> It's a huge mistake to start building with Claude without mapping out a project in detail first, by hand.
I agree with you in theory but in my opinion, it doesn't work so well when you don't even know what exactly you are looking for at the start. Yes I knew I wanted a formatter, linter, parse but which language should those be written in, should they be one project or many, how the pieces should fit together, none of that was clear to me.
As I pointed out in the article, in these sort of "greenfield projects" I work a lot better with concrete prototypes and code in front of me I can dissect instead of trying to endlessly play with designs in my head.
> It's also nuts to me that he had to go back in later to build in tests and validation.
I think this is a little misleading. Yes I did do some testing retroactively (i.e. the upstream validation testing) but I was using TDD + verifying outputs immediately, even during the vibe coding phase. The problem as I point out is that this is not enough. Even when I had unit tests written at the same time as they code, they had lots of holes and over time, I kept hitting SQL statements which failed which the testing did not cover.
> in these sort of "greenfield projects" I work a lot better with concrete prototypes and code in front of me I can dissect instead of trying to endlessly play with designs in my head
I'd really recommend separating prototyping work like that out into a pre-design phase. Do the prototypes and figure out the direction for the actual project, but then come back in with a clean repo and design docs built off the prototypes, for claude to work from. I started out using claude to refactor my old projects (or even my codex ones) before I realized it worked better starting fresh.
I think sometimes it silently decides that certain pieces of code or design are absolute constraints, and won't actually remove or change them unless you explicitly tell it to. Usually I run into this towards the end of implementation, when I'll see something I don't expect to and have to tell it to rip it out.
One example recently was an entire messaging queue (nats jetstream) docker image definition that was sitting in the deploy files unused, but claude didn't ever mention or care about it as it worked on those files; it just silently left it sitting there.
Another example was an auth-bypass setting I built in for local testing during prototyping, being not just left alone by Claude but actually propagated into other areas of the application (e.g. API) without asking.
How many of 500 tests were actually reviewed/tested and found good? The code review results were: don't understand code and/or it's pretty bad. Then 0 of those 500 tests were used due to the full rewrite.
So nothing to extrapolate usefulness from, all that's left is a feel...
> giving me better understanding
Examples of that would also be nice (I don't doubt the personal feel that waste was justified)
> JOURNAL before: ...
> JOURNAL after: ...
> was wrong here, learned this
> How many of 500 tests were actually reviewed/tested and found good?
Essentially ~all of the tests were found to be useful but in a more "smoke test" capacity i.e. they provided good "basic" coverage but it was clear that it was also not sufficient.
Which is why in the rewrite:
1) I built a TCL driver that run the upstream SQLite tests and verified we accepted or rejected the SQL in the same way as SQLite.
2) I wrote a test runner which checked for "idempotence" i.e. run the formatter over all the SQL from all the other types of tests then verify that the AST was identical in the input and output.
3) I also wrote a script which ran the formatter over the PerfettoSQL standard library [1], a real world SQLite-based codebase that I knew and deeply understood so I could go through each file and manually check the output.
> Examples of that would also be nice (I don't doubt the personal feel that waste was justified)
Some things learned concretely:
1) C was not going to work for the higher level parts of the project, even the formatter was not pleasant to read or write in C, the validator was much worse
2) Doing the SQLite source extraction in the same language meant that I could ship a really cool feature where the syntaqlite CLI could "generate dialect extensions" without people needing to download a separate script, run their own extraction on the SQLite source code, or worse yet, need to fork syntaqlite. This actually makes it technically possible for people in the web playground to dynamically build extensions to SQLite (though I haven't ended up plumbing that feature through yet)
3) Having a DSL [2] for extensions of SQLite (that e.g. PerfettoSQL could use) was the correct way to go rather than using YAML/JSON/XML etc becaue of how much clarity it provided and how AI took a lot of the annoyance of maintaining a DSL away.
4) I need to invest much more in testing from the start and also more testing where the correctness can be "proved" in some way (e.g. idempotence testing or SQLite upstream testing as described above)
sqlitebrowser.org is cool but it's not the sort of developer tools I'm talking about. As I clarify in the side notes, I'm looking for a formatter, linter, LSP, not an IDE.
As I replied to some other comment, I'm very aware that there is a syntax diagram but that really only tells half the story. If you actually look at those diagrams into detail, or you look into the the actual parse.y grammar (https://sqlite.org/src/file?name=src/parse.y&ci=trunk), you'll find that they're missing a lot of information which is required for you to actually interpret the SQL into an AST.
When I refer to "extracting sources from the SQLite codebase" a big part of that was indeed referring to compiling Lemon and executing it against a custom implementation of parse.y [1].
The problem comes from how SQLite's upstream parse.y works. Becuase it doesn't actually generate the parse tree, instead generating the bytecode directly, the intepretation of any node labelled "id" or "nm" is buried inside the source code behind many layers of functions. You can see for yourself by looking at SQLite's parse.y [2]
Ah, that makes sense. Thanks for the details. I see now that your article basically had all the information I needed to figure this out if I’d thought a bit harder!
Also, nice work: this makes the world just a little nicer!
I'm very well aware of parse.y, if you look into the syntaqlite code, you'd find it's a critical part of how the whole "source extraction" mentioned in the article works [1]
To be clear when I say "formal specification", I'm not just talking about the formal grammar rules but also how those interpreted in practice. Something closer to the ECMAScript specification (https://ecma-international.org/publications-and-standards/st...).
Was an interesting ready but sad to see that there is nothing special for matching or restructuring tagged unions (beyond the special cased optional type). That's one of the things from Rust I miss the most in my day to day work with C/C++.
It appears Tabula also gets the substituted content instead.
What I'm seeing is that for example, POS is substituted to & !ë on every line in every file, etc. I can see by comparing to the rendered PDF for other common text (like my name, the local supermarket, etc) that those all seem to be 1:1 substitutions too.
The core software itself is completely agnostic to currency or country. Otoh the importer ecosystem is somewhat US focused but a) you can easily write importers yourself b) I just published importers for a bunch of UK institutions (e.g. HSBC, Barclays, AJ Bell) - see the other post hanging around the HN frontpage :)
> How do you think it compares time-wise to using existing accounting software?
Author here. I tried various consumer budgeting apps before I ended up building my own (and then going to Beancount). The main problem with every one of the apps I tried is that they don't handle investments well. 99% of my money is invested and having net worth figures which are wildly wrong because the app is only tracking bank accounts really annoyed me. That was the reason I built my own thing in the first place.
> Was the time investment worth it to get the control and visibility you now have?
Absolutely yes. I think it helps me really understand where my money is going, how I can make it work harder etc. Even though the RE part of FIRE doesn't appeal to me, the FI part does and knowing where I stand at all times has been very motivating.
Thank you for taking the time to reply - thank you!
I have question on a more personal front - please feel no obligation to reply.
What impact has having such clear visibility into your accounts had on your relationship with your wife? It feels like it would be a great catalyst for communication, trust and building things if shared finances was a key part of the relationship.
I think this part was the most inspirational - it takes a lot of courage to be that open about finances, even with partners, perhaps especially with partners.
> What impact has having such clear visibility into your accounts had on your relationship with your wife?
That's a great question. Thankfully when it comes to finances we are very aligned in our habits and goals. So we find it very natural to be open because we know that we're both going to be aligned.
Where we differ heavily though is how much we are willing to really getting onto the nitty gritty details. She really likes knowing how our money works but she also has no interest in spending so much time and effort on it.
But that works great because I love this stuff. So every month or so we have a "finances session" where I sit down with her, take her through the books and make sure we're both happy with everything.
Obviously this very much depends on the couple whether this works but it has for us so far!
Would you consider a follow up blog post about how you structure and approach your monthly finance sessions? I understand that it would be well outside of your topics of software engineering, performance and open-source, but I find that the human component of our industry is often missing. An insight into how someone has successfully navigated that would be a wonderful read.
Given the comments on this post and the other Beancount one in the frontpage, I definitely think there's enough I want to say for at least one if not two followups. Stay tuned!
I'm talking about apps similar to Mint and YNAB. Specifically I used to use an app called Yolt (which was shut down) and then was on an app called Emma for a bit.
I'm sure GnuCash would also work just fine but ultimately it's also a full double entry system. I never tried it because I came across ledger/hledger/beancount first and, being command line tools, they appealed more to my sensibilities.
Firstly, thanks for the thoughtful comments across the post. Really happy you found it insightful!
> But it's best to build a strong portfolio of going fast (where needed), going slow (where high leverage), and if everyone agrees your "going slow" led to huge returns you get the best of all worlds.
I guess maybe I oversold this a little in the post but I definitely think that product orgs value speed more than infra orgs: there's often an mismatch when I speak to product engineers on how fast they expect us to build things for them vs how fast we can actually go while considering all our other customers and not breaking other usecases.
> my protip for prospective staff engineers is to _never_ say you only care about [speed, stability]
Fully agreed, both are important for engineers to have. As above, I think the relative composition of them varies though depending on the area (just pulling numbers from the top of my head, maybe it's 70/30 in favor of speed in product orgs, 30/70 in favor of stability in infra orgs).
> surely we can agree impressing customer stakeholders is even more important in (healthy) product orgs
You're absolutely right for what I say in the post; reflecting I think I maybe did not go into this topic in enough detail for the nuance it deserves (but the post already felt long enough!).
Let's split the discussion into 3 different areas (as IMO they each work slightly different): infra/devex, B2C and B2B.
* Infra/devex: as I say in the post, it's critical to impress your other team's managers as that's how you prove impact.
* B2C: your customers are consumers so it's all MAUs, revenue etc. There is no "customer stakeholder" who will give you direct feedback for your promo packet
* B2B: here's where it gets interesting. So if your management chain is directly able to talk to customers (and especially senior managers) to solicit feedback without middle layers (parter manager, account managers, PMs) then yes you're absolutely correct. But often this is not the case: the middle layers act as a filtering point so you get only a fuzzy sense of how stakeholders in the other company about a specific technical thing you worked on. So again it's sort of at the whim of how your manager feels the other company's managers feels.
The basic point I was trying to make is that if you're working on external facing product, from my understanding, even if you impress your external stakeholders, it's not like in your promo packet you can attribute concrete quotes to the customers. quotes like "we couldn't have done X in our company without the work of Y engineer in your org who worked on Z". Whereas this sort of quote is extremely common to see in infra promo packets.
Hope this made sense, I'm not sure I communicated this last point well!
I agree with you in theory but in my opinion, it doesn't work so well when you don't even know what exactly you are looking for at the start. Yes I knew I wanted a formatter, linter, parse but which language should those be written in, should they be one project or many, how the pieces should fit together, none of that was clear to me.
As I pointed out in the article, in these sort of "greenfield projects" I work a lot better with concrete prototypes and code in front of me I can dissect instead of trying to endlessly play with designs in my head.
> It's also nuts to me that he had to go back in later to build in tests and validation.
I think this is a little misleading. Yes I did do some testing retroactively (i.e. the upstream validation testing) but I was using TDD + verifying outputs immediately, even during the vibe coding phase. The problem as I point out is that this is not enough. Even when I had unit tests written at the same time as they code, they had lots of holes and over time, I kept hitting SQL statements which failed which the testing did not cover.
reply