Hacker Newsnew | past | comments | ask | show | jobs | submit | armchairhacker's commentslogin

Egraphs are for optimization passes, which operate on the same IR, and (without egraphs) may undo and/or prevent each other and are repeated for more optimization.

Nanopass is compiler passes, which each have their own IR, and run once in a fixed sequence.

Egraphs also require the IR to be defined a specific way, which prevents some optimizations. My understanding of https://github.com/bytecodealliance/rfcs/blob/main/accepted/... is that cranelift’s egraph optimizations are pure expression reordering/duplication/deduplication and rewrite rules.


There is no reason why compiler passes cannot be in an egraph, they are more general than optimizations. When you think about it, traditional compiler concerns like instructions selection are sort of a optimization problem if you squint.

Egraphs fundamentally require a first-order language, i.e. no lambda functions, because they do not handle scoping well. There are workarounds that work in some circumstances, but no general solution (that I'm aware of).

Additionally, egraphs express rewrites on a single language. Compilers are fundamentally translators between different languages; how do you write typechecking as an egraph rewrite rule? Or conversion to assembly?


But why? The goal of egraphs is to optimally order passes, but these passes have a single order.

I wrote a small nanopass-style compiler (with many small passes) and had the same experience: lots of redundancy, and hard to debug because it wasn't clear how the passes interacted.

Admittedly, this framework has extensive metaprogramming (it's a Racket DSL), so it probably has much less redundancy, but is probably even harder to debug.

To an extent, it's impossible to implement nanopass in a way that's neither redundant nor confusing, without a projectional editor. Because either each pass's AST is fully redefined, which adds redundancy, or most pass's ASTs are defined in many places, which adds confusion. But my wild speculation is that one day projectional editors will gain popularity, and someone will make a compiler where one observe and debug individual passes.


Theorem proving and complex types are like extensions on an otherwise ordinary language:

- Agda, Idris, etc. are functional languages extended with complex types

- Isabelle, Lean, etc. are functional languages extended with complex types and unreadable interactive proofs

- Dafny etc. are imperative languages extended with theorems and hints

- ACL2 is a LISP with theorems and hints

Related, typeclasses are effectively logic programming on an otherwise functional/imperative language (like traits in Rust, mentioned in https://rustc-dev-guide.rust-lang.org/traits/chalk.html).


> Agda, Idris, etc. are functional languages extended with complex types

I think they are not. No amount of type level extensions can turn a regular functional language like Haskell into something suitable for theorem proving. Adding dependent types to Haskell, for example, doesn't suffice. To build a theorem prover you need to take away some capability (namely, the ability to do general recursion - the base language must be total and can't be Turing complete), not add new capabilities. In Haskell everything can be "undefined" which means that you can prove everything (even things that are supposed to be false).

There's some ways you can recover Turing completeness in theorem provers. You can use effects like in F* (non-termination can be an effect). You can separate terms that can be used in proofs (those must be total) from terms that can only be used in computations (those can be Turing complete), like in Lean. But still, you need the base terms to be total, your logic is done in the fragment that isn't Turing complete, everything else depends on it.


This is just wrong, you're being too didactic. Idris specifically lets you implement nontotal functions in the same way that Rust lets you write memory-unsafe code. The idea is you isolate it to the part of the program with effects (including the main loop), which the compiler can't verify anyway, and leave the total formally verified stuff to the business logic. Anything that's marked as total is proven safe, so you only need to worry about a few ugly bits; just like unsafe Rust.

Idris absolutely is a general-purpose functional language in the ML family. It is Haskell, but boosted with dependent types.


Random passerby chiming in: so this means you can write "regular" software with this stuff?

While reading TFA I thought the theorem stuff deserved its own category, but I guess it's a specialization within an ur-family (several), rather than its own family?

It definitely sounds like it deserves its own category of programming language, though. The same way Lojban has ancestry in many natural languages but is very much doing its own thing.


Yes Idris was meant to write regular code. F* is also meant to write regular code

But I think that the theorem prover that excels most at regular code is actually Lean. The reason I think that is because Lean has a growing community, or at least is growing much faster than other similar languages, and for regular code you really need a healthy ecosystem of libraries and stuff.

Anyway here an article about Lean as a general purpose language https://kirancodes.me/posts/log-ocaml-to-lean.html


> if you're a developer who wants to exploit the multiplicative factor of a truly flexible and extensible programming language with state of the art features from the cutting-edge of PL research, then maybe give Lean a whirl!

Does not sound that appealing to me. Sounds like little consistency and having to learn a new language for every project.


I mentioned this later

> You can separate terms that can be used in proofs (those must be total) from terms that can only be used in computations (those can be Turing complete), like in Lean

What I meant is that the part of Idris that lets people prove theorems is the non-total part

But, I think you are right. Haskell could go there by adding a keyword to mark total functions, rather than marking nontotal functions like Idris does. It's otherwise very similar languages


Haskell has liquid haskell to do that.

> To build a theorem prover you need to take away some capability (namely, the ability to do general recursion - the base language must be total and can't be Turing complete), not add new capabilities. In Haskell everything can be "undefined" which means that you can prove everything (even things that are supposed to be false).

Despite what the fanatical constructivists (as opposed to the ones who simply think it's pragmatically nice) seem to want us to think, it turns out that you can prove interesting things with LEM (AKA call/cc) and classical logic.


How does this disagree with "the base language must be total and can't be Turing complete"

I mean you are kinda right but kinda wrong. To get a proof checker you take a typed lambda calculus and extend it with Pi, Sigma, and Indexed Inductive types while maintaining soundness.

Yes haskell's `bottom` breaks soundness, but that doesn't mean you need to take away some capability from the language. You just need to extend the language with a totality checker for the new proof fragment.


A typewriter is extreme. In school I used an AlphaSmart (https://en.wikipedia.org/wiki/AlphaSmart, because my handwriting sucked). A laptop without internet would also work.

It's not hard to deploy laptops with all ports and networking disabled, its a BIOS option on most fleet systems.

“Writing by hand” two years ago is analogous to writing without an IDE, which was rare.

How?

I usually code faster with good (next-edit) autocomplete then writing a prompt and waiting for the agent.


The workflow isn't that you "wait for the agent".

This is like saying "EV charging is soooo slooooow" and thinking you need to stand next to the car holding the nozzle in the charging port like with a petrol car.

Of course you go do something else unless it's a literal 30 second operation.


Did you try asking Claude Design to generate a complex UI with lots of custom details?

Or “2000s aesthetic” for something before Web 2.0 (although you’ll get a generic 2000s aesthetic unless you provide more detail).


It seems to me LL and LR parser generators are overrated, and hand-written recursive descent is best in practice. I understand why academics teach them, but not why some spend so long on different parsing techniques, nor why hobbyists who just want to compile their toy language are directed to them.

I work in PL, and from my first compiler to today, have always found recursive descent easiest, most effective (less bugs, better error diagnostics, fast enough), and intuitive. Many popular language compilers use recursive descent: I know at least C# (Roslyn) and Rust, but I believe most except Haskell (GHC) and ocaml.

The LR algorithm was simple once I learned it, and yacc-like LR (and antlr-like LL) parser generators were straightforward once I learned how to resolve conflicts. But recursive descent (at least to me) is simpler and more straightforward.

LR being more expressive than LL has never mattered. A hand-written recursive descent parser is most expressive: it has unlimited lookahead, and can modify parsed AST nodes (e.g. reordering for precedence, converting if into if-else).

The only solution that comes close is tree-sitter, because it implements GLR, provides helpful conflict messages, and provides basic IDE support (e.g. syntax highlighting) almost for free. But it’s a build dependency, while recursive descent parsers can be written in most languages with zero dependencies and minimal boilerplate.


> It seems to me LL and LR parser generators are overrated, and hand-written recursive descent is best in practice

I would now agree with that. My compiler experience was on a team that happened to have a LALR expert, Jeanne Musinski PhD, a student of Jeffrey Ullman. She invented a better error recovery for the language. Recursive descent would have been perfectly suited to the task.

> LR being more expressive than LL has never mattered.

Quite agree. One might guess that a language that needs that might be hard to program in.


Parser generators are great in Python (Lark for me) so you can iterate fast and get a runtime spec of your grammar.

A hand-written recursive descent parser is something you do later when you start to industrialize your code, to get better error messages, make the parser incremental, etc.

Bison/ANTLR are code generators, they do not fit well in that model.


Nowadays I’ve heard recommended Crafting Interpreters. (https://craftinginterpreters.com)

The Nanopass paper link doesn’t work.


Compilers are broad enough that when someone recommends a "compiler book", it's rarely exactly the slice you wanted.

So this made me do a runnable cheat sheet for Crafting Interpreters. I keep parsing demonstrative, and the AST is a little more Lisp-y than the book's.

Disclaimer: it's meant to convey the essence of what you'll learn, it is NOT by any means a replacement for the book. I'd also describe the book as more of an experience (including some things Nystrom clearly enjoyed, like the visitor pattern) than a compilers manual. If anyone's interested, I can do a separate visitor-pattern cheat sheet too, also in Python.

I turned it into a 'public-facing artifact' from private scripts with an AI agent.

[0] https://ouatu.ro/blog/crafting-interpreters-cheat-sheet/


Crafting Interpreters is great, I wish it had a companion book that covered:

  - types and typing
  - optimization passes
  - object files, executables, libraries and linking
Then two of them would be sufficient for writing a compiler.

To your last point, "Linkers and Loaders" has no equal despite being a bit dated

> types and typing

Types and Programming Languages, Benjamin C Pierce

> object files, executables, libraries and linking

Linkers and Loaders, John R Levine


I've read Pierce. It's not a bad book, but less grounded than CI, which has an explicit "workmanlike" approach.

> types and typing

This would be like asking for a book on designing grammar. It's just too disjoint of a field to have any kind of reasonable baseline, and it's drop dead easy to grok a basic one together. With those two things being equal, just like with grammar, the answer to this is any resource about implementing the language you're trying to ape.


It's drop dead easy to grok a basic one together until you get to hairy stuff like overloading, lambdas and generics.

The reasonable baseline would be something like Java 1. Scalars, arrays and classes. If I remember correctly, Lox even skips arrays as an exercise for the user.


The problem is in the case of overloading or lambdas being "hairy", it follows you're well outside the scope of a beginner. Generics are explicitly outside the scope of a beginner learning to implement a basic type system.

In the case you're not a beginner, it's not true that no literature exists on type systems and their implementation. The mountain of literature is essentially inexhaustible, but if you're still wondering about how to implement any type system at all, if you've never done it before, you really need to not worry so much about that. Kind of like how you don't even think about implementing a tracing jitter before you've written the world's shittiest tree-walking interpreter.


Many languages like Pancake Stack are looking for efficient interpreters:

https://esolangs.org/wiki/Pancake_Stack

:)




Incredible book for self guided learning!

Awesome course! finished it while i was doing my final CS year because I had to wait on a bunch of classes (and frankly had no one to talk to before classes). I haven't tried nanopass, but there's other links that work, so I'll give it a go.

“‘Proven’ code turns out buggy, because of a bug outside the proven specification” is relatively common. It happened in CompCert: https://lobste.rs/s/qlrh1u/runtime_error_formally_verified

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: