Want to write a compiler? Just read these two papers (2008)
Learning Resources for Compilers
- Many recommend approachable, project-focused material over dense theory:
- Crenshaw’s “Let’s Build a Compiler” and similar “tiny compiler” series.
- “Crafting Interpreters” is widely praised, though some wish for a sequel covering types, optimization, and linking.
- Incremental/educational texts: Ghuloum’s “An Incremental Approach to Compiler Construction,” “Essentials of Compilation,” a short compiler book by Wirth, and a small C-compiler book.
- Courses and video series: nand2tetris, a well-regarded Stanford compilers course, CS6120, and other online lectures.
- Several links to freely available PDFs and archived books/papers (nanopass, Wirth, Bornat, etc.).
Difficulty and Course Experiences
- Compiler courses are repeatedly described as very hard but often rewarding.
- Some found them purely painful, while others say teacher quality made the biggest difference.
- There is disagreement over whether writing a simple compiler is “not that difficult” or beyond most CS graduates without strong guidance.
Parsing, Frontends, and Syntax
- Strong debate on parsing approaches:
- Some favor parser combinators and recursive descent for clarity and better error messages.
- Others argue traditional lexer/parser splits and parser generators are still valuable, especially for understanding grammar design.
- General sense that modern educational resources de-emphasize deep parsing theory compared to the “Dragon Book.”
Nanopass and Incremental Design
- Nanopass is seen as underappreciated: the key idea is many small passes with explicit input/output languages and invariants.
- This structure is argued to make compilers easier to extend and debug than monolithic designs.
Backends, IR, and Modern Concerns
- Thread highlights the importance of SSA, data-flow analysis, and IR-based backends; some feel older texts under-cover these.
- Using LLVM IR as a target is suggested as a practical way to avoid backend complexity, at the cost of learning less about codegen.
AI-Generated Toy Compilers
- One side claims small LLM-generated compilers are great for learning by tinkering and seeing all phases in minimal code.
- Others criticize such projects as buggy, poorly tested, and misleading for beginners, recommending safer targets (e.g., high-level languages) if using AI at all.