Want to write a compiler? Just read these two papers (2008)

Learning Resources for Compilers

  • Many recommend approachable, project-focused material over dense theory:
    • Crenshaw’s “Let’s Build a Compiler” and similar “tiny compiler” series.
    • “Crafting Interpreters” is widely praised, though some wish for a sequel covering types, optimization, and linking.
    • Incremental/educational texts: Ghuloum’s “An Incremental Approach to Compiler Construction,” “Essentials of Compilation,” a short compiler book by Wirth, and a small C-compiler book.
    • Courses and video series: nand2tetris, a well-regarded Stanford compilers course, CS6120, and other online lectures.
  • Several links to freely available PDFs and archived books/papers (nanopass, Wirth, Bornat, etc.).

Difficulty and Course Experiences

  • Compiler courses are repeatedly described as very hard but often rewarding.
  • Some found them purely painful, while others say teacher quality made the biggest difference.
  • There is disagreement over whether writing a simple compiler is “not that difficult” or beyond most CS graduates without strong guidance.

Parsing, Frontends, and Syntax

  • Strong debate on parsing approaches:
    • Some favor parser combinators and recursive descent for clarity and better error messages.
    • Others argue traditional lexer/parser splits and parser generators are still valuable, especially for understanding grammar design.
  • General sense that modern educational resources de-emphasize deep parsing theory compared to the “Dragon Book.”

Nanopass and Incremental Design

  • Nanopass is seen as underappreciated: the key idea is many small passes with explicit input/output languages and invariants.
  • This structure is argued to make compilers easier to extend and debug than monolithic designs.

Backends, IR, and Modern Concerns

  • Thread highlights the importance of SSA, data-flow analysis, and IR-based backends; some feel older texts under-cover these.
  • Using LLVM IR as a target is suggested as a practical way to avoid backend complexity, at the cost of learning less about codegen.

AI-Generated Toy Compilers

  • One side claims small LLM-generated compilers are great for learning by tinkering and seeing all phases in minimal code.
  • Others criticize such projects as buggy, poorly tested, and misleading for beginners, recommending safer targets (e.g., high-level languages) if using AI at all.