Writing a C Compiler: Build a Real Programming Language from Scratch

Book availability & scope

  • Book is newly released in 2024; some confusion over dates (US vs EU / Amazon vs publisher), but it is available now from the publisher and at least one reader has a physical copy.
  • It walks through building a compiler for a subset of C, not the full language.
  • Focuses on compiling to assembly, not just interpreting or bytecode.

Comparison to other compiler resources

  • Frequently compared to “Crafting Interpreters”: this book goes further into native code generation; Crafting Interpreters focuses on interpreters and a bytecode VM.
  • Seen as more hands‑on and implementation‑driven than classic, theory‑heavy compiler texts.
  • Other mentioned resources: older C-based compiler books, an ML-based “Modern Compiler Implementation” series, a “retargetable C compiler” book, and projects like chibicc (for preprocessor/driver) and nand2tetris.

Implementation language & OCaml debate

  • Reference implementation is in OCaml; the book itself uses pseudo‑code so readers can implement in any language.
  • Some find OCaml off‑putting or unfamiliar; others argue ML-family languages are particularly well‑suited to compilers (algebraic data types, pattern matching, GC).
  • Counterpoint: many production compilers are in C/C++ or are self‑hosted; choice of implementation language is largely preference and ecosystem.

Modern compiler design changes (vs past decades)

  • Several comments discuss what’s different now: dominance of SSA-based IRs, multiple IR stages, global and whole‑program optimization, formal semantics, and stronger requirements on correctness of optimizations.
  • Parsing has become relatively less central; hand‑written parsers are now common for better error messages and DX, though parser generators historically powered many “real” languages.
  • LLVM IR is a common target; some languages also have faster custom backends or alternative debug backends.

Learning experience & pedagogy

  • Multiple readers are actively working through the book and report it’s more demanding than some “light” books but more rewarding and concrete.
  • Each chapter includes tests; the book often omits low‑level implementation detail, expecting readers to consult code or design their own.
  • One person is implementing the compiler in Ada; others consider Rust and other languages.

Beyond the compiler: assemblers, debuggers, databases

  • Some want coverage that goes all the way to machine code, binary formats, debuggers, breakpoints, and hot‑reloading; current book stops at assembly, but techniques carry over.
  • JIT/assembler libraries and GNU binutils are suggested for machine‑code generation; ptrace and symbol tools for simple debuggers.
  • A tangential subthread discusses books for building database systems from scratch.

Tooling, pattern matching, and debugging culture

  • Pattern matching over ASTs is highlighted as a major reason to prefer ML-like languages; it makes tree transforms shorter and clearer.
  • Debate over debugging styles: some say FP + strong types reduces need for interactive debuggers; others insist on breakpoints and IDE support and find pure “print debugging” inadequate.
  • OCaml’s debugger exists but tooling, especially on Windows/VS Code, is viewed as weaker than F#/.NET tooling.