Writing a C compiler in 500 lines of Python (2023)
Python-in-500-lines Counterchallenge & Data Structures
- A tongue-in-cheek response suggests writing a Python compiler in 500 lines of C; commenters note a minimal Python bytecode VM is plausible but far larger than 500 LOC in practice.
- With a strict line budget, people argue about dictionary implementations:
- One view: just use linked lists and linear search to save lines.
- Others show that simple hash tables can be written in ~10–30 lines and even “hashed lists” are nearly free syntactically.
- Several note that with a 500-line constraint, performance is irrelevant; correctness and smallness trump data-structure sophistication.
Interpreted vs Compiled Languages
- A claim that “Python is an interpreted language” is pushed back on: any language can be compiled; Python already compiles to bytecode (
.pyc) and has ahead-of-time or JIT-style compilers. - Python and Ruby are described as “nightmarish” to compile fully because of monkey patching, dynamic method creation, decorators, etc.; existing compilers often target restrictive subsets.
Learning Compilers & Linguistics
- Readers say the article demystifies compilers to the point they feel they could target small MCUs (e.g., AVR), even if it’d still be hard.
- Several connect compiler design to linguistics and formal grammars (Chomsky hierarchy) and note similar techniques in domains like DNA/RNA analysis.
- Prior minimalist C compilers (e.g., tiny self-hosting subsets) are referenced as further study.
Single-Pass vs Multi-Pass & Language Choices
- Some are surprised a single-pass compiler can be “easier” than a lexer–parser–AST–IR pipeline, given the latter’s optimization potential.
- Others stress that fewer lines ≠ less conceptual complexity, and that language choice matters:
- ML/OCaml-like languages are said to be far more concise for ASTs and pattern matching than Python’s class-based style.
- Discussion branches into generic functions, the expression problem, and trade-offs between OO and functional designs.
- Single-pass compilers are framed as reasonable for toy or historically resource-constrained systems, but not for serious optimization.
C’s Complexity, Standards, and “Simplicity” Debate
- A commenter notes that no compiler fully implements the modern C spec; real C parsers are tens of thousands of lines, and “you can’t actually parse a C header” without the full toolchain.
- Others argue this “C is impossible” narrative is exaggerated and often comes from people stuck at K&R-level understanding; they emphasize:
- C is defined in terms of an abstract machine; ABIs are platform/toolchain contracts, not in the language standard.
- Using the compiler to parse headers is natural, not a failure.
- Type-size issues (
int,intmax_t) are about portability and ABI ossification, not fundamental design flaws.
- There’s criticism of C’s feature creep and the heavy use of GCC/Clang extensions (e.g., for building the Linux kernel), along with proposals for a “smaller, saner C” (single loop construct, sized primitives, no implicit casts, explicit atomics, etc.).
- In contrast, some defend “simple C” (e.g., C89 subsets) as still practical, fast, and lightweight for small programs, especially when avoiding large external libraries.
Other Notes
- The article’s visual depiction of a compiler is widely praised as clear and charming.
- WebAssembly is seen as a clean, if slightly odd, target; another book on writing a more full-featured C compiler is recommended.
- Tangential threads cover nostalgic programmable calculators and a joking attempt to coin a new word (“cremement”).