2025-09-03

Writing a C compiler in 500 lines of Python (2023)

Python-in-500-lines Counterchallenge & Data Structures

A tongue-in-cheek response suggests writing a Python compiler in 500 lines of C; commenters note a minimal Python bytecode VM is plausible but far larger than 500 LOC in practice.
With a strict line budget, people argue about dictionary implementations:
- One view: just use linked lists and linear search to save lines.
- Others show that simple hash tables can be written in ~10–30 lines and even “hashed lists” are nearly free syntactically.
Several note that with a 500-line constraint, performance is irrelevant; correctness and smallness trump data-structure sophistication.

Interpreted vs Compiled Languages

A claim that “Python is an interpreted language” is pushed back on: any language can be compiled; Python already compiles to bytecode (.pyc) and has ahead-of-time or JIT-style compilers.
Python and Ruby are described as “nightmarish” to compile fully because of monkey patching, dynamic method creation, decorators, etc.; existing compilers often target restrictive subsets.

Learning Compilers & Linguistics

Readers say the article demystifies compilers to the point they feel they could target small MCUs (e.g., AVR), even if it’d still be hard.
Several connect compiler design to linguistics and formal grammars (Chomsky hierarchy) and note similar techniques in domains like DNA/RNA analysis.
Prior minimalist C compilers (e.g., tiny self-hosting subsets) are referenced as further study.

Single-Pass vs Multi-Pass & Language Choices

Some are surprised a single-pass compiler can be “easier” than a lexer–parser–AST–IR pipeline, given the latter’s optimization potential.
Others stress that fewer lines ≠ less conceptual complexity, and that language choice matters:
- ML/OCaml-like languages are said to be far more concise for ASTs and pattern matching than Python’s class-based style.
- Discussion branches into generic functions, the expression problem, and trade-offs between OO and functional designs.
Single-pass compilers are framed as reasonable for toy or historically resource-constrained systems, but not for serious optimization.

C’s Complexity, Standards, and “Simplicity” Debate

A commenter notes that no compiler fully implements the modern C spec; real C parsers are tens of thousands of lines, and “you can’t actually parse a C header” without the full toolchain.
Others argue this “C is impossible” narrative is exaggerated and often comes from people stuck at K&R-level understanding; they emphasize:
- C is defined in terms of an abstract machine; ABIs are platform/toolchain contracts, not in the language standard.
- Using the compiler to parse headers is natural, not a failure.
- Type-size issues (int, intmax_t) are about portability and ABI ossification, not fundamental design flaws.
There’s criticism of C’s feature creep and the heavy use of GCC/Clang extensions (e.g., for building the Linux kernel), along with proposals for a “smaller, saner C” (single loop construct, sized primitives, no implicit casts, explicit atomics, etc.).
In contrast, some defend “simple C” (e.g., C89 subsets) as still practical, fast, and lightweight for small programs, especially when avoiding large external libraries.

Other Notes

The article’s visual depiction of a compiler is widely praised as clear and charming.
WebAssembly is seen as a clean, if slightly odd, target; another book on writing a more full-featured C compiler is recommended.
Tangential threads cover nostalgic programmable calculators and a joking attempt to coin a new word (“cremement”).

Related topics