Writing a Rust compiler in C
Project goals and approach
- Dozer is a Rust compiler written in (portable, minimal) C that targets Cranelift/QBE, aiming to fit into the “from-nothing” bootstrappable toolchain that starts with a tiny C compiler like TinyCC.
- Main motivation: drastically shorten and simplify the bootstrap path to a modern Rust compiler compared to the current chain (Guile → OCaml → early Rust → many rustc versions).
- Intended use is not performance or daily development, but as a bootstrap step that can compile the “real” rustc.
Why C, and why not other languages
- Many argue C is the practical first target because nearly every platform gets a C compiler first, and libraries like Lua are easy to port to minimal C.
- Alternatives proposed: Java, a proto-Rust subset, Forth, WASM+wasm2c, or decompiling rustc output back to C. Critics note:
- Java bootstrapping is complex.
- Forth is ideal conceptually but unpleasant enough to program in that no one follows through at scale.
- Generated C or blobs (like Zig’s wasm stage1) are seen as unauditable and against bootstrappable-build principles.
Security, trust, and reproducible builds
- A major driver is reducing exposure to “trusting trust” style compiler backdoors and supply-chain attacks.
- Bootstrappable Builds ethic: no pre-generated code; everything must be derivable from human-readable source, starting from a tiny binary seed (e.g., hex loader).
- Some see full-chain auditing as still practically infeasible; others argue “more auditable than today” is already valuable.
Practicality, porting, and skepticism
- Use cases discussed: porting Rust to new OSes, where current bootstrapping via many rustc versions and LLVM is painful and time-consuming.
- Counterpoint: for new platforms, cross-compilation often suffices; Dozer specifically targets same-architecture, from-scratch bootstrapping.
- Critics call the effort mostly aesthetic or futile given Rust’s fast evolution and complex language; supporters value the cleaner bootstrap story and educational value.
Project status and limitations
- Current codebase is ~5k lines of C. Lexer and parts of the parser exist; typechecking is minimal (e.g., i32 only); macros/modules and robust codegen are missing.
- It can only handle trivial Rust examples; cannot compile large crates like Tokio yet.
- No substantial test suite is evident; some commenters want clearer test coverage of supported Rust features.