2024-06-28

Meta LLM Compiler: neural optimizer and disassembler

Overview of Meta LLM Compiler

Model is built on Code Llama, trained primarily to emulate compilation (code + flags → assembly/IR), then fine‑tuned for:
- Choosing LLVM optimization pass order (auto‑tuning for size).
- Decompilation/disassembly (assembly ↔ IR / higher-level code).
Intended as a research model and foundation for further fine‑tuning, not a drop‑in replacement for existing compilers.

Determinism and Reproducible Builds

Strong concern that compilers must be deterministic for build systems, caching, Nix-style reproducible builds, and supply-chain validation.
Historically, compilers sometimes embedded timestamps or had other nondeterministic behavior; this is now seen as an antipattern.
LLMs can be made deterministic (temperature 0, fixed seed), but:
- Outputs are still highly sensitive to small input changes.
- Determinism per input is different from reliability over a distribution of inputs, where LLMs remain weak.

Correctness, Verification, and Safety

Many commenters distrust LLMs for correctness-critical compilation; “almost always right” is considered unacceptable.
For decompilation, the paper uses round‑tripping: x86 → (model) IR → (clang) x86; exact match is treated as correct, yielding ~45% exact round‑trip, so only partially trustworthy.
For optimization, the model only suggests pass order; LLVM still enforces semantics, though changing phase ordering is known to surface latent compiler bugs.
Alive2 is suggested for formal verification of LLVM IR transformations, but authors note it is expensive and times out often, limiting practicality.
Consensus: use AI for profitability/heuristics, not for defining correctness.

Decompilation and Potential Applications

Reported big jump over prior decompilation work (previously recalled as <30%); 90%+ style forward/backward mapping is seen as potentially transformative.
Envisioned uses: binary-to-source recovery for archival, porting old binaries, aiding Verilog / hardware work, chip simulations, and serving as a strong code assistant prior.

Optimization Focus: Size vs Performance

Current work targets code size; some disappointed it does not yet optimize for runtime performance.
Commenters note performance is harder to measure (noisy benchmarks vs deterministic size), and cost models are still immature.
There is agreement that modern compilers still have significant optimization headroom (e.g., inlining for size), so ML‑guided heuristics could matter.

Skepticism, Naming, and Practicality

Several view the idea of an “LLM compiler” as overhyped or misleading; prefer framing as “LLM-guided compiler optimization.”
Concerns:
- High risk of subtle miscompilations.
- Production deployment would be hard due to correctness, performance of inference, and engineering complexity.
Others are cautiously optimistic, seeing it as a valuable research direction and a reusable base model, not an immediate product.

Related topics