LLMs could be, but shouldn't be compilers

Determinism, Reproducibility, and What “Being a Compiler” Requires

  • Big subthread on whether determinism is the key property:
    • One camp: compilers must be deterministic; same input → bitwise-identical output is core for debugging, reproducible builds, verification, and security. A “stochastic compiler” is unfit as a building block.
    • Other camp: non-determinism per se isn’t the issue; compilers need semantic closure (outputs always semantically valid), and can still be non-deterministic in implementation choices (e.g., optimization, diversification) as long as semantics are preserved.
  • There’s debate whether LLMs can be deterministic:
    • In theory: with temperature 0, fixed RNG seed, and carefully ordered arithmetic, yes.
    • In practice: GPU floating-point, attention kernels, batching, and service-level choices make outputs non-repeatable across runs/hardware. Even then, minor prompt changes can drastically change output, so they’re “chaotic” even when mathematically deterministic.

Natural Language, Underspecification, and “LLM as Compiler”

  • Many agree the real problem isn’t randomness but that prompts are underspecified: natural language leaves gaps, so LLMs must “guess” intent.
  • Some argue this invites “vibe coding”: users accept plausible output instead of sharply specifying behavior.
  • Others reject the psychological leap that fuzzier authoring will make professionals abandon correctness; requirements, tests, and business constraints still act as hard ground truth.

Testing, Correctness, and Human Oversight

  • Pro‑LLM participants emphasize: if generated code passes real tests, meets performance/security needs, and is reviewed, the tool’s internal process doesn’t matter—just like with traditional compilers and junior devs.
  • Skeptics counter that “non‑toy” test suites with sufficient coverage are extremely hard in complex systems; relying on tests alone is unrealistic.

LLMs as Junior Developers / Transpilers, Not Full Compilers

  • Common model: treat LLMs as junior or mid-level devs: good for boilerplate, refactors, and transpilation, but needing supervision.
  • Several report strong wins in tasks like transpiling between languages or rewriting utilities, but no one trusts continuous regeneration of entire codebases like we do with compilers and object code.

Safety, Domains, and Finite Resources

  • Strong resistance to LLM-driven systems in safety‑critical or financial domains: examples of “probabilistic banking” or avionics are used to highlight the need for strict determinism and auditability.
  • Some note non-determinism already exists in GC/JIT/heuristic systems; what matters is error rates and guarantees.
  • Others stress cost and finiteness: LLMs are computationally “grotesquely” expensive relative to CPUs, making them unsuitable as universal compilation backends, though perhaps useful to improve compilers.