Learnings from 100K lines of Rust with AI (2025)

Lines of code & maintainability

  • Many see 100k–130k LOC as a red flag, especially since Azure’s original C++ RSL is ~36k LOC.
  • Some argue Rust isn’t necessarily shorter than C++, and non-test Rust code here is only ~45–50k LOC.
  • Critics view the result as “AI slop”: large, likely unmaintainable, and possibly not functionally equivalent to RSL.
  • Others say the impressive part isn’t LOC but that a single person + agents could build such a system quickly.

Testing & correctness

  • ~1,300 tests for 130k LOC (or ~50k non-test LOC) is seen by many as too sparse, especially for distributed systems.
  • Several argue that in this domain, test and tooling code should greatly exceed production LOC.
  • Skepticism that AI-generated tests actually assert meaningful properties; concern that passing tests may be vacuous.
  • Some ask whether the real proof would be long-term successful use in production, which is currently unclear.

AI coding in Rust vs other languages

  • Mixed experiences: some report constant lifetime errors and brute-force use of clone()/Rc/Arc<Mutex<…>>.
  • Others find Rust excellent for LLMs when using tight feedback loops (cargo check/test, clippy, formatting hooks).
  • One camp claims Rust is “nearly perfect” for LLMs due to strong typing and compile-time safety; another prefers Go, Kotlin, JS, or even C/Haskell as easier or more efficient targets.
  • Concern that overuse of interior mutability turns safety issues into deadlocks and liveness failures.

Architectures, agents & guardrails

  • A recurring problem: LLMs create sprawling, tangled architectures where small changes touch many modules.
  • Suggested mitigations: multiple small crates, hard LOC limits per file, strict pre-commit hooks, and repo “guardrails” enforcing project conventions.
  • Several tools/workflows are mentioned that orchestrate “beads”/tasks, static checks, and agent-review steps.

Specs, multi-LLM workflows & trust

  • Some teams have one model write specs and another critique/implement them, then cross-review implementations.
  • Others argue the same model in a fresh context is enough; the key is breaking bias by resetting context.
  • A skeptical thread notes LLM outputs vary between runs and can confidently contradict themselves; warns against over-trusting specs or plans without human review.

Debate over LLM “reasoning”

  • Long subthread on whether LLMs truly “reason” versus merely predict tokens.
  • One side points to apparent reasoning behavior; the other emphasizes lack of cognition and deterministic understanding.
  • No consensus; participants agree only that LLMs can be useful even if their internal process is opaque.