2026-05-20

Learnings from 100K lines of Rust with AI (2025)

Lines of code & maintainability

Many see 100k–130k LOC as a red flag, especially since Azure’s original C++ RSL is ~36k LOC.
Some argue Rust isn’t necessarily shorter than C++, and non-test Rust code here is only ~45–50k LOC.
Critics view the result as “AI slop”: large, likely unmaintainable, and possibly not functionally equivalent to RSL.
Others say the impressive part isn’t LOC but that a single person + agents could build such a system quickly.

Testing & correctness

~1,300 tests for 130k LOC (or ~50k non-test LOC) is seen by many as too sparse, especially for distributed systems.
Several argue that in this domain, test and tooling code should greatly exceed production LOC.
Skepticism that AI-generated tests actually assert meaningful properties; concern that passing tests may be vacuous.
Some ask whether the real proof would be long-term successful use in production, which is currently unclear.

AI coding in Rust vs other languages

Mixed experiences: some report constant lifetime errors and brute-force use of clone()/Rc/Arc<Mutex<…>>.
Others find Rust excellent for LLMs when using tight feedback loops (cargo check/test, clippy, formatting hooks).
One camp claims Rust is “nearly perfect” for LLMs due to strong typing and compile-time safety; another prefers Go, Kotlin, JS, or even C/Haskell as easier or more efficient targets.
Concern that overuse of interior mutability turns safety issues into deadlocks and liveness failures.

Architectures, agents & guardrails

A recurring problem: LLMs create sprawling, tangled architectures where small changes touch many modules.
Suggested mitigations: multiple small crates, hard LOC limits per file, strict pre-commit hooks, and repo “guardrails” enforcing project conventions.
Several tools/workflows are mentioned that orchestrate “beads”/tasks, static checks, and agent-review steps.

Specs, multi-LLM workflows & trust

Some teams have one model write specs and another critique/implement them, then cross-review implementations.
Others argue the same model in a fresh context is enough; the key is breaking bias by resetting context.
A skeptical thread notes LLM outputs vary between runs and can confidently contradict themselves; warns against over-trusting specs or plans without human review.

Debate over LLM “reasoning”

Long subthread on whether LLMs truly “reason” versus merely predict tokens.
One side points to apparent reasoning behavior; the other emphasizes lack of cognition and deterministic understanding.
No consensus; participants agree only that LLMs can be useful even if their internal process is opaque.

Related topics