Learnings from 100K lines of Rust with AI (2025)
Lines of code & maintainability
- Many see 100k–130k LOC as a red flag, especially since Azure’s original C++ RSL is ~36k LOC.
- Some argue Rust isn’t necessarily shorter than C++, and non-test Rust code here is only ~45–50k LOC.
- Critics view the result as “AI slop”: large, likely unmaintainable, and possibly not functionally equivalent to RSL.
- Others say the impressive part isn’t LOC but that a single person + agents could build such a system quickly.
Testing & correctness
- ~1,300 tests for 130k LOC (or ~50k non-test LOC) is seen by many as too sparse, especially for distributed systems.
- Several argue that in this domain, test and tooling code should greatly exceed production LOC.
- Skepticism that AI-generated tests actually assert meaningful properties; concern that passing tests may be vacuous.
- Some ask whether the real proof would be long-term successful use in production, which is currently unclear.
AI coding in Rust vs other languages
- Mixed experiences: some report constant lifetime errors and brute-force use of
clone()/Rc/Arc<Mutex<…>>. - Others find Rust excellent for LLMs when using tight feedback loops (
cargo check/test, clippy, formatting hooks). - One camp claims Rust is “nearly perfect” for LLMs due to strong typing and compile-time safety; another prefers Go, Kotlin, JS, or even C/Haskell as easier or more efficient targets.
- Concern that overuse of interior mutability turns safety issues into deadlocks and liveness failures.
Architectures, agents & guardrails
- A recurring problem: LLMs create sprawling, tangled architectures where small changes touch many modules.
- Suggested mitigations: multiple small crates, hard LOC limits per file, strict pre-commit hooks, and repo “guardrails” enforcing project conventions.
- Several tools/workflows are mentioned that orchestrate “beads”/tasks, static checks, and agent-review steps.
Specs, multi-LLM workflows & trust
- Some teams have one model write specs and another critique/implement them, then cross-review implementations.
- Others argue the same model in a fresh context is enough; the key is breaking bias by resetting context.
- A skeptical thread notes LLM outputs vary between runs and can confidently contradict themselves; warns against over-trusting specs or plans without human review.
Debate over LLM “reasoning”
- Long subthread on whether LLMs truly “reason” versus merely predict tokens.
- One side points to apparent reasoning behavior; the other emphasizes lack of cognition and deterministic understanding.
- No consensus; participants agree only that LLMs can be useful even if their internal process is opaque.