When AI writes the software, who verifies it?
Scale and Nature of the Verification Problem
- Many argue AI massively accelerates code generation but not verification, turning review and testing into the new bottleneck.
- AI shifts errors from obvious syntax issues to subtle design, security, and invariant violations that casual review misses.
- Some see this as exposing a long‑standing truth: most teams already verified poorly; AI just makes the deficit visible.
Tests, Specifications, and Formal Methods
- Strong camp: formal verification (Lean, Dafny, Verus, TLA+, etc.) is the only sustainable answer as AI output scales.
- Counterpoint: formal methods don’t solve “are we proving the right thing?” and specs themselves are hard, often as complex as implementation.
- Concern that AI will generate tests that merely mirror its own code, not business logic; tests can become cargo cult artifacts (e.g., 100% coverage via meaningless AI‑written tests).
- Some suggest contracts and machine‑readable specs (OpenAPI, schemas, refinement types) as more robust anchors than tests alone.
Human Review, Responsibility, and Incentives
- Many insist humans must still review AI code “like a junior dev’s,” especially for security and API design.
- Others report real review fatigue: dozens of small, scattered AI‑driven PRs are hard to reason about in context.
- Organizational incentives often favor shipping features over correctness; engineers who push back feel overruled or sidelined.
- Several predict big, public failures will be needed before organizations re‑invest in serious verification.
Languages, Tooling, and Practical Strategies
- Preference grows for strongly typed, compiled languages (Rust, Go, strict TypeScript) to constrain AI slop; dynamic ecosystems (Python/JS) are seen as fragile under LLM use.
- Some developers successfully pair AI with property tests, fuzzing, snapshot tests, and static analysis as “belt and suspenders.”
- Mixed experience with theorem provers: Dafny/F* seen as more approachable than Lean/Coq; Lean praised for power but criticized as “an island” that’s hard to integrate into typical stacks.
Agentic Workflows and Multi‑AI Verification
- Common pattern: one AI writes code, another reviews, sometimes a third generates or critiques tests.
- Advocates report good results from “fresh eyes” models and adversarial or ping‑pong setups.
- Skeptics argue this is “fighting probability with probability” and doesn’t address fundamental misalignment or spec errors.
Impact on the Developer Role and Industry
- Many foresee the job shifting from “writing code” to specifying behavior, designing architectures, and auditing AI output.
- Others note that textbooks always framed software engineering around requirements, design, and quality; AI may simply force industry to align with that.
- There’s anxiety that some engineers are becoming mere “prompt routers” between stakeholders and AI, and that AI will eventually target those higher‑level roles too.