When AI writes the software, who verifies it?

Scale and Nature of the Verification Problem

  • Many argue AI massively accelerates code generation but not verification, turning review and testing into the new bottleneck.
  • AI shifts errors from obvious syntax issues to subtle design, security, and invariant violations that casual review misses.
  • Some see this as exposing a long‑standing truth: most teams already verified poorly; AI just makes the deficit visible.

Tests, Specifications, and Formal Methods

  • Strong camp: formal verification (Lean, Dafny, Verus, TLA+, etc.) is the only sustainable answer as AI output scales.
  • Counterpoint: formal methods don’t solve “are we proving the right thing?” and specs themselves are hard, often as complex as implementation.
  • Concern that AI will generate tests that merely mirror its own code, not business logic; tests can become cargo cult artifacts (e.g., 100% coverage via meaningless AI‑written tests).
  • Some suggest contracts and machine‑readable specs (OpenAPI, schemas, refinement types) as more robust anchors than tests alone.

Human Review, Responsibility, and Incentives

  • Many insist humans must still review AI code “like a junior dev’s,” especially for security and API design.
  • Others report real review fatigue: dozens of small, scattered AI‑driven PRs are hard to reason about in context.
  • Organizational incentives often favor shipping features over correctness; engineers who push back feel overruled or sidelined.
  • Several predict big, public failures will be needed before organizations re‑invest in serious verification.

Languages, Tooling, and Practical Strategies

  • Preference grows for strongly typed, compiled languages (Rust, Go, strict TypeScript) to constrain AI slop; dynamic ecosystems (Python/JS) are seen as fragile under LLM use.
  • Some developers successfully pair AI with property tests, fuzzing, snapshot tests, and static analysis as “belt and suspenders.”
  • Mixed experience with theorem provers: Dafny/F* seen as more approachable than Lean/Coq; Lean praised for power but criticized as “an island” that’s hard to integrate into typical stacks.

Agentic Workflows and Multi‑AI Verification

  • Common pattern: one AI writes code, another reviews, sometimes a third generates or critiques tests.
  • Advocates report good results from “fresh eyes” models and adversarial or ping‑pong setups.
  • Skeptics argue this is “fighting probability with probability” and doesn’t address fundamental misalignment or spec errors.

Impact on the Developer Role and Industry

  • Many foresee the job shifting from “writing code” to specifying behavior, designing architectures, and auditing AI output.
  • Others note that textbooks always framed software engineering around requirements, design, and quality; AI may simply force industry to align with that.
  • There’s anxiety that some engineers are becoming mere “prompt routers” between stakeholders and AI, and that AI will eventually target those higher‑level roles too.