2026-03-03

When AI writes the software, who verifies it?

Scale and Nature of the Verification Problem

Many argue AI massively accelerates code generation but not verification, turning review and testing into the new bottleneck.
AI shifts errors from obvious syntax issues to subtle design, security, and invariant violations that casual review misses.
Some see this as exposing a long‑standing truth: most teams already verified poorly; AI just makes the deficit visible.

Tests, Specifications, and Formal Methods

Strong camp: formal verification (Lean, Dafny, Verus, TLA+, etc.) is the only sustainable answer as AI output scales.
Counterpoint: formal methods don’t solve “are we proving the right thing?” and specs themselves are hard, often as complex as implementation.
Concern that AI will generate tests that merely mirror its own code, not business logic; tests can become cargo cult artifacts (e.g., 100% coverage via meaningless AI‑written tests).
Some suggest contracts and machine‑readable specs (OpenAPI, schemas, refinement types) as more robust anchors than tests alone.

Human Review, Responsibility, and Incentives

Many insist humans must still review AI code “like a junior dev’s,” especially for security and API design.
Others report real review fatigue: dozens of small, scattered AI‑driven PRs are hard to reason about in context.
Organizational incentives often favor shipping features over correctness; engineers who push back feel overruled or sidelined.
Several predict big, public failures will be needed before organizations re‑invest in serious verification.

Languages, Tooling, and Practical Strategies

Preference grows for strongly typed, compiled languages (Rust, Go, strict TypeScript) to constrain AI slop; dynamic ecosystems (Python/JS) are seen as fragile under LLM use.
Some developers successfully pair AI with property tests, fuzzing, snapshot tests, and static analysis as “belt and suspenders.”
Mixed experience with theorem provers: Dafny/F* seen as more approachable than Lean/Coq; Lean praised for power but criticized as “an island” that’s hard to integrate into typical stacks.

Agentic Workflows and Multi‑AI Verification

Common pattern: one AI writes code, another reviews, sometimes a third generates or critiques tests.
Advocates report good results from “fresh eyes” models and adversarial or ping‑pong setups.
Skeptics argue this is “fighting probability with probability” and doesn’t address fundamental misalignment or spec errors.

Impact on the Developer Role and Industry

Many foresee the job shifting from “writing code” to specifying behavior, designing architectures, and auditing AI output.
Others note that textbooks always framed software engineering around requirements, design, and quality; AI may simply force industry to align with that.
There’s anxiety that some engineers are becoming mere “prompt routers” between stakeholders and AI, and that AI will eventually target those higher‑level roles too.

Related topics