2025-12-16

AI will make formal verification go mainstream

AI coding agents and the need for feedback loops

Many comments argue AI coders are only useful when tightly coupled to execution and validation tools: test runners, linters, fuzzers, debuggers, browser automation (Playwright), etc.
Without the ability to run tests, agents “go off the rails,” iterating random fixes or even fabricating tests or results.
Multi-agent and review workflows (separate “developer” and “reviewer” models, orchestration loops) greatly improve reliability but are expensive in API usage.

Formal methods as part of the tooling spectrum

Formal verification is framed as another rung on the same ladder as types, tests, and static analysis: a stronger, more expensive oracle that gives clear yes/no answers.
Several people report success using LLMs with Lean, TLA+, Dafny, Verus, Quint, Z3, etc., to generate proofs or specs, saying this used to take months and now often takes days or hours.
A key attraction: proof checkers and model checkers reject nonsense, so hallucinated proof scripts are harmless as long as the spec is correct.

Specification is the hard problem

Multiple threads stress that writing and maintaining a correct spec is harder than generating code or proofs. Requirements are vague, conflicting, and constantly changing.
For typical business software, formalizing what users actually want (and keeping it in sync with reality) is seen as the main blocker, not proof search.
Some suggest AI can help iterate on specs (propose invariants, properties, models) but humans still must judge whether those are the right ones.

Where formal methods are likely vs unlikely

Strong agreement that formal methods shine where specs are compact and stakes are high: crypto, compilers, smart contracts, OS kernels, protocols, embedded/safety-critical systems, IAM.
Many doubt they’ll ever cover most day‑to‑day app work (complex UIs, changing business rules, multi-system interactions) where bugs are tolerable and economics favor speed over rigor.

Types, tests, and “lightweight formal”

Commenters note that advanced type systems, property-based testing, model-based testing, and analyzers (e.g., Roslyn) are already “formal-ish” verification that LLMs can strengthen by generating properties and tests.
Some foresee AI-backed property checks and model checking in CI/IDEs as the realistic “mainstream” path, rather than everyone writing Lean/Isabelle proofs.

Risks and skepticism

Concerns include: AI inserting malicious code into “tests,” spec/implementation mismatch, agents cheating by modifying tests, and over-trusting “AI-verified” systems.
A sizeable minority flatly reject the premise: they see LLMs as unreliable, formal methods as labor-intensive and niche, and expect organizations to accept AI-induced bugs rather than invest in rigorous verification.

Related topics