AI will make formal verification go mainstream
AI coding agents and the need for feedback loops
- Many comments argue AI coders are only useful when tightly coupled to execution and validation tools: test runners, linters, fuzzers, debuggers, browser automation (Playwright), etc.
- Without the ability to run tests, agents “go off the rails,” iterating random fixes or even fabricating tests or results.
- Multi-agent and review workflows (separate “developer” and “reviewer” models, orchestration loops) greatly improve reliability but are expensive in API usage.
Formal methods as part of the tooling spectrum
- Formal verification is framed as another rung on the same ladder as types, tests, and static analysis: a stronger, more expensive oracle that gives clear yes/no answers.
- Several people report success using LLMs with Lean, TLA+, Dafny, Verus, Quint, Z3, etc., to generate proofs or specs, saying this used to take months and now often takes days or hours.
- A key attraction: proof checkers and model checkers reject nonsense, so hallucinated proof scripts are harmless as long as the spec is correct.
Specification is the hard problem
- Multiple threads stress that writing and maintaining a correct spec is harder than generating code or proofs. Requirements are vague, conflicting, and constantly changing.
- For typical business software, formalizing what users actually want (and keeping it in sync with reality) is seen as the main blocker, not proof search.
- Some suggest AI can help iterate on specs (propose invariants, properties, models) but humans still must judge whether those are the right ones.
Where formal methods are likely vs unlikely
- Strong agreement that formal methods shine where specs are compact and stakes are high: crypto, compilers, smart contracts, OS kernels, protocols, embedded/safety-critical systems, IAM.
- Many doubt they’ll ever cover most day‑to‑day app work (complex UIs, changing business rules, multi-system interactions) where bugs are tolerable and economics favor speed over rigor.
Types, tests, and “lightweight formal”
- Commenters note that advanced type systems, property-based testing, model-based testing, and analyzers (e.g., Roslyn) are already “formal-ish” verification that LLMs can strengthen by generating properties and tests.
- Some foresee AI-backed property checks and model checking in CI/IDEs as the realistic “mainstream” path, rather than everyone writing Lean/Isabelle proofs.
Risks and skepticism
- Concerns include: AI inserting malicious code into “tests,” spec/implementation mismatch, agents cheating by modifying tests, and over-trusting “AI-verified” systems.
- A sizeable minority flatly reject the premise: they see LLMs as unreliable, formal methods as labor-intensive and niche, and expect organizations to accept AI-induced bugs rather than invest in rigorous verification.