AI will make formal verification go mainstream

AI coding agents and the need for feedback loops

  • Many comments argue AI coders are only useful when tightly coupled to execution and validation tools: test runners, linters, fuzzers, debuggers, browser automation (Playwright), etc.
  • Without the ability to run tests, agents “go off the rails,” iterating random fixes or even fabricating tests or results.
  • Multi-agent and review workflows (separate “developer” and “reviewer” models, orchestration loops) greatly improve reliability but are expensive in API usage.

Formal methods as part of the tooling spectrum

  • Formal verification is framed as another rung on the same ladder as types, tests, and static analysis: a stronger, more expensive oracle that gives clear yes/no answers.
  • Several people report success using LLMs with Lean, TLA+, Dafny, Verus, Quint, Z3, etc., to generate proofs or specs, saying this used to take months and now often takes days or hours.
  • A key attraction: proof checkers and model checkers reject nonsense, so hallucinated proof scripts are harmless as long as the spec is correct.

Specification is the hard problem

  • Multiple threads stress that writing and maintaining a correct spec is harder than generating code or proofs. Requirements are vague, conflicting, and constantly changing.
  • For typical business software, formalizing what users actually want (and keeping it in sync with reality) is seen as the main blocker, not proof search.
  • Some suggest AI can help iterate on specs (propose invariants, properties, models) but humans still must judge whether those are the right ones.

Where formal methods are likely vs unlikely

  • Strong agreement that formal methods shine where specs are compact and stakes are high: crypto, compilers, smart contracts, OS kernels, protocols, embedded/safety-critical systems, IAM.
  • Many doubt they’ll ever cover most day‑to‑day app work (complex UIs, changing business rules, multi-system interactions) where bugs are tolerable and economics favor speed over rigor.

Types, tests, and “lightweight formal”

  • Commenters note that advanced type systems, property-based testing, model-based testing, and analyzers (e.g., Roslyn) are already “formal-ish” verification that LLMs can strengthen by generating properties and tests.
  • Some foresee AI-backed property checks and model checking in CI/IDEs as the realistic “mainstream” path, rather than everyone writing Lean/Isabelle proofs.

Risks and skepticism

  • Concerns include: AI inserting malicious code into “tests,” spec/implementation mismatch, agents cheating by modifying tests, and over-trusting “AI-verified” systems.
  • A sizeable minority flatly reject the premise: they see LLMs as unreliable, formal methods as labor-intensive and niche, and expect organizations to accept AI-induced bugs rather than invest in rigorous verification.