Lean proved this program correct; then I found a bug
What actually went wrong
- The verified part was the core zlib compression/decompression logic in Lean.
- Two bugs surfaced:
- A denial-of-service via unverified archive-header parsing in the Lean-zip code.
- A heap buffer overflow in the Lean runtime’s scalar array support, affecting any Lean program using that feature.
- Both issues were outside the formal proof’s stated scope, so the core correctness proof was not invalidated.
Limits of formal verification
- Multiple comments stress: proofs are always “correct with respect to a specification,” not to human intent or the full system.
- Common failure modes:
- Spec bugs (the spec doesn’t capture what was really wanted).
- Spec gaps (important behaviors like parsers, resource limits, or adversarial inputs not modeled).
- Unverified layers (runtime, OS, hardware, compilers).
- Some argue a “verified system” should be understood as the whole binary stack; others say that’s unrealistic unless every component is verified.
Trust in verification tools
- Concern: why trust the verifier more than the verified code?
- Replies: verifiers can be simpler, heavily scrutinized, and used by many, making serious undetected bugs less likely (though not impossible).
- Distinction emphasized between Lean’s kernel (logic/proofs) and runtime (execution/FFI); a runtime bug doesn’t directly falsify existing mathematical proofs.
Spec correctness and completeness
- Discussion about needing methods to “verify the specs” themselves, including adversarial thinking and high-level property proofs.
- Recognized gap between:
- Implementation soundness (“the code does X”) and
- Design soundness (“doing X is actually the right, safe thing”).
- Compression is highlighted as a good domain: simple correctness spec, complex implementation, but still vulnerable to unmodeled behaviors (e.g., malformed inputs).
AI, fuzzing, and “vibe-coded” proofs
- Several commenters report success using LLMs to help write proofs and models (Lean, TLA+, etc.).
- The AI-assisted fuzzing that found the Lean runtime overflow is seen as a significant, positive result.
- Enthusiasm that combining AI, fuzzing, and formal methods can narrow where bugs hide, even if perfection remains impossible.
Title and framing debate
- Many see the article title as misleading or clickbait, implying a flaw in the proof itself.
- Others defend it as fair from an end-user perspective: if the binary crashes or is exploitable, the “verified program” is still buggy in practice.