Lean proved this program correct; then I found a bug

What actually went wrong

  • The verified part was the core zlib compression/decompression logic in Lean.
  • Two bugs surfaced:
    • A denial-of-service via unverified archive-header parsing in the Lean-zip code.
    • A heap buffer overflow in the Lean runtime’s scalar array support, affecting any Lean program using that feature.
  • Both issues were outside the formal proof’s stated scope, so the core correctness proof was not invalidated.

Limits of formal verification

  • Multiple comments stress: proofs are always “correct with respect to a specification,” not to human intent or the full system.
  • Common failure modes:
    • Spec bugs (the spec doesn’t capture what was really wanted).
    • Spec gaps (important behaviors like parsers, resource limits, or adversarial inputs not modeled).
    • Unverified layers (runtime, OS, hardware, compilers).
  • Some argue a “verified system” should be understood as the whole binary stack; others say that’s unrealistic unless every component is verified.

Trust in verification tools

  • Concern: why trust the verifier more than the verified code?
  • Replies: verifiers can be simpler, heavily scrutinized, and used by many, making serious undetected bugs less likely (though not impossible).
  • Distinction emphasized between Lean’s kernel (logic/proofs) and runtime (execution/FFI); a runtime bug doesn’t directly falsify existing mathematical proofs.

Spec correctness and completeness

  • Discussion about needing methods to “verify the specs” themselves, including adversarial thinking and high-level property proofs.
  • Recognized gap between:
    • Implementation soundness (“the code does X”) and
    • Design soundness (“doing X is actually the right, safe thing”).
  • Compression is highlighted as a good domain: simple correctness spec, complex implementation, but still vulnerable to unmodeled behaviors (e.g., malformed inputs).

AI, fuzzing, and “vibe-coded” proofs

  • Several commenters report success using LLMs to help write proofs and models (Lean, TLA+, etc.).
  • The AI-assisted fuzzing that found the Lean runtime overflow is seen as a significant, positive result.
  • Enthusiasm that combining AI, fuzzing, and formal methods can narrow where bugs hide, even if perfection remains impossible.

Title and framing debate

  • Many see the article title as misleading or clickbait, implying a flaw in the proof itself.
  • Others defend it as fair from an end-user perspective: if the binary crashes or is exploitable, the “verified program” is still buggy in practice.