“Erdos problem #728 was solved more or less autonomously by AI”
Scope and nature of the result
- Erdős problem #728 was solved via a pipeline combining:
- An informal proof sketched in English.
- Translation and refinement into Lean by Aristotle (a theorem-proving system).
- Back-and-forth with a frontier LLM (ChatGPT 5.2) and automated proof search.
- The Lean statement of the theorem was written by humans; the long formal proof (≈1400 lines) was machine-generated and checked by Lean’s small, trusted kernel.
- Autonomy is debated: some see this as “90/10” AI/human on the proof itself; others stress the human role in posing the problem, checking the statement, iterating prompts, and interpreting gaps.
What Aristotle / the AI stack actually is
- Aristotle integrates:
- A Lean proof search system guided by a large transformer (policy/value model).
- An informal reasoning component that generates and formalizes lemmas.
- A geometry solver inspired by AlphaGeometry.
- Thread participants argue over terminology:
- One side: this is “just” LLMs plus tools (all major stages use large language models).
- Other side: calling it “an LLM” is misleading; it’s a task-specific neuro‑symbolic system trained on formal math, with custom losses and search, more like AlphaFold than a chat model.
- Consensus: whatever the label, the key advance is tight coupling of powerful models with formal verification.
Significance for mathematics and theorem proving
- Seen as a clear capability jump in:
- Rapid refactoring and rewriting of arguments.
- Turning reasonably correct informal proofs into fully formal Lean proofs.
- Systematic reuse and “remixing” of existing methods at scale.
- Some compare this to earlier tools like Isabelle’s Sledgehammer/CoqHammer, but note the new systems are much more powerful and general.
- Many expect accelerating formalization of math (mathlib, Erdős problems, FLT, IMO problems) and eventual large‑scale automated checking of the literature.
Verification, formalization, and trust
- Lean eliminates many LLM failure modes: if the formal statement is correct and Lean accepts the proof, the theorem is logically proved.
- The remaining hard part is formalizing the intended statement:
- Natural-language problems can be subtly misencoded.
- Participants emphasize human review of the Lean statement and definitions, especially in tricky areas (analysis, probability, topology).
- Some point out that this “specification gap” already exists in human proofs and software verification.
AGI vs “clever tools”
- Several commenters insist this is still narrow AI:
- Works in a domain with automatic checking and strong structure.
- Requires expert guidance and extensive scaffolding.
- Others see it as an early herald of AGI or “artificial general cleverness”:
- The same models that do code, language, and now frontier math may generalize further.
- Fears and hopes about knowledge-work automation and career impacts are expressed, but no consensus emerges.
Practitioner experiences and limits
- Researchers report:
- Strong value for literature search, routine math, and explaining known tools.
- Much weaker performance on bleeding-edge questions in fields like quantum computing or specialized algorithms.
- AI is often more like a fast, fallible collaborator or “rubber duck” than an independent discoverer.
- A recurring theme: AI excels at producing the 2nd, 3rd, 10th version of an idea once a human has a first draft or hunch.