“Erdos problem #728 was solved more or less autonomously by AI”

Scope and nature of the result

  • Erdős problem #728 was solved via a pipeline combining:
    • An informal proof sketched in English.
    • Translation and refinement into Lean by Aristotle (a theorem-proving system).
    • Back-and-forth with a frontier LLM (ChatGPT 5.2) and automated proof search.
  • The Lean statement of the theorem was written by humans; the long formal proof (≈1400 lines) was machine-generated and checked by Lean’s small, trusted kernel.
  • Autonomy is debated: some see this as “90/10” AI/human on the proof itself; others stress the human role in posing the problem, checking the statement, iterating prompts, and interpreting gaps.

What Aristotle / the AI stack actually is

  • Aristotle integrates:
    • A Lean proof search system guided by a large transformer (policy/value model).
    • An informal reasoning component that generates and formalizes lemmas.
    • A geometry solver inspired by AlphaGeometry.
  • Thread participants argue over terminology:
    • One side: this is “just” LLMs plus tools (all major stages use large language models).
    • Other side: calling it “an LLM” is misleading; it’s a task-specific neuro‑symbolic system trained on formal math, with custom losses and search, more like AlphaFold than a chat model.
  • Consensus: whatever the label, the key advance is tight coupling of powerful models with formal verification.

Significance for mathematics and theorem proving

  • Seen as a clear capability jump in:
    • Rapid refactoring and rewriting of arguments.
    • Turning reasonably correct informal proofs into fully formal Lean proofs.
    • Systematic reuse and “remixing” of existing methods at scale.
  • Some compare this to earlier tools like Isabelle’s Sledgehammer/CoqHammer, but note the new systems are much more powerful and general.
  • Many expect accelerating formalization of math (mathlib, Erdős problems, FLT, IMO problems) and eventual large‑scale automated checking of the literature.

Verification, formalization, and trust

  • Lean eliminates many LLM failure modes: if the formal statement is correct and Lean accepts the proof, the theorem is logically proved.
  • The remaining hard part is formalizing the intended statement:
    • Natural-language problems can be subtly misencoded.
    • Participants emphasize human review of the Lean statement and definitions, especially in tricky areas (analysis, probability, topology).
  • Some point out that this “specification gap” already exists in human proofs and software verification.

AGI vs “clever tools”

  • Several commenters insist this is still narrow AI:
    • Works in a domain with automatic checking and strong structure.
    • Requires expert guidance and extensive scaffolding.
  • Others see it as an early herald of AGI or “artificial general cleverness”:
    • The same models that do code, language, and now frontier math may generalize further.
    • Fears and hopes about knowledge-work automation and career impacts are expressed, but no consensus emerges.

Practitioner experiences and limits

  • Researchers report:
    • Strong value for literature search, routine math, and explaining known tools.
    • Much weaker performance on bleeding-edge questions in fields like quantum computing or specialized algorithms.
    • AI is often more like a fast, fallible collaborator or “rubber duck” than an independent discoverer.
  • A recurring theme: AI excels at producing the 2nd, 3rd, 10th version of an idea once a human has a first draft or hunch.