Erdos 281 solved with ChatGPT 5.2 Pro

Status of the Erdős 281 Result

  • An LLM (ChatGPT 5.2 Pro) produced a proof of Erdős problem 281 in a single long reasoning run (~41 minutes) from a one-shot prompt.
  • A leading mathematician checked the proof and judged it correct and notably free of subtle errors (limits, quantifiers), initially classifying it as a clear AI-origin result.
  • Later, it was discovered that the result already follows from older work via known theorems; the problem was reclassified as “AI solution to a problem with prior literature.”

Novelty vs. Memorization / Training Data

  • Some argue this could just be LLM-style information retrieval from training data; others note the method appears different from the literature proof.
  • There is skepticism that one can really know what was in the training set, especially for closed models.
  • Another model (DeepSeek) also produced a proof; a third model claimed equivalence of the two. Commenters highlight that LLM “peer review” is not rigorous and tiny errors can invalidate a proof.
  • A separate discussion points out a prior route via an older theorem and a proof in Erdős’s own work, raising questions about how much novelty this represents.

Erdős Problems as a Benchmark

  • Erdős problems span a huge difficulty range: some are extremely hard, others are “long-tail” under-explored or low-hanging fruit.
  • They’re seen as a good AI benchmark: nontrivial, crisply stated, and with a curated list and wiki tracking AI contributions.

Impact on Mathematics Practice

  • Several see real value in using LLMs to:
    • Generate candidate proofs and strategies for formalization in systems like Lean.
    • Accelerate literature search and uncover obscure results.
    • Systematically clear “easy” but neglected problems and map what’s genuinely hard.
  • Others question the benefit if proofs are machine-verified and ticked off but not actually digested by humans.

AI Capability, Hype, and Coding Analogies

  • Some view this as evidence that LLMs are becoming strong at “logic work” and will outpace humans in code and math, with holdouts “using them wrong.”
  • Skeptics counter with everyday failures (buggy code, hallucinations) and see claims of imminent developer replacement or AGI as hype.
  • A middle view: those who don’t learn to use these tools will be replaced by those who do, but the tools themselves won’t replace most experts yet.

Intelligence vs. Pattern Matching

  • A large subthread debates whether LLMs are “just pattern matchers” or genuinely intelligent systems with internal world models.
  • Some argue that even if it is high-dimensional pattern matching, that may be essentially what (a large part of) human intelligence is.
  • Others emphasize that LLMs lack common sense, judgment, and conscious understanding, characterizing them as powerful but alien reasoning systems.

Attribution, Ethics, and Pure Math Value

  • There is speculation that some professionals may already be using LLM assistance without attribution; norms are unclear (acknowledgments vs. co-authorship vs. silence).
  • A few question the importance of such pure-math results at all, suggesting many Erdős-type problems are intellectually recreational; others defend pure math as historically and potentially practically valuable.