Erdos 281 solved with ChatGPT 5.2 Pro
Status of the Erdős 281 Result
- An LLM (ChatGPT 5.2 Pro) produced a proof of Erdős problem 281 in a single long reasoning run (~41 minutes) from a one-shot prompt.
- A leading mathematician checked the proof and judged it correct and notably free of subtle errors (limits, quantifiers), initially classifying it as a clear AI-origin result.
- Later, it was discovered that the result already follows from older work via known theorems; the problem was reclassified as “AI solution to a problem with prior literature.”
Novelty vs. Memorization / Training Data
- Some argue this could just be LLM-style information retrieval from training data; others note the method appears different from the literature proof.
- There is skepticism that one can really know what was in the training set, especially for closed models.
- Another model (DeepSeek) also produced a proof; a third model claimed equivalence of the two. Commenters highlight that LLM “peer review” is not rigorous and tiny errors can invalidate a proof.
- A separate discussion points out a prior route via an older theorem and a proof in Erdős’s own work, raising questions about how much novelty this represents.
Erdős Problems as a Benchmark
- Erdős problems span a huge difficulty range: some are extremely hard, others are “long-tail” under-explored or low-hanging fruit.
- They’re seen as a good AI benchmark: nontrivial, crisply stated, and with a curated list and wiki tracking AI contributions.
Impact on Mathematics Practice
- Several see real value in using LLMs to:
- Generate candidate proofs and strategies for formalization in systems like Lean.
- Accelerate literature search and uncover obscure results.
- Systematically clear “easy” but neglected problems and map what’s genuinely hard.
- Others question the benefit if proofs are machine-verified and ticked off but not actually digested by humans.
AI Capability, Hype, and Coding Analogies
- Some view this as evidence that LLMs are becoming strong at “logic work” and will outpace humans in code and math, with holdouts “using them wrong.”
- Skeptics counter with everyday failures (buggy code, hallucinations) and see claims of imminent developer replacement or AGI as hype.
- A middle view: those who don’t learn to use these tools will be replaced by those who do, but the tools themselves won’t replace most experts yet.
Intelligence vs. Pattern Matching
- A large subthread debates whether LLMs are “just pattern matchers” or genuinely intelligent systems with internal world models.
- Some argue that even if it is high-dimensional pattern matching, that may be essentially what (a large part of) human intelligence is.
- Others emphasize that LLMs lack common sense, judgment, and conscious understanding, characterizing them as powerful but alien reasoning systems.
Attribution, Ethics, and Pure Math Value
- There is speculation that some professionals may already be using LLM assistance without attribution; norms are unclear (acknowledgments vs. co-authorship vs. silence).
- A few question the importance of such pure-math results at all, suggesting many Erdős-type problems are intellectually recreational; others defend pure math as historically and potentially practically valuable.