2026-01-18

Erdos 281 solved with ChatGPT 5.2 Pro

Status of the Erdős 281 Result

An LLM (ChatGPT 5.2 Pro) produced a proof of Erdős problem 281 in a single long reasoning run (~41 minutes) from a one-shot prompt.
A leading mathematician checked the proof and judged it correct and notably free of subtle errors (limits, quantifiers), initially classifying it as a clear AI-origin result.
Later, it was discovered that the result already follows from older work via known theorems; the problem was reclassified as “AI solution to a problem with prior literature.”

Novelty vs. Memorization / Training Data

Some argue this could just be LLM-style information retrieval from training data; others note the method appears different from the literature proof.
There is skepticism that one can really know what was in the training set, especially for closed models.
Another model (DeepSeek) also produced a proof; a third model claimed equivalence of the two. Commenters highlight that LLM “peer review” is not rigorous and tiny errors can invalidate a proof.
A separate discussion points out a prior route via an older theorem and a proof in Erdős’s own work, raising questions about how much novelty this represents.

Erdős Problems as a Benchmark

Erdős problems span a huge difficulty range: some are extremely hard, others are “long-tail” under-explored or low-hanging fruit.
They’re seen as a good AI benchmark: nontrivial, crisply stated, and with a curated list and wiki tracking AI contributions.

Impact on Mathematics Practice

Several see real value in using LLMs to:
- Generate candidate proofs and strategies for formalization in systems like Lean.
- Accelerate literature search and uncover obscure results.
- Systematically clear “easy” but neglected problems and map what’s genuinely hard.
Others question the benefit if proofs are machine-verified and ticked off but not actually digested by humans.

AI Capability, Hype, and Coding Analogies

Some view this as evidence that LLMs are becoming strong at “logic work” and will outpace humans in code and math, with holdouts “using them wrong.”
Skeptics counter with everyday failures (buggy code, hallucinations) and see claims of imminent developer replacement or AGI as hype.
A middle view: those who don’t learn to use these tools will be replaced by those who do, but the tools themselves won’t replace most experts yet.

Intelligence vs. Pattern Matching

A large subthread debates whether LLMs are “just pattern matchers” or genuinely intelligent systems with internal world models.
Some argue that even if it is high-dimensional pattern matching, that may be essentially what (a large part of) human intelligence is.
Others emphasize that LLMs lack common sense, judgment, and conscious understanding, characterizing them as powerful but alien reasoning systems.

Attribution, Ethics, and Pure Math Value

There is speculation that some professionals may already be using LLM assistance without attribution; norms are unclear (acknowledgments vs. co-authorship vs. silence).
A few question the importance of such pure-math results at all, suggesting many Erdős-type problems are intellectually recreational; others defend pure math as historically and potentially practically valuable.

Related topics