2025-10-19

OpenAI researcher announced GPT-5 math breakthrough that never happened

What Actually Happened with the “GPT-5 Math Breakthrough”

GPT‑5 was used to query a community Erdos problem database; it surfaced existing published solutions to problems still marked “open” there.
The original researchers framed this as “superhuman literature search.”
A senior OpenAI exec then amplified it as “GPT‑5 just found solutions to 10 previously unsolved Erdos problems,” which many read as “novel solutions to unsolved problems.”
Mathematicians pointed out the problems had been solved years earlier and that the site’s “open” status only reflected the maintainer’s knowledge lag, not actual unsolved status.
The OpenAI exec later retracted, calling it a misunderstanding; some commenters see this as an honest mistake, others as part of a pattern of overclaiming.

Hype, Trust, and OpenAI Culture

Many argue this incident illustrates an institutional bias toward sensational claims (“science revolution,” “AGI achieved internally”), weak internal verification, and marketing-driven communication.
Others say the pile-on is disproportionate to the actual error and driven by generalized anti‑OpenAI sentiment.
Several note similar miscrediting episodes at other labs (e.g., AI “discovering” math or algorithms that already exist in the literature).

Hallucinations, Human Error, and Responsibility

Thread plays on “humans hallucinating about AI”: people at OpenAI believing their own hype and misreading ambiguous tweets.
Debate over whether this is best seen as hallucination, negligence, or lying; Hanlon’s razor is invoked, but corporate incentives (“salary depends on not understanding”) are emphasized.
Many stress that extraordinary mathematical claims should face extraordinary internal scrutiny before going public.

What LLMs Are Actually Good At in Math & Research

Strong consensus that LLMs are currently poor at genuinely novel math or complex reasoning without heavy tool support.
Some describe GPT‑5‑style models as excellent semantic search / literature assistants:
- Good at surfacing obscure or cross‑field papers and building reading lists.
- Bad at reliably summarizing or evaluating the literature; hallucinated citations remain common.
Others say even as search helpers they’re “highly convincing counterfeits” and too error‑prone for serious work, especially with older or niche technical material.
Several suggest the real frontier value is better semantic search and citation graph tooling, not “AI solves open problems.”

Broader Reflections: AGI, Bubble, and Pivot to Slop

Many see this as one more data point that we are far from AGI and that LLM “reasoning” progress has slowed; claims of near‑term super‑intelligence are seen as hype.
Some fear an AI investment bubble whose collapse could damage broader tech and even the economy; others think impact would be closer to a contained sector correction.
OpenAI’s recent pivots to ads, in‑chat commerce, and adult content are read by some as evidence of “enshittification” and desperation for monetization rather than deep research seriousness.

Related topics