2026-02-13

GPT-5.2 derives a new result in theoretical physics

What GPT-5.2 Actually Contributed

Humans framed a specific scattering-amplitude problem, computed low‑n base cases with very complicated expressions, and suspected a simpler closed form.
GPT‑5.2 (in an internal “scaffolded” setup) spent ~12 hours simplifying those expressions, spotting a simple pattern, conjecturing a formula valid for all n, and producing a formal proof.
Human physicists then checked the result and extended it into a full paper; GPT did not autonomously choose the problem or write the paper.

Novelty, Validity, and Literature Concerns

Several commenters stress this is a preprint: theoretical-physics results often later get weakened, corrected, or quietly superseded.
Some worry it may just repackage known structures (e.g. Parke–Taylor / MHV work) rather than produce something fundamentally new, though the authors explicitly cite that literature.
There is broader context of earlier “AI solved Erdős problems” claims where some “novel” solutions turned out to be already in the literature or minor variants.
One physicist reading the paper finds the key generalized formula almost obvious once the n≤6 expressions are simplified, and suggests a CAS could plausibly have done the same.

Tool vs Collaborator: How to Attribute Credit

Strong dispute over whether this is like “a calculator helped” or “a genuine co‑author.”
Some argue GPT only refactored a pattern that humans then verified, so the headline overstates things.
Others say an agent that autonomously runs for hours, reorganizes the calculation, conjectures, and proves something the humans had failed to find merits serious research credit; hence an institutional OpenAI authorship.

Capabilities, Limits, and “New Ideas”

Many see this as exactly the sweet spot for LLMs: verifiable domains with test suites or formal checkers, where brute‑force structured exploration is valuable.
Skeptics argue that so far LLMs mainly recombine existing ideas “in distribution” rather than producing paradigm‑shifting insights; defenders reply that most human advances are also recombinations.
Discussion spills into whether anything humans do is more than refined brute‑force search, and whether current models yet show evidence of genuine out‑of‑distribution creativity.

Scaffolding, Long Runs, and Engineering Details

Curiosity about how a 12‑hour run was orchestrated: likely multiple rounds of reasoning with context compaction (summarizing prior work into new prompts), possibly parallel branches and verification loops.
Some users note current public “thinking” modes cut off around 30–60 minutes and require manual restarts; they want access to similar long‑horizon setups.

Perceived Significance for Physics

Domain commenters describe the result as a nontrivial but quite specialized simplification/generalization within an already well‑developed amplitudes program, not a headline‑level revolution.
Several emphasize that the hardest parts of physics are often: choosing good questions, connecting to experiment, and spotting which abstruse results actually matter—tasks where LLMs are still unproven.

Hype, Marketing, and Societal Reactions

Many see the blog post as a carefully timed marketing piece (especially with an OpenAI employee on the author list), paralleling earlier overhyped AI “breakthroughs.”
Others push back on the growing instinct to dismiss every AI-assisted result, noting that comparable human‑only achievements would be uncontroversially respected.
There is extensive meta‑discussion about “moving the goalposts,” job anxiety, and the way AI success stories are being used in narratives about replacing knowledge workers.

Related topics