GPT-5.2 derives a new result in theoretical physics

What GPT-5.2 Actually Contributed

  • Humans framed a specific scattering-amplitude problem, computed low‑n base cases with very complicated expressions, and suspected a simpler closed form.
  • GPT‑5.2 (in an internal “scaffolded” setup) spent ~12 hours simplifying those expressions, spotting a simple pattern, conjecturing a formula valid for all n, and producing a formal proof.
  • Human physicists then checked the result and extended it into a full paper; GPT did not autonomously choose the problem or write the paper.

Novelty, Validity, and Literature Concerns

  • Several commenters stress this is a preprint: theoretical-physics results often later get weakened, corrected, or quietly superseded.
  • Some worry it may just repackage known structures (e.g. Parke–Taylor / MHV work) rather than produce something fundamentally new, though the authors explicitly cite that literature.
  • There is broader context of earlier “AI solved Erdős problems” claims where some “novel” solutions turned out to be already in the literature or minor variants.
  • One physicist reading the paper finds the key generalized formula almost obvious once the n≤6 expressions are simplified, and suggests a CAS could plausibly have done the same.

Tool vs Collaborator: How to Attribute Credit

  • Strong dispute over whether this is like “a calculator helped” or “a genuine co‑author.”
  • Some argue GPT only refactored a pattern that humans then verified, so the headline overstates things.
  • Others say an agent that autonomously runs for hours, reorganizes the calculation, conjectures, and proves something the humans had failed to find merits serious research credit; hence an institutional OpenAI authorship.

Capabilities, Limits, and “New Ideas”

  • Many see this as exactly the sweet spot for LLMs: verifiable domains with test suites or formal checkers, where brute‑force structured exploration is valuable.
  • Skeptics argue that so far LLMs mainly recombine existing ideas “in distribution” rather than producing paradigm‑shifting insights; defenders reply that most human advances are also recombinations.
  • Discussion spills into whether anything humans do is more than refined brute‑force search, and whether current models yet show evidence of genuine out‑of‑distribution creativity.

Scaffolding, Long Runs, and Engineering Details

  • Curiosity about how a 12‑hour run was orchestrated: likely multiple rounds of reasoning with context compaction (summarizing prior work into new prompts), possibly parallel branches and verification loops.
  • Some users note current public “thinking” modes cut off around 30–60 minutes and require manual restarts; they want access to similar long‑horizon setups.

Perceived Significance for Physics

  • Domain commenters describe the result as a nontrivial but quite specialized simplification/generalization within an already well‑developed amplitudes program, not a headline‑level revolution.
  • Several emphasize that the hardest parts of physics are often: choosing good questions, connecting to experiment, and spotting which abstruse results actually matter—tasks where LLMs are still unproven.

Hype, Marketing, and Societal Reactions

  • Many see the blog post as a carefully timed marketing piece (especially with an OpenAI employee on the author list), paralleling earlier overhyped AI “breakthroughs.”
  • Others push back on the growing instinct to dismiss every AI-assisted result, noting that comparable human‑only achievements would be uncontroversially respected.
  • There is extensive meta‑discussion about “moving the goalposts,” job anxiety, and the way AI success stories are being used in narratives about replacing knowledge workers.