A recent experience with ChatGPT 5.5 Pro

Perceived capabilities of ChatGPT 5.5 Pro in math

  • Seen solving nontrivial combinatorial problems and producing proofs that experts regard as publishable.
  • Particularly strong on discrete/combinatorial tasks; weaker and more error‑prone in analysis and some conceptual areas.
  • Some commenters say they can now routinely offload “weeks of pondering” to the model in under an hour, at least for well‑posed subproblems.
  • A subset of participants describe this as effectively “AGI” (or close enough for practical purposes); others call it powerful but narrow, with “jagged” intelligence and embarrassing failures on trivial questions.

Impact on mathematical research and PhD training

  • Concern: “gentle” starter problems for PhD students may now be solvable by LLMs, raising the minimum difficulty of human‑doable, AI‑novel problems.
  • Many argue that actually solving hard problems is critical for developing deep understanding and the ability to use AI well.
  • Worry that human recognition and “immortality” from proving theorems may erode if machines do most of the technical and even conceptual work.
  • Counterview: if AI increases the quantity and quality of mathematical results, that may be a net positive; value could shift from scarcity of ideas to their utility and interpretation.

Education and assessment

  • Undergraduate math foundations still seen as important; AI doesn’t remove need to understand calculus or proofs.
  • Graduate coursework and programming education are heavily disrupted: take‑home assignments are now trivial with LLMs.
  • Instructors report students “buying good grades” via paid models and are moving toward proctored, in‑room or paper exams, and code‑reading/debugging questions rather than pure coding.

Use patterns: assistant vs replacement

  • Many report strong success using LLMs as ultra‑fast “students” or junior colleagues: great at error‑spotting, ideation, literature navigation, and routine coding.
  • Equally common: stories of confident but conceptually wrong arguments, hallucinated connections, and brittle reasoning outside familiar areas.
  • Effective use requires domain expertise, clear expectations for the answer’s “shape,” and heavy emphasis on critical, negative feedback rather than blind trust.

Verification, “AI slop,” and publication norms

  • Broad agreement that AI‑generated work must be critically checked before sharing, especially with unsuspecting colleagues; “zero‑thought” AI output is seen as rude and fatiguing.
  • Worry about overwhelming journals and experts with unverified AI proofs.
  • Some propose a dedicated repository for AI‑produced mathematics, separate from venues that currently disallow AI‑written content, with humans certifying correctness and relevance.

Access, cost, and inequality

  • Researchers in less‑wealthy regions describe being unable to fund top‑tier subscriptions under existing grant and procurement rules.
  • Fear that well‑funded institutions will gain a major productivity edge as “frontier” models become core research infrastructure.
  • Others respond that, relative to salaries and other university expenses, AI tool costs are small and should be budgeted; disagreement remains on feasibility given bureaucracy and local wages.

Creativity, novelty, and the “next‑token” debate

  • Some insist LLMs merely recombine existing mathematics; others note that much human math is exactly that, and see no sharp line.
  • Debate over whether RL and long‑chain reasoning allow genuine “new ideas” or just better search within the training distribution.
  • Comparisons with compilers, calculators, and cars in racing: is steering a powerful tool still a major human achievement, or mostly credit to the tool’s creators?

Broader labor and societal concerns

  • Graduate students express sadness that their work may no longer feel unique or enduring; advice offered to focus on understanding and enjoyment over glory.
  • Discussion of “bullshit jobs”: many workers see strong incentives to use LLMs to save time even if it erodes their own skills, because colleagues will do so and employers may reward speed over depth.
  • Some foresee a shift where human value lies less in producing first‑order work and more in judgment, verification, taste, and choosing which problems are worth solving.