2026-05-09

A recent experience with ChatGPT 5.5 Pro

Perceived capabilities of ChatGPT 5.5 Pro in math

Seen solving nontrivial combinatorial problems and producing proofs that experts regard as publishable.
Particularly strong on discrete/combinatorial tasks; weaker and more error‑prone in analysis and some conceptual areas.
Some commenters say they can now routinely offload “weeks of pondering” to the model in under an hour, at least for well‑posed subproblems.
A subset of participants describe this as effectively “AGI” (or close enough for practical purposes); others call it powerful but narrow, with “jagged” intelligence and embarrassing failures on trivial questions.

Impact on mathematical research and PhD training

Concern: “gentle” starter problems for PhD students may now be solvable by LLMs, raising the minimum difficulty of human‑doable, AI‑novel problems.
Many argue that actually solving hard problems is critical for developing deep understanding and the ability to use AI well.
Worry that human recognition and “immortality” from proving theorems may erode if machines do most of the technical and even conceptual work.
Counterview: if AI increases the quantity and quality of mathematical results, that may be a net positive; value could shift from scarcity of ideas to their utility and interpretation.

Education and assessment

Undergraduate math foundations still seen as important; AI doesn’t remove need to understand calculus or proofs.
Graduate coursework and programming education are heavily disrupted: take‑home assignments are now trivial with LLMs.
Instructors report students “buying good grades” via paid models and are moving toward proctored, in‑room or paper exams, and code‑reading/debugging questions rather than pure coding.

Use patterns: assistant vs replacement

Many report strong success using LLMs as ultra‑fast “students” or junior colleagues: great at error‑spotting, ideation, literature navigation, and routine coding.
Equally common: stories of confident but conceptually wrong arguments, hallucinated connections, and brittle reasoning outside familiar areas.
Effective use requires domain expertise, clear expectations for the answer’s “shape,” and heavy emphasis on critical, negative feedback rather than blind trust.

Verification, “AI slop,” and publication norms

Broad agreement that AI‑generated work must be critically checked before sharing, especially with unsuspecting colleagues; “zero‑thought” AI output is seen as rude and fatiguing.
Worry about overwhelming journals and experts with unverified AI proofs.
Some propose a dedicated repository for AI‑produced mathematics, separate from venues that currently disallow AI‑written content, with humans certifying correctness and relevance.

Access, cost, and inequality

Researchers in less‑wealthy regions describe being unable to fund top‑tier subscriptions under existing grant and procurement rules.
Fear that well‑funded institutions will gain a major productivity edge as “frontier” models become core research infrastructure.
Others respond that, relative to salaries and other university expenses, AI tool costs are small and should be budgeted; disagreement remains on feasibility given bureaucracy and local wages.

Creativity, novelty, and the “next‑token” debate

Some insist LLMs merely recombine existing mathematics; others note that much human math is exactly that, and see no sharp line.
Debate over whether RL and long‑chain reasoning allow genuine “new ideas” or just better search within the training distribution.
Comparisons with compilers, calculators, and cars in racing: is steering a powerful tool still a major human achievement, or mostly credit to the tool’s creators?

Broader labor and societal concerns

Graduate students express sadness that their work may no longer feel unique or enduring; advice offered to focus on understanding and enjoyment over glory.
Discussion of “bullshit jobs”: many workers see strong incentives to use LLMs to save time even if it erodes their own skills, because colleagues will do so and employers may reward speed over depth.
Some foresee a shift where human value lies less in producing first‑order work and more in judgment, verification, taste, and choosing which problems are worth solving.

Related topics