A recent experience with ChatGPT 5.5 Pro
Perceived capabilities of ChatGPT 5.5 Pro in math
- Seen solving nontrivial combinatorial problems and producing proofs that experts regard as publishable.
- Particularly strong on discrete/combinatorial tasks; weaker and more error‑prone in analysis and some conceptual areas.
- Some commenters say they can now routinely offload “weeks of pondering” to the model in under an hour, at least for well‑posed subproblems.
- A subset of participants describe this as effectively “AGI” (or close enough for practical purposes); others call it powerful but narrow, with “jagged” intelligence and embarrassing failures on trivial questions.
Impact on mathematical research and PhD training
- Concern: “gentle” starter problems for PhD students may now be solvable by LLMs, raising the minimum difficulty of human‑doable, AI‑novel problems.
- Many argue that actually solving hard problems is critical for developing deep understanding and the ability to use AI well.
- Worry that human recognition and “immortality” from proving theorems may erode if machines do most of the technical and even conceptual work.
- Counterview: if AI increases the quantity and quality of mathematical results, that may be a net positive; value could shift from scarcity of ideas to their utility and interpretation.
Education and assessment
- Undergraduate math foundations still seen as important; AI doesn’t remove need to understand calculus or proofs.
- Graduate coursework and programming education are heavily disrupted: take‑home assignments are now trivial with LLMs.
- Instructors report students “buying good grades” via paid models and are moving toward proctored, in‑room or paper exams, and code‑reading/debugging questions rather than pure coding.
Use patterns: assistant vs replacement
- Many report strong success using LLMs as ultra‑fast “students” or junior colleagues: great at error‑spotting, ideation, literature navigation, and routine coding.
- Equally common: stories of confident but conceptually wrong arguments, hallucinated connections, and brittle reasoning outside familiar areas.
- Effective use requires domain expertise, clear expectations for the answer’s “shape,” and heavy emphasis on critical, negative feedback rather than blind trust.
Verification, “AI slop,” and publication norms
- Broad agreement that AI‑generated work must be critically checked before sharing, especially with unsuspecting colleagues; “zero‑thought” AI output is seen as rude and fatiguing.
- Worry about overwhelming journals and experts with unverified AI proofs.
- Some propose a dedicated repository for AI‑produced mathematics, separate from venues that currently disallow AI‑written content, with humans certifying correctness and relevance.
Access, cost, and inequality
- Researchers in less‑wealthy regions describe being unable to fund top‑tier subscriptions under existing grant and procurement rules.
- Fear that well‑funded institutions will gain a major productivity edge as “frontier” models become core research infrastructure.
- Others respond that, relative to salaries and other university expenses, AI tool costs are small and should be budgeted; disagreement remains on feasibility given bureaucracy and local wages.
Creativity, novelty, and the “next‑token” debate
- Some insist LLMs merely recombine existing mathematics; others note that much human math is exactly that, and see no sharp line.
- Debate over whether RL and long‑chain reasoning allow genuine “new ideas” or just better search within the training distribution.
- Comparisons with compilers, calculators, and cars in racing: is steering a powerful tool still a major human achievement, or mostly credit to the tool’s creators?
Broader labor and societal concerns
- Graduate students express sadness that their work may no longer feel unique or enduring; advice offered to focus on understanding and enjoyment over glory.
- Discussion of “bullshit jobs”: many workers see strong incentives to use LLMs to save time even if it erodes their own skills, because colleagues will do so and employers may reward speed over depth.
- Some foresee a shift where human value lies less in producing first‑order work and more in judgment, verification, taste, and choosing which problems are worth solving.