GPT-5: Overdue, overhyped and underwhelming. And that's not the worst of it
Perceived Performance of GPT‑5 / GPT‑5 Pro
- Many see GPT‑5 as an incremental upgrade, not a breakthrough; some describe it as a “cost‑cutting initiative” rather than a frontier model.
- Several heavy users say GPT‑5 Pro is state of the art for logic, data analysis, and complex bug‑hunting, beating Grok, Gemini, and others in specific coding tasks.
- Others find it only marginally better than o3‑pro (0–2% more “knowledgeable”, slightly more inventive) and significantly slower, with similar “tone”.
- A sizeable group reports degradation vs o3: weaker deep analysis, worse at large codebase reasoning, more context loss, and more hallucinations.
Comparisons with Other Models
- o3 (and earlier o1‑pro) is repeatedly cited as superior for deep code analysis, bug‑finding, and long, structured reasoning; some users “miss o3 heavily”.
- For prose and creative writing, multiple commenters prefer Kimi K2 and DeepSeek R1; Claude Opus is praised for stylized writing despite quirks.
- Some users see Claude and Gemini free tiers as “good enough”, reducing the incentive to pay for GPT‑5.
Routing, Product Strategy, and Cost
- GPT‑5 is widely interpreted as a mass‑market product with a routing layer: fast cheap mode for most, expensive reasoning only when needed.
- Power users dislike opaque routing and “magic words” to trigger reasoning; they want direct model selection and transparency.
- There’s speculation that earlier reasoning models ran at higher compute and were later “turned down” for cost; GPT‑5/o3 are seen as heavily quantized.
UX, Reliability, and Regressions
- Reports of GPT‑5 losing conversation context, becoming abruptly terse, or “forgetting” prior steps; some compare it to talking to someone who wasn’t listening.
- Complaints about UI slowness, tab freezes, and context silently truncated well below the advertised 128k, perceived as cost‑saving and “unethical”.
- At launch, custom GPTs, Deep Research, and Projects were described as broken or ignoring instructions; some of this was later reported fixed.
- “Thinking” mode is often slow and sometimes veers off‑topic; some say it over‑uses reasoning, others that it doesn’t think deeply enough vs o3.
Hallucinations and the “I Don’t Know” Problem
- Users remain frustrated by confident hallucinations (e.g., invented APIs, misreading “research” sources); 30‑minute dead‑ends are common anecdotes.
- Many argue the biggest needed improvement is honest “I don’t know”; one RAG user notes GPT‑5 is the first model that reliably does this in their setup.
- Debate over whether LLMs “know” anything: some call outputs mere statistical bullshit; others argue this mirrors fallible human memory, differing mainly in error‑checking.
Pricing, Subscriptions, and Monetization
- Some advise against long‑term subscriptions given rapid churn and strong free tiers; others pay for convenience and continuity of context.
- Complaints that AI pricing is stuck on flat subscriptions, leading to a race to the bottom; speculation that free tiers will shrink and ads will appear.
- Suspicion that Plus is being made worse to push users either to the high‑end $200 plan or an ad‑supported free tier.
Reactions to Gary Marcus and the Article
- Many see the piece as a low‑effort compilation of social‑media dunks with sensational framing, more about attacking Altman/OpenAI than technical analysis.
- Others defend the need for high‑profile skeptics to counter AGI hype and “internal AGI” claims, crediting Marcus with early emphasis on scaling limits and lack of robust reasoning.
- There’s strong disagreement over his track record: some say he’s been repeatedly vindicated on diminishing returns; others claim most of his short‑term predictions have been wrong and that better critics exist.
Hype, Expectations, and Broader AI Trajectory
- Multiple commenters highlight a gap between AGI‑adjacent marketing (“Death Star”, “internal AGI”) and the clearly incremental reality of GPT‑5.
- Some argue expectations for “GPT‑5” were impossible to meet once meme culture and OpenAI’s own hints took hold.
- Broader concerns: saturation of high‑quality training data, heavy reliance on synthetic data with risks of model collapse, and uncertainty whether scaling transformers alone can reach human‑level generality.
- Nevertheless, some report concrete productivity gains (especially in coding and research workflows) and see GPT‑5’s main significance in productization: speed, integration, tool use, and long‑horizon task handling rather than raw IQ.