GPT-4.5: "Not a frontier model"?

Pricing, Value, and Credits

  • Many see GPT‑4.5 as only a small quality bump over GPT‑4o with a huge price premium (15x), framing it as an experiment in what the market will tolerate for diminishing returns.
  • Some argue that in enterprise settings, even modest gains are worth high cost, especially if cheaper models remain available. Others say 4o is actually more “enterprise‑compatible.”
  • API credit expiration after one year is widely criticized; commenters note it forces “use it or lose it” behavior and creates accounting liabilities, but still feels user‑hostile.

Scaling Limits and “Frontier” Status

  • Strong sentiment that GPT‑4.5 illustrates scaling hitting a wall: reportedly much larger and more expensive to run, but only “subtly better,” especially versus expectations.
  • Several see this as the end of the rapid “sprint” phase of LLM progress; expect slower, marginal improvements and more focus on techniques like reasoning at runtime, tools, and RL/verification.
  • Others counter that 4.5 is meaningfully better in softer dimensions (humor, tone, grounding, interpreting nuance) that benchmarks don’t capture well.

Comparisons to Competitors and Distillation

  • For coding, many say Anthropic’s Claude 3.5/3.7 or DeepSeek R1‑derived models are now preferred; for cheap non‑coding API, Gemini 2.0 Flash is cited as strong.
  • There’s debate over whether “lightweight” open or cheap models are distilling from OpenAI outputs; technically possible via behavior, but opaque given withheld chain‑of‑thought.
  • Speculation that 4.5 (codenamed Orion) is a very large MoE model used mainly as a teacher for future distilled models; parameter counts in the thread conflict and are acknowledged as uncertain.

Real‑World Use and Behavior

  • Some users find 4.5 clearly better for:
    • Business decisions and high‑level advice
    • Capturing tone and subtle, implied constraints
    • Creative writing and songwriting, with less prompt‑wrangling
    • Staying closer to “reality” and hallucinating less, especially on short tasks
  • Others report that for structured tasks (maps, tooling) it still fails or requires classic “add tools/APIs” engineering, reinforcing the view that productization matters more than raw IQ.
  • Knowledge cutoff at Oct 2023 is noted and interpreted as evidence the model is older; others point out all recent OpenAI models share that cutoff, possibly to avoid AI‑generated web slop.

Trust, Sources, and Creative Content

  • LLMs are described as “Google 2.0”: fantastic for exploration and pointing to what you don’t know, but not authoritative.
  • Strong concern that LLMs usually don’t expose true training sources; newer models/tools can surface web citations, but this is tool‑layer search, not transparent provenance.
  • Several argue AI‑generated creative writing should be clearly labeled; others say enjoyment doesn’t require human authorship and see legal requirements as overreach.

OpenAI’s Strategic Position

  • Some claim OpenAI is no longer leading, with innovation coming from elsewhere; others argue they still set the baseline and their models are widely distilled and emulated.
  • A recurring view: technical gaps are narrowing; future advantage may come more from ecosystem, integrations, and brand (“ChatGPT” as verb) than raw model superiority.
  • Releasing 4.5 is seen by some as a PR misstep that raises expectations without delivering a “next big thing,” but others welcome getting access to a model that might otherwise stay internal.

General vs Specialized Models

  • One camp insists many small domain‑specific models will ultimately beat a single general model for efficiency; another invokes the “bitter lesson” that large generalists often outperform specialists.
  • Consensus: for many practical applications, LLMs need to be combined with tools, APIs, and traditional systems—pure “general intelligence” alone doesn’t yet solve real workflows.