2025-03-02

GPT-4.5: "Not a frontier model"?

Pricing, Value, and Credits

Many see GPT‑4.5 as only a small quality bump over GPT‑4o with a huge price premium (15x), framing it as an experiment in what the market will tolerate for diminishing returns.
Some argue that in enterprise settings, even modest gains are worth high cost, especially if cheaper models remain available. Others say 4o is actually more “enterprise‑compatible.”
API credit expiration after one year is widely criticized; commenters note it forces “use it or lose it” behavior and creates accounting liabilities, but still feels user‑hostile.

Scaling Limits and “Frontier” Status

Strong sentiment that GPT‑4.5 illustrates scaling hitting a wall: reportedly much larger and more expensive to run, but only “subtly better,” especially versus expectations.
Several see this as the end of the rapid “sprint” phase of LLM progress; expect slower, marginal improvements and more focus on techniques like reasoning at runtime, tools, and RL/verification.
Others counter that 4.5 is meaningfully better in softer dimensions (humor, tone, grounding, interpreting nuance) that benchmarks don’t capture well.

Comparisons to Competitors and Distillation

For coding, many say Anthropic’s Claude 3.5/3.7 or DeepSeek R1‑derived models are now preferred; for cheap non‑coding API, Gemini 2.0 Flash is cited as strong.
There’s debate over whether “lightweight” open or cheap models are distilling from OpenAI outputs; technically possible via behavior, but opaque given withheld chain‑of‑thought.
Speculation that 4.5 (codenamed Orion) is a very large MoE model used mainly as a teacher for future distilled models; parameter counts in the thread conflict and are acknowledged as uncertain.

Real‑World Use and Behavior

Some users find 4.5 clearly better for:
- Business decisions and high‑level advice
- Capturing tone and subtle, implied constraints
- Creative writing and songwriting, with less prompt‑wrangling
- Staying closer to “reality” and hallucinating less, especially on short tasks
Others report that for structured tasks (maps, tooling) it still fails or requires classic “add tools/APIs” engineering, reinforcing the view that productization matters more than raw IQ.
Knowledge cutoff at Oct 2023 is noted and interpreted as evidence the model is older; others point out all recent OpenAI models share that cutoff, possibly to avoid AI‑generated web slop.

Trust, Sources, and Creative Content

LLMs are described as “Google 2.0”: fantastic for exploration and pointing to what you don’t know, but not authoritative.
Strong concern that LLMs usually don’t expose true training sources; newer models/tools can surface web citations, but this is tool‑layer search, not transparent provenance.
Several argue AI‑generated creative writing should be clearly labeled; others say enjoyment doesn’t require human authorship and see legal requirements as overreach.

OpenAI’s Strategic Position

Some claim OpenAI is no longer leading, with innovation coming from elsewhere; others argue they still set the baseline and their models are widely distilled and emulated.
A recurring view: technical gaps are narrowing; future advantage may come more from ecosystem, integrations, and brand (“ChatGPT” as verb) than raw model superiority.
Releasing 4.5 is seen by some as a PR misstep that raises expectations without delivering a “next big thing,” but others welcome getting access to a model that might otherwise stay internal.

General vs Specialized Models

One camp insists many small domain‑specific models will ultimately beat a single general model for efficiency; another invokes the “bitter lesson” that large generalists often outperform specialists.
Consensus: for many practical applications, LLMs need to be combined with tools, APIs, and traditional systems—pure “general intelligence” alone doesn’t yet solve real workflows.

Related topics