GPT-4.5: "Not a frontier model"?
Pricing, Value, and Credits
- Many see GPT‑4.5 as only a small quality bump over GPT‑4o with a huge price premium (15x), framing it as an experiment in what the market will tolerate for diminishing returns.
- Some argue that in enterprise settings, even modest gains are worth high cost, especially if cheaper models remain available. Others say 4o is actually more “enterprise‑compatible.”
- API credit expiration after one year is widely criticized; commenters note it forces “use it or lose it” behavior and creates accounting liabilities, but still feels user‑hostile.
Scaling Limits and “Frontier” Status
- Strong sentiment that GPT‑4.5 illustrates scaling hitting a wall: reportedly much larger and more expensive to run, but only “subtly better,” especially versus expectations.
- Several see this as the end of the rapid “sprint” phase of LLM progress; expect slower, marginal improvements and more focus on techniques like reasoning at runtime, tools, and RL/verification.
- Others counter that 4.5 is meaningfully better in softer dimensions (humor, tone, grounding, interpreting nuance) that benchmarks don’t capture well.
Comparisons to Competitors and Distillation
- For coding, many say Anthropic’s Claude 3.5/3.7 or DeepSeek R1‑derived models are now preferred; for cheap non‑coding API, Gemini 2.0 Flash is cited as strong.
- There’s debate over whether “lightweight” open or cheap models are distilling from OpenAI outputs; technically possible via behavior, but opaque given withheld chain‑of‑thought.
- Speculation that 4.5 (codenamed Orion) is a very large MoE model used mainly as a teacher for future distilled models; parameter counts in the thread conflict and are acknowledged as uncertain.
Real‑World Use and Behavior
- Some users find 4.5 clearly better for:
- Business decisions and high‑level advice
- Capturing tone and subtle, implied constraints
- Creative writing and songwriting, with less prompt‑wrangling
- Staying closer to “reality” and hallucinating less, especially on short tasks
- Others report that for structured tasks (maps, tooling) it still fails or requires classic “add tools/APIs” engineering, reinforcing the view that productization matters more than raw IQ.
- Knowledge cutoff at Oct 2023 is noted and interpreted as evidence the model is older; others point out all recent OpenAI models share that cutoff, possibly to avoid AI‑generated web slop.
Trust, Sources, and Creative Content
- LLMs are described as “Google 2.0”: fantastic for exploration and pointing to what you don’t know, but not authoritative.
- Strong concern that LLMs usually don’t expose true training sources; newer models/tools can surface web citations, but this is tool‑layer search, not transparent provenance.
- Several argue AI‑generated creative writing should be clearly labeled; others say enjoyment doesn’t require human authorship and see legal requirements as overreach.
OpenAI’s Strategic Position
- Some claim OpenAI is no longer leading, with innovation coming from elsewhere; others argue they still set the baseline and their models are widely distilled and emulated.
- A recurring view: technical gaps are narrowing; future advantage may come more from ecosystem, integrations, and brand (“ChatGPT” as verb) than raw model superiority.
- Releasing 4.5 is seen by some as a PR misstep that raises expectations without delivering a “next big thing,” but others welcome getting access to a model that might otherwise stay internal.
General vs Specialized Models
- One camp insists many small domain‑specific models will ultimately beat a single general model for efficiency; another invokes the “bitter lesson” that large generalists often outperform specialists.
- Consensus: for many practical applications, LLMs need to be combined with tools, APIs, and traditional systems—pure “general intelligence” alone doesn’t yet solve real workflows.