OpenAI dropped the price of o3 by 80%
Price Change and Access Limits
- o3 API price dropped from $10→$2/M input tokens and $40→$8/M output (and cached input $2.5→$0.5/M), an 80% cut; still ~4x DeepSeek R1 for some use patterns.
- ChatGPT Plus o3 message limits were raised (reports of 50→100→200/week), but many still find limits too tight for serious work.
- Some point out the cut merely brings o3 in line with or below OpenAI’s own flagship models and closer to competitors like Gemini and Claude.
How Can an 80% Drop Happen?
- Explanations floated:
- Large initial margin and now matching intense competition (DeepSeek, Gemini 2.5, Claude).
- Implementation of published efficiency tricks (e.g., from DeepSeek).
- Major inference stack and kernel optimizations, batching, and better prompt/KV caching.
- OpenAI staff repeatedly state: same o3 model, same weights, no quantization, no silent swaps; new variants would get new IDs. Email and release notes also frame it as pure inference optimization.
Quality, Quantization, and “Model Decay”
- Many users feel models (OpenAI, Anthropic, Google) get worse over time; theories include:
- Silent quantization.
- Heavier safety/system prompts.
- Personalization hurting quality.
- Load-based throttling.
- Others counter with benchmarks (e.g., Aider leaderboards) and argue it’s mostly psychology and shifting expectations.
- OpenAI staff insist API snapshots are stable; benchmark differences were traced to different “reasoning effort” settings, not model changes.
- Some still suspect downscaled or throttled behavior at busy times; this remains unresolved/unclear.
Reasoning Models, Naming, and Use Cases
- Confusion over naming: o3 vs GPT‑4.1 vs 4o vs o4/o4‑mini, and “reasoning” vs “flagship for complex tasks.”
- Clarifications in-thread:
- o3 is a reasoning model; GPT‑4.1/4o are general models. Different trade‑offs (more internal tokens, higher latency).
- o3‑pro is “based on o3,” slower and smarter, but not just “o3 with max effort.”
- Some find o3/Opus disappointing for coding compared to Gemini or Claude Sonnet; others report the opposite, often blaming client-side context limits and tooling.
Infrastructure and Caching
- Discussion of KV/prompt caching: caching shared instructions/context and reusing attention keys/values to cut cost and latency.
- OpenAI’s prompt caching doc is cited; some worry about potential side‑channel or DoS via cache behavior.
Competition, Moats, and Business Model
- Many see a “race to the bottom” on token prices with little moat at the model layer; commoditization expected.
- Counterpoints:
- Huge capex, brand/mindshare (ChatGPT ≈ “AI” for most people), and distribution are real moats.
- Profits will come from higher-level services, workflows, and integrations, not raw tokens.
- OpenAI’s reported revenue growth is seen by some as promising despite large losses; others doubt long‑term profitability if costs scale linearly with usage.
ID Verification, Biometrics, and Privacy
- Accessing o3 via API now requires “organization verification,” implemented as personal ID+biometric verification through Persona.
- Multiple commenters balk at:
- Biometric collection, retention periods, and third‑party sharing.
- Combining this with full chat logs and phone numbers.
- Defenses offered: fraud/abuse prevention, KYC‑style requirements, and concern about foreign actors training on or abusing top models.
- Several say this alone is enough to avoid o3, especially when strong alternatives (Gemini, Claude, open‑weights) exist.