OpenAI dropped the price of o3 by 80%

Price Change and Access Limits

  • o3 API price dropped from $10→$2/M input tokens and $40→$8/M output (and cached input $2.5→$0.5/M), an 80% cut; still ~4x DeepSeek R1 for some use patterns.
  • ChatGPT Plus o3 message limits were raised (reports of 50→100→200/week), but many still find limits too tight for serious work.
  • Some point out the cut merely brings o3 in line with or below OpenAI’s own flagship models and closer to competitors like Gemini and Claude.

How Can an 80% Drop Happen?

  • Explanations floated:
    • Large initial margin and now matching intense competition (DeepSeek, Gemini 2.5, Claude).
    • Implementation of published efficiency tricks (e.g., from DeepSeek).
    • Major inference stack and kernel optimizations, batching, and better prompt/KV caching.
  • OpenAI staff repeatedly state: same o3 model, same weights, no quantization, no silent swaps; new variants would get new IDs. Email and release notes also frame it as pure inference optimization.

Quality, Quantization, and “Model Decay”

  • Many users feel models (OpenAI, Anthropic, Google) get worse over time; theories include:
    • Silent quantization.
    • Heavier safety/system prompts.
    • Personalization hurting quality.
    • Load-based throttling.
  • Others counter with benchmarks (e.g., Aider leaderboards) and argue it’s mostly psychology and shifting expectations.
  • OpenAI staff insist API snapshots are stable; benchmark differences were traced to different “reasoning effort” settings, not model changes.
  • Some still suspect downscaled or throttled behavior at busy times; this remains unresolved/unclear.

Reasoning Models, Naming, and Use Cases

  • Confusion over naming: o3 vs GPT‑4.1 vs 4o vs o4/o4‑mini, and “reasoning” vs “flagship for complex tasks.”
  • Clarifications in-thread:
    • o3 is a reasoning model; GPT‑4.1/4o are general models. Different trade‑offs (more internal tokens, higher latency).
    • o3‑pro is “based on o3,” slower and smarter, but not just “o3 with max effort.”
  • Some find o3/Opus disappointing for coding compared to Gemini or Claude Sonnet; others report the opposite, often blaming client-side context limits and tooling.

Infrastructure and Caching

  • Discussion of KV/prompt caching: caching shared instructions/context and reusing attention keys/values to cut cost and latency.
  • OpenAI’s prompt caching doc is cited; some worry about potential side‑channel or DoS via cache behavior.

Competition, Moats, and Business Model

  • Many see a “race to the bottom” on token prices with little moat at the model layer; commoditization expected.
  • Counterpoints:
    • Huge capex, brand/mindshare (ChatGPT ≈ “AI” for most people), and distribution are real moats.
    • Profits will come from higher-level services, workflows, and integrations, not raw tokens.
    • OpenAI’s reported revenue growth is seen by some as promising despite large losses; others doubt long‑term profitability if costs scale linearly with usage.

ID Verification, Biometrics, and Privacy

  • Accessing o3 via API now requires “organization verification,” implemented as personal ID+biometric verification through Persona.
  • Multiple commenters balk at:
    • Biometric collection, retention periods, and third‑party sharing.
    • Combining this with full chat logs and phone numbers.
  • Defenses offered: fraud/abuse prevention, KYC‑style requirements, and concern about foreign actors training on or abusing top models.
  • Several say this alone is enough to avoid o3, especially when strong alternatives (Gemini, Claude, open‑weights) exist.