Addendum to GPT-5 system card: GPT-5-Codex
Perceived quality of GPT‑5‑Codex
- Many users report GPT‑5‑Codex as the best coding model they’ve used, often surpassing GPT‑5 “normal” and Anthropic’s models for real-world coding and refactoring.
- Praised for:
- Strong long‑context handling and “research” on codebases.
- Better tool‑calling, especially knowing when to search or inspect code.
- Clean, minimalist code generation that follows instructions closely.
- Criticisms:
- “Lazy” behavior: frequently stops after a few steps and asks to continue, even when told to run to completion.
- Occasional severe context degradation near the top of the window (repeating steps, getting stuck), forcing manual compaction or careful planning.
Comparisons: Claude Code, Gemini, Cursor, JetBrains
- Several long‑time Claude Code users say Codex caused their Claude usage to drop to near zero, citing:
- Recent quality regressions and “lobotomization” of Claude Code.
- Claude’s tendency to ramble, over‑scaffold, or confidently pass failing code.
- Gemini is widely described as:
- Having strong raw models but poor tooling/clients and flaky reliability.
- Particularly bad at “agentic” coding; breaks code while insisting tasks are done.
- Cursor is commended for UX and a “privacy mode,” but some argue its privacy guarantees aren’t meaningfully better than OpenAI’s data controls.
- JetBrains’ AI (backed by GPT‑5) burns through quota fast, leading to speculation about Codex’s current pricing sustainability.
Tooling, UX, and environment
- Codex CLI and VS Code integration receive mixed reviews:
- Strong context management and steady feature updates, but some dislike the Rust TUI and lack of fine‑grained edit approval compared to Claude Code.
- Tabs vs spaces issues (e.g., Go files) make diffs noisy; several argue formatting should be handled by post‑processing hooks rather than prompts.
- Debate over running code in real containers vs. “*nix emulation” in‑model:
- One side insists real execution environments are essential and inexpensive (containers ≈ processes).
- The other worries about scaling overhead for thousands of short‑lived agents, suggesting lighter‑weight approaches.
Pricing, limits, and availability
- Confusion around rate limits: some new API users hit limits quickly; OpenAI staff note recent limit increases and clarify Pro users are not silently switched to per‑token billing.
- GPT‑5‑Codex is currently available in Codex products (CLI, VS Code, Codex Cloud), with API access “coming soon.”
- Some users hit hard daily limits in Codex IDE, pushing them toward higher‑tier plans.
Technical notes and benchmarks
- New GPT‑5‑Codex uses a significantly smaller system prompt in Codex CLI, with internal benchmarks showing big gains on refactors vs. standard GPT‑5.
- Some distrust SWE‑bench scores and rely more on hands‑on experience and workflow‑specific evals.