2025-09-15

Addendum to GPT-5 system card: GPT-5-Codex

Perceived quality of GPT‑5‑Codex

Many users report GPT‑5‑Codex as the best coding model they’ve used, often surpassing GPT‑5 “normal” and Anthropic’s models for real-world coding and refactoring.
Praised for:
- Strong long‑context handling and “research” on codebases.
- Better tool‑calling, especially knowing when to search or inspect code.
- Clean, minimalist code generation that follows instructions closely.
Criticisms:
- “Lazy” behavior: frequently stops after a few steps and asks to continue, even when told to run to completion.
- Occasional severe context degradation near the top of the window (repeating steps, getting stuck), forcing manual compaction or careful planning.

Comparisons: Claude Code, Gemini, Cursor, JetBrains

Several long‑time Claude Code users say Codex caused their Claude usage to drop to near zero, citing:
- Recent quality regressions and “lobotomization” of Claude Code.
- Claude’s tendency to ramble, over‑scaffold, or confidently pass failing code.
Gemini is widely described as:
- Having strong raw models but poor tooling/clients and flaky reliability.
- Particularly bad at “agentic” coding; breaks code while insisting tasks are done.
Cursor is commended for UX and a “privacy mode,” but some argue its privacy guarantees aren’t meaningfully better than OpenAI’s data controls.
JetBrains’ AI (backed by GPT‑5) burns through quota fast, leading to speculation about Codex’s current pricing sustainability.

Tooling, UX, and environment

Codex CLI and VS Code integration receive mixed reviews:
- Strong context management and steady feature updates, but some dislike the Rust TUI and lack of fine‑grained edit approval compared to Claude Code.
- Tabs vs spaces issues (e.g., Go files) make diffs noisy; several argue formatting should be handled by post‑processing hooks rather than prompts.
Debate over running code in real containers vs. “*nix emulation” in‑model:
- One side insists real execution environments are essential and inexpensive (containers ≈ processes).
- The other worries about scaling overhead for thousands of short‑lived agents, suggesting lighter‑weight approaches.

Pricing, limits, and availability

Confusion around rate limits: some new API users hit limits quickly; OpenAI staff note recent limit increases and clarify Pro users are not silently switched to per‑token billing.
GPT‑5‑Codex is currently available in Codex products (CLI, VS Code, Codex Cloud), with API access “coming soon.”
Some users hit hard daily limits in Codex IDE, pushing them toward higher‑tier plans.

Technical notes and benchmarks

New GPT‑5‑Codex uses a significantly smaller system prompt in Codex CLI, with internal benchmarks showing big gains on refactors vs. standard GPT‑5.
Some distrust SWE‑bench scores and rely more on hands‑on experience and workflow‑specific evals.

Related topics