Addendum to GPT-5 system card: GPT-5-Codex

Perceived quality of GPT‑5‑Codex

  • Many users report GPT‑5‑Codex as the best coding model they’ve used, often surpassing GPT‑5 “normal” and Anthropic’s models for real-world coding and refactoring.
  • Praised for:
    • Strong long‑context handling and “research” on codebases.
    • Better tool‑calling, especially knowing when to search or inspect code.
    • Clean, minimalist code generation that follows instructions closely.
  • Criticisms:
    • “Lazy” behavior: frequently stops after a few steps and asks to continue, even when told to run to completion.
    • Occasional severe context degradation near the top of the window (repeating steps, getting stuck), forcing manual compaction or careful planning.

Comparisons: Claude Code, Gemini, Cursor, JetBrains

  • Several long‑time Claude Code users say Codex caused their Claude usage to drop to near zero, citing:
    • Recent quality regressions and “lobotomization” of Claude Code.
    • Claude’s tendency to ramble, over‑scaffold, or confidently pass failing code.
  • Gemini is widely described as:
    • Having strong raw models but poor tooling/clients and flaky reliability.
    • Particularly bad at “agentic” coding; breaks code while insisting tasks are done.
  • Cursor is commended for UX and a “privacy mode,” but some argue its privacy guarantees aren’t meaningfully better than OpenAI’s data controls.
  • JetBrains’ AI (backed by GPT‑5) burns through quota fast, leading to speculation about Codex’s current pricing sustainability.

Tooling, UX, and environment

  • Codex CLI and VS Code integration receive mixed reviews:
    • Strong context management and steady feature updates, but some dislike the Rust TUI and lack of fine‑grained edit approval compared to Claude Code.
    • Tabs vs spaces issues (e.g., Go files) make diffs noisy; several argue formatting should be handled by post‑processing hooks rather than prompts.
  • Debate over running code in real containers vs. “*nix emulation” in‑model:
    • One side insists real execution environments are essential and inexpensive (containers ≈ processes).
    • The other worries about scaling overhead for thousands of short‑lived agents, suggesting lighter‑weight approaches.

Pricing, limits, and availability

  • Confusion around rate limits: some new API users hit limits quickly; OpenAI staff note recent limit increases and clarify Pro users are not silently switched to per‑token billing.
  • GPT‑5‑Codex is currently available in Codex products (CLI, VS Code, Codex Cloud), with API access “coming soon.”
  • Some users hit hard daily limits in Codex IDE, pushing them toward higher‑tier plans.

Technical notes and benchmarks

  • New GPT‑5‑Codex uses a significantly smaller system prompt in Codex CLI, with internal benchmarks showing big gains on refactors vs. standard GPT‑5.
  • Some distrust SWE‑bench scores and rely more on hands‑on experience and workflow‑specific evals.