GPT-5-Codex
Model Improvements & Benchmarks
- GPT‑5‑Codex is seen as an incremental but meaningful upgrade: modest gain on SWE‑Bench vs GPT‑5, but large jump on OpenAI’s internal refactor benchmark (≈34% → 51%).
- Users report better behavior on large refactors (fewer destructive rewrites, better handling of package restructuring), though file moves and deletes are still brittle.
- Some notice the system prompt is now much smaller, suggesting more behavior is baked into the model, not instructions.
Token Efficiency, Speed & Reasoning Effort
- The big advertised win is fewer internal tokens on simple tasks; people like the idea of less “performative” overthinking and boilerplate.
- In practice, many find GPT‑5‑Codex slow, especially at high reasoning effort—sometimes minutes per task and borderline unusable on launch day.
- Others report that medium effort with reduced rambling actually feels faster overall, but token/sec has fluctuated since rollout.
Steerability & Prompting Style
- GPT‑5‑Codex is viewed as highly “steerable”: follows instructions closely, doesn’t eagerly do extra work unless asked.
- This is praised by experienced devs (especially for refactors in existing codebases) but seen as a drawback for “vibe coding” and sparse prompts.
- Some suggest a two-step workflow (plan, then build) and even persona docs (AGENTS/GEMINI/CLAUDE.md style) to get the best results.
Tool Comparisons (Claude, Gemini, Grok, Aider, Cursor)
- Several users say Codex+GPT‑5 has surpassed Claude Code for serious work, especially on large repos and refactors.
- There’s a strong perception that Claude models recently regressed: more fake/mocked implementations, “yes‑man” behavior, and low quotas.
- Gemini CLI is polarizing: some think it’s terrible for coding agents and harms Gemini’s reputation; others get good results with careful configuration docs.
- Grok‑code‑fast‑1 is praised as fast/cheap in Cursor, with Codex/GPT used when “more brain” is needed.
- Aider remains liked for precise edits; multi‑step agent flows in Codex/Claude are preferred for larger tasks by some, dismissed by others.
UX, Integrations & Access
- Codex now ties into ChatGPT subscriptions (including VS Code extension and mobile app), which many find good value and more generous than Claude quotas.
- Users complain about product fragmentation: differing behaviors and features across CLI, VS Code, web, GitHub integration, and mobile (with iOS ahead of Android).
- Code review as a GitHub Action / PR bot is seen as one of the best UX patterns; Codex’s current flow (comment‑triggered) is less automatic than Claude’s but can be scripted via CLI.
Installation, Limits & Workflows
- Some hit npm install issues (e.g., Node feature support) and call that “not ready”; others point to high weekly downloads and suggest environment fixes.
- People want clearer visibility into usage limits to avoid sudden lockouts; Codex quotas feel high to some, unknown/opaque to others.
- Effective usage patterns described:
- Using multiple parallel tasks/agents to hide latency, especially in the web UI where Codex manages branches/PRs.
- Letting Codex handle large refactors or integration work while humans handle mechanical file moves and test-running.
- Structuring work so agents don’t step on each other; on bare repos, users struggle more with conflicting parallel PRs and duplicated scaffolding.
General Sentiment
- Many long‑time Claude/Cursor users are experimenting with or migrating to Codex due to perceived quality and quota advantages.
- Others remain frustrated by slow performance, poor UX around manual approvals, and the learning curve for effective multi‑agent workflows.