2026-02-02

The Codex App

Model quality & comparisons

Many compare Codex to Claude Code and Gemini:
- Some report Codex 5.2 (esp. Codex High) excels on backend / highly logical tasks and complex multi-step work, while struggling more with UI/frontend details.
- Others say Opus 4.5 “wins” consistently on real-world codebases; Codex and Gemini are seen as slower and less “smart” for them.
- A subset find Codex “lazy” or “stupid”: poor doc lookup, shallow research, ignoring instructions, reverting to old framework versions, or giving up early; they see Claude Code as faster, more reliable, and better at one‑shotting tasks.
- There are also opposite anecdotes: Codex reliably fixing Claude’s mistakes, doing stronger code review, or solving problems Claude couldn’t.

Workflows & agent usage

Several treat Codex/Claude as mid‑level “ticket taker” engineers: humans write detailed specs and plans, agents do grunt work, humans review.
Split between:
- “Plan-first” workflows (requirements → plan.md → review → execute) to avoid drift.
- “Just do it” workflows where Codex is allowed to run longer, with users only supervising diffs.
Parallel/multi‑agent:
- Codex app supports up to 4 agents per project; some already emulate this with multiple CLI sessions or tmux.
- Advocates see value in parallelizing work; skeptics liken unsupervised agents to outsourcing that returns an unmaintainable mess.

Codex app vs IDEs and other orchestrators

Many ask why they should leave VS Code/Cursor/Claude Code + IDE integrations:
- VS Code + Codex is still preferred for deep, hands-on coding; Codex app is pitched as a higher‑level supervisor for multiple agents/projects, with built‑in git, diffs, terminal, and automations.
- Some dislike that code editing is de‑emphasized and prefer agent-in-sidebar + full IDE (e.g., Claude Code in Zed).
The app is compared to Emdash, Conductor, Antigravity, Opencode, Goose, etc.; some see it as OpenAI’s first‑party version of existing multi‑agent/worktree managers.

Platform, UI & implementation

Mac‑only (and initially ARM‑only) launch causes frustration from Windows/Linux users; OpenAI staff say Electron was chosen specifically to ship Windows/Linux soon, with Windows delayed by sandboxing.
Strong debate over Electron:
- Critics see it as bloated, unprofessional for a company of this size, and symptomatic of “AI-built Electron slop.”
- Defenders argue cross‑platform speed and shared web stack outweigh native UX, and most users won’t care.
Multiple complaints that the app feels unpolished and confusing; the demo game’s rough edges and sped‑up video are cited as bad optics.

Security & deployment

Some refuse to run Codex outside a VM; others point to Codex’s documented macOS/Linux sandbox and third‑party tools to further isolate the CLI.
Strong desire for:
- Remote/self‑hosted targets (SSH/VM/servers) with good orchestration, not just local worktrees.
- Seamless mobile handoff (phone as controller for a laptop/server session).

Limits, pricing & strategy

Free ChatGPT users temporarily get Codex; paid plans get doubled Codex limits. This is widely read as a competitive move against Claude Code.
Experiences with limits differ:
- Some never hit Codex caps but regularly exhaust Claude’s usage.
- Others hit Claude/Codex limits by running many agents in parallel and argue you “should” max out the subsidized compute.
Broader strategic thread: perception that model quality gains are slowing and labs are pivoting to vertical integration, lock‑in, and workflow tooling (agents, MCP, orchestration) rather than pure model advances.

Attitudes toward AI coding

Opinions range from enthusiastic (“ticket-taking coders are doomed; this lets one person do team‑sized projects”) to skeptical or hostile (“I don’t want to depend on AI; feels like busywork supervising fallible agents”).
Several emphasize AI’s sweet spot today as “code monkey for tedious plumbing,” not unsupervised greenfield development.

Related topics