The Codex App
Model quality & comparisons
- Many compare Codex to Claude Code and Gemini:
- Some report Codex 5.2 (esp. Codex High) excels on backend / highly logical tasks and complex multi-step work, while struggling more with UI/frontend details.
- Others say Opus 4.5 “wins” consistently on real-world codebases; Codex and Gemini are seen as slower and less “smart” for them.
- A subset find Codex “lazy” or “stupid”: poor doc lookup, shallow research, ignoring instructions, reverting to old framework versions, or giving up early; they see Claude Code as faster, more reliable, and better at one‑shotting tasks.
- There are also opposite anecdotes: Codex reliably fixing Claude’s mistakes, doing stronger code review, or solving problems Claude couldn’t.
Workflows & agent usage
- Several treat Codex/Claude as mid‑level “ticket taker” engineers: humans write detailed specs and plans, agents do grunt work, humans review.
- Split between:
- “Plan-first” workflows (requirements → plan.md → review → execute) to avoid drift.
- “Just do it” workflows where Codex is allowed to run longer, with users only supervising diffs.
- Parallel/multi‑agent:
- Codex app supports up to 4 agents per project; some already emulate this with multiple CLI sessions or tmux.
- Advocates see value in parallelizing work; skeptics liken unsupervised agents to outsourcing that returns an unmaintainable mess.
Codex app vs IDEs and other orchestrators
- Many ask why they should leave VS Code/Cursor/Claude Code + IDE integrations:
- VS Code + Codex is still preferred for deep, hands-on coding; Codex app is pitched as a higher‑level supervisor for multiple agents/projects, with built‑in git, diffs, terminal, and automations.
- Some dislike that code editing is de‑emphasized and prefer agent-in-sidebar + full IDE (e.g., Claude Code in Zed).
- The app is compared to Emdash, Conductor, Antigravity, Opencode, Goose, etc.; some see it as OpenAI’s first‑party version of existing multi‑agent/worktree managers.
Platform, UI & implementation
- Mac‑only (and initially ARM‑only) launch causes frustration from Windows/Linux users; OpenAI staff say Electron was chosen specifically to ship Windows/Linux soon, with Windows delayed by sandboxing.
- Strong debate over Electron:
- Critics see it as bloated, unprofessional for a company of this size, and symptomatic of “AI-built Electron slop.”
- Defenders argue cross‑platform speed and shared web stack outweigh native UX, and most users won’t care.
- Multiple complaints that the app feels unpolished and confusing; the demo game’s rough edges and sped‑up video are cited as bad optics.
Security & deployment
- Some refuse to run Codex outside a VM; others point to Codex’s documented macOS/Linux sandbox and third‑party tools to further isolate the CLI.
- Strong desire for:
- Remote/self‑hosted targets (SSH/VM/servers) with good orchestration, not just local worktrees.
- Seamless mobile handoff (phone as controller for a laptop/server session).
Limits, pricing & strategy
- Free ChatGPT users temporarily get Codex; paid plans get doubled Codex limits. This is widely read as a competitive move against Claude Code.
- Experiences with limits differ:
- Some never hit Codex caps but regularly exhaust Claude’s usage.
- Others hit Claude/Codex limits by running many agents in parallel and argue you “should” max out the subsidized compute.
- Broader strategic thread: perception that model quality gains are slowing and labs are pivoting to vertical integration, lock‑in, and workflow tooling (agents, MCP, orchestration) rather than pure model advances.
Attitudes toward AI coding
- Opinions range from enthusiastic (“ticket-taking coders are doomed; this lets one person do team‑sized projects”) to skeptical or hostile (“I don’t want to depend on AI; feels like busywork supervising fallible agents”).
- Several emphasize AI’s sweet spot today as “code monkey for tedious plumbing,” not unsupervised greenfield development.