The Codex App

Model quality & comparisons

  • Many compare Codex to Claude Code and Gemini:
    • Some report Codex 5.2 (esp. Codex High) excels on backend / highly logical tasks and complex multi-step work, while struggling more with UI/frontend details.
    • Others say Opus 4.5 “wins” consistently on real-world codebases; Codex and Gemini are seen as slower and less “smart” for them.
    • A subset find Codex “lazy” or “stupid”: poor doc lookup, shallow research, ignoring instructions, reverting to old framework versions, or giving up early; they see Claude Code as faster, more reliable, and better at one‑shotting tasks.
    • There are also opposite anecdotes: Codex reliably fixing Claude’s mistakes, doing stronger code review, or solving problems Claude couldn’t.

Workflows & agent usage

  • Several treat Codex/Claude as mid‑level “ticket taker” engineers: humans write detailed specs and plans, agents do grunt work, humans review.
  • Split between:
    • “Plan-first” workflows (requirements → plan.md → review → execute) to avoid drift.
    • “Just do it” workflows where Codex is allowed to run longer, with users only supervising diffs.
  • Parallel/multi‑agent:
    • Codex app supports up to 4 agents per project; some already emulate this with multiple CLI sessions or tmux.
    • Advocates see value in parallelizing work; skeptics liken unsupervised agents to outsourcing that returns an unmaintainable mess.

Codex app vs IDEs and other orchestrators

  • Many ask why they should leave VS Code/Cursor/Claude Code + IDE integrations:
    • VS Code + Codex is still preferred for deep, hands-on coding; Codex app is pitched as a higher‑level supervisor for multiple agents/projects, with built‑in git, diffs, terminal, and automations.
    • Some dislike that code editing is de‑emphasized and prefer agent-in-sidebar + full IDE (e.g., Claude Code in Zed).
  • The app is compared to Emdash, Conductor, Antigravity, Opencode, Goose, etc.; some see it as OpenAI’s first‑party version of existing multi‑agent/worktree managers.

Platform, UI & implementation

  • Mac‑only (and initially ARM‑only) launch causes frustration from Windows/Linux users; OpenAI staff say Electron was chosen specifically to ship Windows/Linux soon, with Windows delayed by sandboxing.
  • Strong debate over Electron:
    • Critics see it as bloated, unprofessional for a company of this size, and symptomatic of “AI-built Electron slop.”
    • Defenders argue cross‑platform speed and shared web stack outweigh native UX, and most users won’t care.
  • Multiple complaints that the app feels unpolished and confusing; the demo game’s rough edges and sped‑up video are cited as bad optics.

Security & deployment

  • Some refuse to run Codex outside a VM; others point to Codex’s documented macOS/Linux sandbox and third‑party tools to further isolate the CLI.
  • Strong desire for:
    • Remote/self‑hosted targets (SSH/VM/servers) with good orchestration, not just local worktrees.
    • Seamless mobile handoff (phone as controller for a laptop/server session).

Limits, pricing & strategy

  • Free ChatGPT users temporarily get Codex; paid plans get doubled Codex limits. This is widely read as a competitive move against Claude Code.
  • Experiences with limits differ:
    • Some never hit Codex caps but regularly exhaust Claude’s usage.
    • Others hit Claude/Codex limits by running many agents in parallel and argue you “should” max out the subsidized compute.
  • Broader strategic thread: perception that model quality gains are slowing and labs are pivoting to vertical integration, lock‑in, and workflow tooling (agents, MCP, orchestration) rather than pure model advances.

Attitudes toward AI coding

  • Opinions range from enthusiastic (“ticket-taking coders are doomed; this lets one person do team‑sized projects”) to skeptical or hostile (“I don’t want to depend on AI; feels like busywork supervising fallible agents”).
  • Several emphasize AI’s sweet spot today as “code monkey for tedious plumbing,” not unsupervised greenfield development.