GPT-5.4
Model capabilities & context window
- GPT‑5.4’s standout feature is a 1M+ token context window, but many note performance degrades beyond ~200–272k (“context rot”), and long‑context benchmarks fall off sharply.
- Some see 1M as mainly useful for niche tasks (reverse engineering, huge codebases, long cross‑file refactors, OS interaction tests), while others call it an anti‑pattern vs. better compaction and retrieval.
- OpenAI staff in the thread emphasize compaction + shorter effective context as the default; 1M is described as experimental and more costly.
Pricing & costs
- Base API pricing for GPT‑5.4 is seen as competitive vs Opus and Gemini; GPT‑5.4 Pro is widely viewed as extremely expensive ($30/M input, $180/M output).
- There’s confusion and later clarification that tokens beyond ~272k cost 2× input and 1.5× output for the full session.
- Several compare subscription value: many say Codex plans (even at $20) give far more usable work than Claude’s cheaper tiers.
Coding, agents & Codex vs 5.4
- Codex 5.3 is praised as a strong coding agent: better at implementation, database queries, and cybersecurity workflows than non‑Codex GPTs, often rivaling Claude Opus.
- Some report 5.4 feels like a meaningful upgrade for coding and planning; others say 5.3‑Codex is still superior on certain coding benchmarks (e.g., Terminal Bench) or more “intelligent” in agents.
- Multi‑agent workflows (Claude + Codex, etc.) are common; people highlight compaction control, AGENTS.md, and context management as major practical issues.
UI vs API & browser automation
- The Gmail “screenshot + coordinate clicking” demo triggers debate:
- Pro‑UI: not everything has full APIs; many services restrict API use; UI interactions are auditable and more universal for agents.
- Pro‑API: UI driving is brittle and inefficient; APIs are cleaner interfaces when available.
- Bot detection against GUI‑driven agents is noted as a continuation of the existing automation arms race.
Benchmarks, competition & product direction
- Many see benchmark gains as incremental and converging across frontier models; “products and harnesses, not raw models” are viewed as the real differentiator.
- Some feel GPT‑5.x writing style and instruction‑following regressed vs older models; others say 5.4 is more concise and less “cringe” than 5.3.
- Multiple commenters say they now prefer Claude, Gemini, or Qwen for specific tasks; others find Codex + 5.4 clearly better, especially for coding.
Ethics, militarization & user backlash
- The recent US DoD/military collaboration dominates sentiment for some: several cancel subscriptions, share “QuitGPT” links, or call OpenAI complicit in “mass murder” and surveillance.
- Safety card data showing a drop in “violence safety score” is seen as alarming by some, ambiguous by others.
- There is broader anxiety about AI empowering state and corporate violence vs. optimism about routing around “enshittified” platforms.