GPT-5.4

Model capabilities & context window

  • GPT‑5.4’s standout feature is a 1M+ token context window, but many note performance degrades beyond ~200–272k (“context rot”), and long‑context benchmarks fall off sharply.
  • Some see 1M as mainly useful for niche tasks (reverse engineering, huge codebases, long cross‑file refactors, OS interaction tests), while others call it an anti‑pattern vs. better compaction and retrieval.
  • OpenAI staff in the thread emphasize compaction + shorter effective context as the default; 1M is described as experimental and more costly.

Pricing & costs

  • Base API pricing for GPT‑5.4 is seen as competitive vs Opus and Gemini; GPT‑5.4 Pro is widely viewed as extremely expensive ($30/M input, $180/M output).
  • There’s confusion and later clarification that tokens beyond ~272k cost 2× input and 1.5× output for the full session.
  • Several compare subscription value: many say Codex plans (even at $20) give far more usable work than Claude’s cheaper tiers.

Coding, agents & Codex vs 5.4

  • Codex 5.3 is praised as a strong coding agent: better at implementation, database queries, and cybersecurity workflows than non‑Codex GPTs, often rivaling Claude Opus.
  • Some report 5.4 feels like a meaningful upgrade for coding and planning; others say 5.3‑Codex is still superior on certain coding benchmarks (e.g., Terminal Bench) or more “intelligent” in agents.
  • Multi‑agent workflows (Claude + Codex, etc.) are common; people highlight compaction control, AGENTS.md, and context management as major practical issues.

UI vs API & browser automation

  • The Gmail “screenshot + coordinate clicking” demo triggers debate:
    • Pro‑UI: not everything has full APIs; many services restrict API use; UI interactions are auditable and more universal for agents.
    • Pro‑API: UI driving is brittle and inefficient; APIs are cleaner interfaces when available.
  • Bot detection against GUI‑driven agents is noted as a continuation of the existing automation arms race.

Benchmarks, competition & product direction

  • Many see benchmark gains as incremental and converging across frontier models; “products and harnesses, not raw models” are viewed as the real differentiator.
  • Some feel GPT‑5.x writing style and instruction‑following regressed vs older models; others say 5.4 is more concise and less “cringe” than 5.3.
  • Multiple commenters say they now prefer Claude, Gemini, or Qwen for specific tasks; others find Codex + 5.4 clearly better, especially for coding.

Ethics, militarization & user backlash

  • The recent US DoD/military collaboration dominates sentiment for some: several cancel subscriptions, share “QuitGPT” links, or call OpenAI complicit in “mass murder” and surveillance.
  • Safety card data showing a drop in “violence safety score” is seen as alarming by some, ambiguous by others.
  • There is broader anxiety about AI empowering state and corporate violence vs. optimism about routing around “enshittified” platforms.