2026-03-05

GPT-5.4

Model capabilities & context window

GPT‑5.4’s standout feature is a 1M+ token context window, but many note performance degrades beyond ~200–272k (“context rot”), and long‑context benchmarks fall off sharply.
Some see 1M as mainly useful for niche tasks (reverse engineering, huge codebases, long cross‑file refactors, OS interaction tests), while others call it an anti‑pattern vs. better compaction and retrieval.
OpenAI staff in the thread emphasize compaction + shorter effective context as the default; 1M is described as experimental and more costly.

Pricing & costs

Base API pricing for GPT‑5.4 is seen as competitive vs Opus and Gemini; GPT‑5.4 Pro is widely viewed as extremely expensive ($30/M input, $180/M output).
There’s confusion and later clarification that tokens beyond ~272k cost 2× input and 1.5× output for the full session.
Several compare subscription value: many say Codex plans (even at $20) give far more usable work than Claude’s cheaper tiers.

Coding, agents & Codex vs 5.4

Codex 5.3 is praised as a strong coding agent: better at implementation, database queries, and cybersecurity workflows than non‑Codex GPTs, often rivaling Claude Opus.
Some report 5.4 feels like a meaningful upgrade for coding and planning; others say 5.3‑Codex is still superior on certain coding benchmarks (e.g., Terminal Bench) or more “intelligent” in agents.
Multi‑agent workflows (Claude + Codex, etc.) are common; people highlight compaction control, AGENTS.md, and context management as major practical issues.

UI vs API & browser automation

The Gmail “screenshot + coordinate clicking” demo triggers debate:
- Pro‑UI: not everything has full APIs; many services restrict API use; UI interactions are auditable and more universal for agents.
- Pro‑API: UI driving is brittle and inefficient; APIs are cleaner interfaces when available.
Bot detection against GUI‑driven agents is noted as a continuation of the existing automation arms race.

Benchmarks, competition & product direction

Many see benchmark gains as incremental and converging across frontier models; “products and harnesses, not raw models” are viewed as the real differentiator.
Some feel GPT‑5.x writing style and instruction‑following regressed vs older models; others say 5.4 is more concise and less “cringe” than 5.3.
Multiple commenters say they now prefer Claude, Gemini, or Qwen for specific tasks; others find Codex + 5.4 clearly better, especially for coding.

Ethics, militarization & user backlash

The recent US DoD/military collaboration dominates sentiment for some: several cancel subscriptions, share “QuitGPT” links, or call OpenAI complicit in “mass murder” and surveillance.
Safety card data showing a drop in “violence safety score” is seen as alarming by some, ambiguous by others.
There is broader anxiety about AI empowering state and corporate violence vs. optimism about routing around “enshittified” platforms.

Related topics