2025-11-24

Claude Opus 4.5

Pricing, Token Usage, and Limits

Opus 4.5’s $5 / $25 per million input/output tokens is roughly a 3× cut from Opus 4.1 and near Gemini 3 Pro pricing; many see this as the most important part of the launch.
Several users note that Opus 4.5 often uses far fewer tokens than Sonnet 4.5 for the same coding task, so cost per task can be lower despite higher per‑token price. Others complain Claude in general is extremely verbose and wastes output tokens.
Opus‑specific caps in Claude/Claude Code have been removed; Max and Team limits were raised so Opus now effectively replaces Sonnet at similar total token budgets. Some still feel Anthropic’s quotas are much tighter than OpenAI or Gemini.
Practical cost comparisons from agent builders suggest Opus 4.5 can be roughly on par with or cheaper than Gemini 3 Pro per successful thread, but Gemini’s nominal per‑token price remains lower.

Perceived Quality, Degradation, and “Nerf Cycle”

Multiple users report Sonnet 4.5 feeling “dumber” or more erratic in the last weeks, especially in Claude Code and CLI; hypotheses include overload, quiet model swaps/quantization, or more aggressive routing to cheaper variants.
Others argue this is largely psychological and not reflected in benchmarks, or attributable to bugs Anthropic has previously acknowledged.
A recurring narrative: new models launch strong, then gradually feel worse, then a new version appears; some now judge vendors by “nerf cycles” and complain about lack of transparent, continuous public benchmarks for specific model versions.

Coding Performance and Workflows

Heavy Claude Code users report Opus 4.5 feels faster than 4.1 and noticeably stronger than Sonnet 4.5 at planning, multi-file refactors, and complex bug hunting. Some say it finally resolves problems that stumped earlier models.
Others still prefer Gemini 3 or GPT‑5.1 Codex for certain debugging or large‑architecture tasks, but often pair models: e.g., Gemini for high‑level design, Claude/Sonnet/Composer for implementation.
Sub‑agent and tool‑calling workflows are a major theme: users wire Claude to other models (Codex, Gemini) to cross‑check plans, or rely on Claude Code’s built‑in editor tools and MCP ecosystem. Some note that when these agentic flows go wrong, they burn huge token budgets.
Haiku 4.5 gets mixed reviews: very fast and cheap, but several say it misdiagnoses nontrivial bugs and falls short of Sonnet‑level reasoning.

Benchmarks, Charts, and Evaluation

Anthropic’s focus on SWE‑bench Verified is welcomed by people invested in agentic coding, but others think it makes Opus look like a one‑trick pony.
Many criticize the blog’s charts for truncated y‑axes and omission of Haiku, calling them “chart crimes” and marketing‑driven.
There’s concern that SWE‑bench is nearing saturation and that models are increasingly tuned to public benchmarks; several want per‑task cost metrics, failure‑mode analysis by issue type, and new evals that reflect long‑horizon, real‑world coding.

Competition, Safety, and Trust

Experience with Gemini 3 Pro and GPT‑5.1 is highly mixed: some find Gemini “hot garbage” at coding but great for SQL, analysis, and long‑context research; others say Antigravity+Gemini is far ahead of Claude Code for agentic workflows.
Claude is often praised for “developer focus” and coding quality, but criticized for rate limits, privacy policy changes (prompt reuse), and for lobbying against open‑weights; opinions diverge on whether its safety posture is genuinely ethical or mostly regulatory capture.
Opus 4.5’s system card (especially on prompt injection and CBRN risk) is appreciated as unusually detailed, but quick jailbreak demos and ongoing alignment debates leave some skeptical of “most aligned model” claims.

Related topics