2026-04-18

Anonymous request-token comparisons from Opus 4.6 and Opus 4.7

Tokenization change & cost impact

Opus 4.7 uses a new tokenizer that often produces 1.0–1.35× more input tokens than 4.6 for the same text; some users measured ~30–45% increases, and pathological prompts see ~90%+.
Since per-token pricing is unchanged, many view this as an implicit price hike: “same text, ~35% more tokens = 35% higher input cost.”
Several users report blowing through daily/weekly subscription limits much faster than with 4.6, especially on Max / xhigh effort.

Overall cost per task

Some analysis (ArtificialAnalysis benchmark) shows 4.7 using fewer output and reasoning tokens than 4.6, making full-suite evals ~10–11% cheaper despite higher input.
Others contest that their real workloads—especially code agents and Claude Code—are highly input-heavy, so net cost is higher in practice.
Thread consensus: token-count comparisons alone are incomplete; what matters is $/task, but that is highly use‑case dependent and currently unclear.

Model quality: 4.7 vs 4.6 (and 4.5)

Experiences are sharply split:
- Some say 4.7 is “absolute fire,” more capable, better at long-running context, more self‑critical, and more “senior engineer”-like.
- Others see regressions: more cycling, more “vibing” instead of precise changes, hand‑waving over hard issues, worse coding behavior, and more safety overreach (e.g., refusing harmless puzzles).
Several power users still prefer Opus 4.5 for tight, fine-grained coding instructions; 4.6/4.7 are seen as optimized for long, agentic tasks.

Adaptive thinking, effort levels & hidden tokens

4.7 uses “adaptive thinking” by default; users can set effort, but cannot fully revert to fixed thinking budgets as with 4.6 in some harnesses.
Many report long “thinking” phases that burn reasoning tokens yet still produce shallow or incorrect answers, especially under adaptive thinking.
Confusion around prompt caching (5 min vs 1h TTL, feature flags, telemetry) and compaction behavior leads to surprise usage spikes.

Pricing, usage limits & “enshittification” concerns

Numerous comments see this as the end of heavy subsidies and the start of incremental improvements with sharply higher effective prices.
Some perceive a casino/Tinder-style engagement pattern: models meander or require multiple attempts, encouraging more token spend.
Subscriptions (esp. Pro, even Max 5×) are increasingly described as insufficient for sustained heavy coding work; some are canceling or switching providers.

Open / local models as alternatives

Many are moving or experimenting with open models (GLM 5.1, Qwen 3.5/3.6, MiniMax, DeepSeek, Gemma 4) via infra providers or local setups.
Opinions diverge:
- Some claim GLM-level models are close to Sonnet/older Opus for coding at a fraction of the cost; others insist no open model yet matches frontier Opus.
- Local near‑SOTA requires serious hardware (tens of GB VRAM or more); ROI is questionable for individuals, more plausible for larger teams.
Privacy, control, and resilience to vendor rug-pulls are major reasons cited for moving off proprietary APIs.

Developer workflows & business implications

Many indie/bootstrapped founders say a ~30–45% token inflation “breaks” their unit economics, pushing them toward dual‑model architectures (cheap model for bulk work, expensive model only for final output).
Discussion highlights classic platform risk: “building on someone else’s land,” expectation of future price hikes, and likely “Sherlocking” of AI-based products.
Some report real productivity gains (1.5×–10×) in established orgs, but others note that more code does not linearly translate to more revenue.

Skill atrophy, dependence & safety debates

Thread contains a long sub‑discussion on whether heavy LLM use causes skill atrophy vs accelerating learning.
Views range from “you haven’t learned anything you can’t redo without AI” to “I’ve never learned faster; AI lets me explore areas I’d never have time for.”
Several worry about deep dependence: if access or pricing changes abruptly, many workflows may collapse.

Benchmarks, evals & ambiguity

There is skepticism toward benchmarks (including community ELO charts): models are suspected of being overtrained on evals, and benchmarks often don’t reflect messy real-world coding sessions.
Overall sentiment: 4.7’s true cost–performance profile remains ambiguous; users want task‑level, end‑to‑end comparisons rather than token or benchmark slices.

Related topics