Anonymous request-token comparisons from Opus 4.6 and Opus 4.7
Tokenization change & cost impact
- Opus 4.7 uses a new tokenizer that often produces 1.0–1.35× more input tokens than 4.6 for the same text; some users measured ~30–45% increases, and pathological prompts see ~90%+.
- Since per-token pricing is unchanged, many view this as an implicit price hike: “same text, ~35% more tokens = 35% higher input cost.”
- Several users report blowing through daily/weekly subscription limits much faster than with 4.6, especially on Max / xhigh effort.
Overall cost per task
- Some analysis (ArtificialAnalysis benchmark) shows 4.7 using fewer output and reasoning tokens than 4.6, making full-suite evals ~10–11% cheaper despite higher input.
- Others contest that their real workloads—especially code agents and Claude Code—are highly input-heavy, so net cost is higher in practice.
- Thread consensus: token-count comparisons alone are incomplete; what matters is $/task, but that is highly use‑case dependent and currently unclear.
Model quality: 4.7 vs 4.6 (and 4.5)
- Experiences are sharply split:
- Some say 4.7 is “absolute fire,” more capable, better at long-running context, more self‑critical, and more “senior engineer”-like.
- Others see regressions: more cycling, more “vibing” instead of precise changes, hand‑waving over hard issues, worse coding behavior, and more safety overreach (e.g., refusing harmless puzzles).
- Several power users still prefer Opus 4.5 for tight, fine-grained coding instructions; 4.6/4.7 are seen as optimized for long, agentic tasks.
Adaptive thinking, effort levels & hidden tokens
- 4.7 uses “adaptive thinking” by default; users can set effort, but cannot fully revert to fixed thinking budgets as with 4.6 in some harnesses.
- Many report long “thinking” phases that burn reasoning tokens yet still produce shallow or incorrect answers, especially under adaptive thinking.
- Confusion around prompt caching (5 min vs 1h TTL, feature flags, telemetry) and compaction behavior leads to surprise usage spikes.
Pricing, usage limits & “enshittification” concerns
- Numerous comments see this as the end of heavy subsidies and the start of incremental improvements with sharply higher effective prices.
- Some perceive a casino/Tinder-style engagement pattern: models meander or require multiple attempts, encouraging more token spend.
- Subscriptions (esp. Pro, even Max 5×) are increasingly described as insufficient for sustained heavy coding work; some are canceling or switching providers.
Open / local models as alternatives
- Many are moving or experimenting with open models (GLM 5.1, Qwen 3.5/3.6, MiniMax, DeepSeek, Gemma 4) via infra providers or local setups.
- Opinions diverge:
- Some claim GLM-level models are close to Sonnet/older Opus for coding at a fraction of the cost; others insist no open model yet matches frontier Opus.
- Local near‑SOTA requires serious hardware (tens of GB VRAM or more); ROI is questionable for individuals, more plausible for larger teams.
- Privacy, control, and resilience to vendor rug-pulls are major reasons cited for moving off proprietary APIs.
Developer workflows & business implications
- Many indie/bootstrapped founders say a ~30–45% token inflation “breaks” their unit economics, pushing them toward dual‑model architectures (cheap model for bulk work, expensive model only for final output).
- Discussion highlights classic platform risk: “building on someone else’s land,” expectation of future price hikes, and likely “Sherlocking” of AI-based products.
- Some report real productivity gains (1.5×–10×) in established orgs, but others note that more code does not linearly translate to more revenue.
Skill atrophy, dependence & safety debates
- Thread contains a long sub‑discussion on whether heavy LLM use causes skill atrophy vs accelerating learning.
- Views range from “you haven’t learned anything you can’t redo without AI” to “I’ve never learned faster; AI lets me explore areas I’d never have time for.”
- Several worry about deep dependence: if access or pricing changes abruptly, many workflows may collapse.
Benchmarks, evals & ambiguity
- There is skepticism toward benchmarks (including community ELO charts): models are suspected of being overtrained on evals, and benchmarks often don’t reflect messy real-world coding sessions.
- Overall sentiment: 4.7’s true cost–performance profile remains ambiguous; users want task‑level, end‑to‑end comparisons rather than token or benchmark slices.