2026-04-16

Claude Opus 4.7

Model quality vs 4.6

Many report 4.6 became noticeably “dumber” or erratic in the weeks before 4.7, especially in coding and real‑world assistant tasks; others say they saw no degradation and cite external benchmarks showing stability.
Early 4.7 feedback is mixed: some see clear coding improvements (especially at high/xhigh effort), others say it feels as weak or sloppier than late‑4.6, with more over‑engineering and hallucinations in niche domains.
Several note that behavior changes may come more from harness/system‑prompt tweaks (Claude Code, adaptive thinking) than from the raw model.

Tokens, limits, and pricing

Strong frustration with opaque session/weekly limits, sudden “burning” of 5‑hour windows in minutes, and perceived “token shrinkage.”
The new tokenizer can increase token counts by up to ~35% for the same text; combined with raised default effort (xhigh) this is expected to raise effective costs, especially for agentic coding.
Some users carefully manage context, prompts, and effort to stay within limits; others feel like they’re “calorie counting” and are anxious about usage bars.

Cybersecurity safeguards & malware checks

4.7 explicitly has reduced cyber capabilities and new filters blocking “high‑risk cybersecurity uses.”
Security researchers fear this will cripple legitimate work (bug bounties, reverse engineering), especially combined with a separate “Cyber Verification Program” and incoming ID verification.
In Claude Code, 4.7 repeatedly checks if every file is malware and sometimes refuses to modify benign code due to over‑strict injected prompts; this is widely criticized as token‑wasting and workflow‑breaking.

Mythos, safety story, and trust

Many suspect 4.7 is a nerfed or distilled version of Mythos, with the “too powerful / safety testing first” narrative compared to earlier GPT‑2‑style hype.
There is skepticism that “safeguards” vs. “lack of compute” and cost concerns are being blurred; some feel silently nerfed models and vague communication have eroded trust.
Others argue Anthropic is genuinely capacity‑constrained and trying to buy time to patch vulnerabilities before broadly releasing Mythos‑class models.

Competition and tooling

Large contingent reports migrating to or experimenting with OpenAI Codex, Gemini, Qwen, and local models, often citing:
- More stable day‑to‑day behavior and higher limits.
- Better transparency and review flows in some harnesses.
Others still prefer Claude for initial feature implementation and use Codex/GPT as reviewers, or run multiple agents that cross‑check each other.
Claude Code itself gets criticism for flicker, permissions friction, hidden thought output, brittle malware prompts, and breaking changes around adaptive thinking and reasoning summaries.

Related topics