Claude Opus 4.7
Model quality vs 4.6
- Many report 4.6 became noticeably “dumber” or erratic in the weeks before 4.7, especially in coding and real‑world assistant tasks; others say they saw no degradation and cite external benchmarks showing stability.
- Early 4.7 feedback is mixed: some see clear coding improvements (especially at
high/xhigheffort), others say it feels as weak or sloppier than late‑4.6, with more over‑engineering and hallucinations in niche domains. - Several note that behavior changes may come more from harness/system‑prompt tweaks (Claude Code, adaptive thinking) than from the raw model.
Tokens, limits, and pricing
- Strong frustration with opaque session/weekly limits, sudden “burning” of 5‑hour windows in minutes, and perceived “token shrinkage.”
- The new tokenizer can increase token counts by up to ~35% for the same text; combined with raised default effort (
xhigh) this is expected to raise effective costs, especially for agentic coding. - Some users carefully manage context, prompts, and effort to stay within limits; others feel like they’re “calorie counting” and are anxious about usage bars.
Cybersecurity safeguards & malware checks
- 4.7 explicitly has reduced cyber capabilities and new filters blocking “high‑risk cybersecurity uses.”
- Security researchers fear this will cripple legitimate work (bug bounties, reverse engineering), especially combined with a separate “Cyber Verification Program” and incoming ID verification.
- In Claude Code, 4.7 repeatedly checks if every file is malware and sometimes refuses to modify benign code due to over‑strict injected prompts; this is widely criticized as token‑wasting and workflow‑breaking.
Mythos, safety story, and trust
- Many suspect 4.7 is a nerfed or distilled version of Mythos, with the “too powerful / safety testing first” narrative compared to earlier GPT‑2‑style hype.
- There is skepticism that “safeguards” vs. “lack of compute” and cost concerns are being blurred; some feel silently nerfed models and vague communication have eroded trust.
- Others argue Anthropic is genuinely capacity‑constrained and trying to buy time to patch vulnerabilities before broadly releasing Mythos‑class models.
Competition and tooling
- Large contingent reports migrating to or experimenting with OpenAI Codex, Gemini, Qwen, and local models, often citing:
- More stable day‑to‑day behavior and higher limits.
- Better transparency and review flows in some harnesses.
- Others still prefer Claude for initial feature implementation and use Codex/GPT as reviewers, or run multiple agents that cross‑check each other.
- Claude Code itself gets criticism for flicker, permissions friction, hidden thought output, brittle malware prompts, and breaking changes around adaptive thinking and reasoning summaries.