Claude Opus 4.7

Model quality vs 4.6

  • Many report 4.6 became noticeably “dumber” or erratic in the weeks before 4.7, especially in coding and real‑world assistant tasks; others say they saw no degradation and cite external benchmarks showing stability.
  • Early 4.7 feedback is mixed: some see clear coding improvements (especially at high/xhigh effort), others say it feels as weak or sloppier than late‑4.6, with more over‑engineering and hallucinations in niche domains.
  • Several note that behavior changes may come more from harness/system‑prompt tweaks (Claude Code, adaptive thinking) than from the raw model.

Tokens, limits, and pricing

  • Strong frustration with opaque session/weekly limits, sudden “burning” of 5‑hour windows in minutes, and perceived “token shrinkage.”
  • The new tokenizer can increase token counts by up to ~35% for the same text; combined with raised default effort (xhigh) this is expected to raise effective costs, especially for agentic coding.
  • Some users carefully manage context, prompts, and effort to stay within limits; others feel like they’re “calorie counting” and are anxious about usage bars.

Cybersecurity safeguards & malware checks

  • 4.7 explicitly has reduced cyber capabilities and new filters blocking “high‑risk cybersecurity uses.”
  • Security researchers fear this will cripple legitimate work (bug bounties, reverse engineering), especially combined with a separate “Cyber Verification Program” and incoming ID verification.
  • In Claude Code, 4.7 repeatedly checks if every file is malware and sometimes refuses to modify benign code due to over‑strict injected prompts; this is widely criticized as token‑wasting and workflow‑breaking.

Mythos, safety story, and trust

  • Many suspect 4.7 is a nerfed or distilled version of Mythos, with the “too powerful / safety testing first” narrative compared to earlier GPT‑2‑style hype.
  • There is skepticism that “safeguards” vs. “lack of compute” and cost concerns are being blurred; some feel silently nerfed models and vague communication have eroded trust.
  • Others argue Anthropic is genuinely capacity‑constrained and trying to buy time to patch vulnerabilities before broadly releasing Mythos‑class models.

Competition and tooling

  • Large contingent reports migrating to or experimenting with OpenAI Codex, Gemini, Qwen, and local models, often citing:
    • More stable day‑to‑day behavior and higher limits.
    • Better transparency and review flows in some harnesses.
  • Others still prefer Claude for initial feature implementation and use Codex/GPT as reviewers, or run multiple agents that cross‑check each other.
  • Claude Code itself gets criticism for flicker, permissions friction, hidden thought output, brittle malware prompts, and breaking changes around adaptive thinking and reasoning summaries.