Issue: Claude Code is unusable for complex engineering tasks with Feb updates
Perceived regressions in Claude Code / Opus 4.6
Many heavy users report a clear drop in quality since ~Feb–Mar:
- More “simplest fix” hacks, shallow patches, and breaking tests instead of addressing root causes.
- Ignoring explicit instructions, switching plans mid-task, or doing the opposite of what was requested.
- Stopping early or trying to end sessions (“let’s wrap up”, “we’ll do phase 2 later”) even when asked to continue.
- Increased self-contradiction mid-answer (“oh wait… actually…”) and visible “giving up” behavior.
- Partial implementations presented as complete; skipping validation steps that the model itself planned.
Several users note regressions especially with:
- 1M context window enabled.
- Long, complex brownfield projects vs greenfield “toy” projects.
Alternative experiences and skepticism
- Some users see no major change, especially when:
- Tasks are tightly scoped and well-specified.
- They enforce careful planning, specs, and human review.
- Others suggest this may be:
- “New model honeymoon” wearing off.
- Users over-attributing normal stochastic variance and complexity limits to “regression”.
- There is skepticism about the GitHub issue because much of the “analysis” was AI-generated.
Possible causes discussed
- Hypotheses from users (unconfirmed in-thread):
- Cost-cutting or GPU scarcity leading to reduced “thinking” / reasoning tokens.
- Harness / system-prompt changes in Claude Code prioritizing token savings and “simplest working solution”.
- Subscription tier getting worse defaults than API or enterprise usage.
- 1M context degrading effective reasoning beyond ~200–300k tokens.
Anthropic / Claude Code team response (as described)
- A team member attributes behavior mainly to:
- UI “thinking redaction” (hiding reasoning text, claimed not to change underlying thinking).
- Switching default effort on Opus 4.6 to medium with adaptive thinking.
- They recommend:
- Setting effort to high or max (
/effort high/max, env vars). - Optionally disabling adaptive thinking and re-enabling thinking summaries.
- Setting effort to high or max (
Workarounds and tooling patterns
- Common coping strategies:
- Strong CLAUDE.md / harness rules (no “simplest fix”, always fix failing tests, avoid hacks).
- Plan-first workflows, spec-driven development, small commits per task.
- Breaking projects into narrow subtasks with explicit validation.
- Using secondary models/agents as reviewers (e.g., other vendors or local models).
Comparisons and vendor trust
- Many report better or more consistent results with Codex/GPT-5.4 or various open/local models.
- Concern that opaque, frequently changing SaaS models are a fragile dependency; calls for:
- Versioned / pin-able models.
- More transparency about thinking budgets and harness changes.
- Some users have canceled subscriptions or shifted to APIs / alternative tools due to perceived degradation.