An update on recent Claude Code quality reports
Perceived Regressions & Root Causes
- Many commenters report noticeable drops in Claude Code quality over the last 1–2 months: more laziness, failures to follow instructions, broken long-horizon workflows, and higher token burn for less progress.
- Some see the postmortem as confirming users “weren’t crazy”: default effort lowered, thinking stripped on resume, and a verbosity-reduction system prompt all degraded coding help.
- Others argue these are harness bugs/config changes, not model-weight degradation, but note that for users “Claude Code is the product,” so the distinction feels academic.
Caching, Context, and Token Costs
- The one-hour idle-session cache behavior and subsequent bug are widely criticized.
- Many rely on long-lived sessions as “expensive, hard-won context”; silently dropping thinking or forcing compaction is seen as a serious regression.
- Multiple technical subthreads explain KV/prompt caching, its GPU/IO cost, and why cache misses can cause huge token charges.
- Users want: visible cache status, clear cost estimates before resuming big sessions, and an explicit choice between cost vs. quality.
Reasoning Effort, System Prompts & Adaptive Thinking
- Lowering default reasoning effort from high to medium to “reduce latency” is viewed by many as an intentional quality‑for‑cost tradeoff that contradicts “we never degrade performance.”
- The “reduce verbosity” system prompt is blamed for worse code quality and odd behavior (e.g., internal prompt-injection paranoia).
- Forced/adaptive thinking and removal of explicit “always think” modes are seen as opaque and harmful for serious coding/scientific work.
Trust, Communication, and “Gaslighting” Debate
- Strong sentiment that Anthropic responded late, minimized issues, and relied on scattered social posts instead of clear product messaging.
- Some explicitly use “gaslighting”; others push back, saying it’s more likely poor instrumentation, complexity, and rushed product decisions than malice.
- Resetting usage limits is welcomed by some, dismissed by others as insufficient given wasted time and tokens.
Pricing, A/B Tests, and Business Model Concerns
- Silent A/B tests on subscription features (e.g., removing Claude Code from some Pro users) are heavily criticized as deceptive and “enshittifying.”
- Several speculate that aggressive cache eviction and effort reductions are cost‑control measures under compute and IPO pressure.
- A minority say they’d pay far more for a stable, uncompromised “max quality” tier; others already find pricing high.
Comparisons, Alternatives, and Lock‑in
- Many report switching or partially switching to Codex, OpenAI’s models, or Chinese/open‑source models; some find those more reliable, others still prefer Claude for UX and UI work.
- There is resentment over Anthropic banning third‑party harnesses with subscriptions, which would have let users avoid Claude Code regressions.
Quality Assurance, Testing, and Harness Design
- Multiple comments say these bugs should have been caught by basic unit/e2e tests and better eval harnesses.
- Some blame “vibe coding” and over‑reliance on Claude to build Claude Code itself, leading to fragile, poorly understood behavior.
- Suggestions include: stricter release processes, staged rollouts, visible model/prompt versions, and independent evaluations of model quality over time.