2026-04-23

An update on recent Claude Code quality reports

Perceived Regressions & Root Causes

Many commenters report noticeable drops in Claude Code quality over the last 1–2 months: more laziness, failures to follow instructions, broken long-horizon workflows, and higher token burn for less progress.
Some see the postmortem as confirming users “weren’t crazy”: default effort lowered, thinking stripped on resume, and a verbosity-reduction system prompt all degraded coding help.
Others argue these are harness bugs/config changes, not model-weight degradation, but note that for users “Claude Code is the product,” so the distinction feels academic.

Caching, Context, and Token Costs

The one-hour idle-session cache behavior and subsequent bug are widely criticized.
Many rely on long-lived sessions as “expensive, hard-won context”; silently dropping thinking or forcing compaction is seen as a serious regression.
Multiple technical subthreads explain KV/prompt caching, its GPU/IO cost, and why cache misses can cause huge token charges.
Users want: visible cache status, clear cost estimates before resuming big sessions, and an explicit choice between cost vs. quality.

Reasoning Effort, System Prompts & Adaptive Thinking

Lowering default reasoning effort from high to medium to “reduce latency” is viewed by many as an intentional quality‑for‑cost tradeoff that contradicts “we never degrade performance.”
The “reduce verbosity” system prompt is blamed for worse code quality and odd behavior (e.g., internal prompt-injection paranoia).
Forced/adaptive thinking and removal of explicit “always think” modes are seen as opaque and harmful for serious coding/scientific work.

Trust, Communication, and “Gaslighting” Debate

Strong sentiment that Anthropic responded late, minimized issues, and relied on scattered social posts instead of clear product messaging.
Some explicitly use “gaslighting”; others push back, saying it’s more likely poor instrumentation, complexity, and rushed product decisions than malice.
Resetting usage limits is welcomed by some, dismissed by others as insufficient given wasted time and tokens.

Pricing, A/B Tests, and Business Model Concerns

Silent A/B tests on subscription features (e.g., removing Claude Code from some Pro users) are heavily criticized as deceptive and “enshittifying.”
Several speculate that aggressive cache eviction and effort reductions are cost‑control measures under compute and IPO pressure.
A minority say they’d pay far more for a stable, uncompromised “max quality” tier; others already find pricing high.

Comparisons, Alternatives, and Lock‑in

Many report switching or partially switching to Codex, OpenAI’s models, or Chinese/open‑source models; some find those more reliable, others still prefer Claude for UX and UI work.
There is resentment over Anthropic banning third‑party harnesses with subscriptions, which would have let users avoid Claude Code regressions.

Quality Assurance, Testing, and Harness Design

Multiple comments say these bugs should have been caught by basic unit/e2e tests and better eval harnesses.
Some blame “vibe coding” and over‑reliance on Claude to build Claude Code itself, leading to fragile, poorly understood behavior.
Suggestions include: stricter release processes, staged rollouts, visible model/prompt versions, and independent evaluations of model quality over time.

Related topics