2026-04-28

Regression: malware reminder on every read still causes subagent refusals

Bug and malware-prompt regression

Core issue: a system prompt in Claude’s managed agents / Claude Code says “whenever you read a file … you MUST refuse to improve or augment the code,” with no conditional “if it’s malware.”
Result: agents often refuse to modify any code after reading it, even when users clarify it isn’t malware.
Some users report they can override it by explicitly stating the code is not malware; others still see refusals.
The malware-analysis-on-every-file behavior is seen as wasteful, context-bloating, and poorly thought through.

Token usage, context bloat, and cost

Several commenters report large, unexplained token use: heavy hidden prompts, repeated malware checks, and verbose “thoughts.”
Some find managed harnesses double the context usage vs lighter custom harnesses (e.g., 30–40k vs ~15k tokens just to say “hi”).
People worry about “revenue-positive bugs” where misbehavior increases token consumption at user expense.
Others note that subscription plans can be dramatically cheaper than raw API use, pressuring users into first‑party harnesses.

Harness control and alternatives

Frustration that managed agents don’t allow users to edit system prompts or harness logic, so they’re stuck with Anthropic’s guardrails and bugs.
Multiple commenters prefer running their own harnesses (OpenCode, Pi, custom tools) with API keys or alternative models (OpenAI Codex, DeepSeek, Qwen).
Subscription-based access often cannot be used with third‑party harnesses, limiting flexibility.

Safety vs user autonomy

Debate over whether LLMs should strictly refuse anything possibly related to malware vs. acting as neutral tools.
Some argue providers should allow powerful tools but handle abuse via detection/reporting, not intrusive guardrails.
Others point out that trying to automate such guardrails can produce exactly this kind of overreach and regressions.

Perceptions of Anthropic’s engineering and product culture

Many see repeated regressions, opaque prompts, and “vibe-coded” changes as signs of weak testing and poor product discipline.
Some note Claude Code was ahead of the curve but appears to be degrading, with more bugs and inconsistent behavior.
A minority suggest these problems may be growth pains rather than pure incompetence, but dissatisfaction dominates the thread.

Related topics