2026-04-19

Changes in the system prompt between Claude Opus 4.6 and 4.7

Behavior changes: acting vs clarifying

Many note the new “act rather than clarify” behavior in 4.7 versus 4.6.
Some like the reduced friction: fewer redundant questions, faster progress.
Others strongly dislike it: the model makes wrong assumptions, starts harmful or incorrect edits, and requires more user interruptions.
Several users now explicitly prompt the model to ask more questions, even inserting mandatory “interview phases” or rules like “don’t assume; ask.”
There is concern that this behavior is effectively “hardcoded” in the system prompt and can’t be reliably overridden by user prompts.

Malware, security, and refusals

Multiple comments describe “malware paranoia,” especially in 4.7 but also seen in 4.6.
Reports include:
- Overzealous refusal patterns in mundane corporate contexts.
- Extra tool-call turns devoted to justifying malware-safety decisions, burning tokens.
- Normal data-analysis scripts or web-scraping and security research being blocked or derailed.
Some suspect new steering techniques or base-model changes, not only system-prompt tweaks.
Others argue tight malware controls are necessary given the models’ increasing coding capability.

Prompt cache, tool use, and latency/cost

The Claude Code system-prompt diff reveals detailed guidance on choosing delaySeconds to align with a 5‑minute prompt-cache TTL.
Advice like “don’t pick exactly 300s” strikes some as overly verbose but explains why many sessions see unexpected token burn.
Some users are surprised that long-running, tool-heavy sessions still incur frequent full-context reload costs.

System prompt size, structure, and performance

The system prompt is described as very long (thousands of words/tokens), raising concerns about:
- Context bloat and instruction dilution.
- Infrastructure cost and inefficiency, despite caching.
Debate over why more behavior isn’t “baked into the weights” instead of layered via enormous prompts.
Others note big, cached system prompts are now typical across major LLMs.

Safety sections (eating disorders, child safety, etc.)

The eating-disorder guidance is seen by some as common-sense harm reduction; others see it as niche, prompt-bloating “tax” applied to every request.
Concerns:
- Incremental accretion of topic-specific safety sections could lead to huge, opaque guardrail stacks.
- Potential future overreach where benign queries (e.g., basic calorie info) might be overblocked.
Counterpoints:
- Eating disorders are argued to be common and high-risk enough to justify explicit handling.
- The system prompt is framed as a temporary patch until safer behaviors can be trained directly into the model.
Some worry this reflects a broader trend toward moralistic control and narrowing of acceptable inquiry; others emphasize legal liability and user trust.

Alignment, identity, and wording style

Commenters note the system prompt’s use of “Claude does/does not X” rather than “you,” suggesting:
- A deliberate attempt to anchor the model in a specific persona (“Claude”) rather than a generic “you.”
- A “what would Claude do?” style of self-alignment, analogous to training a role or character.
Discussion of “positive prompting” patterns (“Claude does Y” instead of “never do X”) as more effective than prohibitions.

User control, options, and specialization

Shared frustration that many behaviors (concise answers, less clarifying, strong safety stances) are fixed at the system level instead of being user-selectable modes.
Some want separate profiles: e.g., verbose expert mode vs. short consumer mode; cautious vs. research/security-friendly; or multiple “characters” tuned to different workflows.
Developers building agents describe adding their own orchestration layers (Socratic sub-agents, interview phases, self-evaluating prompts) to claw back control from the default behavior.

Perceived regressions and model comparisons

Opinions on version quality are mixed:
- Some feel 4.7 improves in options and certain capabilities but worsens decision quality or induces decision fatigue.
- Others say 4.6 became “unusable” due to oversensitive cybersecurity flags and session terminations.
- A few users nostalgically prefer 4.5.
There is debate about whether improvements in safety and capability are beginning to trade off directly against practical usability for advanced users.

Related topics