Changes in the system prompt between Claude Opus 4.6 and 4.7
Behavior changes: acting vs clarifying
- Many note the new “act rather than clarify” behavior in 4.7 versus 4.6.
- Some like the reduced friction: fewer redundant questions, faster progress.
- Others strongly dislike it: the model makes wrong assumptions, starts harmful or incorrect edits, and requires more user interruptions.
- Several users now explicitly prompt the model to ask more questions, even inserting mandatory “interview phases” or rules like “don’t assume; ask.”
- There is concern that this behavior is effectively “hardcoded” in the system prompt and can’t be reliably overridden by user prompts.
Malware, security, and refusals
- Multiple comments describe “malware paranoia,” especially in 4.7 but also seen in 4.6.
- Reports include:
- Overzealous refusal patterns in mundane corporate contexts.
- Extra tool-call turns devoted to justifying malware-safety decisions, burning tokens.
- Normal data-analysis scripts or web-scraping and security research being blocked or derailed.
- Some suspect new steering techniques or base-model changes, not only system-prompt tweaks.
- Others argue tight malware controls are necessary given the models’ increasing coding capability.
Prompt cache, tool use, and latency/cost
- The Claude Code system-prompt diff reveals detailed guidance on choosing
delaySecondsto align with a 5‑minute prompt-cache TTL. - Advice like “don’t pick exactly 300s” strikes some as overly verbose but explains why many sessions see unexpected token burn.
- Some users are surprised that long-running, tool-heavy sessions still incur frequent full-context reload costs.
System prompt size, structure, and performance
- The system prompt is described as very long (thousands of words/tokens), raising concerns about:
- Context bloat and instruction dilution.
- Infrastructure cost and inefficiency, despite caching.
- Debate over why more behavior isn’t “baked into the weights” instead of layered via enormous prompts.
- Others note big, cached system prompts are now typical across major LLMs.
Safety sections (eating disorders, child safety, etc.)
- The eating-disorder guidance is seen by some as common-sense harm reduction; others see it as niche, prompt-bloating “tax” applied to every request.
- Concerns:
- Incremental accretion of topic-specific safety sections could lead to huge, opaque guardrail stacks.
- Potential future overreach where benign queries (e.g., basic calorie info) might be overblocked.
- Counterpoints:
- Eating disorders are argued to be common and high-risk enough to justify explicit handling.
- The system prompt is framed as a temporary patch until safer behaviors can be trained directly into the model.
- Some worry this reflects a broader trend toward moralistic control and narrowing of acceptable inquiry; others emphasize legal liability and user trust.
Alignment, identity, and wording style
- Commenters note the system prompt’s use of “Claude does/does not X” rather than “you,” suggesting:
- A deliberate attempt to anchor the model in a specific persona (“Claude”) rather than a generic “you.”
- A “what would Claude do?” style of self-alignment, analogous to training a role or character.
- Discussion of “positive prompting” patterns (“Claude does Y” instead of “never do X”) as more effective than prohibitions.
User control, options, and specialization
- Shared frustration that many behaviors (concise answers, less clarifying, strong safety stances) are fixed at the system level instead of being user-selectable modes.
- Some want separate profiles: e.g., verbose expert mode vs. short consumer mode; cautious vs. research/security-friendly; or multiple “characters” tuned to different workflows.
- Developers building agents describe adding their own orchestration layers (Socratic sub-agents, interview phases, self-evaluating prompts) to claw back control from the default behavior.
Perceived regressions and model comparisons
- Opinions on version quality are mixed:
- Some feel 4.7 improves in options and certain capabilities but worsens decision quality or induces decision fatigue.
- Others say 4.6 became “unusable” due to oversensitive cybersecurity flags and session terminations.
- A few users nostalgically prefer 4.5.
- There is debate about whether improvements in safety and capability are beginning to trade off directly against practical usability for advanced users.