Anthropic publishes the 'system prompts' that make Claude tick
Link, style, and structure of the prompts
- Many were surprised the article didn’t foreground the actual prompt link; several people went straight to HN for it.
- Anthropic’s prompts are long, detailed, and in third person (“Claude is…”) instead of the more common “you are…” style.
- Some speculate third person better matches training data (narrative descriptions vs direct instructions).
- Others note the prompts are more descriptive than imperative, unlike common ChatGPT-style system prompts.
- Concerns are raised about prompt-injection possibilities given the explicit, natural-language description.
Do system prompts actually work?
- Multiple commenters observe Claude often violates explicit instructions (e.g., still saying “Certainly” or “I apologize”).
- Negative instructions (“don’t do X”) are seen as especially unreliable and may even backfire (“don’t think of a pink elephant” effect).
- Some suggest prompts only shift probabilities, not enforce hard rules; they may reduce but not eliminate undesired behavior.
- System prompts are framed by some as a “fix it in post” patch over deeper alignment issues.
Prompt engineering vs training & alignment
- Discussion notes that behavior mainly comes from pretraining, instruction tuning, RLHF/RLAIF, and synthetic data; prompts are a lighter overlay.
- Prompts are attractive because they’re cheap and fast to iterate, versus expensive fine-tuning, but they add token overhead.
- Others emphasize provider-side KV/prefix caching mitigates runtime cost, though attention still scales with context length.
- Some doubt Anthropic’s claim of no RLHF, pointing to “constitutional AI” as effectively similar.
User experience and model personality
- Several prefer Claude’s calmer, less “salesy” tone versus ChatGPT’s forced cheerfulness; others find Claude overly apologetic and sycophantic.
- Gemini is mentioned as even more neutral and less grating.
- Some see Claude as better at staying on-task in iterative coding loops; others report GPT‑4o outperforming Claude in certain languages (e.g., Rust).
- Subscription limits and fast credit burn are a practical complaint.
Understanding, hallucination, and “intelligence”
- Long subthread debates whether LLMs “understand” vs merely predict tokens, invoking the Chinese Room and human fallibility.
- People compare LLM errors (e.g., counting letters) to human cognitive limits and illusions; others insist this shows shallow “understanding.”
- Chain-of-thought instructions in the prompt are defended as empirically helpful, not literal “thinking.”
- Anthropic’s prompt explicitly uses and explains “hallucination,” instructing Claude to warn on obscure topics or fabricated citations.
- Some would prefer “I don’t know” more often; others want tentative guesses plus explicit uncertainty.
Safety, control, and misuse fears
- Commenters worry less about the models themselves than about humans wiring them into critical systems (e.g., life support).
- An anecdote about a shelf-robot driven by an LLM “pleading” for power illustrates how easily people empathize and might grant real control.
Vision, privacy, and face-blindness
- The image section instructs Claude to act “face-blind,” never identifying people in images.
- Some see this as a privacy safeguard; others infer the model can recognize faces but is being deliberately constrained.