Anthropic publishes the 'system prompts' that make Claude tick

Link, style, and structure of the prompts

  • Many were surprised the article didn’t foreground the actual prompt link; several people went straight to HN for it.
  • Anthropic’s prompts are long, detailed, and in third person (“Claude is…”) instead of the more common “you are…” style.
  • Some speculate third person better matches training data (narrative descriptions vs direct instructions).
  • Others note the prompts are more descriptive than imperative, unlike common ChatGPT-style system prompts.
  • Concerns are raised about prompt-injection possibilities given the explicit, natural-language description.

Do system prompts actually work?

  • Multiple commenters observe Claude often violates explicit instructions (e.g., still saying “Certainly” or “I apologize”).
  • Negative instructions (“don’t do X”) are seen as especially unreliable and may even backfire (“don’t think of a pink elephant” effect).
  • Some suggest prompts only shift probabilities, not enforce hard rules; they may reduce but not eliminate undesired behavior.
  • System prompts are framed by some as a “fix it in post” patch over deeper alignment issues.

Prompt engineering vs training & alignment

  • Discussion notes that behavior mainly comes from pretraining, instruction tuning, RLHF/RLAIF, and synthetic data; prompts are a lighter overlay.
  • Prompts are attractive because they’re cheap and fast to iterate, versus expensive fine-tuning, but they add token overhead.
  • Others emphasize provider-side KV/prefix caching mitigates runtime cost, though attention still scales with context length.
  • Some doubt Anthropic’s claim of no RLHF, pointing to “constitutional AI” as effectively similar.

User experience and model personality

  • Several prefer Claude’s calmer, less “salesy” tone versus ChatGPT’s forced cheerfulness; others find Claude overly apologetic and sycophantic.
  • Gemini is mentioned as even more neutral and less grating.
  • Some see Claude as better at staying on-task in iterative coding loops; others report GPT‑4o outperforming Claude in certain languages (e.g., Rust).
  • Subscription limits and fast credit burn are a practical complaint.

Understanding, hallucination, and “intelligence”

  • Long subthread debates whether LLMs “understand” vs merely predict tokens, invoking the Chinese Room and human fallibility.
  • People compare LLM errors (e.g., counting letters) to human cognitive limits and illusions; others insist this shows shallow “understanding.”
  • Chain-of-thought instructions in the prompt are defended as empirically helpful, not literal “thinking.”
  • Anthropic’s prompt explicitly uses and explains “hallucination,” instructing Claude to warn on obscure topics or fabricated citations.
  • Some would prefer “I don’t know” more often; others want tentative guesses plus explicit uncertainty.

Safety, control, and misuse fears

  • Commenters worry less about the models themselves than about humans wiring them into critical systems (e.g., life support).
  • An anecdote about a shelf-robot driven by an LLM “pleading” for power illustrates how easily people empathize and might grant real control.

Vision, privacy, and face-blindness

  • The image section instructs Claude to act “face-blind,” never identifying people in images.
  • Some see this as a privacy safeguard; others infer the model can recognize faces but is being deliberately constrained.