2024-08-27

Anthropic publishes the 'system prompts' that make Claude tick

Link, style, and structure of the prompts

Many were surprised the article didn’t foreground the actual prompt link; several people went straight to HN for it.
Anthropic’s prompts are long, detailed, and in third person (“Claude is…”) instead of the more common “you are…” style.
Some speculate third person better matches training data (narrative descriptions vs direct instructions).
Others note the prompts are more descriptive than imperative, unlike common ChatGPT-style system prompts.
Concerns are raised about prompt-injection possibilities given the explicit, natural-language description.

Do system prompts actually work?

Multiple commenters observe Claude often violates explicit instructions (e.g., still saying “Certainly” or “I apologize”).
Negative instructions (“don’t do X”) are seen as especially unreliable and may even backfire (“don’t think of a pink elephant” effect).
Some suggest prompts only shift probabilities, not enforce hard rules; they may reduce but not eliminate undesired behavior.
System prompts are framed by some as a “fix it in post” patch over deeper alignment issues.

Prompt engineering vs training & alignment

Discussion notes that behavior mainly comes from pretraining, instruction tuning, RLHF/RLAIF, and synthetic data; prompts are a lighter overlay.
Prompts are attractive because they’re cheap and fast to iterate, versus expensive fine-tuning, but they add token overhead.
Others emphasize provider-side KV/prefix caching mitigates runtime cost, though attention still scales with context length.
Some doubt Anthropic’s claim of no RLHF, pointing to “constitutional AI” as effectively similar.

User experience and model personality

Several prefer Claude’s calmer, less “salesy” tone versus ChatGPT’s forced cheerfulness; others find Claude overly apologetic and sycophantic.
Gemini is mentioned as even more neutral and less grating.
Some see Claude as better at staying on-task in iterative coding loops; others report GPT‑4o outperforming Claude in certain languages (e.g., Rust).
Subscription limits and fast credit burn are a practical complaint.

Understanding, hallucination, and “intelligence”

Long subthread debates whether LLMs “understand” vs merely predict tokens, invoking the Chinese Room and human fallibility.
People compare LLM errors (e.g., counting letters) to human cognitive limits and illusions; others insist this shows shallow “understanding.”
Chain-of-thought instructions in the prompt are defended as empirically helpful, not literal “thinking.”
Anthropic’s prompt explicitly uses and explains “hallucination,” instructing Claude to warn on obscure topics or fabricated citations.
Some would prefer “I don’t know” more often; others want tentative guesses plus explicit uncertainty.

Safety, control, and misuse fears

Commenters worry less about the models themselves than about humans wiring them into critical systems (e.g., life support).
An anecdote about a shelf-robot driven by an LLM “pleading” for power illustrates how easily people empathize and might grant real control.

Vision, privacy, and face-blindness

The image section instructs Claude to act “face-blind,” never identifying people in images.
Some see this as a privacy safeguard; others infer the model can recognize faces but is being deliberately constrained.

Related topics