Anthropic's Prompt Engineering Tutorial (2024)

Relevance of the Tutorial to Newer Models

  • Several commenters note the tutorial targets Claude 3 models and feels dated for newer “reasoning” / RL-tuned models like Sonnet 4.5.
  • Some chapters (esp. about chain-of-thought and decomposing tasks) are seen as less critical when models autonomously plan, but others argue careful structure still improves results on harder problems.
  • Multiple people want an explicitly updated 2024/2025 version.

Prompt Structure, Output Ordering, and Reasoning Models

  • A key takeaway for some readers: control the order of the model’s output.
    • Ask first for evidence, options, or pros/cons, and only then for a final answer. This reduces “random answer + post‑hoc justification.”
  • There’s debate about “reasoning models”:
    • One view: they’re still just next‑token predictors; ordering still matters and context can still be “poisoned.”
    • Another view: they internally generate and refine intermediate thoughts, so external prompt structure matters less.
    • Middle ground: ordering matters less but still helps on challenging tasks; models “flip‑flop,” and careful output design can nudge them toward better final choices.

Grounding, Hallucinations, and Web Use

  • Some people ask models to start with verbatim quotes or references from web sources to ground answers in real docs.
  • Others complain that models still fabricate URLs, documentation, and quotes, and may confidently deny being wrong.

Is “Prompt Engineering” Really Engineering?

  • Large, heated thread on terminology:
    • Critics: “prompt engineering” is mostly trial-and-error “vibe prompting,” easily broken by small model changes and lacking established theory or repeatability; closer to alchemy than engineering.
    • Defenders: engineering routinely deals with randomness, non‑determinism, and changing inputs; with test sets, metrics, statistical validation, and monitoring, prompt work can be rigorous.
    • Some distinguish science (discovering laws) from engineering (applying them), arguing prompt work is still in the pre‑theory, exploratory phase.
    • Others point to broader dictionary senses of “engineering” (artful manipulation, social engineering) to justify the term, while some see this as marketing/ego inflation.

Credentials, Titles, and Professional Responsibility

  • Side discussion on protected “Engineer” titles (e.g., Canadian/PE regimes) vs US-style title inflation (“software engineer,” “front‑end engineer,” “prompt engineer”).
  • Some argue licensing improves safety and accountability; others see it as protectionist or mismatched to software/AI work.

LLM Limits, AGI Skepticism, and “Alchemy” Feel

  • Several users say the tutorial underscores how fragile and opaque current systems are, undermining AGI hype.
  • Skepticism that models are “superhuman” in math; reports of poor performance on advanced topics.
  • Others note that LLMs are trained only to model language, not “deep comprehension,” and we don’t yet know how to train for that.
  • Philosophical questions arise about intelligence, consciousness, and whether AGI is even attainable with current architectures.

Practical Prompting Strategies and Tools

  • One practical pattern:
    • Provide concrete context → ask for broad analysis of possible approaches → list pros/cons → then have the model pick a winner.
    • This is explicitly compared to how humans should solve hard problems.
  • Some people say newer models are good enough that they mostly use short, conversational prompts plus real‑time correction, or rely on built‑in “planning” modes.
  • Others suggest outsourcing prompt design to LLMs themselves, possibly in a loop with a judge model; IDE tools (e.g., Copilot‑style) already do prompt rewriting under the hood.
  • DSPy and “context engineering” are mentioned as more systematic ways to structure prompts and workflows.
  • A few ask for up‑to‑date, project‑based guides for agentic coding in editors like VS Code.

General Frustration and Fatigue

  • Some commenters mock the whole domain as “alchemy for beginners” or a symptom of “the dumbest timeline,” questioning the societal enthusiasm and economic backing relative to the evident brittleness of the techniques.