2025-06-30

The new skill in AI is not prompting, it's context engineering

What “context engineering” is about

Commenters broadly agree that good results come less from “magic prompts” and more from assembling the right information, tools, and history for the model at each step.
Emphasis is on better context, not more: relevant documents, examples, schemas, tool descriptions, recent edits, etc., structured so the model can plausibly solve the task.
Several people liken this to classic software practices: specs, UX requirements, tech lead work, and environment/“bureaucracy” design rather than one-shot clever phrasing.

Prompting vs context: real distinction or rebrand?

One camp says this is just prompt engineering with a new name; everything is “just tokens in the context window.”
Others argue “prompt” (what the user types) vs “context” (system prompts, history, retrieved docs, tool metadata, agent state) is a useful conceptual split, especially for multi-step agents.
There’s criticism of anthropomorphizing LLMs (“like humans”) and of buzzword churn, but also the view that “prompt engineering” got trivialized as “typing into chat,” so a new term helps.

Technical issues: long contexts, tools, and agents

Long contexts degrade (“context rot”); models weight early tokens more, and practical accuracy often drops far before the advertised max window.
Techniques discussed: tool loadout (choosing small subsets of tools per step), context pruning/summarization/offloading, quarantining noisy data, and using sub‑agents to keep each context focused.
Some expect future models with stable huge contexts and support for thousands of tools to make many current multi-agent architectures obsolete; others note costs, latency, and token pricing will still force routing and pruning.

Skepticism, rigor, and “engineering”

Many complain that “context/prompt engineering” is often trial-and-error tinkering dressed up as a discipline—likened to alchemy, SEO, or WoW strategy guides.
Others say it becomes real engineering once you add systematic evaluations, experiments, and measurable improvements; without evals you’re just guessing.
Determinism is debated: in theory fixed seeds make models deterministic, but parallel floating‑point execution and sampling mean outputs often vary in practice.

Real-world experience: powerful but brittle

Positive reports: full plugins, Manim animations, hybrid rules+ML pipelines, and complex refactors built quickly when context is well-curated.
Negative reports: agents that loop, break code, or produce plausible-but-wrong answers even with rich context—leading some to revert to manual coding.
Overall: context matters a lot, but current models still hallucinate, fail on multi-step tasks, and require human review; how durable this “skill” is as models evolve remains unclear.

Related topics