2026-02-03

Agent Skills

What “skills” are and why they matter

Many see skills as small, modular “how-to” units for agents: structured docs plus optional scripts, invoked only when needed, not always in context.
Using LLMs as users of internal tools exposes poor APIs, error messages, and undocumented tribal knowledge; fixing these for agents also improves UX for humans.
Skills are framed as reusable workflows or subroutines (“do X then Y then validate”) rather than vague best-practices notes, which often get ignored.

Do agents reliably use skills? Mixed results

Several people report that agents frequently don’t invoke skills unless explicitly told, even with semantic triggers.
Vercel’s evals are cited: over half the time skills weren’t called at all; a well-crafted AGENTS.md / docs index often outperformed skills.
Workarounds:
- Put key instructions directly into AGENTS.md / CLAUDE.md and just link to skills.
- Use skills as explicit slash commands or workflows, not as background guidance.
- Make descriptions long and precise about when to use the skill; keep the total number of skills small.

Context management & progressive disclosure

Core argued benefit is context efficiency: an index of short descriptions in context, full instructions loaded only if relevant.
Variants like multi-level “glance → card → skill → README” hierarchies are described to minimize tokens while preserving discoverability.
Some argue this is just good documentation structure; skills mainly standardize where/how that structure lives so harnesses can auto-load it.

Standards, directories, and overlap with other systems

There’s active debate over standard folders (.claude/skills, .codex/skills, .agents/skills, XDG paths); some want early standardization, others warn it’s premature.
Skills are compared to MCP and plugins:
- One camp says they’re functionally similar (described capabilities, selection, potential package managers, same security risks).
- Another emphasizes: MCP = external tools with round-trips; skills = in-context manuals and scripts that can compose within a single completion.

Skepticism, security, and long-term relevance

Critics see skills as repackaged prompts/markdown with hype; suggest plain, well-organized docs and indexes achieve the same.
Concern over public skill registries: unverified content, possible prompt injection or malicious behavior, “supply chain” risk analogous to npm.
Some expect skills to be a transitional pattern: larger contexts and better-trained models may make rigid skill specs less important, while the underlying lesson—clear, modular documentation—remains.

Related topics