Agent Skills

What “skills” are and why they matter

  • Many see skills as small, modular “how-to” units for agents: structured docs plus optional scripts, invoked only when needed, not always in context.
  • Using LLMs as users of internal tools exposes poor APIs, error messages, and undocumented tribal knowledge; fixing these for agents also improves UX for humans.
  • Skills are framed as reusable workflows or subroutines (“do X then Y then validate”) rather than vague best-practices notes, which often get ignored.

Do agents reliably use skills? Mixed results

  • Several people report that agents frequently don’t invoke skills unless explicitly told, even with semantic triggers.
  • Vercel’s evals are cited: over half the time skills weren’t called at all; a well-crafted AGENTS.md / docs index often outperformed skills.
  • Workarounds:
    • Put key instructions directly into AGENTS.md / CLAUDE.md and just link to skills.
    • Use skills as explicit slash commands or workflows, not as background guidance.
    • Make descriptions long and precise about when to use the skill; keep the total number of skills small.

Context management & progressive disclosure

  • Core argued benefit is context efficiency: an index of short descriptions in context, full instructions loaded only if relevant.
  • Variants like multi-level “glance → card → skill → README” hierarchies are described to minimize tokens while preserving discoverability.
  • Some argue this is just good documentation structure; skills mainly standardize where/how that structure lives so harnesses can auto-load it.

Standards, directories, and overlap with other systems

  • There’s active debate over standard folders (.claude/skills, .codex/skills, .agents/skills, XDG paths); some want early standardization, others warn it’s premature.
  • Skills are compared to MCP and plugins:
    • One camp says they’re functionally similar (described capabilities, selection, potential package managers, same security risks).
    • Another emphasizes: MCP = external tools with round-trips; skills = in-context manuals and scripts that can compose within a single completion.

Skepticism, security, and long-term relevance

  • Critics see skills as repackaged prompts/markdown with hype; suggest plain, well-organized docs and indexes achieve the same.
  • Concern over public skill registries: unverified content, possible prompt injection or malicious behavior, “supply chain” risk analogous to npm.
  • Some expect skills to be a transitional pattern: larger contexts and better-trained models may make rigid skill specs less important, while the underlying lesson—clear, modular documentation—remains.