Agent Skills

Perceived Benefits of Agent Skills / Harnesses

  • Several users report strong results from Agent Skills and similar systems on side projects and production work.
  • Claimed benefits: more focus on architecture/product design, faster implementation, better handling of large/legacy codebases, and reusable “review” surfaces for plans, docs, and code.
  • Some find skills particularly useful for API design, UI testing, infrastructure work, and domain-specific “how to use this tool/library” instructions.

Comparison to Other Frameworks (Superpowers, Spec-kit, etc.)

  • Frequent comparisons with Superpowers, Compound Engineering, spec-kit, and other harnesses.
  • Mixed experiences with Superpowers:
    • Some say it replaces a lot of prompting for complex tasks and improves discipline.
    • Others removed it, citing extra latency, token burn, and marginal benefit vs. simply asking the model to plan and ask questions.
  • Many suspect heavy overlap between these frameworks; some view them as “prompt libraries” with different branding.

Skills vs. Simple Prompts / Process vs. Outcome

  • One camp: best results come from clearly specifying desired outcomes; elaborate processes and long skills are overkill and often untested.
  • Counterpoint: for complex tasks, process instructions (plan first, ask clarifying questions, follow conventions, maintain tests, document decisions) significantly improve reliability.
  • Skills are framed by some as reusable, shareable context and lightweight “sub-agent” prompts; critics argue many are bloated essays.

Complexity, Context, and Token Costs

  • Concern that long skills and many MCPs/plugins bloat context and raise costs.
  • Others note only skill metadata is always loaded; full content is pulled selectively, so even multi‑kiloword skills are manageable with large contexts.
  • Several warn against blindly installing big skill packs; recommend starting with defaults and adding minimal, task-specific skills.

Reliability, Testing, and “Snake Oil” Skepticism

  • Strong criticism that these harnesses overestimate LLMs as rule followers; models still drop hard requirements.
  • Argument that real safety and quality still require human review, deterministic tests, and sandboxing; workflows alone can’t guarantee correctness.
  • Calls for proper A/B tests, benchmarks, and before/after comparisons; many note this is usually missing.
  • Others respond that, despite imperfections, these systems measurably improve speed and consistency in their environments.

Productivity, Pseudo-Productivity, and Measurement

  • Some believe agent tinkering is “pseudo productivity” and will be seen as a time sink.
  • Others report quantified gains: faster ticket burn-down, automated infra tasks, parallel agent sessions; initial slowdown during learning, then noticeable boost.
  • Debate over what to measure (features shipped, defect rates, MTTR, etc.) and how much experimentation time is justified.

Personalization vs. Shared Configs

  • Multiple commenters emphasize skills are highly personal/team-specific.
  • Recommended usage: treat public skillsets as references; copy/adapt small pieces rather than bulk installing entire frameworks.