Agent Skills
Perceived Benefits of Agent Skills / Harnesses
- Several users report strong results from Agent Skills and similar systems on side projects and production work.
- Claimed benefits: more focus on architecture/product design, faster implementation, better handling of large/legacy codebases, and reusable “review” surfaces for plans, docs, and code.
- Some find skills particularly useful for API design, UI testing, infrastructure work, and domain-specific “how to use this tool/library” instructions.
Comparison to Other Frameworks (Superpowers, Spec-kit, etc.)
- Frequent comparisons with Superpowers, Compound Engineering, spec-kit, and other harnesses.
- Mixed experiences with Superpowers:
- Some say it replaces a lot of prompting for complex tasks and improves discipline.
- Others removed it, citing extra latency, token burn, and marginal benefit vs. simply asking the model to plan and ask questions.
- Many suspect heavy overlap between these frameworks; some view them as “prompt libraries” with different branding.
Skills vs. Simple Prompts / Process vs. Outcome
- One camp: best results come from clearly specifying desired outcomes; elaborate processes and long skills are overkill and often untested.
- Counterpoint: for complex tasks, process instructions (plan first, ask clarifying questions, follow conventions, maintain tests, document decisions) significantly improve reliability.
- Skills are framed by some as reusable, shareable context and lightweight “sub-agent” prompts; critics argue many are bloated essays.
Complexity, Context, and Token Costs
- Concern that long skills and many MCPs/plugins bloat context and raise costs.
- Others note only skill metadata is always loaded; full content is pulled selectively, so even multi‑kiloword skills are manageable with large contexts.
- Several warn against blindly installing big skill packs; recommend starting with defaults and adding minimal, task-specific skills.
Reliability, Testing, and “Snake Oil” Skepticism
- Strong criticism that these harnesses overestimate LLMs as rule followers; models still drop hard requirements.
- Argument that real safety and quality still require human review, deterministic tests, and sandboxing; workflows alone can’t guarantee correctness.
- Calls for proper A/B tests, benchmarks, and before/after comparisons; many note this is usually missing.
- Others respond that, despite imperfections, these systems measurably improve speed and consistency in their environments.
Productivity, Pseudo-Productivity, and Measurement
- Some believe agent tinkering is “pseudo productivity” and will be seen as a time sink.
- Others report quantified gains: faster ticket burn-down, automated infra tasks, parallel agent sessions; initial slowdown during learning, then noticeable boost.
- Debate over what to measure (features shipped, defect rates, MTTR, etc.) and how much experimentation time is justified.
Personalization vs. Shared Configs
- Multiple commenters emphasize skills are highly personal/team-specific.
- Recommended usage: treat public skillsets as references; copy/adapt small pieces rather than bulk installing entire frameworks.