2026-05-04

Agent Skills

Perceived Benefits of Agent Skills / Harnesses

Several users report strong results from Agent Skills and similar systems on side projects and production work.
Claimed benefits: more focus on architecture/product design, faster implementation, better handling of large/legacy codebases, and reusable “review” surfaces for plans, docs, and code.
Some find skills particularly useful for API design, UI testing, infrastructure work, and domain-specific “how to use this tool/library” instructions.

Comparison to Other Frameworks (Superpowers, Spec-kit, etc.)

Frequent comparisons with Superpowers, Compound Engineering, spec-kit, and other harnesses.
Mixed experiences with Superpowers:
- Some say it replaces a lot of prompting for complex tasks and improves discipline.
- Others removed it, citing extra latency, token burn, and marginal benefit vs. simply asking the model to plan and ask questions.
Many suspect heavy overlap between these frameworks; some view them as “prompt libraries” with different branding.

Skills vs. Simple Prompts / Process vs. Outcome

One camp: best results come from clearly specifying desired outcomes; elaborate processes and long skills are overkill and often untested.
Counterpoint: for complex tasks, process instructions (plan first, ask clarifying questions, follow conventions, maintain tests, document decisions) significantly improve reliability.
Skills are framed by some as reusable, shareable context and lightweight “sub-agent” prompts; critics argue many are bloated essays.

Complexity, Context, and Token Costs

Concern that long skills and many MCPs/plugins bloat context and raise costs.
Others note only skill metadata is always loaded; full content is pulled selectively, so even multi‑kiloword skills are manageable with large contexts.
Several warn against blindly installing big skill packs; recommend starting with defaults and adding minimal, task-specific skills.

Reliability, Testing, and “Snake Oil” Skepticism

Strong criticism that these harnesses overestimate LLMs as rule followers; models still drop hard requirements.
Argument that real safety and quality still require human review, deterministic tests, and sandboxing; workflows alone can’t guarantee correctness.
Calls for proper A/B tests, benchmarks, and before/after comparisons; many note this is usually missing.
Others respond that, despite imperfections, these systems measurably improve speed and consistency in their environments.

Productivity, Pseudo-Productivity, and Measurement

Some believe agent tinkering is “pseudo productivity” and will be seen as a time sink.
Others report quantified gains: faster ticket burn-down, automated infra tasks, parallel agent sessions; initial slowdown during learning, then noticeable boost.
Debate over what to measure (features shipped, defect rates, MTTR, etc.) and how much experimentation time is justified.

Personalization vs. Shared Configs

Multiple commenters emphasize skills are highly personal/team-specific.
Recommended usage: treat public skillsets as references; copy/adapt small pieces rather than bulk installing entire frameworks.

Related topics