2026-03-17

Get Shit Done: A meta-prompting, context engineering and spec-driven dev system

Perceived Benefits of GSD / Spec-Driven Harnesses

Some users report big productivity boosts vs “raw” Claude Code: getting to ~90–95% completeness on complex tasks, then finishing with manual testing.
Examples mentioned: self‑hosted VPN manager, SaaS products (including agent‑centric CMS), macOS/iOS apps, data pipelines, lab preprocessing/visualization.
Fans like the enforced structure: research → spec → plan → implement; multi‑step cross‑checks; and storing specs/plans as persistent context.
Spec‑driven workflows (including alternatives like openspec, Superpowers, PAUL) are seen as helping clarify requirements, avoid vibe‑coding, and make it easier to constrain and evolve one’s own AI workflow over time.

Major Criticisms and Pain Points

Many found GSD and similar frameworks overengineered, slow, and “all ceremony”: lots of planning and transcripts for modest code output.
Several users got equal or better results just using Claude Code plan mode, markdown specs, or simple custom scripts/loops.
Complaints include: difficulty adjusting plans when requirements change, black‑box behavior, and poor handling once projects become large and messy.

Token Usage, Speed, and Scale

Repeated reports of extreme token consumption: hitting 5‑hour and weekly Claude limits quickly; hours of agent work vs minutes with lighter workflows.
GSD is often described as a “token burner,” with Superpowers and other harnesses having similar issues in some setups.
Quick or “thin” modes partially mitigate cost but undercut the main value proposition of full orchestration.

Spec vs Tests and Verification

Strong concern that LOC and speed overshadow verification. More AI‑generated code often means less thorough human review.
Several argue that natural‑language specs don’t scale: they rot, are ambiguous, and aren’t systematically checked against behavior.
Counterview: specs improve clarity and feed into tests; tests are seen by some as the true executable specs. There’s interest in workflows that enforce test‑first, mutation testing, and adversarial reviews.

Harness Design, Autonomy, and Safety

Debate over whether these are just unnecessary CLI wrappers vs genuinely useful “harnesses” that offload orchestration to deterministic software.
Some prefer minimal scripts plus plan mode; others layer custom agents, property‑graph planners, or Ralph‑style loops.
Safety concerns around GSD’s recommendation to skip permission prompts; suggestion to run in sandboxes/VMs and to have finer‑grained permission profiles.

Broader Reflections

Many see these frameworks as today’s equivalent of elaborate editor configs: highly personal, often ephemeral, and quickly outdated by new model capabilities.
There’s a call for benchmarks and real‑world evidence (e.g., production features shipped, long‑lived codebases touched) rather than LOC or demo claims.

Related topics