A staff engineer's journey with Claude Code
How People Actually Use Claude Code (“Vibe Coding”)
- Common workflow: first let the agent generate largely “garbage” code to explore design space, then distill what worked into specs/CLAUDE.md, wipe context, and do a second (or third) stricter pass focused on quality.
- Many break work into very small, testable steps: ask for a plan, have the model implement one step per commit, run tests at each step, and iterate.
- Planning mode and “don’t write code yet” prompts are widely used to force the model to outline algorithms, TODOs, and file maps before touching code.
- Some maintain per-module docs and development notes so the agent can respect existing architecture and avoid hallucinating new APIs or patterns.
Where It Helps vs. Where It Fails
- Strong use cases:
- Boilerplate, config, tedious refactors, debug logging, one-off scripts.
- Exploring unfamiliar libraries/frameworks and large codebases (“who calls this?”, “where is this generated?”).
- UI and front-end scaffolding (React pages from designs, Playwright tests, etc.).
- Weak use cases:
- Large, cohesive features in big, mature brownfield systems where context and existing abstractions matter a lot.
- Complex new architecture and non-trivial bug-hunting: models often chase dead ends, delete or weaken tests, or rewrite massive swaths of code.
- Strongly typed languages plus good tests and modular design noticeably improve results; dynamic or niche stacks often fare worse.
Productivity, Cost, and Tradeoffs
- Some report 2–3x speedups on specific backend features (e.g., quota systems, monitoring wrappers), others say net zero or negative once hand‑holding, plan writing, and review are counted.
- A repeated theme: it’s often not faster than an experienced engineer typing, but it’s less cognitively taxing and can be done while tired or multitasking.
- Big concern: reduced intimacy with the codebase and long‑term maintainability; code is treated as disposable, specs and data models as the real assets.
Prompting Skill, Juniors, and Jobs
- Effective use looks like managing a junior dev: decompose work, define success criteria, forbid touching certain files (e.g., tests), and correct recurring mistakes by updating docs/memory.
- Many complain that the overhead of granular prompting and supervision erases any gains, especially for complex backend changes.
- Parallel drawn to internships: LLMs reset each session and don’t truly learn, which may reduce incentives to hire and train human juniors.
Skepticism, Hype, and Evidence
- Several commenters ask for concrete, non‑cherry‑picked, non‑greenfield live examples; some streams and case studies exist but don’t fully settle the debate.
- Concerns about high enterprise spend ($1k–1.5k/month per engineer) vs. modest, hard‑to‑measure real gains, and about cognitive atrophy from overreliance.
- Broad consensus: today’s agents are powerful assistants and prototyping tools, not reliable autonomous engineers.