2025-09-02

A staff engineer's journey with Claude Code

How People Actually Use Claude Code (“Vibe Coding”)

Common workflow: first let the agent generate largely “garbage” code to explore design space, then distill what worked into specs/CLAUDE.md, wipe context, and do a second (or third) stricter pass focused on quality.
Many break work into very small, testable steps: ask for a plan, have the model implement one step per commit, run tests at each step, and iterate.
Planning mode and “don’t write code yet” prompts are widely used to force the model to outline algorithms, TODOs, and file maps before touching code.
Some maintain per-module docs and development notes so the agent can respect existing architecture and avoid hallucinating new APIs or patterns.

Where It Helps vs. Where It Fails

Strong use cases:
- Boilerplate, config, tedious refactors, debug logging, one-off scripts.
- Exploring unfamiliar libraries/frameworks and large codebases (“who calls this?”, “where is this generated?”).
- UI and front-end scaffolding (React pages from designs, Playwright tests, etc.).
Weak use cases:
- Large, cohesive features in big, mature brownfield systems where context and existing abstractions matter a lot.
- Complex new architecture and non-trivial bug-hunting: models often chase dead ends, delete or weaken tests, or rewrite massive swaths of code.
Strongly typed languages plus good tests and modular design noticeably improve results; dynamic or niche stacks often fare worse.

Productivity, Cost, and Tradeoffs

Some report 2–3x speedups on specific backend features (e.g., quota systems, monitoring wrappers), others say net zero or negative once hand‑holding, plan writing, and review are counted.
A repeated theme: it’s often not faster than an experienced engineer typing, but it’s less cognitively taxing and can be done while tired or multitasking.
Big concern: reduced intimacy with the codebase and long‑term maintainability; code is treated as disposable, specs and data models as the real assets.

Prompting Skill, Juniors, and Jobs

Effective use looks like managing a junior dev: decompose work, define success criteria, forbid touching certain files (e.g., tests), and correct recurring mistakes by updating docs/memory.
Many complain that the overhead of granular prompting and supervision erases any gains, especially for complex backend changes.
Parallel drawn to internships: LLMs reset each session and don’t truly learn, which may reduce incentives to hire and train human juniors.

Skepticism, Hype, and Evidence

Several commenters ask for concrete, non‑cherry‑picked, non‑greenfield live examples; some streams and case studies exist but don’t fully settle the debate.
Concerns about high enterprise spend ($1k–1.5k/month per engineer) vs. modest, hard‑to‑measure real gains, and about cognitive atrophy from overreliance.
Broad consensus: today’s agents are powerful assistants and prototyping tools, not reliable autonomous engineers.

Related topics