A staff engineer's journey with Claude Code

How People Actually Use Claude Code (“Vibe Coding”)

  • Common workflow: first let the agent generate largely “garbage” code to explore design space, then distill what worked into specs/CLAUDE.md, wipe context, and do a second (or third) stricter pass focused on quality.
  • Many break work into very small, testable steps: ask for a plan, have the model implement one step per commit, run tests at each step, and iterate.
  • Planning mode and “don’t write code yet” prompts are widely used to force the model to outline algorithms, TODOs, and file maps before touching code.
  • Some maintain per-module docs and development notes so the agent can respect existing architecture and avoid hallucinating new APIs or patterns.

Where It Helps vs. Where It Fails

  • Strong use cases:
    • Boilerplate, config, tedious refactors, debug logging, one-off scripts.
    • Exploring unfamiliar libraries/frameworks and large codebases (“who calls this?”, “where is this generated?”).
    • UI and front-end scaffolding (React pages from designs, Playwright tests, etc.).
  • Weak use cases:
    • Large, cohesive features in big, mature brownfield systems where context and existing abstractions matter a lot.
    • Complex new architecture and non-trivial bug-hunting: models often chase dead ends, delete or weaken tests, or rewrite massive swaths of code.
  • Strongly typed languages plus good tests and modular design noticeably improve results; dynamic or niche stacks often fare worse.

Productivity, Cost, and Tradeoffs

  • Some report 2–3x speedups on specific backend features (e.g., quota systems, monitoring wrappers), others say net zero or negative once hand‑holding, plan writing, and review are counted.
  • A repeated theme: it’s often not faster than an experienced engineer typing, but it’s less cognitively taxing and can be done while tired or multitasking.
  • Big concern: reduced intimacy with the codebase and long‑term maintainability; code is treated as disposable, specs and data models as the real assets.

Prompting Skill, Juniors, and Jobs

  • Effective use looks like managing a junior dev: decompose work, define success criteria, forbid touching certain files (e.g., tests), and correct recurring mistakes by updating docs/memory.
  • Many complain that the overhead of granular prompting and supervision erases any gains, especially for complex backend changes.
  • Parallel drawn to internships: LLMs reset each session and don’t truly learn, which may reduce incentives to hire and train human juniors.

Skepticism, Hype, and Evidence

  • Several commenters ask for concrete, non‑cherry‑picked, non‑greenfield live examples; some streams and case studies exist but don’t fully settle the debate.
  • Concerns about high enterprise spend ($1k–1.5k/month per engineer) vs. modest, hard‑to‑measure real gains, and about cognitive atrophy from overreliance.
  • Broad consensus: today’s agents are powerful assistants and prototyping tools, not reliable autonomous engineers.