Notes on rolling out Cursor and Claude Code

Ambition, DevOps, and Tooling

  • Several commenters echoed the “ambition unlock”: agents make previously unthinkable tooling projects (e.g., custom type inference, complex static analysis) feel feasible.
  • Good DevOps (fast local tests, simple commands, CI, linting/prettifying) is repeatedly cited as a force multiplier: it both helps agents work better and is easier to improve because agents can do the grunt work (fixing lint, typing, etc.).
  • Some note tools like Semgrep and structured API docs (e.g., llm.txt) becoming much more valuable in an agent-driven workflow.

Comments, Code Quality, and Maintainability

  • There’s disagreement on “ugly” agent code laden with comments.
    • Some find the excessive “what this line does” comments annoying or low value and enforce “no comments except why” via prompts or rules.
    • Others like the extra comments or simply strip them on review, arguing this is a minor tradeoff.
  • Many report that agents happily produce sprawling, unstructured code that “works” but is hard to maintain. Some see a strong correlation between code that confuses humans and code that breaks/confuses LLMs.

When and Whether to Use Agents

  • A recurring theme is “forgetting” to use agents, even among heavy users.
    • Some interpret this as a sign the tool isn’t always a big win; when you know exactly what to write, typing it is faster than prompting.
    • Others emphasize habit change, cognitive overhead of deciding to invoke the tool, and the joy/value of doing parts of the work manually.
  • Latency, iterative failures, and context-switching cost also push people to sometimes just code directly.

Ecosystem, Interfaces, and Costs

  • Alternatives and complements to Cursor/Claude Code mentioned include Aider, Plandex, JetBrains with Claude, and various CLI + Neovim setups.
  • Claude Code is described as a CLI coding agent that auto-loads project context and applies diffs rather than requiring copy/paste.
  • Token spend varies wildly: some teams see ~$50/month heavy users; others report burning ~$20/day on big refactors. Techniques to control cost include smaller contexts, using cheaper models, chunking tasks, and caching.

Safety, Reliability, and Workflow Design

  • Several people distrust fully agentic editing after experiences like an AI deleting half a file and replacing it with a placeholder comment.
  • Recommended mitigations: always operate via diffs, constrain scope, and have tools propose human-readable change plans.
  • Claude Code is compared to supervising a very fast but very junior dev: potentially productive with close review, disastrous if left unsupervised on larger codebases.

Non-Engineers Shipping Code

  • The article’s example of a head of product and PM shipping hundreds of PRs provoked strong reactions:
    • Proponents say it increases dev capacity, tightens design–implementation loops, and is safe under code review and CI.
    • Skeptics see it as “horrifying” or a “disaster waiting to happen,” arguing non-technical roles should focus on higher-leverage work and that this can create maintenance debt and hype-driven optics.
  • There’s disagreement on whether, in an AI-coding world, “non-technical” remains a meaningful category.

Capabilities, Limits, and Language Choice

  • Agentic review works best when rules are explicit and context is local (e.g., a GitHub Action checking Rails migrations against written guidelines). General PR review is seen as much harder.
  • Typed languages (TypeScript, etc.) are reported to work better with LLMs; type systems catch many AI mistakes. Dynamic languages like Ruby are described as producing more pathological outputs and runtime surprises.

Economic and Philosophical Concerns

  • One view is that if “anyone can ship code,” developer compensation will be pressured downward, even if full replacement doesn’t happen.
  • There’s a deeper dispute over what LLMs are doing:
    • Critics call them “just token predictors” and liken coding agents to snake oil.
    • Others counter that next-token prediction at current scales requires and exhibits nontrivial reasoning, planning, and domain modeling, which, while imperfect, is already practically useful for many coding tasks.