My experience creating software with LLM coding agents – Part 2 (Tips)

Prompting, Planning, and Clarifying Questions

  • Several commenters echo the value of “measure twice, cut once”: invest tokens in planning, decomposition, and clarifying requirements before coding.
  • Forcing the agent to ask clarifying questions upfront (sometimes multiple rounds, even requiring a minimum number of questions) is reported to greatly improve outcomes and reduce wrong turns.
  • Others argue a faster workflow is to ask the model to generate an “ideal context” in one shot and then correct it, minimizing back-and-forth.
  • Some people stream-of-consciousness “talk it out” to share their mental model; follow-up questions then help expose missing details.

Agents, Sub-Agents, and Context Management

  • People describe using dedicated “architect” or planning modes before enabling code generation, which can consume 30–40% of tokens but reduce rework.
  • Sub-agents are used for tests, Playwright runs, or reading “memory bank” files and returning only relevant summaries, keeping the main context focused.
  • Suggestions include putting instructions in AGENTS.md / CLAUDE.md and using @file syntax rather than long “please read X” prompts.
  • One workflow: bundle repos into a single file for whole-codebase understanding; others note this only works for small-to-medium projects.

Pricing, Usage Levels, and ROI

  • Strong disagreement over “heavy users should use pay-as-you-go.” Multiple commenters claim Anthropic’s Max plan is dramatically cheaper (≈10% of direct API cost) for high-volume users, especially with rollover to API credits.
  • Some report spending $1k+/month on tokens and feeling it has huge productivity ROI; others are skeptical this usage is auditable or personally sustainable.
  • There’s debate on whether humans can realistically review the volume of code implied by very high token spend; some reply you only review the “final” successful version, not every intermediate attempt.
  • Others prefer cheaper, more conservative tools like GitHub Copilot over agentic systems that can propose large, risky edits.

Testing Behavior and Risks

  • Experiences differ on agents disabling tests: some see them give up and skip/disable, others avoid this by only exposing a CLI test runner (e.g., run_tests.sh).
  • A recurring warning: agents often generate tests that validate current (possibly buggy) behavior instead of intended behavior, leading to a false sense of stability.
  • Using two different models to cross-check each other’s work is reported as effective for catching bugs and bad designs.

Documentation, Readmes, and Bot Instructions

  • Human developers ask that bot-specific instructions be kept out of README.md, suggesting BOTS.md or AGENTS.md instead.
  • There’s debate about what README.md “means”: some treat it as a project-root signal; others say it’s simply “a thing you should read here,” possibly at many levels of a repo.
  • People are converging around AGENTS.md as an emerging convention for LLM-oriented context.

Prompt Style and Over-Engineering

  • Mixed experiences on verbosity: some find concise, non-anthropomorphic prompts best; others get better results from long, detailed prompts and sharing their thought process.
  • “RFC-style” language (MUST/SHOULD/MAY) is reported to work well for persistent instructions.
  • LLMs are seen as prone to over-engineering: adding abstractions, caching, and refactors that look plausible but may be unnecessary or even slower.
  • Commenters stress asking tightly scoped improvement questions (e.g., “improve maintainability without major architectural change”) instead of open-ended “how can we improve this?”