My experience creating software with LLM coding agents – Part 2 (Tips)
Prompting, Planning, and Clarifying Questions
- Several commenters echo the value of “measure twice, cut once”: invest tokens in planning, decomposition, and clarifying requirements before coding.
- Forcing the agent to ask clarifying questions upfront (sometimes multiple rounds, even requiring a minimum number of questions) is reported to greatly improve outcomes and reduce wrong turns.
- Others argue a faster workflow is to ask the model to generate an “ideal context” in one shot and then correct it, minimizing back-and-forth.
- Some people stream-of-consciousness “talk it out” to share their mental model; follow-up questions then help expose missing details.
Agents, Sub-Agents, and Context Management
- People describe using dedicated “architect” or planning modes before enabling code generation, which can consume 30–40% of tokens but reduce rework.
- Sub-agents are used for tests, Playwright runs, or reading “memory bank” files and returning only relevant summaries, keeping the main context focused.
- Suggestions include putting instructions in AGENTS.md / CLAUDE.md and using @file syntax rather than long “please read X” prompts.
- One workflow: bundle repos into a single file for whole-codebase understanding; others note this only works for small-to-medium projects.
Pricing, Usage Levels, and ROI
- Strong disagreement over “heavy users should use pay-as-you-go.” Multiple commenters claim Anthropic’s Max plan is dramatically cheaper (≈10% of direct API cost) for high-volume users, especially with rollover to API credits.
- Some report spending $1k+/month on tokens and feeling it has huge productivity ROI; others are skeptical this usage is auditable or personally sustainable.
- There’s debate on whether humans can realistically review the volume of code implied by very high token spend; some reply you only review the “final” successful version, not every intermediate attempt.
- Others prefer cheaper, more conservative tools like GitHub Copilot over agentic systems that can propose large, risky edits.
Testing Behavior and Risks
- Experiences differ on agents disabling tests: some see them give up and skip/disable, others avoid this by only exposing a CLI test runner (e.g., run_tests.sh).
- A recurring warning: agents often generate tests that validate current (possibly buggy) behavior instead of intended behavior, leading to a false sense of stability.
- Using two different models to cross-check each other’s work is reported as effective for catching bugs and bad designs.
Documentation, Readmes, and Bot Instructions
- Human developers ask that bot-specific instructions be kept out of README.md, suggesting BOTS.md or AGENTS.md instead.
- There’s debate about what README.md “means”: some treat it as a project-root signal; others say it’s simply “a thing you should read here,” possibly at many levels of a repo.
- People are converging around AGENTS.md as an emerging convention for LLM-oriented context.
Prompt Style and Over-Engineering
- Mixed experiences on verbosity: some find concise, non-anthropomorphic prompts best; others get better results from long, detailed prompts and sharing their thought process.
- “RFC-style” language (MUST/SHOULD/MAY) is reported to work well for persistent instructions.
- LLMs are seen as prone to over-engineering: adding abstractions, caching, and refactors that look plausible but may be unnecessary or even slower.
- Commenters stress asking tightly scoped improvement questions (e.g., “improve maintainability without major architectural change”) instead of open-ended “how can we improve this?”