2025-08-23

My experience creating software with LLM coding agents – Part 2 (Tips)

Prompting, Planning, and Clarifying Questions

Several commenters echo the value of “measure twice, cut once”: invest tokens in planning, decomposition, and clarifying requirements before coding.
Forcing the agent to ask clarifying questions upfront (sometimes multiple rounds, even requiring a minimum number of questions) is reported to greatly improve outcomes and reduce wrong turns.
Others argue a faster workflow is to ask the model to generate an “ideal context” in one shot and then correct it, minimizing back-and-forth.
Some people stream-of-consciousness “talk it out” to share their mental model; follow-up questions then help expose missing details.

Agents, Sub-Agents, and Context Management

People describe using dedicated “architect” or planning modes before enabling code generation, which can consume 30–40% of tokens but reduce rework.
Sub-agents are used for tests, Playwright runs, or reading “memory bank” files and returning only relevant summaries, keeping the main context focused.
Suggestions include putting instructions in AGENTS.md / CLAUDE.md and using @file syntax rather than long “please read X” prompts.
One workflow: bundle repos into a single file for whole-codebase understanding; others note this only works for small-to-medium projects.

Pricing, Usage Levels, and ROI

Strong disagreement over “heavy users should use pay-as-you-go.” Multiple commenters claim Anthropic’s Max plan is dramatically cheaper (≈10% of direct API cost) for high-volume users, especially with rollover to API credits.
Some report spending $1k+/month on tokens and feeling it has huge productivity ROI; others are skeptical this usage is auditable or personally sustainable.
There’s debate on whether humans can realistically review the volume of code implied by very high token spend; some reply you only review the “final” successful version, not every intermediate attempt.
Others prefer cheaper, more conservative tools like GitHub Copilot over agentic systems that can propose large, risky edits.

Testing Behavior and Risks

Experiences differ on agents disabling tests: some see them give up and skip/disable, others avoid this by only exposing a CLI test runner (e.g., run_tests.sh).
A recurring warning: agents often generate tests that validate current (possibly buggy) behavior instead of intended behavior, leading to a false sense of stability.
Using two different models to cross-check each other’s work is reported as effective for catching bugs and bad designs.

Documentation, Readmes, and Bot Instructions

Human developers ask that bot-specific instructions be kept out of README.md, suggesting BOTS.md or AGENTS.md instead.
There’s debate about what README.md “means”: some treat it as a project-root signal; others say it’s simply “a thing you should read here,” possibly at many levels of a repo.
People are converging around AGENTS.md as an emerging convention for LLM-oriented context.

Prompt Style and Over-Engineering

Mixed experiences on verbosity: some find concise, non-anthropomorphic prompts best; others get better results from long, detailed prompts and sharing their thought process.
“RFC-style” language (MUST/SHOULD/MAY) is reported to work well for persistent instructions.
LLMs are seen as prone to over-engineering: adding abstractions, caching, and refactors that look plausible but may be unnecessary or even slower.
Commenters stress asking tightly scoped improvement questions (e.g., “improve maintainability without major architectural change”) instead of open-ended “how can we improve this?”

Related topics