My LLM codegen workflow
Author’s Workflow, Tools, and Costs
- Commenters appreciate the concreteness of the workflow: plan with an LLM, have it ask questions, generate a TODO/plan, then implement stepwise via tools like Aider, Cursor, repomix, and a CLI wrapper (e.g.
llm, mise tasks). - One data point: ~27M input / 1.5M output tokens in a month on Anthropics, costing under $100.
- Some readers use similar setups with Cursor, Emacs+gptel, or custom scripts that generate repo “maps” (per-file summaries) to keep prompts small and targeted.
- Others note the article glosses over “the prompt” for Aider, but the author clarifies the planning steps themselves produce that starting prompt.
Prompting Techniques and Hallucination Control
- A widely praised idea is telling the model to ask clarifying questions (“ask up to N questions before answering”) and to generate a TODO list or plan first.
- People report this improves LLM output, helps them notice missing info, and even improves communication with human engineers.
- Several note success adding “don’t hallucinate” and “it’s OK to say you don’t know” to prompts; Apple’s system prompts and chain-of-thought self-checking are cited as inspiration, though rigorous evidence is unclear.
- Some build prompt libraries or use tools like TextExpander; DSPy is seen as promising but not yet an easy fit.
Productivity, Limits, and Skill Curve
- Enthusiasts describe large real-world productivity gains, especially for greenfield / prototype work and small-to-medium scripts, sometimes cutting hours down to tens of minutes.
- Others find LLM coding feels like yak-shaving or “futzing with Emacs configs”: fun, but under deadline they revert to traditional coding as LLM output is too unreliable or hard to debug.
- A recurring theme: effectiveness is highly skill-dependent—context management, prompt design, and judgment about what to accept or discard are crucial.
Greenfield vs Legacy and Large Codebases
- Many agree LLMs excel at greenfield projects but struggle with mature or very large repos: they introduce unnecessary frameworks, wrong abstractions, and subtle bugs.
- Strategies discussed: generate per-file summaries, lightweight internal “maps,” scratchpad memory files, and strict modularization to let the model reason about small pieces.
- There is debate whether this is mainly a tooling/context-window problem or a deeper issue of models lacking genuine whole-system understanding.
Team Workflows, Centralized Context, and the Future of Dev Work
- Several ask how this scales beyond solo work: multiple devs each running their own agent on the same codebase seems wasteful and risks inconsistent advice.
- Tools like Cody/Sourcegraph workspaces and shared indexing in editors are mentioned, but a truly “multiplayer” LLM coding environment is seen as still unsolved.
- Some foresee LLMs pushing devs toward higher-level planning/acceptance-testing roles, or even replacing much of traditional programming; others worry this will erode developers’ mental models and critical thinking.
- There is active debate over the future of frameworks and abstractions: some predict many will become unnecessary as models emit low-level code directly, while others argue good abstractions and readable code remain vital—especially for humans maintaining AI-written systems.