2025-02-18

My LLM codegen workflow

Author’s Workflow, Tools, and Costs

Commenters appreciate the concreteness of the workflow: plan with an LLM, have it ask questions, generate a TODO/plan, then implement stepwise via tools like Aider, Cursor, repomix, and a CLI wrapper (e.g. llm, mise tasks).
One data point: ~27M input / 1.5M output tokens in a month on Anthropics, costing under $100.
Some readers use similar setups with Cursor, Emacs+gptel, or custom scripts that generate repo “maps” (per-file summaries) to keep prompts small and targeted.
Others note the article glosses over “the prompt” for Aider, but the author clarifies the planning steps themselves produce that starting prompt.

Prompting Techniques and Hallucination Control

A widely praised idea is telling the model to ask clarifying questions (“ask up to N questions before answering”) and to generate a TODO list or plan first.
People report this improves LLM output, helps them notice missing info, and even improves communication with human engineers.
Several note success adding “don’t hallucinate” and “it’s OK to say you don’t know” to prompts; Apple’s system prompts and chain-of-thought self-checking are cited as inspiration, though rigorous evidence is unclear.
Some build prompt libraries or use tools like TextExpander; DSPy is seen as promising but not yet an easy fit.

Productivity, Limits, and Skill Curve

Enthusiasts describe large real-world productivity gains, especially for greenfield / prototype work and small-to-medium scripts, sometimes cutting hours down to tens of minutes.
Others find LLM coding feels like yak-shaving or “futzing with Emacs configs”: fun, but under deadline they revert to traditional coding as LLM output is too unreliable or hard to debug.
A recurring theme: effectiveness is highly skill-dependent—context management, prompt design, and judgment about what to accept or discard are crucial.

Greenfield vs Legacy and Large Codebases

Many agree LLMs excel at greenfield projects but struggle with mature or very large repos: they introduce unnecessary frameworks, wrong abstractions, and subtle bugs.
Strategies discussed: generate per-file summaries, lightweight internal “maps,” scratchpad memory files, and strict modularization to let the model reason about small pieces.
There is debate whether this is mainly a tooling/context-window problem or a deeper issue of models lacking genuine whole-system understanding.

Team Workflows, Centralized Context, and the Future of Dev Work

Several ask how this scales beyond solo work: multiple devs each running their own agent on the same codebase seems wasteful and risks inconsistent advice.
Tools like Cody/Sourcegraph workspaces and shared indexing in editors are mentioned, but a truly “multiplayer” LLM coding environment is seen as still unsolved.
Some foresee LLMs pushing devs toward higher-level planning/acceptance-testing roles, or even replacing much of traditional programming; others worry this will erode developers’ mental models and critical thinking.
There is active debate over the future of frameworks and abstractions: some predict many will become unnecessary as models emit low-level code directly, while others argue good abstractions and readable code remain vital—especially for humans maintaining AI-written systems.

Related topics