My LLM codegen workflow

Author’s Workflow, Tools, and Costs

  • Commenters appreciate the concreteness of the workflow: plan with an LLM, have it ask questions, generate a TODO/plan, then implement stepwise via tools like Aider, Cursor, repomix, and a CLI wrapper (e.g. llm, mise tasks).
  • One data point: ~27M input / 1.5M output tokens in a month on Anthropics, costing under $100.
  • Some readers use similar setups with Cursor, Emacs+gptel, or custom scripts that generate repo “maps” (per-file summaries) to keep prompts small and targeted.
  • Others note the article glosses over “the prompt” for Aider, but the author clarifies the planning steps themselves produce that starting prompt.

Prompting Techniques and Hallucination Control

  • A widely praised idea is telling the model to ask clarifying questions (“ask up to N questions before answering”) and to generate a TODO list or plan first.
  • People report this improves LLM output, helps them notice missing info, and even improves communication with human engineers.
  • Several note success adding “don’t hallucinate” and “it’s OK to say you don’t know” to prompts; Apple’s system prompts and chain-of-thought self-checking are cited as inspiration, though rigorous evidence is unclear.
  • Some build prompt libraries or use tools like TextExpander; DSPy is seen as promising but not yet an easy fit.

Productivity, Limits, and Skill Curve

  • Enthusiasts describe large real-world productivity gains, especially for greenfield / prototype work and small-to-medium scripts, sometimes cutting hours down to tens of minutes.
  • Others find LLM coding feels like yak-shaving or “futzing with Emacs configs”: fun, but under deadline they revert to traditional coding as LLM output is too unreliable or hard to debug.
  • A recurring theme: effectiveness is highly skill-dependent—context management, prompt design, and judgment about what to accept or discard are crucial.

Greenfield vs Legacy and Large Codebases

  • Many agree LLMs excel at greenfield projects but struggle with mature or very large repos: they introduce unnecessary frameworks, wrong abstractions, and subtle bugs.
  • Strategies discussed: generate per-file summaries, lightweight internal “maps,” scratchpad memory files, and strict modularization to let the model reason about small pieces.
  • There is debate whether this is mainly a tooling/context-window problem or a deeper issue of models lacking genuine whole-system understanding.

Team Workflows, Centralized Context, and the Future of Dev Work

  • Several ask how this scales beyond solo work: multiple devs each running their own agent on the same codebase seems wasteful and risks inconsistent advice.
  • Tools like Cody/Sourcegraph workspaces and shared indexing in editors are mentioned, but a truly “multiplayer” LLM coding environment is seen as still unsolved.
  • Some foresee LLMs pushing devs toward higher-level planning/acceptance-testing roles, or even replacing much of traditional programming; others worry this will erode developers’ mental models and critical thinking.
  • There is active debate over the future of frameworks and abstractions: some predict many will become unnecessary as models emit low-level code directly, while others argue good abstractions and readable code remain vital—especially for humans maintaining AI-written systems.