How to build a coding agent

Mini SWE Agent and Prompting Approach

  • A small (~100 LOC) SWE-bench agent is highlighted as impressively simple; most of its behavior comes from a very short prompt plus a YAML config.
  • Core loop: analyze codebase, create a repro script, edit source, rerun tests, then test edge cases. Some commenters reuse similar step-by-step prompts to avoid “debug loops.”
  • Others note that the YAML prompt content is substantial and that LLMs can both overestimate and underestimate their own capabilities.

Tools, Bash, and Program Synthesis

  • One camp argues a single bash tool could theoretically cover listing, searching, editing, and patching files.
  • Another argues for specialized tools (list files, read file, edit file) for safety, sandboxing, and clarity, and because some models have been specifically trained on such tools.
  • There’s mention that some models (e.g., Sonnet) sometimes synthesize helper Python programs to perform large refactors in one shot, an emergent form of program synthesis.

Effectiveness on Real Codebases and Costs

  • Skeptics say toy or fresh repos are easy; the hard case is large, old codebases where changes must be precise and non-destructive.
  • Cost concerns: “throwing tokens at the loop” equates to throwing money at the problem; suggestions include caching repo metadata, anticipating tool calls, and parallelizing calls to reduce cost.
  • Local models are seen as promising but still limited for top-tier coding performance.

UX: CLI Agents vs Dashboards/HUDs

  • Several people dislike current CLI agents: they lose context, make random edits, get stuck in loops, and rely on crude file-selection heuristics.
  • Proposed future: richer dashboards/HUDs with previews, action buttons, kanban/status views, multi-agent coordination, and better “surgical” editing (e.g., AST-based transformations rather than full-file rewrites).

Why Build Your Own Agent?

  • Some ask why not just use existing tools like Cursor or Claude Code.
  • Supporters say the value is educational: understanding the “tools-in-a-loop” pattern, being able to adapt it to non-coding workflows, and future job relevance.

Presentation, Hype, and Conceptual Framing

  • Multiple commenters find the article’s slide-heavy, image-filled format hard to read and “AI-slop-like.”
  • Buzzier concepts (AI compass, agentic vs non-agentic models) trigger skepticism and “snake oil” vibes, though others still found the technical core useful.