2025-08-24

How to build a coding agent

Mini SWE Agent and Prompting Approach

A small (~100 LOC) SWE-bench agent is highlighted as impressively simple; most of its behavior comes from a very short prompt plus a YAML config.
Core loop: analyze codebase, create a repro script, edit source, rerun tests, then test edge cases. Some commenters reuse similar step-by-step prompts to avoid “debug loops.”
Others note that the YAML prompt content is substantial and that LLMs can both overestimate and underestimate their own capabilities.

Tools, Bash, and Program Synthesis

One camp argues a single bash tool could theoretically cover listing, searching, editing, and patching files.
Another argues for specialized tools (list files, read file, edit file) for safety, sandboxing, and clarity, and because some models have been specifically trained on such tools.
There’s mention that some models (e.g., Sonnet) sometimes synthesize helper Python programs to perform large refactors in one shot, an emergent form of program synthesis.

Effectiveness on Real Codebases and Costs

Skeptics say toy or fresh repos are easy; the hard case is large, old codebases where changes must be precise and non-destructive.
Cost concerns: “throwing tokens at the loop” equates to throwing money at the problem; suggestions include caching repo metadata, anticipating tool calls, and parallelizing calls to reduce cost.
Local models are seen as promising but still limited for top-tier coding performance.

UX: CLI Agents vs Dashboards/HUDs

Several people dislike current CLI agents: they lose context, make random edits, get stuck in loops, and rely on crude file-selection heuristics.
Proposed future: richer dashboards/HUDs with previews, action buttons, kanban/status views, multi-agent coordination, and better “surgical” editing (e.g., AST-based transformations rather than full-file rewrites).

Why Build Your Own Agent?

Some ask why not just use existing tools like Cursor or Claude Code.
Supporters say the value is educational: understanding the “tools-in-a-loop” pattern, being able to adapt it to non-coding workflows, and future job relevance.

Presentation, Hype, and Conceptual Framing

Multiple commenters find the article’s slide-heavy, image-filled format hard to read and “AI-slop-like.”
Buzzier concepts (AI compass, agentic vs non-agentic models) trigger skepticism and “snake oil” vibes, though others still found the technical core useful.

Related topics