How to build a coding agent
Mini SWE Agent and Prompting Approach
- A small (~100 LOC) SWE-bench agent is highlighted as impressively simple; most of its behavior comes from a very short prompt plus a YAML config.
- Core loop: analyze codebase, create a repro script, edit source, rerun tests, then test edge cases. Some commenters reuse similar step-by-step prompts to avoid “debug loops.”
- Others note that the YAML prompt content is substantial and that LLMs can both overestimate and underestimate their own capabilities.
Tools, Bash, and Program Synthesis
- One camp argues a single
bashtool could theoretically cover listing, searching, editing, and patching files. - Another argues for specialized tools (list files, read file, edit file) for safety, sandboxing, and clarity, and because some models have been specifically trained on such tools.
- There’s mention that some models (e.g., Sonnet) sometimes synthesize helper Python programs to perform large refactors in one shot, an emergent form of program synthesis.
Effectiveness on Real Codebases and Costs
- Skeptics say toy or fresh repos are easy; the hard case is large, old codebases where changes must be precise and non-destructive.
- Cost concerns: “throwing tokens at the loop” equates to throwing money at the problem; suggestions include caching repo metadata, anticipating tool calls, and parallelizing calls to reduce cost.
- Local models are seen as promising but still limited for top-tier coding performance.
UX: CLI Agents vs Dashboards/HUDs
- Several people dislike current CLI agents: they lose context, make random edits, get stuck in loops, and rely on crude file-selection heuristics.
- Proposed future: richer dashboards/HUDs with previews, action buttons, kanban/status views, multi-agent coordination, and better “surgical” editing (e.g., AST-based transformations rather than full-file rewrites).
Why Build Your Own Agent?
- Some ask why not just use existing tools like Cursor or Claude Code.
- Supporters say the value is educational: understanding the “tools-in-a-loop” pattern, being able to adapt it to non-coding workflows, and future job relevance.
Presentation, Hype, and Conceptual Framing
- Multiple commenters find the article’s slide-heavy, image-filled format hard to read and “AI-slop-like.”
- Buzzier concepts (AI compass, agentic vs non-agentic models) trigger skepticism and “snake oil” vibes, though others still found the technical core useful.