How I program with agents
What counts as an “agent”? Naming and definitions
- Many agree the article’s “agent = for-loop calling an LLM” is too reductive.
- Several propose: an agent is an LLM plus other logic (tests, tools, overseers) that constrain and steer behavior.
- Competing phrasings: “tools in a loop”, “LLM feedback loop systems”, “AI‑orchestrated workflows”.
- Some defend “agent” as good branding, similar to “Retina Display”: not technically precise but easily understood; others dislike the hype and vagueness.
Architectures and feedback loops
- Two main patterns described:
- LLM at the top, calling tools (build, test, run) per instructions.
- Deterministic system at the top, calling LLMs as subroutines.
- Use of schemas and constrained decoding to map probabilistic output into structured tool calls; unstructured data (logs, stack traces) often fed back as plain text.
- “Mediator” layers may be deterministic, another LLM, or even humans; area is “wild west” with no standard architecture yet.
- Containers and isolated dev environments are seen as important for safely running agents in parallel.
Programming practice and enjoyment
- Split attitudes:
- Some fear losing the joy of solving problems and worry work becomes writing specs, prompts, and reviews.
- Others say agents revived their enthusiasm by removing boilerplate, config, repetitive refactors, and test scaffolding, letting them focus on design and “fun parts”.
- Analogies: power tools vs hand tools; forklifts vs gym weights; juniors you can summon on demand.
- Concern that heavy reliance may atrophy code-writing skills and shift work toward continuous review of AI output.
Code review, safety, and security
- Strong agreement that review is the bottleneck and already “half‑hearted” in many teams.
- Several report security regressions from agent‑written code (old RCE patterns, injections) with developers over‑trusting “make it secure” prompts.
- LLMs can convincingly justify wrong or unsafe designs, especially in security/crypto.
- Use of LLMs as code reviewers today gets mixed reviews: can find some issues, but often noisy, nitpicky, and misses deeper problems; linters sometimes do better.
Use cases, benefits, and failure modes
- Reported wins: repetitive or “formulaic” glue code, CLI/arg parsing, logging setup, multi-file edits, bindings/bridges, test generation, small scripts, planning large refactors, summarizing diffs, API usage reminders.
- Failures: hallucinated APIs/endpoints, incorrect numerics or thermistor formulas, weak CSS, shallow or misleading tests, struggling with complex parsers unless heavily guided.
- Many emphasize that agents are powerful accelerators if you already understand the domain and can verify outputs; dangerous crutches if you do not.