2025-06-09

How I program with agents

What counts as an “agent”? Naming and definitions

Many agree the article’s “agent = for-loop calling an LLM” is too reductive.
Several propose: an agent is an LLM plus other logic (tests, tools, overseers) that constrain and steer behavior.
Competing phrasings: “tools in a loop”, “LLM feedback loop systems”, “AI‑orchestrated workflows”.
Some defend “agent” as good branding, similar to “Retina Display”: not technically precise but easily understood; others dislike the hype and vagueness.

Architectures and feedback loops

Two main patterns described:
- LLM at the top, calling tools (build, test, run) per instructions.
- Deterministic system at the top, calling LLMs as subroutines.
Use of schemas and constrained decoding to map probabilistic output into structured tool calls; unstructured data (logs, stack traces) often fed back as plain text.
“Mediator” layers may be deterministic, another LLM, or even humans; area is “wild west” with no standard architecture yet.
Containers and isolated dev environments are seen as important for safely running agents in parallel.

Programming practice and enjoyment

Split attitudes:
- Some fear losing the joy of solving problems and worry work becomes writing specs, prompts, and reviews.
- Others say agents revived their enthusiasm by removing boilerplate, config, repetitive refactors, and test scaffolding, letting them focus on design and “fun parts”.
Analogies: power tools vs hand tools; forklifts vs gym weights; juniors you can summon on demand.
Concern that heavy reliance may atrophy code-writing skills and shift work toward continuous review of AI output.

Code review, safety, and security

Strong agreement that review is the bottleneck and already “half‑hearted” in many teams.
Several report security regressions from agent‑written code (old RCE patterns, injections) with developers over‑trusting “make it secure” prompts.
LLMs can convincingly justify wrong or unsafe designs, especially in security/crypto.
Use of LLMs as code reviewers today gets mixed reviews: can find some issues, but often noisy, nitpicky, and misses deeper problems; linters sometimes do better.

Use cases, benefits, and failure modes

Reported wins: repetitive or “formulaic” glue code, CLI/arg parsing, logging setup, multi-file edits, bindings/bridges, test generation, small scripts, planning large refactors, summarizing diffs, API usage reminders.
Failures: hallucinated APIs/endpoints, incorrect numerics or thermistor formulas, weak CSS, shallow or misleading tests, struggling with complex parsers unless heavily guided.
Many emphasize that agents are powerful accelerators if you already understand the domain and can verify outputs; dangerous crutches if you do not.

Related topics