How I program with agents

What counts as an “agent”? Naming and definitions

  • Many agree the article’s “agent = for-loop calling an LLM” is too reductive.
  • Several propose: an agent is an LLM plus other logic (tests, tools, overseers) that constrain and steer behavior.
  • Competing phrasings: “tools in a loop”, “LLM feedback loop systems”, “AI‑orchestrated workflows”.
  • Some defend “agent” as good branding, similar to “Retina Display”: not technically precise but easily understood; others dislike the hype and vagueness.

Architectures and feedback loops

  • Two main patterns described:
    • LLM at the top, calling tools (build, test, run) per instructions.
    • Deterministic system at the top, calling LLMs as subroutines.
  • Use of schemas and constrained decoding to map probabilistic output into structured tool calls; unstructured data (logs, stack traces) often fed back as plain text.
  • “Mediator” layers may be deterministic, another LLM, or even humans; area is “wild west” with no standard architecture yet.
  • Containers and isolated dev environments are seen as important for safely running agents in parallel.

Programming practice and enjoyment

  • Split attitudes:
    • Some fear losing the joy of solving problems and worry work becomes writing specs, prompts, and reviews.
    • Others say agents revived their enthusiasm by removing boilerplate, config, repetitive refactors, and test scaffolding, letting them focus on design and “fun parts”.
  • Analogies: power tools vs hand tools; forklifts vs gym weights; juniors you can summon on demand.
  • Concern that heavy reliance may atrophy code-writing skills and shift work toward continuous review of AI output.

Code review, safety, and security

  • Strong agreement that review is the bottleneck and already “half‑hearted” in many teams.
  • Several report security regressions from agent‑written code (old RCE patterns, injections) with developers over‑trusting “make it secure” prompts.
  • LLMs can convincingly justify wrong or unsafe designs, especially in security/crypto.
  • Use of LLMs as code reviewers today gets mixed reviews: can find some issues, but often noisy, nitpicky, and misses deeper problems; linters sometimes do better.

Use cases, benefits, and failure modes

  • Reported wins: repetitive or “formulaic” glue code, CLI/arg parsing, logging setup, multi-file edits, bindings/bridges, test generation, small scripts, planning large refactors, summarizing diffs, API usage reminders.
  • Failures: hallucinated APIs/endpoints, incorrect numerics or thermistor formulas, weak CSS, shallow or misleading tests, struggling with complex parsers unless heavily guided.
  • Many emphasize that agents are powerful accelerators if you already understand the domain and can verify outputs; dangerous crutches if you do not.