2025-09-30

Designing agentic loops

Terminology: “Agentic Loop” vs Other Terms

Debate over naming: “agentic harness” evokes the interface between LLM and world; “agentic loop” emphasizes the skill of designing tool-driven loops to achieve goals.
Relationship to “context engineering”: some see them as closely related; others distinguish context stuffing (docs, examples) from designing tools, environments, and evaluation loops.

Designing Agentic Loops & Context Management

Key design questions: which tools to expose, how to implement them, what results stay in context vs are summarized, stored in memory, or discarded.
For multi-model systems, it’s unclear whether to rely on model‑builtin memory or implement memory as explicit tools.
Tool design must consider context size: e.g., APIs that return huge JSONs are problematic; tools for agents should often differ from tools for humans.
Some speculate future models will internalize these patterns (similar to chain-of-thought).

Sandboxing & Execution Environments

Strong emphasis on sandboxing for YOLO modes: Docker devcontainers with restricted networking; lightweight options like bubblewrap/firejail; distrobox; plain Unix users/groups; or full VMs (KVM, Linux guests).
macOS is viewed as harder: sandbox-exec is deprecated/limited; people explore Lima VMs and app sandbox entitlements but hit practical issues.
Some prefer VM-level isolation for robustness; others argue containers are “good enough” for typical dev use where the main risk is “rm -rf /” rather than targeted attacks.

Security & Container Escape Debate

One view: prompt‑injected agents will eventually discover container escapes and zero-days autonomously; VMs are recommended for serious isolation.
Counterview: that claim is unproven; today’s practical concern is accidental damage, not autonomous zero‑day discovery.
General agreement that kernel vulns can turn into sandbox escapes, but for most local YOLO workflows, containers are acceptable risk.

Experiences Building Custom Coding Agents

Several people report strong results from custom agents that:
- Run inside dedicated containers/VMs.
- Accept “missions” and operate asynchronously with no user interaction.
- Use speculative shell scripts that try multiple things at once.
Observed behaviors include cloning upstream repos to inspect dependencies, aggressively fetching source to understand undocumented APIs, and successfully running 20‑minute uninterrupted inference loops.
Checkpointing and rollback are discussed, but some prefer minimizing human-in-the-loop and instead improving mission specs and AGENTS.md.

Non-Coding & Broader Workflows

Agentic loops applied to documents/spreadsheets, dependency upgrading (reading changelogs, scanning code usage, rating breaking risk), and other engineering domains (metrics, traces).
Commenters liken all this to rediscovering workflow engines; tools like Temporal are cited for orchestration.

Compute, Cost & Parallelism

Anthropic’s “high compute” approach uses multiple parallel attempts, regression-test rejection, and internal scoring models to pick best patches, trading higher cost for better results.
Large, parallel, long‑running missions are seen as essential to scaling agent productivity, with sandboxing enabling aggressive speculation.

Agent Ergonomics & Configuration

Desired UX: “washing machine” model—inspect plan, press go, walk away while the agent runs tests and validations.
AGENTS.md is emerging as a de facto convention: concise, agent‑oriented instructions that tools auto‑ingest, distinct from human‑oriented README.md.
Some express discomfort with “agentic” as buzzword/marketing, though others try to tighten its definition around “LLM running tools in a loop.”

Related topics