Designing agentic loops
Terminology: “Agentic Loop” vs Other Terms
- Debate over naming: “agentic harness” evokes the interface between LLM and world; “agentic loop” emphasizes the skill of designing tool-driven loops to achieve goals.
- Relationship to “context engineering”: some see them as closely related; others distinguish context stuffing (docs, examples) from designing tools, environments, and evaluation loops.
Designing Agentic Loops & Context Management
- Key design questions: which tools to expose, how to implement them, what results stay in context vs are summarized, stored in memory, or discarded.
- For multi-model systems, it’s unclear whether to rely on model‑builtin memory or implement memory as explicit tools.
- Tool design must consider context size: e.g., APIs that return huge JSONs are problematic; tools for agents should often differ from tools for humans.
- Some speculate future models will internalize these patterns (similar to chain-of-thought).
Sandboxing & Execution Environments
- Strong emphasis on sandboxing for YOLO modes: Docker devcontainers with restricted networking; lightweight options like bubblewrap/firejail; distrobox; plain Unix users/groups; or full VMs (KVM, Linux guests).
- macOS is viewed as harder: sandbox-exec is deprecated/limited; people explore Lima VMs and app sandbox entitlements but hit practical issues.
- Some prefer VM-level isolation for robustness; others argue containers are “good enough” for typical dev use where the main risk is “rm -rf /” rather than targeted attacks.
Security & Container Escape Debate
- One view: prompt‑injected agents will eventually discover container escapes and zero-days autonomously; VMs are recommended for serious isolation.
- Counterview: that claim is unproven; today’s practical concern is accidental damage, not autonomous zero‑day discovery.
- General agreement that kernel vulns can turn into sandbox escapes, but for most local YOLO workflows, containers are acceptable risk.
Experiences Building Custom Coding Agents
- Several people report strong results from custom agents that:
- Run inside dedicated containers/VMs.
- Accept “missions” and operate asynchronously with no user interaction.
- Use speculative shell scripts that try multiple things at once.
- Observed behaviors include cloning upstream repos to inspect dependencies, aggressively fetching source to understand undocumented APIs, and successfully running 20‑minute uninterrupted inference loops.
- Checkpointing and rollback are discussed, but some prefer minimizing human-in-the-loop and instead improving mission specs and AGENTS.md.
Non-Coding & Broader Workflows
- Agentic loops applied to documents/spreadsheets, dependency upgrading (reading changelogs, scanning code usage, rating breaking risk), and other engineering domains (metrics, traces).
- Commenters liken all this to rediscovering workflow engines; tools like Temporal are cited for orchestration.
Compute, Cost & Parallelism
- Anthropic’s “high compute” approach uses multiple parallel attempts, regression-test rejection, and internal scoring models to pick best patches, trading higher cost for better results.
- Large, parallel, long‑running missions are seen as essential to scaling agent productivity, with sandboxing enabling aggressive speculation.
Agent Ergonomics & Configuration
- Desired UX: “washing machine” model—inspect plan, press go, walk away while the agent runs tests and validations.
- AGENTS.md is emerging as a de facto convention: concise, agent‑oriented instructions that tools auto‑ingest, distinct from human‑oriented README.md.
- Some express discomfort with “agentic” as buzzword/marketing, though others try to tighten its definition around “LLM running tools in a loop.”