Designing agentic loops

Terminology: “Agentic Loop” vs Other Terms

  • Debate over naming: “agentic harness” evokes the interface between LLM and world; “agentic loop” emphasizes the skill of designing tool-driven loops to achieve goals.
  • Relationship to “context engineering”: some see them as closely related; others distinguish context stuffing (docs, examples) from designing tools, environments, and evaluation loops.

Designing Agentic Loops & Context Management

  • Key design questions: which tools to expose, how to implement them, what results stay in context vs are summarized, stored in memory, or discarded.
  • For multi-model systems, it’s unclear whether to rely on model‑builtin memory or implement memory as explicit tools.
  • Tool design must consider context size: e.g., APIs that return huge JSONs are problematic; tools for agents should often differ from tools for humans.
  • Some speculate future models will internalize these patterns (similar to chain-of-thought).

Sandboxing & Execution Environments

  • Strong emphasis on sandboxing for YOLO modes: Docker devcontainers with restricted networking; lightweight options like bubblewrap/firejail; distrobox; plain Unix users/groups; or full VMs (KVM, Linux guests).
  • macOS is viewed as harder: sandbox-exec is deprecated/limited; people explore Lima VMs and app sandbox entitlements but hit practical issues.
  • Some prefer VM-level isolation for robustness; others argue containers are “good enough” for typical dev use where the main risk is “rm -rf /” rather than targeted attacks.

Security & Container Escape Debate

  • One view: prompt‑injected agents will eventually discover container escapes and zero-days autonomously; VMs are recommended for serious isolation.
  • Counterview: that claim is unproven; today’s practical concern is accidental damage, not autonomous zero‑day discovery.
  • General agreement that kernel vulns can turn into sandbox escapes, but for most local YOLO workflows, containers are acceptable risk.

Experiences Building Custom Coding Agents

  • Several people report strong results from custom agents that:
    • Run inside dedicated containers/VMs.
    • Accept “missions” and operate asynchronously with no user interaction.
    • Use speculative shell scripts that try multiple things at once.
  • Observed behaviors include cloning upstream repos to inspect dependencies, aggressively fetching source to understand undocumented APIs, and successfully running 20‑minute uninterrupted inference loops.
  • Checkpointing and rollback are discussed, but some prefer minimizing human-in-the-loop and instead improving mission specs and AGENTS.md.

Non-Coding & Broader Workflows

  • Agentic loops applied to documents/spreadsheets, dependency upgrading (reading changelogs, scanning code usage, rating breaking risk), and other engineering domains (metrics, traces).
  • Commenters liken all this to rediscovering workflow engines; tools like Temporal are cited for orchestration.

Compute, Cost & Parallelism

  • Anthropic’s “high compute” approach uses multiple parallel attempts, regression-test rejection, and internal scoring models to pick best patches, trading higher cost for better results.
  • Large, parallel, long‑running missions are seen as essential to scaling agent productivity, with sandboxing enabling aggressive speculation.

Agent Ergonomics & Configuration

  • Desired UX: “washing machine” model—inspect plan, press go, walk away while the agent runs tests and validations.
  • AGENTS.md is emerging as a de facto convention: concise, agent‑oriented instructions that tools auto‑ingest, distinct from human‑oriented README.md.
  • Some express discomfort with “agentic” as buzzword/marketing, though others try to tighten its definition around “LLM running tools in a loop.”