2026-02-28

Don't trust AI agents

Containerization and Real Threats

Debate over whether Docker/Podman/containers are a “hard” security boundary: some point to multiple recent container escapes; others note many require strong pre-existing privileges.
Several argue that even perfect container security doesn’t fix the main risk: agents holding powerful third‑party credentials (Google, AWS, email). Exfiltrated tokens are far more valuable than rm -rf /.

Agent Permissions, Email, and Irreversible Actions

Many commenters think any agent with inbox access (even “read + draft only”) can still cause serious harm: password resets, magic links, forwarding reset emails, mass exfiltration, or subtle life manipulation (e.g., via reminders/todos).
Some conclude the only truly safe pattern is “read-only + queue suggestions for human approval,” which is closer to a webhook than an autonomous agent.
Others suggest this is inherently unsafe until prompt injection and non‑determinism are fundamentally solved.

Nanoclaw’s Model and Critiques

Nanoclaw’s pitch: small core, each agent in its own container, and “skills” that generate or merge in code on demand so users only get features they explicitly request.
Critics worry that:
- Skills are effectively self‑modifying code guided by an LLM (with RNG), which may be less secure than a conventional plugin system.
- Every install becomes a custom fork, complicating bug reproduction and updates.
The author positions Nanoclaw as a framework, not turnkey software: users are expected to review diffs and keep the codebase small enough to audit after each skill.

OpenClaw, Code Volume, and AI “Slopware”

Shock at OpenClaw’s claimed ~800k lines of TypeScript and thousands of issues/PRs, widely assumed to be largely LLM‑generated.
This triggers a long subthread on why LoC is a terrible metric, how AI encourages bloated “vibe coded” systems, and how verification and maintainability—not raw output—are what matter.
Some share positive anecdotes of using LLMs to rapidly build substantial systems, but emphasize that human review remains the real bottleneck.

Proposed Security Patterns and Their Limits

Suggested mitigations:
- Treat agents like “enthusiastic juniors”: they draft, humans approve.
- No direct secrets; use a hardened proxy/gateway to inject credentials and restrict network access (whitelists, time‑boxed domains, auditing).
- Snapshot/revert for stateful agents; use VMs or microVMs rather than bare containers.
- Limit agents to “recoverable” actions by default.
Others counter that GET-only tools aren’t truly safe (exfiltration via URLs/logs), proxies themselves can be prompt‑injected, and most schemes still assume non‑adversarial contexts.

Trust, Accountability, and Human vs AI

Comparison to contractors/employees: you don’t fully “trust” them either, but there are laws, contracts, and liability. With agents, accountability is unclear, and “just turn it off” is the only recourse.
Several argue current LLMs lack the judgment to be autonomous in sensitive domains; they’re useful as assistants, not unsupervised actors.

Use Cases and Questioning the Need

Reported personal uses: email triage, notes, reminders/goals, meeting transcription enrichment, GitHub/Jira cross‑referencing, simple home workflows, personal research.
Others openly question whether everyday life has enough real “friction” to justify the risk and maintenance burden of powerful autonomous agents, especially ones wired into critical accounts.

Related topics