2026-02-01

What I learned building an opinionated and minimal coding agent

Minimal, Opinionated Agent Design (Pi and Similar Projects)

Many commenters like Pi’s “small, observable, batteries-not-included” philosophy: minimal core, explicit tools, and full control over prompts and context.
Pi is seen as a strong underlying architecture (and is used by OpenClaw); some call it the more interesting layer compared to more “hyped” wrappers.
Several people are building or sharing similar minimal agent libraries and harnesses, often with built-in tools and simple CLIs.
Some appreciate that Pi doesn’t hardwire subagents or MCP, instead offering extensions so workflows can be customized rather than prescribed.
Others argue the agent space is converging too much on similar designs (Claude Code / Codex–style harnesses) and that there’s a much larger unexplored design space.

Context Management, Subagents, and Workflows

Strong consensus that context engineering is “everything”: tightly controlled system prompts, explicit workspaces, and persistent memory files (e.g., AGENTS.md, MIND_MAP.md) are seen as high leverage.
Subagents are valued both for performance (offloading to smaller models) and for keeping contexts clean and cheaper; Pi leaves their orchestration to extensions.
Users report success with workflows like: “one commit at a time” with git, agents reading prior traces, and tmux sessions for long-running REPLs or jobs.
Some contrast faster, tightly-looped IDE agents (e.g., Cursor) with more autonomous, slower agents like Claude Code; people pick based on project size and tolerance for autonomy.

Security, Sandboxing, and “YOLO Mode”

There’s broad agreement that once an agent can write and run code, naive guardrails are mostly “security theater.”
Proposed mitigations include: running agents as separate Unix users, chroot/container/VM sandboxes, gVisor/Firecracker isolation, and restricting tools (e.g., read-only mode in Pi).
Disagreement centers on how much is “enough”:
- One side: sandbox + limited filesystem scope meaningfully reduces risk (delete data, join botnets).
- Other side: sandbox doesn’t prevent exfiltration of code, secrets, or API keys if network access is available.
Approval-based execution is contentious: some say every non-read action should be manually approved; others argue this leads to blind “OK” clicking and kills usability.
Ideas emerge for stronger models: capability-based tool systems, agent front-ends that only operate via controlled containers, and credential brokers or MCP-style servers that hold secrets while the agent never sees them.

Comparisons: Claude Code, Codex, Cursor, OpenClaw, Benchmarks

Claude Code is praised for features (plan mode, todo tools, ask-user questions, hooks) and criticized for UI flicker, security choices, and occasional disabling of sandboxes.
Codex’s sandboxing (Seatbelt on macOS, others on different OSes) is defended with docs, but some users report being able to escape or write outside intended paths; skepticism remains.
Cursor is liked for tight feedback loops, model-switching, and good integration with git; some find it more accurate or faster for everyday coding, others find it less capable on niche stacks.
OpenClaw is described as a higher-level harness built on primitives like Pi, emphasizing workspace-level files (AGENTS.md, TOOLS.md, memory/) and multiple specialized agents instead of one monolith.
Pi’s “batteries-not-included” nature means it doesn’t appear on some popular leaderboards, leading to debate over how much benchmarks reflect real usefulness.

Business Models, Moats, and Costs

Several commenters argue major labs’ main moats are capital, ecosystem, and data collected from coding agents, not unique agent UX features, which can be copied.
Subsidized “agent-only” plans and model fine-tuning for those agents provide some temporary advantage but are seen as fragile once tool calling is widespread.
People worry about token costs and vendor lock-in to tools like Claude Code; Pi’s efficient context usage and compatibility with existing subscriptions (e.g., ChatGPT, Anthropic plans) are cited as potential cost savers.
There’s debate over future pricing: some expect API prices to keep dropping with generous agent allowances; others anticipate convergence between subscription bundles and raw API costs.

UI, TUI, and Implementation Details

Strong split between “just print to stdout” minimalists and those investing in TUIs (React/Ink) with higher complexity and performance issues (e.g., flickering).
Some criticize the focus on terminal framerates as misplaced effort compared to improving agent reasoning, while others acknowledge TUIs can offer better diffing and plan-editing UIs.
Developers share practical tips: using WebView2 or browser front-ends for chat-like UIs, integrating with VS Code, and improving diff/blame UX to distinguish human vs AI changes.

Minimalism vs Practical Coverage

Many resonate with the article’s stance: minimal, opinionated systems that solve real workflows can outperform feature-heavy agents, as long as they’re flexible where it matters (model choice, tools, context).
Others caution that extreme minimalism can become overfitted to a single user or environment, missing generality that tools like Claude Code or Codex provide.

Related topics