2026-01-08

How to code Claude Code in 200 lines of code

Core idea: agent = LLM + tools + loop

Many commenters agree the article accurately captures the conceptual core: a while-loop where the LLM chooses tools, the harness runs them, and results go back into context.
Several minimal examples are shared (tens of lines in Bash, JS, PHP, Python) to show how small a usable loop can be.
The post is compared to earlier “how to build an agent” pieces that made the same “emperor has no clothes” point.

Where real Claude Code diverges

Multiple people say the article is now out of date: current Claude Code has parallel subagents, hooks, skills, improved planning, TODO/task management, and more sophisticated context handling.
There’s internal plumbing not visible from the loop: UUID-threaded histories, message queues, file-history snapshots, subagent side-chains, queuing of tool calls, etc.
Some describe Claude Code as closer to a RL‑trained conductor/orchestrator than a 200‑line script.

Harness vs model quality

One camp argues model improvements (e.g., newer Claude Opus vs earlier Sonnet) dominate; simple harnesses like mini-swe-agent can match or beat fancy ones if the model is strong.
Another camp says harness details matter a lot in practice: UX, planning, skills, approvals, context pruning, and parallelization can make a weaker model plus good harness competitive for many tasks.
Benchmarks and anecdotal comparisons suggest large quality gaps between model generations that no harness can fully erase.

Planning, TODOs, and “early stopping”

A recurring pain point is premature task completion: the model stops after a few steps and declares “done.”
Claude Code’s TODO/task tools, repeatedly injected into prompts and kept at the top of context, are cited as a key mitigation; experiments show disabling them significantly degrades performance.
People describe custom variants: persistent “plan.md” files, working-memory files, DSLs for task termination, and “nudges” when the model forgets to call tools.

Production complexity, safety, and skepticism

Practitioners building large-scale agents emphasize edge cases: user messages during active loops, Slack/webhook integration, approvals, error handling, structured decoding, and resuming async tasks.
Some liken the article to “Twitter in 200 lines”: educational but glossing over the bulk of real-world complexity.
Concerns are raised about agents’ broad filesystem access and the risks of running them unsandboxed.

Related topics