2026-02-16

"Token anxiety", a slot machine by any other name

Effectiveness of Coding Agents

Experiences range from “95% payout” when users are skilled at validation and stay within well-trodden domains to much lower success rates in data engineering/science or novel scientific tasks.
Users report LLMs parsing structure (checklists, PDFs) well but misinterpreting meaning, especially numeric results.
Some compare different models: in one example, a Codex-based agent spent 45 minutes producing mostly broken E2E tests, while another model solved the same task in 15 minutes and found serious flaws in Codex’s “passing” tests.
Consensus: agents are good at scaffolding, boilerplate, and common patterns; getting to production-ready quality often triggers a frustrating “Fixed it!” loop with new bugs.

Workflows, Back-and-Forth, and Guardrails

Many describe heavy “back-and-forth” as normal: refining specs, correcting bad plans, restarting when context bloats.
Practical tips: detailed README/specs, frequent restarts, stopping the agent when it “goes dumb,” using models mainly as oracles, and treating multi-agent workflows skeptically due to review overhead.
Others advocate agent harnesses with tests, linting, custom scripts, and plan-review subagents to systematically ground and constrain behavior.

Slot Machine / Addiction Analogy

Supporters see intermittent reward and “one more try” behavior similar to gambling, idle games, or loot boxes; some report real “token anxiety” and neglected hobbies.
Critics argue the analogy breaks: LLM makers are (currently) trying to increase reliability; intermittent success is a bug, not a profit-maximizing feature. They frame heavy use as “liking to build things,” not pathology.
There’s debate over whether intermittent rewards alone cause compulsion, with some pointing out that most real-world variable rewards (jobs, gardening, sports) don’t create addictions.

Incentives, Business Models, and Enshittification

One camp claims providers optimize for engagement and token spend, likening them to casinos or social media; they note verbose defaults and features that encourage multiple agents.
Others counter that subscription plans and strong competition incentivize fast, correct answers; if models deliberately wasted tokens, users would switch.
Some fear a Google-like trajectory: tools start user-centered, then slowly shift to profit extraction once lock-in and investor pressure grow.

Work Intensity, Burnout, and Code Slop

Several commenters think AI tools don’t reduce work; they intensify it: more features shipped, more “cognitive debt,” and less time to deeply understand systems.
Work/life boundaries blur because “just sending Claude a message” on a phone feels like low-effort progress, encouraging nights/weekends work in a weak job market.
Others say 996-style expectations remain rare and overreported, though they acknowledge creeping weekend activity.
Easy code generation tempts teams into overbuilt, messy codebases (“workslop”), where throughput rises but maintainability and architecture suffer.

Related topics