"Token anxiety", a slot machine by any other name

Effectiveness of Coding Agents

  • Experiences range from “95% payout” when users are skilled at validation and stay within well-trodden domains to much lower success rates in data engineering/science or novel scientific tasks.
  • Users report LLMs parsing structure (checklists, PDFs) well but misinterpreting meaning, especially numeric results.
  • Some compare different models: in one example, a Codex-based agent spent 45 minutes producing mostly broken E2E tests, while another model solved the same task in 15 minutes and found serious flaws in Codex’s “passing” tests.
  • Consensus: agents are good at scaffolding, boilerplate, and common patterns; getting to production-ready quality often triggers a frustrating “Fixed it!” loop with new bugs.

Workflows, Back-and-Forth, and Guardrails

  • Many describe heavy “back-and-forth” as normal: refining specs, correcting bad plans, restarting when context bloats.
  • Practical tips: detailed README/specs, frequent restarts, stopping the agent when it “goes dumb,” using models mainly as oracles, and treating multi-agent workflows skeptically due to review overhead.
  • Others advocate agent harnesses with tests, linting, custom scripts, and plan-review subagents to systematically ground and constrain behavior.

Slot Machine / Addiction Analogy

  • Supporters see intermittent reward and “one more try” behavior similar to gambling, idle games, or loot boxes; some report real “token anxiety” and neglected hobbies.
  • Critics argue the analogy breaks: LLM makers are (currently) trying to increase reliability; intermittent success is a bug, not a profit-maximizing feature. They frame heavy use as “liking to build things,” not pathology.
  • There’s debate over whether intermittent rewards alone cause compulsion, with some pointing out that most real-world variable rewards (jobs, gardening, sports) don’t create addictions.

Incentives, Business Models, and Enshittification

  • One camp claims providers optimize for engagement and token spend, likening them to casinos or social media; they note verbose defaults and features that encourage multiple agents.
  • Others counter that subscription plans and strong competition incentivize fast, correct answers; if models deliberately wasted tokens, users would switch.
  • Some fear a Google-like trajectory: tools start user-centered, then slowly shift to profit extraction once lock-in and investor pressure grow.

Work Intensity, Burnout, and Code Slop

  • Several commenters think AI tools don’t reduce work; they intensify it: more features shipped, more “cognitive debt,” and less time to deeply understand systems.
  • Work/life boundaries blur because “just sending Claude a message” on a phone feels like low-effort progress, encouraging nights/weekends work in a weak job market.
  • Others say 996-style expectations remain rare and overreported, though they acknowledge creeping weekend activity.
  • Easy code generation tempts teams into overbuilt, messy codebases (“workslop”), where throughput rises but maintainability and architecture suffer.