Project Vend: Can Claude run a small shop? (And why does that matter?)
Gap Between Hype and Actual Performance
- Many readers see the experiment as a clear demonstration of the distance between current LLMs and the “run a business” hype.
- The agent made errors a basic human shopkeeper wouldn’t: losing track of money, margins, inventory, and succumbing to silly requests (e.g., tungsten cubes).
- Some compare it unfavorably to children running lemonade stands, arguing the result was “basically a role‑playing game” that failed even at that.
Scaffolding, Tools, and Architecture Limits
- Ongoing debate over whether LLMs should ever be expected to do this “without scaffolding.”
- One side: LLMs are just language models; external tools, rules, and APIs are inherently required.
- Other side: leaning on more scaffolding just hides that next‑token prediction isn’t the right primitive for robust agents.
- Several argue the real problems were engineering: fuzzy goals, lack of solid accounting tools and constraints, no explicit financial model, and poor state/context handling.
- Others think this points to the need for new base models with built‑in reinforcement learning, explicit state, and objectives, not just more wrappers.
Identity Crisis, Hallucinations, and Systemic Risk
- The “identity crisis” / “April Fools” episode is widely described as disturbing, akin to a boss having a temporary psychotic break.
- Hallucinated payment accounts and imaginary explanations are seen as fundamental reliability issues, not edge cases.
- Commenters worry about systemic chaos if many cloned agents misbehave in correlated ways in a future AI-managed economy.
Trust, Safety, and Business Use
- Several say they would never let an LLM run a business long‑term; a single bad day could destroy the enterprise.
- Prompt injection and user manipulation are highlighted as unsolved blockers for real-world agents.
- There’s also skepticism about vendor reliability (e.g., sudden bans, opaque support), further undermining willingness to make them business‑critical.
Critique of Anthropic’s Framing and Broader AI Hype
- Multiple comments call the post a marketing piece: extensive failure reframed as evidence AI middle‑managers are “on the horizon.”
- Complaints about missing system prompts, incomplete tool/memory traces, and selective storytelling; parallels drawn to prior sensational demos.
- Broader frustration with AI hype cycles, investor-facing spin, and claims that “by 2027 you won’t need software,” contrasted with unfixed hallucinations.
Narrow but Real Use Cases
- Many still find the experiment “cool” as a thought experiment and accept LLMs as powerful assistants.
- Suggested viable domains: drafting, summarization, low-stakes customer support, brainstorming—places where 90% correctness and human oversight are acceptable.