Computer Use is 45x more expensive than structured APIs
Overall reaction to the benchmark
- Many find the result unsurprising: UIs built for humans are inefficient for machines; APIs are the “obvious” better interface.
- Others stress the value is in quantifying the cost gap and showing wall‑clock differences (seconds vs ~17 minutes).
- Some argue it’s “only” 45x, and expected the gap to be even larger.
When “computer use” is still needed
- Key use cases: legacy / proprietary / locked‑down apps (EHRs, hotel PMS, archaic SaaS) and sites with no or closed APIs.
- Examples include medical billing systems where vendors block API access, forcing computer vision + OCR to navigate screens.
- Some note it’s useful for GUI debugging or end‑to‑end testing where UI behavior (not just API) must be validated.
Critiques of the experiment and models
- Commenters note vision agents failed on basics like scrolling, inflating token use; some call this a model or prompt‑design problem.
- Several say the comparison is based on a single workflow, so more like a case study than a true benchmark.
- There’s interest in testing other browser/agent tools (Playwright, agent‑browser, dev‑browser, hybrid DOM/vision approaches).
OS, UI, and “agent‑first” design
- One camp argues an “agentic world” needs OSes and apps with first‑class APIs for all functionality, potentially via MCP‑like surfaces, DBus, accessibility APIs, etc.
- Others push back: incentives for consumer apps (ads, dark patterns, data hoarding) work against exposing clean automation APIs.
- Historical analogues are mentioned (AppleScript, OLE Automation, Unix shells, REST/HATEOAS); many note they were underused or abandoned.
Cost, latency, and tokens
- Strong consensus: generic computer/vision use is currently too slow and token‑hungry for most real products; APIs and CLIs are cheaper, faster, and more reliable.
- Some report browser/computer use quickly exhausting plan limits; others experiment with “code once, reuse many times” (e.g., generating Playwright scripts or reusable workflows) to amortize cost.
- A minority argue computer use can be competitive when APIs are verbose or nonexistent and when screenshots are small.
Trust, safety, and practicality of agents
- Many are reluctant to let agents handle high‑stakes, sensitive tasks (taxes, background checks, company workflows) due to hallucinations and accountability concerns.
- Agents are often likened to interns that need close supervision, not autonomous “real” agents.
Emerging patterns and meta‑effects
- There’s a visible “mean reversion” toward deterministic, structured interfaces (APIs, schemas, CLIs).
- AI is unexpectedly driving better documentation, accessibility, and structured design, since these directly improve agent performance for both humans and machines.