Computer Use is 45x more expensive than structured APIs

Overall reaction to the benchmark

  • Many find the result unsurprising: UIs built for humans are inefficient for machines; APIs are the “obvious” better interface.
  • Others stress the value is in quantifying the cost gap and showing wall‑clock differences (seconds vs ~17 minutes).
  • Some argue it’s “only” 45x, and expected the gap to be even larger.

When “computer use” is still needed

  • Key use cases: legacy / proprietary / locked‑down apps (EHRs, hotel PMS, archaic SaaS) and sites with no or closed APIs.
  • Examples include medical billing systems where vendors block API access, forcing computer vision + OCR to navigate screens.
  • Some note it’s useful for GUI debugging or end‑to‑end testing where UI behavior (not just API) must be validated.

Critiques of the experiment and models

  • Commenters note vision agents failed on basics like scrolling, inflating token use; some call this a model or prompt‑design problem.
  • Several say the comparison is based on a single workflow, so more like a case study than a true benchmark.
  • There’s interest in testing other browser/agent tools (Playwright, agent‑browser, dev‑browser, hybrid DOM/vision approaches).

OS, UI, and “agent‑first” design

  • One camp argues an “agentic world” needs OSes and apps with first‑class APIs for all functionality, potentially via MCP‑like surfaces, DBus, accessibility APIs, etc.
  • Others push back: incentives for consumer apps (ads, dark patterns, data hoarding) work against exposing clean automation APIs.
  • Historical analogues are mentioned (AppleScript, OLE Automation, Unix shells, REST/HATEOAS); many note they were underused or abandoned.

Cost, latency, and tokens

  • Strong consensus: generic computer/vision use is currently too slow and token‑hungry for most real products; APIs and CLIs are cheaper, faster, and more reliable.
  • Some report browser/computer use quickly exhausting plan limits; others experiment with “code once, reuse many times” (e.g., generating Playwright scripts or reusable workflows) to amortize cost.
  • A minority argue computer use can be competitive when APIs are verbose or nonexistent and when screenshots are small.

Trust, safety, and practicality of agents

  • Many are reluctant to let agents handle high‑stakes, sensitive tasks (taxes, background checks, company workflows) due to hallucinations and accountability concerns.
  • Agents are often likened to interns that need close supervision, not autonomous “real” agents.

Emerging patterns and meta‑effects

  • There’s a visible “mean reversion” toward deterministic, structured interfaces (APIs, schemas, CLIs).
  • AI is unexpectedly driving better documentation, accessibility, and structured design, since these directly improve agent performance for both humans and machines.