2026-05-05

Computer Use is 45x more expensive than structured APIs

Overall reaction to the benchmark

Many find the result unsurprising: UIs built for humans are inefficient for machines; APIs are the “obvious” better interface.
Others stress the value is in quantifying the cost gap and showing wall‑clock differences (seconds vs ~17 minutes).
Some argue it’s “only” 45x, and expected the gap to be even larger.

When “computer use” is still needed

Key use cases: legacy / proprietary / locked‑down apps (EHRs, hotel PMS, archaic SaaS) and sites with no or closed APIs.
Examples include medical billing systems where vendors block API access, forcing computer vision + OCR to navigate screens.
Some note it’s useful for GUI debugging or end‑to‑end testing where UI behavior (not just API) must be validated.

Critiques of the experiment and models

Commenters note vision agents failed on basics like scrolling, inflating token use; some call this a model or prompt‑design problem.
Several say the comparison is based on a single workflow, so more like a case study than a true benchmark.
There’s interest in testing other browser/agent tools (Playwright, agent‑browser, dev‑browser, hybrid DOM/vision approaches).

OS, UI, and “agent‑first” design

One camp argues an “agentic world” needs OSes and apps with first‑class APIs for all functionality, potentially via MCP‑like surfaces, DBus, accessibility APIs, etc.
Others push back: incentives for consumer apps (ads, dark patterns, data hoarding) work against exposing clean automation APIs.
Historical analogues are mentioned (AppleScript, OLE Automation, Unix shells, REST/HATEOAS); many note they were underused or abandoned.

Cost, latency, and tokens

Strong consensus: generic computer/vision use is currently too slow and token‑hungry for most real products; APIs and CLIs are cheaper, faster, and more reliable.
Some report browser/computer use quickly exhausting plan limits; others experiment with “code once, reuse many times” (e.g., generating Playwright scripts or reusable workflows) to amortize cost.
A minority argue computer use can be competitive when APIs are verbose or nonexistent and when screenshots are small.

Trust, safety, and practicality of agents

Many are reluctant to let agents handle high‑stakes, sensitive tasks (taxes, background checks, company workflows) due to hallucinations and accountability concerns.
Agents are often likened to interns that need close supervision, not autonomous “real” agents.

Emerging patterns and meta‑effects

There’s a visible “mean reversion” toward deterministic, structured interfaces (APIs, schemas, CLIs).
AI is unexpectedly driving better documentation, accessibility, and structured design, since these directly improve agent performance for both humans and machines.

Related topics