2025-10-07

Gemini 2.5 Computer Use model

Integration and tooling

The model isn’t a drop‑in replacement: it requires using Google’s predefined computer_use tool, which confused people trying it inside existing agents or Studio.
Custom tools can clash with the built‑ins, so they must be excluded or carefully configured.
Some compare this approach to using MCP-based browser tools or Playwright/Puppeteer; many find it simpler to have an LLM generate scripts than to run an LLM in the control loop for every click.

Browser automation performance

The Browserbase demo impresses some: it can log in, browse, solve tasks like “not a robot” mini‑games, and even play Wordle in some runs.
Others report it getting stuck (e.g., HN demo, job application tabs, Google Sheets editing, Wordle color feedback) and frequently misclicking due to pure vision+x/y control.
Latency is described as “painfully slow”; acceptable for background RPA‑style tasks, but a non‑starter for fast E2E test suites.

CAPTCHAs, bot auth, and ethics

Initial claims that Gemini “solved” Google reCAPTCHA were corrected: Browserbase handles it, likely via specialized infrastructure.
Browserbase emphasizes they don’t use click farms and point to “verified bot” / Web Bot Auth schemes.
Commenters note the irony that corporate bots get whitelisted while humans still solve CAPTCHAs, and that only large vendors’ bots qualify.

Use cases and value

Suggested high‑value uses: automating awful enterprise/Web UIs (HR, licensing, logistics, insurance, healthcare forms), periodic browser‑driven workflows, RPA self‑healing, and accessibility support.
Many argue a human+LLM loop that produces stable Playwright‑like scripts is more efficient than always running an LLM agent.

UI vs APIs and architecture debate

One camp calls GUI‑driven AI a “mechanical horse” and wants APIs, structured data, and accessibility trees.
The opposing view: the real world is messy and adversarial, APIs are rare, and UIs are what’s actually tested and deployed; screenshot‑based vision is universal and often more robust to bad markup.

Governance, reliability, and broader concerns

Enterprise adoption is seen as contingent on strong hooks/callbacks and RBAC; skeptics note current agents sometimes ignore even “do not proceed” signals.
Gemini is criticized for poor tool-calling, “laziness” (prematurely declaring tasks done), and Google’s broader track record (e.g., degraded voice assistant behavior).
Some see computer‑use agents as a key labor-impact benchmark and potential “vertical agent killers”; others worry about fraud, bot detection, and indistinguishable automated interactions with humans.

Related topics