Gemini 2.5 Computer Use model

Integration and tooling

  • The model isn’t a drop‑in replacement: it requires using Google’s predefined computer_use tool, which confused people trying it inside existing agents or Studio.
  • Custom tools can clash with the built‑ins, so they must be excluded or carefully configured.
  • Some compare this approach to using MCP-based browser tools or Playwright/Puppeteer; many find it simpler to have an LLM generate scripts than to run an LLM in the control loop for every click.

Browser automation performance

  • The Browserbase demo impresses some: it can log in, browse, solve tasks like “not a robot” mini‑games, and even play Wordle in some runs.
  • Others report it getting stuck (e.g., HN demo, job application tabs, Google Sheets editing, Wordle color feedback) and frequently misclicking due to pure vision+x/y control.
  • Latency is described as “painfully slow”; acceptable for background RPA‑style tasks, but a non‑starter for fast E2E test suites.

CAPTCHAs, bot auth, and ethics

  • Initial claims that Gemini “solved” Google reCAPTCHA were corrected: Browserbase handles it, likely via specialized infrastructure.
  • Browserbase emphasizes they don’t use click farms and point to “verified bot” / Web Bot Auth schemes.
  • Commenters note the irony that corporate bots get whitelisted while humans still solve CAPTCHAs, and that only large vendors’ bots qualify.

Use cases and value

  • Suggested high‑value uses: automating awful enterprise/Web UIs (HR, licensing, logistics, insurance, healthcare forms), periodic browser‑driven workflows, RPA self‑healing, and accessibility support.
  • Many argue a human+LLM loop that produces stable Playwright‑like scripts is more efficient than always running an LLM agent.

UI vs APIs and architecture debate

  • One camp calls GUI‑driven AI a “mechanical horse” and wants APIs, structured data, and accessibility trees.
  • The opposing view: the real world is messy and adversarial, APIs are rare, and UIs are what’s actually tested and deployed; screenshot‑based vision is universal and often more robust to bad markup.

Governance, reliability, and broader concerns

  • Enterprise adoption is seen as contingent on strong hooks/callbacks and RBAC; skeptics note current agents sometimes ignore even “do not proceed” signals.
  • Gemini is criticized for poor tool-calling, “laziness” (prematurely declaring tasks done), and Google’s broader track record (e.g., degraded voice assistant behavior).
  • Some see computer‑use agents as a key labor-impact benchmark and potential “vertical agent killers”; others worry about fraud, bot detection, and indistinguishable automated interactions with humans.