Gemini 2.5 Computer Use model
Integration and tooling
- The model isn’t a drop‑in replacement: it requires using Google’s predefined
computer_usetool, which confused people trying it inside existing agents or Studio. - Custom tools can clash with the built‑ins, so they must be excluded or carefully configured.
- Some compare this approach to using MCP-based browser tools or Playwright/Puppeteer; many find it simpler to have an LLM generate scripts than to run an LLM in the control loop for every click.
Browser automation performance
- The Browserbase demo impresses some: it can log in, browse, solve tasks like “not a robot” mini‑games, and even play Wordle in some runs.
- Others report it getting stuck (e.g., HN demo, job application tabs, Google Sheets editing, Wordle color feedback) and frequently misclicking due to pure vision+x/y control.
- Latency is described as “painfully slow”; acceptable for background RPA‑style tasks, but a non‑starter for fast E2E test suites.
CAPTCHAs, bot auth, and ethics
- Initial claims that Gemini “solved” Google reCAPTCHA were corrected: Browserbase handles it, likely via specialized infrastructure.
- Browserbase emphasizes they don’t use click farms and point to “verified bot” / Web Bot Auth schemes.
- Commenters note the irony that corporate bots get whitelisted while humans still solve CAPTCHAs, and that only large vendors’ bots qualify.
Use cases and value
- Suggested high‑value uses: automating awful enterprise/Web UIs (HR, licensing, logistics, insurance, healthcare forms), periodic browser‑driven workflows, RPA self‑healing, and accessibility support.
- Many argue a human+LLM loop that produces stable Playwright‑like scripts is more efficient than always running an LLM agent.
UI vs APIs and architecture debate
- One camp calls GUI‑driven AI a “mechanical horse” and wants APIs, structured data, and accessibility trees.
- The opposing view: the real world is messy and adversarial, APIs are rare, and UIs are what’s actually tested and deployed; screenshot‑based vision is universal and often more robust to bad markup.
Governance, reliability, and broader concerns
- Enterprise adoption is seen as contingent on strong hooks/callbacks and RBAC; skeptics note current agents sometimes ignore even “do not proceed” signals.
- Gemini is criticized for poor tool-calling, “laziness” (prematurely declaring tasks done), and Google’s broader track record (e.g., degraded voice assistant behavior).
- Some see computer‑use agents as a key labor-impact benchmark and potential “vertical agent killers”; others worry about fraud, bot detection, and indistinguishable automated interactions with humans.