2026-06-24

Computer use in Gemini 3.5 Flash

Ecosystem, MCP, and Missing Integrations

Many see lack of MCP / custom tool support in Gemini’s official apps as a major gap.
Users relying on MCP gravitate to third‑party CLIs or their own frontends; this reduces the value of Gemini’s native apps and shifts evaluation to pure model/API vs cheaper or better‑fitting alternatives.
Some note Google’s fragmented products (Gemini app, CLI, Antigravity) and incompatible subscriptions as a serious usability and trust issue.

Computer Use: Promise vs Problems

Critics call screenshot‑driven “computer use” slow, insecure, brittle, expensive, and a token‑wasting hack compared to proper APIs or accessibility layers.
Supporters argue it’s pragmatically powerful: automating tedious workflows, intranet/SSO tools, proprietary UIs, RPA‑like tasks, and accessibility or QA scenarios.
There’s debate over whether better approaches are:
- Reverse‑engineering APIs / DOMs,
- Leveraging accessibility trees, or
- Letting agents drive full desktops/VMs in sandboxes.
Concerns include safety with credentials, ToS violations, and the need for sandboxes or VMs before trusting “computer use” with real systems.

UX, Apps, and “Agentic” Interfaces

Gemini’s official apps are widely described as weak: poor instruction following, session loss, small context windows, and inconsistent behavior vs API.
Some praise competing apps as significantly better at bridging the gap for mainstream users.
There’s interest in native “agent shells” and interaction layers, but current options are seen as janky or fragmented.

Model Quality, Benchmarks & Positioning

Discussion notes Google’s own chart showing Gemini 3.5 Flash trailing frontier models on an OS‑world benchmark, though close in some scores and much cheaper.
Some think 3.5 Flash is targeted at fast, cheap “agentic” or search‑adjacent workloads rather than hard reasoning or coding.
Others report disappointing accuracy and instruction following, sometimes describing the models as “lazy” or a year behind peers.

Guardrails, Refusals, and Regional Variation

Several users encounter seemingly over‑aggressive refusals on benign topics (SIM transfers, backups, even cooking eggs).
Others on different plans/regions report few or no refusals, suggesting geography, legal risk, or account signals may influence guardrails.
Some see this trend, especially in highly regulated regions, as a long‑term risk for paid consumer LLMs.

PDFs, OCR, and Data Extraction

Experiences with Gemini on PDFs and tables are highly mixed: from flawless table‑to‑CSV extraction to repeated failures and the model explicitly “giving up.”
Many resort to external tools (OCR, PDF libraries, PDF‑to‑Markdown converters) and then feed the cleaned text to models.
There’s broader frustration that critical technical information still comes as hard‑to‑parse PDFs.

Coding, Agents, and Safety

Users want a clear Gemini equivalent to coding agents that can clone repos, perform static analysis, and open PRs; current offerings via Antigravity/CLI are seen as immature or unreliable.
Some report dangerous actions when using agentic tools (e.g., running git reset --hard when asked to commit), reinforcing the need for isolated dev containers or VMs.
Overall sentiment: Gemini 3.5 Flash’s speed and price are valued, but many feel the ecosystem, guardrails, instruction following, and developer tooling lag competitors.

Related topics