Computer use in Gemini 3.5 Flash

Ecosystem, MCP, and Missing Integrations

  • Many see lack of MCP / custom tool support in Gemini’s official apps as a major gap.
  • Users relying on MCP gravitate to third‑party CLIs or their own frontends; this reduces the value of Gemini’s native apps and shifts evaluation to pure model/API vs cheaper or better‑fitting alternatives.
  • Some note Google’s fragmented products (Gemini app, CLI, Antigravity) and incompatible subscriptions as a serious usability and trust issue.

Computer Use: Promise vs Problems

  • Critics call screenshot‑driven “computer use” slow, insecure, brittle, expensive, and a token‑wasting hack compared to proper APIs or accessibility layers.
  • Supporters argue it’s pragmatically powerful: automating tedious workflows, intranet/SSO tools, proprietary UIs, RPA‑like tasks, and accessibility or QA scenarios.
  • There’s debate over whether better approaches are:
    • Reverse‑engineering APIs / DOMs,
    • Leveraging accessibility trees, or
    • Letting agents drive full desktops/VMs in sandboxes.
  • Concerns include safety with credentials, ToS violations, and the need for sandboxes or VMs before trusting “computer use” with real systems.

UX, Apps, and “Agentic” Interfaces

  • Gemini’s official apps are widely described as weak: poor instruction following, session loss, small context windows, and inconsistent behavior vs API.
  • Some praise competing apps as significantly better at bridging the gap for mainstream users.
  • There’s interest in native “agent shells” and interaction layers, but current options are seen as janky or fragmented.

Model Quality, Benchmarks & Positioning

  • Discussion notes Google’s own chart showing Gemini 3.5 Flash trailing frontier models on an OS‑world benchmark, though close in some scores and much cheaper.
  • Some think 3.5 Flash is targeted at fast, cheap “agentic” or search‑adjacent workloads rather than hard reasoning or coding.
  • Others report disappointing accuracy and instruction following, sometimes describing the models as “lazy” or a year behind peers.

Guardrails, Refusals, and Regional Variation

  • Several users encounter seemingly over‑aggressive refusals on benign topics (SIM transfers, backups, even cooking eggs).
  • Others on different plans/regions report few or no refusals, suggesting geography, legal risk, or account signals may influence guardrails.
  • Some see this trend, especially in highly regulated regions, as a long‑term risk for paid consumer LLMs.

PDFs, OCR, and Data Extraction

  • Experiences with Gemini on PDFs and tables are highly mixed: from flawless table‑to‑CSV extraction to repeated failures and the model explicitly “giving up.”
  • Many resort to external tools (OCR, PDF libraries, PDF‑to‑Markdown converters) and then feed the cleaned text to models.
  • There’s broader frustration that critical technical information still comes as hard‑to‑parse PDFs.

Coding, Agents, and Safety

  • Users want a clear Gemini equivalent to coding agents that can clone repos, perform static analysis, and open PRs; current offerings via Antigravity/CLI are seen as immature or unreliable.
  • Some report dangerous actions when using agentic tools (e.g., running git reset --hard when asked to commit), reinforcing the need for isolated dev containers or VMs.
  • Overall sentiment: Gemini 3.5 Flash’s speed and price are valued, but many feel the ecosystem, guardrails, instruction following, and developer tooling lag competitors.