Gemini 2.0: our new AI model for the agentic era

Model capabilities & demos

  • Gemini 2.0 Flash adds native multimodality (text, images, audio; video via Multimodal Live) with low latency.
  • Native image and audio output are delayed to early next year; current image generation routes through Imagen 3.
  • The Multimodal Live API and AI Studio “Live” UI impress many: real‑time voice plus camera/screen sharing can identify objects, read text, critique physical movement, and tutor in tools like Blender.
  • Code execution inside the model sandbox works for local Python but has no outbound network access and runs into missing package issues.

Benchmarks, quality & comparisons

  • Google claims Gemini 2.0 Flash beats 1.5 Pro on most benchmarks; some users see Flash ≈ old Pro, others say experimental 1206 is stronger.
  • On community leaderboards (e.g., LM Arena), Gemini 2.0 Flash ranks near GPT‑4o and other top models, but many distrust benchmarks as over‑optimized.
  • One hallucination benchmark shows a very low hallucination rate for 2.0 Flash, but several hands‑on reports still see confident errors and verbose “reasoning” that can mislead.
  • Mixed anecdotes: some say coding, Advent of Code, and vision tasks are now competitive with GPT‑4o / Claude; others find GPT‑4o or o1 clearly superior for reasoning and hard debugging.

Search integration & hallucinations

  • Strong disagreement about Gemini in Search: some find it increasingly useful; others report frequent factual errors (locations, chemistry definitions, counts of islands, corporate facts) presented as authoritative.
  • A few note that some failures are likely inherited from underlying web search, not just the model.

Pricing, access & quotas

  • Gemini Advanced subscription is ~£18/month.
  • API usage for Flash 2.0 is currently free in preview with 10 RPM limits and ~1,500 requests/day; developers complain this is too low for “agentic” workloads.
  • Multimodal Live API is free during preview; many hope production pricing will undercut OpenAI’s relatively expensive audio I/O.

On‑device vs cloud, hardware & economics

  • Long debate on whether training or inference is the real moat:
    • One side: training compute (TPUs, data) is the scarce asset; inference is a commodity many hardware vendors can provide.
    • Other side: at scale, inference costs dominate; without cheap inference or good on‑device performance, economics and adoption suffer.
  • Discussion about whether on‑device models (Apple, Android Tensor chips) will become “good enough” to erode demand for paid cloud services.
  • Several argue Google doesn’t need to “win” on‑device if cloud inference remains cheap and fast; others think Apple’s eventual strong on‑device AI will force Android to respond.

“Agentic” models & terminology

  • “Agentic” is widely mocked as vague marketing jargon; people prefer plain terms like “autonomous” or “tool‑using.”
  • Some insist most “agents” are just LLMs plus tools and static workflows; complex multi‑agent handoff systems often underperform a single strong model with tools and long context.
  • Others see real promise in browser‑control projects (e.g., Project Mariner) and live multimodal agents, but think the term is over‑applied.

Trust, product longevity & ecosystem

  • Persistent concern about Google’s habit of killing or deprecating products and APIs (Reader, messaging apps, Stadia, GCP deprecations).
  • Some organizations explicitly avoid Google for core infra, preferring AWS/Anthropic due to perceived stability and clearer long‑term support.
  • Fears that violations of vaguely defined AI terms of service could trigger bans affecting entire Google accounts (Gmail, Docs, Photos), with little recourse.
  • Counterpoint: core products like Search, Gmail, and Workspace are long‑lived and widely relied upon; AI is seen as strategically central and unlikely to be abandoned.

Developer tooling & practical use

  • New Python SDK (googleapis/python‑genai) is praised as more modern; supports structured outputs via schemas (including Pydantic).
  • Developers like Gemini’s large context windows for RAG and dumping big docs; also note good speed vs GPT‑4o’s “dog slow” feel.
  • Some find Gemini’s web UI weaker than its raw API, which can integrate well into tools (VS Code via Cline, CLI tools like llm, custom MCP/agent setups).