Gemini 2.0: our new AI model for the agentic era
Model capabilities & demos
- Gemini 2.0 Flash adds native multimodality (text, images, audio; video via Multimodal Live) with low latency.
- Native image and audio output are delayed to early next year; current image generation routes through Imagen 3.
- The Multimodal Live API and AI Studio “Live” UI impress many: real‑time voice plus camera/screen sharing can identify objects, read text, critique physical movement, and tutor in tools like Blender.
- Code execution inside the model sandbox works for local Python but has no outbound network access and runs into missing package issues.
Benchmarks, quality & comparisons
- Google claims Gemini 2.0 Flash beats 1.5 Pro on most benchmarks; some users see Flash ≈ old Pro, others say experimental 1206 is stronger.
- On community leaderboards (e.g., LM Arena), Gemini 2.0 Flash ranks near GPT‑4o and other top models, but many distrust benchmarks as over‑optimized.
- One hallucination benchmark shows a very low hallucination rate for 2.0 Flash, but several hands‑on reports still see confident errors and verbose “reasoning” that can mislead.
- Mixed anecdotes: some say coding, Advent of Code, and vision tasks are now competitive with GPT‑4o / Claude; others find GPT‑4o or o1 clearly superior for reasoning and hard debugging.
Search integration & hallucinations
- Strong disagreement about Gemini in Search: some find it increasingly useful; others report frequent factual errors (locations, chemistry definitions, counts of islands, corporate facts) presented as authoritative.
- A few note that some failures are likely inherited from underlying web search, not just the model.
Pricing, access & quotas
- Gemini Advanced subscription is ~£18/month.
- API usage for Flash 2.0 is currently free in preview with 10 RPM limits and ~1,500 requests/day; developers complain this is too low for “agentic” workloads.
- Multimodal Live API is free during preview; many hope production pricing will undercut OpenAI’s relatively expensive audio I/O.
On‑device vs cloud, hardware & economics
- Long debate on whether training or inference is the real moat:
- One side: training compute (TPUs, data) is the scarce asset; inference is a commodity many hardware vendors can provide.
- Other side: at scale, inference costs dominate; without cheap inference or good on‑device performance, economics and adoption suffer.
- Discussion about whether on‑device models (Apple, Android Tensor chips) will become “good enough” to erode demand for paid cloud services.
- Several argue Google doesn’t need to “win” on‑device if cloud inference remains cheap and fast; others think Apple’s eventual strong on‑device AI will force Android to respond.
“Agentic” models & terminology
- “Agentic” is widely mocked as vague marketing jargon; people prefer plain terms like “autonomous” or “tool‑using.”
- Some insist most “agents” are just LLMs plus tools and static workflows; complex multi‑agent handoff systems often underperform a single strong model with tools and long context.
- Others see real promise in browser‑control projects (e.g., Project Mariner) and live multimodal agents, but think the term is over‑applied.
Trust, product longevity & ecosystem
- Persistent concern about Google’s habit of killing or deprecating products and APIs (Reader, messaging apps, Stadia, GCP deprecations).
- Some organizations explicitly avoid Google for core infra, preferring AWS/Anthropic due to perceived stability and clearer long‑term support.
- Fears that violations of vaguely defined AI terms of service could trigger bans affecting entire Google accounts (Gmail, Docs, Photos), with little recourse.
- Counterpoint: core products like Search, Gmail, and Workspace are long‑lived and widely relied upon; AI is seen as strategically central and unlikely to be abandoned.
Developer tooling & practical use
- New Python SDK (googleapis/python‑genai) is praised as more modern; supports structured outputs via schemas (including Pydantic).
- Developers like Gemini’s large context windows for RAG and dumping big docs; also note good speed vs GPT‑4o’s “dog slow” feel.
- Some find Gemini’s web UI weaker than its raw API, which can integrate well into tools (VS Code via Cline, CLI tools like
llm, custom MCP/agent setups).