2025-11-18

Gemini 3

Rollout, Access & Tooling

Early in the thread many saw “confidential” labels, hard rate limits, and “quota exceeded” errors even though Gemini 3 appeared in AI Studio, Vertex, and APIs. Some reported it quietly working in Canvas with “2.5” before the official flip.
Gemini 3 Pro shows up as “Thinking” on gemini.google.com, with a low/high “thinking level” option; preview models also exposed via Vertex and API (gemini-3-pro-preview), and via GitHub Copilot / Cursor.
CLI access is gated by a waitlist; multiple people struggled to understand how Gemini One/Pro/Ultra, Workspace, AI Studio “paid API keys,” and CLI entitlements tie together.
Antigravity and AI Studio apps impressed some (browser control, app builder, 3D demos) but others hit server errors, missing features, and awkward Google Drive permission prompts.

Pricing & Product Positioning

API prices rose ~60% for input and ~20% for output vs Gemini 2.5 Pro; long-context (>200k) remains pricier. Some see this as acceptable if fewer prompts are needed; others worry about squeezed margins for app builders.
Grounded search pricing changed from per-prompt to per-search; unclear net effect.
Comparisons: still cheaper than Claude Sonnet 4.5; well below Claude Opus pricing. Several note Google’s strategy of bundling Gemini with Google One / Android to drive adoption.
Marketing claims like “AI Overviews now have 2 billion users” drew skepticism, with people arguing “user == saw the box” rather than opted-in usage.

Benchmarks vs Reality

Official charts show strong gains on ARC-AGI (1 & 2), NYT Connections, and other reasoning benchmarks, sometimes beating GPT‑5.1 and Claude Sonnet 4.5. Some suspect “benchmaxxing” or contamination of public eval sets.
Multiple commenters emphasize private, task-specific benchmarks (coding, math, law, medicine, CAD). Experiences conflict: some see Gemini 3 as clear SOTA; others find older models or Claude/OpenAI still better for their niche.

Coding & Agentic Behavior

For many, Gemini 3 Pro is a big step up from 2.5 in complex coding, refactors, math-heavy code, CAD (e.g., Blender/OpenSCAD scripts), and UI design; a few report one-shot fixes where others failed.
Others find it weaker than Claude Code or GPT‑5‑Codex for “agentic” workflows: poor instruction following, over-engineered or messy code, hallucinated imports, partial fixes, or ignoring “plan first” instructions. Gemini CLI itself is viewed as buggy and UX‑rough.
Long-context coding remains mixed: some praise project‑scale reasoning; others say Gemini still misapplies edits and forgets constraints, similar to 2.5.

Multimodal, SVG & Audio

The “pelican riding a bicycle” SVG test and many variant prompts (giraffe in a Ferrari, goblin animations, 3D scenes) show much better spatial understanding than previous models; people note genuine generalization, not just that one meme.
Vision is still brittle: it miscounts legs on edited animals and misses extra fingers; commenters attribute this to perception and tokenization limits, and possibly guardrails around sensitive regions.
Audio performance is polarized: some see huge improvements in meeting summaries with accurate speaker labeling; others get heavy hallucinations, wrong timestamps, and paraphrased “transcripts” on long podcasts.

Privacy, Data & Trust

A leaked/archived model card line about using “user data” from Google products for training triggered fears about Gmail/Drive being in the training set; others point to ToS/privacy carve‑outs and doubt bulk Gmail training, but trust is low.
Broader unease persists about surveillance capitalism, ad‑driven incentives, and AI Overviews cannibalizing the open web’s incentive to create content.

Ecosystem, Competition & Impact

Many see Google “waking up” and possibly retaking the lead from OpenAI/Anthropic on reasoning while leveraging its distribution (Search, Android, Workspace). Others warn that product quality, not just raw models, will decide winners.
There’s noticeable AI fatigue: people rely on their own tasks as the “real benchmark” and are skeptical of hype. Some worry about job erosion and over‑reliance on LLMs; others see this as just another productivity tool wave akin to IDEs or outsourcing.

Related topics