Gemini 3

Rollout, Access & Tooling

  • Early in the thread many saw “confidential” labels, hard rate limits, and “quota exceeded” errors even though Gemini 3 appeared in AI Studio, Vertex, and APIs. Some reported it quietly working in Canvas with “2.5” before the official flip.
  • Gemini 3 Pro shows up as “Thinking” on gemini.google.com, with a low/high “thinking level” option; preview models also exposed via Vertex and API (gemini-3-pro-preview), and via GitHub Copilot / Cursor.
  • CLI access is gated by a waitlist; multiple people struggled to understand how Gemini One/Pro/Ultra, Workspace, AI Studio “paid API keys,” and CLI entitlements tie together.
  • Antigravity and AI Studio apps impressed some (browser control, app builder, 3D demos) but others hit server errors, missing features, and awkward Google Drive permission prompts.

Pricing & Product Positioning

  • API prices rose ~60% for input and ~20% for output vs Gemini 2.5 Pro; long-context (>200k) remains pricier. Some see this as acceptable if fewer prompts are needed; others worry about squeezed margins for app builders.
  • Grounded search pricing changed from per-prompt to per-search; unclear net effect.
  • Comparisons: still cheaper than Claude Sonnet 4.5; well below Claude Opus pricing. Several note Google’s strategy of bundling Gemini with Google One / Android to drive adoption.
  • Marketing claims like “AI Overviews now have 2 billion users” drew skepticism, with people arguing “user == saw the box” rather than opted-in usage.

Benchmarks vs Reality

  • Official charts show strong gains on ARC-AGI (1 & 2), NYT Connections, and other reasoning benchmarks, sometimes beating GPT‑5.1 and Claude Sonnet 4.5. Some suspect “benchmaxxing” or contamination of public eval sets.
  • Multiple commenters emphasize private, task-specific benchmarks (coding, math, law, medicine, CAD). Experiences conflict: some see Gemini 3 as clear SOTA; others find older models or Claude/OpenAI still better for their niche.

Coding & Agentic Behavior

  • For many, Gemini 3 Pro is a big step up from 2.5 in complex coding, refactors, math-heavy code, CAD (e.g., Blender/OpenSCAD scripts), and UI design; a few report one-shot fixes where others failed.
  • Others find it weaker than Claude Code or GPT‑5‑Codex for “agentic” workflows: poor instruction following, over-engineered or messy code, hallucinated imports, partial fixes, or ignoring “plan first” instructions. Gemini CLI itself is viewed as buggy and UX‑rough.
  • Long-context coding remains mixed: some praise project‑scale reasoning; others say Gemini still misapplies edits and forgets constraints, similar to 2.5.

Multimodal, SVG & Audio

  • The “pelican riding a bicycle” SVG test and many variant prompts (giraffe in a Ferrari, goblin animations, 3D scenes) show much better spatial understanding than previous models; people note genuine generalization, not just that one meme.
  • Vision is still brittle: it miscounts legs on edited animals and misses extra fingers; commenters attribute this to perception and tokenization limits, and possibly guardrails around sensitive regions.
  • Audio performance is polarized: some see huge improvements in meeting summaries with accurate speaker labeling; others get heavy hallucinations, wrong timestamps, and paraphrased “transcripts” on long podcasts.

Privacy, Data & Trust

  • A leaked/archived model card line about using “user data” from Google products for training triggered fears about Gmail/Drive being in the training set; others point to ToS/privacy carve‑outs and doubt bulk Gmail training, but trust is low.
  • Broader unease persists about surveillance capitalism, ad‑driven incentives, and AI Overviews cannibalizing the open web’s incentive to create content.

Ecosystem, Competition & Impact

  • Many see Google “waking up” and possibly retaking the lead from OpenAI/Anthropic on reasoning while leveraging its distribution (Search, Android, Workspace). Others warn that product quality, not just raw models, will decide winners.
  • There’s noticeable AI fatigue: people rely on their own tasks as the “real benchmark” and are skeptical of hype. Some worry about job erosion and over‑reliance on LLMs; others see this as just another productivity tool wave akin to IDEs or outsourcing.