Gemini 3
Rollout, Access & Tooling
- Early in the thread many saw “confidential” labels, hard rate limits, and “quota exceeded” errors even though Gemini 3 appeared in AI Studio, Vertex, and APIs. Some reported it quietly working in Canvas with “2.5” before the official flip.
- Gemini 3 Pro shows up as “Thinking” on gemini.google.com, with a low/high “thinking level” option; preview models also exposed via Vertex and API (
gemini-3-pro-preview), and via GitHub Copilot / Cursor. - CLI access is gated by a waitlist; multiple people struggled to understand how Gemini One/Pro/Ultra, Workspace, AI Studio “paid API keys,” and CLI entitlements tie together.
- Antigravity and AI Studio apps impressed some (browser control, app builder, 3D demos) but others hit server errors, missing features, and awkward Google Drive permission prompts.
Pricing & Product Positioning
- API prices rose ~60% for input and ~20% for output vs Gemini 2.5 Pro; long-context (>200k) remains pricier. Some see this as acceptable if fewer prompts are needed; others worry about squeezed margins for app builders.
- Grounded search pricing changed from per-prompt to per-search; unclear net effect.
- Comparisons: still cheaper than Claude Sonnet 4.5; well below Claude Opus pricing. Several note Google’s strategy of bundling Gemini with Google One / Android to drive adoption.
- Marketing claims like “AI Overviews now have 2 billion users” drew skepticism, with people arguing “user == saw the box” rather than opted-in usage.
Benchmarks vs Reality
- Official charts show strong gains on ARC-AGI (1 & 2), NYT Connections, and other reasoning benchmarks, sometimes beating GPT‑5.1 and Claude Sonnet 4.5. Some suspect “benchmaxxing” or contamination of public eval sets.
- Multiple commenters emphasize private, task-specific benchmarks (coding, math, law, medicine, CAD). Experiences conflict: some see Gemini 3 as clear SOTA; others find older models or Claude/OpenAI still better for their niche.
Coding & Agentic Behavior
- For many, Gemini 3 Pro is a big step up from 2.5 in complex coding, refactors, math-heavy code, CAD (e.g., Blender/OpenSCAD scripts), and UI design; a few report one-shot fixes where others failed.
- Others find it weaker than Claude Code or GPT‑5‑Codex for “agentic” workflows: poor instruction following, over-engineered or messy code, hallucinated imports, partial fixes, or ignoring “plan first” instructions. Gemini CLI itself is viewed as buggy and UX‑rough.
- Long-context coding remains mixed: some praise project‑scale reasoning; others say Gemini still misapplies edits and forgets constraints, similar to 2.5.
Multimodal, SVG & Audio
- The “pelican riding a bicycle” SVG test and many variant prompts (giraffe in a Ferrari, goblin animations, 3D scenes) show much better spatial understanding than previous models; people note genuine generalization, not just that one meme.
- Vision is still brittle: it miscounts legs on edited animals and misses extra fingers; commenters attribute this to perception and tokenization limits, and possibly guardrails around sensitive regions.
- Audio performance is polarized: some see huge improvements in meeting summaries with accurate speaker labeling; others get heavy hallucinations, wrong timestamps, and paraphrased “transcripts” on long podcasts.
Privacy, Data & Trust
- A leaked/archived model card line about using “user data” from Google products for training triggered fears about Gmail/Drive being in the training set; others point to ToS/privacy carve‑outs and doubt bulk Gmail training, but trust is low.
- Broader unease persists about surveillance capitalism, ad‑driven incentives, and AI Overviews cannibalizing the open web’s incentive to create content.
Ecosystem, Competition & Impact
- Many see Google “waking up” and possibly retaking the lead from OpenAI/Anthropic on reasoning while leveraging its distribution (Search, Android, Workspace). Others warn that product quality, not just raw models, will decide winners.
- There’s noticeable AI fatigue: people rely on their own tasks as the “real benchmark” and are skeptical of hype. Some worry about job erosion and over‑reliance on LLMs; others see this as just another productivity tool wave akin to IDEs or outsourcing.