Gemini 3.1 Pro
Versioning, “Preview” Status & Rollout
- Many are confused why 3.1 Pro is another preview while 3.0 Pro itself never made it to GA; some question what “preview” even means if new previews arrive before earlier ones stabilize.
- Preview models have tighter rate limits and short deprecation windows, so several people say they can’t rely on them for production despite strong capabilities.
- Rollout is described as disjointed: 3.1 showing up unannounced in Vertex or CLI for some users, missing or erroring for others.
Benchmarks, ARC-AGI-2 & Benchmaxxing
- Thread notes big jumps: ARC-AGI-2 from ~31→77%, LiveCodeBench up sharply, strong Terminal-Bench and Artificial Analysis scores.
- Multiple commenters suspect “benchmark maxing,” especially on ARC-AGI-2, pointing to cost-per-task rising ~4x and the existence of public/semiprivate ARC data.
- Others argue targeted training on benchmarks isn’t inherently bad and that post-training/RL is driving most recent gains.
Coding & Agentic Workflows
- Very mixed reviews for coding assistance:
- Some find Gemini 3.x (especially Flash) excellent for tool use, refactoring, and deep research; a few describe large real projects (drivers, GUIs, microscope control software) built with it.
- Many more say Gemini Pro is unreliable in agentic settings (Gemini CLI, Antigravity, OpenCode): looping, ignoring instructions, editing unrelated files, weak tool calling, and “going off the rails.”
- Compared with Anthropic’s Claude Code and OpenAI’s Codex, Gemini is often described as strong at raw reasoning but poor at staying on task and following scoped instructions.
SVG, Vision & the Pelican Meme
- 3.1 shows clear improvement in SVG output: complex animated scenes, UI diagrams, schematics, and the long‑running “pelican riding a bicycle” test now often look coherent.
- Some think this specific capability is now overfit/benchmark‑maxed (Google showcases animated SVG animals in marketing), so the pelican test is no longer a good discriminator.
- Vision remains uneven: impressive on some pattern/geometry tasks, but still failing specific bespoke tests (e.g., medical cross‑sections, some video transcription).
Cost & Competitive Position
- API pricing is unchanged from 3.0 Pro and notably cheaper than some competitors’ flagships; several call Gemini the best “intelligence-per-dollar,” especially versus Opus‑class models.
- Others counter that half-price isn’t compelling if the model wastes time, tokens, or breaks workflows; for serious coding they still prefer Claude or Codex even at higher cost.
Product, Billing & Reliability
- Strong frustration with Google’s developer UX: confusing product matrix (AI Studio vs Vertex vs Workspace vs One “AI Pro”), opaque billing, hard-to-raise limits, and occasional silent failures.
- Some choose to pay an “OpenRouter tax” or use GitHub Copilot/third‑party harnesses rather than integrate Gemini directly.
- Reports of transient outages, very slow responses, chat histories disappearing, and perceived mid‑cycle “nerfs” reinforce trust issues.
Behavior, Style, Hallucinations & Safety
- Style complaints: Gemini is seen as overly verbose, corporate, analogy‑heavy, and fond of bold bullets even when asked not to; many prefer Claude’s or Codex’s “voice.”
- Several say older Gemini versions hallucinated too often; early impressions suggest 3.1 may reduce hallucination rate, but evidence is anecdotal.
- Safety filters sometimes over‑refuse (earlier) or under‑refuse (more recently); behavior feels inconsistent across releases.
Where Gemini Is Liked
- Many use Gemini as a research/search companion: travel planning, product comparisons, document analysis, science/maths reasoning, and multimodal Q&A (photos, diagrams).
- Tight integration with Google Search and Workspace, high context windows, and family‑sharing subscriptions make it appealing for non‑coding knowledge work.
- Overall sentiment: 3.1 Pro looks like a real reasoning upgrade on paper and in some niches, but until Google fixes agentic reliability, tooling, and rollout discipline, many will keep using Gemini for “thinking” and Claude/Codex for “doing.”