Gemini 3.1 Pro

Versioning, “Preview” Status & Rollout

  • Many are confused why 3.1 Pro is another preview while 3.0 Pro itself never made it to GA; some question what “preview” even means if new previews arrive before earlier ones stabilize.
  • Preview models have tighter rate limits and short deprecation windows, so several people say they can’t rely on them for production despite strong capabilities.
  • Rollout is described as disjointed: 3.1 showing up unannounced in Vertex or CLI for some users, missing or erroring for others.

Benchmarks, ARC-AGI-2 & Benchmaxxing

  • Thread notes big jumps: ARC-AGI-2 from ~31→77%, LiveCodeBench up sharply, strong Terminal-Bench and Artificial Analysis scores.
  • Multiple commenters suspect “benchmark maxing,” especially on ARC-AGI-2, pointing to cost-per-task rising ~4x and the existence of public/semiprivate ARC data.
  • Others argue targeted training on benchmarks isn’t inherently bad and that post-training/RL is driving most recent gains.

Coding & Agentic Workflows

  • Very mixed reviews for coding assistance:
    • Some find Gemini 3.x (especially Flash) excellent for tool use, refactoring, and deep research; a few describe large real projects (drivers, GUIs, microscope control software) built with it.
    • Many more say Gemini Pro is unreliable in agentic settings (Gemini CLI, Antigravity, OpenCode): looping, ignoring instructions, editing unrelated files, weak tool calling, and “going off the rails.”
  • Compared with Anthropic’s Claude Code and OpenAI’s Codex, Gemini is often described as strong at raw reasoning but poor at staying on task and following scoped instructions.

SVG, Vision & the Pelican Meme

  • 3.1 shows clear improvement in SVG output: complex animated scenes, UI diagrams, schematics, and the long‑running “pelican riding a bicycle” test now often look coherent.
  • Some think this specific capability is now overfit/benchmark‑maxed (Google showcases animated SVG animals in marketing), so the pelican test is no longer a good discriminator.
  • Vision remains uneven: impressive on some pattern/geometry tasks, but still failing specific bespoke tests (e.g., medical cross‑sections, some video transcription).

Cost & Competitive Position

  • API pricing is unchanged from 3.0 Pro and notably cheaper than some competitors’ flagships; several call Gemini the best “intelligence-per-dollar,” especially versus Opus‑class models.
  • Others counter that half-price isn’t compelling if the model wastes time, tokens, or breaks workflows; for serious coding they still prefer Claude or Codex even at higher cost.

Product, Billing & Reliability

  • Strong frustration with Google’s developer UX: confusing product matrix (AI Studio vs Vertex vs Workspace vs One “AI Pro”), opaque billing, hard-to-raise limits, and occasional silent failures.
  • Some choose to pay an “OpenRouter tax” or use GitHub Copilot/third‑party harnesses rather than integrate Gemini directly.
  • Reports of transient outages, very slow responses, chat histories disappearing, and perceived mid‑cycle “nerfs” reinforce trust issues.

Behavior, Style, Hallucinations & Safety

  • Style complaints: Gemini is seen as overly verbose, corporate, analogy‑heavy, and fond of bold bullets even when asked not to; many prefer Claude’s or Codex’s “voice.”
  • Several say older Gemini versions hallucinated too often; early impressions suggest 3.1 may reduce hallucination rate, but evidence is anecdotal.
  • Safety filters sometimes over‑refuse (earlier) or under‑refuse (more recently); behavior feels inconsistent across releases.

Where Gemini Is Liked

  • Many use Gemini as a research/search companion: travel planning, product comparisons, document analysis, science/maths reasoning, and multimodal Q&A (photos, diagrams).
  • Tight integration with Google Search and Workspace, high context windows, and family‑sharing subscriptions make it appealing for non‑coding knowledge work.
  • Overall sentiment: 3.1 Pro looks like a real reasoning upgrade on paper and in some niches, but until Google fixes agentic reliability, tooling, and rollout discipline, many will keep using Gemini for “thinking” and Claude/Codex for “doing.”