2026-02-19

Gemini 3.1 Pro

Versioning, “Preview” Status & Rollout

Many are confused why 3.1 Pro is another preview while 3.0 Pro itself never made it to GA; some question what “preview” even means if new previews arrive before earlier ones stabilize.
Preview models have tighter rate limits and short deprecation windows, so several people say they can’t rely on them for production despite strong capabilities.
Rollout is described as disjointed: 3.1 showing up unannounced in Vertex or CLI for some users, missing or erroring for others.

Benchmarks, ARC-AGI-2 & Benchmaxxing

Thread notes big jumps: ARC-AGI-2 from ~31→77%, LiveCodeBench up sharply, strong Terminal-Bench and Artificial Analysis scores.
Multiple commenters suspect “benchmark maxing,” especially on ARC-AGI-2, pointing to cost-per-task rising ~4x and the existence of public/semiprivate ARC data.
Others argue targeted training on benchmarks isn’t inherently bad and that post-training/RL is driving most recent gains.

Coding & Agentic Workflows

Very mixed reviews for coding assistance:
- Some find Gemini 3.x (especially Flash) excellent for tool use, refactoring, and deep research; a few describe large real projects (drivers, GUIs, microscope control software) built with it.
- Many more say Gemini Pro is unreliable in agentic settings (Gemini CLI, Antigravity, OpenCode): looping, ignoring instructions, editing unrelated files, weak tool calling, and “going off the rails.”
Compared with Anthropic’s Claude Code and OpenAI’s Codex, Gemini is often described as strong at raw reasoning but poor at staying on task and following scoped instructions.

SVG, Vision & the Pelican Meme

3.1 shows clear improvement in SVG output: complex animated scenes, UI diagrams, schematics, and the long‑running “pelican riding a bicycle” test now often look coherent.
Some think this specific capability is now overfit/benchmark‑maxed (Google showcases animated SVG animals in marketing), so the pelican test is no longer a good discriminator.
Vision remains uneven: impressive on some pattern/geometry tasks, but still failing specific bespoke tests (e.g., medical cross‑sections, some video transcription).

Cost & Competitive Position

API pricing is unchanged from 3.0 Pro and notably cheaper than some competitors’ flagships; several call Gemini the best “intelligence-per-dollar,” especially versus Opus‑class models.
Others counter that half-price isn’t compelling if the model wastes time, tokens, or breaks workflows; for serious coding they still prefer Claude or Codex even at higher cost.

Product, Billing & Reliability

Strong frustration with Google’s developer UX: confusing product matrix (AI Studio vs Vertex vs Workspace vs One “AI Pro”), opaque billing, hard-to-raise limits, and occasional silent failures.
Some choose to pay an “OpenRouter tax” or use GitHub Copilot/third‑party harnesses rather than integrate Gemini directly.
Reports of transient outages, very slow responses, chat histories disappearing, and perceived mid‑cycle “nerfs” reinforce trust issues.

Behavior, Style, Hallucinations & Safety

Style complaints: Gemini is seen as overly verbose, corporate, analogy‑heavy, and fond of bold bullets even when asked not to; many prefer Claude’s or Codex’s “voice.”
Several say older Gemini versions hallucinated too often; early impressions suggest 3.1 may reduce hallucination rate, but evidence is anecdotal.
Safety filters sometimes over‑refuse (earlier) or under‑refuse (more recently); behavior feels inconsistent across releases.

Where Gemini Is Liked

Many use Gemini as a research/search companion: travel planning, product comparisons, document analysis, science/maths reasoning, and multimodal Q&A (photos, diagrams).
Tight integration with Google Search and Workspace, high context windows, and family‑sharing subscriptions make it appealing for non‑coding knowledge work.
Overall sentiment: 3.1 Pro looks like a real reasoning upgrade on paper and in some niches, but until Google fixes agentic reliability, tooling, and rollout discipline, many will keep using Gemini for “thinking” and Claude/Codex for “doing.”

Related topics