2026-05-19

Gemini 3.5 Flash

Pricing and Model Positioning

Gemini 3.5 Flash standard pricing is reported as ~$1.50/M input and $9/M output tokens, about 3× the previous Flash tier and similar to older Pro models.
Several commenters note confusion between batch/flex vs on‑demand pricing; early posts misquoted cheaper numbers.
Many see this as Flash effectively becoming “new Pro”: cheaper than 3.1 Pro per token but not a “cheap fast model” anymore.
Some suspect this is not cost-based but an attempt to move upmarket and reduce overload from underpriced Flash models.

Performance, Benchmarks, and Token Use

Benchmarks look strong; several claim 3.5 Flash is near or at Sonnet‑class intelligence and beats 3.1 Pro on many tests.
Others highlight cost-per-task: Artificial Analysis shows 3.5 Flash costing ~74% more than 3.1 Pro to run their suite, while scoring lower.
Multiple hands-on tests show 3.5 using many more “thinking” tokens; that can erase speed/price advantages in real workloads.
Some report 3.5 Flash solves coding/design tasks in far fewer tokens than older Gemini models; others find it more verbose.

Developer Experience, Tools, and Reliability

Antigravity 2.0 (CLI and GUI) is praised as a strong agent harness, but:
- Quotas on the Gemini “AI Pro” plan were sharply reduced (e.g., “12 Pro prompts per 5 hours”, very easy to hit).
- People hit quota or 5xx errors after a handful of Antigravity sessions; some are canceling subscriptions.
- Complaints that failed generations (e.g., image overload errors) still consume quota.
Google’s AI Studio and API are widely described as flaky and inconsistent compared to competitors.

Coding and Agentic Use

Opinions on coding are polarized:
- Some say raw coding/reasoning is very strong for a “flash” model and competitive with higher tiers.
- Others find 3.5 Flash clearly worse than frontier models in deep systems code, long-horizon refactors, and tool use.
Recurring theme: Gemini models are “smart but stubborn” — disregard AGENTS.md/instructions, overbuild features, disable tooling, or ignore linters.
Agentic performance (multi-step tool use, large projects) is often called Gemini’s weak spot; several say this regressed vs older models.

Hallucinations, Knowledge Cutoff, and Search

Knowledge cutoff is January 2025 with “latest update May 2026”; some find the lag worrying given rapidly LLM‑polluted web data.
Many still see frequent hallucinations in legal, research, niche APIs, and gaming contexts, even with web search enabled.
Others argue that web-grounded harnesses for top models have made hallucinations rare for everyday questions, but not for specialized domains.

Competition and Local Models

DeepSeek V4 (especially Flash) and Qwen 3.6 are repeatedly cited as dramatically cheaper with “good enough” capability, especially for coding.
Several note that open‑weight models now approach last year’s frontier and can be run locally on high-end consumer hardware; this makes rising cloud prices less attractive.
Some foresee a three‑tier future: free/cheap local models for most users, subscription “near frontier” models, and expensive frontier APIs for high‑value work.

Naming, Branding, and Strategy

Many find Google’s naming confusing: Flash vs Flash‑Lite vs Pro, shifting roles from release to release.
Some interpret 3.5 Flash being marked “stable” (not “preview”) plus the price hike as a long-term reset of the “cheap model” baseline, not a temporary spike.
Several suspect Google is prioritizing monetization and search integration over being the absolute frontier lab.

Culture, UX, and “Vibe”

A recurring complaint is Gemini’s personality: overly enthusiastic, flattering, and verbose, even when wrong; some users say this alone puts them off.
The “pelican on a bike” SVG benchmark shows 3.5 Flash generating elaborate, stylized but structurally flawed graphics, illustrating a tendency to “do a lot” rather than fix core mistakes.
Overall sentiment mixes respect for speed and raw capability with strong skepticism about pricing, reliability, and long-term trust.

Related topics