Making 2.5 Flash and 2.5 Pro GA, and introducing Gemini 2.5 Flash-Lite
Real‑world usage outside coding
- Frequent non-coding uses: translation, long-document summarization, research reports, web/Youtube summarization, web scraping → semi-structured data, NDA/contract extraction, converting handwritten or scanned text to spreadsheets, real-estate listing feeds, home automation, math exploration, audio transcription, and “book club” / journaling / self‑reflection.
- Vision and multimodal: praised for handling large batches of images cheaply and reliably (e.g., product Lexikon), YouTube access and giant context window were repeatedly cited as differentiators.
- Many use Flash / Flash-Lite for “cheap and fast” tasks, often as a delegate from a larger model to generate or edit structured objects.
Model quality & comparisons
- Several users say Gemini 2.5 Pro is strong for translation, summarization, law-like writing, math help, and long-context drafting; some prefer its writing tone and research depth to ChatGPT.
- Others find Gemini worse than Claude or OpenAI for serious coding or complex reasoning, describing it as verbose, off-topic, or “Buzzfeed-style” in tone.
- Some report very good coding performance and stable code from 2.5 Pro (especially via tools like Aider), but complain about excessive comments and try/except clutter.
- There’s a sense that preview versions of 2.5 Pro felt smarter, more willing to push back, and less sycophantic than the GA release.
Long context & benchmarks
- Users praise the 1M-token window for translation, big-doc Q&A, and “pile of NDAs”‑type workflows.
- A detailed subthread debates long-context evals (NIAH, MRCR, RULER, LongProc, HELMET).
- One side: Gemini 2.5 Pro collapses after ~32k tokens on internal enterprise benchmarks; long-context reasoning is still weak across all models.
- Other side: in real-world doc‑assembly tasks (reports/proposals) 2.5 Pro performs uniquely well.
- Consensus: long context is useful but far from “solved,” and benchmark choice strongly affects perceived performance.
Product tiers, UX, privacy
- Gemini app, AI Studio, and Vertex behave differently:
- Gemini app: smaller thinking budgets, stronger safety filters, nerfed behavior; often underperforms API/AI Studio.
- AI Studio: better control (system instructions, temperature, schemas) but confusion over when data may be used for training; clarified that any account with a billed project gets private treatment.
- Vertex: same models with higher, more negotiable rate limits.
- Many dislike Gemini’s chat UX and file-handling; want native Git/FTP/file integration instead of copy-paste.
“Thinking” mode and behavior
- “Thinking” is described as scratchpad / chain-of-thought tokens before the final answer. It improves quality but adds latency and lots of tokens.
- Users question the value of a “thinking” variant that’s weaker than regular Flash, and some see no need for thinking on latency‑sensitive tasks (voice, real-time apps).
- Reports that Flash sometimes emits thinking tokens even when thinking budget is set to zero.
Pricing and “bait‑and‑switch” concerns
- Major point of contention: 2.5 Flash price changes vs preview and vs 2.0 Flash:
- Input text/image/video: 2x increase over 2.5 preview (and higher than 2.0).
- Output: single $2.50/M rate replaces $0.60 (non-thinking) and $3.50 (thinking); effective 4x increase for prior non-thinking use.
- Audio for Flash-Lite up ~6.3x over 2.0 Flash-Lite.
- Many see this as a “bait‑and‑switch”: developers built on cheap preview pricing, then face steep increases as models go GA.
- Others argue earlier pricing was clearly subsidized to gain adoption; as Gemini becomes competitive, it converges toward market rates.
Limits, reliability, and access issues
- Complaints about:
- Low default rate limits (e.g., 10k RPD), opaque upgrade process, getting 403s mid‑batch; some moved back to OpenAI for throughput.
- Empty responses or loops (e.g., Flash-Lite repeating phrases in transcripts), often tied to length limits or safety filters.
- 2.5 Pro being unavailable or tricky to access via some API endpoints.
- Some note that Vertex alleviates many rate-limit issues and offers more formal throughput guarantees.
Cloud vs local models
- A few users consider local LLMs due to API pricing and limits, but others argue:
- Hardware costs and rapid model churn make local generally worse economics unless you’re processing huge volumes.
- Quality gap: local models on 24–48GB GPUs are closer to Flash-Lite level while being slower than top hosted models.
- Local is framed as mainly for hobby and privacy, not efficiency—at least for now.
General sentiment
- Many have shifted primary usage from ChatGPT/Claude to Gemini (especially 2.5 Pro and 2.0/2.5 Flash) and are impressed by speed, multimodal capabilities, and long context.
- At the same time, there’s strong frustration about:
- Perceived “nerfs”/quantization, particularly in the consumer app.
- Overly cheerful, verbose tone.
- Sudden price hikes and confusing thinking/non-thinking semantics.
- Overall, Gemini is seen as technically strong and rapidly improving, but trust is undermined by pricing moves, behavior changes, and product fragmentation.