Making 2.5 Flash and 2.5 Pro GA, and introducing Gemini 2.5 Flash-Lite

Real‑world usage outside coding

  • Frequent non-coding uses: translation, long-document summarization, research reports, web/Youtube summarization, web scraping → semi-structured data, NDA/contract extraction, converting handwritten or scanned text to spreadsheets, real-estate listing feeds, home automation, math exploration, audio transcription, and “book club” / journaling / self‑reflection.
  • Vision and multimodal: praised for handling large batches of images cheaply and reliably (e.g., product Lexikon), YouTube access and giant context window were repeatedly cited as differentiators.
  • Many use Flash / Flash-Lite for “cheap and fast” tasks, often as a delegate from a larger model to generate or edit structured objects.

Model quality & comparisons

  • Several users say Gemini 2.5 Pro is strong for translation, summarization, law-like writing, math help, and long-context drafting; some prefer its writing tone and research depth to ChatGPT.
  • Others find Gemini worse than Claude or OpenAI for serious coding or complex reasoning, describing it as verbose, off-topic, or “Buzzfeed-style” in tone.
  • Some report very good coding performance and stable code from 2.5 Pro (especially via tools like Aider), but complain about excessive comments and try/except clutter.
  • There’s a sense that preview versions of 2.5 Pro felt smarter, more willing to push back, and less sycophantic than the GA release.

Long context & benchmarks

  • Users praise the 1M-token window for translation, big-doc Q&A, and “pile of NDAs”‑type workflows.
  • A detailed subthread debates long-context evals (NIAH, MRCR, RULER, LongProc, HELMET).
    • One side: Gemini 2.5 Pro collapses after ~32k tokens on internal enterprise benchmarks; long-context reasoning is still weak across all models.
    • Other side: in real-world doc‑assembly tasks (reports/proposals) 2.5 Pro performs uniquely well.
  • Consensus: long context is useful but far from “solved,” and benchmark choice strongly affects perceived performance.

Product tiers, UX, privacy

  • Gemini app, AI Studio, and Vertex behave differently:
    • Gemini app: smaller thinking budgets, stronger safety filters, nerfed behavior; often underperforms API/AI Studio.
    • AI Studio: better control (system instructions, temperature, schemas) but confusion over when data may be used for training; clarified that any account with a billed project gets private treatment.
    • Vertex: same models with higher, more negotiable rate limits.
  • Many dislike Gemini’s chat UX and file-handling; want native Git/FTP/file integration instead of copy-paste.

“Thinking” mode and behavior

  • “Thinking” is described as scratchpad / chain-of-thought tokens before the final answer. It improves quality but adds latency and lots of tokens.
  • Users question the value of a “thinking” variant that’s weaker than regular Flash, and some see no need for thinking on latency‑sensitive tasks (voice, real-time apps).
  • Reports that Flash sometimes emits thinking tokens even when thinking budget is set to zero.

Pricing and “bait‑and‑switch” concerns

  • Major point of contention: 2.5 Flash price changes vs preview and vs 2.0 Flash:
    • Input text/image/video: 2x increase over 2.5 preview (and higher than 2.0).
    • Output: single $2.50/M rate replaces $0.60 (non-thinking) and $3.50 (thinking); effective 4x increase for prior non-thinking use.
    • Audio for Flash-Lite up ~6.3x over 2.0 Flash-Lite.
  • Many see this as a “bait‑and‑switch”: developers built on cheap preview pricing, then face steep increases as models go GA.
  • Others argue earlier pricing was clearly subsidized to gain adoption; as Gemini becomes competitive, it converges toward market rates.

Limits, reliability, and access issues

  • Complaints about:
    • Low default rate limits (e.g., 10k RPD), opaque upgrade process, getting 403s mid‑batch; some moved back to OpenAI for throughput.
    • Empty responses or loops (e.g., Flash-Lite repeating phrases in transcripts), often tied to length limits or safety filters.
    • 2.5 Pro being unavailable or tricky to access via some API endpoints.
  • Some note that Vertex alleviates many rate-limit issues and offers more formal throughput guarantees.

Cloud vs local models

  • A few users consider local LLMs due to API pricing and limits, but others argue:
    • Hardware costs and rapid model churn make local generally worse economics unless you’re processing huge volumes.
    • Quality gap: local models on 24–48GB GPUs are closer to Flash-Lite level while being slower than top hosted models.
  • Local is framed as mainly for hobby and privacy, not efficiency—at least for now.

General sentiment

  • Many have shifted primary usage from ChatGPT/Claude to Gemini (especially 2.5 Pro and 2.0/2.5 Flash) and are impressed by speed, multimodal capabilities, and long context.
  • At the same time, there’s strong frustration about:
    • Perceived “nerfs”/quantization, particularly in the consumer app.
    • Overly cheerful, verbose tone.
    • Sudden price hikes and confusing thinking/non-thinking semantics.
  • Overall, Gemini is seen as technically strong and rapidly improving, but trust is undermined by pricing moves, behavior changes, and product fragmentation.