Gemini 2.5 Flash

Pricing, “Reasoning” Mode, and Pareto Positioning

  • 2.5 Flash is ~50% more expensive than 2.0 Flash but still viewed as very cheap vs frontier models; some see it as a good new point on the price–performance “Pareto frontier.”
  • Huge price gap between non‑reasoning and reasoning (≈6× on output tokens) confuses people: it contradicts the “just sprinkle in tokens” mental model.
  • Clarified in external docs: when “thinking” is on, all output tokens (including hidden thought tokens) are billed at the higher rate.
  • Several commenters suspect pricing is driven more by market positioning than raw compute cost, leaving room for future price cuts.

Performance vs Other Models

  • Many report 2.5 Pro as a big leap: strong at coding, “deep research,” and reading large codebases; several cancelled other subscriptions in favor of Gemini.
  • 2.5 Flash is seen as great “bang for the buck,” especially for classification, attribute extraction, OCR, and large‑scale PDF→JSON extraction; some say it beats specialized OCR services on cost/accuracy.
  • Others note that OpenAI’s o4‑mini outperforms 2.5 Flash on key benchmarks (e.g., AIME, MMMU), though at significantly higher cost; with reasoning enabled, the cost gap narrows.
  • Mixed comparisons to Claude 3.7 Sonnet and DeepSeek: some find Gemini more reliable on real codebases and agentic workflows, others still prefer Claude or DeepSeek for predictable, narrow edits.

Use Cases and Capabilities

  • Popular uses:
    • Bulk text classification and extraction with acceptable error rates combined with verification or human review.
    • Large‑context coding assistance, refactors, and bug‑finding on 70k+ token repos.
    • Multimodal tasks like diagram understanding, video ingestion, PDF bank/invoice parsing, and financial data summarization.
    • New image features: bounding boxes and segmentation masks from images; interesting but currently weaker than dedicated vision models on precision.
  • Built‑in Python code execution via the API is highlighted as a powerful, under‑advertised capability.

UX, Rate Limits, and Product Fragmentation

  • Strong complaints about preview rate limits and low free‑tier token‑per‑minute caps; hard to run evals or heavy dev workflows without paid billing.
  • Time‑to‑first‑token and occasional downgrades to older models under load are noticed.
  • AI Studio and API are praised; the consumer Gemini app and Workspace integration are widely criticized as slower, dumber, and over‑censored compared to the same models via API/Studio.
  • Confusion around model names (Pro vs Flash vs Lite vs Preview/Experimental) and around how “thinking” settings affect cost and behavior.

Behavior, Guardrails, and Prompting

  • Several note Gemini has become less “refusal‑heavy” and less politically over‑tuned than earlier versions, with adjustable safety sliders on the API side.
  • Others still encounter over‑eager refactoring, verbose “robust error handling,” and difficulty constraining changes to small patches; prompt hacks (explicit rules repeated each message) help somewhat.
  • There is ongoing frustration that good results still require “speaking LLM” and detailed instructions, contradicting the marketing of “just talk to it.”

Google’s Strategic Position and Trust Issues

  • Many see Google’s custom TPUs, data sources (YouTube, Books, web crawl), and vertical integration as a long‑term advantage; some argue Google is “silently winning” the model race.
  • Counter‑views emphasize Google’s history of product shutdowns, enshittification, and ad‑driven incentives; reluctance to trust Gemini with sensitive data is common.
  • Free or very cheap access to strong models (2.5 Pro experimental, Flash) is seen as both a massive draw and a potential predatory loss‑leader.

Tooling and Ecosystem

  • Gemini is increasingly used with third‑party tools (Aider, Cline, Roo Code, Raycast, Big‑AGI, etc.) where it can compete head‑to‑head with Anthropic and OpenAI.
  • Lack of a first‑party, Claude‑Code‑style desktop agent and weaker Gemini app UX are considered major gaps, even by users who prefer Google’s models.