2025-04-17

Gemini 2.5 Flash

Pricing, “Reasoning” Mode, and Pareto Positioning

2.5 Flash is ~50% more expensive than 2.0 Flash but still viewed as very cheap vs frontier models; some see it as a good new point on the price–performance “Pareto frontier.”
Huge price gap between non‑reasoning and reasoning (≈6× on output tokens) confuses people: it contradicts the “just sprinkle in tokens” mental model.
Clarified in external docs: when “thinking” is on, all output tokens (including hidden thought tokens) are billed at the higher rate.
Several commenters suspect pricing is driven more by market positioning than raw compute cost, leaving room for future price cuts.

Performance vs Other Models

Many report 2.5 Pro as a big leap: strong at coding, “deep research,” and reading large codebases; several cancelled other subscriptions in favor of Gemini.
2.5 Flash is seen as great “bang for the buck,” especially for classification, attribute extraction, OCR, and large‑scale PDF→JSON extraction; some say it beats specialized OCR services on cost/accuracy.
Others note that OpenAI’s o4‑mini outperforms 2.5 Flash on key benchmarks (e.g., AIME, MMMU), though at significantly higher cost; with reasoning enabled, the cost gap narrows.
Mixed comparisons to Claude 3.7 Sonnet and DeepSeek: some find Gemini more reliable on real codebases and agentic workflows, others still prefer Claude or DeepSeek for predictable, narrow edits.

Use Cases and Capabilities

Popular uses:
- Bulk text classification and extraction with acceptable error rates combined with verification or human review.
- Large‑context coding assistance, refactors, and bug‑finding on 70k+ token repos.
- Multimodal tasks like diagram understanding, video ingestion, PDF bank/invoice parsing, and financial data summarization.
- New image features: bounding boxes and segmentation masks from images; interesting but currently weaker than dedicated vision models on precision.
Built‑in Python code execution via the API is highlighted as a powerful, under‑advertised capability.

UX, Rate Limits, and Product Fragmentation

Strong complaints about preview rate limits and low free‑tier token‑per‑minute caps; hard to run evals or heavy dev workflows without paid billing.
Time‑to‑first‑token and occasional downgrades to older models under load are noticed.
AI Studio and API are praised; the consumer Gemini app and Workspace integration are widely criticized as slower, dumber, and over‑censored compared to the same models via API/Studio.
Confusion around model names (Pro vs Flash vs Lite vs Preview/Experimental) and around how “thinking” settings affect cost and behavior.

Behavior, Guardrails, and Prompting

Several note Gemini has become less “refusal‑heavy” and less politically over‑tuned than earlier versions, with adjustable safety sliders on the API side.
Others still encounter over‑eager refactoring, verbose “robust error handling,” and difficulty constraining changes to small patches; prompt hacks (explicit rules repeated each message) help somewhat.
There is ongoing frustration that good results still require “speaking LLM” and detailed instructions, contradicting the marketing of “just talk to it.”

Google’s Strategic Position and Trust Issues

Many see Google’s custom TPUs, data sources (YouTube, Books, web crawl), and vertical integration as a long‑term advantage; some argue Google is “silently winning” the model race.
Counter‑views emphasize Google’s history of product shutdowns, enshittification, and ad‑driven incentives; reluctance to trust Gemini with sensitive data is common.
Free or very cheap access to strong models (2.5 Pro experimental, Flash) is seen as both a massive draw and a potential predatory loss‑leader.

Tooling and Ecosystem

Gemini is increasingly used with third‑party tools (Aider, Cline, Roo Code, Raycast, Big‑AGI, etc.) where it can compete head‑to‑head with Anthropic and OpenAI.
Lack of a first‑party, Claude‑Code‑style desktop agent and weaker Gemini app UX are considered major gaps, even by users who prefer Google’s models.

Related topics