Gemini 2.5 Flash
Pricing, “Reasoning” Mode, and Pareto Positioning
- 2.5 Flash is ~50% more expensive than 2.0 Flash but still viewed as very cheap vs frontier models; some see it as a good new point on the price–performance “Pareto frontier.”
- Huge price gap between non‑reasoning and reasoning (≈6× on output tokens) confuses people: it contradicts the “just sprinkle in
tokens” mental model. - Clarified in external docs: when “thinking” is on, all output tokens (including hidden thought tokens) are billed at the higher rate.
- Several commenters suspect pricing is driven more by market positioning than raw compute cost, leaving room for future price cuts.
Performance vs Other Models
- Many report 2.5 Pro as a big leap: strong at coding, “deep research,” and reading large codebases; several cancelled other subscriptions in favor of Gemini.
- 2.5 Flash is seen as great “bang for the buck,” especially for classification, attribute extraction, OCR, and large‑scale PDF→JSON extraction; some say it beats specialized OCR services on cost/accuracy.
- Others note that OpenAI’s o4‑mini outperforms 2.5 Flash on key benchmarks (e.g., AIME, MMMU), though at significantly higher cost; with reasoning enabled, the cost gap narrows.
- Mixed comparisons to Claude 3.7 Sonnet and DeepSeek: some find Gemini more reliable on real codebases and agentic workflows, others still prefer Claude or DeepSeek for predictable, narrow edits.
Use Cases and Capabilities
- Popular uses:
- Bulk text classification and extraction with acceptable error rates combined with verification or human review.
- Large‑context coding assistance, refactors, and bug‑finding on 70k+ token repos.
- Multimodal tasks like diagram understanding, video ingestion, PDF bank/invoice parsing, and financial data summarization.
- New image features: bounding boxes and segmentation masks from images; interesting but currently weaker than dedicated vision models on precision.
- Built‑in Python code execution via the API is highlighted as a powerful, under‑advertised capability.
UX, Rate Limits, and Product Fragmentation
- Strong complaints about preview rate limits and low free‑tier token‑per‑minute caps; hard to run evals or heavy dev workflows without paid billing.
- Time‑to‑first‑token and occasional downgrades to older models under load are noticed.
- AI Studio and API are praised; the consumer Gemini app and Workspace integration are widely criticized as slower, dumber, and over‑censored compared to the same models via API/Studio.
- Confusion around model names (Pro vs Flash vs Lite vs Preview/Experimental) and around how “thinking” settings affect cost and behavior.
Behavior, Guardrails, and Prompting
- Several note Gemini has become less “refusal‑heavy” and less politically over‑tuned than earlier versions, with adjustable safety sliders on the API side.
- Others still encounter over‑eager refactoring, verbose “robust error handling,” and difficulty constraining changes to small patches; prompt hacks (explicit rules repeated each message) help somewhat.
- There is ongoing frustration that good results still require “speaking LLM” and detailed instructions, contradicting the marketing of “just talk to it.”
Google’s Strategic Position and Trust Issues
- Many see Google’s custom TPUs, data sources (YouTube, Books, web crawl), and vertical integration as a long‑term advantage; some argue Google is “silently winning” the model race.
- Counter‑views emphasize Google’s history of product shutdowns, enshittification, and ad‑driven incentives; reluctance to trust Gemini with sensitive data is common.
- Free or very cheap access to strong models (2.5 Pro experimental, Flash) is seen as both a massive draw and a potential predatory loss‑leader.
Tooling and Ecosystem
- Gemini is increasingly used with third‑party tools (Aider, Cline, Roo Code, Raycast, Big‑AGI, etc.) where it can compete head‑to‑head with Anthropic and OpenAI.
- Lack of a first‑party, Claude‑Code‑style desktop agent and weaker Gemini app UX are considered major gaps, even by users who prefer Google’s models.