Gemini Flash
Model Capabilities & Long Context
- Headline feature is 1M-token context; many see it as enabling “dump the whole textbook/codebase” workflows without RAG or manual filtering.
- Others say practical value is limited: long prompts increase cost and latency, and models often degrade beyond ~128k tokens or forget instructions.
- Several report 1.5 Pro/Flash becoming unstable or slow with very large inputs; at least one user saw crashes with near-1M-token prompts.
- Discussion on whether a single embedding vector can adequately represent such long context; concerns about compression limits, attention sparsity, and retrieval quality.
Pricing & Context Caching
- Flash is cheaper than GPT‑3.5 Turbo on both input and output tokens, especially for multimodal tasks.
- Long-context pricing doubles past 128k tokens. Input cost for a 1M-token exchange is non-trivial and recurs each round.
- “Context caching” for 1.5 Pro can halve prompt cost for shared prefixes but adds an hourly cache fee, seen as only economical at higher request rates.
- Some note that cloud LLM costs make generous user prompting hard for indie devs compared to cheap app hosting.
Quality, Hallucinations & Benchmarks
- Multiple users describe Gemini 1.5 Pro (and by implication Flash) as significantly worse than GPT‑4/4o and Claude 3 Opus, especially in code, audio/video understanding, and hallucination rate.
- Others find it “intelligent enough” and particularly valuable when it can ingest entire codebases or large document sets.
- Benchmarks shared (e.g., NYT Connections) place Flash notably below top frontier models and below Gemini 1.5 Pro.
- Some distrust Google’s benchmark claims and model story (e.g., confusion around “Ultra”), though others point out 1.5 Pro scores competitively on certain leaderboards.
Ecosystem, Commoditization & Branding
- Many frame this as part of a “race to the bottom” on price, suggesting LLM APIs are becoming commodity-like, with switching mainly constrained by quality and integration.
- Google is seen as leveraging cloud scale and cheaper TPUs rather than clear technical superiority.
- Complaints about fragmented/unclear Gemini pricing pages and model names; OpenAI is criticized for clunky naming, Google praised for “Gemini” branding but criticized for product sprawl.
Safety & Control
- Concerns about “safety” triggers blocking use cases; some note they can be disabled via API but still dislike corporate control over speech.