Gemini Flash

Model Capabilities & Long Context

  • Headline feature is 1M-token context; many see it as enabling “dump the whole textbook/codebase” workflows without RAG or manual filtering.
  • Others say practical value is limited: long prompts increase cost and latency, and models often degrade beyond ~128k tokens or forget instructions.
  • Several report 1.5 Pro/Flash becoming unstable or slow with very large inputs; at least one user saw crashes with near-1M-token prompts.
  • Discussion on whether a single embedding vector can adequately represent such long context; concerns about compression limits, attention sparsity, and retrieval quality.

Pricing & Context Caching

  • Flash is cheaper than GPT‑3.5 Turbo on both input and output tokens, especially for multimodal tasks.
  • Long-context pricing doubles past 128k tokens. Input cost for a 1M-token exchange is non-trivial and recurs each round.
  • “Context caching” for 1.5 Pro can halve prompt cost for shared prefixes but adds an hourly cache fee, seen as only economical at higher request rates.
  • Some note that cloud LLM costs make generous user prompting hard for indie devs compared to cheap app hosting.

Quality, Hallucinations & Benchmarks

  • Multiple users describe Gemini 1.5 Pro (and by implication Flash) as significantly worse than GPT‑4/4o and Claude 3 Opus, especially in code, audio/video understanding, and hallucination rate.
  • Others find it “intelligent enough” and particularly valuable when it can ingest entire codebases or large document sets.
  • Benchmarks shared (e.g., NYT Connections) place Flash notably below top frontier models and below Gemini 1.5 Pro.
  • Some distrust Google’s benchmark claims and model story (e.g., confusion around “Ultra”), though others point out 1.5 Pro scores competitively on certain leaderboards.

Ecosystem, Commoditization & Branding

  • Many frame this as part of a “race to the bottom” on price, suggesting LLM APIs are becoming commodity-like, with switching mainly constrained by quality and integration.
  • Google is seen as leveraging cloud scale and cheaper TPUs rather than clear technical superiority.
  • Complaints about fragmented/unclear Gemini pricing pages and model names; OpenAI is criticized for clunky naming, Google praised for “Gemini” branding but criticized for product sprawl.

Safety & Control

  • Concerns about “safety” triggers blocking use cases; some note they can be disabled via API but still dislike corporate control over speech.