2024-05-14

Gemini Flash

Model Capabilities & Long Context

Headline feature is 1M-token context; many see it as enabling “dump the whole textbook/codebase” workflows without RAG or manual filtering.
Others say practical value is limited: long prompts increase cost and latency, and models often degrade beyond ~128k tokens or forget instructions.
Several report 1.5 Pro/Flash becoming unstable or slow with very large inputs; at least one user saw crashes with near-1M-token prompts.
Discussion on whether a single embedding vector can adequately represent such long context; concerns about compression limits, attention sparsity, and retrieval quality.

Pricing & Context Caching

Flash is cheaper than GPT‑3.5 Turbo on both input and output tokens, especially for multimodal tasks.
Long-context pricing doubles past 128k tokens. Input cost for a 1M-token exchange is non-trivial and recurs each round.
“Context caching” for 1.5 Pro can halve prompt cost for shared prefixes but adds an hourly cache fee, seen as only economical at higher request rates.
Some note that cloud LLM costs make generous user prompting hard for indie devs compared to cheap app hosting.

Quality, Hallucinations & Benchmarks

Multiple users describe Gemini 1.5 Pro (and by implication Flash) as significantly worse than GPT‑4/4o and Claude 3 Opus, especially in code, audio/video understanding, and hallucination rate.
Others find it “intelligent enough” and particularly valuable when it can ingest entire codebases or large document sets.
Benchmarks shared (e.g., NYT Connections) place Flash notably below top frontier models and below Gemini 1.5 Pro.
Some distrust Google’s benchmark claims and model story (e.g., confusion around “Ultra”), though others point out 1.5 Pro scores competitively on certain leaderboards.

Ecosystem, Commoditization & Branding

Many frame this as part of a “race to the bottom” on price, suggesting LLM APIs are becoming commodity-like, with switching mainly constrained by quality and integration.
Google is seen as leveraging cloud scale and cheaper TPUs rather than clear technical superiority.
Complaints about fragmented/unclear Gemini pricing pages and model names; OpenAI is criticized for clunky naming, Google praised for “Gemini” branding but criticized for product sprawl.

Safety & Control

Concerns about “safety” triggers blocking use cases; some note they can be disabled via API but still dislike corporate control over speech.

Related topics