2025-05-06

Gemini 2.5 Pro Preview

Billing, Credits & Monitoring

Many want prepaid credits, hard caps, and near‑real‑time usage dashboards “like every other vendor”; Google’s current billing is seen as opaque and laggy.
Several recommend intermediaries (Deepinfra, OpenRouter, LiteLLM + Langfuse, LLM Ops tools) to meter usage, add per‑project keys, and avoid direct billing surprises.
Multiple reports of unexpectedly high Gemini 2.5 Pro bills, especially for coding use; some switch to wallet‑based routers to guarantee spend caps.

Frontend & Web Dev Experience

Some claim “design‑to‑code” FE work is effectively automated: using Cline + Gemini 2.5 + MCP tools (Figma, Playwright) to turn designs into production‑quality UI in hours, including large codebases (leveraging 1M context).
Tailwind + component libraries (shadcn, Material, MUI) are seen as making models dramatically more reliable for styling vs raw CSS.
Others report frequent visual/UX shallowness, CSS mistakes, and still needing significant manual polishing—acceptable scaffolding, not “ship‑ready” UI.
Debate over urgency: early adopters argue devs must adapt quickly; skeptics say it’s easy to “turn on AI later” and catch up.

Code Style: Comments, Defensiveness & Refactors

Huge recurring complaint: Gemini 2.5 Pro produces excessively commented, “junior‑style” code and over‑defensive patterns (e.g., broad except Exception, placeholder objects).
Many say even strong negative prompts (“no comments whatsoever”) often fail, especially via tools like Cursor; some suspect hidden system prompts pushing heavy commentary.
Some see comments as beneficial for LLM‑driven maintenance and personal review, then strip them in a final pass (manually or via script).
Another pain point: over‑eager refactoring and large diffs when only small changes are requested; Claude Code is praised as more conservative here.

Hallucinations, API Usage & Reliability

Several note Gemini 2.5 (Pro and Flash) hallucinates APIs less than prior models and often replaces Stack Overflow for routine tasks; still, it can confidently invent IAM permissions, Loki functions, or framework APIs.
Users mitigate by: dumping official docs into context, using IDE tools/LSP to validate completions, wiring MCP/tools to library docs, or asking the model explicitly to cross‑check with documentation.
Long philosophical subthread on whether we should expect calculator‑level reliability vs “intern‑like” fallibility, and what it means for an LLM to “know” something.

Model Comparisons & Benchmarks

Mixed reports on relative quality:
- Many find Gemini 2.5 Pro the best all‑around coding model today, especially for large‑context, agentic workflows.
- Others prefer Claude 3.7 Sonnet (or 3.5) for aesthetics, restraint, and maintainable diffs; some say Claude 3.7 regressed vs 3.5.
- Some users report Grok 3 outperforming Gemini on their coding tasks (notably refactors), especially when subsidized.
Discussion of benchmarks:
- Google’s own card for the new 05‑06 checkpoint shows slightly worse scores than the 03‑25 version on most non‑coding benchmarks, but better LiveCodeBench.
- This prompts speculation that extra tuning for coding or cost efficiency caused mild “catastrophic forgetting” elsewhere, which Google didn’t foreground in the marketing.

UI, Product & Versioning Issues

Gemini web/app UI is widely criticized: scroll‑jacking during streaming, heavy memory use on long chats, mobile flakiness, missing or awkward affordances (e.g., code copy buttons, TSX upload rejection unless renamed).
AI Studio is generally regarded as the better interface for developers, though it has its own quirks.
Considerable confusion around naming: “2.5 Pro (experimental)”, “2.5 Pro preview 03‑25 vs 05‑06”, “exp” vs “preview”, and lack of clear semantic versioning. Many want simple date- or semver‑style versions and stable pins.
Some integrations (e.g., VS Code Copilot Gemini backend) are reportedly still broken or lagging the latest checkpoint.

Adoption, Workflows & Future of SWE

In practice, many teams use Gemini 2.5 Pro via Cursor, Aider, Cline, RooCode, or GitHub Copilot; others are blocked by corporate policies that only allow Microsoft Copilot or ban external AI entirely.
Common “best‑use” pattern: let Gemini handle boilerplate, UI wiring, tests, and small bug hunts, while humans own architecture, abstractions, and product decisions.
Long debate over whether current models can or will soon handle high‑level architecture and abstraction: some foresee imminent super‑human design; others see clear limits in context, reasoning, and alignment with messy organizational reality.
Concern surfaces about junior devs’ career paths in an AI‑heavy world, but there’s also a strong view that AI mostly amplifies productive engineers rather than replacing them outright—at least for now.

Related topics