Gemini 2.5 Pro Preview
Billing, Credits & Monitoring
- Many want prepaid credits, hard caps, and near‑real‑time usage dashboards “like every other vendor”; Google’s current billing is seen as opaque and laggy.
- Several recommend intermediaries (Deepinfra, OpenRouter, LiteLLM + Langfuse, LLM Ops tools) to meter usage, add per‑project keys, and avoid direct billing surprises.
- Multiple reports of unexpectedly high Gemini 2.5 Pro bills, especially for coding use; some switch to wallet‑based routers to guarantee spend caps.
Frontend & Web Dev Experience
- Some claim “design‑to‑code” FE work is effectively automated: using Cline + Gemini 2.5 + MCP tools (Figma, Playwright) to turn designs into production‑quality UI in hours, including large codebases (leveraging 1M context).
- Tailwind + component libraries (shadcn, Material, MUI) are seen as making models dramatically more reliable for styling vs raw CSS.
- Others report frequent visual/UX shallowness, CSS mistakes, and still needing significant manual polishing—acceptable scaffolding, not “ship‑ready” UI.
- Debate over urgency: early adopters argue devs must adapt quickly; skeptics say it’s easy to “turn on AI later” and catch up.
Code Style: Comments, Defensiveness & Refactors
- Huge recurring complaint: Gemini 2.5 Pro produces excessively commented, “junior‑style” code and over‑defensive patterns (e.g., broad
except Exception, placeholder objects). - Many say even strong negative prompts (“no comments whatsoever”) often fail, especially via tools like Cursor; some suspect hidden system prompts pushing heavy commentary.
- Some see comments as beneficial for LLM‑driven maintenance and personal review, then strip them in a final pass (manually or via script).
- Another pain point: over‑eager refactoring and large diffs when only small changes are requested; Claude Code is praised as more conservative here.
Hallucinations, API Usage & Reliability
- Several note Gemini 2.5 (Pro and Flash) hallucinates APIs less than prior models and often replaces Stack Overflow for routine tasks; still, it can confidently invent IAM permissions, Loki functions, or framework APIs.
- Users mitigate by: dumping official docs into context, using IDE tools/LSP to validate completions, wiring MCP/tools to library docs, or asking the model explicitly to cross‑check with documentation.
- Long philosophical subthread on whether we should expect calculator‑level reliability vs “intern‑like” fallibility, and what it means for an LLM to “know” something.
Model Comparisons & Benchmarks
- Mixed reports on relative quality:
- Many find Gemini 2.5 Pro the best all‑around coding model today, especially for large‑context, agentic workflows.
- Others prefer Claude 3.7 Sonnet (or 3.5) for aesthetics, restraint, and maintainable diffs; some say Claude 3.7 regressed vs 3.5.
- Some users report Grok 3 outperforming Gemini on their coding tasks (notably refactors), especially when subsidized.
- Discussion of benchmarks:
- Google’s own card for the new 05‑06 checkpoint shows slightly worse scores than the 03‑25 version on most non‑coding benchmarks, but better LiveCodeBench.
- This prompts speculation that extra tuning for coding or cost efficiency caused mild “catastrophic forgetting” elsewhere, which Google didn’t foreground in the marketing.
UI, Product & Versioning Issues
- Gemini web/app UI is widely criticized: scroll‑jacking during streaming, heavy memory use on long chats, mobile flakiness, missing or awkward affordances (e.g., code copy buttons, TSX upload rejection unless renamed).
- AI Studio is generally regarded as the better interface for developers, though it has its own quirks.
- Considerable confusion around naming: “2.5 Pro (experimental)”, “2.5 Pro preview 03‑25 vs 05‑06”, “exp” vs “preview”, and lack of clear semantic versioning. Many want simple date- or semver‑style versions and stable pins.
- Some integrations (e.g., VS Code Copilot Gemini backend) are reportedly still broken or lagging the latest checkpoint.
Adoption, Workflows & Future of SWE
- In practice, many teams use Gemini 2.5 Pro via Cursor, Aider, Cline, RooCode, or GitHub Copilot; others are blocked by corporate policies that only allow Microsoft Copilot or ban external AI entirely.
- Common “best‑use” pattern: let Gemini handle boilerplate, UI wiring, tests, and small bug hunts, while humans own architecture, abstractions, and product decisions.
- Long debate over whether current models can or will soon handle high‑level architecture and abstraction: some foresee imminent super‑human design; others see clear limits in context, reasoning, and alignment with messy organizational reality.
- Concern surfaces about junior devs’ career paths in an AI‑heavy world, but there’s also a strong view that AI mostly amplifies productive engineers rather than replacing them outright—at least for now.