2025-05-03

Google Gemini has the worst LLM API

Perceived squandered lead and corporate strategy

Several comments frame Gemini’s issues as part of a longer Google story: once API‑first and developer‑friendly, then increasingly inward‑facing since the Google+ era.
Some see this as driven by bureaucracy, headcount growth, and internal incentives that favor overlapping products over coherent platforms.
There’s debate over leadership: some say Google is floundering on innovation and execution; others point to strong financials and leadership in areas like self‑driving/LLMs as sufficient for investors.

Model quality vs developer experience

Many agree Gemini 2.5 Pro/Flash are excellent—especially long context, multimodality, and price—and compare favorably with competitors.
At the same time, dev experience is widely criticized as confusing, fragile, and overcomplicated relative to OpenAI/Anthropic.

Vertex vs Gemini vs AI Studio vs Firebase Gen AI

A major pain point is understanding the relationship between: Vertex AI (enterprise), Gemini API / AI Studio (simpler dev surface), and Firebase Gen AI.
Users struggle with two near-duplicate APIs, different auth/billing behavior, and multiple partially overlapping SDKs.
Some advise: if you’re not already on GCP, avoid Vertex and just use AI Studio; others use Vertex specifically for compliance and data‑handling guarantees.

Authentication, IAM, and security

Opinions diverge: some find GCP auth/IAM “diabolically complex,” especially compared to simple API keys; others say it’s conceptually clean and more secure than AWS once understood.
Workload Identity Federation vs JSON key files is debated: one side sees static keys as “good enough,” the other sees them as an avoidable security risk.

Quotas, reliability, and billing

Quotas and capacity contention on Gemini are described as a major operational risk; enterprise users mention needing a TAM or Provisioned Throughput, whose entry level is seen as high.
New “Dynamic Shared Quota” is cited as progress, but monitoring actual usage vs limits is still awkward.
Billing dashboards (Vertex and Gemini) are widely criticized as confusing, delayed, and lacking hard caps/prepaid credit—some users instead route through OpenRouter just to control spend.
Reliability of the non‑Vertex Gemini API is questioned (outages, request rejections under load); others say it works fine for them.

SDK fragmentation and documentation

Multiple SDKs (vertexai, google.generativeai, google.genai, plus OpenAI‑compat) and overlapping data models cause confusion; even Googlers acknowledge this and say genai is the future.
Docs are described as fragmented, inconsistent, light on clear examples, and hard to search; some people resort to reading library source.
One commenter notes that Google APIs are internally consistent with published design guidelines, but the learning curve is steep.

OpenAI‑compatible API limitations

The OpenAI‑compatible endpoints are helpful for quick trials but are not fully compatible: missing parameters, different behavior for tools/JSON Schema, content moderation that can’t be disabled, and subtle feature lag.
Some report that apps “just didn’t work” with the compatibility layer and reverted to raw HTTP or native SDKs.

Structured outputs and JSON Schema quirks

Structured output and tool-calling support is a recurring gripe:
- Limited or nonstandard JSON Schema support (e.g., refs, unions, additionalProperties) breaks common libraries and polyglot abstractions.
- Property ordering can affect outputs, and Gemini reorders schema properties alphabetically.
A few users work around this by auto‑resolving refs with their own code; others say this is precisely the kind of friction they don’t want.

Multimodal and files handling

Vertex’s approach to images/files (uploading via file manager or GCS, then referencing IDs) is viewed by some as overengineered, especially in JavaScript where libraries historically assumed local file paths.
Others point out you can inline base64/URLs and that newer JS examples do support images, but documentation and samples lag.

Privacy and data use

There is confusion and mistrust around when AI Studio traffic is used for training. Some users find the policy self‑contradictory and say it depends on opaque account state (billing vs “trial”).
Official responses claim that non‑free‑tier usage is not used for training, but commenters request much clearer and auditable guarantees.

Prefix/prompt caching

One group likes Google’s explicit, configurable prefix caching (dedicated endpoint, TTL control) as “serious” for advanced cost/latency tuning.
Another prefers OpenAI’s automatic caching, arguing it requires no planning and just saves money under load; they see Gemini’s per‑hour pricing and Anthropic’s explicit breakpoints as harder to reason about.

Community engagement from Google staff

Multiple Google PMs/DevRel participate in the thread, acknowledging DevEx problems, clarifying quotas/auth, promising better docs, dashboards, and JSON Schema support.
Some participants welcome this direct engagement; others find the tone corporate or belated, but it does surface concrete roadmap hints (unified SDK, express mode with API keys, upcoming billing UX fixes).

Related topics