2025-10-02

Gemini 3.0 Pro – early tests

Unclear nature of “Gemini 3.0 Pro” tests

Many assume the flashy Twitter demos come from an A/B test in Google AI Studio, but it’s unclear whether they’re actually Gemini 3.0.
Some find the showcased HTML/CSS/JS outputs unimpressive or pedestrian when inspected closely.

Benchmarks, SVG “pelican” test, and training data leakage

Several comments center on the “SVG of X riding Y” benchmark (e.g., pelican on a bicycle) as a private way to test models beyond public benchmarks.
Concern: once a benchmark becomes popular, it seeps into training sets (directly or via discussion), weakening its value.
Others argue that “being in the training data” is overrated; models still fail on many memorized problems, so overfitting to small, quirky tests is unlikely at scale.

Skepticism about “vibe” demos

Many dismiss influencer demos (bouncing balls, fake Apple pages) as shallow and easy to one-shot with existing models.
Some are tired of visually impressive but practically irrelevant tests that don’t reflect hard, real-world software problems.

Comparisons across frontier models

No consensus “best” model: different people report Claude, Gemini, GPT‑5, or others as superior, often based on narrow coding workflows.
One synthesis:
- Gemini: highest “ceiling” and best long-context/multimodal, but weak on token-level accuracy, tool-calling, and steering.
- Claude: most consistent and steerable, strong on detail, but can lose track in very complex contexts.
- GPT‑5: for some, best at long instruction-following and large feature builds; for others, erratic and inconsistent.

Gemini-specific pain points and strengths

Multi-turn instruction following and conversation “loops” (repeating itself, ignoring feedback) are a major complaint.
Tool-calling and structured JSON output are described as “terrible” or broken, limiting agentic coding.
On the plus side, Gemini’s long context and PDF handling are praised for tasks like reading huge spec documents or logs.

Google’s product culture and packaging issues

Recurrent theme: Google has strong research and engineering but weak product vision and integration.
People find Gemini and other Google AI offerings hard to discover, configure, and pay for; APIs, billing, and docs are called confusing and fragmented.
Some believe Google had the tech for ChatGPT‑like systems early but lacked the product culture to ship; OpenAI forced their hand.

Hype fatigue, AGI chatter, and eval difficulty

Commenters recall past GPT‑5/AGI hype and see similar cycles around each new Google announcement.
There’s broad agreement that reliable evaluations are hard: public benchmarks get gamed, private ones risk being ingested, and subjective reports conflict.

Privacy and policy concerns

One criticism: on consumer plans, Gemini reportedly trains on user data unless history is disabled, seen as worse privacy than other major providers.

Related topics