Gemini 3.0 Pro – early tests
Unclear nature of “Gemini 3.0 Pro” tests
- Many assume the flashy Twitter demos come from an A/B test in Google AI Studio, but it’s unclear whether they’re actually Gemini 3.0.
- Some find the showcased HTML/CSS/JS outputs unimpressive or pedestrian when inspected closely.
Benchmarks, SVG “pelican” test, and training data leakage
- Several comments center on the “SVG of X riding Y” benchmark (e.g., pelican on a bicycle) as a private way to test models beyond public benchmarks.
- Concern: once a benchmark becomes popular, it seeps into training sets (directly or via discussion), weakening its value.
- Others argue that “being in the training data” is overrated; models still fail on many memorized problems, so overfitting to small, quirky tests is unlikely at scale.
Skepticism about “vibe” demos
- Many dismiss influencer demos (bouncing balls, fake Apple pages) as shallow and easy to one-shot with existing models.
- Some are tired of visually impressive but practically irrelevant tests that don’t reflect hard, real-world software problems.
Comparisons across frontier models
- No consensus “best” model: different people report Claude, Gemini, GPT‑5, or others as superior, often based on narrow coding workflows.
- One synthesis:
- Gemini: highest “ceiling” and best long-context/multimodal, but weak on token-level accuracy, tool-calling, and steering.
- Claude: most consistent and steerable, strong on detail, but can lose track in very complex contexts.
- GPT‑5: for some, best at long instruction-following and large feature builds; for others, erratic and inconsistent.
Gemini-specific pain points and strengths
- Multi-turn instruction following and conversation “loops” (repeating itself, ignoring feedback) are a major complaint.
- Tool-calling and structured JSON output are described as “terrible” or broken, limiting agentic coding.
- On the plus side, Gemini’s long context and PDF handling are praised for tasks like reading huge spec documents or logs.
Google’s product culture and packaging issues
- Recurrent theme: Google has strong research and engineering but weak product vision and integration.
- People find Gemini and other Google AI offerings hard to discover, configure, and pay for; APIs, billing, and docs are called confusing and fragmented.
- Some believe Google had the tech for ChatGPT‑like systems early but lacked the product culture to ship; OpenAI forced their hand.
Hype fatigue, AGI chatter, and eval difficulty
- Commenters recall past GPT‑5/AGI hype and see similar cycles around each new Google announcement.
- There’s broad agreement that reliable evaluations are hard: public benchmarks get gamed, private ones risk being ingested, and subjective reports conflict.
Privacy and policy concerns
- One criticism: on consumer plans, Gemini reportedly trains on user data unless history is disabled, seen as worse privacy than other major providers.