2025-11-23

Three Years from GPT-3 to Gemini 3

Perceived Progress and Capabilities

Many see Gemini 3 as a substantial step up: useful for coding, product design discussions, math help, and high-quality editing. Some report 2–3x productivity or quality gains (e.g., faster code, better emails, thesis support).
Others argue demos are cherry‑picked. The “PhD‑level” paper is criticized as pattern‑matching and cargo cult research rather than genuine insight.
Several describe the models as “competent grad student” or “intermediate dev” alternating with “raving lunatic.” You still need domain knowledge to validate outputs.

Hallucinations, Reliability, and Gell‑Mann Effect

Hallucinations are seen as changed, not solved: fewer obvious factual glitches, more confident, self‑justifying nonsense (invented APIs, references, or methods).
Users note self‑contradictory reasoning and “embarrassed” behavior when models are corrected.
Multiple comments liken trust in AI on unfamiliar topics to the Gell‑Mann amnesia effect: you see errors in your own field yet assume quality elsewhere.

Interfaces and UX: Text vs Voice vs Generative UI

Strong defense of text: high information density, easy to skim, quote, and iterate. Many power users prefer chat/CLI over video or voice.
Others praise voice interaction (e.g., in cars, brainstorming), but complain about overly perky personalities and slowness.
Some expect multimodal agents and “generative UI” (dynamic, model‑designed interfaces) to be the next big shift; others think plain textboxes, tables, and graphs will remain dominant because humans haven’t changed.

Research, Novelty, and Cognitive Atrophy

In math and research, models help with calculations, literature surfacing, and idea refinement, but often just regurgitate known results unless heavily guided.
Several argue current LLMs are “huge librarians,” structurally biased toward the most probable answer, not genuine novelty.
There’s concern about “neural atrophy” as people offload more thinking to AI; historical analogies to books and calculators are debated.

Coding, Agents, and Security

Heavy use of AI for coding: “vibecoding” entire apps, then reviewing and steering, is becoming common for some; others find the same models stubborn, context‑blind, and grifty.
Agentic tools that can run commands or edit files raise security concerns. Some only run them in containers/VMs; others grant full access, relying on permission prompts or YOLO attitudes.
Worry that we’ve regressed on basic security norms by piping proprietary code and system access into opaque third‑party models.

Economics, Education, and Jobs

Debate over whether the massive AI spend is exceptional versus what other sectors get, and whether it’s delivering commensurate real‑world gains.
Long tangent on education quality, literacy, and teacher pay: some argue we should invest in human education rather than AI; others say schooling is failing regardless of funding.
Developers are split between anxiety about job loss (especially for routine/CRUD work) and optimism that their individual leverage and the market for custom software will expand.

Related topics