Jagged AGI: o3, Gemini 2.5, and everything after

Nature of LLMs: “text completion” vs “reasoning”

  • One camp insists current models are fundamentally probabilistic text predictors; any appearance of “assuming”, “understanding”, or “conversing” is just sophisticated next‑token completion.
  • Others argue this framing is trivial or misleading: transformers, attention and chain‑of‑thought produce internal structure that meaningfully resembles planning, assumptions and reasoning, even if the underlying objective is text prediction.
  • A sub‑debate: whether humans themselves might be “fancy next‑word predictors”; some see this as plausible, others as missing key aspects of human thought (goals, embodiment, long‑term learning).

AGI, “Jagged AGI,” and moving goalposts

  • Many see “jagged AGI” as a rhetorically clever way to say: models are superhuman on many tasks yet weirdly brittle on others.
  • Skeptics call this incompatible with the “G” in AGI: if capabilities are spiky and unreliable, it’s not general intelligence, just a powerful narrow system with broad coverage.
  • Stronger definitions of AGI revolve around:
    • Ability to autonomously improve its own design (recursive self‑improvement).
    • Ability to learn and retain arbitrary new skills over time like a human child.
    • Being able to function as an autonomous colleague (e.g. full software engineer or office worker) using standard human tools.
  • Others adopt weaker, task‑based definitions: any artificial system that can apply reasoning across an unbounded domain of knowledge counts as AGI, in which case some argue we already have it.

Capabilities: where models feel impressive or superhuman

  • Many report Gemini 2.5, Claude 3.7, and o3 as huge practical upgrades:
    • Writing substantial grant proposals, research plans, and project timelines.
    • High‑quality coding assistance, debugging, and test generation.
    • Better at saying “no” or suggesting not to change working systems.
  • Some users now prefer top models over human experts for certain fact‑based or synthesis tasks, especially when they expect more objectivity or broader literature coverage.

Limitations and failure modes

  • Classic riddles, trick questions, and slightly altered prompts still trip models; they often revert to the most common training‑set pattern instead of carefully reading the variation.
  • Hallucinations remain a core problem, especially in domains with lots of online misinformation (e.g. trading strategies, obscure game puzzles). Models confidently invent solutions rather than admit ignorance.
  • Determinism and consistency are weak: same question can yield conflicting answers, including about the model’s own capabilities.
  • Lack of continual learning and robust long‑term memory is widely viewed as a key missing ingredient for true AGI.

Tools, agents, and embodiment

  • Tool‑use (MCP, plugins, agents) is seen by some as lateral progress: more useful systems, but not closer to AGI unless the model itself is doing deeper reasoning and learning from these interactions.
  • Others argue “the AI” is the whole system (model + tools + prompts), and tool‑using agents already exhibit a kind of emerging general intelligence.
  • A recurring benchmark for future AGI: an embodied agent that can reliably do a plumber’s or office worker’s job in messy real‑world conditions.

Economic and social framing

  • Some celebrate current progress as a triumph of capitalist competition driving down costs and expanding capability.
  • Others warn the real issues are concentration of power, eventual labor displacement (especially white‑collar), and when AI becomes too capable to be safely controlled by “flaky tech companies.”
  • Several commenters think definitional fights over “AGI” are largely bikeshedding; what matters is empirical capability, reliability on specific tasks, and downstream societal impact.