Failing to Understand the Exponential, Again

Exponential vs. Sigmoid Growth

  • Many argue the author is mistaking the steep part of a sigmoid (logistic S-curve) for a true exponential.
  • Commenters note that most real systems (COVID spread, airline speeds, CPU clocks, human population) start exponential and then hit constraints.
  • The key disagreement: some think we’re still safely on the early, exponential part of the S-curve; others believe we’re already seeing diminishing returns, especially on LLMs.

Benchmarks, “Human-Level” Claims, and Metrics

  • Heavy skepticism toward the METR “task length” metric and OpenAI’s GDPval benchmark:
    • “Length of tasks a model can do” is seen as loosely defined and easy to cherry-pick.
    • A 50% “win rate” vs. experts is criticized as a low bar, obscuring error and hallucination rates.
    • Concerns that benchmarks select only tasks that flatter LLMs (presentations, reports) rather than the full job (e.g., nursing, software engineering).
  • Several commenters stress that evaluation on curated tests ≠ robust performance in messy real-world workflows.

Limits: Data, Compute, Energy, and Economics

  • Multiple proposed limiting factors:
    • Training data (the “petri dish” is the internet; synthetic data risks feedback/hallucination loops).
    • Compute, energy, and cooling; capex may already be propping up the broader economy.
    • Funding and investor patience: exponential capability is being bought with exponential spending.
  • Others counter that information systems historically show long-run exponential improvement and that physics limits (e.g., Bremermann’s limit) are still far away.

Real-World Capability vs. Hype

  • Practitioners report:
    • Strong gains in tooling (coding assist, video editing, subtitles, masking), but models still fail in ways no competent human would.
    • “Eight hours of autonomous work” ignores memory, learning, and responsibility: LLMs don’t retain long-term context or reliably self-correct.
    • Key weaknesses remain in reasoning, math without tools, physical-world understanding, and persistent learning.

Incentives, Hype, and Trust

  • Significant criticism of conflicts of interest: the author works at a frontier lab and benefits from continued hype.
  • AI timelines always being “1–2 years away” (self-driving, AR, metaverse, AGI) is seen as structurally tied to fundraising and competition for capital.
  • Many call for focusing less on curve-fitting and more on:
    • Concrete constraints and mechanisms,
    • Error/hallucination rates and accountability,
    • How and when systems can actually replace or safely augment human experts.