Failing to Understand the Exponential, Again
Exponential vs. Sigmoid Growth
- Many argue the author is mistaking the steep part of a sigmoid (logistic S-curve) for a true exponential.
- Commenters note that most real systems (COVID spread, airline speeds, CPU clocks, human population) start exponential and then hit constraints.
- The key disagreement: some think we’re still safely on the early, exponential part of the S-curve; others believe we’re already seeing diminishing returns, especially on LLMs.
Benchmarks, “Human-Level” Claims, and Metrics
- Heavy skepticism toward the METR “task length” metric and OpenAI’s GDPval benchmark:
- “Length of tasks a model can do” is seen as loosely defined and easy to cherry-pick.
- A 50% “win rate” vs. experts is criticized as a low bar, obscuring error and hallucination rates.
- Concerns that benchmarks select only tasks that flatter LLMs (presentations, reports) rather than the full job (e.g., nursing, software engineering).
- Several commenters stress that evaluation on curated tests ≠ robust performance in messy real-world workflows.
Limits: Data, Compute, Energy, and Economics
- Multiple proposed limiting factors:
- Training data (the “petri dish” is the internet; synthetic data risks feedback/hallucination loops).
- Compute, energy, and cooling; capex may already be propping up the broader economy.
- Funding and investor patience: exponential capability is being bought with exponential spending.
- Others counter that information systems historically show long-run exponential improvement and that physics limits (e.g., Bremermann’s limit) are still far away.
Real-World Capability vs. Hype
- Practitioners report:
- Strong gains in tooling (coding assist, video editing, subtitles, masking), but models still fail in ways no competent human would.
- “Eight hours of autonomous work” ignores memory, learning, and responsibility: LLMs don’t retain long-term context or reliably self-correct.
- Key weaknesses remain in reasoning, math without tools, physical-world understanding, and persistent learning.
Incentives, Hype, and Trust
- Significant criticism of conflicts of interest: the author works at a frontier lab and benefits from continued hype.
- AI timelines always being “1–2 years away” (self-driving, AR, metaverse, AGI) is seen as structurally tied to fundraising and competition for capital.
- Many call for focusing less on curve-fitting and more on:
- Concrete constraints and mechanisms,
- Error/hallucination rates and accountability,
- How and when systems can actually replace or safely augment human experts.