2025-02-20

Grok 3: Another win for the bitter lesson

Meaning of “bitter lesson” and “exception proves the rule”

Thread opens with a tangent clarifying that “the exception that proves the rule” originally means “an explicit exception implies a general rule,” not “exceptions logically confirm rules.”
Multiple commenters argue the article similarly misuses “the bitter lesson,” which originally says: long‑run progress comes from leveraging computation via general methods, not hand‑coded domain knowledge.
Several say the article reduces this to “more chips = win,” ignoring algorithmic efficiency and software design.

Compute vs algorithms, talent, and DeepSeek vs xAI

Strong disagreement over whether “just scale compute” is realistic.
One side: compute grows exponentially, humans and talent pipelines don’t; scaling hardware is ultimately easier, and raw scale dominates.
Other side: algorithmic advances (e.g., DeepSeek’s optimizations under export constraints) can rival or beat brute force; effective use of FLOPs is nontrivial and requires rare expertise.
Debate over DeepSeek’s actual GPU count and spend; numbers in public reports are viewed as speculative. Some argue a first‑mover disadvantage: once techniques are public, followers can reproduce results with less compute.

Geopolitics and hardware stack concentration

Some see the US concentrating the critical AI stack (TSMC/ASML fabs in US, NVIDIA, big labs) and ask whether this leads to global dominance.
Others respond that:
- China is likely to build an independent stack eventually.
- Software can be exfiltrated; hardware is the real bottleneck.
- AI’s actual geopolitical leverage is unclear and may be overhyped.

Grok 3 performance, scaling laws, and benchmark skepticism

Many note Grok 3’s top ranking on LMSys Chatbot Arena and strong benchmark bars, but others distrust headline numbers.
Concerns: potential training on benchmarks, Goodhart’s law, and suspiciously high scores on reasoning tests (e.g., GPQA Diamond) for a “non‑reasoning” model.
Some users report Grok 3 performing impressively in real coding and application tasks; others see only incremental gains for massive extra compute.
Several argue this is not a clear “win for scaling laws”: large compute increases for modest benchmark deltas look like diminishing returns.

Are LLMs “intelligent” and economically transformative?

One camp believes current neural methods will surpass humans in most tasks, giving early leaders near “nuclear‑scale” advantage.
Skeptics counter that LLMs are fast pattern matchers lacking robust reasoning, reliability, or “System 2” thinking, and that real‑world productivity gains are modest so far.
Broader worry: AI investment and hype may be outpacing tangible ROI, with inference cost and monetization challenges looming.

Talent, ethics, and adoption of Grok

Discussion over whether high compensation outweighs ethical or political concerns about working for certain US or Chinese labs.
For businesses, some would adopt Grok if it’s cheaper or better and API‑compatible; others consider reliance on any closed, politically entangled provider an unacceptable strategic risk.

Related topics