Grok 3: Another win for the bitter lesson
Meaning of “bitter lesson” and “exception proves the rule”
- Thread opens with a tangent clarifying that “the exception that proves the rule” originally means “an explicit exception implies a general rule,” not “exceptions logically confirm rules.”
- Multiple commenters argue the article similarly misuses “the bitter lesson,” which originally says: long‑run progress comes from leveraging computation via general methods, not hand‑coded domain knowledge.
- Several say the article reduces this to “more chips = win,” ignoring algorithmic efficiency and software design.
Compute vs algorithms, talent, and DeepSeek vs xAI
- Strong disagreement over whether “just scale compute” is realistic.
- One side: compute grows exponentially, humans and talent pipelines don’t; scaling hardware is ultimately easier, and raw scale dominates.
- Other side: algorithmic advances (e.g., DeepSeek’s optimizations under export constraints) can rival or beat brute force; effective use of FLOPs is nontrivial and requires rare expertise.
- Debate over DeepSeek’s actual GPU count and spend; numbers in public reports are viewed as speculative. Some argue a first‑mover disadvantage: once techniques are public, followers can reproduce results with less compute.
Geopolitics and hardware stack concentration
- Some see the US concentrating the critical AI stack (TSMC/ASML fabs in US, NVIDIA, big labs) and ask whether this leads to global dominance.
- Others respond that:
- China is likely to build an independent stack eventually.
- Software can be exfiltrated; hardware is the real bottleneck.
- AI’s actual geopolitical leverage is unclear and may be overhyped.
Grok 3 performance, scaling laws, and benchmark skepticism
- Many note Grok 3’s top ranking on LMSys Chatbot Arena and strong benchmark bars, but others distrust headline numbers.
- Concerns: potential training on benchmarks, Goodhart’s law, and suspiciously high scores on reasoning tests (e.g., GPQA Diamond) for a “non‑reasoning” model.
- Some users report Grok 3 performing impressively in real coding and application tasks; others see only incremental gains for massive extra compute.
- Several argue this is not a clear “win for scaling laws”: large compute increases for modest benchmark deltas look like diminishing returns.
Are LLMs “intelligent” and economically transformative?
- One camp believes current neural methods will surpass humans in most tasks, giving early leaders near “nuclear‑scale” advantage.
- Skeptics counter that LLMs are fast pattern matchers lacking robust reasoning, reliability, or “System 2” thinking, and that real‑world productivity gains are modest so far.
- Broader worry: AI investment and hype may be outpacing tangible ROI, with inference cost and monetization challenges looming.
Talent, ethics, and adoption of Grok
- Discussion over whether high compensation outweighs ethical or political concerns about working for certain US or Chinese labs.
- For businesses, some would adopt Grok if it’s cheaper or better and API‑compatible; others consider reliance on any closed, politically entangled provider an unacceptable strategic risk.