Grok 3: Another win for the bitter lesson

Meaning of “bitter lesson” and “exception proves the rule”

  • Thread opens with a tangent clarifying that “the exception that proves the rule” originally means “an explicit exception implies a general rule,” not “exceptions logically confirm rules.”
  • Multiple commenters argue the article similarly misuses “the bitter lesson,” which originally says: long‑run progress comes from leveraging computation via general methods, not hand‑coded domain knowledge.
  • Several say the article reduces this to “more chips = win,” ignoring algorithmic efficiency and software design.

Compute vs algorithms, talent, and DeepSeek vs xAI

  • Strong disagreement over whether “just scale compute” is realistic.
  • One side: compute grows exponentially, humans and talent pipelines don’t; scaling hardware is ultimately easier, and raw scale dominates.
  • Other side: algorithmic advances (e.g., DeepSeek’s optimizations under export constraints) can rival or beat brute force; effective use of FLOPs is nontrivial and requires rare expertise.
  • Debate over DeepSeek’s actual GPU count and spend; numbers in public reports are viewed as speculative. Some argue a first‑mover disadvantage: once techniques are public, followers can reproduce results with less compute.

Geopolitics and hardware stack concentration

  • Some see the US concentrating the critical AI stack (TSMC/ASML fabs in US, NVIDIA, big labs) and ask whether this leads to global dominance.
  • Others respond that:
    • China is likely to build an independent stack eventually.
    • Software can be exfiltrated; hardware is the real bottleneck.
    • AI’s actual geopolitical leverage is unclear and may be overhyped.

Grok 3 performance, scaling laws, and benchmark skepticism

  • Many note Grok 3’s top ranking on LMSys Chatbot Arena and strong benchmark bars, but others distrust headline numbers.
  • Concerns: potential training on benchmarks, Goodhart’s law, and suspiciously high scores on reasoning tests (e.g., GPQA Diamond) for a “non‑reasoning” model.
  • Some users report Grok 3 performing impressively in real coding and application tasks; others see only incremental gains for massive extra compute.
  • Several argue this is not a clear “win for scaling laws”: large compute increases for modest benchmark deltas look like diminishing returns.

Are LLMs “intelligent” and economically transformative?

  • One camp believes current neural methods will surpass humans in most tasks, giving early leaders near “nuclear‑scale” advantage.
  • Skeptics counter that LLMs are fast pattern matchers lacking robust reasoning, reliability, or “System 2” thinking, and that real‑world productivity gains are modest so far.
  • Broader worry: AI investment and hype may be outpacing tangible ROI, with inference cost and monetization challenges looming.

Talent, ethics, and adoption of Grok

  • Discussion over whether high compensation outweighs ethical or political concerns about working for certain US or Chinese labs.
  • For businesses, some would adopt Grok if it’s cheaper or better and API‑compatible; others consider reliance on any closed, politically entangled provider an unacceptable strategic risk.