2025-03-23

Bitter Lesson is about AI agents

Interpretations of the Bitter Lesson

Many commenters restate the original “bitter lesson” as: scalable, general methods plus more compute/data beat intricate, domain-specific engineering over time.
Others argue the post being discussed oversimplifies this into “just buy more GPUs,” whereas the original was more about simple, scalable algorithms vs brittle, hand‑coded features.
Several claim the slogan has done damage: it’s being treated as dogma to dismiss algorithmic innovation, even though recent progress (transformers, diffusion, distillation, Gaussian splatting, etc.) is largely algorithmic.

Compute, Markets, and Centralization

“More generally beats better” is widely acknowledged in data‑intensive workloads, but some question whether rising GPU and power costs will force a retreat from pure scale.
Chess is used as a cautionary example: huge compute spent to reach superhuman play, but the commercial value ended up mostly in human‑vs‑human platforms. A proposed “second bitter lesson”: making something possible with massive compute doesn’t guarantee a large market.
There is concern that a compute‑centric worldview implies “whoever has the most capital wins,” leading to centralization.

Agents, Variance, and RL in Messy Domains

For AI agents, reliability and variance matter: a system that occasionally goes haywire drives users away (customer‑support bots are cited).
Suggestions include adding variance penalties into loss functions, best‑of‑N sampling with eval filters, and ensembles of independent models for critical decisions.
Others push back that RL in domains without good simulators (e.g., real customer service) is slow, expensive, and constrained by noisy satisfaction signals. Creating realistic simulators or distilled smaller models from real transcripts is proposed but seen as nontrivial.

Self‑Driving as a Test Case

Tesla vs Waymo is heavily debated as evidence for or against the Bitter Lesson.
One side: Waymo’s hybrid of classical control plus deep learning and richer sensors (including LiDAR) “actually works,” while Tesla’s end‑to‑end, camera‑only, data‑driven approach has not delivered. This is framed as a refutation of “just add data/compute.”
The other side: sensors like LiDAR are not “hand‑engineered features” but superior sensing; ultimately, vision‑heavy, end‑to‑end approaches may win once compute and data catch up—though possibly too late for some companies to survive.

Pragmatic Engineering Takeaways

Some practitioners say it’s higher ROI to assume models will rapidly improve, avoid over‑engineering prompts/guardrails, and lean into powerful models plus best‑of‑N rather than intricate wrappers.
Others counter that shipping nonfunctional AI now, hoping future models fix it, is pointless; you either make it work today (possibly with more deterministic systems) or don’t build it.
Multiple comments stress that building datasets, products, and customer bases now may matter more long‑term than perfectly anticipating where the “bitter lesson” leads.

Related topics