2024-06-14

AI Search: The Bitter-Er Lesson

What “search” means here

Most commenters read “search” as classic AI tree search (minimax, MCTS, breadth/depth-first), not web search or RAG.
For LLMs this would mean branching over candidate thoughts/solutions, evaluating them, pruning, then answering – akin to “pondering”.

Perceived promise of adding search to LLMs

Could let models spend more compute on hard problems and less on easy ones.
Might turn today’s “intuitive oracle” LLMs into explicit problem solvers that can revise and refine plans before replying.
Some see this as a plausible path to much stronger systems or even “AI foom,” especially in domains with cheap, automated evaluation (games, theorem proving, fuzzing, some science tasks).

Limits: value functions and search spaces

Strong objection: chess works because there is a well-defined state space and fast, good value function; real-world tasks and “AI research” do not.
Value functions today are highly domain-specific; general ones are lacking and their feasibility is unclear.
For broad domains (AI research, curing Alzheimer’s, “cure cancer”), the state space and transitions are themselves unclear.

Compute and practicality

Tree search over token sequences is computationally enormous (branching factor in the tens of thousands at token level).
Even coarse-grained idea-level branching could be very expensive; recent papers using search need drastically fewer rollouts than game AIs, suggesting cost pressure.
Debate over train-time vs inference-time cost tradeoffs; 100–1000× inference cost may be unacceptable for many applications.

Alignment and superintelligence debates

Some warn: anything that accelerates paths to superintelligence worsens alignment risks; article is criticized for ignoring “what to optimize for” and control.
Others are skeptical that “superintelligence” is even a coherent or reachable concept, or see AGI as requiring multiple unknown breakthroughs and long timelines.

World models, generalization, and LLM limits

Repeated concern that current LLMs lack robust world models and generalization; they remix text more than they reason.
Without reliable internal models, search may just traverse a space of biased, sometimes false beliefs.
Several argue we still need mechanisms to learn usable world models (e.g., from video, rich simulations, adjustable abstraction levels).

Symbolic vs statistical approaches

Commenters note that classical search, planning, and theorem-proving already have near-optimal algorithms under known tradeoffs (soundness, completeness, efficiency).
Some advocate hybrid neuro-symbolic systems where logic, simulators, or ontologies provide structure and evaluation, with LLMs as generators.

Related topics