AI Search: The Bitter-Er Lesson

What “search” means here

  • Most commenters read “search” as classic AI tree search (minimax, MCTS, breadth/depth-first), not web search or RAG.
  • For LLMs this would mean branching over candidate thoughts/solutions, evaluating them, pruning, then answering – akin to “pondering”.

Perceived promise of adding search to LLMs

  • Could let models spend more compute on hard problems and less on easy ones.
  • Might turn today’s “intuitive oracle” LLMs into explicit problem solvers that can revise and refine plans before replying.
  • Some see this as a plausible path to much stronger systems or even “AI foom,” especially in domains with cheap, automated evaluation (games, theorem proving, fuzzing, some science tasks).

Limits: value functions and search spaces

  • Strong objection: chess works because there is a well-defined state space and fast, good value function; real-world tasks and “AI research” do not.
  • Value functions today are highly domain-specific; general ones are lacking and their feasibility is unclear.
  • For broad domains (AI research, curing Alzheimer’s, “cure cancer”), the state space and transitions are themselves unclear.

Compute and practicality

  • Tree search over token sequences is computationally enormous (branching factor in the tens of thousands at token level).
  • Even coarse-grained idea-level branching could be very expensive; recent papers using search need drastically fewer rollouts than game AIs, suggesting cost pressure.
  • Debate over train-time vs inference-time cost tradeoffs; 100–1000× inference cost may be unacceptable for many applications.

Alignment and superintelligence debates

  • Some warn: anything that accelerates paths to superintelligence worsens alignment risks; article is criticized for ignoring “what to optimize for” and control.
  • Others are skeptical that “superintelligence” is even a coherent or reachable concept, or see AGI as requiring multiple unknown breakthroughs and long timelines.

World models, generalization, and LLM limits

  • Repeated concern that current LLMs lack robust world models and generalization; they remix text more than they reason.
  • Without reliable internal models, search may just traverse a space of biased, sometimes false beliefs.
  • Several argue we still need mechanisms to learn usable world models (e.g., from video, rich simulations, adjustable abstraction levels).

Symbolic vs statistical approaches

  • Commenters note that classical search, planning, and theorem-proving already have near-optimal algorithms under known tradeoffs (soundness, completeness, efficiency).
  • Some advocate hybrid neuro-symbolic systems where logic, simulators, or ontologies provide structure and evaluation, with LLMs as generators.