Fei-Fei Li: Spatial intelligence is the next frontier in AI [video]

LLMs vs Computer Vision and Spatial AI

  • Several commenters feel LLM hype has drained jobs, funding, and mindshare from computer vision, RL, and robotics, despite CVPR-style research continuing.
  • Others note strong recent CV progress (e.g., segmentation, depth, NeRFs, Gaussian splats) and argue LLM advances indirectly accelerate vision via better tooling and compute.
  • Some sectors (defense, aviation, UAVs, automotive) still depend on classic, real‑time vision; LLMs are seen as unsuitable for tight spatial control loops.
  • A minority frame the LLM wave as an opportunity: less competition to innovate in under‑funded CV/3D areas.

Spatial Reasoning Limitations of Current Models

  • Multiple concrete failures reported: LLMs mis-handle basic spatial relationships in geolocation tasks, 2D optimization, CAD/OpenSCAD code, and even counting polygon sides.
  • In a detailed geolocation case, the model could identify the city/area from a low‑quality image but repeatedly failed to place crosswalks and buildings consistently in a bird’s‑eye schematic, despite step‑by‑step corrections.
  • Text-to-image pipelines are seen as especially weak: the text understanding may be fine, but translation into coherent spatial layouts often collapses.

Is Spatial AI Fundamentally Harder?

  • One line of argument: real‑world spatiotemporal dynamics are sparse, nonlinear and structurally different from sequence prediction; existing public CS literature lacks general, scalable representations of arbitrary spatial relationships.
  • This commenter references non‑public government research into high‑dimensional “cutting” data structures for complex geometry and claims universal solutions cannot exist.
  • Others push back, citing practical successes (video models, NeRFs, 3D Gaussians, geometric methods) and questioning both the “impossible in principle” framing and the reliance on undocumented “dark” research.
  • Debate emerges over whether transformer‑based multimodal models already provide a viable path to spatial reasoning, or whether deeper theoretical breakthroughs in data structures are needed.

3D Reconstruction and Scan‑to‑CAD

  • Several practitioners describe work on detecting planes, edges, and pipes from point clouds, compressing large scans into efficient CAD‑like models.
  • There is optimism that RL/ML can soon outperform classical photogrammetry and SfM (e.g., COLMAP) for buildings and indoor scenes, unlocking value across construction, robotics, AR/VR, and mapping.
  • Funding remains challenging: investors want near‑term traction, while researchers emphasize broader, longer‑term implications.

Data, Embodiment, and Environments

  • Commenters pick up on Li’s “no internet of 3D space” point: spatial AI lacks an equivalent of massive text corpora.
  • Two main data strategies are discussed:
    • Synthetic/game‑engine worlds: scalable but plagued by sim‑to‑real gaps.
    • Real‑world capture (multi-sensor, multi-view): realistic but creates huge MLOps challenges around storage, alignment, labeling, and representation.
  • Some argue intelligence must be embodied and embedded in an environment; proposals include fleets of simple robots gathering experience in shared “playpens,” or highly realistic simulations.
  • Others note that humans function with coarse heuristics; a “child‑level” spatial understanding may be useful long before precise physical world models are achieved.

Human Spatial Intelligence Analogies

  • People discuss wide individual variation in spatial skills, aphantasia without spatial deficits, and “car‑proprioception” when parking.
  • There is debate on how much spatial ability is innate vs learned; examples from animals (chicks, horses, ducks) are cited as evidence of hard‑wired spatial/visual competencies, with some skepticism and counter‑links.

Reactions to the Talk and Li’s Role

  • Many praise the talk as a rare, de‑hyped framing of what comes after language‑centric AI, especially her focus on spatial intelligence and data problems.
  • Her hiring emphasis on “intellectual fearlessness” is seen as appropriate for building entirely new datasets and infrastructures.
  • A side thread discusses her remarks about age; some view them as natural context, others see mild over‑emphasis, and there is minor debate over the extent of her “genius” status.