Fei-Fei Li: Spatial intelligence is the next frontier in AI [video]
LLMs vs Computer Vision and Spatial AI
- Several commenters feel LLM hype has drained jobs, funding, and mindshare from computer vision, RL, and robotics, despite CVPR-style research continuing.
- Others note strong recent CV progress (e.g., segmentation, depth, NeRFs, Gaussian splats) and argue LLM advances indirectly accelerate vision via better tooling and compute.
- Some sectors (defense, aviation, UAVs, automotive) still depend on classic, real‑time vision; LLMs are seen as unsuitable for tight spatial control loops.
- A minority frame the LLM wave as an opportunity: less competition to innovate in under‑funded CV/3D areas.
Spatial Reasoning Limitations of Current Models
- Multiple concrete failures reported: LLMs mis-handle basic spatial relationships in geolocation tasks, 2D optimization, CAD/OpenSCAD code, and even counting polygon sides.
- In a detailed geolocation case, the model could identify the city/area from a low‑quality image but repeatedly failed to place crosswalks and buildings consistently in a bird’s‑eye schematic, despite step‑by‑step corrections.
- Text-to-image pipelines are seen as especially weak: the text understanding may be fine, but translation into coherent spatial layouts often collapses.
Is Spatial AI Fundamentally Harder?
- One line of argument: real‑world spatiotemporal dynamics are sparse, nonlinear and structurally different from sequence prediction; existing public CS literature lacks general, scalable representations of arbitrary spatial relationships.
- This commenter references non‑public government research into high‑dimensional “cutting” data structures for complex geometry and claims universal solutions cannot exist.
- Others push back, citing practical successes (video models, NeRFs, 3D Gaussians, geometric methods) and questioning both the “impossible in principle” framing and the reliance on undocumented “dark” research.
- Debate emerges over whether transformer‑based multimodal models already provide a viable path to spatial reasoning, or whether deeper theoretical breakthroughs in data structures are needed.
3D Reconstruction and Scan‑to‑CAD
- Several practitioners describe work on detecting planes, edges, and pipes from point clouds, compressing large scans into efficient CAD‑like models.
- There is optimism that RL/ML can soon outperform classical photogrammetry and SfM (e.g., COLMAP) for buildings and indoor scenes, unlocking value across construction, robotics, AR/VR, and mapping.
- Funding remains challenging: investors want near‑term traction, while researchers emphasize broader, longer‑term implications.
Data, Embodiment, and Environments
- Commenters pick up on Li’s “no internet of 3D space” point: spatial AI lacks an equivalent of massive text corpora.
- Two main data strategies are discussed:
- Synthetic/game‑engine worlds: scalable but plagued by sim‑to‑real gaps.
- Real‑world capture (multi-sensor, multi-view): realistic but creates huge MLOps challenges around storage, alignment, labeling, and representation.
- Some argue intelligence must be embodied and embedded in an environment; proposals include fleets of simple robots gathering experience in shared “playpens,” or highly realistic simulations.
- Others note that humans function with coarse heuristics; a “child‑level” spatial understanding may be useful long before precise physical world models are achieved.
Human Spatial Intelligence Analogies
- People discuss wide individual variation in spatial skills, aphantasia without spatial deficits, and “car‑proprioception” when parking.
- There is debate on how much spatial ability is innate vs learned; examples from animals (chicks, horses, ducks) are cited as evidence of hard‑wired spatial/visual competencies, with some skepticism and counter‑links.
Reactions to the Talk and Li’s Role
- Many praise the talk as a rare, de‑hyped framing of what comes after language‑centric AI, especially her focus on spatial intelligence and data problems.
- Her hiring emphasis on “intellectual fearlessness” is seen as appropriate for building entirely new datasets and infrastructures.
- A side thread discusses her remarks about age; some view them as natural context, others see mild over‑emphasis, and there is minor debate over the extent of her “genius” status.