2026-02-06

The Waymo World Model

Sensors, Perception, and Tesla Comparisons

Big subthread on whether Waymo’s “world model” implies it could run on cameras only: some say yes in principle, others note the stack and maps were bootstrapped with lidar and radar.
Repeated contrast with Tesla: Tesla uses limited lidar/radar on special fleets for ground-truthing but production cars are camera‑only; some argue Tesla’s depth estimation is now “good enough,” others insist multimodal fusion is inherently safer.
Lots of discussion of human depth perception: binocular vision only works to a few meters; beyond that humans rely on motion parallax, context, size priors, etc. Several argue fixed car cameras miss much of this “extra sensing,” so redundancy (lidar, radar, better optics) is important.

World Models and Synthetic Data

Novelty seen in generating multimodal 3D lidar-like representations from 2D video, then using this to create high‑fidelity simulations (floods, tornadoes, wildfires, wrong‑way drivers, etc.).
Some note prior work on monocular depth and “metric monodepth,” but concede Waymo’s output looks state-of-the-art.
A big implied benefit: if 2D → 3D works well, every dashcam / YouTube / CCTV video becomes potential training data, vastly outscaling Waymo’s own fleet.
Skeptics worry about “laundering” assumptions: simulated worlds built mostly from successful driving might miss or mis-model rare failure modes, and hallucinated edge cases could train unsafe behavior if not carefully validated.

Remote Operators and “Autonomy”

Several link recent reporting and Senate testimony that Waymo uses overseas human “fleet response” agents (including in the Philippines).
Clarification: these agents don’t tele‑drive; they provide high-level guidance when the car is stuck or uncertain (e.g., blocked intersections, protests), with the onboard stack retaining control of the dynamic driving task.
Some see this as normal human‑in‑the‑loop safety; others view the marketing around “autonomous” as misleading and enabled by low‑wage global labor.

Urban Difficulty & Edge Cases

Debate over whether SF is truly hard; many point to medieval European centers, London backstreets, and Asian megacities (Mumbai, Ho Chi Minh, Manila, Dhaka, Old Delhi) as the “final boss.”
Reports from SF and London: Waymo generally handles narrow, steep, and chaotic streets well, but can struggle on ultra‑narrow two‑way roads and during city‑wide stressors (e.g., power outages, parades) when many cars simultaneously need human assistance.
Some ask how the world model is validated on truly novel physical situations (black ice, ball bearings, heavy snow where lane markings fully disappear).

Societal Impact and Alternatives

Thread splits between “progress is inevitable; we survived tractors and electricity” and concern about millions of driving jobs disappearing with little social safety net.
Strong contingent argues that money and effort would be better spent on high‑quality public transit, bikes, and better land use; others counter that most US cities are already car‑centric and AVs will de facto become part of public transit.
Critics highlight that both roads and transit are heavily subsidized; supporters of AVs claim long‑term safety and convenience gains may justify the investment.

Related topics