2024-12-06

Google says AI weather model masters 15-day forecast

Scope and novelty of the model

Commenters say this is a new DeepMind model (GenCast), significantly better than prior AI weather models.
It’s trained on ~40 years of ERA5 reanalysis data (1979–2018) and evaluated mainly on 2019.
Main selling points: 15‑day global forecasts, high claimed skill vs ECMWF, and much lower run-time (minutes vs hours on HPC).

Training, backtesting, and overfitting concerns

Evaluation is largely on historical “held-out” data (2019), not on live, forward-in-time forecasts yet.
Multiple commenters worry that hyperparameter tuning and model iteration on that test period quietly overfit, inflating apparent generalization.
Others respond that backtesting on post‑training years is standard practice; 2019 is after the training window, so it is at least a genuine out-of-sample period in calendar time.

Accuracy, extremes, and tail risks

Several people care less about “97% of cases” and more about the 3% that are wrong: are they trivial drizzle misses or catastrophic-storm failures?
Concern that AI models may do great on common, stable regimes but fail badly on rare, high-impact events (bomb cyclones, unusual hurricanes).
Some note traditional models also struggle here; questions about whether AI actually improves extreme-event skill.

AI vs physics-based / causal models

One camp argues physics-based numerical models encode causal structure, are more interpretable, and handle distribution shifts better.
Another notes that traditional models also have many heuristics and tuning; they’re not pure first-principles.
A hybrid future is discussed: physics cores with ML components (e.g., neural differential equations, emulators of dynamical cores).
Some evidence is cited that AI weather models reproduce classic dynamical behaviors, suggesting they’re more than naive pattern-matching.

Understanding vs pure prediction

Several worry AI forecasts improve utility but not scientific understanding; weights are opaque.
Counterargument: operational forecasting is about usable predictions; understanding can still be pursued separately, and AI outputs can themselves be studied.

Operational, institutional, and trust issues

GenCast depends on ECMWF-style reanalysis and initial conditions; savings partly externalize to those systems.
DeepMind claims code/weights/forecasts will be released; some suspect eventual monetization or lock-in.
Skepticism about Google’s claims is fueled by past missteps (Google Flu Trends, Gemini rollout), though others point to DeepMind’s strong track record (e.g., Alpha* work).

Climate change and distribution shift

Debate over how much changing climate and evolving weather statistics will degrade AI model skill over time.
Some think underlying atmospheric dynamics are stable enough that regular retraining will suffice; others think future, shifted regimes could cause sharp accuracy drops.

Related topics