El Capitan: New supercomputer is the fastest

Purpose and Role of El Capitan

  • Built at Lawrence Livermore National Laboratory using AMD MI300A APU-based nodes interconnected by HPE’s Slingshot.
  • Officially used to model nuclear weapon performance, aging, and safety, replacing live tests under test-ban regimes.
  • Also expected to support other HPC workloads like fusion research, genomics, and fundamental simulations.

Nuclear Weapons, Deterrence, and Ethics

  • Some commenters are disturbed that leading-edge compute is driven by nuclear weapons work, especially amid geopolitical tensions and stalled disarmament.
  • Others argue supercomputer simulations are preferable to live nuclear testing and are essential for stockpile stewardship and credible deterrence.
  • Clarified that modern work focuses more on reliability, safety, aging, and variable-yield designs than on ever-higher explosive yields.
  • Concern raised that some states might neglect stewardship, risking discovering “use-by dates” on warheads only in crisis.

Why Nuclear Simulations Need Massive Compute

  • Simulations couple many demanding domains: radiation and neutron transport, hydrodynamics, plasma physics, high-temperature chemistry, and aging effects.
  • Extremely small time scales (nanoseconds), extreme conditions (pressures, temperatures, plasmas), and 3D modeling needs drive complexity.
  • Codes often run large ensembles (uncertainty quantification, sensitivity analysis).
  • There is debate over whether they simulate down to subatomic particles; consensus in thread is that full per-particle modeling is infeasible and heavy approximations are required.

Hardware, Performance, and Precision

  • El Capitan is a significant win for AMD in exascale HPC, contrasting with less successful competing efforts.
  • Discussion on FP64 vs lower-precision compute: nuclear/HPC workloads need high precision, unlike LLM training, which tolerates FP16/FP8.
  • AI training clusters (e.g., tens of thousands of H100s) may now exceed national labs in raw (low-precision) FLOPs, but workloads and metrics are not directly comparable.

Topology, Secrecy, and Alternatives

  • Key differentiator of supercomputers is low-latency, high-bandwidth interconnects and specialized topologies; many scientific codes are tightly coupled and not “embarrassingly parallel.”
  • Distributed volunteer projects (Folding@home, SETI@home) work for loosely coupled problems, but not for many nuclear/HPC simulations.
  • Top500 list is seen as incomplete: Chinese labs and major tech companies often withhold benchmark submissions due to sanctions, secrecy, or lack of incentive.

Historical and Miscellaneous Notes

  • Fast Fourier Transform and global seismometer networks were partly driven by nuclear test detection, with significant spillover benefits to geophysics.
  • Nostalgia for earlier supercomputers, front-panel lights, and comparison of historical FLOP records with modern consumer devices.