2024-05-13

Intel announces the Aurora supercomputer has broken the exascale barrier

Project history & vendor choices

Aurora was announced in 2015 and was originally intended to be the first exascale system; delays and redesigns led to criticism of Intel’s execution.
Some defend Intel’s engineering strength and note DOE wanted multiple GPU/CPU vendors (Intel, AMD, Nvidia) to avoid a single-vendor bottleneck and to subsidize a broader ecosystem.
There is skepticism about overall taxpayer value and claims that alternative architectures (e.g., Cerebras) might have achieved more raw flops per dollar.

Performance, benchmarks, and FLOPS

The announcement coincides with the Spring TOP500 list: Aurora is now at ~1.0 exaflops Rmax (LINPACK), still second behind Frontier.
The jump from ~585 PFLOPS (Nov 2023) is attributed to the system’s difficult commissioning, not a mid-life upgrade.
Discussion clarifies that TOP500 rankings are FP64/LINPACK only; many “AI flops” numbers use lower precision (FP16, BF16, FP8, INT8).
There’s an extended side debate on the meaning and notation of FLOPS, FLOP/s, and “FLOPS/s”.

Power efficiency and architecture debates

Frontier (AMD) is noted as significantly more power efficient than Aurora (Intel) in kW per PFLOPS.
Frontier also converts more of its theoretical peak into measured LINPACK performance.
Thread participants debate how much efficiency gaps are due to process node (TSMC vs Intel) vs architecture and power limits, with examples from desktop CPUs and GPUs.
Some note Aurora’s GPUs themselves are fabricated at TSMC, complicating simple Intel-vs-AMD narratives.

Is “exascale” a real barrier?

Several argue “exascale barrier” is marketing language: unlike the sound barrier, nothing qualitatively changes at exactly 10¹⁸ FLOP/s.
Others counter that exascale marked a long-planned community target with real challenges: power budgets, failure rates, I/O bottlenecks, and parallel software at extreme scale.
Consensus leans toward calling it a difficult milestone/goal rather than a physics-like barrier.

Usage patterns and scientific value

Most HPC systems run many jobs concurrently, but “hero runs” sometimes take most or all of the machine (e.g., weather prediction, large MD, climate, lattice QCD).
Large systems are justified by:
- Research in distributed systems, concurrency, and architecture.
- National-security workloads (nuclear stockpile simulations, classified physics).
- Scientific problems that need tightly coupled, massive parallelism.
Some contributors argue many scientific problems are better served by many smaller clusters or cloud-like “embarrassingly parallel” approaches, which can be cheaper and more productive.

AI, industry clusters, and TOP500 visibility

Aurora and similar systems are now heavily marketed as “AI-centric”; critics see this as bandwagon PR, but others note GPUs for HPC have long doubled as ML accelerators.
National labs are actively courting AI projects, offering significant free compute and collaboration, and this can be attractive to startups compared with cloud GPU costs.
Large private ML clusters (e.g., at tech and finance firms) often don’t appear on TOP500 because:
- They’re busy doing production work and not taken down for LINPACK.
- They lack full-system MPI/network configuration optimized for LINPACK.
- Many AI-focused GPUs or configurations have weak FP64 and are not tuned for that benchmark.

Power, cost, and broader impacts

Aurora reportedly consumes nearly 40 MW, making it one of the highest power draws in TOP500; some view this as wasteful, others as acceptable for flagship capability.
There is recurring skepticism about “willy waving” and national prestige versus broad societal benefit, set against recognition that such systems helped drive GPU, software (BLAS, CUDA), and containerization advances used widely today.