2024-12-05

AmpereOne: Cores Are the New MHz

Performance and Power Efficiency

Multiple comments compare AmpereOne 192-core (≈276 W TDP) against AMD EPYC 192‑core parts.
Some argue AMD’s 192-core chips, despite using roughly double the power, offer more total threads via SMT and end up similar or better in performance-per-watt.
Linked benchmarks (via Phoronix, videos) suggest AMD’s 192-core EPYC can outperform the AmpereOne in many workloads, and that AMD may be “king of efficiency” for now.
Idle power is a concern: reports of EPYC systems idling at ~100 W+ (IO die heavy), while AmpereOne’s minimum draw is also noted as unusually high.

Value, Pricing, and TCO

AmpereOne 192-core is cited around $5.5k; a comparable high-end EPYC part around $15k.
Some say EPYC wins performance-per-watt but loses performance-per-dollar at list price.
Others counter that you must consider total cost (rack space, power, cooling, networking) and usually compare 2×Ampere vs 1×EPYC when SMT is involved.
Several note big buyers rarely pay list price for x86, which may narrow the apparent cost gap; discounting for Ampere is unclear.

SMT vs Many Simple Cores

Debate over the value of SMT: measured gains range from ~15–25% on earlier Zen to up to ~50% on some Zen5 benchmarks, not a 2× speedup.
SMT can trade throughput for latency and can interfere with neighboring threads; this matters by workload.
Some security concerns in multi-tenant environments are discussed, with core scheduling proposed as a mitigation.
Ampere cores do not use SMT; advocates say this yields more predictable per-thread behavior.

Scalability, Memory, and Real Workloads

Many note it’s hard for a single job to use 192+ cores efficiently; many algorithms stop scaling well beyond 8–32 threads (Amdahl’s law).
The “192 cores as 48×4-core servers” framing is seen as more realistic for VM-heavy or microservice workloads.
10 GB RAM “per core” is presented as a way to reason about density, but actual memory assignment is via VMs and QoS rather than hard binding.
Examples given: parallel backups, compression, and microservices often being memory-light and I/O-bound, not CPU-bound.

LLMs and Non-GPU Use

Running a 405B-parameter LLM on AmpereOne CPU at just under 1 token/sec is described as both “really slow” and “not bad given model size.”
Some argue that for experimentation, offline use, or privacy (e.g., not sending code to cloud LLMs), a slow local model can still be valuable.
Others point out that for serious LLM workloads, GPUs and batched inference are crucial; current benchmark resources often lack CPU/other-accelerator data.

Architecture, Ecosystem, and Market Share

There is discussion about ARM’s growing presence in the datacenter; one claim about “half of AWS CPUs” being ARM is corrected to “half of all Arm server CPUs are in AWS,” and separately that ~half of new AWS CPU capacity has been Graviton.
Some see x86-64 as still very competitive, especially AMD, questioning the narrative that ARM automatically means better efficiency.
Old architectures like SPARC are invoked: some lament its demise, others say SPARC was slow and economically unsustainable.

Parallel Software and Language Frustrations

Several ask why more software isn’t parallelized despite multicore being standard for over a decade.
Answers focus on: difficulty of parallel programming, debugging race conditions, limited payoff for I/O-bound workloads, and algorithms that don’t scale well.
A long subthread explores high-performance computing, genetic algorithms, and a desire for new languages that make large-scale parallelism automatic and user-friendly; this is presented more as aspiration than current reality.

Power Delivery and Cooling Context

The article’s note about avoiding “exotic” 240 V power is debated: outside North America, 230–400 V and three-phase power are common, even in homes.
In the US, 240 V is widespread for large appliances but not convenient in arbitrary locations; running new 240 V circuits can be non-trivial.
Commenters see water cooling as the truly “exotic” part for many deployments, while modern high-core-count servers still rely on substantial air cooling and fans.

Related topics