AmpereOne: Cores Are the New MHz
Performance and Power Efficiency
- Multiple comments compare AmpereOne 192-core (≈276 W TDP) against AMD EPYC 192‑core parts.
- Some argue AMD’s 192-core chips, despite using roughly double the power, offer more total threads via SMT and end up similar or better in performance-per-watt.
- Linked benchmarks (via Phoronix, videos) suggest AMD’s 192-core EPYC can outperform the AmpereOne in many workloads, and that AMD may be “king of efficiency” for now.
- Idle power is a concern: reports of EPYC systems idling at ~100 W+ (IO die heavy), while AmpereOne’s minimum draw is also noted as unusually high.
Value, Pricing, and TCO
- AmpereOne 192-core is cited around $5.5k; a comparable high-end EPYC part around $15k.
- Some say EPYC wins performance-per-watt but loses performance-per-dollar at list price.
- Others counter that you must consider total cost (rack space, power, cooling, networking) and usually compare 2×Ampere vs 1×EPYC when SMT is involved.
- Several note big buyers rarely pay list price for x86, which may narrow the apparent cost gap; discounting for Ampere is unclear.
SMT vs Many Simple Cores
- Debate over the value of SMT: measured gains range from ~15–25% on earlier Zen to up to ~50% on some Zen5 benchmarks, not a 2× speedup.
- SMT can trade throughput for latency and can interfere with neighboring threads; this matters by workload.
- Some security concerns in multi-tenant environments are discussed, with core scheduling proposed as a mitigation.
- Ampere cores do not use SMT; advocates say this yields more predictable per-thread behavior.
Scalability, Memory, and Real Workloads
- Many note it’s hard for a single job to use 192+ cores efficiently; many algorithms stop scaling well beyond 8–32 threads (Amdahl’s law).
- The “192 cores as 48×4-core servers” framing is seen as more realistic for VM-heavy or microservice workloads.
- 10 GB RAM “per core” is presented as a way to reason about density, but actual memory assignment is via VMs and QoS rather than hard binding.
- Examples given: parallel backups, compression, and microservices often being memory-light and I/O-bound, not CPU-bound.
LLMs and Non-GPU Use
- Running a 405B-parameter LLM on AmpereOne CPU at just under 1 token/sec is described as both “really slow” and “not bad given model size.”
- Some argue that for experimentation, offline use, or privacy (e.g., not sending code to cloud LLMs), a slow local model can still be valuable.
- Others point out that for serious LLM workloads, GPUs and batched inference are crucial; current benchmark resources often lack CPU/other-accelerator data.
Architecture, Ecosystem, and Market Share
- There is discussion about ARM’s growing presence in the datacenter; one claim about “half of AWS CPUs” being ARM is corrected to “half of all Arm server CPUs are in AWS,” and separately that ~half of new AWS CPU capacity has been Graviton.
- Some see x86-64 as still very competitive, especially AMD, questioning the narrative that ARM automatically means better efficiency.
- Old architectures like SPARC are invoked: some lament its demise, others say SPARC was slow and economically unsustainable.
Parallel Software and Language Frustrations
- Several ask why more software isn’t parallelized despite multicore being standard for over a decade.
- Answers focus on: difficulty of parallel programming, debugging race conditions, limited payoff for I/O-bound workloads, and algorithms that don’t scale well.
- A long subthread explores high-performance computing, genetic algorithms, and a desire for new languages that make large-scale parallelism automatic and user-friendly; this is presented more as aspiration than current reality.
Power Delivery and Cooling Context
- The article’s note about avoiding “exotic” 240 V power is debated: outside North America, 230–400 V and three-phase power are common, even in homes.
- In the US, 240 V is widespread for large appliances but not convenient in arbitrary locations; running new 240 V circuits can be non-trivial.
- Commenters see water cooling as the truly “exotic” part for many deployments, while modern high-core-count servers still rely on substantial air cooling and fans.