%CPU utilization is a lie
Hyperthreading, “Cores”, and Terminology
- Several comments criticize treating a 12-core/24-thread CPU as “24 cores”; OSes and clouds expose “vCPUs” that map 1:1 to hardware threads, which misleads people into assuming linear scaling.
- Analogies (two chefs/one stove, 2‑ply toilet paper) emphasize that SMT threads share execution units and are not equivalent to full cores.
- Some note real, observable differences between SMT siblings and separate cores (e.g., TLB flush effects, shared caches, memory bandwidth).
When Hyperthreading Helps or Hurts
- Impact is heavily workload‑ and architecture‑dependent.
- Database and multi-user/IO-bound systems often see ~10–20% or more throughput gains, sometimes even at moderate utilization.
- HPC and tightly vectorized, memory‑bandwidth‑bound workloads often see little or negative benefit; disabling SMT can simplify tuning.
- SMT can interact with thermal limits and turbo behavior but usually doesn’t dominate power; multi-core and vector units matter more.
- There’s debate over architectures: AMD SMT is said to behave “closer to a full core” in some Zen generations, IBM POWER leans heavily on many-way SMT, while Intel’s HT often delivers smaller incremental gains.
CPU Utilization as a Misleading Metric
- Many point out utilization is formally “fraction of time not idle,” not “fraction of maximum useful work.” That’s well-defined but often misinterpreted.
- Non-linearities from shared caches, memory bandwidth, interconnects, spinlocks, and frequency scaling mean 60% vs 80% utilization can correspond to dramatically different latency.
- Typical 1–60s averaging windows hide 10–100ms bursts that actually drive latency SLOs. Some advocate measuring short-window p99/p100 CPU usage instead.
- Power draw and temperature, or instructions-per-cycle (IPC), sometimes correlate better with “real” work than %CPU alone, but are themselves non-linear and hard to interpret.
Queueing Theory and Capacity Planning
- Multiple commenters connect this to classic queueing theory: above roughly 60% utilization, queueing delay grows quickly; around 80% it can explode, depending on workload.
- Some SREs treat 40–60% average CPU as “effectively full” for latency-sensitive systems, scaling out before hitting higher plateaus. Others argue IO‑bound apps can safely run hotter.
Benchmarks, Tooling, and Alternatives
- stress-ng is noted as designed to max out components, not mimic real apps; real workloads (nginx, memcached, databases) often show “hockey stick” degradation near saturation.
- Suggested tools/metrics: perf/ftrace for stalls and IPC, load average and run queue length, queue depth, RPS/latency, power usage, GPU FLOPs vs theoretical peak, etc.
- Some argue utilization remains a useful “semi-crude” indicator when combined with business metrics (latency, RPS) and proper load testing.
Other Themes
- OS accounting mostly counts scheduled time; busy-waiting and memory stalls still show as “busy.”
- Hyperthreading is disabled by default in some security-focused OSes; SMT also interacts with per-core licensing.
- Several note that both CPU % and memory reporting in mainstream OS tools are simplistic and often misunderstood, yet still widely relied upon.