2025-09-03

%CPU utilization is a lie

Hyperthreading, “Cores”, and Terminology

Several comments criticize treating a 12-core/24-thread CPU as “24 cores”; OSes and clouds expose “vCPUs” that map 1:1 to hardware threads, which misleads people into assuming linear scaling.
Analogies (two chefs/one stove, 2‑ply toilet paper) emphasize that SMT threads share execution units and are not equivalent to full cores.
Some note real, observable differences between SMT siblings and separate cores (e.g., TLB flush effects, shared caches, memory bandwidth).

When Hyperthreading Helps or Hurts

Impact is heavily workload‑ and architecture‑dependent.
- Database and multi-user/IO-bound systems often see ~10–20% or more throughput gains, sometimes even at moderate utilization.
- HPC and tightly vectorized, memory‑bandwidth‑bound workloads often see little or negative benefit; disabling SMT can simplify tuning.
SMT can interact with thermal limits and turbo behavior but usually doesn’t dominate power; multi-core and vector units matter more.
There’s debate over architectures: AMD SMT is said to behave “closer to a full core” in some Zen generations, IBM POWER leans heavily on many-way SMT, while Intel’s HT often delivers smaller incremental gains.

CPU Utilization as a Misleading Metric

Many point out utilization is formally “fraction of time not idle,” not “fraction of maximum useful work.” That’s well-defined but often misinterpreted.
Non-linearities from shared caches, memory bandwidth, interconnects, spinlocks, and frequency scaling mean 60% vs 80% utilization can correspond to dramatically different latency.
Typical 1–60s averaging windows hide 10–100ms bursts that actually drive latency SLOs. Some advocate measuring short-window p99/p100 CPU usage instead.
Power draw and temperature, or instructions-per-cycle (IPC), sometimes correlate better with “real” work than %CPU alone, but are themselves non-linear and hard to interpret.

Queueing Theory and Capacity Planning

Multiple commenters connect this to classic queueing theory: above roughly 60% utilization, queueing delay grows quickly; around 80% it can explode, depending on workload.
Some SREs treat 40–60% average CPU as “effectively full” for latency-sensitive systems, scaling out before hitting higher plateaus. Others argue IO‑bound apps can safely run hotter.

Benchmarks, Tooling, and Alternatives

stress-ng is noted as designed to max out components, not mimic real apps; real workloads (nginx, memcached, databases) often show “hockey stick” degradation near saturation.
Suggested tools/metrics: perf/ftrace for stalls and IPC, load average and run queue length, queue depth, RPS/latency, power usage, GPU FLOPs vs theoretical peak, etc.
Some argue utilization remains a useful “semi-crude” indicator when combined with business metrics (latency, RPS) and proper load testing.

Other Themes

OS accounting mostly counts scheduled time; busy-waiting and memory stalls still show as “busy.”
Hyperthreading is disabled by default in some security-focused OSes; SMT also interacts with per-core licensing.
Several note that both CPU % and memory reporting in mainstream OS tools are simplistic and often misunderstood, yet still widely relied upon.

Related topics