2024-10-16

AI PCs Aren't Good at AI: The CPU Beats the NPU

Scope and Benchmark Validity

Many commenters say the headline is misleading: the tests cover one Qualcomm NPU in a Surface, not “AI PCs” in general, and don’t include AMD/Intel or Apple NPUs.
Several point out the RTX 4080 result (~2 TOPS vs ~40–80 expected) as evidence the benchmark or measurement method is flawed.
Suspected issues:
- ONNX Runtime overhead for a very small graph.
- No warmup runs and too few iterations.
- Asynchronous GPU timing done incorrectly.
- Unfavorable tensor shapes ([1,6,1500,1500] instead of channels-last, odd sizes like 1500).
- Using generic frameworks/converters instead of vendor-native tooling.
Qualcomm-specific profiling shared in the thread shows work going to vector cores, not tensor cores, and overhead from quantization/layout conversions; suggests the test underutilizes the NPU.

NPUs: Speed vs Efficiency

Multiple comments stress that NPUs are mainly about power efficiency and freeing CPU/GPU for other work, not peak speed.
For small, steady or background tasks (speech, OCR, filters, photo indexing), NPUs can deliver good ops/watt even if absolute throughput is modest.
Others counter that if NPUs deliver only a tiny fraction of advertised TOPS on realistic workloads, the value proposition is questionable.

Memory and System Architecture

Low MAC utilization is attributed largely to memory bandwidth and placement: lots of compute that can’t be fed fast enough.
On tablets/laptops with limited DRAM channels and poorly integrated accelerators, both CPU and NPU end up memory-bound.
Some argue future designs (better cache attachment, bandwidth, shape-aware kernels) could unlock more of the theoretical TOPS.

Software Stack and Ecosystem

Strong criticism of Qualcomm’s software stack (QNN, tooling, docs, error reporting) and ONNX conversion tools; contrast with Nvidia’s mature CUDA/cuDNN ecosystem.
Deploying on NPUs often requires heavy, hardware-specific optimization (IREE, XLA, QNN, CoreML), which is effectively a full-time specialty.

Marketing, Demand, and Use Cases

Many see “AI PC” and huge TOPS numbers as mainly marketing, driven by Nvidia’s valuation and OS vendors’ AI roadmaps.
Some users feel they’re paying for silicon they don’t need; others note unused hardware has long been normal in mass-market PCs.
Concrete benefits cited: on-device photo search, Face ID, transcription, filters, Recall-like OCR/search—mostly OS-level, small-model features.

Related topics