AI PCs Aren't Good at AI: The CPU Beats the NPU
Scope and Benchmark Validity
- Many commenters say the headline is misleading: the tests cover one Qualcomm NPU in a Surface, not “AI PCs” in general, and don’t include AMD/Intel or Apple NPUs.
- Several point out the RTX 4080 result (~2 TOPS vs ~40–80 expected) as evidence the benchmark or measurement method is flawed.
- Suspected issues:
- ONNX Runtime overhead for a very small graph.
- No warmup runs and too few iterations.
- Asynchronous GPU timing done incorrectly.
- Unfavorable tensor shapes ([1,6,1500,1500] instead of channels-last, odd sizes like 1500).
- Using generic frameworks/converters instead of vendor-native tooling.
- Qualcomm-specific profiling shared in the thread shows work going to vector cores, not tensor cores, and overhead from quantization/layout conversions; suggests the test underutilizes the NPU.
NPUs: Speed vs Efficiency
- Multiple comments stress that NPUs are mainly about power efficiency and freeing CPU/GPU for other work, not peak speed.
- For small, steady or background tasks (speech, OCR, filters, photo indexing), NPUs can deliver good ops/watt even if absolute throughput is modest.
- Others counter that if NPUs deliver only a tiny fraction of advertised TOPS on realistic workloads, the value proposition is questionable.
Memory and System Architecture
- Low MAC utilization is attributed largely to memory bandwidth and placement: lots of compute that can’t be fed fast enough.
- On tablets/laptops with limited DRAM channels and poorly integrated accelerators, both CPU and NPU end up memory-bound.
- Some argue future designs (better cache attachment, bandwidth, shape-aware kernels) could unlock more of the theoretical TOPS.
Software Stack and Ecosystem
- Strong criticism of Qualcomm’s software stack (QNN, tooling, docs, error reporting) and ONNX conversion tools; contrast with Nvidia’s mature CUDA/cuDNN ecosystem.
- Deploying on NPUs often requires heavy, hardware-specific optimization (IREE, XLA, QNN, CoreML), which is effectively a full-time specialty.
Marketing, Demand, and Use Cases
- Many see “AI PC” and huge TOPS numbers as mainly marketing, driven by Nvidia’s valuation and OS vendors’ AI roadmaps.
- Some users feel they’re paying for silicon they don’t need; others note unused hardware has long been normal in mass-market PCs.
- Concrete benefits cited: on-device photo search, Face ID, transcription, filters, Recall-like OCR/search—mostly OS-level, small-model features.