AI PCs Aren't Good at AI: The CPU Beats the NPU

Scope and Benchmark Validity

  • Many commenters say the headline is misleading: the tests cover one Qualcomm NPU in a Surface, not “AI PCs” in general, and don’t include AMD/Intel or Apple NPUs.
  • Several point out the RTX 4080 result (~2 TOPS vs ~40–80 expected) as evidence the benchmark or measurement method is flawed.
  • Suspected issues:
    • ONNX Runtime overhead for a very small graph.
    • No warmup runs and too few iterations.
    • Asynchronous GPU timing done incorrectly.
    • Unfavorable tensor shapes ([1,6,1500,1500] instead of channels-last, odd sizes like 1500).
    • Using generic frameworks/converters instead of vendor-native tooling.
  • Qualcomm-specific profiling shared in the thread shows work going to vector cores, not tensor cores, and overhead from quantization/layout conversions; suggests the test underutilizes the NPU.

NPUs: Speed vs Efficiency

  • Multiple comments stress that NPUs are mainly about power efficiency and freeing CPU/GPU for other work, not peak speed.
  • For small, steady or background tasks (speech, OCR, filters, photo indexing), NPUs can deliver good ops/watt even if absolute throughput is modest.
  • Others counter that if NPUs deliver only a tiny fraction of advertised TOPS on realistic workloads, the value proposition is questionable.

Memory and System Architecture

  • Low MAC utilization is attributed largely to memory bandwidth and placement: lots of compute that can’t be fed fast enough.
  • On tablets/laptops with limited DRAM channels and poorly integrated accelerators, both CPU and NPU end up memory-bound.
  • Some argue future designs (better cache attachment, bandwidth, shape-aware kernels) could unlock more of the theoretical TOPS.

Software Stack and Ecosystem

  • Strong criticism of Qualcomm’s software stack (QNN, tooling, docs, error reporting) and ONNX conversion tools; contrast with Nvidia’s mature CUDA/cuDNN ecosystem.
  • Deploying on NPUs often requires heavy, hardware-specific optimization (IREE, XLA, QNN, CoreML), which is effectively a full-time specialty.

Marketing, Demand, and Use Cases

  • Many see “AI PC” and huge TOPS numbers as mainly marketing, driven by Nvidia’s valuation and OS vendors’ AI roadmaps.
  • Some users feel they’re paying for silicon they don’t need; others note unused hardware has long been normal in mass-market PCs.
  • Concrete benefits cited: on-device photo search, Face ID, transcription, filters, Recall-like OCR/search—mostly OS-level, small-model features.