2024-06-04

Intel's Lion Cove Architecture Preview

Hyperthreading/SMT Removal and Workload Impact

Many are curious how Lion Cove without SMT will behave on “everyday” mixed workloads versus synthetic benchmarks.
Reported experiences:
- Disabling SMT can slightly improve some multi-thread benchmarks and gaming, especially when not all cores are saturated and cache hit rates matter.
- For highly parallel CPU-bound tasks (large builds, chess search, RandomX mining, vanity address generation, DB workloads waiting on RAM), SMT can give ~20–30% throughput gains.
Arguments against SMT:
- In high-utilization rendering/HPC or low-power scenarios, it can reduce performance or waste power versus more simple cores.
- Shared resources and caches hurt some HPC and latency-sensitive workloads.
- Side-channel vulnerabilities and validation complexity are major downsides; some OSes disable SMT by default.
Intel is reportedly doing two Lion Cove variants: no-SMT P-cores for hybrid client chips, SMT-enabled P-cores for servers.

Caches, Schedulers, and Core Design

Lion Cove adds an extra cache level: a very low-latency small L0 plus a larger ~192K structure now called L1; seen as “taking the Apple hint” of bigger, faster caches.
Ending the unified scheduler and splitting integer/vector scheduling aligns Intel with AMD and Apple approaches.
Wider integer pipelines and separate vector scheduling reflect workload balancing; some expect future swings between integration and decoupling.

Vector/SIMD, GPUs, NPUs, and Heterogeneous Compute

Vector performance is viewed as essential for databases, crypto, multimedia, modern hash tables, JSON/Unicode parsing, and various throughput workloads.
Many note that relatively few apps are hand-SIMD-optimized; algorithm/data-structure changes often yield bigger wins.
High-level, portable SIMD abstractions (in some newer languages/runtimes) are improving adoption, but SIMD programming is still seen as painful.
Offloading to GPUs/NPUs is useful for large, regular workloads, but data movement and nonstandard APIs limit using them as a replacement for CPU vectors.
More radical ideas (very wide SIMD cores, many-way SMT, SIMT-like CPUs) run into ISA uniformity, OS scheduler, and programmer-model complexity.

Security and Side Channels

Several comments link SMT closely to cache-based side channels; safe use may demand sharing cores only within the same security domain.
Some argue the broader problem is speculative execution, cache sharing, and modern preemption in general, not SMT alone.

ARM vs x86, RISC vs CISC, and Market Position

Debate over whether ARM is still “RISC” given large opcode counts; consensus that modern high-performance ARM and x86 converge on similar deep, complex microarchitectures.
Some expect ARM laptop CPUs (e.g., Qualcomm) to beat x86 on perf/W, though x86 may still lead in absolute performance.
One side argues x86’s legacy/app advantage is decisive; another claims most important workloads are now portable or emulatable, shrinking that advantage.

Skepticism About Marketing and Benchmarks

Multiple commenters treat Intel’s pre-release claims as marketing that historically shifts narrative (first selling HT, now selling its removal).
Strong sentiment to wait for independent benchmarks and workload-specific analysis before drawing conclusions on Lion Cove.

Related topics