2024-06-04

Intel Unveils Lunar Lake Architecture

Intel, TSMC, and Foundry Trajectory

Many see Intel’s full reliance on TSMC for Lunar Lake logic/IO as a major shift and a sign of continuing trouble after the 10nm “debacle.”
Some hope this is the bottom and that Intel’s own nodes (Intel 3, 20A, 18A) will still recover; others point to Arrow Lake mixing Intel 20A and TSMC 3nm as a bad sign for Intel’s foundry business.
Competition in high-end CPUs is widely seen as socially and geopolitically important.

Core Configuration and Hyper-Threading (SMT)

Lunar Lake drops SMT, going 4 performance + 4 efficiency cores.
Debate:
- Pro-SMT: helps unoptimized, memory-bound, and especially compilation workloads; older generations saw 20–25% gains.
- Anti-SMT: small or negative benefit on many modern multicore laptops; can hurt rendering and games; E-cores are argued to be a better, lower-overhead way to increase throughput.
Some note Apple’s lack of SMT as evidence that SMT is less critical in mobile-focused designs.

On-Package LPDDR5X Memory and Soldering

Lunar Lake integrates up to 32 GB LPDDR5X on package (128-bit bus).
Pros cited: power savings from lower-voltage, short-trace DRAM and better perf/W, especially for thin-and-light laptops.
Cons: no user RAM upgrades; compared to Xeons-with-HBM and Apple, some see this as customer-hostile with modest real power gains.
32 GB is seen as enough for many local AI assistant workloads (e.g., Windows Copilot) but not for all LLM use cases.
Broader frustration with soldered components and e-waste; if soldered, people want higher default capacities.

Apple / AMD / Intel Efficiency Debate

Multiple explanations for Apple’s perf/W:
- Early access to leading TSMC nodes.
- High IPC via very wide cores and large caches.
- Willingness to ship large, expensive dies and optimize for perf/W over cost.
Counterpoints:
- On the same node (e.g., TSMC 5nm), some AMD CPUs match or beat Apple in specific benchmarks, especially when allowed high power.
- TDP is criticized as misleading; real efficiency comparisons require measured power under load, which reviews rarely provide.
Consensus: ISA (RISC vs CISC) is minor; process, microarchitecture, and product goals dominate.

GPU, Memory Bandwidth, and Small-Form Systems

Hope Lunar Lake’s iGPU advances carry over to desktops; integrated GPUs are still seen as bandwidth-limited versus discrete GPUs with higher-bandwidth memory.
Some are more excited about future AMD APUs (e.g., “Strix Halo” rumors: wider memory bus, larger iGPU) for ITX-sized gaming without a dGPU.

Design Complexity and Microarchitecture Details

The “sea of FUBs” → “sea of cells” shift is discussed as moving from many small, latch-heavy partitions to larger, flop-dominated partitions, improving physical design flow at the cost of tool complexity.
General agreement that modern superscalar OoO CPUs are extremely complex; even advanced university courses often study much older designs.

Security, Enclaves, and Democracy

One tangent argues for hardware-enforced private-state “actors” (enclave-like units) as a foundation for secure democratic infrastructure, instead of today’s shared-state designs.
Others respond that:
- Secure computing can just as easily empower authoritarian regimes, since they control keys.
- High “security” via opaque enclaves may conflict with the transparency needed for democratic trust.

Windows on ARM, Emulation, and CPU Commoditization

Some see Microsoft pushing toward ISA-agnostic Windows (x86, ARM, etc.) as a way to commoditize CPUs and strengthen Microsoft’s position.
Historical note: NT has long run on multiple ISAs; market demand, not Microsoft, kept x86 dominant.
Windows on ARM already exists (with x86 emulation “Prism”), and Snapdragon X laptops are coming; past attempts (e.g., RT, early WoA) had weak uptake, so it’s unclear if this wave will succeed.

NPUs, Local AI, and User Control

Lunar Lake adds an NPU with ~45 TOPS; Intel markets “120 TOPS” including GPU/CPU.
AMD’s NPUs are cited as more power-efficient per TOP; both vendors claim large gen-on-gen efficiency gains, but real-world comparisons remain unclear.
Some worry about always-on “AI coprocessors” as potential surveillance units and ask for true hardware kill switches; no clear answer from the thread on whether such switches will exist.

Scheduling and OS Support

Lunar Lake’s efficiency gains are said to rely heavily on Windows 11’s improved heterogeneous-core scheduler.
Concern that Linux may lag in exploiting P/E cores optimally, especially on client devices, though Intel has incentives to improve Linux scheduling for servers as well.
Lunar Lake is simpler than Meteor Lake (no SMT, fewer core types), which may ease scheduling.

Related topics