Intel's Lion Cove P-Core and Gaming Workloads
Article reception and meta-discussion
- Many readers find the piece excellent but “non-actionable”: only Intel architects can change Lion Cove, and for most developers the takeaway is to keep using generic performance practices (e.g., reduce memory usage).
- Some note that modern CPUs are under-documented, so deep reverse-engineering/benchmark articles fill an important gap even if there’s “not much to comment.”
- Others see it as yet another disappointing Intel launch and express frustration with Intel’s recent product and branding decisions.
Lion Cove / 285K performance, efficiency, and bugs
- Shared benchmarks place the 285K around 12th in gaming, behind 13th/14th-gen Intel flagships and several AMD chips; 3D cache on AMD is credited with big gaming gains.
- In productivity workloads, the 285K can beat a 14900KS and is more power efficient than recent Intel desktop parts, though still less efficient than AMD.
- Thermal issues on Raptor Lake (and the microcode/voltage degradation saga) are cited as validation that “running deep into the performance curve” went too far.
- Lunar Lake is praised for efficiency but criticized for a serious MONITOR/MWAIT bug that breaks some Linux input handling; workarounds remove one of x86’s advantages over Arm.
Benchmarks, methodology, and trust
- A custom meta-benchmark site is debated: ranking logic (non-percentage scoring, neighbor-based interpolation) caused confusing results and a bug, later fixed.
- Critics ask for more transparency (test hardware, workloads, scoring formula); defenders point to per-benchmark drill-down and note 13900K vs 14900K gaming parity is consistent with other data.
E-cores, heterogeneous CPUs, and gaming
- The article disables E-cores to isolate P-core (Lion Cove) behavior; commenters stress this means real-world gaming with E-cores enabled is likely worse.
- Some argue that for a P-core microarchitecture deep-dive this is appropriate and that mainstream “which CPU to buy” reviews already test full configurations.
- Others say E-cores are currently a net negative for gamers: scheduling can put latency-sensitive threads on weaker cores, causing stutter, and community advice often recommends disabling them.
- Responsibility is debated:
- One camp blames Intel for shipping complex heterogeneous designs and relying on imperfect OS schedulers/Thread Director.
- Another emphasizes it’s fundamentally an OS/application issue and affects AMD/Arm heterogeneity too; consumers, however, only perceive “it’s broken.”
- Multiple comments note the difficulty of optimizing for heterogeneous microarchitectures when code and runtimes assume a single target; you either:
- Compile for a generic baseline and lose 1.5–2.5× performance on high-end cores, or
- Optimize for one core type and accept poor performance on the other.
- Some suggest that, long term, homogeneous cores with very wide dynamic power/perf range may be simpler than mixed microarchitectures.
- AMD’s own asymmetry (X3D vs non-X3D CCDs) is cited as a milder but still nontrivial scheduling challenge.
OS scheduling, sleep, and laptops
- A long subthread compares Windows, Linux, and macOS sleep behavior on laptops and handhelds:
- Several claim Windows sleep on consumer laptops is unreliable, with surprise wakeups and background tasks; others say it works fine on most hardware and that bad drivers/firmware are the main culprit.
- Linux is described by some as worse (frequent resume failures, black screens, kernel panics), by others as essentially problem-free.
- macOS is also reported to have external display reconnection issues and “hot bag” incidents.
- There is agreement that users don’t care whose fault it is (OS vs drivers vs CPU vendor); they only see unreliable sleep and power behavior.
Memory architecture and L3 latency
- A question about Intel competing with AMD’s Strix Halo (quad-channel LPDDR5X) leads to debate on whether more memory channels actually help:
- Some assert most workloads are memory-bound and benefit greatly from bandwidth (and from L3-heavy designs like X3D).
- Others counter that LPDDR5X trades higher bandwidth for worse latency and only shines in bandwidth-heavy tasks (e.g., large GPUs, physics); many general workloads still favor lower latency DDR5.
- A key point drawn from the article: Lion Cove’s L3 latency (
83 cycles) is significantly worse than the previous gen (68 cycles) and far worse than Zen 5 (~47 cycles).- Commenters tie this to Lion Cove’s weak gaming results and highlight how X3D’s large, fast L3 “turbo-charges” games.
Understanding the article: resources and profiling nuance
- For readers wanting more background, Hennessy & Patterson’s “Computer Architecture: A Quantitative Approach” and its lighter RISC-V-oriented variant are recommended, plus online appendices.
- Another suggestion is to use an LLM to explain unfamiliar terms section-by-section.
- A mini-discussion on Intel’s top-down analysis:
- Frontend-bound stalls can be misleading because backend issues (e.g., long-latency loads, atomics, cross-NUMA traffic) often manifest as frontend stalls in sampling.
- Proper interpretation requires looking at surrounding instructions, dependencies, and multiple hardware counters—top-down is a starting point, not a definitive diagnosis.