2024-07-17

NVIDIA Transitions Fully Towards Open-Source Linux GPU Kernel Modules

Linux NVIDIA driver experience today

Many report proprietary drivers on X11 as “rock solid” across multiple GPUs (10xx–40xx), distros (Debian, Arch, Ubuntu, Mint, Pop!_OS, Gentoo, NixOS), and workloads (gaming, CUDA, servers).
Others describe long‑term pain: black screens after resume, DPMS/suspend issues, multi‑monitor problems, driver/kernel mismatches, and breakage after updates—especially on rolling distros or laptops with hybrid graphics.
Several users switched to AMD or Intel because those vendors’ in‑kernel, upstream drivers “just work” and avoid DKMS / out‑of‑tree complexity.
A few users report AMD or Intel GPU instability instead, underscoring highly hardware‑ and setup‑dependent experiences.

Wayland vs X11

Consensus: NVIDIA + Wayland has historically been problematic: flickering, stuttering, glitches, broken HDMI out, HiDPI friction, and app compatibility (e.g., Discord screen sharing, rotation issues).
Driver 555 is widely cited as a major improvement for Wayland (fewer flickers, better gaming, smoother animations), but still not flawless; some compositors (Hyprland, Sway, Plasma 6) vary in stability.
Many satisfied NVIDIA users explicitly stay on X11, citing fewer issues and no perceived benefit from Wayland for single‑monitor setups.

What “open-source GPU kernel modules” actually means

Only the kernel modules are open‑sourced. User‑space drivers (OpenGL/Vulkan stacks, libcuda, GLX libraries) remain proprietary.
Much driver logic has moved into a proprietary firmware blob running on an on‑GPU processor (GSP, RISC‑V–based), with the kernel module as a thin shim.
Firmware is large, signed, and undocumented; some see this as shifting the black box below the kernel rather than removing it.

Security and isolation concerns

Some argue open kernel modules are a meaningful security win: privileged CPU code becomes auditable, while device firmware should be sandboxed via IOMMU.
Others question relying on perfectly implemented IOMMUs and chipsets, and worry about firmware having DMA access to memory and RDMA capabilities.

Motivations and market context

The move is often linked to:
- Reducing maintenance across OSes and architectures.
- Pressure from partners/cloud vendors and Linux dominance in AI/ML and data centers.
- Aligning with Grace Hopper/Blackwell platforms, which reportedly require the open kernel modules.
Some speculate reputation repair (Wayland era, prior hostility to Linux) and competition pressure; others think public shaming alone wouldn’t move a company this profitable.

Impact on ecosystem & remaining limitations

Open kernel modules help Nouveau/NVK development and make it theoretically possible to build fully open userspace stacks.
However, lack of upstreaming into mainline Linux, proprietary user‑space components, and firmware blobs mean:
- CUDA remains closed and dominant.
- True “fully open” driver stacks are still not there.
Some see real progress; others dismiss it as marketing or “throwing a tarball over the wall” until features, power management, and hybrid graphics work out‑of‑the‑box like AMD/Intel.

Related topics