LibreCUDA – Launch CUDA code on Nvidia GPUs without the proprietary runtime
Purpose and Motivation
- Project offers a minimal, open-source CUDA runtime that talks directly to Nvidia’s low-level RM (Resource Manager) interface via ioctls, bypassing the proprietary CUDA user-space stack.
- Goals cited:
- Learn how the stack works and have a “simple, transparent” reference implementation.
- Enable lighter, more debuggable environments and eventually help port CUDA-like APIs to other platforms (e.g., *BSD).
- Challenge Nvidia’s dominance and licensing constraints, and provide an open stack for research and verification.
Technical Scope and Limitations
- Still very early; only a small fraction of the CUDA API is implemented. Several commenters say it must grow ~100× in coverage to be generally usable.
- It still requires Nvidia’s kernel driver (proprietary or the newer open modules). Firmware and GSP remain proprietary.
- CUDA binaries (ELF with SASS) are still needed; replacing
ptxas(the proprietary PTX→SASS compiler) is described as “highly non-trivial” because Nvidia’s ISAs and latencies are undocumented and scheduling is complex. - Some view it as a reference implementation for “run simple stuff” to separate driver vs. compiler vs. hardware bugs.
Legal, Licensing, and Trademark Issues
- Concern that using “CUDA” in the project name and API prefixes invites trademark trouble; discussion of likelihood-of-confusion tests and how far trademarks can reach across industries.
- Debate over whether this can be used to bypass Nvidia’s “no GeForce in datacenters” EULA clause by pairing consumer GPUs with the open kernel driver and AOT-compiled kernels. Applicability of overlapping Nvidia licenses is described as murky and potentially riskier for companies than individuals.
Alternatives and Broader Ecosystem
- Mention of related efforts: ZLUDA (CUDA on non-Nvidia hardware), tinygrad’s direct-ioctl runtimes for AMD and Nvidia, and other compiler stacks (Triton, Numba, Julia, JAX, etc.).
- Some argue Vulkan compute or OpenCL could have been the open standard, but Vulkan/SPIR-V semantics and tooling are seen as less suitable than CUDA, and OpenCL is viewed as effectively abandoned.
Debate Over Value
- Supporters: open stacks reduce lock-in, enable unsupported platforms, avoid vendor whims, and can be used even purely for testing correctness.
- Skeptics: as long as it only runs on Nvidia hardware and is incomplete, the practical benefit is limited; the real win would be a mature CUDA-like API on non-Nvidia GPUs.