2024-08-08

LibreCUDA – Launch CUDA code on Nvidia GPUs without the proprietary runtime

Purpose and Motivation

Project offers a minimal, open-source CUDA runtime that talks directly to Nvidia’s low-level RM (Resource Manager) interface via ioctls, bypassing the proprietary CUDA user-space stack.
Goals cited:
- Learn how the stack works and have a “simple, transparent” reference implementation.
- Enable lighter, more debuggable environments and eventually help port CUDA-like APIs to other platforms (e.g., *BSD).
- Challenge Nvidia’s dominance and licensing constraints, and provide an open stack for research and verification.

Technical Scope and Limitations

Still very early; only a small fraction of the CUDA API is implemented. Several commenters say it must grow ~100× in coverage to be generally usable.
It still requires Nvidia’s kernel driver (proprietary or the newer open modules). Firmware and GSP remain proprietary.
CUDA binaries (ELF with SASS) are still needed; replacing ptxas (the proprietary PTX→SASS compiler) is described as “highly non-trivial” because Nvidia’s ISAs and latencies are undocumented and scheduling is complex.
Some view it as a reference implementation for “run simple stuff” to separate driver vs. compiler vs. hardware bugs.

Legal, Licensing, and Trademark Issues

Concern that using “CUDA” in the project name and API prefixes invites trademark trouble; discussion of likelihood-of-confusion tests and how far trademarks can reach across industries.
Debate over whether this can be used to bypass Nvidia’s “no GeForce in datacenters” EULA clause by pairing consumer GPUs with the open kernel driver and AOT-compiled kernels. Applicability of overlapping Nvidia licenses is described as murky and potentially riskier for companies than individuals.

Alternatives and Broader Ecosystem

Mention of related efforts: ZLUDA (CUDA on non-Nvidia hardware), tinygrad’s direct-ioctl runtimes for AMD and Nvidia, and other compiler stacks (Triton, Numba, Julia, JAX, etc.).
Some argue Vulkan compute or OpenCL could have been the open standard, but Vulkan/SPIR-V semantics and tooling are seen as less suitable than CUDA, and OpenCL is viewed as effectively abandoned.

Debate Over Value

Supporters: open stacks reduce lock-in, enable unsupported platforms, avoid vendor whims, and can be used even purely for testing correctness.
Skeptics: as long as it only runs on Nvidia hardware and is incomplete, the practical benefit is limited; the real win would be a mature CUDA-like API on non-Nvidia GPUs.

Related topics