2025-07-26

Rust running on every GPU

Overall reaction & goals

Many commenters are impressed that ordinary no_std Rust crates can run on GPUs with little or no change, seeing this as unlocking cross‑CPU/GPU libraries and easier GPU adoption for “CPU programmers.”
Core goal as understood: write a single Rust implementation (host + kernels) and run it across CPUs, CUDA, Vulkan/Metal/D3D, WebGPU, etc., with conditional compilation and runtime backend selection.

Abstraction vs performance

Performance‑focused participants distrust high‑level GPU abstractions; they want direct control over low‑level differences between vendors and architectures.
Concern that a portable Rust layer becomes a “jack of all trades,” incapable of exploiting vendor‑specific features (warp reductions, tensor cores, ray tracing, bindless, etc.).
Others argue all GPU APIs (including CUDA) are abstractions; you pick a trade‑off between engineering capacity and peak performance, and can still drop down to PTX/ISA where needed.
Some expect backend‑specific optimization passes and intrinsic support to improve over time, but acknowledge that truly optimal code may still require per‑architecture kernels.

Toolchain, compilation targets, and layers

Rust code is compiled to NVVM IR, then to PTX for NVIDIA, or to SPIR‑V for Vulkan/WebGPU; PTX can be embedded into CPU binaries or loaded at runtime.
Several people note the long chain of abstractions: Rust → GPU backend → wgpu/ash/etc. → Vulkan/Metal/DX/OpenGL → drivers → hardware, raising concerns about debugging and how well performance‑critical details survive.
Proponents reply that similar layering exists on CPUs and in game engines, and that Rust’s “zero‑cost abstractions” plus good codegen can keep overhead acceptable.

Relation to existing APIs and languages

CUDA: skeptics ask why use Rust instead of CUDA C++; supporters cite safer language, reuse of Rust crates, and integration with existing CUDA libraries via NVVM output.
WebGPU: discussed as a portable graphics/compute API, but with constraints (security verification, missing advanced features, decade‑old baseline). It doesn’t solve “same language on CPU and GPU,” since shaders use WGSL.
Other languages: Zig, Nim, LLVM IR can also target SPIR‑V. Julia and Mojo are praised for numerical computing and GPU support (including JITing kernels and native Metal paths); Rust is seen as less numerics‑centric but more “systems”‑oriented.

Ecosystem gaps, CUDA moat, and vendor politics

Commenters highlight the large CUDA library ecosystem (cuBLAS, cuDNN, nvCOMP, etc.) and note many are missing or incomplete on the Rust side; some bindings exist but replicating NVIDIA’s decade of engineering is seen as a huge task.
NVIDIA’s CUDA EULA and proprietary stance are cited as part of its moat; attempts at open standards (OpenCL, SPIR‑V, OpenMP) and Khronos politics are mentioned as only partial successes.
There is recurring desire for a common GPU ISA or a SPIR‑V‑like unification, but skepticism that dominant vendors will cooperate.

Portability, demand, and future directions

Some doubt there is strong demand among experienced CUDA developers to switch to Rust; others counter that Rust dramatically expanded the systems‑programming audience and could do the same for GPU programming even at modest initial performance.
Rust GPU contributors mention forming a GPU working group, integrating GPU concepts into Rust’s language/cfg() model, and building traits/APIs that lower‑level crates (like wgpu) can use.
Several see this as an early but significant step toward portable ML and heterogeneous compute in Rust, while acknowledging that vendor‑specific tuning and ecosystem maturity remain open challenges.

Related topics