2025-11-05

The state of SIMD in Rust in 2025

Current SIMD Options in Rust

Consensus on guidance from the article:
- Use std::simd on nightly if possible.
- Use wide for stable Rust without multiversioning.
- Use pulp/macerator when you need multiversioning and portability.
Some large projects use nightly std::simd in production, while avoiding the most unstable APIs.
Stable std::arch intrinsics for x86, ARM, and WASM are widely used for non‑portable, target-specific SIMD.

Why std::simd Isn’t Stable Yet

Stabilizing portable SIMD is seen as a “massive hard problem”:
- Needs to abstract many heterogeneous ISAs while balancing performance and ergonomics.
- Subject to Rust’s strong stability guarantees; once shipped, API mistakes are very hard to fix.
Blockers mentioned:
- Dependence on unstable building blocks (e.g., const generics, generic_const_exprs, trait solving).
- Interactions with safety, coherence, object safety, error reporting, etc.
- LLVM SIMD intrinsics can be volatile and have caused ICEs and codegen issues.
- Unclear how to support scalable SIMD ISAs like RISC‑V vectors or ARM SVE with today’s fixed-lane design.

Rust Governance, Priorities, and Funding

Compiler and language work is largely volunteer-driven; there are few people who can prioritize stabilization work.
High bar for quality in std, plus no BDFL to “just decide” when something is good enough.
Some argue Rust made too many global promises (safety, semver, trait coherence), which slows or kills complex features.
There is concern about underfunding of core compiler work despite corporate use; a maintainers fund has been started.

Autovectorization and Floating Point

Many workloads can get good SIMD via autovectorization + careful loop structure, especially for integers.
For floats, aggressive reordering is blocked by IEEE-754 semantics; Rust lacks a stable equivalent of -ffast-math.
Nightly offers “algebraic” float operations and _fast intrinsics as an opt-in, but these require explicit, awkward APIs.
Some users want a more ergonomic, scoped “fast math” mechanism; others warn about the dangers of global flags.

Ergonomics, Multiversioning, and Abstractions

Runtime feature detection + #[target_feature] is described as painful: attributes must be propagated or forced via #[inline(always)], making abstractions and reuse harder.
Workarounds exist (traits over vector types, generic algorithms instantiated per-ISA) but are fragile and often rely on unsafe.
Keeping data in SIMD form across a pipeline and handling packing/unpacking correctly is a recurring complexity.

Comparisons to Other Languages

C# is seen as having a more mature, stable SIMD story (portable vectors + intrinsics), partly due to strong corporate backing.
Java and Go are described as weaker on SIMD; Go currently relies on awkward assembly, though intrinsics are being worked on.
Some argue Rust leans too much on LLVM and underinvests in higher-level, predictable autovectorization compared to C/C++ ecosystems.

Related topics