ROCm Device Support Wishlist
Perceived ROCm Device and Lifecycle Problems
- Many commenters say ROCm support for consumer Radeon cards is narrow, late, and short‑lived.
- Reports of cards being dropped 1–2 years after purchase; RX 7800 XT cited as supported for ~15 months.
- Complaints that only a few RX 7900 and some Pro W7xxx cards are clearly “officially supported” on Linux, with older high‑end parts (e.g., Radeon VII, Pro VII, MI50/MI60) marked deprecated despite capable hardware.
- Users argue this makes AMD GPUs a risky investment for ML/compute, especially compared to NVIDIA’s broad CUDA coverage.
Workarounds and Mixed User Experiences
- Debian and Ubuntu ROCm packages are reported to work on many “unsupported” discrete GPUs since Vega, with CI testing across architectures; but they can lag in performance optimizations.
- Some users successfully run Stable Diffusion and LLaMA derivatives on 6xxx/7xxx cards; others hit driver crashes, hard lockups, or CPU fallback.
- Integrated GPUs (e.g., 780M) are particularly fragile; ROCm sometimes works until stress exposes bugs.
Documentation, Packaging, and Tooling Issues
- Strong criticism of confusing, contradictory compatibility matrices between AMD’s main docs and Radeon‑specific pages; hard to know which exact GPUs are supported.
- Complaints about ROCm installation being brittle, poorly documented, and far behind NVIDIA’s “install CUDA and it works” experience.
- Go bindings for AMD SMI are cited as under‑documented, with dead links and missing distro packages.
Comparison with NVIDIA and Ecosystem Concerns
- Many see NVIDIA as “software‑first”: mature CUDA, libraries, tooling, and wider hardware support.
- AMD is perceived as “hardware‑first,” under‑resourced in software, and focused on hyperscalers and Instinct accelerators rather than consumers.
- Several argue that rapid deprecation and unclear guarantees make ROCm unsuitable for enterprises that need predictable lifecycles.
AMD Engagement and Strategic Debates
- An AMD representative appears in the thread, asking for feedback and prioritization guidance, acknowledging gaps but avoiding hard guarantees (“will try hard”).
- Some welcome this as progress; others view it as repetition of old promises and demand explicit commitments to support “all recent GPUs” for longer periods.
Alternatives and Architectural Critiques
- ROCm’s per‑architecture codegen (gfx10xx, etc.) is criticized versus CUDA’s PTX‑style virtual ISA.
- Multiple commenters advocate focusing on open standards (Vulkan compute, SYCL/oneAPI, OpenXLA/IREE) instead of chasing CUDA parity with a proprietary stack.