I helped fix sleep-wake hangs on Linux with AMD GPUs
User Experiences with AMD Sleep/Wake
- Many report that AMD’s Linux graphics stack is generally good but sleep/wake has been the main recurring pain point, especially with desktop dGPUs and some laptop setups.
- Aorus/X570/B550 motherboards and certain NVMe or USB‑C/Thunderbolt devices are repeatedly cited as problematic: machines freeze on wake, instantly re‑wake, or never fully reach sleep.
- Various udev rules are shared to disable PCIe/USB wake sources (
power/wakeup=disabledon specific buses or devices), with mixed success; some ultimately fixed issues only by removing flaky PCIe cards. - Some users on AMD laptops (including ThinkPads and handhelds with 7840U/8840U) report nearly flawless S0ix/suspend with only small tweaks to wakeup sources.
Suspend Reliability Across OSes
- Multiple commenters note that suspend/hibernate is fragile not just on Linux but also on Windows (especially with Modern Standby/S0) and macOS; stories of laptops cooking in bags or draining batteries overnight are common.
- Opinion is divided: some claim Windows is mostly fine and Linux much worse; others point to notorious Windows “Modern Standby” failures and say Linux on business ThinkPads or Linux‑focused vendors works comparably or better.
- Apple is praised for generally good sleep behavior, but several people describe occasional or recurring failures even on MacBooks, including on ARM.
Workarounds, Debugging, and Tooling
- Techniques discussed:
- Using
/proc/acpi/wakeupand/sys/.../power/wakeupto identify and disable spurious wake devices. - Custom systemd units vs udev rules; importance of
Type=oneshotandRemainAfterExit. - Serial consoles, systemd’s debug shell, and decompiling kernel modules to trace crashes.
- Memtest to rule out bad RAM for GPU‑related black screens;
powercfg /lastwakeon Windows.
- Using
- Some users give up on reliable sleep and instead script full session restoration (tmux resurrect, window manager layout scripts).
Root Cause and VRAM Handling
- A concise summary is given: during suspend, GPU VRAM contents must be saved into system RAM; previously this could happen after swap was disabled, so VRAM+RAM could exceed available memory and hang the system.
- The fix in the article hooks GPU VRAM eviction earlier in the suspend path so it runs before swap/shutdown of relevant memory subsystems.
- Prior user‑space workaround
memreserverpre‑allocated andmlock’d RAM to guarantee space for VRAM, at the cost of potentially huge reservations. - Discussion touches on Linux overcommit and the OOM killer: many see current OOM behavior as fundamentally brittle, with cgroups/zram viewed as mitigations, not real fixes.
AMD vs Nvidia vs Intel on Linux
- Views are split: some say AMD GPUs on Linux are “a nightmare” and recommend avoiding them; others say the opposite—AMD and Intel iGPUs are the safest, while Nvidia causes more crashes, idle power issues, and Wayland problems.
- Nvidia users report their own suspend/black‑screen bugs; suggestions include enabling/disabling
nvidia-suspendservices and trying older driver branches. - Intel Arc is mentioned as also hitting similar suspend/VRAM problems, suggesting this class of bug is not AMD‑exclusive.
Broader Reflections and Future Work
- Many see suspend/hibernate as intrinsically hard because it crosses kernel, drivers, firmware, init system, graphics stack, and desktop environment, all developed somewhat independently.
- There’s a call for better automated diagnostics, e.g. a “memtest for S3/S0” and more standardized tools akin to Microsoft’s sleep diagnostics.
- Alibaba’s proposed refinements to AMD suspend/resume state machines are linked as further work aimed at systematic fixes rather than case‑by‑case patches.