macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt
macOS HDR Behavior
- Several commenters complain HDR on macOS looks “washed out” on third‑party HDR monitors (especially OLED): blacks become gray, SDR UI elements look flat, while HDR video in a window looks fine.
- Others say this is by design: macOS keeps UI in SDR while only HDR content uses extended range, and on limited‑brightness displays the SDR UI will appear gray compared to HDR highlights.
- There’s disagreement whether raised blacks are an intentional trade‑off, an Apple bug, or a calibration/metadata issue; Windows’ HDR calibration tool is cited as working better on the same displays.
What RDMA over Thunderbolt Enables
- Previously, people chained Macs using pipeline parallelism (layers split across machines). This allows larger models than fit on one Mac but doesn’t speed up inference.
- RDMA over Thunderbolt plus MLX now enables fast tensor/head parallelism: each layer is sharded across machines, with per‑node Q/K/V, local attention, then all‑reduce on outputs.
- Reported benchmarks: ~3.5× speedup in token generation on 4 machines for batch size 1, mainly from reduced per‑node memory bandwidth pressure. Latency and frequent synchronization remain the main challenges.
Mac Clusters vs GPU Rigs (Cost, Power, Memory)
- Enthusiasts see M‑series clusters as attractive “AI appliances” for labs, small shops, and serious hobbyists: huge unified memory, low power, plug‑and‑play, no CUDA.
- Critics argue Nvidia/AMD GPUs are far cheaper per FLOP and have much higher raw bandwidth; memory bandwidth and interconnect, not just capacity, are the real bottlenecks.
- One comparison:
$50k Mac Studio M3 Ultra cluster: ~3 TB unified memory, slow (15 tok/s) but can host ~trillion‑parameter models.- ~$50k RTX 6000 workstation: much higher tokens/sec but limited to <400B‑parameter models (384 GB VRAM).
- Similar‑capacity GH200 setups cost an order of magnitude more.
- Others point out you can build used Xeon/GPU franken‑clusters or multi‑3090 rigs, trading efficiency for raw capacity and heat.
Thunderbolt/RDMA Technical & Physical Limits
- RDMA runs over Thunderbolt/USB4 PCIe (effectively PCIe 4×4, ~64 Gbps per port), lower latency than standard TB networking.
- Topology is a fully connected mesh; practical limit is ~6 Mac Studios, so this is not a large‑scale datacenter fabric.
- People worry about Thunderbolt’s mechanical robustness for semi‑permanent interconnects, but note locking USB‑C variants and third‑party “cable locking” accessories.
- Some lament the lack of Thunderbolt switches or QSFP‑style ports; others note a “Thunderbolt router” could just be a multi‑port computer.
Deployability and Server‑Style Management
- macOS is seen as awkward in a datacenter role: GUI‑driven OS upgrades, weaker open tooling vs Linux/BSD, no real IPMI/iLO equivalent.
- MDM‑based workflows (Jamf, open‑source MDMs, erase‑install scripts, VNC/Screen Sharing) can automate upgrades and remote control, but require Apple‑specific expertise.
- Rackmount concerns: Mac Studio’s rear corner power button and non‑locking TB cables make clean rack deployments fiddly; third‑party rack kits and locking accessories partly address this.
- Some miss Xserve and argue Apple has never fully committed to server‑grade macOS; others note AWS and MacStadium already run Mac fleets successfully.
Apple’s Possible Strategy and Ecosystem Play
- Several commenters see this as part of a broader Apple plan:
- Bake AI accelerators and large unified memory into all high‑end Macs.
- Make Macs attractive for AI research and local inference.
- Potentially reuse the tech to distribute AI workloads across a user’s devices (Mac, iPhone, iPad, Apple TV, HomePod) for private, on‑prem inference.
- Others are skeptical: RDMA over Thunderbolt is limited to small clusters and doesn’t directly translate to Apple’s mostly wireless consumer device network.
Wider Market, RAM, and End‑User Impact
- There’s debate over whether high‑RAM Macs could become a cost‑effective medium‑scale inference platform, especially given current DRAM shortages and price spikes.
- Some fear that high‑end Macs will be bought out by commercial AI users; others reply that typical home users neither need nor can justify 512 GB+ Macs anyway.
- A long subthread argues over whether RAM pricing spikes are a short‑term bubble or a multi‑year structural issue, with implications for “a computer in every home” and for cheap local AI.
Security, Scope, and Miscellaneous
- RDMA is disabled by default and must be explicitly enabled in recovery mode, which alleviates some concerns about plug‑and‑play physical attack vectors.
- Not tied to ML: in principle any distributed workload that benefits from low‑latency, high‑bandwidth memory access could use it (e.g., MPI/HPC), though early tests are rough.
- Gaming and eGPU hopefuls ask if this helps them; consensus is no—this is for clustering Macs, not reviving general eGPU support or multi‑node gaming.