macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt

macOS HDR Behavior

  • Several commenters complain HDR on macOS looks “washed out” on third‑party HDR monitors (especially OLED): blacks become gray, SDR UI elements look flat, while HDR video in a window looks fine.
  • Others say this is by design: macOS keeps UI in SDR while only HDR content uses extended range, and on limited‑brightness displays the SDR UI will appear gray compared to HDR highlights.
  • There’s disagreement whether raised blacks are an intentional trade‑off, an Apple bug, or a calibration/metadata issue; Windows’ HDR calibration tool is cited as working better on the same displays.

What RDMA over Thunderbolt Enables

  • Previously, people chained Macs using pipeline parallelism (layers split across machines). This allows larger models than fit on one Mac but doesn’t speed up inference.
  • RDMA over Thunderbolt plus MLX now enables fast tensor/head parallelism: each layer is sharded across machines, with per‑node Q/K/V, local attention, then all‑reduce on outputs.
  • Reported benchmarks: ~3.5× speedup in token generation on 4 machines for batch size 1, mainly from reduced per‑node memory bandwidth pressure. Latency and frequent synchronization remain the main challenges.

Mac Clusters vs GPU Rigs (Cost, Power, Memory)

  • Enthusiasts see M‑series clusters as attractive “AI appliances” for labs, small shops, and serious hobbyists: huge unified memory, low power, plug‑and‑play, no CUDA.
  • Critics argue Nvidia/AMD GPUs are far cheaper per FLOP and have much higher raw bandwidth; memory bandwidth and interconnect, not just capacity, are the real bottlenecks.
  • One comparison:
    • $50k Mac Studio M3 Ultra cluster: ~3 TB unified memory, slow (15 tok/s) but can host ~trillion‑parameter models.
    • ~$50k RTX 6000 workstation: much higher tokens/sec but limited to <400B‑parameter models (384 GB VRAM).
    • Similar‑capacity GH200 setups cost an order of magnitude more.
  • Others point out you can build used Xeon/GPU franken‑clusters or multi‑3090 rigs, trading efficiency for raw capacity and heat.

Thunderbolt/RDMA Technical & Physical Limits

  • RDMA runs over Thunderbolt/USB4 PCIe (effectively PCIe 4×4, ~64 Gbps per port), lower latency than standard TB networking.
  • Topology is a fully connected mesh; practical limit is ~6 Mac Studios, so this is not a large‑scale datacenter fabric.
  • People worry about Thunderbolt’s mechanical robustness for semi‑permanent interconnects, but note locking USB‑C variants and third‑party “cable locking” accessories.
  • Some lament the lack of Thunderbolt switches or QSFP‑style ports; others note a “Thunderbolt router” could just be a multi‑port computer.

Deployability and Server‑Style Management

  • macOS is seen as awkward in a datacenter role: GUI‑driven OS upgrades, weaker open tooling vs Linux/BSD, no real IPMI/iLO equivalent.
  • MDM‑based workflows (Jamf, open‑source MDMs, erase‑install scripts, VNC/Screen Sharing) can automate upgrades and remote control, but require Apple‑specific expertise.
  • Rackmount concerns: Mac Studio’s rear corner power button and non‑locking TB cables make clean rack deployments fiddly; third‑party rack kits and locking accessories partly address this.
  • Some miss Xserve and argue Apple has never fully committed to server‑grade macOS; others note AWS and MacStadium already run Mac fleets successfully.

Apple’s Possible Strategy and Ecosystem Play

  • Several commenters see this as part of a broader Apple plan:
    • Bake AI accelerators and large unified memory into all high‑end Macs.
    • Make Macs attractive for AI research and local inference.
    • Potentially reuse the tech to distribute AI workloads across a user’s devices (Mac, iPhone, iPad, Apple TV, HomePod) for private, on‑prem inference.
  • Others are skeptical: RDMA over Thunderbolt is limited to small clusters and doesn’t directly translate to Apple’s mostly wireless consumer device network.

Wider Market, RAM, and End‑User Impact

  • There’s debate over whether high‑RAM Macs could become a cost‑effective medium‑scale inference platform, especially given current DRAM shortages and price spikes.
  • Some fear that high‑end Macs will be bought out by commercial AI users; others reply that typical home users neither need nor can justify 512 GB+ Macs anyway.
  • A long subthread argues over whether RAM pricing spikes are a short‑term bubble or a multi‑year structural issue, with implications for “a computer in every home” and for cheap local AI.

Security, Scope, and Miscellaneous

  • RDMA is disabled by default and must be explicitly enabled in recovery mode, which alleviates some concerns about plug‑and‑play physical attack vectors.
  • Not tied to ML: in principle any distributed workload that benefits from low‑latency, high‑bandwidth memory access could use it (e.g., MPI/HPC), though early tests are rough.
  • Gaming and eGPU hopefuls ask if this helps them; consensus is no—this is for clustering Macs, not reviving general eGPU support or multi‑node gaming.