2025-12-12

macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt

macOS HDR Behavior

Several commenters complain HDR on macOS looks “washed out” on third‑party HDR monitors (especially OLED): blacks become gray, SDR UI elements look flat, while HDR video in a window looks fine.
Others say this is by design: macOS keeps UI in SDR while only HDR content uses extended range, and on limited‑brightness displays the SDR UI will appear gray compared to HDR highlights.
There’s disagreement whether raised blacks are an intentional trade‑off, an Apple bug, or a calibration/metadata issue; Windows’ HDR calibration tool is cited as working better on the same displays.

What RDMA over Thunderbolt Enables

Previously, people chained Macs using pipeline parallelism (layers split across machines). This allows larger models than fit on one Mac but doesn’t speed up inference.
RDMA over Thunderbolt plus MLX now enables fast tensor/head parallelism: each layer is sharded across machines, with per‑node Q/K/V, local attention, then all‑reduce on outputs.
Reported benchmarks: ~3.5× speedup in token generation on 4 machines for batch size 1, mainly from reduced per‑node memory bandwidth pressure. Latency and frequent synchronization remain the main challenges.

Mac Clusters vs GPU Rigs (Cost, Power, Memory)

Enthusiasts see M‑series clusters as attractive “AI appliances” for labs, small shops, and serious hobbyists: huge unified memory, low power, plug‑and‑play, no CUDA.
Critics argue Nvidia/AMD GPUs are far cheaper per FLOP and have much higher raw bandwidth; memory bandwidth and interconnect, not just capacity, are the real bottlenecks.
One comparison:
- ~~$50k Mac Studio M3 Ultra cluster: ~3 TB unified memory, slow (~~15 tok/s) but can host ~trillion‑parameter models.
- ~$50k RTX 6000 workstation: much higher tokens/sec but limited to <400B‑parameter models (384 GB VRAM).
- Similar‑capacity GH200 setups cost an order of magnitude more.
Others point out you can build used Xeon/GPU franken‑clusters or multi‑3090 rigs, trading efficiency for raw capacity and heat.

Thunderbolt/RDMA Technical & Physical Limits

RDMA runs over Thunderbolt/USB4 PCIe (effectively PCIe 4×4, ~64 Gbps per port), lower latency than standard TB networking.
Topology is a fully connected mesh; practical limit is ~6 Mac Studios, so this is not a large‑scale datacenter fabric.
People worry about Thunderbolt’s mechanical robustness for semi‑permanent interconnects, but note locking USB‑C variants and third‑party “cable locking” accessories.
Some lament the lack of Thunderbolt switches or QSFP‑style ports; others note a “Thunderbolt router” could just be a multi‑port computer.

Deployability and Server‑Style Management

macOS is seen as awkward in a datacenter role: GUI‑driven OS upgrades, weaker open tooling vs Linux/BSD, no real IPMI/iLO equivalent.
MDM‑based workflows (Jamf, open‑source MDMs, erase‑install scripts, VNC/Screen Sharing) can automate upgrades and remote control, but require Apple‑specific expertise.
Rackmount concerns: Mac Studio’s rear corner power button and non‑locking TB cables make clean rack deployments fiddly; third‑party rack kits and locking accessories partly address this.
Some miss Xserve and argue Apple has never fully committed to server‑grade macOS; others note AWS and MacStadium already run Mac fleets successfully.

Apple’s Possible Strategy and Ecosystem Play

Several commenters see this as part of a broader Apple plan:
- Bake AI accelerators and large unified memory into all high‑end Macs.
- Make Macs attractive for AI research and local inference.
- Potentially reuse the tech to distribute AI workloads across a user’s devices (Mac, iPhone, iPad, Apple TV, HomePod) for private, on‑prem inference.
Others are skeptical: RDMA over Thunderbolt is limited to small clusters and doesn’t directly translate to Apple’s mostly wireless consumer device network.

Wider Market, RAM, and End‑User Impact

There’s debate over whether high‑RAM Macs could become a cost‑effective medium‑scale inference platform, especially given current DRAM shortages and price spikes.
Some fear that high‑end Macs will be bought out by commercial AI users; others reply that typical home users neither need nor can justify 512 GB+ Macs anyway.
A long subthread argues over whether RAM pricing spikes are a short‑term bubble or a multi‑year structural issue, with implications for “a computer in every home” and for cheap local AI.

Security, Scope, and Miscellaneous

RDMA is disabled by default and must be explicitly enabled in recovery mode, which alleviates some concerns about plug‑and‑play physical attack vectors.
Not tied to ML: in principle any distributed workload that benefits from low‑latency, high‑bandwidth memory access could use it (e.g., MPI/HPC), though early tests are rough.
Gaming and eGPU hopefuls ask if this helps them; consensus is no—this is for clustering Macs, not reviving general eGPU support or multi‑node gaming.

Related topics