2025-03-05

Apple M3 Ultra

Unified Memory and Soldering Debate

Thread opens with “soldered?” and quickly shifts to why: very wide, high‑bandwidth buses demand extremely short traces and tightly coupled packaging.
People note the RAM is on‑package, not on‑die, but still not user‑replaceable; anything socketed (CAMM2, DIMMs) would likely break signal integrity at 512–1024‑bit widths.
Some argue skilled techs can desolder and upgrade, but others counter that this is qualitatively different from “plop in a DIMM,” and not realistic as a general upgrade path.

AI, LLMs, and Memory vs Bandwidth vs Compute

Huge enthusiasm for 512GB unified memory: it enables fitting very large models (e.g., DeepSeek‑R1 671B Q4) that are impossible on consumer GPUs, and gives GPU/NPU direct access to all of it.
Multiple long subthreads compare M3 Ultra (≈819 GB/s) to EPYC (12‑channel DDR5 ≈576 GB/s), 4090 (~1 TB/s), H100/H200/B200 (3–8 TB/s).
Consensus:
- Capacity: M3 Ultra is unique and relatively cheap for GPU‑addressable 512GB.
- Bandwidth: good vs consumer GPUs, far below datacenter parts.
- Compute: orders of magnitude less TOPS than Nvidia’s AI GPUs; likely compute‑bound on large models.
Debate on whether CPU‑only EPYC boxes with 512–768GB are better: more bandwidth in dual‑socket configs but often compute‑limited and slower tokens/s in practice.
Several back‑of‑envelope estimates put DeepSeek‑R1 Q4 on M3 Ultra somewhere around 20–40 tok/s; EPYC builds cited around 3–6 tok/s.

Pricing, Value, and Target Niche

512GB config around $9.5–14k sparks argument:
- Pro‑AI and media users: “bargain” versus multi‑GPU or H100‑class servers, and trivial vs cloud costs at scale.
- Skeptics: for the money you can build Threadripper/EPYC + multi‑GPU rigs with far more FLOPs (but vastly less unified VRAM).
Many agree this is extremely niche: people needing huge local models, macOS‑only workflows, or privacy‑sensitive inference.

M3 Ultra Timing and Product Matrix

Confusion that M3 Ultra ships after M4 Max and only in Mac Studio. Some speculation:
- M4 Max reportedly lacks the UltraFusion interconnect, so no M4 Ultra this gen.
- Apple prioritizing datacenter use or yield/TSMC constraints.
Mac Pro with M2 Ultra now looks particularly odd; some expect a later M4‑based Pro or even quiet discontinuation.

OS, Tooling, and Ecosystem Limits

Repeated concern that macOS, lack of native containers, and weak PyTorch/JAX tooling limit the chip’s appeal as “serious AI” hardware.
Asahi Linux is promising but incomplete for M3/M4 and not something enterprises can rely on.
CUDA lock‑in remains a central reason most AI shops will stay with Nvidia, despite Apple’s perf/W and unified memory advantages.

Related topics