Apple M3 Ultra

Unified Memory and Soldering Debate

  • Thread opens with “soldered?” and quickly shifts to why: very wide, high‑bandwidth buses demand extremely short traces and tightly coupled packaging.
  • People note the RAM is on‑package, not on‑die, but still not user‑replaceable; anything socketed (CAMM2, DIMMs) would likely break signal integrity at 512–1024‑bit widths.
  • Some argue skilled techs can desolder and upgrade, but others counter that this is qualitatively different from “plop in a DIMM,” and not realistic as a general upgrade path.

AI, LLMs, and Memory vs Bandwidth vs Compute

  • Huge enthusiasm for 512GB unified memory: it enables fitting very large models (e.g., DeepSeek‑R1 671B Q4) that are impossible on consumer GPUs, and gives GPU/NPU direct access to all of it.
  • Multiple long subthreads compare M3 Ultra (≈819 GB/s) to EPYC (12‑channel DDR5 ≈576 GB/s), 4090 (~1 TB/s), H100/H200/B200 (3–8 TB/s).
  • Consensus:
    • Capacity: M3 Ultra is unique and relatively cheap for GPU‑addressable 512GB.
    • Bandwidth: good vs consumer GPUs, far below datacenter parts.
    • Compute: orders of magnitude less TOPS than Nvidia’s AI GPUs; likely compute‑bound on large models.
  • Debate on whether CPU‑only EPYC boxes with 512–768GB are better: more bandwidth in dual‑socket configs but often compute‑limited and slower tokens/s in practice.
  • Several back‑of‑envelope estimates put DeepSeek‑R1 Q4 on M3 Ultra somewhere around 20–40 tok/s; EPYC builds cited around 3–6 tok/s.

Pricing, Value, and Target Niche

  • 512GB config around $9.5–14k sparks argument:
    • Pro‑AI and media users: “bargain” versus multi‑GPU or H100‑class servers, and trivial vs cloud costs at scale.
    • Skeptics: for the money you can build Threadripper/EPYC + multi‑GPU rigs with far more FLOPs (but vastly less unified VRAM).
  • Many agree this is extremely niche: people needing huge local models, macOS‑only workflows, or privacy‑sensitive inference.

M3 Ultra Timing and Product Matrix

  • Confusion that M3 Ultra ships after M4 Max and only in Mac Studio. Some speculation:
    • M4 Max reportedly lacks the UltraFusion interconnect, so no M4 Ultra this gen.
    • Apple prioritizing datacenter use or yield/TSMC constraints.
  • Mac Pro with M2 Ultra now looks particularly odd; some expect a later M4‑based Pro or even quiet discontinuation.

OS, Tooling, and Ecosystem Limits

  • Repeated concern that macOS, lack of native containers, and weak PyTorch/JAX tooling limit the chip’s appeal as “serious AI” hardware.
  • Asahi Linux is promising but incomplete for M3/M4 and not something enterprises can rely on.
  • CUDA lock‑in remains a central reason most AI shops will stay with Nvidia, despite Apple’s perf/W and unified memory advantages.