The Magical Mystery Merge Or Why we run FreeBSD-current at Netflix (2023) [pdf]

Hardware, power, and performance

  • Slides mention milestones: 800 Gb/s servers (dual AMD 7713, NIC kTLS offload) and 100 Gb/s servers at ~100W using Nvidia Bluefield-3.
  • Commenters question whether Bluefield really beats modern x86 (e.g., future Zen 5 + DDR5) in watts/Gb, but no concrete data is provided.
  • Several note that total power is not just CPU: SSDs, NICs, chassis, and cooling can push configurations into 1,100–1,400W PSU territory.
  • One point: being able to do 800G in 800W doesn’t imply you can do 100G in 100W; smaller nodes have different design constraints.

Bluefield-3, offload, and storage architecture

  • Practitioners describe running nginx directly on Bluefield ARM cores, using PCIe fabrics (Liqid) to fan out many Bluefields from a single host CPU.
  • NVMe-over-Fabrics from Bluefield to shared NVMe cards can remove host CPUs as storage bottlenecks; bandwidth then limited by PCIe switch capacity.

Bisection, regressions, and tracking -CURRENT

  • Debate over the claim that not tracking -stable branches avoids “weeks” of bisect work.
  • Some argue log₂ of commits means even 3 years of changes are bisectable in days, especially with automation.
  • Others counter that large merges add performance noise, incompatible changes, and manual merge conflicts, making each step slower and attribution harder.
  • Presenter clarifies the 4 hours/step is just reimage + ramp time; merges were trivial only because they closely track head.

FreeBSD vs Linux and licensing

  • Multiple reasons cited for Netflix’s FreeBSD choice:
    • Historically stronger networking stack, dtrace, async sendfile.
    • Unified kernel+userland tree simplifies debugging and bisecting.
    • Tighter integration and clearer “single source of truth” than typical Linux distro stacks.
  • Counterpoint: similar performance could likely be achieved on Linux with a comparably strong team; big Linux users also employ many kernel engineers.
  • Licensing discussed: BSD license seen as attractive to large vendors (routers, storage, consoles) and past corporate legal fears around GPL mentioned.
  • Some see Netflix’s use of FreeBSD as path-dependent (who they hired, historical timing) rather than an absolute technical win.

Filesystems, sendfile, and ZFS

  • Netflix doesn’t store video content on ZFS primarily because sendfile is not zero-copy or async there yet.
  • For mostly read-only video data with external redundancy, single-drive ZFS is seen as offering limited benefit (snapshots, COW, bit-rot detection less critical).

HTTP stack and Nginx

  • Question raised why Netflix hasn’t replaced nginx with a custom server like Cloudflare did.
  • Responses: their use case is mainly static file serving; nginx already works well and handles diverse client quirks.
  • Bottlenecks are more about I/O, memory bandwidth, and pacing than HTTP parsing, so a rewrite is unlikely to yield big gains.

Version control, ordering bugs, and initialization

  • Clarification that FreeBSD now uses git, not CVS/Perforce.
  • Discussion of a long-standing bug masked by link-set alphabetical ordering; a new sort order exposed it.
  • Alphabetical ordering seen as a deterministic but semantically weak tie-breaker; some advocate dependency-based or topological initialization order.

Operating model and update strategy

  • Several endorse running close to head with a lag (e.g., ~3 weeks) to keep future predictable while avoiding the freshest breakage.
  • Cherry-picks into build-time only, with upstreaming encouraged; non-upstreamed patches serve as visible technical debt.
  • Others note similar staged-rollout practices for OS updates in their organizations.

General sentiment

  • Many find the slides “like a thriller,” praising the small, highly capable team and deep kernel-level work.
  • Some skepticism appears about how special FreeBSD is versus Linux, but there’s broad respect for the engineering and transparency.