CDC File Transfer

Stadia origins and self-hosted game streaming

  • Tool originated to speed transfers from Windows dev machines to Stadia’s Linux servers; some see this as a rare lasting benefit of Stadia.
  • Desire for a “self-hosted Stadia” runs into legal/DRM issues; discussion branches into views that modern DRM effectively criminalizes sharing, and some defend piracy when no DRM‑free options exist.
  • Alternatives for self-hosted streaming: Moonlight + Sunshine / Apollo, Steam’s streaming, console remote play, etc. Experiences are mixed, especially with virtual/headless displays and multi‑GPU or Linux setups.
  • Technical notes on Stadia: games were Linux builds using Vulkan plus Stadia APIs; there were custom dev kits and hardware, which makes a generic self‑hosted reuse implausible.

How CDC (Content-Defined Chunking) works

  • CDC here means “Content Defined Chunking”, not USB/CDC, disease control, or other acronyms.
  • Key idea: instead of fixed-size blocks, chunk boundaries are determined by file content (e.g., via GEAR hashing and bit masks). This lets the algorithm recognize insertions/deletions without invalidating all following blocks.
  • Contrast with rsync: rsync uses fixed target blocks plus a rolling hash to find them at arbitrary offsets; good for bandwidth, but more CPU-heavy and less optimal than CDC-based schemes.

Performance vs rsync and other tools

  • Google reports their CDC-based remote diffing is up to 30x faster than rsync’s algorithm (1500 MB/s vs 50 MB/s) in their tests.
  • Some confusion arises over whether rsync already does content-based boundaries; clarifications emphasize its fixed-block design.
  • Steam uses 1MB fixed chunks for updates; backup tools like borg/restic, and git-replacement systems like xet, already exploit content-defined chunking.
  • A variant (go-cdc with lookahead) can modestly improve dedup (≈3–4% extra savings) over FastCDC, at small complexity cost.

Project scope, limitations, and status

  • cdc_rsync only supports a narrow Windows → Linux path, matching Stadia’s workflow; it does not support Linux → Linux.
  • The repo is archived and effectively dead; some view this as acceptable for a bespoke internal tool, others see major missed potential.
  • Complaints include Bazel as a heavy dependency and limited platform support; some praise Bazel, others dislike it.

Broader uses and comparisons

  • Game development is highlighted as a prime beneficiary: massive asset trees, slow rsync behavior with many small files, and high visibility for build-time reductions.
  • Related technologies mentioned include IBM Aspera, Microsoft RDC, borg, monoidal hashing, and simple ad‑hoc file sharing via Tailscale plus python3 -m http.server.