CDC File Transfer
Stadia origins and self-hosted game streaming
- Tool originated to speed transfers from Windows dev machines to Stadia’s Linux servers; some see this as a rare lasting benefit of Stadia.
- Desire for a “self-hosted Stadia” runs into legal/DRM issues; discussion branches into views that modern DRM effectively criminalizes sharing, and some defend piracy when no DRM‑free options exist.
- Alternatives for self-hosted streaming: Moonlight + Sunshine / Apollo, Steam’s streaming, console remote play, etc. Experiences are mixed, especially with virtual/headless displays and multi‑GPU or Linux setups.
- Technical notes on Stadia: games were Linux builds using Vulkan plus Stadia APIs; there were custom dev kits and hardware, which makes a generic self‑hosted reuse implausible.
How CDC (Content-Defined Chunking) works
- CDC here means “Content Defined Chunking”, not USB/CDC, disease control, or other acronyms.
- Key idea: instead of fixed-size blocks, chunk boundaries are determined by file content (e.g., via GEAR hashing and bit masks). This lets the algorithm recognize insertions/deletions without invalidating all following blocks.
- Contrast with rsync: rsync uses fixed target blocks plus a rolling hash to find them at arbitrary offsets; good for bandwidth, but more CPU-heavy and less optimal than CDC-based schemes.
Performance vs rsync and other tools
- Google reports their CDC-based remote diffing is up to 30x faster than rsync’s algorithm (1500 MB/s vs 50 MB/s) in their tests.
- Some confusion arises over whether rsync already does content-based boundaries; clarifications emphasize its fixed-block design.
- Steam uses 1MB fixed chunks for updates; backup tools like borg/restic, and git-replacement systems like xet, already exploit content-defined chunking.
- A variant (go-cdc with lookahead) can modestly improve dedup (≈3–4% extra savings) over FastCDC, at small complexity cost.
Project scope, limitations, and status
- cdc_rsync only supports a narrow Windows → Linux path, matching Stadia’s workflow; it does not support Linux → Linux.
- The repo is archived and effectively dead; some view this as acceptable for a bespoke internal tool, others see major missed potential.
- Complaints include Bazel as a heavy dependency and limited platform support; some praise Bazel, others dislike it.
Broader uses and comparisons
- Game development is highlighted as a prime beneficiary: massive asset trees, slow rsync behavior with many small files, and high visibility for build-time reductions.
- Related technologies mentioned include IBM Aspera, Microsoft RDC, borg, monoidal hashing, and simple ad‑hoc file sharing via Tailscale plus
python3 -m http.server.