The future of large files in Git is Git

Enthusiasm for native large-file support

  • Many welcome large-file handling moving into core Git rather than external tools.
  • Separate “large object remotes” and partial clones are seen as enabling broader use cases, including asset-heavy projects.

How Git already handles binaries

  • Several comments stress that all Git objects are binary and packfiles already use binary deltas.
  • The real pain is with files where small logical changes rewrite the whole binary (compressed, encrypted, some archives), inflating history.
  • Another pain point: once a big file is committed, it lives forever in history unless you rewrite it.

Critique of Git LFS

  • Criticisms: awkward opt‑in (extra install, hooks, .gitattributes), confusing pointer files, poor server UX, multiple auth prompts, and bad offline/sneakernet behavior.
  • Migration tooling can rewrite history in surprising ways (e.g., .gitattributes “pollution” in older commits).
  • Some argue “vendor lock‑in” is mostly about GitHub’s pricing and behavior, not the open LFS protocol itself; others say practically it does lock you in once used.

Partial clones & large object promisors

  • --filter and promisors are seen as addressing history bloat by not downloading unused large blobs.
  • Clarification: even with filters, the checked‑out working tree should be complete; only historical versions are lazily fetched.
  • Skeptics worry about:
    • New flags on git clone that beginners won’t know.
    • Broken behavior if promisor storage is lost/migrated.
    • Server support being uneven; many forges don’t support partial clones yet.
  • Debate over whether these should become safe defaults vs niche power‑user options.

Should Git manage large/binary assets?

  • One camp: Git is a general SCM for whole projects; splitting code and assets (e.g., separate artifact store, submodules) is harmful to reproducibility and release tracking.
  • Other camp: Git is fundamentally for text source; large binaries belong in Perforce/SVN/artifact stores; forcing Git into that role is a “square peg in a round hole”.
  • Game and media developers report Git/LFS struggling at hundreds of GB–TB scales; Perforce or Plastic often fare better, despite weaker surrounding tooling.

Alternatives and ecosystem tools

  • Mentioned tools: git‑annex, datalad, DVC, dud, Oxen, Xet, datamon, jj (future roadmap), DVC‑style indirection layers, artifact repos (Artifactory), and S3‑backed setups.
  • git‑annex praised for private, multi‑remote, N‑copies workflows but considered too complex and not well suited for public multi‑user projects.
  • DVC appreciated for decoupling data storage from Git history; complaints include hashing overhead and unlimited revision accumulation unless pruned.
  • Several projects pitch themselves as “Git‑like but large‑file‑first”, often with chunking, dedupe, or custom backends.

Ideas for better large-file storage

  • Proposals include:
    • Content‑defined chunking and dedup (borg/restic style) inside Git or a new SCM.
    • Prolly trees or similar structures for huge mutable blobs with efficient partial updates.
    • Format‑aware diff/merge (e.g., for Office docs, archives, JSON, scenes) or reversible text‑like representations.
  • Some argue Git should instead focus on fixing shallow/partial clones and pruning policies so any repo can be an efficient mirror, without pointer schemes.

DX, defaults, and scale

  • Repeated complaints that Git “fixes” issues by adding flags, not changing defaults; beginners are left exposed to poor UX (slow clones, obscure options).
  • Others counter that Git’s decentralized model and local full history are core strengths and worth preserving, especially for offline and OSS workflows.
  • Thread ends without consensus: many see the new features as a big step forward; others think a fundamentally new SCM may be needed for petabyte‑scale, asset‑heavy projects.