The future of large files in Git is Git
Enthusiasm for native large-file support
- Many welcome large-file handling moving into core Git rather than external tools.
- Separate “large object remotes” and partial clones are seen as enabling broader use cases, including asset-heavy projects.
How Git already handles binaries
- Several comments stress that all Git objects are binary and packfiles already use binary deltas.
- The real pain is with files where small logical changes rewrite the whole binary (compressed, encrypted, some archives), inflating history.
- Another pain point: once a big file is committed, it lives forever in history unless you rewrite it.
Critique of Git LFS
- Criticisms: awkward opt‑in (extra install, hooks,
.gitattributes), confusing pointer files, poor server UX, multiple auth prompts, and bad offline/sneakernet behavior. - Migration tooling can rewrite history in surprising ways (e.g.,
.gitattributes“pollution” in older commits). - Some argue “vendor lock‑in” is mostly about GitHub’s pricing and behavior, not the open LFS protocol itself; others say practically it does lock you in once used.
Partial clones & large object promisors
--filterand promisors are seen as addressing history bloat by not downloading unused large blobs.- Clarification: even with filters, the checked‑out working tree should be complete; only historical versions are lazily fetched.
- Skeptics worry about:
- New flags on
git clonethat beginners won’t know. - Broken behavior if promisor storage is lost/migrated.
- Server support being uneven; many forges don’t support partial clones yet.
- New flags on
- Debate over whether these should become safe defaults vs niche power‑user options.
Should Git manage large/binary assets?
- One camp: Git is a general SCM for whole projects; splitting code and assets (e.g., separate artifact store, submodules) is harmful to reproducibility and release tracking.
- Other camp: Git is fundamentally for text source; large binaries belong in Perforce/SVN/artifact stores; forcing Git into that role is a “square peg in a round hole”.
- Game and media developers report Git/LFS struggling at hundreds of GB–TB scales; Perforce or Plastic often fare better, despite weaker surrounding tooling.
Alternatives and ecosystem tools
- Mentioned tools: git‑annex, datalad, DVC, dud, Oxen, Xet, datamon, jj (future roadmap), DVC‑style indirection layers, artifact repos (Artifactory), and S3‑backed setups.
- git‑annex praised for private, multi‑remote, N‑copies workflows but considered too complex and not well suited for public multi‑user projects.
- DVC appreciated for decoupling data storage from Git history; complaints include hashing overhead and unlimited revision accumulation unless pruned.
- Several projects pitch themselves as “Git‑like but large‑file‑first”, often with chunking, dedupe, or custom backends.
Ideas for better large-file storage
- Proposals include:
- Content‑defined chunking and dedup (borg/restic style) inside Git or a new SCM.
- Prolly trees or similar structures for huge mutable blobs with efficient partial updates.
- Format‑aware diff/merge (e.g., for Office docs, archives, JSON, scenes) or reversible text‑like representations.
- Some argue Git should instead focus on fixing shallow/partial clones and pruning policies so any repo can be an efficient mirror, without pointer schemes.
DX, defaults, and scale
- Repeated complaints that Git “fixes” issues by adding flags, not changing defaults; beginners are left exposed to poor UX (slow clones, obscure options).
- Others counter that Git’s decentralized model and local full history are core strengths and worth preserving, especially for offline and OSS workflows.
- Thread ends without consensus: many see the new features as a big step forward; others think a fundamentally new SCM may be needed for petabyte‑scale, asset‑heavy projects.