I made my own Git
Alternative VCS Designs & SQLite Idea
- A suggestion to back the toy Git with SQLite leads to discussion of Fossil, which already uses SQLite internally and bundles issues, wiki, docs, and forum as first-class, local-first data.
- Several commenters like Fossil for personal/small-team projects and offline work, but note: no rebasing, different collaboration model, and weaker story for drive‑by contributions compared to Git forges.
- Others argue Git is also “local-first”, but are reminded that its ecosystem typically offloads issues/docs to external platforms.
- Other VCSes mentioned: Sapling (Meta’s Mercurial fork, zstd-based deltas), Pijul and Jujutsu (first-class conflict objects), Got (encrypted, large-data friendly).
Learning Git Internals & DIY Reimplementations
- Many links shared to “build your own Git” resources and explanations (Python/Rust implementations, “Git from the Bottom Up”, “The Git Parable”).
- Reimplementing Git is seen as a powerful way to expose its hidden complexity and improve intuition about everyday commands.
Storage, Compression & Hashing Choices
- Some think focusing early on compression (zstd vs zlib) is less interesting than Git’s object model, but others note implementation details all matter when learning.
- Discussion of SHA‑1 vs SHA‑256: collisions are a theoretical concern even for “just identifiers”; Git’s migration to SHA‑256 is noted.
- Multiple comments argue SHA‑256 is slow; BLAKE3 or similar parallel-friendly hashes can be much faster, depending on hardware.
- Git’s file-based object model is criticized as suboptimal for many small or large files; content-defined chunking is proposed as a better long-term approach.
- Some question whether compression belongs in the VCS or at the filesystem layer (e.g., btrfs with transparent zstd).
Performance, Caching & Empty Directories
- The toy VCS recomputes hashes for all files on each operation; commenters point out this will not scale, and reference Git’s “racy git” handling using timestamps + filesize as a heuristic.
- Git’s data model technically supports empty trees, but its index doesn’t track empty directories; the toy implementation does support empty folders explicitly.
Merging, Rebasing & Conflict Handling
- Git’s recursive merge strategy is praised for remembering conflict resolutions; rerere is mentioned but considered local and sometimes dangerous.
- Some advocate merge-based workflows over squash/rebase to preserve history and past attempts.
- Newer systems (Pijul, Jujutsu) that model conflicts as first-class objects are highlighted as more principled.
AI Training, Scraping & “Self-Eating” Models
- Several comments pivot to LLMs: code repos are clearly being scraped (e.g., many unexplained GitHub clones).
- People discuss blocking AI crawlers, model training on imperfect code, and the possibility of “poisoning” training data (mostly as a thought experiment).
- Broader concerns about humans over-trusting AI outputs and the feedback loop when authors also use LLMs to write content.
Format & UX Choices
- Strong pushback against YAML for machine-generated metadata; JSON or TOML are preferred for simplicity and fewer edge cases.
- Question raised why introduce a new ignore file instead of reusing
.gitignore.