How we decreased GitLab repo backup times from 48 hours to 41 minutes
Nature of the bug and the fix
- Backup slowness was traced to a 15-year-old
git bundle createfunction with an O(N²) duplicate-ref check implemented as a nested loop over refs. - The fix replaced this with a set/map-based deduplication, turning the critical section into O(N) (or N·logN depending on implementation), yielding a ~6x improvement locally but much larger end-to-end impact for GitLab (48 hours → 41 minutes).
- Commenters note this is a textbook “use a hash/set instead of a quadratic scan” situation, and that in C it’s common to default to arrays/loops because container usage is more cumbersome.
Debate over “exponential” language
- The article’s phrase “reducing backup times exponentially” in the same sentence as big-O notation drew heavy criticism.
- Many argued that in a technical performance post, “exponential” should not be used colloquially; it created confusion and wasted readers’ time trying to locate an actual exponential-time algorithm.
- Suggested alternatives: “from quadratic to linear,” “dramatically,” “by a factor of n,” or explicitly stating the new big-O.
- A few defended colloquial use, but most saw it as sloppy or misleading in this context; the author later acknowledged the issue.
Quadratic complexity in practice
- Multiple anecdotes support the idea that O(N²) is the “sweet spot of bad algorithms”: fast in tests, disastrous at scale (sometimes far beyond quadratic in the wild).
- Discussion covers when N is small and bounded (e.g., hardware-limited cases) where quadratic or even cubic can be acceptable, versus unbounded N where it’s a latent production time bomb.
- Some advocate systematically eliminating N² unless N has a hard, low upper bound, and adding explicit safeguards if assumptions about N might change.
C, data structures, and Git implementations
- Several comments note this as an example that C alone doesn’t guarantee speed; algorithm and data structure choices dominate.
- Others argue C makes these mistakes likelier because sets/maps aren’t standard and are harder to integrate than in languages with built-in containers.
- Thread lists alternative Git implementations/libraries (Rust, Go, Java, Haskell, C libraries), with some using them successfully in production.
Backup strategies and alternatives
- Many question why GitLab relies on
git bundleinstead of filesystem-level backups (e.g., ZFS/Btrfs snapshots plus send/receive). - Defenses: direct filesystem copying can bypass Git’s integrity checking and is tricky across diverse filesystems and self-hosted environments; bundles provide a portable, Git-aware backup format.
- Offsite replication of ZFS snapshots and snapshot consistency issues (e.g., with Syncthing) are discussed; Git’s lack of a WAL-like mechanism makes naive snapshotting risky in some setups.
Profiling and tooling
- The flame graph in the article was produced via
perfplus FlameGraph tooling; alternatives recommended includepprof, gperftools, and Samply with Firefox’s profiler. - Several commenters emphasize that algorithmic changes uncovered via profiling dwarf typical micro-optimizations.
Reactions to GitLab and the write-up
- Some praise the upstream contribution (landing in Git v2.50.0) and note GitLab now has a dedicated Git team.
- Others complain about GitLab’s slow UI and the blog post’s style: too long, possibly LLM-like, and missing concrete code snippets, though still useful once skimmed to the core technical section.