Huge Binaries

Binary sizes and where the bloat comes from

  • 25 GiB+ binaries are described, with people noting that most of that can be debug info rather than executable code.
  • C++ debug symbols are highlighted as a huge contributor: templates, type info, local variable locations, line mappings, and multiple specializations generate massive DWARF sections.
  • Some extreme cases: LLVM-dependent builds >30 GB, 25 GB stripped binaries, and games or applications embedding large assets or model weights inside the executable.

Static vs dynamic linking at large scale

  • Large shops favor static (or mostly static) binaries for:
    • Startup speed and reduced dynamic loader overhead (PLT/GOT, symbol interposition).
    • Easier profiling, crashdump analysis, and fleet-wide tooling that assumes a single monolithic binary.
    • Binary provenance and security guarantees: “what’s running is exactly what we built”.
  • Reasons given for avoiding dynamic libraries:
    • ABI instability and header-only templates make reusable .so’s hard in big C++ monorepos.
    • Different builds use different library versions, defeating sharing.
    • Historical ld.so performance issues with many shared objects.
    • Operational weirdness at scale (e.g., bit flips or corruption making a shared library “poisonous” for all processes on a node).
  • Skeptics point out that huge cloud providers successfully use dynamic linking and managed runtimes, questioning whether static linking is truly required for scale.

Debug info handling and tooling

  • Detached debug files, split DWARF (-gsplit-dwarf), and compressed debug sections are widely known and used, but tooling is seen as clumsy.
  • Several note that debuginfo sections don’t affect relocation distances or runtime memory (they’re non-allocated ELF sections).
  • Operational practice: ship stripped binaries, keep symbol files in a “symbol DB” for post-mortem debugging.

Code size, dead code, and optimizations

  • Many argue that hitting a 2 GiB .text limit signals missing dead-code elimination: use LTO, -ffunction-sections + --gc-sections, identical code folding, tree-shaking, or better partitioning.
  • Others counter that even with these, large monolithic C++ services can genuinely approach 2 GiB of code.

Code models, thunks, and relocation limits

  • Discussion dives into x86-64 code models and the 2 GiB relative jump/call limit.
  • Medium/large code models, thunks/trampolines, and post-link optimizers like BOLT are discussed as strategies, each with performance tradeoffs.
  • It’s noted that a proper range-extension thunk ABI for x86-64 would be preferable to pessimistically upgrading everything to the large code model.