Huge Binaries
Binary sizes and where the bloat comes from
- 25 GiB+ binaries are described, with people noting that most of that can be debug info rather than executable code.
- C++ debug symbols are highlighted as a huge contributor: templates, type info, local variable locations, line mappings, and multiple specializations generate massive DWARF sections.
- Some extreme cases: LLVM-dependent builds >30 GB, 25 GB stripped binaries, and games or applications embedding large assets or model weights inside the executable.
Static vs dynamic linking at large scale
- Large shops favor static (or mostly static) binaries for:
- Startup speed and reduced dynamic loader overhead (PLT/GOT, symbol interposition).
- Easier profiling, crashdump analysis, and fleet-wide tooling that assumes a single monolithic binary.
- Binary provenance and security guarantees: “what’s running is exactly what we built”.
- Reasons given for avoiding dynamic libraries:
- ABI instability and header-only templates make reusable .so’s hard in big C++ monorepos.
- Different builds use different library versions, defeating sharing.
- Historical ld.so performance issues with many shared objects.
- Operational weirdness at scale (e.g., bit flips or corruption making a shared library “poisonous” for all processes on a node).
- Skeptics point out that huge cloud providers successfully use dynamic linking and managed runtimes, questioning whether static linking is truly required for scale.
Debug info handling and tooling
- Detached debug files, split DWARF (
-gsplit-dwarf), and compressed debug sections are widely known and used, but tooling is seen as clumsy. - Several note that debuginfo sections don’t affect relocation distances or runtime memory (they’re non-allocated ELF sections).
- Operational practice: ship stripped binaries, keep symbol files in a “symbol DB” for post-mortem debugging.
Code size, dead code, and optimizations
- Many argue that hitting a 2 GiB
.textlimit signals missing dead-code elimination: use LTO,-ffunction-sections+--gc-sections, identical code folding, tree-shaking, or better partitioning. - Others counter that even with these, large monolithic C++ services can genuinely approach 2 GiB of code.
Code models, thunks, and relocation limits
- Discussion dives into x86-64 code models and the 2 GiB relative jump/call limit.
- Medium/large code models, thunks/trampolines, and post-link optimizers like BOLT are discussed as strategies, each with performance tradeoffs.
- It’s noted that a proper range-extension thunk ABI for x86-64 would be preferable to pessimistically upgrading everything to the large code model.