Honey, I shrunk {fmt}: bringing binary size to 14k and ditching the C++ runtime

Locale behavior and std::format

  • {fmt} is locale‑independent by default; some see this as “fixing” historically bad C++ defaults around locales.
  • Others argue standard C++ should respect locales and are filing a Defect Report about std::format ignoring them by default.
  • It’s noted you can pass a locale explicitly, but that doesn’t address the default.
  • There’s mild optimism that newer standardization work avoids repeating older locale mistakes.

Floating‑point formatting complexity and performance

  • Commenters are struck by how much code correct, fast float formatting requires.
  • Dragonbox is highlighted as a modern, highly optimized algorithm; rough comparisons suggest older “teaching” algorithms can be ~100–1000× slower.
  • {fmt} can optionally use Dragon4 for smaller code size at the cost of speed.
  • Dragonbox can be trimmed to ~3 kB for single precision on 8‑bit AVR, but even that is considered “huge” in very tight environments.

Binary size, runtimes, and allocators

  • Float formatting can dominate binary size; one Zig example showed large bloat until floats were cast to integers before printing.
  • On Windows, avoiding the C runtime (e.g., using /NODEFAULTLIB and custom entry) can yield ~1 KiB self‑contained binaries.
  • The post’s technique of replacing new/delete with malloc/free (via a custom allocator and FMT_THROW with -fno-exceptions) is discussed as a way to drop C++ runtime dependencies.
  • There is debate over whether just redefining global operator new/delete would achieve similar savings.

Microcontrollers vs general‑purpose targets

  • One side: for 2–16 kB flash microcontrollers, a 14 kB formatting library is untenable; they use tiny, hand‑rolled or vendor printf variants (hundreds of bytes).
  • Others counter: many modern MCUs (ESP32, Cortex‑M3+) have hundreds of kB to MB of flash; 10–14 kB for a rich formatter is acceptable there.
  • Some emphasize that the article’s optimizations target Linux/aarch64, not ultra‑tiny MCUs.

Dead‑code elimination and compile‑time formats

  • People hope unused formatting features (floats, hex, etc.) would be stripped, but note that generic, runtime‑parsed format strings make this hard.
  • Techniques mentioned: function/data‑section linking, LTO, feature flags (as in Rust), and compile‑time format string processing (FMT_COMPILE), but these are not yet a complete size solution.

C vs C++ in tiny systems

  • Disagreement over whether C++ is appropriate in 2 kB code spaces.
  • Some argue you can use “C++ without the runtime” (no exceptions, no RTTI, no inheritance) and still benefit from templates, RAII, and namespaces with minimal overhead.
  • Others note templates and class hierarchies can still explode code size, and historically very constrained systems avoided OO for that reason.

Debugging and use‑cases

  • Extremely cheap devices (e.g., singing cards, simple consumer gadgets) are cited as real targets where every cent and every byte matter.
  • Even there, small printf‑style facilities are valued for serial/field debugging, but would be disabled or replaced for production.

Overall view of {fmt}

  • Most agree {fmt} is designed to be feature‑rich and fast, with size as an important but secondary goal.
  • There is appreciation for the “thinking outside the box” work to get it to ~14 kB, along with recognition that it still won’t suit the most constrained microcontrollers.