Honey, I shrunk {fmt}: bringing binary size to 14k and ditching the C++ runtime
Locale behavior and std::format
{fmt}is locale‑independent by default; some see this as “fixing” historically bad C++ defaults around locales.- Others argue standard C++ should respect locales and are filing a Defect Report about
std::formatignoring them by default. - It’s noted you can pass a locale explicitly, but that doesn’t address the default.
- There’s mild optimism that newer standardization work avoids repeating older locale mistakes.
Floating‑point formatting complexity and performance
- Commenters are struck by how much code correct, fast float formatting requires.
- Dragonbox is highlighted as a modern, highly optimized algorithm; rough comparisons suggest older “teaching” algorithms can be ~100–1000× slower.
{fmt}can optionally use Dragon4 for smaller code size at the cost of speed.- Dragonbox can be trimmed to ~3 kB for single precision on 8‑bit AVR, but even that is considered “huge” in very tight environments.
Binary size, runtimes, and allocators
- Float formatting can dominate binary size; one Zig example showed large bloat until floats were cast to integers before printing.
- On Windows, avoiding the C runtime (e.g., using
/NODEFAULTLIBand custom entry) can yield ~1 KiB self‑contained binaries. - The post’s technique of replacing
new/deletewithmalloc/free(via a custom allocator andFMT_THROWwith-fno-exceptions) is discussed as a way to drop C++ runtime dependencies. - There is debate over whether just redefining global
operator new/deletewould achieve similar savings.
Microcontrollers vs general‑purpose targets
- One side: for 2–16 kB flash microcontrollers, a 14 kB formatting library is untenable; they use tiny, hand‑rolled or vendor
printfvariants (hundreds of bytes). - Others counter: many modern MCUs (ESP32, Cortex‑M3+) have hundreds of kB to MB of flash; 10–14 kB for a rich formatter is acceptable there.
- Some emphasize that the article’s optimizations target Linux/aarch64, not ultra‑tiny MCUs.
Dead‑code elimination and compile‑time formats
- People hope unused formatting features (floats, hex, etc.) would be stripped, but note that generic, runtime‑parsed format strings make this hard.
- Techniques mentioned: function/data‑section linking, LTO, feature flags (as in Rust), and compile‑time format string processing (
FMT_COMPILE), but these are not yet a complete size solution.
C vs C++ in tiny systems
- Disagreement over whether C++ is appropriate in 2 kB code spaces.
- Some argue you can use “C++ without the runtime” (no exceptions, no RTTI, no inheritance) and still benefit from templates, RAII, and namespaces with minimal overhead.
- Others note templates and class hierarchies can still explode code size, and historically very constrained systems avoided OO for that reason.
Debugging and use‑cases
- Extremely cheap devices (e.g., singing cards, simple consumer gadgets) are cited as real targets where every cent and every byte matter.
- Even there, small
printf‑style facilities are valued for serial/field debugging, but would be disabled or replaced for production.
Overall view of {fmt}
- Most agree
{fmt}is designed to be feature‑rich and fast, with size as an important but secondary goal. - There is appreciation for the “thinking outside the box” work to get it to ~14 kB, along with recognition that it still won’t suit the most constrained microcontrollers.