Bzip2 crate switches from C to 100% Rust
Adoption as a System bzip2 & ABI/Dynamic Linking
- Several comments discuss whether this Rust implementation could replace the “official” C bzip2 in distros, noting Fedora’s zlib→zlib-ng precedent.
- The crate exposes a C-compatible ABI (
cdylib), so in principle it can be dropped in aslibbz2if packagers do the work and verify ABI/symbol compatibility. - Long subthread clarifies Rust linking:
- Rust can produce dynamically linked libraries for the C ABI and can be dynamically linked by C.
- There is no stable Rust-to-Rust ABI across compiler versions, so Rust deps are usually statically linked, but C libs (libc, OpenSSL, zlib, etc.) are commonly dynamically linked.
- Static vs dynamic linking tradeoffs are debated: binary size, page cache sharing, LTO, rebuild costs; no consensus, but several point out that “static is always smaller” is wrong in multi-binary systems.
Motivations: Safety, Maintainability, Performance
- Many see bzip2 as still relevant (tar archives, Wikipedia dumps, Common Crawl), so a safer, better-maintained implementation is valuable.
- Rewriting in Rust reduces memory-unsafe failure modes (bounds issues become data corruption or panics rather than exploitable overflows) and simplifies cross-compilation and WASM targets.
- Users report substantial real-world gains (e.g., processing hundreds of TB of data), and the published ~10–15% compression / ~5–10% decompression speedups are considered meaningful, especially at scale or for battery-constrained devices.
- A few argue that the original C is “finished” and that speedups don’t justify a more complex language with fewer maintainers; others counter that Rust is easier to contribute to and brings better tooling and test ergonomics.
“Rewrite in Rust” Culture & Value of Optimization
- Some view the broader “X rewritten in Rust” trend as churn or CV-padding, especially when framed as a wholesale replacement rather than an alternative.
- Others compare it to historical waves of replacements (AT&T→BSD→GNU, Bourne→bash) and argue that innovation in CLI tools (ripgrep, tokei, sd, uutils) is beneficial.
- There is pushback against dismissing CPU efficiency as irrelevant; commenters link wasted cycles to energy cost, server bills, and UI/“Electron” bloat, invoking Wirth’s law/Jevons paradox.
Security, CVEs, and Critical Infrastructure
- A question about outstanding CVEs in bzip2 elicits the response that the Rust crate has fixed its own historical CVE (pre-0.4.4) and that many C CVEs involve bounds issues that Rust’s model helps avoid.
- Several see this as part of a larger effort (e.g., Prossimo-like initiatives) to move critical components—compression, TLS, DNS, routing protocols—into memory-safe languages; alternatives in Rust and SPARK Ada are mentioned.
Transpilation vs LLMs & Source of Speedups
- The team used c2rust to mechanically translate the C code, then incrementally refactored into idiomatic Rust, guided by the existing bzip2 test suite and fuzzing.
- Commenters consider LLM-based transpilation too error-prone for such low-level, security-sensitive code.
- Speculated performance sources: better aliasing guarantees, more precise types (enabling optimizations), easier use of appropriate data structures/algorithms, and modern intrinsics that are awkward in legacy C.