Bzip2 crate switches from C to 100% Rust

Adoption as a System bzip2 & ABI/Dynamic Linking

  • Several comments discuss whether this Rust implementation could replace the “official” C bzip2 in distros, noting Fedora’s zlib→zlib-ng precedent.
  • The crate exposes a C-compatible ABI (cdylib), so in principle it can be dropped in as libbz2 if packagers do the work and verify ABI/symbol compatibility.
  • Long subthread clarifies Rust linking:
    • Rust can produce dynamically linked libraries for the C ABI and can be dynamically linked by C.
    • There is no stable Rust-to-Rust ABI across compiler versions, so Rust deps are usually statically linked, but C libs (libc, OpenSSL, zlib, etc.) are commonly dynamically linked.
  • Static vs dynamic linking tradeoffs are debated: binary size, page cache sharing, LTO, rebuild costs; no consensus, but several point out that “static is always smaller” is wrong in multi-binary systems.

Motivations: Safety, Maintainability, Performance

  • Many see bzip2 as still relevant (tar archives, Wikipedia dumps, Common Crawl), so a safer, better-maintained implementation is valuable.
  • Rewriting in Rust reduces memory-unsafe failure modes (bounds issues become data corruption or panics rather than exploitable overflows) and simplifies cross-compilation and WASM targets.
  • Users report substantial real-world gains (e.g., processing hundreds of TB of data), and the published ~10–15% compression / ~5–10% decompression speedups are considered meaningful, especially at scale or for battery-constrained devices.
  • A few argue that the original C is “finished” and that speedups don’t justify a more complex language with fewer maintainers; others counter that Rust is easier to contribute to and brings better tooling and test ergonomics.

“Rewrite in Rust” Culture & Value of Optimization

  • Some view the broader “X rewritten in Rust” trend as churn or CV-padding, especially when framed as a wholesale replacement rather than an alternative.
  • Others compare it to historical waves of replacements (AT&T→BSD→GNU, Bourne→bash) and argue that innovation in CLI tools (ripgrep, tokei, sd, uutils) is beneficial.
  • There is pushback against dismissing CPU efficiency as irrelevant; commenters link wasted cycles to energy cost, server bills, and UI/“Electron” bloat, invoking Wirth’s law/Jevons paradox.

Security, CVEs, and Critical Infrastructure

  • A question about outstanding CVEs in bzip2 elicits the response that the Rust crate has fixed its own historical CVE (pre-0.4.4) and that many C CVEs involve bounds issues that Rust’s model helps avoid.
  • Several see this as part of a larger effort (e.g., Prossimo-like initiatives) to move critical components—compression, TLS, DNS, routing protocols—into memory-safe languages; alternatives in Rust and SPARK Ada are mentioned.

Transpilation vs LLMs & Source of Speedups

  • The team used c2rust to mechanically translate the C code, then incrementally refactored into idiomatic Rust, guided by the existing bzip2 test suite and fuzzing.
  • Commenters consider LLM-based transpilation too error-prone for such low-level, security-sensitive code.
  • Speculated performance sources: better aliasing guarantees, more precise types (enabling optimizations), easier use of appropriate data structures/algorithms, and modern intrinsics that are awkward in legacy C.