FireDucks: Pandas but Faster
Scope and Compatibility
- Marketed as a near drop‑in, “100% compatible” Pandas replacement; documentation admits it is not fully compatible but “pretty close.”
- Targeted at single‑node workloads where Pandas is already used; not intended to replace distributed engines like Spark for very large jobs.
- Currently Linux‑only, which limits applicability for some users.
Performance & Internals
- Reported speedups come from multithreading, reimplementing core Pandas operations (e.g.,
dropna) in C++, lazy evaluation, and an MLIR/LLVM-based JIT that prunes unused work. - Some users impressed by large real‑world speedups (minutes → seconds) with zero code changes.
- Others question claims of 100x over Pandas and 20–30% over Polars/DuckDB/ClickHouse; one user reran TPCH benchmarks and found Polars dramatically faster while FireDucks crashed, leading them to distrust the published numbers.
- It’s unclear how it beats other modern engines that already use similar techniques (multithreading, Arrow, query optimization).
Licensing, Source, and Commercialization
- Package is on PyPI under a 3‑clause BSD license, but the core C++ “magic” is only shipped as a binary; source is not available.
- This creates confusion: legally open‑source binaries but practically closed source.
- Several commenters see this as misleading or a “binary blob under an OSS license,” and worry about future lock‑in.
- Official docs state current beta is free, with explicit intent to commercialize later; some view this as a “trap” for teams that adopt it before pricing is known.
- Many say they will not adopt without C++ source, especially in finance/quant where inspectability is critical.
Pandas, Polars, and Alternatives
- FireDucks’ main appeal: speedups for existing Pandas codebases without refactoring.
- Polars is widely praised for a cleaner, more regular API and Arrow integration, but lacks some Pandas time‑series ergonomics; converting large Pandas codebases is non‑trivial.
- Several users prefer sticking with Polars, DuckDB, DataFusion, or Ibis (often citing openness and Arrow-based interoperability).
- Many complaints focus on Pandas’ clunky, footgun‑prone API rather than raw speed; some want a “better Pandas” API more than a faster backend.