2024-11-14

FireDucks: Pandas but Faster

Scope and Compatibility

Marketed as a near drop‑in, “100% compatible” Pandas replacement; documentation admits it is not fully compatible but “pretty close.”
Targeted at single‑node workloads where Pandas is already used; not intended to replace distributed engines like Spark for very large jobs.
Currently Linux‑only, which limits applicability for some users.

Performance & Internals

Reported speedups come from multithreading, reimplementing core Pandas operations (e.g., dropna) in C++, lazy evaluation, and an MLIR/LLVM-based JIT that prunes unused work.
Some users impressed by large real‑world speedups (minutes → seconds) with zero code changes.
Others question claims of 100x over Pandas and 20–30% over Polars/DuckDB/ClickHouse; one user reran TPCH benchmarks and found Polars dramatically faster while FireDucks crashed, leading them to distrust the published numbers.
It’s unclear how it beats other modern engines that already use similar techniques (multithreading, Arrow, query optimization).

Licensing, Source, and Commercialization

Package is on PyPI under a 3‑clause BSD license, but the core C++ “magic” is only shipped as a binary; source is not available.
This creates confusion: legally open‑source binaries but practically closed source.
Several commenters see this as misleading or a “binary blob under an OSS license,” and worry about future lock‑in.
Official docs state current beta is free, with explicit intent to commercialize later; some view this as a “trap” for teams that adopt it before pricing is known.
Many say they will not adopt without C++ source, especially in finance/quant where inspectability is critical.

Pandas, Polars, and Alternatives

FireDucks’ main appeal: speedups for existing Pandas codebases without refactoring.
Polars is widely praised for a cleaner, more regular API and Arrow integration, but lacks some Pandas time‑series ergonomics; converting large Pandas codebases is non‑trivial.
Several users prefer sticking with Polars, DuckDB, DataFusion, or Ibis (often citing openness and Arrow-based interoperability).
Many complaints focus on Pandas’ clunky, footgun‑prone API rather than raw speed; some want a “better Pandas” API more than a faster backend.

Related topics