FireDucks: Pandas but Faster

Scope and Compatibility

  • Marketed as a near drop‑in, “100% compatible” Pandas replacement; documentation admits it is not fully compatible but “pretty close.”
  • Targeted at single‑node workloads where Pandas is already used; not intended to replace distributed engines like Spark for very large jobs.
  • Currently Linux‑only, which limits applicability for some users.

Performance & Internals

  • Reported speedups come from multithreading, reimplementing core Pandas operations (e.g., dropna) in C++, lazy evaluation, and an MLIR/LLVM-based JIT that prunes unused work.
  • Some users impressed by large real‑world speedups (minutes → seconds) with zero code changes.
  • Others question claims of 100x over Pandas and 20–30% over Polars/DuckDB/ClickHouse; one user reran TPCH benchmarks and found Polars dramatically faster while FireDucks crashed, leading them to distrust the published numbers.
  • It’s unclear how it beats other modern engines that already use similar techniques (multithreading, Arrow, query optimization).

Licensing, Source, and Commercialization

  • Package is on PyPI under a 3‑clause BSD license, but the core C++ “magic” is only shipped as a binary; source is not available.
  • This creates confusion: legally open‑source binaries but practically closed source.
  • Several commenters see this as misleading or a “binary blob under an OSS license,” and worry about future lock‑in.
  • Official docs state current beta is free, with explicit intent to commercialize later; some view this as a “trap” for teams that adopt it before pricing is known.
  • Many say they will not adopt without C++ source, especially in finance/quant where inspectability is critical.

Pandas, Polars, and Alternatives

  • FireDucks’ main appeal: speedups for existing Pandas codebases without refactoring.
  • Polars is widely praised for a cleaner, more regular API and Arrow integration, but lacks some Pandas time‑series ergonomics; converting large Pandas codebases is non‑trivial.
  • Several users prefer sticking with Polars, DuckDB, DataFusion, or Ibis (often citing openness and Arrow-based interoperability).
  • Many complaints focus on Pandas’ clunky, footgun‑prone API rather than raw speed; some want a “better Pandas” API more than a faster backend.