Postgres Just Cracked the Top Fastest Databases for Analytics
What pg_mooncake Actually Is
- Implemented as a Postgres extension providing a columnstore table access method.
- Data for columnstore tables is stored as Parquet on S3 and local disk, with Delta or Iceberg metadata.
- Analytical queries on these tables are executed by an embedded DuckDB engine, while Postgres handles catalog, transactions, and WAL (for metadata only).
- Some commenters argue this means “it’s really DuckDB,” others insist it’s still “just Postgres” from the user’s perspective (psql, extensions, SQL interface).
Performance, Benchmarks, and Scaling
- On ClickBench it was initially in the top 10 but later slipped to around #12; maintainers say there are easy optimizations left.
- They claim to be faster than DuckDB on Parquet by storing detailed Parquet metadata in Postgres and using segment elimination to skip files/row groups.
- Concern raised about Postgres’ linear CPU scaling vs cloud warehouses; maintainers respond that Mooncake’s design allows offloading big queries to “stateless engines” (Athena, StarRocks, Spark, etc.) in future versions.
Roadmap and HTAP / Replication
- v0.1 has inefficient small writes (one Parquet file per insert); v0.2 aims to fix this and support small-write and time-series/HTAP workloads.
- Plan: keep OLTP tables in regular Postgres, use logical replication to maintain columnar copies, and run analytics on those.
- Logical replication is flagged as powerful but non-trivial to run reliably; some warn it can become an entire product’s worth of complexity.
Comparisons to Other Systems
- Crunchy Data Warehouse: architecturally similar (Postgres + DuckDB + Iceberg); differences cited are open-source licensing, supporting both Iceberg and Delta, and focus on small writes in v0.2.
- pg_duckdb / Hydra: mainly for querying existing Parquet/S3 data; Mooncake focuses on writing and managing columnar tables in Postgres.
- TimescaleDB is mentioned as a nearby point on the design space (time-series/columnar) but with different S3 capabilities and licensing.
Deployment, Licensing, and “All You Need Is Postgres”
- Some are skeptical of calling this “Postgres” given the dependency on a non-core extension and limitations on using such extensions in hosted environments.
- Legal/licensing concerns around extensions (AGPL, cloud restrictions) are discussed; Mooncake’s extension is MIT-licensed, with assurances it will remain so.
- Broader thread notes strong demand for “proper Postgres analytics” to avoid complex ETL into separate systems like ClickHouse or BigQuery.