Show HN: BemiDB – Postgres read replica optimized for analytics

Overall concept & architecture

  • BemiDB is positioned as a Postgres read-replica for analytics.
  • Embeds DuckDB as the query engine, stores data in Apache Iceberg tables with columnar Parquet files (often ZSTD-compressed).
  • Runs as a separate process (no Postgres extension), connects over the Postgres protocol, and writes to S3 or local disk.

Primary use cases discussed

  • Time-series / IoT: keep recent months in Postgres for fast app queries, archive older data to S3 in Parquet/Iceberg, and run analytical or visualization queries over the full history.
  • Auditing / change capture: potential to combine with existing logical-replication-based auditing tooling from the same team.
  • Machine learning feature/data pipelines: replacing bespoke Postgres→Parquet→DuckDB flows.

Syncing, consistency, and CDC

  • Current implementation: periodic full-table re-sync via COPY to CSV then Iceberg.
  • Incremental sync with logical replication (CDC) is on the roadmap; planned approach is to buffer changes and flush to S3 based on time/size thresholds.
  • Strong consistency is not guaranteed; users must accept delayed data for analytics.
  • Questions were raised about how updates/deletes, data retention, and very large tables will be handled; answer: future Iceberg “diff” files and metadata-based stitching, enabling time travel and schema evolution.

Performance, scale, and latency

  • Benchmarks cited: on TPC-H SF1/SF0.1, BemiDB’s Parquet data was much smaller than Postgres storage; some debate about the realism of unindexed Postgres baselines.
  • One commenter questioned logical replication’s ability to keep up on multi-TB systems; authors position current target as small/medium Postgres and expect more pipelines at larger scale.
  • S3-based analytics are said to have ~1s-level latency; local SSD-backed Iceberg is reported as “super fast.” Caching is on the roadmap.

Comparison with other tools

  • DuckDB: used internally, but seen as still buggy by some; BemiDB adds Postgres-wire and Iceberg support, plus sync automation.
  • ClickHouse: widely praised for performance and S3 support; some see it as a better production pairing with Postgres, others prefer BemiDB’s simpler single-binary + object storage model.
  • Alternatives mentioned: pg_analytics (ParadeDB), pg-archiver, Debezium/Kafka→ClickHouse pipelines, Materialize/Feldera/Striim for incremental view maintenance.

Licensing debate

  • AGPL choice sparked significant pushback due to perceived legal complexity and “fair source” dynamics.
  • Others defended AGPL as aligned with user-freedom focused open source.
  • Authors indicated openness to more permissive licensing over time.