Full text search over Postgres: Elasticsearch vs. alternatives
Postgres FTS vs. Elasticsearch
- Several commenters say Postgres FTS is fine for simple or low-scale use (blogs, small apps), but becomes painful at larger scales, especially for facets and relevance.
- Postgres lacks BM25 and global term statistics in core, limiting ranking quality; extensions like RUM and ParadeDB/pg_search aim to close this gap.
- Some argue you eventually end up re‑implementing an ETL‑like pipeline and specialized indexing inside Postgres, without the full feature set of a dedicated search engine.
Elasticsearch Reliability & Operations
- Mixed experiences: some report “random” failures, GC issues, and operational headaches, others say Elasticsearch is reliable if sized and configured correctly.
- Common pitfalls: dynamic mapping creating huge schemas, underprovisioned clusters, poor shard/index design, and weak backup practices.
- Replication, corruption handling, and leader election reliability are debated; some feel it’s improved since ~7.x, others remain wary.
- Serverless Elasticsearch/OpenSearch is seen as promising but with tradeoffs (S3-backed, eventual visibility of new data, cold starts).
Faceted Search & Analytics
- Facets/aggregations over large result sets are cited as a major reason to prefer Elasticsearch/Solr/OpenSearch.
- Postgres can do facets but it’s complex and slow at scale; extensions like pgfaceting and ParadeDB’s aggregations aim to help.
- Facets + FTS + analytics in one query is described as a key “stickiness” factor for Elastic-style systems.
BM25, Ranking, and Multi‑Tenancy
- BM25 support is viewed as a big differentiator; SQLite FTS5 and Elasticsearch have it; core Postgres does not.
- Multi‑tenant BM25 is tricky: corpus‑wide stats can leak information or skew relevance between tenants. Suggested mitigations include per-tenant indexes, partial/partitioned indexes, or per-DB/collection isolation.
- Shard-level stats can affect scoring; there are workarounds (e.g., special query modes, 1‑shard indices).
Semantic/Vector Search and Hybrid Approaches
- Consensus that vector/semantic search complements, not replaces, keyword/BM25 search.
- Hybrid approaches (BM25 + vectors, reciprocal rank fusion, query rewriting) are recommended, especially when domain-specific terms or IDs are important.
- Keeping two separate datastores (vector + keyword engine) in sync is seen as operationally fragile; some advocate unified systems that combine both.
Alternatives: Meilisearch, Typesense, Solr, Others
- Meilisearch and Typesense praised for simplicity, developer experience, and ease of self-hosting, though feature sets are narrower than Elastic.
- Solr is positioned as offering most of Elasticsearch’s Lucene-based power without Elastic’s ecosystem baggage but with less mindshare and more manual scaling.
- Other options mentioned: Manticore, ArangoDB (for FTS + graphs), Manticore’s MySQL-friendly protocol, and newer hybrid engines (e.g., Tantivy-based, Infinity).
ParadeDB / pg_search Notes
- ParadeDB’s pg_search is described as bringing BM25 and “Elastic-like” FTS into Postgres, distributed as an extension or a separate, replicated Postgres instance.
- Earlier versions were weakly consistent; current BM25 indexes are said to be strongly consistent, with minor transactional overhead.
- Many users reportedly deploy ParadeDB as a separate “search Postgres” fed via logical replication from primary OLTP Postgres to decouple workloads.