2025-08-08

How we replaced Elasticsearch and MongoDB with Rust and RocksDB

Geocoding stack and open-source ecosystem

Several commenters relate the approach to existing OSM geocoders like Photon and OSM Express; suggestion that LMDB may suit OSM-like workloads better than RocksDB.
The described system uses proprietary data (especially for FastText-based semantic models), which makes open-sourcing difficult; idea of swapping in a small open BERT model is raised but with QPS and latency tradeoffs.
FastText is assumed to power semantic queries like “coffee near me,” while Tantivy handles more structured/address search.
The company plans to open-source S2 Rust bindings, which might help others build reverse geocoders, but a full Photon replacement is still aspirational.

Motivation and architecture (Rust + RocksDB + Tantivy)

Author explains the goal as turning a “distributed system problem” (ES/Mongo clusters) into a monolithic, in-process system using embedded storage.
Memory-mapped indexes plus “just add RAM” are used to reach global coverage; reindexing is done as immutable rebuilds on separate nodes and published as static assets.
Some argue this replicates what Postgres + pg_search/pgvector, ParadeDB, or similar could do, and see it as another case of reinventing the wheel.

Elasticsearch operations and alternatives

Mixed experiences: some find ES fragile and high-touch compared to primary datastores; others report large clusters running for years with minimal maintenance if queries and indexing are well-designed.
Hosted ES/OpenSearch options are mentioned, but data-sovereignty and cost constraints can limit their use.
ES is praised for flexibility with changing business requirements; missing tooling (like a simple “copy data between nodes” CLI) is noted but workarounds via HTTP APIs exist.
Alternatives discussed: Typesense (simple, focused, good DX), DuckDB with spatial extensions (excellent for static or batch geo workloads), Quickwit (Tantivy-based), ManticoreSearch, and Quickwit/MotherDuck for log/OLAP-like use cases.

RocksDB vs LMDB and reliability

One commenter warns that RocksDB’s LevelDB ancestry might hide operational pain; others counter with multi-year, large-scale RocksDB deployments without correctness issues.
LMDB is characterized as ideal for read-heavy workloads, extremely low FD usage and almost no tuning, while RocksDB needs more configuration and can consume many FDs.

Datastore innovation and vector search

Debate over whether “data stores are mostly solved”: some say most enterprises only need Postgres; others argue there’s still real innovation needed around embeddings, filtered ANN, dynamic updates, and hybrid keyword/semantic search.
Large search engines (ES, Vespa) are seen by some as sufficient for ANN at scale if you pay the complexity and hardware costs; vector DBs are viewed by some as more about ease-of-use than new capabilities.

Other meta points

Several readers want more concrete details: sharding, replication, failure handling, indexing latency, durability, and benchmarks.
Minor side threads critique the marketing tone, the non-open-sourcing, FastText’s maintenance status, and the title’s inclusion of “Rust” (a language) alongside ES and MongoDB (databases).
There is also a tangential but lively discussion about “in-office culture” as a benefit versus remote work preferences.

Related topics