How we replaced Elasticsearch and MongoDB with Rust and RocksDB

Geocoding stack and open-source ecosystem

  • Several commenters relate the approach to existing OSM geocoders like Photon and OSM Express; suggestion that LMDB may suit OSM-like workloads better than RocksDB.
  • The described system uses proprietary data (especially for FastText-based semantic models), which makes open-sourcing difficult; idea of swapping in a small open BERT model is raised but with QPS and latency tradeoffs.
  • FastText is assumed to power semantic queries like “coffee near me,” while Tantivy handles more structured/address search.
  • The company plans to open-source S2 Rust bindings, which might help others build reverse geocoders, but a full Photon replacement is still aspirational.

Motivation and architecture (Rust + RocksDB + Tantivy)

  • Author explains the goal as turning a “distributed system problem” (ES/Mongo clusters) into a monolithic, in-process system using embedded storage.
  • Memory-mapped indexes plus “just add RAM” are used to reach global coverage; reindexing is done as immutable rebuilds on separate nodes and published as static assets.
  • Some argue this replicates what Postgres + pg_search/pgvector, ParadeDB, or similar could do, and see it as another case of reinventing the wheel.

Elasticsearch operations and alternatives

  • Mixed experiences: some find ES fragile and high-touch compared to primary datastores; others report large clusters running for years with minimal maintenance if queries and indexing are well-designed.
  • Hosted ES/OpenSearch options are mentioned, but data-sovereignty and cost constraints can limit their use.
  • ES is praised for flexibility with changing business requirements; missing tooling (like a simple “copy data between nodes” CLI) is noted but workarounds via HTTP APIs exist.
  • Alternatives discussed: Typesense (simple, focused, good DX), DuckDB with spatial extensions (excellent for static or batch geo workloads), Quickwit (Tantivy-based), ManticoreSearch, and Quickwit/MotherDuck for log/OLAP-like use cases.

RocksDB vs LMDB and reliability

  • One commenter warns that RocksDB’s LevelDB ancestry might hide operational pain; others counter with multi-year, large-scale RocksDB deployments without correctness issues.
  • LMDB is characterized as ideal for read-heavy workloads, extremely low FD usage and almost no tuning, while RocksDB needs more configuration and can consume many FDs.

Datastore innovation and vector search

  • Debate over whether “data stores are mostly solved”: some say most enterprises only need Postgres; others argue there’s still real innovation needed around embeddings, filtered ANN, dynamic updates, and hybrid keyword/semantic search.
  • Large search engines (ES, Vespa) are seen by some as sufficient for ANN at scale if you pay the complexity and hardware costs; vector DBs are viewed by some as more about ease-of-use than new capabilities.

Other meta points

  • Several readers want more concrete details: sharding, replication, failure handling, indexing latency, durability, and benchmarks.
  • Minor side threads critique the marketing tone, the non-open-sourcing, FastText’s maintenance status, and the title’s inclusion of “Rust” (a language) alongside ES and MongoDB (databases).
  • There is also a tangential but lively discussion about “in-office culture” as a benefit versus remote work preferences.