How we replaced Elasticsearch and MongoDB with Rust and RocksDB
Geocoding stack and open-source ecosystem
- Several commenters relate the approach to existing OSM geocoders like Photon and OSM Express; suggestion that LMDB may suit OSM-like workloads better than RocksDB.
- The described system uses proprietary data (especially for FastText-based semantic models), which makes open-sourcing difficult; idea of swapping in a small open BERT model is raised but with QPS and latency tradeoffs.
- FastText is assumed to power semantic queries like “coffee near me,” while Tantivy handles more structured/address search.
- The company plans to open-source S2 Rust bindings, which might help others build reverse geocoders, but a full Photon replacement is still aspirational.
Motivation and architecture (Rust + RocksDB + Tantivy)
- Author explains the goal as turning a “distributed system problem” (ES/Mongo clusters) into a monolithic, in-process system using embedded storage.
- Memory-mapped indexes plus “just add RAM” are used to reach global coverage; reindexing is done as immutable rebuilds on separate nodes and published as static assets.
- Some argue this replicates what Postgres + pg_search/pgvector, ParadeDB, or similar could do, and see it as another case of reinventing the wheel.
Elasticsearch operations and alternatives
- Mixed experiences: some find ES fragile and high-touch compared to primary datastores; others report large clusters running for years with minimal maintenance if queries and indexing are well-designed.
- Hosted ES/OpenSearch options are mentioned, but data-sovereignty and cost constraints can limit their use.
- ES is praised for flexibility with changing business requirements; missing tooling (like a simple “copy data between nodes” CLI) is noted but workarounds via HTTP APIs exist.
- Alternatives discussed: Typesense (simple, focused, good DX), DuckDB with spatial extensions (excellent for static or batch geo workloads), Quickwit (Tantivy-based), ManticoreSearch, and Quickwit/MotherDuck for log/OLAP-like use cases.
RocksDB vs LMDB and reliability
- One commenter warns that RocksDB’s LevelDB ancestry might hide operational pain; others counter with multi-year, large-scale RocksDB deployments without correctness issues.
- LMDB is characterized as ideal for read-heavy workloads, extremely low FD usage and almost no tuning, while RocksDB needs more configuration and can consume many FDs.
Datastore innovation and vector search
- Debate over whether “data stores are mostly solved”: some say most enterprises only need Postgres; others argue there’s still real innovation needed around embeddings, filtered ANN, dynamic updates, and hybrid keyword/semantic search.
- Large search engines (ES, Vespa) are seen by some as sufficient for ANN at scale if you pay the complexity and hardware costs; vector DBs are viewed by some as more about ease-of-use than new capabilities.
Other meta points
- Several readers want more concrete details: sharding, replication, failure handling, indexing latency, durability, and benchmarks.
- Minor side threads critique the marketing tone, the non-open-sourcing, FastText’s maintenance status, and the title’s inclusion of “Rust” (a language) alongside ES and MongoDB (databases).
- There is also a tangential but lively discussion about “in-office culture” as a benefit versus remote work preferences.