Show HN: HelixDB – Open-source vector-graph database for AI applications (Rust)
Positioning vs Other Databases
- Positioned as a graph-first vector database for hybrid / Graph RAG, competing with systems like FalkorDB, Kuzu, SurrealDB, Neo4j, Memgraph, Cozo, Dgraph, Chroma, etc.
- Differentiators emphasized:
- Tight integration of vectors with the graph (incremental indexing instead of separate, re-built vector indexes as with Kuzu).
- On-disk HNSW index to reduce memory pressure compared to in-RAM approaches.
- Maintainers claim 1000× faster than Neo4j and 100× faster than TigerGraph on their internal benchmarks, and “much faster” than SurrealDB; several commenters are openly skeptical and request detailed, fair, reproducible benchmarks.
Query Language and LLM Friendliness
- Helix uses its own query language (HelixQL), described as functional (Gremlin-like) but more readable, and type-safe.
- Some commenters dislike the bespoke DSL, preferring OpenCypher, GQL, GraphQL, or other standards to ease adoption and LLM query generation.
- Maintainers argue type safety and unified graph+vector semantics justify a new language, but acknowledge the learning curve and current LLM friction.
- Proposed mitigations:
- Grammar-constrained decoding so LLMs emit syntactically valid HelixQL.
- An MCP-style traversal tool so agents call graph operations instead of writing queries as text.
Architecture, Storage, and Performance Details
- Implemented in Rust, currently built on LMDB; planning a custom storage engine with in-memory + WASM support.
- Writes optimized via LMDB features (APPEND flags, duplicate keys) and UUIDv6 keys stored as
u128for better locality and reduced space. - Vectors currently stored as
Vec<f64>; plan to supportf32and fixed-size arrays plus binary quantization. No hard dimension cap yet, but likely ~64k in future. - Sparse search: BM25 planned; commenters suggest SPLADE for non-English text.
- Core graph traversals are currently single-threaded; parallel LMDB iteration is in progress.
Use Cases, Scalability, and Graph Features
- Targeted at Graph/Hybrid RAG and knowledge graphs; some users report large speedups moving graph workloads from Postgres.
- Reported tests up to ~10B edges and ~50M nodes without issues; no published comparative scaling benchmarks yet.
- Questions about coverage of standard graph algorithms (for GraphRAG, centralities, etc.) are raised but not fully answered in detail.
Licensing, Deployment, and Roadmap
- Licensed AGPL-3.0: self-hosting is free; closed-source users are expected to pay for a commercial license. Some see this as a blocker for proprietary products.
- Future plans include: own storage engine, WASM/browser support, custom model endpoints, better benchmarks, horizontal/multi-region scaling, and more robust query compilation.
Miscellaneous
- Name collision with the Helix editor and a historic “Helix” database sparks mild confusion, but is treated as a minor issue.
- Browser-side usage via WASM is requested; LMDB currently blocks this, but origin-private file system APIs and an in-memory engine are being explored.