2025-05-13

Show HN: HelixDB – Open-source vector-graph database for AI applications (Rust)

Positioning vs Other Databases

Positioned as a graph-first vector database for hybrid / Graph RAG, competing with systems like FalkorDB, Kuzu, SurrealDB, Neo4j, Memgraph, Cozo, Dgraph, Chroma, etc.
Differentiators emphasized:
- Tight integration of vectors with the graph (incremental indexing instead of separate, re-built vector indexes as with Kuzu).
- On-disk HNSW index to reduce memory pressure compared to in-RAM approaches.
Maintainers claim 1000× faster than Neo4j and 100× faster than TigerGraph on their internal benchmarks, and “much faster” than SurrealDB; several commenters are openly skeptical and request detailed, fair, reproducible benchmarks.

Query Language and LLM Friendliness

Helix uses its own query language (HelixQL), described as functional (Gremlin-like) but more readable, and type-safe.
Some commenters dislike the bespoke DSL, preferring OpenCypher, GQL, GraphQL, or other standards to ease adoption and LLM query generation.
Maintainers argue type safety and unified graph+vector semantics justify a new language, but acknowledge the learning curve and current LLM friction.
Proposed mitigations:
- Grammar-constrained decoding so LLMs emit syntactically valid HelixQL.
- An MCP-style traversal tool so agents call graph operations instead of writing queries as text.

Architecture, Storage, and Performance Details

Implemented in Rust, currently built on LMDB; planning a custom storage engine with in-memory + WASM support.
Writes optimized via LMDB features (APPEND flags, duplicate keys) and UUIDv6 keys stored as u128 for better locality and reduced space.
Vectors currently stored as Vec<f64>; plan to support f32 and fixed-size arrays plus binary quantization. No hard dimension cap yet, but likely ~64k in future.
Sparse search: BM25 planned; commenters suggest SPLADE for non-English text.
Core graph traversals are currently single-threaded; parallel LMDB iteration is in progress.

Use Cases, Scalability, and Graph Features

Targeted at Graph/Hybrid RAG and knowledge graphs; some users report large speedups moving graph workloads from Postgres.
Reported tests up to ~10B edges and ~50M nodes without issues; no published comparative scaling benchmarks yet.
Questions about coverage of standard graph algorithms (for GraphRAG, centralities, etc.) are raised but not fully answered in detail.

Licensing, Deployment, and Roadmap

Licensed AGPL-3.0: self-hosting is free; closed-source users are expected to pay for a commercial license. Some see this as a blocker for proprietary products.
Future plans include: own storage engine, WASM/browser support, custom model endpoints, better benchmarks, horizontal/multi-region scaling, and more robust query compilation.

Miscellaneous

Name collision with the Helix editor and a historic “Helix” database sparks mild confusion, but is treated as a minor issue.
Browser-side usage via WASM is requested; LMDB currently blocks this, but origin-private file system APIs and an in-memory engine are being explored.

Related topics