Show HN: HelixDB – Open-source vector-graph database for AI applications (Rust)

Positioning vs Other Databases

  • Positioned as a graph-first vector database for hybrid / Graph RAG, competing with systems like FalkorDB, Kuzu, SurrealDB, Neo4j, Memgraph, Cozo, Dgraph, Chroma, etc.
  • Differentiators emphasized:
    • Tight integration of vectors with the graph (incremental indexing instead of separate, re-built vector indexes as with Kuzu).
    • On-disk HNSW index to reduce memory pressure compared to in-RAM approaches.
  • Maintainers claim 1000× faster than Neo4j and 100× faster than TigerGraph on their internal benchmarks, and “much faster” than SurrealDB; several commenters are openly skeptical and request detailed, fair, reproducible benchmarks.

Query Language and LLM Friendliness

  • Helix uses its own query language (HelixQL), described as functional (Gremlin-like) but more readable, and type-safe.
  • Some commenters dislike the bespoke DSL, preferring OpenCypher, GQL, GraphQL, or other standards to ease adoption and LLM query generation.
  • Maintainers argue type safety and unified graph+vector semantics justify a new language, but acknowledge the learning curve and current LLM friction.
  • Proposed mitigations:
    • Grammar-constrained decoding so LLMs emit syntactically valid HelixQL.
    • An MCP-style traversal tool so agents call graph operations instead of writing queries as text.

Architecture, Storage, and Performance Details

  • Implemented in Rust, currently built on LMDB; planning a custom storage engine with in-memory + WASM support.
  • Writes optimized via LMDB features (APPEND flags, duplicate keys) and UUIDv6 keys stored as u128 for better locality and reduced space.
  • Vectors currently stored as Vec<f64>; plan to support f32 and fixed-size arrays plus binary quantization. No hard dimension cap yet, but likely ~64k in future.
  • Sparse search: BM25 planned; commenters suggest SPLADE for non-English text.
  • Core graph traversals are currently single-threaded; parallel LMDB iteration is in progress.

Use Cases, Scalability, and Graph Features

  • Targeted at Graph/Hybrid RAG and knowledge graphs; some users report large speedups moving graph workloads from Postgres.
  • Reported tests up to ~10B edges and ~50M nodes without issues; no published comparative scaling benchmarks yet.
  • Questions about coverage of standard graph algorithms (for GraphRAG, centralities, etc.) are raised but not fully answered in detail.

Licensing, Deployment, and Roadmap

  • Licensed AGPL-3.0: self-hosting is free; closed-source users are expected to pay for a commercial license. Some see this as a blocker for proprietary products.
  • Future plans include: own storage engine, WASM/browser support, custom model endpoints, better benchmarks, horizontal/multi-region scaling, and more robust query compilation.

Miscellaneous

  • Name collision with the Helix editor and a historic “Helix” database sparks mild confusion, but is treated as a minor issue.
  • Browser-side usage via WASM is requested; LMDB currently blocks this, but origin-private file system APIs and an in-memory engine are being explored.