Show HN: We open sourced our entire text-to-SQL product

Project capabilities & architecture

  • Handles multi-table schemas and joins via a schema-linking step that identifies relevant tables, columns, and foreign keys.
  • Can be fine-tuned on NL↔SQL pairs or use few-shot prompting; keeps a catalog of schema details and low-cardinality values.
  • Uses an agent loop: collects context (schema, docs, query history), generates SQL, executes a limited-result trial query to catch errors, and retries.
  • Produces a confidence score for each query.
  • LLM-agnostic: works with hosted APIs (e.g., OpenAI) and self-hosted models (benchmarked with Mixtral for tool selection and CodeLlama for code generation).
  • Uses vector stores (Chroma, Pinecone, Astra) with extensible interface.

Accuracy, complexity, and limitations

  • Several commenters say off-the-shelf LLMs handle only simple, single-table queries; this project aims at “enterprise-grade” schemas with complex joins and aggregations.
  • Others argue syntax correctness is mostly solved by large models + simple retry loops; harder problems are schema understanding, naming quirks, and business semantics.
  • Some express concern about rare but serious join mistakes and lack of formal guarantees; others counter that human analysts also err and that expectations for AI are unrealistically high.
  • Latency is reported around 20–30 seconds; acceptable for some production use cases, a blocker for others.

Security, governance, and deployment

  • Tool blocks DML/DDL statements by design.
  • For multi-tenant security, maintainers strongly recommend database row-level security; some find this too complex, especially on systems without native RLS.
  • Data governance is reported as the main enterprise blocker: concerns about off-prem data, LLM training, and regulatory compliance.
  • Open sourcing plus self-hosting is framed as a way to alleviate governance and trust concerns.

Open-sourcing, licensing, and business model

  • Entire stack is claimed to be open source under Apache 2.0.
  • Discussion about how to build a sustainable business on a fully open stack; comparisons to other OSS companies and fears of competitors rehosting cheaply.
  • Some speculate open-sourcing can be an end-of-runway move; others see it as a traction and enterprise-adoption strategy.
  • Maintainers state they intend to keep maintaining the project.

Target users, use cases, and reception

  • Primary target is developers embedding text-to-SQL in SaaS products (e.g., analytics in CRM, payroll, customer support tools), not replacing internal analysts.
  • Debate over whether users should “just learn SQL” vs clear demand for natural-language access, especially to reduce ad-hoc requests to data teams.
  • Mixed sentiment: many express excitement and see it as historically important; others remain skeptical about reliability, safety, and real-world business value.