Show HN: We open sourced our entire text-to-SQL product
Project capabilities & architecture
- Handles multi-table schemas and joins via a schema-linking step that identifies relevant tables, columns, and foreign keys.
- Can be fine-tuned on NL↔SQL pairs or use few-shot prompting; keeps a catalog of schema details and low-cardinality values.
- Uses an agent loop: collects context (schema, docs, query history), generates SQL, executes a limited-result trial query to catch errors, and retries.
- Produces a confidence score for each query.
- LLM-agnostic: works with hosted APIs (e.g., OpenAI) and self-hosted models (benchmarked with Mixtral for tool selection and CodeLlama for code generation).
- Uses vector stores (Chroma, Pinecone, Astra) with extensible interface.
Accuracy, complexity, and limitations
- Several commenters say off-the-shelf LLMs handle only simple, single-table queries; this project aims at “enterprise-grade” schemas with complex joins and aggregations.
- Others argue syntax correctness is mostly solved by large models + simple retry loops; harder problems are schema understanding, naming quirks, and business semantics.
- Some express concern about rare but serious join mistakes and lack of formal guarantees; others counter that human analysts also err and that expectations for AI are unrealistically high.
- Latency is reported around 20–30 seconds; acceptable for some production use cases, a blocker for others.
Security, governance, and deployment
- Tool blocks DML/DDL statements by design.
- For multi-tenant security, maintainers strongly recommend database row-level security; some find this too complex, especially on systems without native RLS.
- Data governance is reported as the main enterprise blocker: concerns about off-prem data, LLM training, and regulatory compliance.
- Open sourcing plus self-hosting is framed as a way to alleviate governance and trust concerns.
Open-sourcing, licensing, and business model
- Entire stack is claimed to be open source under Apache 2.0.
- Discussion about how to build a sustainable business on a fully open stack; comparisons to other OSS companies and fears of competitors rehosting cheaply.
- Some speculate open-sourcing can be an end-of-runway move; others see it as a traction and enterprise-adoption strategy.
- Maintainers state they intend to keep maintaining the project.
Target users, use cases, and reception
- Primary target is developers embedding text-to-SQL in SaaS products (e.g., analytics in CRM, payroll, customer support tools), not replacing internal analysts.
- Debate over whether users should “just learn SQL” vs clear demand for natural-language access, especially to reduce ad-hoc requests to data teams.
- Mixed sentiment: many express excitement and see it as historically important; others remain skeptical about reliability, safety, and real-world business value.