SQLite JSON at full index speed using generated columns
Indexing JSON in SQLite: generated columns vs expression indexes
- Several commenters note that SQLite already supports “index on expression”, e.g. creating an index directly on
json_extract(...), which can avoid generated columns entirely. - Others argue generated (virtual) columns are safer and more self-documenting: they guarantee the index is used, whereas expression indexes are fragile and can be bypassed by small query changes (different JSON operators, quoting, or path syntax).
- Some point out that views plus expression indexes can give the convenience of columns without materializing them, similar in spirit to the article’s technique.
Reliability, schema design, and constraints
- Using JSON with generated columns is seen as common practice in multiple databases (SQLite, Postgres, SingleStore, SQL Server pre-native-JSON).
- Generated columns (or expression indexes) make it possible to index JSON fields and even enforce foreign key constraints when keys are buried inside JSON, though this may still require separate columns in some systems.
- A few commenters like the pattern of exclusively querying via computed columns so it’s impossible to accidentally write an unindexed JSON query.
JSON vs normalized data
- There is tension between fully normalizing data and using JSON(B) columns.
- Critics of JSON columns highlight difficulty in indexing, enforcing constraints, handling schema migrations, and potential overhead compared to normalized tables.
- Defenders argue JSON shines when:
- Data is tree-shaped or highly nested.
- External APIs or heterogeneous record types would require many relational tables.
- Application code has richer type systems and handles migrations (e.g., via Zod or language-level types).
- Some describe hybrid models: core relational columns plus a JSON “bag of attributes” for rarely queried or highly variable fields.
Performance, alternatives, and missing evidence
- Commenters appreciate the trick and its simplicity, especially for small or embedded apps graduating from raw JSON files to SQLite.
- One person questions the title’s “full speed” claim, noting the lack of explicit benchmarks or query plans in the article.
- DuckDB is raised as a better fit for heavy analytical workloads over large JSON datasets, while SQLite remains favored for embedded and systems use.
- There’s interest in more powerful JSON indexing (e.g., multi-valued indexes for JSON arrays) and Postgres-style GIN indexes; SQLite currently lacks these.
Related formats and systems
- Comparisons are drawn to older XML “document store” databases and to MongoDB / JSONB-style storage.
- A side thread discusses Lite³, a serialized B-tree format for JSON-like data that supports zero-copy queries and in-place updates, contrasted with Postgres JSONB’s immutable, server-bound design.