Google's new pipe syntax in SQL
Topic drift: SQL pipes vs PDF-to-HTML
- Several commenters note the HN title is misleading relative to the article, which quickly pivots from the Google SQL paper to using an LLM to turn PDFs into HTML/Markdown.
- Some consider the PDF conversion demo more interesting than the SQL syntax itself; others are primarily there for the SQL proposal.
Pipe and FROM‑first SQL syntax
- Many like the FROM‑first, piped style for complex analytical queries: it matches execution order, reads like a dataflow, and makes autocomplete and incremental query building easier.
- Reported advantages: easier refactoring, multiple WHERE stages (pre/post aggregation), more natural mental model ("chain of filters and transforms"). One person refactored a ~500‑line, 20‑table query and preferred the new style.
- Skeptics argue SELECT‑first improves legibility and troubleshooting because the projection and source tables are visible immediately. They question the need to change a 50‑year‑old, widely understood syntax.
- There’s concern about nonstandard extensions; one embedded‑DB maintainer is explicitly waiting for the SQL standard and major engines (e.g., Postgres) before embracing FROM‑first syntax long‑term.
Relation to existing piped/query DSLs
- Multiple comparisons to LINQ, Kusto, PRQL, dplyr/tidyverse, Kusto-like PQL, Flux, Ecto, PRQL-in-ClickHouse, and others.
- Some view the Google proposal as a pragmatic, incremental change that can coexist with SQL; others see it as a too‑small reinvention given that richer non‑SQL DSLs already exist.
- There’s a broader wish for a common SQL “core IR” (like MIR/CIR) and mention of Substrait as related work.
Syntax details and bikeshedding
- GROUP BY ALL is praised for reducing boilerplate.
- Combined clauses like
GROUP AND ORDER BYare criticized as unnecessary complexity versus separate GROUP BY / ORDER BY. - Some want JSON‑style
{key: value}inserts and universally allowed trailing commas. - Others dismiss the entire clause‑order debate as bikeshedding; they feel SQL is “fine” and already remarkably successful.
PDFs vs semantic HTML/Markdown
- Long subthread on why PDFs are hard to copy from: glyph‑level layout, ligatures, legacy font handling, and inconsistent generators.
- Some argue PDF was designed as a final rendered format; extraction quality depends on producers embedding proper mappings.
- Disagreement over reading papers on phones: some prefer fixed two‑column PDFs; others strongly prefer reflowable HTML/EPUB and see paper‑optimized layouts as outdated.