2024-08-25

Google's new pipe syntax in SQL

Topic drift: SQL pipes vs PDF-to-HTML

Several commenters note the HN title is misleading relative to the article, which quickly pivots from the Google SQL paper to using an LLM to turn PDFs into HTML/Markdown.
Some consider the PDF conversion demo more interesting than the SQL syntax itself; others are primarily there for the SQL proposal.

Pipe and FROM‑first SQL syntax

Many like the FROM‑first, piped style for complex analytical queries: it matches execution order, reads like a dataflow, and makes autocomplete and incremental query building easier.
Reported advantages: easier refactoring, multiple WHERE stages (pre/post aggregation), more natural mental model ("chain of filters and transforms"). One person refactored a ~500‑line, 20‑table query and preferred the new style.
Skeptics argue SELECT‑first improves legibility and troubleshooting because the projection and source tables are visible immediately. They question the need to change a 50‑year‑old, widely understood syntax.
There’s concern about nonstandard extensions; one embedded‑DB maintainer is explicitly waiting for the SQL standard and major engines (e.g., Postgres) before embracing FROM‑first syntax long‑term.

Relation to existing piped/query DSLs

Multiple comparisons to LINQ, Kusto, PRQL, dplyr/tidyverse, Kusto-like PQL, Flux, Ecto, PRQL-in-ClickHouse, and others.
Some view the Google proposal as a pragmatic, incremental change that can coexist with SQL; others see it as a too‑small reinvention given that richer non‑SQL DSLs already exist.
There’s a broader wish for a common SQL “core IR” (like MIR/CIR) and mention of Substrait as related work.

Syntax details and bikeshedding

GROUP BY ALL is praised for reducing boilerplate.
Combined clauses like GROUP AND ORDER BY are criticized as unnecessary complexity versus separate GROUP BY / ORDER BY.
Some want JSON‑style {key: value} inserts and universally allowed trailing commas.
Others dismiss the entire clause‑order debate as bikeshedding; they feel SQL is “fine” and already remarkably successful.

PDFs vs semantic HTML/Markdown

Long subthread on why PDFs are hard to copy from: glyph‑level layout, ligatures, legacy font handling, and inconsistent generators.
Some argue PDF was designed as a final rendered format; extraction quality depends on producers embedding proper mappings.
Disagreement over reading papers on phones: some prefer fixed two‑column PDFs; others strongly prefer reflowable HTML/EPUB and see paper‑optimized layouts as outdated.

Related topics