2024-08-21

The semantic web is now widely adopted

LLMs vs. Semantic Web

Many argue LLMs will make manual semantic markup redundant: they already do high‑accuracy extraction/categorization for many tasks and will get cheaper and better.
Counterpoints:
- LLMs “routinely get stuff wrong,” including catastrophic errors (e.g., mislabeling people as criminals).
- Automation scales errors: even a 1% error rate can mean thousands of serious mistakes per day.
- LLMs are opaque, non‑deterministic, and trained on SEO spam and blogspam, so they inherit those pathologies.
Some see LLMs and semantic tech as complementary: LLMs can help build or fill out metadata and knowledge graphs; structured data and ontologies can ground and constrain LLMs (“neuro‑symbolic” approaches).

Incentives, Trust, and Business Case

Key reason cited for “classic” Semantic Web failure: no business case for publishing rich open data. It reduces clicks and helps competitors and aggregators.
Strong incentives exist to lie or game metadata (SEO), so publishers cannot be the sole source of semantic truth.
Trust is unsolved: users need ways to judge reliability, provenance, and authority; simple vocabularies don’t address this.

Current Adoption and Practical Uses

JSON‑LD + schema.org is widely deployed for SEO, rich Google results, and link previews; CMS plugins generate it automatically.
Other formats appear in niches: RDF/XML in PDFs and archives, MARC/BibTeX in libraries, RSS/Atom, microformats, RDFa/Microdata.
Semantic tech is reportedly used in enterprise/sector contexts (e.g., European electricity grids, procurement, enterprise knowledge graphs), often internally.

Gap from Original Semantic Web Vision

Many see today’s usage (author/title/image/date for previews, basic structured data) as a drastic retreat from the original “Web‑scale queryable knowledge graph” dream.
Lack of a “killer app” for ordinary users is emphasized; benefits mainly accrue to large platforms and scrapers.

Formats, Tooling, and Developer Experience

JSON‑LD is seen as pragmatic for CMSes but aesthetically disliked (data in <script> blobs, duplication, namespaces).
Microformats/RDFa embed semantics inline but are harder to maintain and poorly supported by tooling and browsers.
Ontology work is considered cognitively heavy; global, static ontologies are viewed as unrealistic, and mapping between competing schemas is hard.

Search, Semantics, and User Control

Pure keyword search is limited but transparent and user‑driven; semantic/ML search can obscure user intent and favor “average” interpretations.
Some propose user‑controlled classifiers and knowledge graphs to mitigate the publisher vs. reader incentive misalignment.
Overall, participants see the “semantic web” ideas as valuable, but adoption, incentives, and human–machine meaning gaps remain unresolved.

Related topics