2024-07-21

When ChatGPT summarises, it does nothing of the kind

Nature of LLM “Summaries”

Many commenters say current LLMs mostly “shorten” text, often missing critical or novel points, especially conclusions or minority arguments.
Others report that for many articles, GPT‑4‑class models do capture their own perceived “main points,” highlighting that what counts as “key” is subjective.
Some argue a summary’s goal is just to help decide whether to read the full text; others want summaries that can safely replace reading.

Prompting, Methodology, and System Design

Several criticize the article’s lack of details: model version, prompt, number of runs, exact errors.
Multiple people say “just call summarize()” is inadequate. They describe more elaborate pipelines:
- Chunking text, embedding + clustering, extracting key quotes, verifying against source, then having the LLM rewrite in prose.
- Multi-step prompts with explicit instructions to include niche or rarely mentioned points.
API behavior and web UI helpers may differ; long-context usage degrades accuracy.

Use Cases, Reliability, and Benchmarks

Experiences vary widely: some find LLMs excellent for condensing their own writing, grant applications, meeting notes, or HN threads; others find them frequently wrong or overconfident.
Error tolerance is seen as use‑case dependent: acceptable for blogs or “fluff,” not for medical records or high‑stakes domains.
Several call for objective summarization benchmarks; others note this is an active research area.

Context Windows, RAG, and Technical Limits

Long context windows and sliding attention are blamed for “content drift” and skipped details; splitting into smaller overlapping chunks is often recommended.
Opinions on RAG diverge: some call it overhyped and hallucination‑prone; others find simple vector search plus light LLM summarization effective.

AI Hype, Skepticism, and “Understanding”

Thread reflects both strong skepticism (“toy,” “dangerous,” overhyped like metaverse/NFTs) and strong optimism (LLMs as major productivity tools, part of a broader ML trend).
There is debate over whether LLMs genuinely “understand” text or are sophisticated pattern matchers; failures on math and niche tasks are cited against “understanding.”
Trust is a recurring concern: if a summary must always be checked against the original, its practical value is questioned.

Related topics