2024-10-13

Large language models reduce public knowledge sharing on online Q&A platforms

Effects of LLMs on Q&A activity

Many see LLMs as reducing both basic and advanced questions on sites like Stack Overflow, not just duplicates.
Some argue this is fine if LLMs efficiently handle common questions; others worry valuable edge‑case knowledge will no longer be documented publicly.
There is disagreement over whether observed drops are mostly in “low‑quality” content or also in good, non‑trivial questions; some critique the paper’s methodology and framing.

Answer quality, trust, and hallucinations

Human answers are valued for being public, reviewable, and voted on; reputation and style give trust signals.
LLM answers are fast, fluent, and often “good enough,” but can hallucinate confidently and at scale, making errors harder to spot.
Some posters say they don’t care whether an answer is human‑wrong or AI‑hallucinated, only whether it solves their problem; others stress that misleadingly polished AI output is uniquely risky.

Stack Overflow culture and moderation

Many blame long‑standing hostility, pedantry, and aggressive duplicate‑closing for driving users away even before LLMs.
Others defend strict curation as necessary for a canonical, low‑duplicate knowledge base and say most complaints stem from misunderstanding SO’s purpose (wiki‑like, not a chat‑help desk).
Scaling moderation and maintaining quality with huge user bases is seen as an unsolved problem; comparisons are made to toxic IRC channels and over‑regulated communities elsewhere.

Incentives, ownership, and “theft”

Some describe training on user‑generated content as “theft,” primarily from contributors rather than platforms.
Others point out existing licenses (e.g., Creative Commons) and argue that sharing knowledge has always involved remixing and derivative work.
There is concern that LLMs erode social and reputational rewards for open contributions, potentially shrinking open source and public Q&A.

Long‑term data and model sustainability

Worry: as public Q&A declines and AI‑generated “slop” increases, future models may lack fresh, high‑quality human data, causing a feedback loop and eventual degradation.
Counterpoints: models can train on code, official docs, GitHub issues, synthetic data (with validation), and paid human‑generated datasets.

Shifts in where and how people ask questions

Many developers now start with LLMs, then fall back to search/Q&A if answers fail.
Technical discussion is migrating to Discord, GitHub issues, and other semi‑closed spaces, which improves community feel but reduces public, searchable knowledge.
LLMs are praised as non‑judgmental tutors and “rubber ducks,” especially for beginners and in education.

Proposed hybrids and future directions

Suggestions include Q&A sites embedding AI‑generated candidate answers subject to human voting, or agents that re‑post and upvote LLM‑derived solutions.
Others explicitly reject “competing with AI,” seeing curated human Q&A and AI assistants as serving different roles.

Related topics