Large language models reduce public knowledge sharing on online Q&A platforms
Effects of LLMs on Q&A activity
- Many see LLMs as reducing both basic and advanced questions on sites like Stack Overflow, not just duplicates.
- Some argue this is fine if LLMs efficiently handle common questions; others worry valuable edge‑case knowledge will no longer be documented publicly.
- There is disagreement over whether observed drops are mostly in “low‑quality” content or also in good, non‑trivial questions; some critique the paper’s methodology and framing.
Answer quality, trust, and hallucinations
- Human answers are valued for being public, reviewable, and voted on; reputation and style give trust signals.
- LLM answers are fast, fluent, and often “good enough,” but can hallucinate confidently and at scale, making errors harder to spot.
- Some posters say they don’t care whether an answer is human‑wrong or AI‑hallucinated, only whether it solves their problem; others stress that misleadingly polished AI output is uniquely risky.
Stack Overflow culture and moderation
- Many blame long‑standing hostility, pedantry, and aggressive duplicate‑closing for driving users away even before LLMs.
- Others defend strict curation as necessary for a canonical, low‑duplicate knowledge base and say most complaints stem from misunderstanding SO’s purpose (wiki‑like, not a chat‑help desk).
- Scaling moderation and maintaining quality with huge user bases is seen as an unsolved problem; comparisons are made to toxic IRC channels and over‑regulated communities elsewhere.
Incentives, ownership, and “theft”
- Some describe training on user‑generated content as “theft,” primarily from contributors rather than platforms.
- Others point out existing licenses (e.g., Creative Commons) and argue that sharing knowledge has always involved remixing and derivative work.
- There is concern that LLMs erode social and reputational rewards for open contributions, potentially shrinking open source and public Q&A.
Long‑term data and model sustainability
- Worry: as public Q&A declines and AI‑generated “slop” increases, future models may lack fresh, high‑quality human data, causing a feedback loop and eventual degradation.
- Counterpoints: models can train on code, official docs, GitHub issues, synthetic data (with validation), and paid human‑generated datasets.
Shifts in where and how people ask questions
- Many developers now start with LLMs, then fall back to search/Q&A if answers fail.
- Technical discussion is migrating to Discord, GitHub issues, and other semi‑closed spaces, which improves community feel but reduces public, searchable knowledge.
- LLMs are praised as non‑judgmental tutors and “rubber ducks,” especially for beginners and in education.
Proposed hybrids and future directions
- Suggestions include Q&A sites embedding AI‑generated candidate answers subject to human voting, or agents that re‑post and upvote LLM‑derived solutions.
- Others explicitly reject “competing with AI,” seeing curated human Q&A and AI assistants as serving different roles.