Large language models reduce public knowledge sharing on online Q&A platforms

Effects of LLMs on Q&A activity

  • Many see LLMs as reducing both basic and advanced questions on sites like Stack Overflow, not just duplicates.
  • Some argue this is fine if LLMs efficiently handle common questions; others worry valuable edge‑case knowledge will no longer be documented publicly.
  • There is disagreement over whether observed drops are mostly in “low‑quality” content or also in good, non‑trivial questions; some critique the paper’s methodology and framing.

Answer quality, trust, and hallucinations

  • Human answers are valued for being public, reviewable, and voted on; reputation and style give trust signals.
  • LLM answers are fast, fluent, and often “good enough,” but can hallucinate confidently and at scale, making errors harder to spot.
  • Some posters say they don’t care whether an answer is human‑wrong or AI‑hallucinated, only whether it solves their problem; others stress that misleadingly polished AI output is uniquely risky.

Stack Overflow culture and moderation

  • Many blame long‑standing hostility, pedantry, and aggressive duplicate‑closing for driving users away even before LLMs.
  • Others defend strict curation as necessary for a canonical, low‑duplicate knowledge base and say most complaints stem from misunderstanding SO’s purpose (wiki‑like, not a chat‑help desk).
  • Scaling moderation and maintaining quality with huge user bases is seen as an unsolved problem; comparisons are made to toxic IRC channels and over‑regulated communities elsewhere.

Incentives, ownership, and “theft”

  • Some describe training on user‑generated content as “theft,” primarily from contributors rather than platforms.
  • Others point out existing licenses (e.g., Creative Commons) and argue that sharing knowledge has always involved remixing and derivative work.
  • There is concern that LLMs erode social and reputational rewards for open contributions, potentially shrinking open source and public Q&A.

Long‑term data and model sustainability

  • Worry: as public Q&A declines and AI‑generated “slop” increases, future models may lack fresh, high‑quality human data, causing a feedback loop and eventual degradation.
  • Counterpoints: models can train on code, official docs, GitHub issues, synthetic data (with validation), and paid human‑generated datasets.

Shifts in where and how people ask questions

  • Many developers now start with LLMs, then fall back to search/Q&A if answers fail.
  • Technical discussion is migrating to Discord, GitHub issues, and other semi‑closed spaces, which improves community feel but reduces public, searchable knowledge.
  • LLMs are praised as non‑judgmental tutors and “rubber ducks,” especially for beginners and in education.

Proposed hybrids and future directions

  • Suggestions include Q&A sites embedding AI‑generated candidate answers subject to human voting, or agents that re‑post and upvote LLM‑derived solutions.
  • Others explicitly reject “competing with AI,” seeing curated human Q&A and AI assistants as serving different roles.