2025-03-20

ChatGPT hit with privacy complaint over defamatory hallucinations

Data poisoning, manipulation & downstream harms

Several comments frame indiscriminate web scraping and non-auditable training as “corporate recklessness,” not innocent use of public data.
Hypothetical “poison” scenarios are raised: hidden instructions in web documents causing models (or derivative scoring systems) to quietly sabotage individuals in hiring, credit, health, or parole contexts.
Others note that LLMs are already trained on dubious sources; they see greater risk in deliberate, agenda-driven training or censorship than in isolated poisoned pages.
Comparisons are made to early “Google bombing” and speculation that hostile actors could flood training data to shift model behavior or even markets.

LLMs vs search engines & traditional publishing

One side argues LLM risks are akin to misusing Google or unvetted articles: the real issue is how downstream systems rely on them.
Counterpoints highlight key differences:
- Poison in an LLM is embedded behavior, hard to detect or remove.
- Web pages are static and de-indexable; model weights aren’t.
- LLMs can generate novel, source-less defamatory text.
Some stress that search engines already honor takedown laws, while LLMs currently lack equivalent, robust mechanisms.

Defamation, liability & disclaimers

Disagreement over whether generic “may be wrong” disclaimers meaningfully shield companies from defamation or GDPR duties.
Some think holding providers liable would make LLMs unusable or unavailable in strict jurisdictions; others respond that products which must disown their own outputs are fundamentally defective.
Analogies are drawn to bath salts sold as “not for human consumption” or chatbots whose lies have already produced legal liability in other sectors.

Mitigations & product changes

Discussion notes that the specific Norwegian case now yields an answer grounded in web search rather than pure model memory.
There is skepticism this fully fixes the problem: hallucinations remain possible, the model still struggles to say “I don’t know,” and similar errors may affect other names.
Proposals include: mandatory web-grounding with citations; blocking outputs involving specific names; or treating AI outputs as publisher content, with corresponding responsibility.

Hallucinations, usefulness & overclaiming

Some argue hallucination is inherent and unrecoverable for high-stakes uses, implying certain applications (legal, credit, reputational) should be off-limits.
Others say LLMs are valuable as “idea generators” or assistants when the user already has domain knowledge and can verify; they are dangerous as authoritative information sources.
Critics emphasize that marketing and UI portray these systems as reliable answer engines, not “daydream machines,” creating a mismatch between design, hype, and legal expectations.

Regulation, rights & GDPR

Multiple comments point to GDPR’s requirements for accuracy and rights to rectification/erasure of personal data, questioning how that can coexist with opaque, weight-encoded training on PII.
Some see complaints backed by privacy NGOs as essential pressure to force large vendors to take accountability; others fear new liabilities will chill open-source AI and expand surveillance or censorship.
There is a recurring tension between wanting strong remedies for individuals defamed by models and concern that overbroad rules could effectively ban or severely limit LLM deployment in certain regions.

Related topics