How to fix “AI’s original sin”

Training Data, Copyright, and Attribution

  • Many agree transparency about training data sources would help negotiations and accountability, but see it as insufficient on its own.
  • Some want traceable attribution and payout schemes per contributing document; others argue this will push models toward zero-cost data or is technically unworkable.
  • Several note that LLM training “smears” origins, making per-output attribution fundamentally hard, unlike simple retrieval.
  • Detection of copyrighted regurgitation via systems like Content ID is seen as unreliable and easily gamed.
  • A strong faction argues that if models can’t exist without mass unlicensed data, then they simply shouldn’t exist.

Economics, Class, and Capitalism

  • Debate over whether AI mainly pits “middle class vs middle class” while billionaires capture gains, or whether only a working/owner class distinction makes sense.
  • Some fear AI will make labor broadly irrelevant, causing the middle class to shrink or vanish.
  • Disagreement over whether capitalism can be “augmented” via regulation to handle AI’s impact, or whether it should be replaced entirely.
  • Others push back, saying capitalism isn’t inherently zero-sum and that externalities and redistribution are political, not purely economic, questions.

Ethics of Using Public Data

  • One camp sees training on publicly available material as akin to human learning; paying once for access should be enough.
  • Another camp calls training on creators’ works without consent a moral “sin” even if legal, and says it drives them off the open web.
  • Creators describe feeling like uncompensated, anonymous labor behind others’ profits and reject “you published it, so it’s fair game” reasoning.

Platforms, ToS, and Historical Context

  • Discussion over whether major platforms (e.g., video sites, social networks) anticipated AI training use when collecting user data.
  • Some argue companies always believed large data troves would later become valuable for AI; others say acquisitions were mainly about advertising.
  • Legal friction around scraping vs ToS and prior web-scraping cases is noted but seen as unresolved for LLMs.

Safety, Regulation, and Geopolitics

  • Some worry copyright and ethics constraints will make Western AI lose to less-constrained Chinese efforts; others say US builders largely ignore such qualms anyway.
  • “AI safety” is seen by some as necessary (e.g., harmful content), by others as overreach that weakens models, or as a funding/PR “dogwhistle.”

Cultural and Generational Reception

  • Anecdotes claim generative AI communities skew older; some younger users reportedly see genAI output as “boomer art” lacking cultural cachet.
  • Others counter that criticism may be driven by job fears or aesthetics rather than principled copyright concern.