How to fix “AI’s original sin”
Training Data, Copyright, and Attribution
- Many agree transparency about training data sources would help negotiations and accountability, but see it as insufficient on its own.
- Some want traceable attribution and payout schemes per contributing document; others argue this will push models toward zero-cost data or is technically unworkable.
- Several note that LLM training “smears” origins, making per-output attribution fundamentally hard, unlike simple retrieval.
- Detection of copyrighted regurgitation via systems like Content ID is seen as unreliable and easily gamed.
- A strong faction argues that if models can’t exist without mass unlicensed data, then they simply shouldn’t exist.
Economics, Class, and Capitalism
- Debate over whether AI mainly pits “middle class vs middle class” while billionaires capture gains, or whether only a working/owner class distinction makes sense.
- Some fear AI will make labor broadly irrelevant, causing the middle class to shrink or vanish.
- Disagreement over whether capitalism can be “augmented” via regulation to handle AI’s impact, or whether it should be replaced entirely.
- Others push back, saying capitalism isn’t inherently zero-sum and that externalities and redistribution are political, not purely economic, questions.
Ethics of Using Public Data
- One camp sees training on publicly available material as akin to human learning; paying once for access should be enough.
- Another camp calls training on creators’ works without consent a moral “sin” even if legal, and says it drives them off the open web.
- Creators describe feeling like uncompensated, anonymous labor behind others’ profits and reject “you published it, so it’s fair game” reasoning.
Platforms, ToS, and Historical Context
- Discussion over whether major platforms (e.g., video sites, social networks) anticipated AI training use when collecting user data.
- Some argue companies always believed large data troves would later become valuable for AI; others say acquisitions were mainly about advertising.
- Legal friction around scraping vs ToS and prior web-scraping cases is noted but seen as unresolved for LLMs.
Safety, Regulation, and Geopolitics
- Some worry copyright and ethics constraints will make Western AI lose to less-constrained Chinese efforts; others say US builders largely ignore such qualms anyway.
- “AI safety” is seen by some as necessary (e.g., harmful content), by others as overreach that weakens models, or as a funding/PR “dogwhistle.”
Cultural and Generational Reception
- Anecdotes claim generative AI communities skew older; some younger users reportedly see genAI output as “boomer art” lacking cultural cachet.
- Others counter that criticism may be driven by job fears or aesthetics rather than principled copyright concern.