2024-06-20

How to fix “AI’s original sin”

Training Data, Copyright, and Attribution

Many agree transparency about training data sources would help negotiations and accountability, but see it as insufficient on its own.
Some want traceable attribution and payout schemes per contributing document; others argue this will push models toward zero-cost data or is technically unworkable.
Several note that LLM training “smears” origins, making per-output attribution fundamentally hard, unlike simple retrieval.
Detection of copyrighted regurgitation via systems like Content ID is seen as unreliable and easily gamed.
A strong faction argues that if models can’t exist without mass unlicensed data, then they simply shouldn’t exist.

Economics, Class, and Capitalism

Debate over whether AI mainly pits “middle class vs middle class” while billionaires capture gains, or whether only a working/owner class distinction makes sense.
Some fear AI will make labor broadly irrelevant, causing the middle class to shrink or vanish.
Disagreement over whether capitalism can be “augmented” via regulation to handle AI’s impact, or whether it should be replaced entirely.
Others push back, saying capitalism isn’t inherently zero-sum and that externalities and redistribution are political, not purely economic, questions.

Ethics of Using Public Data

One camp sees training on publicly available material as akin to human learning; paying once for access should be enough.
Another camp calls training on creators’ works without consent a moral “sin” even if legal, and says it drives them off the open web.
Creators describe feeling like uncompensated, anonymous labor behind others’ profits and reject “you published it, so it’s fair game” reasoning.

Platforms, ToS, and Historical Context

Discussion over whether major platforms (e.g., video sites, social networks) anticipated AI training use when collecting user data.
Some argue companies always believed large data troves would later become valuable for AI; others say acquisitions were mainly about advertising.
Legal friction around scraping vs ToS and prior web-scraping cases is noted but seen as unresolved for LLMs.

Safety, Regulation, and Geopolitics

Some worry copyright and ethics constraints will make Western AI lose to less-constrained Chinese efforts; others say US builders largely ignore such qualms anyway.
“AI safety” is seen by some as necessary (e.g., harmful content), by others as overreach that weakens models, or as a funding/PR “dogwhistle.”

Cultural and Generational Reception

Anecdotes claim generative AI communities skew older; some younger users reportedly see genAI output as “boomer art” lacking cultural cachet.
Others counter that criticism may be driven by job fears or aesthetics rather than principled copyright concern.

Related topics