X changes its terms to bar training of AI models using its content
Platform vs. individual control over AI training
- Several commenters argue that if a platform can ban training on “its” corpus, individual artists and authors should have the same practical power.
- Others note that large entities (e.g., news orgs, big platforms) can afford monitoring and lawsuits, while individual creators usually can’t.
- There is disagreement on whether social media should assert such rights: some want it to set a precedent against AI training, others see it as corporate enclosure of a public commons.
Legal uncertainty and fair use
- Extended back-and-forth on whether training on publicly available content is fair use.
- Clarifications that in U.S. law, fair use is an affirmative defense the model trainer must raise, not something plaintiffs must disprove upfront.
- One side views training on copyrighted works (especially paid books) as clear piracy, especially when models can reproduce long passages.
- Others stress that human art is derivative too; they distinguish between (1) training and private use vs. (2) distributing a model that can substitute for the source.
- Multiple people argue current copyright law is ill-suited for LLMs and will likely be overhauled.
Technical and practical enforceability
- Skepticism that ToS can meaningfully stop scraping; crawlers don’t read ToS and clandestine data brokers already route traffic through user devices.
- Suggestion for a web standard (HTML tag or robots.txt directive) for “no training,” plus harsh legal penalties for violators.
- Counterarguments: trivial workarounds via intermediaries, likely “Do Not Track 2.0” non-enforcement, and difficulties proving knowledge of illicit data origins.
Ethical and societal debate about AI
- One camp wants to halt or heavily restrict training, citing environmental damage, biodiversity loss, and techno-overreach.
- Another camp wants maximal acceleration (drug discovery, longevity, space colonization), viewing human existence as brief and expendable compared to potential progress.
- Some explicitly prefer preserving the natural world over advancing human technology.
Copyright duration, public-domain corpora, and FOSS
- Long discussion of excessive copyright terms (e.g., life+70) vs. benefits of a shorter term like 50 years from publication.
- Notes that copyright underpins GPL and other open-source licenses; shortening terms would also affect Linux and FOSS, not just media conglomerates.
- Interest in AI models trained purely on public-domain or clearly licensed datasets (pre-1926 texts, PG19, “lawful” coding corpora).
Business motives and Musk/X specifics
- Some see X’s move as protecting xAI’s exclusive access to X’s data, not as a principled defense of user rights.
- Others think cutting off AI customers is odd financially but consistent if X’s main value is feeding xAI.
- Recurrent criticism of corporate hypocrisy: platforms extract and monetize user content while restricting others’ use.
User compensation and data rights
- Calls for mechanisms (e.g., “VAT for content,” revenue-sharing, residuals) that pay contributors whose data trains profitable models.
- Back-of-the-envelope math suggests most individuals would get trivial sums, but some see symbolic or structural value in the idea.
- GDPR is cited as offering stronger notions of data ownership/consent than typical U.S. frameworks, but public-space and usage carve-outs still apply.