X changes its terms to bar training of AI models using its content

Platform vs. individual control over AI training

  • Several commenters argue that if a platform can ban training on “its” corpus, individual artists and authors should have the same practical power.
  • Others note that large entities (e.g., news orgs, big platforms) can afford monitoring and lawsuits, while individual creators usually can’t.
  • There is disagreement on whether social media should assert such rights: some want it to set a precedent against AI training, others see it as corporate enclosure of a public commons.

Legal uncertainty and fair use

  • Extended back-and-forth on whether training on publicly available content is fair use.
  • Clarifications that in U.S. law, fair use is an affirmative defense the model trainer must raise, not something plaintiffs must disprove upfront.
  • One side views training on copyrighted works (especially paid books) as clear piracy, especially when models can reproduce long passages.
  • Others stress that human art is derivative too; they distinguish between (1) training and private use vs. (2) distributing a model that can substitute for the source.
  • Multiple people argue current copyright law is ill-suited for LLMs and will likely be overhauled.

Technical and practical enforceability

  • Skepticism that ToS can meaningfully stop scraping; crawlers don’t read ToS and clandestine data brokers already route traffic through user devices.
  • Suggestion for a web standard (HTML tag or robots.txt directive) for “no training,” plus harsh legal penalties for violators.
  • Counterarguments: trivial workarounds via intermediaries, likely “Do Not Track 2.0” non-enforcement, and difficulties proving knowledge of illicit data origins.

Ethical and societal debate about AI

  • One camp wants to halt or heavily restrict training, citing environmental damage, biodiversity loss, and techno-overreach.
  • Another camp wants maximal acceleration (drug discovery, longevity, space colonization), viewing human existence as brief and expendable compared to potential progress.
  • Some explicitly prefer preserving the natural world over advancing human technology.

Copyright duration, public-domain corpora, and FOSS

  • Long discussion of excessive copyright terms (e.g., life+70) vs. benefits of a shorter term like 50 years from publication.
  • Notes that copyright underpins GPL and other open-source licenses; shortening terms would also affect Linux and FOSS, not just media conglomerates.
  • Interest in AI models trained purely on public-domain or clearly licensed datasets (pre-1926 texts, PG19, “lawful” coding corpora).

Business motives and Musk/X specifics

  • Some see X’s move as protecting xAI’s exclusive access to X’s data, not as a principled defense of user rights.
  • Others think cutting off AI customers is odd financially but consistent if X’s main value is feeding xAI.
  • Recurrent criticism of corporate hypocrisy: platforms extract and monetize user content while restricting others’ use.

User compensation and data rights

  • Calls for mechanisms (e.g., “VAT for content,” revenue-sharing, residuals) that pay contributors whose data trains profitable models.
  • Back-of-the-envelope math suggests most individuals would get trivial sums, but some see symbolic or structural value in the idea.
  • GDPR is cited as offering stronger notions of data ownership/consent than typical U.S. frameworks, but public-space and usage carve-outs still apply.