2025-01-12

Aaron Swartz and Sam Altman

Comparison of Aaron Swartz and Sam Altman

Many contrast Swartz’s direct, activist mass-access approach (JSTOR scraping to liberate papers) with Altman’s corporate, profit-driven use of scraped data for AI.
Some argue Altman “failed upward” through social skills and cultivating powerful backers (e.g., being chosen to lead YC), more than through clear startup wins; others counter that persuading key people and steering OpenAI is itself real merit.
Several comments note Altman’s talent for power-building and political maneuvering, especially in OpenAI’s internal board conflicts.

Swartz’s Ability and Legacy

Swartz is widely praised as precociously brilliant (e.g., RSS involvement as a teenager, Infogami, writing, activism).
A few challenge the “technical genius” label, calling RSS relatively simple and pointing out lack of hard evidence for some claims about his role.
Others highlight his blog, activism against SOPA, and influence on “remix culture” as his primary legacy.

Legality and Ethics: JSTOR vs OpenAI

Key distinction drawn: Swartz physically broke into MIT infrastructure and caused JSTOR disruption; OpenAI scraped data available on the public internet.
Some say Swartz clearly violated computer and copyright law, while OpenAI operates in a legal gray area around fair use and “transformative” AI training.
Others argue US copyright law already makes OpenAI’s conduct illegal, regardless of lack of enforcement or ongoing lawsuits.

Copyright, Plagiarism, and LLM Memorization

Strong debate over whether LLMs “learn like humans” vs “mechanically encode/compress” data.
One side claims models store only lossy semantic representations, can only emit short, imperfect excerpts, and plagiarism requires false authorship claims.
The other side cites evidence of long verbatim outputs (e.g., news articles, public documents), argues that even partial reproduction can infringe copyright, and stresses that copyright law cares about copying, not “intent” or anthropomorphic notions of learning.
Disagreement on whether training on copyrighted works without consent should be legal; views range from “abolish all copyright” to “new training rights” to strict enforcement against AI companies.

Corporate Power, Liability, and Double Standards

Several comments see a systemic double standard: individuals like Swartz get aggressively prosecuted; corporations doing similar or larger-scale data use face mild or no personal consequences.
Discussion of corporations as “gangs” protected by weak law and captured states; criticism of capitalism’s incentive to exploit data for private gain while punishing public-minded actors.

Broader Reflections on AI and Industry

Some users express gratitude for modern LLMs improving productivity and don’t mind companies profiting.
Others worry about social harm, copyright abuse, and hype, comparing AI to earlier “black-hat” content-spinning tools and to the dot-com bubble.
There is side discussion on whether authors and artists should have a right to opt out of training datasets; responses sharply divided.

Related topics