News outlets are limiting the Internet Archive’s access to their journalism
Archiving vs Paywalls and Timing
- Many suggest a compromise where the Internet Archive (IA) can crawl immediately but delays public access for days or months.
- Argument: immediate availability competes with publishers by letting people bypass paywalls and ads, but delayed access preserves history without harming short‑term revenue.
- Some compare this to JSTOR’s embargo model for journals and suggest an explicit “embargo” directive in robots.txt.
- Others note IA has historically archived even when robots.txt disallowed display, revealing tension between preservation and publisher control.
AI Training, IP, and Micropayments
- A core driver of new blocking is fear that AI companies are scraping IA instead of paying publishers.
- Some call this “training cost minimization” or outright “stealing”; they argue AI firms should license content directly.
- Counterpoint: blocking IA to keep out AI is seen as shortsighted because it sacrifices the public record.
- Several propose paid access for bots while keeping human access free, including live implementations and micropayment schemes.
- Others criticize micropayments as impractical and privacy‑threatening, and note AIs can evade paywalls via mass trial accounts.
Historical Memory and “Memory-Holing”
- Strong concern that blocking IA will enable quiet edits, deletions, and narrative rewrites.
- People report frequent silent article changes and disappearing stories; IA snapshots are seen as critical for fact‑checking.
- Some fear a future where history is “rewritten by the current victors,” especially without independent archives.
- National or physical archives (libraries, microfiche, local partnerships with universities) help, but are less accessible than IA.
Quality and Economics of Local News
- Many see this as a symptom of a collapsing business model: ad revenue is down, paywalls are up, and PE owners prioritize extraction over reporting.
- Some argue blocking IA won’t create new subscribers because people who bypass paywalls weren’t going to pay anyway.
- Others note archives themselves are monetized (e.g., genealogy subscriptions), so outlets logically resist giving them away.
Alternative Archiving Approaches and IA Limitations
- Proposed alternatives include decentralized, torrent‑style archiving that ignores copyright, self‑hosted tools, and browser extensions that let logged‑in readers submit pages.
- There’s demand for anonymous archivists that don’t honor takedowns (with exceptions like CSAM).
- IA’s own anti‑bot measures are reported to make broader research harder, even as it is being blocked more.
Privacy and Ethical Concerns
- Some worry that searchable archives of old local news (hospital admissions, addresses, etc.) create invasive public dossiers.
- Others respond that despite these risks, losing an accessible, independent record is more dangerous overall.