News outlets are limiting the Internet Archive’s access to their journalism

Archiving vs Paywalls and Timing

  • Many suggest a compromise where the Internet Archive (IA) can crawl immediately but delays public access for days or months.
  • Argument: immediate availability competes with publishers by letting people bypass paywalls and ads, but delayed access preserves history without harming short‑term revenue.
  • Some compare this to JSTOR’s embargo model for journals and suggest an explicit “embargo” directive in robots.txt.
  • Others note IA has historically archived even when robots.txt disallowed display, revealing tension between preservation and publisher control.

AI Training, IP, and Micropayments

  • A core driver of new blocking is fear that AI companies are scraping IA instead of paying publishers.
  • Some call this “training cost minimization” or outright “stealing”; they argue AI firms should license content directly.
  • Counterpoint: blocking IA to keep out AI is seen as shortsighted because it sacrifices the public record.
  • Several propose paid access for bots while keeping human access free, including live implementations and micropayment schemes.
  • Others criticize micropayments as impractical and privacy‑threatening, and note AIs can evade paywalls via mass trial accounts.

Historical Memory and “Memory-Holing”

  • Strong concern that blocking IA will enable quiet edits, deletions, and narrative rewrites.
  • People report frequent silent article changes and disappearing stories; IA snapshots are seen as critical for fact‑checking.
  • Some fear a future where history is “rewritten by the current victors,” especially without independent archives.
  • National or physical archives (libraries, microfiche, local partnerships with universities) help, but are less accessible than IA.

Quality and Economics of Local News

  • Many see this as a symptom of a collapsing business model: ad revenue is down, paywalls are up, and PE owners prioritize extraction over reporting.
  • Some argue blocking IA won’t create new subscribers because people who bypass paywalls weren’t going to pay anyway.
  • Others note archives themselves are monetized (e.g., genealogy subscriptions), so outlets logically resist giving them away.

Alternative Archiving Approaches and IA Limitations

  • Proposed alternatives include decentralized, torrent‑style archiving that ignores copyright, self‑hosted tools, and browser extensions that let logged‑in readers submit pages.
  • There’s demand for anonymous archivists that don’t honor takedowns (with exceptions like CSAM).
  • IA’s own anti‑bot measures are reported to make broader research harder, even as it is being blocked more.

Privacy and Ethical Concerns

  • Some worry that searchable archives of old local news (hospital admissions, addresses, etc.) create invasive public dossiers.
  • Others respond that despite these risks, losing an accessible, independent record is more dangerous overall.