38% of webpages that existed in 2013 are no longer accessible a decade later

Code reuse, mirroring, and copyright

  • Disagreement on whether you can mirror useful code from random websites to GitHub.
  • One view: license “depends,” but small snippets for educational/archival purposes likely fall under fair use (in the U.S.), and much code is closer to “idea” than “expression.
  • Counterpoints:
    • Fair use is an after-the-fact legal defense, not a shield.
    • Many jurisdictions don’t have fair use at all; in some (e.g., Germany) mirroring could clearly be infringement.
    • Safest route is to follow the explicit license or just keep private copies.

Ephemerality vs preservation (“feature or bug”)

  • Some argue disappearance is good: forgetting is healthy, storage and attention are finite, and we shouldn’t try to fight entropy.
  • Others push back: rediscovering lost scientific/cultural knowledge is costly; future historians and archaeologists benefit from “store everything,” including mundane content (analogy to Sumerian clay-tablet trash heaps).
  • Debate over “worthy content”: highly subjective; we can’t predict what will matter in 100–10,000 years.
  • Moral angle: if disappearance is treated as a “feature,” who decides what vanishes, especially when data is in private platforms or behind paywalls?

Archiving practices and tools

  • Strong praise for the Internet Archive; several commenters donate and use it heavily, but worry about its legal exposure.
  • Many now save local copies (PDF, MHTML, SingleFile, Epub) instead of merely bookmarking.
  • Tools mentioned: ArchiveBox, Linkwarden, bookmarklets that auto-save to Wayback, browser extensions, self-hosted link archives.
  • Some post their site content to public Git repos so others can mirror or rebuild it.

Technical causes of decay

  • Dynamic and database-backed sites are fragile: code rot, dependency changes, CVEs, framework deprecations, and unmaintained APIs kill sites even if the HTML would still work.
  • Static sites on stable hosting (e.g., S3, GitHub Pages, plain HTML) are seen as the most durable.
  • TLS, server maintenance, and costs of commercial hosting contribute to short lifetimes; domains lapsing is common.

Centralization, walled gardens, and discoverability

  • Widespread shift from independent sites/forums to Facebook pages, Instagram, Reddit, and Discord:
    • Small businesses often use Facebook as their only presence; people without accounts are excluded or discouraged.
    • Forums and niche communities move to private or semi-private groups/Discords; information becomes harder to search, index, or archive.
  • Mixed views:
    • Some lament the loss of an open, diverse “Web 1.0” and the burying of high-signal hobbyist content.
    • Others welcome semi-closed spaces: knowledge stays within communities, less exposed to scraping, SEO spam, and large ML models.

How bad is 38%, and what is lost?

  • Some are surprised the number isn’t higher, given business churn and hobby sites dying.
  • Others see “62% still alive” as still troubling, especially for references in Wikipedia, news, government sites, and niche resources (e.g., immigration forums).
  • Recognition that search engines increasingly surface SEO-heavy or “content farm” material, while older, high-quality but inactive sites sink or vanish.
  • Several respondents have already taken old content offline intentionally, sometimes explicitly because of AI scraping concerns.