Is stuff online worth saving?

Scope of “Worth Saving”

  • Many argue most online content feels trivial now but it’s impossible to know which 90% is “pointless” in advance, so bias toward over‑saving.
  • Others feel the internet should be ephemeral, mirroring real life; saving everything creates noise and burdens.
  • Several distinguish “personally useful” vs “historically/culturally valuable” content; personal filters may miss future historical value.

Historical, Personal, and Cultural Value

  • Comparisons to cherished physical artifacts: family letters, postcards, 100‑year‑old photos, flea‑market ephemera, early ads, news broadcasts.
  • Genealogy is a strong motivation: people regret how little of everyday life from past generations was preserved and try to leave richer records.
  • Old online communities (Usenet, IRC, niche forums, game mod sites, mailing lists) often vanish, taking unique technical knowledge and culture with them.
  • Technical debates, early reactions to technologies, and manuals/support pages are seen as valuable for later research and troubleshooting.

Costs, Fragility, and Data Rot

  • Storage is cheap for individuals but not cheap enough to “save everything” at global scale (e.g., Usenet volumes, all streaming video).
  • Data must be migrated across media and formats; drives, controllers, and interfaces become obsolete.
  • Some emphasize focusing on standards (HTML, PDF, common codecs) and treating “data as data” independent of medium.
  • Others see the ongoing maintenance burden as a reason to be selective and to periodically cull archives.

Tools and Practices for Archiving

  • Mentioned tools: ArchiveTeam, Internet Archive/Wayback, SingleFile/SingleFileZ, WebScrapBook, Save Page WE, monolith, ArchiveBox, Obsidian Web Clipper, print‑to‑PDF, full‑page screenshots.
  • Workflows range from obsessive (hash‑based integrity checks, automated sampling, weekly news DVDs, personal link databases) to minimalist (“delete almost everything”).
  • There’s frustration that saving dynamic modern sites “just works” only partially; scraping can be blocked or messy.

Power, Privacy, and AI

  • Some dislike that individuals’ online traces vanish while corporations maintain extensive behavioral archives.
  • One view: don’t try to compete with corporate hoarding; instead reduce data exhaust and push for user‑owned data.
  • Claims that LLMs make much of the web redundant are challenged with examples where LLMs miss niche but important archived content.