Is stuff online worth saving?
Scope of “Worth Saving”
- Many argue most online content feels trivial now but it’s impossible to know which 90% is “pointless” in advance, so bias toward over‑saving.
- Others feel the internet should be ephemeral, mirroring real life; saving everything creates noise and burdens.
- Several distinguish “personally useful” vs “historically/culturally valuable” content; personal filters may miss future historical value.
Historical, Personal, and Cultural Value
- Comparisons to cherished physical artifacts: family letters, postcards, 100‑year‑old photos, flea‑market ephemera, early ads, news broadcasts.
- Genealogy is a strong motivation: people regret how little of everyday life from past generations was preserved and try to leave richer records.
- Old online communities (Usenet, IRC, niche forums, game mod sites, mailing lists) often vanish, taking unique technical knowledge and culture with them.
- Technical debates, early reactions to technologies, and manuals/support pages are seen as valuable for later research and troubleshooting.
Costs, Fragility, and Data Rot
- Storage is cheap for individuals but not cheap enough to “save everything” at global scale (e.g., Usenet volumes, all streaming video).
- Data must be migrated across media and formats; drives, controllers, and interfaces become obsolete.
- Some emphasize focusing on standards (HTML, PDF, common codecs) and treating “data as data” independent of medium.
- Others see the ongoing maintenance burden as a reason to be selective and to periodically cull archives.
Tools and Practices for Archiving
- Mentioned tools: ArchiveTeam, Internet Archive/Wayback, SingleFile/SingleFileZ, WebScrapBook, Save Page WE, monolith, ArchiveBox, Obsidian Web Clipper, print‑to‑PDF, full‑page screenshots.
- Workflows range from obsessive (hash‑based integrity checks, automated sampling, weekly news DVDs, personal link databases) to minimalist (“delete almost everything”).
- There’s frustration that saving dynamic modern sites “just works” only partially; scraping can be blocked or messy.
Power, Privacy, and AI
- Some dislike that individuals’ online traces vanish while corporations maintain extensive behavioral archives.
- One view: don’t try to compete with corporate hoarding; instead reduce data exhaust and push for user‑owned data.
- Claims that LLMs make much of the web redundant are challenged with examples where LLMs miss niche but important archived content.