2024-12-17

Is stuff online worth saving?

Scope of “Worth Saving”

Many argue most online content feels trivial now but it’s impossible to know which 90% is “pointless” in advance, so bias toward over‑saving.
Others feel the internet should be ephemeral, mirroring real life; saving everything creates noise and burdens.
Several distinguish “personally useful” vs “historically/culturally valuable” content; personal filters may miss future historical value.

Historical, Personal, and Cultural Value

Comparisons to cherished physical artifacts: family letters, postcards, 100‑year‑old photos, flea‑market ephemera, early ads, news broadcasts.
Genealogy is a strong motivation: people regret how little of everyday life from past generations was preserved and try to leave richer records.
Old online communities (Usenet, IRC, niche forums, game mod sites, mailing lists) often vanish, taking unique technical knowledge and culture with them.
Technical debates, early reactions to technologies, and manuals/support pages are seen as valuable for later research and troubleshooting.

Costs, Fragility, and Data Rot

Storage is cheap for individuals but not cheap enough to “save everything” at global scale (e.g., Usenet volumes, all streaming video).
Data must be migrated across media and formats; drives, controllers, and interfaces become obsolete.
Some emphasize focusing on standards (HTML, PDF, common codecs) and treating “data as data” independent of medium.
Others see the ongoing maintenance burden as a reason to be selective and to periodically cull archives.

Tools and Practices for Archiving

Mentioned tools: ArchiveTeam, Internet Archive/Wayback, SingleFile/SingleFileZ, WebScrapBook, Save Page WE, monolith, ArchiveBox, Obsidian Web Clipper, print‑to‑PDF, full‑page screenshots.
Workflows range from obsessive (hash‑based integrity checks, automated sampling, weekly news DVDs, personal link databases) to minimalist (“delete almost everything”).
There’s frustration that saving dynamic modern sites “just works” only partially; scraping can be blocked or messy.

Power, Privacy, and AI

Some dislike that individuals’ online traces vanish while corporations maintain extensive behavioral archives.
One view: don’t try to compete with corporate hoarding; instead reduce data exhaust and push for user‑owned data.
Claims that LLMs make much of the web redundant are challenged with examples where LLMs miss niche but important archived content.

Related topics