2024-10-09

Internet Archive: Security breach alert

Incident overview

Visitors to archive.org saw a JavaScript alert() popup claiming a “catastrophic security breach” and that “31 million” users were now on Have I Been Pwned (HIBP).
Users quickly confirmed the popup, then the site began returning 503/504 errors and periods of full downtime; later a “Temporarily Offline” page appeared.
Separate reports describe both a DDoS attack and a data breach affecting about 31M accounts, with the leaked data added to HIBP.
Thread participants note that media coverage initially blurred “DDoS” vs “breach,” and that final details are still emerging.

Attack vectors and technical details

Multiple commenters trace the popup to malicious JavaScript served from polyfill.archive.org, a self‑hosted instance of the polyfill service previously involved in a supply‑chain incident.
This explains the injected window.alert but not necessarily how database access was gained; whether these are the same vector is unclear.
A DDoS campaign against IA is credited by an online group; their claimed motives (pro‑Palestinian, anti‑US) are widely questioned, with some suggesting they are opportunistic script‑kiddies or a possible false flag.
There is concern about linking to a currently compromised site because it could deliver malware.

Data breach impact and security posture

HIBP domain and email checks confirm many IA users’ addresses are in the leak; BleepingComputer is cited for ~31M records.
Leaked fields reportedly include emails, usernames, and bcrypt password hashes; no confirmation that payment data was taken, but commenters stress “we don’t know that’s all.”
Several people find old or changed emails still present in the dump, suggesting historical data was retained or the breach window is earlier than stated.
Advice repeatedly emphasized: password managers, unique passwords per site, 2FA/MFA “for anything of value,” and ideally unique or aliased email addresses per service.
There is debate over storing 2FA seeds inside password managers: convenient and better than no 2FA, but it collapses two factors into one vault.

IA design, privacy, and accounts

Commenters highlight that uploader email addresses are already exposed in item metadata and account XML, viewing this as a longstanding privacy flaw IA has not fixed.
This sparks discussion about whether email addresses should be treated as private data, and how much linkage between identity and contributions an archive ought to expose.
Some ask why IA needs user accounts at all; others point out they’re required for uploads and for borrowing digitized books.

Motives, ethics, and community reaction

Many express anger that a public‑good, donation‑funded “library of the internet” is being attacked at all, likening it to vandalizing a public library.
Others speculate about enemies of IA (publishers, states) but also warn against ungrounded conspiracies and over‑attribution.
There’s debate over “hack value”: curiosity‑driven exploits vs destructive DDoS and mass data leaks; several invoke hacker ethics that distinguish making public data accessible from exposing private data.
One user notes being actively doxxed via archived social media and feeling conflicted: valuing IA while suffering from the permanence of personal PII.

Resilience, decentralization, and backups

The outage renews worries about IA as a global single point of failure, with frequent “Library of Alexandria” metaphors and calls for multiple independent archives.
Participants discuss prior/ongoing attempts to mirror or decentralize IA’s corpus (IPFS/Filecoin, torrents, Freenet/Hyphanet), noting scale and filesystem/UX challenges.
Rough napkin math for mirroring tens of petabytes shows hardware cost is attainable for large organizations but daunting for volunteers; tape and better compression are suggested for cold backups.
Several propose volunteer‑driven distributed backup schemes using personal storage plus smart redundancy and metadata tracking, though reliability, copyright, and funding remain open problems.

Related topics