Anna's Archive: An Update from the Team

Access, Blocking, and Censorship

  • Commenters report Anna’s Archive being blocked differently by country and ISP: HTTP 451 via Cloudflare in Belgium, DNS blocks or connection resets on some UK and Dutch ISPs, while others in the same countries have full access.
  • People discuss using VPNs, Apple Private Relay, Tor and alternative DNS to bypass blocks, and note that Cloudflare must comply with local legal orders or risk being blocked wholesale.
  • There’s unease that Cloudflare is effectively becoming a filter on individuals’ web access. Some want Ofcom/EU regulators to make blocking policies more consistent and transparent.
  • HTTP status 451 (“Unavailable for Legal Reasons”) is discussed as a censorship marker; other status codes appear as well (523, etc.).

Role, Mission, and Ethics of Anna’s Archive

  • Many see AA as “one of the last good things on the internet,” a modern Library of Alexandria preserving scientific papers, textbooks and books for the whole world, especially where legal access is limited or prohibitively expensive.
  • Others push back on the site’s rhetoric (“attacks on our mission”), arguing it is fundamentally a piracy operation, even if its archival side effects are valuable.
  • Some distinguish between AA’s role in liberating scientific/academic content (often produced with public funding and locked behind paywalls) and its distribution of recent commercial ebooks that directly affect authors’ income.

Impact on Authors, Copyright, and Fairness

  • One author in the thread is furious that a book they worked on for decades is freely downloadable; others reiterate that many writers already earn very little per sale and piracy feels like “mind theft.”
  • Supporters counter that most downloads are not lost sales; many users say they use AA to discover or preview works and then buy physical or DRM‑free copies, especially for niche or older titles.
  • Multiple people cite studies suggesting weak or no robust evidence that piracy significantly displaces overall sales, though skeptics argue these effects are hard to measure and likely non‑zero.
  • Libraries vs AA: physical and controlled digital lending buy/licence limited copies and replace them over time; AA distributes unlimited perfect copies. Some see that as a crucial legal and moral difference.

LLMs, Training Data, and Shadow Libraries

  • Several comments state or assume that OpenAI, Meta and others have trained on data from LibGen, Z‑Library, AA and similar sites; a few claim to have direct knowledge of small payments to AA‑like projects for bulk access.
  • There’s a deep argument over whether training on copyrighted books is or should be “fair use,” and whether companies that don’t use all available (including pirated) data will be outcompeted.
  • Some argue that if training on books is judged fair use, rights‑holders must “just accept it”; others insist that changing this should require democratic reform of copyright, not unilateral corporate decisions.
  • A separate line of debate asks whether the social benefit of powerful models built on shadow‑library data justifies those libraries, and whether models should be open‑weights if built on such material.

Shadow Library Ecosystem and Preservation

  • The AA blog update notes: massive scrapes from Internet Archive’s Controlled Digital Lending, HathiTrust, DuXiu, WorldCat, Google Books; partnerships with LibGen forks, STC/Nexus, Z‑Library; and the disappearance of a LibGen fork.
  • Commenters worry that explicitly bragging about scraping IA’s lending system could harm IA in court, by letting publishers argue that even “controlled” lending leaks into unrestricted piracy.
  • WeLib is called out by AA as mirroring AA’s collection and code but not sharing new material or code back; some agree this is parasitic and dangerous for preservation, others say any extra mirror improves decentralization.
  • AA publishes large torrent sets (e.g., sci‑hub, libgen) so anyone can help seed. Some individuals discuss the feasibility and cost of personally mirroring ~100–200 TB of scientific knowledge and whether high‑quality PDFs vs deduplicated text should be preserved.

Funding, Paywalls, and Non‑profit Claims

  • AA uses “soft” throttling: free downloads are slow/queued; donations unlock faster mirrors. Some users are suspicious, comparing this to commercial file‑host monetization; others point out that bandwidth, storage, and legal risk are expensive and volunteers are likely not “getting rich.”
  • There’s debate over calling AA a “non‑profit” when it’s an illegal, opaque operation with no formal status or audits. Some argue “non‑profit” should be reserved for regulated entities; others say it’s about intent and non‑distribution of profits, not paperwork.
  • Anonymous funding and crypto: Monero and indirect methods (e.g., buying gift cards with crypto) are discussed; some worry that large money flows plus opacity make AA vulnerable to greed or accusations of money laundering.

Internet Design, Privacy, and Piracy Culture

  • A side thread argues the internet should be redesigned to resist DDoS, spam, surveillance, and mass scraping; replies note trade‑offs between openness, decentralization, and control, and that many “attacks” are features for powerful actors.
  • Tools like Hashcash, Tor, Freenet, I2P, and proof‑of‑work schemes are mentioned as partial mitigations with significant usability or efficiency costs.
  • Broader piracy ethics recur: some see most pirates as simply wanting free stuff and rationalizing; others emphasize that heavy pirates are often heavy buyers and that streaming/DRM and high prices helped create the demand for shadow libraries in the first place.