Anna's Archive Faces Millions in Damages and a Permanent Injunction
Scale and Preservation of Anna’s Archive
- Commenters note the archive is ~860 TB, with suggestions to distribute it via preloaded multi‑drive arrays or tapes.
- Some think individuals or hobbyists can realistically host 250 TB subsets (e.g., non‑fiction/journals), but ISPs and transfer times are major bottlenecks.
- Others argue only physical distribution at scale (hard drives, tapes) is feasible, while some see partial mirroring (1 TB chunks) as not worth the legal and logistical risk.
Compression, LLMs, and File Formats
- Idea raised: treat LLMs or advanced predictors as lossless compressors by storing only deviations from predictions.
- Pushback: LLM-based representations are unreliable for trusted reference content; lossless schemes exist but are complex.
- Discussion of converting bloated PDFs to DjVu to shrink the archive; others say tooling is poor and that sticking with PDFs (and ultimately plain text + images) is more future‑proof.
Views on OCLC, WorldCat, and the Lawsuit
- Many see OCLC’s mission statements about “sharing knowledge” as hypocritical given litigation against a mirror of WorldCat data.
- Some question OCLC’s added value, since it did not create the bibliographic data; others say organizing and standardizing data is non‑trivial and valuable.
- Several note a history of OCLC asserting strong control over its records, framing this suit as part of a broader gatekeeping pattern.
Scraping, “Cyberattacks,” and Claimed Damages
- OCLC claims ~$5.3M in damages (hardware upgrades, Cloudflare, salaries for 34 staff, investigations). Commenters widely view this as inflated or as normal operating costs.
- Debate over whether aggressive scraping constitutes a “cyberattack.” Some note scrapers can effectively DoS sites; others say calling scraping “hacking” is dangerous overreach.
- Comparisons drawn to earlier cases around scraping and to the prosecution of high‑profile downloaders; some hope or fear this could set precedent affecting AI training.
Intellectual Property and Copyright Debate
- Strong current that IP does more harm than good, especially for knowledge access; calls range from radical abolition to sharply shorter copyright terms (e.g., 7–15–30 years) and reform of patent renewals.
- Counterargument: IP‑intensive sectors represent a large share of GDP and employment; losing legal protection could undercut companies like chip designers and weaken US strategic power.
- Long sub‑thread on differences between physical and intellectual property, incentives to create, and whether markets would adapt if IP protections were scaled back.