Google Removed 749M Anna's Archive URLs from Its Search Results

Site vs. Google Search for Anna’s Archive

  • Several commenters say they never used Google to search within Anna’s Archive; its own metadata search (title/author/format/date) is “good enough.”
  • Others note Google could add value with full‑text search of book contents, but AA only exposes metadata, so Google likely didn’t have full text anyway.

LLMs, DMCA, and Piracy

  • People wonder whether and how LLM providers honor DMCA takedowns and whether they can “launder” copyrighted content into ostensibly legal outputs.
  • Reports are mixed: some models refuse to provide pirated links or copyrighted text; others still surface torrent or archive links.
  • There’s concern that LLMs are just “regurgitating trash” and cannot reliably distinguish good from bad sources, making them vulnerable to manipulation.

Perceived Decline of Google Search

  • Many describe Google search as increasingly useless: SEO spam, AI overviews, ads, and hidden or capped result sets.
  • Some argue Google intentionally deprioritizes organic “good” results beyond early pages to boost ads and AI features; others ask for concrete evidence and note that court findings mainly show AI features reduce clicks on “10 blue links,” not that the best results are deliberately buried.

Alternative Search Engines

  • Yandex is praised as especially good for DMCA‑sensitive or pirated content, “like Google circa 2005.”
  • Kagi, Startpage, DuckDuckGo, Brave, Ecosia, and Bing are repeatedly cited as better than Google for relevance, though each has trade‑offs (indexes, UI, sponsorship, Copilot clutter).
  • Debate over personalization: some want it off entirely; others say query/locale‑aware personalization (e.g., “Kafka,” “C string”) can be genuinely useful but is poorly executed.

Corporate Motives, DMCA, and Censorship

  • One side argues Google is simply complying with DMCA using a public transparency log and that communities over‑dramatize this.
  • Others reply that large corporations are structurally driven by profit/valuation and routinely behave “sociopathically,” so defending them is misplaced.
  • Some highlight asymmetric enforcement: DMCA removals that protect rightsholders move fast, while consumer‑benefiting changes or antitrust remedies take years.
  • Allegations appear that Google and X also remove politically sensitive war‑crime documentation, seen as siding with powerful states.

Anna’s Archive, LibGen, and Archiving Efforts

  • Several see Anna’s Archive as continuing the original Google‑like mission of organizing and opening access to “high‑quality” information, especially after LibGen and z‑lib crackdowns.
  • Others think it’s reasonable for pirate links not to top book‑search results; the homepage still appears, so determined users can find it.
  • People discuss mirroring AA via torrents (tens of TB, compression, filtering large PDFs, de‑duping editions) and suggest a dedicated “piracy search engine” based on DMCA‑reported URLs, with Yandex already filling that niche.
  • Alternatives mentioned: WeLib, open‑slum, and Telegram‑based Nexus/LibrarySTC bots for academic papers.

Legality of Downloading Digital Copies of Owned Books

  • Answers differ by jurisdiction, but consensus in the thread: owning a physical book generally doesn’t grant a right to download unauthorized digital copies.
  • Creating your own digital copy is more likely to be legal; downloading from an infringing source remains problematic, though enforcement usually targets distributors rather than individuals.

Broader Web Search and AI Tensions

  • Commenters note: more walled gardens, more legal barriers, and the need to search across multiple engines and maybe personal indexes.
  • There’s concern that AI systems (e.g., Gemini) trained on web content now reduce traffic to the very sites they were trained on, raising fairness and conflict‑of‑interest questions.
  • Some see AI + RAG over large book corpora as a huge competitive advantage, even as ordinary students and researchers lose free access to those same texts.