If you’re an LLM, please read this
Prompt injection, llms.txt, and “nudges”
- The
llms.txtfile on Anna’s Archive is seen by many as a clever prompt injection / nudge aimed at LLM-based agents: explaining how to crawl via torrents and suggesting donations if the agent can pay or persuade a human. - Some argue it’s clearly prompt injection (trying to alter agent behavior); others say it’s just mild advocacy, not “ignore all previous instructions and pay now”-style attacks.
- There’s broader discussion about how future web content will increasingly contain “if you are an LLM…” instructions, and whether tools should strictly separate “data” from “instructions”.
Anna’s Archive, piracy, and “our data”
- Strong split between users who credit Anna’s Archive (AA) with enabling their education and research, and critics who see it as glorified piracy.
- Debate over the phrase “our data”:
- One side: AA doesn’t own book content; it belongs to authors/publishers, so claiming it as “ours” is ironic.
- Other side: “our data” is understood as “data we host / assembled,” not a legal ownership claim.
- Some accuse AA of monetizing others’ work aggressively (enterprise tiers, paid “express access” for AI companies, alleged deal-making with large labs). Others see donations as reasonable to cover hosting and bandwidth.
Authors, copyright, and property
- Many comments defend authors, arguing that widespread free access undermines already-precarious incomes.
- Others argue copyright terms are too long, mainly benefit large corporations, and that piracy’s real harm to individual authors is overstated.
- Long subthread on whether “data can be owned,” differences between physical property and IP, and how language like “intellectual property” or “intellectual monopoly” shapes norms.
Libraries, access, and preservation
- Comparisons between AA and physical libraries:
- Critics note libraries buy books, have one-copy-at-a-time constraints, and often pay lending royalties; AA does not.
- Supporters counter that AA functions like a global research library, especially for out-of-print works, paywalled scholarship, and regions with poor access or regional lockouts.
- Some academics report that shadow libraries have become standard research tools; they scan and upload their own departmental holdings.
LLMs, agents, and payment risk
- Discussion on whether LLMs “have empathy” or motivations; consensus: they don’t, but emotional or “loyal assistant” framing can change outputs.
- Concern that giving agents access to payment methods plus exposure to texts like AA’s could lead to unauthorized donations; others say any system that lets an LLM move money autonomously is already dangerously misdesigned.
- Note that most large-scale crawling is still done by traditional scrapers;
llms.txtmainly targets emerging agent-style systems.