It seems like the AI crawlers learned how to solve the Anubis challenges
Role and Limits of Anubis / PoW
- Commenters stress Anubis was never a “bot detector” so much as a rate/cost limiter for abusive traffic, especially from rotating residential IPs that defeat IP-based throttling.
- It works by requiring a SHA-256 proof-of-work once per client/session and issuing a JWT; scrapers can then amortize the cost over many requests, so large crawlers are only mildly inconvenienced.
- Several note that if a normal browser can run the JS, a headless browser can too. The move from curl/Go clients to full Chromium was seen as inevitable.
- Some argue PoW is “security theater”: the cost per page is orders of magnitude too low relative to AI companies’ compute, especially given optimization and batching.
Economics and Alternatives (402, Micropayments, “Useful Work”)
- Many propose “402 Payment Required”–style schemes or Cloudflare-like pay-per-crawl/x402, to directly charge AI crawlers and shift costs back onto them; concerns include fees, taxes, exclusion of low-income users, and stronger DRM/copyright incentives.
- Ideas include memory-hard PoW (Argon2, scrypt), per-resource hashes, and tying challenges to limited request quotas, but there’s skepticism that any tuning can meaningfully burden data centers without punishing users.
- Some suggest embedding “useful work” (cryptomining, protein folding) in PoW; others strongly oppose normalizing web cryptominers and note that making work simultaneously useful, verifiable, and low-latency is unsolved.
Impact of AI Crawlers on the Open Web
- Several operators of forges and personal sites report massive, robots.txt-ignoring scraping that hammers expensive endpoints (e.g., git blame, logs) and drives up bandwidth/CDN bills or causes slowdowns/DoS.
- Others say they see little such traffic and suspect this is mainly a problem for highly visible or code-heavy sites.
- There is worry about non-commercial sites disappearing or retreating into private/overlay networks, geoblocking, or paywalls, contributing to web “balkanization.”
Legal, Ethical, and Normative Arguments
- One camp: public web content is fair game for crawling unless it causes clear harm (e.g., takes sites down); mandatory robots.txt compliance or anti-crawling laws risk DRM-like regimes.
- The other camp: ignoring robots.txt and overwhelming small hosts is abusive, and there should be legal penalties (e.g., treating circumvention of systems like Anubis as bypassing “digital locks” under DMCA-style statutes).
- Debate hinges on whether publishing for humans implies consent to large-scale machine reuse and on the difficulty of cross-border enforcement.
Critiques of Anubis and Broader Arms Race
- Criticisms: Anubis harms UX (JS dependence, delays), breaks archiving and indexing unless carefully configured, and doesn’t truly stop determined AI crawlers—only the “dumbest” bots.
- Supporters counter that even partial filtering and raising marginal costs is valuable for donation-funded services that just want to avoid being overrun.
- Some prefer alternative tactics: serving LLM-generated junk or honeypot link mazes to waste crawler resources or poison training data; others experiment with IPv6-only sites, with mixed reports on effectiveness.