2025-08-15

It seems like the AI crawlers learned how to solve the Anubis challenges

Role and Limits of Anubis / PoW

Commenters stress Anubis was never a “bot detector” so much as a rate/cost limiter for abusive traffic, especially from rotating residential IPs that defeat IP-based throttling.
It works by requiring a SHA-256 proof-of-work once per client/session and issuing a JWT; scrapers can then amortize the cost over many requests, so large crawlers are only mildly inconvenienced.
Several note that if a normal browser can run the JS, a headless browser can too. The move from curl/Go clients to full Chromium was seen as inevitable.
Some argue PoW is “security theater”: the cost per page is orders of magnitude too low relative to AI companies’ compute, especially given optimization and batching.

Economics and Alternatives (402, Micropayments, “Useful Work”)

Many propose “402 Payment Required”–style schemes or Cloudflare-like pay-per-crawl/x402, to directly charge AI crawlers and shift costs back onto them; concerns include fees, taxes, exclusion of low-income users, and stronger DRM/copyright incentives.
Ideas include memory-hard PoW (Argon2, scrypt), per-resource hashes, and tying challenges to limited request quotas, but there’s skepticism that any tuning can meaningfully burden data centers without punishing users.
Some suggest embedding “useful work” (cryptomining, protein folding) in PoW; others strongly oppose normalizing web cryptominers and note that making work simultaneously useful, verifiable, and low-latency is unsolved.

Impact of AI Crawlers on the Open Web

Several operators of forges and personal sites report massive, robots.txt-ignoring scraping that hammers expensive endpoints (e.g., git blame, logs) and drives up bandwidth/CDN bills or causes slowdowns/DoS.
Others say they see little such traffic and suspect this is mainly a problem for highly visible or code-heavy sites.
There is worry about non-commercial sites disappearing or retreating into private/overlay networks, geoblocking, or paywalls, contributing to web “balkanization.”

Legal, Ethical, and Normative Arguments

One camp: public web content is fair game for crawling unless it causes clear harm (e.g., takes sites down); mandatory robots.txt compliance or anti-crawling laws risk DRM-like regimes.
The other camp: ignoring robots.txt and overwhelming small hosts is abusive, and there should be legal penalties (e.g., treating circumvention of systems like Anubis as bypassing “digital locks” under DMCA-style statutes).
Debate hinges on whether publishing for humans implies consent to large-scale machine reuse and on the difficulty of cross-border enforcement.

Critiques of Anubis and Broader Arms Race

Criticisms: Anubis harms UX (JS dependence, delays), breaks archiving and indexing unless carefully configured, and doesn’t truly stop determined AI crawlers—only the “dumbest” bots.
Supporters counter that even partial filtering and raising marginal costs is valuable for donation-funded services that just want to avoid being overrun.
Some prefer alternative tactics: serving LLM-generated junk or honeypot link mazes to waste crawler resources or poison training data; others experiment with IPv6-only sites, with mixed reports on effectiveness.

Related topics