2024-12-30

AI companies cause most of traffic on forums

Scale and behavior of AI crawlers

Many reports of AI-related bots (Claude, GPTBot, Meta, AmazonBot, etc.) dominating traffic on forums and wikis, sometimes >90% of hits.
Some describe effectively DDoS-like load: repeated full-site crawls, wiki revision histories, diff views, dynamic searches, and low backoff behavior.
Others note that raw request counts can sound large but average TPS can be modest; contention is around burstiness and dynamic-page cost, not just averages.

Robots.txt, legality, and norms

Several posters say AI crawlers ignore robots.txt and then evade blocks by:
- Rotating IPs and ASNs (often cloud/residential ranges).
- Spoofing “normal” browser User-Agent strings.
There is disagreement over a specific site’s robots.txt history; archival data and operator claims conflict.
Some cite US case law suggesting scraping public pages is legal; others argue that high-volume, non-consensual crawling that degrades service approaches unauthorized access / CFAA territory, though enforceability is doubted and seen as favoring large companies.

Mitigation strategies

Technical defenses mentioned:
- Blocking/bandlimiting by User-Agent, IP, ASN, or geography.
- Using Cloudflare/WAFs, rate limiting (e.g., nginx + fail2ban), stick tables.
- Returning 403/404/429/5xx, or very slow responses, or infinite 302 chains.
- Moving content behind logins, VPNs, or whitelists.
Concerns that over-aggressive bot-blocking (especially via Cloudflare “suspected bot” features) harms legitimate users, especially in some regions.

Data poisoning and honeypots

Many propose serving AI bots:
- Markov-chain or LLM-generated nonsense, subtle inaccuracies, or irrelevant text.
- Alternative “junk” mirrors or link mazes and tarpits.
- Honeypot URLs in hidden links or robots.txt to detect misbehaving crawlers.
Skeptics argue large LLM trainers heavily clean data and can filter obvious garbage; supporters counter that obfuscation can evolve and poisoning can be a “moral victory” even if low-scale.

Impact on small sites and future of the web

Fear that continual bot load plus AI-generated spam will:
- Push small forums/wikis to shut down, go private, or require logins and payments.
- Accelerate a shift to walled gardens and centralized platforms.
Some see this as “privatized profits, socialized losses”: AI firms monetize models trained on public content while site operators pay infra bills.

Related topics