AI companies cause most of traffic on forums

Scale and behavior of AI crawlers

  • Many reports of AI-related bots (Claude, GPTBot, Meta, AmazonBot, etc.) dominating traffic on forums and wikis, sometimes >90% of hits.
  • Some describe effectively DDoS-like load: repeated full-site crawls, wiki revision histories, diff views, dynamic searches, and low backoff behavior.
  • Others note that raw request counts can sound large but average TPS can be modest; contention is around burstiness and dynamic-page cost, not just averages.

Robots.txt, legality, and norms

  • Several posters say AI crawlers ignore robots.txt and then evade blocks by:
    • Rotating IPs and ASNs (often cloud/residential ranges).
    • Spoofing “normal” browser User-Agent strings.
  • There is disagreement over a specific site’s robots.txt history; archival data and operator claims conflict.
  • Some cite US case law suggesting scraping public pages is legal; others argue that high-volume, non-consensual crawling that degrades service approaches unauthorized access / CFAA territory, though enforceability is doubted and seen as favoring large companies.

Mitigation strategies

  • Technical defenses mentioned:
    • Blocking/bandlimiting by User-Agent, IP, ASN, or geography.
    • Using Cloudflare/WAFs, rate limiting (e.g., nginx + fail2ban), stick tables.
    • Returning 403/404/429/5xx, or very slow responses, or infinite 302 chains.
    • Moving content behind logins, VPNs, or whitelists.
  • Concerns that over-aggressive bot-blocking (especially via Cloudflare “suspected bot” features) harms legitimate users, especially in some regions.

Data poisoning and honeypots

  • Many propose serving AI bots:
    • Markov-chain or LLM-generated nonsense, subtle inaccuracies, or irrelevant text.
    • Alternative “junk” mirrors or link mazes and tarpits.
    • Honeypot URLs in hidden links or robots.txt to detect misbehaving crawlers.
  • Skeptics argue large LLM trainers heavily clean data and can filter obvious garbage; supporters counter that obfuscation can evolve and poisoning can be a “moral victory” even if low-scale.

Impact on small sites and future of the web

  • Fear that continual bot load plus AI-generated spam will:
    • Push small forums/wikis to shut down, go private, or require logins and payments.
    • Accelerate a shift to walled gardens and centralized platforms.
  • Some see this as “privatized profits, socialized losses”: AI firms monetize models trained on public content while site operators pay infra bills.