OpenAI's bot crushed this seven-person company's web site 'like a DDoS attack'

Legal and liability questions

  • Commenters discuss whether excess hosting costs from scraping are legally recoverable.
  • Cited case law suggests scraping public data is generally not a criminal CFAA issue; disputes are mostly civil.
  • One small site reports successfully getting a few thousand dollars from an AI company for bandwidth overuse.
  • Several note robots.txt has no legal force; lawsuits would likely rest on general claims of harm, not robots violations.
  • There is disagreement on whether large-scale, non-consensual scraping could realistically succeed in court; some say “no precedent,” others point to past crawling lawsuits (unclear which).

Who is responsible for the overload?

  • One camp: if you run a public site without auth, rate limits, caching, or robots.txt, you should expect heavy crawling and design for it.
  • Opposing camp: small businesses can’t all be infra experts; bots that knock sites offline are behaving unreasonably, regardless of site quality.
  • Analogies (e.g., emptying a free library, filling a shop with non-buyers) are used to argue that “not illegal” ≠ ethical.

Crawler behavior and engineering quality

  • Many describe AI crawlers as poorly engineered: over-aggressive, ignoring 429 / Retry-After, re-crawling unchanged content, and sometimes spoofing user agents.
  • Some note that classic search bots historically provided support channels and honored robots more reliably.
  • Others argue the article’s “DDoS-like” framing is unproven because no request rates or timestamps are shown; Cloudflare IPs in logs further muddy attribution.

Mitigations and countermeasures

  • Suggested defenses: robots.txt (with explicit AI blocks), Cloudflare protection and AI-bot blocking, fail2ban, HTTP 429 with subnet-level throttling, ASN or country blocks, IPv6-only access, and .htaccess rules.
  • Some propose “data poisoning” defenses: serving gibberish, recursive content, or compressible text to abusive bots; others argue such gibberish is easy to filter in curation.

Broader implications for the web and AI

  • Concern that aggressive AI scraping will push more content behind logins/paywalls, reducing open information.
  • Some see AI agents as reviving “personal webcrawlers” and automating interactions with sites that don’t offer APIs.
  • Others worry this simply recreates a centralized, Google-like gatekeeper, now controlled by AI companies.