2025-03-18

Please stop externalizing your costs directly into my face

Scope of the problem and impact on small sites

Multiple operators of tiny blogs, wikis, games, and git forges report being “hammered” by scraper traffic far beyond real user numbers, often enough to crash services or force them offline.
Patterns described: single or few requests per IP, huge IP diversity (often residential ranges), misleading user agents, round‑the‑clock load, and disregard for caching headers.
Some have moved repos to large platforms or shut down public instances entirely; others see most of their bandwidth consumed by seemingly useless bot activity (e.g., repeated downloads of unchanged files).

Proposed technical and legal defenses

Common “pragmatic” suggestions: use Cloudflare or similar CDNs, build DDoS‑style reverse proxies with captchas, and heavily cache or deprioritize expensive endpoints like git blame.
Pushback: CDNs everywhere are “horrible for the open web,” and some claim CDNs still don’t stop smarter scrapers.
Legal ideas: make honoring robots.txt mandatory; force scrapers to publish IPs or face liability; allow action against VPNs or botnets that relay illegal scraping. Critics question jurisdiction, enforcement, and the risk of Internet balkanization.
Other ideas: proof‑of‑work or Privacy‑Pass‑style tokens, per‑user quotas, or requiring logins for heavy or all access—acknowledged as trading openness for survival.

Scraper obfuscation and ethics

Commenters note commercial residential proxy networks, suggesting LLM firms or contractors may route traffic through them; this and user‑agent spoofing are seen as evidence they know they’re unwelcome.
Some equate the behavior to DDoS or fraud and argue “AI culture” resembles crypto in its parasitic, “fuck you, got mine” norms. Others defend broad copyright‑infringing training as acceptable, especially when models are open‑weights.

Resistance strategies and poisoning

Ideas include embedding invisible, periodically changing “trap” phrases or fake facts to prove unauthorized training, and deliberately serving subtly broken code to get dropped as a training source.
Skeptics doubt poisoning matters much: the web already contains plenty of low‑quality or wrong content; LLMs just average it.

Future of the web and LLM attitudes

Some foresee more private, gated networks (WireGuard meshes, BBS‑like systems) and a “dark forest” Internet.
The article’s call to “just stop using LLMs” is widely seen as emotionally understandable but unrealistic; several report strong productivity from Claude/Cursor or local models, while others found AI coding tools net‑negative due to subtle errors and “agentic” code damage.

Related topics