The internet is no longer a safe haven
Perceived causes of rising abuse
- Many see a clear recent increase in scrapers and attacks, largely attributed to:
- Commercial demand for training data from AI companies.
- LLMs making it trivial for non-experts to generate custom scrapers or “check X every second” tools.
- Cheap, abundant cloud infrastructure and proxy networks (including residential/mobile IPs).
- Others argue the internet has always been hostile; what changed is scale and automation, not the fundamental dynamic.
Legal vs technical governance
- One camp sees this primarily as a legal problem:
- Better international law enforcement and treaties (like those against software piracy) extended to DDoS and abuse.
- Liability for hosts/ISPs and even negligent customers (e.g., outdated WordPress on a VPS).
- Pushback:
- Global enforcement also imports foreign censorship and speech laws.
- Centralized control (governments, clouds) is ripe for abuse and could be worse than today’s Cloudflare-style gatekeepers.
- Some advocate “tribes”/small communities with explicit gatekeeping instead of global regulation.
Identity, reputation, and proof-of-work ideas
- Proposed defenses:
- Request-signing standards plus reputation databases for crawlers.
- Persistent, per-service pseudonymous identities that can be banned but don’t reveal real-world identity.
- Reputation systems where capabilities grow with good behavior; resetting identity should be costly.
- Concerns:
- “Digital death penalty” (permanent exclusion) and abuse by authoritarian regimes.
- Tension between reputation and privacy may be fundamentally hard to solve; zero-knowledge proofs suggested but unproven at scale.
- Proof-of-work:
- Suggested as a way to keep bots out; critics cite work showing global thresholds are unworkable and attackers can use cheap/botnet compute.
Defensive techniques in practice
- Common approaches discussed:
- Nginx rate limiting, iptables/ASN/geo blocking, SYN anti-spoofing, rp_filter.
- Honeypots and traps: invisible links, fake admin paths, “bot-ban-me” hostnames, SSH user triggers.
- Bot-wasting tactics like zipbombs or bogus content to poison AI scrapers.
- mTLS, VPNs/WireGuard, and Cloudflare/Anubis-style frontends for private or small sites.
- Mixed experience: some see these as sufficient; others say even hobby sites get overwhelmed without big-CDN protection.
Effects on hobbyists and self-hosting
- Many recount constant automated probing since the 2000s; logs full of exploit attempts against software they don’t even run.
- Some argue the real issue is unoptimized stacks (e.g., Gitea + Fail2ban) rather than traffic volume.
- Others say requiring deep security expertise and endless hardening proves the environment is objectively hostile, discouraging casual self-hosting.
Debate over the internet’s past and future
- Some think “the internet is over” as an open, welcoming space; any new platform will be swarmed the moment it gains traction.
- Suggestions range from moving to niche protocols (Gemini, IPv6-only) to accepting centralization and signing/identity as inevitable.
- There’s a broader philosophical split: internet as immense net-benefit vs. net-negative (consumerism, attention capture, AI “plastic content”), with no consensus on a realistic path to “safe haven” status.