The internet is no longer a safe haven

Perceived causes of rising abuse

  • Many see a clear recent increase in scrapers and attacks, largely attributed to:
    • Commercial demand for training data from AI companies.
    • LLMs making it trivial for non-experts to generate custom scrapers or “check X every second” tools.
    • Cheap, abundant cloud infrastructure and proxy networks (including residential/mobile IPs).
  • Others argue the internet has always been hostile; what changed is scale and automation, not the fundamental dynamic.

Legal vs technical governance

  • One camp sees this primarily as a legal problem:
    • Better international law enforcement and treaties (like those against software piracy) extended to DDoS and abuse.
    • Liability for hosts/ISPs and even negligent customers (e.g., outdated WordPress on a VPS).
  • Pushback:
    • Global enforcement also imports foreign censorship and speech laws.
    • Centralized control (governments, clouds) is ripe for abuse and could be worse than today’s Cloudflare-style gatekeepers.
    • Some advocate “tribes”/small communities with explicit gatekeeping instead of global regulation.

Identity, reputation, and proof-of-work ideas

  • Proposed defenses:
    • Request-signing standards plus reputation databases for crawlers.
    • Persistent, per-service pseudonymous identities that can be banned but don’t reveal real-world identity.
    • Reputation systems where capabilities grow with good behavior; resetting identity should be costly.
  • Concerns:
    • “Digital death penalty” (permanent exclusion) and abuse by authoritarian regimes.
    • Tension between reputation and privacy may be fundamentally hard to solve; zero-knowledge proofs suggested but unproven at scale.
  • Proof-of-work:
    • Suggested as a way to keep bots out; critics cite work showing global thresholds are unworkable and attackers can use cheap/botnet compute.

Defensive techniques in practice

  • Common approaches discussed:
    • Nginx rate limiting, iptables/ASN/geo blocking, SYN anti-spoofing, rp_filter.
    • Honeypots and traps: invisible links, fake admin paths, “bot-ban-me” hostnames, SSH user triggers.
    • Bot-wasting tactics like zipbombs or bogus content to poison AI scrapers.
    • mTLS, VPNs/WireGuard, and Cloudflare/Anubis-style frontends for private or small sites.
  • Mixed experience: some see these as sufficient; others say even hobby sites get overwhelmed without big-CDN protection.

Effects on hobbyists and self-hosting

  • Many recount constant automated probing since the 2000s; logs full of exploit attempts against software they don’t even run.
  • Some argue the real issue is unoptimized stacks (e.g., Gitea + Fail2ban) rather than traffic volume.
  • Others say requiring deep security expertise and endless hardening proves the environment is objectively hostile, discouraging casual self-hosting.

Debate over the internet’s past and future

  • Some think “the internet is over” as an open, welcoming space; any new platform will be swarmed the moment it gains traction.
  • Suggestions range from moving to niche protocols (Gemini, IPv6-only) to accepting centralization and signing/identity as inevitable.
  • There’s a broader philosophical split: internet as immense net-benefit vs. net-negative (consumerism, attention capture, AI “plastic content”), with no consensus on a realistic path to “safe haven” status.