Aggressive bots ruined my weekend

Residential & mobile proxy networks via apps

  • Several comments say it’s well-known that “residential proxies” are often built from mobile devices and consumer connections, via SDKs bundled into free apps (including VPNs, streaming, or “passive income” apps).
  • Users typically get vague consent dialogs or in‑app rewards, which many won’t understand; some suspect apps may even run proxies silently.
  • People report abuse complaints from ISPs after joining such schemes, while proxy providers market “unblock anything” capabilities and even sell scraped datasets.
  • Ethically, this is viewed as highly deceptive; some call it akin to turning user devices into unwitting botnet nodes and argue it should be illegal or treated as malware.

Impact on small/indie sites and services

  • Multiple operators (blogs, WordPress farms, a large book catalog site) describe a sharp rise in abusive scraping:
    • No respect for rate limits or robots.txt.
    • Hidden identities (no bot UAs, VPNs/mobile IPs, anti-fingerprinting, TLS cloaking).
    • Exhaustive crawling of parameter combinations, making caching hard.
  • For small services, this consumes 90%+ of traffic in some cases, turning operations into a “hellscape” and prompting questions about whether indie hosting is viable.
  • Others argue these indie spaces are worth defending as rare pockets of authentic, personal web content.

Technical mitigation strategies

  • Ideas and experiences include:
    • Reverse proxies with advanced rules (e.g., Pingoo).
    • CDN caching layers and selective protection (Fastly rules, Cloudflare Turnstile on “expensive” paths).
    • Dynamic honeypots via robots.txt: trap-only URLs that, when hit, trigger bans or even “zip bombs.”
    • Very early, cheap per‑IP resource tracking and temporary blocking in the app server.
  • Concerns: CGNAT and residential proxies make IP-based blocking crude and potentially overbroad, sometimes effectively blocking entire cities.

Legal and collective responses

  • Some suggest suing abusive scrapers under DDoS theories, but others highlight:
    • Difficulty attributing traffic behind proxy networks and foreign jurisdictions.
    • Cost of legal action for small operators.
  • A proposal appears for a shared abuse-detection service (probabilistic reporting + Bloom filters), but trust, gaming, and residential IP issues are seen as hard problems.

Scraping as essential vs. exploitative

  • One camp argues scraping public data is foundational and often beneficial (search, comparison, affiliate sites), and that we should design fair-use standards rather than demonize all scraping.
  • Others counter that older “good citizen” norms (robots.txt, modest rates) are being ignored by commercial and AI-driven scrapers whose profit motives externalize costs onto small sites, pushing the web toward more centralization (Cloudflare, major clouds) and gated platforms.