Aggressive bots ruined my weekend
Residential & mobile proxy networks via apps
- Several comments say it’s well-known that “residential proxies” are often built from mobile devices and consumer connections, via SDKs bundled into free apps (including VPNs, streaming, or “passive income” apps).
- Users typically get vague consent dialogs or in‑app rewards, which many won’t understand; some suspect apps may even run proxies silently.
- People report abuse complaints from ISPs after joining such schemes, while proxy providers market “unblock anything” capabilities and even sell scraped datasets.
- Ethically, this is viewed as highly deceptive; some call it akin to turning user devices into unwitting botnet nodes and argue it should be illegal or treated as malware.
Impact on small/indie sites and services
- Multiple operators (blogs, WordPress farms, a large book catalog site) describe a sharp rise in abusive scraping:
- No respect for rate limits or robots.txt.
- Hidden identities (no bot UAs, VPNs/mobile IPs, anti-fingerprinting, TLS cloaking).
- Exhaustive crawling of parameter combinations, making caching hard.
- For small services, this consumes 90%+ of traffic in some cases, turning operations into a “hellscape” and prompting questions about whether indie hosting is viable.
- Others argue these indie spaces are worth defending as rare pockets of authentic, personal web content.
Technical mitigation strategies
- Ideas and experiences include:
- Reverse proxies with advanced rules (e.g., Pingoo).
- CDN caching layers and selective protection (Fastly rules, Cloudflare Turnstile on “expensive” paths).
- Dynamic honeypots via robots.txt: trap-only URLs that, when hit, trigger bans or even “zip bombs.”
- Very early, cheap per‑IP resource tracking and temporary blocking in the app server.
- Concerns: CGNAT and residential proxies make IP-based blocking crude and potentially overbroad, sometimes effectively blocking entire cities.
Legal and collective responses
- Some suggest suing abusive scrapers under DDoS theories, but others highlight:
- Difficulty attributing traffic behind proxy networks and foreign jurisdictions.
- Cost of legal action for small operators.
- A proposal appears for a shared abuse-detection service (probabilistic reporting + Bloom filters), but trust, gaming, and residential IP issues are seen as hard problems.
Scraping as essential vs. exploitative
- One camp argues scraping public data is foundational and often beneficial (search, comparison, affiliate sites), and that we should design fair-use standards rather than demonize all scraping.
- Others counter that older “good citizen” norms (robots.txt, modest rates) are being ignored by commercial and AI-driven scrapers whose profit motives externalize costs onto small sites, pushing the web toward more centralization (Cloudflare, major clouds) and gated platforms.