2025-10-29

Aggressive bots ruined my weekend

Residential & mobile proxy networks via apps

Several comments say it’s well-known that “residential proxies” are often built from mobile devices and consumer connections, via SDKs bundled into free apps (including VPNs, streaming, or “passive income” apps).
Users typically get vague consent dialogs or in‑app rewards, which many won’t understand; some suspect apps may even run proxies silently.
People report abuse complaints from ISPs after joining such schemes, while proxy providers market “unblock anything” capabilities and even sell scraped datasets.
Ethically, this is viewed as highly deceptive; some call it akin to turning user devices into unwitting botnet nodes and argue it should be illegal or treated as malware.

Impact on small/indie sites and services

Multiple operators (blogs, WordPress farms, a large book catalog site) describe a sharp rise in abusive scraping:
- No respect for rate limits or robots.txt.
- Hidden identities (no bot UAs, VPNs/mobile IPs, anti-fingerprinting, TLS cloaking).
- Exhaustive crawling of parameter combinations, making caching hard.
For small services, this consumes 90%+ of traffic in some cases, turning operations into a “hellscape” and prompting questions about whether indie hosting is viable.
Others argue these indie spaces are worth defending as rare pockets of authentic, personal web content.

Technical mitigation strategies

Ideas and experiences include:
- Reverse proxies with advanced rules (e.g., Pingoo).
- CDN caching layers and selective protection (Fastly rules, Cloudflare Turnstile on “expensive” paths).
- Dynamic honeypots via robots.txt: trap-only URLs that, when hit, trigger bans or even “zip bombs.”
- Very early, cheap per‑IP resource tracking and temporary blocking in the app server.
Concerns: CGNAT and residential proxies make IP-based blocking crude and potentially overbroad, sometimes effectively blocking entire cities.

Legal and collective responses

Some suggest suing abusive scrapers under DDoS theories, but others highlight:
- Difficulty attributing traffic behind proxy networks and foreign jurisdictions.
- Cost of legal action for small operators.
A proposal appears for a shared abuse-detection service (probabilistic reporting + Bloom filters), but trust, gaming, and residential IP issues are seen as hard problems.

Scraping as essential vs. exploitative

One camp argues scraping public data is foundational and often beneficial (search, comparison, affiliate sites), and that we should design fair-use standards rather than demonize all scraping.
Others counter that older “good citizen” norms (robots.txt, modest rates) are being ignored by commercial and AI-driven scrapers whose profit motives externalize costs onto small sites, pushing the web toward more centralization (Cloudflare, major clouds) and gated platforms.

Related topics