Ask HN: Website with 6^16 subpages and 80k+ daily bots
Scale of the site & color math
- Several commenters challenge the “6^16 subpages” claim, explaining that 6-digit hex colors span 16^6 = 2^24 ≈ 16.7M unique RGB values, not 6^16.
- Thread walks through basic combinatorics (decimal 00–99 as 10^2, hex 000000–FFFFFF as 16^6) and byte/bit reasoning (256^3, 2^24).
- Some note that additional valid CSS forms (3‑digit hex, 8‑digit with alpha) would increase URL count, but still far below 6^16.
Nature of the pages & crawling
- Pages are generated dynamically from the URL (e.g.,
/000000,/000001), each with color-derived content. - Each page links to ~20 “similar color” URLs to feed crawlers, explaining how bots “crawl” the subpages.
- Some argue this is a single dynamic page pattern, not “millions of subpages” in the traditional sense.
Monetization & value debate
- Suggestions: add AdSense, sell backlinks, create a bot IP ban-list product, or sell the high-traffic site (with debate over the ethics of hiding that traffic is mostly bots).
- Others are skeptical there’s real value without human visitors, likening it to search-engine spam or a novelty experiment.
Bot analysis & defenses
- Multiple commenters recommend treating the site as a honeypot: log user agents, IPs, ASN, crawl depth, and publish stats.
- Technical mitigations proposed:
- robots.txt and bot-specific disallows.
- Cloudflare “Bot Fight” mode, rate limiting, CAPTCHAs, 402/paywalls.
- Serving pages without HTML links (JS-rendered only) or SPA-style UI to reduce crawlability.
- Simple throttling: slow responses, holding sockets open, limiting POST size, and carefully chosen timeouts.
Bot poisoning & adversarial responses
- Elaborate ideas to “fight back”:
- Serving gzip/zip/brotli bombs (sometimes double-layered) to waste scraper resources, with debates on feasibility and limits.
- Injecting misleading or grammatically mangled text, especially targeted at LLM and data-mining bots, to pollute training data.
- Generating random “facts” or clickbait about each color (e.g., who “loves” or “hates” it) so they propagate into models.
- Some caution against harmful responses and broad ASN bans, noting legal and collateral-damage concerns.
Extensions & creative directions
- Ideas to expand the project:
- Include alpha (8‑digit hex) and other color spaces (LAB, HSV, CMYK), even floating-point “infinite” subpages.
- Turn it into a color/CyberChef-style toolkit or art tool.
- Embrace the Library of Babel vibe; several related projects and APIs are mentioned as inspiration.