Ask HN: Website with 6^16 subpages and 80k+ daily bots

Scale of the site & color math

  • Several commenters challenge the “6^16 subpages” claim, explaining that 6-digit hex colors span 16^6 = 2^24 ≈ 16.7M unique RGB values, not 6^16.
  • Thread walks through basic combinatorics (decimal 00–99 as 10^2, hex 000000–FFFFFF as 16^6) and byte/bit reasoning (256^3, 2^24).
  • Some note that additional valid CSS forms (3‑digit hex, 8‑digit with alpha) would increase URL count, but still far below 6^16.

Nature of the pages & crawling

  • Pages are generated dynamically from the URL (e.g., /000000, /000001), each with color-derived content.
  • Each page links to ~20 “similar color” URLs to feed crawlers, explaining how bots “crawl” the subpages.
  • Some argue this is a single dynamic page pattern, not “millions of subpages” in the traditional sense.

Monetization & value debate

  • Suggestions: add AdSense, sell backlinks, create a bot IP ban-list product, or sell the high-traffic site (with debate over the ethics of hiding that traffic is mostly bots).
  • Others are skeptical there’s real value without human visitors, likening it to search-engine spam or a novelty experiment.

Bot analysis & defenses

  • Multiple commenters recommend treating the site as a honeypot: log user agents, IPs, ASN, crawl depth, and publish stats.
  • Technical mitigations proposed:
    • robots.txt and bot-specific disallows.
    • Cloudflare “Bot Fight” mode, rate limiting, CAPTCHAs, 402/paywalls.
    • Serving pages without HTML links (JS-rendered only) or SPA-style UI to reduce crawlability.
    • Simple throttling: slow responses, holding sockets open, limiting POST size, and carefully chosen timeouts.

Bot poisoning & adversarial responses

  • Elaborate ideas to “fight back”:
    • Serving gzip/zip/brotli bombs (sometimes double-layered) to waste scraper resources, with debates on feasibility and limits.
    • Injecting misleading or grammatically mangled text, especially targeted at LLM and data-mining bots, to pollute training data.
    • Generating random “facts” or clickbait about each color (e.g., who “loves” or “hates” it) so they propagate into models.
  • Some caution against harmful responses and broad ASN bans, noting legal and collateral-damage concerns.

Extensions & creative directions

  • Ideas to expand the project:
    • Include alpha (8‑digit hex) and other color spaces (LAB, HSV, CMYK), even floating-point “infinite” subpages.
    • Turn it into a color/CyberChef-style toolkit or art tool.
    • Embrace the Library of Babel vibe; several related projects and APIs are mentioned as inspiration.