Ask HN: How did the internet discover my subdomain?
Primary ways subdomains get “discovered”
- Certificate Transparency (CT) logs expose any hostname with its own public TLS cert; many tools and services continuously tail these logs.
- Large-scale IPv4 scanning (e.g., by security companies) hits every routable IP and probes common ports, then fingerprints what’s running.
- DNS-based techniques: brute-force enumeration with wordlists, AXFR (zone transfers) on misconfigured DNS servers, and DNSSEC/NSEC zone walking.
- Reverse lookups via TLS: connect to an IP over HTTPS, inspect the certificate/SNI to learn associated hostnames.
DNS, passive data, and commercial services
- DNS zones are not generally enumerable, but:
- Some authoritative servers still allow unauthenticated AXFR (misconfiguration but common enough to mine).
- DNSSEC NSEC/NSEC3 can leak zone structure unless carefully configured.
- “Passive DNS” providers and some ISPs/resolvers sell aggregated query/answer logs, revealing which hostnames are being resolved.
- PTR (reverse DNS) records can map IPs back to hostnames.
- Many subdomain-finding tools aggregate CT, passive DNS, zone-transfer leaks, brute-forced records, and web crawling into searchable databases.
IP scanning and default virtual hosts
- If a scanner connects by raw IP (no SNI/Host header), it often hits the web server’s default vhost; those requests may be logged under a particular subdomain, creating the impression the subdomain itself was targeted.
- With non-SNI TLS or a default cert, the hostname in that cert can be learned even without knowing the domain first.
Telemetry and browser/endpoint leaks
- Browser telemetry, corporate firewalls, antivirus, and URL-filtering appliances can observe domains users visit and feed them into security/crawling ecosystems.
- Email/webmail (e.g., links in Gmail), Chrome/Edge browsing, and similar channels can surface otherwise “unlisted” URLs.
Security through obscurity and mitigations
- Consensus: obscurity (unguessable subdomains) can reduce noise and attack surface but must not be the only control.
- Suggested mitigations: authentication, IP allowlisting, firewalling origin to Cloudflare only, wildcard certs to reduce CT leakage, or hiding sensitive services behind hard-to-guess paths rather than hostnames.