Download responsibly
Irresponsible downloads and CI pipelines
- Core problem: some users repeatedly download the same huge OSM extracts (e.g. a 20GB Italy file thousands of times/day), or mirror every file daily.
- Many suspect misconfigured CI or deployment pipelines: “download-if-missing” logic gets moved into CI, containers are rebuilt frequently, or scripts always re-fetch fresh data.
- Others note this behavior is often accidental, not malicious, but at some point “wilful incompetence becomes malice.”
- There is concern that similar patterns already exist across ecosystems (e.g. Docker images, libraries) and waste massive compute, bandwidth, and energy.
Rate limiting, blocking, and API keys
- Many commenters argue rate limiting is a solved problem and should be implemented rather than relying on blog appeals.
- Counterpoints:
- IP-based limits can hurt innocent users on shared IPs (universities, CI farms, VPNs) and can be weaponized for DoS.
- The current Geofabrik setup (Squid proxies, IPv4-only rate limiting, per-node not global) makes correct limiting nontrivial.
- Suggested middle grounds:
- Lightweight auth (API keys, email, or UUID-style per-download URLs) to identify abusers.
- Anonymous low-rate tier + higher limits for authenticated users.
- Throttling rather than hard blocking (progressively slower downloads).
BitTorrent and alternative distribution
- Many see this as a textbook BitTorrent use case: large, popular, mostly immutable files; examples cited include OSM planet dumps and Wikipedia torrents.
- Enthusiasts cite better scalability and reduced origin load; some existing tools and BEPs for updatable torrents are mentioned.
- Skepticism and obstacles:
- Bad reputation of BitTorrent (piracy associations, corporate policies, “potentially unwanted” software).
- NAT/firewall complexity, lack of default clients, fear of seeding/upload liability, asymmetric residential upload.
- From a network-operator view, BitTorrent’s many peer-to-peer flows complicate peering and capacity planning.
- For many corporate users, torrents are simply a non-starter; HTTP/CDNs remain easier.
API and CI culture
- Broader frustration that many APIs and tools aren’t designed for bulk or batched operations, forcing clients into many small calls.
- Complaints that some B2B customers treat 429s as provider faults rather than signals to change their code, and will even escalate commercially.
- Several argue CI should default to offline, cached builds and disallow arbitrary network access to avoid such abuse.
Open data ecosystem
- Some praise Geofabrik for providing “clean-ish” OSM extracts and note this benefits both community and related commercial services.
- Alternatives like Parquet-based OSM/Overture datasets on S3 (with surgical querying via HTTP range requests) are mentioned as more bandwidth-efficient for analytics workloads.