Download responsibly

Irresponsible downloads and CI pipelines

  • Core problem: some users repeatedly download the same huge OSM extracts (e.g. a 20GB Italy file thousands of times/day), or mirror every file daily.
  • Many suspect misconfigured CI or deployment pipelines: “download-if-missing” logic gets moved into CI, containers are rebuilt frequently, or scripts always re-fetch fresh data.
  • Others note this behavior is often accidental, not malicious, but at some point “wilful incompetence becomes malice.”
  • There is concern that similar patterns already exist across ecosystems (e.g. Docker images, libraries) and waste massive compute, bandwidth, and energy.

Rate limiting, blocking, and API keys

  • Many commenters argue rate limiting is a solved problem and should be implemented rather than relying on blog appeals.
  • Counterpoints:
    • IP-based limits can hurt innocent users on shared IPs (universities, CI farms, VPNs) and can be weaponized for DoS.
    • The current Geofabrik setup (Squid proxies, IPv4-only rate limiting, per-node not global) makes correct limiting nontrivial.
  • Suggested middle grounds:
    • Lightweight auth (API keys, email, or UUID-style per-download URLs) to identify abusers.
    • Anonymous low-rate tier + higher limits for authenticated users.
    • Throttling rather than hard blocking (progressively slower downloads).

BitTorrent and alternative distribution

  • Many see this as a textbook BitTorrent use case: large, popular, mostly immutable files; examples cited include OSM planet dumps and Wikipedia torrents.
  • Enthusiasts cite better scalability and reduced origin load; some existing tools and BEPs for updatable torrents are mentioned.
  • Skepticism and obstacles:
    • Bad reputation of BitTorrent (piracy associations, corporate policies, “potentially unwanted” software).
    • NAT/firewall complexity, lack of default clients, fear of seeding/upload liability, asymmetric residential upload.
    • From a network-operator view, BitTorrent’s many peer-to-peer flows complicate peering and capacity planning.
    • For many corporate users, torrents are simply a non-starter; HTTP/CDNs remain easier.

API and CI culture

  • Broader frustration that many APIs and tools aren’t designed for bulk or batched operations, forcing clients into many small calls.
  • Complaints that some B2B customers treat 429s as provider faults rather than signals to change their code, and will even escalate commercially.
  • Several argue CI should default to offline, cached builds and disallow arbitrary network access to avoid such abuse.

Open data ecosystem

  • Some praise Geofabrik for providing “clean-ish” OSM extracts and note this benefits both community and related commercial services.
  • Alternatives like Parquet-based OSM/Overture datasets on S3 (with surgical querying via HTTP range requests) are mentioned as more bandwidth-efficient for analytics workloads.