Dear AI Companies, instead of scraping OpenStreetMap, how about a $10k donation?
Scraping vs official OSM access
- Many note OSM already provides free bulk data (planet dumps, regional extracts, minutely updates, torrents, S3), making scraping irrational and harmful.
- OSM maintainers complain of heavy, careless scraping that ignores robots.txt and hammers tile/rendering APIs instead of using dumps.
- Some suggest scrapings are often done by low-skill teams following generic tutorials rather than exploring official options.
Infrastructure load and the “bot arms race”
- Affected projects report significant extra infra cost from AI crawlers and buggy crawlers (e.g., repeated downloads of the same files).
- People worry this accelerates a shift from open, anonymous web access to login-gated, invite-only “islands” and “dark forest” trust models.
- Various defenses are debated: rate limiting, proof-of-work / Hashcash-style CAPTCHAs, Cloudflare Turnstile, account age checks, and poisoning data for heavy users.
- Many note that AI can solve traditional CAPTCHAs and bots can spread across many IPs, so no solution is perfect; the goal becomes “make it expensive, not impossible.”
IP, copyright, and AI
- Strong sentiment that AI companies are “taking without consent,” re-igniting debates on intellectual property, piracy, and derivative works.
- Some argue IP has been eroding since digital piracy; others say only powerful actors effectively benefit from relaxed enforcement.
- There is comparison to past treatment of individual scrapers (e.g., Aaron Swartz) versus today’s tolerance for large AI-driven scraping.
Corporate payments vs donations
- Multiple commenters say it’s often easier for companies to spend large sums on commercial licenses than small voluntary donations to open projects.
- Reasons cited: procurement processes favor invoices and “products,” risk mitigation, and the perception that paid software includes accountable support.
- Suggestions include “selling donations as licenses” so finance departments can process them.
OSM usability and ecosystem
- Some developers find OSM’s “proper” bulk data route confusing: large files, specialized formats, scattered docs, and rate-limited APIs.
- Others respond that this is intentional: the foundation stays small and focuses on raw data; an ecosystem of third parties is expected to provide streamlined APIs and transformed datasets.