Backing up Spotify
Legality and Terms of Service
- Many commenters state the project is clearly illegal copyright infringement for the audio files; some distinguish “piracy” from “theft” but still call it unlawful.
- Debate over whether scraping violates law or only Spotify’s ToS. One lawyer notes ToS are contract law and usually require actual assent; criminal vs civil liability are separated.
- Metadata-only release is seen as much safer; text metadata itself might be legal, though large‑scale scraping could still breach contracts and anti‑hacking statutes depending on jurisdiction.
- Some argue research use might fall under fair use (especially in the US), but that doesn’t protect redistribution.
- Jurisdiction matters: Anna’s Archive is believed to operate from Russia or similar jurisdictions, complicating enforcement but not preventing DNS/IP blocking, which is already happening in several EU countries.
Ethics, Artists, and “Stealing”
- One side: ripping and releasing Spotify’s catalog is framed as “stealing” from thousands of artists, many already poorly paid; enabling others to resell or stream without licenses is seen as clearly harmful.
- Counter‑side: copying doesn’t deprive the rights holder of their copy; main harm is hypothetical lost sales, which pirates might never have paid anyway. Many argue streaming pays artists “peanuts” and labels capture most revenue.
- Several musicians say streaming income is negligible; real money comes from touring, merch, direct sales (Bandcamp, CDs, vinyl). For them, large‑scale piracy is more about exposure than lost income.
- Preservationists stress that streaming platforms routinely remove works (rights changes, regional exits, politics), creating “contemporary lost media.” They view this archive as cultural insurance for future generations.
- Others are uneasy: they support preservation but fear this scale and visibility will draw aggressive music‑industry litigation and jeopardize Anna’s Archive’s book collections.
Spotify Critique
- Frequent reminders that Spotify itself reportedly bootstrapped with unlicensed catalogs; some see current outrage as hypocritical.
- Complaints about tiny per‑stream payouts, new minimum‑stream thresholds, label capture of revenue, and Spotify’s push of low‑royalty “garbage”/AI content and house-brand tracks.
- Users report songs and even whole catalogs disappearing, greyed‑out tracks, region restrictions, worsening recommendations, and general “enshittification.”
- Others defend Spotify as reasonably priced and convenient given storage, bandwidth, and catalog breadth.
Technical Aspects of the Rip
- Discussion of how 300 TB could be exfiltrated: many parallel accounts, continuous streaming/download at 160 kbps (free tier), or direct use of open‑source clients (librespot) and possibly DRM cracks (Widevine, “playplay”).
- Some speculate about insider access or leaked credentials but nothing concrete is known.
- Questions about Spotify’s rate‑limiting and why it didn’t prevent this; suggested that such traffic may look like heavy but plausible listening.
- Torrent distribution: users note BitTorrent supports selective downloading; a “Popcorn Time for music” UI that streams directly from these torrents is considered technically straightforward, if blatantly illegal.
Value of Metadata and Research Uses
- The metadata dump (hundreds of millions of tracks, ISRCs, genres, keys, tempos, popularity scores) is widely seen as a goldmine for:
- Music information retrieval, recommendation, classification, and search benchmarking.
- Studying long‑tail listening behavior: a large majority of tracks have under 1,000 streams.
- Genre and key distributions (e.g., unexpected prevalence of Db/C#; large counts for opera and psytrance raise questions about classification quality or auto‑generated content).
- Some want this ingested into projects like MusicBrainz/EveryNoise or wrapped in an open API; others mention building search front‑ends and using it as IR benchmark data.
- There’s interest in using the metadata to detect AI‑generated “slop” and mislabeled content.
AI Training and “Slop” Concerns
- Many expect big tech and AI labs to be early heavy users; Anna’s Archive already advertises paid bulk access for AI training.
- Critics see this as fueling even more low‑effort generative music and undermining already precarious human creators.
- Supporters reply that AI companies already scrape or license massive catalogs; this archive marginally changes access but greatly helps independent researchers.
User Behavior, Alternatives, and Blocking
- Several note average listeners are unlikely to handle 300 TB torrents; piracy’s real draw is cheap, polished interfaces, not raw files.
- Others describe existing consumer‑friendly piracy boxes for video as precedent, and foresee similar tools for this music set.
- Many advocate supporting artists directly (Bandcamp, shows, merch) while self‑hosting personal libraries (Jellyfin/Navidrome/Lidarr) and using tools to back up Spotify playlists.
- Access to Anna’s Archive is already DNS‑blocked in parts of Germany, Belgium, the Netherlands and elsewhere; users bypass via VPNs or custom DNS, and criticize private “copyright clearing” bodies driving such blocks.