Transformers in music recommendation

Model choice and technical skepticism

  • Several commenters question why transformers are needed over simpler models (Wide & Deep, DCNv2, basic NNs) for short music-action histories.
  • Transformers are seen as useful for long-range dependencies, but some argue that the last few interactions usually suffice to capture “current taste.”
  • Others note that full sequences can encode multiple timescales (right now, recent weeks, time-of-day patterns, willingness to change genre), which may justify sequence models.
  • The work is viewed by some as incremental and non-novel; acceptable as a blog post but not ground‑breaking.

Content understanding vs co-occurrence

  • A major theme is that the described system appears to rely on user actions and track embeddings, not deep analysis of the audio itself.
  • Many argue that without awareness of musical content, recommendation is like a “deaf DJ” driven by charts and behavior.
  • Others counter that collaborative filtering and co-occurrence (e.g., playlist co-membership) are extremely strong baselines and hard to beat, comparable to how language models learn from token relations, not semantics.
  • There is discussion of audio-based features and semantic embeddings (spectral features, self-supervised models), but these are seen as costly and historically underused in large services.

Commercial bias and trust

  • Strong concern that even excellent models are overridden or skewed by commercial incentives.
  • Spotify’s “Discovery Mode” and commission-based boost of priority tracks are cited as examples of pay-influenced recommendations and “smart shuffle” inserting monetizable songs.
  • Some doubt the legality/ethics of unlabeled sponsored recommendations; others note that disclosures exist but are obscure.

User experience, mood, and agency

  • Many feel current systems overfit to recent listening, fail to account for mood shifts, and conflate “what I like generally” with “what I’m in the mood for now.”
  • Skip behavior and listening logs are seen as very low-fidelity signals; explicit likes/dislikes and richer context are preferred but rare.
  • Some argue the best discovery is semi-random “crate digging,” not tight personalization. Others want tools for user-driven branching exploration (similar tracks lists, knowledge-graph style navigation) rather than linear “infinite radio.”

Comparisons to existing and past services

  • Rdio and Pandora are frequently praised as having had superior, more serendipitous recommendation, often leveraging expert tagging or earlier Echo Nest similarity.
  • Opinions on current platforms are mixed:
    • Spotify: strong tools and community features, but many complain of homogenized, top‑40‑ish outcomes and label influence.
    • YouTube Music: some report uncannily good “song radios” and next-track choices.
    • Apple Music: viewed as decent but sometimes repetitive or overly focused on popular tracks.

Alternatives, DIY, and open systems

  • Users mention open or niche projects (ListenBrainz, AcousticBrainz, Discogs exploration, personal embedding experiments, custom playlist generators) as better aligned with deep discovery or local collections.
  • There is repeated desire for:
    • Locally run, unbiased recommenders.
    • Systems that surface the long tail, not just already‑popular music.
    • Interfaces that emphasize human curation, social discovery, and knowledge graphs (labels, producers, scenes) alongside any transformer-based ranking.

Ethical and societal concerns

  • Several commenters worry about recommendation systems optimized for engagement turning into addictive “slot machines.”
  • There is debate over whether services should intentionally reduce stickiness (e.g., avoid autoplay) or factor in user wellbeing; others see this as impractical or paternalistic.
  • Some fear powerful recommenders plus commercial pressure will narrow musical diversity over time, steering both listening and production toward a small set of sounds.