2024-08-19

Transformers in music recommendation

Model choice and technical skepticism

Several commenters question why transformers are needed over simpler models (Wide & Deep, DCNv2, basic NNs) for short music-action histories.
Transformers are seen as useful for long-range dependencies, but some argue that the last few interactions usually suffice to capture “current taste.”
Others note that full sequences can encode multiple timescales (right now, recent weeks, time-of-day patterns, willingness to change genre), which may justify sequence models.
The work is viewed by some as incremental and non-novel; acceptable as a blog post but not ground‑breaking.

Content understanding vs co-occurrence

A major theme is that the described system appears to rely on user actions and track embeddings, not deep analysis of the audio itself.
Many argue that without awareness of musical content, recommendation is like a “deaf DJ” driven by charts and behavior.
Others counter that collaborative filtering and co-occurrence (e.g., playlist co-membership) are extremely strong baselines and hard to beat, comparable to how language models learn from token relations, not semantics.
There is discussion of audio-based features and semantic embeddings (spectral features, self-supervised models), but these are seen as costly and historically underused in large services.

Commercial bias and trust

Strong concern that even excellent models are overridden or skewed by commercial incentives.
Spotify’s “Discovery Mode” and commission-based boost of priority tracks are cited as examples of pay-influenced recommendations and “smart shuffle” inserting monetizable songs.
Some doubt the legality/ethics of unlabeled sponsored recommendations; others note that disclosures exist but are obscure.

User experience, mood, and agency

Many feel current systems overfit to recent listening, fail to account for mood shifts, and conflate “what I like generally” with “what I’m in the mood for now.”
Skip behavior and listening logs are seen as very low-fidelity signals; explicit likes/dislikes and richer context are preferred but rare.
Some argue the best discovery is semi-random “crate digging,” not tight personalization. Others want tools for user-driven branching exploration (similar tracks lists, knowledge-graph style navigation) rather than linear “infinite radio.”

Comparisons to existing and past services

Rdio and Pandora are frequently praised as having had superior, more serendipitous recommendation, often leveraging expert tagging or earlier Echo Nest similarity.
Opinions on current platforms are mixed:
- Spotify: strong tools and community features, but many complain of homogenized, top‑40‑ish outcomes and label influence.
- YouTube Music: some report uncannily good “song radios” and next-track choices.
- Apple Music: viewed as decent but sometimes repetitive or overly focused on popular tracks.

Alternatives, DIY, and open systems

Users mention open or niche projects (ListenBrainz, AcousticBrainz, Discogs exploration, personal embedding experiments, custom playlist generators) as better aligned with deep discovery or local collections.
There is repeated desire for:
- Locally run, unbiased recommenders.
- Systems that surface the long tail, not just already‑popular music.
- Interfaces that emphasize human curation, social discovery, and knowledge graphs (labels, producers, scenes) alongside any transformer-based ranking.

Ethical and societal concerns

Several commenters worry about recommendation systems optimized for engagement turning into addictive “slot machines.”
There is debate over whether services should intentionally reduce stickiness (e.g., avoid autoplay) or factor in user wellbeing; others see this as impractical or paternalistic.
Some fear powerful recommenders plus commercial pressure will narrow musical diversity over time, steering both listening and production toward a small set of sounds.

Related topics