DiffRhythm: Fast End-to-End Full-Length Song Generation with Latent Diffusion
Automation, Capitalism, and Creative Work
- Strong thread arguing that businesses don’t “hate creatives,” they hate costs; any job that can be done cheaper or better is targeted.
- Some see a bleak endgame: pervasive automation, gig work, mass underemployment, and welfare-supported populations not sharing in productivity gains.
- Others counter that history shows technology usually creates more jobs than it destroys (e.g., agriculture → other sectors), and see mass displacement as unlikely on short time scales.
- There’s explicit worry that capital uses AI to crush labor (including artists) while hoarding gains, with comparisons to past political/economic crises.
What Musicians Actually Want from AI
- Multiple commenters don’t want “one-click songs” but fine-grained assistive tools:
- Multi-tracking, humming-to-melody, generating counterpoint, call/response, flexible accompaniment.
- Plugins/VSTs that separate “composition” (chords, melodies) from “synthesis” (rendering sound).
- One-shot models are widely called “toys,” “pipe dreams,” or novelty; expectation is that serious use will be as compositional aids embedded in DAWs.
Technical & Structural Critiques of DiffRhythm
- Architecture is considered impressive: full-length songs in ~10 seconds is seen as a big technical milestone.
- Audio quality still has noticeable artifacts; listeners expect them to become more obvious with repeated listening.
- Major gap: lack of song structure. Many tracks are described as having no clear chorus, ebb/flow, or development—“glitchy lyrics over a bland backing track.”
- Some suggest that structureless tracks could form a new style; others insist they’re “by definition an incomplete song.”
Use Cases and Practical Value
- Proposed uses:
- Custom background music for video without copyright worries (though similarity strikes remain possible).
- Dynamic music for interactive media.
- Producers generating material to slice, sample, and rework, especially as traditional sample sources are “used up.”
- Skeptics argue royalty-free libraries and existing tools already cover most pragmatic needs.
Is It “Real” Music? Quality, Taste, and “AI Slop”
- Several argue AI tracks are “noise that sounds like music,” analogous to LLM “word salad”: competent surface, no substance.
- Common perception: outputs are highly mediocre, exposing clichés in human-made music as well.
- Some find the lack of clear time signatures and intentionality actively unpleasant.
Creativity, Process, and Human Value
- One camp: the act of making music is itself the value; AI removes practice, growth, and emotional investment, turning creation into a “slot machine.”
- Others: these are just new tools; responsibility and meaning still come from how humans use them, especially when models allow iterative control instead of pure one-shot prompting.
- Concern that AI diminishes the social role and status of people who invested years in a craft, possibly causing them to quit.
Democratization vs Devaluation
- Supporters frame AI as democratizing creativity for those without skill, time, or access; critics counter that subscriptions and GPU costs are a shallow kind of “democracy.”
- Strong pushback from working musicians who see this as uncompensated exploitation of their training data and direct economic competition.
- Broader parallel drawn to AI coding tools: developers may be automating away work they actually enjoy, potentially undermining their own future roles.