2025-03-04

DiffRhythm: Fast End-to-End Full-Length Song Generation with Latent Diffusion

Automation, Capitalism, and Creative Work

Strong thread arguing that businesses don’t “hate creatives,” they hate costs; any job that can be done cheaper or better is targeted.
Some see a bleak endgame: pervasive automation, gig work, mass underemployment, and welfare-supported populations not sharing in productivity gains.
Others counter that history shows technology usually creates more jobs than it destroys (e.g., agriculture → other sectors), and see mass displacement as unlikely on short time scales.
There’s explicit worry that capital uses AI to crush labor (including artists) while hoarding gains, with comparisons to past political/economic crises.

What Musicians Actually Want from AI

Multiple commenters don’t want “one-click songs” but fine-grained assistive tools:
- Multi-tracking, humming-to-melody, generating counterpoint, call/response, flexible accompaniment.
- Plugins/VSTs that separate “composition” (chords, melodies) from “synthesis” (rendering sound).
One-shot models are widely called “toys,” “pipe dreams,” or novelty; expectation is that serious use will be as compositional aids embedded in DAWs.

Technical & Structural Critiques of DiffRhythm

Architecture is considered impressive: full-length songs in ~10 seconds is seen as a big technical milestone.
Audio quality still has noticeable artifacts; listeners expect them to become more obvious with repeated listening.
Major gap: lack of song structure. Many tracks are described as having no clear chorus, ebb/flow, or development—“glitchy lyrics over a bland backing track.”
Some suggest that structureless tracks could form a new style; others insist they’re “by definition an incomplete song.”

Use Cases and Practical Value

Proposed uses:
- Custom background music for video without copyright worries (though similarity strikes remain possible).
- Dynamic music for interactive media.
- Producers generating material to slice, sample, and rework, especially as traditional sample sources are “used up.”
Skeptics argue royalty-free libraries and existing tools already cover most pragmatic needs.

Is It “Real” Music? Quality, Taste, and “AI Slop”

Several argue AI tracks are “noise that sounds like music,” analogous to LLM “word salad”: competent surface, no substance.
Common perception: outputs are highly mediocre, exposing clichés in human-made music as well.
Some find the lack of clear time signatures and intentionality actively unpleasant.

Creativity, Process, and Human Value

One camp: the act of making music is itself the value; AI removes practice, growth, and emotional investment, turning creation into a “slot machine.”
Others: these are just new tools; responsibility and meaning still come from how humans use them, especially when models allow iterative control instead of pure one-shot prompting.
Concern that AI diminishes the social role and status of people who invested years in a craft, possibly causing them to quit.

Democratization vs Devaluation

Supporters frame AI as democratizing creativity for those without skill, time, or access; critics counter that subscriptions and GPU costs are a shallow kind of “democracy.”
Strong pushback from working musicians who see this as uncompensated exploitation of their training data and direct economic competition.
Broader parallel drawn to AI coding tools: developers may be automating away work they actually enjoy, potentially undermining their own future roles.

Related topics