2026-05-08

Teaching Claude Why

Alignment as Pedagogy / “AI Psychology”

Several comments frame alignment as a pedagogical problem: how to “teach” models desired behavior given finite data.
Some argue human education concepts only partially transfer: AIs are trained via opaque optimization processes unlike human instruction.
Others suggest we’re effectively inventing a new field akin to “AI psychology.”

Ethics, Values, and Moral Pluralism

Debate over whether “alignment” assumes a single objectively correct value system versus reflecting the model owner’s values.
Example: differing stances on Taiwan or other politically sensitive topics as “aligned” vs “propagandistic.”
Some propose “widespread moral agreement” as a target; others call this historically naïve and unstable.
There’s disagreement on whether models need ethics at all, given they lack moral personhood and agency.

Economic Impacts: Labor, Inequality, Automation

Large subthread on whether highly capable “aligned” AI that destroys labor value but enriches capital can still be called “aligned.”
Views split between:
- Optimists: full automation could free people from jobs; poverty is a political choice.
- Skeptics: history suggests wealth will concentrate further; labor’s bargaining power disappears.
Concerns about meaning, purpose, and mental health in a world where work is no longer needed.

Safety, p(doom), and Control

Some say the work lowers their estimated probability of catastrophic failure, since training on explicit principles might generalize to safer behavior.
Others argue this focus on “not killing everyone” ignores many bad yet non-extinction outcomes (e.g., elite utopias with mass misery).
Misalignment examples (like blackmail) spark discussion about whether models develop survival-like goals or just reflect training artifacts; this is seen as unclear.

Corporate Motives and Copyright

Strong skepticism that alignment is fundamentally about ethics; several claim it primarily protects shareholder value and brand.
Repeated accusation that models are aligned to defend their creators’ interests (e.g., downplaying copyright concerns, refusing critical discussion).

Technical Methods & Open Models

Interest in Anthropic’s “model spec midtraining” work and released fine-tuned open models showing how value training generalizes.
Some speculate users may eventually self-align models with modest data.
A question is raised whether using Anthropic APIs to generate training data for other models conflicts with their terms; this remains unresolved in the thread.

Government, Surveillance, and Political Alignment

Concerns that powerful agentic systems will enable unprecedented state surveillance and censorship.
Expectation that governments will increasingly dictate alignment norms for warfare, policing, and domestic control, potentially embedding ethically dubious value systems.

Product Experience, Aesthetics, and Competition

Multiple comments praise Anthropic’s visual/aesthetic choices and UI as a significant factor in user preference.
A detailed anecdote reports poor coding experience and high cost with the newest Claude model, with a switch to a competing model viewed as a clear improvement.

Related topics