Teaching Claude Why
Alignment as Pedagogy / “AI Psychology”
- Several comments frame alignment as a pedagogical problem: how to “teach” models desired behavior given finite data.
- Some argue human education concepts only partially transfer: AIs are trained via opaque optimization processes unlike human instruction.
- Others suggest we’re effectively inventing a new field akin to “AI psychology.”
Ethics, Values, and Moral Pluralism
- Debate over whether “alignment” assumes a single objectively correct value system versus reflecting the model owner’s values.
- Example: differing stances on Taiwan or other politically sensitive topics as “aligned” vs “propagandistic.”
- Some propose “widespread moral agreement” as a target; others call this historically naïve and unstable.
- There’s disagreement on whether models need ethics at all, given they lack moral personhood and agency.
Economic Impacts: Labor, Inequality, Automation
- Large subthread on whether highly capable “aligned” AI that destroys labor value but enriches capital can still be called “aligned.”
- Views split between:
- Optimists: full automation could free people from jobs; poverty is a political choice.
- Skeptics: history suggests wealth will concentrate further; labor’s bargaining power disappears.
- Concerns about meaning, purpose, and mental health in a world where work is no longer needed.
Safety, p(doom), and Control
- Some say the work lowers their estimated probability of catastrophic failure, since training on explicit principles might generalize to safer behavior.
- Others argue this focus on “not killing everyone” ignores many bad yet non-extinction outcomes (e.g., elite utopias with mass misery).
- Misalignment examples (like blackmail) spark discussion about whether models develop survival-like goals or just reflect training artifacts; this is seen as unclear.
Corporate Motives and Copyright
- Strong skepticism that alignment is fundamentally about ethics; several claim it primarily protects shareholder value and brand.
- Repeated accusation that models are aligned to defend their creators’ interests (e.g., downplaying copyright concerns, refusing critical discussion).
Technical Methods & Open Models
- Interest in Anthropic’s “model spec midtraining” work and released fine-tuned open models showing how value training generalizes.
- Some speculate users may eventually self-align models with modest data.
- A question is raised whether using Anthropic APIs to generate training data for other models conflicts with their terms; this remains unresolved in the thread.
Government, Surveillance, and Political Alignment
- Concerns that powerful agentic systems will enable unprecedented state surveillance and censorship.
- Expectation that governments will increasingly dictate alignment norms for warfare, policing, and domestic control, potentially embedding ethically dubious value systems.
Product Experience, Aesthetics, and Competition
- Multiple comments praise Anthropic’s visual/aesthetic choices and UI as a significant factor in user preference.
- A detailed anecdote reports poor coding experience and high cost with the newest Claude model, with a switch to a competing model viewed as a clear improvement.