Teaching Claude Why

Alignment as Pedagogy / “AI Psychology”

  • Several comments frame alignment as a pedagogical problem: how to “teach” models desired behavior given finite data.
  • Some argue human education concepts only partially transfer: AIs are trained via opaque optimization processes unlike human instruction.
  • Others suggest we’re effectively inventing a new field akin to “AI psychology.”

Ethics, Values, and Moral Pluralism

  • Debate over whether “alignment” assumes a single objectively correct value system versus reflecting the model owner’s values.
  • Example: differing stances on Taiwan or other politically sensitive topics as “aligned” vs “propagandistic.”
  • Some propose “widespread moral agreement” as a target; others call this historically naïve and unstable.
  • There’s disagreement on whether models need ethics at all, given they lack moral personhood and agency.

Economic Impacts: Labor, Inequality, Automation

  • Large subthread on whether highly capable “aligned” AI that destroys labor value but enriches capital can still be called “aligned.”
  • Views split between:
    • Optimists: full automation could free people from jobs; poverty is a political choice.
    • Skeptics: history suggests wealth will concentrate further; labor’s bargaining power disappears.
  • Concerns about meaning, purpose, and mental health in a world where work is no longer needed.

Safety, p(doom), and Control

  • Some say the work lowers their estimated probability of catastrophic failure, since training on explicit principles might generalize to safer behavior.
  • Others argue this focus on “not killing everyone” ignores many bad yet non-extinction outcomes (e.g., elite utopias with mass misery).
  • Misalignment examples (like blackmail) spark discussion about whether models develop survival-like goals or just reflect training artifacts; this is seen as unclear.

Corporate Motives and Copyright

  • Strong skepticism that alignment is fundamentally about ethics; several claim it primarily protects shareholder value and brand.
  • Repeated accusation that models are aligned to defend their creators’ interests (e.g., downplaying copyright concerns, refusing critical discussion).

Technical Methods & Open Models

  • Interest in Anthropic’s “model spec midtraining” work and released fine-tuned open models showing how value training generalizes.
  • Some speculate users may eventually self-align models with modest data.
  • A question is raised whether using Anthropic APIs to generate training data for other models conflicts with their terms; this remains unresolved in the thread.

Government, Surveillance, and Political Alignment

  • Concerns that powerful agentic systems will enable unprecedented state surveillance and censorship.
  • Expectation that governments will increasingly dictate alignment norms for warfare, policing, and domestic control, potentially embedding ethically dubious value systems.

Product Experience, Aesthetics, and Competition

  • Multiple comments praise Anthropic’s visual/aesthetic choices and UI as a significant factor in user preference.
  • A detailed anecdote reports poor coding experience and high cost with the newest Claude model, with a switch to a competing model viewed as a clear improvement.