LLMs are more persuasive than incentivized human persuaders

Why LLMs May Outperform Humans at Persuasion

  • LLMs can recall and recombine huge amounts of “factual-sounding” content, making their answers seem researched and authoritative compared to short, bare human replies.
  • They don’t get tired, will respond to every point in a gish-gallop, and can mirror the interlocutor’s tone and style, which helps with rapport.
  • They’re trained on vast corpora full of persuasive language (marketing, scams, bullying, debate material), giving them a rich library of tactics.
  • Their strength is “shallow but extremely wide” search: rapidly exploring wording and framing that satisfy many small constraints.

Hallucination, Lying, and RLHF

  • Commenters stress that LLMs smoothly fabricate details to make arguments look stronger; bad math proofs and fake “facts” can look airtight until inspected closely.
  • Some models hallucinate less than others, but benchmarks show trade-offs between capability and hallucination rates.
  • A key criticism: RLHF / “human preference” tuning rewards outputs people like, not truth. A lie that isn’t recognized as a lie is often preferred, effectively optimizing for undetectable deception.
  • This makes LLMs “bad tools” in an engineering sense: they fail silently and confidently, instead of flagging uncertainty.

Human vs LLM Communication Styles

  • Examples (like defining a stack) show humans arguing over minutiae, misreading, or hair-splitting logic structures. Some see this as necessary precision; others say it’s why people prefer LLMs’ smoother answers.
  • Several anecdotes compare LLMs to skilled human bullshitters who care only about being convincing, not about truth.

Debate Culture, Gish Gallop, and Datasets

  • High-school/college debate practices (spreading: ultra-fast delivery of many arguments, gish-gallop tactics) are cited as analogous to LLM persuasion.
  • Debate incentives reward volume of arguments and penalize ignoring even absurd claims, distorting debate away from clarity or audience understanding.
  • A large open debate-argument dataset derived from this culture is being used to train/evaluate LLMs, arguably reinforcing these tactics.

Experimental Design and Word Count

  • One close reading of the paper notes LLM advice messages were over twice as long as human ones. Word count may explain much of the persuasive gap.
  • Some suggest rerunning the experiment with controlled lengths or instructing humans to write longer, to see if LLMs still win.
  • Others note that longer outputs also reduce hallucinations, and that humans underestimate how much sheer length biases perceived rigor.

Social, Political, and Commercial Implications

  • Many are worried about mass persuasion: targeted political messaging, subtle advertising, and manipulation on social platforms.
  • Fears include young users over-trusting “magic oracles” and the prospect of chatbots that quietly embed product pushes into otherwise helpful advice.
  • Some propose personal “loyal” models to critique incoming persuasive content—leading to the image of LLMs arguing with other LLMs on our behalf.
  • Commenters expect political campaigns and advertisers to adopt such systems aggressively; some joke that salespeople, not programmers, should be most worried about replacement.

Broader AI Trajectory and Labor

  • One camp anticipates rapid upheaval: any value delivered via digital interfaces (especially knowledge work and persuasion-heavy roles) is vulnerable, with robotics following later.
  • Another camp is skeptical: hallucinations and legal liability limit real deployment; current productivity gains feel closer to autocomplete than revolution.
  • There’s debate over whether we’re heading toward a “weak singularity” (recursive improvement, end of scarcity) or just another overhyped tech wave.