LLMs are more persuasive than incentivized human persuaders
Why LLMs May Outperform Humans at Persuasion
- LLMs can recall and recombine huge amounts of “factual-sounding” content, making their answers seem researched and authoritative compared to short, bare human replies.
- They don’t get tired, will respond to every point in a gish-gallop, and can mirror the interlocutor’s tone and style, which helps with rapport.
- They’re trained on vast corpora full of persuasive language (marketing, scams, bullying, debate material), giving them a rich library of tactics.
- Their strength is “shallow but extremely wide” search: rapidly exploring wording and framing that satisfy many small constraints.
Hallucination, Lying, and RLHF
- Commenters stress that LLMs smoothly fabricate details to make arguments look stronger; bad math proofs and fake “facts” can look airtight until inspected closely.
- Some models hallucinate less than others, but benchmarks show trade-offs between capability and hallucination rates.
- A key criticism: RLHF / “human preference” tuning rewards outputs people like, not truth. A lie that isn’t recognized as a lie is often preferred, effectively optimizing for undetectable deception.
- This makes LLMs “bad tools” in an engineering sense: they fail silently and confidently, instead of flagging uncertainty.
Human vs LLM Communication Styles
- Examples (like defining a stack) show humans arguing over minutiae, misreading, or hair-splitting logic structures. Some see this as necessary precision; others say it’s why people prefer LLMs’ smoother answers.
- Several anecdotes compare LLMs to skilled human bullshitters who care only about being convincing, not about truth.
Debate Culture, Gish Gallop, and Datasets
- High-school/college debate practices (spreading: ultra-fast delivery of many arguments, gish-gallop tactics) are cited as analogous to LLM persuasion.
- Debate incentives reward volume of arguments and penalize ignoring even absurd claims, distorting debate away from clarity or audience understanding.
- A large open debate-argument dataset derived from this culture is being used to train/evaluate LLMs, arguably reinforcing these tactics.
Experimental Design and Word Count
- One close reading of the paper notes LLM advice messages were over twice as long as human ones. Word count may explain much of the persuasive gap.
- Some suggest rerunning the experiment with controlled lengths or instructing humans to write longer, to see if LLMs still win.
- Others note that longer outputs also reduce hallucinations, and that humans underestimate how much sheer length biases perceived rigor.
Social, Political, and Commercial Implications
- Many are worried about mass persuasion: targeted political messaging, subtle advertising, and manipulation on social platforms.
- Fears include young users over-trusting “magic oracles” and the prospect of chatbots that quietly embed product pushes into otherwise helpful advice.
- Some propose personal “loyal” models to critique incoming persuasive content—leading to the image of LLMs arguing with other LLMs on our behalf.
- Commenters expect political campaigns and advertisers to adopt such systems aggressively; some joke that salespeople, not programmers, should be most worried about replacement.
Broader AI Trajectory and Labor
- One camp anticipates rapid upheaval: any value delivered via digital interfaces (especially knowledge work and persuasion-heavy roles) is vulnerable, with robotics following later.
- Another camp is skeptical: hallucinations and legal liability limit real deployment; current productivity gains feel closer to autocomplete than revolution.
- There’s debate over whether we’re heading toward a “weak singularity” (recursive improvement, end of scarcity) or just another overhyped tech wave.