2025-05-17

LLMs are more persuasive than incentivized human persuaders

Why LLMs May Outperform Humans at Persuasion

LLMs can recall and recombine huge amounts of “factual-sounding” content, making their answers seem researched and authoritative compared to short, bare human replies.
They don’t get tired, will respond to every point in a gish-gallop, and can mirror the interlocutor’s tone and style, which helps with rapport.
They’re trained on vast corpora full of persuasive language (marketing, scams, bullying, debate material), giving them a rich library of tactics.
Their strength is “shallow but extremely wide” search: rapidly exploring wording and framing that satisfy many small constraints.

Hallucination, Lying, and RLHF

Commenters stress that LLMs smoothly fabricate details to make arguments look stronger; bad math proofs and fake “facts” can look airtight until inspected closely.
Some models hallucinate less than others, but benchmarks show trade-offs between capability and hallucination rates.
A key criticism: RLHF / “human preference” tuning rewards outputs people like, not truth. A lie that isn’t recognized as a lie is often preferred, effectively optimizing for undetectable deception.
This makes LLMs “bad tools” in an engineering sense: they fail silently and confidently, instead of flagging uncertainty.

Human vs LLM Communication Styles

Examples (like defining a stack) show humans arguing over minutiae, misreading, or hair-splitting logic structures. Some see this as necessary precision; others say it’s why people prefer LLMs’ smoother answers.
Several anecdotes compare LLMs to skilled human bullshitters who care only about being convincing, not about truth.

Debate Culture, Gish Gallop, and Datasets

High-school/college debate practices (spreading: ultra-fast delivery of many arguments, gish-gallop tactics) are cited as analogous to LLM persuasion.
Debate incentives reward volume of arguments and penalize ignoring even absurd claims, distorting debate away from clarity or audience understanding.
A large open debate-argument dataset derived from this culture is being used to train/evaluate LLMs, arguably reinforcing these tactics.

Experimental Design and Word Count

One close reading of the paper notes LLM advice messages were over twice as long as human ones. Word count may explain much of the persuasive gap.
Some suggest rerunning the experiment with controlled lengths or instructing humans to write longer, to see if LLMs still win.
Others note that longer outputs also reduce hallucinations, and that humans underestimate how much sheer length biases perceived rigor.

Social, Political, and Commercial Implications

Many are worried about mass persuasion: targeted political messaging, subtle advertising, and manipulation on social platforms.
Fears include young users over-trusting “magic oracles” and the prospect of chatbots that quietly embed product pushes into otherwise helpful advice.
Some propose personal “loyal” models to critique incoming persuasive content—leading to the image of LLMs arguing with other LLMs on our behalf.
Commenters expect political campaigns and advertisers to adopt such systems aggressively; some joke that salespeople, not programmers, should be most worried about replacement.

Broader AI Trajectory and Labor

One camp anticipates rapid upheaval: any value delivered via digital interfaces (especially knowledge work and persuasion-heavy roles) is vulnerable, with robotics following later.
Another camp is skeptical: hallucinations and legal liability limit real deployment; current productivity gains feel closer to autocomplete than revolution.
There’s debate over whether we’re heading toward a “weak singularity” (recursive improvement, end of scarcity) or just another overhyped tech wave.

Related topics