2025-04-22

Should We Respect LLMs? A Study on Influence of Prompt Politeness on Performance

Effect of Politeness on LLM Performance

Several commenters highlight the paper’s core claim: prompt politeness measurably affects LLM performance, with impolite prompts often yielding worse answers, refusals, or more bias.
Extremely respectful language doesn’t always help; “moderate” politeness tends to work best, varying by language and model.
A common hypothesis: because models are trained on human text, polite prompts may steer them toward training examples where people gave more careful, higher‑quality answers.
Some suggest this could be automated: a system could rewrite user prompts into optimally polite form before sending them to the model.

Anthropomorphism vs. “Just a Tool”

One side strongly rejects anthropomorphizing LLMs: they are “word calculators,” not sentient beings, and don’t merit respect in a moral sense.
Others counter that anthropomorphism is unavoidable and partly the point: the entire interface is human language, and models actively present as human‑like.
There’s debate over whether treating LLMs like people is skeuomorphism or a useful UI choice.

User Psychology, Manners, and Habits

Many say they remain polite (“please,” “thank you”) not for the model’s sake but to maintain their own habits of courtesy.
Concern: getting used to barking orders at LLMs might bleed into how people treat baristas, colleagues, or smart speakers with human voices.
Others argue humans can context‑switch just fine (terminal vs. email vs. chat) and that rudeness toward a machine need not generalize.
Some frame politeness as self‑discipline or “practicing good manners in private to be well mannered in public.”

Ethics, Rights, and Social Risks

A minority worry that over‑politeness contributes to a cultural push to grant AI “human‑like” standing or rights, despite no evidence of consciousness.
Others note that if AI ever does become conscious, rights claims will be inevitable, just as views evolved about animals.
A few jokingly invoke “future AI judging us” or Roko’s basilisk–style scenarios, while others critique this as Pascal’s‑wager‑type thinking.

Prompt Style, Structure, and Tactics

Multiple commenters report that polite‑but‑firm, specific instructions often yield better, more focused code or text than either harsh abuse or vague brevity.
Some find explicit positive feedback (“this part is good, now tweak X”) prevents unnecessary rewrites.
Others say structure and role‑play (e.g., military hierarchy, emotional framing) matter more than raw politeness level for steering behavior.

Related topics