2025-03-10

People are just as bad as my LLMs

Reaction to the Article and Title

Several commenters argue the title (“People are just as bad as my LLMs”) overstates the case and veers into misanthropy; a more accurate framing would be “people can be just as bad.”
Many think the experiment is poorly chosen: using interleaved HN comments as a proxy for hiring potential is seen as arbitrary and weakly related to real-world performance.
Some say the whole exercise is like comparing one Markov chain with another and doesn’t justify broad conclusions about people vs. LLMs.
Others note this is really about one narrow bias shared by humans and LLMs, not general equivalence.

RLHF, Randomness, and Number Biases

Discussion of research: RLHF and “aligning to human preferences” can induce mode collapse, e.g., “choose a random number” converging on 7 or 42.
Long subthread explores why humans pick 7: mid-range avoidance of extremes, preference for primes, and (contested) cultural/religious “luck” associations that differ by region.
People note humans are systematically bad at generating random sequences (avoid repetition, avoid endpoints, avoid small subranges).

Pairwise Ranking and Label-Order Bias

The article’s “person 1 vs person 2” bias is recognized as a known effect in LLM pairwise ranking.
Suggested mitigations:
- Evaluate each pair in both orders and average.
- Use sorting-based schemes (e.g., Quicksort/Heapsort) vs. full pairwise comparisons; trade off bias vs. compute.
Some argue symmetrization can hide that the model isn’t actually doing the intended evaluation, just responding to label position.

Do LLMs Emulate Humans or Just Text?

One camp: LLMs are trained on human text, so they inherit human-like statistical biases (primacy/recency, cultural patterns).
Another camp stresses they “parrot documents,” not “behave like humans”; confusion between sounding human and thinking human is linked to the ELIZA effect.
Distinction is drawn between pretraining (mimic corpus) and RLHF (optimize for human raters’ preferences, attempting—but not fully succeeding—to suppress some social biases).

Intelligence, Reliability, and Correctness Standards

Long debate on whether LLMs are “intelligent”:
- Skeptics: they predict tokens, don’t form concepts, lack intention, introspection, or world models; calling them intelligent is anthropomorphizing.
- Defenders: functionally they design/debug novel code and solve complex tasks; if intelligence is “acquire and apply knowledge,” they meet at least part of that bar.
- Others argue the term “AI” in CS was never meant to imply human-like minds; it covers many subfields far removed from human cognition.
Some point out LLMs often confidently state falsehoods (e.g., incorrect current dates) instead of admitting ignorance; this is contrasted with humans who can decline to answer or check a clock.
Others note LLMs are trained to be agreeable and admit possible error, unlike many humans who resist acknowledging they’re wrong.

Accountability, Safety, and Economic Pressures

Key distinction: humans are alive, have rights/responsibilities, and can be held accountable (fired, jailed); LLMs cannot, though companies deploying them can.
Concern that accepting “AI interns” with human-like unreliability undermines the traditional expectation of computers as precise, deterministic tools.
Counterpoint: in many domains we already manage fallible humans with process, redundancy, and fault tolerance; similar safety engineering could be applied around AI instead of assuming perfect accuracy.
Some foresee a “race to the bottom” where cheaper but lower-quality AI replaces humans; others say that race is driven by incentives and is hard to avoid.
Several insist that, despite individual biases, a consensus of humans still outperforms current LLMs on judgment-heavy tasks.

Language Around Human Faults

A dense subthread debates whether calling “people” as a group bad is “racist,” “prejudicial,” or “misanthropic.”
The core objection: treating “humanity” as a homogeneous class with fixed negative traits ignores variability and exceptions.

Related topics