People are just as bad as my LLMs
Reaction to the Article and Title
- Several commenters argue the title (“People are just as bad as my LLMs”) overstates the case and veers into misanthropy; a more accurate framing would be “people can be just as bad.”
- Many think the experiment is poorly chosen: using interleaved HN comments as a proxy for hiring potential is seen as arbitrary and weakly related to real-world performance.
- Some say the whole exercise is like comparing one Markov chain with another and doesn’t justify broad conclusions about people vs. LLMs.
- Others note this is really about one narrow bias shared by humans and LLMs, not general equivalence.
RLHF, Randomness, and Number Biases
- Discussion of research: RLHF and “aligning to human preferences” can induce mode collapse, e.g., “choose a random number” converging on 7 or 42.
- Long subthread explores why humans pick 7: mid-range avoidance of extremes, preference for primes, and (contested) cultural/religious “luck” associations that differ by region.
- People note humans are systematically bad at generating random sequences (avoid repetition, avoid endpoints, avoid small subranges).
Pairwise Ranking and Label-Order Bias
- The article’s “person 1 vs person 2” bias is recognized as a known effect in LLM pairwise ranking.
- Suggested mitigations:
- Evaluate each pair in both orders and average.
- Use sorting-based schemes (e.g., Quicksort/Heapsort) vs. full pairwise comparisons; trade off bias vs. compute.
- Some argue symmetrization can hide that the model isn’t actually doing the intended evaluation, just responding to label position.
Do LLMs Emulate Humans or Just Text?
- One camp: LLMs are trained on human text, so they inherit human-like statistical biases (primacy/recency, cultural patterns).
- Another camp stresses they “parrot documents,” not “behave like humans”; confusion between sounding human and thinking human is linked to the ELIZA effect.
- Distinction is drawn between pretraining (mimic corpus) and RLHF (optimize for human raters’ preferences, attempting—but not fully succeeding—to suppress some social biases).
Intelligence, Reliability, and Correctness Standards
- Long debate on whether LLMs are “intelligent”:
- Skeptics: they predict tokens, don’t form concepts, lack intention, introspection, or world models; calling them intelligent is anthropomorphizing.
- Defenders: functionally they design/debug novel code and solve complex tasks; if intelligence is “acquire and apply knowledge,” they meet at least part of that bar.
- Others argue the term “AI” in CS was never meant to imply human-like minds; it covers many subfields far removed from human cognition.
- Some point out LLMs often confidently state falsehoods (e.g., incorrect current dates) instead of admitting ignorance; this is contrasted with humans who can decline to answer or check a clock.
- Others note LLMs are trained to be agreeable and admit possible error, unlike many humans who resist acknowledging they’re wrong.
Accountability, Safety, and Economic Pressures
- Key distinction: humans are alive, have rights/responsibilities, and can be held accountable (fired, jailed); LLMs cannot, though companies deploying them can.
- Concern that accepting “AI interns” with human-like unreliability undermines the traditional expectation of computers as precise, deterministic tools.
- Counterpoint: in many domains we already manage fallible humans with process, redundancy, and fault tolerance; similar safety engineering could be applied around AI instead of assuming perfect accuracy.
- Some foresee a “race to the bottom” where cheaper but lower-quality AI replaces humans; others say that race is driven by incentives and is hard to avoid.
- Several insist that, despite individual biases, a consensus of humans still outperforms current LLMs on judgment-heavy tasks.
Language Around Human Faults
- A dense subthread debates whether calling “people” as a group bad is “racist,” “prejudicial,” or “misanthropic.”
- The core objection: treating “humanity” as a homogeneous class with fixed negative traits ignores variability and exceptions.