People are just as bad as my LLMs

Reaction to the Article and Title

  • Several commenters argue the title (“People are just as bad as my LLMs”) overstates the case and veers into misanthropy; a more accurate framing would be “people can be just as bad.”
  • Many think the experiment is poorly chosen: using interleaved HN comments as a proxy for hiring potential is seen as arbitrary and weakly related to real-world performance.
  • Some say the whole exercise is like comparing one Markov chain with another and doesn’t justify broad conclusions about people vs. LLMs.
  • Others note this is really about one narrow bias shared by humans and LLMs, not general equivalence.

RLHF, Randomness, and Number Biases

  • Discussion of research: RLHF and “aligning to human preferences” can induce mode collapse, e.g., “choose a random number” converging on 7 or 42.
  • Long subthread explores why humans pick 7: mid-range avoidance of extremes, preference for primes, and (contested) cultural/religious “luck” associations that differ by region.
  • People note humans are systematically bad at generating random sequences (avoid repetition, avoid endpoints, avoid small subranges).

Pairwise Ranking and Label-Order Bias

  • The article’s “person 1 vs person 2” bias is recognized as a known effect in LLM pairwise ranking.
  • Suggested mitigations:
    • Evaluate each pair in both orders and average.
    • Use sorting-based schemes (e.g., Quicksort/Heapsort) vs. full pairwise comparisons; trade off bias vs. compute.
  • Some argue symmetrization can hide that the model isn’t actually doing the intended evaluation, just responding to label position.

Do LLMs Emulate Humans or Just Text?

  • One camp: LLMs are trained on human text, so they inherit human-like statistical biases (primacy/recency, cultural patterns).
  • Another camp stresses they “parrot documents,” not “behave like humans”; confusion between sounding human and thinking human is linked to the ELIZA effect.
  • Distinction is drawn between pretraining (mimic corpus) and RLHF (optimize for human raters’ preferences, attempting—but not fully succeeding—to suppress some social biases).

Intelligence, Reliability, and Correctness Standards

  • Long debate on whether LLMs are “intelligent”:
    • Skeptics: they predict tokens, don’t form concepts, lack intention, introspection, or world models; calling them intelligent is anthropomorphizing.
    • Defenders: functionally they design/debug novel code and solve complex tasks; if intelligence is “acquire and apply knowledge,” they meet at least part of that bar.
    • Others argue the term “AI” in CS was never meant to imply human-like minds; it covers many subfields far removed from human cognition.
  • Some point out LLMs often confidently state falsehoods (e.g., incorrect current dates) instead of admitting ignorance; this is contrasted with humans who can decline to answer or check a clock.
  • Others note LLMs are trained to be agreeable and admit possible error, unlike many humans who resist acknowledging they’re wrong.

Accountability, Safety, and Economic Pressures

  • Key distinction: humans are alive, have rights/responsibilities, and can be held accountable (fired, jailed); LLMs cannot, though companies deploying them can.
  • Concern that accepting “AI interns” with human-like unreliability undermines the traditional expectation of computers as precise, deterministic tools.
  • Counterpoint: in many domains we already manage fallible humans with process, redundancy, and fault tolerance; similar safety engineering could be applied around AI instead of assuming perfect accuracy.
  • Some foresee a “race to the bottom” where cheaper but lower-quality AI replaces humans; others say that race is driven by incentives and is hard to avoid.
  • Several insist that, despite individual biases, a consensus of humans still outperforms current LLMs on judgment-heavy tasks.

Language Around Human Faults

  • A dense subthread debates whether calling “people” as a group bad is “racist,” “prejudicial,” or “misanthropic.”
  • The core objection: treating “humanity” as a homogeneous class with fixed negative traits ignores variability and exceptions.