Surprising gender biases in GPT

Paper design and quality

  • Several commenters criticize the prompts’ spelling/grammar, initially assuming sloppiness.
  • Others note the paper explicitly aimed to mimic elementary-school writing with typical errors and that authors are ESL, which some see as reasonable.
  • A few argue that “playing with ChatGPT” is too thin to be a serious paper; they expect broader cross-linguistic, cross-cultural analysis of gender in language models.

Source of GPT’s gender bias

  • One camp sees bias as a direct artifact of training data: scrape the internet, get its sexism and norms, then models echo them.
  • Another camp argues RLHF and alignment are the main cause, deliberately pushing models toward “pro-female/anti-male” or “woke” norms, citing examples like image models refusing to show certain demographics.
  • Some note that alignment seems to work only in obvious, explicit cases; subtle biases leak through.

Using GPT to infer real-world attitudes

  • Some say the paper is only about GPT-4’s behavior, not society.
  • Others point out the authors interpret biases as reflecting human text corpora and thus underlying social attitudes.
  • A few find this move questionable: GPT is a tuned commercial product, not a neutral survey instrument.

Societal gender norms and feminism (large tangent)

  • Many tie GPT’s asymmetries to real-world patterns: society warmly supports women entering “masculine” roles but is less accepting of men in caregiving or “feminine” roles.
  • Multiple comments discuss higher female college enrollment, targeted programs for women, and perceived neglect of boys/men.
  • Debates flare over feminism’s goals (equality vs “overcorrection”), equal opportunity vs equal outcomes, and whether modern policies mainly serve economic growth (expanding workforce) rather than families.
  • Examples raised include parental leave asymmetries, domestic labor imbalance, and lingering patriarchal norms vs emerging “anti-male” sentiment.

Language, bias, and technical notes

  • Some explore how pronoun choice (“he”/“she”/“they”) interacts with stereotypes and information efficiency.
  • Others discuss whether LLMs can be used as instruments to measure societal bias, or whether alignment and corporate filters distort that signal.
  • A few suggest directly inspecting base models’ token probabilities (e.g., for “he” vs “she”) as a cleaner way to study bias.