Famous cognitive psychology experiments that failed to replicate

Replication Rates and Famous Results

  • Commenters cite large replication projects showing low rates across psychology subfields (social ~37%, cognitive ~42%, etc.).
  • Several note that “famous” and counterintuitive results are often the least robust, yet get the most citations and media attention.
  • There is interest in a corresponding list of “famous experiments that do replicate,” which seems harder to assemble.

Incentives, Publication, and Tracking Replications

  • Structural incentives favor novel, striking findings over careful replications.
  • Suggestions:
    • Require PhD students or publicly funded projects to include replication work.
    • Attach a persistent “stats card” to each paper, tracking replications, failures, and citations.
  • Others push back that offloading replication onto grad students is unfair and does not fix career-pressure incentives.

How “Debunked” Are These Studies?

  • Multiple commenters argue the article overstates its conclusions; “failed replication” ≠ “false.”
  • Some replications are underpowered or may have design differences; for effects like ego depletion or stereotype threat, meta-analyses and wording of key replication papers leave room for small or context-dependent effects.
  • There’s concern the piece encourages simplistic “psychology is silly” takes and doesn’t communicate uncertainty well.

IQ, Measurement, and Cultural Bias

  • IQ tests are proposed as an example of highly replicable cognitive measures; others counter:
    • They largely predict performance in test-like, culturally specific contexts.
    • Results vary with practice, schooling, and socio-economic status.
    • Cross-cultural and “culture-specific IQ” examples highlight strong cultural loading.
  • Debate extends to personality tests: Big Five seen as better than Myers–Briggs, but even it faces serious critiques.

Statistics, Methodology, and Cross-Discipline Problems

  • Several claim psychology has a “cookbook” stats culture, with widespread p‑hacking and weak experimental design.
  • Others note that designing valid experiments on humans is intrinsically hard and that similar replication issues exist in biomedicine, economics, ML, and medical research.
  • Some advocate more Bayesian methods and better experimental design training.

Social Impact and Trust in Science

  • Discussion about how much harm bad social science has caused:
    • Some point to limited direct policy impact; others cite examples like stereotype threat and other findings used to justify policies.
    • A major concern is erosion of public trust in “science,” feeding vaccine and COVID skepticism.
  • Commenters distinguish between science as a method (which demands skepticism) and “trust the science” as dogma.

Field Boundaries, Theory, and Reform

  • Multiple people note most examples are really social/developmental psychology, not “cognitive” per se.
  • One argument: psychology suffers from a lack of strong, falsifiable core theories, so surprising findings can’t be screened against theory before publication.
  • Others say psychology is among the fields most actively confronting the replication crisis, with tightening standards over the last decade.

Other Notable Threads

  • Stanford Prison Experiment and related ethical scandals (e.g., APA and interrogation/torture) reinforce mistrust.
  • Hormone- and neurotransmitter-heavy language (cortisol, dopamine) is flagged as a strong heuristic for pseudoscientific self-help.
  • Some commenters still find personal value in “debunked” ideas (e.g., power poses, marshmallow test, growth mindset) as metaphors or habits, independent of the original experimental claims.