Famous cognitive psychology experiments that failed to replicate
Replication Rates and Famous Results
- Commenters cite large replication projects showing low rates across psychology subfields (social ~37%, cognitive ~42%, etc.).
- Several note that “famous” and counterintuitive results are often the least robust, yet get the most citations and media attention.
- There is interest in a corresponding list of “famous experiments that do replicate,” which seems harder to assemble.
Incentives, Publication, and Tracking Replications
- Structural incentives favor novel, striking findings over careful replications.
- Suggestions:
- Require PhD students or publicly funded projects to include replication work.
- Attach a persistent “stats card” to each paper, tracking replications, failures, and citations.
- Others push back that offloading replication onto grad students is unfair and does not fix career-pressure incentives.
How “Debunked” Are These Studies?
- Multiple commenters argue the article overstates its conclusions; “failed replication” ≠ “false.”
- Some replications are underpowered or may have design differences; for effects like ego depletion or stereotype threat, meta-analyses and wording of key replication papers leave room for small or context-dependent effects.
- There’s concern the piece encourages simplistic “psychology is silly” takes and doesn’t communicate uncertainty well.
IQ, Measurement, and Cultural Bias
- IQ tests are proposed as an example of highly replicable cognitive measures; others counter:
- They largely predict performance in test-like, culturally specific contexts.
- Results vary with practice, schooling, and socio-economic status.
- Cross-cultural and “culture-specific IQ” examples highlight strong cultural loading.
- Debate extends to personality tests: Big Five seen as better than Myers–Briggs, but even it faces serious critiques.
Statistics, Methodology, and Cross-Discipline Problems
- Several claim psychology has a “cookbook” stats culture, with widespread p‑hacking and weak experimental design.
- Others note that designing valid experiments on humans is intrinsically hard and that similar replication issues exist in biomedicine, economics, ML, and medical research.
- Some advocate more Bayesian methods and better experimental design training.
Social Impact and Trust in Science
- Discussion about how much harm bad social science has caused:
- Some point to limited direct policy impact; others cite examples like stereotype threat and other findings used to justify policies.
- A major concern is erosion of public trust in “science,” feeding vaccine and COVID skepticism.
- Commenters distinguish between science as a method (which demands skepticism) and “trust the science” as dogma.
Field Boundaries, Theory, and Reform
- Multiple people note most examples are really social/developmental psychology, not “cognitive” per se.
- One argument: psychology suffers from a lack of strong, falsifiable core theories, so surprising findings can’t be screened against theory before publication.
- Others say psychology is among the fields most actively confronting the replication crisis, with tightening standards over the last decade.
Other Notable Threads
- Stanford Prison Experiment and related ethical scandals (e.g., APA and interrogation/torture) reinforce mistrust.
- Hormone- and neurotransmitter-heavy language (cortisol, dopamine) is flagged as a strong heuristic for pseudoscientific self-help.
- Some commenters still find personal value in “debunked” ideas (e.g., power poses, marshmallow test, growth mindset) as metaphors or habits, independent of the original experimental claims.