2025-09-17

Famous cognitive psychology experiments that failed to replicate

Replication Rates and Famous Results

Commenters cite large replication projects showing low rates across psychology subfields (social ~37%, cognitive ~42%, etc.).
Several note that “famous” and counterintuitive results are often the least robust, yet get the most citations and media attention.
There is interest in a corresponding list of “famous experiments that do replicate,” which seems harder to assemble.

Incentives, Publication, and Tracking Replications

Structural incentives favor novel, striking findings over careful replications.
Suggestions:
- Require PhD students or publicly funded projects to include replication work.
- Attach a persistent “stats card” to each paper, tracking replications, failures, and citations.
Others push back that offloading replication onto grad students is unfair and does not fix career-pressure incentives.

How “Debunked” Are These Studies?

Multiple commenters argue the article overstates its conclusions; “failed replication” ≠ “false.”
Some replications are underpowered or may have design differences; for effects like ego depletion or stereotype threat, meta-analyses and wording of key replication papers leave room for small or context-dependent effects.
There’s concern the piece encourages simplistic “psychology is silly” takes and doesn’t communicate uncertainty well.

IQ, Measurement, and Cultural Bias

IQ tests are proposed as an example of highly replicable cognitive measures; others counter:
- They largely predict performance in test-like, culturally specific contexts.
- Results vary with practice, schooling, and socio-economic status.
- Cross-cultural and “culture-specific IQ” examples highlight strong cultural loading.
Debate extends to personality tests: Big Five seen as better than Myers–Briggs, but even it faces serious critiques.

Statistics, Methodology, and Cross-Discipline Problems

Several claim psychology has a “cookbook” stats culture, with widespread p‑hacking and weak experimental design.
Others note that designing valid experiments on humans is intrinsically hard and that similar replication issues exist in biomedicine, economics, ML, and medical research.
Some advocate more Bayesian methods and better experimental design training.

Social Impact and Trust in Science

Discussion about how much harm bad social science has caused:
- Some point to limited direct policy impact; others cite examples like stereotype threat and other findings used to justify policies.
- A major concern is erosion of public trust in “science,” feeding vaccine and COVID skepticism.
Commenters distinguish between science as a method (which demands skepticism) and “trust the science” as dogma.

Field Boundaries, Theory, and Reform

Multiple people note most examples are really social/developmental psychology, not “cognitive” per se.
One argument: psychology suffers from a lack of strong, falsifiable core theories, so surprising findings can’t be screened against theory before publication.
Others say psychology is among the fields most actively confronting the replication crisis, with tightening standards over the last decade.

Other Notable Threads

Stanford Prison Experiment and related ethical scandals (e.g., APA and interrogation/torture) reinforce mistrust.
Hormone- and neurotransmitter-heavy language (cortisol, dopamine) is flagged as a strong heuristic for pseudoscientific self-help.
Some commenters still find personal value in “debunked” ideas (e.g., power poses, marshmallow test, growth mindset) as metaphors or habits, independent of the original experimental claims.

Related topics