Results Are In From Psychology's Largest Reproducibility Test: 39/100 Reproduced
An anonymous reader writes: A crowd-sourced effort to replicate 100 psychology studies has successfully reproduced findings from 39 of them. Some psychologists say this shows the field has a replicability problem. Others say the results are "not bad at all". The results are nuanced: 24 non-replications had findings at least "moderately similar" to the original paper but which didn't quite reach statistical significance. From the article: "The results should convince everyone that psychology has a replicability problem, says Hal Pashler, a cognitive psychologist at the University of California, San Diego, and an author of one of the papers whose findings were successfully repeated. 'A lot of working scientists assume that if it’s published, it’s right,' he says. 'This makes it hard to dismiss that there are still a lot of false positives in the literature.'”
You need to put this in perspective. Sure, psychology is wishywashy field filled with pseudo science. But apparently their studies are about as reproducible as a bunch of the hard sciences fields. If there is anything that reproduciblility studies have taught us is that if there is around a 50% chance your result is correct than you are around the norm, in a great many fields. This 39% would make them about on par with what I remember from medical/cancer reproduciblility studies.
Troll is not a replacement for I disagree.
Psychology, sociology and other social sciences have always been given special treatment precisely because its difficult in some cases to get two independent groups together to rerun an experiment in the first place - and if you try and reproduce an experiment done in the 1950s today, are the results due to poor scientific method in the original experiment, or because the evidence gathered was misinterpreted, or because society has changed which means the results have changed?
Assuming there is some actual effect being investigated, one reproduction will not get you to 'good' levels of surety about the effect. To hit '95%' - you're going to need likely over ten reproductions.
One study != one sample. Each study should have enough cases to make it statistically significant. The problem is related to issues with the sample population or systematic flaws in what you're measuring. To bring it into the realm of physics, if we do a high school gravity experiment and ignore air resistance we can make as many tests as we like, check for measurement uncertainty in our clocks and whatnot and put up some confidence intervals that are still horribly wrong. It's very hard to isolate and experiment with one tiny aspect of the human psyche and most of the problem is the result is nothing but either a statistical fluke or quirk with the people tested that doesn't generalize to the general population.
Live today, because you never know what tomorrow brings