Results Are In From Psychology's Largest Reproducibility Test: 39/100 Reproduced

← Back to Stories (view on slashdot.org)

Results Are In From Psychology's Largest Reproducibility Test: 39/100 Reproduced

Posted by samzenpus on Friday May 1, 2015 @12:07AM from the different-directions dept.

An anonymous reader writes: A crowd-sourced effort to replicate 100 psychology studies has successfully reproduced findings from 39 of them. Some psychologists say this shows the field has a replicability problem. Others say the results are "not bad at all". The results are nuanced: 24 non-replications had findings at least "moderately similar" to the original paper but which didn't quite reach statistical significance. From the article: "The results should convince everyone that psychology has a replicability problem, says Hal Pashler, a cognitive psychologist at the University of California, San Diego, and an author of one of the papers whose findings were successfully repeated. 'A lot of working scientists assume that if it’s published, it’s right,' he says. 'This makes it hard to dismiss that there are still a lot of false positives in the literature.'”

23 of 174 comments (clear)

Min score:

Reason:

Sort:

39/100 is the new passing grade. by geekmux · 2015-05-01 00:19 · Score: 4, Insightful

Is there a valid reason we accept studies that have not been reproduced at least one more time to truly vet them before the community?
Logistics, resources, patents, or a need to just plain bullshit people. I'm sure there's plenty of excuses as to why we don't, but doesn't sound like we have a whole lot of good reasons why not.
And those that are labeling a score of 39/100 "not bad at all" should have their head checked. Enjoy your legal fun from that ball of lies.
1. Re:39/100 is the new passing grade. by queazocotal · 2015-05-01 00:30 · Score: 4, Insightful
  
  Funding.
  Assuming for the moment that the reproducers were not particularly more skilled than the original scientists, you can't go from '60% not reproduced' to '60% wrong'.
  Assuming there is some actual effect being investigated, one reproduction will not get you to 'good' levels of surety about the effect. To hit '95%' - you're going to need likely over ten reproductions.
2. Re:39/100 is the new passing grade. by Anonymous Coward · 2015-05-01 00:35 · Score: 5, Insightful
  
  Please remember that this applies to Psychology, a field that is rife with lots of historical issues (and it is getting better).
  In the past, they've practiced thinly veiled religion (Freud, if you don't believe me it's because my unverifiable model explains it, and you are in denial)
  They've overstated their findings, to the tune of "and because of this experiment, we can extrapolate that EVERYONE is just the same!" (Stanford experiment, one of the ones with reproduciblity problems too, coincidentally).
  They've put up roadblocks to proper scientific evidence gathering (and so this experiment was done before we adopted an ethical code that made verification of its outstanding results, that is reproduciblity, possible, but we are going to believe the conclusions anyway).
  It was always called a "soft" Science because that way they could dodge the bullet coming from the real Sciences. However, when you read some of their works (especially their older works) you begin to see a pattern. They come from a history that probably poised them to have a long and hard road to understanding much of what we really do. In their defense they were attempting to build a model of the "mind" which is something that they assumed existed in a particular way, but didn't really test (it took a long time for Turing to come around). Finally, they are burdened with a lot of thinking that doesn't meet Philosophical rigor, because they shore it up with testing that doesn't meet Scientific rigor.
  I'm glad to see the new wave of Psychology coming through. The now base a lot of their findings on biochemical analysis and stronger testing (including better attention to controls and double blind testing, which to their credit, they invented). It's just disheartening that the field lacks respect in other ways because in every intro to Psychology class they keep pushing the sensational "Dogma" experiments as facts when in reality they often fail to reproduce the results.
3. Re:39/100 is the new passing grade. by lurking_giant · 2015-05-01 01:05 · Score: 3, Funny
  
  Should have had the teachers in Atlanta grade the results... http://www.nytimes.com/2015/05...
4. Re:39/100 is the new passing grade. by Richard_at_work · 2015-05-01 01:17 · Score: 3, Interesting
  
  Psychology, sociology and other social sciences have always been given special treatment precisely because its difficult in some cases to get two independent groups together to rerun an experiment in the first place - and if you try and reproduce an experiment done in the 1950s today, are the results due to poor scientific method in the original experiment, or because the evidence gathered was misinterpreted, or because society has changed which means the results have changed?
5. Re:39/100 is the new passing grade. by geminidomino · 2015-05-01 01:30 · Score: 3, Funny
  
  And those that are labeling a score of 39/100 "not bad at all" should have their head checked.
  
  They did, but only 39/100 of them found anything out of whack.
6. Re:39/100 is the new passing grade. by bondsbw · 2015-05-01 03:31 · Score: 3, Insightful
  
  Many sciences, not just the ones you listed, have at least some problem with reproducibility. Verification isn't nearly as sexy as coming up with a new idea.
  During my academic days, all the focus was on new work and literature reviews, but only one professor seemed to (defeatedly) care about verifying the results of other researchers. That doesn't get the funding.
  
  --
  All my liberal friends think I'm a conservative, all my conservative friends think I'm a liberal.
7. Re:39/100 is the new passing grade. by rockmuelle · 2015-05-01 04:08 · Score: 4, Insightful
  
  Gah. I have mod points but want to add to this conversation.
  The point of publishing is to share results of an experiment or study. Basically, a scientific publication tells the audience what the scientist was studying, how they did the experiment, what they found, and what they learned from it. The point of peer review is to review the work to make sure appropriate methods were followed and that the general results agree with the data. Peer review is not meant to verify or reproduce the results, but rather just make sure that the methods were sound.
  Scientific papers are _incremental_ and meant to add to the body of knowledge. It's important to know that papers are never the last word on a subject and the results may not be reproducible. It's up to the community to determine which results are important enough to warrant reproduction. It's also up to the community to read papers in the context of newly acquired knowledge. An active researcher in any field can quickly scan old papers and know which ones are likely no-longer relevant.
  That said, there is a popular belief that once something is published, it is irrefutable truth. That's a problem with how society interacts with science. No practicing scientist believes any individual paper is the gospel truth on a topic.
  The main problem in science that this study highlights is not that papers are difficult to reproduce (that's expected by how science works), but that some (most?) fields currently allow large areas of research to move forward fairly unchecked. In the rush to publish novel results and cover a broad area, no one goes back to make sure the previous results hold up. Thus, we end up with situations where there are a lot of topics that should be explored more deeply but aren't due to the pursuit of novelty.
  If journals encouraged more follow-up and incremental papers, this problem would resolve itself. Once a paper is published, there's almost always follow-up work to see how well the results really hold up. But, publishing that work is more difficult and doesn't help advance a career, especially if the original paper was not yours, so the follow-up work rarely gets done.
  tl;dr: for the general public, it's important to understand that the point of publishing is to share work, peer review just makes sure the work was done properly and makes no claims on correctness, and science is fluid. For scientists, yeah, there are some issues with the constant quest for novel publications vs. incremental work.
  -Chris
8. Re:39/100 is the new passing grade. by Kjella · 2015-05-01 04:20 · Score: 4, Interesting
  
  Assuming there is some actual effect being investigated, one reproduction will not get you to 'good' levels of surety about the effect. To hit '95%' - you're going to need likely over ten reproductions.
  One study != one sample. Each study should have enough cases to make it statistically significant. The problem is related to issues with the sample population or systematic flaws in what you're measuring. To bring it into the realm of physics, if we do a high school gravity experiment and ignore air resistance we can make as many tests as we like, check for measurement uncertainty in our clocks and whatnot and put up some confidence intervals that are still horribly wrong. It's very hard to isolate and experiment with one tiny aspect of the human psyche and most of the problem is the result is nothing but either a statistical fluke or quirk with the people tested that doesn't generalize to the general population.
  
  --
  Live today, because you never know what tomorrow brings
9. Re:39/100 is the new passing grade. by amicusNYCL · 2015-05-01 05:53 · Score: 4, Insightful
  
  I think that there is a bit of a disconnect between scientific methods and things like psychology. I think it stems from the fact that it's easy to set up a thermometer, or an EKG, or whatever else and get those discrete data points, but it's difficult to measure things like "I feel better" or "I don't think about that the same way." I have a few friends who would swear up and down that EMDR therapy helped them out tremendously, but I don't know if there's a single way to gather data that would actually quantify what happened to them. I see the results not so much as psychology being flawed, but more about the difficulty of simply gathering the type of quantified data that a scientific study would require.
  
  --
  "Our two-party system is like a bowl of shit looking at itself in a mirror." - Lewis Black
false positives by jc42 · 2015-05-01 00:21 · Score: 4, Insightful

'This makes it hard to dismiss that there are still a lot of false positives in the literature.'
An even more widespread problem is that there are a lot of true negatives that aren't in the literature.
Of course, this is a problem in most scientific fields, not just the "soft sciences" like psychology. I'm occasionally impressed by a researcher who publishes descriptions of things studied and found to be not significant, but this doesn't happen very often.

--
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
1. Re:false positives by quintessencesluglord · 2015-05-01 01:01 · Score: 4, Informative
  
  More to the point, this is a problem of funding in all fields.
  No one wants to pay for basic research, even if it yields other useful ideas for further research. Unless it hyped to high heavens, the possibility of getting dollars is nil.
  Gent I know was a decent researcher who got demoted to teaching community college. After a year of not being able to produce the "right" (read: able to secure further funding), he was canned, and another researcher who was more accommodating to fudging results got the position.
  It's not like the experiments were going to be reproduced anyway. Just fodder for additional grants because you produce "results".
2. Re:false positives by randomencounter · 2015-05-01 01:21 · Score: 2
  
  False positives are why we have to insist on doing research to verify past studies, and they are inevitable for various reasons.
  Accepting that, and funding verification studies, is how a science goes from soft to hard.
  
  --
  Forget diamonds, copyright is forever.
3. Re:false positives by mrchaotica · 2015-05-01 01:35 · Score: 4, Informative
  
  You missed the GP's point: the problem is not that true negatives were found; the problem is that they were not published. Because they were not published, future researchers might waste more effort re-discovering them.
  
  --
  "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
Obg. XKCD by Reaper9889 · 2015-05-01 00:31 · Score: 5, Insightful

I think it could have something to do with this XKCD:
https://xkcd.com/882/
Has Anyone Reproduced Their Results? by registrations_suck · 2015-05-01 00:32 · Score: 4, Funny

Well, this is interesting news, to be sure. Gives us plenty to think about. I can't help but wonder if anyone has been able to reproduce their results.
Perspective by wisnoskij · 2015-05-01 00:44 · Score: 4, Interesting

You need to put this in perspective. Sure, psychology is wishywashy field filled with pseudo science. But apparently their studies are about as reproducible as a bunch of the hard sciences fields. If there is anything that reproduciblility studies have taught us is that if there is around a 50% chance your result is correct than you are around the norm, in a great many fields. This 39% would make them about on par with what I remember from medical/cancer reproduciblility studies.

--
Troll is not a replacement for I disagree.
Quick plug for JASNH by AEton · 2015-05-01 01:07 · Score: 5, Informative

Just taking this quick opportunity to post a link to my favorite journal, the Journal of Articles in Support of the Null Hypothesis: http://www.jasnh.com/ .
JASNH is one of the few places where you can submit a paper that says "we tested for X effect on Y and found no evidence that X affects Y". Generally this research is unpublishable and people will tweak parameters to get something career-advancing out of their research; I like JASNH because of the reminder that "falsifiability" can really happen.

--
We recently had heard in the office over one of the Yellow Machine that's made by Anthology Solutions.
Which gives a global p value of... by leehwtsohg · 2015-05-01 01:11 · Score: 2

Great! With p=2E-65, studies in psychology aren't totally random.
Re:What They Don't Say by Penguinisto · 2015-05-01 04:23 · Score: 3, Funny

Think of it this way... psychology is the 1960-70's equivalent of today's MBA, and have many similarities:
* neither has an objective means of measuring success or failure, in spite of claiming to have a wide array of methods by which to do so.
* neither the psychologist or the MBA is held accountable for incompetence or non-criminal malice.
* sometimes either one can take on the semblance of religion, minus a deity.
* the big 'do-nothing-but-are-promised-great-riches' degree of the 60's-80's was psychology, as hordes of students took that class thinking just that. In the 90's through today it's the MBA program.
* both can stretch logic and credulity in their work to attempt things that would get an engineer either incarcerated or killed.
(...add your own here...)
(Trigger warning for the MBAs and Psych majors: this is what is known as a joke.)

--
Quo usque tandem abutere, Nimbus, patientia nostra?
Social science is mutable by edcheevy · 2015-05-01 06:38 · Score: 2

Unlike the hard sciences, awareness of classic social science findings can loop back to impact the phenomena in question or they can change in response to society's evolution. Take the bystander effect for example. How many thousands or millions of college students have learned about the bystander effect in Psych 101? Hypothetically, now that they're aware of it, the effect should diminish and not be quite as reproducible as it once was. Then you layer on societal changes (oblivious smartphone/iTunes users increase the effect, but ubiquitous phones may decrease barriers to reporting and responding to violent crime, etc) and the ability to reproduce an earlier effect becomes muddled.

When a physicist announces a new particle, nothing changes. All the particles keep behaving how they were behaving before the announcement, and they don't care how society changes. The findings should be reproducible 100 years from now.

Many other comments have correctly pointed out that studies in general often focus on the new and shiny and statistically significant rather than reproducing prior results or reporting null findings, but the issue of settling on "truth" is made that much more difficult in the social sciences due to the existence of moving targets.
Because it is a SOFT science by blang · 2015-05-01 07:11 · Score: 2

You can measure how many parts per million of some matter is in teh air.
You can measure how many bacteria of a certain type is in your blood stream.
How do you measure if someone is in a good or bad mood?
The tester's bedside (or couch-side) manners can be enough to tilt the result one way or the other.
And if the researcher has an idea of what he is looking to find, he can (even subcounsciously) manipulate the patient into reacting one way or the other, tainting the measurement.
What do we measure, how do we measure it? The subject could be lying. They subject could be be imagining something. The tester has no way to verify.
Reproducibility is NOT the problem.
Even research that was reproduced can be wrong, for same reasons as above.
The NATURE of the field is the problem, not the lack of reproduciblility.
Lack of reproducibility is merely the proof that there are fundamental problems with measurements and conclusions.
But I agree that the conclusion we can draw, is that there are a lot of false positives.

--
-- Another senseless waste of fine bytes.
Invisible Hand by PopeRatzo · 2015-05-01 08:08 · Score: 2

As dismal as Psychology's record is as a science, it's still way more rigorous and evidence-based than Economics.

--
You are welcome on my lawn.