Results Are In From Psychology's Largest Reproducibility Test: 39/100 Reproduced
An anonymous reader writes: A crowd-sourced effort to replicate 100 psychology studies has successfully reproduced findings from 39 of them. Some psychologists say this shows the field has a replicability problem. Others say the results are "not bad at all". The results are nuanced: 24 non-replications had findings at least "moderately similar" to the original paper but which didn't quite reach statistical significance. From the article: "The results should convince everyone that psychology has a replicability problem, says Hal Pashler, a cognitive psychologist at the University of California, San Diego, and an author of one of the papers whose findings were successfully repeated. 'A lot of working scientists assume that if it’s published, it’s right,' he says. 'This makes it hard to dismiss that there are still a lot of false positives in the literature.'”
Is there a valid reason we accept studies that have not been reproduced at least one more time to truly vet them before the community?
Logistics, resources, patents, or a need to just plain bullshit people. I'm sure there's plenty of excuses as to why we don't, but doesn't sound like we have a whole lot of good reasons why not.
And those that are labeling a score of 39/100 "not bad at all" should have their head checked. Enjoy your legal fun from that ball of lies.
'This makes it hard to dismiss that there are still a lot of false positives in the literature.'
An even more widespread problem is that there are a lot of true negatives that aren't in the literature.
Of course, this is a problem in most scientific fields, not just the "soft sciences" like psychology. I'm occasionally impressed by a researcher who publishes descriptions of things studied and found to be not significant, but this doesn't happen very often.
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
I think it could have something to do with this XKCD:
https://xkcd.com/882/
Well, this is interesting news, to be sure. Gives us plenty to think about. I can't help but wonder if anyone has been able to reproduce their results.
You need to put this in perspective. Sure, psychology is wishywashy field filled with pseudo science. But apparently their studies are about as reproducible as a bunch of the hard sciences fields. If there is anything that reproduciblility studies have taught us is that if there is around a 50% chance your result is correct than you are around the norm, in a great many fields. This 39% would make them about on par with what I remember from medical/cancer reproduciblility studies.
Troll is not a replacement for I disagree.
It wasn't just published on the internet, it was published in a scientific journal!
Silly researchers. You're not supposed to publish science fiction just because a company paid you to write a story that matches their agenda.
@Whee
I'm trying to imagine the results for a similar review of computer science and computer architecture research from most universities.
Since there is nearly zero reproduction of results, limited validation, generally poor test content and few incentives to improve research quality I doubt the results would instill much confidence.
Just taking this quick opportunity to post a link to my favorite journal, the Journal of Articles in Support of the Null Hypothesis: http://www.jasnh.com/ .
JASNH is one of the few places where you can submit a paper that says "we tested for X effect on Y and found no evidence that X affects Y". Generally this research is unpublishable and people will tweak parameters to get something career-advancing out of their research; I like JASNH because of the reminder that "falsifiability" can really happen.
We recently had heard in the office over one of the Yellow Machine that's made by Anthology Solutions.
Great! With p=2E-65, studies in psychology aren't totally random.
Psychology is basically pulling something out of your ass, and making it sound good. A floor sweeper contributes more to society.
Where batting .390 is considered a good thing.
Harrison's Postulate - "For every action there is an equal and opposite criticism"
Okay, 39/100 is an absolute, total and complete failure in all possible regards. Legitimate scientific fields don't get recognized for being able to backup 39% of there research. This goes to show why Psychology will never be considered a real science, it produces unverifiable results and it produces flaky / questionable answers. I'm glad someone took the time to finally put this issue to bed once and for all, 39/100 verified studies is the same as saying: "You're not science, stop acting like it!"
Which makes it very very hard to do research on. In addition effects may change over time, e.g. as a person's individual psychology changes as they age or as culture changes people over time. Doing research in a natural system, like the Social Sciences, is very hard. It is much harder than Physics, Chemistry, or other such Sciences. It is much harder than putting together a cutesy mobi app. And longitude studies are even harder to do well.
putting the 'B' in LGBTQ+
I had despaired that psychology could ever pull its head out of his own ass. But if they start actually doing real science again then the field might actually be saved.
It had gotten so bad that I just assumed that the neurologists would have to deal with all this stuff from the other side. Answer the psychology questions with neurology science.
Psychology has become something of a joke lately and there is no way to fix it short of subjecting it to cold empirical science.
I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
The frightening part is when one of these unreproducible studies is used to formulate government social services policies. Then we have people imprisoned, or their children taken from them, based on bad science.
Gamingmuseum.com: Give your 3D accelerator a rest.
A lot of people claim the soft sciences are not 'really science' due to the intangibility of their results - and this plays directly into that bias.
However, it's very much not just the softer sciences that have this issue. There's a growing realization that it's pervasive across many hard science disciplines:
http://www.wsj.com/articles/SB... : 64% of pharma trials couldn't be reproduced.
http://retractionwatch.com/201... - half of researchers couldn't reproduce published findings.
We're inundated with data that, due to the specificity of the field or detail of the results, has to come from 'experts' and doesn't lend itself to a sort of common-sense vetting that we can use to filter bullshit in the usual course of our lives. Whether it's from ignorance of statistical methods, poor experimental technique, motivated mendacity (for whatever reason), or simply experimental results that represent only an unusual end of a bell-curve, there are many, many reasons that scientific data has to be taken with a serious grain of salt. It can't be assumed to be conclusive until we've reproduced it in whatever context we're trying to apply it.
-Styopa
Sounds about right.
I trust the judgement of peers before I trust the judgement of a shrink. I've heard to many things from and about shrinks to believe that they know shit from shinola.
"Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
Here's an experiment that will always reproduce the same results:
1. psychologist thinks they're right
2. psychologist gets mediocre stats that sort of support their claim
3. psychologist messes with the numbers and eliminates "incorrect" data to make their point appear more supported
When I was a freshman in college, I took Intro to Psych, and like many psych 101 classes in colleges across the country, the students are required to participate in research studies for grad student researchers in order to complete the class. Some of the studies were what you would consider to be more traditional one-on-one experiments with the researcher, but more often than not, the experiment merely consisted of taking a survey. All of the surveys were scheduled so that everybody involved took them at the same time. Well, I remember one time a survey was scheduled for something like 6pm on a Friday. Nobody wanted to be there, so of course, every student was rushing through the survey so they could get out of there and start partying for the weekend. I remember feeling so bad for the researcher, knowing that their dissertation probably depended on getting good data. Unlike the rest of the students, I took my time taking the survey, to the point where I was the last student left. That's when the researcher came up to me and in a sarcastic tone said "Are you done yet?". At that point I just randomly filled in the rest of the bubbles and came to the conclusion that most psychology studies are probably BS.
The "soft science" in the sense that a stick of butter is a "soft structural material"
There's no grant money available from government for that. It's only available (in huge quantities) for supporting the paradigm. Strange but true.
I rather thought that energy companies had a few extra dollars running around. Exxon's 2014 revenue was over 400 billion. Surely they could fund a few studies all by themselves.
Faster! Faster! Faster would be better!
So, are all you skeptics paid shills, or just really passionate about your cause?
I, for one, am rather passionate about some odd numbers not being primes.
That there is an infinite amount of them strengthens my position, but having only a few exceptions is enough to invalidate the theory I cited as an example.
In Soviet Washington the swamp drains you.
I'd suggest a study going farther than just checking reproducibility.
I bet for many studies you could produce opposite or contradictory outcomes.
That ought to get someone's PhD published.
If you are unethical and try to reproduce a given experiment 100 times and it reproduces 10 times, you can publish a paper saying "I reproduced this experiment 10 times successfully" and destroy the evidence of the other 90 trials. Find 2 or 3 "independent" shills to do the same type of fake "reproduction" over the course of a few months and people will just assume that the experiment is valid and stop trying to disprove it.
It works in reverse too:
If you are unethical and try to reproduce a given experiment 100 times and it reproduces 90 times, you can publish a paper saying "I tried and failed to reproduce this experiment 10 times" and destroy the evidence of the other 90 trials. Have a few "independent" shills repeat the sham "failure to reproduce" a few times and the original experiment will be discredited, probably along with the original research team and its institution.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
I grew up a few blocks from a large mental institution, and many of my classmates' parents worked there. An awful lot of the people who worked there ended up loonier than their patients.
"Think about how stupid the average person is. Now, realise that half of them are dumber than that." - George Carlin
Think of it this way... psychology is the 1960-70's equivalent of today's MBA, and have many similarities:
* neither has an objective means of measuring success or failure, in spite of claiming to have a wide array of methods by which to do so.
* neither the psychologist or the MBA is held accountable for incompetence or non-criminal malice.
* sometimes either one can take on the semblance of religion, minus a deity.
* the big 'do-nothing-but-are-promised-great-riches' degree of the 60's-80's was psychology, as hordes of students took that class thinking just that. In the 90's through today it's the MBA program.
* both can stretch logic and credulity in their work to attempt things that would get an engineer either incarcerated or killed.
(...add your own here...)
(Trigger warning for the MBAs and Psych majors: this is what is known as a joke.)
Quo usque tandem abutere, Nimbus, patientia nostra?
The bible was last edited by the council of Nicea in about 250AD. It gets re-translated regularly. Look at the various translations of 1st Timothy 2:12 for how politics can affect that. Who decided to remove 'she should be quite'?
By your standard, that makes the bible about 1/10 as reliable as the Hindu books about flying elephants having battles.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
Unlike the hard sciences, awareness of classic social science findings can loop back to impact the phenomena in question or they can change in response to society's evolution. Take the bystander effect for example. How many thousands or millions of college students have learned about the bystander effect in Psych 101? Hypothetically, now that they're aware of it, the effect should diminish and not be quite as reproducible as it once was. Then you layer on societal changes (oblivious smartphone/iTunes users increase the effect, but ubiquitous phones may decrease barriers to reporting and responding to violent crime, etc) and the ability to reproduce an earlier effect becomes muddled.
When a physicist announces a new particle, nothing changes. All the particles keep behaving how they were behaving before the announcement, and they don't care how society changes. The findings should be reproducible 100 years from now.
Many other comments have correctly pointed out that studies in general often focus on the new and shiny and statistically significant rather than reproducing prior results or reporting null findings, but the issue of settling on "truth" is made that much more difficult in the social sciences due to the existence of moving targets.
If you use p=0.05 to suggest that you have made a discovery, you will be wrong at least 30% of the time. If, as is often the case, experiments are underpowered, you will be wrong most of the time.
And given the low power of most psychology experiments I am not surprised by this result.
I've always thought it had something to do with this. Yes, another xkcd post:
https://xkcd.com/435/
I can see how messy proving things are in sociology and psychology, and how absolute mathematical proofs are. It's always disturbed me how uncertain we can be with the sciences as we move to the left, though I really don't know at what point we can call something 'pure.'
You can measure how many parts per million of some matter is in teh air.
You can measure how many bacteria of a certain type is in your blood stream.
How do you measure if someone is in a good or bad mood?
The tester's bedside (or couch-side) manners can be enough to tilt the result one way or the other.
And if the researcher has an idea of what he is looking to find, he can (even subcounsciously) manipulate the patient into reacting one way or the other, tainting the measurement.
What do we measure, how do we measure it? The subject could be lying. They subject could be be imagining something. The tester has no way to verify.
Reproducibility is NOT the problem.
Even research that was reproduced can be wrong, for same reasons as above.
The NATURE of the field is the problem, not the lack of reproduciblility.
Lack of reproducibility is merely the proof that there are fundamental problems with measurements and conclusions.
But I agree that the conclusion we can draw, is that there are a lot of false positives.
-- Another senseless waste of fine bytes.
'A lot of working scientists assume that if it's published, it's right,' he says. 'This makes it hard to dismiss that there are still a lot of false positives in the literature.'
Ummm... they do? Like, who? Not a single one I know.
If a result is published, I assume (as do most other scientists) that means very little until it's been reproduced, and even then I remain quite skeptical until it's stood the test of time. I assume many published results will turn out to be wrong. That's just the nature of science. Every paper is a work in progress, a snapshot of someone's research at one moment. And that's fine.
So 39% were successfully reproduced, and another 24% came close? I'd call that pretty good, especially in psychology where you're studying an incredibly complex system (the human brain) while trying to sort out hundreds of interacting factors.
"I'm too busy to research this and form an educated opinion, but I do have time to tell everyone my uninformed opinion."
That is all.
Most linux users don't know this, but the man pages were named after Chuck Norris. Chuck Norris fsck'ing hates noobs!
You mean they said they worked there.
You are welcome on my lawn.
As dismal as Psychology's record is as a science, it's still way more rigorous and evidence-based than Economics.
You are welcome on my lawn.
See PLOS's "most" viewed paper, "Why Most Published Research Findings are False" by John Ionnidis August 30, 2005 at PlosMedicine.org Why Most Published Research Findings are False
Ionnidis paper proves that
alpha ="After a research finding has been claimed based on achieving formal statistical significance, the post-study probability that it is true is the Positive Predictive Value,"
PPV = (1 - beta) R / (R - beta * R + alpha)
= 1 / [1 + alpha / (1 - beta) R) ]
where
Here, for psychology, with alpha = 0.05, .05 / (1 - beta) R]
R = 0.03 if 1-beta = 1 [the power for a very large sample] 0.06 if 1-beta = 0.5 0.15 if 1-beta = 0.2 [the power for a moderate sample].PPV = 0.39 = 1 / [1 +
so
That is, these psychology papers operate in a field with around R = 0.15 true/false relationships.
Germany's Pharmaceutical Bayer found only 30 percent (PPV=0.30) of all pharmaceutical papers verifiable, corresponding to an R = 0.11.
You can change the ratio R to
R / (1-R),
the pre-study probability the relationship is true. Call this the "Background Probability" of a true relationship.
In the extreme though not uncommon genetics field, research seeks from 30,000 genes the (at most) 30 genes that influence a genetic disease, for which
R = 30/30000 = 0.001
and at this small R, PPV is then also about 0.001.
Don't lose track. There are three fractions mentioned here,
(1) R (ratio of true relationships to false relationships in the field, before experiment)
(2) Background probability = R / (1-R)
(3) PPV (after an experiment and publication, this is the probability the result as significant)
While the researchers/statisticians can set alpha = 0.05, and can get beta = 0.80, their probability meaning is clouded by their frequentist interpretation. What the statistician can't set, and what is never mentioned -- the Background Probability -- differs and is important in each research field!
When the Background Probability is moderate, a design with moderate power (1 - beta) can get good PPV. But research often works in a field of previously unseen results, or uses data mining software (a good generator of false results and tool of charlatans), where R does equal 0.01 or even 0.001. In these many fields, the Background Probability swamps any statistical design's alpha and beta. "Most research findings are false for most research designs and for most fields... a PPV exceeding 50% is quite difficult to get." Indeed, a look at the PPV formula shows that whatever alpha, even a power of 1 (a little thought reveals why more power hardly helps here) produces mostly false results if the Background Probability itself is less than alpha!
If R must be relatively large in a "field" for published results to represent true relationships, then a large proportion of relationships considered in that field are true (significant). Such a research field should be exceedingly boring. In the other extreme, in a "field" with relatively few true relationships, research produces mostly false conclusions. However, in followup studies from published results (eg, pharmaceuticals check results with further studies), R becomes large (note the conditioning). When you see that the probability published research represents a true relationsh
It was a full tilt edit. How did the poor old lady giving a single copper story show up in John? It was never there before the edits. There are multiple fragments that cover the verses affected. It wasn't there before Nicea. They just paraphrased it from Mark.
They fully consolidated the bible. Synchronizing 'good' gospels while burning others etc.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
Just proves what we already knew - that psychology is half science and half pseudoscience. Good thing that so many base their careers and their children's future and the future of the world on it..
An Astrologer "..and they had the nerve to call us pseudoscientists!"
Below the speed of light Special Relativity is one of the most accurate theories in physics - above the speed of light..