Results Are In From Psychology's Largest Reproducibility Test: 39/100 Reproduced
An anonymous reader writes: A crowd-sourced effort to replicate 100 psychology studies has successfully reproduced findings from 39 of them. Some psychologists say this shows the field has a replicability problem. Others say the results are "not bad at all". The results are nuanced: 24 non-replications had findings at least "moderately similar" to the original paper but which didn't quite reach statistical significance. From the article: "The results should convince everyone that psychology has a replicability problem, says Hal Pashler, a cognitive psychologist at the University of California, San Diego, and an author of one of the papers whose findings were successfully repeated. 'A lot of working scientists assume that if it’s published, it’s right,' he says. 'This makes it hard to dismiss that there are still a lot of false positives in the literature.'”
What they don't tell you is that of the 39 reproducible results:
9 were telepathic,
3 were remote viewing,
15 were astral projection,
11 were telekinetic,
and 1 was precognitive (we knew the results of this study before it was published).
Is there a valid reason we accept studies that have not been reproduced at least one more time to truly vet them before the community?
Logistics, resources, patents, or a need to just plain bullshit people. I'm sure there's plenty of excuses as to why we don't, but doesn't sound like we have a whole lot of good reasons why not.
And those that are labeling a score of 39/100 "not bad at all" should have their head checked. Enjoy your legal fun from that ball of lies.
'This makes it hard to dismiss that there are still a lot of false positives in the literature.'
An even more widespread problem is that there are a lot of true negatives that aren't in the literature.
Of course, this is a problem in most scientific fields, not just the "soft sciences" like psychology. I'm occasionally impressed by a researcher who publishes descriptions of things studied and found to be not significant, but this doesn't happen very often.
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
I think it could have something to do with this XKCD:
https://xkcd.com/882/
Well, this is interesting news, to be sure. Gives us plenty to think about. I can't help but wonder if anyone has been able to reproduce their results.
The reason the 61% weren't reproducible was because they didn't want to be reproduced.
See, in psychology, for it to work, you have to want it to work. Psychology is basically the study of placebos. Actually, psychology is a placebo.
You need to put this in perspective. Sure, psychology is wishywashy field filled with pseudo science. But apparently their studies are about as reproducible as a bunch of the hard sciences fields. If there is anything that reproduciblility studies have taught us is that if there is around a 50% chance your result is correct than you are around the norm, in a great many fields. This 39% would make them about on par with what I remember from medical/cancer reproduciblility studies.
Troll is not a replacement for I disagree.
It wasn't just published on the internet, it was published in a scientific journal!
Silly researchers. You're not supposed to publish science fiction just because a company paid you to write a story that matches their agenda.
@Whee
I'm trying to imagine the results for a similar review of computer science and computer architecture research from most universities.
Since there is nearly zero reproduction of results, limited validation, generally poor test content and few incentives to improve research quality I doubt the results would instill much confidence.
It would be funny to real scientists, except that it is a disgrace, because this pretend science is used to decide if people should be executed or go to jail or be released from jail.
Right there. Psychology is as much a science as Astrology and Tarot cards. It's all based on what the observer thinks is normal.
Do not look at laser with remaining good eye.
Just taking this quick opportunity to post a link to my favorite journal, the Journal of Articles in Support of the Null Hypothesis: http://www.jasnh.com/ .
JASNH is one of the few places where you can submit a paper that says "we tested for X effect on Y and found no evidence that X affects Y". Generally this research is unpublishable and people will tweak parameters to get something career-advancing out of their research; I like JASNH because of the reminder that "falsifiability" can really happen.
We recently had heard in the office over one of the Yellow Machine that's made by Anthology Solutions.
You can now say, "Psychology Research results show correlation with reality higher than expected with P > 0.05."
C'mon, this isn't science, it's Psychology! That's good enough!
Great! With p=2E-65, studies in psychology aren't totally random.
So should circumstantial evidence from psychologist be admissible in courts? Should the field be used to determine who is eligible for parole? What about life time sex offenders, does this mean they lifetime label might not be correct.
Aren't most psychology studies done on college students with mandatory participation to pass the required psychology course? Quite a selected biased group? Perhaps they use to give the researchers what they wanted, or just didn't care to be there to give accurate answers?
I'd say, credibility is the problem plaguing the field.
The experimenters really wanted each experiment to predict their theory... The less precise a science, the worse a problem this is.
In Mathematics, where absolute proofs are possible, and proponents of this or that public policy fiddle not, things are fine. But if a psychologist or, dare I say it, a climate scientist thought, that all odd numbers are prime, for example, they would've staged an experiment: "3, 5, 7" and declared the theory confirmed... Yes, the subsequent "9" is problematic, but "11" and "13" are further confirmations and how much research do you need anyway, skeptic, before you start doing something?
In Soviet Washington the swamp drains you.
Psychology is basically pulling something out of your ass, and making it sound good. A floor sweeper contributes more to society.
Can someone explain to my fiancee why and how her dysthymia isn't real and that she is absolutely fine and depression is fake and she can just do normal stuff now because we found out psychology isn't real?
The energy industry would like you to come work for them to help "prove" that climate change is hokum.
Where batting .390 is considered a good thing.
Harrison's Postulate - "For every action there is an equal and opposite criticism"
Okay, 39/100 is an absolute, total and complete failure in all possible regards. Legitimate scientific fields don't get recognized for being able to backup 39% of there research. This goes to show why Psychology will never be considered a real science, it produces unverifiable results and it produces flaky / questionable answers. I'm glad someone took the time to finally put this issue to bed once and for all, 39/100 verified studies is the same as saying: "You're not science, stop acting like it!"
Which makes it very very hard to do research on. In addition effects may change over time, e.g. as a person's individual psychology changes as they age or as culture changes people over time. Doing research in a natural system, like the Social Sciences, is very hard. It is much harder than Physics, Chemistry, or other such Sciences. It is much harder than putting together a cutesy mobi app. And longitude studies are even harder to do well.
putting the 'B' in LGBTQ+
I had despaired that psychology could ever pull its head out of his own ass. But if they start actually doing real science again then the field might actually be saved.
It had gotten so bad that I just assumed that the neurologists would have to deal with all this stuff from the other side. Answer the psychology questions with neurology science.
Psychology has become something of a joke lately and there is no way to fix it short of subjecting it to cold empirical science.
I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
The frightening part is when one of these unreproducible studies is used to formulate government social services policies. Then we have people imprisoned, or their children taken from them, based on bad science.
Gamingmuseum.com: Give your 3D accelerator a rest.
Looks like the head shrinkers that buy the oogy boogy fake "science" that is psycology have mod points today.
Lumpy is right, anyone that thinks this crap is real is bat-shit crazy and needs to see a shrink.
People assume that just because there is an ology on the end of it you have legitimate science. Hypothesis, experiment, testable data.
Many psychology experiments fail the basic idea of testable repeatably and instead rely on surveys and make statistical assumptions.
It also doesn't show well that the only scientific test for inclusion / or exclusion in things like the DSM is 'voting'.
A lot of people claim the soft sciences are not 'really science' due to the intangibility of their results - and this plays directly into that bias.
However, it's very much not just the softer sciences that have this issue. There's a growing realization that it's pervasive across many hard science disciplines:
http://www.wsj.com/articles/SB... : 64% of pharma trials couldn't be reproduced.
http://retractionwatch.com/201... - half of researchers couldn't reproduce published findings.
We're inundated with data that, due to the specificity of the field or detail of the results, has to come from 'experts' and doesn't lend itself to a sort of common-sense vetting that we can use to filter bullshit in the usual course of our lives. Whether it's from ignorance of statistical methods, poor experimental technique, motivated mendacity (for whatever reason), or simply experimental results that represent only an unusual end of a bell-curve, there are many, many reasons that scientific data has to be taken with a serious grain of salt. It can't be assumed to be conclusive until we've reproduced it in whatever context we're trying to apply it.
-Styopa
Here's an experiment that will always reproduce the same results:
1. psychologist thinks they're right
2. psychologist gets mediocre stats that sort of support their claim
3. psychologist messes with the numbers and eliminates "incorrect" data to make their point appear more supported
When I was a freshman in college, I took Intro to Psych, and like many psych 101 classes in colleges across the country, the students are required to participate in research studies for grad student researchers in order to complete the class. Some of the studies were what you would consider to be more traditional one-on-one experiments with the researcher, but more often than not, the experiment merely consisted of taking a survey. All of the surveys were scheduled so that everybody involved took them at the same time. Well, I remember one time a survey was scheduled for something like 6pm on a Friday. Nobody wanted to be there, so of course, every student was rushing through the survey so they could get out of there and start partying for the weekend. I remember feeling so bad for the researcher, knowing that their dissertation probably depended on getting good data. Unlike the rest of the students, I took my time taking the survey, to the point where I was the last student left. That's when the researcher came up to me and in a sarcastic tone said "Are you done yet?". At that point I just randomly filled in the rest of the bubbles and came to the conclusion that most psychology studies are probably BS.
The "soft science" in the sense that a stick of butter is a "soft structural material"
I'd suggest a study going farther than just checking reproducibility.
I bet for many studies you could produce opposite or contradictory outcomes.
That ought to get someone's PhD published.
OK, I know we all like to gang up on psychologists, but the real question is: If I randomly take a 100 studies in physics/chemistry/biology, how many would I be able to reproduce?
I worked on semiconductors. On the theory side, the field is rife with papers that likely will never be testable by an experiment. I should know, because I wrote one of these, and many of the references I cited were similar. Papers that model an effect assuming many more dominant effects are not present. In reality, I don't think anyone will ever be able to construct an environment in the lab where you can isolate the effect in the paper from all the other effects. Neither I nor my peers could even conceive an experiment that would let you do that.
I got sick of it, and wasn't happy I had contributed to journal pollution, so I quit my PhD. I can assure you that amongst my peers, I was the only one. Others were fine publishing nonreproducible research, and many said so openly. The system encourages it.
Then go to the experimental side of solid state research. Here you run into a different problem: Many researchers intentionally omit key details in their experimental setup because they want to have a monopoly on the topic - so perhaps they actually did do the research, but they make it insanely hard to reproduce. No one bothers.
As always, the likelihood of reproducibility is directly proportional to the grandioseness of the claims in the paper.
Non-reproducibility is common. Is it worse in psychology? I don't know.
If you are unethical and try to reproduce a given experiment 100 times and it reproduces 10 times, you can publish a paper saying "I reproduced this experiment 10 times successfully" and destroy the evidence of the other 90 trials. Find 2 or 3 "independent" shills to do the same type of fake "reproduction" over the course of a few months and people will just assume that the experiment is valid and stop trying to disprove it.
It works in reverse too:
If you are unethical and try to reproduce a given experiment 100 times and it reproduces 90 times, you can publish a paper saying "I tried and failed to reproduce this experiment 10 times" and destroy the evidence of the other 90 trials. Have a few "independent" shills repeat the sham "failure to reproduce" a few times and the original experiment will be discredited, probably along with the original research team and its institution.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Is there a valid reason we accept studies that have not been reproduced at least one more time to truly vet them before the community?
Science is the new religion. Actually it's worse. At least the bible has been around for 1000 years. Nowadays scientists claim things like evolution and global warming are true and we are all supposed to take their word for it even though they both don't pass even the simplest of tests.
It's not just in psych, it's all of medicine. See "Why Most Published Research Findings Are False"
John P. A. Ioannidis http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124, "Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. " Ioannidis analyzed "49 of the most highly regarded research findings in medicine over the previous 13 years". In the paper Ioannidis compared the 45 studies that claimed to have uncovered effective interventions with data from subsequent studies with larger sample sizes: 7 (16%) of the studies were contradicted, 7 (16%) the effects were smaller than in the initial study, 20 (44%) were replicated and 11 (24%) of the studies remained largely unchallenged.[5]
Many biopharma trials that fail, fail quietly, as the business case to complete and publish evaporates for studies that don't produce desired results.
http://blogs.wsj.com/pharmalot/2014/11/19/nih-and-fda-toughen-rules-for-reporting-clinical-trial-results/
All that said, let's not get too worked up about the results of this reproducibility study until its results can be reproduced.
Unlike the hard sciences, awareness of classic social science findings can loop back to impact the phenomena in question or they can change in response to society's evolution. Take the bystander effect for example. How many thousands or millions of college students have learned about the bystander effect in Psych 101? Hypothetically, now that they're aware of it, the effect should diminish and not be quite as reproducible as it once was. Then you layer on societal changes (oblivious smartphone/iTunes users increase the effect, but ubiquitous phones may decrease barriers to reporting and responding to violent crime, etc) and the ability to reproduce an earlier effect becomes muddled.
When a physicist announces a new particle, nothing changes. All the particles keep behaving how they were behaving before the announcement, and they don't care how society changes. The findings should be reproducible 100 years from now.
Many other comments have correctly pointed out that studies in general often focus on the new and shiny and statistically significant rather than reproducing prior results or reporting null findings, but the issue of settling on "truth" is made that much more difficult in the social sciences due to the existence of moving targets.
I'm fairly certain that's psychiatry, not psychology. It's a big difference.
Personally I'm more concerned about the masses who seem to want things like driver's licenses and gun permits to be only given to people who have passed some sort of psychological testing. The only benefits of that are pleasing the dumbasses and giving a massive amount of power over everyone's daily lives to psychologists who may or may not have only their wallets' interests at heart.
If you use p=0.05 to suggest that you have made a discovery, you will be wrong at least 30% of the time. If, as is often the case, experiments are underpowered, you will be wrong most of the time.
And given the low power of most psychology experiments I am not surprised by this result.
I've always thought it had something to do with this. Yes, another xkcd post:
https://xkcd.com/435/
I can see how messy proving things are in sociology and psychology, and how absolute mathematical proofs are. It's always disturbed me how uncertain we can be with the sciences as we move to the left, though I really don't know at what point we can call something 'pure.'
You can measure how many parts per million of some matter is in teh air.
You can measure how many bacteria of a certain type is in your blood stream.
How do you measure if someone is in a good or bad mood?
The tester's bedside (or couch-side) manners can be enough to tilt the result one way or the other.
And if the researcher has an idea of what he is looking to find, he can (even subcounsciously) manipulate the patient into reacting one way or the other, tainting the measurement.
What do we measure, how do we measure it? The subject could be lying. They subject could be be imagining something. The tester has no way to verify.
Reproducibility is NOT the problem.
Even research that was reproduced can be wrong, for same reasons as above.
The NATURE of the field is the problem, not the lack of reproduciblility.
Lack of reproducibility is merely the proof that there are fundamental problems with measurements and conclusions.
But I agree that the conclusion we can draw, is that there are a lot of false positives.
-- Another senseless waste of fine bytes.
'A lot of working scientists assume that if it's published, it's right,' he says. 'This makes it hard to dismiss that there are still a lot of false positives in the literature.'
Ummm... they do? Like, who? Not a single one I know.
If a result is published, I assume (as do most other scientists) that means very little until it's been reproduced, and even then I remain quite skeptical until it's stood the test of time. I assume many published results will turn out to be wrong. That's just the nature of science. Every paper is a work in progress, a snapshot of someone's research at one moment. And that's fine.
So 39% were successfully reproduced, and another 24% came close? I'd call that pretty good, especially in psychology where you're studying an incredibly complex system (the human brain) while trying to sort out hundreds of interacting factors.
"I'm too busy to research this and form an educated opinion, but I do have time to tell everyone my uninformed opinion."
That is all.
Most linux users don't know this, but the man pages were named after Chuck Norris. Chuck Norris fsck'ing hates noobs!
As dismal as Psychology's record is as a science, it's still way more rigorous and evidence-based than Economics.
You are welcome on my lawn.
First, people who actually pay attention know full-well that the Bible is not a single book but rather a large collected set of books, letters, poems/songs etc. which are bound together in one volume for convenience. To clarify: the council of Nicea did not really edit an existing recognized book, but rather assembled the book from a large collection of works. The works that were deemed to not belong (for many reasons that would have also applied to the assembly of a secular book) were excluded from the collection but many are available separately for those who are interested in them. This is not the same as the "edited a book" idea many readers will have, which implies an existing book was manipulated (though it IS like the editing a publisher does with a book author before a book is printed).
Second, the "various translations" of the Bible you mention will also be misunderstood by readers not familiar with the material. All the Biblical texts were written in ancient non-English languages, so any Bible owned by a person who cannot read ancient Hebrew, Greek, etc is likely to be a translation (same as with any other ancient book). The work "translation" does not imply "manipulation" or "distortion". The simple fact is that no two languages map directly word-for-word, so any time a translation is done the translator/s have to choose the best wording they can to convey the ideas as accurately as they can and different translators have different opinions of what is "best". Biblical translations are even tougher than secular translations because the scholars involved are often trying to carefully avoid having a result that implies things which out not to be implied and which might inadvertently lead a reader to think something which is Biblically "sinful" is ok (thereby misleading a reader into sin). This is further complicated by the continual shifts in contemporary language as well. The translators of the King James version, for example, wrote "Thou shalt not kill" in their translation of the Ten Commandments which readers of their time would understand full-well did NOT mean to be vegitarians, or to not defend yourself in war or during a violent crime, but modern English readers might mistake this way so other more modern versions more-accurately translate that verse to something like "Do not murder". The King James translation is still remarkably faithful to ancient manuscripts, but it's translators were being very careful to not lead their readers into sin (they took this stuff VERY seriously and believed their readers' souls were at stake). The fact that two versions word that verse differently does not make either version "wrong", and in fact many people who study the Bible keep more than one version and study them together. There ARE some very recent "translations" of the Bible which are a bit political in that they do things like remove gender references from many verses to conform with modern political correctness, but these are widely rejected by many who see that as a variation away from the original text which is invalid precisely because it attempts to manipulate the meaning away from the intent of the authors.
The fact that there are a number of translations of an old book, even if they disagree on the precise wording used, does NOT mean that they are "less reliable" or are a manipulation, as you post implies. Were that the case, we would have to say that we have NO reliable books on anything from before Gutenberg's printing press; there are fewer surviving manuscripts, and with more translational variations, for many ancient Roman and Greek books which everybody accepts than for the Biblical texts that many people use these criticisms to attack.
People who make money claiming they know what other people are thinking MIGHT be full of crap.
Sadly, this reminds me of my personal favorite bit of proposed legislation which failed to become law.
An entire profession based on non-reproducible results and claims of insight into the minds of others is simply no more reliable and credible than a 1-900 number telephone psychic. The "testimony" of people in this profession is used on courts to make sure some people are kept jailed and others are let go with NO penalty for the professional who is later proven wrong; it's a profession with no true accountability other than by peers and associations of people in the profession (i.e. LESS accountability that the telephone scammers). Oh, and for any psycho who disagrees with me on this...... you're just "in denial" and you probably have "issues" involving you mother....
See PLOS's "most" viewed paper, "Why Most Published Research Findings are False" by John Ionnidis August 30, 2005 at PlosMedicine.org Why Most Published Research Findings are False
Ionnidis paper proves that
alpha ="After a research finding has been claimed based on achieving formal statistical significance, the post-study probability that it is true is the Positive Predictive Value,"
PPV = (1 - beta) R / (R - beta * R + alpha)
= 1 / [1 + alpha / (1 - beta) R) ]
where
Here, for psychology, with alpha = 0.05, .05 / (1 - beta) R]
R = 0.03 if 1-beta = 1 [the power for a very large sample] 0.06 if 1-beta = 0.5 0.15 if 1-beta = 0.2 [the power for a moderate sample].PPV = 0.39 = 1 / [1 +
so
That is, these psychology papers operate in a field with around R = 0.15 true/false relationships.
Germany's Pharmaceutical Bayer found only 30 percent (PPV=0.30) of all pharmaceutical papers verifiable, corresponding to an R = 0.11.
You can change the ratio R to
R / (1-R),
the pre-study probability the relationship is true. Call this the "Background Probability" of a true relationship.
In the extreme though not uncommon genetics field, research seeks from 30,000 genes the (at most) 30 genes that influence a genetic disease, for which
R = 30/30000 = 0.001
and at this small R, PPV is then also about 0.001.
Don't lose track. There are three fractions mentioned here,
(1) R (ratio of true relationships to false relationships in the field, before experiment)
(2) Background probability = R / (1-R)
(3) PPV (after an experiment and publication, this is the probability the result as significant)
While the researchers/statisticians can set alpha = 0.05, and can get beta = 0.80, their probability meaning is clouded by their frequentist interpretation. What the statistician can't set, and what is never mentioned -- the Background Probability -- differs and is important in each research field!
When the Background Probability is moderate, a design with moderate power (1 - beta) can get good PPV. But research often works in a field of previously unseen results, or uses data mining software (a good generator of false results and tool of charlatans), where R does equal 0.01 or even 0.001. In these many fields, the Background Probability swamps any statistical design's alpha and beta. "Most research findings are false for most research designs and for most fields... a PPV exceeding 50% is quite difficult to get." Indeed, a look at the PPV formula shows that whatever alpha, even a power of 1 (a little thought reveals why more power hardly helps here) produces mostly false results if the Background Probability itself is less than alpha!
If R must be relatively large in a "field" for published results to represent true relationships, then a large proportion of relationships considered in that field are true (significant). Such a research field should be exceedingly boring. In the other extreme, in a "field" with relatively few true relationships, research produces mostly false conclusions. However, in followup studies from published results (eg, pharmaceuticals check results with further studies), R becomes large (note the conditioning). When you see that the probability published research represents a true relationsh
Just proves what we already knew - that psychology is half science and half pseudoscience. Good thing that so many base their careers and their children's future and the future of the world on it..
An Astrologer "..and they had the nerve to call us pseudoscientists!"
Below the speed of light Special Relativity is one of the most accurate theories in physics - above the speed of light..