Why Published Research Findings Are Often False
Hugh Pickens writes "Jonah Lehrer has an interesting article in the New Yorker reporting that all sorts of well-established, multiply confirmed findings in science have started to look increasingly uncertain as they cannot be replicated. This phenomenon doesn't yet have an official name, but it's occurring across a wide range of fields, from psychology to ecology and in the field of medicine, the phenomenon seems extremely widespread, affecting not only anti-psychotics but also therapies ranging from cardiac stents to Vitamin E and antidepressants. 'One of my mentors told me that my real mistake was trying to replicate my work,' says researcher Jonathon Schooler. 'He told me doing that was just setting myself up for disappointment.' For many scientists, the effect is especially troubling because of what it exposes about the scientific process. 'If replication is what separates the rigor of science from the squishiness of pseudoscience, where do we put all these rigorously validated findings that can no longer be proved?' writes Lehrer. 'Which results should we believe?' Francis Bacon, the early-modern philosopher and pioneer of the scientific method, once declared that experiments were essential, because they allowed us to 'put nature to the question' but it now appears that nature often gives us different answers. According to John Ioannidis, author of Why Most Published Research Findings Are False, the main problem is that too many researchers engage in what he calls 'significance chasing,' or finding ways to interpret the data so that it passes the statistical test of significance—the ninety-five-per-cent boundary invented by Ronald Fisher. 'The scientists are so eager to pass this magical test that they start playing around with the numbers, trying to find anything that seems worthy,'"
Is it possible that there has always been error, but it is just more noticeable now given that reporting is more accurate?
"As the intrepid kobold companion continues his journey, he begins to wonder... if priests raises dead, why anybody die?
Even in academia, there's an establishment and people who are powerful within that establishment are rarely challenged. A new upstart in the field will be summarily ignored and dismissed for having the arrogance to challenge someone who's widely respected. Even if that respected figure is incorrect, many people will just go along to keep their careers moving forward.
LK
"Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
After years of speculation, the a study has revealed that scientists are, in fact, human. The poor wages, long hours, and relative obscurity that most scientists dwell in has apparently caused widespread errors, making them almost pathetically human and just like every other working schmuck out there. Every major news organization south of the mason-dixon line in the United States and many religious organizations took this to mean that faith is better, as it is better suited to slavery, long hours, and no recognition than science, a relatively new kind of faith that has only recently received any recognition. In other news, the TSA banned popcorn from flights on fears that the strong smell could cause rioting from hungry and naked passengers who cannot be fed, go to the bathroom, or leave their seats for the duration of the flight for safety reasons....
#fuckbeta #iamslashdot #dicemustdie
I see this as one more planted article in mainstream press: "Science is there to mislead you, listen to fake news instead". The rising tide against education and critical thinking in the USA is reminiscent of the Cultural Revolution in China. It is even more ironic that the argument "against" metrics that usefully determine validity is couched in a pseudo-analytical format itself. At this point in the USA, most folks reading (even) the New yorker have no idea what a p-value is, why these things matter, and they will just recall the headline "science is wrong". And then they wonder in Detroit why they can't make $100k a year anymore pushing the button on robot that was designed overseas by someone else- you know, overseas where engineering, science, etc are still held in high regard.
I'm a scientist myself. It's quite clear from where I'm standing that to get good jobs, research grants, etc one needs plenty of published articles. Whether the conclusions of those are true or false is not something that hiring committees will delve into too much. If you are young and have a family to support, it can be tempting to take shortcuts.
This article has already been taken apart by P.Z. Myers in a blog post on Pharyngula. Here's his conclusion:
Basically, it's not like anyone's surprised at this.
NYT article is well written and informative. It's clearly not assuming that there is something wrong with scientific method, but just asks - could it be? There is excellent reply by George Musser at "Scientific American" http://cot.ag/hWqKo2
This is what I call interesting and engaging public discussion and journalism.
user@ubuntubox:~$ stfu This server is going down for shutdown NOW!
I agree. Though its not lying with the Clintonesque definition of lying that most people use. Its more lying my omission, distorting the meaning of the results by not putting them in their complete context. At least that's how it is with the papers I've read and known enough about to have an educated opinion on. Although the misrepresentation is usually at least partially intentional, I don't think its all intentional.
> 'Which results should we believe?'
What a ridiculous question. How about the results that are replicated, accurately, time and time again, and not ones that aren't based off of scientific theory, or failed attempts at scientific theory?
That article is as flawed as the supposed errors it reports on. The author just "discovered" that biases exist in human cognition? The "effect" he describes is quite well understood, and is the very reason behind the controls in place in science. This is why we don't, in science, just accept the first study published, why scientific consensus is slow to emerge. Scientists understand that. It's journalists who jump on the first study describing a certain effect, and who lack the honesty to review it in the light of further evidence, not scientists.
Is it possible that there has always been error, but it is just more noticeable now given that reporting is more accurate?
Precisely. As mentioned in a Scientific American blog:
"The difficulties Lehrer describes do not signal a failing of the scientific method, but a triumph: our knowledge is so good that new discoveries are increasingly hard to make, indicating that scientists really are converging on some objective truth."
That's not a given. Particularly in the soft sciences - psychology, for instance - it is extremely difficult to control for all factors (I'm more inclined to say nearly impossible) and so replication of results can be subsumed by other effects, or even simply not work at all. You know that whole generation gap thing? That's a good example of groups of people who are different enough that the reactions they will have to certain subject matter can be polar opposites. So something that was "definitively determined" in 1960 may be statistically irrelevant among the current generation.
That's just one example of how squishy this all is. Without having to bring lying into it at all. And then, there will be liars; and there will be people who draw conclusions without scientific rigor at all, simply because it's just too difficult, expensive or time-consuming to attempt to confirm the ideas at hand. And there is the outlier personality; the one who accounts for those other few percent -- all the declarations of "this is how it is" are false for them right out of the gate.
Hard sciences simply lend themselves a lot better to repeatability. Where I think we go wrong is assigning the same certainties to the claims of the soft scientists. I have personally seen psychiatrists, best intent not in doubt, completely err in characterizing a situation to the great detriment of the people involved, because the court took the psychiatrist's word as gospel truth.
All science is an exercise in metaphor, but soft science is an exercise of metaphor that is almost always far too flexible. One place you can see this happening is the trendy / cyclic adherence to Froyd, Jung, Maslow, Rogers and so forth... the "correct" way to raise babies... Ferberizing, etc. This stuff isn't generally lies at all, but it also generally isn't "right." Good intentions do not automatically make good science.
Serious medicine is another good example. Something that might work very well for you might not work at all for me; get the wrong group of test subjects, and your results will skew or worse. This is an area that I think is fair to call a hard science, but where we just don' t know enough about the systems involved. Generally speaking, I don't think our oncologist lies to us; further, I think he's pretty well aware of the limitations of his practice and the state of knowledge that informs it; but they just don't know enough. To which I hopefully add, "yet."
On a personal level - since that's all I can really affect - I treat soft science about the same way I do astrology. If you believe it, you'll probably attempt to modify your behavior because of the predictions, which in turn may, or may not, affect your actual outcome. If you don't, it's either irrelevant or too uncertain to trust anyway. So it's low confidence all the way.
I do, however, still place very high confidence in Boyle's law for gasses. Hard science works very well. :)
I've fallen off your lawn, and I can't get up.
It's only lying if you do it intentionally. If ten labs independently and without knowing of each other perform essentially the same experiment, and one of them has a statistically significant result, is that lying? The other nine won't get published because, unfortunately, people only rarely (and for large or controversial experiments) publish negative results, but the one anomalous study will.
The vast majority of science is performed with all the good will in the world, but it's simply impossible for scientists to not be human. That's why we do replicate experiments - hell, my wife just published a paper where she tried to replicate someone else's results and got entirely different ones, and analyzed why the first guy got it wrong.
The article can be viewed on a single page here: http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer?currentPage=all
Not surprisingly, most of the posts so far show no signs of having actually RTFA.
Lehrer goes through all kinds of logical contortions to try to explain something that is fundamentally pretty simple: it's publication bias plus regression to themean. He dismisses publication bias and regression to the mean as being unable to explain cases where the level of statistical significance was extremely high. Let's take the example of a published experiment where the level of statistical significance is so high that the result only had one chance in a million of occurring due to chance. One in a million is 4.9 sigma. There are two problems that you will see in virtually all experiments: (1) people always underestimate their random errors, and (2) people always miss sources of systematic error.
It's *extremely* common for people to underestimate their random errors by a factor of 2. That means the the 4.9-sigma result is only a 2.45-sigma result. But 2.45-sigma results happen about 1.4% of the time. That means that if 71 people do experiments, typically one of them will result in a 2.45-sigma confidence level. That person then underestimates his random errors by a factor of 2, and publishes it as a result that could only have happened one time in a million by pure chance.
Missing a systematic error does pretty much the same thing.
Lehrer cites an example of an ESP experiment by Rhine in which a certain subject did far better than chance at first, and later didn't do as well. Possibly this is just underestimation of errors, publication bias, and regression to the mean. There is also good evidence that a lot of Rhine's published work on ESP was tainted by his assistants' cheating: http://en.wikipedia.org/wiki/Joseph_Banks_Rhine#Criticism
Find free books.
Now science uses different math, and the results are expressed differently, even probabilistically. But in real science those probabilities are not what most think as probability. In a scanning tunneling microscope, for instance, works by the probability that a particle can jump an air gap. Though this is probabilistic, It is well understood so allows us to map atoms. There is minimal uncertainty in the outcome of the experiment.
The research talked about in the article may or may not be science. First, anything having to do with human systems is going to be based on statistics. We cannot isolate human systems in a lab. The statistics used is very hard. From discussions with people in the field, I believe it is every bit as hard as the math used for quantum mechanics. The difference is that much of the math is codified in computer applications and researchers do not necessarily understand everything the computer is doing. In effect, everyone is used the same model to build results, but may not know if the model is valid. It is like using a constant acceleration model for which a case where there is a jerk. The results will be not quite right. However, if everyone uses the faulty model, the results will be reproducible.
Second, the article talks about the drug dealers. The drug dealers are like the catholic church of Galileo's time. The purpose is not to do science, but to keep power and sell product. Science serves a process to develop product and minimize legal liability, not explore the nature of the universe. As such, calling what any pharmaceutical does as the 'scientific method' is at best misguided.
The scientific method works. The scientific method may not be comopletey applicable to fields of studies that try to find things that often, but not, always, work in a particular. The scientific method is also not resistant to group illusion. This was the basis of 'The Structure of Scientific Revolution'. The issue here, if there is one, is the lack of education about the scientific method that tends to make people give individual results more credence than is rational, or that is some sort of magic.
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
I agree. I was a science major and saw quite a willingness to fudge/manipulate data and I believe it has worked its way into general research. During a breif PhD stint I redid some experiments showed the opposite of what other students had done. Mine showed some significance why theirs had not. Funny thing was my data was ugly, while theirs was pretty. This was from an experiment where organisms where growing in media and had to be counted via microscope and measured with a spectrograph at set time periods. My guess is their data was pretty because they fudged it by saying they took the samples at exactly a particular time ratio. Since I recorded the actual elapsed time (the procedure was complicated and there was variability on how long it took me to complete the tasks sometimes being more than the next check point). I also guess that the student wanted pretty looking data because he thought that would look better to his boss (the professor who ran the lab). Even if the scientists are not doing this from pressure to go higher then their underlings might be doing it to be "impressive". Part of the problem is science is no longer something people do because they love it. It is too commoditized and has become just a job at the low end and a vicious battle for survival at the high end.
Did you even read the article?
This is basically about poorly designed clinical drug trials without sufficient controls. Sloppy work, even if it seemed rigorous enough at the time.
The sensationalistic "scientific method in question" stuff is pure BS, but after all this is New Yorker magazine we're talking about, so one wouldn't expect too much scientific literacy. It was the scientific method of "predict and test" that caught these erroneous results, so the method itself is fine. The "scientist" who designed a sloppy experiment is too blame, not the method.
However, I'm not sure that psychiatric drug trials even deserve to be called science in the first place. The principle of GIGO (Garbage In - Garbage Out) applies. This is touchy-feely soft science at best. How do you feel today on a scale of 1-10? Do the green pills make you happy?
Are you serious? Many thousands of people are dead simply because a few people were trying to stay gainfully employed to support their families?
I am truly sorry if this comes off as offensive as I think it does but if you believe there would be mass suffering from unemployment if we did not bomb the shit out of Iraq and that was the basis for the lies that resulted in many thousands losing their lives then you are seriously deluded.
As a U.S. citizen I found Clinton's actions and lies embarrassing, but the lies from Bush transferred billions, if not trillions, of public funds into the hands of a few and resulted in the deaths of many thousands of people.
Comparing lies about a blow job to lies resulting in debt and death is absurdity on a grand scale.
I think that people tend to underestimate the pervasive impact of regression toward the mean.
Even without "data snooping" (improperly reanalyzing your data post-hoc in multiple ways to find something that appears to be statistically significant), there is still going to be bias. If I do an experiment and I happen to "luck out" and get a large (i.e. larger than the "true" mean of an infinite number of observations) effect size just by chance, I am far more likely to do follow-up experiments than if I am unlucky and the effect size is small or the result is not statistically significant. If subsequent experiments asking the same question in different ways also give a statistically significant result, my belief in the phenomenon is reinforced even if the effect size is a bit smaller.
So I am far more likely to identify a real phenomenon if because of a statistical fluctuation I initially observe a larger effect size or a smaller standard error than the "true" value. And my figures from that initial study, showing a nice big effect and a small error bar are far more likely to pass peer review than if the effect size is smaller and the error bars are larger, even if the criterion for statistical significance is satisfied.
If I am unlucky, and I get a lot of variation and/or a small effect size (again, compared to the "true" value from an infinite number of experiments), there is a good chance that the experiment will go into a drawer. Perhaps I'll give up on the idea, or perhaps I'll try it again, but I'll improve the experimental design in a way that I hope will reduce the statistical variability or give me a larger effect size. Of course, if it "works," I'll pat myself on the back for solving the technical problem and go on to do follow-up studies, even though statistically speaking it may well be the case that the prettier result from the new design is itself just a statistical fluctuation.
Part of the problem is that by convention, we report a single value for effect size. Yes, some sort of estimate of standard deviation is appended, but what people remember is that single value. It simply is very hard for human beings to think in terms of statistical distributions. We tend to forget (even though we know it to be true in theory) that a statistically significant result does not show that our estimate of effect size is correct--all it tells us is that the effect size is unlikely to be zero.
Thus, we can predict, just on statistical grounds, that effect sizes will tend to decline ("regress" toward the "true" mean) over time with follow-up studies, based on the simple fact that those follow-up studies are far more likely to happen if the measured effect size was initially larger than the "true" value than if it was smaller. And as far as I know, nobody has been able to come up with any statistically rigorous way of estimating the magnitude of this unavoidable bias.