Why Published Research Findings Are Often False
Hugh Pickens writes "Jonah Lehrer has an interesting article in the New Yorker reporting that all sorts of well-established, multiply confirmed findings in science have started to look increasingly uncertain as they cannot be replicated. This phenomenon doesn't yet have an official name, but it's occurring across a wide range of fields, from psychology to ecology and in the field of medicine, the phenomenon seems extremely widespread, affecting not only anti-psychotics but also therapies ranging from cardiac stents to Vitamin E and antidepressants. 'One of my mentors told me that my real mistake was trying to replicate my work,' says researcher Jonathon Schooler. 'He told me doing that was just setting myself up for disappointment.' For many scientists, the effect is especially troubling because of what it exposes about the scientific process. 'If replication is what separates the rigor of science from the squishiness of pseudoscience, where do we put all these rigorously validated findings that can no longer be proved?' writes Lehrer. 'Which results should we believe?' Francis Bacon, the early-modern philosopher and pioneer of the scientific method, once declared that experiments were essential, because they allowed us to 'put nature to the question' but it now appears that nature often gives us different answers. According to John Ioannidis, author of Why Most Published Research Findings Are False, the main problem is that too many researchers engage in what he calls 'significance chasing,' or finding ways to interpret the data so that it passes the statistical test of significance—the ninety-five-per-cent boundary invented by Ronald Fisher. 'The scientists are so eager to pass this magical test that they start playing around with the numbers, trying to find anything that seems worthy,'"
Is it possible that there has always been error, but it is just more noticeable now given that reporting is more accurate?
"As the intrepid kobold companion continues his journey, he begins to wonder... if priests raises dead, why anybody die?
Even in academia, there's an establishment and people who are powerful within that establishment are rarely challenged. A new upstart in the field will be summarily ignored and dismissed for having the arrogance to challenge someone who's widely respected. Even if that respected figure is incorrect, many people will just go along to keep their careers moving forward.
LK
"Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
I see this as one more planted article in mainstream press: "Science is there to mislead you, listen to fake news instead". The rising tide against education and critical thinking in the USA is reminiscent of the Cultural Revolution in China. It is even more ironic that the argument "against" metrics that usefully determine validity is couched in a pseudo-analytical format itself. At this point in the USA, most folks reading (even) the New yorker have no idea what a p-value is, why these things matter, and they will just recall the headline "science is wrong". And then they wonder in Detroit why they can't make $100k a year anymore pushing the button on robot that was designed overseas by someone else- you know, overseas where engineering, science, etc are still held in high regard.
That article is as flawed as the supposed errors it reports on. The author just "discovered" that biases exist in human cognition? The "effect" he describes is quite well understood, and is the very reason behind the controls in place in science. This is why we don't, in science, just accept the first study published, why scientific consensus is slow to emerge. Scientists understand that. It's journalists who jump on the first study describing a certain effect, and who lack the honesty to review it in the light of further evidence, not scientists.
After years of speculation, the a study has revealed that scientists are, in fact, human. The poor wages, long hours, and relative obscurity that most scientists dwell in has apparently caused widespread errors, making them almost pathetically human and just like every other working schmuck out there...
I'll add another cause to the list. The "publish or perish" mentality encourages researchers to rush work to print often before they are sure of it themselves. The annual review and tenure process at most mid-level research universities rewards a long list of marginal publications much more than a single good publication.
Personally, I feel that many researchers publish far too many papers with each one being an epsilon improvement on the previous. I would rather they wait and produce one good well-written paper rather than a string of ten sequential papers. In fact, I find that the sequential approach yields nearly unreadable papers after the second or third one because they assume everything that is in the previous papers. Of course, I was guilty of that myself because if you wait to produce a single good paper, then you'll lose your job or get denied tenure or promotion. So, I'm just complaining without being able to offer a good solution.
The article can be viewed on a single page here: http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer?currentPage=all
Not surprisingly, most of the posts so far show no signs of having actually RTFA.
Lehrer goes through all kinds of logical contortions to try to explain something that is fundamentally pretty simple: it's publication bias plus regression to themean. He dismisses publication bias and regression to the mean as being unable to explain cases where the level of statistical significance was extremely high. Let's take the example of a published experiment where the level of statistical significance is so high that the result only had one chance in a million of occurring due to chance. One in a million is 4.9 sigma. There are two problems that you will see in virtually all experiments: (1) people always underestimate their random errors, and (2) people always miss sources of systematic error.
It's *extremely* common for people to underestimate their random errors by a factor of 2. That means the the 4.9-sigma result is only a 2.45-sigma result. But 2.45-sigma results happen about 1.4% of the time. That means that if 71 people do experiments, typically one of them will result in a 2.45-sigma confidence level. That person then underestimates his random errors by a factor of 2, and publishes it as a result that could only have happened one time in a million by pure chance.
Missing a systematic error does pretty much the same thing.
Lehrer cites an example of an ESP experiment by Rhine in which a certain subject did far better than chance at first, and later didn't do as well. Possibly this is just underestimation of errors, publication bias, and regression to the mean. There is also good evidence that a lot of Rhine's published work on ESP was tainted by his assistants' cheating: http://en.wikipedia.org/wiki/Joseph_Banks_Rhine#Criticism
Find free books.
Now science uses different math, and the results are expressed differently, even probabilistically. But in real science those probabilities are not what most think as probability. In a scanning tunneling microscope, for instance, works by the probability that a particle can jump an air gap. Though this is probabilistic, It is well understood so allows us to map atoms. There is minimal uncertainty in the outcome of the experiment.
The research talked about in the article may or may not be science. First, anything having to do with human systems is going to be based on statistics. We cannot isolate human systems in a lab. The statistics used is very hard. From discussions with people in the field, I believe it is every bit as hard as the math used for quantum mechanics. The difference is that much of the math is codified in computer applications and researchers do not necessarily understand everything the computer is doing. In effect, everyone is used the same model to build results, but may not know if the model is valid. It is like using a constant acceleration model for which a case where there is a jerk. The results will be not quite right. However, if everyone uses the faulty model, the results will be reproducible.
Second, the article talks about the drug dealers. The drug dealers are like the catholic church of Galileo's time. The purpose is not to do science, but to keep power and sell product. Science serves a process to develop product and minimize legal liability, not explore the nature of the universe. As such, calling what any pharmaceutical does as the 'scientific method' is at best misguided.
The scientific method works. The scientific method may not be comopletey applicable to fields of studies that try to find things that often, but not, always, work in a particular. The scientific method is also not resistant to group illusion. This was the basis of 'The Structure of Scientific Revolution'. The issue here, if there is one, is the lack of education about the scientific method that tends to make people give individual results more credence than is rational, or that is some sort of magic.
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
Agreed. Way too many papers from academia are ZERO value added. Most are a response to "publish or perish" realities.
Cases in point: One of my less favorite profs published approximately 20 papers on a single project, mostly written by his grad students. Most are redundant papers taking the most recent few months data and producing fresh statistical numbers. He became department head, then dean of engineering.
As a design engineer I find it maddening that 95% of the journals in the areas I specialize in are:
1. Impossible to read (academia style writing and non-standard vocabulary).
2. Redundant. Substrate integrated waveguide papers for example are all rehashes of original waveguide work done in the 50's and 60's, but of generally lower value. Sadly the academics have botched a lot of it, and for example have "invented" "novel" waveguide to microstrip transitions that stink compared to well known techniques from 60's papers.
3. Useless. Most, once I decipher them, end up describing a widget that sucks at the intended purpose. New and "novel" filters should actually filter, and be in some way as good or better than the current state of the art, or should not be bothered to be published.
4. Incomplete. Many interesting papers report on results, but don't describe the techniques and methods used. So while I can see that University of Dillweed has something of interest, I can't actually utilize it.
So as a result when I try to use the vast number of published papers and journals in my field, and in niches of my field to which I am darn near an expert, I cannot find the wheat from the chaff. Searches yield time wasting useless results, many of which require laborious decyphering before I can figure that they are stupid or incomplete. Maybe only 10% of the time does a day long literature search yield something of utility. Ugh.
Hard sciences simply lend themselves a lot better to repeatability. Where I think we go wrong is assigning the same certainties to the claims of the soft scientists.
Granted that hard sciences are probably more reliable, but unfortunately, a lot of the research even there is shaky. I overheard roughly the following conversation between a graduate student in mathematics and his thesis adviser one summer, while I was doing undergraduate summer math research at the CUNY Graduate Center on an NSF grant (RTG):
Even if high-profile results are more reliable in the hard sciences, your average paper is still unreproducible garbage. The problem is the system, which forces everyone to publish as much as possible without heed to quality; and the journals, which publish only positive results. Researchers need to publish all their results publicly, including registering their hypotheses before they even begin the study. Universities need to take a stand by not focusing on quantity of publications. More emphasis must be placed on repeatability.
The people who treat this kind of finding as an attack on science are perpetuating the problem. We should be looking to make the scientific process ever better and more accurate as we come to understand its pitfalls better, not shrug off its inadequacies as inevitable.
MediaWiki developer, Total War Center sysadmin