Why Published Research Findings Are Often False
Hugh Pickens writes "Jonah Lehrer has an interesting article in the New Yorker reporting that all sorts of well-established, multiply confirmed findings in science have started to look increasingly uncertain as they cannot be replicated. This phenomenon doesn't yet have an official name, but it's occurring across a wide range of fields, from psychology to ecology and in the field of medicine, the phenomenon seems extremely widespread, affecting not only anti-psychotics but also therapies ranging from cardiac stents to Vitamin E and antidepressants. 'One of my mentors told me that my real mistake was trying to replicate my work,' says researcher Jonathon Schooler. 'He told me doing that was just setting myself up for disappointment.' For many scientists, the effect is especially troubling because of what it exposes about the scientific process. 'If replication is what separates the rigor of science from the squishiness of pseudoscience, where do we put all these rigorously validated findings that can no longer be proved?' writes Lehrer. 'Which results should we believe?' Francis Bacon, the early-modern philosopher and pioneer of the scientific method, once declared that experiments were essential, because they allowed us to 'put nature to the question' but it now appears that nature often gives us different answers. According to John Ioannidis, author of Why Most Published Research Findings Are False, the main problem is that too many researchers engage in what he calls 'significance chasing,' or finding ways to interpret the data so that it passes the statistical test of significance—the ninety-five-per-cent boundary invented by Ronald Fisher. 'The scientists are so eager to pass this magical test that they start playing around with the numbers, trying to find anything that seems worthy,'"
fail
Is it possible that there has always been error, but it is just more noticeable now given that reporting is more accurate?
"As the intrepid kobold companion continues his journey, he begins to wonder... if priests raises dead, why anybody die?
The article says "this phenomenon doesn't yet have an official name," [yet] but it actually does. It's called "lying".
"If you want to know what happens to you when you die, go look at some dead stuff."
Even in academia, there's an establishment and people who are powerful within that establishment are rarely challenged. A new upstart in the field will be summarily ignored and dismissed for having the arrogance to challenge someone who's widely respected. Even if that respected figure is incorrect, many people will just go along to keep their careers moving forward.
LK
"Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
The scientist favorite song:
The best things in life are free
But you can keep 'em for the birds and bees
Now give me money (that's what I want)
That's what I want (that's what I want)
That's what I want (that's what I want), yeah
That's what I want
Given the political environment of the last residental administration, and what it did to science, this is much worse than it might initially seem.
After years of speculation, the a study has revealed that scientists are, in fact, human. The poor wages, long hours, and relative obscurity that most scientists dwell in has apparently caused widespread errors, making them almost pathetically human and just like every other working schmuck out there. Every major news organization south of the mason-dixon line in the United States and many religious organizations took this to mean that faith is better, as it is better suited to slavery, long hours, and no recognition than science, a relatively new kind of faith that has only recently received any recognition. In other news, the TSA banned popcorn from flights on fears that the strong smell could cause rioting from hungry and naked passengers who cannot be fed, go to the bathroom, or leave their seats for the duration of the flight for safety reasons....
#fuckbeta #iamslashdot #dicemustdie
I see this as one more planted article in mainstream press: "Science is there to mislead you, listen to fake news instead". The rising tide against education and critical thinking in the USA is reminiscent of the Cultural Revolution in China. It is even more ironic that the argument "against" metrics that usefully determine validity is couched in a pseudo-analytical format itself. At this point in the USA, most folks reading (even) the New yorker have no idea what a p-value is, why these things matter, and they will just recall the headline "science is wrong". And then they wonder in Detroit why they can't make $100k a year anymore pushing the button on robot that was designed overseas by someone else- you know, overseas where engineering, science, etc are still held in high regard.
I'm a scientist myself. It's quite clear from where I'm standing that to get good jobs, research grants, etc one needs plenty of published articles. Whether the conclusions of those are true or false is not something that hiring committees will delve into too much. If you are young and have a family to support, it can be tempting to take shortcuts.
The article falsely gives a sense of "increasing junk"
- Since there is tangible progress in the field of medicine (don't know about others), we must be doing something right.
- Clearly the total scientific output is increasing and the junk is bound to increase. What matters is percentage, not the absolute count.
- The New Yorker article cites a few hand picked cases, that's all this 5 page article is based on?
This article has already been taken apart by P.Z. Myers in a blog post on Pharyngula. Here's his conclusion:
Basically, it's not like anyone's surprised at this.
NYT article is well written and informative. It's clearly not assuming that there is something wrong with scientific method, but just asks - could it be? There is excellent reply by George Musser at "Scientific American" http://cot.ag/hWqKo2
This is what I call interesting and engaging public discussion and journalism.
user@ubuntubox:~$ stfu This server is going down for shutdown NOW!
I was actually about to feed the troll. I was 2 sentences in before going "oh... right."
Most science is funded by government, and you don't get more funding if your data shows that "everything is OK, no further research needed."
So of course the results aren't reproducible, they are fiction in the first place!
~ now you know
Some of the things I've taken comfort in as I age are:
But if the fundamental indicator of that progress: publisued scientific results, contains a potentially large and unknown degree of misinformation, then my hopes are called into question.
I mean, obviously some progress is being made. We see that in the life expectancy statistics, in cancer survival rates, etc. But how much potential are we missing due to bogus publications?
This isnt about issues with the scientific method. Intellectually honest scientists dont care about how studys/experiments turn out. They have no vested interest, or sought outcome. However this article is not about honest scientists at all, its about well known frauds who found that their results didnt match their beliefs, and so they made up an excuse for why.
Too many researchers are eager to torture the data until they confess to something. Sometimes the data just don't have anything conclusive to say.
> 'Which results should we believe?'
What a ridiculous question. How about the results that are replicated, accurately, time and time again, and not ones that aren't based off of scientific theory, or failed attempts at scientific theory?
That article is as flawed as the supposed errors it reports on. The author just "discovered" that biases exist in human cognition? The "effect" he describes is quite well understood, and is the very reason behind the controls in place in science. This is why we don't, in science, just accept the first study published, why scientific consensus is slow to emerge. Scientists understand that. It's journalists who jump on the first study describing a certain effect, and who lack the honesty to review it in the light of further evidence, not scientists.
I'm not sure about ecology, but psychology and medicine are definitely not science, nor have they ever been science.
Probably the best indictment of psychology as a pseudo-science I've ever seen is: Trauma Myth The Truth About the Sexual Abuse of Children--and its Aftermath by Susan Clancy
She herself is basically a scientist, she engages in testing hypotheses in order to determine their validity and has been willing to set aside ones that were demonstrated to be false in favor of better ones. But, unfortunately, most in her field are charlatans.
Is it possible that there has always been error, but it is just more noticeable now given that reporting is more accurate?
Precisely. As mentioned in a Scientific American blog:
"The difficulties Lehrer describes do not signal a failing of the scientific method, but a triumph: our knowledge is so good that new discoveries are increasingly hard to make, indicating that scientists really are converging on some objective truth."
All science is either true or false there is no in between!
In my field, I have noticed that the grant writing cycle often drives researchers to propose doing things that are inherently difficult to do outside a particular setting (e.g. an academic medical center), but which is helpful in getting funding for research. One of the undesirable consequences of such research then is that it is either difficult to reproduce the exact setting (and consequently the results) elsewhere, and it can lead to findings that have limited external validity.
The article says "this phenomenon doesn't yet have an official name," [yet] but it actually does. It's called "lying".
For today's PC Nazis, I prefer "non reproducible truth".
:-P
This is the natural outcome of 'publish or perish.' If keeping your job depends almost solely on getting 'results' published, you will find those results.
Discovery is more prestigious than replication. I don't see how to fix that.
After a research is published, there is plenty of time to someone test it or find an experiment that disprove it (could still be done with relativity). And here plays the same mechanism than with the Murphy laws, where we only notice when something goes wrong... we don't count the ones that are not yet disproved, but the disproved ones, so "often" could be misleading.
If you had bothered to read the fucking article instead of jumping to some half assed conclusion you would see that the article has nothing to do with lying.
It's not "the oil companies have paid scientists to lie about science"
It's "I'm fascinated that trends I detected early in my research seem to fall apart as I continue to investigate"
Anyway.. thanks for lowering the level of discussion on /. even further, douche.
That's not a given. Particularly in the soft sciences - psychology, for instance - it is extremely difficult to control for all factors (I'm more inclined to say nearly impossible) and so replication of results can be subsumed by other effects, or even simply not work at all. You know that whole generation gap thing? That's a good example of groups of people who are different enough that the reactions they will have to certain subject matter can be polar opposites. So something that was "definitively determined" in 1960 may be statistically irrelevant among the current generation.
That's just one example of how squishy this all is. Without having to bring lying into it at all. And then, there will be liars; and there will be people who draw conclusions without scientific rigor at all, simply because it's just too difficult, expensive or time-consuming to attempt to confirm the ideas at hand. And there is the outlier personality; the one who accounts for those other few percent -- all the declarations of "this is how it is" are false for them right out of the gate.
Hard sciences simply lend themselves a lot better to repeatability. Where I think we go wrong is assigning the same certainties to the claims of the soft scientists. I have personally seen psychiatrists, best intent not in doubt, completely err in characterizing a situation to the great detriment of the people involved, because the court took the psychiatrist's word as gospel truth.
All science is an exercise in metaphor, but soft science is an exercise of metaphor that is almost always far too flexible. One place you can see this happening is the trendy / cyclic adherence to Froyd, Jung, Maslow, Rogers and so forth... the "correct" way to raise babies... Ferberizing, etc. This stuff isn't generally lies at all, but it also generally isn't "right." Good intentions do not automatically make good science.
Serious medicine is another good example. Something that might work very well for you might not work at all for me; get the wrong group of test subjects, and your results will skew or worse. This is an area that I think is fair to call a hard science, but where we just don' t know enough about the systems involved. Generally speaking, I don't think our oncologist lies to us; further, I think he's pretty well aware of the limitations of his practice and the state of knowledge that informs it; but they just don't know enough. To which I hopefully add, "yet."
On a personal level - since that's all I can really affect - I treat soft science about the same way I do astrology. If you believe it, you'll probably attempt to modify your behavior because of the predictions, which in turn may, or may not, affect your actual outcome. If you don't, it's either irrelevant or too uncertain to trust anyway. So it's low confidence all the way.
I do, however, still place very high confidence in Boyle's law for gasses. Hard science works very well. :)
I've fallen off your lawn, and I can't get up.
Before you can question the scientific method through experimentation you first must understand and utilize the scientific process. That last quote is a massive clue that the issue is that they are stepping away from the scientific process and trying to force an answer.
I'll go read the article but before I do I'll just note that in working in semiconductor manufacturing and development both the scientific process and statistical significance are at the core of resolving problems, maintaining repeatable manufacturing and developing new processes and products. And from my 20 years of experience the scientific process worked just fine and when results were not reproducible then you had more work to do but you didn't decide that science no longer worked and that the answer simply changed.
I can guarantee that if we throw away the scientific process and no longer rely of peer review and replication then all those fun little gadgets everyone enjoys these days will become a thing of the past and we'll enter into the second dark age.
This just proves that "science" is a load of bullshit. The creationists are right.
Now, where did I put my leaches. I feel a cold coming on...
Similar to the upcoming US election results
The article can be viewed on a single page here: http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer?currentPage=all
Not surprisingly, most of the posts so far show no signs of having actually RTFA.
Lehrer goes through all kinds of logical contortions to try to explain something that is fundamentally pretty simple: it's publication bias plus regression to themean. He dismisses publication bias and regression to the mean as being unable to explain cases where the level of statistical significance was extremely high. Let's take the example of a published experiment where the level of statistical significance is so high that the result only had one chance in a million of occurring due to chance. One in a million is 4.9 sigma. There are two problems that you will see in virtually all experiments: (1) people always underestimate their random errors, and (2) people always miss sources of systematic error.
It's *extremely* common for people to underestimate their random errors by a factor of 2. That means the the 4.9-sigma result is only a 2.45-sigma result. But 2.45-sigma results happen about 1.4% of the time. That means that if 71 people do experiments, typically one of them will result in a 2.45-sigma confidence level. That person then underestimates his random errors by a factor of 2, and publishes it as a result that could only have happened one time in a million by pure chance.
Missing a systematic error does pretty much the same thing.
Lehrer cites an example of an ESP experiment by Rhine in which a certain subject did far better than chance at first, and later didn't do as well. Possibly this is just underestimation of errors, publication bias, and regression to the mean. There is also good evidence that a lot of Rhine's published work on ESP was tainted by his assistants' cheating: http://en.wikipedia.org/wiki/Joseph_Banks_Rhine#Criticism
Find free books.
If science has become about "good enough" statistical analysis then many of our scientific truths are actually scientific "truths."
We have far to many politically motivated scientific "research" and paid for "reports" and "studies" that amount to Photoshop Science. Shouldn't we demand more from scientists so we can discredit the "scientists?"
Only the dead have seen the end of War. - Plato
Now science uses different math, and the results are expressed differently, even probabilistically. But in real science those probabilities are not what most think as probability. In a scanning tunneling microscope, for instance, works by the probability that a particle can jump an air gap. Though this is probabilistic, It is well understood so allows us to map atoms. There is minimal uncertainty in the outcome of the experiment.
The research talked about in the article may or may not be science. First, anything having to do with human systems is going to be based on statistics. We cannot isolate human systems in a lab. The statistics used is very hard. From discussions with people in the field, I believe it is every bit as hard as the math used for quantum mechanics. The difference is that much of the math is codified in computer applications and researchers do not necessarily understand everything the computer is doing. In effect, everyone is used the same model to build results, but may not know if the model is valid. It is like using a constant acceleration model for which a case where there is a jerk. The results will be not quite right. However, if everyone uses the faulty model, the results will be reproducible.
Second, the article talks about the drug dealers. The drug dealers are like the catholic church of Galileo's time. The purpose is not to do science, but to keep power and sell product. Science serves a process to develop product and minimize legal liability, not explore the nature of the universe. As such, calling what any pharmaceutical does as the 'scientific method' is at best misguided.
The scientific method works. The scientific method may not be comopletey applicable to fields of studies that try to find things that often, but not, always, work in a particular. The scientific method is also not resistant to group illusion. This was the basis of 'The Structure of Scientific Revolution'. The issue here, if there is one, is the lack of education about the scientific method that tends to make people give individual results more credence than is rational, or that is some sort of magic.
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
Worth pointing out that, if you do 20 hypothesis tests in a study, all at the 5% level, your expectation should be that approximately 1 of your conclusions is false.
Also, between subjective errors, regression to the mean, and publication bias, it's not surprising that at least some of these major results turn out to be wrong....
Actually, it's a whole conflated mish-mash of things, including as you say straight-out lying, but also cherry-picking, cognitive bias, statistical naivety and a whole bunch more things.
And really the headline should be "Checking for reproducibility really does identify flawed results: Science works".
There are 2 types of valid study; an experimental investigation that tries to test the prediction of a theory to either confirm or disprove it, or secondly a study that attempts to quantify an observed phenomena.
Fishing expeditions (lets see if esp is real, lets see if random compounds do something for condition x) are not valid - for all the reasons outlined in the article, unless they produce results that are stone cold solid. One example of an investigation of this type that has worked is the mapping of novas to redshift that revealed dark energy (or yet another reason to stop believing anything about cosmology, what ever you want to call it) - they were mapping the sky and found that all the models of the universe were utter bollocks (note, any theory that fails to account for 90% of the known physical conditions that it attempts to derive is utter rubbish, and no amount of bum squeezing carping whilst pointing to nonsense sums will make up for it. When you can explain mass we will talk, when you can explain non-baryonic matter I will sit and listen)
Interestingly though that study, which no one can argue with (cos you can look at the sky and see it if you have a few thousand $ of kit) has been dealt with by the cosmology community with a name (dark energy) and a few sheepish looks.
--------------------------------------------- "In the end, we're all just water and old stars."
Taking an example from a discipline and condemning the whole discipline for it is not intelligent. I mean, I could take some aspects of evolution and point at how biologists study them, and claim it is science - when compared to what most other disciplines do, the rigor is laughable.
Basically, there are two camps in psychology: Those who rigorously follow the scientific method, and those who loosely follow it. Declaring a whole discipline as not science would be like declaring biology not to be science.
Beetle B.
There are ways of controlling for that such that you have a 5% level for the 20 test combination. Sadly, I believe that many researchers without a solid statistical background do just as you suggest.
I just last week had a frank discussion with a former surgeon general about the predecessor article referenced on Slashdot, from the Atlantic Lies, Damned Lies, and Medical Science.
He noted that there was a lot of truth to the article. We discussed a few bases for this phenomenon, most notably:
1. The money: Researchers need funding, but funding is often effectively conditional on a finding of conclusions favourable to the funder (which funders are often either big pharmaceuticals or big governments);
2. The stigma: A "failure" to "prove" a hypothesis looks poorly on a researcher, so they often choose topics that are:
(a) irrelevant and so unlikely to ever be tested in the future; or
(b) trite and so unlikely to fail.
We have created a self-perpetuating system of "research" that leads to few useful results in the form of valuable hypotheticals being tested. Where potentially valuable hypotheses are being "tested" the methods used are often contrived so as to reach a specific conclusion, and unconcerned with the truth. These facades of research designed so as to reach specific conclusions allow companies and governments to market product and policy decisions, respectively, which they consider favourable.
All to say, the finding of a useful truth, although supposedly the object of scientific research and generally considered to be at least an incidental consequence of our economic system through e.g. the market's invisible hand, is in practice in the Western world at best irrelevant and at worst heavily counter-incentivized.
The absence of consequence – the curse of affluence – serves to perpetuate an increasing disconnect between reality and the publications that peddle the results of research.
Did you even read the article?
This is basically about poorly designed clinical drug trials without sufficient controls. Sloppy work, even if it seemed rigorous enough at the time.
The sensationalistic "scientific method in question" stuff is pure BS, but after all this is New Yorker magazine we're talking about, so one wouldn't expect too much scientific literacy. It was the scientific method of "predict and test" that caught these erroneous results, so the method itself is fine. The "scientist" who designed a sloppy experiment is too blame, not the method.
However, I'm not sure that psychiatric drug trials even deserve to be called science in the first place. The principle of GIGO (Garbage In - Garbage Out) applies. This is touchy-feely soft science at best. How do you feel today on a scale of 1-10? Do the green pills make you happy?
"This phenomenon doesn't yet have an official name..."
Sure it does: Publication Bias. It's even mentioned in the article itself: "Jennions, similarly, argues that the decline effect is largely a product of publication bias..." (p. 3 of the linked online article).
Unfortunately, the New Yorker has gotten in the habit of publishing articles in the vein of "Enormous scientific existential mystery!... Or actually, it's a standard topic that's been known for decades". Methinks someone got snookered by the 1st-page article headline/hype.
We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
I remember some time in the '80s, a doctor published some "research" that claimed to show that abused children could be identified by how they reacted to a pencil shoved into their anus. Yes, really! Unfortunately, doctors think they are scientists and for the most part, they are not, so they did not properly evaluate the methods used for this "research" The real shame of this was that some doctors actually used this "method" to identify supposedly abused children, with all the attendant hurt and distress that these false accusations caused.
ANY "scientific" finding that cannot be replicated must be called into question and absolutely not allowed standing in the domain as "fact". That is the entire purpose of the scientific method. If you cannot replicate your findings, then either your hypothesis is wrong, or your methods are flawed. In either case, you are back to square one, but with knowledge that may help in your next efforts.
Sometimes, real fast is almost as good as real-time.
In one of the first articles I published, I have naively described in detail the experiment, including the details of the (micro)device fabrication. One of the reviewer was scathing because "the article includes _too_ many details about the experiment". So how the fuck are my peers going to reproduce my results (or try, at least) without these details?
Still, I got "wiser" later on, and relented on the details. Much smaller rejection rate, after that. Sadly.
"The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
Climate change?
Science is hard.
Being unbiased (or more realistically, being aware of possible biases and checking for their influence on your research) is hard.
Funding which doesn't include the proper checks and balances to protect scientific independence can make science even harder.
And since when does scientific progress rely on seemingly significant results from a single unreproducible experiment?
The surest sign that intelligent life exists elsewhere in the universe is that none of it has tried to contact us.
I havent RTFA but dont the FINDINGS of the article in consideration apply to ITSELF? Shouldn't the article be titled Why Published Research Findings are often false except THIS ONE!!!
This has already been nicely deconstructed by PZ Myers here: http://scienceblogs.com/pharyngula/2010/12/science_is_not_dead.php
Or, to put it more charitably, medicine and psychology are far describing far more complex phenomenon than we like to admit.
For example, in psychiatric genetics, there are dozens of articles every year that find a new gene associated with a common and important condition (e.g. autism, schizophrenia, depression). After each new finding comes out, there are dozens of labs that try to replicate that finding, usually one or two replicate (or partially replicate) the finding, and five or six don't replicate it. Why is it so hard to replicate these findings? Probably because there are really dozens of independent genes that contribute to these complex disorders (probably in combination with each other), and some populations tend to have mutations in one set, while other populations tend to have mutations in another set.
We're moving towards understanding, but the disorders are far more complex than the assumption that there will be a single cause.
So basically the idea is that scientists fudged their results to get past p0.5 and then find they can't repeat it. Sounds a lot more like a lack of rigor. Not that this is surprising. Instead of forming a hypothesis and testing a null hypothesis, researches do the above and then, if their null isn't falsified, go hunting through data for suggestive data so they can publish - on top of the massive amount of data fudging that goes on (and which I have personally seen when people shave off decimal places in their favor or find excuses to omit people in data sets that contradict the desired goal). The journals, of course, encourage this by only plugging positive results and the institutions, wanting long resumes for prestige, help them along.
Scientists are just humans. They want their work to matter. They want to outshine their peers. And of course they want money.
So, they are under tremendous pressure to produce stuff that is awesome. It takes nearly super-human self-discipline to properly apply the scientific method when doing so means you gather a bunch of data that is relatively boring, leaves you relatively nameless, and doesn't get you any grant money next year.
...medicine and biochemistry. In some of them 95% confidence is considered utterly inadequate.
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
I wonder whether part of this is unnoticed assumptions that all parties make when confirming results. Then much later when people's assumptions are different, nobody can duplicate the results anymore. Sort of like in any field there's plenty of knowledge that isn't documented well, and gets lost across generations. Sucks. It's like all of the sudden being unable to read anything from your storage media and not knowing why.
I have done research, and tried to be rigorous. But - who knows what errors I may have made unknowingly? Did anyone try to independently reproduce my results? Almost certainly the answer is "now".
The point is: most researchers want to do original research. Very few research results are ever independently reproduced. If the initial researchers made implicit assumptions, if their work was affected by an external factor they failed to note - there are innumerable reasons their results might not hold up.
In regards to people-oriented research (medicines, etc), there can be any number of confounding factors. This should not really be terribly surprising to anyone who has done serious research. The problem may be more the fact that there is no system in place to arrange for verification of important results...
Enjoy life! This is not a dress rehearsal.
I stopped reading after the author said three times in the first page that science was "proving" this or that. Unless you are a mathematician, you are not proving anything. So I can't really take this guy too seriously.
The scientific process is basically about experimenting/analyzing/hypothesizing/ruminating. Good scientists are overwhelming conservative in their conclusions because good scientists understand "the box" within which they are working.
The fact that early studies are overturned with new analysis is exactly what makes the scientific process so powerful. When new studies call into question the results of earlier studies it is called progress. If a new study shows that a previous study used questionable statistical approaches, then future reviewers can cite this new knowledge to keep new studies from using these flawed approaches. The scientific process and the peer-review process is certainly not perfect, but I have yet to hear from its detractors of a better alternative.
A squid eating dough in a polyethylene bag is fast and bulbous, got me?
As the other guy said, don't let one bad apple ruin the whole bunch. There's plenty we know about psychology, that we really do know. The Lake Wobegon Effect for one -- if you look for it, you find it. The Fundamental Attribution Error is another. People make money (and politics) by exploiting these effects.
Related occured to me about surveys and their margin of error.
http://en.wikipedia.org/wiki/Margin_of_error
http://en.wikipedia.org/wiki/Confidence_interval
Surely as thousands of surveys are published each day, some of their results will fall outside of the stated confidence interval.
Some of them will simply be quite wrong, useless, and/or misleading, without the individual publisher having any fault at all.
Stephan
http://stephan.sugarmotor.org
Find out who funded the Research, and if they had a vested interest in the results turning out a specific way you can probably be sure in which way they turn out.
The law of gravity should not be confused with the force of gravity. The article writer made a horrendous assertion that they were the same and that somehow a scientific law was wrong "some of the time". Journalists shouldn't write about what they don't understand as it can be quite damaging to the public at large. I wish he had made that assertion in the first paragraph rather than the last so that I could have saved myself from reading 5 pages of supposition.
Think globally but act within local variable scope.
Surely the level of detail in the describing the experiment should match the level of specificicity of the claim/theory it aims to prove/disprove.
If your claim is that is a mixture of 50% dog shit and 50% ketchup will spontaneously ignite, then it shouldn't matter what type of container you used to prove the result. If you say the container type matters, then go ahead any specify what you used.
How you fabricated your test equipment would seem to be entirely irrelevant unless your theory was that it matters.
I stopped reading as soon as I saw the word "psychology" (which is a make work project for med school wash outs who couldn't stomach the sight of blood).
It starts off heavily implying that reality itself is somehow changeable
The unacknowledged problem is that the scientist is a part of his experiment. Scientists are humans with expectations, and cannot be impartial observers.
The interaction is usually subtle, but is always present. There was another placebo story last week: Placebos Work -- Even Without Deception. The most important part of this latest placebo study was the wording the doctors used as they handed the patients their sugar pills.
Learn the rules so you know how to break them properly.
www.teslabox.com
In the medical field everything is about "statistical error" -- that was great when you had studies with low numbers of subjects. With small N, statistical uncertainties are reckoned to be large and generally are large relative to the systematic uncertainties due to model assumptions. As medical science grows up studies get larger and statistical uncertainties shrink -- eventually you can no longer ignore the systematics uncertainties. For years it's been nearly impossible to get a publication in particle physics without two sets of error bars: value +/- stat error +/- syst error. It's hard to determine systematic errors so most people shove their fingers in their ears and sing "blah blah blah".
Google "average height adult male" and somewhere at the top of the list will be a webpage from the NIH with lots of tables breaking down height and weight by age group.
The columns are:
Age, Subject N, Mean Height, Std. Dev. of Mean, etc...
Now, just glancing at the tables, the avg height for a 20-30yr old male was 69.2 inches. But the next column was 0.012in. I was like "huh?" because the distribution of the population is certainly not 69.2 \pm 0.012 in. But then I read the headings again, and it's the standard deviation of the mean. With an N=1000, 0.012 becomes 0.012 * sqrt(1000) \approx 0.4, which sounds a bit more accurate.
But boy oh boy, 0.012 sure looks like a much tighter measurement than 0.4. And 0.4 is the sigma for the actual population.
"When the experiments are done, we still have to choose what to believe."
BULLSHIT.
read this, it's a better response then I can put together right now:
http://www.sciencebasedmedicine.org/?p=8987
The Kruger Dunning explains most post on
"We have created a self-perpetuating system of "research""
No, what you described is what some researchers may do, not the whole system.
The Kruger Dunning explains most post on
"This phenomenon doesn't yet have an official name,"
It's called CHEATING.
The problem is calling these fields as "science" and these people as "scientists".
The most hilarious one is the "Science of Economics".
It's all right if the subject is too complex and we don't yet have better ways to study it. The best people available have done their best in studying the field, whichever method they adopt. There is nothing more we can ask for.
Except, just don't call it the fucking SCIENCE.
I agree with the post and there are two reasons for bad statistics: being lazy/dumb and money.
Here in Belgium, research facilities (from a university) are paid on certain grounds.
One of them is the number of publications in your facililty: the more publications, the more money.
They feel the pressure and it's obvious that the quality of research is going down.
There are some journals which aren't very attentive to the statistics, so they publish what other journals wouldn't.
But more and more researchers are seeing what is happening, so in the next five or ten years, I think it's going to change.
There is an equilibrium and soon, it will turn over to the other side.
Research will be scientific again, if they figure out a way how to fund researchers appropriate
(I am only speaking for psychology, I don't know anything about other domains.)
People tend to forget that science is done by scientists, who vary in quality, and have up and down weeks and years....
(even T Woods or M Jordan or R Stallmann has a bad day)
as for the ionnidis article, I would be very surprised if more then 2 or 3 slashdotters know enough statistics to comment; I do know that I have looked at the article, and it is incomprehensible; further the article has been severly criticised by other statisticians (forgive the spelling)
There is empirical data to suggest that the problem is that most science is just bad or worthless; since it is worthless, no one bothers to check if it is right or wrong, so it doesn't matter if you can't reproduce it.
The data is citation counts; something like half of all papers are cited zero or one times - that means that a for a substantial fraction of the published science, no one bothers to follow up.
this is data from the ISI, which publishes the science citation index, a very valuable tool for anyone doing science.
Wouldn't it be bad if other researcher exactly copies your setup, but use your hypothesis to design their own experiment and test it. If the results don't match then more research would be needed to validate either study.
This would (hopefully) rule out results due to environmental interactions in your particular setup. If two different experiments come to the same conclusion, that is better than two identical experimental setups coming to the same conclusion (and more useful as well).
Given that the microdevice in question was the subject of the experiment, an accurate description of its fabrication is essential to reproduce the results. I did not include any details that were not essential to reproduce the experiment.
"The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
or just afraid of your wife?
Yes, in 2008, when public pressure was on and a new election was coming he gave lip service to people about getting more money. At a time when he would have no incentive to actually back up his statement.
Seriously, learn freaking politics before mashing out nonsense.
Obama has NOT CUT NASA BUDGET. He increased it 6 billion dollars.
In 2006, NASA has it's budget butchered and 3 billion cut.
The republicans have been slowly cutting science since Reagan. The more the religious right infects the republican p[arty, the more it cuts science. Who cut the metric system conversion program? republicans under the guise of 'cutting the budget'. Who removed all effort to remove oil dependence, the republicans.
Who made is so the general population could use the internet? democrats. Who increase NASA's budget? Democrats.
And comparing cuts or increase to just the president is so naive as to be called stupid.
The Kruger Dunning explains most post on
This issue is exactly why many scientists are moving towards model selection approaches instead of significance testing. Significance testing is arbitrary and silly at some level, and even Fisher knew that. The .05 cutoff is just something he pulled out of his butt one day as an arbitrary threshold that one might use for determining whether or not to provisionally believe a result, it's not some fundamental constant of the universe that has any real external justification to it. The good news is that the younger generation of scientists is increasingly comfortable with model selection, and as a result this is a problem that is in the process of correcting itself.
If you are outside the field, there is no way that you can describe in a few pages (the limit for most journals, esp prestigious ones) enough detail to let some one actually duplicate your experiment (assuming it can be duplicated - not everyone can afford to build a hubble, or a fermilab or a 500 Mhz NMR; or that you can gain access to and maintain specialiez cell lines or transgenic mice))
If you are in the field it is also silly; if you really need all the nitty gritty details, you call or email the authors and say, hey, these points here, what exactly do you mean...
this idea that you should describe enough detail to allow someone else to reproduce your work is one of those comforting hoary myths from 18th century england...Dobzhanski says somewhere that when he started in genetics, he could claim to have at least looked at every single paper ever published in the field, but that even a few years later, this was impossible...
http://www.youtube.com/watch?v=qNxfPAF1frM
The Kruger Dunning explains most post on
"Filing Cabinet Syndrome" Discuss.
We must drive a sword through any hypothesis that is not strictly necessary.
I'm not sure why this is news. Isn't evolutionary theory based on unrepeatable results? I.e., Molecules to Man?
The importance that most scientific approaches give to statistics is ridiculous. For a great discussion on its value in scientific research, Murray Sidman's "Tactics of Scientific Research" is a must-read. It is focused on Psychology, but the basic ideas applies to every field. I have my students read this book as soon as I start supervising them.
First, science has always had a political aspect. Publication reviewers are always biased by conventional wisdom among their scientific peers, and they will become critical of any submitted paper that strays from that view. A lot of careers are based on following the conventional wisdom, and threats to those careers are met with political responses.
Second, the quest for statistical significance is based on serious misunderstanding of statistics among scientists. It has been so for decades. Publication editors are thoroughly ignorant of statistics if they demand statistical significance at the .95 or .99 levels as a condition of acceptance.
Results that are statistically significant may or may not be clinically significant. Both factors must be considered.
Significance levels are based on one model of statistical inference. There are other models, although those have been subjected to politics within the mathematical/statistical community. Although Bayesian statistics are now accepted (and form a critical basis in theories of signal processing, radar, and other technologies) they were rejected by the statistical community for many years. The rejection was almost completely political, because the concepts challenged the conventional wisdom.
The basic scientific method is not a problem. The major problem is the factors in publication acceptance and the related biases and pressures to adhere to the conventional wisdom. Rejection of papers based on politics or on ignorance of statistical methods is outside the scientific method and needs to be rooted out.
I think that people tend to underestimate the pervasive impact of regression toward the mean.
Even without "data snooping" (improperly reanalyzing your data post-hoc in multiple ways to find something that appears to be statistically significant), there is still going to be bias. If I do an experiment and I happen to "luck out" and get a large (i.e. larger than the "true" mean of an infinite number of observations) effect size just by chance, I am far more likely to do follow-up experiments than if I am unlucky and the effect size is small or the result is not statistically significant. If subsequent experiments asking the same question in different ways also give a statistically significant result, my belief in the phenomenon is reinforced even if the effect size is a bit smaller.
So I am far more likely to identify a real phenomenon if because of a statistical fluctuation I initially observe a larger effect size or a smaller standard error than the "true" value. And my figures from that initial study, showing a nice big effect and a small error bar are far more likely to pass peer review than if the effect size is smaller and the error bars are larger, even if the criterion for statistical significance is satisfied.
If I am unlucky, and I get a lot of variation and/or a small effect size (again, compared to the "true" value from an infinite number of experiments), there is a good chance that the experiment will go into a drawer. Perhaps I'll give up on the idea, or perhaps I'll try it again, but I'll improve the experimental design in a way that I hope will reduce the statistical variability or give me a larger effect size. Of course, if it "works," I'll pat myself on the back for solving the technical problem and go on to do follow-up studies, even though statistically speaking it may well be the case that the prettier result from the new design is itself just a statistical fluctuation.
Part of the problem is that by convention, we report a single value for effect size. Yes, some sort of estimate of standard deviation is appended, but what people remember is that single value. It simply is very hard for human beings to think in terms of statistical distributions. We tend to forget (even though we know it to be true in theory) that a statistically significant result does not show that our estimate of effect size is correct--all it tells us is that the effect size is unlikely to be zero.
Thus, we can predict, just on statistical grounds, that effect sizes will tend to decline ("regress" toward the "true" mean) over time with follow-up studies, based on the simple fact that those follow-up studies are far more likely to happen if the measured effect size was initially larger than the "true" value than if it was smaller. And as far as I know, nobody has been able to come up with any statistically rigorous way of estimating the magnitude of this unavoidable bias.
I am not sure if you argue for or against thorough description of the experiment. The tone says "against" but the content says "definitely for".
Let me add this: the description of the experiment is there as a convenience, and any given researcher has the option of using or not using that description. I can't see any downside to this.
"The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
I think the really important question is whether or not this is the same John Ioannidis who wrote the original IPsec stack used in OpenBSD. Perhaps he is trying to tell us something? :-)
I guess they become true
Seriously, the guy makes a few points but the paper's headline was and is misleading at best. Then again, its sole purpose always was to generate
citations for and drive eyes towards the new journal (at the time) PLoS. Considering it was done in 2005, I'd say mission accomplished. Not like this place never covered it before, either
http://science.slashdot.org/story/10/10/15/1934228/Meta-Research-Debunks-Medical-Study-Findings?from=rss
http://science.slashdot.org/article.pl?sid=08/10/19/172254
http://science.slashdot.org/article.pl?sid=05/08/30/2048236
http://science.slashdot.org/article.pl?sid=07/09/18/1429222
We got the message. I think we can go back to doing science now.
Anyone familiar with the concept of "reality on demand" knows that it is constructed on a need-by-need basis by the Blue People. Now the Blue People are not 100% reliable. Sometimes they forget to put back key pieces of reality. This is the source of the problem of failure to reproduce results. The reproduction is being attempted in a reality which is simply too different.
Two possible solutions come to mind. Only conduct experiments where one never has to leave the room. Or maybe find a good lawyer who can negotiate a higher quality level contract with the Blue People.
Not really. This would be only true if all of those 2000 ways were statistically independent from one another. It would take a much larger dataset than most scientists deal with for there to be 2000 different ways of analyzing it, and even then they would not be statistically independent.
So the problem is not as bad as you suggest, but it is real. If I compare 20 different statistically independent measurements, one is expected to meet the p 0.05 criterion by pure random chance. There are ways of correcting for this bias, by requiring a higher criterion of statistical significance (say p 0.0025), but that also reduces the power of my study to detect a real difference.
Which is appropriate really depends upon the nature of the experiment and the question being asked. If I do 20 measurements and half of them are statistically significant, I may not much care if one of them is by chance.
If I want to minimize the likelihood of reporting an incorrect result, while maximizing the power of my study, my best bet is to decide in advance on a very few measurements and statistical tests, and stick with them. That's good for me, but it doesn't really help the reader who is looking at a bunch of different studies, because each finding reported at p = 0.05 still has one chance in 20 of being wrong. Added to that is an unknown magnitude of publication bias, because studies with significant findings are more likely to be published than those that find nothing of statistical significance.
Pet theories and funding from a company that has an interest in a certain result.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Last time I tried to repeat Creationism it kinda fizzled, too.
Ok, I'm not that good at magic, I admit.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Fisher did an excellent job on these sorts of questions. Here is what it boils down to: Significance an outcome of an experiment, not a goal. One does estimate how many trials will be needed to detect a particular effect. But, after making that estimate, one does not call off an experiment early if the the effect turns out to be obvious or extend the experiment to chase a possible weak detection. The experiment gets analyzed and reported. If there is no detection, one can contemplate a different experiment.
Here is the problem with calling off an experiment early: since your finishing criterion has become the apparent strong signal, your significance is no longer an independent outcome. You have manufactured it. Thus, it no longer has meaning. The same can be said for extending the experiment. I've been surprised by the number of senior scientists I've met who do not comprehend this though Fisher certainly did. When they use administrative authority to modify experimental design ex post facto they may induce false results for which their supervisory charges ultimately get the blame.
Sales of Charles Dickens' Christmas Carol continue to remain strong in English Lit departments.
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
"According to John Ioannidis, author of Why Most Published Research Findings Are False, the main problem is that too many researchers engage in what he calls 'significance chasing,' or finding ways to interpret the data so that it passes the statistical test of significance—the ninety-five-per-cent boundary invented by Ronald Fisher. 'The scientists are so eager to pass this magical test that they start playing around with the numbers, trying to find anything that seems worthy,'
Describes climate science to a tee.
Post Doc ergo Proctor Doc.
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
All you need is a computer simulation. Doesn't even have to be a provable simulation.
Just run the numbers and then change the world!
No repeatable experiments needed.
Replication is for pussies.
Of cours eI didn't read TFA, but one problem with the summary is that psychology and ecology *aren't sciences*. Both involves extensive subjective evaluations and value judgements and so it is no surprise at all that they can't be repeated.
Are we seeing Accidental Scienticide then?
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
http://www.naturalnews.com/z030209_placebo_medical_fraud.html
http://www.wired.com/medtech/drugs/magazine/17-09/ff_placebo_effect?currentPage=all
http://www.theatlantic.com/magazine/archive/2010/11/lies-damned-lies-and-medical-science/8269/
http://www.its.caltech.edu/~dg/crunch_art.html
http://www.theatlantic.com/past/docs/issues/2000/03/press.htm
http://www.scoop.co.nz/stories/HL9910/S00096/rankin-on-thursday-where-communism-succeeded.htm
http://www.nybooks.com/articles/archives/2004/jul/15/the-truth-about-the-drug-companies/
http://www.bloomberg.com/news/2010-10-26/glaxo-said-to-settle-u-s-drug-manufacturing-lawsuit-for-750-million.html
Wired on the orginal article:
http://www.wired.com/wiredscience/2010/12/the-truth-wears-off/
Anyway, this New Yorker article once again underscores the folly of going to extremes against common sense or long standing cultural traditions, based on some new scientific report or another, without looking at the broad big picture on overall weight of all the evidence we have from a variety of perspectives.
But even when there is a wide variety of good science, often policy ignores it.
Problems with the recent timid vitamin D recommendation:
http://www.grassrootshealth.net/recommendation
Dr. Joel Fuhrman on how much money the USA spends on sick care for very poor outcomes:
http://vimeo.com/16682935
A 21st century issue: the irony of technologies of abundance in the hands of those still thinking in terms of scarcity.
"but it now appears that nature often gives us different answers. " - if it gives you different answers, then it does not pass the test of the scientific method and must be thrown out. The laws of the universe do not change, so if what was once validated, no longer validates, it was validated in error. There is no supernatural entity changing the laws of nature....humans commit errors and those errors can be discovered when technology advances.
I'm not sure what the point of this article or posting was. The scientific method is as solid as ever. As we increase our knowledge and our abilities, we will invariably find past claims that no longer stand up to scrutiny. That is the cost of progress. The scientific method remains unscathed because it is the best means we have to validate research. Once we thought the world was a flat disc....and that seemed to stand up to scrutiny for a while....
Also: http://www.google.com/#q=peer+review+as+censorship
http://www.counterpunch.org/mazur02262010.html
http://www.suppressedscience.net/censorship-medicine.html
A key point being that keeping information from the public is not the same as modding up (or revising interactively) information like on slashdot. What would slashdot be like if every comment needed "peer review" before it was posted? Instead, slashdot uses after the fact moderation. (Nothing is perfect, of course.)
In general:
http://www.suppressedscience.net/
http://www.disciplinedminds.com/
http://www.jamesphogan.com/books/book.php?titleID=37
http://www.newciv.org/whole/schoolteacher.txt
http://www.johntaylorgatto.com/chapters/16a.htm
And from a previously posted link (from 1994 from the Vice Provost of Caltech, and it has probably gotten worse since): ..."
http://www.its.caltech.edu/~dg/crunch_art.html
"Peer review is usually quite a good way to identify valid science. Of course, a referee will occasionally fail to appreciate a truly visionary or revolutionary idea, but by and large, peer review works pretty well so long as scientific validity is the only issue at stake. However, it is not at all suited to arbitrate an intense competition for research funds or for editorial space in prestigious journals. There are many reasons for this, not the least being the fact that the referees have an obvious conflict of interest, since they are themselves competitors for the same resources. This point seems to be another one of those relativistic anomalies, obvious to any outside observer, but invisible to those of us who are falling into the black hole. It would take impossibly high ethical standards for referees to avoid taking advantage of their privileged anonymity to advance their own interests, but as time goes on, more and more referees have their ethical standards eroded as a consequence of having themselves been victimized by unfair reviews when they were authors. Peer review is thus one among many examples of practices that were well suited to the time of exponential expansion, but will become increasingly dysfunctional in the difficult future we face.
We must find a radically different social structure to organize research and education in science after The Big Crunch. That is not meant to be an exhortation. It is meant simply to be a statement of a fact known to be true with mathematical certainty, if science is to survive at all.
A 21st century issue: the irony of technologies of abundance in the hands of those still thinking in terms of scarcity.
One tends to hear this sort of thing from people who don't know anything about the pharmaceutical industry, and of course this attitude is pushed very hard by people who are hawking quack cures of one sort or another, and who are thus competitors of the pharmaceutical industry.
I'm an academic pharmacologist, but I've met a lot of the people involved in industrial drug discovery, and trained more than a few of them. People tend to go into pharmacology because they are interested in curing disease and alleviating suffering. Many of them were motivated to enter the area by formative experiences with family members or other loved ones suffering from disease. They don't lose this motivation because they happen to become employed by a pharmaceutical company--indeed, many enter industry because it is there that they have the greatest opportunity to be directly involved in developing treatments that will actually cure people.
It is certainly true that pharmaceutical companies are businesses, and their decisions regarding how much to spend on treatments for different illnesses are strongly influenced by the potential profits. A potential treatment for a widespread chronic disease can certainly justify a larger investment than a one-time cure. But it can also be very profitable to be the only company with a cure for a serious disease. And it would be very bad to spend a lot of money developing a symptomatic treatment only to have somebody else find a cure. So a company passes up an opportunity for a cure at its peril. There is definitely a great deal of research going on in industry on potential cures.
The real reason why cures are rare is that curing disease is hard. Biology is complicated, and even where the cause is well understood, a cure can be hard to implement. For example, we understand in principle how many genetic diseases can be cured, but nobody in industry or academia knows how to reliably and safely edit the genes of a living person in practice. It is worth noting that the classic "folk" treatments for disease, including virtually all of the classic herbal treatments that have been found to actually be effective--aspirin, digitalis, ma huang, etc--are not cures; they are symptomatic treatments. Antibiotics were a major breakthrough in the curing of bacterial diseases, but they were not created from scratch, but by co-opting biological antibacterial weapons that were the product of millions of years of evolution. Unfortunately, for many diseases we are not lucky enough to find that evolution has already done the hardest part the research for us.
I can almost hear the cretinists and similar science-haters rubbing their hand in glee...
Read PZ's take on it in Pharyngula to clear up the FUD.
--
El Guerrero del Interfaz
Biology is based on Chemistry is based on Physics is based on Mathematics is based on Logic is based on Anthropology is based on Neuroscience is based on Biology.
Where does statistics fit into this?
Republicans have often been very good for scientific funding. Clinton would propose a low NIH budget, and the Republicans would increase it. That is probably the best scenario for science--a Democratic administration, with a strong Republican presence in Congress. The problem with the Republicans actually being in charge is that while they like science, but they love war, and they hate taxes. So they tend to end up getting into a war, and then they can't afford science
Ah, another article by a Wired editor (at least this one has some basic scientific education) sexing up the old "a lot of scientists are crap at stats" thing into an angsty article about how "science" is flawed and how do we know what to believe? Add in a good dose of Roland, er, Hugh sensationalization (nature is giving different results!!11! And you've got a winner.
Here's how it works - a lot of scientists are crap at statistics, particularly in the squishier sciences, where rigorous stats are more important. MOST published studies are really exploratory - they might show some interesting results but those results came after so many comparisons the p-value is NOT a p-value. These studies are valuable, as they point out interesting things to look at further, but they're not "truth." And finally, drug companies, surprise, surprise, have a vested interest in making their expensive investments look good, even if it means bending some silly statistical rules a little.
After I read TFA I found it very interesting, but upon reading it a couple more times...not so much.
John Ioannidis coined this as the Proteus phenomenon.
The name of the effect is "I'm tired of this horseshit; just give me my fucking {masters,doctorate} syndrome" which is often abbreviated as idontgiveashititus.
I recall having a little discussion about this a week or so ago on /., at the time of the bee research by 8-year olds, in which I linked to this PLoS paper. It's from 2005, not exactly new.
I don't think it's a problem with the research in itself, but the demand to get things published or funded by demonstrating statistical significance. Most researchers aren't statisticians, but know that demonstrating a p-value of less than 0.05 (at least in biomedical research) is good enough to convince most publishers and funders. Test 20 different things, and chances are than one of those things will pass this significance threshold. That chance effect will end up getting published and the rest will be left behind as useless work. I find the bit about "gold-standard" research three pages into TFA to be particularly telling:
In 2005, Ioannidis published an article... that looked at the forty-nine most cited clinical-research studies in three major medical journals.... the data Ioannidis found were disturbing: of the thirty-four claims that had been subject to replication, forty-one per cent had either been directly contradicted or had their effect sizes significantly downgraded.
Note it mentions "of the claims that had been subject to replication". If you look at the abstract of that paper, only 20 (44%) of the most cited studies were replicated, and 11 (24%) remained largely unchallenged.
But wait, there's more:
Claims from highly cited observational studies persist and continue to be supported in the medical literature despite strong contradictory evidence from randomized trials.
I don't think this is as much of a problem in the fields of mathematics and physics, because they are more likely to understand the statistics involved in demonstrating that something is a real effect. I have heard that physicists only accept p-value cutoffs that are approaching the planck constant in magnitude, which would be somewhat harder to fudge.
Ask me about repetitive DNA
Seriously, those are the two leading causes for this phenom
Some people who 'confirm' the findings take shortcuts to get their name out there before someone else confirms it. Some just flat out lie about confirming it and fake it nearly completely.
Others are stupid. I realize that most people think people with higher education in research positions have intelligence but having known as many of my wifes coworkers over the years as I have I can assure you that a 'higher education' has absolutely 0 relationship to intelligence.
Just because they work in a research setting doesn't mean they have a clue, it just means they got someone to fund them, and getting funding is relatively easy, its a tax write off and free advertising. Easier than buying a home or car in most cases actually, as long as your one of those people who don't mind bullshitting.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Feynman words of wisdom can be found in "The Pleasure of Finding Things Out" on page 22 under the heading ”Science Which Is Not a Science . . . ”
IMHO - "Truer words were never spoken":
"There’s all kinds of myths and pseudoscience all over the
place.
I may be quite wrong, maybe they do know all these things,
but I don’t think I’m wrong. You see, I have the advantage of
having found out how hard it is to get to really know some-
thing, how careful you have to be about checking the experi-
ments, how easy it is to make mistakes and fool yourself. I
know what it means to know something, and therefore I see
how they get their information and I can’t believe that they
know it, they haven’t done the work necessary, haven’t done
the checks necessary, haven’t done the care necessary. I have
a great suspicion that they don’t know..."
The man himself can also be seen uttering these very words on youtube -
R. P. Feynman and Social Science:
http://www.youtube.com/watch?v=IaO69CF5mbY
May d 4s b wiz u!
Aside from arguments about the research environment, people here on /. have also mentioned that statistics need to be understood better by certain scientists.
Those arguments all make sense, but here's some more:
The fields mentioned in the articles are things like biology, medicine, and psychology. I'd bet economics would show up there as well if it had been investigated. These are the kinds of fields where there are many interacting components, often with feedback loops and non-linearities. Often, it is also very hard to find a well-suited dataset for testing specific factors on their own. In other fields, one can rule out a great many factors by clever experimental design. With complex stuff, there's no obvious simplification. Often, you even have the problem of a self-aware dataset. Someone mentioned the Demand Effect.
So what can one do? The brute force way is to try and gather a lot of data. That can help, until you realise that the very act of gathering more data might corrupt the data. Well, suppose you're trying to learn something, and the very act of learning it screws up your result. Sounds like quantum mechanics, doesn't it? But at least in QM you can have someone else try to replicate your result. Suppose you've published an article saying that stocks tend to rise in January. That could screw up the market every January as investors have read your article. What I'm getting at is some things seem to be path dependent. How the heck do you hypothesise about that when you only have one "real world" to look at? I think economics might have an issue like this. Loosely speaking, everyone studied what caused previous crises, and base their response to the next one on what they think they learned. So obviously, you never get a second test with the same model. Is the model right then? Hard to say.
Ben Goldacre, an MD from UK, has been at the detecting pseudoscience game for a while now. I have just started reading his book, Bad Science: Quacks, Hacks, and Big Pharma Flacks. I find it refreshingly topical and well-focused on the problem: evidence-based decision making.
Similar to Goldacre's findings, my experience has been that evidence, which has been produced by some test, requires the nature of that test to be disclosed. Following the model of the scientific process, evidence requires the following before it is complete: a testable idea, a test (or series of). To facilitate TFA's issue of replication, it is often nice to include the test setup, the procedure for executing the test, results of running the test given some inputs, etc.
--I apologize for any weirdness. I have been trying to edit this but apparently copy/paste is broken for my mode of /. viewing and Mac OS X 10.6.5 Safari 5.0.3.
> Francis Bacon .... declared that experiments were essential, because they allowed us to 'put nature to the question'
Call that Bacon's Law; but consider what could be called Torquemada's Corollary to Bacon's Law: Torturers will hear exactly what they want to hear.
Studies that don't replicate well probably missed something important.
Bacon in fact makes that point explicitly.
"There are two images used by Bacon to refer to knowledge, torture and light.
The torture refers to the violent twisting of nature's secrets...."
-- http://www.studyworld.com/newsite/reportessay/SocialIssues/Religion%5CFrancis_Bacon_and_the_Society_of_New_Atlantis-32139.htm
This brings to mind Feynman's warning about not fooling your self and brutal honesty.
" 'One of my mentors told me that my real mistake was trying to replicate my work,'...'He told me doing that was just setting myself up for disappointment.' "
A lament of the politically correct, post modern age. Many "scientists" are lapdogs for vested interests or academic gamesmanship. Some even find out belatedly, or get found out, that they really aren't scientists.
Rather than trying to invent a whole load of new effects with psychological explanations I wonder whether anyone has actually looked at things using basic statistics. The only seems to occur when the author has observed a noticeable effect. A noticeable effect is far more likely to be spotted when a statistical fluctuation makes it bigger, rather than smaller. When you then repeat a measurement you will then notice a smaller effect.
The article with the ESP experiment is a dead ringer for this. A student suddenly gets far more correct than they should on average so everyone takes notice and then, over time, the number of correct guesses drops to normal. Would anyone have noticed an exceptionally unlucky streak where a student got more than normal wrong and then suddenly got "better" by approaching normal?
This simple statistical effect has been well known in particle physics for years. Discoveries are typically made on upward fluctuations of data and then you will typically see them decrease with subsequent measurements. However the change is usually within reasonable uncertainty of the previous measurement and is not there for all measurements (although less likely you can still discover something on a downward fluctuation). So how about testing the simplest, statistical hypothesis first before inventing new psychological explanations...unless you think that fundamental particles are somehow subject to the positive, upbeat nature of particle physicists!
I am the author of one of the relatively few open source/free random number generator testing programs out there (dieharder) and I will affirm that there are plenty of people who are nominally statisticians who have no clue as to what e.g. p means when testing against a null hypothesis (which is the most common basis of random number generator testing). The problem is that they are taught the rule in ordinary English that if you perform a statistical test and compute p, the probability of getting your result given a null hypothesis, then if p is larger than some cutoff one "passes the test" at some confidence level.
This is complete bullshit, as is the counter-error, that if p is less than that cutoff (where people tend to pick something absurdly large, like 0.05) one has failed the test at the 5% confidence level.
The truth is that (as the famous George Marsaglia remarks in the diehard documentation) "p happens". In fact, for a correctly designed test, p is itself a uniform deviate. What that means is that one can easily get a p value of 0.001 even if the null hypothesis is true -- one will get this value (or lower) one in a thousand test runs, for a perfectly good random number generator and test, or else the generator should fail the test. One is precisely as justified at failing the generator if p happens to be in the range 0.5-0.55 as one is at failing it if p is in the 0.0-0.05 range.
The relevance of this is hopefully clear. The way to test is to run many tests and look at the distribution of p (or do the moral equivalent of running many tests if testing other ways).
Unfortunately, this immediately introduces two problems. First is that more tests cost more money -- a lot more money because of the need for e.g. HIPAA compliance in medicine. People prematurely publish results that basically might be significant as if they are significant once they've done all of the testing they can afford, at least for now. Second, if many statisticians don't properly appreciate the meaning of p in hypothesis testing and use terms like "confidence level" in places where they mean nothing of the sort, how can one expect people for whom statistics and calculus and math in general was a struggle, working in contexts that reward "results" (however shaky) and punish "no results" (however honest), to get this right?
When you get right down to it, Bayes pretty much lived in vain. Once you get out of the simple realm of hypothesis testing you run right into Bayes. The correct computation of posterior probabilities seems elusive, even in statistical analyses that more or less demand the use of Bayes theorem because of the wealth of prior knowledge we have available. However, if you asked any medical researcher about Bayes theorem, I predict (as a testable hypothesis:-) that no more than two in ten would identify it as something important in statistics, and not one in twenty would be able to actually write it down or explain how it works and why it is important...
rgb
Even when the experts all agree, they may well be mistaken. --- Bertrand Russell.
In fairness to Fisher, in his context (crop yields) 95% significance happened to be the point at which it became worth it to do the alternative thing being tested, from a probabilistic point of view (in that at that point the expected benefit minus the expected cost became positive).
So it's not like he made up the 95% number from nothing. But then people who don't understand statistics came along and cargo-culted it into various contexts where it may or may not make sense.
the research on those chem trails was flawed!
I was promised a flying car. Where is my flying car?
Studies, particularly medical ones, often underestimate the size of an effect because of uncertainties in the data. For instance, a study of asthma inhalers used inhalers that secretly recorded when they were used. The researchers found that patients lied about how they used the inhalers, claiming that they had used them regularly as instructed, when in fact they had not (in some cases, not at all). Similarly, patients in a trial may not take pills as instructed, and lie about it.
The result of this is that the true effect of taking a medication is underestimated, because some of those assumed to be taking it did not. And it is not just when people lie that this occurs. If you are looking at the effects of smoking, say, then people's poor recollection of how much they smoked, and when, results in an underestimation of the effects of smoking.
Any uncertainty in the data (eg, uncertainties on the radiation doses received by individuals at Hiroshima) reduces the estimated magnitude of the effect. Statisticians can compensate for this if they know the magnitude of the uncertainty (and in fact this was done in the study of the effects of radiation at Hiroshima), but in many studies it is assumed that the data is perfect (all patients took their pills as instructed) so that the true effect of the medication is underestimated.
whole thing is rubbish. One of the main implements of an experiment is its reproducability. An experiment with scientific relevance should be constructed in a way that would make it possible to be re-constructed in every lab, all over the world. As I said, its one of the main characteristics of an experiment. So a) its true that non-reproducable experiments are not liable but b) duh, even Galileo knew that.
Extraordinary claims require extraordinary evidence to be accepted as proposed by Carl Sagan. This makes the decline effect described in the article entirely expected. Novel results will be biased to have high statistical significance since they will be considered to be a statistical anomaly otherwise. This means that discovery results will almost always have too high a significance and subsequent study will find a smaller effect.
A prime example of this kind of thing: the theory of continental drift. Wegener proposed it sometime around WWI, but it wasn't accepted by the geology community until around the 60's. There were a couple of reasons for this - among them: Wegener was unable to identify a mechanism for the movement of the continents, but even after a full-fledged theory of plate tectonics came out around 1959, the whole thing was pretty much laughed off for years. The other big reason to dismiss Wegener: he wasn't a geologist by trade, and after all, how could he know anything about the topic?
Medicine is not science? Dammit, if you told me earlier I wouldn't have gone through all the trouble to get my MD-PhD!!
this sig is useless
It is quite possible to use statistics to prove almost anything.
One classical case that I still refer to, is allegedly (as told by my professor) in the late 70'ies the American automotive industry "proved" by statistical means that letting your child use a bicycle to go to school would make them drug addicts as they got older! Statistically it holds, because of the way the numbers were put up. See in the 70'ies only very poor (usually black) families did not have a car and their kids have to use their own bikes to go to school (as the relatively poor public schools in their ares did not have a bus service) and these kids, statistically had a much higher chance of ending up as criminals and drug users than anyone else. So by turning the cause-effect on it's head you could conclude that the cause of this effect is the use of bicycles, the numbers are there...
When many of these numbers are taken as value, it is because they are so abstract to most of us that we can't really tell what they actually mean. Some chemical formula might improve the life of someone with a chronic illness, but the mere fact that this person was chosen to participate in a scientific test might already have changed this persons life (He gets out, meets other participants, feels lucky, feels "something is being done", placebo effect or something entirely different) and this effect is also being measured and accredited to the chemical. This effect would be very different if the chemical was administered unknowingly into said persons life. But without actually performing such devious tests it is hard to know the exact effect even without being intentionally "cheating" to make some numbers add up to a given sum.