Weak Statistical Standards Implicated In Scientific Irreproducibility

← Back to Stories (view on slashdot.org)

Weak Statistical Standards Implicated In Scientific Irreproducibility

Posted by Soulskill on Tuesday November 12, 2013 @11:40AM from the nobody-who-needs-to-understand-statistics-understands-statistics dept.

ananyo writes "The plague of non-reproducibility in science may be mostly due to scientists' use of weak statistical tests, as shown by an innovative method developed by statistician Valen Johnson, at Texas A&M University. Johnson found that a P value of 0.05 or less — commonly considered evidence in support of a hypothesis in many fields including social science — still meant that as many as 17–25% of such findings are probably false (PDF). He advocates for scientists to use more stringent P values of 0.005 or less to support their findings, and thinks that the use of the 0.05 standard might account for most of the problem of non-reproducibility in science — even more than other issues, such as biases and scientific misconduct."

2 of 182 comments (clear)

Min score:

Reason:

Sort:

Re:Or you know.. by hde226868 · 2013-11-12 12:01 · Score: 5, Insightful

The problem with frequentist statistics as used in the article is that its "recipe" character often results in people using statistics that do not understand its limitations (a good example is assuming a normal distribution when there is none). The bayesian approach does not suffer from this problem, also because it forces you to think a little bit more about the problem you are trying to solve compared to the frequentist approach. But that's also the problem with the cited article. Just remaining in the framework and going towards more discriminating thresholds is not really a solution of the problem that people do not understand their data analysis (a p-value based on the wrong distribution remains meaningless, even if you change your threshold...). Because it is more logical in its setup, the danger of making such mistakes is smaller in bayesian statistics. The telescoper over at http://telescoper.wordpress.com/2013/11/12/the-curse-of-p-values/ has a good discussion of these issues.
Interpretation of the 0.05 threshold by Michael+Woodhams · 2013-11-12 12:06 · Score: 5, Insightful

Personally, I've considered results with p values between 0.01 and 0.05 as merely 'suggestive': "It may be worth looking into this more closely to find out if this effect is real." Between 0.01 and 0.001 I'd take the result as tentatively true - I'll accept it until someone refutes it.
If you take p=0.04 as demonstrating a result is true, you're being foolish and statistically naive. However, unless you're a compulsive citation follower (which I'm not) you are somewhat at the mercy of other authors. If Alice says "In Bob (1998) it was shown that ..." I'll tend to accept it without realizing that Bob (1998) was a p=0.04 result.
Obligatory XKCD

--
Quattuor res in hoc mundo sanctae sunt: libri, liberi, libertas et liberalitas.