Why P-values Cannot Tell You If a Hypothesis Is Correct
ananyo writes "P values, the 'gold standard' of statistical validity, are not as reliable as many scientists assume. Critically, they cannot tell you the odds that a hypothesis is correct. A feature in Nature looks at why, if a result looks too good to be true, it probably is, despite an impressive-seeming P value."
Don't worry, with the way beta is going you'll soon have first post on -every- post :)
http://xkcd.com/882/
Even the example of p=0.01 from the article is subject to the same problem. That's why the LHC worked for something like 6 sigma before declaring the higgs boson to be discovered. Even then, there's always the chance, however remote, that statistics fooled them.
it takes more then 1 study.
There is a push to have studies include Bayesian Probability.
IMHO all papers should be read be statisticians just to be sure the calculation are correct.
The Kruger Dunning explains most post on
There is no shortage of misleading statistics out there. It can be a discipline fraught with peril for the uninformed, and there are lots of statistics packages out there that reduce advanced tests to a "point and shoot" level of difficulty that produces results that may not mean what the user thinks they mean. I've read some articles showing no lack of problems in the social sciences, but the problem is bigger than that.
I can't help wondering how much that plays into the oscillating recommendations that you see for various foods. Both coffee and eggs have gone through repeated cycles of, "it's bad," "no, it's good," "no, it's bad," "no, it's good." I understand that at least some of it is coming down to the aspect they choose to measure, but I can't help but wonder now much bad statistics is playing into it.
much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
The world is full of coincidental correlations waiting to be rationalized into causality relationships.
That means "outmoded and archaic", right?
I realize I have a p-value in my .sig line and have for a decade, but p-values were a mediocre way to communicate the plausibility of a claim even in 2003. They are still used simply because the scientific community--and even moreso the research communities in some areas of the social sciences--are incredibly conservative and unwilling to update their standards of practice long after the rest of the world has passed them by.
Everyone who cares about epistemology has known for decades that p-values are a lousy way to communicate (im)plausibility. This is part and parcel of the Bayesian revolution. It's good that Nature is finally noticing, but it's not as if papers haven't been published in ApJ and similar journals since the '90's with curves showing the plausibility of hypotheses as positive statements.
A p-value is the probability of the data occurring given the null hypothesis is true, and which in the strictest sense says nothing about the hypothesis under test, only the null. This is why the value cited in my .sig line is relevant: people who are innocent are not guilty. This rare case where there is an interesting binary opposition between competing hypothesis is the only one where p-values are modestly useful.
In the general case there are multiple competing hypotheses, and Bayesian analysis is well-suited to updating their plausiblities given some new evidence (I'm personally in favour of biased priors as well.) The results of such an analysis is the plausibility of each hypothesis given everything we know, which is the most anyone can ever reasonably hope for in our quest to know the world.
[Note on language: I distinguish between "plausibility"--which is the degree of belief we have in something--and "probability"--which I'm comfortable taking on a more-or-less frequentist basis. Many Bayesians use "probability" for both of these related by distinct concepts, which I believe is a source of a great deal of confusion, particularly around the question of subjectivity. Plausibilities are subjective, probabilities are objective.]
Blasphemy is a human right. Blasphemophobia kills.
I learnt the uselessness of statistics for guidance of correctness when trying to reduce my effort required at Sudoku. I've since discover the best way to win is not to play. Doesn't stop me trying though!
From TFA:
A few folk here have commented using incomplete or inaccurate definitions of p-values. A p-value is the probability of finding new data as or more extreme as data you observed assuming a null hypothesis is true. A couple of salient criticisms not mentioned in the article are a) why should more extreme data be lumped in with what was observed and b) what if "new" data can't sensibly be obtained.
In a less technical sense, what the article didn't get into so much is that there is a strong publication bias towards results that are significant (i.e. small p-values), to the point where you need <0.05 to even consider submitting. Some key reading: http://www.stanford.edu/~neilm/qjps.pdf. The short version is to not believe it when the news says that "recent research shows...".
Personally, I wait for evidence to accumulate before, say, changing my diet. And if you really want to get it right, dig through the literature yourself. Some of my saddest moments have come from statistics consulting where mostly people come to you looking for permission to run an inappropriate analysis, not understand their data or fit the "right" model. They want to get published, and that's just how things are done.
Also there is a simpler analysis of the above article
MOD THE CHILD UP!
One variant of "p-hacking" is "torturing the data", or performing the same statistical test over and over again, on slightly different data sets, until you get the result that you want. You will eventually get the result you want, regardless of the underlying reality, because there is 1 spurious result for every 20 statistical tests you perform (p=0.05).
I remember one amusing example, which involved a researcher who claimed that a positive mental outlook increases cancer survival times. He had a poorly-controlled study demonstrating that people who keep their "mood up" are more likely to survive longer if they have cancer. When other researchers designed a larger, high-quality study to examine this phenomenon, it found no effect. Mood made no difference to survival time.
Then something interesting happened. The original researcher responded by looking for subsets of the data from the large study, to find any sub-groups where his hypothesis would be confirmed. He ended up retorting that "keeping a positive mental outlook DID work, according to your own data, for 35-45 year-old east asian females (peven if the p value was 0.05.
This kind of thing crops up all the time.