Is Statistical Significance Significant? (npr.org)
More than 850 scientists and statisticians told the authors of a Nature commentary that they are endorsing an idea to ban "statistical significance." Critics say that declaring a result to be statistically significant or not essentially forces complicated questions to be answered as true or false. "The world is much more uncertain than that," says Nicoole Lazar, a professor of statistics at the University of Georgia. An entire issue of the journal The American Statistician is devoted to this question, with 43 articles and a 17,500-word editorial that Lazar co-authored.
"In the early 20th century, the father of statistics, R.A. Fisher, developed a test of significance," reports NPR. "It involves a variable called the p-value, that he intended to be a guide for judging results. Over the years, scientists have warped that idea beyond all recognition, creating an arbitrary threshold for the p-value, typically 0.05, and they use that to declare whether a scientific result is significant or not. Slashdot reader apoc.famine writes: In a nutshell, what the statisticians are recommending is that we embrace uncertainty, quantify it, and discuss it, rather than set arbitrary measures for when studies are worth publishing. This way research which appears interesting but which doesn't hit that magical p == 0.05 can be published and discussed, and scientists won't feel pressured to p-hack.
"In the early 20th century, the father of statistics, R.A. Fisher, developed a test of significance," reports NPR. "It involves a variable called the p-value, that he intended to be a guide for judging results. Over the years, scientists have warped that idea beyond all recognition, creating an arbitrary threshold for the p-value, typically 0.05, and they use that to declare whether a scientific result is significant or not. Slashdot reader apoc.famine writes: In a nutshell, what the statisticians are recommending is that we embrace uncertainty, quantify it, and discuss it, rather than set arbitrary measures for when studies are worth publishing. This way research which appears interesting but which doesn't hit that magical p == 0.05 can be published and discussed, and scientists won't feel pressured to p-hack.
But not always.
Then I took a course on statistics, and the stats professor told me that 47.37% of all statisticians make up their own statistics.
Some drink at the fountain of knowledge. Others just gargle.
100% of all published incorrect results have a P value above 0.05
Some drink at the fountain of knowledge. Others just gargle.
> 850 scientists and statisticians
Not a statistically significant representation of the scientific community.
A prime number is divisible only by itself and 1
1 is prime (by this definition)
3 is prime
5 is prime
7 is prime
11 is prime
13 is prime
9 is experimental error.
The proposition that "all odd numbers are prime" has a P value above 0.05.
Some drink at the fountain of knowledge. Others just gargle.
Nope. I'll delete it from Wikipedia later today.
882: Significant
If intelligent life is too complex to evolve on its own, who designed God?
In particle physics, (the field in which I have my Ph.D. but--full disclosure--no longer work), the standard is 3 sigma to claim evidence for an effect, and 5 sigma to claim discovery. Publication of results below 3 sigma is not only encouraged, but required...it's unethical to conceal such results. A null result can be a theory killer.
In my International Relations graduate program there was a big push towards quantitative research and analysis; there were two mandatory classes on it. However, I always felt that it broke things down into too simplistic a view, and while it could tell things might be correlated, it never told you why. And with human systems like societies, states, conflict, politics, etc, there are so many inputs, so many factors that contribute to why people act the way they do, what decisions they make, that to boil it down to one or two that are "statistically significant" isn't missing the forest for a tree, it's missing the forest for a bush. Complex systems very often have complex inputs.
That's why I preferred a more qualitative approach: there was no arbitrary line of significance. It allowed you to explore more complicated or elaborate analyses. There was no worry about getting bogged down in what regression method you used or why, whether your math was wrong, or you excluded/included a variable that you shouldn't have. It gives you the chance to simply lay out your theory, your interpretation, and the evidence to back up that interpretation. And best of all, it allows you do it in such a way that it makes your research much more accessible to other people. I also prefer a more narrative style of writing anyway. Now, of course this for a humanities discipline. A more scientific discipline would require significantly more math.
The only thing necessary for evil to triumph is for it to be pitted against a slightly greater evil
This way research which appears interesting but which doesn't hit that magical p == 0.05 can be published and discussed
The significance value is essentially a measurement of how good a researcher is at their job. Unfortunately, a lot of academics feel that they shouldn't be bothered by silly things like "accountability", because they've chosen the noble ivory tower of research.
If your experiment can't hit that level of certainty, redesign your experiment. Go get more samples, run more simulations, and grow more cultures. Alternatively, go ahead and publish, but include the note that the job isn't actually finished. Use the partial result to justify asking for more funding so you can complete the work.
(These are all things I saw first- or secondhand during my time in academia)
I'd be fine getting rid of the p-value, but it would have to be replaced by something else that does an equal job of filtering out the half-assed crank "research" that makes more headlines than discoveries. The only replacement I can think of that wouldn't be vulnerable to similar "hack" methods would be to require that every experiment go through an exhaustive process inspection before, during, and after the run. That's an even more painful thing to deal with than making sure your experiment can produce significant results.
You do not have a moral or legal right to do absolutely anything you want.
Plus they are almost all from biology or medicine. Just because their fields don't seem to understand what statistically significant means does not mean that the rest of us do not. Their example when two results measure the same value but one is within one sigma of a null result and the other is not they claim that people interpret this as two incompatible results!? I do not know of any physicist who would look at those data and make that assertion.
Their paper reads more like a "I wish our colleagues understood simple statistics". Banning certain terms is not going to address the underlying problem they clearly have. The solution to ignorance is education, not censorship as they really ought to know, working in universities!
Even without a magical "significant/insignificant" threshold, researchers will still evaluate, judge, and compare levels of significance. The pressure will just shift to come up with results that are "MORE significant" rather than "LESS significant," and thus p-hacking will continue by those that were willing to cross that line in the first place.
The root cause is going to remain until peer reviewers force researchers to commit to how they're going to evaluate their measurements before they take those measurements. But the likely outcome would be either a lot less research would get published at all or published research would start to lose some of the imprimatur it now enjoys, including that of the peer reviewers. So that's unlikely to happen.
On average, humans have one breast and one testicle.
It's even worse when economic trends are reported in the popular press.
1 is prime by that definition, but it's mostly called a unit and defined as *not* prime to make factorising integers into primes unique (up to the order of the factors): Prime number - Primality of 1
Sure, in a perfect world we would all discuss the exact probabilities. The reality is we all (even professionals in an industry) have a limited attention span. Benchmarks are useful, even imperfect benchmarks. This is just another example of some purists thinking we should move to some idealized but impractical situation
I'm really curious about what people think about this comment and my attempt to defend p-values and statistical significance testing as a concept. I used to hate p-values like any respectable scientist, but in teaching intro college stats class (targeted to behavioral science), I've come to appreciate them, for one major reason.
1. We have to take uncertain science and make certain decisions about the conclusions. Science gets simplified to dichotomous decisions. You either approve the drug or not. You either eat eggs or don't eat eggs. The defendant is guilty or not guilty. In each of these cases, we take scientific and other evidence and have to make a decision: do we trust these data. Confidence intervals, odds ratios, etc, help give a picture but they don't give a clear guideline about what to accept.
2. It's really hard to understand (and teach) Bayesian and other approaches. I think that statistical significance is a decent proxy, as long as the limitations are well-understood. I am a big believer in teaching science research to people who have no desire to ever be "researchers", and in order to evaluate their studies, statistical significance is a good proxy. If you are doing an intro biology lab testing whether there are more bacteria on your hands after washing your hands versus hand sanitizer, a t-test with a p .05 criterion is a good approach. It won't get published in JAMA, but it's good for teaching research concepts.
3. Reviewers still want p-values. Each time I have submitted a manuscript without p-values, I get a nasty reviewer who requires p-values. Maybe I've had bad luck, but I'm guessing this is pretty common in the literature. Any time I try a statistical technique that goes beyond null hypothesis testing, there is at least one reviewer who doesn't understand the technique and gripes because there are no p-values or decision criteria. As long as this is required to publish, we need to do it.
So these aren't very good defenses, but it's why I'm still teaching p-values and null hypothesis testing. Maybe we will get rid of it, but like some other comments here, it leaves the question of what the alternative would be.
Mostly, they don't understand that the world isn't black and white.
People want answers. That's a given. And they used to turn to science for this. I say used to, because more and more people think that woo has better answers for their questions. The reason is less that science does not have answers, but that the answers science has require thinking and understanding. They are rarely YES or NO. There's a lot of ifs and buts attached, but people don't want that. They want easy answers.
And reality has rarely easy answers.
"Statistically significant" doesn't mean "resoundingly YES". But that was what was read into it, and of course that expected YES cannot be delivered.
Yes, reading statistics requires some effort by those trying to understand them. Unfortunately that's not what people want to do when they're looking for answers.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
If you understand what it means and how to apply it. If you blindly slap on the formula and use the resulting number to say, "Look, it's significant!", then, no, it isn't.
When I took statistics, the text made it clear that a P-value of 0.05 is *somewhat* arbitrary, in that for any individual analysis, it is a useful threshold, but by itself not an absolute indicator of significance. I think the people in this group are guilty of overstating their argument. Determining P-value, or any other statistical measure of significance, is the *start* of a study, and then comes all the hard work of determining if that value is pointing to something truly significant. But a p value of 0.05 is certainly going to suggest that the finding is significant, but it is not THE definitive test.
The world's burning. Moped Jesus spotted on I50. Details at 11.
meh just set it to 0.051 and watch 90% of "science" publication burn
The wait is over - YOU did!
"A prime number is divisible only by itself and 1
1 is prime (by this definition)"
When I was learning Maths (Mathematics is plural where I come from) I was taught that 1 is not prime, it is a special case.
Anyway for 1 the statement becomes:
1 is divisible by 1 and 1
But 1and1 is now IONOS
And reality has rarely easy answers.
Which is why engineers answer most questions with "it depends".
Having P = 0.05 has led to "P-Hacking". One example was the "chocolate" trial that had a small group eating different diets and tracked a whole slew of things, looking for a correlation in any of them. It happened to be that in the small group eating 1.5oz of chocolate and otherwise dieting lost slightly more weight than the group just dieting. Due to not having a specific goal in mind, and small sample size, they were bound to determine some sort of "positive correlation" somewhere, and there you go. If it hadn't been weight loss, it would have been heart rate or something else.
https://io9.gizmodo.com/i-fool...
https://en.wikipedia.org/wiki/...
Inheritance is the sincerest form of nepotism.
The issue I find with nearly every single biological application of p-value testing is that either the wrong test is used, or, far more frequently, the necessary validations of the assumptions of the test have not been made. I assume that among those many articles from The American Statistician (a journal that I do not read) that point will have been made because although it is a subtle one, it isn't that subtle, and it is important.
The most commonly used statistical tests assume that unaccounted experimental variability will be Gaussian in nature. That assumption is patently false for the general case. Noise sources for some things are Gaussian -- thermal noise in an electrical signal for example -- but many, many biological sources are not.
When Nature is non-Gaussian, you have to be extra super careful with your tests of significance. And nearly every paper that I've read skips doing noise analysis to validate their tests. Even the lowly mean and standard deviation functions assume Gaussian variability for correct interpretation. The alternative is to have p-values that are so small that results are obvious by inspection --- and then you don't need statistics.
That's the sort of science I strive to perform.
Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
If you browse around a typical statistics textbook, you will probably find a brief discussion about the difference between statistical significance and real world significance. It seems like a lot of people in sciences, specially in the soft sciences are chasing after the statistical significance because it's now some kind of a prerequisite to get published. However, their findings can amount to very little in the real world. Imagine for example that you find out there the commute distance is statistically significant between people who drink diet coke and tomato juice. Sounds like a great title for a click-bait report. But in reality, your estimates can be 7.34 miles vs 7.36 miles, a difference of 40 meters.
Yay.
So we can look forward to even MORE broken, badly researched, pointless garbage being published as academically or scientifically relevant.
Look at the finances of any journal pushing this crap. They're probably on borrowed time, in the financial sense.
Chas - The one, the only.
THANK GOD!!!
And this is why we don't make good politicians. Politics need easy answers. They needn't be correct or even solve anything, but they have to be easy to understand.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
By Michael Greenacre & Gurdeep Stephens
Barcelona, Catalonia & Victoria, Canada, March 2015
[Video link]
Statistics, logistics, cladistics seem to me
To have a common theme scientifically,
Economists, biologists, with PhD degrees,
They all need some proof of their theories.
A letter is the key, you'll see clearly,
Not B nor G nor V -- but it's the P !
There's no values like P-values
Like no values I know
Think of something that is not worth proving,
An hypothesis that everyone calls null,
If your P is too large to reject it
Then your experiment is rather dull.
There's no values like P-values,
Especially when they are low,
Don't be sad if your P's over point-O-five,
Just try again with samples twice the size,
Everything is possible, just trust in me:
Put your faith in the P.
The F test, the Z test, the chi-square and the T
And other cryptic terminology
Anova, regression, tests distribution-free,
They all need some sort of guarantee.
So if you find a tiny effect size
The P-value will be a good disguise.
There's no values like P-values,
The frequentist's hero,
When you get that data modeling feeling
But results you have are not a lot,
You will need some stats that are appealing
To show the journals your work is hot!
There's no values like P-values
Especially when they're low
Don't be sad if your P's over point-O-five,
Just try again with samples twice the size
Everything is possible, just trust in me:
Put your faith in the P!
We have too many researchers doing too many studies about the same topic and we incorrectly view each study as a separate event. Without quantification of the number of unpublished, published, and significant studies on a given topic, an individual study's relevance is unknown. If 200 separate researchers did 50 studies each (or 2000 did 5 studies) for a total of 10,000 studies, at .05 p, we could expect 500 false positives. When a study is published without knowing the universe of all studies on that topic, we do not know if any report of a significance level is really significant. Add that there is a bias to report and document positive over negative results. There is also data mining, where an existing database is used to search for any relationship among the historical variables at a p value and then report that relationship. With a large universe of studies and with data mining of historical data, an individual studies significance level is unknown and reproducible results is very low. Combining the data of published studies does not help, since their is a bias is what is reported and published.
The real problem is when scientists aren't interested in finding something significant, they are interested in getting published. In that situation, even setting the threshold at .0005 will end up with p value hacking.
"First they came for the slanderers and i said nothing."
"Prime numbers have to be greater than 1 so 1 is not a prime."
According to your definition. Like most terms, there is no king to give the definitive definition. To me, prime is a cut of meat.
Also, remember that math is just a mental contruct that allows our human minds to interpret the universe around us.
Ninjas don't carry tic tacs
The even number 2
"All odd numbers are prime" does not imply "no even numbers are prime".
He's getting rather old, but he's a good mouse.
Actually 1 is neither prime nor composite by some deep mathematical definitions which go beyond the integers -- they go into the structure of algebraic rings which are generalizations of the integers. If you allow 1 (a unit) to be prime then you break some properties and theorems which everyone generally accepts in the algebra of the integers. The most well known such property is that of unique factorization -- any natural number is factored uniquely into prime factors. If you let 1 be prime then the prime factorization of a composite number can have any number of factors of 1 in it.
The deeper definition of a prime (from my old abstract algebra book) is, "In the Euclidean ring R a nonunit p is said to be a prime element of R if whenever p = ab, where a, b are in R, then one of a or b is a unit in R."
And there is a king which gives the definitive definition -- it is the accepted body of mathematical definitions by the world's mathematical community. There are sometimes differing definitions of a term, but those differences are usually well spelled out in any discussions. You can choose not to accept the definitions as the professionals in the field use them but then don't claim your definition is as good or useful as that of the pros.
I don't care how significant your p value is, if your n is less than 40 case/control match your values are meaningless, other than proof of concept for further study.
Wake me up when you get 256/256 fully matched case/control with true randomization. Then we'll talk p values.
-- Tigger warning: This post may contain tiggers! --
To get a truly significant number, you would need 850 scientists, and 850 controls (or non-scientists). And you would need a truly randomized sample of both. If all the scientists are the same age BMI and gender, it's not even close to randomized. Throw in some post-docs.
-- Tigger warning: This post may contain tiggers! --
p 0.05 is supposed to be kind of a minimum threshold. Higher than that and you really can't draw any conclusions. Less than that, and you might have something. Maybe. It's basically a first level filter.
Does that mean you get some false negatives? Absolutely. And you also get lots of false positives.
There seem to be a bunch of people who want to look at confidence intervals and say "well, a good part of my confidence interval is over here, which is interesting, so this is important!" There are also a bunch of people who think that too many false positives get published and want to clean things up. These two things are at odds with each other.
I much prefer Bayes factor hacking. Sounds way fancier.
Lies, damned lies, and statistics is a phrase describing the persuasive power of numbers, particularly the use of statistics to bolster weak arguments
https://en.m.wikipedia.org/wik...
Casteism
Then don't use the word statistical significance to express the word "real". This "p value exaggerate" has been discussed over 10 years ago, but the p-value is still being used because many people, like you, are, as you are saying, have to accept some thing or method in order to be "real". As MightyMartian said, p-value could be used in initial test, but it should never be used as "statistical significance" at all.
I've always thought "statistical reliability" was a better name.