Social Science Journal 'Bans' Use of p-values
sandbagger writes: Editors of Basic and Applied Social Psychology announced in a February editorial that researchers who submit studies for publication would not be allowed to use common statistical methods, including p-values. While p-values are routinely misused in scientific literature, many researchers who understand its proper role are upset about the ban. Biostatistician Steven Goodman said, "This might be a case in which the cure is worse than the disease. The goal should be the intelligent use of statistics. If the journal is going to take away a tool, however misused, they need to substitute it with something more meaningful."
It is the job of the reviewer to check that the statistic was used ion the proper context. not to check the result, but the methodology. It sounds like social journal simply either have bad reviewer or sucks at methodology.
C. Sagan : A demon haunted world:
http://www.amazon.com/gp/product/0345409469/
visit randi.org
It's a war, I tell you, a war on frequentists! I'm 95% certain!
https://xkcd.com/882/
This is social science. Mathematics and statistics aren't even relevant.
Correlation between low intelligence and uninformed statements of this nature is p<0.01.
This is social science. Mathematics and statistics aren't even relevant.
Yes they are. Get quantitative data, use quantitative methods.
Just because most social 'scientists' are not experts at statistical inference, it doesn't mean it can't be done correctly.
p-values are just a probability of something. Do you experiment well and 'something' makes sense.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Ok, let me enlighten the readers a bit. The reviewers tend to be the typical researcher within the field. The typical social researcher does not have a very strong math background. There is a lot of them into qualitative research and quantitative tends to stop at ANOVA. I have multiple masters in business and social science and worked on a Ph.D. in social science (Being vague here for a reason). However, I have a dual bachelors in comp sci and math. I know statistical analysis very well. My master's thesis for my MBA was an in-depth analysis of survey responses. 30 pages of body and really good graphs. My research professor, an econometrics professor, and I submitted it to a second tier journal associated with the field I specialized in...
... 6 pages got published. 6?!? They took out the vast majority of the math. Why? "Our readers are really bad at math," said the editor. If you knew the field... you would be scared shitless. The reviewers suggested we took out the math because it confused them. This is why they want P value out... it is misunderstood and abused. The reviewers have NO idea if it is being used correctly.
This is why we can't have nice things.
I agree with you. Yet no need for the quotes around social 'scientists.' Psychologists, socialists, etc. employ the same experimental designs and mathematical techniques in experiments as doctors or others performing drug efficacy or medical outcome experiments, for example.
P-Value: It's intervention versus control group. Standard, basic scientific experimental design and statistical analysis stuff.
It's an uninformed and naive view to think that people looking at the behavior of humans at the level of social organization are somehow intellectually or scientifically less able than those examining them at the biological level.
I don't think you even need to be pushing people to do Bayesian stats. You just need to force them to graph their data properly. In *a lot* of biological and social science sub-fields it's standard practice to show your raw data only in the form of a table and the results of stats tests only in the form of a table. They aren't used to looking at graphs and raw data. You can hide a lot of terrible stuff that way, like weird outliers. Things would likely improve immediately in these fields if they banned tables and forced researchers to produce box plots (ideally with overlaid jittered raw data), histograms, overlaid 95% confidence intervals corresponding to their stats tests, etc, etc.
Having seen some of these people work, it's clear that many of them never make these plots in the first place. All they do is look at lists of numbers in summary tables. They have no clue in the first place what their data really look like, and know good knowledge of how to properly analyse data and make graphs. Before they even teach stats to undergrads they should be making them learn to plot data and read graphs. It's obvious most of them can't even do that.
soylentnews.org
Yes they can, in some cases. There was a very well-controlled study where two sets of anonymous letters of application were sent to various positions at a large number of companies from a large number of applicants. The letters included similar random credentials from random institutions, random cosmetic variations of the same cover letter, and so on, to avoid tipping the hand of the researchers. The only difference between the two groups of letters was that one were given names sampled uniformly from African-Americans, and the other given names sampled uniformly from everyone else. The names were assigned in a blind way, literally a random form insertion, to avoid introducing bias.
I'm sure you can guess where this is going. The response and offer rate to the blacks was significantly lower, both statistically and practically. It's rather hard to explain that away, though I'm sure someone here will try without having even read the study.
"They were pure niggers." – Noam Chomsky
There was a very well-controlled study where two sets of anonymous letters of application ...
This study was conducted by Stephen Levitt, and is described in his book Freakonomics, which is a fantastic book for anyone interested in the application of statistics to social science. Here is the original paper.
...and this isn't even the first journal to do this. It's probably happening now because an entire book has just come out walking people how universally abused p-values are as statistical measures.
http://www.statisticsdonewrong...
The book is nice in that it does give one replacements that are more robust and less likely to be meaningless, although nothing can substitute for having a clue about data dredging etc.
rgb
Even when the experts all agree, they may well be mistaken. --- Bertrand Russell.
Actually, p-values are about CORRELATION. Maybe *you* aren't well-positioned to be denigrating others as not statistical experts.
I may be responding to a troll here, but, no, the GP is correct. P-values are about probability. They're often used in the context of evaluating a correlation, but they needn't be. Specifically, p-values specify the probability that the observed statistical result (which may be a correlation) could be a result of random selection of a particularly bad sample. Good sampling techniques can't eliminate the possibility that your random sample just happens to be non-representative, and the p value measures the probability that this has happened. A p value of 0.05 means that there's a 5% chance that your results are bogus in this particular way.
The problem with p values is that they only describe one way that the experiment could have gone wrong, but people interpret them to mean overall confidence -- or, even worse -- significance of the result, when they really only describe confidence that the sample wasn't biased due to bad luck in random sampling. It could have been biased because the sampling methodology wasn't good. I could have been meaningless because it finds an effect which is real, but negligibly small. It be meaningless because the experiment was just badly constructed and didn't measure what it thought it was measuring. There could be lots and lots of other problems.
There's nothing inherently wrong with p values, but people tend to believe they mean far more than they do.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
It is the job of the reviewer to check that the statistic was used ion the proper context. not to check the result, but the methodology. It sounds like social journal simply either have bad reviewer or sucks at methodology.
That's a good sentiment, but it won't work in practice. Here's an example:
Suppose a researcher is running rats in a maze. He measures many things, including the direction that first-run rats turn in their first choice.
He rummages around in the data and finds that more rats (by a lot) turn left on their first attempt. It's highly unlikely that this number of rats would turn left on their first choice based on chance (an easy calculation), so this seems like an interesting effect.
He writes his paper and submits for publication: "Rats prefer to turn left", P<0.05, the effect is real, and all is good.
There's no realistic way that a reviewer can spot the flaw in this paper.
Actually, let's pose this as a puzzle to the readers. Can *you* spot the flaw in the methodology? And if so, can you describe it in a way that makes it obvious to other readers?
(Note that this is a flaw in statistical reasoning, not methodology. It's not because of latent scent trails in the maze or anything else about the setup.)
====
Add to this the number of misunderstandings that people have about the statistical process, and it becomes clear that... what?
Where does the 0.05 number come from? It comes from Pearson himself, of course - any textbook will tell you that. If P<0.05, then the results are significant and worthy of publication.
Except that Pearson didn't *say* that - he said something vaguely similar and it was misinterpreted by many people. Can you describe the difference between what he said and what the textbooks claim he said?
====
You have a null hypothesis and some data with a very low probability. Let's say it's P<0.01. This is such a good P-value that we can reject the null hypothesis and accept the alternative explanation.
P<0.01 is the probability of the data, given the (null) hypothesis. Thus we assume that the probability of the hypothesis is low, given the data.
Can you point out the flaw in this reasoning? Can you do it in a way that other readers will immediately see the problem?
There is a further calculation/formula that will fix the flawed reasoning and allow you to make a correct inference. It's very well-known, the formula has a name, and probably everyone reading this has at least heard of the name. Can you describe how to fix the inference in a way that will make it obvious to the reader?
I studied and tutored experimental design and this use of inferential statistics. I even came up with a formula for 1/5 the calculator keystrokes when learning to calculate the p-value manually. Take the standard deviation and mean for each group, then calculate the standard deviation of these means (how different the groups are) divided by the mean of these standard deviations (how wide the groups of data are) and multiply by the square root of n (sample size for each group). But that's off the point. We had 5 papers in our class for psychology majors (I almost graduated in that instead of engineering) that discussed why controlled experiments (using the p-value) should not be published. In each case my knee-jerk reaction was that they didn't like math or didn't understand math and just wanted to 'suppose' answers. But each article attacked the math abuse, by proficient academics at universities who did this sort of research. I came around too. The math is established for random environments but the scientists control every bit of the environment, not to get better results but to detect thing so tiny that they really don't matter. The math lets them misuse the word 'significant' as though there is a strong connection between cause and effect. Yet every environmental restriction (same living arrangements, same diets, same genetic strain of rats, etc) invalidates the result. It's called intrinsic validity (finding it in the experiment) vs. extrinsic validity (applying in real life). You can also find things that are weaker (by the square root of n) by using larger groups. A study can be set up in a way so as to likely find 'something' tiny and get the research prestige, but another study can be set up with different controls that turn out an opposite result. And none apply to real life like reading the results of an entire population living normal lives. You have to study and think quite a while, as I did (even walking the streets around Berkeley to find books on the subject up to 40 years prior) to see that the words "99 percentage significance level" means not a strong effect but more likely one that is so tiny, maybe a part in a million, that you'd never see it in real life.
OK a new size TV