Social Science Journal 'Bans' Use of p-values
sandbagger writes: Editors of Basic and Applied Social Psychology announced in a February editorial that researchers who submit studies for publication would not be allowed to use common statistical methods, including p-values. While p-values are routinely misused in scientific literature, many researchers who understand its proper role are upset about the ban. Biostatistician Steven Goodman said, "This might be a case in which the cure is worse than the disease. The goal should be the intelligent use of statistics. If the journal is going to take away a tool, however misused, they need to substitute it with something more meaningful."
http://xkcd.com/1478/
It is the job of the reviewer to check that the statistic was used ion the proper context. not to check the result, but the methodology. It sounds like social journal simply either have bad reviewer or sucks at methodology.
C. Sagan : A demon haunted world:
http://www.amazon.com/gp/product/0345409469/
visit randi.org
My immediate thought would be that hard math in this field doesn't tow the groupthink by revealing too much that they want to be able to argue around, so their solution is to try to eliminate the math.
I don't know that this is the case or anything: it's just the only real motivation that would lead to this. Like, the studies show stuff that no one wants to talk about or the math prevents people from coming to a conclusion opposite reality.
It's a war, I tell you, a war on frequentists! I'm 95% certain!
https://xkcd.com/882/
This is social science. Mathematics and statistics aren't even relevant.
Correlation between low intelligence and uninformed statements of this nature is p<0.01.
Don't worry, we'll find another panacea statistic.
My next recommendation to the Basic and Applied Social Psychology committee is that any conjunctions in a sentence are to be removed. Mainly because of the poor usage of english grammer in a typical submittal.
From a blog by a colleague of mine on the subject: "Questions that p-values can answer" != "Interesting questions about the world'.
"Look at this experimental evidence and tell me what you see?"
Revolution is the opium of the intellectuals.
This is social science. Mathematics and statistics aren't even relevant.
Yes they are. Get quantitative data, use quantitative methods.
Just because most social 'scientists' are not experts at statistical inference, it doesn't mean it can't be done correctly.
p-values are just a probability of something. Do you experiment well and 'something' makes sense.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
miss used them, it is right to ban them. To not ban them is to support racism.
Did that sound as dumb in your head as it looks on the screen?
At least one president of the American Psychological Association published a statistics book intelligent enough that it used to be required in university statistics intro classes: http://books.google.com/books/about/How_to_use_and_misuse_statistics.html
Not that he would have disagreed with the comment about social psychologists...
Ok, let me enlighten the readers a bit. The reviewers tend to be the typical researcher within the field. The typical social researcher does not have a very strong math background. There is a lot of them into qualitative research and quantitative tends to stop at ANOVA. I have multiple masters in business and social science and worked on a Ph.D. in social science (Being vague here for a reason). However, I have a dual bachelors in comp sci and math. I know statistical analysis very well. My master's thesis for my MBA was an in-depth analysis of survey responses. 30 pages of body and really good graphs. My research professor, an econometrics professor, and I submitted it to a second tier journal associated with the field I specialized in...
... 6 pages got published. 6?!? They took out the vast majority of the math. Why? "Our readers are really bad at math," said the editor. If you knew the field... you would be scared shitless. The reviewers suggested we took out the math because it confused them. This is why they want P value out... it is misunderstood and abused. The reviewers have NO idea if it is being used correctly.
Blindfold. Check.
Math textbook. Check.
Bedroom. Check.
Girlfriend to enjoy my fetish with??? Oh wait, this is Slashdot.
This is why we can't have nice things.
used to be required in university statistics intro classes: http://books.google.com/books/about/How_to_use_and_misuse_statistics.html
I suspect that book is still foundational in most University advertising/marketing progams.
the growth in cynicism and rebellion has not been without cause
Just because you are post positivist doesn't mean you are right. Social science is more difficult and has constantly been used to resolve issues with 'hard' science. How difficult do you think it was to create the Nash equilibrium, which is social science? What about the Nobel prize game theory won in Biology based on social science? Why don't you spend some time learning something before you blast it?
I just tried thinking for myself but because you told me to do it, it didn't feel right. Sheesh, any suggestions??
While p-values are routinely misused in scientific literature, many researchers who understand its proper role are upset about the ban.
Do they also know whether "p-values" is plural or singular?
systemd is Roko's Basilisk.
"Racism", "sexism", "patriarchy" and related topics of study within the social "sciences" inherently can't be quantitatively analyzed in any meaningful way.
I agree with you. Yet no need for the quotes around social 'scientists.' Psychologists, socialists, etc. employ the same experimental designs and mathematical techniques in experiments as doctors or others performing drug efficacy or medical outcome experiments, for example.
P-Value: It's intervention versus control group. Standard, basic scientific experimental design and statistical analysis stuff.
It's an uninformed and naive view to think that people looking at the behavior of humans at the level of social organization are somehow intellectually or scientifically less able than those examining them at the biological level.
Nash equilibria are pure math. They do have lots of applications, but the definition is about a point in a multidimensional space satisfying a system of inequalities.
Just because a lot of researchers in a particular field suck at what they do doesn't mean the field is inherently not a science, only that the subset of crappy work is not scientific. There is plenty of work in social sciences that does live up the science name, including controlled experiments and practical applications.
those two examples are from economics. whatever your opinion of that discipline may be, it is, at least, in a different class of bullshit from sociology or "social psychology".
"They were pure niggers." – Noam Chomsky
I could cite examples of your folly all day, but since only one instance is needed to refute your foolhardy blanket statement, this will suffice:
http://en.wikipedia.org/wiki/I...
Wow! Would [apart from a self-aware and non-self-righteous human being] could imagine that the processes of the brain could be measured and analyzed mathematically?!
Well, I guess you learn something every day, eh?
I don't think you even need to be pushing people to do Bayesian stats. You just need to force them to graph their data properly. In *a lot* of biological and social science sub-fields it's standard practice to show your raw data only in the form of a table and the results of stats tests only in the form of a table. They aren't used to looking at graphs and raw data. You can hide a lot of terrible stuff that way, like weird outliers. Things would likely improve immediately in these fields if they banned tables and forced researchers to produce box plots (ideally with overlaid jittered raw data), histograms, overlaid 95% confidence intervals corresponding to their stats tests, etc, etc.
Having seen some of these people work, it's clear that many of them never make these plots in the first place. All they do is look at lists of numbers in summary tables. They have no clue in the first place what their data really look like, and know good knowledge of how to properly analyse data and make graphs. Before they even teach stats to undergrads they should be making them learn to plot data and read graphs. It's obvious most of them can't even do that.
soylentnews.org
I agree with you. Yet no need for the quotes around social 'scientists.' Psychologists, socialists, etc. employ the same experimental designs and mathematical techniques in experiments as doctors
No, socialists are a political group, and psychology is the unscientific part of psychiatry.
Welp you sure convinced me. Let's lock up all the white male cispigs.
Yes they can, in some cases. There was a very well-controlled study where two sets of anonymous letters of application were sent to various positions at a large number of companies from a large number of applicants. The letters included similar random credentials from random institutions, random cosmetic variations of the same cover letter, and so on, to avoid tipping the hand of the researchers. The only difference between the two groups of letters was that one were given names sampled uniformly from African-Americans, and the other given names sampled uniformly from everyone else. The names were assigned in a blind way, literally a random form insertion, to avoid introducing bias.
I'm sure you can guess where this is going. The response and offer rate to the blacks was significantly lower, both statistically and practically. It's rather hard to explain that away, though I'm sure someone here will try without having even read the study.
"They were pure niggers." – Noam Chomsky
This is social science. Mathematics and statistics aren't even relevant.
Undergraduate psychology and sociology students are required to study statistics. Undergraduates in medicine, biology, chemistry, physics, et al. are not. So, perhaps you need to rethink your ignorance about the limits of the scientific method and educate yourself about what these subject areas actually entail in real life instead of in your uninformed world view.
Dammit, I meant the letters were written anonymously and then labeled with names later. I guess "pseudonymous" would have been a better word. Oh well.
"They were pure niggers." – Noam Chomsky
Not a big fan of college, eh?
Use of the p-value gave us conclusions that weren't politically correct. We have corrected the issue by banning the use of the p-value so that only True Science may be published.
Welp you sure convinced me. Let's lock up all the white male cispigs.
You obviously didn't read. IAT applies equally to all races.
Did that sound as dumb in your head as it looks on the screen?
+1
LOL
Just because most social 'scientists' are not experts at statistical inference, it doesn't mean it can't be done correctly.
p-values are just a probability of something.
Actually, p-values are about CORRELATION.
Maybe *you* aren't well-positioned to be denigrating others as not statistical experts.
"Racism", "sexism", "patriarchy" and related topics of study within the social "sciences" inherently can't be quantitatively analyzed in any meaningful way.
You sound as silly now as the people who used to think atoms were the *inherent* limit of divisibility and exploration. Then electrons...
In science, as in politics, innovation tends to come from the death of the old stalwarts rather than their enlightenment.
Even Einstein became an obstructionist to quantum mechanics in his later years.
There was a very well-controlled study where two sets of anonymous letters of application ...
This study was conducted by Stephen Levitt, and is described in his book Freakonomics, which is a fantastic book for anyone interested in the application of statistics to social science. Here is the original paper.
Even Einstein became an obstructionist to quantum mechanics in his later years.
"God does not play dice with the universe." ;-)
...and this isn't even the first journal to do this. It's probably happening now because an entire book has just come out walking people how universally abused p-values are as statistical measures.
http://www.statisticsdonewrong...
The book is nice in that it does give one replacements that are more robust and less likely to be meaningless, although nothing can substitute for having a clue about data dredging etc.
rgb
Even when the experts all agree, they may well be mistaken. --- Bertrand Russell.
Are you sure that the term "well-controlled study" applies, given how you repeatedly used the term "random" when describing this experiment?
Randomness is not compatible with experimental control. Additionally, randomness itself cannot be controlled, because doing so would prevent it from being true randomness.
I used 'scientists' in quotes in the same sense I'd put computer 'scientistis' in quotes. My degree is computer science, but I dispute that it's a science in the conventional sense.
I find debugging hardware is closer to science. You can't really see inside the chip, but you can develop hypotheses about what it wrong and come up with tests that will refute (or not) the hypotheses. Iterate until you think you probably know the truth.
Doing things well in social sciences is hard. The field (human subjects, IRB etc.) doesn't admit normal testing methods readily. You can't set up a control group and not teach them anything when the control group are school children, or not treat them when the control group is cancer patients, or not house them when the control group is people of the prevalent skin color in the area. The statistics to do things correctly are therefore non trivial and are all about making do with what you have and not over-inferring. If your professors don't know this stuff, and you don't know this stuff, and the paper reviewers don't know this stuff, then it's going to be hard to be rigorous.
I design chips and I do new things that haven't been done before in the analog/digital overlap. So I need data to test. My curves and P-values look great, since I just pull a couple of gig of data when I need it. The control group won't get upset, it's a chip. This is easy compared to statistics in the social sciences. So it's less 'science' and more 'advanced inference'.
It's reasonable for a journal to declare that it (and it's reviewers) don't know that stuff. Presumably there a journal with statistically skilled reviewers and you should submit there if you need that sort of peer review.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
IMO the main problem with p-values is it equates p(a|b) with p(b|a) ... disregarding that different values of p(b) and p(a) could make that equation woefully inaccurate.
it really needs to be a bayesian estimate. they need to look at p(a) and p(b) in addition to p(a|b) (or p(b|a)).
regarding "... they need to substitute it with something more meaningful", that's the more meaningful thing they need to substitute it with.
If this is important enough of an issue to consider such a radical change to policy, then they should also have considered other possible solutions, like requiring a statistician be included in the pool of reviewers. The journal I submit to most frequently uses 2 to 3 ad hoc reviewers plus the associate section editor. It could be possible to require the section editor who choses the ad hoc reviewers to include a statistician as the 3rd reviewer. They would then review for the soundness of the statistical procedures, and the appropriateness of the conclusions based on the model used, and analysis conducted.
I have better stats chops than most in my field (dunning kruger delusion on my part, possibly), but I know that I'm no statistician. I think that getting an actual statistician involved in reviewing most papers as a content expert is far more valuable to science as a whole than simply banning a statistical convention that can be, but is not universally, abused. The comments from the statistician would improve the statistical prowess of the corresponding author, thus reducing the tendency for conclusions based on poor stats to be accepted at face value. This move just hides the ignorance behind confidence intervals, which can also be abused if they are not calculated correctly.
Bureaucracy expands to meet the needs of the expanding bureaucracy.-Oscar Wilde
Actually, p-values are about CORRELATION. Maybe *you* aren't well-positioned to be denigrating others as not statistical experts.
I may be responding to a troll here, but, no, the GP is correct. P-values are about probability. They're often used in the context of evaluating a correlation, but they needn't be. Specifically, p-values specify the probability that the observed statistical result (which may be a correlation) could be a result of random selection of a particularly bad sample. Good sampling techniques can't eliminate the possibility that your random sample just happens to be non-representative, and the p value measures the probability that this has happened. A p value of 0.05 means that there's a 5% chance that your results are bogus in this particular way.
The problem with p values is that they only describe one way that the experiment could have gone wrong, but people interpret them to mean overall confidence -- or, even worse -- significance of the result, when they really only describe confidence that the sample wasn't biased due to bad luck in random sampling. It could have been biased because the sampling methodology wasn't good. I could have been meaningless because it finds an effect which is real, but negligibly small. It be meaningless because the experiment was just badly constructed and didn't measure what it thought it was measuring. There could be lots and lots of other problems.
There's nothing inherently wrong with p values, but people tend to believe they mean far more than they do.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
The Q are intrigued. Pray they don't intervene.
MMM. Actually economics (especially the behavioral variety, which is the most innovative field, as I'm sure you didn't know) *is* social psychology with another name.
Back to the books, bro.
It is the job of the reviewer to check that the statistic was used ion the proper context. not to check the result, but the methodology. It sounds like social journal simply either have bad reviewer or sucks at methodology.
That's a good sentiment, but it won't work in practice. Here's an example:
Suppose a researcher is running rats in a maze. He measures many things, including the direction that first-run rats turn in their first choice.
He rummages around in the data and finds that more rats (by a lot) turn left on their first attempt. It's highly unlikely that this number of rats would turn left on their first choice based on chance (an easy calculation), so this seems like an interesting effect.
He writes his paper and submits for publication: "Rats prefer to turn left", P<0.05, the effect is real, and all is good.
There's no realistic way that a reviewer can spot the flaw in this paper.
Actually, let's pose this as a puzzle to the readers. Can *you* spot the flaw in the methodology? And if so, can you describe it in a way that makes it obvious to other readers?
(Note that this is a flaw in statistical reasoning, not methodology. It's not because of latent scent trails in the maze or anything else about the setup.)
====
Add to this the number of misunderstandings that people have about the statistical process, and it becomes clear that... what?
Where does the 0.05 number come from? It comes from Pearson himself, of course - any textbook will tell you that. If P<0.05, then the results are significant and worthy of publication.
Except that Pearson didn't *say* that - he said something vaguely similar and it was misinterpreted by many people. Can you describe the difference between what he said and what the textbooks claim he said?
====
You have a null hypothesis and some data with a very low probability. Let's say it's P<0.01. This is such a good P-value that we can reject the null hypothesis and accept the alternative explanation.
P<0.01 is the probability of the data, given the (null) hypothesis. Thus we assume that the probability of the hypothesis is low, given the data.
Can you point out the flaw in this reasoning? Can you do it in a way that other readers will immediately see the problem?
There is a further calculation/formula that will fix the flawed reasoning and allow you to make a correct inference. It's very well-known, the formula has a name, and probably everyone reading this has at least heard of the name. Can you describe how to fix the inference in a way that will make it obvious to the reader?
Just because you put the word science behind your name, doesn't mean you're doing scientific work. Psychology is great example of where the term "science" has been grossly misused and misdirected. Psychology is really about the pursuit to understand why we're all fucked up and explain away behaviour that no one wants to take ownership of. If this publication is going to block anything, block anyone using the term science, because psychology is not science, it's people trying to make excuses about the way they feel and act, which is all boils down to scape goating responsibility for your actions.
With the exception of chemical imbalance, every single person is directly responsible for there actions, case closed, now lets stop using the term science to describe excuse generation.
Didn't have time for college, eh? Don't like to do much reading on your own? Things you don't know yet are intimidating. I understand. Anyway, if you ever feel like it, you can educate yourself about the types of research conducted in these fields.
Here's a hint... Freud is to psychology what Copernicus was to physics, an early thinker but not the state-of-the-art.
Considering the amount of math that goes into advanced sociology... It isn't bullshit, not to say there isn't bullshit in it or hard science (I am looking at you Physics). The problem is that people see qualitative methods and think, that is so much bullshit. However, it isn't. It is a formalized way of determining phenomena just so that we can use quantitative analysis to figure out if the phenomena are valid. Especially since the Belmont Report, because we can't set experiments that ruin people for life anymore.
I studied and tutored experimental design and this use of inferential statistics. I even came up with a formula for 1/5 the calculator keystrokes when learning to calculate the p-value manually. Take the standard deviation and mean for each group, then calculate the standard deviation of these means (how different the groups are) divided by the mean of these standard deviations (how wide the groups of data are) and multiply by the square root of n (sample size for each group). But that's off the point. We had 5 papers in our class for psychology majors (I almost graduated in that instead of engineering) that discussed why controlled experiments (using the p-value) should not be published. In each case my knee-jerk reaction was that they didn't like math or didn't understand math and just wanted to 'suppose' answers. But each article attacked the math abuse, by proficient academics at universities who did this sort of research. I came around too. The math is established for random environments but the scientists control every bit of the environment, not to get better results but to detect thing so tiny that they really don't matter. The math lets them misuse the word 'significant' as though there is a strong connection between cause and effect. Yet every environmental restriction (same living arrangements, same diets, same genetic strain of rats, etc) invalidates the result. It's called intrinsic validity (finding it in the experiment) vs. extrinsic validity (applying in real life). You can also find things that are weaker (by the square root of n) by using larger groups. A study can be set up in a way so as to likely find 'something' tiny and get the research prestige, but another study can be set up with different controls that turn out an opposite result. And none apply to real life like reading the results of an entire population living normal lives. You have to study and think quite a while, as I did (even walking the streets around Berkeley to find books on the subject up to 40 years prior) to see that the words "99 percentage significance level" means not a strong effect but more likely one that is so tiny, maybe a part in a million, that you'd never see it in real life.
OK a new size TV
p-values are not probabilities. What people would like it to be are probabilities that one hypothesis is correct compared to another. But that is not what it does, and because people ignore that gap and mis-interpret them it has become such a problem; that's why they are being banned. Many experiments with acceptable p-values (p0.05) are not reproducible.
Actually the inventor of p-values never intended them for a test, only to uncover that there is perhaps worth of further investigation.
p-values tell you, if you collected data under the current model, how frequently you will get data more extreme than the data at hand. p0.01 means, only in 1% of cases you will get such an "outlier". But it assumes that the model itself is correct. It varies the data!
Instead, what should be done is to compare one model versus another one, with the data we have. Bayes factors do that, and should be used and taught.
The problem came to be because social sciences do not have proper, meaningful models, which can be compared. So they have resorted to techniques that do not require specifying models (or alternatives) rigorously. In the physical sciences, you can precisely write a model for a planetary system with 2 planets and one with 3 planets, and the Bayes factor will be meaningful.
NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.
Undergraduates in medicine, biology, chemistry, physics, et al. are not.
Depends on the school. The schools I went to for undergrad and grads programs, and the three I've worked at since, all required statistics for science majors. Some of them required only a single semester, others required a full year. My undergrad only required one semester (covering calculus based statistics, e.g. mle evaluation for distributions), and the second optional semester was ~40% math majors and ~40% physics majors.
The key is to make sure that "something" isn't totally irrelevant. For example, the probability that two groups of people/animals/cells are from exactly the same hypothetical infinite population is not helpful. If you teach people they should calculate such a probability, they will attribute magical powers to it in an attempt to make sense of why they are doing it. If not you get this:
Fisher, R N (1958). "The Nature of Probability". Centennial Review 2: 261–274.
And yes, I know it is partly Fisher's fault.
You obviously didn't read about the criticism and numerous problems associated with that measurement technique.
Doctors are the opposite of scientists. They use induction (or rather abduction) to make their clinical decisions. Whereas scientists principally use deduction and the scientific method (i.e. experimentation). Legal reasoning in the context of evidence is also based on abductive logic. Basically, doctors and lawyers typically draw conclusions from circumstantial evidence. Whereas scientists formulate theories partly based on circumstantial evidence, but only draw conclusions from experimentation and deduction of the results.
These differences are important to comprehend if you want to understand how and why doctors behave the way they do. Some fields, like nutrition science, are less scientific than they could be specifically because they apply Doctor Logic (tm) too heavily. Not that there's anything wrong with Doctor Logic (tm), as long as it's applied in the correct context. You obviously cannot experiment on a human being in order to address his particular ailment, and there are very few symptoms or set of symptoms which conclusively identify a specific ailment. And in the context of a single case, induction and abduction are arguably less susceptible to biases because the particularities of the case are more highly correlated with the underlying effect. Whereas in the context of huge samples induction can lead you to a multitude of different conclusions because of the cornucopia of circumstantial evidence.
Anyhow, people who perform medical research are rarely practicing medical doctors. They usually have a Ph.D in a hard science, like chemistry or virology.
Your false assumption is that doctors, chemists and physicists get things right with any greater frequency. It's not that social scientists are misusing statistics but that a large number of scientists is most disciplines simply do a poor job of quantifying things. It's a little more obvious when it happens in social science, but accurate measurement is hard or often impossible, so bad proxy measures a pervasive feature of most scientific disciplines. That's one of may reasons why most "experts" usually get it wrong.
yes, i am.
true randomization allows you to control for everything (intuitively: since it's randomized, there is no way for you to introduce bias), at the cost of increased variance. however, you can make up for increased variance by increasing the sample size, which is what they did here. i forget the exact numbers, but they sent out hundreds of letters.
far from what you assert, randomization is fundamental to experimental control, and randomness is quite easily generated in a controlled manner. here's a general hint for you and everyone else: don't say things like "randomness cannot be controlled because then it wouldn't be 'true' randomness". it just makes you seem like an idiot.
"They were pure niggers." – Noam Chomsky
yes, you are correct: social psychology done rigorously becomes economics. as for the rest, however...
"They were pure niggers." – Noam Chomsky
+1
The useful interpretation of the "group comparison" p-value was not figured out until 2013:
http://arxiv.org/abs/1311.0081
i am a statistician and i've worked closely with a sociologist (one of the few who uses math correctly, if a bit pedantically). you are correct, it is not intrinsically impossible to do sociology correctly. however, the mathematical literacy standards for the field are woefully lacking even in the ivy league.
this song by Tom Lehrer holds true today, just replace "sigma and chi-square" by "social network analysis".
"They were pure niggers." – Noam Chomsky
indicates that authors incorrectly measure p-values to their study results 86.5% of the time (P 0.001).
Maybe you and the mods just didn't major in the sciences, or maybe you just didn't go to that great of a school. In my experience statistics is a required course for the sciences, and statistics is needed for the USMLE exam for those going into medicine... so maybe you need to take your own advice and rethink things instead of using an uninformed world view.
Modern social psychology is notorious for rejecting objective evidence, since it can often uncover facts about human nature which society (and many social psychologists) don't like. Stan Milgram's experiments come to mind.
On the other hand, get rid of p-values and other forms of objective verification and you can make up anything you want to. You can come up with any amount of airy-fairy so-called 'evidence' to support your pet theory. Get rid of that annoying inconvenience of 'logic'. These days, it's all inference and innuendo, especially since the "critical" / "discursive" crowd have gotten a hold.
P-value certainly is a probability - it's the probability that you'd see data at least as extreme as the data you saw, if the null hypothesis is correct.
You misspelled "English".
putting the 'B' in LGBTQ+
Citations please.
putting the 'B' in LGBTQ+
"Randomness is not compatible with experimental control."
You have no clue about how to set up an experiment.
putting the 'B' in LGBTQ+
and the other crazies were right all along, that psychiatry is not a real science?
Or does it just prove that the general understanding of math and statistics (except among matematicians) are fields that are in free fall, and that a few years from now, college graduates won't even be able to recite the multiplication table up to 10?
-- Another senseless waste of fine bytes.
Psychiatry is a medical profession where the practitioners go to medical school and then train in the profession as per other medical professions. Psychology is not medicine. Psychologist study human emotions, thought, mental illnesses and disorders (overlapping Psychiatrists) but cannot prescribe unless they also train as a doctor or a Psych nurse. Psychologists do more counseling and group dynamics. Psychiatrists are more focused on drug treatments, but often work in tandem with Psychologists.
Psychological is training based on both medical and Psychological research.
putting the 'B' in LGBTQ+
Also "grammar."
There really aren't any good ways to measure those other effects. If you knew how your experiment was biased, you'd try and fix it.
Criticisms of p-values usually fall into two groups. Some people believe that p-values are bad because some people interpret them as the false positive rate. Personally, I think that's a problem with some people, and not p-values. The other criticism, which is particularly prevalent in social sciences, epidemiology and some of the squishier medical-type areas, is that if you get a non-significant p-value you discard potentially useful results. The usual proposal (which is probably the situation in this case) is to use confidence intervals. That way you can see all the area where your confidence interval is not overlapping zero! I have two objections to that. First, CIs are simply calculated from p-values and vice versa - they're really the same thing presented differently. Second, the reason you discard your result (or save it for a meta-analysis) if you get an insignificant p-value is because your data has been ruled insufficient evidence. Looking at CIs and marvelling at all the potentially meaningful area between them is just softening the p 0.05 rule of thumb. Incidentally, the false positive rate people suggest doing the opposite - using p 0.01 or 0.001 as the threshold for significance.
Whereas scientists principally use deduction
To all autodidacts: Imagine if YOU were to make a statement this absurd, without even a hint of self doubt. Worse, what if this is the kind of thing you actually believe as a result of your online "learning" adventures?
This is why a formal education is important. On your own, you could very well end up the the AC above -- so deeply misinformed that there's little hope for recovery.
Required reading for internet skeptics
I suspect that book is still foundational in most University advertising/marketing progams.
I think historically, a more influential book has been Darrell Huff's "How To Lie With Statistics", the second book in this list.
It was originally written in 1954. And while less rigorous, it is an entertaining read and probably gets its point across to a much wider audience.
I know for a fact that Huff's book is still used as a text in college statistics courses... but probably only the lower-level classes.
Well I happen to have several friends studying psychology right now and I can tell you that compared to real science, what they is mostly hogwash. Your initial analogy was correct, that I'll give you credit for, both that's about all I can give you credit for.
My cousin is studying hoarding disorder and how to overcome it. She's been given money, A LOT of money to study this. None of the people in her study have any kind of chemical imbalance, which as I already stated would completely change the landscape. So far her research at a PhD level has determined that people with hoarding disorder attach false emotional states to objects, which means in other words, they can't rationalize emotion. That's not science, that's a lack of emotional control, they need to mature and stop playing cry baby.
My Sister studied depression, again in people with NO chemical imbalance. People felt sad because they didn't want to grow up and face the world for what it is! Again, not science. More proof people are cry babies.
I could keep going but it all the same story.
Randomized controlled trial
Momentarily, the need for the construction of new light will no longer exist.
P-values certainly are probabilities. You just argued they aren't probabilities, but they are probabilities of this other thing. You contradicted yourself. I was specifically vague when I called it 'something' because it changes with the type of test and there are many to choose from and I didn't want to write a whole book. That book has already been written by smarter people than I.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Hey there. Again, we're generally on the same page here, and I agreed with your comment, and my counterpoint was directed not so much at you as at the general idea of the folks here with a dismissive view of what social science means. BTW, Interesting your comment about computer science. ;-)
I realize now we may not even have been using the same definitions. I was thinking more like psychology (let's say a stress coping training study, for example) versus biology (let's say a cancer treatment study). So, it's funny to me now to realize you are perhaps calling the latter social science as well. Anyhow, in both studies you absolutely can setup valid control groups. In the cancer case, the control might not be "no treatment," but you compare your new treatment to the efficacy of existing conventional ones. In a non-health related area completely, cognitive psychology is filled with countless examples of measuring the effect of priming the brain with images or words associated with different categories of concepts upon reaction times, opinion formation, behavior, etc. That's just one example off the top of my head.
Also, many studies are able to be performed on existing data sets without requiring an interventional experiment. Steven Levitt has received much acclaim over his career performing these types of analyses. For anyone who doubts that social science is a rigorous and fascinating field, they should read (or listen to) some of his work.
There really aren't any good ways to measure those other effects. If you knew how your experiment was biased, you'd try and fix it.
Randomized sampling goes a long way, but only if you have a large enough population. This is one of the problems of social sciences. A randomized 10% subsample from 100 subjects ain't gonna cut it. A randomized subsample from 10,000,000 people isn't going to get funded.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
I lived through my wife's PhD in education. I helped with the statistics. It was mind curdling stuff. But her thesis had rigor. S-Plus, Excel and everything else doesn't have MANOVAs. R does. We used R.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
I have a doctorate, and learned these particular distinctions under a professor emeritus who teaches in the statistics and engineering department of an esteemed tier 1 American university, and who is well regarded in the fields of risk analysis and systems engineering. He's spent the latter part of his career analyzing the systems and methods of knowledge acquisition in science, medicine, law, and other fields. In fact, he was one of the early critics of over-reliance on P-values in scientific literature, and has provided concrete experimental data proving problems with how modern scientific research is performed. When you read papers studying reproducibility of research, or comparing and contrasting the worth of different statistical methods--i.e. the frequentist versus bayesian debate--he may very well be the author.
But thank you very much. I do consider myself an autodidact. It's not like self-directed learning is somehow mutually exclusive relative to working through formal academic programs.
I don't know what your qualifications or experience are, but you should kind of be ashamed of yourself. The notion that science principally uses deduction to draw _conclusions_ is kind of the _definition_ of the scientific method. Generate a hypothesis: if A, then B. Test the hypothesis. Publish your results. If the results affirm "if A, then B", and especially if the results are reproduced, it's subsequently applied as a premise in further work. That's deduction. Without deduction, every experiment would necessarily need to reproduce every previous experiment. Furthermore, invalidating any premise in the chain can devastate confidence in subsequent work.
When diagnosing an ailment as a doctor, diagnosing a problem in a mechanical or software system, or proofing guilt in a trial, deduction is not the _principle_ methodology. Your evidence is limited. You often can't test a hypothesis by running a test--either because it would be too expensive, or in the case of a criminal trial it's simply not possible at all. So your principle tool is induction--inferring a conclusion from incomplete and inconclusive data.
Your false assumption is that doctors, chemists and physicists get things right with any greater frequency.
Did you mean to reply to me? It's a bit surreal to see you seemingly support what I wrote but tell me about a false assumption I made. In case you were speaking to me, I would like to point out that I made no such assumption. I argued that social scientists were as rigorous as any others but made no claims about either group's infallibility in absolute terms.
That sounds like an excellent reason to use scare quotes around "scientists". When only 25% of published biomedical results can be reproduced, that field needs to do work to justify the claim to be science as well.
Are you sure that the term "well-controlled study" applies, given how you repeatedly used the term "random" when describing this experiment?
Randomness is not compatible with experimental control. Additionally, randomness itself cannot be controlled, because doing so would prevent it from being true randomness.
Quick! Someone is wrong on the internet.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
SJWs do not apply it equally however..
Except that those terms are subjective and, really, based on emotions.. Atoms are not.
socialists have built a 5th column in psychiatry so that they can label those who disagree with their political goals as mentally ill.
Unfortunately academia was taken over by quacks a long time ago..
Not really a proability. If you use an exact approach, one figures out the possible outcomes for your statistic under your initial hypothsis and then determines the rank of the observed value of the statistic. If its too low, then its time for some abduction. The rest is just convienient approximations using p.d.f's See Fisher's reanlysis of Darwin's pea data.
There really aren't any good ways to measure those other effects. If you knew how your experiment was biased, you'd try and fix it.
Randomized sampling goes a long way, but only if you have a large enough population. This is one of the problems of social sciences. A randomized 10% subsample from 100 subjects ain't gonna cut it. A randomized subsample from 10,000,000 people isn't going to get funded.
Why wouldn't a randomized subsample from 10M people get funded? The required sample size doesn't grow as the population does.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Because identifying the 10 million and sampling the 1 million will be expensive. Worse, that many people in the class may not exist. If your class is 'residents of Boring, Oregon', there may simply be too few of them to randomize away the confounders and drive the p-value down.
Top tip. If you want to find something in the data, it helps if it sticks out above the noise floor like a sore thumb. If you're having to push the noise floor down with sample size to make something visible, the odds you got something else wrong go up in proportion.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
I used to have one of those old fashioned uniplication tables but ordered a new multiplication table because it collects spilled drinks in its crevices rather than letting it drip on the floor. Much more sanitary. Why would I need to recite? I already cited amazon for one, are you saying they are usually defective?
The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives
Stephen T. Ziliak and Deirdre N. McCloskey
http://www.press.umich.edu/titleDetailDesc.do?id=186351
The University of Michigan Press
https://www.press.umich.edu/pdf/9780472070077-fm.pdf
Citing one table will get them to grade you as a D at most. You have to recite at least ten times to get the benefits of being in the A group. Even then it is up to chance since it is graded on a curve.
The notion that science principally uses deduction to draw _conclusions_ is kind of the _definition_ of the scientific method.
I'm sorry, but the very definition of the scientific method is inductive reasoning. This can include deductive steps, but inevitably there is never a 100% proof about the universe because you are unable to observe all spaces at all times, especially the past. Your evidence is always limited, and any general conclusion you come to is subject to being wrong with future observations.
And I am also some one who's gotten a PhD in the sciences... but thought this was something thoroughly covered in a basic philosophy of science course that many schools require for science majors.
That's not science, that's a lack of emotional control, they need to mature and stop playing cry baby.
And exactly how many engineering and science endeavors have been in the name of laziness? If people just used protection, we wouldn't have to study STDs, etc. Whether or not something is science isn't why people find the results useful, but the process that is used to find results.
Classical statistics mentions the significance level, alpha=0.05. It mentions beta -- (1-beta) is the power of the test to conclude the null hypothesis. Classical statistics never mentions R, the background ratio of true to false relationships in a field. While R lies in the interval [0,infinity], you could think instead about the background probability of true relationships. PLOS had an article several years ago that showed the probability a published article falsely touts a relationship as true, a probability they called the Positive Predictive Probability,
PPV = 1 / [1 + alpha / ((1 - beta) * R))]
The person designing an experiment seeks a large power, 1 - beta, so is bounded away from 0 and at most 1, so this factor becomes irrelevant (remember, the article gets published). When R is much less than alpha; eg, R=0.001 is less than 0.05, then PPV is about
R / alpha
or often
R / 0.05
The background proportion of true relationships R dominates over alpha and over beta in the probability the relationship is true PPV.
You do a statistical test in a "field" of relationships where most of the relationships are wrong, otherwise any relationship stated has a good chance to be correct and the "field" is easy if not boring. Consider the search for some 30 genes that might cause a genetic disease out of 30,000 genes in a genome. Then R is 1 / 1000 and (about)
PPV =. 1/(1 + 0.05/(1/1000)) = 1/51 =. 0.02
That is, such published genetics articles tout relationships that are very unlikely (0.02) to be correct.
The German pharmaceutical Bayer called a large sample of published article authors, duplicated their procedures, yet found 70 percent of the publications' touted results could not be confirmed (probably wrong). Many statistical tools will give fame -- hypothesis tests or even more so data mining tools -- these are often charlatan's tools.
So on a scale of 1 to freedom, how much sympathy do we have for the authors who got banned for using p-values?
I lived through my wife's PhD in education. I helped with the statistics. It was mind curdling stuff. But her thesis had rigor. S-Plus, Excel and everything else doesn't have MANOVAs. R does. We used R.
Right. So you know first hand how ignorant it is to say math and statistics have nothing to with social science. :-)
Just simplify it, most of psychology is immature people who don't want to take responsibility, complaining that they might have to take responsibility for there actions. For the real cases of people who have chemical imbalance, it's about understanding how the brain forms attachment and reason.
Speaking of immaturity and projecting emotions to justify one's own feelings...
Except that those terms [racism, sexism, ...] are subjective and, really, based on emotions.. Atoms are not.
That the mind uses stereotypes to classify and categorize information is neither "subjective" nor an "emotion." This is what researchers actually study. Subsequently, inferences can perhaps be generalized that relate to the functioning of the subjective terms the previous poster used.
I lived through my wife's PhD in education. I helped with the statistics. It was mind curdling stuff. But her thesis had rigor. S-Plus, Excel and everything else doesn't have MANOVAs. R does. We used R.
Right. So you know first hand how ignorant it is to say math and statistics have nothing to with social science. :-)
It wasn't me who said that.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Right. So you know first hand how ignorant it is to say math and statistics have nothing to with social science. :-)
It wasn't me who said that.
I know. I wasn't trying to imply you did. =)
And it's a "submission".
Sorry. I had a bit of a whooshy moment.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
english grammer. Kelsey's cousin, who thinks he's e. e. cummings.
Star Trek transporters are just 3d printers.
I suspect that book is still foundational in most University advertising/marketing progams.
I think historically, a more influential book has been Darrell Huff's "How To Lie With Statistics", the second book in this list. It was originally written in 1954. And while less rigorous, it is an entertaining read and probably gets its point across to a much wider audience. I know for a fact that Huff's book is still used as a text in college statistics courses... but probably only the lower-level classes.
Not to be confused with the much more exciting book, "How to lie with Statisticians"
"Racism", "sexism", "patriarchy" and related topics of study within the social "sciences" inherently can't be quantitatively analyzed in any meaningful way.
Yeah, based on your multiyear immersion in the field, right? And those so-called climate scientists, I bet they didn't even include solar effects. And don't get me started on medical science, they're all a bunch of quacks, one year coffee is good for you one year it's bad for you.
Star Trek transporters are just 3d printers.
You randomize your two populations, then you test to ensure that there are no significant differences between the two populations in what you are trying to control for. If there is, like 1 group is all males and the other is all females, then "the randomization failed". Which of course is guaranteed to happen 5% of the time for each factor, so if you have 20 factors.....
Star Trek transporters are just 3d printers.
How is it subjective that, given random applications or whatever, as in the previously described test, subject A reliably responds favorably to names like George Whittington Huxley III and unfavorably to names like D'shawn Mohammed Washington, whereas the majority of subjects respond equally to both? Statistically verifiable, and all that?
Star Trek transporters are just 3d printers.
The other side of the problem is that a random sample of 10,000,000 people is going to find everything significantly different. That's from the inverse dependence of the standard deviation on the root of N. Given any nonzero difference between two samples, there will always be some value of N high enough that the standard deviation is therefore low enough that that difference will have a p value .05, or as low as you want it to be.
Star Trek transporters are just 3d printers.
Because identifying the 10 million and sampling the 1 million will be expensive. Worse, that many people in the class may not exist. If your class is 'residents of Boring, Oregon', there may simply be too few of them to randomize away the confounders and drive the p-value down.
Top tip. If you want to find something in the data, it helps if it sticks out above the noise floor like a sore thumb. If you're having to push the noise floor down with sample size to make something visible, the odds you got something else wrong go up in proportion.
Oh you really mean a sample from of a population of 10,000,000? I thought you meant a sample of 10,000,000 but were a bit imprecise in wording. You don't need a sample of 1,000,000 for a population of 10,000,000, a sample of 100 will do just fine if you are sure it's representative and randomly sampled. And if it's not representative and randomly sampled, a sample of 1,000,000 won't give you a valid answer either. That's why we can do clinical trials on a few hundred people, at most, and decide that a drug is in all reasonable probability efficacious and safe enough to be marked to a population of 600,000,000.
Star Trek transporters are just 3d printers.
Because identifying the 10 million and sampling the 1 million will be expensive. Worse, that many people in the class may not exist. If your class is 'residents of Boring, Oregon', there may simply be too few of them to randomize away the confounders and drive the p-value down.
Top tip. If you want to find something in the data, it helps if it sticks out above the noise floor like a sore thumb. If you're having to push the noise floor down with sample size to make something visible, the odds you got something else wrong go up in proportion.
But you are right though. If the effect is invisible until teased out statistically, it's probably not real, or at best not big enough to be interesting, and at best best nobody will believe it anyway. Especially when the raw effect goes one way, but after statistically clearing out the debris, it suddenly changes polarity. Statistics is best used as a minor tool to get a more precise estimate of an effect which is clear before you start the statistical work.
But people publish that tortured out stuff anyway.
To be fair, even if there's substantial doubt about a result, if it's important enough it's worth publishing just to see if people can either repeat it, refute it, or explain what the heck happened. Cold fusion being a perfect example.
Star Trek transporters are just 3d printers.
Actually, p-values are about CORRELATION. Maybe *you* aren't well-positioned to be denigrating others as not statistical experts.
I may be responding to a troll here, but, no, the GP is correct. P-values are about probability. They're often used in the context of evaluating a correlation, but they needn't be. Specifically, p-values specify the probability that the observed statistical result (which may be a correlation) could be a result of random selection of a particularly bad sample. Good sampling techniques can't eliminate the possibility that your random sample just happens to be non-representative, and the p value measures the probability that this has happened. A p value of 0.05 means that there's a 5% chance that your results are bogus in this particular way.
The problem with p values is that they only describe one way that the experiment could have gone wrong, but people interpret them to mean overall confidence -- or, even worse -- significance of the result, when they really only describe confidence that the sample wasn't biased due to bad luck in random sampling. It could have been biased because the sampling methodology wasn't good. I could have been meaningless because it finds an effect which is real, but negligibly small. It be meaningless because the experiment was just badly constructed and didn't measure what it thought it was measuring. There could be lots and lots of other problems.
There's nothing inherently wrong with p values, but people tend to believe they mean far more than they do.
Yeah. p-values are much more sensitive to having a small standard deviation than they are to having a large difference between the two samples tested. So you can have a test where the differences between the two samples ranged from 3-4 be significant, while an identical test where the differences ranged from 10-30 were not significant. Thus the dependence on big sample size I discussed elsewhere.
Star Trek transporters are just 3d printers.
Well, the whole debate circles around the fact that there is a missing piece of information, no matter how you try to shove the wrinkle in the carpet around, it has to show up somewhere. In this case, they're saying that the p-value is reflecting the probability that the null hypothesis is correct when the results obtained say it is incorrect, and what is missing is the probability that the test hypothesis is correct. .05 says that if we got a result that the drug is harmful, then the chance of it not being harmful is less than 5%, which of course is pretty much useless information; but the power result also tells you that if the drug is harmful at a rate of XXXX%, we would have a 95% chance of seeing it, which is more useful, but not as precise numerically.
The basic forest comprising all these trees is Type I errors and Type II errors, i.e. incorrectly rejecting the null hypothesis (false positive), vs incorrectly not rejecting the null hypothesis (false negative). For any given experiment and result, whatever the noise and error are, the Type I and Type II errors interact; if you want to avoid false positives when analyzing that dataset, you specify a small p-value, but you increase the chances of false negatives, and vice versa. Mostly, we've decided to minimize false positives, so we go with p=.05. If you were able to precisely measure the rate of one of these types of errors you could precisely calculate the other; but you can't, so you have to just push your uncertainty over to where you figure it will do the least damage.
You can put a floor on the Type II error rate estimate by specifying the power of the experiment; i.e., the more tests you make, or the higher the number in your sample, the smaller the effect you can find. For instance, if you're trying to prove a drug is not harmful, intuitively a test of 5 people isn't going to be anywhere near as conclusive as a test of 1,000 people; statistically/mathematically/calculatedly, that's because in that case, the null hypothesis is that the drug is not harmful, and the p-value is testing for Type I errors; but what you're really looking for is Type II errors, i.e. you want to ensure that there isn't a nonzero rate of harm that is too low to show up in your little sample size. You can't get that the way you get p-value, though; the best you can do is calculate the statistical power of the experiment, i.e. that with this sample size, if the rate of harmful complications was greater than 1 in 100 or whatever, we would see it with 95% probability. So, when you include this calculation in with your stats, you get the best set of numbers you can; that the p-value of
Amazingly, that wasn't even considered by the FDA for years when evaluating drugs for safety, so manufacturers were free to test drugs on tiny populations and say with all honesty that they didn't see any problems. It wasn't until later that it occurred to somebody that they really needed a properly powered trial to be safe.
Star Trek transporters are just 3d printers.
P-values certainly are probabilities. You just argued they aren't probabilities, but they are probabilities of this other thing. You contradicted yourself. I was specifically vague when I called it 'something' because it changes with the type of test and there are many to choose from and I didn't want to write a whole book. That book has already been written by smarter people than I.
Right, .05 (or whatever your p-value is). And, hard as it is to believe given so many of the words being the same, you can't get the answer you want from the answer the p-value gives you.
Basically, but vaguely, the experimenter compares two sets of numbers, and calculates the average difference between numbers in the same group (hopefully, getting the same results in each group) and compares that to the difference in the average between the two groups, and wants to know; given this difference between the two groups and this difference between those in the same group, what is the probability that in reality, if there were no errors or noise, there would be a real actual difference between groups (and what is the reasonable range that the actual difference might be). So far, pretty clear, right?
but what the p-value tells you is the opposite way around; i.e., if there really is no difference between the two groups except for errors and noise, then the probability that I'd see the difference between groups and the difference between members of the same group that I'm seeing in my experiment, is
Star Trek transporters are just 3d printers.
Comparing means is one kind of test. There are many others.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
maybe i'm misunderstanding you, but why would you test to "ensure that"? the randomization guarantees it (assuming that it is done correctly, of course); poking around after-the-fact can only undo the blind, which is why good experiments take some measures to make it difficult.
and why is it "guaranteed to happen 5% of the time"? is that independent of sample size and distribution of the factor? quite remarkable indeed!
you sound quite confused about certain things.
"They were pure niggers." – Noam Chomsky
maybe i'm misunderstanding you, but why would you test to "ensure that"? the randomization guarantees it (assuming that it is done correctly, of course); poking around after-the-fact can only undo the blind, which is why good experiments take some measures to make it difficult.
and why is it "guaranteed to happen 5% of the time"? is that independent of sample size and distribution of the factor? quite remarkable indeed!
you sound quite confused about certain things.
The whole point of the concern of the editors of the journal, as described in the article, is what the p-value actually represents: which is, the chances that the test in question, using two randomized samples from the same population, will demonstrate a difference of the size in question. I.e., 5% of the time a randomized population will show a difference in any test with a Gaussian distribution of p=.05. You do the randomization, you test all the independent variables you are controlling for/adjusting for/interested in/worried about; if any are significantly different at .05 or whatever you preferentially redo the randomization; if not you have to rely on your statistical adjustment to take care of it, but you're safer if you can redo the randomization.
If you're not doing this, you should tell people in your publications, because it's something they should know. Apparently, you believe that if you flip a coin twice, it's guaranteed to produce one head and one tail. This is not a good assumption to go into statistical analysis with.
Star Trek transporters are just 3d printers.
ah, you're just confusing randomization as a means of controlling nuisance factors, with the formal significance level of the result about the factor of interest. you are confused; these are different concepts. to wit, randomization certainly does not involve testing "all the independent variables". trying to randomize this way is a waste of time at best, and would probably fuck up your experiment.
it is worth recalling, at times like this, that the last person to speak to me with such a combination of ignorance and certitude was found dead three days later from profuse rectal bleeding.
"They were pure niggers." – Noam Chomsky
ah, you're just confusing randomization as a means of controlling nuisance factors, with the formal significance level of the result about the factor of interest. you are confused; these are different concepts. to wit, randomization certainly does not involve testing "all the independent variables". trying to randomize this way is a waste of time at best, and would probably fuck up your experiment.
it is worth recalling, at times like this, that the last person to speak to me with such a combination of ignorance and certitude was found dead three days later from profuse rectal bleeding.
Not sure what you're saying, but me and my rectum are outa here.
Star Trek transporters are just 3d printers.