Science and the Shortcomings of Statistics
Kilrah_il writes "The linked article provides a short summary of the problems scientists have with statistics. As an intern, I see it many times: Doctors do lots of research but don't have a clue when it comes to statistics — and in the social science area, it's even worse. From the article: 'Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.'"
In other news math may not lie but people still can, all the honesty and good statistics in the world doesnt help end-user stupidity, and there are statistically two popes per square kilometer in the vatican.
A bullet may have your name on it but splash damage is addressed "To whom it may concern."
How to Lie with Statistics by Darrell Huff. Recommended reading.
Do not mock my vision of impractical footwear
Standard deviation is what you learn very early in school. And this was a endocrinologist - a specialist who no doubt took a lot of Biostatistics courses and such, and used a lot of statistics all through his education. And you are telling me that it's not his "job" to know? Wow! We are talking the most basic stuff that anyone with a degree in the sciences should know. It's almost like saying that an English major can be excused if he doesn't know that 2+2=4 because "it's not his job to know".
And lots of others. It then suggests Bayesian reasoning as an alternative to traditional statistical tests.
Most post-PhD scientists are aware of the common mistakes, but being aware that we make mistakes doesn't necessarily stop us from making them. If you chose a random set of conference proceedings, it is almost certain you will find at least one paper (and I suspect usually a dozen or more) that have statistical mistakes in them.
Peer review is not about catching mistakes, although it can on occation. Peer review is about clear communication, such that the experiment can be repeated as identically as possible and that the readers can understand the authors justification for their conclusions. At least that's what every journal article I've read on the topic indicateded was the reason for the peer review processes creation. One of my advisors asked me about it on my written preliminary exam and I needed to do a lot of reading to be prepared for the oral exam. There were several different societies that claimed to have originated the idea, but no one claimed that the purpose was to catch mistakes, fabrications, or data manipulations.
Bureaucracy expands to meet the needs of the expanding bureaucracy.-Oscar Wilde
And what's the law about spelling/grammar corrections that incorrectly correct the supposed spelling error? (Redundancy is purposefully deliberate.) "Its" is possessive. "It's" is a contraction of "it" and "is". -- This has been a message from your friendly neighborhood Spelling Nazi.
The largest demographic in american prisons are black americans. Real statistic but is it true?
Given a particular sample that indicates blacks are 60% of the prison population this would appear to be true.
But what if I said: "The largest demographic in prison is minority, non-whites." Suddenly the % jumps from 60% (black) to 80% (minority). Which is more right? This is the problem with statistics. Context.
Now I can say readily that the largest demographic in prison is actually right-handed people. The % now jumps to 90%.
But wait! There is more! The largest demographic is prison is actually people who prior to arrest were below the poverty line which jumps to 99% of the population. Again, all of the above are accurate based on a sample but which is MORE correct? Linear Algebra is coming into play here quickly....
When that kind of issue comes into play, it is the classic "Correlation != Causation" confusion. The majority of people in prison are in there because of "Being black? Being a minority? being right handed? or being poor?" None of the above. The majority of them are in there because they were convicted of a crime and sentenced. That is the causation of their imprisonment, the rest is correlation which may have a direct causation on the conviction or sentencing, but no direct causation on being in prison. (e.g. You cannot be thrown into prison for being poor, black, minority, right handed)
Same with medical research, politics, economics, etc. The price of oil rising 10% and a subsequent 5% drop in shipping orders. Measuring the significance of regessors is important but oddly never reported most of the time. Many factors get masked or shadowed by higher level regressors (e.g. being a minority masks a variety of other social and economic factors. In addition it can distort statistical work by being too broad. Asians have a variety of different economic and social factors as north american blacks versus even african immigrants.)
Back to the orignal subject:
We can take 100 prisoners and 100 non-prisoners and figure out rather quickly if being black is statistically significant in prison population. Non-prison population blacks would account for 25%-45% of the population (Depending on location). We can see that 60% of prisoners are black. There is a 20+% deviation from the norm. We can test to see the significance of that. Same with minorities. Now we find something quickly that right handed is insignificant because it doesn't deviate from the norm. We can test left-handed and right-handed populations and rule out the handed-ness of a convict being significant.
We can find the economic status is considerable MORE significant then minority or black as a status. We can determine that the reason minorities or blacks are disporotinally more prevelant in prison is that blacks and minorities have higher rates of poverty. We can extract and determine the statistical weight of POVERTY in regards to imprisonment (Since we find a high % of white in prison that are poor compared to the normal population.) Once we figure that out we can remove that and continue an investigation and figure out what weight minority and black has once we have removed POVERTY from the model (Residual analysis).
The problem in reporting is without providing the whole, comprehensive analysis you can miss important things. For instance to correct the injustice in sentencing, without reporting the weight POVERTY has in contrast to BLACK or MINORITY you may lose sight that you may have better success addressing POVERTY to normalize sentencing rather then MINORITY or BLACK (or not).
The same happens in medical reasearch. Given a cocktail of drugs wirthout having the whole analysis you may end up providing more of Medicine A versus B but lose sight that A & B are limited by the dosage of Medicine C.
Satistics are not bullshit, rather mearly observations with no intrinsic agenda or even implication of truth. Purely amoral, like a hand gun.. useful to both the good and evil.
Statistics don't lie, nor do they tell the truth. They simple show the relationship of the data as it stands. The Truth or Thruthiness of it is subjective and vulnerable to context.
-=[ Who Is John Galt? ]=-