Science and the Shortcomings of Statistics

Lies, Damned Lies, and Statistics. by Shadow+of+Eternity · 2010-03-17 14:31 · Score: 5, Informative

In other news math may not lie but people still can, all the honesty and good statistics in the world doesnt help end-user stupidity, and there are statistically two popes per square kilometer in the vatican.

--
A bullet may have your name on it but splash damage is addressed "To whom it may concern."

Re:Lies, Damned Lies, and Statistics. by jeckled · 2010-03-17 14:41 · Score: 3, Informative

Also, statistics are often manipulated to suggest correlations where there are none.
Re:Lies, Damned Lies, and Statistics. by Michael+Kristopeit · 2010-03-17 15:02 · Score: 2, Insightful

valid correlations are often manipulated to suggest causation where there is none.
in the end, it's only a problem if the person listening is an idiot...
Re:Lies, Damned Lies, and Statistics. by dwarfsoft · 2010-03-17 15:03 · Score: 4, Funny

As with everything, xkcd delivers. My personal favorite :)
People often get caught assuming that Correlation == Causation.

--
Cheers, Chris
Re:Lies, Damned Lies, and Statistics. by Cryacin · 2010-03-17 15:08 · Score: 5, Funny

Exactly. I would never believe a statistic that I did not make up myself!

--
Science advances one funeral at a time- Max Planck
Re:Lies, Damned Lies, and Statistics. by menkhaura · 2010-03-17 15:56 · Score: 2, Funny

In other news, researches proved that water causes cancer. 100% of the cancer patients that died in 2009 drank water regularly.

--
Stupidity is an equal opportunity striker.
Fellow slashdotter Bill Dog
Re:Lies, Damned Lies, and Statistics. by Mitchell314 · 2010-03-17 16:02 · Score: 2, Funny

*Only for nonzero values of pope.

--
I read TFA and all I got was this lousy cookie
Re:Lies, Damned Lies, and Statistics. by crmarvin42 · 2010-03-17 16:27 · Score: 4, Funny

That particular oversight drives me nuts. An extension of that is when someone uses orthogonal polynomial contrasts and multiple comparison tests on the same data without adjusting their alpha level. If Tukey's HSD accounts for all tests and gives you an overall alpha of 0.05, and you then proceed to run linear and quadratic contrasts, the combined alpha level is actually 0.10, not 0.05 because Tukey's doesn't adjust for contrasts and contrasts don't contain adjustments for multiple comparisons.

I'm actually at a scientific meeting and saw 7 presentations in which they "double dipped" on their statisitics before we broke for lunch.

--
Bureaucracy expands to meet the needs of the expanding bureaucracy.-Oscar Wilde
Re:Lies, Damned Lies, and Statistics. by Opportunist · 2010-03-17 18:21 · Score: 2, Funny

Not only that, but it is also the key ingredient in most of today's problems. It's the core element of acid rain, it's a main ingredient in beer and many other alcoholic beverages that cause families to break apart, you find it in fattening food and it is the main ingredient in all high carbon soft drinks.
Consuming that stuff might also lead to antisocial behaviour, as it has been confirmed that all murderers, gunmen and even terrorists have consumed it pretty much all their life. When are we going to ban that substance? Doesn't anyone think of the children anymore?
Please read on. It is a serious problem and people should be informed urgently. Giving one sided, biased and doctored results is a really urgent problem in today's statistics and presentation of information. I beg you, make sure you read it and heed the warning. Think of the children!

--
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Re:Lies, Damned Lies, and Statistics. by the_womble · 2010-03-17 18:35 · Score: 3, Insightful

The problem is that a lot of people believe statistics produced by an expert such as a doctor. Sri Roy Meadow had people sent to prison, and lots of children taken away from their parents, by misinterpreting statistics.
Re:Lies, Damned Lies, and Statistics. by Chrisq · 2010-03-17 22:33 · Score: 5, Funny

That since a dead clock is right twice a day, those two times cause the clock to work again?
No, the clock is right all of the time, it just shows local sidereal time and is often in the wrong place
Re:Lies, Damned Lies, and Statistics. by imakemusic · 2010-03-17 22:55 · Score: 4, Funny

Indeed. For example: 6 out of 7 dwarves aren't Happy.

--
Brain surgery - it's not rocket science!
Re:Lies, Damned Lies, and Statistics. by Anonymous+Cowpat · 2010-03-18 02:37 · Score: 2, Interesting

I saw a fascinating presentation by an eminent professor of physics on what Meadow did wrong. It boiled down to mis-applying bayes' theorem. Meadow had got an extremely high probability of the accused being guilty out of it, what the professor did was poit out that (a) the probability put in for chance of two babies dying couldn't be taken by simply multiplying the chance of one dying by itself as the event may not be independent (b) that would be rather moot because the accused's chance of having 2 dead babies was 1.
Putting the correct numbers in and turning the handle produces a chance of guilt of about 1%.
What was most shocking was that, given the elementary error in the application of statistics, no-one called him on it in court.

--
FGD 135
Re:Lies, Damned Lies, and Statistics. by silentcoder · 2010-03-18 03:32 · Score: 2, Informative

The actual truth, as usual, is a bit more complex than the bit we all remember and quote.
Where a correlation occurs there are four distinct possible reasons:
Let say a correlation that during the time when X is known to have increased, Y showed a corelatory increase. then
1) It is possible that this is because X caused Y - e.g. the causation that way isn't implied - yes it's a possible implication.
2) It is possible that X in fact caused Y (e.g. the causation is in fact in the opposite direction of what the quoter of the stat is trying to say).
3) It is possible that X and Y were both caused by an unknown third factor Z.
4) It is possible that X and Y were caused by completely different factors and their correlation is purely coincidental.
The mere existence of a correlation does not imply any of these four possibilities more strongly than the other - they are equally likely unless additional data is presented to corroborate one.
The example from my philosophy textbook (which I'm shamelessly citing here) was this:
Between the period 1955 to 1965 the number of schools in the US where sex-ed was given increased by 75%, during the same period the amount of teenage pregnancies increased by nearly 80% (Both compared to the decade before that). Conclusion - giving sex-ed led to more teenage pregnancies.
This citation is classic example of the correlation/causation mistake in that it assumes option 1.
In this case option 2 actually seems quite likely - if teenage pregnancies were going up, that would put pressure on schools to give sex-ed to try and reverse this trend.
But what if we consider more available data. Specifically that the pill came on the market in 1953 sparking the sexual revolution.
If we consider that the pill let to a more relaxed attitude among teenagers about sex, but that this attitude probably spread a lot faster than actual usage of the pill then it explains the increase in teen pregnancies which combined with the known presence of this attitude would put pressure on the schools to give sex ed.
So then it suggests that in fact we have options 2,3 and 4 happening in a commonly reinforcing manner. The only conclusion that isn't supported by the data at all is option one.
With each bit of additional data added, including comparison with other times where there was a sharp increase in teenage pregnancy (like the early years of the current decade under the Bush administration) we find that the likelihood of X actually causing Y in this example gets smaller and smaller and in fact becomes statistically insignificantly small.
But that doesn't mean option 1 is never the right answer. Sometimes a correlation really is due to causation. You just cannot assume it without further evidence.

--
Unicode killed the ASCII-art *

Re:Its common knowledge by snl2587 · 2010-03-17 14:39 · Score: 2, Funny

How do you figure that? My latest calculations placed it at 70% [Note: Error +/- 10%].

Summery? by sincewhen · 2010-03-17 14:40 · Score: 4, Funny

It's not just statistics that people have a problem with...

--
-- Braden's law of data: All data spends some of its lifetime in an excel spreadsheet.

Re:Summery? by oGMo · 2010-03-17 15:18 · Score: 3, Funny

From your sig:

-- Braden's law of data: All data spends some of it's lifetime in an excel spreadsheet.

What's that law about spelling/grammar corrections inevitably having spelling or grammar mistakes in them?

--
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
Re:Summery? by martin-boundary · 2010-03-17 15:46 · Score: 4, Funny

Godwin's.

Only if the sentence misspells Hilter.
Re:Summery? by icannotthinkofaname · 2010-03-17 16:12 · Score: 2, Informative

That would be Muphry's law.
For details on Muphry's law, click on the above hyperlink. For more fun laws, click on the below hyperlink.
More fun here.

--
Let q be a radix > 1. I am in ur base-q, killing 10 d00ds.
Re:Summery? by Saroful · 2010-03-17 19:37 · Score: 5, Informative

And what's the law about spelling/grammar corrections that incorrectly correct the supposed spelling error? (Redundancy is purposefully deliberate.) "Its" is possessive. "It's" is a contraction of "it" and "is". -- This has been a message from your friendly neighborhood Spelling Nazi.
Re:Summery? by bingoUV · 2010-03-18 00:36 · Score: 2, Insightful

No idea, but there must be a law about people assuming that editing Slashdot signature doesn't affect posts made previous to the edit.

--
Bingo Dictionary - Pragmatist, n. A myopic idealist.

Example: Standard Deviation by cytoman · 2010-03-17 14:42 · Score: 4, Interesting

My doctor was explaining to me that my blood sugar readings should not have a standard deviation of more than 1/3rd of the average blood sugar reading. Just to test if he knew what it meant, I asked him what a standard deviation was. Oh the fun when he tried to bullshit his way out of that one! He eventually told me that when I plot my data in Excel I can ask it to give me statistics on the column and it would mention what the standard deviation value was. But when I pressed on and asked him what a standard deviation is, he shooed me off and told me to go look it up. Never did he confess that he had no clue.

Re:Example: Standard Deviation by cytoman · 2010-03-17 14:59 · Score: 4, Informative

Standard deviation is what you learn very early in school. And this was a endocrinologist - a specialist who no doubt took a lot of Biostatistics courses and such, and used a lot of statistics all through his education. And you are telling me that it's not his "job" to know? Wow! We are talking the most basic stuff that anyone with a degree in the sciences should know. It's almost like saying that an English major can be excused if he doesn't know that 2+2=4 because "it's not his job to know".
Re:Example: Standard Deviation by cytoman · 2010-03-17 15:19 · Score: 4, Insightful

You are missing the point - he did not know what a standard deviation means! That is unforgivable for anyone with a medical degree...hell, it's unforgivable for anyone who has passed a course in statistics in school.
Re:Example: Standard Deviation by PSUspud · 2010-03-17 15:32 · Score: 2, Insightful

As a statistics teacher (HS / Tech school level), this doesn't surprise me in the least. Statistics and statistics education has become a giant game of "plug the numbers in and damn the understanding". When a student has never calculated a standard deviation by hand, how can they be expected to know what the heck a root mean square deviation from the sample mean really is?
Going further, I would say that statistics is a tool for answering questions. Like any other tool, it works well for some jobs and not for others. So far, no problem. But the problem comes from students that are just not willing to understand the questions that statistics can answer. Case in point -- a p value of 0.05 does _not_ mean that the null hypothesis has a 95% chance of being wrong. That's what stats students want it to mean, because they are not willing to ask the questions that stats can answer.
Until students are willing to actually do the work, for the sake of actually learning, I don't see any hope.

--
----- Why sig when you can sign? PGP key id 7675D05E
Re:Example: Standard Deviation by Ethanol-fueled · 2010-03-17 15:33 · Score: 2, Informative

s = sample standard deviation = sqrt((sum(x-xbar)^2)/(n-1)), where xbar is the mean
sigma = population standard deviation = sqrt((sum(x-mu)^2)/N), where mu is the mean
s is approximately equal to (highestValue-lowestValue)/4, range rule of thumb
Unusual values are outside +/- 2 standard deviations
Z = ((x-mu)/sigma) where Z is in terms of standard deviations.
Re:Example: Standard Deviation by Opportunist · 2010-03-17 15:43 · Score: 4, Interesting

Doctors are notoriously bad with statistics. But the real kings of bad statistics are psychiatrists.
Notice how a LOT of studies in psychiatry are essentially statistics, statistics and a bit of statistics? It might be the reason why a lot of the courses you have to pass to become a shrink also consist of a lot of statistics, statistics... you get the idea.
NOBODY who decides that his course of studies would be psychiatry decided for that because he enjoy statistics that much, though. Actually, most psych students struggle badly with statistics. Psychiatry is one of the fields where the label doesn't match the contents. It looks like you're going to do a lot of messing with people's minds (aka "solving their psychology problems") but actually, judging from the courses, you become a refined statistician who had a bit of a counceling tutoring on the side.
That's not what people become shrinks for, though. They want to sit in their office, put people on their couch (or, more modern, in a comfy chair) and get 100 bucks an hour for listening to some idiot whine. And most do just that and will do fine.
It gets bizarre when they somehow end up in a spot where they have to rely on their statistics. Hey, you got a masters in that, and that entails a buttload of statistics, so you can do it... Nobody really cares that 9 out of 10 that somehow managed to get their diploma by either learning what they absolutely needed (and forgot it right after the test, certain that they'd never need it again, because ... ya know, listening to idiots and stuff, not sitting there plotting standard deviations...) or by cribbing altogether.
And then you get studies of the usefulness of psychotropic drugs and wonder whose black hole they pulled that out of...

--
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Re:Example: Standard Deviation by JumpDrive · 2010-03-17 15:44 · Score: 3, Interesting

I agree with your concerns. Being a chemical engineer and a physical scientist, I have often found medical doctors understanding of chemistry and other sciences lacking. I once had an argument about chemical kinetics involved in a prescription drug I was taking, he basically told me I didn't know what I was talking about and blew me off. After another run in with him over another issue I fired him. But that's just one of my personal issues with a doctor.

Back when I was in graduate school me and my colleagues in graduate science taught pre-med chemistry and physics, which was a really watered down version of chemistry and physics which were taught to engineers and science majors. To be honest I thought it was kind of scary. All these years I was taught that medical student were supposed to be the best and the brightest, but we spoon fed them "baby chemistry" and "baby physics".

Since that time I have had many discussions with professors about this and they and I have come to the same conclusion, "the best and the brightest do not go into medical school". Thirty or forty years ago this may have been true, but economics has taken a turn and it just isn't the case anymore.

And why would they? They can make more money on Wall Street, they don't have to hassle with bureaucracy of health insurance, they don't have to hassle with lawyers, so why would the best and brightest go into medicine.

And you want to know what kind of income a hot little girl with a business degree can get. Pharmaceutical sales can pay 6 figures for one good figure. So the next time you see that good looking girl pulling that bag through your doctors office realize she is probably making a lot of money. More money than the average general practitioner .
Re:Example: Standard Deviation by cytoman · 2010-03-17 15:47 · Score: 4, Insightful

There are some things you should never be able to forget - the definitions and meanings of probability, mean, median, standard deviation and variance come to mind. You find yourself in situations everyday where you need to apply some of these things. Am I wrong about this? Do people forget basic definitions so easily?
Re:Example: Standard Deviation by Capsaicin · 2010-03-17 15:54 · Score: 3, Insightful

Standard deviation is what you learn very early in school.
So early in fact that by you forget the details by the time you have had some serious study under your belt. Do you have any idea of the stuff you have to keep in your head to be an endocrinologist? So long as he remembers that it's a measure of variance (which he obviously does), it hardly matters whether he can explain to a mathematician how to derive it? And if OP gets off tripping up specialists with such minutae it ain't the specialist who has issues.
And you are telling me that it's not his "job" to know?
YMMV, but I would prefer to visit an endocrinologist who was an expert on the subject of hormones etc rather than stats.

--
Better to be despised for too anxious apprehensions, than ruined by too confident a security. --Edmund Burke
Re:Example: Standard Deviation by Jah-Wren+Ryel · 2010-03-17 16:13 · Score: 2, Interesting

And then you get studies of the usefulness of psychotropic drugs and wonder whose black hole they pulled that out of...
Indeed. Normally I would never cite an article in a McNews magazine like Time or Newsweek, but I found this explanation of the state of antidepressant drug efficacy to be one of the best I've run across so far - hundreds of billions of dollars all depending on some really, really bad math. Its like the collateralized debt securities of the drug & psychiatric industries:
http://www.newsweek.com/id/232781

--
When information is power, privacy is freedom.
Re:Example: Standard Deviation by crmarvin42 · 2010-03-17 16:58 · Score: 3, Insightful

I am in the life sciences (ie not a computer programmer).

If MD's are reading medical journals and interpreting their results, which they all are expected to do (especially those with a Board Certified Specialty like Endocrinology) then there is no excuse for them to have forgotten what what the standard deviation is a measure of. They should be using the variance estimates provided in a data table to interpret the results it contains every time they read an article. If not, then they aren't worth the exorbinant fee's they are charging, because critical thinking is part of a physicians job description, and accepting whatever gets publish in the New England Journal of Medicine at face value is not.

I can accept forgetting the equation, but there is NO EXCUSE for forgetting that SD is a measure of varition (along with SEM, SED, and CV) as opposed to a measure of central tendancy (mean, mode, median). That is something they teach you in the first week of a statistics course, and is used every subsequent class because it is so fundamental to the interpretation of statistics. If I were cytoman, I'd be looking for a new Endocrinologist.

--
Bureaucracy expands to meet the needs of the expanding bureaucracy.-Oscar Wilde
Re:Example: Standard Deviation by rve · 2010-03-17 17:05 · Score: 3, Informative

You're mixing up psychiatrists, psychologists and psychotherapists.
A psychiatrist went to med school, got a doctors degree and specialized in problems with the brain. A psychologist went to university to learn the study of behavior of people. This involves a lot of statistics and many of them probably do consider it something they didn't go to college for, but it's a study that is supposed to follow the scientific method and prepare students for doing research, not therapy.
A psychotherapist is anyone who feels like calling themselves that. As a preparation they may have studied psychology at university, or they may have spent 20 years meditating in the Himalayas, or followed a short course at a religious group such as an institute of multiple personality disorder therapists or scientology.
Re:Example: Standard Deviation by hazem · 2010-03-17 18:44 · Score: 4, Interesting

There are some things you should never be able to forge.... Do people forget basic definitions so easily?
Given a couple years with little contact with people who speak your native language, you'll actually begin to forget that very language you have lived speaking all your life. So it doesn't surprise me at all that people would forget basic definitions if they don't actually think about those definitions very often.
I figure if you can forget your native language then pretty much all bets are off for the stuff you've known for a lot less time and used a much smaller percentage of your thinking life.
Re:Example: Standard Deviation by fsterman · 2010-03-17 19:42 · Score: 2, Insightful

Except, if you had read this story, you would have found that the antidepressant = placebo story to be incorrect due to poor statistical reasoning:
"Another concern is the common strategy of combining results from many trials into a single “meta-analysis,” a study of studies. In a single trial with relatively few participants, statistical tests may not detect small but real and possibly important effects. In principle, combining smaller studies to create a larger sample would allow the tests to detect such small effects. But statistical techniques for doing so are valid only if certain criteria are met. For one thing, all the studies conducted on the drug must be included — published and unpublished. And all the studies should have been performed in a similar way, using the same protocols, definitions, types of patients and doses. When combining studies with differences, it is necessary first to show that those differences would not affect the analysis, Goodman notes, but that seldom happens. “That’s not a formal part of most meta-analyses,” he says.
Meta-analyses have produced many controversial conclusions. Common claims that antidepressants work no better than placebos, for example, are based on meta-analyses that do not conform to the criteria that would confer validity. "

--
Is there anything better than clicking through Microsoft ads on Slashdot?
Re:Example: Standard Deviation by cycoj · 2010-03-17 19:50 · Score: 2

I think you don't have a clue how doctors work. Do you really think doctors evaluate the effectiveness of a medicine by reading through scientific articles and possible even recalculating the results? No, they follow guidelines which are written by other medical people who's job is evaluating this sort of stuff. That is a good thing, because the people who make the guidelines actually do know their stuff and I trust them do to the statistics more than some doctor. The doctor's job is to know what tests to run, what symptoms to look for etc. Not to do a statistical analysis about the likelihood that you have disease X or Y.
Re:Example: Standard Deviation by cycoj · 2010-03-17 20:07 · Score: 2, Insightful

Have you ever looked at the sheer amount of knowledge that doctors have to know (and actually do know)? Yes they are learning baby physics and baby chemistry. We have physicists and chemists to do the non-baby physics and chemistry. You could also say the same thing for other sciences. I know friends who've taught physics to chemistry students and that was baby-physics which the chemists struggled to understand. Similarly I've had to learn chemistry for my physics degree and have pretty much forgotten almost anything about it, that didn't prevent me from getting a PhD in physics.
I wonder how much of the physics or chemistry out of your field of expertise you still remember.
Back to the topic of doctors, a lot of the stuff that doctors do is purely knowing things, but they need to do a lot of it. They don't necessarily know exactly how a drug works, they just know when to give that drug. So a lot of their work could be done with a very big flowchart, except for the fact that quite a lot is actually observation not just what you tell them.
Re:Example: Standard Deviation by jpmorgan · 2010-03-17 20:50 · Score: 3, Insightful

There's a reason why you keep getting modded up and those disagreeing with you keep getting modded down.
You're exactly right. Modern diagnostic medicine is predicated on interpreting statistical studies to make diagnoses. It is practically incompetence for a practicing medical doctor to not know what standard deviation means.
Re:Example: Standard Deviation by demonlapin · 2010-03-17 23:08 · Score: 2, Informative

Med schools hire statisticians for this. Read the thank-yous. Even small studies will thank the biostatistician. The biostatistician will be an author on a major study.
Re:Example: Standard Deviation by tibit · 2010-03-18 01:43 · Score: 3, Insightful

I believe that forgetting something usually means you never really understood it. I don't think that if you really understand something, you'll ever forget it. There are things I do rather rarely yet I don't forget them because I understand them and could re-derive them from first principles.
Usually if I forget something, it means I never quite understood it in the first place.
I think that real understanding implies almost indefinite retention, and lack of retention can be usually be explained by lack of understanding.
It's very easy to forget something if all you know about it is a memorized definition and an equation and two.
If you use statistical terms and concepts daily, you should be able to explain them after being woken up in the middle of your sleep, after several hours of partying with lots of booze. Anything less probably means you're acting things out rather than understanding them.
Feynman often talked about the issue of real understanding. One could summarize his view thusly: if you cannot explain it to a non-specialist of decent intelligence, you probably don't understand it.

--
A successful API design takes a mixture of software design and pedagogy.

Re:Maths anxiety by Nefarious+Wheel · 2010-03-17 14:50 · Score: 4, Informative

How to Lie with Statistics by Darrell Huff. Recommended reading.

--
Do not mock my vision of impractical footwear

Two weeks of six sigma classes... by ctmurray · 2010-03-17 14:52 · Score: 2, Funny

Our company six sigma training included two weeks of collecting and analyzing data with a stats package. I got enough experience to even train me how to use the program. I can still do a few things that come up regularly. Probably the best thing to come out of six sigma (for me at least).

Re:Two weeks of six sigma classes... by Bemopolis · 2010-03-17 18:03 · Score: 2, Funny

So, you got twelve sigma-weeks of statistical training?

--
"I guess the moral of the story is, don't paint your airship with rocket fuel." -- Addison Bain

Personal experience by nanoakron · 2010-03-17 14:53 · Score: 5, Interesting

As a doctor myself, I feel I should add my $0.02...

Throughout med school we had the odd scattered lecture on statistics, and later when reading papers I used to skim over most of the maths just to look for the P value at the end (one representation of how statistically significant a result is).

However, I then took a formal stats course and was amazed at how little I understood - Monte Carlo techniques, Markov models, and even something as trivial yet important as the difference between a parametric versus a non-parametric test.

And then it struck me - most of the research I had read had applied parametric statistical tests to their data - that it, the researchers made an assumption that the underlying distribution of results would fall on a normal curve. Yet this simple assumption may be all it takes to skew the data when they should have chosen a non-parametric test instead.

So yes, stats are vitally important, badly taught, and focus too much on the maths rather than the concepts. Remember that we're doctors, not mathematicians - the last set of sums I did were in high school. If I need to analyse data, I'll probably plug it into SPSS - although now with my eyes open.

-Nano.

Re:Personal experience by Frequency+Domain · 2010-03-17 15:24 · Score: 5, Insightful

...And then it struck me - most of the research I had read had applied parametric statistical tests to their data - that it, the researchers made an assumption that the underlying distribution of results would fall on a normal curve. Yet this simple assumption may be all it takes to skew the data when they should have chosen a non-parametric test instead.
So yes, stats are vitally important, badly taught, and focus too much on the maths rather than the concepts. Remember that we're doctors, not mathematicians - the last set of sums I did were in high school. If I need to analyse data, I'll probably plug it into SPSS - although now with my eyes open.
That's a good insight. I'm a statistics professor, and some of the problems I see are a) people generally get exposed to a single course in statistics; b) they're usually mathematically unprepared for it; c) so much gets squeezed into that one opportunity that heads are exploding; d) because of (a) - (c), everybody wants you to "just give 'em the formula"; e) since statistics is so widely used, there's a plethora of courses that are being taught by people who themselves are victims/products of (a) - (d), and are very happy to "just give 'em the formula"; and so e) most people plug and chug data through a stats package with no idea of the applicability, limitations, and interpretation of the results. The sheer volume of bad analyses is enough to make you weep, and contributes to the widely held perception about "lies, damned lies, and statistics". And that completely ignores the intentional falsehoods propagated by people who are trying to support various advocacy viewpoints, and will happily mislead the public with biased samples, Simpson's paradox, invalid assumptions, etc.
Re:Personal experience by martin-boundary · 2010-03-17 16:00 · Score: 3, Interesting

There's a certain level of historical baggage as well.
Even ten to fifteen years ago, students in Statistics courses had very little computer exposure, and that of course means any practical analyses would imply the use of approximations - hence the widespread use of chi squared tests and normal distributions for everything, whether appropriate or not.
If the statistician -> textbook -> student/scientist -> textbook -> scientist process is factored in, I have no doubt that it will take another generation or two before the old style of statistics is replaced sufficiently widely to be only a memory.
Re:Personal experience by WeirdJohn · 2010-03-17 16:22 · Score: 2, Informative

It's the approach that you can just pump the numbers into SPSS or Statistica, and then call on a battery of tests until you get a "significant" result that results in the kind of errors the article (and a disturbing number of /. readers) fall into.
Unless you're dealing with large samples, all z and t tests assume normality in the population, with insignificant skew or kurtosis. Yet by definition, if we have enough data to be sure we have a normal population, we have enough data that the central limit theorem makes the differences moot. Even more extreme, if we have a complete description of the population (a census) we have no need to use any inferential statistics.
Meanwhile students are told to test the data for normality, homoscedacticity and linearity, to the point where the repeated tests on a single data set make the chance of a Type II error better than even. But by saying "SPSS said so" and burying assumptions beneath a mountain of waffle and misunderstood jargon we can still get these "results" published.
No-one who can't perform a balanced block design ANOVA by hand, or explain what transforming data does to residuals under assumptions of a linear additive model, should be allowed near statistical software in my opinion. And the so-called statistics packages in popular spreadsheets should be banned, and any student relying on them should be failed.
Re:Personal experience by failedlogic · 2010-03-17 18:09 · Score: 2, Interesting

I think the term "Statistics" has become too general that people don't understand how complicated it can be. People think of Statistics as Bob saying to Alice - "Get me the stats on this weeks' sales." Alice just goes digging around and gets Bob the total sales in $, # of units sold .... etc. People don't understand or know of the concepts that are involved in polling, they just thing they called 2,000 random people and that's it. That's statistics to the public and many college graduates.
I had to take a few stats courses for my BA. Learning stats is humbling - I know I really know nothing about it now after taking a few courses. The classes I took assumed you have no inkling of the basics of calculus or algebra (much past grade 9 level). I didn't know any calculus - I took some 1000 level Algebra but, after graduating a few years ago, I'm teaching my self Calc now and I'm realizing how much less I really understood about Stats at the time.
When people really don't understand the underlying mathematical principles they shouldn't use SPSS or Excel. Heck, if you ask people what 2+2 is, they know the answer. But you tell them to apply X or Y to such and such data set with SPSS, they probably won't investigate the results. Print it out with the report. Done! If you use a Stats program you should understand what your data means, what is happening to your data, what it means when X is applied to your data and what the end result means. I don't think a lot of people are humble enough to say they don't really understand. Little white lies!

Fair and Balanced: Fox quotes the Bible as saying by vandelais · 2010-03-17 14:57 · Score: 2, Funny

that there are only 3 kinds of scientists: those that are good at math and those that aren't.

--
Game: Player 'Donald J Trump' now has AI skill level 'experimental'.

Re:No surprise here by Homburg · 2010-03-17 15:00 · Score: 5, Funny

I think your example would be more persuasive if it involved algebra, though.

The problem is statisticians by BrokenHalo · 2010-03-17 15:01 · Score: 5, Insightful

In other news math may not lie but people still can...

Usually (in science at least) it's not even a matter of lying. Part of the problem is that the multi-headed monster that statistics has become has a tendency to lead people to over-use numerical "answers" vomited up by stats packages, without really understanding what they are for, or how to interpret them.

Statistics are very useful for predicting certain things, but all too often they are submitted as "proof" of a given condition, which is dangerous. Sometimes we need to throw away statistics and start applying common sense.

Re:The problem is statisticians by caerwyn · 2010-03-17 15:35 · Score: 4, Interesting

Actually, one of the most dangerous uses of statistics is exactly predicting with them inappropriately. Curve fitting is especially prone to this error- attempting to make any predictions outside of the central mass of the points used to *produce* the curve is completely bogus, and yet people do it all the time.

--
The ringing of the division bell has begun... -PF
Re:The problem is statisticians by Martin+Blank · 2010-03-17 17:38 · Score: 3, Insightful

Many times the answer that "just can't be right" is; the problem comes when we "throw away the statistics" instead of figuring out why and how it gave the answer it did.
I've adopted in my life a truism I learned from my flight training: deal with things as they are, not how we would wish them to be.
In my work in network security, I often come across some oddities, which I present to management. They can present some uncomfortable episodes, and management sometimes wishes to just sweep them under the rug instead of addressing the problems. Now that we have a newly-upgraded IDS, we're seeing things that we never noticed before, and I suspect that we're going to be getting new guidelines on what is important.
I hope that's just cynicism leaking through the rum, but I've been there long enough to thing it might be reality instead.

--
You can never go home again... but I guess you can shop there.
Re:The problem is statisticians by Capsaicin · 2010-03-17 18:26 · Score: 3, Funny

*Ahem* Cue carbon dating.
To be fair the problem with carbon dating is not merely curve fitting. A larger problem is the when God created the universe in Oct 4004BC (or thereabouts), He created Adam with a belly-button.

--
Better to be despised for too anxious apprehensions, than ruined by too confident a security. --Edmund Burke
Re:The problem is statisticians by the_womble · 2010-03-17 18:38 · Score: 3, Funny

I feel somewhat vindicated for being no good at econometrics when I see where the people who were good at it have landed us.....
Re:The problem is statisticians by Aceticon · 2010-03-17 21:49 · Score: 2, Funny

Here's a good example (credits to Nassim Taleb and his "The Black Swan" book) on the risks of extrapolation (of which curve fitting is one method):
- Based on previous experience, a turkey will confidently predict that he will wake up every morning be fed during the day and go to spleep in the evening. He can be easilly extrapolate this from the fact that it has happened every day of it's life. At some point before Christmas this turkey is going to have a big surprise ...

Re:No surprise here by skine · 2010-03-17 15:02 · Score: 2, Informative

It's perfectly reasonable that someone use a calculator for sales tax (if an exact answer is desired).

Also, sales tax is multiplication - not algebra.

Re:No surprise here by coolsnowmen · 2010-03-17 15:05 · Score: 5, Insightful

You are a jerk.
You are insulting your sister because she is bad at mental math? It is a skill; one not required for extensive knowledge of the social sciences. Additionally, maybe if sales tax is simple in your state like 10%, but where I live it is 4.5% which is not always easy to get exactly right in your head.

I had a roommate who was brilliant,funny, a singer and an artist, and yet, he couldn't calculate tip to save his life, but I don't certainly hold that against him.

Excellent by zoso1132 · 2010-03-17 15:08 · Score: 2, Insightful

One of the best articles I've seen on stats (and their misuse). I'm taking a data analysis course at the moment and I've spent at least a dozen hours simply computing confidence intervals, testing the null hypothesis, and determining significance. It really has changed how I view statistics because it keeps pounding in these very key but oft-ignored principles.

--
"Everything is linear if plotted log-log with a fat magic marker."

bad title by obliv!on · 2010-03-17 15:10 · Score: 5, Interesting

It is not a shortcoming of statistics that other people, like various scientists who aren't statisticians, don't know how to use or properly interpret statistics. It is a shortcoming of their knowledge.

It is not a shortcoming of the Copenhagen interpretation of quantum mechanics or the Chicago school of economics if I don't understand or know how to correctly interpret their results. It is my shortcoming and fault for not knowing enough to connect the dots.

I do statistical research some of that is through interacting with researchers in the biosciences. Often when I go to talk to a researcher and ask them if they could use some statistical or mathematical or computational assistance with their research it has almost always been a fruitful starting point to long conversations and getting into the research. Now sometimes it was simply a matter of looking at their F-test results or ANOVA scores and telling them what it meant (like with a regression model relating proportions of certain characteristics between taxa), more useful interactions for me often mean working on new algorithms or estimators or working with fitting a model from their empirical data because there isn't a reliable standard model to work off of (like intergenic distance between genes in an operon) that kind of challenge makes less engaging work worth the hassle. Maybe I'm odd because I've worked hard to have a good background in both statistics and biology, but I shouldn't be.

Although here is an observation that perhaps supports some of the intent of the article from my own experience. I was speaking with a biology graduate student and it came up that they had a biostatistics course in the department. Of course as a statistician my mind goes towards survival function, failure rate, life tables, censored data, bioassy, epidemiology, microarrays, clincal trials, topics along those lines. It turned out their course focused z tests, t tests, f tests, confidence intervals, point predictions, least squares regression, multiple regression, ANOVA, and things along these lines just with simulated problems in a lab setting. That is not necessarily a bad thing, but much of the core math was under played or missing like model assumptions and alternate formulations or things like dummy variables. The worst part was that even though they were doing well with the class they had no confidence in actually using the statistics and didn't understand how to interpret the meaning of something like a confidence interval, they knew how to calculate one, but it wasn't clear what it actually meant to them.

The corollary to the notion in the summary I'd rant and claim is that scientists overall have less than desirable skills in mathematics, statistics, and computation than those who studied those disciplines principally and that's hurting science. However many in those three disciplines really know little beyond basic results in any of the sciences which hurts the applicability of these mathematical fields to the sciences and likely hurt our ability to develop certain types of discipline specific results that can be generalized from work in application problems.

In either case whether you're a typical scientist or a typical math/stat/comp person in order to become proficient enough in the other areas it requires going an awfully long out of the way compared to any counterpart who simply does not care and goes straight through as many before have. While in some areas of research on either side it is no problem to do as has been done and not further knowledge into those other areas. Increasingly results that have the highest levels of impact are coming more and more from truly interdisciplinary research. In order to further encourage that for those who are interested in such fields (aside from making more clear what areas in any of the fields fringe to such interdisciplinary work) we need more incentive to study more than one field and/or better ways of enabling fruitful cooperation between the camps.

Re:PhD Candidate in Biostatistics Here by MindlessAutomata · 2010-03-17 15:16 · Score: 2, Funny

I don't have to be a statistician to know that the above post is 97% bullshit.

only in medicine by rook166 · 2010-03-17 15:20 · Score: 5, Interesting

In reading a couple of these types of articles recently I've noticed that the articles always talk about this being a problem across all journals, but only seem to mention a couple of different disciplines - medicine usually chief among them. Has anyone heard/read anything naming a hard science (e.g. chemistry or physics) as full of bad stats? My hunch is that this happens most often in medicine because you have the combination of controlling for a lot of variables as well as inadequate mathematics training.

Re:only in medicine by daver00 · 2010-03-17 19:31 · Score: 3, Funny

Physics (yes, Physics, THE hardest of hard sciences) is full of terrible mathematics, absolutely terrible, shockingly bad stuff. The good ones know it, some will say it doesn't matter because their butchery comes up with "accurate" results. If they can't even get their analysis right, what can we expect of the softer sciences? That said physics is not so much concerned with statistics as it is probability, none the less, they have some serious problems, for example they often simply decide highly non-convergent things should converge because the experiment says it should...
The greatest tragedy in modern science (in my eyes) is the loss of physics as a hard science, currently these guys are way off with the fairies and producing nothing of worth, string theorists are the worst. We'll see what the CERN guys manage to come up with, but right now the mathematicians have taken the ball and run with it. It has been said that physics has become too hard for the Physicists...
I am not trolling, I am quite serious about Physicists playing dodgey games with mathematics.
Re:only in medicine by Anonymous Coward · 2010-03-18 00:25 · Score: 2, Insightful

Give some examples. I mean, real, specific examples of mathematical practices or mathematical theories that are invalid and why they are such. Based on what you said, my suspicion is you are basing your claim on a smattering of slashdot comments and no understanding of any of the physics you are referring to. Several points give you away:
1) You speak of physics but your two vague examples are (I'm guessing because your description is almost unrecognizable) renormalization theory, and string theory. You, and many others besides, forget that the many of physics sub-disciplines are not directly unconcerned with the former, and almost no one outside of high energy physics is involved in the latter. In other words, your examples leave out the bulk of physics being done.
2) Renormalization theory involves demonstrating that apparent divergences will exactly cancel. You do not just discard them. There was a saying that was popular in the 50's when people were developing the mathematical foundations for it: "Just because it is infinite, does not mean it is zero!". It was an extremely important milestone when Freeman Dyson showed in the early 50's that all such divergences - obeying certain, explicit criteria - occurring in quantum electrodynamics were renormalizable. In case you weren't paying attention, Dyson was a mathematician. In the following decades a lot of work was done to explore the mathematical properties of renormalizable theories, contrary to your assertion.
Now many theories are not - in the strict mathematical sense - renormalizable. In these cases, cutting off divergences is physically meaningful(condensed matter physics, where matter is discrete at small length scales), or physicists actively and openly discuss and search for ways to formulate theories that possess no divergences or are strictly renormalizable. One may also ask, what if the correct theory is *not* renormalizable? In other words, what if our theory, while mathematically sound, is physically inaccurate (which is the opposite of the bizzare paradigm you suggest)? This is something actively discussed (and even widely assumed) in the search for new physics, but if true, the effects are too small to be currently detectable. In other words, we are back to discarding things because they are small, which is standard practice.
3) String theory - which again, is actually a very small part of physics - is actually almost entirely mathematical, which you concede. The mathematics is fine; the question is what, if anything, does it actually mean? Your criticism makes no sense here - are you suggesting by having math taking over the physics, the math becomes bad?
4) You put accurate in quotes, as if to suggest it was a dubious claim. This is disingenuous - in fields where a physicist is liable to claim this, it is demonstrably true; theories are able to predict many constants (such as the magnetic moment of the electron) to experimental precision. Many general, quantitative phenomena that are predicted as a result of the mathematics have been experimentally verified. (BCS superconductivity, Bose-Einstein condensates, Bohm-Aharanov effect, Quantum hall effect, etc).
5) More generally physics has often been less then mathematically rigorous as new theories are developed and refined. Calculus - the basis for Newtonian physics - was not put on firm mathematical footing until the 19th century. And even then the intuitive form of calculus that Newton and Leibniz were thinking of was not formally developed until the 1960's(nonstandard analysis). Part of the maturation of physical theories is the introduction of mathematically rigorous foundations.
Seriously, make some specific claims rather than casting blanket aspersions. What physical theories today lack rigorous mathematical underpinning that physicists ignore?
Re:only in medicine by Vornzog · 2010-03-18 05:24 · Score: 2, Interesting

I've had my name included on several 'hard science' papers that had horrible statistical assumptions. I fought, and lost, because my professor had a big grant to maintain, and nobody else understood the underlying assumptions (we used an absolute scaling function, guaranteeing that our distribution was not normal, then tried to assume that it was normal). The second half of my thesis refutes the math in the last three papers I was on. Not one single person who read it understood it, which is sad because it wasn't actually all that impressive.
The only reason I'm not completely ashamed to admit that is that the bad stats don't actually change the conclusions in this case. They do invalidate the confidence intervals, though...
The training in stats required for 'hard science' is essentially nil. Most of the hard science folks I know who are not into high-end mathematical modeling just assume a normal distribution for their data, do a bit of analysis, and publish. I was in an analytical chemistry lab, where that sort of thing normally works, and to a very high precision. However, we were working with sloppy biological assays, where being within a factor of two is a miracle. Under those conditions, you need to know a lot more statistics.
Basically, the people who know enough math are working on well defined systems and theories, and the medical and biological communities don't know much math at all, but are working on very sloppy systems that need a lot of math to analyze correctly. It is therefore easier to spot the mistakes in those communities, but don't assume they aren't there in the 'hard science' papers.

--
-V-

Who can decide a priori? Nobody.
-Sartre

Re:Long winded troll by TapeCutter · 2010-03-17 15:20 · Score: 2, Informative

It's a troll because it implies scientists don't know about those things.

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.

Re:Its common knowledge by Opportunist · 2010-03-17 15:31 · Score: 3, Insightful

And 77.335% of all statistics claim more accuracy than their expected deviation warrants.

--
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.

What it actually said by williamhb · 2010-03-17 15:38 · Score: 5, Informative

Contrary to the parent poster's claim, the article does not focus on correlation vs causation. It focuses on people getting the correlation wrong in the first place. It lists several common mistakes scientists make when writing up research studies. (Not all scientists are very good at stats). These include:

If you run enough studies you are almost certain to find a difference that appears statistically significant at the p<0.05 level through chance alone. (It is incredibly unlikely that you will win the lottery; but across the whole pool of tickets someone wins it most weeks.) That makes studies that bulk analyze large amounts of data against many different factors, actively hunting for something that is significantly different, erroneous.
"p < 0.05" does not mean there is a 95% chance of your result being "true"; it just means that someone else rolling dice has a 5% chance of achieving the same result through chance alone.
Tests are often combined in ways that are mathematically inconsistent
Finding a statistical effect does not mean it is a strong effect
You cannot simply compare effect sizes between two studies because the results of their control groups may differ ("effect size analysis" is usually wrong)
Failing to find a significant effect does not mean there is no effect ("we found there was no significant effect on..." is misleading because "no satistical significance" is "no information" [your study didn't tell anybody anything] not "no effect" -- to prove "no effect" you need a different statistical test)

And lots of others. It then suggests Bayesian reasoning as an alternative to traditional statistical tests.

Most post-PhD scientists are aware of the common mistakes, but being aware that we make mistakes doesn't necessarily stop us from making them. If you chose a random set of conference proceedings, it is almost certain you will find at least one paper (and I suspect usually a dozen or more) that have statistical mistakes in them.

Re:What it actually said by RightwingNutjob · 2010-03-17 16:39 · Score: 4, Insightful

People who deal with raw physical measurements (radar engineers, astronomers, the guy who makes airspeed sensor of the B2--er,um...) have had this problem figured out for a while.

The result, repeatedly proven mathematically and by experience, is that the magic number is always Signal-to-Noise-Ratio. You can't get good information from crappy, scant, data.

Humanities and social-"science" types, and unfortunately the med school set, are by and large composed of people with varying degrees of pathological fear of mathematics, computation, and computer programming. I'd be willing to bet that a largish portion of even the post-PhD scientists who 'know' how to make a proper calculation for a statistical test don't really understand the physical meaning of the numbers they're copying and pasting in and out of excel.

When your attention and skill set are focused on looking through a microscope, or cutting up lab rats, or synthesizing chemicals, you probably never have the experience of being up to your eyeballs in noise estimates and P_FA's that bludgeon in the fact that your data really sucks because it's too noisy, and never need to answer fundamental questions like 'what's the probability that the ruskies will fire off a missile and this radar won't see it'/[insert biologically relevant example here], which *requires* learning the right way to do statistics.
Re:What it actually said by tabdelgawad · 2010-03-17 17:30 · Score: 2, Insightful

Good summary, but I call bullshit on the article. Most of the problems you mention and the others in the article are common popular misinterpretations of statistical results, but that doesn't mean they're common mistakes made by researchers in the studies themselves. Any rookie peer-reviewer would spot them immediately if they ever make it into a manuscript.
This doesn't mean that there aren't a lot of bad statistics-based studies out there, especially in medicine. But the problems are usually much more subtle than the article implies. Standard statistical methods require many regularity and sampling assumptions to be valid, and a lot of times researchers take these assumptions for granted when even a little probing would show that they're violated. A lot of advances in recent econometrics have been in the development of robust methods (valid when standard assumptions are violated), and those advances unfortunately take a long time to filter down to the 'applied researcher' level. If you're an applied researcher, it's generally unlikely you'll use statistical advances you didn't learn as a grad student.
And frankly, I have no idea what the Frequentist/Bayesian debate has to do with any of this. To suggest that using Bayesian methods is some sort of solution for the problems listed in the article is ridiculous.

--
Imposing Libertarian views on everyone online since 1992.
Re:What it actually said by TapeCutter · 2010-03-17 18:39 · Score: 2, Interesting

"Contrary to the parent poster's claim, the article does not focus on correlation vs causation. It focuses on people getting the correlation wrong in the first place."

Fair point, I only skimmed the TFA but I still stand by my assertion that it's a troll of the "scientists don't understand statistics" genre, it even starts by claiming statistics is a "mutant form of math". Had they ommitted that drivel and not refrenced discredited papers then maybe I would have read the whole thing.

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.

Re:Maths anxiety by reverseengineer · 2010-03-17 15:41 · Score: 3, Funny

Unfortunately, it is hard to break a viscous cycle. The high viscosity makes it easy to get stuck.

--
"FDA staff reviewers expressed concern about the number of patients who were left out of the study because they died."

Re:Long winded troll by Lars+T. · 2010-03-17 15:51 · Score: 2, Funny

Yes, it's rarely mentioned that causation implies correlation.

Interestingly, I have observed a correlation between people who cite that "correlation != causation" and those who ignore "causation implies correlation" in their arguments.

--

Lars T.

To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

Re:Statistical assumptions are often ignored by solanum · 2010-03-17 15:57 · Score: 3, Informative

and IAAB (biologist) and I can tell you that most scientists don't have access to statisticians or don't have the grant money to pay for them. I also don't have time to learn SAS and code my own tests, therefore I use stuff like SPSS or Genstat (both of which do allow you to code your own tests as well). Just because they are easy to use doesn't mean I do or do not understand the tests, the assumptions or their results. I would say my grasp of stats is above average for my peer group, below where I would like it to be and obviously limited.

One thing that is interesting to me is that throughout my education and career I have been warned off using multiple means comparisons and LSD in particular (I understand why and have avoided where I can and the latter always). Yet the only actual statisticians I have dealt with in recent years have recommended me to use LSD on means comparisons with 10s of means. I would be hard pressed to publish those results.

In summary, whilst statisticians like to blame easy to use stats programs for bad stats the reality is they are just a tool and if statisticians can't agree on the acceptable use of the simplest procedures I'm not sure what chance the rest of us have of getting it right.

--
Si hoc legere scis nimium eruditionis habes.

The problem is with statistics itself by Z8 · 2010-03-17 15:57 · Score: 3, Informative

I see a lot of posts bashing people for being idiots, and I'm sure that's often the case, but IMHO there are some big problems with statistics itself.

The most common school is the "classical" school, which is extremely counterintuitive. For instance, most people think that if a 95% confidence interval is 5 to 10, then the parameter has a 95% chance of being between 5 to 10. This would be true with Bayesian statistics, but exactly backwards for classical statistics. For classical statistics, it's that your 5 to 10 interval has a 95% chance of being around the parameter! This is a subtle difference that most statisticians don't even understand, and it screws up almost everyone. Furthermore the classical statement is much less useful than the intuitive statement that people think it is.
Relatedly, other schools which make more sense such as Bayesianism and likelihoodism aren't taught. Furthemore, nonparametric statistics are usually not taught to undergrads (unless they are statistics majors probably). In the real world, non-parametric statistics are often more useful because no parametric model is actually true (for instance, basic regression assumes that the Truth is in your model, and it almost never is).
Finally, a lot of statistics as it is normally taught depends on the central limit theorem. Any result that depends on the central limit theorem (or the law of large numbers) is often useless in real applications due to data poverty. The basic reason is that the average of i.i.d. random variables only converges to a normal distribution as 1/sqrt(n). Everyone knows this, and it's obvious that something that converges to 1/sqrt(n) is much much slower than the typical 1/n convergence, but people still rely on the central limit theorem.

Statistics is changing slowly (mostly because computers and R make non-classical statistics more practical) but the way it's taught still leads to problems.

Re:Long winded troll by williamhb · 2010-03-17 16:03 · Score: 3, Insightful

Actually the subtler issue here has nothing to do with statistics, they are implying peer-review does not work.

"Peer review" is another of the things that has been over-sold to the public. A science research group spends six months and a hundred thousand dollars conducting a research study using highly specialised equipement. They submit a paper to an academic conference or a small journal. It gets put out to review by three people who each spend about four hours reading it and reviewing it, and who usually do not have access to the equipment or the original data that was used in the study. Do you really think we're likely to catch every mistake at review? We certainly can't check the stats (except for the most egregious errors) because we don't have the full data tables they analyzed.

Scientists actually accept that inevitably some incorrect results will be published. More often in the smaller conferences than in the most prestigious journals, but even the journals have to publish a retraction every now and then. We also accept that most studies are never repeated, and so the "objective repeatable experiment" is rarely really tested for being either objective or repeatable. However, science has long had the "many eyes" effect at work. There are hundreds of thousands of scientists reading papers and using them in our own experiments. If some theorised effect out there is wrong, usually we'll find out eventually.

Re:Long winded troll by crmarvin42 · 2010-03-17 16:36 · Score: 4, Informative

Peer review is not about catching mistakes, although it can on occation. Peer review is about clear communication, such that the experiment can be repeated as identically as possible and that the readers can understand the authors justification for their conclusions. At least that's what every journal article I've read on the topic indicateded was the reason for the peer review processes creation. One of my advisors asked me about it on my written preliminary exam and I needed to do a lot of reading to be prepared for the oral exam. There were several different societies that claimed to have originated the idea, but no one claimed that the purpose was to catch mistakes, fabrications, or data manipulations.

--
Bureaucracy expands to meet the needs of the expanding bureaucracy.-Oscar Wilde

Re:Long winded troll by Capsaicin · 2010-03-17 17:11 · Score: 2, Funny

Interestingly, I have observed a correlation between people who cite that "correlation != causation" and those who ignore "causation implies correlation" in their arguments.

Ah yes, but can you suggest any causal relationship between those two observations?

--
Better to be despised for too anxious apprehensions, than ruined by too confident a security. --Edmund Burke

Looking for a good book on statistics by steveha · 2010-03-17 18:37 · Score: 3, Interesting

I'm interested in learning the essentials of statistics. What would be a good book to start me out?

I got The Manga Guide to Statistics and it did introduce me to the very basics. However, there are many places where it just gives you an equation, without deriving it or even explaining it. After reading this book, I now know how to calculate standard deviation, but I'm still a bit vague on how people actually use it. I would like to see some examples of how people use statistics in (for example) science experiments.

My ideal book would explain the basics, with examples, and show how the math works. Ideally it wouldn't be a thousand pages long, either, but that's a secondary consideration.

Recommendations, please?

P.S. Those of you who know about statistics: how good are the Wikipedia pages on statistics?

steveha

--
lf(1): it's like ls(1) but sorts filenames by extension, tersely

Re:Looking for a good book on statistics by Daniel+Dvorkin · 2010-03-17 19:53 · Score: 2, Informative

Devore's Probability and Statistics for Engineering and the Sciences is probably the best one-volume, undergrad-level intro to statistics out there. Get a copy (I think it's on the sixth or seventh edition now; you can pick up a fifth edition for cheap) and work your way through that, and you'll have a pretty good idea of where all those formulae come from and how they're used. Get a copy of R and check out the "Devore*" packages in the package list too. If you want to learn more after that, I recommend Kutner et al.'s Applied Linear Statistical Models for applications, and Casella and Berger's Statistical Inference for theory.
The Wikipedia stats pages are pretty good for most things, but many of them are written with the assumption of a lot of background knowledge. If you open up a page on a particular stats subject and you comprehend it, great; if not, be prepared to do a lot of digging outside of Wikipedia, because trying to figure out the subject from the links to other WP pages is an exercise in circularity.

--
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.

Significance is NOT probabilty by drewhk · 2010-03-17 19:01 · Score: 2, Insightful

.. or at least not the probability of the hypothesis. This is one of the errors that people make. Having 0.95 significance do NOT imply having 95% chance for the hypothesis being true! The significance is the probability of the test outcome assuming the hypothesis is true (in other words it is a likelihood value). You have to multiply it by a prior to obtain real probabilities.

Significance values will not even add up to 1 over the two hypothesises!

The root of the problem is that frequentists can not use probabilities for statements -- only for events. In frequentist terms you have to have a sigma algebra over some Omega state space which is measurable. Bayesians on the other hand can talk about the probabilities of any statements using probability theory as an extension of formal logic. I really recommend reading the books of E. T Jeynes and David McKay.

Other false assumptions people make with statistics:
- Everything is normally distributed
- Everything has a variance
- Everything has an expected value
- Hypothesis testing is without bias (in fact it is equivalent to give 50% prior probability to both hypothesises)
- Variance means average distance from mean
- Empirical variance does not have a variance

The use and abuse of statistics. by AliasMarlowe · 2010-03-17 19:18 · Score: 3, Interesting

I'm actually at a scientific meeting and saw 7 presentations in which they "double dipped" on their statisitics before we broke for lunch.

Double-dipping is bad enough, but the medical field is rife with multiple-dipping. Each dataset is plumbed to test dozens of hypotheses, without appropriately adjusting the acceptance criteria. Even with separate datasets, if you test 20 hypotheses and discover that each one is just valid at the 95% confidence level, then there is a very good chance that there are some false positives. In the medical alleged-sciences, however, all 20 would be blindly proclaimed as truth.

And then there are the social nonsenses^W sciences... If practitioners of some discipline do not understand how to use quantitative methods, they should limit themselves to qualitative argument only. Unfortunately, in statistics as in other fields, those who are ignorant or incompetent are generally unaware of the extent of their ignorance and incompetence.

--
Those who can make you believe absurdities can make you commit atrocities. - Voltaire

Re:The use and abuse of statistics. by Peter+Mork · 2010-03-17 22:20 · Score: 4, Insightful

And then there are the social nonsenses^W sciences... If practitioners of some discipline do not understand how to use quantitative methods, they should limit themselves to qualitative argument only.
Has it ever been demonstrated that social scientists have a worse understanding of statistics than physical scientists? I ask because my observations are the opposite. The physical scientists run a t-test and declare the matter resolved (significant or not-significant). Given the complexities of social sciences, these scientists check the assumptions required to use a test (e.g., normalcy) and have a good understanding of the statistics involved. (The obligatory exception is statistical genetics: physical science with a solid statistical basis.)

Re:There are lies, damn lies... by u38cg · 2010-03-17 22:43 · Score: 2, Insightful

I'm sick of this bullshit. There is statistics, and there is lies. Statistical operations are mathematical procedures, which may or may not be appropriate. They are not, however, lies. They may be errors, deliberate or accidental. Lies, on the other hand, are what you introduce when the data does not fit the hypothesis you want to put forward. Blame the liar, not his smoke and mirrors.

--
[FUCK BETA]

Re:No surprise here by qc_dk · 2010-03-17 23:52 · Score: 2, Interesting

I have the same problem. In school they were considering putting me in remedial classes because I had trouble doing basic arithmetic with even single digit numbers(I still have trouble with anything above 6). I can and could do a reasonably accurate estimate, but not the real result (possibly has something to do with me also having a bad short term memory). As soon as we got to the abstract bit (i.e. real math) I had no trouble. I can do integration with coordinatesystem shifts(e.g. cartesian->polar) in my head, but I will have to check my constants with a calculator.

MY common conversation by kenp2002 · 2010-03-18 02:29 · Score: 4, Informative

The largest demographic in american prisons are black americans. Real statistic but is it true?

Given a particular sample that indicates blacks are 60% of the prison population this would appear to be true.

But what if I said: "The largest demographic in prison is minority, non-whites." Suddenly the % jumps from 60% (black) to 80% (minority). Which is more right? This is the problem with statistics. Context.

Now I can say readily that the largest demographic in prison is actually right-handed people. The % now jumps to 90%.

But wait! There is more! The largest demographic is prison is actually people who prior to arrest were below the poverty line which jumps to 99% of the population. Again, all of the above are accurate based on a sample but which is MORE correct? Linear Algebra is coming into play here quickly....

When that kind of issue comes into play, it is the classic "Correlation != Causation" confusion. The majority of people in prison are in there because of "Being black? Being a minority? being right handed? or being poor?" None of the above. The majority of them are in there because they were convicted of a crime and sentenced. That is the causation of their imprisonment, the rest is correlation which may have a direct causation on the conviction or sentencing, but no direct causation on being in prison. (e.g. You cannot be thrown into prison for being poor, black, minority, right handed)

Same with medical research, politics, economics, etc. The price of oil rising 10% and a subsequent 5% drop in shipping orders. Measuring the significance of regessors is important but oddly never reported most of the time. Many factors get masked or shadowed by higher level regressors (e.g. being a minority masks a variety of other social and economic factors. In addition it can distort statistical work by being too broad. Asians have a variety of different economic and social factors as north american blacks versus even african immigrants.)

Back to the orignal subject:

We can take 100 prisoners and 100 non-prisoners and figure out rather quickly if being black is statistically significant in prison population. Non-prison population blacks would account for 25%-45% of the population (Depending on location). We can see that 60% of prisoners are black. There is a 20+% deviation from the norm. We can test to see the significance of that. Same with minorities. Now we find something quickly that right handed is insignificant because it doesn't deviate from the norm. We can test left-handed and right-handed populations and rule out the handed-ness of a convict being significant.
We can find the economic status is considerable MORE significant then minority or black as a status. We can determine that the reason minorities or blacks are disporotinally more prevelant in prison is that blacks and minorities have higher rates of poverty. We can extract and determine the statistical weight of POVERTY in regards to imprisonment (Since we find a high % of white in prison that are poor compared to the normal population.) Once we figure that out we can remove that and continue an investigation and figure out what weight minority and black has once we have removed POVERTY from the model (Residual analysis).

The problem in reporting is without providing the whole, comprehensive analysis you can miss important things. For instance to correct the injustice in sentencing, without reporting the weight POVERTY has in contrast to BLACK or MINORITY you may lose sight that you may have better success addressing POVERTY to normalize sentencing rather then MINORITY or BLACK (or not).

The same happens in medical reasearch. Given a cocktail of drugs wirthout having the whole analysis you may end up providing more of Medicine A versus B but lose sight that A & B are limited by the dosage of Medicine C.

Satistics are not bullshit, rather mearly observations with no intrinsic agenda or even implication of truth. Purely amoral, like a hand gun.. useful to both the good and evil.

Statistics don't lie, nor do they tell the truth. They simple show the relationship of the data as it stands. The Truth or Thruthiness of it is subjective and vulnerable to context.

--
-=[ Who Is John Galt? ]=-

Re:What are you measuring? by RightwingNutjob · 2010-03-18 12:27 · Score: 2, Insightful

That's exactly the point. If obtaining a degree of certainty in one measurement takes a bookload of theory to do 'properly', and is 'hard', obtaining a the same degree of certainty in a space with N channels should be 'hard'^N. The OP's point was that people assume that it should be just as easy, and don't go to the trouble of learning what it takes to do it right.

Re:Long winded troll by crmarvin42 · 2010-03-20 01:36 · Score: 2, Informative

That people are trying to use peer-review as a method to detect fraud, does not make it a good method for doing so. I've mentioned this before on /., although not in this thread, but I have no way of telling if the numbers in a table were generated by the experiment described, some other experiment, a random number generator, or the PR department at the company who's product is being evaluated. As long as the numbers are internally consistent, I have to "trust" that what they describe, happened. I can catch obvious errors, such as the SEM not supporting claims of statistical significance made by the authors. However, if during the review process, they claim that the SEM was a typo (numbers were actually SD and not SEM for example) and change it, I have no way of verifying that their explanation was true.

Also, in your quote you highlighted 2 different lines. The first has to do with the soundness of the conclusions. This is most definitely a role of peer review, but not related to accuracy. It doesn't mean that they verify that your conclusions are correct. Conclusions are not objective. The data gives you objective facts from which to draw subjective conclusions. This line indicates that your discussion will be evaluated for how well the data (yours and previous literature) supports your conclusions. If you extrapolate, or ignore important results then your paper will be rejected.

The second bolded section just indicates that if serious errors are found (using insufficiently large sample size, extrapolating results, etc.) then the paper will be rejected. That's totally understandable to reject, but obviously serious errors of this sort are uncommon. Most errors are much harder to detect, and are not picked up by the peer review process in my experience.

--
Bureaucracy expands to meet the needs of the expanding bureaucracy.-Oscar Wilde

Slashdot Mirror

Science and the Shortcomings of Statistics

88 of 429 comments (clear)