Science and the Shortcomings of Statistics
Kilrah_il writes "The linked article provides a short summary of the problems scientists have with statistics. As an intern, I see it many times: Doctors do lots of research but don't have a clue when it comes to statistics — and in the social science area, it's even worse. From the article: 'Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.'"
In other news math may not lie but people still can, all the honesty and good statistics in the world doesnt help end-user stupidity, and there are statistically two popes per square kilometer in the vatican.
A bullet may have your name on it but splash damage is addressed "To whom it may concern."
That 77.28% of all statistics are made up.
There are two kinds of fool. One says, This is old, and therefore good. And one says, This is new, and therefore better.
It's not just statistics that people have a problem with...
-- Braden's law of data: All data spends some of its lifetime in an excel spreadsheet.
The entire article can be summed up by the tiresome cliche "correlation != causation". To make matters worse they quote an economic historian who does not understand that science is not in the bussiness of proof... "“That test itself is neither necessary nor sufficient for proving a scientific result,” asserts Stephen Ziliak, an economic historian at Roosevelt University in Chicago."
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
She was like a little ball of sunshine.
As for statistics, does this really surprise anyone in a time when net polls are being reported as hard news?
My doctor was explaining to me that my blood sugar readings should not have a standard deviation of more than 1/3rd of the average blood sugar reading. Just to test if he knew what it meant, I asked him what a standard deviation was. Oh the fun when he tried to bullshit his way out of that one! He eventually told me that when I plot my data in Excel I can ask it to give me statistics on the column and it would mention what the standard deviation value was. But when I pressed on and asked him what a standard deviation is, he shooed me off and told me to go look it up. Never did he confess that he had no clue.
I've found that statistics knowledge amongst maths-anxious non-math-majors only seems to deepen the fear they have of mathematics in general. One was talking about a basic regression model. He had no appreciation for what anything was (i.e. it was all "plug in the numbers" for him), and when I tried to explain, he'd consider doing mathematics to be some kind of labour to be avoided!
Luckily this was bachelor's level.
Maths anxiety is a viscous cycle which needs to be broken at an early age at home.
A lot of the "science" done today isn't actually much more than agreeing that their idea is right and "well supported." It's the mathematics; the statistics and models that they use that are most often the first and most obvious signal flares. An awful lot of them should really just give their degrees back to their university and go off do something useful, like play in the traffic, or maybe catch a flight to Saudi and run up and down the main streets yelling insults to Islam, or even tootle down to Chelsea's ground and start singing Inter Milan chants. The claims of "science" and "it's a scientific fact" in the new millennium all too often tend to be completely against reality.
Actually, it's a tough situation. There is no real life experimental data can 100% fit the assumptions of commonly used statistical models. Real life data is messy. There is some degree of simplification. In addition, resorting to whiz-bang fancy methods that "fit" the real data may not be easily interpretable. Ease of result interpretability is what medical scientists want. There are other issues as well, such as computing time, equations derivability, etc.
In addition, many many medical scientists use statistics as a tool to filter things (e.g. candidate genes, target enzymes, treatments, etc). In this case, 100% accuracy is not really important. Once the scientists narrow down the genes, they can test the validity directly in either test animals or real people.
--
Error 500: Internal sig error
These days, most people cannot deal with basic algebra. Case in point: my sister, who has a master's degree in the social sciences, reaches for a calculator to calculate the sales tax on purchases (and she does it because she cannot manage without it).
Why on earth would we expect Joe Average to be able to comprehend the meaning of statistics?
linquendum tondere
Our company six sigma training included two weeks of collecting and analyzing data with a stats package. I got enough experience to even train me how to use the program. I can still do a few things that come up regularly. Probably the best thing to come out of six sigma (for me at least).
As a doctor myself, I feel I should add my $0.02...
Throughout med school we had the odd scattered lecture on statistics, and later when reading papers I used to skim over most of the maths just to look for the P value at the end (one representation of how statistically significant a result is).
However, I then took a formal stats course and was amazed at how little I understood - Monte Carlo techniques, Markov models, and even something as trivial yet important as the difference between a parametric versus a non-parametric test.
And then it struck me - most of the research I had read had applied parametric statistical tests to their data - that it, the researchers made an assumption that the underlying distribution of results would fall on a normal curve. Yet this simple assumption may be all it takes to skew the data when they should have chosen a non-parametric test instead.
So yes, stats are vitally important, badly taught, and focus too much on the maths rather than the concepts. Remember that we're doctors, not mathematicians - the last set of sums I did were in high school. If I need to analyse data, I'll probably plug it into SPSS - although now with my eyes open.
-Nano.
Funny Stat correlation: http://www.seanbonner.com/blog/archives/piratesarecool.jpg
Troll is not a replacement for I disagree.
When I did my BA in psychology Statistics was the core of the degree. It was the one subject that you could not escape and had to take for the full year every year of the degree. I heard later that the Psychology department at that Uni was sometimes disparagingly described as teaching Rats and Stats psychology.
Statistical methods are typically developed for fairly specific mathematical models. A practitioner may error greatly by using a statistical method outside of its intended purview. For example, many statistical tests assume that different groups of observations are independent or correlated in a specific way. If this isn't true then the resulting inferences can be very inaccurate.
Unfortunately the spread of "easy to use" statistical software is making this problem worse. Many scientists just enter their data and select an analysis from a drop-down menu - thinking that just because their data is in the right format that the results will accurate. It would be better if people had to think about what analysis to choose rather than just treating the choice of a test like the choice of a visual effect in photoshop.
IAAS (statistician), for what it's worth...
that there are only 3 kinds of scientists: those that are good at math and those that aren't.
Game: Player 'Donald J Trump' now has AI skill level 'experimental'.
In other news math may not lie but people still can...
Usually (in science at least) it's not even a matter of lying. Part of the problem is that the multi-headed monster that statistics has become has a tendency to lead people to over-use numerical "answers" vomited up by stats packages, without really understanding what they are for, or how to interpret them.
Statistics are very useful for predicting certain things, but all too often they are submitted as "proof" of a given condition, which is dangerous. Sometimes we need to throw away statistics and start applying common sense.
Does that mean that we should send people who know what they're doing to sort through results and draw more meaningful conclusions? Or just rerun the tests?
This seems obvious, so please don't waste mod points here, people who know what they're actually talking about will probably chime in.
-
They're all buncha crap, and I say this with 95% confidence interval, or sum such stat shit that I wish I can remember.
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
> countless conclusions in the scientific literature are erroneous
Number of Publications: Finite
Number of Conclusions: Finite
Time taken to count erroneous conclusions: Finite
Countless Conclusions? I don't think so!
A large but unspecified number of conclusions in the scientific literature are erroneous: Not so compelling
One of the best articles I've seen on stats (and their misuse). I'm taking a data analysis course at the moment and I've spent at least a dozen hours simply computing confidence intervals, testing the null hypothesis, and determining significance. It really has changed how I view statistics because it keeps pounding in these very key but oft-ignored principles.
"Everything is linear if plotted log-log with a fat magic marker."
It is not a shortcoming of statistics that other people, like various scientists who aren't statisticians, don't know how to use or properly interpret statistics. It is a shortcoming of their knowledge.
It is not a shortcoming of the Copenhagen interpretation of quantum mechanics or the Chicago school of economics if I don't understand or know how to correctly interpret their results. It is my shortcoming and fault for not knowing enough to connect the dots.
I do statistical research some of that is through interacting with researchers in the biosciences. Often when I go to talk to a researcher and ask them if they could use some statistical or mathematical or computational assistance with their research it has almost always been a fruitful starting point to long conversations and getting into the research. Now sometimes it was simply a matter of looking at their F-test results or ANOVA scores and telling them what it meant (like with a regression model relating proportions of certain characteristics between taxa), more useful interactions for me often mean working on new algorithms or estimators or working with fitting a model from their empirical data because there isn't a reliable standard model to work off of (like intergenic distance between genes in an operon) that kind of challenge makes less engaging work worth the hassle. Maybe I'm odd because I've worked hard to have a good background in both statistics and biology, but I shouldn't be.
Although here is an observation that perhaps supports some of the intent of the article from my own experience. I was speaking with a biology graduate student and it came up that they had a biostatistics course in the department. Of course as a statistician my mind goes towards survival function, failure rate, life tables, censored data, bioassy, epidemiology, microarrays, clincal trials, topics along those lines. It turned out their course focused z tests, t tests, f tests, confidence intervals, point predictions, least squares regression, multiple regression, ANOVA, and things along these lines just with simulated problems in a lab setting. That is not necessarily a bad thing, but much of the core math was under played or missing like model assumptions and alternate formulations or things like dummy variables. The worst part was that even though they were doing well with the class they had no confidence in actually using the statistics and didn't understand how to interpret the meaning of something like a confidence interval, they knew how to calculate one, but it wasn't clear what it actually meant to them.
The corollary to the notion in the summary I'd rant and claim is that scientists overall have less than desirable skills in mathematics, statistics, and computation than those who studied those disciplines principally and that's hurting science. However many in those three disciplines really know little beyond basic results in any of the sciences which hurts the applicability of these mathematical fields to the sciences and likely hurt our ability to develop certain types of discipline specific results that can be generalized from work in application problems.
In either case whether you're a typical scientist or a typical math/stat/comp person in order to become proficient enough in the other areas it requires going an awfully long out of the way compared to any counterpart who simply does not care and goes straight through as many before have. While in some areas of research on either side it is no problem to do as has been done and not further knowledge into those other areas. Increasingly results that have the highest levels of impact are coming more and more from truly interdisciplinary research. In order to further encourage that for those who are interested in such fields (aside from making more clear what areas in any of the fields fringe to such interdisciplinary work) we need more incentive to study more than one field and/or better ways of enabling fruitful cooperation between the camps.
E.g.: Study shows a cancer group of size 3000 is cured by drug A 99% of the time.
1% it fails.
30 patients are dead. No correlation it seems at the time.
*2970* patients are saved.
20 years later, it's proven a dormant undetected/sequenced gene is responsible for the 1% failure of the drug, making it ineffective.
Statistics allowed the drug to be approved at the time that saved millions of lives.
I hate stats as much as 70% of the average Joe :), but anyone with an education knows its importance. (Esp. those dudes that are breathing right now because that drug saved their lives)
So the article in short, don't lie about your stats(or in general don't lie!) and you can benefit humanity.
The clearest discussion of the logic of probability reasoning I know of is E.T. Jaynes' Probability Theory: The Logic of Science. (Cambridge University Press). Many of Jaynes' excellent papers on statistics are downloadable from http://bayes.wustl.edu/etj/etj.html.
I don't have to be a statistician to know that the above post is 97% bullshit.
maybe we'd get some honest science if it wasn't a bidding war.
+/- 50%*.
*confidence interval=100%
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
In reading a couple of these types of articles recently I've noticed that the articles always talk about this being a problem across all journals, but only seem to mention a couple of different disciplines - medicine usually chief among them. Has anyone heard/read anything naming a hard science (e.g. chemistry or physics) as full of bad stats? My hunch is that this happens most often in medicine because you have the combination of controlling for a lot of variables as well as inadequate mathematics training.
You think I'm full of it? Wait till you hear professors at seminars, making up whatever theories they like. I've witnessed professors from household-name schools acting like this.
From TFA:
One has to wonder, though: how much of that is due to misuse of statistics and how much is because it's paid research expected to get certain results in favour of those paying for the research?
Ok, so the referenced fields that have problem with stats are both not Sciences. Medicine has no theories that govern the human body. All they do is memorize a bunch of crap and then poke some squishy bits and memorize how it looks and feels when healthy/normal v.s. unhealthy/abnormal. It's really the Engineers, Physicists, Chemists and to a lesser extent (though they are gaining market-share) Biologists, that make the true breakthroughs in Medicine.
And the social "sciences" are just plain an embarrassment when it compares to real Science. Seriously...
People in the real Sciences would have been forced to take enough Mathematics and/or Statistics to be able to properly interpret Statistics. And just as importantly, be able to do proper experiment design (Medicine, I'm looking at you). Then there's the whole not being able to tell the difference between causation and correlation. I could go on.
87.24% of statistics are made up.
[Insert pithy quote here]
I had taken a stats class in undergrad... did not really pay attention as I thought it had no use. While getting my masters I was obligated to take an advance statistics class. Going in, for the life of me I thought it would be a waste - it was the best class I ever took. I was able to use it in my job almost every week if not more ( most of the other classes were theoretical at best and had no real world application ). Ten years later, I still rely on things I learned in that class. Statistics should be mandatory for all in college regardless of major because it can be used for so many things.
And lots of others. It then suggests Bayesian reasoning as an alternative to traditional statistical tests.
Most post-PhD scientists are aware of the common mistakes, but being aware that we make mistakes doesn't necessarily stop us from making them. If you chose a random set of conference proceedings, it is almost certain you will find at least one paper (and I suspect usually a dozen or more) that have statistical mistakes in them.
Warning Signs in Experimental Design and Interpretation
http://norvig.com/experiment-design.html
He does an excellent job of describing and illustrating common research mistakes, statistical and otherwise.
Build a man a fire, he's warm for one night. Set him on fire, and he's warm for the rest of his life.
Well, you ARE in Biostatistics - try doing a real science and getting away with that shit.
what kind of atheist are you now?
keep posting comments like this on slashdot and see if the "gods" award you your PhD...
What is missing in this discussion is systematic error, which is often very large and often dwarfs the analyzed random error or even the result itself. Systematic error is frequently a basic problem in biological research and in emerging technologies with crude tools and poorly understood cofactors. The human factor can hugely inflate systematic errors where legal, marketing or politics are involved. The systematic error may not be uncovered for years or decades, if ever.
One can design "tests" that are beautifully reproducible and precise, but absolutely, and deliberately, absurdly wrong. And get away with it, nay, be be rewarded handsomely as a salable skill. It happens behind the scenes. I have direct experience in science and engineering where politics have butted in, but I see this as more common in medicine, pharmaceuticals and the medical journals. Multiple, blatant design and interpretation errors in any single article that are extremely hard to assign to mere stupidity and/or ignorance, that involve authors with clear conflicts of interest to victimize cheap (defenseless) generic drugs and supplements, and to promote their product.
How blatant does it get? I have industrial experience where a big name, big $ university consultant was given free reign to do a "political assignment". On a literature comparison of two materials' figure of merit, even after fair warning, he missed reality by 9+ ORDERS of magnitude, over a billion fold by avoiding data in equal test environments. The results of internally published, correct tests were later deliberately ignored. This did eventually lead to his backers' catastrophic failure and his dismissal. Millions wasted. This is one of dozens of such situations I've seen in intercorporate wars with NoAm and European companies (no names here). The pharmaceutical and medicine situation appears blatantly worse in terms of number of fundamental test errors in a given high profile paper, resultant damage and duration. But big profits are made!
Statistics is changing slowly (mostly because computers and R make non-classical statistics more practical) but the way it's taught still leads to problems.
... and those that may or may not be good at math. :P
For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
http://www.americanscientist.org/science/pub/everything-is-dangerous-a-controversy
These posts express my own personal views, not those of my employer
So call them out. If you don't, you're just a part of the problem you describe.
Research shows that 78% of all people who use the term "research shows" are just making s**t up.
Non-statisticians are always going to be doing statistics. Perhaps from a purist point of view they shouldn't, but they will. They might be health researchers, business analysts etc, and their job requires the use of statistics. Unfortunately, many statistical packages assume people can be trusted to choose the right test and interpret it correctly. But that isn't enough, and even some of the more helpful wizards and documentation are inadequate. The open source SOFA (Statistics Open For All - https://sourceforge.net/projects/sofastatistics/) project is an attempt to provide the required guidance and tools and is looking for people to join the community.
See, you stretch too far.
I could believe in conflicting medical studies. Biological systems are tough things to confirm (as you should know).
But you are going to spout that isotopes are misleading and untestable?! For a quick proof to the contrary, ever heard of a nuclear bomb? Yup, uses a purified isotope of uranium. Tritium (as seen in Spiderman)? Also an isotope. Half life is not just a game, but a measurable entity.
As with most 'damning evolution evidence' you have to give more evidence than some hand waving about bones in the wrong place.
Yes, as critical intelectuals, we are able to look at ourselves in this critical manner. However, really successful people, e.g. lawyers, politicians, psychologists, & salespeople never have this drawback. I remember Ronald Reagan talking about various issues and being absolutely wrong. However, he said it with such conviction and determination that I had to go back and check the facts. But apparently, he never did.
Another time, I remember reading about using DNA to cross check previous serious crime conviction. Judges and politician refuse to open closed cases, because doing so undermines the fact that maybe the justice system might be quite faulty. Rather than worrying about incarcerating innociant people, the legal profession was more worried about protecting their own future revenue stream.
Now, salespeople, no matter how professional and honest they might seem, are taught to never let a sucker get an even break. Doctors too, are often taught that you should never allow the impression that you might be wrong to be formed in people's minds.
Last year, NOVA had a episode about the practice of performing lobotomies on mentally ill people. One part of the story focused on treating one of the Kennendy girls during the 1960s. The girl had definite problems. However, the real tragedy of the story was how Harvard and Johns Hopkins cream of the cream doctors turned a girl with an IQ of a little girl into that of a vegatable. Although there were no scientific cases of a lobotomy of curing anyone with her problems, the doctors went ahead and preformed the procedures anyway. Well, the biggest irony of this was if these were the best doctors that money can buy, I shutter to think what would happened to people in mental institutes for the indigent and politically unconnected run by doctors graduating from state universities and military institutes.
I have come to the same conclusion, "the best and the brightest do not go into medical school".
Interestingly, at the medical school I attended, during the graduation ceremony the PhDs are called up prior to the MDs. This of course implies more deference to the PhD degree. I wonder if this is standard everywhere...
... and I noticed it is dated March 27. OK I guess it is when the magazine comes out but still it was a little ironic in this instance.
and statistics.
Seriously, the problem with statistics is that they can be manipulated to mean whatever the presenter wants. Taken out of context, which is how a lot of statistics are presented, enhances the problem. I wouldn't trust any statistic unless I can examine the data behind it.
Statistics are not inherently bad, but I think they are over-used in many areas and often present a purposefully distorted view of something. Statistics do not address causality.
Better yet, last night on "Mad Money with Jim Cramer", someone pointed out that Goldman-Sacks research recommended selling all assets of HOG, yet looking at insider activity of their holdings, GS increased its holdings of HOG from .5M to 4M. Cramer attributed this to different divisions of GS advocating different positions. However, I think most viewers thought that GS is trying to get everyone to sell its shares of HOG so that GS can get them on the cheap.
I was specializing in methodology during my doctorate work and so had to not only have a good grasp on stats as performed but also able to at least estimate how well the analyses I was developing worked. We had a top notch stats professor who'd started in psychology and so ended up not only teaching all graduate courses for our department but also served as top level consultant for any and all of our projects. Since some of my work was in nonlinear phenomena and therefore stats, I spent many an hour trying to absorb everything he could offer.
When I'd gotten on top of the material, some of what I saw going on made me disturbed, angry and/or disgusted.
In EEG research it was common to go through an analysis system wherein one first does a test on all electrodes together to determine if there's a difference between conditions. Fine. But then to localize, one first divided the electrodes in half and tested left vs. right. Then one tested both left and right according to front and rear. And so on, until individual electrodes are compared. I as told this reduced false positives and retained power. I was told to do it in my dissertation. I was told who started this process. I wasn't told it was bullshit; I figured that out on my own. I looked up the reference. There was no mention of this process in the article. As is common when I tracked down such rituals, the article said to do what you could justify doing but to know what you could and could not justify due to your own ability. I also found an article that said such processes did not retain power nor reduce errors.
The stats prof pointed out that each collection of electrodes in each test was arbitrary. There was no reason that every possible combination should not be included in their ritual. A "real" result from the process should require that. I pointed out that our software localized electrical sources in the brain down to 1mm voxels (to work with fMRI data) making surface electrode analysis extraneous. I took these points and the articles back to the department and was told finally to "do what I had to" for my diss. I ended up using a nonlinear running t-test to analyze time series of signals in 2 msec windows and produced a 'movie' of dopamine effects on the frontal lobe across the first 20 msec post stimulus. Nobody on my committee could understand the analysis, but they all loved the movie. I didn't tell them I'd adapted the analysis technique used in fMRI because some of them had done fMRI research and thought they knew what they were doing. Had I had to explain my workings I'd have had to tell them they didn't understand what they were doing, and at that point I wanted to get done and get to my first job offer. NIH. Invited and non-competitive. They understood my work. Besides, by this time I'd already studied at Santa Fe Institute and had learned the difference between learning from people who knew more than I likely ever would and jumping through hoops for people who I'd already passed in ability.
I also saw colleagues doing fMRI work who had no clue they were pushing statistical testing so hard that due to the necessary correction factor they were trying to find individual data points with p values with up to 22 zeroes between dot and data, a certainty they could never realistically achieve, and a cut off level they'd never even consider trying to look at in any study where they knew at least some of what was going on. I've seen entire poster sessions at conferences on brain mapping where maybe 2 out of 200 could accurately and factually explain how their analysis worked (typically they worked with a biophysicist who could, but none of which understood the phenomenon under test well enough to describe it, meaning together they could produce results but not knowledge as they couldn't pass the latter back and forth between them).
And I've seen researchers who did understand fMRI and SPS (statistical probability mapping, the analysis technique used for fMRI). And they refused to use the technique for the reasons given. My boss at
"I may be synthetic, but I'm not stupid." -- Bishop 341-B
One statistic I remember vividly is that in informal situations, 43.85% of all statistics are made up on the spot!
I'm interested in learning the essentials of statistics. What would be a good book to start me out?
I got The Manga Guide to Statistics and it did introduce me to the very basics. However, there are many places where it just gives you an equation, without deriving it or even explaining it. After reading this book, I now know how to calculate standard deviation, but I'm still a bit vague on how people actually use it. I would like to see some examples of how people use statistics in (for example) science experiments.
My ideal book would explain the basics, with examples, and show how the math works. Ideally it wouldn't be a thousand pages long, either, but that's a secondary consideration.
Recommendations, please?
P.S. Those of you who know about statistics: how good are the Wikipedia pages on statistics?
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
.. or at least not the probability of the hypothesis. This is one of the errors that people make. Having 0.95 significance do NOT imply having 95% chance for the hypothesis being true! The significance is the probability of the test outcome assuming the hypothesis is true (in other words it is a likelihood value). You have to multiply it by a prior to obtain real probabilities.
Significance values will not even add up to 1 over the two hypothesises!
The root of the problem is that frequentists can not use probabilities for statements -- only for events. In frequentist terms you have to have a sigma algebra over some Omega state space which is measurable. Bayesians on the other hand can talk about the probabilities of any statements using probability theory as an extension of formal logic. I really recommend reading the books of E. T Jeynes and David McKay.
Other false assumptions people make with statistics:
- Everything is normally distributed
- Everything has a variance
- Everything has an expected value
- Hypothesis testing is without bias (in fact it is equivalent to give 50% prior probability to both hypothesises)
- Variance means average distance from mean
- Empirical variance does not have a variance
All these studies in Africa show that circumcision prevents aids. However, if you look, Scandinavian countries have the lowest rates of aids and virtually no circumcision.
The US has a high circ rate and a high aids rate.
The reason Africa has those studies showing circumcision reduces aids is because after being cut you are laid up and can't boink!!
I'm actually at a scientific meeting and saw 7 presentations in which they "double dipped" on their statisitics before we broke for lunch.
Double-dipping is bad enough, but the medical field is rife with multiple-dipping. Each dataset is plumbed to test dozens of hypotheses, without appropriately adjusting the acceptance criteria. Even with separate datasets, if you test 20 hypotheses and discover that each one is just valid at the 95% confidence level, then there is a very good chance that there are some false positives. In the medical alleged-sciences, however, all 20 would be blindly proclaimed as truth.
And then there are the social nonsenses^W sciences... If practitioners of some discipline do not understand how to use quantitative methods, they should limit themselves to qualitative argument only. Unfortunately, in statistics as in other fields, those who are ignorant or incompetent are generally unaware of the extent of their ignorance and incompetence.
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
.. a place where the sun doesn't shine (often - statistically), does that mean 100% of those are stinky?
--- I am known for the ones who want to find me on the net. Is that a privacy risk or a privilege? One might wonder..
I think hippies tried to warn us too in the seventies of avoiding bad trips (LSD) .. Didn't know there was any math involved in that ...
--- I am known for the ones who want to find me on the net. Is that a privacy risk or a privilege? One might wonder..
That would mean your 100% is truely only valid for 99% at most?
Why to people ask .. are you 100% sure? while the answer is mostly "I think so" ?
To stretch this ... are you correct about your statement, even if it is statistically only for 99% correct?
--- I am known for the ones who want to find me on the net. Is that a privacy risk or a privilege? One might wonder..
Irrationality by Stuart Sutherland. Talks about irrationality in general, with a focus on how statistics are generally misunderstood and misused by the public, and particularly health officials. He also recommends Innumeracy by John Allen Paulos. As a good start to learn about statistics and probability theory.
I've always thought teaching a good understanding of statistics should be a requirement for high schools, since statistics are so often (mis)used to justify public policies and legislation. We need a citizenry that can see through the bullshit, or at least think a bit critically on the subject.
I think a firm understanding of statistics is more useful than the entry level calculus and the entry-level science courses like chemistry and biology(not that those aren't good too, just not as relevant to citizenship).
Here's a nice book on statistics called "How To Lie With Statistics" that covers a lot of the ways statistics are misused. (not a referrer link or anything like that)
http://www.amazon.com/How-Lie-Statistics-Darrell-Huff/dp/0393310728
> there are statistically two popes per square kilometer in the vatican.
Two LIVE popes, you mean. You'll find many more per square kilometer if you remove that restriction.
Which means we should be grateful that there haven't been more anti-popes, given their estimated mass, close proximity and E = m*c^2 ...
> And then it struck me - most of the research I had read had applied parametric statistical tests to their data - that it, the researchers made an assumption that the underlying distribution of results would fall on a normal curve. Which in cases with lots of samples is a perfectly valid assumption. See http://en.wikipedia.org/wiki/Central_limit_theorem
And why would they? They can make more money on Wall Street
Think you are missing the point dude.
We (mostly!) didn't become doctors / scientists to make money.
If people are only motivated by money.... then have you ever wondered why kids climb trees ?
Anyone quoted by a reporter knows how little they understand
Don't believe what you read is the truth.
There are three kinds of lies: harmless lies, harmful lies...and then there's statistics ;)
40% of all people know that
Apologies that I can't remember the exact details but I read about the case of a university professor in the US who lost his job for allegedly saying there were more men in science because men were more intelligent than women. The issue revolved around the press not understanding standard deviations. What the professor had actually said (in fewer words) was that the standard bell curve for intelligence is slightly difference by gender. For men it is shorter and fatter but the tails don't extend very far while for women the curve is taller but with very long tails. It boils down to there being more intelligent men but equally, more stupid men while women have the potential to be both significantly more intelligent but also significantly less intelligent than the bulk of the male population.
All the details are in the book Super Crunchers which is incidentally a fantastic read for anyone interested in the application of statistics in a very general, non-mathematical sense (it covers the use of statistics by baseball scouts, medical computers, predicting changes in flight prices and predicting wine vintages to name a few scenarios that are covered). Unfortunately the professor lost his job because of the furore generated by the misinterpretation in the press.
http://www.amazon.com/How-misuse-statistics-Spectrum-book/dp/0134362047 was written by an early president of the American Psyhcological Association and, in its day, was often used when teaching lower-level statistics courses.
You are missing the point - he did not know what a standard deviation means! That is unforgivable for anyone with a medical degree...hell, it's unforgivable for anyone who has passed a course in statistics in school.
School knowledge and real life knowledge are separate things.
He may have known at one time what the textbook definition was, but over the years it was knocked out of his brain by other things used more regularly on a day-to-day basis. He can look at a set of numbers and "know" (or "feel") that they're wrong or right when it comes to someone's health. That's called experience.
The universe isn't as deterministic as our models, graphs, and tables make it seem sometimes.
Scientists should start working with statisticians.
How do you prove that isotopes stay in the same place in the mud for millions of years?
when people can't use a basic tool, its the fault of the tool, not the people
As wolfgang pauli remarked, its not that new ideas triumph because people discard old ideas; new ideas triumph because old people die and the students learn the correct idea in the 1st place
Shortcomings of statistics? More like shortcomings of humans *attempting* to use statistics.
When I studied medicine a few years ago I was surprised to see my fellow students not understanding the easiest mathematical tests and their implications. But the university wasn't of any help. Instead of telling the students about the importance of these tests and showing them how and where to get help to get correct tests, they declared this knowledge as not important for becoming a physician. So nobody even tried to understand these tests, the courses were just lost time. Later these students used software to create charts which looked great. The fact that they were wrong was of minor importance, as nobody understood or checked them.
cb
Misuse of statistics is well-represented in scientific articles. Other things that are well-represented are poor knowledge and reasoning in the area of the subject discipline, inept writing, misleading or unhelpful graphics, poor scholarship, etc. Sturgeon's Law applies across the board.
Having read a fair number of sky-is-falling articles about statistics in science, and having worked with my share of researchers (MDs and PhDs in a variety of fields) who think everything is rosy, I'm pretty sure that the truth is somewhere in between. A minor saving grace is the fact that getting the statistics wrong is not the same as getting the answer wrong. Although it's certainly quite common to find published articles that make claims with no support whatsoever, in my experience it's much more common to find articles where the inappropriate statistics just mean the support isn't nearly as strong as claimed. Spurious results tend, though not as reliably as we'd like, to get weeded out by the literature. I rarely read an article that isn't specifically about methodology in which the methods/statistics are really solid, but I also rarely read an article in which unsound statistics undermines the entire contribution.
I do stat mech. Most of the papers I read pay very little attention to assigning a level statistical significance to their "measurements". When they do, assumptions of uncorrelated measurements are always made - and probably incorrectly. I struggle with the statistics myself. I find myself working out of my undergraduate stats text mostly. I feel I'm more concerned with understanding how statistically meaningful my measurements are than most of my colleagues. And I worry about my understanding of the statistical methods I use.
46 & 2
That's irrelevant, because isotope studies in geology are not done on "mud". You don't take the lid off a mountain and find liquid mud that's been sitting there for millions of years (in which case the isotopes would have circulated as you suggest).
The "mud" hardens into a rock. If it's really mud, then it might be mudstone or shale. Anything that's in the mud, like isotopes, is trapped in place in the chemistry of the rock. And believe it or not, geologists do realize that things "leak out" or are otherwise mixed up over time, and this is taken into account.
That said, as a geologist myself I do often think results based on isotope studies are bullshit, but not because the science of isotopes is bullshit. It's because of the problems described in TFA - misunderstanding of statistics - and misapplication of isotope-related techniques.
Your disagreement with public health bullshit is understandable, and I agree to some extent with that. However, I really don't think you understand the chemistry of isotope studies and the principles of geology that make these things valid (when used properly).
It sounds like you are a Truth Seeker who has become jaded because of the basic assumptions underlying science and because our broken human nature does not always treat the scientific results properly. You are pointing out the mess we are in and how Naturalism does not solve the problems. I would encourage you to continue to seek to understand Reality/Truth. It is important. I found that the Christian Faith fits reality best. Consider it. There are presuppositions/assumptions also with Christianity but I believe it does explain reality best and science can fit into that Christian framework.
The largest demographic in american prisons are black americans. Real statistic but is it true?
Given a particular sample that indicates blacks are 60% of the prison population this would appear to be true.
But what if I said: "The largest demographic in prison is minority, non-whites." Suddenly the % jumps from 60% (black) to 80% (minority). Which is more right? This is the problem with statistics. Context.
Now I can say readily that the largest demographic in prison is actually right-handed people. The % now jumps to 90%.
But wait! There is more! The largest demographic is prison is actually people who prior to arrest were below the poverty line which jumps to 99% of the population. Again, all of the above are accurate based on a sample but which is MORE correct? Linear Algebra is coming into play here quickly....
When that kind of issue comes into play, it is the classic "Correlation != Causation" confusion. The majority of people in prison are in there because of "Being black? Being a minority? being right handed? or being poor?" None of the above. The majority of them are in there because they were convicted of a crime and sentenced. That is the causation of their imprisonment, the rest is correlation which may have a direct causation on the conviction or sentencing, but no direct causation on being in prison. (e.g. You cannot be thrown into prison for being poor, black, minority, right handed)
Same with medical research, politics, economics, etc. The price of oil rising 10% and a subsequent 5% drop in shipping orders. Measuring the significance of regessors is important but oddly never reported most of the time. Many factors get masked or shadowed by higher level regressors (e.g. being a minority masks a variety of other social and economic factors. In addition it can distort statistical work by being too broad. Asians have a variety of different economic and social factors as north american blacks versus even african immigrants.)
Back to the orignal subject:
We can take 100 prisoners and 100 non-prisoners and figure out rather quickly if being black is statistically significant in prison population. Non-prison population blacks would account for 25%-45% of the population (Depending on location). We can see that 60% of prisoners are black. There is a 20+% deviation from the norm. We can test to see the significance of that. Same with minorities. Now we find something quickly that right handed is insignificant because it doesn't deviate from the norm. We can test left-handed and right-handed populations and rule out the handed-ness of a convict being significant.
We can find the economic status is considerable MORE significant then minority or black as a status. We can determine that the reason minorities or blacks are disporotinally more prevelant in prison is that blacks and minorities have higher rates of poverty. We can extract and determine the statistical weight of POVERTY in regards to imprisonment (Since we find a high % of white in prison that are poor compared to the normal population.) Once we figure that out we can remove that and continue an investigation and figure out what weight minority and black has once we have removed POVERTY from the model (Residual analysis).
The problem in reporting is without providing the whole, comprehensive analysis you can miss important things. For instance to correct the injustice in sentencing, without reporting the weight POVERTY has in contrast to BLACK or MINORITY you may lose sight that you may have better success addressing POVERTY to normalize sentencing rather then MINORITY or BLACK (or not).
The same happens in medical reasearch. Given a cocktail of drugs wirthout having the whole analysis you may end up providing more of Medicine A versus B but lose sight that A & B are limited by the dosage of Medicine C.
Satistics are not bullshit, rather mearly observations with no intrinsic agenda or even implication of truth. Purely amoral, like a hand gun.. useful to both the good and evil.
Statistics don't lie, nor do they tell the truth. They simple show the relationship of the data as it stands. The Truth or Thruthiness of it is subjective and vulnerable to context.
-=[ Who Is John Galt? ]=-
of doctors and researchers who deal with statistics on a regular basis. My aunt and uncle are both oncologists. My grandfather is an orthopedist. Last year, my grandfather discussed this very issue with me: for the majority of his career, he did not understand statistics well enough to truly gain anything from scientific journals. He could understand things like means, standard deviation, median, etc. But when the literature begins to lean toward more esoteric statistics, he can no longer discern the meaning. He then handed me a book titled The Lady Tasting Tea, which he claims made a great difference in his understanding of statistics and their meanings. I graduated with a BS in computer science, and have taken enough statistics courses that the idea of reading one more word about chi square tests would melt my brain. But I digress. The point is that there is accessible literature out there for people who are not versed in statistics.
Lots of statistical problems seem to be ignored. Papers which blithely present meta-analyses as if they had the power of a single large study. Far too much significance attributed to case-control studies (which magnify small effects and can't, by nature, show causation). And statistical tests which simply don't have the power to show what they purport to be showing.
One example: A study purporting to demonstrate the effect of an event E on a particular variable X. The study took the average of the variable 12 months prior to E (high), and 12 months following E (much lower), and determined that event E reduced variable X. Only problem is that variable X had been declining, and about the time event E happened, that decline reversed and X started going up, though more slowly than it had been declining.
Yeah, the mathematical statistics courses were just chock full of what we called "meds keeners" or "hoovers" ie those seeking admission to med school. Even those majoring in alleged sciences like biology were often shockingly ignorant of hard sciences and tended to fulfill only the minimum requirements in things like chemistry.
I did not dispute the properties of isotopes themselves. When studied in the laboratory, isotopes appear to have predictable properties of exponential decay. (Though, I would add that the more stable isotopes which last for millions of years can only be assumed on faith to have the same properties of exponential decay as more short-lived isotopes. Could there be emergent properties that make them deviate from predictions? We know from the world of survival analysis that longer-lived entities often do deviate; the "proportional hazards" model is inappropriate.)
What I question, as you have, are the untestable postulates involving mixtures of isotopes in the crust. The only way to test this is to make hundreds of planets and wait millions of years. We don't have the technology to do so, so we only have postulates that appear logical. As we know, plenty of ideas sound logically elegant, but fail to work in the real world.
I embrace Christianity as a moral system. I am a Christian in this sense. I'm not here to promote intelligent design or to oppose the theory of evolution as a whole, as the slashdot crowd may wish to label me. My position is that any number of theories can be cooked up about evolution, cosmology, etc., and one can find any amount of data to support their theory. Every theory I've come across relies on a faith in untestable or non-falsifiable postulates. Our physical reality is defined by what we look for; there are any number of legitimate theories based on data that isn't available yet, because we didn't look in the right places, or ask the right questions.
Thou shalt not worship the .05 level.
Correlation does not imply causation -- you need to have some idea of HOW the values are correlated.
Linear regression is only valid when the relationship is in fact linear.
The more variables added to a multivariate statistical model, the greater the likelihood that there will be a spurious correlation.
SPSS will always find something when you tell it to look hard enough.
Lies
Damn Lies
Statistics
'nuff said......
Every theory I've come across relies on a faith in untestable or non-falsifiable postulates. Our physical reality is defined by what we look for; there are any number of legitimate theories based on data that isn't available yet, because we didn't look in the right places, or ask the right questions.
Yes, we don't know Truth/Reality in the full. We only know bits and pieces and we have some concepts that are just wrong. I agree that all of our theories are incomplete and based "untestable or non-falsifiable postulates". And Kurt Godel showed that to be the case even with math with his incompleteness theorems. And yet I believe we can successfully strive to better understand reality, not the many different realities we perceive, but the one true Reality we live in.
Though science has given us a better understanding of reality, it is good to recognize the limitations of science and to question the assumptions, presuppositions and axioms that make up the theories and our beliefs. Ask yourself how well does this theory/belief match reality? If it does a poor job, if possible replace it with one that does a better job. And recognize that because humans can be biased and blind to reality, there are and will continue to be theories promoted that fall far short of reality and/or are based on bad assumptions. Don't let that get you down. Strive to better understand reality. That is the journey I am on and I believe the Christian worldview gives me the best framework to understand reality.
http://www.newscientist.com/article/dn7915
we see what we want to see, we see what we are paid to see
People who deal with raw physical measurements (radar engineers, astronomers, the guy who makes airspeed sensor of the B2--er,um...) have had this problem figured out for a while.
It sounds easy to you cos you have an easy job. You only have a single, easy to measure parameter.
In other fields there can be dozens, hundreds or thousands of parameters, each with it's own signal. Determining which of the signals (if any) are meaningful is a lot harder than what you're doing. What I'm saying is, you're an engineer, not a scientist.
Deleted
When I started teaching I was telling what is in the book. he basic problem of statistics is that we are using ratio analysis which is subjected to error. Statistics is similar to Geometry where you hypothesize if two triangles are congruent or not. Both statistics and Geometry use inductive logic(or thinking), thus are not similar to arithmetic.algebra or calculus. Unfortunately we do not teach statistics as an interesting tool for investigation of non-algebraic system like human behavior, disease behavior etc. I am in the process of getting a patent on my statistics methods and teaching material. I have tested this with over 300 graduate students in blind study with about 85% understand and use the knowledge in their field. Blaming doctors is not right though most hear such statistics from sales people (of medical/ pharmaceutical companies). In general, statistics controls our life and those who don't understand or use become part of the statistics. So, who teaches statistics and how it is taught finally determines the usefulness or misuse of statistics and it is not the fault of the subject itself. When the population being sample is stratified, that is not homogeneous and if all the associated facts are not carefully selected, then statistics tell lies. We get only 40% real information in any situation and about 60% have to be carefully collected or assumed. If the assumptions are wrong and when we collect wrong data, every thing fails. Take for example, brilliant mathematicians and engineers working for the Banks etc., did not take the human behavior of consumers in US, the statistics failed!.