Why Standard Deviation Should Be Retired From Scientific Use

← Back to Stories (view on slashdot.org)

Why Standard Deviation Should Be Retired From Scientific Use

Posted by Soulskill on Wednesday January 15, 2014 @09:48AM from the hope-it-gets-a-good-pension dept.

An anonymous reader writes "Statistician and author Nassim Taleb has a suggestion for scientific researchers: stop trying to use standard deviations in your work. He says it's misunderstood more often than not, and also not the best tool for its purpose. Taleb thinks researchers should use mean deviation instead. 'It is all due to a historical accident: in 1893, the great Karl Pearson introduced the term "standard deviation" for what had been known as "root mean square error." The confusion started then: people thought it meant mean deviation. The idea stuck: every time a newspaper has attempted to clarify the concept of market "volatility", it defined it verbally as mean deviation yet produced the numerical measure of the (higher) standard deviation. But it is not just journalists who fall for the mistake: I recall seeing official documents from the department of commerce and the Federal Reserve partaking of the conflation, even regulators in statements on market volatility. What is worse, Goldstein and I found that a high number of data scientists (many with PhDs) also get confused in real life.'"

10 of 312 comments (clear)

Min score:

Reason:

Sort:

So you want to retire a statistical term... by Anonymous Coward · 2014-01-15 09:53 · Score: 5, Insightful

...because people use it incorrectly in economics? Get bent. The standard deviation is a useful tool for statistical analysis of large populations.
1. Re:So you want to retire a statistical term... by Fouquet · 2014-01-15 10:50 · Score: 5, Insightful
  
  +1 this. The problem here is the author's impression that "social scientists" and economists are scientists. The groups that he excludes in the first paragraph (physicists) are scientists. Anyone attempting to implement a statistical model designed for a large (and Gaussian) data set on a small number of data points (as the article's example does) should expect to get an answer that is at best marginal. Any scientists who ever received even the most basic of statistics and/or data analysis training knows this. Understand the problem first, then take enough data points, then carry out your statistical analysis & formulate conclusions.
Basic Statistics by TechyImmigrant · 2014-01-15 09:55 · Score: 4, Insightful

The meaning of standard deviation is something you learn on a basic statistics course.
We don't ask biochemists to change their terms because the electron transport chain is complicated.
We don't ask cryptographers to change their terms because the difference between extra entropy and multiplicative prediction resistance is not obvious.
We should not ask statisticians to change their terms because people are too stupid to understand them.

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
1. Re:Basic Statistics by Fly+Swatter · 2014-01-15 10:41 · Score: 3, Insightful
  
  Someone should tell that to the lawyers!
That's not the problem. by khasim · 2014-01-15 09:56 · Score: 4, Insightful

The problem is that people think they understand statistics when all they know is how to enter numbers into a program to generate "statistics".
They mistake the tools-used-to-make-the-model for reality. Whether intentionally or not.
1. Re:That's not the problem. by JoeMerchant · 2014-01-15 10:13 · Score: 5, Insightful
  
  The problem is that peoples' attention spans are rapidly approaching that of a water-flea.
  Up until the past 50 or so years, people who learned about Standard Deviation would do so in environments with far less stimulation and distraction. Their lives weren't so filled with extra-curricular activities and entertainments that they never sat for a moment from waking until sleep without some form of stimulus based pastime. When they "understood" the concept, there was time for it to ruminate and gel into a meaningful set of connections with how it is calculated and commonly applied. Today, if you can guess the right answer from a set of 4 choices often enough, you are certified expert and given a high level degree in the subject.
  Not bashing modern life, it's great, but it isn't making many "great thinkers" in the mold of the 19th century mathematicians. We do more, with less understanding of how, or why.
2. Re:That's not the problem. by TsuruchiBrian · 2014-01-15 11:45 · Score: 4, Insightful
  
  I think it's also true that a larger percentage of people are going to university, so the average "intelligence" of people in university in terms of natural ability is probably lower now than when it was just the very best students attending.
  Most of the mediocre students today would have simply not gone to university in the past. I think the same principle holds when it comes to things like blogs. The fact that public discourse can sometimes make it seem as if people are getting dumber, when it is really just that more and more people know how to read and write and can now even be published, whereas in the past, there was a higher cost to publishing, and you were more likely to have something important to say before being willing to incur that cost.
Re:Standard Deviation is Important by neonsignal · 2014-01-15 11:13 · Score: 3, Insightful

I'm a little surprised at Nassim Taleb's position on this.
He has rightly pointed out that not all distributions that we encounter are Gaussian, and that the outliers (the 'black swans') can be more common than we expect. But moving to a mean absolute deviation hides these effects even more than standard deviation; outliers are further discounted. This would mean that the null hypothesis in studies is more likely to be rejected (mean absolute deviation is typically smaller than standard deviation), and we will be finding 'correlations' everywhere.
For non-Gaussian distributions, the solution is not to discard standard deviation, but to reframe the distribution. For example, for some scale invariant distributions, one could take the standard deviation of the log of the values, which would then translate to a deviation 'index' or 'factor'.
I agree with him that standard deviation is not trustworthy if you apply it blindly. If the standard deviation of a particular distribution is not stable, I want to know about it (not hide it), and come up with a better measure of deviation for that distribution. But I think the emphasis should be on identifying the distributions being studied, rather than trying to push mean absolute deviation as a catch-all measure.
And for Gaussian distributions (which are not uncommon), standard deviation makes a lot of sense mathematically (for the reasons outlined in the parent post).
Re:The big picture by reve_etrange · 2014-01-15 12:21 · Score: 3, Insightful

I think NNT is saying that the MAD ought to be used when you are conveying a numerical representation of the "deviations" with the intent that readers use this number to imagine or intuit the size of the "deviations." His example is that of how much the temperature might change on a day-to-day basis. According to him, it's not just that the concept is easier to explain, but that it is the more accurate measure to use for this purpose.
Based on his other work I'm sure he understands that the STD is generally superior for optimization purposes, fit comparison, etc.

--
.: Semper Absurda :.
Re:Would those data scientists with PhDs by The_Wilschon · 2014-01-15 13:44 · Score: 3, Insightful

I know several people who have left high energy physics to become data scientists. Nobody in HEP calls themselves a "data scientist", but that's (some of) what we do anyway. It's just analysis of very large data sets. Unlike in the life sciences, both HEP and many commercial / industrial environments have sufficiently large data sets that very complex questions can be asked and answered. You can never have "enough data" -- if you think you have "enough data", then you aren't asking hard enough questions.

--
SIGSEGV caught, terminating

wait... not that kind of sig.