Slashdot Mirror


Why Standard Deviation Should Be Retired From Scientific Use

An anonymous reader writes "Statistician and author Nassim Taleb has a suggestion for scientific researchers: stop trying to use standard deviations in your work. He says it's misunderstood more often than not, and also not the best tool for its purpose. Taleb thinks researchers should use mean deviation instead. 'It is all due to a historical accident: in 1893, the great Karl Pearson introduced the term "standard deviation" for what had been known as "root mean square error." The confusion started then: people thought it meant mean deviation. The idea stuck: every time a newspaper has attempted to clarify the concept of market "volatility", it defined it verbally as mean deviation yet produced the numerical measure of the (higher) standard deviation. But it is not just journalists who fall for the mistake: I recall seeing official documents from the department of commerce and the Federal Reserve partaking of the conflation, even regulators in statements on market volatility. What is worse, Goldstein and I found that a high number of data scientists (many with PhDs) also get confused in real life.'"

4 of 312 comments (clear)

  1. Issues by Edward+Kmett · · Score: 5, Informative

    On the other hand, you also need to use 2-pass algorithms to compute Mean Absolute Deviation, whereas STD can be easily calculated in one pass. And you still need standard deviation as it relates directly to the second moment about the mean.

    Also, annoyingly, Median Absolute Deviation competes for the MAD name and is more robust against outliers.

    --
    Sanity is a sandbox. I prefer the swings.
  2. Standard Deviation is Important by njnnja · · Score: 5, Informative

    Standard Deviation is the square root of the second moment about the mean, an important fundamental concept to probability distributions. Looking at moments of probability distributions gives us lots of tools that have been developed over the years and in many cases we can apply closed form solutions with reasonably lenient assumptions. Then we apply the square root in order to put it in the same units as the original list of observations and get some of the heuristic advantages that he attributes to the mean absolute deviation.

    But it is a balance, and any data set should be looked at from multiple angles, with multiple summary statistics. To say MAD is better that standard deviation is a reasonable point (with which I would disagree), but to say we should stop using standard deviation (the point made in TFA) is totally incorrect.

  3. Re:So you want to retire a statistical term... by mysidia · · Score: 5, Informative

    the mean *absolute* deviation, rather than the square root of the mean *squared* deviation (the standard deviation).

    The mean absolute deviation is a simpler measure of variability. However....

    The algebraic manipulation of the standard deviation is simpler; the absolute deviation is more difficult to deal with.

    Further, when drawing a number of samples from a large population --- the standard deviation of their mean deviations is substantially higher than the standard deviations of their individual standard deviations; that is to say, the standard deviation of a sample provides an estimate that is more in-line with the whole.

    That is to say.... there are cases where the Standard Deviation may be better, AND, much of statistics is using standard deviation as its basis.

    Fisher, R. 1920 Monthly Notes of the Royal Astronomical Society, 80, 758-770:

    the quality of any statistic could be judged in terms of three characteristics. The statistic, and the population parameter that it represents, should be consistent , The statistic should be sufficient, and the statistic should be efficient -- e.g. the smallest probable error as an estimate of the population. Both the standard deviation and mean deviation met the first two criteria (to the same extent); however, in meeting the third criterion -- the standard deviation proves superior.

  4. Re:So you want to retire a statistical term... by Fouquet · · Score: 5, Informative

    That actually was part of my point. In my day job (and night job and weekend job, and, oh god I need a vacation) I'm an astrophysicist. I have more data sets that I can recall, and the number of problems for which I'm confident that the errors are Gaussian is at most 2 or 3. We're finally in an era where computational power facilitates forward modeling & Bayesian techniques that can provide good estimates of true uncertainties. But I (and many of my colleagues) barely understand how they work. Any expectation that most researchers are willing to invest the time to understand anything beyond Gaussian statistics is unrealistic.