Why Standard Deviation Should Be Retired From Scientific Use
An anonymous reader writes "Statistician and author Nassim Taleb has a suggestion for scientific researchers: stop trying to use standard deviations in your work. He says it's misunderstood more often than not, and also not the best tool for its purpose. Taleb thinks researchers should use mean deviation instead. 'It is all due to a historical accident: in 1893, the great Karl Pearson introduced the term "standard deviation" for what had been known as "root mean square error." The confusion started then: people thought it meant mean deviation. The idea stuck: every time a newspaper has attempted to clarify the concept of market "volatility", it defined it verbally as mean deviation yet produced the numerical measure of the (higher) standard deviation. But it is not just journalists who fall for the mistake: I recall seeing official documents from the department of commerce and the Federal Reserve partaking of the conflation, even regulators in statements on market volatility. What is worse, Goldstein and I found that a high number of data scientists (many with PhDs) also get confused in real life.'"
And, in a word full of highly numerate simpletons, one must never forget this.
...because people use it incorrectly in economics? Get bent. The standard deviation is a useful tool for statistical analysis of large populations.
The meaning of standard deviation is something you learn on a basic statistics course.
We don't ask biochemists to change their terms because the electron transport chain is complicated.
We don't ask cryptographers to change their terms because the difference between extra entropy and multiplicative prediction resistance is not obvious.
We should not ask statisticians to change their terms because people are too stupid to understand them.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
On the other hand, you also need to use 2-pass algorithms to compute Mean Absolute Deviation, whereas STD can be easily calculated in one pass. And you still need standard deviation as it relates directly to the second moment about the mean.
Also, annoyingly, Median Absolute Deviation competes for the MAD name and is more robust against outliers.
Sanity is a sandbox. I prefer the swings.
The problem is that people think they understand statistics when all they know is how to enter numbers into a program to generate "statistics".
They mistake the tools-used-to-make-the-model for reality. Whether intentionally or not.
Not even close. As someone just a field or two over from climate science, I gotta say that I've never heard of a data scientist before. They have nothing in common.
error free written right in to our scriptdead pretense.
If the value being measured is a voltage or current the square is proportional to energy (or power) so standard deviation has an important physical interpretation. In other applications it could be worthless. No one measure works for all cases - apply the correct tool for the job.
Standard Deviation is the square root of the second moment about the mean, an important fundamental concept to probability distributions. Looking at moments of probability distributions gives us lots of tools that have been developed over the years and in many cases we can apply closed form solutions with reasonably lenient assumptions. Then we apply the square root in order to put it in the same units as the original list of observations and get some of the heuristic advantages that he attributes to the mean absolute deviation.
But it is a balance, and any data set should be looked at from multiple angles, with multiple summary statistics. To say MAD is better that standard deviation is a reasonable point (with which I would disagree), but to say we should stop using standard deviation (the point made in TFA) is totally incorrect.
There is a great difference between a mean value and an RMS value. Scientific people can work with the appropriate version so I don't see a problem with using the correct one for the correct occasion. And certainly science should stay with the correct term as appropriate.
What I believe the person is calling for here is the most appropriate use when communicating to the non-scientific person. This is an education issue in that the communication really should not use either term as a shorthand but should explain in full the effect of the distribution. Science uses mean and standard deviation (often also requiring a named distribution) because they are shorthands that describe the random behavior and have full meaning without any other explanation needed. So I say use neither term when communicating to the non-scientific as they do not fulfill the communication role to which they are intended.
What I believe should actually be done is proper education of all so that they understand the differences between various random distributions and move totally away from a "it is cold today, so global climate change based on heating must be a lie".
The confusion started then: people thought it meant mean deviation. The idea stuck: every time a newspaper has attempted to clarify the concept of market "volatility", it defined it verbally as mean deviation yet produced the numerical measure of the (higher) standard deviation.
First, when the media reports on a scientific discovery, they report the researcher's stats - not they're own. So, if there's an error with the interpretation of STD, then it's the original researchers'.
Do you take every observation: square it, average the total, then take the square root? Or do you remove the sign and calculate the average? For there are serious differences between the two methods.
I suggest he publishes in a peer review journal instead of ... WTF is 'edge.org'???
If there are "data scientists" who don't understand what the standard deviation is, then they certainly shouldn't be calling themselves "data scientists," and quite possibly not scientists at all. What subjects are their PhDs in, I wonder? This doesn't do anything to reduce my skepticism that such a thing as "data science" really needs to exist.
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
Given that most of the buzz about "data science" seems to be in the business world, I'd say it's more lilkely they're corporate hacks working for the propaganda machine that's so effective on suckers like you.
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
Yes, use the interquartile range instead https://en.wikipedia.org/wiki/Interquartile_range
It is like the median a very robust method, not readily influenced by outliers. https://en.wikipedia.org/wiki/Median
The median is wickedly robust, with a breakdown point at 50%, meaning that you can throw a huge a mount of junk data at it and it still doesn't care.
The arithmetic mean and the standatd deviation are both junk, often worse than the too-often-assumed-normal data thrown at it.
We should not ask statisticians to change their terms because people are too stupid to understand them.
I've always wondered about this attitude.
For me, any change requires an analysis of risk/reward versus value. For example, if code contains confusing names, it might be worthwhile to refactor it.
The tradeoff is in the time spent refactoring versus the perceived value - if it's a mature product that largely works with few planned updates and few people will have to deal with the confusion, then the effort outweighs the returned value. If the code is open source, being actively developed and with many eyes looking at it, there may be a great deal of value in making it easier to understand.
The same could be said of English versus Metric measurements. Why should the US change to use the new system when everyone understands the one we have?
If the Federal Reserve sometimes gets it wrong, there may be great value in changing terms. The effort to fix the mistakes people make might be a good deal less effort than changing the terms used by a subset of mathematicians.
You can look at the big picture and see changes that would return a large overall/distributed value, or you can look at small groups and see that making those changes would cost them time and effort.
Is it too much to ask statisticians to look at the big picture?
That's a good enough replacement term.
---- The above post was generated by the Turing Institute. Maybe.
Properly educating the world on this problem would likely take no more effort than convincing everyone to stop using standard deviation. To that end, why eliminate something that (apparently) has widespread use?
My college math teacher made a special point of warning us that journalists almost always mix up pct and pp. Sometimes they even do that on purpose!
If you don't like the term "standard devation", use "margin of error" instead.
-Bob-
I don't know which is more foolish, thinking that saying nothing, but saying it first, is a worthwhile goal, or claiming to be first when you're not. No need for you to choose, however: you did both.
No, it's more likely to be a public relations firm in the guise of a "grass roots" organization in the pockets of big oil. But we digress.
If there are "data scientists" who don't understand what the standard deviation is, then they certainly shouldn't be calling themselves "data scientists," and quite possibly not scientists at all. What subjects are their PhDs in, I wonder?
The problem isn't with highly-educated people, it people who are not highly educated, or who are highly educated but in a different field.
If a particular intersection attracts a lot of accidents, we consider the accidents to be the fault of the drivers involved. But at the same time, we recognize that aspects of the intersection might be a contributing factor as well.
Expert drivers would never have such accidents, but if we spend some effort reblocking the intersection we could get improved safety, and sometimes there is value in doing this.
Like the roadway intersection, if a term is so confusing that average people make mistakes because of it, there may well be value in changing to easier-to-understand terms.
First!
... to within 0.5 standard deviations.
Actually, the more posts this story attracts, the more accurate your statement is, and the fewer standard deviations you are away from true first. Response times not being distributed in a Gaussian curve perhaps complicates things.
Perhaps non-mathematicians don't have a problem with this, but it rubs me the wrong way.
What makes the mean an interesting quantity is that it is the constant that best approximates the data, where the measure of goodness of the approximation is precisely the way I like it: As the sum of the squares of the differences.
I understand that not everybody is an "L2" kind of guy, like I am. "L1" people prefer to measure the distance between things as the sum of the absolute values of the differences. But in that case, what makes the mean important? The constant that minimizes the sum of absolute values of the differences is the median, not the mean.
So you either use mean and standard deviation, or you use median and mean absolute deviation. But this notion of measuring mean absolute deviation from the mean is strange.
Anyway, his proposal is preposterous: I use the standard deviation daily and I don't care if others lack the sophistication to understand what it means.
You forgot making a blatantly late fp just to elicit a reaction from someone, otherwise known as trolling. Someone with only six digits should know better FFS.
I also think averages should go away. Most people think they are being reported the median (the number in the middle) when people tell them the average. It's great for real estate agents, and people trying to advocate for tax reform, but the numbers are not what people think they are.
They are vital to getting meaningful information out of a sea of data. Cancer research and particle physics use data scientists. Unfortunately so does amazon.com.
Well, given that they think it's a great idea to take two different data sets measured in the same units, but measured in completely different ways, and put them together as a comparison over time then I'd say the definition of deviation is the least of their worries.
Food for thought: "Revisiting a 90-year-old debate: the advantages of the mean deviation"
http://www.leeds.ac.uk/educol/documents/00003759.htm
When taking measurements (such as protein concentration in blood) we are forced by the magazine editors to inform SD as an error estimate. That is in my view plainly wrong, as the SD is an estimate of the population variance. I try to use what is known as standard error of the mean (SEM) (mean deviation in TFA).
Didn't Taleb warn us about the perils of modeling things with normal distributions that fail to capture outliers ("Black Swans") and yet now he advocates the use of a stastical measure that conceals^H^H^H^H^H is robust with respect to outliers?
Oh well, next year he'll probably come up with something along the lines of "Monte Carlo methods major cause of global warming, return to analytic methods and moments unavoidable truth"...
The problem is not that standard deviation is confusing. The problem is that sociologists need to learn how to apply statistics. Either the majority of sociology PhDs are ignorant of statistics, or they've mastered the art of selecting a politically desirable conclusion and misapplying statistics to support it.
I find this article quite confusing. Is the actual suggestion that we should be going around using the mean deviation as a way of capturing the general variance of our data sets? Or to put it another way, does he want "deviation" measures not to give us a real sense of the larger deviations that might occur with some real probability. For example, with temperatures, standard deviation is more likely to suggest that we can have periods of significantly higher and lower temperatures than a simple "mean deviation".
Adding to my confusion is that there is no reference to articles, books, or other subject material that supports the general thesis. If the "mean deviation" is better than the "std deviation", give some real concrete examples and supporting mathematics.
Also, there seems to be no reference to "bell curve" distributions and "non bell curve" distributions. Standard deviation computations are built around bell curve distributions for their mathematical soundness. For example, if I were to take every number and raise it the fourth power, standard deviation would not work so well on this new set of numbers. Is the author suggesting that typical sampling distributions of sampled events tend not to be "bell curve" like?
Standard deviation is taught in 7th grade in my local school. It shows up constantly in any standard K-12 curriculum. To challenge this, you really should bring a lot more substance to any argument that we should do things differently.
For example, I could argue that we should use 1:2 to represent 1/2 because the slash (/) should be used for logical dependency arguments instead. I could create lots of examples and go into a diatribe about how people constantly misuse fractions and ratios because they use a slash in their construction. But I would still be spouting nonsense.
Well... first of all, summary has it wrong. It's not "mean deviation", it's "mean absolute deviation", or just "absolute deviation" from the literature I've seen. (Mean deviation is actually always zero, the most useless thing you could possibly consider.)
Keep in mind that standard deviation is the provably best basis if your goal is to estimate a population *mean*, the most commonly used measure of center. Absolute deviation, on the other hand, is the best basis to use for an estimate of a population *median*, which is maybe fine for finances, which is what the linked paper seems mostly focused on. (Bayesian best estimators, if I recall correctly.)
If the main critique is that economists and social scientists don't know what the F they're doing, then I won't disagree with that. But no need to metastasize the infection to math and statistics in general.
We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
The example in the article isn't even an example of a standard deviation. He may have plugged his five values into the RMS formula, but what it produced isn't an actual standard deviation because five values is too small of a sample size.
This article is really a demonstration of why people should stop misusing the term "standard deviation" than it is an argument of why people should stop using standard deviation.
-Glires
I studied geodesy in germany as diploma on a technical university. Standard deviation has its right to exist and to be there and to be used. If this man really means what he says he should not say to abandon standard deviation but to write BOOKS that teach people correctly what it is and how it is calculated on the data which you have. Yes I also meet people (talking of themselves as scientists and researchers) who have no fucking clue how to work with data and standard deviation, but on the other hand I also meet alot who know and also derive the right conclusions, formulars or algorithms out of these. For me this guy sounds like a mad panda who just didn't get it right...
He says that standard deviation gives too much weight to tail events.
But I think the bigger problem in finance is under weighting tail events.
When I was in school, they still taught the central limit theorem which explains why so many error distributions are "normal". Our world provides us with millions of examples in everyday life where the standard deviation of our experiences is the best statistic to estimate the probability of future events.
What you do with a statistic is what counts. It's easy to look at the standard deviation and estimate the probability that the conclusion was reached by chances of the draw, though it takes some practice to develop your intuition. It is imbedded in our language when we talk of "6 sigma" reliability or " 4 sigma" thinkers. Anyone who thinks he is a scientist should understand this!
Mr. Taleb may be working in a field where normal distributions are rare, but the probability is he is either lying or poorly educated.
I agree that mathematicians may become imprinted on standard deviation and forget that it is only used because it is easier to work with than average absolute deviation (ex: the derivative of x^2 is continuous, unlike abs(x)), and that less technically inclined readers might not realize there is a difference. However, they ARE usually pretty close (I don't have a reference, but I once ran a simulation comparing the 2 using random data with a Gaussian distribution and the curves matched exactly), and its harder to find exact solutions with average absolute deviation. On the other hand, it wouldn't hurt to use "MAD" occasionally on a data set to make sure that the standard deviation gives results that are meaningful as a measure of "deviation".
Cep of 50% and 10 meters means half the missiles land on your house, and half on your neighbors. A fine measure
That's what he concludes at the bottom of the article. He starts the article by saying that standard deviation should only be used by physicists, mathematicians, and mathematical statisticians. If I'm not mistaken, "physics" and "math" covers a whole lot of different fields, including most of the STEM fields that (largely) define the users of this site.
I know in my particular field (physics based), standard deviation is a hell of a lot more useful than mean average deviation. And easier to use.
Bah. I call poor summary.
It may look like I'm doing nothing, but I'm actively waiting for my problems to go away.
--Scott Adams
Cancer research and particle physics use data scientists. Unfortunately so does amazon.com.
Okay, since cancer research is a very large field, I can't say for sure one way or the other ... but I do know that working in bioinformatics at a major academic research center, I've never known a single person in medical research of any kind who called themselves a "data scientist." We have lots of computer scientists and statisticians, most of whom, fortunately, get along well enough to make use of each other's strengths. Regarding particle physics I have no idea, but yeah, I'm willing to bet Amazon or any other large corporation hires more "data scientists" than all the scientific institutions in the world put together--and gets exactly the kind of buzzword bingo they're paying for in return.
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
Have you checked to see if you're oscillating while it is in use? That might be why you're not inside it, I think you're supposed to be both the particle and the wave unless you check.
Unfortunately so does amazon.com.
"Amazon.com: Turning your Main Streets into Skid Rows since Walmart made it fashionable — with a computer!"
"Choke down your Super-size while your city chokes on down-size — shop Amazon!"
"Amazon — named after the lumber used to package your shit, bro! High-five!"
If you read the post his description presumes you have 1) already calculated the mean and found it is zero thereby simplifying the calculation 2) "average the total" whatever that is, when he really means total the squares of the values and divide by the number of values less one. This is pretty poor language from a professor writing a post about confusion. Nevertheless I think his argument is based on a lay person's intuitive misunderstanding of SD, and maybe if that is true, and I suspect it is, MAD might be a better measure to report in newspaper articles?
This guy is not a statistician.
http://www.edge.org/memberbio/nassim_nicholas_taleb
He is someone who *uses* statistics, just like the scientists he criticizes, albeit with probably a fair bit more understanding of the underlying mathematics. But nonetheless, it is wrong to label him a "statistician" just to add unwarranted authority to his words.
If I read a non-scientific article that spewed out standard deviations I would automatically disregard the numbers anyway. It is a safe assumption that a journalism major doesn't understand what they're writing about and just adding filler to boost word count.
I am becoming gerund, destroyer of verbs.
For normal densities, standard deviations and MAD are just proportional, with a factor of about 1.25, so it doesn't matter which you use.
For non-normal densities, neither of them really is universally "right" for characterizing the deviation, but it's mathematically a whole lot easier to understand how standard deviation behaves in those cases than MAD. So even there, standard deviations are usually the better choice.
Standard deviation is a much better measurement of the spread of a distribution than the mean error. There are many great mathematical properties (mostly originating for Pythagoras theorem) that emerge in using the squared errors. Further, using the regular mean error can more often result in non unique values. For example, the MAD for the distribution -4, -4, 4, 4 will result in the same value as for the distribution of -8, 0, 0, 8. In most cases, different decision will be made between these two distributions. I.e. If it's stock prices, the latter is normally considered more volatile than the former.
The celebrated Heisenberg uncertainty principle in quantum theory is based on statistical statements about the coupled standard deviations of position and momentum measurements (for example), not the mean deviation. The mean deviations are assumed to be zero since the means of the position and momentum distributions are exactly known for theoretical work. What matters are the fluctuations about the mean. In fairness, Taleb does allow physicists to keep using STD. But, quantum mechanics aside, it seems characterizing fluctuations about the mean, rather than fluctuations of the mean, is often an important measure depending on the nature of the investigation. Retiring the standard deviation seems a bit hasty.
i\hbar\dot{\psi}=\hat{H}\psi
It weights by the difference between the observation and the mean, by the variation. So large observations are not weighed any more than small ones. Two observations equally far from the mean get equal weight.
Widely varying observations do get higher weight and that is intentional. Standard deviation is that way because it is so useful in analysis of variance and measuring likelihood of statistical significance.
http://lkml.org/lkml/2005/8/20/95
...and besides... JUST THINK of all the rigorous Lean Management courses that will have to re-certify all of their "Six-Sigma Black Belts" to some kind of "Half-Dozen of the Other" degrees!
PANDEMONIUM!!!
First! to Help you out with that. ;)
Where does it say that?
Agreed that this is a ridiculous proposal. He probably just wants more publicity.
This was the guy who wrote the book "Anti-Fragile", which I had hoped would educate and broaden my way of thinking, in the same way that the Malcolm Gladwell books ("Tipping Point", "Blink", "Outliers") did. He ended up droning on and on without really making a worthwhile point, and I gave up after a while.
404555974007725459910684486621289147856453481154 in hex is "You sank my Battleship?"
[GPG key in journal]
I know several people who have left high energy physics to become data scientists. Nobody in HEP calls themselves a "data scientist", but that's (some of) what we do anyway. It's just analysis of very large data sets. Unlike in the life sciences, both HEP and many commercial / industrial environments have sufficiently large data sets that very complex questions can be asked and answered. You can never have "enough data" -- if you think you have "enough data", then you aren't asking hard enough questions.
SIGSEGV caught, terminating
wait... not that kind of sig.
And here's /. advertising yet another blog article.
Step 1 Start bullshit blog with paid advertising.
Step 2 Get blog article on front page of /.
Step 3 Profit!
"If people believe that they have to give up a comfortable lifestyle to reduce carbon dioxide emissions, they will look for any evidence that AGW is incorrect, no matter how flimsy it is. You can see this behavior for what it is when people cling to a mistaken idea for dear life."
The above reminded me of something from Nassim Taleb's writings. Those who have read his books may be familiar with the following Upton Sinclair quote: "It is difficult to get a man to understand something, when his salary depends upon his not understanding it." NNT applies this principle to financial 'experts' (quants, stockbrokers, advisors, etc.) who do things that are demonstrably counterproductive (applying stat methods that assume Gaussianity to non-normal distributions; disregarding the randomness inherent in stock movements) not necessarily out of ignorance, but largely because such actions serve their economic benefit. In all areas, people often disregard evidence when doing so serves what they perceive as their immediate interests.
Data science is a field that combines machine learning and statistics to derive meaning from data. Data scientists should be reasonably well-versed in classical stats, but the data sets they deal with are often huge, ill-defined, and not amenable to analysis using classical methods. To deal with such challenges, data science recruits a healthy combination of certain areas of comp-sci (databases, machine learning, NLP, AI), statistical methods, and, quite often, improvisation.
Strange that there are so many people on here that are unfamiliar with data science.
Concerning his education credentials: he's got a U. Penn. MBA and a U. of Paris doctorate, and currently teaches at NYU Polytech. If you want to know his thoughts on normal distributions, stats, epistemology, econ, and the social sciences, his books are excellent, and are well worth a read (although much of the best material is quite derivative of Mandelbrot). NNT may be called an anti-academic, anti-econ establishment crank, but it would generally be inaccurate, in accordance with your inference, to accuse him of lying.
odd. i would suggest the exact opposite
I can really go for renaming standard deviation, but it should not be abolished.
Standard deviation is a function of the second moment of the data, and if you remember your laws for combining moments of inertia (the parallel axis theorem), then you'll understand better what you're dealing with.
2nd moments detail resistance to spin, and thus the resiliance of your findings to changes and errors.
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
Calculating the standard deviation of a data set with only one pass over the data as you initially collect it is fairly straightforward. This is ideal in situations where the data you are working with is ephemeral, or of unbounded size and impractical to store every individual sample. How do you calculate the mean deviation without having to go back and revisit all of your samples?
File under 'M' for 'Manic ranting'
Naw, you'd probably need a Poisson distribution ;-).
"[I]t is a wise man who admits the limits of his knowledge or skill, and that pretending either causes harm." --Terry Go
Perhaps we should be looking at MAD statistics more often when summarizing or describing data. However, the standard deviations are very useful in Statistical Inference . Standard deviations are always reported with the parameter estimates. Now this is really useful because the parameter estimates are assumed to be approximately normally distributed either due to the Central Limit Theorem or by assumption of iid normal disturbances. Under the standard normal distribution, two standard deviations account for 95% of the coverage probability, so just by glancing at the standard deviations you know roughly the confidence terminals and also the outcome of a simple z or t test of a hypothesis about the given estimate.
Do you take every observation: square it, average the total, then take the square root? Or do you remove the sign and calculate the average?
WTF! - he's managed to get both the definition of standard deviation and mean absolute deviation wrong.
Beats me, but everyone knows there are 20 fluid ounces in a pint.
And neither does the media-consuming public. Most would totally ignore your measure of precision regardless of whether you call it standard deviation or mean absolute deviation. For them your average is absolute and if any values aren't at all near it something is terribly wrong. They will also not rest until every school performs above average and nothing in your work will convince them otherwise. The public doesn't like uncertainty and will assume every outcome is for a special reason, and this even goes for the non-religious ones. The idea that some things aren't absolute and are actually uncertain and variable terrifies them.
Nowhere is this more apparent than in sports. Everything there is always "written in the stars" or "destiny" and if you win it always proves beyond doubt your are better than your opposition (or you were 100% cheated by the refs). Hell, journalists may have had a full article written up 2 minutes before the end of a game and then completely change everything to be about one team's dogged determination because chance would have it they scored in the last minute. I love football (soccer), but discussing it can be frustrating.
If you still believe you can convince them, use mean absolute deviation in your "executive summary" or press release and leave the standard deviation as is in your actual paper. The only ones that actually read the paper are scientists anyway. The typical journalist reading your actual paper is likely to misunderstand something in every paragraph anyway. Changing real science to pander to the masses is a fucking huge mistake.
In TFA, Taleb claims that, for the set of differences from the mean [-23, 7, -3, 20, -1], the standard deviation is 15.7, but it's actually 14.1!
I'm highly surprised that someone with such a reputation would makes such a mistake in such an article...
Why on earth would he suggest that the standard deviation is an *average*? It's an average deviation *from* an average.
I get that the concept of an average deviation from an average is highly confusing, to the point of apparently confusing the guy himself. Or his apparent message, I should say.
Incidentally, for data points much as the ones he suggests (-23, 7, -3, 20, -1), I consider the most sensible average to be zero. I'm sure there's a nifty name for it. It happens to be the mean distance from the lowest sample, i.e. (0 + 30 + 20 + 43 + 22) / 5.
I know some people who try to use a screwdriver as a hammer, and to open buckets of paint. If we follow the logic of the article, since that's not what screwdrivers are for, everyone else should stop using them.
From a Bayesian view -- we would prefer not to quote any "statistics" anyway, either SD or this MD. Rather, we should infer the posterior distribution over the variable of interest, and use it directly to inform our action based on utility. For just reporting, we can show the full posterior as a picture, or if it's a Gaussian, quote it generative mean and variance parameters, which enable a reader to do anything they like with them. (The square root of the variance, sigma, is generally understood to describe the region around the mean where 68.2% -- a one-sigma portion -- of the generated data would fall. And it appears directly in the posterior Gaussian generative equation in a way that MD does not.)
You mean that there are scientists publishing papers out there that don't understand basic concepts of statistics? Like, the rest of the world?
I'm truly 99.9999% e+/-0.5 shocked!
"I think this line is mostly filler"
A standard deviation is something kinky everyone should try at least once.
Tiller's Rule: Never use a word in written form that you've only heard and never read. You will end up looking foolish.
(what's new right?)
I never even thought to conflate Std Deviation and Mean Deviation prior to reading this article/summary. I just thought of Std Deviation as that bit of the normal distribution which captures ~68.2% of the values (for +/- 1 sigma). And Yes, I knew how it is calculated, my mind just didn't go that direction.
McFly777
- - -
"What do people mean when they say the computer went down on them?" -Marilyn Pittman
I worked for one of those guys once. Not long after I was hired, he eagerly explained to me that sigma is how many nines there are after the decimal point.
If he doesn't like standard deviation, then what he wants is the standard error of the mean. The mean deviation is meaningless, unless you're Al Gore perhaps
Using the mean deviation is kind of like kissing your sister. Nice, but it doesn't go anywhere. Significance testing is right out, for starts.
So what day of the year should be Mean Deviation Day?
Physicist here...
I should think the travesty in this article is an economist not making a huge deal about the real issue here and that is measures of central tendency (any measure) only really makes sense when you're looking at gaussian type data (don't economists have fat-tail debacles etched into them at school???). Using a mean and a standard deviation, rmse or whatever to encapsulate a power law distributed thing is dangerous when you start USING it for something (like derivatives pricing). Power law distributions are more prevalent than popularly imagined... Use care when using measures of central tendency on them.
There is nothing standard about my deviation. But I can root a mean square.
I like that little poke at journalists:
In other words, it's not just journalists who fall for the mistake, so do educated people.
The real danger comes not from a 50% confusion between standard deviation and mean absolute deviation; but from the assumption that the statistical distribution is Gaussian.
Before the credit crunch, financiers who considered themselves "masters of the universe" believed on the basis of the Black–Scholes equation that they could hedge their risks with a mean time to failure of billions of years. The probability distributions were assumed to be Gaussian, but this bore no relation to the past performance of the stock market.
You just have to choose the statistic which gives you the answer that you want and always insist that your opponents are using the wrong numbers or do not possess the understanding of statistics to hold an opinion. In the end the guy who shouts the loudest will win the day. We see it every day in the Climate Debate, which of course does not exist because "the debate is over!". [/semi-sarc]
He described a simple RMS calculation... but he calculated standard deviation of a sample. (It's the difference between STDEVPA and STDEVA in Excel, or a difference of n or n-1 in the denominator of the equation before the square root.)
The true standard deviation over the five values given is actually 14.05 not 15.7...
It made it hard for me to keep reading when his first description and number are wrong...
poleguy
I like that little poke at journalists: ...
I assume you have met a journalist, before? 8-}