Gaussian Distribution being questioned
Robert Wilde writes "The Financial Times is reporting in two stories that a group of scientists have discovered that any scale-independent system does not follow the traditional Gaussian Bell Curve but a new curve. " Interesting implications-for above systems. For what I can gather from the article, for those systems in which this curve is more appropriate, rare events will occur more often then predicted by the Gaussian distribution. Anyone have more comments on this?
The curve is weird looking, but still readily quantifiable. Besides the mean, and standard deviation, you can use skew (un-centeredness) and kurtosis (bulging in the middle) to describe how different a given curve is from a bell-curve.
More interesting though, is that fact that the curves shown in the ft.com article weren't properly normalized; comparing these graphs visualy doesn't begin to show what the differences are, and the axis on the graphs didn't make to much sense. It x = "rarity" then what does y correspond to. Typically you would show y and frequency and x as a value, and from this you would determine rarity.
anyway, my two cents
m.d.
-m.d.
Easily. this is calles slashdot-effect.
The journalist completely missed the point. There are already lots of other curves that you can use to fit 'non-normal' data (e.g. Weibull, Gamma, etc) that could be more appropriate than a normal curve. The link with chaos theory and the new distribution is interesting though.
>
'There is no intellectual exercise that is not ultimately useless' - Jorge Luis Borges
>
This is the driest article to appear on slashdot ever. As Hemos gets closer and closer to grad school the articles get drier and drier. How about doing something interesting, like teaching lab mice how to roll over.
I admit the popular journalism of scientific is lacking (from the articles given here it is really impossible to form any informed opinion about the scientific validity/merit of the work reported on and no references to that work are given) I quickly checked the first name in the article (Donald Turcotte) and indeed he has been published in a number of peer reviewed journals including Science, about self organized critical behavior. What I don't understand is how you can judge whether the scientists involved in this research are doing good work or quackery based on one popular press article and not try to examine the facts before jumping to conclusions. I'm not saying what is described in the articles is right or groundbreaking, what I am saying is that these articles alone don't give nearly enough information to form a reasonable opinion (although I don't have enough time to go and do a full literature review to form an informed opinion either)
Don't blame the scientists for poor reporting.
-------- This space intentionally left blank --------
The interesting thing here - and it has been long known but often ignored, is that there some measurable things in life . for which the average value, or the standard deviation cannot be calculated, because no matter how many values you average together, then along comes a bigger value to throw off your average. (Another word of saying the tails of the distribution are bigger than usual)
Search for Pareto or Paretian distributions and you can find a few equations that deal with the same thing.
But the article cited was kinda dumb, because in this case, they just found another equation to misuse.
Turcotte has a 2-part paper on "Self-Affine Time Series" in a recent /Adv. Geophys./ that looks, by timing and title, as if it would provide the technical information those who are really interested would want.
I decided that behaving ethically was the most nihilistic thing I could do. - Paul Pavel
Do not suffer too much. That article is garbage. Read some books on related subjects instead. As an example Mandelbrot is pretty funny and many parts of his writing are accesible for a person without strong mathematics/statistics background..
<^>_<(ô ô)>_<^>
Not to be too crazy, but if this holds up, and others find this curse, it is exceptional. The basic curves of life, and chaos. This is the stuff that explains why a seashell and a universe have the same design. Chaos theory and quantum mechanics both show a certain unpredictability to reality. Science like this shows there is some underlying pattern. At the very least this is extremely interesting, at least for all of us that want our own universe some day.
+&x
That's not an observation or a fact.
You miss, sorry. This is only true for a finite variance variables. Observables in nature are not required to have finite variance - there are plenty of cases when they do not.
You position is typical for those who just took some statistic classes, but never bothered to check the fine print and understand what it means (no insult, please) But be careful when you make strong statements in public. They sound funny.
Check some references on "stable distributions" in statistics. For physical examples do a search on "Levy flights"
Also Gaussian distr. is not "normal" in a sense of its frequent occurence. I would claim that scaling, or "power law" as physicists refer to it, distribution is much more common in natural phenomena.
<^>_<(ô ô)>_<^>
Statisticians have said for ages that not all data follows the normal (a.k.a. gaussian) distribution. We even have names for the ways in which distributions differ from the normal. Skewness describes distributions where one tail is stretched out in one direction longer than the other like this, or this more extreme example.
Kurtosis describes the "thickness" of the tails in comparison to the height of the centre of the distribution. (i.e. this has more kurtosis than this.
So, with some distributions, the chance of rare events is greater than some others.
Secondly, in the financial times (not my usual choice of statistical literature) articles there seems little link between the "universal curve" stuff and the distribution other than the normal.
---- "First came stats, pulling habits out of rats
note: this is all dependant on whether this is actual or some disillusioned scientists. I tend to beleive it, mainly because these scientists would most likely not be the type to publish normally, but until I see it from another source I won't totally believe it, that being said, let me argue like I do.
:)
:)
Let's say one night you watch the results of the lottery on TV, and the numbers '1-2-3-4-5-6' come up. Is that a rare occurence? No. That sequence is as likely to occur than your birthday and your girlfriend's birthday combined into esoteric equations.
Example number 2: I'm with this girl one night. I say my astrological sign is Scorpio. "Really!" she exclaims, "I'm Scorpio too!" What are the probabilities of that happening? 1/144? No, just 1/12. At one point (and cryptos will be familiar with this) if you add people, it becomes a rare event that you do not find people with the same sign.
Both of the examples you give here are actually rare occurences, not the number series themselves, but the fact that you recognize them as special series. You note their occurence as extremely rare (the water cooler talk if the lotto was 1-2-3-4-5-6!!) thus in fact making them rare.
These guys were both looking at special curves, in fact random , that turned out to be the same. That is significant in the number of other patterns that can, or cannot, be explained. At the very least this will cause your insurance rates to go up
We're 6 billion on this Earth. It's bound to happen to someone. Same thing with winning the grand prize lottery once or twice.
That's what the story said, very rare occurences are more likely. Check out the Drake Equation if you think that couldn't be significant
cold fusion
this is different (so far) in that it was two totally seperate areas of study that found the same thing, not some freaks in the desert.
Cool stuff regardless.
Slashdotia
pronounced Slash-dosh-ya?
+&x
This is all bullocks.
The Gaussian distribution is 'the universal distribution' in the following sense:
Consider a series of events that generate some value. For example, rolling of a dice, which generates a value from 1 to 6. Assume that these events are independent, meaning that, say, the 10th outcome will in no way influence, say, the 20th outcome. Now take the first N outcomes, add them together and divide by N. The larger you take N, the better the distribution of this average follows the Gaussian distribution. (And I should add that there are some mild conditions that have to be satisfied).
Now what are they saying here? That the 'rareness' of species does not follow the Gaussian distribution? How do you quantify 'rareness'? How can this satisfy any kind of independence condition (where there's one rare animal, there are bound to be more).
What's the weirdest of all, is the statement that rare species are more common that expected. What a joke! If something is more common than expected, then by definition it is not as rare as you thought!
Like someone already said, show me a formula for the distribution. I have to guess that if it really has anything to do with Chaos Theory the actual distribution curve is less than half the story. Here is a picture of chi-square(actually it's flip-flopped, but you get the idea) http://www.ruf.rice.edu/~lane/hyperstat/A100557.ht ml
They're trying to sweeten up the deal by placing the guys behind this as innovators who took on a controversial path. That's just downright silly. I took Chaos Theory grad. courses in college, and let me tell you it's so widely-used that it's like saying electricity is a controversial theory. Let me also tell you that what they're trying to say has absolutely nothing to do with Chaos Theory.
I mean! I hope they never make a movie starring Jeff Goldblum about Newton's life, because we might end up refuting Classical Mechanics (even at non-relativistic speeds) tomorrow, wouldn't we? And those movies 'IQ' and 'Young Einstein' really ruined Relativity for me. Drat.
"There is no surer way to ruin a good discussion than to contaminate it with the facts."
This is exactly what one sees when plotting exponential distributions on a log scale. If you have a reaction where A -> B at some rate, then plotting, for example, event durations you would get this distibution as long as the x-axis is log. When working with any system where there is a delta G of reaction(s) the distribution is not gaussian and you can see this graphically.
Researchers have known for years that natural processes typically don't follow Gaussian distributions - in fact if a process is self-similar with certain parameters you can mathematically prove that it is non-Gaussian. But its nice to see this fact get some attention. There was some nice research on self-similar models of network traffic a few years back -- here's an experiment for you: Plot an hour's worth on pings to a distant site. It will be non-Gaussian, or I'll eat my hat.
is not because it's more correct, but because it makes the math nice. and it turns out, the things that we do with that nice math seem to have pretty accurate results. for instance, my data has sufficient outliers that an exponential distribution would probably model them better, but using a gaussion distribution means i can use easy math to get answers that agree with my data to experimental uncertainty, and i'm not going to spend any time making a more precise model that won't get me any improvement in my results (my data was all taken long ago, so the experimental uncertainty is fixed)
>{Yes, I know you can have asymmetric gaussian distributions.}
Sorry, I was thinking three dimensions. It really wouldn't be gaussian if it were assymetric in two.
As you said, for central limit theorem you need a lot of
independent rv's. I wonder if self-similarity causes
interdependent rv's, such that they can be shown to
converge to a specific curve, or if their research is just:
"look at our data - it looks about the same" type stuff.
Does anyone know if there is a formal theoretical
basis for this work?
I wonder why there is the sudden interest in this. While I'll admit that many of my colleagues still haven't figured out that the Gaussian curve is not supreme, it has been known for many decades that most things don't follow straight Gaussian randomness (or white noise as many like to call it). Since I started looking at chaos and fractals many years ago, all of the research I've done and looked at ranging from particular motion, to weather patterns, to fluid dynamics, to DNA, to Internet traffic, to images and textures, to EKG signals, and the list goes on.. have all had very non-Gaussian but still random characteristics. Our descriptions for the randomness was through chaos and fractal theories.
I'm glad to see that this is getting some press time, but, it does seem strange to me since much of this has been known since well before the 1970s as quoted in the articles.
I suppose it is time to get the word out a little more and through off the limiting shackles of the Gaussian distribution and white noise
(try brown and pink noise instead.. much more pleasing).
This is interesting but I think that like many popular accounts it is misleading. From my undergraduate statistics class I remember the Central Limit Theorem which (loosely) states that the distribution of a large number of random samples with a from a population with a given mean and variance will be approximately Guassian, regardless of what the population distribution is. The key here is the word approximately, what the article seems to indicate is that for large samples drawn from chaotically driven processes (a chaotic process being deterministic but extremely sensitive to initial conditions) their curve is a better approximation of the expected distribution than a pure Gaussian. I wouldn't be surprised though if as the sample size went to infinty, their curve (really a family of curves like any distribution) did not become approximately Guassian too, albeit "slowly".
Thanks, people. I was looking at the graph merely in one dimension, and didn't even consider that it represented an X-Y axis. Now I understand perfectly. :)
"There are no shortcuts to any place worth going."
"Be regular and orderly in your life, so that you may be violent and original in your work." -Flaubert
I would just like to take this opportunity to say that chaos theory is completely misunderstood by almost everybody, especially journalists. Chaos theory is no use in predicting specific events, nor does it pretend to be. Chaos theory tells us that there can be order in random events, but that this order is not predictable.
I think that anyone who deals with large amounts of computer hardware (ie. enough to be a statistically sound sampling) could attest to the claim made.
Certain failures and other occurences happen much more frequently than one expects from a straightforward analysis of uptimes and standard accepted failure rates.
I love you... Ok I love you AND the UNIX operating system, but then I've know it longer.
You know, this isn't really DIRECTLY on topic, however:
:)
In highschool a few years back I was a teachers-aide, and my teachers all talked about the standard curve, blah blah blah.
Yet, never once did I ever see the distribution - but no, I had to be wrong, right? I mean, who am I to tell the teacher they're wrong (not that I was ever slow to disagree >:P )
Anyways, looking at the graph, it seems a bit more realistic than a standard curve, because in reality, intelligence grows fast but falls faster
(much like my grades...quick to raise, quicker to fall)
Oh, and anyone notice how if you turn your head sideways this kind of looks like half a turnip...which this and the poll option, has Rob revealed a secret fetish?
also go here halfway down the page
jump to "turcotte"
+&x
http://www.ft.com/hippocampus/q14ae5a.htm
Also, I don't understand how self-similarity would change the bell curve. You'd think every portion would still have the same probabilities, no?
-- perl -e'print pack"H*","6e656d6f406d38792e6f7267"'
this is just another ruse for the insurance company to raise my flood insurance premiums again!
Chuck
Conspiracy theorist
try { do() || do_not(); } catch (JediException err) { yoda(err); }
The monkeys were less rare and therefore plot to the right, while the programmers, and script kiddies are rare and plot to the left. The "mean" value is the dogs and cats; this plots more to the right.
So what they are saying is that there are more species that have a smaller (rarer) number of critters that they could find. The "most common" value corresponds to the "average" number of critters per species.
I guessing now, but if one did a similar survey of the world's population using nationality instead of species, one may get a similar type of distribution.
He had linked to both articles.
sorry.
-- perl -e'print pack"H*","6e656d6f406d38792e6f7267"'
I think what they mean is that the breakthrough was the linkage between this particular skewed curve, and a whole slew of previously 'unpredictable' events, (e.g.)demagnetisation of magnets using heat, turbulence flows, etc. Also, the possible applications on other types of so-called 'self-similar' events, if their theory turns out to have some merit.
--
From what I see here, few folks 'get' what these guys 'got.' They seem to have empirical evidence that randomness is an illusion, as per Einstein's own fervent declarations. I predict that upon closer examination, classic 'random' models of *anything* will be found to be seriously flawed. In Real Life, randomness does not exist. At All. Seemingly random events are the convergence of a literally infinite number of causations at one 'point' ( again, a sucky term). You don't see those cauations because they happen at all sorts of levels, from the quantum on up to that car coming at you. It's Mach's principle on acid. Make sense?
In the 1st article, there is a graph about midway that appears to illustrate the notion that, with the new curve, you are more likely to find the rarest creature than the least-rarest creature. I must not be interpreting right, and I tried reading that part a few more times.
Also, unrelated to the above question, how come it took scientists so long to analyze the obviousness of the microcosm in such detail within the field of statistics? Shouldn't this have been obvious? Why do you think it wasn't? I have no clue.
"There are no shortcuts to any place worth going."
"Be regular and orderly in your life, so that you may be violent and original in your work." -Flaubert
Isn't this just Murphy's Law?
"The number of suckers born each minute doubles every 18 months."
These are my friends, See how they glisten. See this one shine, how he smiles in the light.
Suppose that ppl's programming skills are statistically Gaussian distributed. These ppl then decide to produce a "new" OS called linux. The contribution of these ppl are then plotted up. One would find that the majority of ppl produced a lot of "minor" improvements, smaller programs, scripts, and responses on mailing lists. There would be a smaller group of ppl that contributed a lot of important stuff.
This is the lognormal statistical distribution, IIRC. A bunch of ppl are capable of writing good code in support of this new OS. Unfortunately, only a smaller subset of these ppl have the time to work on the project for a long period of time. Then only a smaller subset of these ppl have the inclination to volunteer their services for this long period of time. Additionally, only a smaller portion of these ppl have the overall skills to do this. The result is that their are only a few ppl that have all of these attributes.
Sorry for this simplistic explanation (it is late and should really be sleeping now). A log normal is really a summation of normal distributions in log space (multiplication in regular space). Another way to view this is to ask yourself a bunch of statistical what if questions (the questions should really generate a set of answers that are Gaussian distributed). When you answer no then you are out of the game. More ppl are eliminated early.
The new curve is broader and more gently sloping, suggesting that the rarest events occur more often than predicted by the bell-shaped curve.
Or, as wizzards have known for years, million to one chances happen nine times out of ten.
But seriously, folks. This reminds me a lot in terms of its applicability to pretty much everything of an article in New Scientist that I also found darn interesting.
The Bayesian community has known about this for many years. It is a log Gaussian, which is the prior commonly used for SCALE PARAMETERS in Bayesian estimation. It is interesting that it applies to other scale parameters, but it's what you'd expect, not some big breakthrough.
Spoken like a true college student. Possibly even a grad student(?)
Just because something does not fit the current model does not mean it's wrong (Well it does when you're in college).
I bet you get A's dont you? Fudge your lab data alot?
This might turn out to be on par with cold fusion or it might be significant. Lets wait for the additional research and find out.
-Rich
The analysis of test data for composite materials in order to determine a single value for yield strength to use in design purposes uses the Weibull distribution. Metallic materials use Gaussian. If you are interested, track down copies of the MIL-5-HDBK (metallics), and MIL-17-HDBK (composites) for aerospace applications.
If I remember back to my Random Processes class not much really is Gaussian. There are two reasons that that assumption is often made. The first is that we have so many tools that assume it, and work OK if we are near it. The second is the abuse of the central limit theorem which says (correct me here if I'm not percise) the sum of a large set of random variables tends toward a gaussian dist as the number of variables approaches infinite. The problem is that people tend to short the infinite part and exagerate the tends to part.
What we really need to do is stop teaching statistics classes that depend on a gaussian distribution. Down with standard deviations:-).
The infinitely defined portion of the gaussian curve doesn't add anything... because as x -> inifity, gaussian(x) -> 0 much faster... off the top of my head, I can't remember the expression for the gaussian though.
Not every pseudo-random event that you plot will produce a gaussian curve... tracking the rolls of a die will be just a flat linear curve, while tracking the sum of two dice rolled together will produce a guassian bell curve...
There's also more fun you can have with a 'Lorentz' distribution.... as well as however many other distributions there are out there.
If I remember correctly though, a poisson distribution is just a discrete gaussian distribution. Basically for n infinity.
This article is nothing in and of itself.
Could someone please provide a link to the actual scientific paper(s)?
I wonder how this will affect the whole field of data analysis? If this curve proves to pretty common then wouldn't it affect the assupmtion (in at least the social sciences) that your distirbution is normal (ie as in z & t-tests and ANOVA tests?). I guess if you can't assume normalcy then you will need find other analysis techniques.
-------------------------------------------- It looks just like a Telefunken U-47! -Frank Zappa
This really isn't the biggest discovery ever. In fact what they've accomplished is to rediscover the base assumptions of the bell curve. The normal curve (bell curve) is a product of stochastic interactions between atomistic events; it pretty much only reflects behavior in systems where new actions are not affected by the history of the system. If you have a saturated system (like the ground being unable to absorb more water in the case of the floding example) you've got a messed up curve. Any decent book on statistics will give you the basics about this.
-m.d.
That's a very good question. Am I the only person who noticed that this curve - at least what we can see of it - looks very much like the black body radiation curve? If any one knows of a way to get a more complete graph, I would be very interested in seeing it.
Least likely events happen when you least need them.
You can't prove established, proved mathematics wrong, because it doesn't claim anything about reality. It only states that, given some assumptions, Gaussian distribution emerges. If the assumptions does not fit to a certain application, then Gaussian curve does not necessarily emerge. Assumptions behind the Gaussian curve are very general but certainly not universal. For sure there are phenomena that do not follow Gaussian curve. (Roughly speaking, if you have lots of independent rangom variables, their sum follows the Gaussian distribution. That's why it emerges in so many places: most things we measure have effects of countless independent error sources included in an additive manner.) When you sum up those random variables, it's usually the tails which converge most slowly towards the Gaussian distribution. It's therefore easy to find examples where the tails behave in a non-Gaussian way. (No, I didn't read the article.)
Geee! Scientist have discovered chaos again (and again and again).
Genesis, Wind and Wuthering,
MartinS
In the figure given, the x axis (horiz) [if it is indeed a bell curve] is eg 'height of person' ie it is the value you are measuring , or 'percentage of slashdot readers in a given area who are mathematically incompetent' (which is actually quite high imo - computer literacy != mathematical competence as i have frequently discovered). The y axis is the value of this measurement found. So the least frequently found measurements are at either end of the x-axis (ie huge people or tiny people) where the y-value is going to be low (eg noone less than 2 inches high or more than 50ft high). Hope this clears up facts a bit.
Pretty stupid claim that the distribution is wrong. Gaussian is only an approximation, usually for infinite statistics. When you have a finite one you should use Poisson distribution and this is how they curve look like. I am sure in the research they will find binomial and exponential distribution next.
The interesting issue here is the reporting of the science. It makes a good story if you can paint a picture of a brilliant free-thinking scientist being oppressed by a conservative scientific majority. And so even the most trivial new theory often gets reported in this light. Think about it - more than half the scientific articles you read in the non-scientific press take exactly this form.
And in all the cases in fields I understand, either the theory is rasonable but the story has totally misrepresented it, or the proponent is a crackpot after publicity.
The difference between this and the gaussian model is that with the gaussian you are merely dealing with the summed behavior of a large number of independent variables. With the SOC model you are dealing with a particular pattern -- the frequency of changes as a function of their magnitude is described by a power law. It's not just a bunch of stuff happening randomly, it's a particular state in which the rarity of a change is correlated in a precise way to its magnitude.
If I'm correct and that *is* what they're talking about, this isn't all THAT new. I have a neuroscientist friend who's been working on applying the SOC model to brain function with some success for a couple years now.
But the article is vague enough that it's not totally clear that's what they're talking about.
BTW, the "new" curve that they show looks like a Rayleigh or a lognormal distribution.
The authors have found things that were mismodeled as gaussian and instead follow another distribution. So what? There are plenty of distributions besides the normal that are assymettric and have fatter tails.
It *may* be that they've found another distribution that appears in multiple fields, but there's not enough here to judge this as a statistician. If it has any parameters beyond mean and variance, I'm not likely to be impressed--I can probably produce a three parameter beta distribution that's close.
hawk,wearing his Ph.D. statistician hat for the moment
Isn't this whole thing just a fractal?
Spoon not. Fork, or fork not. There is no spoon.
The key that some people seem to be ignoring is that this is only for self-similar phenomena. This gives rise to the asymmetry and emphasis for rare events as compared to the gaussian distribution. The assertion is not that a Gaussian is incorrect, but that it does not accurately model certain self-similar phenomena.
Ok, I'm getting "arbetsskada". I read that and thought to myself, "Sweet name for a new distribution of linux."
Slap.
Hey Jeff, you have misunderstood the article. The article that you read was not very interesting, it was _vague_ reading made for people who do not understand maths. I strongly disagree with the topic: "Gaussian Distribution questioned"... The article was clearly pointless and the writer clearly didn't understand the subject. Gauss distribution is what it is and it's rigorous construction cannot be undone with any new curve that predicts some events better than gaussian does.
The September Sci American has an interesting blurb about possible non-gaussian distribution of the Cosmic Background radiation.
. html
http://www.sciam.com/1999/0999issue/0999scicit5
Unfortunatly, does not reference the papers that it is based on. Sigh.
MobiusKlein.
Not really your fault. It was labelled poorly (as others have explained) and it was misleading: the two curves that were being compared contained two different areas.
Given an infinite number of samples, the data will perfectly fit some distribution (maybe not any distribution we commonoly use or even know about). So unless you are talking about infinite samples, there's no way in HELL that you're going to have your data ever perfectly fit a probabilistic distribution.
All too often... well I do it too... researchers use the Gaussian distribution because it's very easy to use (only two moments). So the Gaussian distribution isn't wrong but it just isn't correct selection of distributions for that particular experiment.
I saw the New Scientist article, too; it claims that Benford's Law describes the (only) scale independent distribution. Another description can be found in this Mathland column.
You know, Web search engines are useful for this kind of thing. The Donald L. Turcotte homepage seems wildly out of date. But he also wrote a textbook explaining all this.
The interpretation of this is that the heart is in a self-similar state, that is all lengths of time between heart beats occur, at all scales - the distribution of which is a power law. The heart is in a similar state to a condensed matter phase transition, that is its control mechanism keeps the heart in a critically balenced state, ready to change period rapidly.
Donald Turcotte
Dr. Steve Bramwell
John Harte
I guess that the diference between your example is that the equations that are used to explaing the experiments that you listed are rather basic ones. As I understood the curve explained by the article is very complicated and would make sence that it would explain well just one specific experiment. But it did explained a lot more then expected, actualy it turn out the it could be a basic curve as well.
--
"take the red pill and you stay in wonderland and I'll show you how deep the rabitt hole goes"
[]'s Victor Bogado da Silva Lins
^[:wq
Equipment failures in general are generally assumed to follow the Weibull distribution, which has already been discovered (by Weibull)
jsm
Indeed I suspect that they just have some variation on a lognormal curve Prob not lognorm as has negative numbers on it, but I would guess that it can be put together from beta and gamma functions. More generally, I wonder what the properties of this density function are? Social sciences (or at least, economics fer sure) don't really assume that things are distributed Gaussian -- they just recognise the fact that if you use some sort of screwy distribution that doesn't add, subtract and scale like the Gaussian, you're going to be an old man before you get a working model. On the other hand, if the new curve does scale & add, that would be a discovery .... jsm
Given an infinite number of samples, the data will perfectly fit some distribution (maybe not any distribution we commonoly use or even know about).
Well, actually, given an infinite number of samples, the data will (must) fit a Gaussian distribution. All distributions converge in the limit to the normal. It's called the "central limit theorem", and I don't think that it's going anywhere soon, no matter what the "self-similarity" bunch say.
jsm
1. The earth is flat 2. Sun revolves around the earth 3. Moon landing was faked 4. Elvis is not really dead 5. Pink teletubby is really gay
You're right, the journalists missed the point. The part that's "wrong" is the over-application of the gaussian distribusion as a model of everything .
--Mizerai
More crap journalism. No references, as far as I can see, to the original source of the alternative curve so that we can find out what it's really about. Anyone know of a better reference or even the original publication?
This doesn't prove a Gaussian curve is "wrong". What it is saying is that the new data is evidence that a gaussian distribution is probably the wrong model to be using under the circumstances described. The theory is that self similar phenomena do not follow a gaussian distribution, but follow this new distribution. It seems to me that the deep mathematical analysis has not really been done, but the experimental evidence suggests the existence of this distribution. There is probably a lot of work ahead in coming up with a mathematical model for the new distibution. What would be real interesting would be if the mathematical model for the distribution reduces to gaussian under specific conditions, kind of like how special relativity reduces to classical mechanics at low speeds.
The biggest implication of the model is int he insurance industry. If it is found that floods, fire, earthquakes, and hurricanes follow the new distribution. It may allow insurers to go back to insuring against earthquakes and hurricanes because tey can actually predict long term income and expenditures more accurately. Maybe they will actually do their job instead of claiming hardship whenever a disaster strikes somewhere.
Insurance Exec: Oh wahhh!!! We can't pay a billion in claims, go to the government.
Translation: We have taken in a net profit of 2 billion dollars over the last two years. But, the billion dollars for this disaster will affect our earnings numbers for the next quarter or two and my stock options will be worthless.
I looked over the articals, and all I can say is "So What?" the Gaussian distribution is based on pure random-ness. Did you expect everything to be a completely random event?
Neither artical seems to go into great detail about how the new curve was calculated, but it's simply a _FACT_ that applying the Gaussian distribution to most events is considered a "simplification" of the problem, assuming it's random. Take away some random-ness, and of course the Gaussian distribution won't fit.
Intelligence (however mesured) will not be purely random, nor will floods, grade distributions, tornados, or anything
What's missing from both of these pieces is an explination behind the way the new curve was built, and on what foundation. The Poisson (spelling is way off there) distribution is frequently used in place of Gaussian because it "fits better," but again, doesn't prove that the events have much to do with the math.
This is a case of "curve fitting gone wild" here, and unless I can see someone spell out in scientific detail the relationship between the events and the distribution, I don't buy it. So, they have a new equasion, and a new curve, it doesn't mean that the events are related to the math directly. If you look for anything hard enouth, you will start to find it everywhere.
I do award them credit for a new curve that better fits some models. If the equasion for thier curve is manageable. If it's a complex equasion, it's worthless, because the whole point is to make some equasion fit a distribution of events. If theirs fits, and it's easy to calculate, it's benificial. But it does not emply a direct coorilation between the functions and the variables in the distribution. How do I explain this in SlashDot terms... (/me get's frusturated).
Ok, take Moore's Law, you all know that right? Processor power doubles every 18 months? Or, the more accurately I believe he stated something to the effect that the number of circuts would double every 18 months. Well, a loosely fit exponential function will almost match this trend (roughly). But then you have to "adjust" the month scale between 12 and 24 untill the curve fits well. Now, that's a "model" but does not prove scientificly that circuts and design engineers are behaving exactly as can be predicted. At some point in the future everyone has predicted Moores Law will fail. See... It's a model! Curve Fitting.... Doesn't PROVE anything about what's going on in developers minds, or much tangable other that the "estimation" that things will get more powerfull in the computing world.
Now, take it a step further, say Moores Law fails right as people develop a new method of increacing computing preformance, like say 3D circuts, or something not yet concieved, and with less "countable circuts" you get more preformance. Suddently, new devices start to a few less circuts, and more power. Now the Moores Law curve goes down, slowly at first, leveling off, and maybe dropping just a tad, and it starts to look like a "bell shaped curve" only half drawn. You could go "Curve fitting crazy" and say "Hey, it's Gaussian, it's going to go down now, and within another 15 years, we will all be back to 8 bit processors!" That's just idiotic.
In short, curve fitting is useful to predict many things, but it can not be assumed that the curve implyes natural phenomona. Any curve that fits data is useful. A curve that fits data does not directly imply complete coorelation of events, or diffinitive proff that God does or doesn't play dice (hope he does personally, has to have fun sometime!). And Furthermore:
For those who continue to doubt that it could all be so simple, Prof Turcotte has a suitably direct response. "People say: 'You can't do it because it's too complicated a problem'," he says. "We say: 'Just look at the data'."
So his data fit, so what? Any reasonable math wiz should be able to come up with a few dozen equasions that fit a line. Doesn't prove a thing.
Forgive my typos, bad grammer, and spelling, I got pretty pissed at tabloid junk science, and I had to vent. Feel free to prove me wrong, I would like to see how you can prove the new equasion and chaos theory is the best "insight into the universe" we have... BTW, if you can prove it, you'll probably be up for a Nobel Prize too.
So the Gaussian distribution isn't wrong but it just isn't correct selection of distributions for that particular experiment.
My point exactly- the people doing that research probably never intended for the article to lean towards a "Scrap Gaussian! Look at us!" thesis, but that's what happens from time to time when you get journalists in the act of "reporting" science.
Statistics classes which teach Gaussian distributions are fine provided students learn what is required for the central limit theorem to apply. I think you may be mistaken regarding the "noninfinite" part. We can quantify the fluctuations from a Gaussian law in a rigorous manner in cases where a noninfinite number of variables are concerned. The real difficulty, on the other hand, is that people sometimes apply the CLT when it is not valid.
You should add "...approaches infinity, provided the fractional contribution from any one random variable to the sum uniformly converges to zero in the limit as N -> infinity." This is an important distinction. For instance, Levy distributions are a class of stable limit laws for which this is not the case--the largest variable in the sum can in fact dominate the sum. Symmetric Levy distributions may superficially resemble Gaussian laws, but with tails that decay slower (like power laws rather than exponentially fast).
This article is amusing if only because it is a nostalgic throwback to the days of P.R. and hype over "chaos theory." Call me dense, but I don't understand why something as simple as scale-invariance needs to be dressed in the extra jargon and hype. Assuming the author did not miss anything terribly fundamental, I don't see anything novel in what was reported. Perhaps someone in the know can fill me in on just how exactly this turns statistical physics on its head?
[For those who are interested, Levy distributions are treated quite adequately in Limit Distributions for Sums of Independent Random Variables (Gnedenko and Kolmogorov) (c)1954].
I had not heard about the New Scientist article, but I have known about Benford's law for some years. Indeed without a description of what curve these people think that they have, I am not sure how it *differs* from Benford's law in practical import!
Indeed I suspect that they just have some variation on a lognormal curve. (Which does indeed show up in many different places.)
Incidentally one of the few things that I disagree with in Knuth is his presentation on Benford's law. Sure they toy mathematical model he generates is fun and all, but he says nothing about why it applies to the real world. And hence his "proof" says nothing about why real numbers that appear in real computers follow Benford's law. I personally find the general explanation in the article you listed to be far more convincing...
Cheers,
Ben
My usual seat in the cluetrain is at A HREF="http://pub4.ezboard.com/biwethey.ht
Notice that they place 'most common' on the rightmost of the graph, instead of in the center of the curve.
P.S. While I generally try to be tolerant of the differences between British and American English, "maths" really, really grates on my nerves. I wish they'd learn proper slang.
Actually, I was wondering if it weren't related to the Gaussian Orthogonal Ensemble (GOE) distribution, which was a result of much of Wigner's work pioneering Random Matrix Theory (RMT) decades ago.
Mathematically, the GOE distribution characterizes the eigenvalues of a Gaussian distribution of orthogonal matrices containing random elements. (Forgive me if I've got the math a bit wrong; I'm a physicist by trade...)
Physically, the GOE distribution has been popping up in increasingly many physical systems for a while now. Years ago (maybe by Wigner himself? not sure) it was noticed that the energy level spacings of atomic nuclei have statistical properties consistent with the GOE distribution. Some time later, people fooling around with microwave cavities began seeing these distributions as well. The quantum dot folks have also run into the GOE distribution, I believe.
The GOE distribution seems to provide a good test for broken symmetries in a system. As a system's symmetry is gradually broken by, say, shaving off a corner of a piezoelectric crystal, the statistics followed by the eigenvalues (in this example, the resonant frequencies) gradually shift from GOE to Poisson, the latter which characterizes the eigenvalues of a truly random system.
Now, two really cool things about the apparent universality of the GOE distribution are:
Neat, chaos! Well, sort of. If you take a classically chaotic system, say, a Sinai billiard, and quantize it (solve the Schroedinger equation), time after time you will discover that the eigenvalues of the quantized system have these nice statistical properties that happen to fall out of RMT, namely, the GOE distribution.
So does that mean all quantum systems that follow GOE statistics are chaotic? No. In fact, it's difficult to define what "chaos" really means for a quantum system that has no classical analog. But it implies there's a connection, it certainly is fun to think about, and perhaps continued research will reveal a deeper universal phenomenon at work. I wonder if these researchers haven't taken another step in that direction.
Dang, I wish I had something up on the web about how my research relates to all this... well, you can email me.
wcb
First, let me say that the graph in the article is poorly labeled (or at least their example
Not to mention that the area under the new curve in their graph is significantly more than that under the bell curve. Which means that the total probability is above 1. To use their example, we have a very neat species distribution, say 50% wolves, 50% rabbits, and 30% bears, for a total of 130%... My question is, is the Financial Times always that bad at math?
Maybe in the long term, this will be "proven" or "disproven," but does anybody remember the phrase, "Assume a point mass ..." or "Assume a sphere ..." way back in school. What happens in the long run is that it's never as nice as you'd like. That's why there are other distributions than the Gaussian. Why should we assume that everything that can be found has already been found. Take the time to think, folks. Maybe this is like cold fusion, but maybe its like the transistor for model prediction. More advanced, more accurate models could be the result.
I'll never be as good as I want to be. I can only be as good as I am.
If memory serves me the "God doesn't play dice" came from a dispute Einstein had with Niels Bohr where Einstein didn't belive in the randomness of quantum mechanics but was proven wrong later, God does play dice.
Did I remember correctly?
That's why I said "the article claims they use Gaussian for this and that". I thought someone will have more insight. Yeah, I also heard that insurance companies employ a few good men/women. I would imagine they would use some extrapolated curve based on claims data for a century back or something, not just some silly distribution.
In which case, you're absolutely right: WTF is the target audience for the article? Someone speculating on insurance companies stock?
One of the original researchers in this field is Benoit Mandelbrot, who applied it to financial markets, showing that price changes are not (as is frequently assumed) gaussian. Why the Financial Times did not pick up on this angle is beyond me.
As the mystic said to the hot dog vendor, "Make me one with everything"
Yes, I realize that it is statistics. However, it deals with determining what seemingly random events are most likely to happen everywhere, not in a single closed environment. Statistics are compilations of data. How that data is used is not statistics. "1 out of 10 blah blah blah" is spouting of statistics. What causes that 1 out of 10 is not statistics. That's the gist of this article.
The reason I spoke of Psychohistory is because it is supposed to be using the statistics on an advanced probability engine. This is a step toward that equation. The more refined we get to deciding that "1 out of 10 blah blah blah because [insert reason]" is the closer we get to figuring out the universe and how humanity acts as a whole.
-NYFreddie
Barbie of Borg - She doesn't just Assimilate, She Accessorizes too!
One thing that no one has mentioned is the possibility that this technology could be applied to anything that uses guassian distributions to analyze data. That means compression routines like .Z, .jpg and .png could be improved That is, of course, if the article isn't just a bunch of fluff....
As I read this I wonder if the curve is a real distribution or whether it just models the human mode of perception. We are more likely to notice rare events instead of common ones. For example, take a field of flowers. Its 99% green but what we notice are the colored bits...
Chemical engineers study a similar relationship between different systems, in a course often called Transport Phenomena after the name of a classic textbook on the subject. Replace the electrical bits in this explanation of System Dynamics with heat flow, and change the names of some terms (since it's not electrical-based), and you've got Transport Phenomena.
It not exactly surprising that Gaussian's aren't scale-invarient. Scale invarience requires
that the distribution obey a property like:
f( ax) = b* f(x) where a,b are arbitrary scales.
Gaussian's ( e^(-cx^2) ) clearly can't meet that because the x^2 term is non-linear. Certain
hyperbolic distributions (i.e. 1/(1+cx) ) meet this.
It's a pitty the article didn't give the function they
used.
It's also a really common mistake to use a Gaussian to describe a process where there are limits which make it invalid. An example is the
distribution of intervals in a line ( x e^-x) or a
Rayleigh distribution (for noise intensity)
( x e ^(-ax^2) ; x >= 0). The Gaussian expectations will be systematically wrong for these cases.
Just my half-cents.
rob
(harrison@asterix.jci.tju.edu)
In my own field, the distribution of stock market returns is often taken to be distributed log-normal
You can also start with log returns (instead of "normal" returns). This will give you an approximation to a Gaussian (as opposed to a lognormal distribution), plus they are summable across time. I work almost exclusively with log returns -- they are a pain when you need to calculate portfolios, but nice otherwise.
A new distribution that gives increased weight to rare events would be very useful
There are several (e.g. Cauchy), but the problem is that they are much harder to deal with (analytically) than the Gaussian. And if you don't like any, you can always work with the empirical distribution -- no need to pollute the facts with assumptions about what they should be. However, not much of statistics will be useful to you -- the Bayesians offer some good tools.
Getting back to the original point, I wonder if these guys heard of Hurst and Hurst processes. A persistent Hurst process (sometimes called black noise) will generate something like what they found, and Hurst himself developed his theory on the basis of natural phenomena (he started with the frequency of floods on the Nile which occurred, surprise, more often than should have been expected). Skim through Peters "Fractal market analysis" for more information.
I bet these guys rediscovered Hurst processes.
Kaa
Kaa
Kaa's Law: In any sufficiently large group of people most are idiots.
Yes, Gaussian is one case of the family of so called "stable" distribution - the only one with a finite variance. Stable distribution is such that if you add random variables with this distribution you will get a variable with the same distribution. - As in law of the large numbers, but without finite variance.
Infinite variance means the tails of the distribution fall off slowly - it is more likely to get an event further from the mean value.
So fucking what? Big news? Hardly.
Stable distributions have a lot of applications in many areas of physics and finance. Do a literature search on "Levy flights" for examples. There was a good article on Levy flights in one recent "Nature" (IIRC) For some financial applications - check out very easily written (but for a specialist kinda useless - IMHO) Mandelbrot's "Fractals and Scaling in Finance". It has some good discussion on the subject.
Guys, you look like fools, making news out of a rather well known field. And discussing it rather childishly...
<^>_<(ô ô)>_<^>
Quick! Somebody fix the SETI client! We don't want to miss any of the alien signals!!
:)
SETI GBC Analysis
Just kiding.
It has been known for some time now that the assumption of infinite degrees of freedom inherent to the Gaussian is inappropriate in many cases where it is applied, essentially due to laziness and its analytic simplicity. The t-distribution, which has fatter tails (ie. rare events are more common) is commonly a better fit, but a pain to use.
Yes, my sentiments also. Does anyone have a link to a page which actually SAYS something on this topic?
Normal distributions only make sense if your fundamental operation is addition. If, however, your fundamental operation is multiplying the random variables of interest, then you get a distribution whose logarithm is normal. Hence the name lognormal.
This is just as natural as a normal distribution, and appears more often than straight normal distributions in subjects like finance and stochastic analysis.
Cheers,
Ben
My usual seat in the cluetrain is at A HREF="http://pub4.ezboard.com/biwethey.ht
This is a well know fact in statistics and finance, so called fat tails.
Just in case people don't know where in the heck the above jokes are coming from, go out, buy some books by Terry Pratchett, and have a ball...
Ben
My usual seat in the cluetrain is at A HREF="http://pub4.ezboard.com/biwethey.ht
it's a one-in-a-million chance of winning the lottery...
Insert mind here.
Umm is that article just on crack or what? I am sure that graph (Learning curves : a fresh approach) has to be a misrepresentation of something... Notice how the "old" graph says to the left we have a small distribution of "rare" species and then (as you move to the right) it gets larger and the for some reason only known to the business kidz very unrare species they think become umm rare again :). Obviously this just isn't true, no one maps out gaussian curves like that.. I think by the assymmetry of the "new" curve that they are showing that some things are better modelled using Poisson Statistics.. thats what the "new graph" looks like anyway.. the question is what on earth are they trying to show.. what is the Y axis??
:)
Anyway i'll put money on the fact whoever these profs are are trying to scam cash from financial wallstreet types.. (New curves new ways to predict the stock market give us money *cough*) this article was a plant.. but i'm feeling kind of cynical today
-avi
Here is an example of "universal" mathmatical models from a Mechanical Engineer:
In school, I took a class called System Dynamics which is all about modeling dynamic behavior of systems. There is an interesting similarity of behavior between electrical, mechanical, and hydraulic systems in the equations used and how you define them.
Driver:
Electrical = voltage
Mechanical = force
Hydraulic = pressure
Flow:
Electrical = current
Mechanical = velocity
Hydraulic = flowrate of fluid
Resistance:
Electrical: voltage = constant*current
Mechanical: force = constant*velocity
Hydraulic: pressure = constant*flowrate of fluid
Capacitance:
Electrical: contant*integral(current) with time
Mechanical: constant*distance traveled
Hydraulic: constant*integral(flowrate) with time
Inductance:
Electrical: Voltage = constant*delta(voltage)/delta(time)
Mechanical: Force = constant*delta(velocity)/delta(time)
Hydraulic: Pressure = constant*delta(flowrate)/delta(time)
(In the mechanical example, mass is the constant)
The equations are very similar, but you don't see me calling the press and saying I've found a "universal" mathmatical model.
Trying to claim a "universal" law is hype. Just because there is similar behavior for magnetic properties, turbulent flow, and distribution of species is interesting, but doesn't suggest that everything is related in a similar way. I think that is why Mr. Turcotte got such a hostile reaction. Before you claim here might be a "universal law linking patterns of mineral deposits, floods and landslides" you better look at the data first and don't argue from the specific to the general the way he did in this case.
Your password has expired, please login to change it.
As many have pointed out, there is nothing new or nothing surprising in the claim. What the statistical theories claim is that if a variable is truly (mathematically) random the statistical distribution asymptotes to a Gaussian distribution (or the Bell curve). That's not an observation or a fact. That's a theorem which one can prove, in other words, it's more like a definition of a "true randomness" of a variable. Roughly speaking, if something is truly random, its distribution will begin to look like a Bell curve. The real question is, "what is truly random?"
It's almost nonsensical to state that the nature does not follow the Gaussian curve just because a statistical variable does not follow it. Perhaps it tells you more about the variable itself. If a variable x has a perfect Gaussian distribution, the distribution of log(x) will look nothing like a Gaussian distribution. Does that tell us the Gaussian curve is not the normal curve? It only tells us that even if x is truly random log(x) is not.
First, the bell curve is ubiquitous because so many random processes satisfy the assumptions of the Central Limit Theorem. (finiteness...)
However, there are lots of natural phenomena that don't meet those requirements and so we use lots of probability distributions in science. Lorentzian and Poisson distributions come to mind.
It's fascinating, but unsurprising that self-similarity leads to a different kind of probability distribution.
The journalist heads into "Golly Gee" territory once he starts calling it a "Universal Curve".
DK
It assumed its pre-eminence, precisely because it has nice analytic properties (i.e. one can prove pencil-and-paper theorems about it). That does *NOT* imply that all stochastic phenomena obey Gaussians. Their distributions are what they are, and trying to fit any/all data with Gaussians is a disease. Get over it.
Judging from the curve, it looks like it can be built by multiplying several random 'factors' togeter, instead of adding.
What you're talking about is called the "central limit theorem", which holds for the summation of iid (independent, identically distributed) random variables of any kind of distribution with finite mean and variance.
-- Will quantum computers run imaginary-time operating systems?
If you measure a set of mostly random events you will end up with a bell cuve.
      it seems to me that external, modifying events are removed from scientific studies as much as possible.
This act automagically skews the results at least slightly enuff to where you will find something else in nature.
It would seem to me that it would be impossible to take EVERYTHING (i.e. everything) that might efect the results into account so we dont bother trying.
      If we want to predict a mostly random event we apply the bell shaped curve. But I say 'mostly random' becouse most things are not truely random.
      Just becouse we fail to predict or fully understand a problem does not mean that it is utterly random. This new curve helps to predict some things. Others might take a whole new curve. I do not believe that there will ever be a universally true curve. All that this points out (gasp) is that not all things are utterly random.
Just because your data doesn't precisely fit the distribution, it does not mean the distibution is "wrong." What it means is your data doesn't match your distribution.
This appears to be another case where journalists have missed the point.
The Gaussian distribution is not "wrong" in any shape or form.
It sounds like something out of an Asimov novel, actually. A common formula that can be used to judge seemingly random events when large masses are considered - the individual is random, but the collective is predictable.
I wonder if I should take this back to school and demand they raise my grades for all those times I "created the bell curve".
-NYFreddie
Barbie of Borg - She doesn't just Assimilate, She Accessorizes too!
Goddess, I love it when the status quo gets shaken up! Woo!
To explain the 'rare more common than common' phenomenon, one need look no further than Hallmark or Precious Moments or crap like that: "We are all special, we are all unique, etc." Blah!
Still giddy, this is cool!
The Divine Creatrix in a Mortal Shell that stays Crunchy in Milk
The House Between - Original Sci-Fi Series
For people in the physical sciences. Many of use have been using binormal or power law (fractal) distributions to charcterize our data for some time.
First, let me say that the graph in the article is poorly labeled (or at least their example poorly chosen), IMHO, since "rarity" is related to the number of standard deviations you are from the mean (whether or not the distribution is symmetrical), whereas their graph has rarity monotonically decreasing from left to right. I guess in this sense ("rarity of a species"), rarity != probability.
This new graph stikes me as a bit odd, since it's not symmetrical. With the bell curve, you only need to know how many standard deviations you are from the mean. With this curve, "above the mean" and "below the mean" are vastly different territories.
This curve brings up two questions for me:
I guess this new curve is just another way of saying that "Hey, there's a class of 'random' events out there that share a common non-uniform distribution!" While that's useful to know, I don't see it as the ultimate refutation of the Gaussian distribution.
--Joe--
Program Intellivision!
The point of this article isn't really about math, but that one side or the other in the conservation movement is going to get some ammo to use against the other side.
I suspect from the way that the article is written, that the authors are claiming that the extrapolations about the numbers of species, and by implication the numbers of species that have gone extinct, are too high.
This would imply that the ecological crisis claimed by the extremist conservationists is vastly overstated.
So this is a political piece, not a technical one. (I'm not going to comment, one way or the other, about the ecological issues that are implicit in this article. That's a completely different discussion.)
skg
Rumor has it that wallstreet quants use fractal correlations to predict/exploit financial time series. This has been a long term interest of Mr. Fractal himself in a Scientific American cover story a few months back by Dr. Mandlebrot.
The use of the gausian curve is based on the assumption that the random variable we are considering is actually gereated as an average of many many independent random variables. It has been shown for all 'reasonable' independent random variables in the limit their average will be a gausian distribution. This is straightforwad mathematics no arguing with this.
As such from a mathematical point of view this has nothing to do with replacing the gausian curve...it is still clearly the most 'natural' mathematical curve. However, what I understand the authors to be claiming is that certain types of real world events are not actually gaussian and are described better by this model. This shouldn't be that surprising as often the 'extreme' cases are not caused by a mere sum of the independent random variables mentioned earlier.
For instance intelligence might be regarded as the influence of a great deal of small random variables (how some genes got arranged upbring etc..) but the truly tale end cases such as mental retardation do not occur because all of these factors go bad, (someone who is retarded is the result of some genetic defect usually not a combination of bad upbringing poor nutrition etc..). This is probably not the kind of thing the distribution describes but it shows that the gaussian really never has been the end all and be all.
So while this is undoubtly a very interesting subbject it really isn't that exciting. Ohh and the claim that the greater incidence of natural disasters disproves the gaussian was really BS, while they may not be gaussian this doesn't appear to be a large enough sample size to make such definitive claims
Marriage is the "pseudo-ethics" that cloaks the messy truth of sexuality in the raiment of propriety -- it's "Don't Ask,
you're suggesting we call this the beanie-baby curve?
"The number of suckers born each minute doubles every 18 months."
These are my friends, See how they glisten. See this one shine, how he smiles in the light.
You have grasped half of the situation: the derivation of the gaussian curve depends on the assumption of independence of events. When you deal with self-similar systems, there is an overall structure (e.g., a fractal structure) which means there is a coherence (correlation) between seemingly unrelated events (you just may not be able to articulate presicely what that correlation is--the fractal structure proves it exists nonetheless). In the real world, there is very often a low level interdependence among events, and the existance of that interdependence means there is no reason whatsoever to expect those events to fall along a gaussian distribution. What they have identified (as i interpret this very incomplete description of the discovery) is that there is apparently a universal form of distribution of the events causing evolution in self-similar (fractal) systems. This is something I definitely want to look into, but if something along the lines of my imperfect present understanding is in fact the case, this is a very very big deal!
The bell shaped Gaussian distribution is just a model. In many cases, there are good theoretical reasons why it makes sense to use the Guassian model (e.g. the central limit theorem, easy to work with mathematically, etc.), but there are many different statistical models in use. For example, noise in undersea communications is better modeled using a Laplacian distribution than Gaussian. So I don't really see what's surprising about the fact that real data doesn't really fit the gaussian model.
This article is just confusing.
Anyone out there who can give an honnest description of this whole thing in mathematical terms?
(would be nice if it was someone who was in maths-sup-math-spe...thanks...)
zeb
Research Guide to the Palestinian-Israeli Co
I happened to think of one possible reason why so many phenomena might fit a lopsided curve better: The bell curve implies the possibility of infinite extension in both directions. If the mean of the distribution is near one physical extreme (for instance, looking at average rainfall levels -- you can't have negative rainfall), then the curve must become lopsided.
Perhaps that's what they've stumbled onto?
--Joe--
Program Intellivision!
That curve is just the probabilistic map taken from a column right after the period 3 in any bifurcation map. Other columns also give different curves.
-- You are in a twisty maze of passages, all alike.
Aren't you guys missing a point?
We have different distributions and none are "good" or "bad" by default, as anyone with science degree knows, but:
1. This article is in financial times
2. Scientist proposing the new distribution have
established partially based on data about frequency of forest fires and air turbulences
3. Article claims that for natural disasters insurance companies currently use Gaussian distribution
4. Insurance comapanies = Tera $$$ = what "financial" stands for in the web site's name
So simply, if it is true that insurance companies use Gaussian d. for e.g. forest fires, and research shows that the actual data doesn't fit that distribution, then there's a good chance those companies will utilize this research and insurance premiums for such events will change. For finance world, that IS big news.
If this wasn't related to $$$, you wouldn't see an article like this in financial times. They don't give a damn about chaos theory or anything, unless it means $$$.
For the implications of this new distribution in science, a better article with formulas and such would do the rest of us good.
The Gaussian distribution is 'the universal distribution' in the following sense:
Consider a series of events that generate some value. For example, rolling of a dice, which generates a value from 1 to 6. Assume that these events are independent, meaning that, say, the 10th outcome will in no way influence, say, the 20th outcome. Now take the first N outcomes, add them together and divide by N. The larger you take N, the better the distribution of this average follows the Gaussian distribution. (And I should add that there are some mild conditions that have to be satisfied).
Perhaps you should get the calculation right. You have to subtract N times the expected value, then divide by the square root of N to see a real Gaussian distribution emerge.
Now what are they saying here? That the 'rareness' of species does not follow the Gaussian distribution? How do you quantify 'rareness'? How can this satisfy any kind of independence condition (where there's one rare animal, there are bound to be more).
What's the weirdest of all, is the statement that rare species are more common that expected. What a joke! If something is more common than expected, then by definition it is not as rare as you thought!
I would have thought that to be pretty darned obvious. The species is rare if there are few members of that species. Their claim is that there are more species with only a few members than one would expect. (My retort is that humans have a lot to do with how many rare species there are!)
Let me say that again slowly.
Step 1: Categorize species based on how many members they have.
Step 2: Look at the distribution of species by the population of the species.
The non-technical articles don't tell us anything useful when it comes to judging their math. But given your math, when it comes to labelling things mathematical nonsense I can only think, pot, kettle, black.
Sincerely,
Ben
My usual seat in the cluetrain is at A HREF="http://pub4.ezboard.com/biwethey.ht
So, what you're telling me is that according to this distribution certain rare events will happen more often......like Microsoft releasing stable and useful products, actually getting help from tech support, etc., etc..... ;-) Microsoft Enthusiast
I don't think the new distribution is due to the mean being near one physical extreme. This situation is very common and has been studied by statisticians. In such cases the data is not expected to be normally distributed, although sometimes a "transformation" (i.e. 1/x or sqrt(x)) can be used to get a normal distribution from such a data set.
What I gathered from the article is that the Gaussian distribution is still correct. They have just discovered a set of situations, which follow a different distribution. While it may have important implications for some fields, this is not earth shattering. Many phenomena have non-gaussian distributions (i.e. nuclear events follow a Lorentzian distribution).
Too bad the article didn't give many details about this new distribution!
Why coincidences happen more often than they should, and its not all 'in our heads' like the psychobabblists like to tell us.
:)
:)
My bet is that it has something to do with quantum mechanics. Those things always seem to crop up and show us how classical physics can be full of shit under the right circumstances.
You really can read the other guy's mind. You just don't know it
Anonymous Coward, get it?
Anonymous Coward, get it?
Not bad spelling, bad typing
Too bad you never see any other than in a magazine...
However, the chaos theory, which can be consider like a new form of geometry, demonstrate, by strict observation, that the nature is better represent by fractals than by (even complex one) euclidian form. A good example is the lenght of the England coast: if you take a straight meter to calculate it, you obtain a certain lenght. Now, use something a feet long: your total will be higher because you have now to follow irregularity that your previous meter must ignore. Now take the smallest rule you can, something near 10 E-35 meter. If you have the patience to do it so, you will obtain something near the infinity. That's it, a finite space with an infinite surface! This can shock your common sense but it still strictly a question of strict observation and mathematics. Nothing new, just the point of view.
Fractal and self-similarity is more natural than euclidian geometry and gaussian probabilities, IMHO. First scientist use to name everything they find, distinguing the aort from the veins, fingers from the toes, etc. In the chaos theory, it seems easier to talk about self-similarity of a single component. The vein is really simply a small aort, and fingers are really like toes who evoluate differently because of different conditions. Chaos simply say that: euclidian matahematics is may be easy for the human mind, it's not for the nature. Nature prefer too use the simpler way of fractal geometry, with no regards for our simple mind. The chaos is not a new invention, is only the tools that give us a new model of the world around us. A model who is more precise than previous one.
Finally, a little remark on the context of when Einstein had said: "God don't play with dice!" It was about quantic physics who said that probability is inherent to particules, for example, between their position and their energy. A particule CAN USE this incertitude to, e.g., gain enough energy to put itself out of the atom. That's call radioactivity. The problem Einstein had with this is that, even theorically, you can never predict what will happen. It's just like if God as said: "You're lucky being here, but they're still some chances that all the air of the atmosphere get suddenly out of the planet. I make thing this way, so even I can know which will happen." In fact, one interpretation of the quantic theory simply state that all probability happen, we simply not there to see the other.
Fabien Ninoles -- Debian GNU/Linux Developer
Damn discordians...
This looks like a job for SLASHDOT...
Seriously...
Either the graph is backwards or journalismology, the study of hype and fluff in published science, is prevaling here.
Actually it's worse rarity IS VERTICAL NOT HORIZONTAL.
Put differently however...
We should have two statistics:
One for situations where there's a constant change that affects the system and indirectly itself.
Use the new curve here.
For example: Number of hours of studying and test scores.
Small to high shouldn't be too steep in the beginning, but if you study for a very long time, there should be a quick drop because you don't actually understand the material and are attempting to memorise it.
Which proves something I've known for a while:
Computer Newbies Are Not Stupid. Give them some information and they get better. Spoonfeed them and they're helpless!
The old curve should still apply to where a hundred rocks land if dropped from a height. There's no interaction between the objects themselves.
As for the obviousness, BLAME IT ON NETWORK TV. All the mindnumbing leading brand detergent and pharmaceutical commercials are responsible for giving people a false sense of completion as far as statistical studies are concerned.
My take on it is DUH! or 32DOHS.wav.
The message on the other side of this sig is false.
That would make so little difference in most real world situations. If the average rainfall is 1 inch +- .1 inches, then according to the model, negative rainfall has probability > 0, but 10 sd's below the mean is ~10^-30 (top of the head estimate). That's not going to skew anything. AC
"Universality of rare fluctuations in turbulance and critical phenomena" S.T. Bramwell, P.C.W. Holdsworth and J.-F. Pinton, Nature 396, 552 (1998)
Now why didn't the FT just give a reference, and save us all some time?
The concept that there could be a "universal" curve behind all statistical phenomena silly. About as silly as the implication that scientists currently think Gaussian curves are universally applicable.
Every statistical process has a different distribution. Some are Gaussian, some become Gaussian under certain limits, and some can be approximated as Gaussians. But many others are simply something completely different: Lorentzian, Poissonian, etc. (Many of which have fatter tails than a Gaussian distribution.)
These guys may or may not have come up with something. But there's certainly no news value to the claim that some distributions are non-Gaussian.
-Steve Stuart
All that this article really says is that it appears that certain natural data fit a distribution other than the Gaussian. OK, fine. So do other processes. Besides, without some details on the distribution itself (give me formulae!) there's no real way to evaluate it.
"You can never have too many elephants on your team."
If you take the law of large numbers into account: You can make this assumption rapidly. It's obviously stated that anything outside the ordinary can and will happen due to external events and internal events. I highly doubted the "group of scientist" thought of the simple truth before make this assertion. The assertion your making is based on a law that defines behaviour similar to this. If you state that the behaviour trails or follows another curve at a particular given sample : then another sample with another variant distribution will follow another. Keep in mind the Guassian distribution assumes alot of data output due to unknown factors. This keeps the values represenative in the analytic major, but proves a central behavior for percetange, not en entirety. So to say that something follows a new curve is silly. It's nothing new. That's like skewing statistics for you own personal use. Please, teach your children math, so they don't grow up to make stupid assumptions.
"Fear the scientists! They can CONTROL your life! They know WHEN FLOODS will HAPPEN!"
Or maybe I'm reading too much into scientific illiteracy.
But I really _must_ protest the whole "Donald Turcotte recalls the hostile reaction he received when he suggested there might be a universal law linking patterns of mineral deposits, floods and landslides..." *sigh*
I hope someone comes up with a better article on this.
-=Best Viewed Using [INLINE]=-
Phrases like "has discovered a new mathematical curve" and "is derived from chaos theory" may sound good but don't tell me anything. Does anyone have a link to more informative articles?
I personally prefer the more voluptuous curves.
I wonder what implications this has, if any, to mechanics (clissical and maybe moreso quantum). With quantum being in a large part a statistical analysis, I wonder if some of these non gaussian events would describe better quantum mechanical events? I don't know? I'm just askin.
Now only if they GPL the curve. We should all write in and request they opensource the curve before microsoft can copyright it.
Marriage is the "pseudo-ethics" that cloaks the messy truth of sexuality in the raiment of propriety -- it's "Don't Ask,
one thing that i sort of wondered is if there
was a possibility that the curve is somehow
changing shape over time?
-i mean, theorhetically, universal laws could
be allowed to change slightly over millions of
years, yes/no?
assuming that every rare occurence has the
probabillity of spawning an infinite number
of new probable and improbable outcomes, one would
think the gaussian model would be correct,
but i'm sort of looking at this from the perspective that
an infinite number of rare occurances might not yet exist
(or is this simply things that already exist?)
hence, probable outcomes could only increase
everytime a rare occurance becomes probable,
where as rare outcomes have the capabillity to
increase...well, randomly?
-adam
showtell.com
javanet.com/~user
September 3, 1999
l ]
: ::::::::::: : :::::::::::
Wow!
It is interesting to see the response that this "research" article in the financial times generated. I'm a research associate (Bruce Malamud) working closely with Donald Turcotte. A student wrote me about the discussion your web site was having. Donald Turcotte was one of the scientists "quoted" in the financial times article. My research area has been in the areas of "time-series analysis" and also applying ideas of fractals and self-organized criticality to natural hazards. I did my Ph.D. with Donald Turcotte and am now doing a brief stint as a postdoc while I look for a "real" job in the world.
First of all, this Financial Times article was a "quickly" researched article on the part of the person who wrote it. Donald Turcotte was contacted and interviewed by phone on Tuesday/Wednesday, with no contact afterwards from the Financial Times to see how correct they got the overall picture. This is how things are and he and I both gulped when we saw how the article appeared. We quickly prepared a short "response" from him (below) to the deluge of e-mails and telephone calls that he received yesterday.
Bottom line, he was a bit misquoted, but the general idea holds. We are talking about applying the ideas of power-law frequency-size distributions (i.e., fractals) to extreme events, including floods, forest-fires, earthquakes, landslides, etc. Donald Turcotte has been active for many years in the area of applying fractals, self-organized criticality, and chaos theory to the earth sciences, and yes, he knows very well that he did not "invent" the idea, just made many applications (well, a bit more then that, but read his book).
On the most basic level (and no, I'm not trying to be insulting, I'm sure many people on this site know what I'm talking about already as this is basic statistics), at one level the idea is a very simple one. Plot the frequency-size distribution of a set of data and see what curve is that best fits the data, i.e. what might be the underlying distribution. For some sets of data (such as forest-fire burn areas, earthquakes, and many other "natural" data sets) the frequency-size distribution follows a nice straight line in log-log space, i.e. it is follows a power-law (fractal or self-similar) distribution. Although one cannot say for SURE what an underlying distribution is, one can make certain (statistical) guesses as to whether a distribution follows more a Gaussian, log-normal, power-law, etc.
Once on "believes" that a set of data follows a certain distribution, one can then begin to make some guesses as to what an "extension" of that curve might bring in time. If one has 30 years of flood-discharge data, one might then be able to make certain predictions as to the "size" of what the 100-year flood might be. Same with earthquakes. One has a better idea of the probability of having a certain size or greater earthquake, flood, forest-fire, etc. each year. It just happens that many of these events appear to follow power-law distributions, and these are not as "accepted" in the statistical community.
Don just came in and is looking over my shoulder. He adds (to my above comments) that statisticians do not in general recognize power-law distributions because one cannot define a pdf for them. (Although one can define pdf's for certain distributions that are similar distributions to the power-law distributions, such as the Pareto distribution).
So...in terms of the insurance community, they are of course very interested if a given "natural hazards" appears to follow more a power-law distribution vs. log-normal or Gaussian, as the resulting recurrence intervals will be very different. Power-law distributions tend to be very conservative for extreme events, i.e. one would expect more larger events in a given period of time, then say a Gaussian distribution. Others of course interested in this underlying distribution would be engineers trying to decide how big a flood one might expect in a given area in a given amount of time (and yes, we're dealing with extreme events, so the statistics are small and unsure), so as to know where people can build houses, how deep to make the bridge supports, etc. Bottom line is the statistics are unsure because there the data sets are small, but people need some sort of a starting point as a lot of money rides on the answers of what the "underlying" distribution might be.
There are also many scientific implications, ranging from the simple "describing" what distribution a data set best follows, to understanding better (or in a different way) the underlying basic physics or equations that describe a given natural phenomena due to a better understanding of the statistics resulting from the equations vs. the actual data. In addition, many scientists are now beginning to think that the pervasive power-law distribution in nature is a general indication of self-organized critical behavior. One definition of self-organized behavior is when one has a complex system with a small steady input, and a power-law distribution of the "avalanches" (the events). Donald Turcotte and I wrote a paper (in Science, see below) applying this general idea of self-organized criticality to computer models and forest fires. Of the references listed below, this is probably the easiest for people to get.
OK, before I start babbling. Below is the "reply" that Donald Turcotte wrote to many of the e-mails that came in during the last day.
Bruce Malamud
_________________________________________
Wednesday September 2, 1999
Ithaca, NY, USA
Dear Interested Reader:
Due to the large number of e-mails and telephone calls I have received with respect to the articles by Michael Peel, "New Curve Makes Life Predictable" and "Redrawing the Curve Reveals New Pattern of Events", that appeared in the Financial Times, September 2, 1999, I have prepared a short general reply. If you have further questions or comments after reading the below "comment" to the article, please do not hesitate to contact me for further information.
These Financial Times articles emphasize the importance of power-law (also called fractal or fat-tail) distributions in estimating the probability of occurrence of extreme events. It is unfortunate the article implies that I invented the idea of power-law distributions, which have been recognized now for many decades. For instance, earthquake hazard assessment is based mainly on the Gutenberg-Richter relation; which is a power-law distribution of the number of earthquakes as a function of their magnitude [for some papers where I discuss this, see DLT, Annual Review of Earth and Planetary Sciences, Vol. 19, p. 263-281, 1991; DLT, Physics of Earth and Planetary Interiors, Vol. 111, 275-293, 1999].
My work in power-law distributions is based on the concept of fractals, which is due to the pioneering work of Benoit Mandelbrot [for instance, see his book, The Fractal Geometry of Nature, Freeman, San Francisco, 1982]. Mandelbrot, along with many other researchers, have applied the concept of fractals to many phenomena in the natural and "man-made" world, including to financial time series. Other distributions, similar to the power-law, such as the Pareto distributions, have also been used for a long time. A good web page which discusses fractals and has many links is The Spanky Fractal Database (http://spanky.triumf.ca/].
My own contributions have concerned applications to natural hazards and related phenomena. These are set forward in detail in my book [DLT, Fractals and Chaos in Geology and Geophysics, 2nd ed., Cambridge University Press, Cambridge, 1997] and in a major review paper on self-organized criticality [DLT, Reports on Progress in Physics, Vol. 62, 1999, available as a pdf document (preprint) which can be sent upon request].
The principal contributions of my group have been the applications of fractal distributions to:
(1) Fragmentation (by explosions in asteroids, etc.). [DLT, Journal of Geophysical Research, Vol. 91, p. 1921-1926, 1986]
(2) Mineral deposits. [DLT, Economic Geology, Vol. 81, p. 1528-1532, 1986]
(3) Floods. [DLT and L. Greene, Stochastic Hydrology and Hydraulics, Vol. 7, p. 33-40, 1993; DLT Journal of Research NIST, Vol. 99, p. 377-389 1994; B.D. Malamud, DLT, and CC Barton, Environmental and Engineering Geosciences, Vol. 2, p. 479-486, 1996. The last paper is available as a pdf document at http://coastal.er.usgs.gov/barton/pubs_online.htm
(4) Landslides. [J.D. Pelletier, B.D. Malamud, T. Blodgett, and DLT. Engineering Geology, Vol. 48., p. 255-268, 1997; available as a postscript file at http://www.gps.caltech.edu/~jon/]
(5) Forest Fires. [B.D. Malamud, G. Morein, and DLT. Science, Vol. 281, p. 1840-1842, 1998; available as a pdf document for subscribers of Science, web site: http://www.sciencemag.org/]
Many extreme-value events are directly related to time series that exhibit persistence or memory (for instance, time series of temperature, river discharge, the stock market, etc.). A good reference to applying persistent techniques (and a discussion of how to apply the techniques) is Advances in Geophysics, Vol. 40, B.D. Malamud, J.D. Pelletier, and DLT.
Two other colleagues that have used power-law techniques applied to natural hazards include Dr. Bruce D. Malamud (Cornell University, e-mail: Bruce@Malamud.Com) and Dr. Christopher C. Barton (USGS, e-mail: barton@usgs.gov, home page: http://coastal.er.usgs.gov/barton/).
Again, please do not hesitate to contact me for further questions.
Donald L. Turcotte
Maxwell Upson Professor of Engineering
:::::::::::::::::::::::::::::::::::::::::::::::
:: Donald L. Turcotte
:: Department of Geological Sciences
:: Cornell University, Snee Hall
:: Ithaca, NY 14853-1504, USA
:: Office: 607-255-7282; Fax: 607-254-4780
:: e-mail: turcotte@geology.cornell.edu
:::::::::::::::::::::::::::::::::::::::::::::::
"In the same way, the distribution of species in 1 sq km of forest ought to be similar to that found in one square metre of the same habitat. Likewise, a turbulent or magnetic system is made up of a series of miniature systems, each of which is made up of a set of yet smaller arrays."
Who comes up with this? Are the scientists expecting us to believe that this is valid information?
"Self-similarity in the distribution and abundance of species", J.Harte, A. Kinzig, J.Green, Science 284 334 1999
The Gaussian is very useful and very important, but it is easy to misuse it. The Gaussian *cannot* be strictly correct when measuring a quantity that cannot be negative (such as number of individuals in a species, or an absolute strength of magnetization) because it extends with non-zero probability infinately in both directions. (It may still be a very good approximation, however, if the mean is many standard deviations above zero.)
Another point is that any deviations between your true distribution and the Gaussian are likely to be most noticable at the extremes. Studies of extreme values should not use Gaussian distributions unless there is good theoretical and/or experimental support for doing so. Assuming a Gaussian distribution of heights is fine if you want to sell trousers, but is likely very poor if you want to recruit basket ball players or jockeys.
does the new curve apply to the rolling of two six sided dice and if so how can one use it to make money at the craps table?
Could someone telle me what is realy new in that ? Looking to that articles (examples), the presented courve are a simple convolution of the gaussian function and phasespace (sorry for slang, probability density (?)).
What these guys are saying is that they have discovered a series of apparently unrelated problems that appear to share the same, non-gaussian distribution.
Processes that follow the process that gives yield to the distribution are not gaussian but slightly skewed in their outliers. I'm waiting to read something more technical before I conclude that they are full of shit.
In my own field (financial modelling) we don't have a good distribution for stock market activity. It's usually modelled lognormal, but it is clear that there are too many big drop days for the true distribution to be lognormal. People try to patch lognormal distributions, but the results are mixed. We use the lognormal distribution because it's the best one that we've got realizing that we're not going to catch those big drop days in the models. If a new distribution has a reasonable physical model underlying it with parameters that are readily estimable, you can bet that we'll use it. But we just can't tell from *this* article what exactly they've found here.
I have discovered a truly marvelous sig, unfortunately the sig limit is too small to contain i
The problem here is how you define and measure a rare occurence. Let me give you an example.
Let's say one night you watch the results of the lottery on TV, and the numbers '1-2-3-4-5-6' come up. Is that a rare occurence? No. That sequence is as likely to occur than your birthday and your girlfriend's birthday combined into esoteric equations.
Example number 2: I'm with this girl one night. I say my astrological sign is Scorpio. "Really!" she exclaims, "I'm Scorpio too!" What are the probabilities of that happening? 1/144? No, just 1/12. At one point (and cryptos will be familiar with this) if you add people, it becomes a rare event that you do not find people with the same sign.
All that graph is showing me is that the guys (I'm hesitating to call them scientists - I mean, they published in "serious papers"? Come on. Names, please) looked purposefully for freak occurences, discarding other "rare" occurences that were perfectly normal. That's why the left side of the graph is wider.
Thing is, the Gaussian curve doesn't come out of nowhere; it's not arbitrary. For instance, in statistical mechanics and quantum mechanics, you get bell curve distributions precisely because of the distribution of particle states.
All these guys are saying is, "rare events are not as rare as we think they are". That's not because the bell curve is wrong, it's because we seem to forget how huge the Earth provides a sample.
What are the odds of being struck by lightning twice? One in a billion? We're 6 billion on this Earth. It's bound to happen to someone. Same thing with winning the grand prize lottery once or twice.
And, again, same thing with floods or tornadoes. Yes, in themselves they're rare. When taken alone they seem improbable. But on the scale of the planet, that's the kind of thing that happens.
Alright, anyone got another article on cold fusion lying around?
"There is no surer way to ruin a good discussion than to contaminate it with the facts."
What a frustrating article. There are no technical details, and no links. Does anyone know what this wonderful, new, universal curve actually is?
Curiously, the article seems devoid of references to any articles on this new curve -- any one out there know the journals/articles? Any hard math posted online somewhere?
Yeah, that explains one side; but specifically (I think) all they are saying is that the bell curve is skewedin some cases. BFD, this is not a major breakthrough based on my reading of the article. Perhaps the author is not explaining a real breakthrough well or a silly reasearcher has deluded himself into thinking he's found something new.