Cause and Effect: How a Revolutionary New Statistical Test Can Tease Them Apart
KentuckyFC writes Statisticians have long thought it impossible to tell cause and effect apart using observational data. The problem is to take two sets of measurements that are correlated, say X and Y, and to find out if X caused Y or Y caused X. That's straightforward with a controlled experiment in which one variable can be held constant to see how this influences the other. Take for example, a correlation between wind speed and the rotation speed of a wind turbine. Observational data gives no clue about cause and effect but an experiment that holds the wind speed constant while measuring the speed of the turbine, and vice versa, would soon give an answer. But in the last couple of years, statisticians have developed a technique that can tease apart cause and effect from the observational data alone. It is based on the idea that any set of measurements always contain noise. However, the noise in the cause variable can influence the effect but not the other way round. So the noise in the effect dataset is always more complex than the noise in the cause dataset. The new statistical test, known as the additive noise model, is designed to find this asymmetry. Now statisticians have tested the model on 88 sets of cause-and-effect data, ranging from altitude and temperature measurements at German weather stations to the correlation between rent and apartment size in student accommodation.The results suggest that the additive noise model can tease apart cause and effect correctly in up to 80 per cent of the cases (provided there are no confounding factors or selection effects). That's a useful new trick in a statistician's armoury, particularly in areas of science where controlled experiments are expensive, unethical or practically impossible.
>provided there are no confounding factors or selection effects
So that'll provide plenty of material for medical researchers, nutrition researchers, education researchers and economists to keep doing what they're doing.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
So once we start using this on everything, 1 out of every 5 times, it will lead us to bogus conclusions with false statistical confidence....
So the noise in the effect dataset is always more complex than the noise in the cause dataset....... the additive noise model can tease apart cause and effect correctly in up to 80 per cent of the cases
In other words, not always.
"First they came for the slanderers and i said nothing."
Reading through the article, it wasn't clear to me how it is determined whether it worked correctly or not.
But still, an interesting statistical breakthrough, and one that allows researches to ask interesting questions about their data.
Lies, damned lies and statistics.
It must have been something you assimilated. . . .
Well, of course it can. How do you think causation is determined? First by noticing a correlation. There can't be causation without correlation.
Gawd I hate the brain-dead fools who thoughtlessly parrot, "Correlation is not causation!"
I drank too much wine, I must take a piss.
So if Z causes both X and Y, I assume that this amazing test gives garbage?
You blithering idiots have not exactly solved Hume's fundamental Problem of Induction.
Is that a joke for the quantitatively pedantic?
Hey, we have this new technique. It's somewhere between 0% and 80% reliable.
80 percent accuracy would get you laughed out of a room 100% of the time.
80% of the time it confirms the scientits' exectations.
I only read the summary, not TFA, but it doesn't seem like a new idea to me. I can think of scads of areas in engineering in which assuming there is (typically independent, Gaussian) noise in the model and/or the measurement is the basis that makes the calculations work out. E.g., using random pertubations and a Kalman filtering algortihm to uncover the model of an unknown system knowing only the output of the black box.
Many other attempts at detecting causality exist. There's one based on dynamical systems theory (Takens' theorem): in a multidimensional, causally linked dynamical system, all the information in the high-dimensional system can be recovered from a multiple values of a single dimension over time.
The method works by reconstructing values of X from lagged vectors of Y(t) nearest-neighbor lagged vectors of Y in a training set. As the training set gets larger, the predictions get better. If they keep getting better, X probably causes Y. The idea that the noise in X(t) shows up in Y(t) but not the other way around is implicitly captured in that approach, although not in a statistically rigorous way.
Sugihara et al. Science 2012 (sorry about paywall).
2) A whole bunch of people totally ignoring this study because they don't like what it means.
excitingthingstodo.blogspot.com
Don't fucking delete your fucking data you fucking dipshit.
Unless, of course, you know that some fucking data is bad, and other fucking data is good. In that case it makes sense to fucking delete the fucking bad data.
Finally, we can discover whether increased crime causes ice cream sales to rise...or if it's the other way around.
But then how are you supposed to get your research published?
sysadmins and parents of newborns get the same amount of sleep.
Wow, so angry! Look at all those fucks and fuckings you wrote! boy are you mad, ropeable even...spitting tacks, cross and angry angry!
So much vile, so much hatred, so much angry anger and swearing and lots and lots of fucks!
Boy, you are as mad as anybody I have ever seen.
So angry!
Wait...verifying........ ...yes, hes angry alright! So angry!
So...um...why so angry bro?
You torture the data until they confess.
I would like to officially confirm that, indeed, OP is angry.
sexconker, can you please point to me the place on the doll where the bad ebil statistician touched you?
We will get you some therapy sorted out. Please dont rape, torture and mutilate the dead body of an innocent person in the meantime.
So angry!
It implicitly presumes that there is some relatively direct casual relationship between the two events.
Fundamental flaw.
Now that they've found a way to filter out ("ignore") data that doesn't fit, maybe now they'll actually be able to conclusively prove that climate change exists!
AGW skeptic here, but I'd be very cautions about applying this technique to climatic data to try and prove anything.
This technique works best there there are a limited number (read: two) variables, and a clear cause & effect (ie. one variable is dependant). At least that's my understanding.
Climate data is mindbogglingly complex, with a huge number of know variables with known and unknown dependencies. Even something as seemingly straightforward as the carbon cycle has a large number of feedbacks, which (again as I understand it) would only mess this approach up.
To my mind, the AGW hypothesis either succeeds or fails based on the predictions it makes and how much in-line those predictions are with observed reality. Clever statistical tricks don't help nor lend credibility in either case.
The turbine example is poor. Adequate data will show causality in time between a wind gust, and a delayed turbine rotation rate. Momentum easily causes a lag between one data set and the other, and the concept of time running in one direction can easily be used to suggest causality.
I am more curious about a test that would show if 2 data sets are clearly caused by a third non-measured factor.
There's no such thing as bad data, only bad methods of taking measurements. If you can't quantify, precisely and deterministically, what's wrong with your measurement method, then trying to just noise-filter the resulting data is a net loss of real data.
Which direction in time does cause/effect flow? The world may never know.
I looked at the article - I don't understand how this is different than a covariance matrix?
It seems you don't like statistics ... wondering how you would 'look' TBs of raw data (e.g. from the LHC)?
Example of bad data: a series of measurements of windspeed that has, during the series, a block of flats put right next to it.
This is the "Garbage In" that climate deniers are supposed to be against because it gives "Garbage Out", however, they often DEMAND the garbage data is put in, without any reference to the factors you provide for making sense of that bad data.
Can they say:
Does A cause B? Probably not.
Does B cause A? Probably not.
So there's probably a C causing A and B.
There's a lot of probablys in that.
... light at the end of the tunnel re: Chicken v. Egg... Pretty interesting though!
Wind is generated by the turbine, and turbine spins due to the wind. If one only takes measurements at the steady-state situation, there is no way to tell which what is cause and what is effect!
This excellent blog article describes a technique developed by Judea Pearl decades ago to do exactly this. Would be interested to understand how this is different/better.
I love statistics. I hate "statisticians".
You can't know your data is bad when doing experimentation. That's the point of experimentation - you control variables and observe others to test a hypothesis.
The point at which you can KNOW data is bad is the point at which you know all of the variables and all the details of the phenomena observing. It's like "experimenting" with 1+1 on a calculator. When it give you a 12 you know you've got bad data (you keyed in 11+1 or 1+11 or something), but that's only because you know the entire system and what it's supposed to do. It's not an experiment at that point, and there's no fucking point in doing it.
If you're experimenting on something then you don't know the entire system. If you don't know the entire system then you cannot know for sure whether any data is bad or not.
Even without going to that extreme. "bad data" - even obviously "bad data" - is merely a failure to control variables. The methodology and experiment as a whole is then suspect. Repeat with better control and methodology, or deal with the small amount of ugliness in the graph that the "bad data" may have contributed.
Understandable reaction to the quantity of smugness in the story?
how to interpret noise with regard to altitude or temperature? I guess it is much more related to Observational error
Hmmm. there is a lot more noise in the global temperature data than there is the atmospheric concentration of CO2.
Now that they've found a way to filter out ("ignore") data that doesn't fit, maybe now they'll actually be able to conclusively prove that climate change exists!
Oh, wait. They are already ignoring the data that doesn't fit, so I guess this won't help. Well, maybe sometime in the next 50 years they'll actually come up with a model that is accurate for more than 2-3 years in the future.
There's undoubtedly more noise in climate data than in CO2 data, so you've just reminded us about the "climate makes CO2 rise, not the other way around" argument and it is now even more clear that it is false. Good job!.
Star Trek transporters are just 3d printers.