Cause and Effect: How a Revolutionary New Statistical Test Can Tease Them Apart
KentuckyFC writes Statisticians have long thought it impossible to tell cause and effect apart using observational data. The problem is to take two sets of measurements that are correlated, say X and Y, and to find out if X caused Y or Y caused X. That's straightforward with a controlled experiment in which one variable can be held constant to see how this influences the other. Take for example, a correlation between wind speed and the rotation speed of a wind turbine. Observational data gives no clue about cause and effect but an experiment that holds the wind speed constant while measuring the speed of the turbine, and vice versa, would soon give an answer. But in the last couple of years, statisticians have developed a technique that can tease apart cause and effect from the observational data alone. It is based on the idea that any set of measurements always contain noise. However, the noise in the cause variable can influence the effect but not the other way round. So the noise in the effect dataset is always more complex than the noise in the cause dataset. The new statistical test, known as the additive noise model, is designed to find this asymmetry. Now statisticians have tested the model on 88 sets of cause-and-effect data, ranging from altitude and temperature measurements at German weather stations to the correlation between rent and apartment size in student accommodation.The results suggest that the additive noise model can tease apart cause and effect correctly in up to 80 per cent of the cases (provided there are no confounding factors or selection effects). That's a useful new trick in a statistician's armoury, particularly in areas of science where controlled experiments are expensive, unethical or practically impossible.
How does this cause effect you?
>provided there are no confounding factors or selection effects
So that'll provide plenty of material for medical researchers, nutrition researchers, education researchers and economists to keep doing what they're doing.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
So once we start using this on everything, 1 out of every 5 times, it will lead us to bogus conclusions with false statistical confidence....
So the noise in the effect dataset is always more complex than the noise in the cause dataset....... the additive noise model can tease apart cause and effect correctly in up to 80 per cent of the cases
In other words, not always.
"First they came for the slanderers and i said nothing."
Reading through the article, it wasn't clear to me how it is determined whether it worked correctly or not.
But still, an interesting statistical breakthrough, and one that allows researches to ask interesting questions about their data.
Lies, damned lies and statistics.
It must have been something you assimilated. . . .
Well, of course it can. How do you think causation is determined? First by noticing a correlation. There can't be causation without correlation.
Gawd I hate the brain-dead fools who thoughtlessly parrot, "Correlation is not causation!"
I drank too much wine, I must take a piss.
So if Z causes both X and Y, I assume that this amazing test gives garbage?
You blithering idiots have not exactly solved Hume's fundamental Problem of Induction.
Is that a joke for the quantitatively pedantic?
Hey, we have this new technique. It's somewhere between 0% and 80% reliable.
Now that they've found a way to filter out ("ignore") data that doesn't fit, maybe now they'll actually be able to conclusively prove that climate change exists!
Oh, wait. They are already ignoring the data that doesn't fit, so I guess this won't help. Well, maybe sometime in the next 50 years they'll actually come up with a model that is accurate for more than 2-3 years in the future.
80 percent accuracy would get you laughed out of a room 100% of the time.
I only read the summary, not TFA, but it doesn't seem like a new idea to me. I can think of scads of areas in engineering in which assuming there is (typically independent, Gaussian) noise in the model and/or the measurement is the basis that makes the calculations work out. E.g., using random pertubations and a Kalman filtering algortihm to uncover the model of an unknown system knowing only the output of the black box.
Many other attempts at detecting causality exist. There's one based on dynamical systems theory (Takens' theorem): in a multidimensional, causally linked dynamical system, all the information in the high-dimensional system can be recovered from a multiple values of a single dimension over time.
The method works by reconstructing values of X from lagged vectors of Y(t) nearest-neighbor lagged vectors of Y in a training set. As the training set gets larger, the predictions get better. If they keep getting better, X probably causes Y. The idea that the noise in X(t) shows up in Y(t) but not the other way around is implicitly captured in that approach, although not in a statistically rigorous way.
Sugihara et al. Science 2012 (sorry about paywall).
You don't need to do this. It is obvious if you just look at the raw fucking data.
Statisticians have gone full circle (and full retard). They started out "adjusting" data - dropping outliers, massaging noisy data, etc. all in an attempt to make it fit a preconceived pattern in order to shit together a graph for a powerpoint. Now they want to add that "noise" back in (but only after tweaking it further).
Rule fucking 1 of data analysis: Don't fucking delete your fucking data you fucking dipshit.
2) A whole bunch of people totally ignoring this study because they don't like what it means.
excitingthingstodo.blogspot.com
Finally, we can discover whether increased crime causes ice cream sales to rise...or if it's the other way around.
It implicitly presumes that there is some relatively direct casual relationship between the two events.
Fundamental flaw.
you down. It was NIIGER ASSOCIATION decentralized fly...don't fear PEOPLE'S FACES IS has ground to a
The turbine example is poor. Adequate data will show causality in time between a wind gust, and a delayed turbine rotation rate. Momentum easily causes a lag between one data set and the other, and the concept of time running in one direction can easily be used to suggest causality.
I am more curious about a test that would show if 2 data sets are clearly caused by a third non-measured factor.
Which direction in time does cause/effect flow? The world may never know.
I looked at the article - I don't understand how this is different than a covariance matrix?
Example of bad data: a series of measurements of windspeed that has, during the series, a block of flats put right next to it.
This is the "Garbage In" that climate deniers are supposed to be against because it gives "Garbage Out", however, they often DEMAND the garbage data is put in, without any reference to the factors you provide for making sense of that bad data.
Can they say:
Does A cause B? Probably not.
Does B cause A? Probably not.
So there's probably a C causing A and B.
There's a lot of probablys in that.
... light at the end of the tunnel re: Chicken v. Egg... Pretty interesting though!
Wind is generated by the turbine, and turbine spins due to the wind. If one only takes measurements at the steady-state situation, there is no way to tell which what is cause and what is effect!
This excellent blog article describes a technique developed by Judea Pearl decades ago to do exactly this. Would be interested to understand how this is different/better.
how to interpret noise with regard to altitude or temperature? I guess it is much more related to Observational error
Hmmm. there is a lot more noise in the global temperature data than there is the atmospheric concentration of CO2.