Slashdot Mirror


Cause and Effect: How a Revolutionary New Statistical Test Can Tease Them Apart

KentuckyFC writes Statisticians have long thought it impossible to tell cause and effect apart using observational data. The problem is to take two sets of measurements that are correlated, say X and Y, and to find out if X caused Y or Y caused X. That's straightforward with a controlled experiment in which one variable can be held constant to see how this influences the other. Take for example, a correlation between wind speed and the rotation speed of a wind turbine. Observational data gives no clue about cause and effect but an experiment that holds the wind speed constant while measuring the speed of the turbine, and vice versa, would soon give an answer. But in the last couple of years, statisticians have developed a technique that can tease apart cause and effect from the observational data alone. It is based on the idea that any set of measurements always contain noise. However, the noise in the cause variable can influence the effect but not the other way round. So the noise in the effect dataset is always more complex than the noise in the cause dataset. The new statistical test, known as the additive noise model, is designed to find this asymmetry. Now statisticians have tested the model on 88 sets of cause-and-effect data, ranging from altitude and temperature measurements at German weather stations to the correlation between rent and apartment size in student accommodation.The results suggest that the additive noise model can tease apart cause and effect correctly in up to 80 per cent of the cases (provided there are no confounding factors or selection effects). That's a useful new trick in a statistician's armoury, particularly in areas of science where controlled experiments are expensive, unethical or practically impossible.

30 of 137 comments (clear)

  1. No problem. by TechyImmigrant · · Score: 4, Insightful

    >provided there are no confounding factors or selection effects

    So that'll provide plenty of material for medical researchers, nutrition researchers, education researchers and economists to keep doing what they're doing.
     

    --
    I should use this sig to advertise my book ISBN-13 : 978-1501515132.
    1. Re:No problem. by Noah+Haders · · Score: 5, Funny

      one weird trick to separate cause and effect!

    2. Re:No problem. by Anonymous Coward · · Score: 2, Funny

      Statisticians HATE him!

    3. Re:No problem. by Anonymous Coward · · Score: 5, Insightful

      I can't thing of any cases where I know there are no confounding factors but don't know which is the cause and which is the effect.

      Also, when it comes to medical stuff, or any human observational study, I can't think of any that don't have selection effects as well. Its a neat trick, but I honestly can't think of a single case where it applies in a useful way. Does anyone have an example?

      The article starts with this example of a confounding factor (which makes this test not applicable):

      That turned out to be an erroneous conclusion. Later studies showed that women who took hormone replacement therapy were likely to be from higher socio-economic groups with higher incomes, better diets and generally healthier outcomes. It was this that caused the correlation the earlier studies had found. By contrast, proper randomised controlled trials showed that hormone replacement therapy actually increased the risk of heart disease.

      This test may sometimes be able to provide evidence against causation in such cases (which is useful) but it can't determine causation (because there may be confounding factors). That may be news worthy, but it deserves a more accurate headline: new statistical test can form confidence bounds for how unlikely a it would be for a new parameter to be of this magnitude if there were causation: when combined with existing test it may discredit more potential claims of causation than previously practical.

    4. Re:No problem. by wiredlogic · · Score: 2

      At least PBS will be able keep up their snake oil infomercials and I won't feel guilty for not supporting them.

      --
      I am becoming gerund, destroyer of verbs.
    5. Re: No problem. by TechyImmigrant · · Score: 4, Insightful

      If you stop the wind all of a sudden, the turbine will continue to turn, causing wind, until the energy in the turbine is spent.

      --
      I should use this sig to advertise my book ISBN-13 : 978-1501515132.
    6. Re:No problem. by rdnetto · · Score: 2

      I suspect the test could be generalized to work for N variables, since the noise should increase as we move along a causal chain. The only issue is the exponential drop-off in confidence. If the accuracy could be improved, it could be quite useful for deriving or verifying Bayesian networks.

      --
      Most human behaviour can be explained in terms of identity.
    7. Re:No problem. by gzuckier · · Score: 2

      Indeed. Famous, perhaps apocryphal, finding that storks bring babies, correlating postwar stork population in Europe with birth rate; confounding factor was spike in marriage rate after war, resulting in more babies, and more houses (in the chimneys of which storks nest). Can't take that apart by comparing noise rate in stork count and in baby count.

      --
      Star Trek transporters are just 3d printers.
    8. Re:No problem. by gzuckier · · Score: 2

      The other thing they fail to understand is that causality is patently obvious in the vast majority of cases where there are no confounding factors.

      Probably the social sciences are most in need tests like this, as they are always trying to pin some outcome on some input in a bubbling cauldron of alternatives. But of course, the cauldron is full of confounding factors.

      Still going to need to elucidate reasonably valid mechanism to convince anybody of anything.

      --
      Star Trek transporters are just 3d printers.
    9. Re: No problem. by gzuckier · · Score: 2

      I still believe that the only reason fridges and freezers are cold is because you keep buying cold stuff, and putting it into them. That's why they need to be insulated. All those electric motors and stuff just keep chugging heat out.

      --
      Star Trek transporters are just 3d printers.
    10. Re:No problem. by mcswell · · Score: 2

      It may be obvious, but that doesn't mean it isn't contested. An example (which Pearl uses in his book Causality) is lung cancer and smoking. It was obvious to most people that smoking caused lung cancer, but another possibility was that there was a genetic predisposition to lung cancer, and that genetic factor also caused people to want to smoke. The tobacco industry in fact argued this, and (IIUC) it took some time before the direction of causation could be established in the legal sense.

      An example I heard about just yesterday involved exercise and health. The question was not so much whether exercise improved health (that is obvious), but how the causation worked. The study I read about said exercise had been shown to cause methylation of DNA. Establishing that causal relationship was done experimentally by having people exercise one leg and not the other.

      Causation in economics is also hard to establish (I'm told--I'm glad not to be an economist).

      So no, I don't think causation is always obvious.

  2. Always by phantomfive · · Score: 3, Interesting

    So the noise in the effect dataset is always more complex than the noise in the cause dataset....... the additive noise model can tease apart cause and effect correctly in up to 80 per cent of the cases

    In other words, not always.

    --
    "First they came for the slanderers and i said nothing."
    1. Re:Always by Mr+D+from+63 · · Score: 5, Interesting

      This is the tricky part, and it seems to work if you know exactly the cause and effect in advance, so you know which data to look at. It is quite clever though, and would seem to have application as an indicator if nothing else.

      I recall some equipment monitoring techniques used in my industry. There were reams of data. If a piece of equipment failed, you could go back and look at the data and see that there were indications. But filtering those indications out as useful input was always the problem. Only the blatant, in your face indications were caught. I see a similar problem here, that you might be able to show cause and effect with this data in hindsight, but it won't be so clear when you don't know the answer already.

    2. Re:Always by phantomfive · · Score: 3, Interesting

      Indeed, it's easy to think of situations where the opposite is true, where the noise is simpler in the 'effect' than in the 'cause,' because there is some attenuation factor in between that reduces the noise. That's more or less what a damper or shock absorber is designed to do. And a low pass filter in audio does the same thing.

      Now you might say, "obviously a low-pass filter is in the way, and that's causing the difference" but that gets back to your point, where it's easy to figure out when you already know the system, but if you don't, then it's not so easy.

      --
      "First they came for the slanderers and i said nothing."
    3. Re:Always by itzly · · Score: 2

      Probably because A and B have a large overlap in time, combined with poor record keeping at the beginning.

    4. Re:Always by TechyImmigrant · · Score: 3, Informative

      An algorithm changes its behavior based on the value.

      The example I gave is a sneaky algorithm in the FIPS spec that deletes consecutive values when they match.
      I.E.
      If this_value == last_value:
          don't output this_value
      else
          do output this_value.

      This is on the output of an RNG and so it reduces the entropy in the random numbers because there are no matching consecutive numbers, whereas in a full entropy stream, all pairs would be equally likely.

      In the context of noise in statistical analysis, it can confound the additive noise models.

      Algorithms that do things to data, but don't look at the values of the data when deciding what to do are not data dependent and so that limits the scope various bad things to happen.

      --
      I should use this sig to advertise my book ISBN-13 : 978-1501515132.
  3. That 20% is the killer though by neilo_1701D · · Score: 2

    Reading through the article, it wasn't clear to me how it is determined whether it worked correctly or not.

    But still, an interesting statistical breakthrough, and one that allows researches to ask interesting questions about their data.

  4. So, correlation CAN mean causation? by Anonymous Coward · · Score: 5, Insightful

    Well, of course it can. How do you think causation is determined? First by noticing a correlation. There can't be causation without correlation.

    Gawd I hate the brain-dead fools who thoughtlessly parrot, "Correlation is not causation!"

    1. Re:So, correlation CAN mean causation? by Anonymous Coward · · Score: 2, Interesting

      Gawd I hate the brain-dead fools who thoughtlessly parrot, "Correlation is not causation!"

      The proper term is: "Correlation does not imply causation". Perhaps you are being pendantic, but I'd rather hang around people who think "Correlation is not causation" (since it is more correct), than people who think "Correlation is causation".

    2. Re:So, correlation CAN mean causation? by Wraithlyn · · Score: 4, Interesting

      I prefer "Correlation does not prove causation".

      Edward Tufte suggested "Correlation is not causation but it sure is a hint."

      --
      "Mind, as manifested by the capacity to make choices, is to some extent present in every electron." -Freeman Dyson
  5. Other causality tests exist by Anonymous Coward · · Score: 5, Informative

    Many other attempts at detecting causality exist. There's one based on dynamical systems theory (Takens' theorem): in a multidimensional, causally linked dynamical system, all the information in the high-dimensional system can be recovered from a multiple values of a single dimension over time.

    The method works by reconstructing values of X from lagged vectors of Y(t) nearest-neighbor lagged vectors of Y in a training set. As the training set gets larger, the predictions get better. If they keep getting better, X probably causes Y. The idea that the noise in X(t) shows up in Y(t) but not the other way around is implicitly captured in that approach, although not in a statistically rigorous way.

    Sugihara et al. Science 2012 (sorry about paywall).

  6. I predict by gurps_npc · · Score: 2
    1) A sudden disappearance of studies claiming that video games cause violent behavior rather than the other way around.

    2) A whole bunch of people totally ignoring this study because they don't like what it means.

    --
    excitingthingstodo.blogspot.com
  7. Re:Great... by Black+Parrot · · Score: 2

    The standard t-test for detecting an effect is already probabalistic. In science and medicine a 95% confidence value is commonly used, which means a 1/20 of detecting something that isn't there.

    --
    Sheesh, evil *and* a jerk. -- Jade
  8. Re:What if there is a third party? by Black+Parrot · · Score: 2

    So if Z causes both X and Y, I assume that this amazing test gives garbage?

    Perhaps in some cases it would be possible to detect that both X and Y were being affected by the same noise, implying the existence of some unknown Z?

    --
    Sheesh, evil *and* a jerk. -- Jade
  9. Re:David Hume by Black+Parrot · · Score: 5, Insightful

    Yes, but now we can find out whether we read Slashdot because we are nerds, or we are nerds because we read Slashdot.

    --
    Sheesh, evil *and* a jerk. -- Jade
  10. This tells us nothing about the arrow of time by Culture20 · · Score: 3, Insightful

    Which direction in time does cause/effect flow? The world may never know.

  11. Re:Lies, damned lies, and statistics by skids · · Score: 2

    Almost any level of accuracy above pure randomness can be fruitfully added to the bayesion inference process. You can pretty harmlessly add the pure noise as well, it's just not going to be fruitful.

  12. Re:Great... by Zephyn · · Score: 2

    So once we start using this on everything, 1 out of every 5 times, it will lead us to bogus conclusions with false statistical confidence....

    Apparently the Trident Gum people have been using this for decades.

  13. Re:Bad turbine example by BenSchuarmer · · Score: 3, Funny

    That's just what big turbine wants you to believe.

  14. Re:Great... by ConceptJunkie · · Score: 3, Funny

    So once we start using this on everything, 1 out of every 5 times, it will lead us to bogus conclusions with false statistical confidence....

    So, a vast improvement then? ;-)

    --
    You are in a maze of twisty little passages, all alike.