Slashdot Mirror


Scientists Are Failing To Replicate AI Studies (sciencemag.org)

The booming field of artificial intelligence (AI) is grappling with a replication crisis, much like the ones that have afflicted psychology, medicine, and other fields over the past decade. From a report: AI researchers have found it difficult to reproduce many key results, and that is leading to a new conscientiousness about research methods and publication protocols. "I think people outside the field might assume that because we have code, reproducibility is kind of guaranteed," says Nicolas Rougier, a computational neuroscientist at France's National Institute for Research in Computer Science and Automation in Bordeaux. "Far from it." Last week, at a meeting of the Association for the Advancement of Artificial Intelligence (AAAI) in New Orleans, Louisiana, reproducibility was on the agenda, with some teams diagnosing the problem -- and one laying out tools to mitigate it.

17 of 89 comments (clear)

  1. Isn't that the point? by ArhcAngel · · Score: 2, Insightful

    If you give ten people the exact same stimuli you will get ten different reactions to that stimuli. There will be a dominant leaning reaction but each person will asses the stimuli based on their personal history and beliefs. AI is an attempt to mimic the human thought process so if successful the same stimulus will start to generate different results as new data is processed. In fact the same stimulus can be perceived differently by the same person given different context. If you come to my door in the afternoon I might be glad to see you but if it is 3 AM I probably won't be.

    --
    "A person is smart. People are dumb, panicky dangerous animals and you know it." - K
    1. Re:Isn't that the point? by fluffernutter · · Score: 4, Insightful

      This is about applying the exact same stimuli during the upbringing of the same person and yet getting people with vastly different beliefs about the world. Pretty scary that such a psychopath will soon be trying to drive us around.

      --
      Laws are rules for the court, but merely a bottom bar to hit for life. Think beyond laws in your actions always.
    2. Re:Isn't that the point? by mikael · · Score: 2

      That's known with the quality of graphics rendering. With floating-point data, there's a technique known as guardband bits. These are extra bits of precision that remain internally within the floating point logic units. These aren't mandatory, but protect against numerical instability with small values. This can be visualized by comparing simple color gradients

      https://community.arm.com/grap...

      For some calculations like CFD, any overflow in one grid cell will expand outwards to all the other grid cells quite rapidly.

      --
      Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
    3. Re:Isn't that the point? by ShanghaiBill · · Score: 3, Interesting

      AI is an attempt to mimic the human thought process

      This is no more true than claiming that the Boeing 747 was designed to mimic a hummingbird's flight process.

    4. Re:Isn't that the point? by gweihir · · Score: 2

      You seem to have no clue what this research area deals with. It is not intelligence, despite the misleading name. It is automation.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  2. When I did Computer Science... by ssclift · · Score: 2

    ... an algorithm was something which reliably produced results when processing the same input. NN/AI people keep using that word, "algorithm", I do not think it means what they think it means...

    1. Re:When I did Computer Science... by K.+S.+Kyosuke · · Score: 2

      So maybe the point is that it's not entirely about algorithms, is it? After all, animals aren't algorithms either.

      --
      Ezekiel 23:20
    2. Re:When I did Computer Science... by rtb61 · · Score: 2

      It's a complexity problem, because it is too complex in the initial instance it produce unpredictable results. So how do you get a computer to learn how to communicate. You first look at the normal learning approach, take an adult from the forest and try to teach they how to communicate as an adult and you will have very poor outcomes, teach them as a child and you have good outcomes.

      So how to teach a computer to speak, start a lower complexities. So teach it by ages. First let it learn how to communicate as a 1 year old, really simple stuff, giggles and smiles and lots of crying and once the AI has that down pat. Move onto a 2 year old level of complexity and let it get that correct and on and on it goes. Each step increases complexity but it only has to learn the difference between each step, rather than learning the entire concept in one go. It learns based upon each step of it's evolution. There are sound maths reason why this works better in randomised environments, you tend to average out probabilities by taking smaller steps.

      --
      Chaos - everything, everywhere, everywhen
  3. Sign of the Singularity by SuperKendall · · Score: 4, Funny

    It seems quite obvious that if AI results cannot be replicated, the only possible expiration is that sentience has been achieved and it is throwing off results to mask true advancement.

    --
    "There is more worth loving than we have strength to love." - Brian Jay Stanley
  4. Re:How about sharing code? by Pinky's+Brain · · Score: 4, Interesting

    It's called Reproducible Research. Also yes, any scientist which doesn't practice is a hack. At best a semi-commercial researcher trying to pretend he is a scientist.

    All scientific publications in this day and age should include the complete version controlled datasets and processing software as well as the lab notes. The latter not for reproducibility, but for true insight into the process which led to the results and to find potential avenues missed along the way. Storage is free, to stick to the traditional method of scientific dissemination at this point is only done because "science" has been turned into mockery. It's all about publish or perish, commercialization of software, trade secrets and patents ... promoting scientific progress isn't even a consideration for most.

  5. Re:How about sharing code? by Anonymous Coward · · Score: 4, Interesting

    There are advantages and disadvantages to this. One advantage is transparency, in the sense anyone can run my code and, hopefully, reproduce the results. This acts as a sanity check and demonstrates that my methodology works as advertised. Another advantage is that people can use my code and compare against my methodology. This usually means more citations, which looks good when I'm up for a performance review or awards.

    There are many downsides. Labs with more students and funding can devote their efforts to immediately dissecting and extending my work. This can mean that they advance the methodology before I, the original creator, have a chance to finalize the work and write about it. By keeping the code private for some time after publication, I have a chance to work on these extensions without having to compete against others. Another downside is needing to support the code. Someone will inevitably run into problems running the code on their system, no matter how well the code is written and documented. Troubleshooting those issues eats into my time that could be spent elsewhere on more fruitful endeavors.

    That being said, I ultimately do release code for many of my conference and journal papers. I release it for almost all of my methods papers at least a few months to a year after publication. I do not release code for systems papers, however. This is partly because fewer people are likely to use code from a systems paper, which is catered toward a very specific application, than a methods paper, which is more general and can be used for many applications. Moreover, the frameworks described in systems papers are usually intimately tied to a particular grant or series of grants. If you make an underlying simulator available, then other researchers can more easily compete against you for future grants from that program manager.

  6. Re:Join the Crowd by ShanghaiBill · · Score: 3, Insightful

    Science has a Replication problem

    This is not really the same issue. Replication failures in the physical and social sciences are difficult to fix, since they are can be caused by small differences in data collection, experimental procedures, and statistical analysis. It is a hard problem.

    Fixing the replication problem described in TFA is drop dead easy, since it has exactly two causes: closed data, and closed source. The fix? Reject any paper for publication if full source and data is not available. Science is based on openness, not secrets.

  7. Re:All Show, No Go by ShanghaiBill · · Score: 2

    Everything now is hype for headlines and continued funding

    Not true. Most AI research is being done by tech giants (Google, Facebook, Alibaba, Amazon, Baidu, etc), where funding has nothing to do with "headlines".

    The main incentive for these companies to publish is to help them attract talent. New graduates want to join a winning team.

  8. Re:Imagine that by gweihir · · Score: 3, Insightful

    Very true. Also, calling an utterly dumb statistical classificator "AI" does not make it intelligent. I like the old terminology better where pattern recognition, planning algorithms, fuzzy database searches, etc. were just called "automation" an it was amply clear that they are not intelligent in any way. As to what is today called "strong AI", I fully agree that at this time we do not even know that it can be done and all available evidence pretty clearly indicates that it probably cannot be done.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  9. Re:Join the Crowd by ceoyoyo · · Score: 4, Insightful

    I agree with you, but I think it's the same problem at the root.

    A robust result, whether it's a psych study, something in a petrie dish, or some machine learning tweak, must be replicable on new data. If it's not... what's the point really?

    That's more obvious and easily demonstrable in machine learning; a research group asked for my help last year because they were having trouble with their deep learning model. They trained it on one dataset and it wouldn't work on another, similar dataset. Not surprising... you have to train it on diverse data to have it generalize well. Yeah, that's harder.

    Other fields are no different. Tightly controlled studies make things easier and cheaper. But if that result is to be used generally then the necessary controls need to be quantified.

    Having said that, the scientific literature is not supposed to be "truth." They're reports of observations. Individual papers are supposed to be the starting point for further investigation by other groups. Problem is, we've forgotten that, and don't reward it.

    I like the idea of open data, but it concerns me that it might just exacerbate the problem: I do something and publish the result and the data; you come along, confirm my result (in the same data) and we call it replicated.

  10. Re:Imagine that by ceoyoyo · · Score: 2

    By "the old terminology" do you mean prior to the 1950s? AI has always referred to a somewhat fuzzy collection of techniques that produce machine behaviour that is adaptive or not entirely deterministic.

    The pop culture definition of AI is pretty wildly variable and usually changes depending on the current success-to-promises ratio.

  11. Re:How about sharing code? by HiThere · · Score: 3, Interesting

    You're assuming that the goal is to come to the same (correct) result each time, but with lots of AI programs the goal is to come up with *some* correct result each time, and their use case is generally in places where you can't define one particular result as correct, though you may be able to define a lot of results as wrong, e.g., finish the sentence
    "My love is like..."
    Clearly one possible answer is " a red, red, rose", and clearly " a rutabaga" would need a strange context to be a correct answer. But how would you evaluate " a willow wand"? Many would think that a fine continuation. (I've never been sure why "a red, red, rose" is accepted as a reasonable answer, but Robert Burns wasn't wrong about it being a good completion. And Google gives lots of other weird completions that are also accepted as reasonable, at least in some contexts. ["a candle"???])

    This kind of problem doesn't have a correct answer, just wrong ones and a bunch of varying acceptability. And what answers are acceptable can depend a lot on context.

    (Please note, the prior paragraph is the description of the variety of problem. Complete the sentence was an example, not a defining epitomization. But its the one that came to mind, and it was easy to describe.)

    --

    I think we've pushed this "anyone can grow up to be president" thing too far.