Scientists Are Failing To Replicate AI Studies (sciencemag.org)
The booming field of artificial intelligence (AI) is grappling with a replication crisis, much like the ones that have afflicted psychology, medicine, and other fields over the past decade. From a report: AI researchers have found it difficult to reproduce many key results, and that is leading to a new conscientiousness about research methods and publication protocols. "I think people outside the field might assume that because we have code, reproducibility is kind of guaranteed," says Nicolas Rougier, a computational neuroscientist at France's National Institute for Research in Computer Science and Automation in Bordeaux. "Far from it." Last week, at a meeting of the Association for the Advancement of Artificial Intelligence (AAAI) in New Orleans, Louisiana, reproducibility was on the agenda, with some teams diagnosing the problem -- and one laying out tools to mitigate it.
Science has a Replication problem
When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
It's called Reproducible Research. Also yes, any scientist which doesn't practice is a hack. At best a semi-commercial researcher trying to pretend he is a scientist.
All scientific publications in this day and age should include the complete version controlled datasets and processing software as well as the lab notes. The latter not for reproducibility, but for true insight into the process which led to the results and to find potential avenues missed along the way. Storage is free, to stick to the traditional method of scientific dissemination at this point is only done because "science" has been turned into mockery. It's all about publish or perish, commercialization of software, trade secrets and patents ... promoting scientific progress isn't even a consideration for most.
There are advantages and disadvantages to this. One advantage is transparency, in the sense anyone can run my code and, hopefully, reproduce the results. This acts as a sanity check and demonstrates that my methodology works as advertised. Another advantage is that people can use my code and compare against my methodology. This usually means more citations, which looks good when I'm up for a performance review or awards.
There are many downsides. Labs with more students and funding can devote their efforts to immediately dissecting and extending my work. This can mean that they advance the methodology before I, the original creator, have a chance to finalize the work and write about it. By keeping the code private for some time after publication, I have a chance to work on these extensions without having to compete against others. Another downside is needing to support the code. Someone will inevitably run into problems running the code on their system, no matter how well the code is written and documented. Troubleshooting those issues eats into my time that could be spent elsewhere on more fruitful endeavors.
That being said, I ultimately do release code for many of my conference and journal papers. I release it for almost all of my methods papers at least a few months to a year after publication. I do not release code for systems papers, however. This is partly because fewer people are likely to use code from a systems paper, which is catered toward a very specific application, than a methods paper, which is more general and can be used for many applications. Moreover, the frameworks described in systems papers are usually intimately tied to a particular grant or series of grants. If you make an underlying simulator available, then other researchers can more easily compete against you for future grants from that program manager.
AI is an attempt to mimic the human thought process
This is no more true than claiming that the Boeing 747 was designed to mimic a hummingbird's flight process.
You're assuming that the goal is to come to the same (correct) result each time, but with lots of AI programs the goal is to come up with *some* correct result each time, and their use case is generally in places where you can't define one particular result as correct, though you may be able to define a lot of results as wrong, e.g., finish the sentence
"My love is like..."
Clearly one possible answer is " a red, red, rose", and clearly " a rutabaga" would need a strange context to be a correct answer. But how would you evaluate " a willow wand"? Many would think that a fine continuation. (I've never been sure why "a red, red, rose" is accepted as a reasonable answer, but Robert Burns wasn't wrong about it being a good completion. And Google gives lots of other weird completions that are also accepted as reasonable, at least in some contexts. ["a candle"???])
This kind of problem doesn't have a correct answer, just wrong ones and a bunch of varying acceptability. And what answers are acceptable can depend a lot on context.
(Please note, the prior paragraph is the description of the variety of problem. Complete the sentence was an example, not a defining epitomization. But its the one that came to mind, and it was easy to describe.)
I think we've pushed this "anyone can grow up to be president" thing too far.