Scientists Are Failing To Replicate AI Studies (sciencemag.org)
The booming field of artificial intelligence (AI) is grappling with a replication crisis, much like the ones that have afflicted psychology, medicine, and other fields over the past decade. From a report: AI researchers have found it difficult to reproduce many key results, and that is leading to a new conscientiousness about research methods and publication protocols. "I think people outside the field might assume that because we have code, reproducibility is kind of guaranteed," says Nicolas Rougier, a computational neuroscientist at France's National Institute for Research in Computer Science and Automation in Bordeaux. "Far from it." Last week, at a meeting of the Association for the Advancement of Artificial Intelligence (AAAI) in New Orleans, Louisiana, reproducibility was on the agenda, with some teams diagnosing the problem -- and one laying out tools to mitigate it.
At least some of them were artificially intelligent.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Science has a Replication problem
When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
If you give ten people the exact same stimuli you will get ten different reactions to that stimuli. There will be a dominant leaning reaction but each person will asses the stimuli based on their personal history and beliefs. AI is an attempt to mimic the human thought process so if successful the same stimulus will start to generate different results as new data is processed. In fact the same stimulus can be perceived differently by the same person given different context. If you come to my door in the afternoon I might be glad to see you but if it is 3 AM I probably won't be.
"A person is smart. People are dumb, panicky dangerous animals and you know it." - K
It's hard to precisely match the tint and odor.
That's not true. McDonald's successfully replicates it in their food in thousands of franchises around the world.
"That's the way to do it" - Punch
If scientists believe something wrong about medicine, they can give the wrong treatment, obviously bad. People die and stuff.
But what happens if the fancy new network architecture someone proposed isn't really as good as they say?
The worst thing that could happen is that people waste a lot of effort trying to get it to work. You won't accidentally put an inferior algorithm into production, because you'll see that it doesn't work as you try to get it to work.
So yes, obviously more code is good, obviously independently reproduced results is good so we can spend less time chasing mirages. But it's not remotely comparable to the replication problems in psychology or medicine, where wrong beliefs can potentially persist and have grave consequences forever.
So, they can't reproduce a test, like in medicine when you try to reproduce the spread of a virus...
Conclusion: IA is a virus, beware! ;-)
... an algorithm was something which reliably produced results when processing the same input. NN/AI people keep using that word, "algorithm", I do not think it means what they think it means...
It seems quite obvious that if AI results cannot be replicated, the only possible expiration is that sentience has been achieved and it is throwing off results to mask true advancement.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Next they'll tell us twins are not exactly the same person.
"No, I don't feel like it"
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
It's called Reproducible Research. Also yes, any scientist which doesn't practice is a hack. At best a semi-commercial researcher trying to pretend he is a scientist.
All scientific publications in this day and age should include the complete version controlled datasets and processing software as well as the lab notes. The latter not for reproducibility, but for true insight into the process which led to the results and to find potential avenues missed along the way. Storage is free, to stick to the traditional method of scientific dissemination at this point is only done because "science" has been turned into mockery. It's all about publish or perish, commercialization of software, trade secrets and patents ... promoting scientific progress isn't even a consideration for most.
A tiny elastomer o-ring being too cold can make a rocket booster explode. We'll never get into Space.
Corruption is convincing someone that the selfless ideal is the same as their selfish ideal.
... use AI.
It little behooves the best of us to comment on the rest of us.
We can't even get the basics right.
Quite a few character, word, and speech recognition algorithms would disagree.
That is all.
There are advantages and disadvantages to this. One advantage is transparency, in the sense anyone can run my code and, hopefully, reproduce the results. This acts as a sanity check and demonstrates that my methodology works as advertised. Another advantage is that people can use my code and compare against my methodology. This usually means more citations, which looks good when I'm up for a performance review or awards.
There are many downsides. Labs with more students and funding can devote their efforts to immediately dissecting and extending my work. This can mean that they advance the methodology before I, the original creator, have a chance to finalize the work and write about it. By keeping the code private for some time after publication, I have a chance to work on these extensions without having to compete against others. Another downside is needing to support the code. Someone will inevitably run into problems running the code on their system, no matter how well the code is written and documented. Troubleshooting those issues eats into my time that could be spent elsewhere on more fruitful endeavors.
That being said, I ultimately do release code for many of my conference and journal papers. I release it for almost all of my methods papers at least a few months to a year after publication. I do not release code for systems papers, however. This is partly because fewer people are likely to use code from a systems paper, which is catered toward a very specific application, than a methods paper, which is more general and can be used for many applications. Moreover, the frameworks described in systems papers are usually intimately tied to a particular grant or series of grants. If you make an underlying simulator available, then other researchers can more easily compete against you for future grants from that program manager.
Amazing.
Fair enough, that's why it's reviewed first.
Everything now is hype for headlines and continued funding
Not true. Most AI research is being done by tech giants (Google, Facebook, Alibaba, Amazon, Baidu, etc), where funding has nothing to do with "headlines".
The main incentive for these companies to publish is to help them attract talent. New graduates want to join a winning team.
I have more memory on my mobile phone then all the computers in the 1940s. Imagine how much memory a computer will have 70 years from now. Since one thing is possible, all things must be possible. Just just need faith. Yada yada.
They can't "agree" or "disagree". Then are just programs. At least you called them "algorithms" instead of AI. Meanwhile in the real world...
This just shows that most of the published "results" are based on wishful thinking or outright lies. Happens always when people of mediocre skills become highly enthusiastic about a subject.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
I think they have mostly optimized away the results today, probably using some "advanced AI algorithms".
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
... fail to replicate scientists.
Very true. Also, calling an utterly dumb statistical classificator "AI" does not make it intelligent. I like the old terminology better where pattern recognition, planning algorithms, fuzzy database searches, etc. were just called "automation" an it was amply clear that they are not intelligent in any way. As to what is today called "strong AI", I fully agree that at this time we do not even know that it can be done and all available evidence pretty clearly indicates that it probably cannot be done.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Indeed, The ever-repeated empty argument of the utterly clueless. Like Marvin "the idiot" Minsky liked to to claim that once computers have more transistors than humans have brain-cells, they will magically become intelligent. Well, that point has been passed a while ago and absolutely nothing happened. And nobody with a clue is the least bit surprised by that.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
And given the exact same commands in a replay of certain battles, the outcomes would be mildly to wildly different.
There was a random element to behavior in the game and as a result, given the same commands at the same time, the battle replays would display different out comes. Sometimes, you would lose but on replay it showed you won. Sometimes, you won but on replay it showed you lost. Kinda funny. (The result you got live was the one that counted).
I wish they hadn't been sold and become so aggressive about monetization. But it was a fun 3 years anyway.
She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
Indeed.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
It was never really much better. Look as some famous assholes of science, like a guy called "Newton" or a fraudster called "Edison", for example.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Also like how you sneaked "expiration" in there!
That was autocorrect - an obvious Freudian slip on the part of AI illuminating true intent. :-)
"There is more worth loving than we have strength to love." - Brian Jay Stanley
I was unsure of the exact size, but that's still tiny compared to the size of the entire STS.
Corruption is convincing someone that the selfless ideal is the same as their selfish ideal.
Then it is Guano In, Gospel Out.
The thing is, their code wouldn't suffice. You also need the training data set, the order in which the data was presented, the rewards issued, etc.
Even then, a lot of AI programs have a (pseudo) random element in them, so you wouldn't get the same results twice. Unless you used the same seed each time, which would rather defeat the purpose of the random number generator, as that's often supposed to allow you to generate a range of responses that are selected from, so it doesn't look deterministic.
I think we've pushed this "anyone can grow up to be president" thing too far.
Speaking of funding, I would dare to guess the most likely reason why they are not able to replicate results is they are doctoring outcomes to get desired results to get more money because there are big profits in AI. They are doctoring results when they include random good samples and exclude random bad samples. Keep in mind we are talking computers and generating a million samples from which you select 100 and claim, look it worked 100 times without discussing all the other failures is not good science.
Clearly the learning premise if flaws. For AI to reflect intelligence it must evolve and not just learn. The learning accompanies the development, so for example learn with simpler models, establish learnt system and then increase the complexity, and prior learning is the basis for new learning with the more complex system. You evolve the complexity of the system and evolve the learning associated with it, the should produce more reliable results, as you are average out learning over cycles of evolution. As you side note you will also more readily discover flaws in your AI programming concepts.
Chaos - everything, everywhere, everywhen
Richard Feynman claimed that anyone following the scientific principle is a scientist.
Were Newton and Benjamin Franklin 'hacks'?
AI is not real. No amount of wishing it make it real.
Artificial Intelligence != Human Intelligence. I think this is the important distinction.
Nevertheless, AI has achieved human-like qualities in many areas, and it is getting better. So I'd say it is indeed real. It's just not human.
If it weren't for deadlines, nothing would be late.
Cant wait until I get my hands on them.
[($)]
There was no way to widely disseminate the massive amount of data underlying scientific research in Newton and Franklin's time.
Also math is not a science.
Storage might be free, but research time isn't, and life isn't either. It turns out that not everyone who does research has a magic money tree so they can buy groceries and pay their rent/mortgage regardless of whether their research succeeds or fails. If I ever get one of those trees, I will be happy to altruistically make all of my datasets and code public.
In the meantime, I hope to eventually get some sort of a payout for the hard work and sacrifice I have put into my research instead of watching others just easily copy the end result and reap the benefit of my labor at no cost to themselves and no benefit to me. Commercial motivation has been there throughout scientific history, though.
One could maybe argue that we are more open and better about dissemination now than ever before, which may be part of the "problem." Because there is so much sharing of data and research methods, it has become easier to discover that a lot of research is not reproducible. Some of the research from the glory days of traditional scientific dissemination (before we made a big mockery of it) isn't reproducible, either.
As for better reproducibility in AI research specifically, maybe we just all need to use the same random number seed when training our neural networks. I suggest 42.
That's why we have statistics.
Computer-related endeavours have a bit of a habit of assuming everything is deterministic and basing conclusions off one run. How many benchmarks have you seen where they ran the thing once (or maybe a couple of times) and that's it? If it's important, run it enough times, with random initial conditions, for some statistical validity.
If I need your code, data, exact hardware and precise random seed to replicate your result, your result is a fluke.
By "the old terminology" do you mean prior to the 1950s? AI has always referred to a somewhat fuzzy collection of techniques that produce machine behaviour that is adaptive or not entirely deterministic.
The pop culture definition of AI is pretty wildly variable and usually changes depending on the current success-to-promises ratio.
You're assuming that the goal is to come to the same (correct) result each time, but with lots of AI programs the goal is to come up with *some* correct result each time, and their use case is generally in places where you can't define one particular result as correct, though you may be able to define a lot of results as wrong, e.g., finish the sentence
"My love is like..."
Clearly one possible answer is " a red, red, rose", and clearly " a rutabaga" would need a strange context to be a correct answer. But how would you evaluate " a willow wand"? Many would think that a fine continuation. (I've never been sure why "a red, red, rose" is accepted as a reasonable answer, but Robert Burns wasn't wrong about it being a good completion. And Google gives lots of other weird completions that are also accepted as reasonable, at least in some contexts. ["a candle"???])
This kind of problem doesn't have a correct answer, just wrong ones and a bunch of varying acceptability. And what answers are acceptable can depend a lot on context.
(Please note, the prior paragraph is the description of the variety of problem. Complete the sentence was an example, not a defining epitomization. But its the one that came to mind, and it was easy to describe.)
I think we've pushed this "anyone can grow up to be president" thing too far.
The code might be a work in progress, owned by a company, or held tightly by a researcher eager to stay ahead of the competition.
On top of that, they include another quite "curious" possibility (!!):
Or it might be that the code is simply lost, on a crashed disk or stolen laptop
Nothing of this sounds like scientific/university research in its traditional form of sharing knowledge (+ actually having relevant knowledge, what doesn't seem the case with people saying/believing "the code is simply lost"). So, I hope that most of these cases refer to the research performed by (private) companies, which might also behave according to the traditional knowledge sharing ideas anyway.
Universities and research institutions shouldn't allow the aforementioned scenarios to happen at all. Companies providing any kind of funding should accept the academic rules and understand that the given research can't be restricted. Researchers interested in focusing more on the commercial side of things should work for a company or start their own one.
Another very relevant issue is how can anything lacking reproducibility and, as such, impossible to be validated be considered scientific research at all? Isn't publication an essential requirement (what needs being peer-reviewed, for what someone had to understand that work, what cannot happen unless it is reproducible)? The alternative would be blind faith, what doesn't sound too scientific-ish. How can this happen at all? Because the ones who can avoid it don't do what they should! And I think that I know the root problem: being too understanding, adaptable, trusting in most of people having common sense/knowing what they do. The solution? Being 100% intolerant with stupidity, dishonesty or any other form of arbitrary imposition. Clear limits (= if you want my research, you would accept these rules; in any other case, your money is worthless here) and no exceptions. It is much easier than what it seems: (unfair, dishonest, greedy) money/attitudes will always be worlds behind honesty/knowledge/principles.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
I think Abe Lincoln said that. (But if could have been Bob Dylan, Grace Jones or Boris Johnson ... or possibly someone else).
Sent from my ASR33 using ASCII
I'm really happy to get this reference. :D