Uber has Cracked Two Classic '80s Video Games by Giving an AI Algorithm a New Type of Memory (technologyreview.com)
An algorithm that remembers previous explorations in Montezuma's Revenge and Pitfall! could make computers and robots better at learning how to succeed in the real world. From a report: A new kind of machine-learning algorithm just mastered a couple of throwback video games that have proved to be a big headache for AI. Those following along will know that AI algorithms have bested the world's top human players at the ancient, elegant strategy game Go, one of the most difficult games imaginable. But two pixelated classics from the era of 8-bit computer games -- Montezuma's Revenge and Pitfall! -- have stymied AI researchers. There's a reason for this seeming contradiction. Although deceptively simple, both Montezuma's Revenge and Pitfall! have been immune to mastery via reinforcement learning, a technique that's otherwise adept at learning to conquer video games.
DeepMind, a subsidiary of Alphabet focused on artificial intelligence, famously used it to develop algorithms capable of learning how to play several classic video games at an expert level. Reinforcement-learning algorithms mesh well with most games, because they tweak their behavior in response to positive feedback -- the score going up. The success of the approach has generated hope that AI algorithms could teach themselves to do all sorts of useful things that are currently impossible for machines. The problem with both Montezuma's Revenge and Pitfall! is that there are few reliable reward signals. Both titles involve typical scenarios: protagonists explore blockish worlds filled with deadly creatures and traps. But in each case, lots of behaviors that are necessary to advance within the game do not help increase the score until much later. Ordinary reinforcement-learning algorithms usually fail to get out of the first room in Montezuma's Revenge, and in Pitfall! they score exactly zero.
DeepMind, a subsidiary of Alphabet focused on artificial intelligence, famously used it to develop algorithms capable of learning how to play several classic video games at an expert level. Reinforcement-learning algorithms mesh well with most games, because they tweak their behavior in response to positive feedback -- the score going up. The success of the approach has generated hope that AI algorithms could teach themselves to do all sorts of useful things that are currently impossible for machines. The problem with both Montezuma's Revenge and Pitfall! is that there are few reliable reward signals. Both titles involve typical scenarios: protagonists explore blockish worlds filled with deadly creatures and traps. But in each case, lots of behaviors that are necessary to advance within the game do not help increase the score until much later. Ordinary reinforcement-learning algorithms usually fail to get out of the first room in Montezuma's Revenge, and in Pitfall! they score exactly zero.
So researchers have discovered that short term gains can come at the expense of long term success? *gasp* Say it isn't so!
Actually, that's been a known problem for a long time.You end up at a local maximum on the "score" function and now you have no possible way to improve so re-enforcement learning just keeps you there even though you might do substantially better if you actually took a decrease in the "score" and ended up on the path to some other maximum on the function.
(Oh, and "Fr1st ps0t!", especially if it isn't.)
If it works in theory, try something else in practice.
The team’s new family of reinforcement-learning algorithms, dubbed Go-Explore, remember where they have been before, and will return to a particular area or task later on to see if it might help provide better overall results. The researchers also found that adding a little bit of domain knowledge, by having human players highlight interesting or important areas, sped up the algorithms’ learning and progress by a remarkable amount.
"First they came for the slanderers and i said nothing."
As an aside, have we even developed an accepted definition for what properly qualifies as "AI"? Recently I was being flogged some software who's selling point was an "AI engine" which turned out to be little more than a previous version with little bit of Bayesian statistics bolted on.
I couldn't find this link anywhere in the actual article Slashdot linked to or the summary - the blog post laying out what Go-Explore is in more detail:
http://eng.uber.com/go-explore/
"There is more worth loving than we have strength to love." - Brian Jay Stanley