Uber has Cracked Two Classic '80s Video Games by Giving an AI Algorithm a New Type of Memory (technologyreview.com)

← Back to Stories (view on slashdot.org)

Uber has Cracked Two Classic '80s Video Games by Giving an AI Algorithm a New Type of Memory (technologyreview.com)

Posted by msmash on Tuesday November 27, 2018 @05:42AM from the closer-look dept.

An algorithm that remembers previous explorations in Montezuma's Revenge and Pitfall! could make computers and robots better at learning how to succeed in the real world. From a report: A new kind of machine-learning algorithm just mastered a couple of throwback video games that have proved to be a big headache for AI. Those following along will know that AI algorithms have bested the world's top human players at the ancient, elegant strategy game Go, one of the most difficult games imaginable. But two pixelated classics from the era of 8-bit computer games -- Montezuma's Revenge and Pitfall! -- have stymied AI researchers. There's a reason for this seeming contradiction. Although deceptively simple, both Montezuma's Revenge and Pitfall! have been immune to mastery via reinforcement learning, a technique that's otherwise adept at learning to conquer video games.

DeepMind, a subsidiary of Alphabet focused on artificial intelligence, famously used it to develop algorithms capable of learning how to play several classic video games at an expert level. Reinforcement-learning algorithms mesh well with most games, because they tweak their behavior in response to positive feedback -- the score going up. The success of the approach has generated hope that AI algorithms could teach themselves to do all sorts of useful things that are currently impossible for machines. The problem with both Montezuma's Revenge and Pitfall! is that there are few reliable reward signals. Both titles involve typical scenarios: protagonists explore blockish worlds filled with deadly creatures and traps. But in each case, lots of behaviors that are necessary to advance within the game do not help increase the score until much later. Ordinary reinforcement-learning algorithms usually fail to get out of the first room in Montezuma's Revenge, and in Pitfall! they score exactly zero.

8 of 100 comments (clear)

Min score:

Reason:

Sort:

short term vs long term gain by LostOne · 2018-11-27 05:50 · Score: 4, Insightful

So researchers have discovered that short term gains can come at the expense of long term success? *gasp* Say it isn't so!
Actually, that's been a known problem for a long time.You end up at a local maximum on the "score" function and now you have no possible way to improve so re-enforcement learning just keeps you there even though you might do substantially better if you actually took a decrease in the "score" and ended up on the path to some other maximum on the function.
(Oh, and "Fr1st ps0t!", especially if it isn't.)

--

If it works in theory, try something else in practice.
1. Re:short term vs long term gain by lgw · 2018-11-27 05:59 · Score: 4, Interesting
  
  This is more important than youmake it out to be. The key to these games is that you have to make a map to succeed. That's not the kind of learning you get from "machine learning", as obvious as it might be to a human player.
  One of the many ways that AI is nothing like intelligence is the absence of any representational model of the real world. It's no accident that the neurological seat of human intelligence is an addition to our massive vision processing wetware - understanding the world in terms of objects precedes self awareness in the only example we can study. "AI" doesn't work that way, at least for the most part.
  I find it impressive that someone has managed to connect the idea of making a map with the internals of machine learning (which are completely arbitrary matrices that have no obvious connection to the result).
  
  --
  Socialism: a lie told by totalitarians and believed by fools.
2. Re:short term vs long term gain by Areyoukiddingme · 2018-11-27 07:54 · Score: 4, Interesting
  
  One of the many ways that AI is nothing like intelligence is the absence of any representational model of the real world.
  There are many kinds of AI. Neural nets don't construct a representational model of the world from visual input but other AI techniques do. The Soar framework used so successfully for the machine-controlled antagonists in Descent (among many other uses) supports chunking, reinforcement learning, episodic learning, and semantic learning. It is based on the unified theory of cognition. It has both a temporary and permanent representational memory. It's fundamentally rule-based, rather than a neural net.
  There was at one time a neural net version of Soar called Neuro-Soar but it's not part of the mainstream Soar library.
If a human chooses the algorithm, is it AI? by phantomfive · 2018-11-27 06:09 · Score: 2

Here is the key quote from the article that vaguely describes the algorithm they used:

The team’s new family of reinforcement-learning algorithms, dubbed Go-Explore, remember where they have been before, and will return to a particular area or task later on to see if it might help provide better overall results. The researchers also found that adding a little bit of domain knowledge, by having human players highlight interesting or important areas, sped up the algorithms’ learning and progress by a remarkable amount.

--
"First they came for the slanderers and i said nothing."
1. Re: If a human chooses the algorithm, is it AI? by phantomfive · 2018-11-27 10:54 · Score: 2
  
  In a single post you learned I'm a moron. No repetition needed.
  
  --
  "First they came for the slanderers and i said nothing."
2. Re:If a human chooses the algorithm, is it AI? by ceoyoyo · 2018-11-27 13:49 · Score: 3, Interesting
  
  Humans are not slow. Hinton has computed the amount of sensory information that is processed by the human brain, using reasonable approximations for things like the effective sampling rate of the eyes and ears. It's enormous.
  The *consciousness* that we subjectively experience is slow. We're also pretty horrible at tasks we have to consciously think about as we're doing them too. Both of which suggest that "consciousness" might be considerably less important than many give it credit for.
Non-differentiable functions are hard to optimize by smoothnorman · 2018-11-27 06:25 · Score: 2

Once again, this seems to be a case of "AI" research re-discovering some basic math: if the function has discontinuities or is otherwise non-differentiable ("behaviors that are necessary to advance within the game do not help increase the score until much later.") then its optimization is hard or dependent on fortunate starting conditions.
As an aside, have we even developed an accepted definition for what properly qualifies as "AI"? Recently I was being flogged some software who's selling point was an "AI engine" which turned out to be little more than a previous version with little bit of Bayesian statistics bolted on.
Want to read the actual details? Here's the blog. by SuperKendall · 2018-11-27 08:02 · Score: 4, Informative

I couldn't find this link anywhere in the actual article Slashdot linked to or the summary - the blog post laying out what Go-Explore is in more detail:
http://eng.uber.com/go-explore/

--
"There is more worth loving than we have strength to love." - Brian Jay Stanley