Uber has Cracked Two Classic '80s Video Games by Giving an AI Algorithm a New Type of Memory (technologyreview.com)

← Back to Stories (view on slashdot.org)

Uber has Cracked Two Classic '80s Video Games by Giving an AI Algorithm a New Type of Memory (technologyreview.com)

Posted by msmash on Tuesday November 27, 2018 @05:42AM from the closer-look dept.

An algorithm that remembers previous explorations in Montezuma's Revenge and Pitfall! could make computers and robots better at learning how to succeed in the real world. From a report: A new kind of machine-learning algorithm just mastered a couple of throwback video games that have proved to be a big headache for AI. Those following along will know that AI algorithms have bested the world's top human players at the ancient, elegant strategy game Go, one of the most difficult games imaginable. But two pixelated classics from the era of 8-bit computer games -- Montezuma's Revenge and Pitfall! -- have stymied AI researchers. There's a reason for this seeming contradiction. Although deceptively simple, both Montezuma's Revenge and Pitfall! have been immune to mastery via reinforcement learning, a technique that's otherwise adept at learning to conquer video games.

DeepMind, a subsidiary of Alphabet focused on artificial intelligence, famously used it to develop algorithms capable of learning how to play several classic video games at an expert level. Reinforcement-learning algorithms mesh well with most games, because they tweak their behavior in response to positive feedback -- the score going up. The success of the approach has generated hope that AI algorithms could teach themselves to do all sorts of useful things that are currently impossible for machines. The problem with both Montezuma's Revenge and Pitfall! is that there are few reliable reward signals. Both titles involve typical scenarios: protagonists explore blockish worlds filled with deadly creatures and traps. But in each case, lots of behaviors that are necessary to advance within the game do not help increase the score until much later. Ordinary reinforcement-learning algorithms usually fail to get out of the first room in Montezuma's Revenge, and in Pitfall! they score exactly zero.

3 of 100 comments (clear)

Min score:

Reason:

Sort:

Re:short term vs long term gain by lgw · 2018-11-27 05:59 · Score: 4, Interesting

This is more important than youmake it out to be. The key to these games is that you have to make a map to succeed. That's not the kind of learning you get from "machine learning", as obvious as it might be to a human player.
One of the many ways that AI is nothing like intelligence is the absence of any representational model of the real world. It's no accident that the neurological seat of human intelligence is an addition to our massive vision processing wetware - understanding the world in terms of objects precedes self awareness in the only example we can study. "AI" doesn't work that way, at least for the most part.
I find it impressive that someone has managed to connect the idea of making a map with the internals of machine learning (which are completely arbitrary matrices that have no obvious connection to the result).

--
Socialism: a lie told by totalitarians and believed by fools.
Re:short term vs long term gain by Areyoukiddingme · 2018-11-27 07:54 · Score: 4, Interesting

One of the many ways that AI is nothing like intelligence is the absence of any representational model of the real world.
There are many kinds of AI. Neural nets don't construct a representational model of the world from visual input but other AI techniques do. The Soar framework used so successfully for the machine-controlled antagonists in Descent (among many other uses) supports chunking, reinforcement learning, episodic learning, and semantic learning. It is based on the unified theory of cognition. It has both a temporary and permanent representational memory. It's fundamentally rule-based, rather than a neural net.
There was at one time a neural net version of Soar called Neuro-Soar but it's not part of the mainstream Soar library.
Re:If a human chooses the algorithm, is it AI? by ceoyoyo · 2018-11-27 13:49 · Score: 3, Interesting

Humans are not slow. Hinton has computed the amount of sensory information that is processed by the human brain, using reasonable approximations for things like the effective sampling rate of the eyes and ears. It's enormous.
The *consciousness* that we subjectively experience is slow. We're also pretty horrible at tasks we have to consciously think about as we're doing them too. Both of which suggest that "consciousness" might be considerably less important than many give it credit for.