Artificial General Intelligence That Plays Video Games: How Did DeepMind Do It?

← Back to Stories (view on slashdot.org)

Artificial General Intelligence That Plays Video Games: How Did DeepMind Do It?

Posted by timothy on Thursday September 25, 2014 @08:20AM from the they-can't-let-you-do-that-dave dept.

First time accepted submitter Hallie Siegel writes Last December, an article named 'Playing Atari with Deep Reinforcement Learning' was uploaded to arXiv by employees of a small AI company called DeepMind. Two months later Google bought DeepMind for 500 million euros, and this article is almost the only thing we know about the company. A research team from the Computational Neuroscience Group at University of Tartu's Institute of Computer Science is trying to replicate DeepMind's work and describe its inner workings.

3 of 93 comments (clear)

Min score:

Reason:

Sort:

Re:How to do it. by steve.webster · 2014-09-25 09:03 · Score: 3, Informative

According to their paper, DeepMind's Q-learning is indeed passing simplified, vectorized Atari screen pixels straight into a neural net. There's no MPEG or other pre-encoding of the screen, just conversion to grayscale and normalizing to 64x64 pixels.
How to do it. by Jmstuckman · 2014-09-25 09:38 · Score: 4, Informative

Advances in Deep Learning have made it far easier to extract features from vision -- in fact, feeding pixels straight to the neural net is pretty close to being all you need to do.
Take a look at these slides and read about convolutional neural networks: http://www.slideshare.net/0xda...
Q Learning by Giant+Robot · 2014-09-25 11:05 · Score: 3, Informative

The methodology deepmind used for training the game player is based on a classical reinforcement learning algorithm called Q Learning (http://en.wikipedia.org/wiki/Q-learning), developed in the late 1980's. This approach of maximizing expected future rewards for the agent to select an action in a current state has some parallels with studies of how the basal ganglia region of our brain conduct reward learning (basal ganglia).
What has been done is to approximate the reward function Q (which originally used a look up table) by a more general function to approach larger problems with much larger (or infinite) number of states. The approach here was to use a function which can fit large amounts of data, in this case a multi layered neural network (with convnet layers to preprocess the raw image input first to identify features) to attempt to learn the game.
This has actually been done a while ago, by Tesauro (now at IBM research) who used the same approach to create a Q Learning agent to play Back Gammon at an advanced level.
The reason why this is new is because in recent years we can employ cheap GPU's to learn exponentially more quicker than conventional cpu's and can construct much larger and deeper networks to learn from more complicated systems. Also many new 'tricks' have been developed to optimize learning in recent years (sigmoid functions replaced by simplified rect linear function, and dropout, etc), so we are going to see better and more amazing uses for this relatively old technology.