Artificial General Intelligence That Plays Video Games: How Did DeepMind Do It?

← Back to Stories (view on slashdot.org)

Artificial General Intelligence That Plays Video Games: How Did DeepMind Do It?

Posted by timothy on Thursday September 25, 2014 @08:20AM from the they-can't-let-you-do-that-dave dept.

First time accepted submitter Hallie Siegel writes Last December, an article named 'Playing Atari with Deep Reinforcement Learning' was uploaded to arXiv by employees of a small AI company called DeepMind. Two months later Google bought DeepMind for 500 million euros, and this article is almost the only thing we know about the company. A research team from the Computational Neuroscience Group at University of Tartu's Institute of Computer Science is trying to replicate DeepMind's work and describe its inner workings.

10 of 93 comments (clear)

Min score:

Reason:

Sort:

DeepMind? by ArcadeMan · 2014-09-25 08:25 · Score: 4, Funny

I've seen the next-generation after DeepMind, and it requires seven and a half million years of calculation to play a video game.

--
Get free satoshi (Bitcoin) and Dogecoins
1. Re:DeepMind? by i+kan+reed · 2014-09-25 08:36 · Score: 3, Interesting
  
  They have Multi-GPU accelerated map reduced neural nets these days. And their comparative performance is amazingly fast, and cheap. You can even buy physical servers built for that exact function.
2. Re:DeepMind? by ArcadeMan · 2014-09-25 09:08 · Score: 2
  
  I'm guessing 42 minutes. Or is it 42 years?
  
  --
  Get free satoshi (Bitcoin) and Dogecoins
Re:Opensource remake by i+kan+reed · 2014-09-25 08:37 · Score: 2

Well, you know what they say, make a proof of concept first, then make it good later(only a few people ever bother to do this).
How to do it. by Animats · 2014-09-25 08:38 · Score: 4, Interesting

That's neat. The demo takes in the video from a video game of the Pong/Donkey Kong era, can operate the controls, and in addition has the score info. It then learns to play the game. How to do that?
It's been done before, but not this generally. "Pengi", circa 1990, played Pengo using only visual input from the screen. It had hand-written heuristics, but only needed vision input from the game. So we have a starting point.
The first problem is feature extraction from vision. What do you want to take from the image of the game that you can feed into an optimizer? Motion and change, mostly. Something like an MPEG encoder, which breaks an image into moving blocks and tracks their motion, would be needed. I doubt they're doing that with a neural net.
Now you have a large number of time-varying scalar values, which is what's needed to feed a neural net. The first thing to learn is how the controls affect the state of the game. Then, how the state of the game affects the score.
I wonder how fast this thing learns, and how many tries it needs.
1. Re:How to do it. by steve.webster · 2014-09-25 09:03 · Score: 3, Informative
  
  According to their paper, DeepMind's Q-learning is indeed passing simplified, vectorized Atari screen pixels straight into a neural net. There's no MPEG or other pre-encoding of the screen, just conversion to grayscale and normalizing to 64x64 pixels.
Re:Opensource remake by jdavidb · 2014-09-25 09:00 · Score: 2

I took a graduate neural networks class in 2002 and did my implementation in Perl using PDL. The professor desperately pushed matlab on everybody but left us free to choose our own implementation language, and I chose Perl. I felt I understood neural networks pretty well at the end of the project. Twelve years on all I remember are the basic concepts at a high level.

--
Secession is the right of all sentient beings.
No AI Can Simulate A Video Game Tester by __aaclcg7560 · 2014-09-25 09:31 · Score: 3, Interesting

When I worked as a video game tester for Accolade/Infogrames/Atari (same company, different owners, multiple identity crisis), I drove the programmers nuts on a racing title. Most video game players will play a race from beginning to end. Not an experienced video game testers. I would stopped the vehicle just before the finished line, turn around or drive in reverse, and crash the game by crossing the starting line. The programmers will complain that no one plays a racing game that way, try to wiggle out from fixing their code, and fix the bug only when its prevent them from going to code release. This is why testing automation is never used in the video game industry.
How to do it. by Jmstuckman · 2014-09-25 09:38 · Score: 4, Informative

Advances in Deep Learning have made it far easier to extract features from vision -- in fact, feeding pixels straight to the neural net is pretty close to being all you need to do.
Take a look at these slides and read about convolutional neural networks: http://www.slideshare.net/0xda...
Q Learning by Giant+Robot · 2014-09-25 11:05 · Score: 3, Informative

The methodology deepmind used for training the game player is based on a classical reinforcement learning algorithm called Q Learning (http://en.wikipedia.org/wiki/Q-learning), developed in the late 1980's. This approach of maximizing expected future rewards for the agent to select an action in a current state has some parallels with studies of how the basal ganglia region of our brain conduct reward learning (basal ganglia).
What has been done is to approximate the reward function Q (which originally used a look up table) by a more general function to approach larger problems with much larger (or infinite) number of states. The approach here was to use a function which can fit large amounts of data, in this case a multi layered neural network (with convnet layers to preprocess the raw image input first to identify features) to attempt to learn the game.
This has actually been done a while ago, by Tesauro (now at IBM research) who used the same approach to create a Q Learning agent to play Back Gammon at an advanced level.
The reason why this is new is because in recent years we can employ cheap GPU's to learn exponentially more quicker than conventional cpu's and can construct much larger and deeper networks to learn from more complicated systems. Also many new 'tricks' have been developed to optimize learning in recent years (sigmoid functions replaced by simplified rect linear function, and dropout, etc), so we are going to see better and more amazing uses for this relatively old technology.