Artificial General Intelligence That Plays Video Games: How Did DeepMind Do It?
First time accepted submitter Hallie Siegel writes Last December, an article named 'Playing Atari with Deep Reinforcement Learning' was uploaded to arXiv by employees of a small AI company called DeepMind. Two months later Google bought DeepMind for 500 million euros, and this article is almost the only thing we know about the company. A research team from the Computational Neuroscience Group at University of Tartu's Institute of Computer Science is trying to replicate DeepMind's work and describe its inner workings.
I've seen the next-generation after DeepMind, and it requires seven and a half million years of calculation to play a video game.
Get free satoshi (Bitcoin) and Dogecoins
Well, you know what they say, make a proof of concept first, then make it good later(only a few people ever bother to do this).
That's neat. The demo takes in the video from a video game of the Pong/Donkey Kong era, can operate the controls, and in addition has the score info. It then learns to play the game. How to do that?
It's been done before, but not this generally. "Pengi", circa 1990, played Pengo using only visual input from the screen. It had hand-written heuristics, but only needed vision input from the game. So we have a starting point.
The first problem is feature extraction from vision. What do you want to take from the image of the game that you can feed into an optimizer? Motion and change, mostly. Something like an MPEG encoder, which breaks an image into moving blocks and tracks their motion, would be needed. I doubt they're doing that with a neural net.
Now you have a large number of time-varying scalar values, which is what's needed to feed a neural net. The first thing to learn is how the controls affect the state of the game. Then, how the state of the game affects the score.
I wonder how fast this thing learns, and how many tries it needs.
interesting piece...
I took a graduate neural networks class in 2002 and did my implementation in Perl using PDL. The professor desperately pushed matlab on everybody but left us free to choose our own implementation language, and I chose Perl. I felt I understood neural networks pretty well at the end of the project. Twelve years on all I remember are the basic concepts at a high level.
Secession is the right of all sentient beings.
"Clearly not concerned about the AI's performance?"
It uses Python, indeed. And for the computationally intensive tasks, it uses numpy and theano. Theano is general symbolic computation framework that will automatically accelerate your vector computations on a nearby GPU, etc.
I don't know how it compares with (likely Lua, torch-based) deepmind's implementation. But assuming that scientific python programs actually do their expensive computations in the Python VM is really rather silly.
It's not the fall that kills you. It's the sudden stop at the end. -Douglas Adams
What I would like to know how to do is to get $500M for so little track record, intellectual property, or even publications. I don't get it.
When I worked as a video game tester for Accolade/Infogrames/Atari (same company, different owners, multiple identity crisis), I drove the programmers nuts on a racing title. Most video game players will play a race from beginning to end. Not an experienced video game testers. I would stopped the vehicle just before the finished line, turn around or drive in reverse, and crash the game by crossing the starting line. The programmers will complain that no one plays a racing game that way, try to wiggle out from fixing their code, and fix the bug only when its prevent them from going to code release. This is why testing automation is never used in the video game industry.
Advances in Deep Learning have made it far easier to extract features from vision -- in fact, feeding pixels straight to the neural net is pretty close to being all you need to do.
Take a look at these slides and read about convolutional neural networks: http://www.slideshare.net/0xda...
I find myself wondering about the following question:
How did they differentiate "learning to play the game" from "learning how to track the game's RNG"?
Most video games have ridiculously simplistic PRNG generators embedded in them. An AI might get "sidetracked" and learn how to play the underlying RNG output of the game, rather than the game itself. That would yield really good results for most arcade games of this type, I imagine (weak RNG, limited input and timing options, etc.) I don't know if they checked for that possibility.
Easy way to check, though: Reach into the game and substitute a better RNG (cryptographically-strong/hardware/quantum) RNG for the one in the game. That would enable you to quickly determine the difference. If the AI's game performance suddenly goes to shit, it wasn't a real game-playing AI. If it doesn't, well, all hail Skynet, I guess.
I am Chaos. I am alive, and I tell you that you are Free. -Eris
They used a fish. Didn't you see it on twitch or wherever?
Per the article, the AI has absolutely no knowledge of the game, what a "player" is, etc. All it has as input are the 64x64 pixels from the game, and a "score delta" (represented as -1,0,1 -- score goes up, it gets a 1 signal).
Everything else is "learned" by the engine (on its own) after repeatedly playing the game for a couple of hours. They tried this algorithm out on 7 different Atari games -- nothing learned from Game A was carried over to Game B.
OMFG, these guys talking about pixels(soon they will write OpenCV stubs in comments), It's not about the video or picture or anything like that, It's about the fact that without the knowledge of what the game is the AI can learn the structure and the rules of the game.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion. -- Spazmania (174582)
they are a part of (ahem!) team Googie, not team IBM. Sorry article.
Both the summary and article say "google".
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
The methodology deepmind used for training the game player is based on a classical reinforcement learning algorithm called Q Learning (http://en.wikipedia.org/wiki/Q-learning), developed in the late 1980's. This approach of maximizing expected future rewards for the agent to select an action in a current state has some parallels with studies of how the basal ganglia region of our brain conduct reward learning (basal ganglia).
What has been done is to approximate the reward function Q (which originally used a look up table) by a more general function to approach larger problems with much larger (or infinite) number of states. The approach here was to use a function which can fit large amounts of data, in this case a multi layered neural network (with convnet layers to preprocess the raw image input first to identify features) to attempt to learn the game.
This has actually been done a while ago, by Tesauro (now at IBM research) who used the same approach to create a Q Learning agent to play Back Gammon at an advanced level.
The reason why this is new is because in recent years we can employ cheap GPU's to learn exponentially more quicker than conventional cpu's and can construct much larger and deeper networks to learn from more complicated systems. Also many new 'tricks' have been developed to optimize learning in recent years (sigmoid functions replaced by simplified rect linear function, and dropout, etc), so we are going to see better and more amazing uses for this relatively old technology.
Twelve years on all I remember are the basic concepts at a high level.
I formally studied AI and neural nets 25yrs ago, I recently came across this series of video lectures on YT. I started watching to refresh my memory and ended up learning quite a bit of new stuff that was unknown when I did my degree. It took me about a month or so to watch the whole series, definitely worth the effort if you already have the basics, but forget it if statistical maths or matrices scare you.
Peal/Python - A toy AI doesn't need to be fast, it's purpose is to play with ideas, scripts are much more flexible than binaries for this purpose.
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
from the technical side, the real hurdle is vision...the ability to compute the best move is relatively easy
here's why: video games are 'AI'...we program games to 'play' us all the time...which is reacting to continually changing parameters to choose the best option for input to control the game to 'win'
from a complexity standpoint, think of the AI from a new P2P shooter's level of complexity vs a ghost in PacMan
all of 'ai' is abstractions based on arbitrary choices...in this instance they define "artificial intelligence plays video game" to mean having an external visual processor that relays data...then a robot arm presses buttons
again, making a robot arm to press buttons and move a joystick surely isn't "easy" but it is *simple* behavior for robotics programming
so...what is this AI really doing? what is the real engineering and coding work being done? it's the vision...it's programing the "eye" of the computer to take in visual data in the best way, in this case from another screen
"computer vision" is interesting of course...but i'm not sure we need to bother with the whole "ai" aspect of this work...it's alot of language to describe coding
Thank you Dave Raggett
hey...i came to the exact opposite conclusion as you...my comment is above you can check it out and tell me what you think
Thank you Dave Raggett
How does this compare to Learnfun and Playfun, programs publicized last year as learning and playing NES games?
http://www.cs.cmu.edu/~tom7/ma...
Me, too. No, wait... that was the Minestrone / Cannabis problem. My bad. I did solve it, though. Ate every damn bit of that minestrone. I think I found some old popcorn under the sofa seats, too. Don't exactly remember.
I've fallen off your lawn, and I can't get up.
I remember Peal... from Bell Labs, right? Yeah, I thought it rang a bell.
I've fallen off your lawn, and I can't get up.
"machine learning" is the same as every other machine behavior: it is the product of coding instructions from humans
i don't have a problem with the language, but it's not the same as "human learning" at all
when people say a machine "learned" what they really mean is that its optimization algorithm did its programmed task effectively
"learning" = optimization over time based on parameters set by humans
Thank you Dave Raggett
Differentiating between the two AIs is easy. One should mostly work on all levels and the other needs to be trained on every level.
I also regret getting a masters in AI. Before everything AI related was awesome, now it's all trivial. This is what reinforcement learning does. Given some goal, it runs hundreds to millions of simulations slowly reusing info from what worked better than before. If you setup the problem correctly, it will always eventually reach the goal. Some of the specifics are non-trivial to implement if you want it to finish before you die, but all the high-level algorithms are.
Not only did AI lose it's awesomeness, it also opened my eyes to all the corruption going on (or purposeful self-ignorance). These guys 'stole' all the credit from another's work. Yavar Naddaf did a very similar thing for a masters thesis a couple years before these guys. The main difference was he used normal reinforcement learning and these guys used deep reinforcement learning. They didn't even cite his work. They made a fairly trivial and safe improvement to something that had already been done: http://www.arcadelearningenvironment.org/publications/yavar-thesis/ and are getting a lot of fame and money for it.
* Yes, experience has made me bitter at an early age. I don't have a lawn for you to get off of.
* I don't know Naddaf in any way. I read his paper a few years ago when doing research for mine.
Take the goat and cabbage across the river in the boat.
Leave the wolf behind.
Why the heck were you traveling with a wolf to begin with?
(ob xkcd I think).
She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
Somebody else already told you about Theano. To add to that, a lot of neural net stuff gets done in Python because Theano will happily take your equation, compile it for a multi-GPU or CPU setup, optimize it, and run it fast.
A neural net is a couple of equations that need to run fast and a lot of data manipulation and visualization. Theano, Cython, a C module, pyOpenCL/pyCUDA, or something equivalent takes care of the little bit that needs to be fast.
I don't recall Reaper bots learning opponent play styles. They did learn to path around new maps (the guy who wrote the AI had a day job writing routing routines for network routers, and he applied concepts from that to game pathing), and they had a simple finite state machine that allowed them to tarck enemies, search for enemies, engage in combat using techniques like circle-strafing, disengage from comat if canditions like health/armour/powerups/weapons were unfavourable, etc. As far as I remember they treated all players the same, though.
again, misquoting me and using linguistics to make your case not engineering
the behavior you describe, a computer doing a task without 'a priori knowledge' is "machine learning"
my point, which you ignore, is that "play any game without 'a priori knowledge' is ***the computer running code*** that some dumb monkey ***programed** according to parameters
it's all coding...it's not "intelligence" and you're purposefully inventing linguistic distinctions to keep arguing
Thank you Dave Raggett
It's not scary. It's pretty basic.
Genetic algorithms has a classic example where a GA evolved a chip design that could distinguish between two frequencies of an electrical input. It do so in a more efficient and smaller package than anything designed to do so. And it was so complex that (it was said, but people say a lot) nobody really understands how it works... but it does.
The problem is that it's also not intelligence of any kind. If anything, the opposite. Pigeons, for example, if you put them in an enclosed room and feed them at completely random times will pick up a "superstition" where they correlate something they were doing at the time the food was dispensed, and think that "brings food". So you'll get pigeons sitting there banging their head against the wall because the first time they did that, some food dropped at random. That's not intelligence, even if it's being displayed by a real animal.
This is the same thing. By sheer chance, a correlation is formed between "winning slightly more often" and a certain action and that action is reinforced every time it wins slightly more.
This isn't scary, and it's certainly not intelligence or AI, and it's miles over making the leap that real "intelligent" animals can make - "if I do this, this will happen, which will give me what I want", which is an entirely different abstraction from repeatedly trying something and has an intellectual "insight", which may help through situations that you've NEVER encountered before (instead of being repeatedly trained on the same situation until you happen to find the solution).
It's the difference between a point-and-click adventure where if you just point-and-click enough and in every combination, you'll "win", and a 3D FPS where you can use tactics and strategy that can outwit an opponent who's totally unpredictable through insight into the deeper situation.
there is no "singularity"...humans are unique and have free will and civil rights...we are always dynamic and each human learns differently...
also, we understand alot about how humans learn...there are whole fields of inquiry in academia and professions devoted to it...you may have even met one of these people when you attended school...they study people like Vygotsky now...and integrate neuroscience into their learning models
the same neuroscience we programmers use to model computer architecture
humans are different from machines and always will be...there is no correlation between processor speed and 'AI' advancement towards being "like human"...you're reading too much sci-fi
Thank you Dave Raggett
this whole ontology, it's not science or engineering...it's language tricks to make us humans feel like we've accomplished something when really it's just coding...
'ai' is code...code written by humans
also, there is no specific line where we can say "we've learned everything about how humans learn"...you can't have a black/white dichotomy with an abstract idea like "learning"
"learning" is different to every human and always will be...every human is unique in the universe and has free will...no machine will ever have these characteristics...only the characteristics we linguistically ascribe to machine behavior that is ***entirely*** dependent and predictable by the human coding that instructs it
Thank you Dave Raggett
absolutely not...you've been reading too much Richard Dawkins...put his books down forever he's a troll on academia
**secular humanism** also holds to this same essentially...
"every human is unique in the universe and has free will...no machine will ever have these characteristics"
not the last part about machines, but the free will aspect of human existence is **NOT TIED TO RELIGION**
here is the UN declaration of human rights: http://en.wikipedia.org/wiki/U...
it is not religious in nature at all
that said, thanks for your insightful comments!
Thank you Dave Raggett
hey thanks for the comments
You seem to be implying that humans somehow learn differently than programs because the program is "programmed" and we're not. Do you have anything to support that assertion, besides "it's blatantly obvious to anyone with technical experience?" There's fairly good evidence that we've been "programmed" very effectively, and quite beyond what most of us would like to believe, by evolution.
now...what kind of evidence could I present that would satisfy your need?
if i had access, i could take Watson or another well known AI and show you the logic schematics then show you the codebase, then demonstrate how changing the codebase changes how Watson (or w/e) behaves with highly predictable results. I could have the engineers who made Watson walk you through their entire development process, and at each point of decision, explain to you how the decision effects how it works...
or, would you prefer some kind of academic study? i honestly don't know if such a study could even logically exist...my assertion are not provable -or- disprovable in that way
it's about having done the work of making a machine function...that's the experience/knowledge that i feel makes my assertion 'obvious'
now, humans being "programmed"...
i feel i know a bit too much about this...but yes, you can use technology, like chemistry or electricity or other E-M stuff, to alter human behavior
3 shots of whiskey or a 100K Volt cattle prod or some GHB...all can be said to "program" human behavior
the key here is consent...one human can "program" or control another but if it is without consent then it is abuse!
Thank you Dave Raggett
also, you may want to brush up on your education theory, because it's made leaps and bounds in the last 15 years, incorporating the exact same neuroscience that AI learning tries to use
i got an MA in Education from CU-Boulder in 2007...don't teach now, but i was genuinely impressed with how teaching has advanced as a profession
i also taught snowboarding for 6 seasons...applying the "Facilitated Learning Model" and was developed by...wait for it...education theorists at CU-Boulder
Vygotsky and Csikszentmmihalyi formed the basis for the Facilitated Learning Model (which is more a application of theory for the classroom)...it makes a distinction between memorizing a list and learning
The "learning" happens in what Vygotsky called the "Zone of Proximal Development"
The facilitated learning model is the act of strategically placing instrisically motivated individual autonomous actors into positions where a teacher can model behavior while simultaneously reacting to the differing needs of each individual learner
http://en.wikipedia.org/wiki/F...
http://en.wikipedia.org/wiki/M...
http://en.wikipedia.org/wiki/L...
http://en.wikipedia.org/wiki/Z...
the point is, human learning has advanced much...based on russian theorists who operated from the idea that humans have free will and that education is facilitating the learning that happens when one human is in the zone of proximal development
i admit, applying these theories to machine programming opens some intersting possibilities, but again, it's an adaptation to a command executed by a machine that **mimics** the behavior we see in humans in a machine for a specific task...in other words, yes, maybe we can use these theories to program better machines, but the act of doing so proves what i'm saying...machines are fundamentally different than humans
Thank you Dave Raggett
it's not "perfectly conceivable"...it's complete conjecture
like i said a few comments back, you've been watching too much sci-fi and have no concept of how this stuff is actually made
that's why i said, earlier, that i'd have to *literally* take you by the hand and have you talk to the Watson (or other ai) team, look at the codebase...because it seems that's the only way you can understand how complex this work is
here's your problem in a nutshell:
I suspect that we'll understand and be able to construct artificial intelligence before we can replicate a human brain, but I don't think either is more than 100 years away.
before what?
***we already understand "artificial intelligence" it's just code***
don't you see?
the notion that "artificial intelligence" is something that we can 100% "undesrtand" shows a fundamental misunderstanding of what "artificial intelligence" actually is...it's just software running on hardware, all programed by humans
also, it burdens me greatly that you somehow don't think humans have free will...you want to inject religion or 'supernatural' stuff but that's not even relevant
I linked you to the Universal Declaration of Human Rights...you should at least have a cursory undestanding of how civil rights works in the US...it's absolutely ridiculous that you think I need to proffer up some sort of link to prove humans have free will
here...if humans do not have free will and inherent civil rights then let me send you a Power of Attorney form and you can **prove** to me that you don't have free will or civil rights by signing them away to me
Thank you Dave Raggett
the notion that our brains are deterministic machines
you're already a "true believer" arent' you?
the idea that the *thing that created machines* (human brain) is nothing more than a machine is ridiculous
machines are tools for humans...that's all...
our brains can be compared to machines (anything can be compared to anything else), but that doesn't mean that our brains function like machines
it's a false ontology...and it's based on your **personal beliefs** not rationality or logic
Thank you Dave Raggett