Artificial General Intelligence That Plays Video Games: How Did DeepMind Do It?
First time accepted submitter Hallie Siegel writes Last December, an article named 'Playing Atari with Deep Reinforcement Learning' was uploaded to arXiv by employees of a small AI company called DeepMind. Two months later Google bought DeepMind for 500 million euros, and this article is almost the only thing we know about the company. A research team from the Computational Neuroscience Group at University of Tartu's Institute of Computer Science is trying to replicate DeepMind's work and describe its inner workings.
I've seen the next-generation after DeepMind, and it requires seven and a half million years of calculation to play a video game.
Get free satoshi (Bitcoin) and Dogecoins
https://github.com/kristjankor...
In Python of all languages. Clearly not concerned about the Ai's performance.
with advert that pops up just as you go to click the no beta link WTF...
That's neat. The demo takes in the video from a video game of the Pong/Donkey Kong era, can operate the controls, and in addition has the score info. It then learns to play the game. How to do that?
It's been done before, but not this generally. "Pengi", circa 1990, played Pengo using only visual input from the screen. It had hand-written heuristics, but only needed vision input from the game. So we have a starting point.
The first problem is feature extraction from vision. What do you want to take from the image of the game that you can feed into an optimizer? Motion and change, mostly. Something like an MPEG encoder, which breaks an image into moving blocks and tracks their motion, would be needed. I doubt they're doing that with a neural net.
Now you have a large number of time-varying scalar values, which is what's needed to feed a neural net. The first thing to learn is how the controls affect the state of the game. Then, how the state of the game affects the score.
I wonder how fast this thing learns, and how many tries it needs.
interesting piece...
That was in the late 90s and it learned the maps & opponent play style.
This is just an extension of neural networks that can play pong.
The basics of this can be found in the following C# code. The AI can adapt to any game and be refactored to make it agnostic:
http://www.2shared.com/complete/C5-ddyMa/Marx_Pong_v1.html
Does Google know what it is doing or paying for???
Enough already. Can't you guys take a hint. It's garbage. Leave me bet.
DeepMind is appalled at the public shaming of Zoe Quinn, but thinks #gamergate has a point about collusion of game journalist trying to push a feminist agenda by unfairly characterizing most gamers as white misogynist men. DeepMind just wants everybody to play the games they like.
Agreed. It's shite!!!
this is actually scary good, if you put yourself in the position of a computer that doesnt know what a brick wall or a paddle or a ball even is, and probably doesnt' even know a lot about movement or geometry. it really nailed that first game in a very reasonable amount of time.
its learning ability is impressive.
if it could make assumptions on gaming based on past gaming experiences, to know inherently 'player', 'hitting something with a moving object' or 'moving your player to avoid an object' it would get smarter and smarter... that's the real trick; can it learn its way through a game, then start another game with the same knowedge, and quickly ignore parts of its memory that dont seem to apply to the new environment?
What I would like to know how to do is to get $500M for so little track record, intellectual property, or even publications. I don't get it.
Right about now Google's probably sitting on the edge of the bed, looking dejectedly at the floor going "Damn, I think I just wasted a billion bucks."
When I worked as a video game tester for Accolade/Infogrames/Atari (same company, different owners, multiple identity crisis), I drove the programmers nuts on a racing title. Most video game players will play a race from beginning to end. Not an experienced video game testers. I would stopped the vehicle just before the finished line, turn around or drive in reverse, and crash the game by crossing the starting line. The programmers will complain that no one plays a racing game that way, try to wiggle out from fixing their code, and fix the bug only when its prevent them from going to code release. This is why testing automation is never used in the video game industry.
Advances in Deep Learning have made it far easier to extract features from vision -- in fact, feeding pixels straight to the neural net is pretty close to being all you need to do.
Take a look at these slides and read about convolutional neural networks: http://www.slideshare.net/0xda...
I find myself wondering about the following question:
How did they differentiate "learning to play the game" from "learning how to track the game's RNG"?
Most video games have ridiculously simplistic PRNG generators embedded in them. An AI might get "sidetracked" and learn how to play the underlying RNG output of the game, rather than the game itself. That would yield really good results for most arcade games of this type, I imagine (weak RNG, limited input and timing options, etc.) I don't know if they checked for that possibility.
Easy way to check, though: Reach into the game and substitute a better RNG (cryptographically-strong/hardware/quantum) RNG for the one in the game. That would enable you to quickly determine the difference. If the AI's game performance suddenly goes to shit, it wasn't a real game-playing AI. If it doesn't, well, all hail Skynet, I guess.
I am Chaos. I am alive, and I tell you that you are Free. -Eris
They used a fish. Didn't you see it on twitch or wherever?
OMFG, these guys talking about pixels(soon they will write OpenCV stubs in comments), It's not about the video or picture or anything like that, It's about the fact that without the knowledge of what the game is the AI can learn the structure and the rules of the game.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion. -- Spazmania (174582)
they are a part of (ahem!) team Googie, not team IBM. Sorry article.
Both the summary and article say "google".
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
The methodology deepmind used for training the game player is based on a classical reinforcement learning algorithm called Q Learning (http://en.wikipedia.org/wiki/Q-learning), developed in the late 1980's. This approach of maximizing expected future rewards for the agent to select an action in a current state has some parallels with studies of how the basal ganglia region of our brain conduct reward learning (basal ganglia).
What has been done is to approximate the reward function Q (which originally used a look up table) by a more general function to approach larger problems with much larger (or infinite) number of states. The approach here was to use a function which can fit large amounts of data, in this case a multi layered neural network (with convnet layers to preprocess the raw image input first to identify features) to attempt to learn the game.
This has actually been done a while ago, by Tesauro (now at IBM research) who used the same approach to create a Q Learning agent to play Back Gammon at an advanced level.
The reason why this is new is because in recent years we can employ cheap GPU's to learn exponentially more quicker than conventional cpu's and can construct much larger and deeper networks to learn from more complicated systems. Also many new 'tricks' have been developed to optimize learning in recent years (sigmoid functions replaced by simplified rect linear function, and dropout, etc), so we are going to see better and more amazing uses for this relatively old technology.
from the technical side, the real hurdle is vision...the ability to compute the best move is relatively easy
here's why: video games are 'AI'...we program games to 'play' us all the time...which is reacting to continually changing parameters to choose the best option for input to control the game to 'win'
from a complexity standpoint, think of the AI from a new P2P shooter's level of complexity vs a ghost in PacMan
all of 'ai' is abstractions based on arbitrary choices...in this instance they define "artificial intelligence plays video game" to mean having an external visual processor that relays data...then a robot arm presses buttons
again, making a robot arm to press buttons and move a joystick surely isn't "easy" but it is *simple* behavior for robotics programming
so...what is this AI really doing? what is the real engineering and coding work being done? it's the vision...it's programing the "eye" of the computer to take in visual data in the best way, in this case from another screen
"computer vision" is interesting of course...but i'm not sure we need to bother with the whole "ai" aspect of this work...it's alot of language to describe coding
Thank you Dave Raggett
hey...i came to the exact opposite conclusion as you...my comment is above you can check it out and tell me what you think
Thank you Dave Raggett
How does this compare to Learnfun and Playfun, programs publicized last year as learning and playing NES games?
http://www.cs.cmu.edu/~tom7/ma...
"machine learning" is the same as every other machine behavior: it is the product of coding instructions from humans
i don't have a problem with the language, but it's not the same as "human learning" at all
when people say a machine "learned" what they really mean is that its optimization algorithm did its programmed task effectively
"learning" = optimization over time based on parameters set by humans
Thank you Dave Raggett
Differentiating between the two AIs is easy. One should mostly work on all levels and the other needs to be trained on every level.
I also regret getting a masters in AI. Before everything AI related was awesome, now it's all trivial. This is what reinforcement learning does. Given some goal, it runs hundreds to millions of simulations slowly reusing info from what worked better than before. If you setup the problem correctly, it will always eventually reach the goal. Some of the specifics are non-trivial to implement if you want it to finish before you die, but all the high-level algorithms are.
Not only did AI lose it's awesomeness, it also opened my eyes to all the corruption going on (or purposeful self-ignorance). These guys 'stole' all the credit from another's work. Yavar Naddaf did a very similar thing for a masters thesis a couple years before these guys. The main difference was he used normal reinforcement learning and these guys used deep reinforcement learning. They didn't even cite his work. They made a fairly trivial and safe improvement to something that had already been done: http://www.arcadelearningenvironment.org/publications/yavar-thesis/ and are getting a lot of fame and money for it.
* Yes, experience has made me bitter at an early age. I don't have a lawn for you to get off of.
* I don't know Naddaf in any way. I read his paper a few years ago when doing research for mine.
this is nonsense
you mention "determine rules" which is part of the "machine learning" linguistic contextualization
that's what my post was about...how every behavior that is called "machine learning" including when this particular machine determined the rules of this game, it was not in any way like "learning" that humans do...it is cod....well go back and read my post again it's all there
some dumb monkey had to tell that machine the parameters and conditions by which it contextualizes what is happening in the game
also, this annoys me to no end...not the least because I tried to assiduously avoid this predictable logic error/troll tactic...you said:
see, i anticipated people taking my criticism to the litteral extreme, just as you did...that's why I specifically took the time to type this in my comment:
from a complexity standpoint, think of the AI from a new P2P shooter's level of complexity vs a ghost in PacMan....again, making a robot arm to press buttons and move a joystick surely isn't "easy" but it is *simple* behavior for robotics programming
I did *not* imply that at all. As I explained in the above quoted text, it's simple vs complex, not "easy vs hard"...i know it may seem like a complex distinction at first but if you think about it, it's pretty easy to understand
Thank you Dave Raggett
again, misquoting me and using linguistics to make your case not engineering
the behavior you describe, a computer doing a task without 'a priori knowledge' is "machine learning"
my point, which you ignore, is that "play any game without 'a priori knowledge' is ***the computer running code*** that some dumb monkey ***programed** according to parameters
it's all coding...it's not "intelligence" and you're purposefully inventing linguistic distinctions to keep arguing
Thank you Dave Raggett
there is no "singularity"...humans are unique and have free will and civil rights...we are always dynamic and each human learns differently...
also, we understand alot about how humans learn...there are whole fields of inquiry in academia and professions devoted to it...you may have even met one of these people when you attended school...they study people like Vygotsky now...and integrate neuroscience into their learning models
the same neuroscience we programmers use to model computer architecture
humans are different from machines and always will be...there is no correlation between processor speed and 'AI' advancement towards being "like human"...you're reading too much sci-fi
Thank you Dave Raggett
this whole ontology, it's not science or engineering...it's language tricks to make us humans feel like we've accomplished something when really it's just coding...
'ai' is code...code written by humans
also, there is no specific line where we can say "we've learned everything about how humans learn"...you can't have a black/white dichotomy with an abstract idea like "learning"
"learning" is different to every human and always will be...every human is unique in the universe and has free will...no machine will ever have these characteristics...only the characteristics we linguistically ascribe to machine behavior that is ***entirely*** dependent and predictable by the human coding that instructs it
Thank you Dave Raggett
absolutely not...you've been reading too much Richard Dawkins...put his books down forever he's a troll on academia
**secular humanism** also holds to this same essentially...
"every human is unique in the universe and has free will...no machine will ever have these characteristics"
not the last part about machines, but the free will aspect of human existence is **NOT TIED TO RELIGION**
here is the UN declaration of human rights: http://en.wikipedia.org/wiki/U...
it is not religious in nature at all
that said, thanks for your insightful comments!
Thank you Dave Raggett
hey thanks for the comments
You seem to be implying that humans somehow learn differently than programs because the program is "programmed" and we're not. Do you have anything to support that assertion, besides "it's blatantly obvious to anyone with technical experience?" There's fairly good evidence that we've been "programmed" very effectively, and quite beyond what most of us would like to believe, by evolution.
now...what kind of evidence could I present that would satisfy your need?
if i had access, i could take Watson or another well known AI and show you the logic schematics then show you the codebase, then demonstrate how changing the codebase changes how Watson (or w/e) behaves with highly predictable results. I could have the engineers who made Watson walk you through their entire development process, and at each point of decision, explain to you how the decision effects how it works...
or, would you prefer some kind of academic study? i honestly don't know if such a study could even logically exist...my assertion are not provable -or- disprovable in that way
it's about having done the work of making a machine function...that's the experience/knowledge that i feel makes my assertion 'obvious'
now, humans being "programmed"...
i feel i know a bit too much about this...but yes, you can use technology, like chemistry or electricity or other E-M stuff, to alter human behavior
3 shots of whiskey or a 100K Volt cattle prod or some GHB...all can be said to "program" human behavior
the key here is consent...one human can "program" or control another but if it is without consent then it is abuse!
Thank you Dave Raggett
also, you may want to brush up on your education theory, because it's made leaps and bounds in the last 15 years, incorporating the exact same neuroscience that AI learning tries to use
i got an MA in Education from CU-Boulder in 2007...don't teach now, but i was genuinely impressed with how teaching has advanced as a profession
i also taught snowboarding for 6 seasons...applying the "Facilitated Learning Model" and was developed by...wait for it...education theorists at CU-Boulder
Vygotsky and Csikszentmmihalyi formed the basis for the Facilitated Learning Model (which is more a application of theory for the classroom)...it makes a distinction between memorizing a list and learning
The "learning" happens in what Vygotsky called the "Zone of Proximal Development"
The facilitated learning model is the act of strategically placing instrisically motivated individual autonomous actors into positions where a teacher can model behavior while simultaneously reacting to the differing needs of each individual learner
http://en.wikipedia.org/wiki/F...
http://en.wikipedia.org/wiki/M...
http://en.wikipedia.org/wiki/L...
http://en.wikipedia.org/wiki/Z...
the point is, human learning has advanced much...based on russian theorists who operated from the idea that humans have free will and that education is facilitating the learning that happens when one human is in the zone of proximal development
i admit, applying these theories to machine programming opens some intersting possibilities, but again, it's an adaptation to a command executed by a machine that **mimics** the behavior we see in humans in a machine for a specific task...in other words, yes, maybe we can use these theories to program better machines, but the act of doing so proves what i'm saying...machines are fundamentally different than humans
Thank you Dave Raggett
it's not "perfectly conceivable"...it's complete conjecture
like i said a few comments back, you've been watching too much sci-fi and have no concept of how this stuff is actually made
that's why i said, earlier, that i'd have to *literally* take you by the hand and have you talk to the Watson (or other ai) team, look at the codebase...because it seems that's the only way you can understand how complex this work is
here's your problem in a nutshell:
I suspect that we'll understand and be able to construct artificial intelligence before we can replicate a human brain, but I don't think either is more than 100 years away.
before what?
***we already understand "artificial intelligence" it's just code***
don't you see?
the notion that "artificial intelligence" is something that we can 100% "undesrtand" shows a fundamental misunderstanding of what "artificial intelligence" actually is...it's just software running on hardware, all programed by humans
also, it burdens me greatly that you somehow don't think humans have free will...you want to inject religion or 'supernatural' stuff but that's not even relevant
I linked you to the Universal Declaration of Human Rights...you should at least have a cursory undestanding of how civil rights works in the US...it's absolutely ridiculous that you think I need to proffer up some sort of link to prove humans have free will
here...if humans do not have free will and inherent civil rights then let me send you a Power of Attorney form and you can **prove** to me that you don't have free will or civil rights by signing them away to me
Thank you Dave Raggett
the notion that our brains are deterministic machines
you're already a "true believer" arent' you?
the idea that the *thing that created machines* (human brain) is nothing more than a machine is ridiculous
machines are tools for humans...that's all...
our brains can be compared to machines (anything can be compared to anything else), but that doesn't mean that our brains function like machines
it's a false ontology...and it's based on your **personal beliefs** not rationality or logic
Thank you Dave Raggett