Neural Net Learns Breakout By Watching It On Screen, Then Beats Humans
KentuckyFC writes "A curious thing about video games is that computers have never been very good at playing them like humans by simply looking at a monitor and judging actions accordingly. Sure, they're pretty good if they have direct access to the program itself, but 'hand-to-eye-co-ordination' has never been their thing. Now our superiority in this area is coming to an end. A team of AI specialists in London have created a neural network that learns to play games simply by looking at the RGB output from the console. They've tested it successfully on a number of games from the legendary Atari 2600 system from the 1980s. The method is relatively straightforward. To simplify the visual part of the problem, the system down-samples the Atari's 128-colour, 210x160 pixel image to create an 84x84 grayscale version. Then it simply practices repeatedly to learn what to do. That's time-consuming, but fairly simple since at any instant in time during a game, a player can choose from a finite set actions that the game allows: move to the left, move to the right, fire and so on. So the task for any player — human or otherwise — is to choose an action at each point in the game that maximizes the eventual score. The researchers say that after learning Atari classics such as Breakout and Pong, the neural net can then thrash expert human players. However, the neural net still struggles to match average human performance in games such as Seaquest, Q*bert and, most importantly, Space Invaders. So there's hope for us yet... just not for very much longer."
I for one welcome our new virtual ass-kicking overlords.
CLI paste? paste.pr0.tips!
"I hope the lifter pilot doesn't get too bored." Jarvis is all chummy again.
"There is no pilot. It's a smart gel."
"Really? You don't say." Jarvis frowns. "Those are scary things, those gels. You know one suffocated a bunch of people in London a while back?"
Yes, Joel's about to say, but Jarvis is back in spew mode. "No shit. It was running the subway system over there, perfect operational record, and then one day it just forgets to crank up the ventilators when it's supposed to. Train slides into station fifteen meters underground, everybody gets out, no air, boom."
Joel's heard this before. The punchline's got something to do with a broken clock, if he remembers it right.
"These things teach themselves from experience, right?," Jarvis continues. "So everyone just assumed it had learned to cue the ventilators on something obvious. Body heat, motion, CO2 levels, you know. Turns out instead it was watching a clock on the wall. Train arrival correlated with a predictable subset of patterns on the digital display, so it started the fans whenever it saw one of those patterns."
"Yeah. That's right." Joel shakes his head. "And vandals had smashed the clock, or something."
"Hey. You did hear about it."
"Jarvis, that story's ten years old if it's a day. That was way back when they were starting out with these things. Those gels have been debugged from the molecules up since then."
"Yeah? What makes you so sure?"
"Because a gel's been running the lifter for the better part of a year now, and it's had plenty of opportunity to fuck up. It hasn't."
"So you like these things?"
"Fuck no," Joel says, thinking about Ray Stericker. Thinking about himself. "I'd like 'em a lot better if they did screw up sometimes, you know?"
"Well, I don't like 'em or trust 'em. You've got to wonder what they're up to."
Next up wee have Sky Net.
Let this Neural Net watch Matrix a few times, then turn on the sound and hear it say: "I know Kung Fu".
You can't handle the truth.
For once, something based on proper AI (rather than human-generated heuristics).
However - notice it's limitations: Where there is a direct correlation between where you need to be, and where something else is on the screen (basically a 1:1 relationship in Pong, for example), it can cope with going higher or lower as required.
But when you put it into something that has more than a single thing to "learn" (move left/right, avoid bombs, shoot aliens, choose which aliens to shoot, don't shoot your own base, etc.) then the amount of training required goes up exponentially. And thus we could spend centuries of computer time in order to get something that can do as well as a simple heuristic designed by someone who knows the game (not saying heuristics don't have their place!).
"Trained" devices require training relative to some power of the variety of the inputs and the directness of their correlation to the game-arena. And thus, proper AI is really stymied when it comes to learning complex tasks.
But still - this is the sort of thing we should be doing. If it takes an infant two years with the best "computer" in the universe that we know of to learn how to talk, why should we think it will take a machine at even the top-end of the supercomputer scale (which can't have as many "connections" as the average human brain) any less?
This neural-net-combined-with-trial-and-error style of algorithm is typically referred to as a "JavaScript Programmer"-type algorithm in recent AI literature. (I'm being completely serious, too, in case you think this is a joke; it isn't.)
The name derives from the similarity between how these kinds of algorithms work, and how JavaScript programmers tend to work.
Both the algorithms and JavaScript programmers use a very basic, minute form of pseudo-intelligence.
This small dab of pseudo-intelligence is then used to repeatedly attempt to solve a problem, followed by an analysis of the success of the attempt.
In the case described in this article, it involves the computer trying to play the game, with the aim of winning.
In the case of the JavaScript programmer, it involves the programmer repeatedly searching through Stack Overflow, finding code to copy-and-paste, and then hoping that it works well enough to trick the customer or employer into thinking the job is done.
The summary should have probably mentioned this, but I suspect that the submitter may not be following the latest AI journals and research very closely.
Deep Blue
much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
Where did the researchers find the "expert Breakout and Pong players" to match their neural net against? Was it that same loudmouth kid down the hall who is always "beating the spread" on football?
The question is not, "when can a bunch of machinery beat a human at X." The question is "when can a bunch of machinery beat a team of humans _with access to similar computational resources_ at X." I don't see much progress there.
"Exterminate... Exterminate...."
Actually, when they become advanced enough, we won't need to work anymore.
I'll buy TWO. One to do my job and one ... just in case.
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
the programmer repeatedly searching through Stack Overflow, finding code to copy-and-paste, and then hoping that it works well enough to trick the customer or employer into thinking the job is done."
No sigs in BETA. Beta SUCKS.
You were just asking for an oblig, weren't you?
Summary of post:
tl;dr
Cwm, fjord-bank glyphs vext quiz
In this case, the "gels" were employing a heuristic to know when to do something (in this case, turn on the air ventilation system). It was assumed that it was something meaningful to the action (i.e. something to do with the recipients of the ventilation), but it was something arbitrary (i.e. the way the clock looked). So, unless you have insight into what the heuristic is, you won't know when it's going to have the expected behavior and when it isn't. Even if it seemingly has the expected behavior for a long time.
Tetris, by nature, would prove most interesting. I myself never made it past level 10, and I've never seen anyone make it past level 20. I wonder what the breaking point for this neural net would be after a few days of practice. I would love to see a video of it starting from level one and making it's way to the insanity of level 50 - if it's up to the task. I imagine a super computer would have too much latency.
Brought to you by Carl's Junior.
"Now our superiority in this area is coming to an end."
You mean our superiority in BREAKOUT and PONG. This hardly applies to EVERYTHING.
Jarvis and Joel are discussing smart gel - some kind of AI that apparently has had bugs. What was difficult to understand?
Must be you, I understood it easily. It's a conversation between two people, Jarvis and Joel, about the dangers of smart gels (gel form neural network), based on an incident a decade ago.
They should spin up two instances of the neural net and have it play itself
The AI has another advantage over us human players with the Atari 2600. No blisters.
But has it learned to let someone else design Breakout and then steal a couple thousand dollars from him for his efforts? When it does that, it will truly be an intelligence. (And it will be a superior intelligence if it leaves off the black turtlenecks.)
Hoist Number One and Number Six.
Show me an AI that can play minecraft; that would be impressive.
to farm gold for me.
My eyes reflect the stars and a smile lights up my face.
I would think making sure the score section of the net gets special emphasis might help with the harder games. Separate inputs even with the numbers of various things known to a human player (score, lives), rather than having the AI get that from a bitmap and separate/extract.
Also, I'm guessing they're not using all the tricks one can with neural nets. Like long short-term memory. That would seem to seriously help with this sort of thing. Basically I'm guessing their lack of success with the harder games is not due to inherent limitations like some of the above posters said, but due to limitations in their implementation.
So it can play Breakout, big deal.
Wake me when it's giving the checkers-playing chicken a run for her money.
An enigma, wrapped in a riddle, shrouded in bacon and cheese
You're not alone. Although I understood it, that is some really shitty writing. Really bad. I couldn't imagine reading a whole book with that writing style, it's terrible.
Don't trust computers if your name is Jarvis.
In the greater context I think the post is trying to make the same thought, AI's that use visuals to make decisions may not be trusted with human life.
Life is a great ride, the vehicle doesn't matter
Lrrr: You are defeated. Instead of shooting where I was, you should have shot where I was going to be. Muahahahaha!
The interesting part of the slim article was the part left out. Why did not not perform as well on some of the games. There was not much detail on that issue. I'm not familiar with the poorly played game, but I would guess they introduce a level of visual complexity that overwhelms the AI?
Other than that, simply astounding accomplishment.
Life is a great ride, the vehicle doesn't matter
The performance would be better than that of a human regardless. A human will get high or drunk and smash the trains into each other once in a while.
Aside from the weird writing style, the whole excerpt doesn't really make sense. They assume because the software hasn't killed anyone yet that it's perfect. Not a good assumption. Then he says he would like the software more if it did kill them.
tl;dr == "I hate reading". You should NEVER see a tl;dr at slashdot. NERDS READ.
Remember have read about this in Scientific American some 20 years ago. It did not use neural nets but matchboxes and colored beads, but could learn tic-tac-toe.
Vajk
Nonsense. Just because something is in written form it does not mean that there's value in reading it. This is especially true if it's longer, yet inherently worthless, written material that'll waste a lot of time and effort to read through. The excerpt is a good example of this. It's rambling, obtuse, and pretty much incomprehensible. One can read it, but there's nothing of value to be obtained by reading it. Another good example is your comment. Yes, it involves some reading, but its lack of substance and insight, if not its outright incorrectness, means that reading it is a pointless activity.
Don't let it play Missile Command
...Welcome the new King of Kong!
Winning means nothing unless you can enjoy it. Is the computer having any fun?
Neural Net Learns Breakout By Watching It On Screen, Then Beats Humans
Women and children and nerds first!! The machines are coming!
Oh, my mistake. I thought it said "neural learns to break out" and then something about beating humans.
This is one reason why most people don't Capitalise Every Word In A Headline.
systemd is Roko's Basilisk.
You should learn to read.
This neural-net-combined-with-trial-and-error style of algorithm is typically referred to as a "JavaScript Programmer"-type algorithm in recent AI literature. (I'm being completely serious, too, in case you think this is a joke; it isn't.)
The name derives from the similarity between how these kinds of algorithms work, and how JavaScript programmers tend to work.
Funny, of course :)
But, you got me thinking. The JavaScript programmer is generally trying to affect the appearance of stuff on the screen, therefore, he looks at the stuff on the screen, and tries to affect ... the stuff on the screen. So, it makes more sense than it might.
Our new pong-playing overlords, on the other hand, if they are actually doing something important like remotely fighting wars or trying to save people or something, well, then we don't really know if they are looking at the right input, and it becomes much more important that they, and we, understand exactly how they are coming to their decisions.
"Hey, I have an idea, let's take concepts, deliberately misunderstand and exaggerate them, and then the person who created the concepts will look stupid!
Oh, wait, that's a dumb idea, because we'll end up looking like the stupid ones."
That is the conversation you should have had with yourself before you posted.
In the excerpt one of the chars expresses a begrudged acceptance of the 'gels' because they haven't 'fucked up' which is not, despite the anecdote which precedes the opinion, exclusive to fatalities. The responding party understands this, because he's not a total idiot, and says that he wishes the 'gels' made some kind of mistakes (again, with NO exclusivity to fatalities as you ridiculously assert in your summation).
Make me wonder how people like those in these comments ever passed verbal standardized tests. Reading comprehension is negligible and it seems even actively avoided.
I support the Slashcott and will not be reading or commenting from 2/10/14 to 2/17/14. Beta is steaming pile of dog shit
Truer words were never spoken:
the programmer repeatedly searching through Stack Overflow, finding code to copy-and-paste, and then hoping that it works well enough to trick the customer or employer into thinking the job is done."
If it really works, if the specifications are met, and if it passes testing, then the job is done.
Wisely leveraging the shared knowledge of others is a good thing to do.
People would learn more, faster if they spent more time reading. A University of Alberta student published a very similar thesis in 2010: http://www.arcadelearningenvironment.org/wp-content/uploads/2012/07/Naddaf_2010_Game-Independent-AI-Agents-for-Playing-Atari-2600-Console-Games.pdf The work is so similar I had to double-check that the authors weren't the same and simply continuing their work at a different university. The thesis tested various reinforcement learning methods as well as search-based solutions. The basic idea was the same. Graphically watch the games, then learn to replay them. I think the thesis did a better job then these guys.
I really hate how researchers claim they're always first to do some new novel thing. The only sort-of new thing they did was try out the same problem with a different learning algorithm. If someone has already shown it's possible with one algorithm, it's going to be possible with another. I'd have more respect for them if they didn't say they were the first. I see no reason to include such statements in research papers except for ego stroking.
And now, I'm wondering if there is another way for creating DOM manipulating Javascript. I mean, I can most of times make a Linux module by reading the documentation of a device and writting code that makes it work (but for some devices, it's the Javascript way), imagine some kind of data representation and then write it down with native types, imagine a Python map, a Haskell fold, or a SQL query, write it down, and in all those cases the stuff that I imagined works (given or taken a few bugs). But I could never, in my entire life, create a piece of DOM manipulating Javascript that did what I thought it should do. I always resort to trial and error.
Rethinking email
Almost every AI algorithm does this. If we knew the correct choices for every game state, the problem would be quickly automated and people would forget about it. How many people are researching new Tic-Tac-Toe AIs?
Even logical inference systems do this. Behind the scenes, backtracking/DFS/BFS tries going down different states.
http://en.wikipedia.org/wiki/Destination:_Void
You have been warned.
TFA does not describe an advancement in AI technology whatsoever.
It is an external 'computer player'...We have had AI's that play video games virtually since we had video games.
Take good ol' Tecmo Bowl...you play against an AI opponent that does absolutely everything this AI did and more.
This is not an AI advancement, it is....an **application** of new and better **sensor inputs** for an external AI
I don't see anything in this that would indicate we are some kind of 'step' closer to having Terminator kill bots....it's just an application of visual pattern recognition to a particular task.
Another example of this tech being in use today is assembly line robots. They are programed to behave according to certain visual parameters input from visual sensors.
This is **application** of existing technology...engineering...not new science or AI evolution....this is HYPE
Thank you Dave Raggett
"used to repeatedly attempt to solve a problem, followed by an analysis of the success of the attempt."
The above is exactly how humans learn to play simple games. Sure, you learn a few rules beforehand but then you actively - and to an extent subconciously - engage in trial and error about what to hit/kick/click at what time in what scenario. Its called "practice". No one for example becomes a good football (soccer for the yanks) player by analysing angles of attack of other players feet - they just go out and keep playing until they become better.
"Would you like to hear the song I learned today while we play? Daisy, Daisy, give me your answer do ..."
Alternatively, your mind is just too rigid to dig the style and too inflexible to get what the dialog is about.
Its also more formally called TDD. Create some code that tests the suitability of the existing code to solve the problem. Then randomly change the code until it passes the test, and all the others. Repeat, rinse, etc.
... they don';t need us creators any longer. Now if only we could see what they come up with about their origins after we are gone.
Nuff said.
This is neat but it needs to be understood that mimicking what other players do without the understanding of strategy and deeper conceptual thought severely limits what the AI can do. This AI could never learn to play sophisticated games simply because it works by copy-pasting basic behavior. Even something with as basic rules as Go would be far beyond this AI.
It's rambling, obtuse, and pretty much incomprehensible. One can read it, but there's nothing of value to be obtained by reading it. [...] its lack of substance and insight, if not its outright incorrectness, means that reading it is a pointless activity.
Required reading for internet skeptics
Very regularly, someone writes a clever new algorithm to crunch a specific limited set of data more efficiently.
Repeat it with me: "This is not an AI breakthrough".
How...About...A...Nice...Game...Of...Chess...
I weep for humanity...
"weird writing style" apparently means "written beyond a 6th grade reading level" (which is incidentally what USA Today is written for, by and large — a good reason to aspire to better news periodicals, even though editorial standards are slipping across the board)
Incidentally, the writing style is not that uncommon, and some of the techniques he uses can be found in other great novels of multiple genres (e.g., detective novels). At least one review describes Starfish as a thriller.
I assure you the excerpt makes sense. If you have trouble understanding this, perhaps you should go read more, and in greater variety. They used to teach reading comprehension in schools, but I'm starting to think programs like No Child Left Behind may have de-emphasized that in exchange for teaching kids how to pass more and more standardized tests that focus on bare essentials. I guess you don't need to have excellent English skills to be a good consumer.
In the case of the JavaScript programmer, it involves the programmer repeatedly searching through Stack Overflow, finding code to copy-and-paste, and then hoping that it works well enough to trick the customer or employer into thinking the job is done.
I don't hope. I believe.
— A Deep Belief JavaScript Programmer
I didn't know anyone would still dare call themselves an expert Pong or Breakout player any more.
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
Oh they are getting better -- faster/bigger nets. Id like to hear details of what flavor of neural nets and node/layer counts etc... And how long they had to train it (and how many times their net design wouldnt gel and they had to start over -- common the more complex the problem-solution is)
More complex games - you use NN as a *TOOL* for certain aspects that tool is appropriate fro and then some other AI tools to do other aspects THOSE are better suited to.
Alot of processing work required for something like this is converting the game output into a form that a NN can handle. Breakout is almost perfect because it is a rather simple fixed grid pattern (little optical interpretation required) and the input likewise is fairly simple (figure it used to run on something with less cpu horsepower than you watch has today)
Of course someone had to filter the training input to tell the program what was a 'good' result and what was the 'bad' result so that it could emulate properly to win. Thats another advantage of the game they picked in taht the play can be easily broken down into scenario chunks and is quite repetative
"but I'm starting to think programs like No Child Left Behind may have de-emphasized that in exchange for teaching kids how to pass more and more standardized tests that focus on bare essentials"
so your positing that holding teachers and children to a fairly standardized level of math and reading comprehension by forcing them to prove they have the very skills that society needs them to have is somehow bad for an educational system??
that is just so asinine..."teaching for the test"?? jesus almost every minute of every hour of school SHOULD be "teaching for a test"! otherwise, what is everyone doing there??
and how do we prove to the parents, like me, that the teachers, admins, and schools are doing their jobs if we don't somehow have a global measuring stick to 1. identify those kids most as risk and 2. reward and learn from those teachers who are excelling at their jobs??
now if teachers are found gaming the system by basically giving students answers and drilling those into our kids heads, that should be a criminal-level offense and the test makers and district people should work constantly on ferreting out those scumbags and kicking them out of my kids classrooms.
never bring a twinkie to a food fight.
Unless, of course, your name is Jarvis and you are a computer.
No, you're wrong, it really just is an example of sub-par writing.
Hold off on your accusations of me being "illiterate" or "the product of a program like No Child Left Behind", too. I was born in the late 1930s, received my education in the 1940s and 1950s, and have a personal library that includes over 12,500 novels. I've read one or two novels a day now for nearly 40 years, of every possible category you could imagine, and every possible theme.
That excerpt is one of the poorest-quality examples of fiction that I've seen in a long time. While poor writing style is endemic when it comes to science fiction, that example is worse than the norm. It requires upwards of 10 to 15 readings before it begins to make sense; most other novels are comprehensible the first read through.
It may seem good if all that you're accustomed to is reading science fiction defecate. But for those of us who have a wider and deeper understanding of modern English literature, what you consider to be "good" is total cowshit.
Touche' Well played for I had forgotten that reference.
Life is a great ride, the vehicle doesn't matter
If computers could do that for video games, do you think they could perform the same tasks behind a real gun in real life in a real battle? It would help armies a lot if such technology could be used.
I doubt this will affect Asians at all.
There are more ways to learn, and prove that you have learned, than taking and passing tests. The idea that we go to school only to learn the rote of what is taught is the very problem with the system. Our education needs to focus on critical thinking and analysis, not memorizing the answers to test questions out of their textbooks.
I should be able to ask a class of high school students what they believe was the cause of some historic event, and hear back several different answers. They don't have to be the 'correct' answers, they just need to have some reasonable explanation behind them. If I actually did this though, I think I would hear one answer: the one in their book.
Sure, it wasn't beautiful, but neither was it very hard to understand in one reading. I've read Nabokov, Conrad, Lethem, David Foster Wallace, et fucking cetera, so whatever you might consider to be "wider and deeper," I've probably read something pretty damned close to it.
Maybe your brain's just getting a bit worn out?
See http://www.arcadelearningenvironment.org/ for a few other approaches to this de-facto AI test.
Precisely! This really isn't an amazing result. Basically there is one mechanic in breakout: move the platform to where the ball is/is going. That's it! Of course the computer is going to be better than a human... the computer can move the platform the exact amount of vertical movement the ball has made each frame, people can't do that at 30-60 fps consistently.
Just like the majority of programmers anymore, they Google, click Stack Overflow, copy-paste, and then ask why other programmers are still working on the issue (even though their "solution" is usually error-prone).
Both of these cases have something very important in common: they work for the simplest of problems only. Creating something brand new that has never been done before? Have fun searching Stack Overflow for a 10 minutes before saying "It's impossible!" (true story, one of my friends did this during college...). Want to play a game that actually requires strategy instead of reflex? Have fun letting your Neural Network work on it for many years before determining it is only so good and doesn't seem to be getting any better...
hehehe, Im one of these people and I like to refer to myself as a copy and paste coder, its just a coincidence that I love JS, Ive been a ripper ever since commodore 64.
And Im cool with you guys looking down at me, thats just fine, but saying I have pseudo-intelligence is just mean.
I know what my intelligence levels are and work within them and thats why I copy and paste.
I use code that I have enough understanding in that I can adept it to my needs and fix problems that might occur.
Yes I often solve problems by systematic elimination but what's wrong with that?...its just my way of learning and although I have memory/learning problems it eventually sticks and I truly learnt something and by using the methods you look down on.
Its great that your got an awesome brain that can just understand things....for me its just a longer journey, thats all.
And saying "works well enough to trick the customer or employer into thinking the job is done." is also a little mean and wrong.
I know a guy who has built a career being a copy/paste coder and makes a lot of money and has a lot of very happy clients all over Australia and abroad.
He cheated his way through Uni...me and a mate did his homework and he cheated on every test he ever had. And lied his way into his first job.
But copying and pasting is a bridge for him as well, he uses it to get a job done and then learns how it all works (takes him time coz hes got his own problems as well) and no one ever complains or knows because hes the best cheater you'll ever meet.
And his stuff works, he doesn't have to trick anyone, his stuff is running on terminals all over australia and I've never heard one of his colleagues say anything about any of it crashing or bugging, EVER.
You really smart coders like to over complicate everything, turn everything into big complex systems and then go to great lengths explaining why it should be that way when the bulk of time none of that crap is needed.
Its like when I teach coding to someone for the first time and they ask me why theres so many confusing words or explanations..."because they want to make it seem alot harder than it is so they can charge stupid amounts of money for it"....its a joke, but alot of the time it feels real.
The performance would be better than that of a human regardless.
Until it's not. Perhaps the best solution is to use both. That is, automate but have a human operator as backup. That way when the automation goes off the rails (either figuratively or literally), there's a human there saying "that's not quite right" and can resume the reins. Automation should allow for a higher system-to-human ratio, so it's not a complete loss.
"However, the neural net still struggles to match average human performance in games such as Seaquest, Q*bert and, most importantly, Space Invaders."
There's the Singularity put off for another year.