Slashdot Mirror


Neural Net Learns Breakout By Watching It On Screen, Then Beats Humans

KentuckyFC writes "A curious thing about video games is that computers have never been very good at playing them like humans by simply looking at a monitor and judging actions accordingly. Sure, they're pretty good if they have direct access to the program itself, but 'hand-to-eye-co-ordination' has never been their thing. Now our superiority in this area is coming to an end. A team of AI specialists in London have created a neural network that learns to play games simply by looking at the RGB output from the console. They've tested it successfully on a number of games from the legendary Atari 2600 system from the 1980s. The method is relatively straightforward. To simplify the visual part of the problem, the system down-samples the Atari's 128-colour, 210x160 pixel image to create an 84x84 grayscale version. Then it simply practices repeatedly to learn what to do. That's time-consuming, but fairly simple since at any instant in time during a game, a player can choose from a finite set actions that the game allows: move to the left, move to the right, fire and so on. So the task for any player — human or otherwise — is to choose an action at each point in the game that maximizes the eventual score. The researchers say that after learning Atari classics such as Breakout and Pong, the neural net can then thrash expert human players. However, the neural net still struggles to match average human performance in games such as Seaquest, Q*bert and, most importantly, Space Invaders. So there's hope for us yet... just not for very much longer."

19 of 138 comments (clear)

  1. I for one by fisted · · Score: 4, Funny

    I for one welcome our new virtual ass-kicking overlords.

  2. Excerpt from "Starfish" by Peter Watts by Anonymous Coward · · Score: 5, Interesting

    "I hope the lifter pilot doesn't get too bored." Jarvis is all chummy again.
    "There is no pilot. It's a smart gel."
    "Really? You don't say." Jarvis frowns. "Those are scary things, those gels. You know one suffocated a bunch of people in London a while back?"
    Yes, Joel's about to say, but Jarvis is back in spew mode. "No shit. It was running the subway system over there, perfect operational record, and then one day it just forgets to crank up the ventilators when it's supposed to. Train slides into station fifteen meters underground, everybody gets out, no air, boom."
    Joel's heard this before. The punchline's got something to do with a broken clock, if he remembers it right.
    "These things teach themselves from experience, right?," Jarvis continues. "So everyone just assumed it had learned to cue the ventilators on something obvious. Body heat, motion, CO2 levels, you know. Turns out instead it was watching a clock on the wall. Train arrival correlated with a predictable subset of patterns on the digital display, so it started the fans whenever it saw one of those patterns."
    "Yeah. That's right." Joel shakes his head. "And vandals had smashed the clock, or something."
    "Hey. You did hear about it."
    "Jarvis, that story's ten years old if it's a day. That was way back when they were starting out with these things. Those gels have been debugged from the molecules up since then."
    "Yeah? What makes you so sure?"
    "Because a gel's been running the lifter for the better part of a year now, and it's had plenty of opportunity to fuck up. It hasn't."
    "So you like these things?"
    "Fuck no," Joel says, thinking about Ray Stericker. Thinking about himself. "I'd like 'em a lot better if they did screw up sometimes, you know?"
    "Well, I don't like 'em or trust 'em. You've got to wonder what they're up to."

  3. AI by ledow · · Score: 5, Insightful

    For once, something based on proper AI (rather than human-generated heuristics).

    However - notice it's limitations: Where there is a direct correlation between where you need to be, and where something else is on the screen (basically a 1:1 relationship in Pong, for example), it can cope with going higher or lower as required.

    But when you put it into something that has more than a single thing to "learn" (move left/right, avoid bombs, shoot aliens, choose which aliens to shoot, don't shoot your own base, etc.) then the amount of training required goes up exponentially. And thus we could spend centuries of computer time in order to get something that can do as well as a simple heuristic designed by someone who knows the game (not saying heuristics don't have their place!).

    "Trained" devices require training relative to some power of the variety of the inputs and the directness of their correlation to the game-arena. And thus, proper AI is really stymied when it comes to learning complex tasks.

    But still - this is the sort of thing we should be doing. If it takes an infant two years with the best "computer" in the universe that we know of to learn how to talk, why should we think it will take a machine at even the top-end of the supercomputer scale (which can't have as many "connections" as the average human brain) any less?

    1. Re:AI by Anonymous Coward · · Score: 3, Interesting

      If it takes an infant two years with the best "computer" in the universe that we know of to learn how to talk, why should we think it will take a machine at even the top-end of the supercomputer scale (which can't have as many "connections" as the average human brain) any less?

      Because we're learning languages in the wrong way.

    2. Re:AI by StripedCow · · Score: 3, Interesting

      If it takes an infant two years with the best "computer" in the universe that we know of to learn how to talk, why should we think it will take a machine at even the top-end of the supercomputer scale (which can't have as many "connections" as the average human brain) any less?

      Because neurons are much slower than transistors?

      --
      If Pandora's box is destined to be opened, *I* want to be the one to open it.
    3. Re:AI by GTRacer · · Score: 2, Interesting

      So, Pimsleur or Rosetta?

      --
      Defending IP by destroying access to it? That makes sense, RIAA/MPAA. Go to the corner until you can play nice!
  4. It's called a "JavaScript Programmer" algorithm. by Anonymous Coward · · Score: 5, Funny

    This neural-net-combined-with-trial-and-error style of algorithm is typically referred to as a "JavaScript Programmer"-type algorithm in recent AI literature. (I'm being completely serious, too, in case you think this is a joke; it isn't.)

    The name derives from the similarity between how these kinds of algorithms work, and how JavaScript programmers tend to work.

    Both the algorithms and JavaScript programmers use a very basic, minute form of pseudo-intelligence.

    This small dab of pseudo-intelligence is then used to repeatedly attempt to solve a problem, followed by an analysis of the success of the attempt.

    In the case described in this article, it involves the computer trying to play the game, with the aim of winning.

    In the case of the JavaScript programmer, it involves the programmer repeatedly searching through Stack Overflow, finding code to copy-and-paste, and then hoping that it works well enough to trick the customer or employer into thinking the job is done.

    The summary should have probably mentioned this, but I suspect that the submitter may not be following the latest AI journals and research very closely.

  5. Re:It's called a "JavaScript Programmer" algorithm by cfulton · · Score: 2, Funny
    Truer words were never spoken:

    the programmer repeatedly searching through Stack Overflow, finding code to copy-and-paste, and then hoping that it works well enough to trick the customer or employer into thinking the job is done."

    --
    No sigs in BETA. Beta SUCKS.
  6. Re:The handwriting on the wall by cold+fjord · · Score: 5, Interesting
    --
    much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
  7. Re:Can we get a summary of that excerpt, please? by themightythor · · Score: 4, Informative

    In this case, the "gels" were employing a heuristic to know when to do something (in this case, turn on the air ventilation system). It was assumed that it was something meaningful to the action (i.e. something to do with the recipients of the ventilation), but it was something arbitrary (i.e. the way the clock looked). So, unless you have insight into what the heuristic is, you won't know when it's going to have the expected behavior and when it isn't. Even if it seemingly has the expected behavior for a long time.

  8. Tetris by wjcofkc · · Score: 2

    Tetris, by nature, would prove most interesting. I myself never made it past level 10, and I've never seen anyone make it past level 20. I wonder what the breaking point for this neural net would be after a few days of practice. I would love to see a video of it starting from level one and making it's way to the insanity of level 50 - if it's up to the task. I imagine a super computer would have too much latency.

    --
    Brought to you by Carl's Junior.
    1. Re:Tetris by Sigma+7 · · Score: 4, Informative

      Tetris is a solved problem if you're going for survival (assuming you don't get an extremely unlucky piece selection). Since AI has access to the current piece, the next piece, and can do a probability check on the next piece, it can basically last forever.

      I myself never made it past level 10, and I've never seen anyone make it past level 20.

      Tetris: The Grand Master: http://www.youtube.com/watch?v=jwC544Z37qo - fast forward to 3:00 to see first majoor speedup, 4:45 for final speedup, and 5:01 for invisible pieces.

      That, and 999999 was done on a real NES within 3 minutes 11 seconds: http://www.youtube.com/watch?v=bR0BKCHJ48s

  9. All is lost! by portwojc · · Score: 2

    The AI has another advantage over us human players with the Atari 2600. No blisters.

  10. But by Stargoat · · Score: 2

    But has it learned to let someone else design Breakout and then steal a couple thousand dollars from him for his efforts? When it does that, it will truly be an intelligence. (And it will be a superior intelligence if it leaves off the black turtlenecks.)

    --
    Hoist Number One and Number Six.
  11. Perhaps I can teach it by The-Ixian · · Score: 2

    to farm gold for me.

    --
    My eyes reflect the stars and a smile lights up my face.
  12. Re:Can we get a summary of that excerpt, please? by Anonymous Coward · · Score: 2, Insightful

    tl;dr == "I hate reading". You should NEVER see a tl;dr at slashdot. NERDS READ.

  13. Re:It's called a "JavaScript Programmer" algorithm by cascadingstylesheet · · Score: 3, Interesting

    This neural-net-combined-with-trial-and-error style of algorithm is typically referred to as a "JavaScript Programmer"-type algorithm in recent AI literature. (I'm being completely serious, too, in case you think this is a joke; it isn't.)

    The name derives from the similarity between how these kinds of algorithms work, and how JavaScript programmers tend to work.

    Funny, of course :)

    But, you got me thinking. The JavaScript programmer is generally trying to affect the appearance of stuff on the screen, therefore, he looks at the stuff on the screen, and tries to affect ... the stuff on the screen. So, it makes more sense than it might.

    Our new pong-playing overlords, on the other hand, if they are actually doing something important like remotely fighting wars or trying to save people or something, well, then we don't really know if they are looking at the right input, and it becomes much more important that they, and we, understand exactly how they are coming to their decisions.

  14. Re:wining is pointless by egcagrac0 · · Score: 2

    It's learning to have fun.

  15. Re:Oh, KentuckyFC by Savage-Rabbit · · Score: 3, Funny

    You were just asking for an oblig, weren't you?

    http://xkcd.com/347/ ...now that was truly obligatory.

    --
    Only to idiots, are orders laws.
    -- Henning von Tresckow