DeepMind Used YouTube Videos To Train Game-Beating Atari Bot (theregister.co.uk)

← Back to Stories (view on slashdot.org)

DeepMind Used YouTube Videos To Train Game-Beating Atari Bot (theregister.co.uk)

Posted by BeauHD on Thursday May 31, 2018 @02:20PM from the imitation-learning dept.

Artem Tashkinov shares a report from The Register: DeepMind has taught artificially intelligent programs to play classic Atari computer games by making them watch YouTube videos. Exploration games like 1984's Montezuma's Revenge are particularly difficult for AI to crack, because it's not obvious where you should go, which items you need and in which order, and where you should use them. That makes defining rewards difficult without spelling out exactly how to play the thing, and thus defeating the point of the exercise. For example, Montezuma's Revenge requires the agent to direct a cowboy-hat-wearing character, known as Panama Joe, through a series of rooms and scenarios to reach a treasure chamber in a temple, where all the goodies are hidden. Pocketing a golden key, your first crucial item, takes about 100 steps, and is equivalent to 100^18 possible action sequences.

To educate their code, the researchers chose three YouTube gameplay videos for each of the three titles: Montezuma's Revenge, Pitfall, and Private Eye. Each game had its own agent, which had to map the actions and features of the title into a form it could understand. The team used two methods: temporal distance classification (TDC), and cross-modal temporal distance classification (CDC). The DeepMind code still relies on lots of small rewards, of a kind, although they are referred to as checkpoints. While playing the game, every sixteenth video frame of the agent's session is taken as a snapshot and compared to a frame in a fourth video of a human playing the same game. If the agent's game frame is close or matches the one in the human's video, it is rewarded. Over time, it imitates the way the game is played in the videos by carrying out a similar sequence of moves to match the checkpoint frame. In the end, the agent was able to exceed average human players and other RL algorithms: Rainbow, ApeX, and DQfD. The researchers documented their method in a paper this week. You can view the agent in action here.

61 comments

Min score:

Reason:

Sort:

Train your dragon by Anonymous Coward · 2018-05-31 14:25 · Score: 0

Beat it is the only way to make it to what you want, what you really want.
1. Re:Train your dragon by Anonymous Coward · 2018-05-31 15:37 · Score: 0
  
  Beat it is the only way to make it to what you want, what you really want.
  Your mom is the same way.
Mimicry by Anonymous Coward · 2018-05-31 14:30 · Score: 2, Interesting

So, essentially, little more than mimicry to bootstrap what any human who had never seen a video game would do intuitively and without instruction...
Winter is coming: https://blog.piekniewski.info/2018/05/28/ai-winter-is-well-on-its-way/
Captcha: outvote
1. Re:Mimicry by rmdingler · 2018-05-31 14:48 · Score: 1
  
  Perhaps intuitive artificial intelligence has indeed been over-hyped by corporate spokesmen who have an interest in seeing it arrive sooner, rather than later; yet, its very future depends on the ability to mimic the current human overlords.
  
  --
  Happiness in intelligent people is the rarest thing I know.
  Ernest Hemingway
2. Re:Mimicry by phantomfive · 2018-05-31 15:10 · Score: 1
  
  ; yet, its very future depends on the ability to mimic the current human overlords.
  I wonder if something similar could be used to make more realistic text-to-voice synthesizers.
  
  --
  "First they came for the slanderers and i said nothing."
3. Re:Mimicry by ranton · 2018-06-01 01:15 · Score: 1
  
  I have been very impressed with how AI technology is progressing, and often argue with those on Slashdot who think anything short of Skynet is not "real AI". But training AI to win at Atari games is one story that just doesn't mean much to me. Those games are so basic it still seems the work done to beat chess masters two decades ago was more difficult.
  Once these AI's can beat professional Starcraft players, or build an AI that can beat Diety level human players in Civ 5 without cheating, then it will become impressive.
  
  --
  -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
4. Re:Mimicry by religionofpeas · 2018-06-01 02:54 · Score: 1
  
  But training AI to win at Atari games is one story that just doesn't mean much to me.
  
  I figure there was a reason they couldn't train an AI to beat these games a few years ago. Must be harder than you'd assume. And if they can beat them now, that's a step of progress.
5. Re:Mimicry by ranton · 2018-06-01 03:08 · Score: 1
  
  But training AI to win at Atari games is one story that just doesn't mean much to me.
  I figure there was a reason they couldn't train an AI to beat these games a few years ago. Must be harder than you'd assume. And if they can beat them now, that's a step of progress.
  Many times it is just because no one had tried yet. They thought it wasn't possible yet, then saw extreme success in another area, and decided to take a crack at it.
  Also sometimes no one had done it because they don't see the need. There are probably plenty of AI related tests you could do that may be a good training exercize but don't necessarily show anything new or novel. I'm not saying I could do this, just like I couldn't play in the NBA. But that doesn't mean it is news every time an NBA player makes a basket, or every time someone does a thought experiment with neural networks.
  
  --
  -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
Monkey see, monkey do by SlaveToTheGrind · 2018-05-31 14:45 · Score: 1

So if an algorithm "watches" a video of a good player, the algorithm's play approaches the level of the human. Hooray?
1. Re:Monkey see, monkey do by Anonymous Coward · 2018-05-31 14:48 · Score: 0
  
  Agreed, it's pretty fucking stupid. But people are getting paid for this apparently.
2. Re:Monkey see, monkey do by 110010001000 · 2018-05-31 15:02 · Score: 1
  
  As an added bonus it needs to be reprogrammed ("have its own agent") to play each different game. Genius!
3. Re:Monkey see, monkey do by Anonymous Coward · 2018-05-31 15:16 · Score: 0
  
  This seems like more effort and questionable results compared to the write-an-algorithm-to-play-the-game exercise that kids do in college, where they compete on whose algorithm scores the best.
4. Re:Monkey see, monkey do by ShanghaiBill · 2018-05-31 15:43 · Score: 1
  
  So if an algorithm "watches" a video of a good player, the algorithm's play approaches the level of the human. Hooray?
  ANNs use gradient descent, which is prone to converge on local minima. So for good performance, you need to get it into the right ballpark. This is why image recognizers often use auto-encoding. The same is going on here. Learn to mimic the best human, and then use that as the starting point for further optimization.
5. Re: Monkey see, monkey do by Anonymous Coward · 2018-05-31 18:59 · Score: 0
  
  The AI could watch their hands on the controller in hi res and accomplish the same goal. It isn't learning anything, just mimicking with out context.
6. Re:Monkey see, monkey do by Anonymous Coward · 2018-05-31 23:11 · Score: 0
  
  Incidentally, that's how we humans do it too. We learn from exploration (own experience), from seeing the experiences of others, and then make informed guesses combining all that previous knowledge, further refining our own experience.
7. Re: Monkey see, monkey do by phantomfive · 2018-06-01 00:10 · Score: 1
  
  Apparently it needs to be reprogrammed if a new level is added to the same game.
  
  --
  "First they came for the slanderers and i said nothing."
Re: Mimicry - no, it's cheating. by Anonymous Coward · 2018-05-31 14:50 · Score: 0

Players already have a term for people looking up clues on youtube. Cheaters.
The whole point of many games is to solve the puzzles.
If you just google for a video to find the answers, that's still cheating.
Even if the A.I. has to brute force an exhaustive approach to solve a game, so be it. Or develop better methods.
Montezuma's Revenge Map by UnknownSoldier · 2018-05-31 15:09 · Score: 1

What do you mean "not obvious where to go" ? There are only 9 levels and the map is in the shape of a pyramid -- sections of the pyramid are blocked off for that level.
True, it isn't deterministic, but cry me a river. That's the _whole_ point of intelligence --- to make an intelligent decision!
1. Re:Montezuma's Revenge Map by religionofpeas · 2018-05-31 21:28 · Score: 5, Insightful
  
  That's the _whole_ point of intelligence --- to make an intelligent decision!
  The problem is that intelligence operates on previously recognized patterns. A human playing the game already knows the concept of a map, and a pyramid, and understands locked doors that can be opened with a key. The AI starts with absolutely zero knowledge.
2. Re: Montezuma's Revenge Map by Anonymous Coward · 2018-06-01 02:50 · Score: 0
  
  So give it a small level with a lock, map and key. Master the small levels and then have it explore larger levels.
100^18 possible action sequences by AsmCoder8088 · 2018-05-31 15:18 · Score: 2

I'm pretty sure they must mean 18^100 possible sequences.
That is, if there are 18 possibilities for each step, then 100 steps would yield 18^100 possibilities.
Similar to how if there are two choices to make (go left or go right) and you make 100 of them, then there would be 2^100 possibilities.
1. Re:100^18 possible action sequences by Anonymous Coward · 2018-05-31 15:39 · Score: 1
  
  Yeah exactly. And even 18^100 is not entirely correct, because first going left and then going right is (usually) the same as first going right and then going left. So the actual number of unique possibilities will be far lower.
2. Re:100^18 possible action sequences by Anonymous Coward · 2018-06-01 07:21 · Score: 0
  
  Chess playing AI used to work that out. They would hash the positions of the game pieces to give each possible outcome a unique ID. Then if that ID were ever encountered again during a depth search of future moves, that path was already considered. It handled any number of combinations that led up to particular state,
Text adventures by darkshadow · 2018-05-31 15:42 · Score: 1

Can they beat Zork yet?

--
-Darkshadow (There was a thing called Heaven; but all the same they used to drink enormous quantities of alcohol.)
1. Re: Text adventures by denis.goddard · 2018-05-31 16:22 · Score: 1
  
  The real test is Nethack. Iâ(TM)ve been playing ~20 years, and ascended less than a dozen times. There are tens of thousands of games ttyrecâ(TM)d on the public nethack servers like nethack.alt.org Letâ(TM)s see an AI get the Amulet of Yendor.
2. Re: Text adventures by Anonymous Coward · 2018-05-31 16:41 · Score: 1
  
  Getting the Amulet? That's easy! I've had the Amulet dozens of times.
  But there's the pesky Wizard, getting thrown back down several levels in Gehennom over and over, and the Plane of Air and the Astral Plane are crazy hard.
  I say let an AI read the source, the spoilers, and the full RGRN archive. It still won't ascend.
3. Re: Text adventures by Khashishi · 2018-06-01 05:09 · Score: 1
  
  Already been done. https://www.reddit.com/r/netha...
Smoke and mirrors by Dan+East · 2018-05-31 15:55 · Score: 3, Informative

So random control inputs are fired over and over until the image of the gameplay the AI is doing very closely matches that of an expert human player. There really is no intelligence to this at all. If the slightest bit of randomness occurs in the game then it will fail, because that would not match the game the human played originally.

the agent was able to exceed average human players
Uh, that's because the "average human" would suck at these kinds of games, and the AI has merely copied the exact gameplay of an expert human who played it originally. So an expert human at a given game exceeds the average human. I hope it didn't take too much research money to come to that conclusion.

--
Better known as 318230.
1. Re:Smoke and mirrors by Anonymous Coward · 2018-05-31 17:19 · Score: 0
  
  That's how most of these AI systems work. Still scared of them taking over anytime soon?
  You can get very far on applying randomness towards maximizing a goal value. You can apply humans to that standard too: trying out things in life based on past attempts, with the goal of gaining your mate and financial security. AIs are dumb systems, but so are we. We're just really complex dumb systems. AIs are becoming more complex too. No one is really sure what level of complexity == smart, though scope matters. We've already simulated the brains of tiny insects. They're smart enough for what they do, so in that sense we've already developed smart AIs capable of taking over the word, as insects have already achieved that victory. All hail our stupid underlord masters! Terrors of children nightmares! Able to turn adults into screaming wreaks at a single glance! Tasty food for cats. All hail cats, our saviors! Yay!
  (can you tell I'm tried?)
2. Re:Smoke and mirrors by Anonymous Coward · 2018-05-31 21:16 · Score: 1
  
  Have you ever seen how a baby learns to do something? Sending muscle control signals over and over, untl the output matches expectations.
  Or does a baby just 'know' how to walk? or even to accurately put its thumb in its mouth?
3. Re:Smoke and mirrors by Anonymous Coward · 2018-05-31 21:20 · Score: 0
  
  A+ content
4. Re:Smoke and mirrors by religionofpeas · 2018-05-31 21:33 · Score: 1
  
  It's almost like a toddler trying to mash different shaped objects in various holes, until it figures out that the cylinder goes in the circle.
  Sufficiently advanced smoke and mirrors is indistinguishable from intelligence.
5. Re:Smoke and mirrors by Anonymous Coward · 2018-06-01 00:02 · Score: 1
  
  It's almost like a toddler trying to mash different shaped objects in various holes, until it figures out that the cylinder goes in the circle.
  Sufficiently advanced smoke and mirrors is indistinguishable from intelligence.
  no it is not
  the toddler isn't copying a video of another toddler putting objects in holes. they are literally only rewarding this ai when it copies the video. this is a form of copying, not learning. it's an expensive mirror, not intelligent. mirrors are not intelligent because they always mirror what is in front of them, you're way off.
6. Re:Smoke and mirrors by Impy+the+Impiuos+Imp · 2018-06-01 00:56 · Score: 1
  
  It is how humans learn -- the vast majority is watching what others did which is successful (also the easiest way to change someone's mind, as it is the opposite of preaching.)
  
  --
  (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
7. Re:Smoke and mirrors by MobyDisk · 2018-06-01 01:29 · Score: 1
  
  It's not the same though. Animals don't make random spazmodic movements until it matches the desired output. Instead, they build a mental model correlating their actions to desired outputs. That is entirely different, and it is why this approach is not AI at all. I was very excited. :-(
  The real test is this: Create a new screen in Montezuma's Revenge, and let a human player and an AI player both play that new screen. It sounds like this "AI" would simply stand there since it did not have any input on what to do.
  Alternatively, move one of the keys 25 pixels to the left or right. The human players will probably complete the level just the same and maybe not even notice. The AI will probably jump where the key used to be, then try to walk through the locked door over and over because it doesn't have the key. The AI didn't learn that keys open doors.
8. Re:Smoke and mirrors by Anonymous Coward · 2018-06-01 02:20 · Score: 0
  
  It never ceases to amaze me how quick people are to assume they already know more about any given subject than researchers who have studied it for years.
  Has it never occurred to you that there might actually be more to the field than a few paragraphs you just read from some equally inexpert journalist?
9. Re:Smoke and mirrors by religionofpeas · 2018-06-01 04:43 · Score: 0
  
  Alternatively, move one of the keys 25 pixels to the left or right. The human players will probably complete the level just the same and maybe not even notice. The AI will probably jump where the key used to be
  Not necessarily. By studying the human games, it doesn't just memorize the exact screens and movements. It can also extract patterns that can be used somewhere else, like the image of a key.
10. Re:Smoke and mirrors by MobyDisk · 2018-06-01 10:44 · Score: 1
  
  Dan East's comments made me think otherwise. I can't tell from the linked article, and I don't see them giving it any novel inputs. Many versions of Montezuma's Revenge reset all the object positions when you exit and come back into the room, so you get a free "reset" each time.
I'll start worrying when... by Grog6 · 2018-05-31 16:23 · Score: 2

They train one to get all 222 points in Leisure Suit Larry I.
Best I ever got was 221, lol.

--
Truth isn't Truth - Guliani
Did they let it read the comments too? by Anonymous Coward · 2018-05-31 16:25 · Score: 0

Yoiur motha!
Obigatory by Anonymous Coward · 2018-05-31 16:43 · Score: 0

I, for one, welcome our AI Video Gaming Overlords.
Lead Generation Services by lgsleadseo · 2018-05-31 18:27 · Score: 0

Thanks for writing such a good article, It really gives a clear idea about the topic. Really great post. - Appointment Setting Services http://leadgenerationservices....
Informative by Apk+Master · 2018-05-31 18:33 · Score: 1

Such an Informative post. Especially for gaming lovers.
Re: Mimicry - no, it's cheating. by Anonymous Coward · 2018-05-31 22:09 · Score: 0

I remember this game, might've even made a map for it. The AI computer needs pen and paper, perhaps a joint, and lot's of hours, and a mouth so it can be screaming at the screen when you die, that helped me learn it. Also, an fresh Atari joystick for when it breaks the first one in frustration.
Montezuma's Revenge https://www.youtube.com/watch?v=_zbg9rs5QZY
This has great opportunities by houghi · 2018-05-31 22:49 · Score: 1

See how many movies actually are about Atari games. Now look at the amount of movies on e.g. youporn, redtube and many similar sites and you see where I am going with this.

--
Don't fight for your country, if your country does not fight for you.
Re: Mimicry - no, it's cheating. by Anonymous Coward · 2018-05-31 23:32 · Score: 0

Well, depends on the game.
Puzzles, certainly, but no amount of videos can make you automatically finish some games unless you are up to the required skills/reflexes: fighting, arcades, even Tetris!
And is watching football games from your next opponent cheating, then?
lack of human intelligence by Anonymous Coward · 2018-06-01 00:50 · Score: 0

I always wondered what the hell the devs were thinking when they decided to name their game after diarrhea.
Re: Mimicry - no, it's cheating. by gnick · 2018-06-01 00:51 · Score: 1

...computer needs pen and paper...
That's one way to implement optical storage.

--
He's getting rather old, but he's a good mouse.
Re: Mimicry - no, it's cheating. by gnick · 2018-06-01 00:55 · Score: 1

no amount of videos can make you automatically finish some games unless you are up to the required skills/reflexes

AI skills impress me. Here it sounds like the primary skill is mimicry, but others exist. AI reflexes should be up to almost any task.

--
He's getting rather old, but he's a good mouse.
change the game in any way by Sqreater · 2018-06-01 01:58 · Score: 1

and the program would be lost again. Not the human, who would adapt. The program locks in the solution. And this is the problem with AI. Wherever you use it in lieu of humans you freeze the level of expertise.

--
E Proelio Veritas.
Congrats! by Anonymous Coward · 2018-06-01 02:37 · Score: 0

you are one of the 10,000 who understood that the soup of classifiers and feedback vodoo is called AI today.
Millennial misdirection by Anonymous Coward · 2018-06-01 03:10 · Score: 0

Once again we see millennials failing to grasp a concept. Say it with me: *The method of delivery or the process is not the end result.* it isn't 'being trained' or 'learning' anymore than software that has been programmed by hand, and if programming is easier and faster, why wouldn't you just do that? Oh, that's right. Millennial engineers are lazy. So are a lot of other people. Funny that the 'automation revolution' (which will probably never materialize, as this is just technological iteration, not even a leap forward, and past advances haven't made much of a dent. We don't think this stuff is half as impressive or cool because unlike your baby ass, we've seen it before) has turned out to be about laziness and ineptitude rather than innovation or progress. Whatever, but stop wasting our time with your science fair, thank you.
Re: Mimicry - no, it's cheating. by Anonymous Coward · 2018-06-01 03:11 · Score: 0

I apologize to /. for the grammar mistakes in my post. I wish /. would implement at least a 'one time edit' ability. :/
A step back for DeepMind by stdarg · 2018-06-01 03:50 · Score: 1

This sounds like a step back for DeepMind. The whole point of their high profile project AlphaZero was to learn to play games (go, chess, shogi) without any mimicry of human players, and it proved that such an approach could be successful. It soundly defeated the previous go world champ, AlphaGo Master, which was trained with top-level human games and self-play. AlphaZero wasn't taught or shown any patterns, rather it discovered them through self-play and random moves.
The funny thing is, after AlphaZero, DeepMind was like, "We're not doing this just to beat people at games, we want to apply this to solving serious problems in energy, healthcare, etc." (paraphrase, not a direct quote)
And now they're revealing how they're playing old Atari games...
1. Re:A step back for DeepMind by religionofpeas · 2018-06-01 04:41 · Score: 1
  
  It's not really a step back, but rather a more difficult problem. The reason that AlphaZero worked, is because the consequences of a mistake are quickly visible, leading to a short feedback cycle of improvements.
  With these particular games, there are too many choices, and too much delay between making a choice and the consequence, making it very hard to detect patterns between a specific action and the outcome.
2. Re:A step back for DeepMind by Areyoukiddingme · 2018-06-01 10:29 · Score: 1
  
  With these particular games, there are too many choices, and too much delay between making a choice and the consequence, making it very hard to detect patterns between a specific action and the outcome.
  Humans have big problems with that too...
I'm pretty sure by Anonymous Coward · 2018-06-01 05:17 · Score: 0

I'm pretty sure I could train my dog to do the same. I don't get how this is so special?
Later Levels by slapout · 2018-06-01 09:49 · Score: 1

The later levels in Montezuma's Revenge were the same as the earlier ones, except they were blacked out. You have to have already memorized where to go.

--
Coder's Stone: The programming language quick ref for iPad
Isn't this cheating? by Anonymous Coward · 2018-06-02 15:12 · Score: 0

Using the best human runs as a basis for training the AI seems like cheating. The AI should learn without preconceived knowledge.