DeepMind Used YouTube Videos To Train Game-Beating Atari Bot (theregister.co.uk)

← Back to Stories (view on slashdot.org)

DeepMind Used YouTube Videos To Train Game-Beating Atari Bot (theregister.co.uk)

Posted by BeauHD on Thursday May 31, 2018 @02:20PM from the imitation-learning dept.

Artem Tashkinov shares a report from The Register: DeepMind has taught artificially intelligent programs to play classic Atari computer games by making them watch YouTube videos. Exploration games like 1984's Montezuma's Revenge are particularly difficult for AI to crack, because it's not obvious where you should go, which items you need and in which order, and where you should use them. That makes defining rewards difficult without spelling out exactly how to play the thing, and thus defeating the point of the exercise. For example, Montezuma's Revenge requires the agent to direct a cowboy-hat-wearing character, known as Panama Joe, through a series of rooms and scenarios to reach a treasure chamber in a temple, where all the goodies are hidden. Pocketing a golden key, your first crucial item, takes about 100 steps, and is equivalent to 100^18 possible action sequences.

To educate their code, the researchers chose three YouTube gameplay videos for each of the three titles: Montezuma's Revenge, Pitfall, and Private Eye. Each game had its own agent, which had to map the actions and features of the title into a form it could understand. The team used two methods: temporal distance classification (TDC), and cross-modal temporal distance classification (CDC). The DeepMind code still relies on lots of small rewards, of a kind, although they are referred to as checkpoints. While playing the game, every sixteenth video frame of the agent's session is taken as a snapshot and compared to a frame in a fourth video of a human playing the same game. If the agent's game frame is close or matches the one in the human's video, it is rewarded. Over time, it imitates the way the game is played in the videos by carrying out a similar sequence of moves to match the checkpoint frame. In the end, the agent was able to exceed average human players and other RL algorithms: Rainbow, ApeX, and DQfD. The researchers documented their method in a paper this week. You can view the agent in action here.

5 of 61 comments (clear)

Min score:

Reason:

Sort:

Mimicry by Anonymous Coward · 2018-05-31 14:30 · Score: 2, Interesting

So, essentially, little more than mimicry to bootstrap what any human who had never seen a video game would do intuitively and without instruction...
Winter is coming: https://blog.piekniewski.info/2018/05/28/ai-winter-is-well-on-its-way/
Captcha: outvote
100^18 possible action sequences by AsmCoder8088 · 2018-05-31 15:18 · Score: 2

I'm pretty sure they must mean 18^100 possible sequences.
That is, if there are 18 possibilities for each step, then 100 steps would yield 18^100 possibilities.
Similar to how if there are two choices to make (go left or go right) and you make 100 of them, then there would be 2^100 possibilities.
Smoke and mirrors by Dan+East · 2018-05-31 15:55 · Score: 3, Informative

So random control inputs are fired over and over until the image of the gameplay the AI is doing very closely matches that of an expert human player. There really is no intelligence to this at all. If the slightest bit of randomness occurs in the game then it will fail, because that would not match the game the human played originally.

the agent was able to exceed average human players
Uh, that's because the "average human" would suck at these kinds of games, and the AI has merely copied the exact gameplay of an expert human who played it originally. So an expert human at a given game exceeds the average human. I hope it didn't take too much research money to come to that conclusion.

--
Better known as 318230.
I'll start worrying when... by Grog6 · 2018-05-31 16:23 · Score: 2

They train one to get all 222 points in Leisure Suit Larry I.
Best I ever got was 221, lol.

--
Truth isn't Truth - Guliani
Re:Montezuma's Revenge Map by religionofpeas · 2018-05-31 21:28 · Score: 5, Insightful

That's the _whole_ point of intelligence --- to make an intelligent decision!
The problem is that intelligence operates on previously recognized patterns. A human playing the game already knows the concept of a map, and a pyramid, and understands locked doors that can be opened with a key. The AI starts with absolutely zero knowledge.