DeepMind Used YouTube Videos To Train Game-Beating Atari Bot (theregister.co.uk)
Artem Tashkinov shares a report from The Register: DeepMind has taught artificially intelligent programs to play classic Atari computer games by making them watch YouTube videos. Exploration games like 1984's Montezuma's Revenge are particularly difficult for AI to crack, because it's not obvious where you should go, which items you need and in which order, and where you should use them. That makes defining rewards difficult without spelling out exactly how to play the thing, and thus defeating the point of the exercise. For example, Montezuma's Revenge requires the agent to direct a cowboy-hat-wearing character, known as Panama Joe, through a series of rooms and scenarios to reach a treasure chamber in a temple, where all the goodies are hidden. Pocketing a golden key, your first crucial item, takes about 100 steps, and is equivalent to 100^18 possible action sequences.
To educate their code, the researchers chose three YouTube gameplay videos for each of the three titles: Montezuma's Revenge, Pitfall, and Private Eye. Each game had its own agent, which had to map the actions and features of the title into a form it could understand. The team used two methods: temporal distance classification (TDC), and cross-modal temporal distance classification (CDC). The DeepMind code still relies on lots of small rewards, of a kind, although they are referred to as checkpoints. While playing the game, every sixteenth video frame of the agent's session is taken as a snapshot and compared to a frame in a fourth video of a human playing the same game. If the agent's game frame is close or matches the one in the human's video, it is rewarded. Over time, it imitates the way the game is played in the videos by carrying out a similar sequence of moves to match the checkpoint frame. In the end, the agent was able to exceed average human players and other RL algorithms: Rainbow, ApeX, and DQfD. The researchers documented their method in a paper this week. You can view the agent in action here.
To educate their code, the researchers chose three YouTube gameplay videos for each of the three titles: Montezuma's Revenge, Pitfall, and Private Eye. Each game had its own agent, which had to map the actions and features of the title into a form it could understand. The team used two methods: temporal distance classification (TDC), and cross-modal temporal distance classification (CDC). The DeepMind code still relies on lots of small rewards, of a kind, although they are referred to as checkpoints. While playing the game, every sixteenth video frame of the agent's session is taken as a snapshot and compared to a frame in a fourth video of a human playing the same game. If the agent's game frame is close or matches the one in the human's video, it is rewarded. Over time, it imitates the way the game is played in the videos by carrying out a similar sequence of moves to match the checkpoint frame. In the end, the agent was able to exceed average human players and other RL algorithms: Rainbow, ApeX, and DQfD. The researchers documented their method in a paper this week. You can view the agent in action here.
So random control inputs are fired over and over until the image of the gameplay the AI is doing very closely matches that of an expert human player. There really is no intelligence to this at all. If the slightest bit of randomness occurs in the game then it will fail, because that would not match the game the human played originally.
the agent was able to exceed average human players
Uh, that's because the "average human" would suck at these kinds of games, and the AI has merely copied the exact gameplay of an expert human who played it originally. So an expert human at a given game exceeds the average human. I hope it didn't take too much research money to come to that conclusion.
Better known as 318230.