DeepMind Used YouTube Videos To Train Game-Beating Atari Bot (theregister.co.uk)
Artem Tashkinov shares a report from The Register: DeepMind has taught artificially intelligent programs to play classic Atari computer games by making them watch YouTube videos. Exploration games like 1984's Montezuma's Revenge are particularly difficult for AI to crack, because it's not obvious where you should go, which items you need and in which order, and where you should use them. That makes defining rewards difficult without spelling out exactly how to play the thing, and thus defeating the point of the exercise. For example, Montezuma's Revenge requires the agent to direct a cowboy-hat-wearing character, known as Panama Joe, through a series of rooms and scenarios to reach a treasure chamber in a temple, where all the goodies are hidden. Pocketing a golden key, your first crucial item, takes about 100 steps, and is equivalent to 100^18 possible action sequences.
To educate their code, the researchers chose three YouTube gameplay videos for each of the three titles: Montezuma's Revenge, Pitfall, and Private Eye. Each game had its own agent, which had to map the actions and features of the title into a form it could understand. The team used two methods: temporal distance classification (TDC), and cross-modal temporal distance classification (CDC). The DeepMind code still relies on lots of small rewards, of a kind, although they are referred to as checkpoints. While playing the game, every sixteenth video frame of the agent's session is taken as a snapshot and compared to a frame in a fourth video of a human playing the same game. If the agent's game frame is close or matches the one in the human's video, it is rewarded. Over time, it imitates the way the game is played in the videos by carrying out a similar sequence of moves to match the checkpoint frame. In the end, the agent was able to exceed average human players and other RL algorithms: Rainbow, ApeX, and DQfD. The researchers documented their method in a paper this week. You can view the agent in action here.
To educate their code, the researchers chose three YouTube gameplay videos for each of the three titles: Montezuma's Revenge, Pitfall, and Private Eye. Each game had its own agent, which had to map the actions and features of the title into a form it could understand. The team used two methods: temporal distance classification (TDC), and cross-modal temporal distance classification (CDC). The DeepMind code still relies on lots of small rewards, of a kind, although they are referred to as checkpoints. While playing the game, every sixteenth video frame of the agent's session is taken as a snapshot and compared to a frame in a fourth video of a human playing the same game. If the agent's game frame is close or matches the one in the human's video, it is rewarded. Over time, it imitates the way the game is played in the videos by carrying out a similar sequence of moves to match the checkpoint frame. In the end, the agent was able to exceed average human players and other RL algorithms: Rainbow, ApeX, and DQfD. The researchers documented their method in a paper this week. You can view the agent in action here.
Beat it is the only way to make it to what you want, what you really want.
So, essentially, little more than mimicry to bootstrap what any human who had never seen a video game would do intuitively and without instruction...
Winter is coming: https://blog.piekniewski.info/2018/05/28/ai-winter-is-well-on-its-way/
Captcha: outvote
So if an algorithm "watches" a video of a good player, the algorithm's play approaches the level of the human. Hooray?
Players already have a term for people looking up clues on youtube. Cheaters.
The whole point of many games is to solve the puzzles.
If you just google for a video to find the answers, that's still cheating.
Even if the A.I. has to brute force an exhaustive approach to solve a game, so be it. Or develop better methods.
What do you mean "not obvious where to go" ? There are only 9 levels and the map is in the shape of a pyramid -- sections of the pyramid are blocked off for that level.
True, it isn't deterministic, but cry me a river. That's the _whole_ point of intelligence --- to make an intelligent decision!
I'm pretty sure they must mean 18^100 possible sequences.
That is, if there are 18 possibilities for each step, then 100 steps would yield 18^100 possibilities.
Similar to how if there are two choices to make (go left or go right) and you make 100 of them, then there would be 2^100 possibilities.
Can they beat Zork yet?
-Darkshadow (There was a thing called Heaven; but all the same they used to drink enormous quantities of alcohol.)
So random control inputs are fired over and over until the image of the gameplay the AI is doing very closely matches that of an expert human player. There really is no intelligence to this at all. If the slightest bit of randomness occurs in the game then it will fail, because that would not match the game the human played originally.
the agent was able to exceed average human players
Uh, that's because the "average human" would suck at these kinds of games, and the AI has merely copied the exact gameplay of an expert human who played it originally. So an expert human at a given game exceeds the average human. I hope it didn't take too much research money to come to that conclusion.
Better known as 318230.
They train one to get all 222 points in Leisure Suit Larry I.
Best I ever got was 221, lol.
Truth isn't Truth - Guliani
Yoiur motha!
I, for one, welcome our AI Video Gaming Overlords.
Thanks for writing such a good article, It really gives a clear idea about the topic. Really great post. - Appointment Setting Services http://leadgenerationservices....
Such an Informative post. Especially for gaming lovers.
I remember this game, might've even made a map for it. The AI computer needs pen and paper, perhaps a joint, and lot's of hours, and a mouth so it can be screaming at the screen when you die, that helped me learn it. Also, an fresh Atari joystick for when it breaks the first one in frustration.
Montezuma's Revenge https://www.youtube.com/watch?v=_zbg9rs5QZY
See how many movies actually are about Atari games. Now look at the amount of movies on e.g. youporn, redtube and many similar sites and you see where I am going with this.
Don't fight for your country, if your country does not fight for you.
Well, depends on the game.
Puzzles, certainly, but no amount of videos can make you automatically finish some games unless you are up to the required skills/reflexes: fighting, arcades, even Tetris!
And is watching football games from your next opponent cheating, then?
I always wondered what the hell the devs were thinking when they decided to name their game after diarrhea.
...computer needs pen and paper...
That's one way to implement optical storage.
He's getting rather old, but he's a good mouse.
no amount of videos can make you automatically finish some games unless you are up to the required skills/reflexes
AI skills impress me. Here it sounds like the primary skill is mimicry, but others exist. AI reflexes should be up to almost any task.
He's getting rather old, but he's a good mouse.
and the program would be lost again. Not the human, who would adapt. The program locks in the solution. And this is the problem with AI. Wherever you use it in lieu of humans you freeze the level of expertise.
E Proelio Veritas.
you are one of the 10,000 who understood that the soup of classifiers and feedback vodoo is called AI today.
Once again we see millennials failing to grasp a concept. Say it with me: *The method of delivery or the process is not the end result.* it isn't 'being trained' or 'learning' anymore than software that has been programmed by hand, and if programming is easier and faster, why wouldn't you just do that? Oh, that's right. Millennial engineers are lazy. So are a lot of other people. Funny that the 'automation revolution' (which will probably never materialize, as this is just technological iteration, not even a leap forward, and past advances haven't made much of a dent. We don't think this stuff is half as impressive or cool because unlike your baby ass, we've seen it before) has turned out to be about laziness and ineptitude rather than innovation or progress. Whatever, but stop wasting our time with your science fair, thank you.
I apologize to /. for the grammar mistakes in my post. I wish /. would implement at least a 'one time edit' ability. :/
This sounds like a step back for DeepMind. The whole point of their high profile project AlphaZero was to learn to play games (go, chess, shogi) without any mimicry of human players, and it proved that such an approach could be successful. It soundly defeated the previous go world champ, AlphaGo Master, which was trained with top-level human games and self-play. AlphaZero wasn't taught or shown any patterns, rather it discovered them through self-play and random moves.
The funny thing is, after AlphaZero, DeepMind was like, "We're not doing this just to beat people at games, we want to apply this to solving serious problems in energy, healthcare, etc." (paraphrase, not a direct quote)
And now they're revealing how they're playing old Atari games...
I'm pretty sure I could train my dog to do the same. I don't get how this is so special?
The later levels in Montezuma's Revenge were the same as the earlier ones, except they were blacked out. You have to have already memorized where to go.
Coder's Stone: The programming language quick ref for iPad
Using the best human runs as a basis for training the AI seems like cheating. The AI should learn without preconceived knowledge.