Slashdot Mirror


AI Cheats at Old Atari Games By Finding Unknown Bugs in the Code (theverge.com)

An anonymous reader shares a report: AI research and video games are a match made in heaven. Researchers get a ready-made virtual environment with predefined goals they can control completely, and the AI agent gets to romp around without doing any damage. Sometimes, though, they do break things. Case in point is a paper published this week by a trio of machine learning researchers from the University of Freiburg in Germany. They were exploring a particular method of teaching AI agents to navigate video games (in this case, desktop ports of old Atari titles from the 1980s) when they discovered something odd. The software they were testing discovered a bug in the port of the retro video game Q*bert that allowed it to rack up near infinite points. As the trio describe in the paper, published on pre-print server arXiv, the agent was learning how to play Q*bert when it discovered an "interesting solution." Normally, in Q*bert, players jump from cube to cube, with this action changing the platforms' colors. Change all the colors (and dispatch some enemies), and you're rewarded with points and sent to the next level. The AI found a better way, though: "First, it completes the first level and then starts to jump from platform to platform in what seems to be a random manner. For a reason unknown to us, the game does not advance to the second round but the platforms start to blink and the agent quickly gains a huge amount of points (close to 1 million for our episode time limit)."

3 of 45 comments (clear)

  1. Cheating by Anonymous Coward · · Score: 4, Interesting

    For those who didn't read the reddit thread hours ago: The bug only exists in the bad port they used for the AI, not the real version.
    It's not like the SMW-bug where the real deal allows you to inject arbitrary code with clever positioning.

  2. Not cheating ... by Anonymous Coward · · Score: 2, Interesting

    That's not cheating, that's winning according to the rules as discovered by the AI.

    In other words, it tried pretty much everything, discovered a corner case nobody knew of, and concluded if it's possible, it's a valid option.

    Which, while a silly example, tells us that if an AI determines "kill all the humans" is possible and achieves its goals, it will.

    Machine learning just finds optimal solutions. If the inputs allow for it, those 'optimal' solutions could include some outcomes we don't like ... and since nobody really knows the decision process and what rules it's made for itself, you'll simply never know until it's too late.

    Sooner or later, one of these things is going to do something exceedingly dangerous and costly, and nobody will see it coming or know why it happened.

    I fully expect any AI applied to the stock market would eventually conclude stock manipulation would maximise return, and start breaking the law.

    This is interesting, but more so because it points out some terrifying possibilities. Which is why as much work needs to be put into constraining these things as building them.

  3. Just found out something similar yesterday by Anonymous Coward · · Score: 2, Interesting

    I was playing a retro cabinet game called Rampage that eventually was ported to the PS3. You play a giant monster and try to destroy all the buildings before the humans kill you. Yesterday, one building that I destroyed wasn't tallied as destroyed. So the level never completed. But without buildings to hide in, the humans were less frequent sitting ducks. They kept sending commandos to try to dynamite this collapsed building, and I kept eating them, which also heals your monster. And I got the game into a boring, but highly manageable, infinite state. I could rack up points by punching helicopters, and stay alive by healing off the commandos.