Slashdot Mirror


DeepMind's AI Agents Exceed 'Human-Level' Gameplay In Quake III (theverge.com)

An anonymous reader quotes a report from The Verge: AI agents continue to rack up wins in the video game world. Last week, OpenAI's bots were playing Dota 2; this week, it's Quake III, with a team of researchers from Google's DeepMind subsidiary successfully training agents that can beat humans at a game of capture the flag. DeepMind's researchers used a method of AI training that's also becoming standard: reinforcement learning, which is basically training by trial and error at a huge scale. Agents are given no instructions on how to play the game, but simply compete against themselves until they work out the strategies needed to win. Usually this means one version of the AI agent playing against an identical clone. DeepMind gave extra depth to this formula by training a whole cohort of 30 agents to introduce a "diversity" of play styles. How many games does it take to train an AI this way? Nearly half a million, each lasting five minutes. DeepMind's agents not only learned the basic rules of capture the flag, but strategies like guarding your own flag, camping at your opponent's base, and following teammates around so you can gang up on the enemy. "[T]he bot-only teams were most successful, with a 74 percent win probability," reports The Verge. "This compared to 43 percent probability for average human players, and 52 percent probability for strong human players. So: clearly the AI agents are the better players."

3 of 137 comments (clear)

  1. Re:Bad Challenge by Djoulihen · · Score: 4, Informative

    From TFA: "DeepMind’s agents also didn’t have access to raw numerical data about the game — feeds of numbers that represents information like the distance between opponents and health bars. Instead, they learned to play just by looking at the visual input from the screen, the same as a human"

    You've got your very least, but I'm pretty sure you'll find another way to turn this into just "shite" work.

  2. Stripped down by thePsychologist · · Score: 4, Informative

    While interesting and promising, it's worth noting that the game they were playing was not the "real" Quake 3 arena with all the weapons but a highly stripped down version with one weapon, no power-ups, and brightly-coloured walls to help the AI perceive the level design.

    --
    "What lies behind us, and what lies before us are tiny matters compared to what lies within us." Ralph Waldo Emerson
  3. Re: Bad Challenge by martyros · · Score: 4, Informative

    But that's a skill-based game, as opposed to strategy or anything needing intelligence. "Skill" as in reaction time to seeing an opponent and successfully moving clicking the mouse of their head.

    Strangely enough, they already thought of that:

    First, we noticed that the agents had very fast reaction times and were very accurate taggers, which could explain their performance. However, by artificially reducing this accuracy and reaction time we saw that this was only one factor in their success. ...Even with human-comparable accuracy and reaction time the performance of our agents is higher than that of humans.

    Both the summary and the Verge article seem to have missed the point of this development -- an improvement to the agent design scheme.

    Last year, after smashing both go and chess with their self-play-from-zero strategy, they tried the same thing with Starcraft. And they lost spectacularly -- even after millions of games, their self-trained DeepMind agents were unable to beat even the most simplistic "scripted" StarCraft AI -- the ones designed for n00b humans to beat up on. They discovered that while the self-play agents were able to eventually figure out activities like "harvest minerals", they were unable to put those together into higher-level activities like building an army and winning a game.

    One of the key refinements they introduce in this paper is to allow the agents to evolve their own internal "rewards", which were sub-steps towards winning. These goals included things like killing an opponent, capturing a flag, recapturing their own flag, avoiding being killed, and so on. The programmers architected in that such rewards were *possible*, but let the learning algorithm define what those rewards actually were and how much the reward was for each one.

    They call this architecture 'FTW'. Then they ran their vanilla "self-play from nothing" bots again, and found that just like in StarCraft, the bots never made much progress; but they found that the new bots, which had self-made internal rewards, were able to consistently beat strong humans, even after having their reaction time and visual accuracy reduced below that of measured humans.

    --

    TCP: Why the Internet is full of SYN.