Uber has Cracked Two Classic '80s Video Games by Giving an AI Algorithm a New Type of Memory (technologyreview.com)
An algorithm that remembers previous explorations in Montezuma's Revenge and Pitfall! could make computers and robots better at learning how to succeed in the real world. From a report: A new kind of machine-learning algorithm just mastered a couple of throwback video games that have proved to be a big headache for AI. Those following along will know that AI algorithms have bested the world's top human players at the ancient, elegant strategy game Go, one of the most difficult games imaginable. But two pixelated classics from the era of 8-bit computer games -- Montezuma's Revenge and Pitfall! -- have stymied AI researchers. There's a reason for this seeming contradiction. Although deceptively simple, both Montezuma's Revenge and Pitfall! have been immune to mastery via reinforcement learning, a technique that's otherwise adept at learning to conquer video games.
DeepMind, a subsidiary of Alphabet focused on artificial intelligence, famously used it to develop algorithms capable of learning how to play several classic video games at an expert level. Reinforcement-learning algorithms mesh well with most games, because they tweak their behavior in response to positive feedback -- the score going up. The success of the approach has generated hope that AI algorithms could teach themselves to do all sorts of useful things that are currently impossible for machines. The problem with both Montezuma's Revenge and Pitfall! is that there are few reliable reward signals. Both titles involve typical scenarios: protagonists explore blockish worlds filled with deadly creatures and traps. But in each case, lots of behaviors that are necessary to advance within the game do not help increase the score until much later. Ordinary reinforcement-learning algorithms usually fail to get out of the first room in Montezuma's Revenge, and in Pitfall! they score exactly zero.
DeepMind, a subsidiary of Alphabet focused on artificial intelligence, famously used it to develop algorithms capable of learning how to play several classic video games at an expert level. Reinforcement-learning algorithms mesh well with most games, because they tweak their behavior in response to positive feedback -- the score going up. The success of the approach has generated hope that AI algorithms could teach themselves to do all sorts of useful things that are currently impossible for machines. The problem with both Montezuma's Revenge and Pitfall! is that there are few reliable reward signals. Both titles involve typical scenarios: protagonists explore blockish worlds filled with deadly creatures and traps. But in each case, lots of behaviors that are necessary to advance within the game do not help increase the score until much later. Ordinary reinforcement-learning algorithms usually fail to get out of the first room in Montezuma's Revenge, and in Pitfall! they score exactly zero.
So researchers have discovered that short term gains can come at the expense of long term success? *gasp* Say it isn't so!
Actually, that's been a known problem for a long time.You end up at a local maximum on the "score" function and now you have no possible way to improve so re-enforcement learning just keeps you there even though you might do substantially better if you actually took a decrease in the "score" and ended up on the path to some other maximum on the function.
(Oh, and "Fr1st ps0t!", especially if it isn't.)
If it works in theory, try something else in practice.
These games aren't hard for a computer to play. You could write a fairly straightforward algorithm that would play them both well.
What's hard is to develop a very general learning algorithm - one that doesn't know about the task - that just happens to pass the test of being able to learn these games.
The approach here seems "cheaty". That's not to say their technique is useless (and maybe it's more generalizable than I'm giving it credit for) - but from the vague overview of the article it seems like they're effectively juicing their performance.
Let's not stir that bag of worms...
The team’s new family of reinforcement-learning algorithms, dubbed Go-Explore, remember where they have been before, and will return to a particular area or task later on to see if it might help provide better overall results. The researchers also found that adding a little bit of domain knowledge, by having human players highlight interesting or important areas, sped up the algorithms’ learning and progress by a remarkable amount.
"First they came for the slanderers and i said nothing."
So what does this (an article about Alphabet's deep learning) have to do with Uber???
As an aside, have we even developed an accepted definition for what properly qualifies as "AI"? Recently I was being flogged some software who's selling point was an "AI engine" which turned out to be little more than a previous version with little bit of Bayesian statistics bolted on.
They mean "cracked" in a difference sense. For example you can crack an egg, or a tech journalist can be addicted to crack.
“Common sense is not so common.” — Voltaire
> protagonists explore blockish worlds filled with deadly creatures and traps.
No, they ARE block rooms -- each room IS exactly 40x25 tiles (on the Apple ][ it only displays 40x24 tiles.) The tiles just happen to be a) animated, and b) mega-tiles such as ladders which are three tiles wide.
Also, here is a map of the world --- It make a pyramid shape, go figure!
Impressive that it could fit in 99 rooms in less then 32 KB !
Well why not just make building models of the environment and reducing surprise at observation compared to expectation the optimization goal?
Isn't that the point of "free energy principle" thinking?
How do you know I didn't read it prior to commenting? That's amazing! ...and yet I did read it before I commented. It was me saying it's hard. I didn't say they said it. At least you're not lacking for assumptions in your gratuitous reply.
Why wouldn't you assign points for a strategy not resulting in something negative (ie staying alive points)? Seems like an easy tweak.
love is just extroverted narcissism
uber already has one death how many more before they get an safe auto drive AI?
After failing off an drawbridge next time we can slam on the gas or add map data that there is one.
After driving off an pier next time we can add wait for ferry to the map data.
I couldn't find this link anywhere in the actual article Slashdot linked to or the summary - the blog post laying out what Go-Explore is in more detail:
http://eng.uber.com/go-explore/
"There is more worth loving than we have strength to love." - Brian Jay Stanley
I remember managing to finish all 9 levels of the game back in the day in a single sitting. I do recall that you have to map out your path to the bottom of the pyramid, so essentially thereâ(TM)s a lot of exploration involved... however, past level 3 the map remains largely unchained and instead the lower levels go dark requiring a torch. At this point thereâ(TM)s rote memorization of the platforms that needs to happen because you canâ(TM)t see them before finding the torch ... especially one of the rooms that has the torch because it is a nightmare.
I figure that as long as the AI is good at exploring and mapping, it will have no issues remembering when and where to jump, even as it canâ(TM)t see the floor in the later levels.
READY.
PRINT ""+-0
... is unstructured. When you put an 'algorithm' in a game it has no awareness to make discoveries about goals and motivations, take the idea of them mentioning one of their algorithms not being able to get out of the first room...
Now think of what that means, your AI has no sense of when to move on because it doesn't realize there's nothing interesting going on in the space it finds itself. When goals don't exist or are unstructured you basically have to invent goals, aka come to the realization your wasting time and resources in a space that isn't interesting. Instead of making an algorithm to go through a game, they should basically automate the navigation and come up with algorithms that can come up with goals on it's own when there is no stated goal, aka it should be able to seperate areas of interest from areas of disinterest and then from there use those to come up with goals.
Many of the algorithms when I read about them don't sound very interesting because the problem with the real world is that tasks are unstructured and you usually have to do a lot of "legwork" first before you can even infer a goal or come up with a task.
AKA there really needs to be algoritms that come up with sense making of an environment when that environment has no particular 'end state'. Imagine an open world game where there are tasks and activities to do, but there's no 'finish' line. To take an example, say you got fishing minigame, you can drive around town, etc. The AI should come up with a way to discover or generate interest and goals for itself when the world is simply unstructured.
This reminds me in some ways of the chart parsers I was playing around with in university for a paper on natural language processing. I think these days they are mostly used in the context of code compilation, but I must admit I don't know much about modern natural language processing tools, so I don't know if they're still a thing there.
Never trust a man in a blue trench coat, Never drive a car when you're dead
The title mentions novel AI techniques by Uber involving a new type of memory. Cool!!
The summary does not mention Uber nor this new memory. What are exactly the news?
Reinforcement learning basically exists to solve problems that have the properties you describe. Researchers in the field have been aware of them for a long time. Many modern reinforcement learning algorithms basically use artificial neural networks to estimate the trickier bits in a Q-learning framework. Q-learning was introduced in 1989 and the basic theory developed in the early nineties.
From TFA: "The researchers also found that adding a little bit of domain knowledge, by having human players highlight interesting or important areas, sped up the algorithms’ learning and progress by a remarkable amount." This defeats the whole purpose of autonomous independent exploration.
Games like tic-tac-toe are solved, and I believe Checkers/Draughts is solved as well. Maybe Chess and Go will be solved in the near future.
Nethack may remain the one unsolved game for the foreseeable future.
“Common sense is not so common.” — Voltaire