Uber has Cracked Two Classic '80s Video Games by Giving an AI Algorithm a New Type of Memory (technologyreview.com)
An algorithm that remembers previous explorations in Montezuma's Revenge and Pitfall! could make computers and robots better at learning how to succeed in the real world. From a report: A new kind of machine-learning algorithm just mastered a couple of throwback video games that have proved to be a big headache for AI. Those following along will know that AI algorithms have bested the world's top human players at the ancient, elegant strategy game Go, one of the most difficult games imaginable. But two pixelated classics from the era of 8-bit computer games -- Montezuma's Revenge and Pitfall! -- have stymied AI researchers. There's a reason for this seeming contradiction. Although deceptively simple, both Montezuma's Revenge and Pitfall! have been immune to mastery via reinforcement learning, a technique that's otherwise adept at learning to conquer video games.
DeepMind, a subsidiary of Alphabet focused on artificial intelligence, famously used it to develop algorithms capable of learning how to play several classic video games at an expert level. Reinforcement-learning algorithms mesh well with most games, because they tweak their behavior in response to positive feedback -- the score going up. The success of the approach has generated hope that AI algorithms could teach themselves to do all sorts of useful things that are currently impossible for machines. The problem with both Montezuma's Revenge and Pitfall! is that there are few reliable reward signals. Both titles involve typical scenarios: protagonists explore blockish worlds filled with deadly creatures and traps. But in each case, lots of behaviors that are necessary to advance within the game do not help increase the score until much later. Ordinary reinforcement-learning algorithms usually fail to get out of the first room in Montezuma's Revenge, and in Pitfall! they score exactly zero.
DeepMind, a subsidiary of Alphabet focused on artificial intelligence, famously used it to develop algorithms capable of learning how to play several classic video games at an expert level. Reinforcement-learning algorithms mesh well with most games, because they tweak their behavior in response to positive feedback -- the score going up. The success of the approach has generated hope that AI algorithms could teach themselves to do all sorts of useful things that are currently impossible for machines. The problem with both Montezuma's Revenge and Pitfall! is that there are few reliable reward signals. Both titles involve typical scenarios: protagonists explore blockish worlds filled with deadly creatures and traps. But in each case, lots of behaviors that are necessary to advance within the game do not help increase the score until much later. Ordinary reinforcement-learning algorithms usually fail to get out of the first room in Montezuma's Revenge, and in Pitfall! they score exactly zero.
The key in any of these kinds of games is not to make assumptions about anything one hasnâ(TM)t already seen
So researchers have discovered that short term gains can come at the expense of long term success? *gasp* Say it isn't so!
Actually, that's been a known problem for a long time.You end up at a local maximum on the "score" function and now you have no possible way to improve so re-enforcement learning just keeps you there even though you might do substantially better if you actually took a decrease in the "score" and ended up on the path to some other maximum on the function.
(Oh, and "Fr1st ps0t!", especially if it isn't.)
If it works in theory, try something else in practice.
These games aren't hard for a computer to play. You could write a fairly straightforward algorithm that would play them both well.
What's hard is to develop a very general learning algorithm - one that doesn't know about the task - that just happens to pass the test of being able to learn these games.
The approach here seems "cheaty". That's not to say their technique is useless (and maybe it's more generalizable than I'm giving it credit for) - but from the vague overview of the article it seems like they're effectively juicing their performance.
Let's not stir that bag of worms...
I didn't think the 2600 had any copy protection *to* crack, never mind requiring AI to crack anything that might be there. WEIRD.
The team’s new family of reinforcement-learning algorithms, dubbed Go-Explore, remember where they have been before, and will return to a particular area or task later on to see if it might help provide better overall results. The researchers also found that adding a little bit of domain knowledge, by having human players highlight interesting or important areas, sped up the algorithms’ learning and progress by a remarkable amount.
"First they came for the slanderers and i said nothing."
So what does this (an article about Alphabet's deep learning) have to do with Uber???
Now all they need to do is make an AI that can figure out how to make them profitable!
As an aside, have we even developed an accepted definition for what properly qualifies as "AI"? Recently I was being flogged some software who's selling point was an "AI engine" which turned out to be little more than a previous version with little bit of Bayesian statistics bolted on.
I remember chuckling as an eight year old at the name of this video game when it came cout. 30+ years later, I still can't believe they named it what they did.
For those unfamiliar with the American idiom: https://en.wikipedia.org/wiki/Traveler%27s_diarrhea
If it is the Activision version being referred to, that game was so repetitive we used to have contests about how far you could get playing with your eyes closed.
> protagonists explore blockish worlds filled with deadly creatures and traps.
No, they ARE block rooms -- each room IS exactly 40x25 tiles (on the Apple ][ it only displays 40x24 tiles.) The tiles just happen to be a) animated, and b) mega-tiles such as ladders which are three tiles wide.
Also, here is a map of the world --- It make a pyramid shape, go figure!
Impressive that it could fit in 99 rooms in less then 32 KB !
n/t
Once again, this seems to be a case of "AI" research re-discovering some basic math: if the function has discontinuities or is otherwise non-differentiable ("behaviors that are necessary to advance within the game do not help increase the score until much later.") then its optimization is hard or dependent on fortunate starting conditions.
Except they didn't say it was hard. Since you are too smart to even read before commenting, here's a picture, which shows how much better they've done than all other "AI" and a human.
Well why not just make building models of the environment and reducing surprise at observation compared to expectation the optimization goal?
Isn't that the point of "free energy principle" thinking?
How do you know I didn't read it prior to commenting? That's amazing! ...and yet I did read it before I commented. It was me saying it's hard. I didn't say they said it. At least you're not lacking for assumptions in your gratuitous reply.
Technically it was hacked, not cracked.
Why wouldn't you assign points for a strategy not resulting in something negative (ie staying alive points)? Seems like an easy tweak.
love is just extroverted narcissism
uber already has one death how many more before they get an safe auto drive AI?
Shit I loved that game. What an awesome game it was... swinging from vines and jumping on croc heads. Hell yes!
After failing off an drawbridge next time we can slam on the gas or add map data that there is one.
After driving off an pier next time we can add wait for ferry to the map data.
I couldn't find this link anywhere in the actual article Slashdot linked to or the summary - the blog post laying out what Go-Explore is in more detail:
http://eng.uber.com/go-explore/
"There is more worth loving than we have strength to love." - Brian Jay Stanley
I remember managing to finish all 9 levels of the game back in the day in a single sitting. I do recall that you have to map out your path to the bottom of the pyramid, so essentially thereâ(TM)s a lot of exploration involved... however, past level 3 the map remains largely unchained and instead the lower levels go dark requiring a torch. At this point thereâ(TM)s rote memorization of the platforms that needs to happen because you canâ(TM)t see them before finding the torch ... especially one of the rooms that has the torch because it is a nightmare.
I figure that as long as the AI is good at exploring and mapping, it will have no issues remembering when and where to jump, even as it canâ(TM)t see the floor in the later levels.
READY.
PRINT ""+-0
... is unstructured. When you put an 'algorithm' in a game it has no awareness to make discoveries about goals and motivations, take the idea of them mentioning one of their algorithms not being able to get out of the first room...
Now think of what that means, your AI has no sense of when to move on because it doesn't realize there's nothing interesting going on in the space it finds itself. When goals don't exist or are unstructured you basically have to invent goals, aka come to the realization your wasting time and resources in a space that isn't interesting. Instead of making an algorithm to go through a game, they should basically automate the navigation and come up with algorithms that can come up with goals on it's own when there is no stated goal, aka it should be able to seperate areas of interest from areas of disinterest and then from there use those to come up with goals.
Many of the algorithms when I read about them don't sound very interesting because the problem with the real world is that tasks are unstructured and you usually have to do a lot of "legwork" first before you can even infer a goal or come up with a task.
AKA there really needs to be algoritms that come up with sense making of an environment when that environment has no particular 'end state'. Imagine an open world game where there are tasks and activities to do, but there's no 'finish' line. To take an example, say you got fishing minigame, you can drive around town, etc. The AI should come up with a way to discover or generate interest and goals for itself when the world is simply unstructured.
This reminds me in some ways of the chart parsers I was playing around with in university for a paper on natural language processing. I think these days they are mostly used in the context of code compilation, but I must admit I don't know much about modern natural language processing tools, so I don't know if they're still a thing there.
Never trust a man in a blue trench coat, Never drive a car when you're dead
The title mentions novel AI techniques by Uber involving a new type of memory. Cool!!
The summary does not mention Uber nor this new memory. What are exactly the news?
It's heartening to know that one of my favourite games of that era was also one of the hardest for AI to learn. I don't know how many hours I poured into that one.
Reinforcement learning basically exists to solve problems that have the properties you describe. Researchers in the field have been aware of them for a long time. Many modern reinforcement learning algorithms basically use artificial neural networks to estimate the trickier bits in a Q-learning framework. Q-learning was introduced in 1989 and the basic theory developed in the early nineties.
From TFA: "The researchers also found that adding a little bit of domain knowledge, by having human players highlight interesting or important areas, sped up the algorithms’ learning and progress by a remarkable amount." This defeats the whole purpose of autonomous independent exploration.
Games like tic-tac-toe are solved, and I believe Checkers/Draughts is solved as well. Maybe Chess and Go will be solved in the near future.
Nethack may remain the one unsolved game for the foreseeable future.
“Common sense is not so common.” — Voltaire