Researchers "Solve" Texas Hold'Em, Create Perfect Robotic Player
Jason Koebler writes The best limit Texas Hold'Em poker player in the world is a robot. Given enough hands, it will never, ever lose, regardless of what its opponent does or which cards it is dealt. Researchers at the University of Alberta essentially "brute forced" the game of limit poker, in which there are roughly 3 x 10^14 possible decisions. Cepheus runs through a massive table of all of these possible permutations of the game—the table itself is 11 terabytes of data—and decides what the best move is, regardless of opponent.
...they got banned by 6241 online casinos and bragging here is the only thing left?
I hope they went to Vegas before they published.
Wouldn't another robot which knows of all possible decisions of this particular robot be better that this "Perfect Robotic Player"?
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
I wonder how long it will be before the google glasses interface is done.
The computer cannot win every hand, which means it must lose some hands. Since it cannot control how large the bet gets, and in real gambling there is no such thing as infinite reserves, then the computer is still subject to the same worries the pros have: whether you can weather the losses and not go bankrupt long enough for your skill to have you come out on top eventually.
Recently I noticed that Texas Hold'em is only half of the game. The betting is the real strategic part. Unless the bot can do this well, I don't it will ever really "beat" a human player.
If so, then the program may play a perfect game technically but still lose money, which for most people, is not really winning.
I "discovered" this same thing like 10 years ago! Fucking a...
No matter how many hands you play (besides zero), it will rarely ever NOT be losing if it plays against similarly skilled opponents.
And if it does, the users will just hand out these muffins: http://i.imgur.com/UKhXeLn.png
There's even casinos you an play against limit players.
:P Some day in the future everyone will be playing perfect NL Texas Holdem, and it will no longer be a game of skill, it will be gambling since everyone is playing the same style.
The real "challenge" if you call it that is finding No Limit solved.
I could easily code a No limit bot which will make the right move maybe 80-90% of the time that puts reads on and everything.
But where's the fun in that? Online poker sites crack down on bots with captchas, so why do I want to lose my initial deposit. Anymore when I play no limit, I play it like it is solved. I get my money in correctly maybe 95% of the time. The fun part of poker is that once you know how to play, its simple. Its hard to learn how to play correctly, but it is easier than tic-tac-toe once you got a strategy down. And yes, I've turned $1 into $1000 over several thousand games, so I'm not just bsing. I'm just a lifelong gamer with a math statistician background who got interested with poker over about a decade.
The reason I don't just go and make a living over it is that I still have some moral hangups on making money and not producing anything for society. It is the same reason I didn't decide to be a pro-gamer, but instead wanted to make software games and useful aps for people to use.
The only thing a good NL bot could do in today's world is make bad players into geniuses like chess programs makes everyone online play like a Grand Master. And there's no good way to monetize this with piracy or I'd sell "near-perfect No limit poker bots" for 500$ a pop or something. Since there's no money to be had due to piracy, its the same reason good players don't write books on the best strategies, they can just play the game and make money. It's better to keep other poker players in the dark on the right moves instead of educating them. I only teach a select few how to play the game right, and normally they don't listen anyway
God spoke to me
First, robots can't play poker. Computer programs called "bots" are hated by online poker sites. They're fun for an offline game like Poker Academy, but they don't play perfectly.
How do the creators of this thing say it's perfect? When handed the lowest hand in the game, how does it not lose? Bluff too badly and that's a loss This makes no sense to me.
can you can search 11TB of data within 30seconds?
thats the time limit on most online tables
It's not a typo if you understood the meaning!
There is another computer that triples the bet every time it loses and leaves when it wins and says something about knowing when to walk away.
And then there is the computer that deals the cards and takes 4% of the pot every round, muttering "House always wins".
The third computer that can beat it keeps giving it complimentary alcoholic beverages until it gets a buffer overrun.
The fourth computer doesn't "believe in no win situations" and reprograms the computer to lose, but then Spock finds out and he gets kicked out of the Academy.
Priest: "Universe from nothing, no laws of physics, sped up time"+ huge discrepancies. Creationism? No. Big Bang Theory
I bet a couple of shots of redeye would lower its winning percentage.
I'm a former professional poker player, now semi-pro and working again in the IT industry. In a game like poker, to "solve" the game, from a mathemartcal and game theory point of view, means to develop a strategy that is "unexploitable", which basically means "mistake free". If two game-theory perfect players were to play against each other, then their "expectation" would be zero, as if they were flipping a coin between each other. Neither would make a mistake, so only te randomness of the cards would determine the winner of a given hand. In the long run, both perfect players would win as often as they lose.
But in a real poker game, human players make lots of mistakes. A player who adjusts their strategy to exploit these mistakes will win vastly more than this (formerly theoretical) "perfect player". The game-theory optimal strategy is focused on not losing, rather than exploiting mistakes and winning the most.
So in an actual game, the expert human player will outperform the computer because the other humans in the game are exploitable.
In live play, especially in tournaments, computer solutions are used in poker. In particular, when the game is "heads up" (only two players), and the chips are not deep, which happens at the end of every tournament, then the correct strategy is to "jam or fold" all hands. The solution to this has been determined in a computer and top players have the table memorized.
If this subject interests you, I HIGHLY recommend "The Mathematics of Poker", by Chen and Ankenman.
...does it know when to fold 'em? When to walk away? When to run?
Koans and fables for the software engineer
I go all in...
You give me this bot, and I'll give you Bob. Bob plays exactly like this bot, but is also able to bluff the opponent when playing the worst possible hand. Bob is thus a measurably better player than the bot.
Always bet and win with the high hands, and fold when you do not have the highest hand.
You've got to lookup when to hold 'em
Lookup when to fold 'em
Lookup when to walk away
Lookup when to run
You never lookup your money
When you're sittin' at the lookup table
There'll be time enough for lookups
When the dealin's done
It's about psychology: guessing what your opponents hold, whether you can beat what you think they hold, or whether you can bluff them into folding.
I'm betting that a good human player could pretty quickly learn how this bot plays, and learn how to react to various scenarios to defeat it...regardless of the math.
Limit poker can be mathmatically gamed. That book will teach you how. No limit completely screws up the pot odds which is why it's considered the only "pure" game.
The published article's actual title is: "HEADS-UP LIMIT HOLD’EM POKER IS SOLVED."
This is critically important. Heads-Up is two player. Limit poker can consist of 2-10 players, and while there are some learnings to be had I'm sure, heads-up limit is not a particularly interesting game, and really only occurs at the end of limit poker tournaments.
They basically solved the weakest case possible, which can obviously be solved because it reduces the amount of influence betting strategy has on an outcome. In a 10 way game against world class players, this bot would likely get destroyed.
You can better understand what is going on by considering the much simpler game Rock paper scissors. 'Perfect' here basically means the strategy gives you the best possible worst case.
For RPS, the perfect strategy (using the term in the same sense as it is used for the poker bot) is to play completely randomly. There is no way to gain an edge over this strategy, no counter-strategy which will give you more than 50% chance of winning, even if you know your opponent's strategy. (In this case, there is also no strategy which will give you less than 50% chance of winning against the 'perfect' strategy.)
For the poker bot, there is no strategy that will give you greater than 50% chance of winning against it in a two player game. If you know its strategy perfectly (but of course you don't know its cards) the best you can do is to equal that 50% chance (which is what happens if it plays itself.) Unlike RPS, you can can lose to the perfect poker bot by playing poorly. Also, as noted in the article, the perfect poker bot always plays as if it were playing against perfect opposition. A good human player will fleece you faster then the perfect bot, because the human player will notice your peculiar imperfections and exploit them, choosing to play in a way which would be suboptimal against a perfect opponent, but superior against you.
Quattuor res in hoc mundo sanctae sunt: libri, liberi, libertas et liberalitas.
Texas Hold'em is poker that's stupid enough to show on TV. Wake me when it figures out 7-card stud.
Comment removed based on user account deletion
Comment removed based on user account deletion
Comment removed based on user account deletion
even in a fair game, the house wins.
Not always. Poker isn't played against the house; it's played against other players. The house just gets about 4 percent of any pots that flop.
Blackjack is the other casino game that can be "beaten". It's played against the house, but the house plays like a robot, and its only advantage is that it takes double busts (hands where both the the player's cards and the dealer's cards total more than 21). The player has plenty of advantages, including double payout for player A-10, the "insurance" side bet on dealer A-10, split A and 8, double down on 10 or 11, and standing on less than 17. There's a basic strategy that by itself makes the house advantage negligible, and if you can mentally estimate when A-10 is more likely, you can know when to bet higher and when to insure. This was enough for the MIT Blackjack Team to turn a modest profit.
Comment removed based on user account deletion
It seems like there was already a pretty established body of theory on limit Hold-em by players. I wonder if there are any interesting ideas that will be proved or disproved by this bot.
When there's no mandate, people will buy insurance if they see their situation as particularly risky. It's called adverse selection, and you can make it work for you in Blackjack.
Insurance is offered when the dealer shows an ace. It's a 2:1 bet that the dealer's hole card is 10. On a fresh shuffle of multiple decks, this is a loser because only 4 out of every 13 cards are 10. But if you can prove a high proportion of 10s left in the library, more than one-third of the remaining cards, it becomes advantageous to insure every dealer ace. Even if you don't count cards, a single deck game where you're playing multiple hands is more likely to produce a hole 10 if there are no player 10s.
heads up limit Holdem is something you in practice never play, unless you are in the last table in a limit tournament.
besides, being able to beat or play even with the perfect player will in fact make you broke in the long run. the casino rake will clean you out , it is just a matter of how many hands.
the only claim to fame this robot could make is that it is the best ever limit heads up novice. there have been plenty of pokerbots before, making good money, playing low stakes online poker. it does not take much to consistently beat the weak. maybe 100 lines of c code with simple heuristics.
anyone who knows knows anything avbout winning poker is that you play the players, not the cards. your winning is dependent on making the other players make the wrong decision, folding winning hands and betting losing hands. weak players make so many bad decisions, you just can play straight. in addition to decisions on each hand, one must take into account how the decision will help in the overalll strategy. you might want to get caught in a bluff. you might want to keep your nuts secret, or vice versa. and then you have to play your streaks, as we'll as the streaks of other players. taking money from very weak players is easy, but to play poker at profesionnal level you need to be a lot better than average, in order to stay ahead of the casino rake.
-- Another senseless waste of fine bytes.
Actually playing the game by the rules is probably less than 10% of the actual game in profession poker. Often pots are won by the weaker of two hands. Real professionals can guess with uncanny accuracy what other players hands are, and know when a bluff can pay off more than playing to the odds of whatever hand you were dealt. And the betting amount is as important as anything. Sometime you use it to bluff, sometimes you try and pretend you don't really have much to try and get other to up their bets. All of this requires loads of physiology, and watching and knowing their opponents. And in fact, if a player was restrained to always play his hand perfectly mathematically correct, and an opponent guesses this, he would then have a incredibly huge advantage. Unless human psychology is also a solved problem, then this AI is no where close to being a perfect player.
Troll is not a replacement for I disagree.
>. Of course you can, but that's like looking at your opponents cards while they are playing. You still would have no idea what "hand" the other robot is playing.
You CAN know, with sufficient precision to matter, what cards the opponent holds . We can know if it has a strong hand, a medium hand, or a weak hand. That's the difference between a competent poker player and a pro. That's the whole point of everything you learn after your first few poker games. I'll explain.
Consider the first bet. The bot bets based on its hole cards only. Obviously you'd bet pocket aces and you wouldn't bet 3-7 off suit. In between are a number of different hand possibilities, each with a defined rank - you can trivially say one pair of cards is better or worse than another pair. Therefore, your computerized rule for the first bet comes down to one value - bet if your hand is better than X, check fold if it's X or worse. A good player will notice that every time the not bets, it has at least a pair of threes. So when the not bets, we know it has at least a pair of threes.
Suppose we have a pair of twos and the not bets. We know that we should fold, because the bot has us beat with a pair of threes or better. Suppose the bot plays first and checks. We know it doesn't have a pair of threes, so if web have that or better we should check.
By watching what the bot does and what cards it later reveals, we learn its rules for those questionable situations. We can invert those rules to know whether it has good, great, or poor cards this time. Suppose the bot throws in a bit of randomness. No problem since Bayes theorem in 1700s. We want to know the probability of a strong hand, given the bet. That's written as P(S|B). We calculate that as P(S|B) = P(B|S) * P(S) / P(B). P(S) we just look up from any of the sources who have calculated the odds of getting a sttong hand. We have P(B) and P(B|S) empirically, from its betting history.
I dunno, my Roomba has a pretty good poker-face.
Table-ized A.I.
Playing mathematically perfect making rational decisions based on the value of the cards is very beatable, I'd like to see this actually win in a large number of games against skilled players. The claim that if the system played itself "the instance with the better cards would win" is pretty telling that they have a long way to go in understanding how poker is played in the real world.
Cepheus wins 280, bringing their balance to 135.
Who are they and why are they taking Cepheus' money? Is it the house? I guess in Alabama they don't even teach remedial English to CS students.
Simple statistics can prove that luck plays absolutely no role in gambling if you look at a large enough number of individual games. No need to built a fancy robot for that.
In fact, most "casino" type games of "chance" are designed to have a very small house edge. This keeps the players playing while at the same time ensuring that the house does not lose money. Lotteries, on the other hand, have a house edge high enough that it's pretty close to cheating.
This is limit holdem, not no-limit. It's a completely different game. Betting is in fixed amounts. What you're saying is that humans have additional (visual) information. Why the hell you think that your personal experience with an entirely different game has anything to do with a mathematical construct is beyond me.
What is with all of the posters here being complete fucking morons? "That's not real poker!" "But what about mah bluffing?" "But if I change the conditions, the optimal strategy will lose!" "Ah don't know mayuth, but these here guys are wrong!" It's one thing to not understand the results. 9/10 people here aren't even talking about the same fucking game, including you. What a fucking disgrace...
Is there some reason to believe that this bot never bluffs with the worst possible hand?
Given enough hands, it will never, ever lose, regardless of what its opponent does or which cards it is dealt.
So what happens if it plays against another instance of itself?
systemd is Roko's Basilisk.
A game theoretically optimal player.
Please note that this bot will never lose, and also NEVER WIN! The last part is conveniently omitted in popluar articles like this. :p
The only way to make a profit with a game theoretically optimal strategy is when your opponent plays dominated mistakes. That is math jargon for "blatantly stupid"
Also my guess is that this is heads-up only, so as soon as you have three players or more, it is a game again.
This is very unlikely. The idea you could reduce poker to a look-up table regardless of your opponent is laugable.
1. While it wins (or at least doesn't lose) at the limit, it doesn't account for the rake taken by casino. Given good enough opponents, both this robot and the opponent may be actually losing in the long run,
2. Their research is focused only on limit hold'em (which for bounded amount of cash at the table has finitely many states) and on not losing rather than winning (trying to win would lead to a rock-paper-scissors game with no objectively best strategy... of course by trying not to lose / trying to merely avoid mistakes the bot typically plays better than the opponent and thus ends up winning, but it does not try to exploit opponents mistakes in any way),
3. Their research is only applicable to heads-up play. When multiple players are involved, "Nash equilibrium" makes no sense, and in fact collusion (whether intentional or not) occurs and plays a huge role in poker.
4. Heads-up tables at all casinos had already been plagued by bots years ago.
But that doesn't mean it will win more money. And that's really the goal, isn't it?
http://forumserver.twoplustwo.... The author admits that the nemesis (perfect enemy) of this bot would have a win rate of 0.05bb/100 (this is extremely small). So it is not technically solved, but it's close enough.
see his paper on why poker is not a game of luck:
http://www.tau.ac.il/~nogaa/PDFS/skill4.pdf
This has/is already happening.
Several online poker sites have been caught with "poker bots" in the past. While they may not use 12TB of data or be "perfect", they do not have to be. They just have to beat the bad human players, and considering the house takes 3% or whatever anyway it hardly matters.
Winning at poker is all about knowing what your opponent has and betting accordingly. As soon as I understand what that bot is doing, I will use it to my advantage.
Just curious...
Matthew Broderick and Joshua figured out the only way to win is to not play at all. This was while holding pocket aces and facing thousands of incoming ICBMs.
I suspect that the rules the robot follows (without reading the article - where's the fun in that?) are not of the form "In situation S(1234) do response R(456)" but rather are "In situation S(1234), 22% of the time do R(456) and 78% of the time do R(678)". Even if you know exactly what the algorithm is, you wil not be able to tell much about the robot's hand by seeing what its bets are.
Game theory types of strategies quite often have this "Do X 20% of the time and Y 80% of the time" nature, especially for games where there is incomplete knowledge.
That's precisely what the last paragraph of my post addressed. A bit of random choice is no problem - you can easily solve for that. I've copy-pasted my last paragraph for you below. I should also add, however, that I doubt the bot does much of that, if any, because for Hold 'Em specifically that' more likely to be harmful than helpful. In the vast majority of Hold 'Em situations, there is one answer that is unequivocally much more mathematically, and deviating from that more than 5% of the time will cost the bot significantly. (Given that the bot is playing the math, not understanding the psychology of their opponent). Some situations in Hold Em are borderline, but those fall within the margin of error. Without getting into poker notation you may not be familiar with, suppose you could choose randomly for a pair of threes. That would mean you'll need to bet a bet of fours and fold a pair of twos - the smart opponent can tell, by your bet, whether you hold "pair of threes or higher" vs "pair of threes or lower". You're only "hiding" your cards when they happen to be exactly a pair of threes, so it's probably not worth the trouble.
Paragraph from original post:
Suppose the bot throws in a bit of randomness. No problem since Bayes theorem in 1700s. We want to know the probability of a strong hand, given the bet. That's written as P(S|B). We calculate that as P(S|B) = P(B|S) * P(S) / P(B). P(S) we just look up from any of the sources who have calculated the odds of getting a sttong hand. We have P(B) and P(B|S) empirically, from its betting history.
But I don't think you are playing enough hands to really get any useful data. How many hands do you think you need to get data from in order to be able to draw any strong conculsions about the bot's hand based on their bets? And you get no useful data when it folds as I assume you don't get to see their hand then.
And it doesn't really matter anyway - you could know exactly what the bot would do in any situation (by getting a copy of the 11TB of lookup tables), and it doesn't give you any advantage without knowing the hidden cards.
If the authors are correct and they have an optimal playing strategy, the opponent can play any way they want and it doesn't make the robot's job any more difficult - in the long run the optimal strategy will not be beaten by any other strategy.
Of course I could be wrong in my understanding. Do you think your playing is good enough to beat it? Are you planning on giving it a go when the website it not crushed under the weight of everyone else trying to test it out? http://poker.srv.ualberta.ca/
With human players, analyzing 100 hands is enough to put them into one of four categories. This is what internet poker software does. I've written a not and tested it based on hundreds of thousands of hands, and I found that categorizing your opponent based on 100 hands truly does work. In other words, it's empirically proven. Analyzing the computer, we already know ahead of time that it plays strict odds, so we can place it in category 4 a priori. Watching a few dozen or a few hundred hands allows us to narrow it down even more specifically.
When the opponent folds, you don't learn much specific, but you do learn a few very important things:
How often do they fold at each betting interval, called P(Fx).
Facing a bet, what is P(Fx). This is called P(Fx|I)
On the button, what is P(Fx)?
Facing a check, what is their P(Fx)?
etc.
All of these numbers allow you to categorize them on certain axis.
The programmers assume an optimal strategy for poker, but there is no such optimum strategy as such. There is a mathematically optimum strategy ASSUMING THAT YOUR OPPONENT PLAYS THE SAME STRATEGY, which what they probably calculated. That strategy is too tight for play against most humans. A different strategy is optimum for playing a loose/wild opponent, which is different from the optimum strategy against a tight/cold.
Blackjack has one mathematically optimum strategy, because you are effectively playing against a robot - the opponent always has the exact same strategy. Professional level poker is more like football. When playing against the 2014 Broncos, the best strategy is to defend against the short play because Manning rushes himself, and doesn't throw deep very often. Playing against Elway's Broncos, the optimum was to rush while covering deep because Elway would take his time and throw deep. The optimum strategy depends very much on which opponent you're playing, so understanding your opponent and making the appropriate adjustments is key.
I left out a step in my explanation about analyzing their folds so you know which hands they fold.
Suppose you observe that they fold 34% of the time they're bet into at the first interval, when they've only seen their hole cards.
You then refer to your table of odds and see that the 34th percentile corresponds to a hand of 8-9 suited or worse.
Now you know that they fold 8-9 suited or worse.
If that's not clear, maybe an example will help:
You drop 100 coins at intervals in the middle of the sidewalk, 50 pennies, 30 nickels, 15 dimes and 5 quarters.
From a distance, you watch me walk down the sidewalk, counting how many coins I pick up, but you can't see WHICH coins I pick up or pass by.
You observe that picked up 20 coins, passing up 80 others.
You deduce that I decided to pick up dimes and nickels, while passing up (folding) nickels and pennies.
You didn't need to see WHICH coins I picked up. Knowing HOW OFTEN I picked up coins allowed you to deduce which ones I picked up.
In poker, out of 200 hands, you'll have some hands rated 1 on a scale of 1-10, some rated 2, some 3, etc. By knowing which portion of hands you play, I know what your cutoff is to consider it a hand good enough to play.
You certainly can "solve" Hold 'Em the same way you can solve Tic-Tac-Toe. It's not that hard to do. Competent players all know that strategy more or less - close enough to be within the margin of random. For some, they calculate it numerically, others calculate it the same way an experienced quarterback calculates the trajectory of a thrown ball , subconciously, but effectively. Yet some players are much better than other competent players. Here's why.
You mentioned RPS, which is trivial to analyze mathematically - the winner each hand is random IF one opponents move is random. . All strategies mathematically tend to 50/50 over the long haul. Yet computers routinely beat people because people AREN'T random. Poker is like that. The optimum strategy is to play the psychology, NOT the cards. The psychology is different for every opponent.
Imagine you played 4x4 tic-tac-toe but put pictures on the board - one corner is Hitler, another corner is Beyonce, one is the prophet Mohamad, another is Obama. The pictures would likely affect your opponent's play if they don't have the exact optimum strategy memorized. Recognizing their psychological bias would CHANGE your optimum strategy. The psychology would be different playing against an Isis member vs against Jay-Z, so the optimum strategy would be different for each opponent.
Yes, it's interesting that the section of Wikipedia you linked to uses the same RPS example I used:
>. . As an example, the perfect strategy for Rock, Paper, Scissors would be to randomly choose each of the options with equal (1/3) probability. The disadvantage in this example is that this strategy will never exploit non-optimal strategies of the opponent, so the expected outcome of this strategy versus any strategy will always be equal to the minimal expected outcome.
As pointed out by the wiki, the expected outcome is the WORST possible non-losing outcome. I'm referring to a strategy with the BEST expected value.
I you asked if I've been able to consistently beat that computer. I've consistently playes as well as that computer. As described in the Wikipedia article, Perfect Play basically attempts to guarantee a tie. I'm going to assume the programmers don't have bugs, so they successfully tie everyone, over the long haul. I've done the same - I've tied that computer. I can guarantee a tie between me and the computer every week. My strategy is "don't play". That guarantees a tie. :)
Actually beating the computer is simple as well. Play sanely until you're up by one bet, then stop. It's virtually guaranteed that you'll be up by one bet at some point, so by stopping at that point you virtually guarantee a win against the computer.
Comment removed based on user account deletion
Comment removed based on user account deletion
Comment removed based on user account deletion