'Tit for Tat' Defeated In Prisoner's Dilemma Challenge
colonist writes "Tit for Tat, the reigning champion of the Iterated Prisoner's Dilemma Competition, has been defeated by a group of cooperating programs from the University of Southampton. The Prisoner's Dilemma is a game with two players and two possible moves: cooperate or defect. If the two players cooperate, they both have small wins. If one player cooperates and the other defects, the cooperator has a big loss and the defector has a big win. If both players defect, they both have small losses. Tit for Tat cooperates in the first round and imitates its opponent's previous move for the rest of the game. Tit for Tat is similar to the Mutual Assured Destruction strategy used by the two nuclear superpowers during the Cold War. Southampton's programs executed a known series of 5 to 10 moves which allowed them to recognize each other. After recognition, the two Southampton programs became 'master and slave': one program would keep defecting and the other would keep cooperating. If a Southampton program determined that another program was non-Southampton, it would defect."
Update: 10/14 15:08 GMT by J : If anyone wants to try writing their own PD strategy and see how it fares in a Darwinian contest, I'll host a tournament of Slashdot readers. Here are the docs, sample code, notes on previous runs, and my email address.
But the proper test is really whether the master half of these programs can do better than tit for tat on a large scale basis. I suspect that the S/M program will still do less simply because it plays a pattern during the interaction phase which is likely to result in tit for tat still coming out ahead- if there is one tit for tat, it won't do so well since the costs of being tit for tat are relevant if you don't know the master sign and most of those you interact with are expecting to hear it. But that's already well known. If tit for tat's numbers start growing, it does better. You see, tit for tat has an identification mechanism too, which is simply that it always starts out nice and immediately gets nasty if it gets fucked. If the number of tit for tats increases to a reasonable critical mass, they can have enough positive reactions to do very well. In fact, they'd become a secret society within the S/Ms!
In short, if tit for tat is isolated, it won't do so well since everyone is fucking with it. If there are just a few tit for tats out there, their power increases significantly with each one added.
What we call folk wisdom is often no more than a kind of expedient stupidity.-Edward Abbey
What's being ignored is that the total profit of all the colluding algorithms is less than that of Tit-for-Tat, which makes the solution unviable in real-world Prisoner Dilemma situations. (bidding on large construction projects under certain auction formats, etc)
As an analogy of unprofitable collusion, I could win the World Series of Poker by hiring enough shills and paying their time and entry fees. I would lose money by doing this, probably more than I could recoup with post-tournament income via endorsements/books/whatever.
The parent is correct. Tit-for-Tat is still superior in equal numbers, and a modified Tit-for-Tat that can spoof the recognition algorithm of colluders will trounce them.
Repeated games have radically different outcomes than one-time games. It's long been known that where cooperation is possible, cooperation can beat solitary strategies in repeated games. I really don't think there's anything surprising here.
See what I've been reading.
I'm not sure you read the article correctly.
this is not cheating, what is happening is throughout the iterations the programs can experience the equivalent of morse code in the patterns of defections and co-operations in the form of the penalties.
i.e.: 4yrs 4yrs free free 2years.
As this is iterative, and no actual lifespan (in human terms) the pattern can become quickly evident. Kind of like a "penalty handshake".
In this way one program can detect the intentions of what is happening, without actually communicating directly.
The secret of success is honesty and fair dealing. If you can fake those, you've got it made. (Marx)
That's why this is the Iterated PD Challenge, not just the classic PD. If the competitors only played ONE round of PD each, the contest would not be very interesting. Their past performance in the contest itself is part of the "history" they are using to evaluate their choices.
A mathematical treatment of population genetics in groups was given by W. D. Hamilton in "Innate Social Aptitudes of Man". In the last sentence of that paper, Hamilton, the originator of modern kin selection theory, states:
What Hamilton is referring to is the fact that in any structure of components vs composite, there is the opportunity to defect. An individual gene can defect against the organism within which it resides via, say, meiotic drive. An individual may defect against his tribe made up of his close relatives. A tribe may defect against the others making up a nation. A nation may defect against others making up a geographic race. A geographic race may defect against others making up humanity as a whole.It is indeed a dilemma but it isn't without a rigorous treatement within genetic theory.
Steve Sailer has written an an excellent review of the politically touchy issue of ethnic nepotism given from Hamilton's group selective perspective.
Seastead this.
Err, of course this is the iterated prisoner's dilemma. It is quite easy to do the optimal thing in the non-iterated case: defect. You couldn't make a competition out of that.
Finally! A year of moderation! Ready for 2019?
The length of the code is one of the largest problems to overcome. Performing any signal other than all-cooperate produces a net loss of 1 or 4 points per round for your team in traditional (0,1,3,5) IPD. Simple signalling, ie 4th round defect was very effective. While the master/slave aspect was amazingly effective in my research, the "spoiler" was not. A small population of master/slaves could invade an arbitrariliy large block of TitForTat if evolution was by duplicating winner and removing loser after n iterations. The population of "spoilers" stagnates very quickly in a large TFT population. TFT should be considered a friend, not an enemy because they are a positive growth environment. Going "spoiler" on any non-TFT/ally was quite effective as any bot not prone to cooperate posed the only real risk of "master" losing.
"The best argument against democracy is a five minute chat with the average voter."
--Winston Churchill
You aren't missing anything. In fact, TfT never beats *any* other strategy. The best it can do is tie.
Also worth noting is that the Southampton programs *were* cooperating -- with each other.
That's right, traitor (hawk) beats TfT in any given trial.
BUT, in an environment made up of a few players playing each strategy, then you have the following matchups:
Hawk vs Hawk. Horrible horrible loss for both of them.
TfT vs Hawk. Hawk wins, but only by a single round.
TfT vs TfT. Both TfT 'win' - neither betray the other.
So, overall, TfT does better than hawk.
The interesting part isn't beating TfT (which, as you point out, isn't THAT hard to do) but in doing consistently better than it against a wide variety of programs. Which is what TfT has long been the baseline for.
Tit for Tat is outperformed by "Tit for Two Tats", because it is better at avoiding long runs of damaging mutual recrimination. That was 5 years ago. The performance of any of these strategies is only determined by the opponent strategies that they face, which is arbitrary. It is therefore meaningless to talk of one strategy being 'better' than another - most advanced strategies can beat Tit for Tat given the right opponents.
foo mane padme hum
I've written perl scripts to run Prisoner's Dilemma tournaments on my website (the perl source code for all the species in the last run is on that page) and new submissions for the next tournament are welcome. Since my tournament does Darwinian selection on every agent that plays in it -- agents which don't earn points eventually die off, those which earn a lot of points reproduce -- self-sacrificing pairs of strategies won't do very well.
I haven't yet written code which can trickle in new members over time, but that really wouldn't be very hard to add (I could probably set it up in an hour if you are really interested :)
I presume there is no way the entrants can know how many moves will be in a given round, or Tit-for-Tat could be slightly be beaten by a modified TfT which always defected in the final round.
--
"Open source is good." - Steve Jobs
"Open source is evil." - Microsoft
Unfortunately, PD has a dominant strategy to cheat your opponent. No matter what your oppoent does, you can do better by cheating.
Iterated PD is more interested since it lets your opponent punish you for cheating. So you get into some interesting social issues.
How well a given strategy does depends on the strategies other in its community are using. If the population is heavily cheater based, then agents that cooperate will lose big time. However, if there are enough cooperators, then they will form a coalition of sorts, and even though they will lose to the cheaters, in the end they will come out on top.
"[...]How well a given strategy does depends on the strategies other in its community are using. If the population is heavily cheater based, then agents that cooperate will lose big time. However, if there are enough cooperators, then they will form a coalition of sorts, and even though they will lose to the cheaters, in the end they will come out on top."
I remember reading about the tournament in a "Science" magazine article, back when the original tournament was done; Tit for Tat won irrespectively of the number of strategies it played against.
Furthermore, in a variant in which the winning strategy "spawned" more often, and the loser did not, Tit for Tat became a majority of the population irrespective of the initial sample, except in the extreme case of only one tit for tat and defectors. This is explained better here. The most interesting thing was, for me, the fact that Tit for Tat was superior to a strategy in which the program responded with a delay, i.e. it made the opponent's move two turns down the line. So, remember this if you have kids, or a pet, or both like me: whatever reaction you deem appropriate, it should be done soon, or not at all.
NOTE: the payoff was described as such:if A cooperates and B cooperates, both get X points; if A "defects" it gets W points to Z for B;if both defect, they get Y, where:
W > X > Y > Z, all positive integers.
"If a boss demands loyalty, give him integrity. But if he demands integrity, give him loyalty." (John Boyd, 1927-1997)
In the particular case I ran, it didn't make any difference. Now, though, I've added errors to what each agent tries to play. Tit-For-Tat responds to those errors and, playing against another Tit-For-Tat, quickly plunks down to (say) 90% cooperation plus or minus error, and random-walks around from there. "Always cooperate," though, sticks at 100% plus or minus error, so it does a little better.
The whole reason people are interested in the Prisoner's Dilemma (and Iterated) is that they are models for situations encountered in science. This isn't a comepetition like chess where we're trying to see who (as in the humans) are the best at it. We're trying to see what interesting results come of it. This is an interesting result.
Sort of like the parable of the teacher who said there'd be a pop quiz but you wouldn't know what day it was. Obviously it wasn't Friday, because on Thursday you'd know the quiz was friday. Likewise for Thursday because if the quiz wasn't friday, and if you got to wednesday w/o a quiz, then you would know the quiz was thursday, and therefore the quiz couldn't be on thursday... leading to the conclusion that there is no quiz. Not a perfect example (since then the teacher could give the quiz on any day and it would be equally unknowable given the possibility of "no quiz"), but I hope it clarifies some.