'Tit for Tat' Defeated In Prisoner's Dilemma Challenge

← Back to Stories (view on slashdot.org)

'Tit for Tat' Defeated In Prisoner's Dilemma Challenge

Posted by ryuzaki0 on Thursday October 14, 2004 @02:06AM from the taking-one-for-the-team dept.

colonist writes "Tit for Tat, the reigning champion of the Iterated Prisoner's Dilemma Competition, has been defeated by a group of cooperating programs from the University of Southampton. The Prisoner's Dilemma is a game with two players and two possible moves: cooperate or defect. If the two players cooperate, they both have small wins. If one player cooperates and the other defects, the cooperator has a big loss and the defector has a big win. If both players defect, they both have small losses. Tit for Tat cooperates in the first round and imitates its opponent's previous move for the rest of the game. Tit for Tat is similar to the Mutual Assured Destruction strategy used by the two nuclear superpowers during the Cold War. Southampton's programs executed a known series of 5 to 10 moves which allowed them to recognize each other. After recognition, the two Southampton programs became 'master and slave': one program would keep defecting and the other would keep cooperating. If a Southampton program determined that another program was non-Southampton, it would defect." Update: 10/14 15:08 GMT by J : If anyone wants to try writing their own PD strategy and see how it fares in a Darwinian contest, I'll host a tournament of Slashdot readers. Here are the docs, sample code, notes on previous runs, and my email address.

27 of 356 comments (clear)

Min score:

Reason:

Sort:

Scary Stuff by mfh · 2004-10-14 02:07 · Score: 5, Funny
FTA:
- If you confess and your partner denies taking part in the crime, you go free and your partner goes to prison for five years.
- If your partner confesses and you deny participating in the crime, you go to prison for five years and yor [sic] partner goes free.
- If you both confess you will serve four years each.
- If you both deny taking part in the crime, you both go to prison for two years.
This sounds pretty much like the RIAA might be involved. I would deny everything if I were you!
--
The dangers of knowledge trigger emotional distress in human beings.
That's not really so special by Perianwyr+Stormcrow · 2004-10-14 02:09 · Score: 4, Interesting

In other words, an in-group can work vs. tit for tat if it outnumbers it. I'd like to see a trial with a slow trickle of immigration of tit for tats into a large population of S/M programs. That might be illuminating. I suspect the outcome would be that tit for tat still does well.

--
What we call folk wisdom is often no more than a kind of expedient stupidity.-Edward Abbey
1. Re:That's not really so special by Minwee · 2004-10-14 02:18 · Score: 4, Funny
  
  I was with you up until you started talking about tits and S/M... Am I still on slashdot, or did I wander onto alt.com by mistake?
2. Re:That's not really so special by Anonymous Coward · 2004-10-14 02:20 · Score: 5, Informative
  
  What's being ignored is that the total profit of all the colluding algorithms is less than that of Tit-for-Tat, which makes the solution unviable in real-world Prisoner Dilemma situations. (bidding on large construction projects under certain auction formats, etc)
  
  As an analogy of unprofitable collusion, I could win the World Series of Poker by hiring enough shills and paying their time and entry fees. I would lose money by doing this, probably more than I could recoup with post-tournament income via endorsements/books/whatever.
  
  The parent is correct. Tit-for-Tat is still superior in equal numbers, and a modified Tit-for-Tat that can spoof the recognition algorithm of colluders will trounce them.
3. Re:That's not really so special by harrkev · 2004-10-14 02:45 · Score: 4, Insightful
  
  Yup! It is the "outnumbers" thing which (in my opinion) makes things unfair.
  
  Had this been an actual prisoner's dilemma, this winning strategy would require recruiting a large number of thugs who LIKE going to prison and are willing to "take one for the team."
  
  Although cooperation is not explicitly defined as being against the rules, IMHO, it goes against the "spirit" of the competition. The point is that each algorithm is supposed to act in a greedy manner.
  
  This will no doubt spark a LOT of discussion, but to me, they "cheated." (OK. Maybe "worked the system" is a better phrase).
  
  --
  "-1 Troll" is the apparently the same as "-1 I disagree with you."
That's why... by Stile+65 · 2004-10-14 02:11 · Score: 4, Funny

...fraternities and secret societies work so well!

I'm off to join the Freemasons. Be back in a few.

--
I claim first use of "Error No. 0B" - or "No. 0B error." It'll be the new ID 10T!
Uh, isn't that just cheating? by DoorFrame · 2004-10-14 02:14 · Score: 4, Interesting

I mean, the whole point of the Prisoner's Dilemma is that you don't have all the information. You don't know what your partner/opponent is going to do and you have decide based entirely on what little information you have based on your history with your partner/opponent. What these people are doing is creating a pattern to be recognized by another player, and then working as a team. And, it's not like they're people where one person might change their mind and decide to defect unilaterally... they're programs. Once they've locked onto each other as the same program, that's it. They'll play to their advantage until the end.

The real trick is to find a program that can beat other DIFFERENT programs, not beat itself. This seems really stupid, or am I missing something?

--

--
RumorsDaily
1. Re:Uh, isn't that just cheating? by Urban+Garlic · 2004-10-14 02:23 · Score: 4, Insightful
  
  Well, part of the interest is that these programmers found a way, within the rules, to get more information, by means of their "secret handshake". The important lesson (to my mind) is that the environment can be manipulated in surprising ways to get a desired result. That's creativity and innovation doing its thing.
  
  Interestingly, this strategy is also fairly "brittle", I think, in that simple rule-changes could foil it. Requiring only one submission per team, for example, or scoring teams according to the total (or average) scores of all their programs, would complicate any strategy of collusion.
  
  --
  2*3*3*3*3*11*251
2. Re:Uh, isn't that just cheating? by Minwee · 2004-10-14 02:29 · Score: 4, Insightful
  
  I don't see it as cheating. It's a lot like Bridge -- The rules say that you can't show your partner your hand and you can't tell them what you have, but you are allowed to use prearranged bidding conventions to pass information across the table. All that the Southhampton agents did was use a bidding strategy. They did act as a team, but they had no out-of-game way of knowing that they were up against a team member. That doesn't break any rules, and it did work. The Southampton team took the top three spots in the competition. If you insist on comparing the entrants to people, consider this. They worked as a team, for the good of the team, knowing that at least some of them would win even if the others bombed. People do that kind of thing all the time outside of competitions. Why should it be so out of place here?
Does this defeat the purpose? by Snowspinner · 2004-10-14 02:14 · Score: 5, Interesting

This seems to me to be an unfair way to "win." The point of the PD simulation is to talk about whether, in the absence of any social consequences, it is better to screw someone over for money or to work cooperatively with them. It's not a perfect model for that question, but that is still the question that makes us care about the PD in the first place.

All this has done is make a meta-PD game in which the two programs create a meta-game in which they agree to cooperate. That is to say, this is a solution to the PD problem that relies on the cooperation of a cohort (Someone to keep choosing loyalty while you defect and get all the money). Which is exactly not the point of PD.

So the real headline, I think, is "Trivial flaw found in definition of Prisoner's Dillema problem. University of Southhampton wastes money demonstrating flaw instead of writing a goddamn paper like a normal person would."

--
Philip Sandifer's academic website
...by cheating! by The-Bus · 2004-10-14 02:15 · Score: 4, Insightful

If the program recognized that another player was not a Southampton entry, it would immediately defect to act as a spoiler for the non-Southampton player. The result is that Southampton had the top three performers -- but also a load of utter failures at the bottom of the table who sacrificed themselves for the good of the team. Another twist to the game was the addition of noise, which allowed some moves to be deliberately misrepresented. In the original game, the two prisoners could not communicate. But Southampton's design lets the prisoners do the equivalent of signaling to each other their intentions by tapping in Morse code on the prison wall. Kendall noted that there was nothing in the competition rules to preclude such a strategy, though he admitted that the ability to submit multiple players means it's difficult to tell whether this strategy would really beat Tit for Tat in the original version. But he believes it would be impossible to prevent collusion between entrants.

Yeah, that's not the Prisoner's Dilemma. Or even the Iterated PD. This whole "signaligng Morse code" on the prison walls is nonsense, because it was not part of the original plan. Just because it's not in the rules doesn't mean you can do it. In Chess there's no rule specifically against me bringing a SuperGrape(TM) onto the board. The SuperGrape(TM) immediately destroys all pawns on a color of my choosing.

No, it doesn't work that way.

While this is an interesting experiment, it's not a true victory.

--
Small potatoes make the steak look bigger.
1. Re:...by cheating! by billbaggins · 2004-10-14 02:47 · Score: 4, Informative
  
  Actually, this is exactly the sort of thing that the organizers were hoping would happen. From the FAQ, question 12:
  But we don't want to [impose limits on the number of entries] as it will be interesting to see if people can come up with strategies that cooperate with themselves within the whole population.
  
  --
  "The best argument against democracy is a five minute chat with the average voter."
  --Winston Churchill
Evolutionarily stable? by Dr.+Manhattan · 2004-10-14 02:17 · Score: 4, Insightful

From TFA:"Our initial results tell us that ours is an evolutionarily stable strategy -- if we start off with a reasonable number of our colluders in the system, in the end everyone will be a colluder like ours," he said.
It's not clear to me how the entries determined who would be the 'master' and who would be the 'slave'. It seems that if you had lots of 'colluders' around who could be induced to 'suicide' for another's benefit, you'd very quickly get cheaters who worked to be the 'master' in all situations.
This strikes me as a lot more reminiscient of the Hawk/Dove situation.

--
PHEM - party like it's 1997-2003!
Tit for Tat by alexo · 2004-10-14 02:17 · Score: 4, Funny

Why should Tat get all the fun?
Spirit of the PD by johnthorensen · 2004-10-14 02:19 · Score: 4, Interesting

Not precisely cheating, as the rules are set up to play this way...but this certainly violates the spirit of the original Prisoner's Dilemma. Why?

Real prisoners only get to choose ONCE.

By taking advantage of the multiple-iteration aspect of the simulation with this sort of 'portknocking' strategy, the winning programs kind of take a cheap shot at the original PD.

Of course, it's all hypothetical anyway, and come to think of it Tit For Tat technically takes advantage of the multiple-iteration aspect as well by doing whatever the opponent did the last time...

Ah well, at least the Wikipedia entry makes a distinction between regular "Prisoner's Dilemma" and "Iterated Prisoner's Dilemma".
The important codicil to the story is... by aug24 · 2004-10-14 02:19 · Score: 4, Insightful

"The result is that Southampton had the top three performers -- but also a load of utter failures at the bottom of the table who sacrificed themselves for the good of the team."

J.

--
You're only jealous cos the little penguins are talking to me.
The article got it wrong by nels_tomlinson · 2004-10-14 02:21 · Score: 5, Informative

The article got it wrong: they compared the tit-for-tat strategy for the iterated prisoner's dilemma to mutual assured destruction. That's wrong, since nuclear war is usually considered to be a one-time game: once you've blown each other up, there is no next round. Tit-for-tat requires that there always be a following round.
Repeated games have radically different outcomes than one-time games. It's long been known that where cooperation is possible, cooperation can beat solitary strategies in repeated games. I really don't think there's anything surprising here.

--
See what I've been reading.
1. Re:The article got it wrong by Anonymous Coward · 2004-10-14 02:36 · Score: 5, Insightful
  
  "The article got it wrong: they compared the tit-for-tat strategy for the iterated prisoner's dilemma to mutual assured destruction. That's wrong, since nuclear war is usually considered to be a one-time game: once you've blown each other up, there is no next round. Tit-for-tat requires that there always be a following round."
  
  The nuclear MAD comparison is apt, because of the time lag between launch detection and detonation. During the flight time of the first launch, there is time for several rounds to occur.
  
  Actually, the nuclear standoff could be considered an ongoing PD game with both sides playing Tit-For-Tat strategies. The rounds occur every few minutes with both sides asking "did the other side screw us yet" and responding "no, so we won't screw them yet". This PD game has consisted of millions and millions of turns already, with both players using historical knowledge to influence their current choices.
It is not the first by Flyboy+Connor · 2004-10-14 02:28 · Score: 4, Interesting

Axelrod never claimed that Tit-for-Tat was the best PD-playing program. He just stated that Tit-for-Tat would play well against any other combination of programs. Actually, IIRC, in the second tournament he organised Tit-for-Tat came in second. There was a different program that managed to exploit faults in other programs.
It is easy to score better than Tit-for-Tat in Axelrod's (original) tournament. He included a program that played random moves. It is not difficult to recognise this program after, say, ten moves have been played. You can always defect against random, because its moves are unrelated to its history. So, a program that plays Tit-for-Tat by default, but always defects against Random, scores better than Tit-for-Tat.
Does this dillute Tit-for-Tat's accomplishment? Of course not. Tit-for-Tat still plays well. And it is such a simple strategy that it can be programmed in two lines ("C on move 1, then copy opponent's previous move"), which none of the other programs achieve. Tit-for-Tat is simple, elegant, and strong. It's beautiful.
Southamptom entries, on the other hand, are complex, sneaky, and cheating against (perhaps unwritten, but nonetheless agreed-upon) rules. They're ugly. They only prove that backstabbing cheating bastards may defeat just-and-fair if the referee is looking the other way for a moment.
Re:Practicality by Daniel_Staal · 2004-10-14 02:29 · Score: 4, Insightful

Sorry, you've probably already lost that one. The prisoner's dilemma is quite useful in normal life, or at least the thinking that gives rise to the solution is. It applies any time there is significant advantage to be gained by working together, but also much advantage to be the one 'cheating'.

For /., try this interpretation:
If we both share our source code, we will both will be more productive.
If I share my source code, and you don't, you can be more productive. (Assuming you can use mine.)
If neither of us share, we both will have to re-create other's work...

--
'Sensible' is a curse word.
Did the same thing a few years ago... by Anonymous Coward · 2004-10-14 02:42 · Score: 5, Informative

The length of the code is one of the largest problems to overcome. Performing any signal other than all-cooperate produces a net loss of 1 or 4 points per round for your team in traditional (0,1,3,5) IPD. Simple signalling, ie 4th round defect was very effective. While the master/slave aspect was amazingly effective in my research, the "spoiler" was not. A small population of master/slaves could invade an arbitrariliy large block of TitForTat if evolution was by duplicating winner and removing loser after n iterations. The population of "spoilers" stagnates very quickly in a large TFT population. TFT should be considered a friend, not an enemy because they are a positive growth environment. Going "spoiler" on any non-TFT/ally was quite effective as any bot not prone to cooperate posed the only real risk of "master" losing.
The winner basically cheated (good for him :) by jamie · 2004-10-14 02:50 · Score: 4, Interesting

It's pretty trivial that if two or more Dilemma agents are able to recognize each other, they have an advantage over those which cannot. I've got a Prisoner's Dilemma simulation running on my website -- I wrote some code for it over the summer and have been playing around with it on and off.
Once I experimented with letting the agents recognize which "species" they were in and which "species" their opponent was. The runaway winner, of course, was the one which always cooperated with itself, and was less nice to every other species. (In my version, "less nice" meant playing Tit-For-Tat, but the idea's the same.)
Being able to do this is like having the teacher's edition. If recognizing which species other agents belong to is allowed, that's a pretty trivial strategy. It's not called cooperation. It's called xenophobia, or to put it into the most familiar anthropomorphization, racism.
(The life lesson, if I may go out on a limb, is that in an environment where some recognize a quality called "race" and discriminate based on it, being unable to see that quality is a liability. Being truly color-blind means you are unable to recognize not only race but racism, which means you will be taken advantage of.)
When I ran my first tournament and got some interesting results based on this, I realized that knowledge of what "species" an agent belongs to is too powerful, it throws a monkey wrench into the works. So I scrapped it and moved on to stuff I found more interesting.
But the winner of this PD tournament was even craftier; he submitted a ton of entries, all of which were xenophobic in this way, except that they all recognized one "species" as the top dog. The other "species" essentially committed suicide to give the highest score to the top dog. That wouldn't have worked in my tournament, since they literally would have committed suicide (my agents starve to death if they don't score high enough) and that would have shaped the resulting environment. Every tournament is artificial in some way, and the human submitting entries to this one was clever enough to take advantage of these particular artificialities.
Since it's now been shown that inter-agent communication is possible, that's going to be fair game for every tournament from now on. The next step is going to be designing tournaments to work with this trick, not against it. As I wrote to this tournament's organizers:

Since that's such a powerful strategy, I think the next step in PD tournaments is not to try to overcome it, but to embrace it: allow agents to communicate, not just with their own species, but with whoever they're playing against. My guess is that mere xenophobia would be eclipsed by the much more powerful strategy of joining the ongoing discussion about which agents can and can't be trusted. That's the next big feature I want to try.
That's right by Anonymous Coward · 2004-10-14 02:52 · Score: 4, Informative

That's right, traitor (hawk) beats TfT in any given trial.

BUT, in an environment made up of a few players playing each strategy, then you have the following matchups:

Hawk vs Hawk. Horrible horrible loss for both of them.
TfT vs Hawk. Hawk wins, but only by a single round.
TfT vs TfT. Both TfT 'win' - neither betray the other.

So, overall, TfT does better than hawk.

The interesting part isn't beating TfT (which, as you point out, isn't THAT hard to do) but in doing consistently better than it against a wide variety of programs. Which is what TfT has long been the baseline for.
Re:Don't you see the beauty? by kisrael · 2004-10-14 03:45 · Score: 4, Insightful

They "cheated", and the other guy didn't, so they won big! Wasn't that the whole premise?

Well, they kind of went for a win on the "metalevel", utilizing the circumstances of the competition rather than solving the originally stated issue in an abstract way. On the one hand that's cool because evolution can work like that sometimes, but on the other hand, it really isn't answering the original question any more. (the question is "what's probably the best strategy for any given individual in Prisoner's Dilemna" and they changed the question to "how can we get some individuals to be super-players with the way this prisoner's dilemna simulator is setup"

--
SO YOU'RE GOING TO DIE: The Comic for Dealing with Death
Missing option... by balaam's+ass · 2004-10-14 04:51 · Score: 5, Insightful

I agree that this defnition of the "Prisoner's Dilemma" is no more than a "meta-game," and not really a problem of philosophical ethics (though it may appear to be to some people.)

What I find disturbing this is the way that the problem is framed presupposes no underlying system of ethics. To wit....
* If you confess and your partner denies taking part in the crime, you go free and your partner goes to prison for five years. * If your partner confesses and you deny participating in the crime, you go to prison for five years and yor partner goes free. * If you both confess you will serve four years each. * If you both deny taking part in the crime, you both go to prison for two years. What do you do?

How about: Tell the truth? Regardless of what your partner does, tell the truth. I find it disturbing that the problem is framed in a way that the actual truth of the matter is irrelevant. (i.e. the problem would be unchanged if I replaced "You and your partner have committed a crime and are caught" with "You and a friend have been accused of a crime which you may or may not have committed.")

I'm not trolling or off-topic here. I'm dead serious. This formulation of the PD is ethically doomed from the get-go, and thus the results of the experiment may be of interest to mathematical game theorists of this particular game, but I find it unwise to think the results make any significant implications about ethics (or anything else for that matter).

Someone will counter that since this is a "Prisoner's" dilemma the person involved must be a criminal with no "ethical" principles other than an interest in self-preservation (i.e. the person is already debased as can contribute nothing meaningful on the subject of ethics! ;-) ). I'd say that just because someone committed a crime does not mean they necessarily want to continue committing crimes...
Pavlov, Grim, and the other strategies. by DrRobin · 2004-10-14 05:09 · Score: 4, Interesting

As a microbiologist with interest in evolution, I have followed this field from afar for years. Looking over the results, I was surprised at how relatively poorly "Pavlov" (win-stay lose-shift) did, since it performs so strongly in noisy, evolutionany, versions of the game. [see:
http://www.ncbi.nlm.nih.gov/entrez/query.fc gi?hold ing=npg&cmd=Retrieve&db=PubMed&list_uids=8316296&d opt=Abstract
It was also a bit dismaying to see how well "Grim" (hold a grudge forever) did in both games. In evolutionary versions of the game, Pavlov helps keep down the population of "suckers" (thereby decreasing the food supply for more predatory and parasitic strategies) while still rewarding "provokable" cooperators (thereby increasing the total aggregate "reward" of the ecosystem.
Also, one essential part of the payoff structure that deserves emphasis is that the payoff for cooperating has to be more than half the average of the winner and loser's payoff for defection, else one benefits by simply alternating each turn. This is a little bit like the winners did here, where they got the top spots at the cost of a lower total take for their "team". One real world example of slashdot interest where this might make sense is if you take these losses in order to eliminate your rivals from the game and then reap monopoly benefits once you control the game (not to mention any names...).
Maybe someone who has analyzed the results in more detail could comment on how the various well known strategies fared and why.
Re:Practicality by spitzak · 2004-10-14 05:51 · Score: 4, Insightful

Yes I agree that public domain code is very much the same as the prisoner delimma.

The GPL is an attempt to make it *not* the prisoner delimma by forcing the other side to cooperate if you do. This eliminates the losing part of the cooperation choice and thus it is no longer a delimma.