Volume 8,Number1,pp.87100. http://www.s pe.org ©2007SWPS
THE SUCCESSOF COOPERATIVESTRATEGIES IN THEITERATED PRISONER'S
DILEMMAAND THECHICKEN GAME
BENGT CARLSSON
∗
AND K.INGEMAR JÖNSSON
†
Abstra t. Theprisoner's dilemmahas evolvedinto astandard gamefor analyzingthe su essof ooperative strategiesin
repeatedgames. Withthe aimofinvestigatingthe behaviorofstrategies insomealternativegamesweanalyzed theout omeof
iteratedgamesforboththe prisoner'sdilemmaand the hi kengame. Inthe hi kengame,mutualdefe tionispunishedmore
stronglythanintheprisoner'sdilemma,andyieldsthelowesttness.Wealsoranouranalysesunderdierentlevelsofnoise. The
resultsreveala strikingdieren einthe out omebetween the games. Iterated hi kengameneededmoregenerations tonda
winningstrategy. It alsofavoredni e,forgivingstrategies ableto forgiveadefe tionfromanopponent. Inparti ularthe well-
knownstrategytit-for-tathasapoorsu essrateundernoisy onditions. The hi kengame onditionsmayberelatively ommon
inothers ien es,and thereforewesuggest thatthisgameshould re eivemoreinterest asa ooperative gamefromresear hers
within omputers ien e.
Keywords. Gametheory,prisoner'sdilemma, hi kengame,noise,tit-for-tat
1. Introdu tion. Within omputers ien e,biology,so ialande onomi s ien estheissueof ooperation
betweenindividualsinanevolutionary ontextiswidelydis ussed. Anevolutionary ontextmeanssome oni t
ofinterestbetweentheparti ipantspreferrablymodeledinagametheoreti al ontextusing oni tinggames.
Asimple,butfrequentlyused,gamemodelisbetweentwoparti ipantsea hwithtwo hoi es,eitherto ooperate
ortodefe t(a
2 ∗ 2
matrixgame)playedon eorrepeated. Inmultiagentsystemsiteratedgameshavebe omeapopular tool for analyzing so ial behaviorand ooperation based on re ipro ity ([3, 5, 4, 9℄). By allowing
gamestobeplayedseveraltimes andagainstseveralotherstrategiesashadowofthefuture,i.e.anon-zero
probability for the agents to meet again in the future, is reated for the urrent game. This in reases the
opportunityfor ooperativebehaviorto evolve(e.g., [4℄). A olle tionof dierent models of ooperationand
altruismwasdis ussedinLehmannandKeller[14℄.
Most iterative analyses on ooperation have fo used on the payo environment dened asthe prisoner's
dilemma(PD)([5,9,13,20℄). Intermsofpayos,aPDisdenedwhen
T > R > P > S
,whereR
=reward,S
=su ker,
T
=temptationandP
=punishment. Itshouldalsohold that2R > T + S
a ordingtotable 1.1a.These ond onditionmeansthatthevalueofthepayo,whensharedin ooperation,mustbegreaterthanit
iswhen sharedbya ooperator andadefe tor. Be ause itpays moreto defe t,nomatter how theopponent
hoosestoa t,anagentisboundtodefe t,iftheagentsarenotderivingadvantagefromrepeatingthegame. If
2R < T + S
isallowedtherewillbenoupperlimitforthevalueofthetemptation. However,thereisnodenite reason for ex luding this possibility. Carlsson and Johansson [11℄ argued that Rapoport and Chammah [23℄introdu ed this onstraintfor pra ti almorethan theoreti alreasons. PD belongs to a lass of gameswhere
ea hplayerhasadominatingstrategyofplayingdefe tinthesingleplayPD.
Chi ken game (CG) is asimilar but mu h lessstudied game than PD, but see Tutzauer et al. [26℄ for a
re entstudy. CG isdened when
T > R > S > P
, i. e. mutualdefe tion ispunished more in theCG thanin thePD.In thesingle-play form,theCGhas nodominantstrategy (althoughit hastwoNash equilibriain
purestrategies, andonemixed equilibrium), andthus noexpe ted out omeasin thePD [16℄. Together with
thegenerous hi kengame(GCG), also alled the battleof sexes[17℄ or oordinationgame, CGbelongsto a
lassofgameswhereneitherplayerhasadominatingstrategy. ForaGCG,playingdefe tin reasesthepayo
forbothofthem,unlesstheotheragentalsoplaysdefe t(
T > S > R > P
).Intable1.1b,
R
andP
areassumedtobexedto1
and0
respe tively. This anbeobtainedthroughatwo stepsredu tionwhereallvariablesarerstsubtra tedbyP
andthendividedbyR − P
. Thismakesitpossibletodes ribethegameswithonlytwoparameters
S
′= (S − P )/(R − P )
andT
′= (T − P )/(R − P )
. Infa t wean aptureallpossible
2x2
gamesinatwo-dimensionalplane.Ingure1.1 theparameterspa e forPD,CG andGCG dened by
S
′ andT
′, isshown.T
′= 1
marksadividinglinebetween oni t and ooperation.
S
′= 0
marksthelinebetweenCGandPD.T
′< 1
meansthatplaying ooperate(
R
)is favored overplaying defe t (T
) when the other agent ooperates. This prevents an∗
S hoolofEngineering,BlekingeInstituteofTe hnology,S-37225Ronneby,Sweden,+46457385813,bengt. arlssonbth.se
†
Department of Mathemati s and S ien es, Kristianstad University, S-291 88 Kristianstad, Sweden. +46 44 203429,
ingemar.jonssonmna.hkr .se
Fig. 1.1. Theareas overed bythree kindsof oni tinggames ina two-dimensional plane: prisoner's dilemma, hi ken
gameandgenerous hi kengame
agentfrombeingselshinasurroundingof ooperation. Coni tinggamesareexpe tedwhen
T
′> 1
be auseofbetterout omeplayingtemptation(
T
).In an evolutionary ontext, the payo obtained from a parti ular game represents the hange in tness
(reprodu tivesu ess)of aplayer. MaynardSmith [18℄des ribesanevolutionaryresour eallo ation withina
2x2
gameas ahawkand dove game. In thematri es of table 1.1 ahawk onstitutes playingD, and adove onstitutes playing C. A hawk gets all the resour es playing against a dove. Two doves share the resour ewhereastwohawkses alatea ght aboutthe resour e. If the ost ofobtaining theresour efor thehawksis
greaterthantheresour ethereisaCG,otherwisethereisaPD.InagenerousCG(notahawkanddovegame)
more resour esare obtained for both agents when oneagent defe ts ompared to both playing ooperate or
defe t.
Re entanalyseshavefo usedontheee tsof mistakesintheimplementationof strategies. Inparti ular,
su hmistakes,usually allednoise,mayallowevolutionarystabilityofpurestrategiesiniteratedgames[9℄. Two
separate asesaregenerally onsidered:thetremblinghandnoiseandmisinterpretations. Withinthetrembling
hand noise ([24, 4℄) a perfe t strategy would take into a ount that agents o asionally do not perform the
intendeda tion 1
. Inthemisinterpretations aseanagentmaynothave hosenthewrong a tion. Insteaditis
interpretedassu h byat leastoneof itsopponents,resultingin agentskeepingdierentopinionsaboutwhat
happenedin thegame. This introdu tionofmistakesrepresentsanimportantstep,asreal biologi alsystems
aswellas omputersystemswillusuallyinvolveun ertaintyatsomelevel.
Here,westudythebehaviorofstrategiesiniteratedgameswithintheprisoner'sdilemmaand hi kengame
payo stru tures, under dierent levels of noise. We rst give a ba kground to our simulations, in luding a
roundrobin tournamentanda hara terizationofthe strategiesthat weuse. Wethen presenttheout omeof
iteratedpopulationtournaments,anddis usstheimpli ationsofourresultsforgametheoreti alstudiesonthe
evolutionof ooperation.
1
Inthismetaphoranagent hoosesbetween twobuttons. Thetremblinghandmay,bymistake, ausetheagenttopressthe
2. Games,Strategies, and Simulation Pro edures.
2.1. Games. Agame anbemodeled asastrategi oran extensivegame. Astrategi gameis amodel
of a situation in whi h ea h agent hooses his plan of a tion on e and for all, and all agents' de isions are
madesimultaneouslywhileanextensivegamespe iesthepossibleordersofevents. Thestrategi agentisnot
informedoftheplanofa tion hosenbyanyotheragentwhileanextensiveagent an onsideritsplanofa tion
wheneverade isionhasto bemade. All theagentsinouranalysesarestrategi . Allstrategiesmayae tthe
movesof the otheragent, i. e. to playC orD, but notthe payo value,so thelatter doesnot inuen e the
strategy. Thekindofgamesthatwesimulateherehavebeen allede ologi alsimulations,asdistinguishedfrom
evolutionarysimulationsinwhi hnewstrategiesmayariseinthe ourseofthegamebymutation([3℄). However,
e ologi alsimulationsin ludeall omponentsne essaryforthemimi kingofanevolutionarypro ess: variation
intypes(strategies),sele tionofthesetypesresultingfromthedierentialpayosobtainedinthe ontests,and
dierentialpropagationofstrategiesovergenerations. Consequently,wendthedistin tionbetweene ologi al
andevolutionarysimulationsbasedonthe riteriaofmutationrathermisleading.
The PDs and CGs that we analyze are repeated games with memory, usually alled iterated games. In
iteratedgames someba kgroundinformation is known aboutwhat happened in thegame upto now. Inour
simulation the strategies know the previous moves of their antagonist 2
. In all our simulations, intera tions
amongplayersarepair-wise,i.e. aplayerintera tswithonlyoneplayeratatime
2.2. Ni e and Mean Strategies. Axelrod ([1, 5, 2,3℄) ategorized strategiesasni e ormean. A ni e
strategyneverplaysdefe tionbeforetheotherplayerdefe ts,whereasameanstrategyneverplays ooperation
before theopponent ooperates. Thus theni eandmeanterminologydes ribesanagent'snextmove.
A ording to the ategorization of Axelrod Tit-for-tat, TfT, is a ni e strategy, but it ould as well be
regardedasarepeatingstrategy. Another ategoryofstrategiesisagroupofforgivingstrategies onsisting of
Simpleton,Grofman,andFair. They an,unlikeTfT,avoidgettingintomutualdefe tionbyplaying ooperate.
Iftheopponentdoesnotrespondtothisforgivingbehaviortheystarttoplaydefe tagain. Finallyweseparate
agroupofrevengingstrategies,whi hretaliateadefe tionatsomepointofthegamewithdefe tionfortherest
ofthegame. FriedmanandDavisbelongtothis groupofstrategies.
Theprin ipleforthe ategorizationofstrategiesintoni eandforgivingagainstdefe tingstrategies,whi h
usethreatsandpunishments,isun lear. Forinstan e,whyisTfT notjust treatedasastrategyrepeatingthe
a tionoftheotherstrategyinstead?
2.3. GenerousandGreedyStrategies. Onealternativewayof ategorizingstrategiesistogroupthem
togetherasbeinggenerous,even-mat hed, orgreedy([11,10℄). Ifastrategymoreoftenplaysasasu ker,
n
S,thanplayingtemptation,
n
T, thenitisagenerousstrategyn
S> n
T. An even-mat hedstrategyhasn
S≈ n
Tandagreedystrategyhas
n
S< n
T wheren
S andn
T aretheproportionanagentplayssu kerandtemptation, respe tively.Boerlijst,et al[8℄usesasimilar ategorizationintogoodorbadstandings. An agentisingoodstandingif
ithas ooperatedinthepreviousroundorifithasdefe tedwhileprovoked,i.e.,iftheagentisingoodstanding
it should notbegreedy unless theother agentwasgreedy theround before. In everyother aseof defe tion
theagentisin bad standing,i. e. it triesto begreedy. Thegenerousand greedy ategorizationusesastable
approa h,aon eandforall ategorization 3
, ontrarytothemoredynami goodandbadstandingdealingwith
whathappenedinthepreviousmove.
Thestableapproa hofthegenerousandgreedy ategorizationmakesiteasiertoanalyzethismodel. The
basis of the partition is that it is a zero-sumgame at the meta-level in that the sum of proportions of the
strategies
n
S mustequalthesumofthestrategiesn
T. Inotherwords,ifthereisagenerousstrategy,thentheremustalsobeagreedystrategy.
The lassi ationofastrategy an hangedependingonthesurroundingstrategies. Letusassumewehave
thefollowingfourstrategies:
•
AlwaysCooperate(AllC) has100per ent o-operaten
R+ n
S whenmeeting another strategy. AllCwillnevera tasagreedystrategy.
•
AlwaysDefe t(AllD) has100per entdefe tn
T+ n
P whenmeetinganotherstrategy. AllDwill nevera tasagenerousstrategy.
2
Oneofthestrategies,Fair,alsoremembersitsownpreviousmoves
3
Fig.2.1.Proportionsof
R
,S
,T
andP
fordierentstrategies. Thereisagenerousstrategyifn
S> n
Tandagreedystrategyif
n
S< n
T•
Tit-for-tat(TfT)alwaysrepeatsthemoveoftheother ontestant,makingitarepeatingstrategy. TfT naturallyentailsthatn
S≈ n
T.•
Random plays ooperateand defe tapproximatelyhalf of thetime ea h. Theproportions ofn
S andn
T will bedeterminedbythesurrounding strategies.Random will be a greedy strategy in asurrounding of AllC and Random, and agenerous strategy in a
surrounding of AllD and Random. Both TfT and Random will behave as an even-mat hed strategy in the
presen e of only these two strategies aswell as in a surrounding of all four strategies, with AllC and AllD
parti ipatinginthesameproportions. Allstrategiesareeven-mat hedwhenthereisonlyasinglestrategyleft.
The strategiesused in ouriterated prisoner'sdilemma (IPD) and iterated hi kengame (ICG), in all 14
dierentstrategiesplusplayingRandom,are presentedintable 2.1. AllC, AllDandRandomdonotneedany
memoryfun tion at allbe ausetheyalwaysdo thesamething (whi h forRandommeansalwaysrandomize).
TfT andATfTneed tolook ba konemovebe ausetheyrepeatorreversethemoveof itsopponent. Most of
theotherstrategiesalsoneedto lookba konemovebutmayrespondtodefe tionorshowforgiveness.
AllCdenitelybelongstoagroupofgenerousstrategiesandsodo95%Cooperate(95%C),tit-for-two-tats
(Tf2T),Grofman,Fair,andSimpleton,in thisspe i environment.
Theeven-mat hedgroupofstrategiesin ludesTfT, Random,andAnti-tit-for-tat(ATfT).
Within thegroupofgreedy strategies,Feld, Davis, andFriedmanbelong toasmallerfamily ofstrategies
doingmore o-operationmovesthanRandom,i. e. havingsigni antlymorethan50%
R
orS
. An analogousfamily onsistsofJoss,Tester,andAllD.These strategies o-operatelessfrequentlythandoesRandom.
Whatwill happento aparti ularstrategydepends bothon thesurrounding strategiesandonthe hara -
teristi softhestrategy. Forexample,AllCwillalwaysbegenerouswhile95%Cwill hangetoagreedystrategy
when thesetwoare theonlystrategies left. The des ribedrelation betweenstrategiesis independent ofwhat
kindofgameisplayed,butthea tualout omeofthegameisrelatedtothepayomatrix.
2.4. SimulationPro edures. Thesetofstrategiesusedinourrstsimulationin ludessomeofAxelrod's
original strategiesandafew, laterreported,su essful strategies. Of ourse,these strategiesrepresentonlya
verylimitednumberofallpossiblestrategies. However,theemphasisinourworkisondieren esbetweenIPD
andICG.Whetherthereexistsasingle"`bestofthegame"'strategyisoutsidethes opeofouranalyses.
Mistakesintheimplementationofstrategies(noise)werein orporatedbyatta hinga ertainprobability
p
between0.02and20%toplaythealternativea tion(CorD),anda orrespondingprobability
(1 − p)
to playTable2.1
Des riptionofthedierentstrategiesused intherstsimulation(seese tion3.1)
Strategy Firstmove Des ription
AllC C Cooperatesall thetime
95%C C Cooperates95%ofthetime
Tf2T C tit-for-two-tats, Cooperatesuntilitsopponentdefe tstwi e,
andthendefe tsuntilitsopponentstartsto ooperateagain
Grofman C CooperatesifRorPwasplayed,otherwiseit ooperateswith
aprobabilityof2/7
Fair C Astrategywiththreepossiblestates,-'satised'(C),'apolo-
gizing'(C)and'angry'(D).Itstartsinthesatisedstateand
ooperates untilitsopponent defe ts;thenitswit hes toits
angrystate,anddefe tsuntilitsopponent ooperates,before
returning tothe satised state. IfFair a identally defe ts,
theapologizingstateisenteredanditstays ooperating un-
tilitsopponent forgivesthemistakeandstartsto ooperate
again
Simpleton C Like Grofman, it ooperates whenever the previous moves
werethesame,butitalwaysdefe tswhenthemovesdiered
(e.g.S)
TfT C Tit-for-tat. Repeatsthemovesoftheopponent
Feld C Basi allyatit-for-tat,butwithalinearlyin reasing(from0
with0.25% periteration up to iteration 200)probability of
playingDinsteadofC
Davis C Cooperates onthe rst10moves, andthen,ifthereisade-
fe tion,itdefe tsuntiltheendofthegame
Friedman C Cooperates aslongasitsopponentdoesso. On ethe oppo-
nentdefe ts,Friedmandefe tsfortherestofthegame
ATfT D Anti-tit-for-tat. Playsthe omplementarymoveoftheoppo-
nent
Joss C A TfT-variant that ooperates with a probability of 90%,
when opponent ooperated and defe ts whenopponent de-
fe ted
Tester D AltersDandCuntilitsopponentdefe ts,thenitplaysaC
andTfT
AllD D Defe tsallthetime
Our population tournament involves two sets of analyses. In the rst set, the strategies are allowed to
ompete within a round robin tournament with theaim of obtaininga generalevaluation of thetenden y of
dierentstrategiestoplay ooperateanddefe t. Inaroundrobintournament,ea hstrategyispairedon ewith
allotherstrategiesplusitstwin. Theresultsfromtheroundrobintournamentareusedwithin thepopulation
tournamentbutwillnotbepresentedhere(fortheresultssee[10℄). Inthese ondset,the ompetitiveabilities
ofstrategiesiniteratedpopulationtournamentswere studieswithintheIPDandtheICG.Wealso ondu ted
ase ond simulationoftheIPDandtheICGwheretwosetsofstrategieswereused. Weusedthestrategiesin
gure2.2representedbyniteautomata[15℄. Theplaybetweentwoautomataisasto hasti pro esswhereall
nitememorystrategies anberepresentedbyin reasingly ompli atednite automata.Memory-0strategies,
likeAllC andAllD,donotinvolveanymemory apa ityatall. Ifthestrategyin useonlyhasto lookba kat
onedraw,thereisamemory-1strategy(a hoi ebetweentwo ir lesdependentoftheotheragent'smove). All
thestrategiesin gure2.2belongtomemory-0or memory-1strategies.
Both sets of strategies in lude AllD, AllC, TfT, ATfT and Random. In the rst set of strategies, the
ooperative-set veAllC variants(100, 99.99, 99.9, 99 and 90%probability of playingC) are added. In the
Fig.2.2. a)AllD(andvariants)b)TfT )ATfTd)AllC(andvariants). Onthetransitionedges,theleftsymbol orrespond
toana tion donebyastrategyagainstanopponentperformingtherightsymbol,whereanXdenotesanarbitrarya tion. Yin
CyandDydenotesaprobabilityfa torforplayingCandDrespe tively
probabilityofplayingD) areadded.
C
y andD
y in gure2.2showaprobabilityfa tor y100,99.99,99.9, 99, 90%orfortheRandomstrategy50%forplayingCandD respe tively.3. PopulationTournament WithNoise.
3.1. First Simulation. We evaluated thestrategies in table 2.1by allowing them to ompete within a
roundrobintournament.
Toobtain amoregeneral treatmentof IPD and ICG, we used several variantsof payo matri es within
thesegames,basedonthegeneralmatrixoftable3.1. Inthismatrix,Cstandsfor ooperate;D fordefe tand
q
isa ostvariable.Table3.1
Payovaluesusedinoursimulation.
q
isa ostparameter.0 < q < 0.5
denesa prisoner'sdilemmagame,whileq > 0.5
denesa hi kengame
Player2
Player1 C D
C 1.5 1
D 2 1.5-
q
ThepayoforaDagentplayingagainstaCagentis2,whilethe orrespondingpayoforaCagentplaying
againstaDagentis1,et . TwoCagentsshare theresour eandget1.5ea h.
The out ome of a ontest with two D agents depends on
q
. For0 < q < 0.5
, a PD game is dened,and for
q > 0.5
we have a CG. Simulations were run with the values for(1.5 − q)
set to 1.4 and 1.1 forPD, and to 0.9, 0.6, and 0.0 for the CG (these values are hosen with the purpose to span a wide range of
thegames but are otherwisearbitrarily hosen). Wealso in ludedAxelrod's original matrixAx (
R = 3, S = 0, T = 5
andP = 1
) and a ompromise dilemma game CD (R = 2, S = 2, T = 3
andP = 1
). A CD islo ated on the borderline between the CG area and the generous CG area. In the dis ussion part we also
omparethe mentionedstrategieswith a oordination game CoG(
R = 2, S = 0, T = 0
andP = 1
), theonlygame with
T
′< 1
. CoG is in luded as a referen e game and does not belong to the oni ting games. Ingure3.1allthesegamesareshownwithinthetwo-dimensionalplane. TheCDis loselyrelatedtothe hi ken