• No results found

The success of cooperative strategies in the iterated prisoner's dilemma and the chicken game

N/A
N/A
Protected

Academic year: 2022

Share "The success of cooperative strategies in the iterated prisoner's dilemma and the chicken game"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

Volume 8,Number1,pp.87100. http://www.s pe.org ©2007SWPS

THE SUCCESSOF COOPERATIVESTRATEGIES IN THEITERATED PRISONER'S

DILEMMAAND THECHICKEN GAME

BENGT CARLSSON

AND K.INGEMAR JÖNSSON

Abstra t. Theprisoner's dilemmahas evolvedinto astandard gamefor analyzingthe su essof ooperative strategiesin

repeatedgames. Withthe aimofinvestigatingthe behaviorofstrategies insomealternativegamesweanalyzed theout omeof

iteratedgamesforboththe prisoner'sdilemmaand the hi kengame. Inthe hi kengame,mutualdefe tionispunishedmore

stronglythanintheprisoner'sdilemma,andyieldsthelowesttness.Wealsoranouranalysesunderdierentlevelsofnoise. The

resultsreveala strikingdieren einthe out omebetween the games. Iterated hi kengameneededmoregenerations tonda

winningstrategy. It alsofavoredni e,forgivingstrategies ableto forgiveadefe tionfromanopponent. Inparti ularthe well-

knownstrategytit-for-tathasapoorsu essrateundernoisy onditions. The hi kengame onditionsmayberelatively ommon

inothers ien es,and thereforewesuggest thatthisgameshould re eivemoreinterest asa ooperative gamefromresear hers

within omputers ien e.

Keywords. Gametheory,prisoner'sdilemma, hi kengame,noise,tit-for-tat

1. Introdu tion. Within omputers ien e,biology,so ialande onomi s ien estheissueof ooperation

betweenindividualsinanevolutionary ontextiswidelydis ussed. Anevolutionary ontextmeanssome oni t

ofinterestbetweentheparti ipantspreferrablymodeledinagametheoreti al ontextusing oni tinggames.

Asimple,butfrequentlyused,gamemodelisbetweentwoparti ipantsea hwithtwo hoi es,eitherto ooperate

ortodefe t(a

2 ∗ 2

matrixgame)playedon eorrepeated. Inmultiagentsystemsiteratedgameshavebe ome

apopular tool for analyzing so ial behaviorand ooperation based on re ipro ity ([3, 5, 4, 9℄). By allowing

gamestobeplayedseveraltimes andagainstseveralotherstrategiesashadowofthefuture,i.e.anon-zero

probability for the agents to meet again in the future, is reated for the urrent game. This in reases the

opportunityfor ooperativebehaviorto evolve(e.g., [4℄). A olle tionof dierent models of ooperationand

altruismwasdis ussedinLehmannandKeller[14℄.

Most iterative analyses on ooperation have fo used on the payo environment dened asthe prisoner's

dilemma(PD)([5,9,13,20℄). Intermsofpayos,aPDisdenedwhen

T > R > P > S

,where

R

=reward,

S

=su ker,

T

=temptationand

P

=punishment. Itshouldalsohold that

2R > T + S

a ordingtotable 1.1a.

These ond onditionmeansthatthevalueofthepayo,whensharedin ooperation,mustbegreaterthanit

iswhen sharedbya ooperator andadefe tor. Be ause itpays moreto defe t,nomatter how theopponent

hoosestoa t,anagentisboundtodefe t,iftheagentsarenotderivingadvantagefromrepeatingthegame. If

2R < T + S

isallowedtherewillbenoupperlimitforthevalueofthetemptation. However,thereisnodenite reason for ex luding this possibility. Carlsson and Johansson [11℄ argued that Rapoport and Chammah [23℄

introdu ed this onstraintfor pra ti almorethan theoreti alreasons. PD belongs to a lass of gameswhere

ea hplayerhasadominatingstrategyofplayingdefe tinthesingleplayPD.

Chi ken game (CG) is asimilar but mu h lessstudied game than PD, but see Tutzauer et al. [26℄ for a

re entstudy. CG isdened when

T > R > S > P

, i. e. mutualdefe tion ispunished more in theCG than

in thePD.In thesingle-play form,theCGhas nodominantstrategy (althoughit hastwoNash equilibriain

purestrategies, andonemixed equilibrium), andthus noexpe ted out omeasin thePD [16℄. Together with

thegenerous hi kengame(GCG), also alled the battleof sexes[17℄ or oordinationgame, CGbelongsto a

lassofgameswhereneitherplayerhasadominatingstrategy. ForaGCG,playingdefe tin reasesthepayo

forbothofthem,unlesstheotheragentalsoplaysdefe t(

T > S > R > P

).

Intable1.1b,

R

and

P

areassumedtobexedto

1

and

0

respe tively. This anbeobtainedthroughatwo stepsredu tionwhereallvariablesarerstsubtra tedby

P

andthendividedby

R − P

. Thismakesitpossible

todes ribethegameswithonlytwoparameters

S

= (S − P )/(R − P )

and

T

= (T − P )/(R − P )

. Infa t we

an aptureallpossible

2x2

gamesinatwo-dimensionalplane.

Ingure1.1 theparameterspa e forPD,CG andGCG dened by

S

and

T

, isshown.

T

= 1

marksa

dividinglinebetween oni t and ooperation.

S

= 0

marksthelinebetweenCGandPD.

T

< 1

meansthat

playing ooperate(

R

)is favored overplaying defe t (

T

) when the other agent ooperates. This prevents an

S hoolofEngineering,BlekingeInstituteofTe hnology,S-37225Ronneby,Sweden,+46457385813,bengt. arlssonbth.se

Department of Mathemati s and S ien es, Kristianstad University, S-291 88 Kristianstad, Sweden. +46 44 203429,

ingemar.jonssonmna.hkr .se

(2)

Fig. 1.1. Theareas overed bythree kindsof oni tinggames ina two-dimensional plane: prisoner's dilemma, hi ken

gameandgenerous hi kengame

agentfrombeingselshinasurroundingof ooperation. Coni tinggamesareexpe tedwhen

T

> 1

be ause

ofbetterout omeplayingtemptation(

T

).

In an evolutionary ontext, the payo obtained from a parti ular game represents the hange in tness

(reprodu tivesu ess)of aplayer. MaynardSmith [18℄des ribesanevolutionaryresour eallo ation withina

2x2

gameas ahawkand dove game. In thematri es of table 1.1 ahawk onstitutes playingD, and adove onstitutes playing C. A hawk gets all the resour es playing against a dove. Two doves share the resour e

whereastwohawkses alatea ght aboutthe resour e. If the ost ofobtaining theresour efor thehawksis

greaterthantheresour ethereisaCG,otherwisethereisaPD.InagenerousCG(notahawkanddovegame)

more resour esare obtained for both agents when oneagent defe ts ompared to both playing ooperate or

defe t.

Re entanalyseshavefo usedontheee tsof mistakesintheimplementationof strategies. Inparti ular,

su hmistakes,usually allednoise,mayallowevolutionarystabilityofpurestrategiesiniteratedgames[9℄. Two

separate asesaregenerally onsidered:thetremblinghandnoiseandmisinterpretations. Withinthetrembling

hand noise ([24, 4℄) a perfe t strategy would take into a ount that agents o asionally do not perform the

intendeda tion 1

. Inthemisinterpretations aseanagentmaynothave hosenthewrong a tion. Insteaditis

interpretedassu h byat leastoneof itsopponents,resultingin agentskeepingdierentopinionsaboutwhat

happenedin thegame. This introdu tionofmistakesrepresentsanimportantstep,asreal biologi alsystems

aswellas omputersystemswillusuallyinvolveun ertaintyatsomelevel.

Here,westudythebehaviorofstrategiesiniteratedgameswithintheprisoner'sdilemmaand hi kengame

payo stru tures, under dierent levels of noise. We rst give a ba kground to our simulations, in luding a

roundrobin tournamentanda hara terizationofthe strategiesthat weuse. Wethen presenttheout omeof

iteratedpopulationtournaments,anddis usstheimpli ationsofourresultsforgametheoreti alstudiesonthe

evolutionof ooperation.

1

Inthismetaphoranagent hoosesbetween twobuttons. Thetremblinghandmay,bymistake, ausetheagenttopressthe

(3)

2. Games,Strategies, and Simulation Pro edures.

2.1. Games. Agame anbemodeled asastrategi oran extensivegame. Astrategi gameis amodel

of a situation in whi h ea h agent hooses his plan of a tion on e and for all, and all agents' de isions are

madesimultaneouslywhileanextensivegamespe iesthepossibleordersofevents. Thestrategi agentisnot

informedoftheplanofa tion hosenbyanyotheragentwhileanextensiveagent an onsideritsplanofa tion

wheneverade isionhasto bemade. All theagentsinouranalysesarestrategi . Allstrategiesmayae tthe

movesof the otheragent, i. e. to playC orD, but notthe payo value,so thelatter doesnot inuen e the

strategy. Thekindofgamesthatwesimulateherehavebeen allede ologi alsimulations,asdistinguishedfrom

evolutionarysimulationsinwhi hnewstrategiesmayariseinthe ourseofthegamebymutation([3℄). However,

e ologi alsimulationsin ludeall omponentsne essaryforthemimi kingofanevolutionarypro ess: variation

intypes(strategies),sele tionofthesetypesresultingfromthedierentialpayosobtainedinthe ontests,and

dierentialpropagationofstrategiesovergenerations. Consequently,wendthedistin tionbetweene ologi al

andevolutionarysimulationsbasedonthe riteriaofmutationrathermisleading.

The PDs and CGs that we analyze are repeated games with memory, usually alled iterated games. In

iteratedgames someba kgroundinformation is known aboutwhat happened in thegame upto now. Inour

simulation the strategies know the previous moves of their antagonist 2

. In all our simulations, intera tions

amongplayersarepair-wise,i.e. aplayerintera tswithonlyoneplayeratatime

2.2. Ni e and Mean Strategies. Axelrod ([1, 5, 2,3℄) ategorized strategiesasni e ormean. A ni e

strategyneverplaysdefe tionbeforetheotherplayerdefe ts,whereasameanstrategyneverplays ooperation

before theopponent ooperates. Thus theni eandmeanterminologydes ribesanagent'snextmove.

A ording to the ategorization of Axelrod Tit-for-tat, TfT, is a ni e strategy, but it ould as well be

regardedasarepeatingstrategy. Another ategoryofstrategiesisagroupofforgivingstrategies onsisting of

Simpleton,Grofman,andFair. They an,unlikeTfT,avoidgettingintomutualdefe tionbyplaying ooperate.

Iftheopponentdoesnotrespondtothisforgivingbehaviortheystarttoplaydefe tagain. Finallyweseparate

agroupofrevengingstrategies,whi hretaliateadefe tionatsomepointofthegamewithdefe tionfortherest

ofthegame. FriedmanandDavisbelongtothis groupofstrategies.

Theprin ipleforthe ategorizationofstrategiesintoni eandforgivingagainstdefe tingstrategies,whi h

usethreatsandpunishments,isun lear. Forinstan e,whyisTfT notjust treatedasastrategyrepeatingthe

a tionoftheotherstrategyinstead?

2.3. GenerousandGreedyStrategies. Onealternativewayof ategorizingstrategiesistogroupthem

togetherasbeinggenerous,even-mat hed, orgreedy([11,10℄). Ifastrategymoreoftenplaysasasu ker,

n

S,

thanplayingtemptation,

n

T, thenitisagenerousstrategy

n

S

> n

T. An even-mat hedstrategyhas

n

S

≈ n

T

andagreedystrategyhas

n

S

< n

T where

n

S and

n

T aretheproportionanagentplayssu kerandtemptation, respe tively.

Boerlijst,et al[8℄usesasimilar ategorizationintogoodorbadstandings. An agentisingoodstandingif

ithas ooperatedinthepreviousroundorifithasdefe tedwhileprovoked,i.e.,iftheagentisingoodstanding

it should notbegreedy unless theother agentwasgreedy theround before. In everyother aseof defe tion

theagentisin bad standing,i. e. it triesto begreedy. Thegenerousand greedy ategorizationusesastable

approa h,aon eandforall ategorization 3

, ontrarytothemoredynami goodandbadstandingdealingwith

whathappenedinthepreviousmove.

Thestableapproa hofthegenerousandgreedy ategorizationmakesiteasiertoanalyzethismodel. The

basis of the partition is that it is a zero-sumgame at the meta-level in that the sum of proportions of the

strategies

n

S mustequalthesumofthestrategies

n

T. Inotherwords,ifthereisagenerousstrategy,thenthere

mustalsobeagreedystrategy.

The lassi ationofastrategy an hangedependingonthesurroundingstrategies. Letusassumewehave

thefollowingfourstrategies:

AlwaysCooperate(AllC) has100per ent o-operate

n

R

+ n

S whenmeeting another strategy. AllC

willnevera tasagreedystrategy.

AlwaysDefe t(AllD) has100per entdefe t

n

T

+ n

P whenmeetinganotherstrategy. AllDwill never

a tasagenerousstrategy.

2

Oneofthestrategies,Fair,alsoremembersitsownpreviousmoves

3

(4)

Fig.2.1.Proportionsof

R

,

S

,

T

and

P

fordierentstrategies. Thereisagenerousstrategyif

n

S

> n

Tandagreedystrategy

if

n

S

< n

T

Tit-for-tat(TfT)alwaysrepeatsthemoveoftheother ontestant,makingitarepeatingstrategy. TfT naturallyentailsthat

n

S

≈ n

T.

Random plays ooperateand defe tapproximatelyhalf of thetime ea h. Theproportions of

n

S and

n

T will bedeterminedbythesurrounding strategies.

Random will be a greedy strategy in asurrounding of AllC and Random, and agenerous strategy in a

surrounding of AllD and Random. Both TfT and Random will behave as an even-mat hed strategy in the

presen e of only these two strategies aswell as in a surrounding of all four strategies, with AllC and AllD

parti ipatinginthesameproportions. Allstrategiesareeven-mat hedwhenthereisonlyasinglestrategyleft.

The strategiesused in ouriterated prisoner'sdilemma (IPD) and iterated hi kengame (ICG), in all 14

dierentstrategiesplusplayingRandom,are presentedintable 2.1. AllC, AllDandRandomdonotneedany

memoryfun tion at allbe ausetheyalwaysdo thesamething (whi h forRandommeansalwaysrandomize).

TfT andATfTneed tolook ba konemovebe ausetheyrepeatorreversethemoveof itsopponent. Most of

theotherstrategiesalsoneedto lookba konemovebutmayrespondtodefe tionorshowforgiveness.

AllCdenitelybelongstoagroupofgenerousstrategiesandsodo95%Cooperate(95%C),tit-for-two-tats

(Tf2T),Grofman,Fair,andSimpleton,in thisspe i environment.

Theeven-mat hedgroupofstrategiesin ludesTfT, Random,andAnti-tit-for-tat(ATfT).

Within thegroupofgreedy strategies,Feld, Davis, andFriedmanbelong toasmallerfamily ofstrategies

doingmore o-operationmovesthanRandom,i. e. havingsigni antlymorethan50%

R

or

S

. An analogous

family onsistsofJoss,Tester,andAllD.These strategies o-operatelessfrequentlythandoesRandom.

Whatwill happento aparti ularstrategydepends bothon thesurrounding strategiesandonthe hara -

teristi softhestrategy. Forexample,AllCwillalwaysbegenerouswhile95%Cwill hangetoagreedystrategy

when thesetwoare theonlystrategies left. The des ribedrelation betweenstrategiesis independent ofwhat

kindofgameisplayed,butthea tualout omeofthegameisrelatedtothepayomatrix.

2.4. SimulationPro edures. Thesetofstrategiesusedinourrstsimulationin ludessomeofAxelrod's

original strategiesandafew, laterreported,su essful strategies. Of ourse,these strategiesrepresentonlya

verylimitednumberofallpossiblestrategies. However,theemphasisinourworkisondieren esbetweenIPD

andICG.Whetherthereexistsasingle"`bestofthegame"'strategyisoutsidethes opeofouranalyses.

Mistakesintheimplementationofstrategies(noise)werein orporatedbyatta hinga ertainprobability

p

between0.02and20%toplaythealternativea tion(CorD),anda orrespondingprobability

(1 − p)

to play

(5)

Table2.1

Des riptionofthedierentstrategiesused intherstsimulation(seese tion3.1)

Strategy Firstmove Des ription

AllC C Cooperatesall thetime

95%C C Cooperates95%ofthetime

Tf2T C tit-for-two-tats, Cooperatesuntilitsopponentdefe tstwi e,

andthendefe tsuntilitsopponentstartsto ooperateagain

Grofman C CooperatesifRorPwasplayed,otherwiseit ooperateswith

aprobabilityof2/7

Fair C Astrategywiththreepossiblestates,-'satised'(C),'apolo-

gizing'(C)and'angry'(D).Itstartsinthesatisedstateand

ooperates untilitsopponent defe ts;thenitswit hes toits

angrystate,anddefe tsuntilitsopponent ooperates,before

returning tothe satised state. IfFair a identally defe ts,

theapologizingstateisenteredanditstays ooperating un-

tilitsopponent forgivesthemistakeandstartsto ooperate

again

Simpleton C Like Grofman, it ooperates whenever the previous moves

werethesame,butitalwaysdefe tswhenthemovesdiered

(e.g.S)

TfT C Tit-for-tat. Repeatsthemovesoftheopponent

Feld C Basi allyatit-for-tat,butwithalinearlyin reasing(from0

with0.25% periteration up to iteration 200)probability of

playingDinsteadofC

Davis C Cooperates onthe rst10moves, andthen,ifthereisade-

fe tion,itdefe tsuntiltheendofthegame

Friedman C Cooperates aslongasitsopponentdoesso. On ethe oppo-

nentdefe ts,Friedmandefe tsfortherestofthegame

ATfT D Anti-tit-for-tat. Playsthe omplementarymoveoftheoppo-

nent

Joss C A TfT-variant that ooperates with a probability of 90%,

when opponent ooperated and defe ts whenopponent de-

fe ted

Tester D AltersDandCuntilitsopponentdefe ts,thenitplaysaC

andTfT

AllD D Defe tsallthetime

Our population tournament involves two sets of analyses. In the rst set, the strategies are allowed to

ompete within a round robin tournament with theaim of obtaininga generalevaluation of thetenden y of

dierentstrategiestoplay ooperateanddefe t. Inaroundrobintournament,ea hstrategyispairedon ewith

allotherstrategiesplusitstwin. Theresultsfromtheroundrobintournamentareusedwithin thepopulation

tournamentbutwillnotbepresentedhere(fortheresultssee[10℄). Inthese ondset,the ompetitiveabilities

ofstrategiesiniteratedpopulationtournamentswere studieswithintheIPDandtheICG.Wealso ondu ted

ase ond simulationoftheIPDandtheICGwheretwosetsofstrategieswereused. Weusedthestrategiesin

gure2.2representedbyniteautomata[15℄. Theplaybetweentwoautomataisasto hasti pro esswhereall

nitememorystrategies anberepresentedbyin reasingly ompli atednite automata.Memory-0strategies,

likeAllC andAllD,donotinvolveanymemory apa ityatall. Ifthestrategyin useonlyhasto lookba kat

onedraw,thereisamemory-1strategy(a hoi ebetweentwo ir lesdependentoftheotheragent'smove). All

thestrategiesin gure2.2belongtomemory-0or memory-1strategies.

Both sets of strategies in lude AllD, AllC, TfT, ATfT and Random. In the rst set of strategies, the

ooperative-set veAllC variants(100, 99.99, 99.9, 99 and 90%probability of playingC) are added. In the

(6)

Fig.2.2. a)AllD(andvariants)b)TfT )ATfTd)AllC(andvariants). Onthetransitionedges,theleftsymbol orrespond

toana tion donebyastrategyagainstanopponentperformingtherightsymbol,whereanXdenotesanarbitrarya tion. Yin

CyandDydenotesaprobabilityfa torforplayingCandDrespe tively

probabilityofplayingD) areadded.

C

y and

D

y in gure2.2showaprobabilityfa tor y100,99.99,99.9, 99, 90%orfortheRandomstrategy50%forplayingCandD respe tively.

3. PopulationTournament WithNoise.

3.1. First Simulation. We evaluated thestrategies in table 2.1by allowing them to ompete within a

roundrobintournament.

Toobtain amoregeneral treatmentof IPD and ICG, we used several variantsof payo matri es within

thesegames,basedonthegeneralmatrixoftable3.1. Inthismatrix,Cstandsfor ooperate;D fordefe tand

q

isa ostvariable.

Table3.1

Payovaluesusedinoursimulation.

q

isa ostparameter.

0 < q < 0.5

denesa prisoner'sdilemmagame,while

q > 0.5

denesa hi kengame

Player2

Player1 C D

C 1.5 1

D 2 1.5-

q

ThepayoforaDagentplayingagainstaCagentis2,whilethe orrespondingpayoforaCagentplaying

againstaDagentis1,et . TwoCagentsshare theresour eandget1.5ea h.

The out ome of a ontest with two D agents depends on

q

. For

0 < q < 0.5

, a PD game is dened,

and for

q > 0.5

we have a CG. Simulations were run with the values for

(1.5 − q)

set to 1.4 and 1.1 for

PD, and to 0.9, 0.6, and 0.0 for the CG (these values are hosen with the purpose to span a wide range of

thegames but are otherwisearbitrarily hosen). Wealso in ludedAxelrod's original matrixAx (

R = 3, S = 0, T = 5

and

P = 1

) and a ompromise dilemma game CD (

R = 2, S = 2, T = 3

and

P = 1

). A CD is

lo ated on the borderline between the CG area and the generous CG area. In the dis ussion part we also

omparethe mentionedstrategieswith a oordination game CoG(

R = 2, S = 0, T = 0

and

P = 1

), theonly

game with

T

< 1

. CoG is in luded as a referen e game and does not belong to the oni ting games. In

gure3.1allthesegamesareshownwithinthetwo-dimensionalplane. TheCDis loselyrelatedtothe hi ken

References

Related documents

Stöden omfattar statliga lån och kreditgarantier; anstånd med skatter och avgifter; tillfälligt sänkta arbetsgivaravgifter under pandemins första fas; ökat statligt ansvar

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

Parallellmarknader innebär dock inte en drivkraft för en grön omställning Ökad andel direktförsäljning räddar många lokala producenter och kan tyckas utgöra en drivkraft

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar

Industrial Emissions Directive, supplemented by horizontal legislation (e.g., Framework Directives on Waste and Water, Emissions Trading System, etc) and guidance on operating