LinkopingStudiesinS ien eandTe hnology
ThesisNo. 1363
Dynamic Abstraction for Interleaved Task Planning and Execution
by
Per Nyblom
SubmittedtoLinkopingInstituteofTe hnologyatLinkopingUniversityinpartial
fullmentoftherequirementsfordegreeofLi entiateofEngineering
DepartmentofComputerandInformationS ien e
Linkopinguniversitet
SE-58183Linkoping,Sweden
Dynamic Abstraction for Interleaved Task Planning and Execution
by
PerNyblom
April2008
ISBN978-91-7393-905-8
LinkopingStudiesinS ien eandTe hnology
ThesisNo. 1363
ISSN0280{7971
LiU{Tek{Li {2008:21
ABSTRACT
Itisoftenbene ialforanautonomousagentthatoperatesina omplexenvironmentto
makeuseofdierenttypesofmathemati almodelstokeeptra kofunobservableparts
oftheworld orto performpredi tion,planningandother typesof reasoning. Sin e a
modelisalwaysasimpli ationofsomethingelse,therealwaysexistsatradeobetween
themodel'sa ura y andfeasibilitywhen it isused within a ertain appli ationdue
to thelimitedavailable omputationalresour es. Currently,thistradeoisto alarge
extentbalan edbyhumansformodel onstru tioningeneralandforautonomousagents
in parti ular. This thesisinvestigates dierent solutionswheresu h agents aremore
responsibleforbalan ingthetradeoformodelsthemselvesinthe ontextofinterleaved
taskplanningandplanexe ution. Thene essary omponentsforanautonomousagent
thatperformsitsabstra tionsand onstru tsplanningmodelsdynami allyduringtask
planningandexe utionareinvestigatedandamethod alledDAREisdevelopedthatisa
templateforhandlingthepossiblesituationsthat ano ursu hastheriseofunsuitable
abstra tionsandneedfordynami onstru tionof abstra tionlevels. Implementations
ofDAREarepresentedintwo asestudieswherebothafullyandpartiallyobservable
sto hasti domain are used,motivated byresear h with UnmannedAir raftSystems.
The asestudiesalso demonstratepossibleways to performdynami abstra tion and
problemmodel onstru tioninpra ti e.
This workhasbeen supportedby theSwedishAeronauti sResear hCoun il (NFFP4-
S4203), the Swedish National Graduate S hool in Computer S ien e (CUGS), the
SwedishResear hCoun il(50405001)andtheWallenbergFoundation(WITASProje t).
DepartmentofComputerandInformationS ien e
Linkopinguniversitet
Acknowledgements
IwouldliketothankmyadvisorPatri kDohertywhohasgivenmemoreor
lessfreehandstoinvestigatethis fas inatingeldof Arti ialIntelligen e.
IthastrulybeensomeofthemostinterestingyearsofmylifeandIapologize
foralwayspi kingsubje tsthatyouarelessfamiliarwith.
During mytime at theArti ial Intelligen e andIntegratedComputer
Systemsdivision(AIICS),Ihavere eivedvaluableinputfrommanypeople.
Spe ial thanks to Martin Magnusson, Fredrik Heintz, Per-MagnusOlsson,
DavidLanden,PiotrRudolandGianpaoloContefor ommentingdraftsof
this thesis and related papersat various(perhaps dynami allygenerated)
levelsofabstra tion.
ThankstoMartinMagnussonforprovidingthereformanyinteresting
andsometimesendlessdis ussionswhi hreallymakemegrowasa person.
Alsothanksto PatrikHaslumforyourendlesswisdomand forsupporting
meduringmyearlydevelopment. ThankstoFredrikHeintzforyoursense
ofdetailandperfe tionandJonasKvarnstromforyourin redibleproblem
solving apabilities(andwilltosharethem). ThankstoTommyPersonand
BjornWingmanforyourhelpwith alltheimplementationissuesand your
insightsintotheUASTe hsystem.
Finally, I thankmy parentsKurtand Gunilla,my girlfriendAnna and
mydaughterAnneliforloveandsupport.
Contents
1 Introduction 1
1.1 ModelsandTradeos. . . . . . . . . . . . . . . . . . . . . . . 1
1.2 TaskEnvironmentsandModels . . . . . . . . . . . . . . . . . 2
1.3 TheUAS Te h System . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Abstra tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 PlanningModel Types . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Constru tingPlanningModels . . . . . . . . . . . . . . . . . 9
1.7 Fo usofAttention . . . . . . . . . . . . . . . . . . . . . . . . 9
1.8 Dynami Abstra tion. . . . . . . . . . . . . . . . . . . . . . . 10
1.9 Dynami Abstra tionforPlanningandExe ution. . . . . . . 11
1.9.1 Example. . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.9.2 DARE . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.10 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.11 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.12 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Preliminaries 16
2.1 ProbabilityTheory . . . . . . . . . . . . . . . . . . . . . . . . 172.1.1 Basi Assumptions . . . . . . . . . . . . . . . . . . . . 17
2.1.2 Sto hasti Variables . . . . . . . . . . . . . . . . . . . 17
2.1.3 DistributionsandDensityFun tions . . . . . . . . . . 18
2.1.4 JointDistributions . . . . . . . . . . . . . . . . . . . . 18
2.1.5 ConditionalDistributions . . . . . . . . . . . . . . . . 18
2.1.6 BayesRule . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.7 Expe tation . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 BayesianNetworks . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Denition . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.2 HybridModels . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 Inferen e . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.4 Impli itModels . . . . . . . . . . . . . . . . . . . . . . 22
2.2.5 ModelEstimation . . . . . . . . . . . . . . . . . . . . 22
2.3 Dynami BayesianNetworks. . . . . . . . . . . . . . . . . . . 23
2.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Exe utionSystems . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5.1 ModularTaskAr hite ture . . . . . . . . . . . . . . . 26
2.5.2 OtherAr hite tures . . . . . . . . . . . . . . . . . . . 26
2.5.3 DenitionofSkills . . . . . . . . . . . . . . . . . . . . 27
3 Dynamic Decision Networks 28
3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2 Lo alReward,GlobalUtility . . . . . . . . . . . . . . . . . . 30
3.3 SolutionTe hniques . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Spe ialCase: MarkovDe isionPro esses. . . . . . . . . . . . 31
3.4.1 Poli y . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.2 SolutionsandSolverMethods . . . . . . . . . . . . . . 32
3.4.3 ValueIteration . . . . . . . . . . . . . . . . . . . . . . 32
3.4.4 Reinfor ementLearning . . . . . . . . . . . . . . . . . 33
3.4.5 RLwithModel Building. . . . . . . . . . . . . . . . . 34
4 The DARE Method 37
4.1 TasksandBeliefs . . . . . . . . . . . . . . . . . . . . . . . . . 384.2 OverviewofDARE . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Exe utionAssumptions . . . . . . . . . . . . . . . . . . . . . 39
4.4 RenementAssumptions. . . . . . . . . . . . . . . . . . . . . 39
4.5 Hierar hi alSolutionNodes . . . . . . . . . . . . . . . . . . . 41
4.6 Subs riptionVSPoll . . . . . . . . . . . . . . . . . . . . . . . 43
4.7 TheMethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.7.1 Main. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.7.2 DynabsSolve . . . . . . . . . . . . . . . . . . . . . . . 44
4.7.3 CreateSubProblems . . . . . . . . . . . . . . . . . . . 45
4.7.4 ReplanIfNe essary . . . . . . . . . . . . . . . . . . . . 46
4.8 Dis ussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5 Case Study I 49
5.1 TaskEnvironmentClass . . . . . . . . . . . . . . . . . . . . . 495.2 TaskEnvironmentModel . . . . . . . . . . . . . . . . . . . . 51
5.2.1 DangerRewards . . . . . . . . . . . . . . . . . . . . . 52
5.2.2 ObservationTargetRewards . . . . . . . . . . . . . . 53
5.3 Skills. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.4 DAREImplementation. . . . . . . . . . . . . . . . . . . . . . 54
5.4.1 ProblemModels . . . . . . . . . . . . . . . . . . . . . 54
5.4.2 Dynami Abstra tion . . . . . . . . . . . . . . . . . . 55
5.4.3 SolutionMethod . . . . . . . . . . . . . . . . . . . . . 57
5.4.4 SubproblemGeneration . . . . . . . . . . . . . . . . . 59
5.4.5 ReplanningConditions. . . . . . . . . . . . . . . . . . 60
5.5 Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.6 Comments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
CONTENTS ix
6 Case Study II 69
6.1 TaskEnvironmentClass . . . . . . . . . . . . . . . . . . . . . 70
6.2 TaskEnvironmentModel . . . . . . . . . . . . . . . . . . . . 71
6.3 Skills. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.4 BeliefStateandFiltering . . . . . . . . . . . . . . . . . . . . 74
6.5 DAREImplementation. . . . . . . . . . . . . . . . . . . . . . 76
6.5.1 PlanningModel Generation . . . . . . . . . . . . . . . 76
6.5.2 SolutionMethod . . . . . . . . . . . . . . . . . . . . . 79
6.5.3 CameraMovement . . . . . . . . . . . . . . . . . . . . 81
6.5.4 DynabsSolveImplementation . . . . . . . . . . . . . 81
6.5.5 Replanning . . . . . . . . . . . . . . . . . . . . . . . . 82
6.6 Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.6.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.7 Dis ussion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7 Conclusion 86
7.1 FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . 867.1.1 ExtentionstotheCaseStudies . . . . . . . . . . . . . 87
7.1.2 Dynami Task EnvironmentModels . . . . . . . . . . 88
Chapter 1
Introduction
ItisdiÆ ulttooverestimatetheimportan eofmathemati almodelsinour
modern so ietybe ause of their ommon use ine.g. naturals ien es and
engineering di iplines for many dierent purposes. Many types of mod-
els existstoday that an be usedfor a variety of tasks su h as predi ting
weather, simulating vehi le dynami s, monitoring nu lear power rea tors
andverifying omputerprograms.
ModelshavealsobeenusedwithintheareaofArti ialIntelligen e(AI)
to develop autonomous agents. It is widely onsidered that su h agents
shouldhavemodelsoftheirenvironments(andthemselves)tomakeitpos-
sibletooperate moresu essfully. Themodels anforexamplebeusedto
keeptra kofthe unobservablepartsof theworld,performpredi tion[34℄,
taskplanning[31℄andothertypesofreasoning[10℄.
1.1 Models and Tradeoffs
One ommon trait formathemati al models usedinpra ti al appli ations
isthatitisnotalwaysbene ial(orevenpossible)tomodeleveryaspe tof
asystemofstudydowntothesmallestdetailtogetasa urateaspossible.
Theproblemis thatthere isalwaysatradeo between a ura y andfea-
sibility ofamodelthatshouldbeusedfora ertainappli ationona given
ar hite ture. Theremightbeademandfortimelyresponseofasystemthat
prohibitslongdeliberationtime whi h inturn anmakeahighlydetailed,
but omputationallydemanding model inappropriatefor use in that par-
ti ular domain. Althoughthe omputational resour esthat anbe made
availablefordierentappli ationshavebeenin reasingexponentiallysin e
thedawn ofele troni omputers,therewillalwaysbealimitwhen apar-
ti ularsystemis beingdeveloped anddeployed. Thismeansthat onewill
alwayshave totrade a model'sa ura y forfeasibilityto get a reasonable
performan einany future system,whi hisa fa tthat isoftenmentioned
intheliteratureaboutpra ti almathemati almodelling[44℄[24℄.
1.2 Task Environments and Models
Whenamodelistobe onstru tedforanautonomousagent,itisimportant
to onsiderthetaskenvironment [63℄inwhi htheagentwilloperate.The
omplexity of the task environment an give signi ant hints about the
dierenttypesof models that anbeusedforwhateverthepurposeofthe
model is.
Ataskenvironment,whi h anbeeitherrealorsimulated,spe ies:
what theagent andototheenvironmentwith itsa tuators,
what informationit anre eivefromitssensors,
howtheenvironment works andwhat it ontains,and
what is onsidered \good or bad" with the help of a performan e
measure
A task environmentfor an autonomous ground robot an e.g. spe ify
thatthea tuators onsistofapropulsionsystemandpossiblyamanipulator
arm. Su hagentsarealsotypi allyequippedwithsensorssu haslaserrange
s anners, amerasand sometimes ollisionsensors. Theenvironmentmay
onsistoftables, hairs,walls,stairset .,anditsperforman emeasuremay
be dened in terms of power onsumption and the time to omplete an
assignedtask(su hasdeliveringapa kage).
Amodelthatanagentusesshouldbe losely onne tedtothetaskenvi-
ronmentthattheagentoperateswithin. Forexample,ifamodelisgoingto
beusedforpredi tingthestateofanautonomousagent'staskenvironment
depending on what a tions it performs, it better in lude spe i ations of
howthe a tuators,sensors andthesurrondingsworkinordertobeuseful.
Su hataskenvironmentmodel annotbetoodetailedduetothetradeo
betweena ura yandfeasibility.
A task environment or a model thereof an be lassied a ording to
some ommonly used dimensions [63℄ whi h to a large extent determine
howdiÆ ultitis tohandle.
Fully Observable
orPartially Observable
: Ifthe agent'ssensorsan give a ess to all the relevant information in the environment
it is alled a fully observable task environment; otherwise the task
environmentis alledpartially observable.
Deterministic
orStochastic
: Ifthenextstate is ompletelydeter-minedbythe urrentstateand thea tionexe utedbytheagentthe
taskenvironmentis alleddeterministi . Ifthereareseveralpossible
out omesofana tionitis alledasto hasti environment. Theterm
non-deterministi is often usedwhen out omes do nothave proba-
1.2. TASKENVIRONMENTSAND MODELS 3
Episodic
orSequential
: In an episodi environment, the agent's urrent de ision does not in uen e the performan e of any futureepisode. All environments onsidered in this thesis will be sequen-
tialwhi hmeansthattheagent's urrentde isionmightin uen ethe
performan eoftheagentinfuturestates.
Static
orDynamic
: A task environment whi h may hange while theagentdeliberatesis alledadynami environment;otherwiseitisalledstati .
Discrete
orContinuous
: A ontinuous environment ontains el- ements that are more a urately des ribed with ontinuous modelsinvolving real values instead of an enumerable set of values. Task
environments that do not have any ontinuous elements are alled
dis rete.
Single Agent
orMultiagent
: A taskenvironmentwhereotherex- ternalagents,besidesthemainagentitself,trytorea hgoalsormax-imizetheir utilitiesare alled multiagent. Ifthe externalagentsare
betterdes ribedwithoutde ision apabilities,orifnoexternalagents
exist,theenvironment anbe onsideredsingle agent.
Inthis thesis, thesedimensionsareusedto lassifythe intrinsi prop-
ertiesofataskenvironment.Theyarenotassumptionsthate.g. adesigner
ofanagent anmake. Ontheotherhand,adesigner anmakeassumptions
thatarere e tedintheagent'staskenvironmentmodelsthatitissupposed
touse. Constru tedmodelsthatrepresentpartsofataskenvironmentmust
oftenbea simpli ationof therealthingand thedierentdimensionsare
then usedto lassify the model onstru tionassumptionsthat are notal-
ready a property of thetask environment. Thiswill bedis ussedmorein
Se tion1.4.
It is assumed that task environment models an be simulated. This
meansthatdierenta tions anbetestedwiththemodelwhi hmayresult
in one or several possible out omes depending on whether the model is
deterministi ornot. Sto hasti models anbesimulatedbypseudorandom
numbergenerators.
A task environment lass orenvironment lass isa set of taskenvi-
ronmentswithsimilarproperties. Anagentisoftendesignedto operatein
instan esofaparti ulartaskenvironment lasswheree.g. theenvironment
an ontainadierentnumberofobje tsand agentsbutmostoftheother
propertiesor assumptionsstay the same. In this thesis, the taskenviron-
ment instan esina parti ularenvironment lass are assumedto havethe
same lassi ation a ordingto the previously mentioned dimensionsand
thatthe a tuatorsand sensorsaresimilarlymodelled. Withina parti ular
environment lass,thetypesoftheobje tsintheenvironmentalsostaythe
samebutthenumberandinitial onditions mayvary intaskenvironment