Implementation and testing of an FPT-algorithm for computing the h+ heuristic

(1)

Linköpings universitet SE–581 83 Linköping

Linköping University | Department of Computer and Information Science

Bachelor thesis, 16 ECTS | Datateknik

2017 | LIU-IDA/LITH-EX-G–17/077–SE

Implementation and testing of

an FPT-algorithm for

comput-ing the h

+

-heuristic

Implementering och testning av en FPT-algoritm för beräkning

av h

+

-heuristiken

Niclas Jonsson

Supervisor : Simon Ståhlberg Examiner : Christer Bäckström

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet eller dess framtida ersättare under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Till-gång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin-istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam-manhang som är kränkande för upphovsmannenslitterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet or its possible replacement for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to as-sure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

c

(3)

Abstract

We have implemented and benchmarked an FPT-algorithm, that has two input param-eters, k and w besides the input problem instance, which is a planing instance, in this thesis. The algorithm has an exponential running time as a function of these two parameters. The implemented algorithm computes the heuristic value h+(s)of a state s that belongs to a state space, which originates from a STRIPSinstance. The purpose of the project was to test if the algorithm can be used to compute the heuristic function h+, i.e. the delete-relaxation heuristic, in practice. The delete-relaxation heuristic value for some state is the length of the optimal solution from the state to a goal in the delete-relaxed-instance, which is the orig-inal instance without all its negative effects. Planning instances was benchmarked with the search algorithm A˚_{to test the algorithms practical value. The heuristic function blind}

was benchmarked together with A˚ _{with the same instances so that we could compare}

the quality of the benchmark result for the implemented algorithm. The conclusion of the project was that the implemented algorithm is too slow to be used in practise.

(4)

Acknowledgments

I wish to thank my supervisor Christer Bäckström for his guidance, patience and always helpful comments for my work in this thesis. I would also like to thank Simon Ståhlberg who created one of the planners that I have used in this project. Simon was always quick to offer me help whenever I had problems understanding the planner. I also wish to thank Malte Helmert who created the planner Fast Downward, and the community on Fast Downward Google group who helped me understand the planner.

(5)

List of Figures

1.1 A Block World instance with 3 blocks. . . 2

2.1 A TG of the planning instance . . . 7

2.2 The transition graph of the projection TG[P[tv2u]]from example 7. . . 7

2.3 The CG of the instance in example 10. . . 8

2.4 A graph and two heuristic functions. . . 12

3.1 A tree decomposition of a graph . . . 14

4.1 An illustration of how the constraint Rijis defined. . . 17

4.2 A CG of for variables of the instance from example 21. . . 17

4.3 A TD of the undirected causal graph in figure 4.2. . . 18

(8)

List of Tables

1.1 A subset of all actions to model the Block World instance in figure 1.1 . . . 3 8.1 The result of the benchmarking when Bäckström’s algorithm computed the

(9)

1 Introduction

This is a bachelors thesis written with the motivation to analyze the speed and effectiveness of a fixed-parameter tractable algorithm in practice. We will refer to the algorithm as Bäck-ström’s algorithm. While the algorithm can be used to find optimal paths in any SAS instance with acyclic domain transition graph, we will only use it to compute the heuristic h+. The purpose of a heuristic in this context is to minimize the search time for planning problems. Before we proceed to the background and motivation section, we will quickly explain what an FPT-algorithm is.

A problem with an input length n and a parameter k is said to be in the complexity class FPTif the problem can be solved inO(nO(1)f(k))time, where f is a function which is usually exponential (but may be polynomial) in k. As one can see, the function f is independent of the input size n. The parameter k is a parameter of the input instance which is not strictly related to the input size of the instance. The parameter can be the highest degree of the vertex of a graph or the length of the shortest path for example. Bäckström’s algorithm has three parameters but only two of them are relevant for computing h+.

1.1 Background and Motivation

Heuristic functions are used in automated planning and scheduling to estimate the distance from a state to the closest goal from that state. The purpose of this is to have an estimate of which of the generated states that are closest to the goal. This information can be used by a search algorithm to decrease the needed search space for the search algorithm. Depending on what the planning instances model, different heuristics may work better than others. We will discuss heuristic functions further below.

We will now describe what a planning problem is and how heuristic functions can be used to solve planning problems faster. A planning problem generally NP-hard, so it is practically impossible to find an (optimal) solution with an exhaustive search method for these problems. Planning problems are represented by states and actions. To move between states, actions are used. A classical framework to represent planning problems is STRIPS. A STRIPSinstance is a tuple of a set of variables, actions, a start vertex and a function of goal states. In the STRIPS

framework, each variable is either true or false. There exist other planning frameworks as well. SAS is one of them. SAS is a generalization of STRIPSand allows variables to have more

(10)

1.1. Background and Motivation

Figure 1.1: A Block World instance with 3 blocks.

The state in the figure can be represented as OnTable(A) ^On(C,A)^OnTable(B) ^ Clear(B) ^Clear(C) ^ Empty(), with the variables from list 1.1

than two values. This thesis will only experiment with instances that can be represented within the STRIPSframework.

One of the most classic planning problems that can be represented with STRIPS is the

block world problem. We explain the problem below and give an example of how to reach a solution. Parts of the example are borrowed from Saleema, Amershi, Nicole et al [17].

In the Blocks World problem, there is a floor, a robot arm and some blocks that can be moved. A picture of a block world can be seen in figure 1.1. A block can be picked up by the robot arm if there is no block on top of that block and if the arm does not hold any other block. The block world instance in figure 1.1 can be modelled with the binary variables in list 1.1. Holding(A) is true if the arm is holding block A, On(A,B) is true if block A is on block B, OnTable(A) is true if block A is standing on the table (and not on a block), Clear(A) is true if no block is on A, and Empty() is true if the arm is not holding any block. With these variables, we can represent all possible states that exists in this instance. A state can be represented by conjunctions over all variables that are satisfied. We need actions to move between states. A few actions for this instance can be seen in table 1.1. The first column shows the names of the actions, the pre-conditions are in the second column, i.e. the conditions that need to be satisfied for an action to be applied, the conditions that will be removed from the current state when an action is applied are in the third column and the conditions that will be added to the state when an action is applied is in the fourth column. The result of applying an action from a state is another state. Pickup(A,B) means that the arm is picking up block A when block A is standing on block B, PickupTable(A) means that block A is picked up from the table, putdown(A,B) means that block A is held by the arm and is placed on block B and putDownTable(A) means that block B, held by the arm is placed on the table to explain the meaning of a few actions.

Example 1 Assume we want to reach a state where On(C,B) holds and we are on the state that figure 1.1 represent, i.e. OnTable(A) ^ On(C,A) ^ OnTable(B) ^ Clear(A) ^ Clear(B) ^ Clear(C) ^ Empty(). By applying action PickUp(C, A), we reach the state OnTable(A) ^ OnTable(B) ^ Clear(A) ^Clear(B) ^ Holdding(C). From this state, we can apply the action putdown(C, B) and reach the

(11)

1.1. Background and Motivation

Action Preconditions Add List Delete List

pickup(A, B) empty clear(A) on(A, B)

holding(A), clear(B)

empty, on(A, B) clear(A)

pickuptable(B) empty, clear(B) ontable(B) holding(B) empty, ontable(B), clear(B putdown(A, B) holding(A) clear(B) empty, on(A, B), clear(A) clear(B), hold-ing(A) put-downtable(B) holding(B) empty, ontable(B), clear(B) holding(A)

Table 1.1: A subset of all actions to model the Block World instance in figure 1.1 . The action pickup(A, B) is used pick up block A when block A is on block B and pickupt-able(A) is used to pick up block A when the block is standing on the table.

state OnTable(A) ^ On(C, B) ^ OnTable(B) ^ Clear(A) ^ Clear(B) ^ Clear(C) ^ Empty(), which is a goal state since On(C, B)is satisfied.

• Holding(A) • Holding(B) • Holding(C) • On(A,B) • On(A,C) • On(B,A) • On(B,C) • On(C,A) • On(C,B) • OnTable(A) • OnTable(B) • OnTable(C) • Clear(A) • Clear(B) • Clear(C) • Empty()

List 1.1: A list of variables to model the block world in figure 1.1.

Even if many conjunctions of true conditions contradict each other - like Hold(A) and Hold(B) - and can therefore not be reached by any action from the initial state, the block world problem is an NP-Complete problem, since there exists an exponential number of states that can be reached as a function of the number of blocks that are in the instance [11].

In light of this, it is clear that an exhaustive search algorithm like Breadth First Search (BFS) would be too slow to solve the problem when the input size is large. To increase the search speed of the algorithm, we need to find a way to decrease the number of expanded vertices, i.e. states the search algorithm reaches. One way to decrease the number of ex-panded vertices is to avoid exploring vertices that are unlikely to be closer to a goal than others. This is the purpose of heuristic functions. A heuristic function takes a state as input and returns a number which indicates how far a state is from reaching a goal. The lower the return value, the closer the state is to reach the goal (if the heuristic is accurate). A heuristic function returns an estimate of the distance from a state to a goal since it is pointless to return the exact distance. If a heuristic would return the exact distance, then the heuristic would need to calculate the optimal solution by an exhaustive search which we are trying to avoid by using the heuristic function. It is clear that there is a trade-off between speed and precision of heuristic functions.

There are many well-known heuristic functions. One of these is the delete-relaxing heuris-tic. This method removes all negative conditions from all actions from a planning instanceP, which creates an easier planning instanceP+. It then finds the shortest solution forP+and returns the cost of that path. It is trivial to see that it is easier (or at least not harder) to reach a goal in a delete-relaxed instance than in a non-relaxed instance since one can reach states like holding(A)^clear(A)from the initial state in a delete-relaxed instance which is not possible

(12)

1.2. Problem Definition

in a non-relaxed instance. This is possible in a relaxed instance since no variable is never set to a negative value in a relaxed instance.

The optimal cost of a delete-relaxed instance seems easy to calculate at first but the prob-lem is in fact NP-hard so h+is therefore normally estimated instead of being solved optimally [6]. We will implement Bäckström’s algorithm in this thesis, which can be used to calculate h+in linear time with respect to the input size but exponential in some other parameters of the instance.

1.2 Problem Definition

Is the implementation that is made in this project of Bäckström’s algorithm useful in practice? We will measure the practical value of its running time, and the number of states that are generated for each instance by the search algorithm A˚_.

We will benchmark Bäckström’s algorithm by implementing it as a heuristic function for the search algorithm A˚_{. The benchmark will be done on instances from the ICCP}

compe-tition. The search algorithm A˚ _{will then benchmark another well-known heuristic on the}

same ICCP instances so we can compare the implemented algorithm’s running time with some popular heuristic like the FF heuristic. Two planners will be used for this project, one where we will implement and benchmark Bäckström’s algorithm and one where we will benchmark the other heuristic. We use two planners since the planner we implement the al-gorithm in has very few heuristics and the lack of documentation in the other planner, which makes it problematic to implement algorithms in it. We will run the search algorithm with the heuristic blind which returns the value 1 for each state on all instances we use for the benchmarking on both planners. This is compare the performance of the two planners.

1.3 Limitations

• The benchmarks will only be performed on a MacBook Pro Retina with 2,6 GHz Intel Core i5 and 8 GB memory.

• There are many ways to implement this algorithm and there is a trade-off in speed for some steps but only one implementation will be tested.

• Only unit cost actions will be used.

1.4 Thesis Structure

This section present the structure of the thesis. Chapter two will present the planning frame-work STRIPS, graphs that are related to the framework, heuristic functions and the search algorithm A˚_{, which is used for the benchmarking. Chapter three will cover the rest of the}

theory needed to understand Bäckström’s algorithm. Bäckström’s algorithm is explained in chapter four, chapter five gives a short description of the two planners and how to use them. In Chapter six, the implementation of the algorithm is discussed. Chapter seven contains a literature review of tree decomposition algorithms. The result and the discussion of the result is found in chapter eight and chapter nine respectively. Chapter ten presents our conclusions of the project and a short discussion about future work related to the project. An example of an input file to the planner in which the algorithm was implemented for is found in appendix A.

(13)

2 Planning Theory

This chapter introduces the planning theory needed for the reader to understand Bäckstöm’s algorithm and general information about planning theory, including search algorithms and heuristics. More theory, which is needed to understand Bäckstöm’s algorithm is found in chapter three.

2.1 STRIPS Instance

The following definition is a modification of the SAS planning framework in [2]. The purpose of the modification is to transfer the SAS instance definition to a STRIPSinstance definition. Definition 2 Let V = tv1, . . . , vnube a finite set of binary variables. A variable is either positive

or negative. We denote a positive variable v as v and negative as ¯v. We define the state space for V as S(V) = ttv1, ¯v1uˆ, . . . , ˆtvn, ¯vnuuand a members of the state space are called states or total

states. The value of a variable v in state s is denoted s[v]P t¯v, vu. A partial state is similarly a partial function over V such that for each vi PV , either s[vi] is undefined or s[vi] P t¯v, vu. The notation

vars(s) denotes the set of variables vi PV such that s[vi]is defined. The two functions Pos and Neg

are defined as(s) =tv|v=s[v], v P vars(s)u, and Neg(s) =t¯v| ¯v=s[v], v P vars(s)u, for a partial state s.

Furthermore, we define the projection of a partial state s such that for a V1 _Ď_{V, s}_[_V1_]_{is a partial}

state over the variables vars(s) XV1_.

We are now ready to give the definition of a STRIPSinstance. A STRIPSinstance is a tupleP= xV, A, sI, sGy, where

• V=tv1, . . . , vnuis finite set of binary variables

• sIis a state

• sGĎV is a subset of positive variables

• A is the set of actions. An action a is a pair xpre, effy where pre(a)and eff(a)are partial states. Let a P A and let s be a total state. Then a is valid in s if pre(a)[v] =s[v]for all v P vars(pre(a)). Furthermore, the result of a in s is a state t P S(V) such that for all v P V, t[v] = eff(a)[v] if v P vars(eff(a))and t[v] =s[v]otherwise.

(14)

2.2. Instance Projections

Let s0, s` P S(V)and let ω = xa1, . . . , a`ybe a sequence of actions. Then ω is a plan from s0

to s` if either (1) ω = xy and` = 0 or (2) there are states s1, . . . , s`´1 P S(V)such that for all i

(1 ď i ď`), aiis valid in si´1and siis the result of aiin si´1. An action sequence ω is a plan forP

if it is a plan from sIto sG.

2.2 Instance Projections

The following definition is taken from [2].

Definition 3 Projections are extended as follows. Let P = xV, D, A, sI, sGy be a planning

in-stance and let V1 _Ď _{V. Then, for each action a P A, define a}_[_V1_] _{as the restriction a}1 _{of a}

where pre(a1_{) =} _pre₍_a_)[_V1_] _{and eff}₍_a1_{) =} _eff₍_a_)[_V1_]_{. That is, we treat a and a}_[_V1_] _as

differ-ent variants of the same action. Also define A[V1_{] =} _ta_[_V1_]_|_{a P A and eff}₍_a_[_V1_])_‰

∅u. Then

P[V1_{] =} _xV1_{, A}_[_V1_]_{, s}

I[V1], sG[V1]y. The projection of an action sequence ω = xa1, . . . , a`yover A

onto V1_{is denoted ω}_[_V1_]_{and defined as follows. First define the sequence ω}1 ₌_xa1

1, . . . , a1`ysuch that

a1

i =ai[V1]for all i (1 ď i ď`). Then define ω[V1]as the subsequence of ω1 that contains only those

a1

iwhere eff(a1i)‰∅. Also here, we consider ω[V1]to be a subsequence of ω, i.e. it consists of actions

from ω although in a restricted variant. For all cases, we also define the projection onto a single vari-able v such that a[v] =a[tvu]etc. The following result is known in the literature (cf. Helmert [12]).

Proposition 4 LetP =xV, D, A, sI, sGybe a planning instance and let V1 ĎV. If ω is a plan for

P, then ω[V1_]_{is a plan for}_P_[_V1_]_.

2.3 Transition Graph

Definition 5 We define thetransition graph for a planning instanceP = xV, D, A, sI, sGyas the

labelled directed graph TG(P) =xS, Ey, where S=S(V), i.e the state space of V and E Ď S ˆ A ˆ S such that for all s, t P S and a P A, xs, a, ty P E if a is valid in s and t is the result of a in s. Obviously, the paths from sIto sGin TG(P)correspond to the plans forP.

We will now give an example of a STRIPSinstance and visualize its transition graph (TG).

Example 6 The TG for the planning instanceP =xV, A, sI, sGy, where V =tv1, v2u, A contains

the follow actions: • a : ¯v1v¯2Ñv2

• b : ¯v1v¯2Ñv1v¯2

• c : v2Ñv1

(15)

2.4. Domain Transition Graph ¯ v1, ¯v2 v1v¯2 v¯1v2 v1v2 a b c c

Figure 2.1: A TG of the planning instance in example 6. The labels on the edges are action names.

Let us now look at a transition graph of a projection of the instance in the example above.

Example 7 The projection of tv2uof the example above isP[tv2u] =xV, A, sI, sGywhere V=tv2u,

A contains the actions • a : ¯v2Ñv2

• b : ¯v2Ñv¯2

,sI=tv¯2uand sG =t∅u.

The transition graph TG[P[tv2u]] can be seen in figure 2.2. The goal set is empty in this

projection and therefore any sequence of actions is a solution, including the empty one. The sequences xy, xay, xa, by are therefore all solutions toP[tv2u].

¯

v2 a v2

a,b

Figure 2.2: The transition graph of the projection TG[P[tv2u]]from example 7.

The graph is also a domain transition graph (see section 2.4) of variable v2over the instance

Since the instance is projected over only one variable.

2.4 Domain Transition Graph

Definition 8 We define thedomain transition graph (DTG) for a variable v P V as DTG(v) = TG(P[v]). The paths from sI[v]to sG[v]in DTG(v)describe all possible ways to go from the initial

state to the goal for this particular variable treated in isolation. A domain transition graph (DTG) can be seen in figure 2.2.

2.5 Causal Graphs

(16)

2.6. Search Algorithms and Heuristics

Definition 9 Thecausal graph for a planning instance describes how the variables of the instance depend on each other, as implicitly defined by the actions. LetP = xV, D, A, sI, sGybe a planning

instance. Then the causal graph ofP is the directed graph CG(P) = xV, Ey where E contains the edge xv, wy for every pair of distinct vertices v, w P V such that (1) v P vars(pre(a))Yvars(eff(a)) and (2) w P vars(eff(a))for some action a P A.

A visualization of a causal graph can be seen in example 10.

Example 10 LetP=xV, A, sI, sGy, where V =tv1, v2, v3, v4u, A contains the follow actions:

• a : ¯v1v¯2Ñv2

• b : ¯v1v¯2Ñv1v¯2

• c : ¯v2Ñv3

• d : ¯v3Ñv4

,sI=tv¯1v¯2uand sG=tv1v2u. The causal graph CG(P)can be seen in figure 2.3.

v1 v2

v3 v4

Figure 2.3: The CG of the instance in example 10.

2.6 Search Algorithms and Heuristics

Heuristic functions are used in automatic planning to get an estimate of the length of the shortest path from a state s to the closest goal node from s. A search algorithm can use this information to determine which vertex it should expand next. A planning problem can for example model some airports and planes where the goal is to fly some given routes so that the total cost of the company that owns the airport is minimized when all laws and security rules are applied. A much simpler planning problem could be a game with the goal to find a path to a coin in a grid from the player’s initial position when some paths are blocked by walls. If it is only possible to move up, down, left and right in the grid and not possible to cross a wall and the positions of the coins are given, a simple heuristic could be the Manhattan distance between the position of the player and the position of the coin. By using this heuristic in a search algorithm, the optimal path might be found faster in comparison to the same search algorithm that performs the search without the heuristic function.

The described heuristic for the grid graph problem is a so-called admissible heuristic. A heuristic is said to be admissible if and only if the heuristic distance from any vertex to the closest goal is never larger than the actual shortest path to a goal, i.e @s, @g, h(s, g) ď dist(s, g), where s is a state, g is a goal, h is the heuristic function and dist is the distance of the shortest path between two states. A dumb example of a non-admissible heuristic could be a function which returns the double Manhattan distance from the current position of the player of the coin in the game-planning problem described above. This heuristic is not admissible since if a player is one step away from the coin and there is no wall between the player and the coin, the heuristic would return 2 but the shortest distance is 1. The disadvantages of

(17)

using a non-admissible heuristic is that the search algorithm may not find the optimal path. Also, it might slow down the search algorithm instead of guiding it since a non-admissible heuristic can make a search algorithm expand vertices further and further away from the goal(s). On the other hand, a non-admissible heuristic could have a lower time complexity than an admissible heuristic and therefore make the search algorithm find a solution faster even if the guiding may be wrong for a few states.

It is vital that the heuristic function that is being used with a search algorithm is accurate so it gives good estimates - a heuristic that returns 0 for every state is clearly admissible but also useless since no planning problem is being solved faster with such heuristic.

Delete-Relaxation Heuristic

We will define and discuss the delete-relaxation heuristic in this section. The definition of the delete-relaxed instanceP+of instanceP will also be given. We start defining P+(definition 11) since the definition of the heuristic requires this definition.

Definition 11 Given a planning instanceP = xV, A, sI, sGy,P+ is the delete-relaxed instance of

P, where P+ ₌ _{xV, A}+_{, s}

I, sGyso that A+ = ta+|(|eff(a+)| ą 0), where a+ = xpre’(a), eff’(ay,

pre’(a) =Pos(pre(a)), eff’(a) =Pos(e f f(a))u, @a P A

As one can see in the definition above, only the actions in a delete-relaxed instance differ from the original instance: all negative pre- and post-conditions are removed in the delet e-relaxed instance. A relaxed action can be removed if it does not contain any effect and such an action does not contribute to a solution and is therefore unnecessary.

Example 12 a+ : v2Ñv1is the relaxed action of action a : ¯v1v2Ñv1v¯2

We can now define the delete-relaxation heuristic:

Definition 13 The delete-relaxation heuristic is denoted as h+. For a state s in the state space for the instanceP=xV, A, sI, sGy, the heuristic value for state s is denoted h+(s)which is equal to the

length of an optimal solution forP+s =xV, A+, s, sGy

One might think that finding an optimal path in a delete-relaxed-relaxed instance is sim-ple but the problem is in fact NP-hard [14]. Therefore, the heuristic value h+is usually esti-mated since it is too time consuming to compute the exact value. An estimate of h+is useful in practise which has been proven since estimates of h+are widely used in the planning com-petition IPPC (international planning comcom-petition). Seven out of eleven competitors used a method that estimated h+during the third competition [13].

delete-relaxed-relaxed instances have a special property which is useful when one is searching for solutions in a transition graph of such an instance. Since the set var(eff((a)) contains no negative variables for all actions in delete-relaxed-relaxed instances, we know that none of the variables in eff((a)will be negative in any state reachable from a state which was reached by traversing an edge with action a+. Because of this, we know that it is never useful to include an action twice in a path if the instance is a delete-relaxed instance. This makes it easier to search for solutions since many edges in the graph can be ignored.

The delete-relaxing heuristic is an admissible heuristic. Removing negative effects and preconditions will never make it harder to find a goal since no variable will be set to false as in the original instance and therefore, the shortest path in a delete-relaxed instanceP+ can never be longer than the length of the shortest path in the original instanceP.

A delete-relaxed instance transition graph is always a directed acyclic graph since it is never possible to travel to a state which has a variable with a negative value from a state where the variable is true. The transition graphs are acyclic for all delete-relaxed instances since the DTGs for all variables in a delete-relaxed instance are also acyclic. This is an impor-tant fact since the input to Bäckström’s algorithm must be an instance with acyclic DTG’s for all variables of the instance.

(18)

Only the goal state(s) are interesting for some planning problems but the path to reach the goal state(s) is not interesting. An example of this kind of problem is the n-queen problem. An instance of the problem is a n ˆ n chess board and n queens. A solution to the problem is to place the n queens on the board so that none of them is threatens any other queen. How a goal state looks for this problem is interesting but how a goal state is found is not interesting. However, this is not the case for most planning problems. The (optimal) path that ends in a goal state is usually what is interesting for most problems. In the grid toy example above, we know that the solution for the problem is to place the player on the same coordinates as the coin. The problem is to find a path which moves the player to that position.

We can search for an optimal solution in the transition graph of any instance we want to find a solution for. There exist many search algorithms with different disadvantages and advantages. Breadth-first search (BFS) is guaranteed to find the optimal path but requires ex-ponential space. Depth-first search (DFS), on the other hand, requires only polynomial space but is not guaranteed to find the optimal path when its first finds a goal state, in worse case, DFS needs to find all paths to the goal node before it can be certain that the optimal path has been found to that node. Uniform-cost search or Dijkstra’s algorithm is a generalization of BFS, which can find the optimal paths on graph with weighted edges. One mutual disad-vantage with the algorithms above is that none of these search algorithms uses a heuristic. Instead, they all search blindly for a goal node. A search algorithm that uses a heuristic func-tion (the heuristic funcfunc-tion is a parameter to the algorithm) is A˚_{which the next section will}

cover. This is the search algorithm that was used for the benchmarking.

A

˚

The search algorithm A˚_{is a popular search algorithm which finds a path between two}

ver-tices in a graph. It is used in many real-time strategy games like Warcraft and Age of Empires. The steps of the algorithm are very similar to the algorithm uniform-cost search. The only difference between the two algorithms is that A˚ _{uses both the cost of the edges and}

a heuristic function to determinate in which order the vertices should be expanded, which is more complex than the uniform-cost search algorithm which only uses the costs of the edges to determinate the order. The search algorithm A˚_{is guaranteed to return an optimal}

path if the heuristic function it uses is admissible. The heuristic function is usually an input parameter in most A˚_{implementation. If this is the case, then different heuristics can use the}

same A˚_{implementation. The input parameters to A}˚ _{are usually a function that computes}

the neighbors from a state, a start vertex, a goal function and a heuristic function. Note that the transition graph is not an input. It is unlikely that the complete transition graph is needed. It is more likely that just a part of the transition graph is needed to find an (optimal) path. Because of this and the fact that the size of the transition graph is generally exponential in the number of input variables, we do not want to generate more vertices of the graph than necessary. Instead, we only generate the states that are needed to perform the search. Let us give an example of this.

Example 14 Let the vertex that represents the state in the Block World in figure 1.1 be the vertex that is currently expanded. From this state, we can reach the states OnTable(A)^OnTable(B)^ Hold(C)and OnTable(A)^On(C, A)^Hold(B)by testing which actions we can apply from this state. The states we can reach are generated as vertices and added to the data structure that stores them.

This approach saves both time and memory. Unfortunately, the time complexity is the same with this approach as with the naive one which computes the whole transition graph at once. If the transition graph would be an input, then the complete TG must have been generated earlier and the optimal path would already have been found so searching for the optimal path with A in the TG would be redundant. Therefore, it is never a good idea to generate

(19)

the complete transition graph since many vertices in the graph are never reached when we search for a solution.

The pseudo-code for A˚_{can be seen in algorithm 1. The input parameters to the algorithm}

are h(), which is a heuristic function, sIis the start vertex, g()is a function which returns true

for a vertex if and only if the vertex is a goal state and neig()is a function which computes all reachable neighbors for a vertex. The variable S defined on line 2 is a minimum priority queue, variable D defined on line 3 is a hash map which maps the actual cost to travel from the initial cost to a vertex. On line 4 is the hash map P defined, which maps the parent-child relationship between all vertices. A while-iteration in A˚ _{performs the following steps: the}

algorithm expands the vertex that is on the top of the priority queue S. We call this vertex v. It then controls if v is a goal vertex. If it is, then the vertex v and the hash set P is returned. The optimal path can be obtained from this set by iterating over P starting from v until the initial vertex sI is reached. Otherwise, i.e. if v is not a goal, all neighbors of v that have

not been expanded yet are generated and added to S. The vertex with the lowest sum of distTov+h(v)in the priority queue S is first in the queue. This guarantees an optimal solution if h is admissible. Algorithm 1 A˚ 1: function A˚(h(), sI, g(), neig()) 2: S=NULL 3: D= NULL 4: P=NULL 5: expandeted= ∅ 6: S.add(sI, 0+h(sI)) 7: D.add(sI, 0) 8: P.add(sI, NULL) 9: while S ‰ NULL do 10: v=S.top(), S.pop() 11: distTov=D.lookup(v)

12: expandeted = expandeted Ytvu

13: if g(v) then

14: return tP, vu

15: neighbors=neig(v)´expandeted

16: for all u: neighbors do

17: P.add(u, v)

18: S.add(sI, distTov+h(u))

Since A˚_{makes a non-constant number of calls to the heuristic function, A}˚_{’s time}

com-plexity depends on the comcom-plexity of the heuristic function. The time comcom-plexity of A˚

is O(|E|+|V| log(|V|) +|V| ˚ Ht) where Ht is the time complexity of the heuristic

func-tion. If the blind heuristic is used, i.e. a heuristic that returns the same value for all ver-tices, then there is no difference between A˚ _{and uniform-cost search complexity since}

re-turning a constant value is done in constant time and therefore, A˚_{’s time complexity is}

O(|E|+|V| log(|V|))in that case. The space complexity is also dependent of the heuristic. The space complexity of A˚_is_O(_|V|₊_|E|₊_H

s), where Hsis the heuristic space complexity.

We finish this section with an example that demonstrates how A˚ _{works and how the}

algorithm can fail to find an optimal path if the heuristic it uses is not admissible.

Example 15 We will search for a solution with A˚ _{in the graph in figure 2.4a in this example. The}

vertex S is the start vertex and G is the only goal vertex in the graph. We have two heuristic functions: H1and H2. The values of the two heuristic functions can be seen table 2.4b. The heuristic function

H1is admissible since the heuristic value for each vertex is smaller than the distance to the goal. This

(20)

2.6. Search Algorithms and Heuristics S A B C D G 4 4 3 1 3 2

(a) A graph with a start vertex S and a goal vertex G. The number over each edge is the cost for traversing over it.

Vertex H1 H2 S 4 4 A 4 4 B 3 4 C 2 2 D 2 8 G 0 0

(b) Two heuristics for the graph.

Figure 2.4: A graph and two heuristic functions.

is 8. If we would use H1with A˚to find the goal from S, we would expand the vertices in the order

S, B, D, G. We would then stop and return a path since we have explored G. The path that would be return would be S, B, D, G since G was found by D, D by B and B by S. This path is optimal.

On the other hand, if we use heuristic H2instead of H1, we would expand the vertices in the order

S, B, A, C, G. The path we would return then would be S, A, C, G since G was found by C A by C and A by S. This path is not optimal. The example illustrates why A˚_{finds a non-optimal path if it uses a}

(21)

3 More Theory

This chapter presents the complexity class FPT, a few definitions from graph theory and the constraint satisfaction problems.

3.1 Fixed-Parameter Tractability

Definition 16 Fixed-parameter tractability (FPT) is a complexity class like P and NP. A problem P is said to be in the complexity class FPT if there exists a deterministic algorithm that solves each instance(n, k)of the problem P inO(nO(1)f(k)), where n is the input size of the problem, k some parameter of the input instance and f a function which is independent of n

Under the assumption that P ‰ NP, many important problems in the complexity class NP can only be solved in exponential running time, but the exponential part of an algorithm that solves the problem does not need to be a function of the input size of the problem but some other parameter of the instance if the problem is in FPT.

Let us give an example of this. The Knapsack problem is a well-known NP-complete problem that can be solved inO(2n)by an exhaustive search method. That algorithm is ex-ponential in n. But the knapsack problem can also be solved inO(nW), where W is the size of the bag. The running timeO(nW)looks polynomial but is only pseudo-polynomial since W is an integer and the input size of W in bits is c=rlog2(W)sso iterating W times takesO(2c)

which is exponential in the input size. With the assumption that the parameter W would be fixed for all instances of the Knapsack problem, then the problem could be solvable in polynomial time sinceO(nW) = O(n)under this assumption. The knapsack problem would then be traceable (in theory). This is why the class is named Fixed-Parameter Tractability. FPT-algorithms can solve problems faster than algorithms that are exponential in the input size but this is not always the case.

The class FPT belongs to the branch parametrized complexity which is a branch in Com-putational complexity theory, which is one of many complexity classes in the branch. The following hierarchy exists in the Parametrised complexity branch: FPT Ď W[1] Ď W[2] Ď W[3] Ď ¨ ¨ ¨W[P] Ď XP. An XP-problem can be solved in O(nf (k)₎_{. This branch is very}

importat under the assumption that P ‰ NP since it gives us an alternative way to solve NP-complete problems. On the other hand, if P = NP, then all classes in this hierarchy would be equal to each other and the branch would not be so interesting.

(22)

3.2. Graphs a b c d e f g h abc bcg ecg cde def egh

Figure 3.1: A tree decomposition of a graph

The graph to the right is a tree decomposition of the graph to the left.

3.2 Graphs

Tree decompostions

The following definition is taken from [2]:

Definition 17 A tree decomposition (TD) of a graph G = xV, Ey is a tuple xN, Ty where N = tN1, . . . , Nnuis a family of subsets of V and T is a tree with nodes N1, . . . , Nn, satisfying the following

properties (the term node is used to refer to a vertex of T to avoid confusion with vertices of G): 1. The union of all sets Ni equals V. That is, each graph vertex is contained in at least one tree

node.

2. For each vertex v P V, the tree nodes of N containing vertex v form a connected subtree of T. 3. For every edge tv, wu in the graph, there is a subset Ni that contains both v and w. That is,

vertices are adjacent in the graph only when the corresponding subtrees have a node in common. The width of a tree decomposition is the size of its largest set Niminus one. The treewidth of a

graph G is the minimum width among all possible tree decompositions of G.

The nodes of a tree composition are sometimes refered to as bags and are denoted Xiin the

literature. We will use this notion in chapter 6, where we are describing the implementation of Bäckström’s algorithm.

The center of a Graph

The Center vertices of a graph are the set of the vertices with the smallest eccentricity. A vertex’s eccentricity is the largest distance between the vertex and any other vertex in the graph. A tree has always one or two center vertices.

3.3 The Constraint Satisfaction Problem

The following definitions are taken from [2]:

Definition 18 An instance of theconstraint satisfaction problem (CSP) is a tripleC=xX, D, Cy, where X is a finite set of variables, D is a domain function assigning a finite domain to each variable and C is a finite set of constraints. Each constraint in C is a tuple xt, Ry where t is a sequence xxi1, . . . , xiry

(23)

3.3. The Constraint Satisfaction Problem

of variables from X and R is a relation R Ď D(xi1)ˆ. . . ˆ D(xir). An instantiation of C is a

mapping α that maps each xiPX to an element in D(xi). A solution forC is an instantiation α that

satisfies all constraints in C, i.e. R(α(xi1), . . . , α(xir)) holds for every constraint xxxi1, . . . , xiry, Ry

in C. When all constraint relations are binary, the constraint graph forC is the graph G =xX, Ey where E contains the edge txi, xjuwhenever there is some constraint xt, Ry such that t =xxi, xjyor

t=xxj, xiy.

Definition 19 An instance of theconstraint satisfaction optimization problem (CSOP) is a CSP instance C = xX, D, C, Wy with the additional parameter W which contains a weight (or cost) function w : D(x) Ñ Q_ě0 for each x P X. The weight of an instantiation α is defined as w(α) = ř_xPXw(α(x)). A solution toC is either the answer ’no’, if there is no satisfying instan-tiation, or the minimum value of w(α)over all satisfying instantiations α.

Here is an example of a basic CSP problem:

Example 20 We are given an CSP instance with the variables x1, x2and x3, the domains D(x1) =

t1, 2u, D(x2) = t1, 2, 3, 4u and D(x3) = t7, 8, 9u and the constraints R1,2 Ď D(x1)ˆD(x2) =

t(x1, x2)|x1+x2=6u and R2,3 ĎD(x2)ˆD(x3) =t(x2, x3)|x2+x3=12u. The mapping α is a

solution to the problem if it maps, for example, x1to 2, x2to 4 and x3to 8.

Constraint satisfaction problems are NP-complete in general but if all constraints of the in-stance are binary and the constraint graph of the inin-stance is a tree, then the inin-stance can be solved inO(d2n)time, where d is the largest domain and n the number of variables in the instance [7].

(24)

4 Bäckström’s Algorithm

This chapter presents Bäckström’s algorithm, gives an example of each step of the algorithm with example-instance and explain why the algorithm works.

The input to the algorithm in this project is always a delete-relaxed instance P+ = xV, A+, sI, sGy, since we will use the algorithm to compute the heuristic h+. The algorithm

reduces the planning problemP+ to a CSOP problemC= xX, D, C, Wy, with a binary con-straint graph, in FPT-time. The optimal solution forC is the length of an optimal path for the delete-relaxed instanceP+. One can often construct several optimal solutions forP+, with an optimal solution to the CSOPC, but we do not need to construct any of these paths since we are only interested in the length of an optimal path toP+, which is equal to heuristic function h+.

The first step of the algorithm is to compute the casual graph CG(P+). A tree decom-position xN, Ty of the casual graph CG(P+)is then computed. The undirected causal graph UCG(P+) is the input to the tree decomposition algorithm instead of the directed casual graph CG(P+), since there is no definition of a tree decomposition for a directed graph. A transition graph TG([P+_[_N

i])is then created for each vertex NiPN, when the tree

decompo-sition xN, Ty has been computed.

We are now ready to construct the CSOP-problemC = xX, D, C, Wy: for each variable Ni P N, there is a corresponding variable xi P X. The domain set D(xi), xi P X, is the set

of all solution-sequences of the transition graph TG([P+[Ni]), for all xi Px, where an action

occurs at most once in each sequence. This is due to the fact that a variable that is true in a state s can never be false in a state that is reachable from state s, so repeating actions is equivalent to looping in any of the transition graphs TG([P+[Ni]), where Ni P N that have

been generated. No sequence in any domain D(Xi)can therefore be longer than the number

of actions in TG([P+[Ni]), since we do not allow repeating actions in sequences.

Before we define the constraint set R of the CSOP-problemC, we define the weight func-tion W, and before we do that, the sets AI needs to be defined. We define the sets Ai as

Ai=ta|v P vars(eff((a)), v P Ni, Ej ă i, v P Nju, @a P A+. The sets Aiare a partition of the set

A+. The weight of a solution is the cardinality of the union of all actions in all sequences of the solution as we define the weight function W as W : D(Xi)Ñ |A+XAi|, since each action is

only counted once. Finally, we define the set of constraints R as follows: Ri,jĎD(xi)ˆD(xj)

such that tωi, ωju PRi,j ðñ ωi[NiXNj] =ωj[NiXNj], where ωiPD(xi), ωj PD(xj), i ă j

(25)

Figure 4.1: An illustration of how the constraint Rijis defined.

(The figure is used with permission from the author of [2].)

We will now illustrate the steps of the algorithm with an example:

Example 21 Let the instanceP+=xV, A+, sI, sGy, where V =tv1, v2, v3, v4u, A+ =

• a :∅Ñv1

• b :∅Ñv2

• c : v1Ñv3

• d : v2, v3Ñv4

• e : v2, v3Ñv4

,sI =tv¯1, ¯v2, ¯v3, ¯v4uand sG =tv3, v4u, be the input to Bäckström’s Algorithm. The first step of the

algorithm is to compute the causal graph CG(P+), which can be seen in figure 4.2.

v1 v2

v3 v4

Figure 4.2: A CG of for variables of the instance from example 21.

The tree decomposition xN, Ty of CG(P+)can now be computed. A tree decomposition (many exist) can be seen in figure 4.3. The algorithm used in the implementation for this project would compute this TD with the graph in figure 4.2 as the input graph. The node N2is unnecessary in this

TD, since N1and N3 could be neighbours if the node N2did not exist and the graph would still be a

valid TD.

We label the vertices arbitrarily in T as N1 = tv2, v4u, N2 = tv4u, N3 = tv3, v4uand N4 =

(26)

v4

N2 v3, v4 N3

v2, v4

N1 v1, v3 N4

Figure 4.3: A TD of the undirected causal graph in figure 4.2.

graphs can be seen in figure 4.4. The paths that leads to a solution for each TD[Ni],Ni P N, are the

domain for the variables in the CSOP problemC = xX, D, R, Wy. The domains for each variable of the CSOP problemC can be seen in the following list:

• D(X1) =txb, dy, xb, ey, xb, d, ey, xb, e, dyu

• D(X2) =txdy, xey, xd, ey, xe, dyu

• D(X3) =txc, dy, xc, ey, xc, d, eyxc, e, dyu

• D(X4) =txa, cyu

An optimal solution to the problem is to assign • X1=tb, du P D1 • X2=tdu P D2 • X3=tc, du P D3 • X4=ta, cu P D4 , since X1[N1XN2] =xdy=X2[N1XN2], X2[N2XN3] =xdy=X3[N2XN3], X3[N3XN4] =xcy=X4[N3XN4]

We can construct the following solutions forP+from the assignment above: • xa, b, c, dy

• xa, c, b, dy • xb, a, c, dy

A path toP+is constructed from a solution of the CSOP-problemC by including all actions that are in the solution to CSOP-problemC and by respecting all the partial-orders in the sub sequences.

The partial-orders in this example are a, b, c ă d and a ă c. Assigning X2=xd, ey instead

of tdu is also a feasible solution toC but any of the merged plans would then be of the length five so that would be a non-optimal assignment. Note that xa, by ‰ xb, ay, since the order of the elements in the sequences matter since the merged plan needs to respect the partial-order of all sub sequences.

Here is an idea why the algorithm works (the formal proof can be read in [2]): the algo-rithm is correct if the action aj, in the merged plan ω, can be applied in state sj. Action aj

(27)

¯ v2, ¯v4 v2, ¯v4 N1: v2, v4 b b d,e b,d,e (a) TG(P+[N1]) ¯ v4 v4 N2: d,e d,e (b) TG(P+[N2]) ¯ v3, ¯v4 v3, ¯v4 N3: v3, v4 c c d,e c,d,e (c) TG(P+[N3]) ¯ v1, ¯v3 v1, ¯v3 N4: v1, v3 a a c a,c (d) TG(P+[N4])

Figure 4.4: The sub-transition graphs for each projectionP[Ni]

, where NiPN. Only the states that are reachable from the initial state sIare visualized in

each graph. We do not need to consider the other states since we can never reach them from the initial state sI.

can only be applied in state sj if the precondition pre(aj)is satisfied in sj. Consider any two

variables u P var(pre(aj)) and v P var(eff((aj)). The variables u, v must both be in some

vertex Ni in N, and action aj[Ni]must be valid in the partial state sj[Ni]per definition. The

variable u must therefore have been set to true by an action ahthat must have been applied

before aj. The action ajmust therefore be valid in si, since no action can set a variable to false

(28)

5 Planners

This chapter describes how STRIPSinstances can be modeled in computer systems and how one can run instances with the planners Fast Downward and Ståhlberg’s Planner, which are the planners we use to benchmark instances.

5.1 Modell Planning Problems

The two planners model instances differently. The planner Fast Downward uses both the Planning Domain Definition Language (PDDL) and the translator file format to model a plan-ning instance but Ståhlberg’s Planner only uses the translator file format. Both models are described below.

Planning Domain Definition Language

Two files are used to model a planning problem in PDDL. One of the files is the domain file and the other one is the problem file. The domain file is to model functions and actions, while the problem file is to model the variables, the initial state and the goal state(s). The advantages with PDDL is that one domain file can be used with many problem files of the same type. A domain file which models the Block World problem (which was described in the introduction) can be used to model many instances of the Block-World problem, since many Block-World problem files can be used together with this domain file. Each of these problem files can have a different number of blocks, initial-states and goal-states. PDDL can only model planning problems that can be formulated in the STRIPSframework.

Translator File Format

The Translator File Format (TFS) is another file format that can model planning problems. It is possible in TFS to model SAS formulated problems, unlike in PDLL, where only STRIPS

formulated problems can be modelled. A TFS file is a list of variable names, their domain size, the name for each domain and variable, the initial state, the goal state(s) and actions. Each instance that we use in this project has a binary domain, since only strips instances are used in this project. A positive domain value can be any string of letters but the negative value of each domain in the instances we are using are always called "Negate" + the positive

(29)

5.2. Planning Systems

domain value so if the positive value for a variable has the name "OnA", the negative domain value for that variable is "NegateOnA".

5.2 Planning Systems

Fast Downward

Fast Downward (FD) is one of the planning systems we will us for the benchmark , which is written by Malte Helmert in C++ [15, 13]. The planner has become widely recognized since it won the fourth International Planning Competition at ICAPS 2004. FD has several implemented search algorithms, like A˚_{, Depth-first search, and Greedy search to name a}

few. It also has several heuristics, like "blind heuristic", FF-heuristic and the Causal graph heuristic [13]. To run an instance with FD, navigate to the folder where FD is located and enter the folder /src with a shell and execute the following three commands:

t r a n s l a t e / t r a n s l a t e . py [DOMAIN. pddl ] PROBLEM. pdl p r e p r o c e s s / p r e p r o c e s s < OUTPUT. SAS

s e a r c h /downward OPTIONS < OUTPUT

PDDL files can be found in the folder benchmarking/*/. The translation phase of FD often optimizes the output, which may result in an output file with non-binary domains for some variables. One can add the flag

´´i n v a r i a n t ´g e n e r a t i o n ´max´c a n d i d a t e s 0

to avoid this at the translation phase to force the planner to make all domains binary.

Ståhlberg Planner

The other planner we will use for the benchmark is Ståhlberg’s Planner, which is written in C# by Simon Ståhlberg [18, Chapter 4]. The search algorithm A˚_{and the heuristic function}

blind is implemented in the planner. To run an instance in the planner, execute the following command in a shell: mono / c o n s i s t e n c y ćhecking/ ConsistencyCheck/bin/Debug/onsistencyCheck . exe í c o n s i s t e n c y ćhecking/ t e s t í n s t a n c e s / i n p u t f i l e . s a s ´m normal ´s " a s t a r ( h e u r i s t i c = a l g o r i t h m ( ) ) "

(30)

6 The Implementation

The implementation of Bäckström’s algorithm that will be used to compute the heuristic h+

is explained and motivated in this chapter. The algorithm was implemented in Ståhlberg’s planner, which is written in C#.

Each heuristic has its own class in the planner. We named the implemented class of the heuristic to HPlus. The heuristic classes in the planner inherit virtual methods from the class HEURISTICSTATISTICS. The purpose of some of these methods is to give the planner information of the heuristic. This information includes binary data such as if the heuristic is consistent and if it is admissible. Other inherited methods collect information that can be used to measure how successful the heuristic is in terms of how many vertices that were expanded and generated in the search space during a search. A method with the signature VALUE(Instance, state)is inherited as well. Each state that the search algorithm explores will be an argument to the parameter state for this method. This method will return the heuris-tic value h+ for that state. The first parameter to the method VALUE(Instance, state)is the

original instance, i.e. the input instance to the planner. The parameter instance is an object of the class INSTANCE. This class has all the data-members an instance needs, such as variables, actions, goal(s) and the initial states. The class also has several methods that can be used to make a projection, generate the causal graph of the instance, etc.

The Constructor

Each heuristic class has a constructor, which takes the original instance as a parameter. We can do a few steps in the constructor since some steps of the algorithm only needs to be done once. The steps we can do in the constructor is to create the delete-relaxed instance of the input instance, build the causal graph of the delete-relaxed instance and construct a tree decomposition of the causal graph of the relaxed instance.

Modifications of the Instance

An advantage with delete-relaxed instances is that we know that a variable that is true in the initial state can never be false in a state that is reachable from the initial state. In light of this, we can project the instance over all variables that are false in the initial state. This is done by finding all variables in the initial state that have a value that do not start with "Negate",

(31)

since we know from chapter 5 that all negative values always start with the string "Negate". We then project the instance over these variables. This is done before the causal graph is generated. This can speed up the algorithm, since we may work with fever variables and actions. The causal graph of the instance may have several components after this projection has been performed, which implies that it is easier to compute a tree decomposition with a smaller width, since fewer variables are connected to each other in each component of the causal graph compared to if we would not have performed the projection.

The projection over the false variables in the initial state is made with the instance-method

PROJECT. The false variables in the initial state are found by a loop which checks which

variables in the initial state that have a value which starts with the string "Negate".

A method in the class INSTANCEis then called to delete all negative pre- and effect-

con-ditions for all actions in the instance. The modified instance is then saved in a data-member of the class since it does not need to be modified anymore but it is needed in the method VALUE(Instance, state).

Generation of the Causal Graph

The causal graph CG) of the instance can finally be generated and stored in a variable. The causal graph is generated by a method of the class INSTANCE.

Generating a Tree Decomposition

A tree decomposition (TD) of each component of the CG is now generated. The undirected CG is the input to the tree decomposition algorithm, since there is no definition for tree de-composition of undirected graphs. We use the algorithm number two in Bodlaender’s paper [5] to compute the tree decomposition(s) of the CG. The algorithm is a heuristic, which runs in O(n) time, where n is the number of nodes in the graph that is given as input to the TD-algorithm. A Heuristic in this context means that the TD the algorithm returns is not guaranteed to have an optimal width, or to be an approximation of a TD with an optimal width.

The algorithm takes a graph G = xV, Ey, and a map a as input. The map a maps each vertex to a unique integer between 1 and |V|. The algorithm works as follow: if only one vertex v is in G, then a TD is returned with one node Ni = tvu. Otherwise, a graph G1 is

computed, which is obtained by eliminating vertex v1in G. Vertex vi denotes the vertex that

the map a maps to the integer i, 1 ď i ď |V|. Eliminating a vertex v in a graph G means that edges are added to the graph G so that all vertices connected to v form a clique. Vertex v is deleted from the graph when these edges have been added. The result of an elimination is a graph.

A recursive call to the algorithm is now made with the arguments G1_{and a map a}1_{. The}

map a1 _{maps each vertex in the graph G to a unique integer between 1 and |V}1_{| ´}_{1. Let}

tXw|w P V1u, T1 = (V1, F1)be the TD returned from the recursive call. We create a new bag

Xi, which will be connected with the TD tXw|w P V1u, T1 = (V1, F1) . Let bag Xj be the

set of all neighbors to vj and vj itself, where vj = min(i|(v1, vj) P E). We add this bag to

(tXw|w P V1u, T1= (V1, F1))and add an edge between Xiand Xjand return the TD.

We use the graph data-structure ADJACENCYGRAPH<> in the implementation to store

graphs. This data structure allows us to add vertices and edges but not to remove them, which is problematic when we need to eliminate a vertex in a graph. We add an extra pa-rameter x, such that 0 ď x ď |V|, to the function to solve the problem. This papa-rameter is a positive integer. If a vertex v has order j so that x ą j, we always ignore v since it has been eliminated. The parameter x is increased by one in each recursive call.

(32)

The Projections of the Nodes

The last thing that is done in the constructor is to create the projectionsP+[Ni], Ni P N, for

each tree decomposition. The pseudo-code for the constructor on a high level can be seen in algorithm 2.

The Method Value

We will now describe the implementation of the method VALUE(Instance, state). The pseudo-code for this method can be seen in algorithm 3. There is a loop of components in the case that the causal graph has more than one component. If that is the case, then the tree decomposi-tion has many components as well and we then have several independent CSOP problems to solve. The sum of the weight of all the optimal solutions to each CSOP problems is returned by the Method VALUE(Instance, state).

The first thing that is done in this method is to project the input state s over all variables that are false in the initial state of the original instance. This has to be done since no data-structure that were generated in the constructor is familiar with the the variables that are false in the initial state of the original instance. As a consequence, these data-structure cannot rec-ognize the state s. We denote the projection of s over these variables as s1_{. A transition graph}

TG(P+[Ni])is generated for each projectionP+[Ni]with s1[Ni]as each graph’s start vertex.

The Instance-method TRANSITIONGRAPHgenerates these transition graphs. The State s1_will

be different for each VALUE(Instance, state)call and each transition graph TG(P+[Ni])will

therefore be different as well for each call to VALUE(Instance, state). This is why the transition graphs must be generated in this method and not in the constructor.

The Search Algorithm

The next step of the algorithm is to search for all the solution sequences that have non-repeating actions in all the transition graphs that we just generated. We use an algorithm by M. Migliore and V. Martorana F. Sciortino to find all paths in the transition graphs [16]. We will refer to this algorithm as Migliore’s algorithm. Migliore’s algorithm is a modification of the Depth-first search (DFS) algorithm. An edge(u, v)is never added to a path if vertex v is already in the path in DFS, but this is allowed in Migliore’s algorithm. The modification is simple: we traverse an edge e= (u, v, a)(and add action a to the path) if and only if action a is not already included in the path. All transition graphs that we are searching for paths in are directed acyclic graphs (DAG)s, but they may have loops, so we would exclude some paths if the search was performed with DFS. For example: both the paths xb, ey and xb, e, dy in the graph in figure 4.4a are solutions but the edge with the label d is a loop in the node v2, v4.

Migliore’s algorithm is not an FPT algorithm so this step of the implementation does not run in FPT. Originally was Einstein’s algorithm meant to be implemented, which is an FPT algorithm. However, due to the complexity of that algorithm in terms of number of steps, and the time limitation of this project, it was never successfully implemented in the project [9]. This does not imply that Migliore’s algorithm is slower in practise than Einstein’s algorithm. Migliore’s algorithm is much simpler than Einstein’s algorithm and may therefore be faster than Einstein’s algorithm.

Migliore’s algorithm is described in the paper as recursive but we implemented the algo-rithm iteratively, since an iterative implementation is faster than a recursive implementation. The pseudo-code for the iterative implementation can be seen in algorithm 5 and the function that calls Migliore’s algorithm can be seen in algorithm 4.

The input to Migliore’s algorithm in our implementation is a (transaction) graph and an operator o. This operator is pushed to a stack named S0and is added to the list this_path.

The number 0 is then pushed to the stack Si. This stack store indices. Each index in the

(33)

which iterates over the outgoing edge of the target-vertex of the last operator in the path that corresponds to the index.

The initialization of the function is now complete and the while-loop is entered: the vari-able pathiis assigned the top value in the stack S_i. If the target of the top element in this_path

is a goal, then the top element is added to the list of goal-paths but only if path_i is 0. The same path would be added several times to the list of goal-paths without that if-condition.

The for-loop is now reached. It iterates over all edges from the index pathi to the edge

with the highest index of all outgoing edges of the target of the operator o. If there is an edge with an operator which is not in the path already, the operator is added to the path and the number i+1 is pushed to the stack S_i. This is to remember from which index we shall start to iterate from the next time this path is assigned to the variable this_path. We push 0 to the stack S_i and the operator that was not in the path already to the stack S_0 and then break the for-loop. If an operator was never added the path when the for-loop terminates, then there are no more outgoing edges of the target of operator o that we need to consider and we can then remove the operator o from the path since we cannot find any more goals with this operator as the last operator in the path.

An action in the planner is an object of the class OPERATOR, which has the data-members

Name, Precondition, and Effect. We only need to keep track of the action names to avoid adding the same action several times in a path. The action names are stored as strings and can be quite long. An action-name can be "block_a_on_block_b" for example. To store less data, we use a mapping which map each action-name to a unique number and only store a number instead of an operator for each operator in a path. The expected look-up time to find the mapped integer for an operator is constant with the data structure

DICTIONARY<STRING,INT> in C#. All found solutions, for each TG(P+[Ni])are stored in

a dictionary DICTIONARY<VARIABLE[],HASHSET<PATHSOLUTION» so they can be looked up in constant time.

Solving the Constraint Satisfaction Problem Optimally

We now have all the information that is needed to solve the CSOPC= xX, D, C, Wy, which was defined in chapter 21: the set of nodes N is known, and the domains D(xi), for all xiPX,

which is the set of all paths that are solutions for TG[P+[Ni]], Ni PN , have been computed.

We need to find as assignment xi = ωi P D(xi), xi P X, so that ωi[NiXNj] = ωj[NiXNj],

where i ă j, and Ni, Nj P N are adjacent in T, so that the length of the merged plan ω is

minimised.

The pseudo-code for the algorithm that finds such an assignment can be seen in algorithm 7. The parameter Niis a vertex in the TD, x is an element in the domain D(xi), and Nj is a

neighbor of Ni in the TD. The variable D1 in the pseudo-code is the subset of D(xj)which

matches with x, and the variable N is Nj’s neighbors besides Ni.

To find all paths in the domain D(xj)that matches with x, we first do a look up in the

dictionary DICTIONARY<VARIABLE[],HASHSET<PATHSOLUTION». We then project a copy of the instance that was saved in the constructor instance over the variables NiYNj. The

actions in this projection are the mutual actions of the actions in P+[Ni] and P+[Nj]. Let

us refer to this set of actions as S. We then add all paths in y P D(vj)to the list D1 so that

y[S] =x[S], i.e. the paths that are matching with x.

If the set N is empty, then Njis a leaf vertex in the TD and the variable xjin CSOPC only

need to match with xi, so we can simply choose the cheapest element in D1. If N is not empty,

then Njhas at least one neighbor besides Niand we must test all the elements in the set D1,

and see which one that is best suited. We can find an optimal solution to the CSOPC with the algorithm 7 when we have created the sets Ai, and the weight function W as described in

chapter 4.

The first call to algorithm 7 is made in algorithm 6. The parameter Njis a center vertex in

Implementation and testing of an FPT-algorithm for computing the h+ heuristic

Linköping University | Department of Computer and Information Science

Bachelor thesis, 16 ECTS | Datateknik

2017 | LIU-IDA/LITH-EX-G–17/077–SE

Implementation and testing of

an FPT-algorithm for

comput-ing the h

+

-heuristic

Implementering och testning av en FPT-algoritm för beräkning

av h

-heuristiken

Niclas Jonsson

Upphovsrätt

Copyright

Acknowledgments

Contents

List of Figures

List of Tables

1

Introduction

1.1

Background and Motivation

1.2

Problem Definition

1.3

Limitations

1.4

Thesis Structure

2

Planning Theory

2.1

STRIPS Instance

2.2

Instance Projections

2.3

Transition Graph

2.4

Domain Transition Graph

2.5

Causal Graphs

2.6

Search Algorithms and Heuristics

Delete-Relaxation Heuristic

A

3

More Theory

3.1

Fixed-Parameter Tractability

3.2

Graphs

Tree decompostions

The center of a Graph

3.3

The Constraint Satisfaction Problem

4

Bäckström’s Algorithm

5

Planners

5.1

Modell Planning Problems