Learning stationary tasks using behavior trees and genetic algorithms

(1)

UPTEC F 20039

Examensarbete 30 hp Juni 2020

Learning stationary tasks using

behavior trees and genetic algorithms

Martin Edin

Masterprogrammet i fysik

(2)

Teknisk- naturvetenskaplig fakultet UTH-enheten

Besöksadress:

Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0

Postadress:

Box 536 751 21 Uppsala

Telefon:

018 – 471 30 03

Telefax:

018 – 471 30 00

Hemsida:

http://www.teknat.uu.se/student

Abstract

Learning stationary tasks using behavior trees and genetic algorithms

Martin Edin

The demand for collaborative, easy to use robots has increased during the last decades in hope of incorporating the use of robotics in smaller production scales, with easier and faster programming.

Artificial intelligence (AI) and Machine learning (ML) are showing promising potential in robotics and this project has attempted to automatically solve a specific assembly task with Behavior trees (BTs).

BTs can be used to elegantly divide a problem into different subtasks, while being modular and easy to modify. The main focus is put towards developing a Genetic algorithm (GA), that uses the fundamentals of biological evolution to produce BTs that solves the problem at hand. As a comparison to the GA result, a so-called Automated planner was developed to solve the problem and produce a benchmark BT. With a realistic physics simulation, this project automatically generated BTs that builds a tower of Duplo-like bricks and achieved successful results. The results produced by the GA showed a variety of possible solutions, a portion resembling the automated planner's result, but also alternative, perhaps more elegant, solutions. In conclusion, the approach used in this project shows promising signs and has many possible improvements for future research.

Handledare: Jonathan Styrud

(3)

Acknowledgements

I would like to thank ABB Robotics for giving me the opportunity to write my master’s thesis in an inspiring and professional environment. In particular, I would like to express large gratitude to my supervisor at ABB, Jonathan Styrud, for the guidance and support during the project. I wish you great success in your continued research. Moreover, I would like to thank Niklas Wahlstr¨om at Uppsala University for his interest and feedback during the project.

Finally, I would like to thank my mother and my sister for always believing in me and lifting me up. To all my friends and family, thank you for being there through the good times and the bad. This thesis is dedicated to the memory of my father, Daniel Edin (1967 - 2017).

Keywords— Behavior Tree, Genetic Algorithm, Evolutionary Algorithm, Automated Planning, ABB Robotics, ROS2, Algoryx Dynamics

(4)

Popul¨ arvetenskaplig sammanfattning

Automatiserade robotar blir allt mer en del av det moderna samhället. Att till exempel se en robot som automatiskt klipper gräset eller dammsuger ett hem är ingen ovanlighet. Traditionellt sett har robotbranschen drivits av tunga industrier, som med enorma produktionsmängder haft behov av automatiserade processer. Detta sätter stora krav p˚a kunskap och kapital hos företa- gen. ABB Robotics lanserade ˚ar 2015 världens första fullt kollaborativa robot YuMi, med en framtidsvision att kunna förändra hur människor och robotar samarbetar. Ett l˚angsiktigt m˚al

¨

ar att kunna visa en robot hur den ska genomföra en uppgift och att roboten ska kunna lära sig den automatiskt, utan ytterligare information. Detta skulle leda till att det blir billigare att använda robotar och därmed kan bli lönsamt för mindre företag som inte besitter den kunskapen som krävs idag.

Detta projekt kommer granska möjligheten att automatiskt lära en robot att stapla Duploklossar i ett torn. Som ett första steg görs detta i en simulerad miljö, med realistisk fysik som tar hänsyn till bland annat gravitation och friktion mellan Duploklossar och robotverktyg. En fortsättning p˚a detta arbete är att undersöka möjligheten att föra vidare resultaten i den simulerade miljön för att utföra tester med en verklig robot.

Figur 1. Förenklat exempel p˚a hur en robotdammsugare kan styras med hjälp av ett beteendeträd.

Styrningen av roboten sker med hjälp av s˚a kallade beteendeträd. Beteendeträd är en matematisk mod- ell för att kunna strukturera olika uppgifter i en trädmodell. För att övergripande beskriva hur mod- ellen fungerar används figur 1, som beskriver en robotdammsugare som p˚a ett förenklat sätt städar ett hem.

Roten av trädet, i detta fall urvalsnoden, kollar sitt vänstra barn “Har städat de senaste 24 timmarna”.

Om detta stämmer är urvalsnoden nöjd, eftersom den endast kräver att ett barn lyckas. Om inte, s˚a g˚ar den vidare till sitt andra barn, Sekvensnoden. Denna nod kräver att alla barn ska lyckas och börjar med att

”Städa alla ytor”. När denna uppgift är klar fortsätter sekvensnoden med nästa barn “˚Aterg˚a till startposi- tionen”. Lyckas den med detta skickar sekvensnoden tillbaka en signal till sin förälder, urvalsnoden, och

säger att den har lyckats. Urvalsnoden lyckas därmed ocks˚a. Under de kommande 24 timmarna behöver urvalsnoden inte städa eftersom det första barnet kommer vara sant.

För att roboten automatiskt ska lära sig en uppgift används en genetisk algoritm. Denna algoritm använder sig av teorin om naturlig evolution, där starkare individer har större chans att

¨

overleva och sprida vidare sina gener. Som jämförelse till den genetiska algoritmen används ocks˚a en algoritm som är baserad p˚a automatisk planering. Projektet lyckas utföra uppgiften och visar att denna metod har potential för denna sorts uppgifter.

(5)

1 Introduction

Modern industrial automation can solve complex tasks in structured environments, but the knowledge and time needed for programming and system integration are of large proportions. Conse- quently, automation solutions have traditionally been used in larger production scales of expensive products such as cars. Reducing the setup time with easier and faster programming could benefit smaller production series and make industrial robots profitable to use. The ultimate goal would be to achieve ”show and tell”-like programming, but the progress in industrial applications has in fact shown little progress. Specifically, machine learning (ML) has not been used commercially for higher level robot programming. With ML on the rise, as well as promising results in decision structures, the belief is that now could be the time to take the next steps in industrial automation. ML is a very promising technique in robotics that can make it possible to solve tasks autonomously by learning from experiments. A drawback with many of the learning techniques is that a lot of data is necessary to be able to learn and validate an ML solution. If a task, such as a grasp or an assembly operation, should be performed in an application, it is necessary to perform an extensive experimental effort which often poses a problem in reality.

Behavior Trees (BTs) [1] is a decision structure that can be efficiently modelled to perform and switch between different subtasks for autonomous agents. BTs originates from controlling virtual entities in video games, but have received increasing attention in robotics. Due to BTs being easy to model, change and reuse, they are well suited for ML, specifically evolutionary learning approaches, to generate solutions for various robotic tasks.

1.1 Thesis goal

The goal of this master’s thesis is to create BTs that assembles Duplo-like bricks into simple structures and investigate if different articifial intelligence (AI) methods can be used to automatically generate BTs that complete certain tasks. Simulation tools with a realistic robot gripper and objects will be used to accelerate the learning. Specifically, the main focus will be towards developing a genetic algorithm (GA) [2], that generates successful BTs for a certain task. As a comparison, an automatic planner [3] for BTs will be used as a reference to the GA result.

2 Background

2.1 ABB Robotics

ABB Robotics is one of the world’s leading manufacturers of industrial robots and robot systems [4]. Since the release of the first commercially available industrial robot in 1974, ABB has installed more than 400,000 robots worldwide [5]. Furthermore, ABB manufactures and supplies complete robot systems with software, peripheral equipment, process equipment, modular manufacturing cells and services for tasks such as welding and painting. Robot technology and automation solutions have traditionally been driven by the demand from large production scales, such as car manufactures and metal fabrication. However, during the last decades, the need for ”ease

(8)

of use”, collaborative robots has grown in industries, smaller business and in human everyday life.

ABB changed the game in 2015 by introducing YuMi, the world’s first truly collaborative robot.

YuMi gave rise to a new era where people and robots can work side-by-side, without any barriers, in a safe and productive manner. Collaborative robots shows great flexibility to assembly processes, where the goal is to make small lots of highly individualized products in short cycles. With human’s unique ability to adapt to change, combined with the robot’s strength of endurance in precise and repetitive tasks, it is possible to automate a variety of various types of products on the same line [6].

Figure 2: ABB YuMi

2.2 Behavior trees

A Behavior Tree (BT) is a way to model a plan execution for an autonomous agent, such as a virtual game character or a robot. BTs can effectively switch between different tasks, which has led to increasing attention during the last decade. Two main advantages of BTs are that complex systems can be created efficiently that are both reactive and modular. These qualities are advantageous in various applications, which has led to a increasing use of BTs in computer game programming, different branches of artificial intelligence(AI) and Robotics [1, 7].

With the assumption that tasks can be divided into sub-activities, a BT is a directed rooted tree with internal nodes called control nodes and leaf nodes called execution nodes. The common terminology of parent and child is used for the connection between nodes, where all nodes have exactly one parent, except for the root node that has none. A control node has at least one child, which could consist of other control nodes or execution nodes. An example of a simple BT, that performs a simple pick and place task, can be seen in Figure 3. The BT consists of one control node with three execution nodes as children.

(9)

Figure 3: Example of a simple BT with three sequential actions, ticked from left to right.

The execution of a BT initializes by the root node sending out signals to its children, called ticks, with a given frequency. The children nodes are executed if and only if they receive a tick.

Once a node is ticked, it immediately returns the status to its parent; Running if execution is under way, Success if the goal is achieved or Failure otherwise.

2.2.1 Nodes

The nodes in a BT are divided into categories with different features. A control node can be of type Sequence, Fallback and Parallel, while an execution node can be either an Action or Condition node. Below follows an explanation of the different types and a summary in Table 1.

A Sequence node (root node of Figure 3) ticks its children one by one, starting from the far left. The Sequence node succeeds if and only if all its nodes returns Success. However, if one of the children returns either Running or Failure, the status is returned to its parent and the execution of the Sequence is stopped until it receives another tick. Psuedocode of a Sequence node is shown below in Algorithm 1.

Algorithm 1: Pseudocode of a Sequence node with N children

1 for i ← 1 to N do

2 childStatus ← T ick(child(i))

3 if childStatus = Running then

4 return Running

5 else if childStatus = Failure then

6 return Failure

7 return Success

A Fallback node also ticks its children from the far left and succeeds if one of the children returns Success. If a child returns Running, the status is sent to its parent and the Fallback waits until the next tick. A Fallback node fails if and only if all its children return Failure. Pseudocode of a Fallback node can be seen below in Algorithm 2.

(10)

Algorithm 2: Pseudocode of a Fallback node with N children

3 if childStatus = Running then

4 return Running

5 else if childStatus = Success then

6 return Success

7 return F ailure

A parallel node ticks all of its N children and returns Success if M children return Success, Failure if more than N − M + 1 children return Failure, otherwise it returns Running. M acts as a threshold parameter and is user defined as M ≤ N . Pseudocode of a parallel node can be found in Algorithm 3.

Algorithm 3: Pseudocode of a Parallel node with N children and success threshold M

3 if P

i:childStatus(i)=Success1 ≥ M then

4 return Success

5 else if P

i:childStatus(i)=F ailure1 > N-M then

6 return Failure

7 return Running

An Action node executes a certain command when ticked. If the action is completed, it returns Success and if it fails it returns Failure. While the action is ongoing, the node returns Running. The difference to a Condition node is that it never returns Running. A Condition node can only return Success or Failure, depending on if the proposition holds or not.

(11)

Table 1: Node types and characteristics of a BT.

Node Success Failure Running

Sequence If all children succeed If one child fails If one child returns running Fallback If one child succeeds If all children fail If one child returns running Parallel If ≥ M children succeed If > N − M children fail Else

Action When completed Fails to complete During action

Condition If true If false Never

2.2.2 BT example

Consider the example in figure 3, with the pick node expanded into a sub-BT, as seen in figure 4.

The task is for an agent to locate a target, approach it, pick it up and then place the target at a specific position. The root, which is a Sequence node, starts ticking its first child ”Locate target”.

If the agent fails to locate the target, Failure is returned to the root and the BT fails. However, if it succeeds, the root ticks the next child which is also a Sequence. The ticks continue to explore the BT as it traverses down to the Condition node ”Target close”, which will return Failure if the target is out of reach. Since its parent is a Fallback node, the next children ”Approach target” is ticked until it returns either Failure or Success, while returning Running until it reaches either state. Assume the agent failed to approach the target, then its parent, the Fallback node, will fail and in turn both the Sequence nodes overhead will fail. If not and the target is approached, the first Fallback returns Success to the sequence node, which will tick the next Fallback node which initially reaches ”Target picked”. This condition node returns Failure and the action ”Pick target” is initiated and ticked until it reaches Failure or Success. Before reaching the last node of the root Sequence, the BT is again ticked from the left. ”Locate Target” returns Success and since ”Approach target” previously returned Success, the first Fallback node will succeed since proposition ”Target close” holds. Same goes for the second Fallback node, which will succeed since ”Target picked” holds. Finally, if ”Place target” succeeds, the BT returns Success.

(12)

Figure 4: Expanded example of the BT in figure 3.

An important note is the BTs ability to remain reactive, as mentioned above. In order to perform ”Place target” in figure 4, the root sequence ticks all its children and requires Success from all previous children. If the agent accidentally drops the target, the condition ”target picked” will return Failure, and hence the BT will keep ticking the second fallback node rather than proceeding to tick ”Place target”. However, this feature can be avoided by the user with the use of ”Control nodes with memory” [1], that avoid ticking its children once they return Success or Failure. Using control nodes with memory is a way to reduce ticks in parts of the tree where reactivity is not needed.

2.2.3 Comparing BTs to other methods

This section will briefly present two of the most used control architectures, Finite State Machine (FSM) and Utility System (US). The advantages and disadvantages compared to BTs will be discussed. More comparisons and generalizations of BTs can be found in [1].

• Finite State Machine(FSM): consists of events, transitions and states. Figure 5 illustrates and example that correlates with the BT example in section 2.2.2. Being in a current state, the FSM can move into a different state depending on which event that occurs.

(13)

Locate target Approach

target Pick target

Place target

Failure

Success

Located Approached

Picked

Fault Fault Fault

Fault

Target placed

Figure 5: FSM example. The thicker boundary of ’Locate target’ indicates the initial state. The events are connected to each transition line.

FSMs are widely used in computer science and are easy to implement and relatively easy to understand with simple problems. However, as the problem size expands, FSMs become complex and require a large number of states and transitions to remain reactive. This poses a problem for the modularity of the system and changing the model becomes a challenge.

The reusability is limited and makes it impractical to reuse FSMs or small parts of the FSM. Specifically, having n different states results in n² possible state transitions, which quickly results in a large structure that is very complex to adjust, debug and reuse [8].

• Utility system(US): USs are commonly used in AI as a tool to make decisions based on available data. The decision is made by quantifying input data and measuring the relative suitability of an action by mapping the input through a specific function. [9]. One disadvantage with USs is the difficulty to weight the input value in the function, which poses a challenge to implement when dealing with multi-parametric data. Also, with certain input data, it may be perfectly reasonable to perform several different actions. However, the model has proven to perform well on complex multi-choice decision. Another disadvantage is the adjusting and re-scaling a US. Attempting to change a strong performing system, by tweaking parameters for a specific element or expanding the number of actions, might result in unexpected results.

Comparing both methods presented above to BTs, there are some clear advantages regarding scaling, modularity and the ability to transfer parts of a model to new problems. The modularity does not change with increasing size of the BT, making it more reliable and easy to use compared to FSMs, while keeping reactivity. Since subtrees of a BT can be designed to perform specific subtasks, the reusability is superior. As BTs grow, they have a clear advantage in usability. Still, grasping very large BTs can be challenging, although not as complex as FSMs.

Furthermore, an FSM’s state transition, is in a way similar to a GOTO statement that is considered obsolete in higher modern level programming [10]. BTs structure of controlling trans- fers between nodes is more similar to function calling, with subtrees being ticked and returning the current status [7]. Even though USs performs great on multi-choice problems, BTs show great promise when it comes to switching between problems and reusability as discussed above.

However, using US on BTs having multiple actions might possibly be the best solution [9].

(14)

2.2.4 BTs with backchaining

Figure 6: Example BT using backchaining.

BTs can be created by using so called backchaining, meaning that the execution starts backwards with the main goal of the task [1]. This increases the reactivity of the BT by checking each goal state before trying to achieve them. As seen in figure 6, the condition c is the main goal to be achieved. The BT has two subtrees, where both have an action ai which completes the main task, granted that their preconditions c_jk holds. This format is called Postcondition- Precondition-Action (PPA). Generally, a postcondition c can be achieved by a single or a set of actions a_i, with its corresponding sets of con-

ditions c_jk. From a computational and efficiency point of view, it is advantageous to put actions that are most likely to succeed first, as well as having preconditions that are most likely to fail first in the subtrees.

A more specific example can be found in figure 7(a), where an agent is supposed to ride a unicycle if the tire tube is in good condition. If the main task ”Riding unicycle” is not achieved, the BT proceeds down to check if the tire tube is functional. Granted that the tube is ok, the BT proceeds to tick action ”Start to ride unicycle” and the main task is complete. However, if the tube is malfunctioned, the condition fails. Applying two PPAs to the failing condition ”Unicycle tube is ok” resolves the problem if the agent is able to complete either one, seen in the expanded example in 7(b). The first alternative is the action that is more likely to succeed, simply switching the tube if a new tube is available. The second choice has two preconditions that must succeed before performing the action. Either of the PPAs achieves the postcondition ”Unicycle tube is ok”, which in turn achieves the precondition for ”Start to ride unicycle” and the main task is completed.

(15)

(a) PPA for achieving the postcondition ”Riding unicycle”. If the main task is not already achieved, the BT checks the PPA and performs action if the precondition is satisfied.

(b) Same BT as figure 7(a) with expanded condition ”Unicycle tube is ok”, that satisfies the condition if possible with two different PPAs.

Figure 7: PPA example of riding a unicycle.

2.3 Planning and acting with BTs

As presented in [1], a Planning and acting approach for BTs (PA-BT) was inspired by a task planner dealing with infinite state spaces, called the Hybrid Backward-Forward (HBF) algorithm [11]. The algorithm has shown great efficiency in solving problems with large state spaces. By using an HBF-style algorithm, one can create descriptive models of different actions and map them onto an operational model that defines how to perform certain actions during different circumstances. The PA-BT uses the advantages of BTs, being both modular and reactive, combined with the planning capability in an infinite space from HBF. This approach arises from the so-called STRIPS-style planning, proposed at Stanford in 1971 [12].

PA-BT uses the idea of backchaining, presented in section 2.2.4, where a failing condition is replaced by a PPA BT. A simple example is illustrated below in figures 8-10. The BT is initially a single node corresponding to the main goal seen in figure 8, where an object is to be placed at a certain goal position. Since the task is not completed, the BT fails and the planner proceeds to look for an action that have a postcondition that achieves the condition that failed, resulting in adding a PPA with the action ”Place object at GOAL”. Figure 9 shows the first expansion of the BT. If the agent is holding the object, the condition ”Object in hand” is already satisfied

(16)

and the BT returns success. However, if the condition returns Failure, the planner once again searches for an action with a postcondition that satisfy the failing condition. As seen in figure 10, the action ”Pick object” holds for reaching the condition ”Object in hand”. Given that the agent initially has no object grasped, the BT performs a successful ”pick and place” task. More advanced examples can be found in [1, 3].

Figure 8: Main goal, BT with one condition.

Figure 9: BT after 1 iteration. Main goal failed and found action ”Place” as

a solution. Figure 10: BT after 2 iterations.

”Handgrasped” failed and found

”Pick” as a solution.

(17)

The algorithms 4-5 presents Pseudocode of a PA-BT inspired by [1, 3], followed by descriptions of the different parts in detail.

Algorithm 4: Main code for PA-BT

1 T ← ∅; // Create an empty BT.

2 for c in Cgoal do

3 T ← SequenceNode(T , c); // Add goal conditions in a to a Sequence node.

4 while True do

5 r ← Tick(T ); // Tick behavior tree.

6 if r = Failure then

7 cf ← GetConditionToExpand(T ) ; // Find failing condition.

8 T ,Tnew subtree ← Expandtree (T , cf)

9 while Conflict(T ) do

10 T ← IncreasePriority(Tnew subtree)

Algorithm 5: Expand BT function for PA-BT

1 Function Expandtree(T , cf):

2 A_T ← GetAllActionsTemplatesFor(c_f)

3 Tf allback ← c_f

4 for action in AT do

5 Tseq ← ∅

6 for c_a in action.preconditions do

7 Tseq ← SequenceNode(Tseq, ca)

8 Tseq ← SequenceNode(Tseq, action)

9 Tf allback ← CreatePPA(Tf allback,Tseq)

10 T ← Substitute(T , cf,Tf all)

11 return T , Tf allback

(18)

Figure 11: Action template for Place.

The postcondition will be met if the preconditions are satisfied and the action succeeds.

Before running the algorithm, the user specifies the initial state and the available PPAs for the problem, refered to as Action templates. Algorithm 4 initiates the PA-BT by adding the set of goal conditions,Cgoal, to a sequence node. Note that if there is only a single goal condition, a sequence node is superfluous as presented in figure 8. Line 4 starts the loop and ticks the BT. Once a condition in the BT fails, Algorithm 4 Line 7 finds the failing condition in the BT. Once found, the call to expand BT is done at Line 8. Algorithm 5 line 2 gets all the action templates that satisfy the failing condition.

In the example above, the first action template is ”Place object”. An action template has a number of preconditions that are added to a sequence node, seen in Line 6.

A typical example of an action template can be seen in figure 11. Once all the preconditions are added, the action is added last in the sequence in Line 8. Finally, the subtree is added to the fallback parent of the condition, creating a PPA-style subtree on Line 9. This is done for all the actions that satisfy the condition. In figure 9, the result of one iteration is presented. Algorithm 4 loops until the main goal is satisfied. Figure 10 shows a complete BT.

Line 9 in algorithm 4 is used to solve conflicts in the PA-BT. A conflict appears as a result of the planner expanding a certain condition which is failing, but results in the postcondition of the expanded subtree to leave the agent in a position where it is not able to proceed ticking the BT. The function ”IncreasePriority” in Line 10 then moves the subtree to a higher priority in the BT until no conflicts exist. An example of this can be found in [3].

2.4 Genetic Algorithm

Genetic Algorithm(GA) is an evolutionary programming method based on the fundamentals of biological evolution. The method was first proposed by John Holland in 1975 for computationally difficult problems [13]. By utilizing the mechanisms of natural selection, where stronger individuals in a population are more likely to survive and reproduce, GAs have been proven to be a successful and powerful optimization tool [2]. This section explains the fundamentals of GAs and the different functionalities that build the algorithm.

2.4.1 Fundamentals

The fundamentals of GAs consist of having a population of individuals, where an individual can represent a possible solution if structured correctly. Every individual is represented as a genome with different traits in a specific order that forms each individual. Every trait can be seen as a gene of the genome, where each trait has an impact on the performance of the individual. The traits can consists of parameter values, action calls, etc., depending on the encoding of the traits.

Let t ∈ [t1, t2, ..., tn], be all possible traits, then all possible combinations of the traits form the

(19)

search space of the problem. An example of two different individuals can be seen below in figure 12.

Figure 12: Possible GA encoding of two individuals.

2.4.2 Evaluation

The key aspect of GAs is to be able to quantitatively evaluate the performance of each individual.

The evaluation is done by a predefined fitness function or a cost function, that measures the success of an individual in terms of various criteria and objectives such as sub-goals, distance, execution time etc. After the evaluation, the candidates with good score have a higher probability to be chosen to the next generation. How to measure the success of a problem is not always trivial, posing a challenge for the developer when designing the different attributes to be validated.

2.4.3 Selection

To produce the next population, the GA uses the measured score as a discriminator of quality for each individual and a selection method to drive the evolution of selective individuals forward.

The selection can be done in several ways and some of the most common ways are listed below [14].

• Elitist selection: The individuals with the best score are chosen to the next generation.

• Fitness/cost proportional: Individuals with the best score are more likely to move on to the next generation, but not guaranteed.

• Rank selection: Similar to fitness proportional, that lets the user define the probability of choosing each ranked individual.

• Tournament selection: Several subgroups of the population are chosen to compete against each other with uniform probability, where only the individual with the best score are chosen from each subgroup.

Depending of the nature of the problem, GAs might use complete replacement between generations, but can also keep a fraction of the individuals to ensure that the next generation is at least as good as the preceding. A complete guide of when to choose a certain method can be found in [15].

(20)

2.4.4 Crossover

After selection, the GA performs a crossover between two individuals to yield offspring. The idea is to mix the genes between individuals and can be done in several ways. An example can be seen in figure 13.

Figure 13: Example of crossover where two parents mate and produce mixed offspring.

2.4.5 Mutation

Finally, after selection and crossover, a chosen amount of the individuals are mutated to keep diversity. This is done by either switching one of the genes or by adding or removing a randomly chosen trait. An example can be seen in figure 14.

Figure 14: Example of mutation, where the trait ”t6” is switched with ”t3”, producing an offspring.

2.4.6 Pseudocode of GA

The exact structure when designing a GA varies depending on the problem at hand. The choice of hyper-parameters such as population size, chance to use crossover, chance to mutate, cost/

fitness function etc., will depend on the nature of the problem. A general example of a GA can be seen in algorithm 6. A more detailed GA is presented in section 3.

(21)

Algorithm 6: Pseudocode of a GA.

1 Randomly generate an initial population of I individuals.

2 Calculate cost of each individual i in the population.

3 Select individuals for next generation.

4 Change individuals randomly.

5 Repeat 2-4 until criteria is achieved.

2.5 Choice of learning algorithm

The structure of BTs have been shown to be well suited to evolutionary algorithms, such as GAs [7]. Specifically, locality has been shown to be an important factor for the performance, meaning that small changes in the model gives small changes in results [16]. The BT’s modularity and node structure enables such behavior. As discussed in [1], GA is a good candidate when developing BTs from scratch with no prior knowledge.

Reinforcement learning (RL) is a greatly developed and powerful ML tool. The basics of RL is for a software agent to perform actions in an environment that can be represented in different states. The agent has a set of actions to choose between that each changes the state of the environment in different ways. Depending on the outcome, the agent receives feedback on how beneficial an action is in a certain state. By using trial-and-error, the agent learns over time to choose the correct actions in certain situations [17]. When learning a model from scratch with RL(sometimes refered to as Pure RL), the search space becomes excessively large. This requires a tremendous amount of computational power and Pure RL is by some seen as an ineffective way of learning [18]. Using RL on BTs from scratch will quickly result in excessive search spaces.

Still, RL has proven to be useful on predefined BTs, where RL is used on smaller state spaces to optimize certain subproblems of a BT, see example in [1].

This project attempts solving relatively simple examples where the search space of an RL algorithm might not be a problem. However, since the hope is that more complex tasks will be explored in the future, the choice of using GA is made to avoid the curse of dimensionality.

2.6 Previous work

BTs originate from the need to control modular AI in computer games, but its potential has recieved great attention in robotics during the last decade. A grand survey of BTs in robotics and AI [7] lists 166 different papers divided into application area and topic matter. This section shows some of the articles relating to this thesis.

• The automatic planner presented in [3] presents both mobile and stationary robots. The planner is able to generate BTs for a different task, given a main goal, and furthermore replan if outside factors disturb the task. For example, when proceeding to goal, an obsta- cle is placed in front of the robot and the subtree that moves the BT towards goal fails.

In this case, the planner expands and solves the problem, while at the same time avoiding

(22)

conflicts by rearranging the BT structure.

• An example of the use of GA with BTs can be seen in [19]. This paper applies GA to learn a autonomous agent to solve a task in a video game using BTs. The implementation uses the knowledge of how BTs are structured with conditions and actions to evolve, which makes for clean and well structured BTs.

• In [20], BTs are used to control swarms of robots to solve a specific task, by applying GA as learning method. The field of swarm robotics is inspired by natural phenomena, such as flocks of birds or schools of fish moving collectively as a unit. By using a clever approach of how to evolve the BT with GA, they managed to achieve promising results in a simulated environment. Also, they tested the resulting BT with real life robots with successful results, however with slightly lower precision.

• The paper [21] shows learning BTs with reinforcement learning. The BTs do not learn from scratch, but rather uses a predefined BT and chooses a certain behavior in specific situations. One of the investigated situations is when an agent can choose between specific types of fire extinguishers, depending on the type of fire to be put out. The results converged to 100% accuracy.

This project will use GA on BTs without providing any prior knowledge of how a BT can be structured properly, with an exception of creating infeasible BT structures. The BTs will be completely randomly generated and treat each node the same when making changes to the BT.

The projects that uses GA, mentioned above, utilizes BT properties or rules when running the GA. This makes for more structured development of the trees, where control nodes, conditions and actions can be used to get desirable structures. The GA used in this project is fairly simplified compared to the others, as discussed further in section 3.

Also, a PA-BT inspired by the work mentioned above will be implemented. The implementation lacks some vital features, also explained in section 3.

(23)

3 Method implementations

As mentioned in the Thesis goal in section 1.1, the goal of the task studied in this project is to produce a BT that solve some assembly task. Specifically, the goal is to stack three objects on top of each other in a specific order. This is done in a simulated environment, shown in figure 15. This environment uses realistic physics, where a robot gripper is created as two finger-like structures, to resemble the YuMi robot shown in section 2.1, with three Duplo-like bricks. More details of the simulated environment can be found in section 3.4. Two different methods are used. First, a GA executes simplified BTs in the simulation environment in pursuit of completing the task, having predefined actions with built in conditions. Second, as a comparison to the GA results, the PA-BT produces a theoretical BT that utilizes the structure of having conditions connected to specific actions. More detailed description of the limitations of this project can be seen in section 3.1.

– Method 1: Genetic Algorithm (GA). All the actions the robot can perform, as well as the control nodes of a BT, are encoded as traits for an individual in the GA. The GA generates a population of individuals with an initial number of traits, represented as either actions or control nodes, to then run the algorithm a specified number of generations to try to achieve the goal. The success is measured by the distance from a predefined goal position of each brick. The process is done in a simulated environment and presented in section 3.2.

– Method 2: Planning and Acting for BTs (PA-BT). Before executing the PA-BT, all available action templates are defined. The PA-BT then receives a set of goal conditions to accomplish, in the form of conditions. If a condition fails, the PA-BT expands the BT with a suitable action to satisfy the failing condition, see section 2.3 for more details about the method. The method succeeds when all goal conditions are achieved. The results are not directly used in the simulation due to the limitations of the simulation setup. The method is presented in section 3.3.

Figure 15: Visualization of the setup in Algoryx dynamics, with three duplo bricks. The robot performs a pick action on the blue brick.

(24)

3.1 Limitations

Running BTs in the simulated environment has its limitations. The GA creates individuals with predefined actions that incorporate the conditions connected to each action, see section 3.2.1.

This produces simple BTs, where the result usually is a sequence with the required actions in some correct order. This leads to non-reactive BTs that exploit neither the theory presented in section 2.2.2 about reactive BTs nor the Backchaining approach of BTs, mentioned in 2.2.4.

Figure 16: Simplification of the PA-BT tree shown in figure 10.

The PA-BT result acts as reference to how a BT can be designed to acquire reactive properties. The theory presented in 2.3 utilizes many of the strengths of BTs, by using backchaining and creating reactive BTs. Due to the simplifications done to the BTs in the simulated environment, directly simulating the BT result from the PA-BT is not possible at the moment. However, results from the PA-BT can be translated into a simple BT that can be used in the simulation. Figure 10 shows a BT that performs a ”pick and place”, with corresponding conditions. This tree can be translated into a sequence with two actions as seen in figure 16.

To achieve the structure of BTs with reactive features, the environment needs to be proba- bilistic, where states can change seemingly at random. As an example, an already placed brick can be returned to its starting position. Otherwise, the GA is prone to create the easiest solution like in figure 16, since the condition check will not effect the result. As an example, the user can define a probability that the robot drops the object after it’s picked.

3.2 Method 1: Genetic algorithm (GA) implementation

This section presents the GA implementation together with the simulated environment. The different parts of the GA will be described in detail and ultimately, the setup between the GA and the simulated environment is presented. An illustrative block scheme for the setup can be seen in figure 23.

3.2.1 Task and available actions

The assembly task to be attempted by the GA is to stack three Duplo bricks at a specific goal in a certain order. The actions available are listed below. To keep the BT as simple as possible, the conditions of each action are incorporated into the actions. Actions move the robot gripper to various reference positions and succeed if it arrives within a certain range, between 0.1 mm or 1 mm, depending on the importance of the reference position. The end goal for each action is to move the robot gripper to a reference position. This is done in several steps, depending on

(25)

the action, and the different actions are described below. More detailed information about the simulated environment can be found in section 3.4.

• Pick brick: The action can pick a brick given there is not already another brick picked.

If the robot already has grasped an object, the action returns Failure. Otherwise, if the robot is able to get to the gripping position, grip the brick and successfully lift the brick, the action returns Success.

• Place brick at target: This action can only be performed if the robot has grasped an object. Furthermore, the robot keeps track of how much the brick ascended when picking the object and therefore places the object at target on the same level that the brick is picked from. This enables the robot to lift stacks of bricks. If the gripper manages to move the gripped object to the target, the action returns Success.

• Place brick on another brick: If a brick is grasped, the robot checks the position of the other brick and proceeds to stack the grasped brick on top. If the robot gripper is able to move the gripped brick to the reference point of the other brick, the action returns Success.

• Apply force on brick: Due to the structure of a Duplo brick, a force is required to merge two stacked bricks. This is done by the gripper that approach the brick from above with a closed grip, applying force in the downward direction during a chosen simulation time period.

3.2.2 Cost function

The cost function, see section 2.4.2, is measured by comparing the bricks’ positions against predefined goal positions for each brick. The total distance difference in millimeters is penalized with cost value 1. Furthermore, to keep the BT as compact as possible, the cost increases with the number of nodes, penalized with 0.1 per node.

3.2.3 BT representation

Figure 17: Visualization of the way a BT can be created from a string repre- In order to represent a BT as an individual for the GA,

the different node types are encoded as traits. These traits can be combined to form string representations of an individual’s genome. Below, different rules are described and an example of a individual with its traits can be seen in Figure 17.

• Control nodes: The different control nodes are encoded as “s(”,“f(” and “p(” for sequence, fallback and parallel nodes, respectively. The parenthesis indicates a start of a subtree or the root of the BT.

• Leaf nodes: All conditions and actions are encoded as a unique trait. See example in figure 17.

(26)

• Up-node: This node does not work as a part of the BT, but rather changes the structure. The up- node is seen as a trait of the individual and works as a command to close control nodes of the BT.

See example in 17.

3.2.4 Creation of BTs

The initial population of string BTs for the GA is randomly generated at the initialization stage, with a specific start length set by the user. The BTs gets a random control node as a first element in the string, to create a valid BT structure. Afterwards, all possible nodes can be added to the BT until the desired length is achieved. However, after creation, the BT is controlled so that the structure is valid. If a BT is not structured correctly, say by having a control node as the last element, the individual is removed and a new BT is generated. The “Up-node” can be randomly added anywhere in the BT (except for the first element), but is also added after each individual is created to close the string. See the string representation in figure 17.

3.2.5 Code strucure & Hyperparameters

Algorithm 7 shows a version of the implemented GA. The algorithm has a various hyperparameters described in table 2.

Algorithm 7: Pseudocode of the GA used in simulation.

1 Choose number of elitist individuals(E), number crossovers(CO) and number of mutations(M ), summing up to population size P .

2 Randomly generate an initial population of P individuals.;

3 for Generation ← 1 to N do

4 Calculate cost C(i) of each individual I in population.

5 Append the elite individuals E to new population.

6 Perform tournament selection for CO individuals and perform crossover.

7 Perform tournament selection for M individuals and perform mutation.

8 Append crossover and mutation offspring to new population.

(27)

Table 2: Node types and characteristics of a BT.

Hyperparameter Definition Note

Actions A list with all possible actions that the BT can use.

Start length Start length of each individual of the initial population

Generations The number of generations to be simulated.

Specific criteria is not set since the op- timal result might not be known.

Population size The number of individuals in each generation.

Crossover Specifies the number of individuals mating, generating equal amount of offspring.

Mutations The number of individuals going through mutation.

Elitism Defines how many of the top individuals that survive between generations. Used to guarantee that the top individual in the next generation is at least equally good.

Acts as a result of the number of crossovers and mutations made, to keep population size static.

Mutation: Add node Probability of adding a node of any type instead of mutating.

Specified by user.

Mutation:

Delete node

Probability of deleting a node of any type instead of mutating

Specified by user.

Mutation:

Switch node

Probability of switching a node of any type into random node.

The remaining probability after add and delete have been chosen.

Selection method Choose between Elitist selection or Tournament selection. Tour- nament selection is done for both Crossover and Mutation.

(28)

3.2.6 Tournament selection

The GA can use tournament selection as selection method, described in section 2.4.3. The BTs in a population are randomized before participating in the tournament. Then, they are paired in two individuals per group, where the best individuals are selected. This procedure is done for both crossover and mutation. An illustrative example is shown in figure 18.

Figure 18: Example of 8 individuals participating in a tournament.

3.2.7 BT mutations

This section gives a description of the mutations that can be applied to a BT. Three illustrative examples are shown in this section, figures 19-21, while illustrative figures to the remaining mutations can be found in the Appendix, in section 8. When mutating, all node types can be switched, deleted or added, which makes it important to keep track of the structure of the BT.

A mutation that destroys the structure of a BT is not allowed. One of switch, add or delete is picked, given the specified probabilities followed by randomly selecting the position in the BT to be changed. Below follows a list with the different changes with motivation.

If the mutation of adding a node occurs, one of the following changes is made to the BT:

• Add leaf node: Since the size of the final BT is not known, the mutation can add a random leaf node to any position of the BT. See figure 19.

• Add control node: Adding a control node to the BT requires it to have some children.

The control node inherits all the remaining nodes to the right of where it is added, creating a subtree with all the nodes and subtrees of the current level. If the current position that the control node is added to does not have children to the right, a new position is chosen randomly. See figure 38 and 39 in Appendix.

If the mutation of deleting a node occurs, the following changes can occur:

• Delete leaf node: This mutation is done to change the structure and remove superfluous

(29)

• Delete single child leaf node: Special case, when a leaf node is the sole child of a control node. This mutation also deletes the parent, since a control node without a child forms an infeasible BT. See figure 40 in Appendix.

• Delete control node: Changes the structure of the BT. The parent of the control node inherits all the children of the deleted control node. See figure 41 in Appendix.

Finally, if the mutation switches a node, the following changes can occur:

• Switch leaf node: Standard mutation, simply swap a leaf node to another leaf node. See figure 21.

• Switch control node: Changing the type of a control node can change the entire structure of the BT. See figure 42 in Appendix. Special case: This switch always occurs if the root node is chosen.

• Switch a leaf node with a control node: Special case, simply switch a behavior into a control node destroys the structure and yields an infeasible BT. By allowing the control node to inherit the leaf node which is switching, the BT remains feasible. Also, a control node with a single child is superfluous and yields the same result as before switching. Thus, a random behavior is also added when this occurs. See figure 43 in Appendix.

• Switch a control node with a leaf node: A mutation in which all the children of the mutating node are added to the parent of the mutating node. See figure 44 in Appendix.

Special case for dealing with the ”up-node”:

• Add/delete an ”up-node”: By adding or deleting ”up-nodes”, the structure within the BT changes. The nodes of different subtrees are moved to different subtrees. Switching a

”up-node” to a different kind is not allowed. An example of adding can be seen in figure 45 in Appendix.

(30)

Figure 19: Add a leaf node “b0” into sequence node.

Figure 20: Delete leaf node “a3”.

Figure 21: Switch leaf node “a1” into leaf node “b1”, seen in bottom left corner of the BTs.

(31)

3.2.8 BT crossover

The crossover switches a subtree or a single leaf node of a BT with another BT at random. The key is that all parts of the BT can be switched, from single behaviors to large subtrees. After selction, the parents are randomly paired and a random node from each BT is chosen. The crossover can be as easy as switching two leaf nodes between two parents, but can also be used to inherit large subtrees. A crossover between two BTs can be seen in figure 22, where a leaf node is switched with a subtree.

Figure 22: Result of a simple crossover. The parents before crossover can be seen to the left with red circles surrounding the crossover parts. The right part of the tree shows the offspring result.

3.2.9 Software setup and simulation block scheme

This section presents the different entities connected to the GA implementation. The software are presented in a list below and a block scheme in figure 23 illustrates the connection, together with a descriptive text.

• To create and handle BTs, the open source Python implementation Py-trees [22] is used.

• Algoryx Dynamics [23] is used as physics engine. Duplo-like bricks are used as well as an adaptation of the YuMi gripper. See figure 15 for visualization and section 3.4.

• The communication between the BTs and the physics engine is done with Robot operative system 2(ROS2) Eloquent for Windows 10 [24]. ROS2 is a open source software for robotics development. The software can for instance be used to enable message-passing between different processes. Both Algoryx Dynamics and Py-trees have built-in ROS2 support.

ROS2 is used for both programs, where each setup a topic to write information onto, whilst the other reads the information. See figure 23 to see a visual representation of how the programs are connected.

(32)

GA Execute GA

Algoryx dynamics Physics engine Setup simulation

environment, launch ROS node

Behavior Tree Launch ROS node for BT and tick tree.

Topics Return result Send individual

Read data

Publish instructions

Publish data Read

instructions

Figure 23: Block scheme of the GA. The thicker border indicates a separate program.

Figure 23 shows the two top blocks with thicker border, which indicates separate programs that are running simultaneously . The GA, top left, is executed with chosen hyperparameters, sending an individual in the form of a string containing the BT to be simulated to the block ”Behavior Tree”. To create a BT individual from a string representation, see example in figure 17, a recur- sive algorithm iterates through the string. Once the BT is created, a ROS2 topic is created to publish instruction to the physics engine from the BT.

Meanwhile, the physics engine, shown in the top right block, is launched with the setup shown in figure 15. The physics engine continuously publishes data of the simulation onto a ROS2 topic.

Both the programs subscribes to the other program’s topic, enabling message passing between the two entities.

Once initiated, the BT reads the data published by the physics engine and is ticked. The BT starts publishing instructions to a ROS2 topic, which the physics engine reads and performs. As the physics engine updates, it publishes the data to the ROS2 topic and the BT reads information if the action performed succeeded or failed. Once the BT reaches state Success or Failure, the result is sent back to the GA and a cost function calculates the success of the run. The physics engine resets between every individual BT and the process is repeated until the GA is finished.

3.2.10 Syncing simulation clocks, avoiding duplicates and multiple runs The simulation clock of the physics simulation and the ticks in the running BT are synced to control the BT ticking speed and keep it constant with regards to the physics engine. During

Learning stationary tasks using behavior trees and genetic algorithms

Examensarbete 30 hp Juni 2020

Learning stationary tasks using

behavior trees and genetic algorithms

Martin Edin

Masterprogrammet i fysik

Abstract

Learning stationary tasks using behavior trees and genetic algorithms

Popul¨ arvetenskaplig sammanfattning

Contents

1 Introduction

1.1 Thesis goal

2 Background

2.1 ABB Robotics

2.2 Behavior trees

2.3 Planning and acting with BTs

2.4 Genetic Algorithm

2.5 Choice of learning algorithm

2.6 Previous work

3 Method implementations

3.1 Limitations

3.2 Method 1: Genetic algorithm (GA) implementation