DECISION-MAKING FOR AUTONOMOUS CONSTRUCTION VEHICLES

(1)

School of Innovation Design and Engineering

V¨

aster˚

as, Sweden

Thesis for the Degree of Master of Science in Intelligent Embedded

System - 30.0 credits

DECISION-MAKING FOR

AUTONOMOUS CONSTRUCTION

VEHICLES

Marielle Gallardo

mgo12003@student.mdh.se

Sweta Chakraborty

scy17001@student.mdh.se

Examiner: Daniel Sundmark

M¨

alardalen University, V¨

aster˚

as, Sweden

Supervisors: Saad Mubeen, Ning Xiong

M¨

alardalen University, V¨

aster˚

as, Sweden

Company supervisor: Torbj¨

orn Martinsson

Volvo Construction Equipment, Eskilstuna, Sweden

(2)

Abstract

Autonomous driving requires tactical decision-making while navigating in a dynamic shared space environment. The complexity and uncertainty in this process arise due to unknown and tightly-coupled interaction among traffic users. This thesis work formulates an unknown navigation problem as a Markov decision process (MDP), supported by models of traffic participants and userspace. Instead of modeling a traditional MDP, this work formulates a Multi-policy decision making (MPDM) [1] in a shared space scenario with pedestrians and vehicles. The employed model enables a unified and robust self-driving of the ego vehicle by selecting a desired policy along the pre-planned path. Obstacle avoidance is coupled within the navigation module performing a detour off the planned path and obtaining a reward on task completion and penalizing for collision with others. In addition to this, the thesis work is further extended by analyzing the real-time constraints of the proposed model. The performance of the implemented framework is evaluated in a simulation environment on a typical construction (quarry) scenario. The effectiveness and efficiency of the elected policy verify the desired behavior of the autonomous vehicle.

Keywords: shared-space users, MPDM, timing analysis, planning and decision-making, autonomous vehicle, MDP, reinforcement learning, social force model

(3)

Acknowledgements

The authors would like to express sincere gratitude to the academic supervisor of this thesis, Ning Xiong, co-supervisor Saad Mubeen and thesis examiner Daniel Sundmark for their continuous support, motivation and immense knowledge. The valuable guidance and feedback improved the quality of the thesis work by steering it in the right direction.

Besides our advisor, we would also like to express great gratitude towards our industrial su-pervisor, Torbj¨orn Martinsson for providing such an interesting thesis opportunity and continuous assistance throughout the entire process.

This thesis work would not have been achievable without the co-operation and skills of the aforementioned persons. The authors would like to express their deep appreciation and will be grateful forever.

(4)

Abbreviation

ALF ARTIST2 language for WCET flow analysis CFG Control flow graph

CNN Convolutional neural network

DARPA Defence advanced research agency E2EDA End-to-end delay analysis

GPU Graphics processing unit HRTA Holistic response time analysis IDE Integrated development environment ILP Integer linear programming

LIDAR Light detection and ranging MBD Model based development MCDM Multi criteria decision making MDP Markov decision process

MPDM Multi-policy decision making

POMDP Partially observable Markov decision process RADAR Radio detection and ranging

RCM Rubus component model RTA Response time analysis SFM Social force model

SWC Software circuit or Software component SWEET Swedish execution time tool

TP Timed path

WCET Worst-case execution time WCRT Worst-case response time

(5)

List of Figures

1 Illustrates the difference between the urban traffic scenario (A) and the construction

site environment (B) . . . 2

2 Control System for Autonomous Vehicle . . . 4

3 A visualization of the agent interacting with the environment . . . 5

4 Vehicle B must predict the maneuver of the vehicle A before proceeding . . . 6

5 Grid-world example where the agent aims to find the diamond while avoiding the rocks . . . 7

6 Example of how the policy favor complete safety over being fast . . . 8

7 Presenting the state transitioning in a MDP . . . 10

8 Task Model . . . 12

9 Single-Rate System . . . 13

10 Multi-Rate System . . . 13

11 Data Propagation Delays: Age Delay . . . 14

12 Data Propagation Delays: Reaction Delay . . . 15

13 Periodic Task Chain . . . 15

14 Modelling of software architecture in RCM . . . 16

15 A flow diagram of the components required for autonomous navigation . . . 17

16 A Flow Diagram of Multi-methodological Research Approach . . . 23

17 Process of System Development Approach . . . 24

18 A Flow Diagram of the Approach Applied For System Design . . . 26

19 Overview of Software Architecture . . . 26

20 The different Move Crowd SWCs . . . 27

21 Overview of the forces exerted on pedestrian i . . . 28

22 Example code to demonstrate path clustering . . . 29

23 Software Architecture Modelled in Rubus ICE . . . 30

24 Demonstrates the angle ϕyδ . . . 35

25 Effective field of view for human drivers . . . 35

26 A presentation of the simulated environment where the black dots represent pedes-trians and the blue circles represent vehicles . . . 38

27 Shows the behavior of the agent while navigating towards the goal without disturbance 39 28 The autonomous agent adapts its velocity and direction towards the left as one of the vehicles are located in its path . . . 40

29 Show how the autonomous vehicle avoids static obstacles detected on its path . . . 40

30 Show how agents interact with each other, the autonomous vehicle stops in capture 3 and 4 when pedestrians move towards it . . . 41

31 The vehicle in front is estimated to navigate to the right and as such the agent continues on its path . . . 42

32 The agent stops when vehicles are heading towards it . . . 42

33 The vehicle in front is heading in the same direction as a result the agent chooses to follow the leading vehicle . . . 42

34 The agent though their are parallel and not in a straight line still considers to observe the agent and follow it as they have the same direction and orientation . . . 43

List of Tables

1 Analysis Statics . . . 43

2 Analysis Result . . . 43

(8)

1. Introduction

In recent years, there has been an increased interest in the research and development of autonomous vehicles. The potential benefits of self-driving vehicles on public roads is well recognized within the automotive industry [2]. Every year approximate 1.35 million people are killed due to car accidents, and the main cause is directly linked to human errors [3]. The increased deployment of autonomous systems in the vehicles could provide long term benefits in terms of greater traffic safety, increased efficiency and lower fuel consumption [4]. Today automation technology is already being incorporated in modern cars, such as dynamic steering response, adaptive cruise control, and automatic braking systems.

The strive for automation does not only apply to the automotive sector. The effective use of automation in the construction industry is viewed as one of its greatest opportunities [5]. Accidents with heavy equipment-vehicles are common [6] and automating machinery strives to provide a safer work site. Employing fully autonomous heavy equipment-vehicles could also potentially increase productivity and efficiency [7]. Such machines are capable of dealing with work tasks without halt and provide quality work in a fast-paced manner, thus effectively speeding up the process of finishing the overall work operation.

Despite the benefits of automation, there are several problems to overcome in the process of automating these vehicles [8]. Constructing reliable and safe autonomous systems require modeling and handling all the complex scenarios that the vehicle might encounter in the environment. Automation control systems are reliant on robust predictions and sensor measurements to reliably adapt to different scenarios. In such situations, the world model must account for any uncertainty in sensor measurements, and consider the behavior of different traffic participants to make appropriate risk assessments of future planning and motion [9].

In automation, planning and decision-making methods provide a framework for modeling com-plex scenarios and automatically determine an appropriate maneuver for the situation. Considering the complexity of the real world and the limited precision of sensors a probabilistic decision model is often employed [10] [11]. With a probabilistic model, the vehicle makes robust decisions by pre-dicting and assessing future consequences. While proper theoretical frameworks for planning under uncertainty exists, called Markov decision process (MDP) and Partially observable Markov deci-sion process (POMDP), tractable solutions require choosing a suitable architecture for handling the different models of the environment [12].

In the architecture of an autonomous vehicle, motion planning is a fundamental part that affects the performance of the autonomous vehicle [13]. Motion planning regards with the high-level goal of transporting the vehicle from point A to point B in a collision-free manner. The task of moving the vehicle from one point to another alone involves controlling several maneuvers, e.g. providing appropriate vehicle speed and wheel steering angle. Along with this, in order for the autonomous system to simultaneously provide the ability to avoid any obstacle that could result in an accident or collision the system has to re-plan the original maneuver tactics. For re-planning, it is extremely important that a feasible solution is calculated in a timely manner [14]. Preventing planning latency is therefore of most importance. Managing all subsystems simultaneously becomes a highly complex problem. A hierarchical software architecture, allowing for different planning methods in different scenarios, is therefore considered a key solution for solving large planning problems [15].

This thesis work aims to investigate how a decision-making module can be designed to navigate in a dynamic, unstructured construction environment. The environment constrains the system to estimate the behavior of traffic participants in the space to account for uncertain encounters. To ensures the decision-making system is capable of performing safe online trajectory planning a set of well-defined maneuvering tactics are needed to be established. Furthermore, the evaluation of decisions is required to be made within a strict time-constraint to meet productivity goals and ensure safety.

1.1. Industrial use case scenario

The complexity of the environment among different driving scenarios differs greatly. Urban or city driving is viewed as one of the most difficult environments to model since it involves a variety of complex driving situations. There are a lot of different traffic participants, traffic rules and

(9)

road types to consider. A lot of research regarding autonomous driving has therefore mainly focused on driving in highway scenarios since these are considered most traceable [14]. In these cases, the decision system is often modeled as taking tactical high-level decisions such as choosing between lane-changing or lane-staying while avoiding collisions. These decisions are then sent to the trajectory generator that provides low-level lateral and longitudinal control.

The environment at the construction site varies from urban and highway driving scenarios as can be depicted in 1. Unlike in public roads, there are no lanes and space is less limited. Furthermore, there are no predefined traffic rules. Humans operating the vehicles are instructed to perform specific tasks, e.g. going towards a pile and picking up gravel. At the same time, the work of all vehicles at the site are administratively controlled by humans remotely or at the site to guarantee an efficient workflow.

A B

Figure 1: Illustrates the difference between the urban traffic scenario (A) and the construction site environment (B)

The main tasks and goals likewise differ for construction vehicles as opposed to regular cars. For self-driving cars, the main objective is to travel from point A to B in a safe manner. Construction vehicles are not only concerned with providing a safe trajectory. Other tasks, such as digging and loading material efficiently are equally important.

The main purpose of this work is to model a generic decision system that can handle different scenarios while considering high-level goals. To provide a proof of concept the scenario is required to be explicitly defined. In this work, the system will consider the autonomous vehicle to be a wheel loader traveling towards a pile. Any other vehicles in its surrounding are considered to be human operated. The high-level goal is to reach the pile efficiently and safely. Along the way, the vehicle may encounter different static or dynamic objects.

In motion planning, objects in the vehicle’s path are treated as obstacles and thus avoided [16]. In real situations, such as at the construction site, different objects should be treated differently depending on the circumstance and established rules. For instance, if some particular vehicle is crossing the path of the autonomous wheel loader it might require to always give way to the vehicle according to company rules. Or in the presence of humans, it might be ordered to preform an emergency stop, signal and wait until the human is completely out of sight to continue. Considering the wheel loader is the only autonomous vehicle in the site it should further not behave abruptly or scare others.

The purpose of the hierarchical decision system is to provide rules, similar to traffic rules, that obey company requirements. The system takes behavior decisions on how the different objects will be handled, while still considering the main goals. The decision system then sends commands to the motion planner and controllers (e.g. adaptive cruise control) to adapt lateral and longitudinal velocity.

This thesis work is a collaboration with Volvo Construction Equipment with the purpose of creating a decision-making system for autonomous construction vehicles. Productivity and safety

(10)

are important aspects and such the main objectives of the decision engine are to make safe and robust decisions while considering timing requirements. In this work, the aimed level of automation is level 5 according to SAE international [17], which means fully automated under all conditions.

1.2. Motivation and problem formulation

Modeling a hierarchical decision engine reduces the probability of acquiring planning latency and execution errors that occur due to complex decision making [14]. However, there is no standard hierarchical solution, and finding an appropriate hierarchical structure is a challenge [15]. At the same time, for the decision-making algorithm to provide the ability to perform real-time decisions an algorithm that can evaluate decisions within a strict time-constraint is required.

A hierarchical control model for driver behavior further divide the decision-making in au-tonomous vehicles into three categories [18] since decision-making is required for different areas and applications. The model is based on how humans select tactics to achieve a goal and comprise of a strategic level developed for long-term goals, a tactical level selecting high-level maneuvers to react to the current situations, and a control level handling low-level decisions such as the appropriate steering angle for adapting to the the high-level maneuvers.

The proposed system operates at the tactical level. A key challenge in autonomous cars is how to model decision systems that operate at this level since it is difficult to determine the appropriate behavior for when there are a lot of outside and uncertain factors that affect the proper maneuver. Thus, in order to provide a framework for planning under uncertainty, the problem is modeled as an MDP or POMDP.

As can be observed the complexity of designing a tactical decision system is large. The envi-ronment, possible scenarios, inital assumptions, and rules are required to be well established and limited to provide a solution. With these guidelines the work in this thesis aims to answer the following questions whilst maintaining the focus on construction vehicle domain:

• Research question 1: How can a tactical decision engine for construction vehicles be formu-lated as a partially observable- or Markov decision process?

• Research question 2: How can real-time characteristics be taken into account while modeling a tactical decision engine?

• Research question 3: How can we solve the POMDP/MDP formulation and obtain an optimal policy1 considering real-time constraints?

1.3. Initial assumptions

Limiting the scope of this thesis work is very important considering the complexity of the problem. Therefore several assumptions have to be made beforehand. Discussing the limitations require understanding the structure of a typical control system for autonomous vehicles [19]. Figure 2 illustrates such a system which can be further divided into separate modules.

The perception subsystem corresponds to the extraction of available information from the vehicle’s prevailing surrounding environment. The data is obtained from both onboard and external sensors. The abstract data (e.g. vehicle position, static or dynamic objects in the surrounding) gathered from these sensors are then processed and subsequently transferred in an adequate form to the following rules interpreter and behavior module. This work assumes the collection of all raw data from the perception layer to be performed exclusively through onboard sensors. Additionally, it is also presumed that the construction site may be populated by both static and dynamic objects. For example, at the construction site, static obstacles can be considered as piles of material or barrel or any other parked vehicles whereas dynamic obstacles are working vehicles or human.

Based on the information provided by the perception subsystem the traffic rules interpreter module checks the current rule and points at the most suitable reactions. Unlike urban or highway traffic conditions at a construction site, there are no predefined traffic rules. The authors of this thesis have defined possible rules considering a typical construction environment at the behavior

1_{An optimal policy lets the agent known the most favorable action to take in each state, such that it maximizes}

(11)

Car State Environment Sensor Data Perception Traffic Rules Interpreter Behavior Controller / Decision Making Behavior Interpreter and Lowlevel Controller

Figure 2: Control System for Autonomous Vehicle

controller module. Furthermore, for the sake of simplicity an even terrain is considered prevailing at the assumed construction site.

The behavior controller or decision-making subsystem comprises a set of closed-loop control algorithms and a certain number of user-defined rules. Each algorithm is able to maneuver the vehicle under a given environmental situation (e.g. following or avoiding other vehicles or braking). The role of the behavior controller module is to decide upon which specific algorithm the system should perform in any distinct scenario. Finally, the output decision is forwarded to the behavior interpreter and low-level controller module, which then plans and executes the designated control action such as planning the best route, modulating the vehicle’s rotational or transnational velocity, steering angle and other actuators. This thesis mainly focuses on the design of a decision-making module assuming other models are already provided as given inputs to the system.

1.4. Thesis outline

The rest of the thesis work is organized as follows: Section 2. explores relevant background for un-derstanding the topic of the thesis. In the background, the two different approaches are discussed for modeling the decision-making environment. First, the Markov decision process (MDP) is dis-cussed and then the Partially observable Markov decision process (POMDP) is presented. Section 3. presents and discusses various related work. Continuing this, a scientific approach for research methodology is introduced together with its application. This is described in Section 4. Section 5. discusses how to formulate the decision-making system as an MDP. Later the software architecture and timing properties of the system are presented. Lastly, how to select a suitable timing analysis technique for the designed system is addressed. In Section 6. various implementation concepts and formulas are described in detail. Section 7. evaluates the performance of the implemented system with different test-case scenarios and Section 8. further discusses the result along with the limitations of the system. Finally, the thesis is concluded in Section 9. following a short proposal on the further development of the designed system.

(12)

2. Background

This chapter addresses the theoretical foundations for understanding the concept of real-time decision-making in intelligent agents. Firstly, the concepts of rational agents and environments are discussed. An overview of the theory and algorithms behind Partially observable- and Markov decision processes are then presented. Following this, a general overview of real-time schedulability analysis is briefly discussed. The chapter ends with presenting the components and algorithms required for the vehicle system to sense, react and navigate.

2.1. Rational agents and environments

In artificial intelligence, the study of thought processing, reasoning, and behavior of rational agents are of interest. An agent is any system that can acquire information from the environment, make decisions and take actions. The agent can be anything from a human, robot or nonphysical entity such a software program. The fundamental part is that the agent entity has a way of perceiving its environment and acting upon the environment. Consider us humans that have eyes and ears to orientate ourselves in the world or a robot system that has different vision sensors and various motors operating as actuators.

An agent is said to be rational if it can take the ideal action, given the information it knows. The study of finding the optimal action is what the field of decision making concerns itself with. To solve this problem the decision-making system is composed of two distinct components, the agent and its environment as depicted in Figure 3.

Interpretation Decision of actions Sensors Actuators Agent Environment Actions Observations

Figure 3: A visualization of the agent interacting with the environment

The environment provides a way to model the agent’s world and how it interacts with it. Rewarding certain actions in the environment provides a way to make the agent behave in a certain manner. The agent component can further be divided as the interpreter and decision solver. The interpreter is the part of the agent responsible for modeling its sensors. The interpreter is responsible for observing and creating a belief state of where the agent is believed to be in the environment. The solver is the brain of the agent. It considers the environment, the rewards given for certain actions, and the observations made from the interpreter to decide upon what action to make. The components of the system are further discussed in detail below.

2.2. Modeling the environment

The properties of the environment determine how the environment is modeled. An environment is said to be fully observable if the agent, by its sensors, has a complete awareness of the state of the world at every point in time. The benefit of such a world model is that the agent does not need to create a memory database of past observations to keep track of the environment. Consider the checkers game. The environment is fully observable in the sense that the agent has full state awareness of the game by knowing the game rules and the number of pieces the agent and the

(13)

component has access to. Contrarily, the environment is partially observable if the agent only has limited awareness of the world state. The limited awareness is either a result of incorrect sensor measurements from the agent or because the information is missing from the environment. For autonomous vehicles navigating in traffic, the scenario deals with partial information. The vehicle does not fully know the intention of other traffic participants and as such must make predictions of future states.

For autonomous vehicles, the environment is stochastic since the outcome of a specific state is uncertain. This is opposed to when the environment is deterministic, meaning the next state is completely determined by the current state and the action performed by the agent. However, most real-world environments are not deterministic as can be illustrated with the example of navi-gation for autonomous vehicles represented in Figure 4. A mathematical framework is required for modeling a stochastic world environment. The following sections will discuss the widely employed Markov decision process framework for modeling stochastic decision making where the outcome is partly uncertain.

A

?

B

A

Figure 4: Vehicle B must predict the maneuver of the vehicle A before proceeding

2.2..1 Markov decision process

A Markov decision process (MDP) is a 5-tuple hS, A, T, R, γi where every element is defined as follows:

• S: a finite set of states

• A: a finite set of actions for each state

• T : a transition model T (s, a, s0_{) = P (s}0_{|s, a) giving the probability to go to another state s}0

given the current state s and executing the action a

• R: a reward function R(s, a) providing a scalar value as a reward for being in a particular state s and executing action a

• γ: a discount factor γ ∈ [0, 1) for adjusting the preference of immediate rewards or larger rewards in the future

The MDP framework aims to be a straightforward model for achieving goals that require learning from interacting with the environment. The agent learns that certain actions in a particular state are more favorable than others. The interaction is continuous as the agent in every point in time

(14)

has to select an action, change to a different state and react to the new situation. The transition model governs the probability that the agent actually changes to the desired state. For MDPs the environment is stochastic and fully observable [20]. The stochasticity is due to the agent not being completely certain of what state it will be placed in after taking a particular action, but the environment is fully observable as the agent’s observations of the environment are assumed to be correct. Consider the following grid-world example where the goal of the agent is to find the diamond and avoid the rocks:

Figure 5: Grid-world example where the agent aims to find the diamond while avoiding the rocks

• States are the different grid blocks • Actions are up, down, left, right

• The transition function states that there is a probability of 30% that the incorrect action is taken, if the agent wants to go up (or down) the probability of actually going up (or down) is 70% and 15% chance that the agent goes left or right. If the agent goes left (or right) the probability of actually going left (or right) is 70% and 15% chance that the agent goes up or down. Once the agent is in a particular state it is assumed to fully know its position in the grid-world.

• The reward model gives the agent -10 points when it hits the rocks, +20 points when finding the diamond and -0.2 points for transitioning to any other block. The negative reward for moving into an empty block defines a way to prioritize reaching the treasure as quickly as possible at each step.

The MDP has the Markov property stating that once in the current state necessary information of the history of past states is accessible without the need to explicitly memorize or save past events. In mathematical expressions, this means that state s0= st+1 has the Markov property only if:

P [st+1|st] = P [st+1|s1, s2, ..., st], (1)

where the state can access relevant information from the history.

(15)

at the main goal. The main objective of an MDP is to find a policy π(s) → a that determine the optimal action for every state. The optimal policy π∗(s) → a is a policy that maximizes the expected reward R(s, a). Any arbitrary policy π(s) is defined as follows:

π = Eπ "∞ X t=0 γtR(st, π(st)) # , (2) where the policy maps every state with a particular action and the expected accumulated return starting from state s, taking action a, and following policy π, tell us the greatness of the policy. The policy is guaranteed to converge over an infinite time horizon since γ ∈ [0, 1).

The optimal policy π∗(s) can be defined as follows:

π∗= argmax π E "∞ X t=0 γtR(st, π(st)) # , (3)

where the only difference is the maximized accumulated reward for sampling all actions from a particular state defined by the policy.

Consider the grid world example again. Once the agent is in a particular state it can take different routes for achieving the main goal of getting the diamond as previously mentioned. The agent could take a safer route away from the rocks or take a more risky but shorter path alongside the rocks as can be depicted in Figure 5. Finding the optimal policy is deciding which strategy is most optimal.

The transition- and reward functions alter what parameters are of most importance and there-fore affects what defines the optimal policy. In the grid-world case, the reward function might favor safety over being fast. Consider the same reward- and transition model example presented before. The high-lighted state in Figure 6 might seem unreasonable. However, the transition model states that when the agent goes down there is no probability of going up. Thus, the agent can completely avoid the rock. The agent might be stuck in the same state for a while but because there is a probability of 15% of going right the agent will eventually reach the goal. This is fine because the reward model only gives the agent a penalty of -0.2 for transitioning to other states. Hitting the rock is worse as it gives the agent -10 points. The optimal policy has therefore defined that the most optimal way to act in states near the rock is to completely avoid it.

Figure 6: Example of how the policy favor complete safety over being fast

There are different methods for finding the optimal policy, such as Dynamic Programming and different tree search methods. The solver component in the agent has the role to find the most optimal policy to execute. A more descriptive role of the solver part of the agent is presented in section 2.3..2.

2.2..2 Partially observable Markov decision process

Partially observable Markov decision processes, as can be observed by the name, are an extension of MDPs for modeling partly observable stochastic decision making to account for sensor- and

(16)

environmental uncertainties. Contrary to the case of MDPs, the agent is not capable of observing the full state. Instead, the agent has to make observations to hint about the true state. This is the case for many real systems that are relying on uncertain or subjective sensors to model the environment. Suppose the agent fails to perform an action or the environment changes due to outside forces. POMDPs provide a way to handle these situations.

In a POMDP the set of observations can be probabilistic to model the probability of each observation for each state in the model. This is referred to as the belief state and creates the notion of possible states the agent might be in. The comprised knowledge about the current state in POMPD would generally require to keep track of the entire history of past states, making the POMDP a non-Markovian process, however maintaining the belief state (i.e probability dis-tribution) of possible states provides the information needed to maintain the history [20]. Thus POMDPs are Markovian processes.

POMDPs are a 7-tuple hS, A, T, R, γ, Ω, Oi were the first five elements are a regular MDP and the two last elements are defined as follows:

• Ω: a set of observations

• O: a set of observation probabilities O(s0_{, a, o) = P (o|s}0_{, a) to specify the probability of}

receiving the observation o ∈ Ω given that the system ended up in state s0 after taking action a

The belief state B is the key component for the interpreter part of the agent. As new observations are made the belief state is updated to get more accurate knowledge about the possible state. The interpreter and belief states are discussed below.

2.3. Modeling the agent

This section first discusses the agent component required for interpreting sensor data to model a belief state of the environment. The actual decision process in the agent is then described. 2.3..1 Creating a belief state

The creation of the belief state is important for modeling sensor uncertainty. In the process, the agent starts with a prior belief distribution. As new observations are made the belief distribution is then updated to create a posterior belief distribution. The process of updating the belief state is referred to as filtering. Filtering allows the agent to refine its belief of possible states.

The Markovian property states that the new belief state over states only requires the notion of the previous belief distribution, current observation and the action taken. In mathematical terms, this is expressed as follow:

b0 = τ (b, a, o) (4)

where b0 is the posterior belief state and b is the prior belief distribution.

The posterior, updated, belief distribution can be computed as follows: b0(s0) = αP (o|s0, a)X

s

P (s0|s, a)b(s), (5)

where b0_(s0_{) is the probability of state s}0_{, α is a normalizing constant that makes the belief state}

sum to 1. The probability of state s0 is dependent on the probability of making observation o ∈ Ω given state s0and action a, considering the transition probability to be in state s0and the previous belief state b(s).

For updating the belief distribution different filtering method exists, such as Bayesian filtering and Monte Carlo particle filters. Due to the Markovian property of the belief state, a POMDP can be formulated as an MDP where the belief states are the states. The problem is that the belief-state space is continuous considering the possibility of infinite probability distributions of

(17)

the belief state. Finding the optimal policy π∗(b), therefore, requires other solutions or adapting the standard methods employed for solving classical discrete MDPs [20].

2.3..2 The decision process

The actual decision procedure happens when the agent, after making observations, decide what action to execute considering its beliefs. The process is narrowed down to finding the optimal policy in MDPs and POMDPs. There are several methods and algorithms for finding an optimal policy. The methods are further categorized as either online- or offline approaches.

The offline approaches assume the solutions can be calculated beforehand in every state (e.g. supervised learning used to equip the agent with knowledge beforehand to choose good actions) whereas online methods find the optimal policy directly only from the current state (e.g. Monte Carlo Tree Search simulating best action to take in every state). For problems with large space calculating the policy beforehand for every state might be impossible [21]. The often employed Value iteration and policy iteration methods are presented below to describe how an optimal policy can be solved for MDPs.

Example of how to find an optimal policy

Value iteration and policy iteration are dynamic programming methods. Both these methods assume that the agent knows the MDP model of the environment. This refers to the agent having knowledge about the transition and reward functions. Assuming this knowledge the agent can plan what action to take in beforehand (offline) [21].

In MDP there is the notion of states and q-states. States are regular states while q-states are chance nodes. While an agent is at a particular state it determines to take a certain action. It then transitions to a q-state where the transition model determines the probability to land in the next state. This process repeats itself. The reward function gives a value when the transition has happened. The actual process is illustrated in Figure 7.

a s s, a s' s, a, s' state qstate action transition

Figure 7: Presenting the state transitioning in a MDP

The utility is the sum of discounted rewards. To maximize long-term discounted rewards requires finding policy π that maximizes the expected future utility of each state s, and for the q-state s, a: Vπ(s) = E{Rt+1+ γRt+2+ γ2Rt+3+ ..|st= s, π}, (6)

(18)

where γ is the discounted reward.

Both these functions are referred to as value functions. The value function Vπ_{(s) specifies how}

good a state s is if following the policy π. In other words, the value function is equal to the ex-pected future utility starting from state s and executing the actions specified by the policy. With this value function, the policy is evaluated depending on the long-term value the agent is expected to get from following the policy. The action-value function Qπ(s, a) specifies how good a particular action is from a particular state while following policy π. It specifies the expected future utility from a q-state following the actions defined by the policy.

The optimal value function V∗(s) is the function that has a higher value over other functions. It is the expected utility starting in s and acting optimally. The optimal value function Q∗_{(s, a)}

means the highest value for having taken action a from state s and then behaving optimally. Since the optimal value function V∗(s) defines acting optimal from state s and Q∗(s, a) defines taking an action a from state s and then acting optimal, V∗(s) equals the maximum Q∗(s, a) over all possible actions defined as follows:

V∗(s) = max

a Q

∗_{(s, a)} ₍₈₎

The definition of the optimal Q-function is the following: Q∗(s, a) = R(s, a) + γX

s0

P (s0|s, a)V∗(s0), (9)

where R(s, a) is the immediate reward after performing the action a while in state s, and since there is a distribution over outcomes the sum over all possible next states s0 are required to be averaged over the probability of landing in those states when taking action a, times the discounted expected future utility starting from s0 which is given by V∗(s0). These are recursive Bellman equations that can be solved.

Value iteration initializes the value function V (s) to arbitrary random values. The Q(s, a) and V (s) values are updated iteratively in a bottom-up way to compute the optimal value functions. The idea is to set time-limited values. At every time step Vk+1(s) the value of Vk(s) is known

and thus both values are recursively updated. This process is repeated until the values converge. The optimal policy is given once the optimal values of every state are set. The following equation calculates the optimal value of s for a finite horizon:

Vk+1(s) = max a R(s, a) + γX s0 P (s0|s, a)Vk(s0) (10)

The complexity of value iteration is as high as O(S2_{A) since the equation needs to look at each}

state, action and next state per iteration. The algorithm is impractical over large state- and action space. Policy iteration reduces the complexity to O(S2_{) per iteration by calculating the value}

function for a fixed policy. The action space is reduced considering the policy maps every state to one action. The process first evaluates an arbitrary policy π in a similar method as value iteration until convergence: Vπi k+1(s) = R(s, πi(s)) + γ X s0 P (s0|s, πi(s))Vkπi(s 0₎ ₍₁₁₎

Then a new better policy is created by comparing the evaluated policy to the actions available. The action with the best value is chosen and the process is repeated for every new state. Convergence is reached once the evaluated policy is equal to the new optimal policy. Instead of returning the value of the best action, the action is returned. In mathematical form this is expressed as follows:

πi+1(s) = arg max a R(s, a) + γX s0 P (s0|s, a)Vπi_(s0₎ (12)

(19)

2.4. Timing analysis of real-time systems

This section first addresses the general concept of schedulability analysis which verifies whether the timing constraints specified on the system are satisfied or not. Following this, the end-to-end timing constraints are discussed. At the end of this section, the utilized industrial tool suite for timing analysis is presented.

2.4..1 Real-time characteristics

A real-time system is required to interact and communicate with its environment in a timely man-ner. Such a system must processes information and deliver a logically correct output within a strict timing constraint. Regardless of the system load, failure of such a system in many applica-tions may result in catastrophic consequences such as endangering human life or the environment. Hence, the temporal aspects of these systems must be predictable. While considering the real-time characteristics of the embedded system, each software component of the system is reviewed as a set of tasks. The three fundamental characteristics of these task sets in a periodic task model are listed as worst-case execution time (C ), deadline (D ) and period (T ). A brief description of each concept is presented below and shown in Figure 8a and 8b.

• Task: A sequence of a program executed on a processor aimed to perform a certain compu-tation.

• Time period: The time interval between activation of two consecutive task instances. • Deadline: The latest time within which the execution of the task has to be finished after

its release.

• Worst-case execution time (WCET): The maximum execution time of a program on specific hardware without any interruption. Estimation of safe and tight WCET of a program is crucial for reliable and accurate behavior of the real-time system.

• Worst-case response time (WCRT): The longest possible time required by a task to finish its execution while considering worst-case interference (preemption and blocking) from other tasks in the model.

Period start Start of

next period Finishing time worst-case execution time Time Period Absolute Deadline Relative deadline Response time

(a) Without interference

Period start Start of

next period Finishing time execution time Time Period Absolute Deadline Relative deadline Response time

(sum of all parts)

worst-case interference

(b) With interference

Figure 8: Task Model

The response time of a task is bounded by interference from other tasks in the model. More precisely, a task execution may be preempted by other higher priority tasks or blocked by lower priority tasks. As depicted in Figure 8a the execution of a task is not interfered by any other tasks in the model. Hence, the task’s response time and WCET are equal. On the other hand (see Figure 8b) due to interference by another task in the model, the response time of the task is higher than the WCET.

(20)

The real-time requirement of each software component primarily depends on two major constraints. First, the WCET of each task must be less than or equal to its deadline. Moreover, the deadline for each task should be less than or equal to its period, as shown in Equation 13.

C ≤ D ≤ T (13)

Another requirement is the schedulability of task sets. This implies each task requires to have a maximum response time i.e., the time elapsed between the moment the task becomes ready to execute and finishes its execution must be less than or equal to its deadline [22]. Schedulability analysis methods evidence the meeting of timing requirements of the system. One such method is the Response time analysis (RTA), which checks the schedulability of a system by concluding both necessary and sufficient conditions respectively [23]. In real-time systems and networks, the upper bounds on the response times of tasks or messages are calculated by the RTA method.

2.4..2 Task chain

A task chain contains a sequence of tasks and can be classified as a trigger or data chain depending on the activation of tasks in the chain. Therefore, the analysis of the task activation pattern is of significance to the performance of the real-time system. The classified task chains are presented and described in the list below:

• Trigger chain: The first task in the chain is initially activated by a triggering source (e.g, event, clock or interrupt) as shown in Figure 9. The rest of the tasks in the trigger chain are activated by their predecessor tasks. In such a chain, there is an implicit precedence relation between any two neighboring tasks, i.e., a task can only be triggered for execution after its predecessor task has finished its execution. If a system is modeled solely with trigger chains then the system is said to be single-rate.

data sink Sensor data

input

Figure 9: Single-Rate System

• Data chain: Each task in the data chain is activated individually with an independent clock and often with distinct periods (see in Figure 10). Hence, all tasks are independent of each other. Signals from peripheral devices or messages from the network interfaces provide data to the first task in the chain. However, the rest of the tasks receive data from their predecessors. If a system contains a single data chain with different clocks then it is called a multi-rate system.

Sensor data input

data sink

(21)

2.4..3 End-to-end Timing analysis

In the case of a pure trigger chain or single-rate system, every task in the chain is not indepen-dently activated. In fact, only the first task in the chain is activated by an event while among other tasks precedence constraint exists. In this work, analysis for end-to-end data propagation delay is discussed exclusively. In order to verify the end-to-end delay constraint, especially in the automotive domain, the age delay and reaction delay are expected to be calculated. To explain the analysis method, a few relevant concepts are required to be briefly described. The following list present concepts for understanding end-to-end data propagation delay:

• Timed path(TP): A sequence of all task instances along a task chain in the system. • Reachable timed path: The path where data can actually propagate from the first task

to the last task in the chain.

• Age delay: The maximum delay on reading the output among all reachable timed path in the chain as shown in Figure 11. The age delay can be calculated manually either using a mathematical equation or by creating a graphical representation. The mathematical equation for age delay estimation is expressed in 14.

AgeDelay = M ax{Delay(T P 1), ..., Delay(T PM)} (14) Delay(T PM_{) is the data propagation delay in each timed path and can be calculated by}

equation 15.

Delay(T PM) = αn(T PnM) + Rn(T PnM) − α1(T P1M) (15)

α1(T P1M) is the activation time of the instance of the 1st task in path T PM.

αn(T PnM) is the activation time of the instance of the nth task (the last task in the chain)

in path T PM_.

Rn(T PnM) is the response time of the instance of the nthtask in the path T PM.

0 ₂ ₄ ₆ ₈ ₁₀ ₁₂ _t

Hyperperiod

Age Delay

Periodic Task Arrival

Data Propagation

Periodic Task

TP1 TP2

TP1 TP2 _{Timed Path}Reachable

Figure 11: Data Propagation Delays: Age Delay

• Reaction delay: In order to calculate the reaction delay in a task chain it is required to consider the effect of missing new data as input in the first instance of a task. Reaction delay is defined as the maximum delay among all reachable timed path in a single hyperperiod and having non-duplicate or ”first” output of the chain. The mathematical formula to calculate reaction delay is expressed in Equation 16 and for the graphical representation see Figure 12. ReactionDelay = M ax{Delay(T P 1), ..., Delay(T PM)} (16) Delay(T PM_{) is the delay in each such path and can be calculated by equation 17.}

(22)

0 2 4 6 8 10 12 t Hyperperiod

Reaction Delay

Periodic Task Arrival Data Propagation

Periodic Task Task just missed the new data which

is read by the new instances of the ﬁrst task TP Reachable Timed Path TP

Figure 12: Data Propagation Delays: Reaction Delay

α1(P red(T P1M)) is the activation time of the predecessor instance of the 1st task in path

T PM_.

Example of an age and reaction delay calculation

Consider a task chain (see Figure 13) where all tasks are periodic and activated independently. Task sets are communicating with each other through registers. The timing diagram of the task chain is depicted in Figure 11 and 12.

Register 1 Register 2 High Priority Period = 8 Medium Priority Period = 2 Low Priority Period = 4

Figure 13: Periodic Task Chain

Age delay calculation for the task chain: Delay(T P1_{) = α}

3(T P31) + R3(T P31) − α1(T P11) = α3(1) + R3(1) − α1(1) = 0 + 1.5 − 0 = 1.5

Delay(T P2) = α3(T P32) + R3(T P32) − α1(T P12) = α3(2) + R3(2) − α1(1) = 4 + 1 − 0 = 5.0

AgeDelay = M ax{Delay(T P1), Delay(T P2)} = M ax{1.5, 5} = 5

Reaction delay calculation for the task chain:

Delay(TP) = α3(T P3) + R3(T P3) − α1(P red(T P1)) = α3(3) + R3(3) − α1(1) = 8 + 1.5 − 0 = 9.5

ReactionDelay = M ax{Delay(T P )} = 9.5 2.4..4 The Rubus tool

In order to develop a model and a component-based resource-constrained embedded system the Rubus tool [24] is utilized. The overall goal of this tool-set is to form a resource efficient system. The tool is developed by Arcticus System [24] and in close collaboration with other research institutes and industrial partners [25], [26]. The Rubus concept is based around the Rubus component model (RCM) [27] and its integrated component development environment. The environment is called Rubus-ICE and is a commercial tool for modeling an application, including its control flows, data flows, and constraints.

(23)

Software Circuit_1 Software Circuit_2 Clock GND Input Trigger Port Input Trigger Port Input Data Port Trigger Terminator Input Data Port Figure 14: Modelling of software architecture in RCM

The Rubus component model (RCM)

The software architecture is referred to as the fundamental structure of a software system. It comprises of software components and their interactions in terms of data and control flow. The control flow is triggered by either a periodic clock, interrupts or internal and external events. The purpose of the Rubus component model is to express the infrastructure for the software architecture and analyze the components timing constraints. The basic software component in RCM is called a software circuit (SWC) and its purpose is to encapsulate functional code. The trigger and data flow in SWCs are realized by a trigger and data port respectively. In RCM the functional code is separated from the framework implementing the execution model. Hence, a SWC has no knowledge regarding its connection to other components which benefices the reuse of components in different context [28]. Figure 14 illustrates the interaction of the components with regards to both data and trigger.

The Rubus analysis framework

In the component-based design of the software architecture, the real-time properties and charac-teristics are specified at the architectural level. These real-time characcharac-teristics are WCET and stack usage. The designer must express these properties while designing the model of a system. Later, the scheduler will consider these real-time constraints while generating the schedule of the overall software architecture.

2.5. Control components

The initial intention of this section is to give a conceptual and theoretical representation of how the environment and the agent can be modeled with MDP and POMDP. Different methods for solving MDP and POMDP are presented as they provide a way to make accurate predictions about the future state and decide the optimal actions. Later concepts for understanding and analyzing real-time systems are presented. Various tools and methods for creating real-time constrained systems are discussed as they are required to be utilized for designing the initial structure of the decision-making system.

The next section intends to give a more concrete representation of the different control com-ponents required for autonomous vehicle navigation. It aims to provide the reader with an un-derstanding of what role the different subsystems have in relation to the decision-making module. This is important since the subsystem are all interconnected.

2.5..1 Systems required for autonomous navigation

In order to enable the construction vehicle to navigate autonomously and safely in a real-world cluttered construction site, several technical requirements need to be taken into consideration.

(24)

The system requires algorithms for path planning, path tracking, local navigation, and decision-making. All these components are independent and directly impact the autonomous navigation, treated as a fundamental skill of the agent. The different components for autonomous navigation and decision-making are shown in Figure 15.

Sensor Data Technology Localization Low-level actuator Object Perception Path Planning World Model Driving Maneuvers Real-Time Decision Making Environment

Figure 15: A flow diagram of the components required for autonomous navigation

• Sensor data technology: Sensor technology merges multiple data coming in from different sensors (e.g. LIDAR, RADAR, Stereo Camera, GPU, Odometry specified by the company) and delivers accurate and reliable information regarding the vehicle’s immediate surrounding environment.

• World model: Software components collect sensor data and model a static view of vehicle’s environment.

• Localization: This system tracks the vehicle’s current position in a static map.

• Path planning: Path planning is further divided into global- and local path planning. The global path planning algorithm deliberately determines an optimal path on configured space, thus enabling the vehicle to navigate from a start location to a goal location, while avoiding static obstacles present in the configuration space. The local path planning system utilizes real-time sensor data and provides a fast and reactive algorithm. The algorithm enables the avoidance of dynamic obstacles emerging in the environment in a minimal amount of time while proceeding towards the overall goal location.

• Object perception: Detects objects and obstacles via incoming real-time sensor data which provides various information about the detected objects such as the type, size, and color. • Low-level actuator: Utilizes algorithms such as path tracking which enables the vehicle to

follow a predefined trajectory accurately, as provided by the path planner. A path tracking algorithm utilizes the positional- or the localization information to control the vehicle’s speed and steering angle.

• Decision making: The role of the decision-making system is choosing from competing priorities to select the most suitable action based on the current scenario. This module must take into account the uncertainties in the environment and other model imperfections. The output from the decision-making system compiles the output from path tracker module, thereby translating into an intelligent selection of the vehicle’s velocity and steering angle.

(25)

3. Related work

In this chapter, an overview of the state-of-the-art research conducted in the domain of autonomous navigation is described. In this thesis work, different fields of scientific approaches are considered. Consequently, the related work is divided into various subsections. Primarily, different approaches for solving decision-making problems for autonomous driving are discussed. Following this, meth-ods for WCET estimation of real-time systems are presented. Afterward, the end-to-end timing analysis constraints are discussed briefly. At the end of this chapter, a motivation for the chosen methods are presented.

3.1. Decision-making algorithm

There are several decision making algorithms which have been designed and developed for au-tonomous navigation. However, selection of one such method is highly dependent upon the pre-vailing uncertainty in the environment, efficiency in producing optimal behaviour and complexity of the implementation. A short overview of different algorithms is presented below.

3.1..1 Trajectory optimization

Over the past decade, a number of investigators have proposed decision-making solutions for au-tonomous vehicles through the lens of safe and efficient trajectory optimization algorithms. In a simulated environment, Gu et al. [29] uses dynamic programming and generates an efficient trajec-tory planner in an uncertain environment. In their work, the generate and test approach is used to select the best trajectory solution which encodes the suitable driving maneuvers. Although their simulation result was promising, their method was never implemented on a real vehicle scenario.

A novel collision-free motion algorithm among multiple agents, moving in a common workspace is proposed by Berg et al. [30]. In their work, each agent is completely independent and does not communicate with other agents. A velocity obstacles approach is used for obstacle avoidance considering a known simulated environment. The main drawback of this method is the strong assumptions which make the approach less applicable in a real-world scenario. Another trajectory optimization method is proposed in [31]. The system generates an intelligent driving maneuver while accounting for both static and dynamic obstacles in the environment. The authors also compared the performance of the motion planner before and after trajectory optimization. Both simulation and experimental results revealed the reduction of computational run-time for motion planner by 52% with improved trajectory quality. However, their method was primarily imple-mented for a simple passing maneuver. Another drawback in their experimental methodology was the fact that it was only limited to static obstacles. Similar approach is also used by Kuwata et al. [32], Tumova et al. [33] and Vasile et al. [34].

In a complex cluttered traffic environment, modeling interactions from the surrounding vehicles are necessary. The key problem with trajectory optimization methods is that the methods can only predict other vehicle’s current position in the traffic, but not their future reactions in response to actions from ego vehicle. However, in a real traffic scenario closed-loop coupled interactions among other vehicles are also accounted for. Instead of providing one or several collision-free paths, the ego vehicle required a decision-making system which suggests when and how to navigate in a dynamic environment.

3.1..2 Rule-based approach

As of today the most common approach to decision making in autonomous navigation is to manually model reactions to situations [35]. State machines are commonly used to assess the situations and take decisions in a single framework. During the 2007 DARPA urban challenge, the first notable approaches for autonomous vehicle’s decision-making architecture in an urban driving environment had been proposed [36]. Due to the controlled traffic situation, the design of the decision-making system was more inclined towards a rule-based expert system. A concurrent hierarchical state machine is used by Ziegler et al. [37] to model the basic behaviors in accordance with the traffic rules. In their work, the trajectory planner controls the driving maneuver only for merging the ego vehicle into traffic. However, several strong assumptions have been made such as other cars

(26)

in the traffic maintaining the right bound distance and to not accelerate while driving. The uncertainty arising in common traffic situations was also not considered. Team Junior [38] in the DARPA challenge finds critical zones where the ego vehicle has to wait in different traffic situations. Based upon ego vehicle’s speed and proximity relative to other vehicles they conducted a threshold test to provide with such territory. The concept of a rule-based decision-making system is implemented in a wide variety of solutions ranging from decision trees [39], behavior networks [40], heuristic approaches [41], distributed voting-based behavior architectures [42] or hybrids of rule-based decision-making and behavior models [43].

In order to adapt these approaches towards real-world scenario, more and more states, and transitions need to be considered for a complex and cluttered scenario. For example, with the state machine approach, individual complex tasks, like following a leader vehicle or merging into traffic, usually need tailored solutions. This process is tedious, error-prone and not robust considering the increasing complexity of driving situations.

3.1..3 Machine learning approach

To make decisions for autonomous navigation various learning based techniques have been endorsed in the automotive industry. Muller et al. [44] proposed a vision based obstacle avoidance system for an autonomous mobile robot. The system is trained from end to end to map the raw input images translated to steering angles. The learning system is a CNN model which predicts the relation between input images and the driver’s steering angle. An experimental result shows high-speed performance and real-time obstacle detection and path navigation. A real-time learning based trajectory generator approach is proposed by Guo et al. [45] for implementing an advanced driver assistance system with lane-following and adaptive cruise control function. Particularly, in their work, the ego vehicle learned the local trajectory generation from the existing leader vehicle in the host lane. An experimental result depicts the effectiveness of the proposed system in a typical but challenging urban environment. Other machine-learning techniques such as Gaussian mixed models [46], Gaussian process regression [47], Inverse reinforcement learning [48] also provide good solutions by predicting uncertain conditions in the environment and planning a course through them.

With these approaches, the generation of optimal decisions is limited by extensive training of the system, which in itself is a pretty complex and time-consuming endeavor. Additionally, it is difficult to reproduce multiple training scenarios which can represent the complex combinations of diverse situations arising from the real world.

3.1..4 Probabilistic approach

Human behavior is highly specific and complex, hence probabilistic prediction is used by several researchers as an input to the motion planner [49]. Damerow et al. [50] proposed a behavior generation approach for advanced driver-assistance systems, considering multiple traffic scenario with the different occurrence of probability. They created several cost-maps for each traffic situation to solve the global planning problem. Although they managed to generate different behavior applicable for different situations they made unrealistic initial assumptions which may not hold true in real traffic conditions.

Zhan et al. [51] presented a non-conservative defensive strategy for urban driving. They considered two deterministic driving maneuvers (passing and yielding) for other vehicles in traffic. The simulation results demonstrated the realistic behavior of the system under uncertainty. In probabilistic approaches, the problem is often formulated as a POMDP where the intention and re-planning procedure of other agents are not directly observable. Different navigation problems are formulated and solved by general POMDP model as observed in [52] and [53]. However, the main drawback of this method is the requirement of high computation run-time even for small states, actions and observation spaces compared to a real-world scenario. To overcome the problem with traditional POMDP, a group of researcher provided a set of pre-computed initial policies to the POMDP model. The POMDP solver then refines these policies and switches on and off over time to select the best policy. However, with these methods, significant resources are required to compute a set of policies which in turn limits the planning horizon, states, actions, and observation spaces.

(27)

A more practical approach towards the decision-making problem for autonomous driving is observed in [54]. The authors proposed a Multiple Criteria Decision Making (MCDM) theory to generate a high-level decision process in the navigation system. However, they do not predict the future intentions of other vehicle’s participating in the traffic. In these approaches, motion planning algorithms determined various discrete behaviors of other vehicles and their probabilities. The motion action of other vehicles and the ego vehicle are not modeled independently (non-interactive). Hence, the planning and decision-making modules are not integrated into the system. The robotics community stated such system results in an uncertain explosion in future states and thus results in the freezing robot problem [55]. However, within this class, the interaction may occur implicitly by re-planning but it is not explicitly planned.

More recently, in a highway entry scenario involving merging of ego vehicle into moving traffic, Wei et al. presented a set of possible high-level policies in MDP. The authors performed a forward simulation to find the best policy in the most likely traffic scenario. Then they scored every policy against ego vehicle’s cost function and the best policy was executed [56]. Guo et al. [57] generates a hybrid potential map which accounts for detecting obstacles and predicting risks in the environment. Their MDP model is used to generate candidate actions/policies and it is based on machine learning tuning. Although, in a simulated environment both the authors are able to get promising results their work still lacks real-world experimentation. Cummingham et al. solved the complex POMDP model by formulating the highway lane-changing problem into a multi-policy decision making (MPDM) process. The authors modeled a high-level decision process into a set of policies that encode close-loop behavior and uses manual tuning to select the best policy for vehicle control. They modeled the high-level behaviors of all agents in the system and thus, the principle decision making process is robust in scenarios with extensive coupled interaction within agents [58]. An extension of this work [1] was carried out in an indoor setting, where the number and complexity of the candidate policy are much higher.

3.2. Worst case execution time analysis

There are three main methodologies to analyze WCET for real-time applications in order to observe the maximum amount of time an application is required to execute. The comparison metrics on different WCET analysis methods is due to the restriction of non-availability of the source code, properties of hardware and prohibition of reverse engineering on the binary [59]. Below a short review of different WCET approaches are discussed. The key differences in estimating the WCET are also presented in this paper and the state-of-the-art method is highlighted.

3.2..1 Static WCET analysis

Static analysis methods have been around for a long time and they generate strong theoretical results by performing compile-time optimization of object code observed in [60] and [61]. Wilhelm et al. presented an abstract interpretation method besides static analysis to determine the bounds on the execution time of the program [62]. These bounds represent timing constraints of a hard real-time system. In their method, both cache and pipeline analysis results are integrated to provide safe and tight bounds on execution time. However, the experimental outcomes are limited to cache analysis and an extremely simple pipeline. Alur et al. proposed Timed Automata method to model the timing constraints of a real-time system and require time bounds [63]. The timing analysis method is similar to a finite-state machine where every edge corresponds to a matrix of intervals restricting various delays. This method provides bounds in the form of upper and lower bounds on the execution time for a finite state system. The main drawback of this method is that verification of the analysis result to the industrial benchmark was not shown.

The timing analysis of assembler programs using symbolic simulation was observed in [64], [65] and [66]. However, there works abide to machine-independent level and with the major assumption of a unit time-based system. Furthermore, optimal results were only obtained from a highly simplified architecture due to the complexity of the method.

Few commercial static analysis tools are also available in the market such as aiT [67] and Bound-T [68]. In a case study conducted by Andreas Ermedahl et al., they presented their view of using aiT WCET analysis tool from AbsInt GmbH [69] in the paper [70] and [71]. In their work,

DECISION-MAKING FOR AUTONOMOUS CONSTRUCTION VEHICLES

School of Innovation Design and Engineering

V¨

aster˚

as, Sweden

Thesis for the Degree of Master of Science in Intelligent Embedded

System - 30.0 credits

DECISION-MAKING FOR

AUTONOMOUS CONSTRUCTION

VEHICLES

Marielle Gallardo

mgo12003@student.mdh.se

Sweta Chakraborty

scy17001@student.mdh.se

Examiner: Daniel Sundmark

M¨

alardalen University, V¨

aster˚

as, Sweden

Supervisors: Saad Mubeen, Ning Xiong

M¨

alardalen University, V¨

aster˚

as, Sweden

Company supervisor: Torbj¨

orn Martinsson

Volvo Construction Equipment, Eskilstuna, Sweden

Acknowledgements

Abbreviation

Table of Contents

List of Figures

List of Tables

1.

Introduction

1.1.

Industrial use case scenario

1.2.

Motivation and problem formulation

1.3.

Initial assumptions

1.4.

Thesis outline

2.

Background

2.1.

Rational agents and environments

2.2.

Modeling the environment

2.3.

Modeling the agent

2.4.

Timing analysis of real-time systems

2.5.

Control components

3.

Related work

3.1.

Decision-making algorithm

3.2.

Worst case execution time analysis