Sensor-Based Trajectory Optimization

(1)

IN

DEGREE PROJECT MATHEMATICS, SECOND CYCLE, 30 CREDITS

STOCKHOLM SWEDEN 2016,

Sensor-Based Trajectory Optimization

MARTIN BIEL

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ENGINEERING SCIENCES

(2)

(3)

Sensor-Based Trajectory Optimization

M A R T I N B I E L

Master’s Thesis in Optimization and Systems Theory (30 ECTS credits) Master Programme in Applied and Computational Mathematics

(120 credits) Royal Institute of Technology year 2016 Supervisor at ABB: Mikael Norrlöf Supervisor at KTH: Xiaoming Hu Examiner: Xiaoming Hu

TRITA-MAT-E 2016:28 ISRN-KTH/MAT/E--16/28--SE

Royal Institute of Technology SCI School of Engineering Sciences KTH SCI SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

(4)

(5)

Sammanfattning

Inom robotindustrin är det nödvändigt att nyttja effektiva banplanerare för att utföra givna uppgifter med god prestanda, och även förse användaren med ett lätthanterligt gränssnitt som inte kräver manuell inställning för att erhålla optimala rörelsemönster. En vanlig metod inom banplanering är att först generera en geometrisk väg och sen bestämma en optimal bana längs den vägen. Det här arbe- tet undersöker en alternativ metod där den underliggande vägen tillåts förändras dynamiskt. Slutresultatet är ett generellt ramverk samt en mjukvaruimplementa- tion för att planera lösningsbanor till optimala styrproblem i realtid. Specifika lösningar till problem som berör kollisionshantering och att fånga rörliga mål genomförs genom att introducera målfunktioner och villkor i det underliggande optimeringsproblemet. Banplaneraren utvärderas efter tillämping i enklare robo- tapplikationer, där rörliga hinder och rörliga mål förekommer. Banplaneraren ser lovande ut för framtida bruk, men behöver utvecklas vidare innan den används i riktiga applikationer.

i

(6)

(7)

Abstract

In the robot industry, it is necessary to employ efficient trajectory planners to perform tasks with good performance, as well as provide the end-user with a friendly interface which does not require manual tuning to achieve optimal movements.

A common method for trajectory planning is to first generate a geometric path and then determine an optimal trajectory along that path. This work investigates an alternative method were the underlying path is allowed to change dynamically during the planning procedure. The end result is a general framework and a software implementation for planning solution trajectories to optimal control problems in real time. Specific adaptations are made, to solve problems related to collision avoidance and moving targets, by introducing objective functions and constraints in the underlying optimization problem. The planner is evaluated after applying it in simple robot applications, that feature moving obstacles and targets. The planner looks promising for future use, but needs to be explored further before being applied in real applications.

iii

(8)

(9)

Acknowledgements

I would like to thank my supervisor Mikael Norrlöf for all the insightful discussions and the great amount of support I have received during my work.

I would also like to thank ABB and my manager Mattias Björkman for the warm welcome I received and for the compensation of my work and travel ex- penses.

Finally, I would like to thank my examiner Prof. Xiaoming Hu at KTH for the discussions about my project.

(10)

(11)

1

Introduction

Robotic manipulators are frequently utilized in industry and constitute a common topic of research. Industrial robots are able to accomplish their given tasks with high performance and are therefore distinguished tools for automating different processes. Classical examples of tasks include moving a tool along a given path or different pick and place scenarios. Typical performance measures are low transition time or high precision. In addition, energy minimizing movements are of great importance in many industries.

An important aspect of robot control is motion planning. An adequate trajectory planner is a prerequisite for achieving high performance in terms of the mentioned measures. In addition, automatically providing services for optimiz- ing the robot movement, instead of requiring the robot programmer to manually tune parameters, leads to a more user-friendly interface. In the area of robotic manipulators, real-time motion planning is challenging due to the often complex system dynamics and fast movement. In brief, since the robots are capable of performing their tasks with great speed it becomes critical to plan the motion fast, which is made difficult by the complexity of the problem.

Nowadays, there are readily available embedded computers and sensors with high computational performance. As a result, it is possible to construct real-time adaptive motion planners, that can react to sensor events, providing positions of obstacles that should be avoided or moving targets that should be tracked.

Apart from just finding a collision-free path, the planner should determine a trajectory along the path that can be realized by the manipulator. As such, there is a distinction betweenpath generation and path following.

1

(14)

2 1 Introduction

High level planner Path

generation

Dynamical optimization Path

following

Low level controllers

Set of via-points

Reference values for controllers

High level planner

Dynamical optimization

Low level controllers

Trajectory Planner

Combined path generation

andpath following Reference values for controllers

Figure 1.1: A traditional serial path and trajectory optimization versus a combined path and trajectory generation process.

1.1 Background

Traditionally, the motion planning problem is decoupled into two parts. First, the geometric path is determined by extracting some geometry specified by the user. Alternatively, the path is generated by a high level planner, based on for example Dijkstra algorithm [3], A* [7] or a more modern approach based on probabilistic roadmaps [8]. Common to these types of planners is that they only consider geometric constraints and ignores system dynamics. This first process is here denoted aspath generation. Afterwards, an optimization based planner, that accounts for dynamics and other constraints, determines an optimal trajectory along the predefined path. The second process is here defined aspath following.

The two step planning procedure provides a highly efficient solution that can be realised on-line and is generally a sufficient approach since many applications infer a specific geometric path. However, if the manipulator works in dynamic environments that feature for example moving targets and obstacles, then it is necessary to re-plan the geometric path as well as the trajectory along that path.

Additionally, the two-step approach is in essence suboptimal, since the path generation does not consider system dynamics. A proposed solution could be to introduce a trajectory planner that combines path generation and path following, and updates the trajectory in real-time based on sensor input. Such an approach would involve higher complexity than the two-step procedure, but could lead to a solution that is better suited in dynamic environments. Fig. 1.1 illustrates the distinction between the decoupled approach and the combined approach.

1.2 Problem formulation

The objective of this thesis work is to investigate the possibility of constructing a real-time capable trajectory planner where the path is allowed to change dynam-

(15)

1.3 Related work 3

ically. The planner should be able to compute a trajectory that is consistent with some given system dynamics. The work will assume the existence of low level controllers that, given the reference values of a feasible trajectory, can compute the control values required to realize the trajectory. For example, in a robotics application, this would include generating appropriate voltage and current values and synchronizing the different joint movements in time. Also, it is assumed that the current state of the system, as well as the location of obstacles and targets, can be obtained through sensor readings and/or state estimation.

Additionally, it will be investigated how well such a planner could be applied in the field of robotic manipulators. Specifically, the target application will be a pick and place scenario. In such applications, two important aspects are collision avoidance and tracking of moving targets. As such, the planner should also be able to compute collision-free trajectories that can accommodate a moving end point.

1.3 Related work

In [2], F.Debrouwere provides an in depth study of the dynamic trajectory optimization when an a priori geometric path is given. The path following problem is transformed to a one dimensional problem through a variable transformation.

Next, the obtained problem is solved efficiently using a modified second order cone solver. In addition, [2] includes extensive modelling work, such as more realistic robot models, constraints and collision avoidance.

Khatib et.al developed a technique to dynamically update a given trajectory based on sensor input through a so called elastic band framework [12]. A given trajectory that connects the start state to the end state generates an initial elastic band, which then deforms around obstacles by letting the obstacles infer virtual forces on the band. The technique enables real-time collision avoidance while also keeping the geometric path as short as possible. However, there is no clear cut way to include the system dynamics in the trajectory planning, nor allow for time optimal transitions.

The concept of elastic bands was extended in [13] and [15], where the time interval between poses is included to form a so calledTimed Elastic Band (TEB).

The inclusion of temporal information makes it possible to account for system dynamics and optimize with respect to time. The trajectory planning problem is modelled as a multi-objective optimization problem, where collision avoidance and system dynamics are introduced through penalty functions in the objective while also aiming to minimize the transition time. In [15], the concept was further improved by saving old optimal trajectories. After the trajectory has been recomputed, the global optimal is chosen among the candidates. Most recently, in [14], the timed elastic band concept was applied in an MPC setting. The approach looks promising as a basis for a real-time reactive planner, but has so far only been tested in simulation, and not in the context of robotic manipulators.

(16)

4 1 Introduction

1.4 Approach

The main focus will be on constructing a real-time capable trajectory planner that could potentially be applied to many different optimal control problems. To this end, the intention of this work is to adopt the TEB concept in combination with non-linear model predictive control techniques. In addition, the planner will then be applied in the field of robotic manipulators. The idea is to investigate if a combined path generation and path following approach could serve as an effec- tive alternative to the two-step approach, especially when operating in dynamic environments. Also, the outcome could be used to evaluate the applicability of the planner to specific problems. To limit the scope, the work will frequently rely on already existing solutions to common problems in optimization and robotics.

Regarding the target application of pick and place, the work will be limited to seeking simple solutions to problems that involve obstacle avoidance and tracking of moving targets. The aim is to evaluate how well the trajectory planner can be extended to solve additional problems. The results could then be used as support when addressing the possible future use of the methodology and the tools presented in this work.

(17)

Preliminaries 2

In this chapter, some mathematical preliminaries are provided as a base for further chapters. First, a brief summary of concepts from robotics is supplied together with some terminology and notation. Second, the motion planning problem for robotic manipulators is posed as an optimal control problem. Further- more, the concept of timed elastic bands is introduced and used to restate the motion planning problem in a form suitable for on-line trajectory optimization.

Finally, non-linear model predictive control is presented as a method for solving the reformulated problem in real time.

2.1 Robot modelling

The following overview of robot modelling is based on [17]. In the field of robotics, rigid motions are commonly represented as homogeneous transforma- tions, which provides a framework where matrices can be used to denote trans- lations and rotations. Furthermore, robot positions and movements can be represented through a set of joint variables. Two important spaces are introduced, theconfiguration space and operational space. The configuration space is denoted Qand includes all possible joint poses qqq. The operational space is denoted O and constitutes the Cartesian space where the manipulator is positioned and operates in. Theworkspace W ⊂ O includes all possible end-effector positions yyy, which are the points the manipulator can reach with respect to the joint constraints.

Two classical problems in robot modelling connects the configuration space with the the operational space. Theforward kinematics problem concerns determining the position and orientation of the end-effector given a specific joint con- figuration. With notation, the problem involves finding a function χ(qqq) so that

yyy φ φ φ

!

= χ_y(qqq) χ_φ(qqq)

!

= χ(qqq)

5

(18)

6 2 Preliminaries

There exists procedures for systematically determining the forwards kinematics and established solutions exist for many robot types. Theinverse kinematics problem concerns the more complex procedure of inverting the above relation. Specif- ically, it involves determining a joint configuration given an end-effector position and orientation. Typically, there exists multiple joint solutions and sometimes even an infinite number if there exists redundant joints. In simple examples, it is usually possible to find a closed-form solution. However, more complicated robots typically require utilization of some numeric approach. This work will not delve further into solution methods for these problems and instead utilize existing solutions.

Given a forward kinematics function χ(qqq), the analytic Jacobian

J(qqq) = J_v(qqq) J_ω(qqq)

!

of χ(qqq) provides a transformation from joint velocity ˙qqq to linear velocity vvv and angular velocity ωωω of the end-effector through

vvv ωωω

!

= J(qqq)qqq

Consequently, the inverse kinematics combined with the velocity Jacobian can be used to determine a target joint configuration and joint velocity required to perform a specific task. However, it is not always possible to find a solution given an end-effector position and velocity. Joint configurations for which J(qqq) drops rank are called singularities. At a singularity, some directions of motion might be unattainable and some target velocities might even infer unbounded joint velocities and/or control input. Hence, identifying and handling singularities is an important problem in robotics. This work will not discuss direct solutions to this problem, but it will be addressed during the implementation of the trajectory planner.

The dynamics of rigid motions are derived from analytical mechanics. Based on the principle of virtual work and the Euler-Lagrange equations, one can derive a matrix form of the dynamical equation of motion for a robotic manipulator

M(qqq(t))¨qqq(t) + C(qqq(t), ˙qqq(t)) ˙qqq(t) + g(qqq(t)) = τττ(t) (2.1) A time dependence has been added to emphasize that the joint variables now are dynamic entities. The first term, M(qqq(t))¨qqq(t) includes the inertia matrix M(qqq(t)).

The second term, C(qqq(t), ˙qqq(t)) ˙qqq(t) involves centrifugal and Coriolis terms, which describe the interaction between the links. The third term, g(qqq(t)) stems from the potential energy and includes external forces such as gravity. The right hand side contains the torque vector τττ(t) which is typically used as the control variable. In implementation, an intermediate step involving actuator dynamics is required to attain a certain torque. Independent joint control treats each manipulator joint as an independent SISO system and derives corresponding control laws for determining voltage and current values. Furthermore, there exists several control

(19)

2.2 Optimal control problem 7

techniques, such asinverse dynamics or passivity-based control, that can be applied to realize a given trajectory. It is assumed here that given a feasible trajectory, each reference state

(qqqi, ˙qqq_i, ¨qqq_i)

can be realised through application of such techniques.

2.2 Optimal control problem

The motion planning problem falls under a more general category of problems called optimal control problems

minuuu(.) t_f

Z

t₀

f0(xxx(t), uuu(t), t)dt s.t.











˙xxx(t) = f (xxx(t), uuu(t), t) xxx(t) ∈ X_t

uuu(t) ∈ U_t(xxx(t)) φ₀(xxx(t₀), uuu(t₀), t₀) = 0 φ_f(xxx(t_f), uuu(t_f), t_f) = 0

(2.2)

2.2.1 General motion planning problem

The motion planning problem for a robotic manipulator is now formulated as an optimal control problem, following the structure of (2.2). The system dynamics will be given by the rigid motion dynamics (2.1). The system variables will be the joint variables, which are constrained to the configuration space Q and connected to the operational space through some given forward kinematics χy. For simplicity, this work will leave out specifying a target orientation for the end-effector.

Some applications, such as pick and place, might require including a target orientation. However, adding this functionality is straightforward and the only reason for not including it here is simply to limit the scope. The input torques will be used as control variables, with simple predefined bounds. The trajectory will be constrained by given position and velocity endpoints, specified in operational space. First, as a general formulation, the aim will be to minimize the transition time. For simplicity, the starting time is set to zero and the end time is denoted T . Hence, the general motion planning problem is given by

minτττ(.) T s.t.











M(qqq(t))¨qqq(t) + C(qqq(t), ˙qqq(t)) ˙qqq(t) + g(qqq(t)) = τττ(t) qqq(t) ∈ Q

τττ−≤τττ(t) ≤ τττ₊ yyy(t) = χy(qqq(t)) yyy(0) = yyy₀, ˙yyy(0) = ˙yyy₀ yyy(T ) = yyyT, ˙yyy(T ) = ˙yyy_T

(2.3)

2.2.2 Constraint extensions

A first extension to this problem could be to introduce more kinematic constraints in both the configuration space and the operational space. Both for safety and

(20)

8 2 Preliminaries

product lifetime reasons it is beneficial to limit the speed at which the robot operates. One could impose constraints on the joint speeds, either directly or by introducing some joint dependence in the control input limitations. In addition, one could directly pose limits on the Cartesian velocity to accommodate for some task constraint. Simultaneously including these velocity constraints will most probably be redundant since having one constraint active will typically imply that the others are satisfied. However, a general inequality constraint that cap- ture all desired limitations could become complicated and it is probably more flexible to specify the velocity constraints directly. Including these constraints yields the following formulation,

minτττ(.) T s.t.











˙qqq−≤ ˙qqq(t) ≤ ˙qqq₊ τττ−≤τττ(t) ≤ τττ₊ yyy(t) = χy(qqq(t))

|J(qqq(t)) ˙qqq(t)| ≤ v_max yyy(0) = yyy0, ˙yyy(0) = ˙yyy₀ yyy(T ) = yyy_T, ˙yyy(T ) = ˙yyy_T

(2.4)

Another important extension is to formulate criteria for collision avoidance.

As suggested in [5], if the manipulator and an obstacle are modelled as a unions of convex polyhedra

P = ∪ⁿ_i=1^P P⁽ⁱ⁾ and

Q = ∪ⁿ_i=1^QQ⁽ⁱ⁾ then the obstacle is avoided if and only if

P⁽ⁱ⁾∩Q^(j)= ∅ ∀i = 1, . . . , n_P, and ∀j = 1, . . . , n_Q

this criterion can now be formulated as joint constraints which could be included in (2.3). However, introducing collision avoidance through this involved framework is an extensive task. Therefore, this work incorporates a more simplified approach, which is covered in Section 3.4.

2.2.3 Objective extensions

A natural extension to the objective function of (2.3) is to include energy considerations. Different sources of energy loss, such as electrical losses in the actuators or mechanical losses due to coulomb friction, can be modelled and included in the formulation. The most simple approach is to minimize the energy of the control signal, which indirectly reduces the energy dissipation. A problem occurs if the transition time objective is completely replaced by an energy measure of the input signal, since the control input can be made arbitrary small if no exter- nal forces are present and T → ∞. To counter this, the transition time could be

(21)

2.3 Timed elastic band 9

bounded by some predefined ¯T_max. Hence, the energy minimizing formulation is given by

minτττ(.)

ZT

0

|τττ(t)|²dt s.t.











τττ−≤τττ(t) ≤ τττ₊ yyy(t) = χ_y(qqq(t)) yyy(0) = yyy₀, ˙yyy(0) = ˙yyy₀ yyy(T ) = yyyT, ˙yyy(T ) = ˙yyy_T T ≤ ¯T_max

(2.5)

Another approach could be to combine both formulations, which yields a trade-off problem between transition time and energy loss. Employing ¯α and β as selection parameters, (2.3) is reformulated as¯

minτττ(.) αT +¯ ZT

0

β|τττ(t)|¯ ²dt s.t.











τττ−≤τττ(t) ≤ τττ₊ yyy(t) = χy(qqq(t)) yyy(0) = yyy₀, ˙yyy(0) = ˙yyy₀ yyy(T ) = yyyT, ˙yyy(T ) = ˙yyy_T

(2.6) In short, varying ¯α and ¯β leads to some solution along the Pareto frontier, i.e. an optimal trade-off between transition time and energy loss.

2.3 Timed elastic band

Introduce the state vector

xxx(t) = qqq(t)

˙qqq(t)

!

which includes the position and velocity trajectories of the manipulator. The dynamics of the introduced state can be obtained by rewriting (2.1) into

¨qqq(t) = M⁻¹(qqq(t)) (τ(t) − C(qqq(t), ˙qqq(t)) − g(qqq(t))) Now, let

f (xxx) = −M⁻¹(qqq(t)) (C(qqq(t), ˙qqq(t)) + g(qqq(t))) h(xxx) = M⁻¹(qqq(t))

and define

A = 0 I 0 0

!

, B = 0 I

!

(22)

10 2 Preliminaries

The complete state dynamics can now be condensed to

˙xxx(t) = Axxx(t) + B(f (xxx(t)) + h(xxx(t))τττ(t)) Discretize the state and input trajectories by finite differences

xxx := {xxx₁, xxx₂, . . . , xxx_n} τττ := {τττ₁, τττ₂, . . . , τττ_n−1}

where xxx_i ∈ R^p, τττ_i ∈ R^qand the finite time difference is given by ∆T . Note that n and ∆T are not fixed, and will be allowed to vary as the band deforms in time. n will be bounded between some predefined ¯n_minand ¯n_max, while for ∆T it should obviously hold that ∆T > 0. The discrete states and inputs are constrained by the approximated system dynamics, obtained through forward differences

xxx_k+1−xxx_k

∆T = Axxx_k+ B(f (xxx_k) + h(xxx_k)τττ_k)

Following the notation of [14], the state and input sequences, along with the time difference, are lumped into a so called Timed Elastic Band (TEB) set B ⊆ R^d^˜where d = 2np + (n − 1)q + 1˜

B:= {xxx1, τττ1, xxx2, τττ2, . . . , xxxn−1, τττn−1, xxxn, ∆T }

the set constitutes the timed elastic band and can now be deformed in space and time while aiming to satisfy the approximated system dynamics between each state.

The general motion planning problem (2.3) can now be rewritten in the TEB framework. The problem will be formulated in the variables included in the TEB set, which requires elimination of the cartesian path coordinates appearing in (2.3). For simplicity, assume that the initial position qqq_sand velocity ˙qqq_sare known through, for example, state estimation, and that the final velocity ˙qqq_f is zero. Also, assume the existence of inverse kinematics so that the final state can be obtained as qqq_f = χ⁻_y¹(yyy_T), where yyy_T is some given target. The possible existence of multiple solutions is addressed in Section 3.6. Hence, the boundary states are given by

xxx_s= qqq_s

˙qqq_s

!

, xxx_f = χ⁻_y¹(yyyT) 000

!

The planning problem (2.2) can now be reformulated as minB (n − 1)∆T

s.t. xxxk+1−xxxk

∆T −Axxxk+ B(f (xxxk) + h(xxxk)τττk) = 0 (k = 1, 2, . . . , n − 1) τττ⁻≤τττk ≤τττ+ (k = 1, 2, . . . , n − 1) xxx₁= xxx_s, xxx_n= xxx_f, ∆T > 0

(2.7)

for a fixed n, ¯n_min≤n ≤ ¯n_max.

(23)

2.4 Non-linear model predictive control 11

2.4 Non-linear model predictive control

Before introducing the proposed planning procedure, a brief analysis of the reformulated planning problem (2.7) is required. Providing technical proofs of optimality, feasibility or stability is beyond the scope of this work, but some heuristic arguments will be used to support the solution method described in the next section.

The approximated system dynamics that appear in the constraints can be highly non-linear and the problem is typically not convex. As a result, it is generally not possible to determine a global optimal solution. Additionally, it would not be suitable for a planner that should be capable of operating in real-time to search for a global optimal solution, since the required computation time would be too large. Instead, the aim will be to determine a feasible trajectory that can be improved during subsequent iterations. A common technique for attaining such suboptimal trajectories is non-linear model predictive control. During each control cycle, the current trajectory is improved through some iterative optimization method, and then only the first state and input is used as basis for the next control action. In the next cycle, the current state is obtained directly from sensor readings or from state estimation. The next optimization iteration can either warm-start directly from the previous solution or shift-initialize the trajectory based on the new starting state.

To achieve a quasi time-optimal trajectory, the length n will be allowed to vary between iterations. By doing so, the resolution of the trajectory can be made more fine in the vicinity of obstacles to increase the possibility of finding a feasible solution without collisions. Likewise, the trajectory can be made more coarse when the path is clear and the manipulator approaches the target. Aiming to keep the trajectory length n low while retaining the possibility of finding a feasible solu- tion is a good basis for achieving a low transition time without the requirement of long computational periods. Consequently, (2.7) will be solved for different n during the planning procedure.

To explore the possibility of maintaining feasibility when the length of the solution trajectory is altered a couple of assumptions are required:

• ∃n^∗, ¯n_min ≤n^∗ ≤n¯_maxsuch that (2.7) has a feasible solution B_n∗^∗ .

• There is no model-plant mismatch.

The latter assumption is not likely to hold in implementation, especially if ∆T greatly differs from the inherent sampling time, but there exist robust MPC tech- niques that counter this issue [9]. Now, if a feasible trajectory with length n^∗has been computed

B_n∗^∗ := {xxx_s, τττ₁, xxx₂, τττ₂, . . . , xxx_n^∗−1, τττ_n^∗−1, xxx_f, ∆T }

it can easily be extended to a trajectory of length n^∗+ 1 if the final position is a steady state. It then holds that xxx_n^∗₊₁ = xxx_f when using the input τττ_f = 000, or some τττ_f , 000 if external forces such as gravity are present. Hence, the new trajectory is given by

B^∗

n^∗+1:= {xxx_s, τττ₁, xxx₂, τττ₂, . . . , xxx_n^∗−1, τττ_n^∗−1, xxx_f, τττ_f, xxx_f, ∆T }

(24)

12 2 Preliminaries

Thus, there exists at least one feasible trajectory of length n^∗+ 1 which can be im- proved. Alternatively, if the trajectory length is reduced to n^∗−1, either a feasible solution exists and it can be improved and otherwise the previous trajectory B_n^∗^∗ can be used as basis for control. In the next cycle the solution set is shift initial- ized. With the assumption of no model mismatch, the state xxx2will be attained in the next cycle and the principle of optimality implies that

B^∗

n^∗−1 := {xxx₂, τττ₂, . . . , xxx_n^∗−1, τττ_n^∗−1, xxx_f, τττ_n^∗−1, xxx^∗_n, ∆T } is a feasible solution to

minB (n − 1)∆T s.t. xxx_k+1−xxx_k

∆T −Axxx_k+ B(f (xxx_k) + g(xxx_k)τττ_k) = 0 (k = 1, 2, . . . , n − 1) τττ−≤τττ_k ≤τττ₊ (k = 1, 2, . . . , n − 1) xxx1= xxx2, xxxn= xxxf, ∆T > 0

with n = n^∗−1. Hence, the solution set B^∗_n^∗−1 is feasible in the next cycle and can then be further improved. In summary, if a feasible trajectory is supplied as a starting guess it is possible to maintain feasibility of the trajectory while it is improved in the iterative optimization process.

Finally, the stability of the nonlinear model predictive control approach should be adressed. It has been shown that under some mild conditions suboptimal model predictive controllers are stabilizing [16]. In brief, as long as feasibility is maintained, the state is steered towards the terminal state. The results are based on an approach similar to standard Lyapunov theory. One of the conditions is that during each iteration there is a sufficient decrease in the objective, which in [16] is ensured by assuming quadratic objective functions. To enforce convergence, (2.7) can be reformulated to a tracking error minimization problem with a quadratic objective function.

B\{∆T }min Xn−1 k=1

||xxx_k−xxx_f||²

s.t. xxx_k+1−xxx_k

∆ ¯T_sample −Axxx_k+ B(f (xxx_k) + h(xxx_k)τττ_k) = 0 (k = 1, 2, . . . , n − 1) τττ−≤τττ_k ≤τττ₊ (k = 1, 2, . . . , n − 1) xxx₁= xxx_s, xxx_n = xxx_f

(2.8)

Note that the inherent sampling time ∆ ¯Tsample is used in the tracking formulation, which indicates that the time increment is no longer allowed to vary. This is to ensure that time deformation does not disrupt the convergence properties of the quadratic formulation, as well as reduce the model mismatch. In essence, this is a more traditional non-linear MPC formulation. Assuming the planning procedure will be able to steer the manipulator close enough to the target using the time minimizing formulation (2.7), a switching strategy that utilizes the tracking formulation (2.8) could be employed to ensure that the goal is reached.

(25)

Trajectory Planner 3

The trajectory planner is introduced together with a discussion of the different implementation details. First, the main method used to solve (2.7) is presented, followed by the solution approaches to problems that involve collision avoidance and tracking of moving targets. In addition, the concept of planning multiple trajectories is introduced and discussed. Afterwards, the full solution method is summarized and provided as a real-time algorithm. Finally, a software implementation of the trajectory planner is presented.

3.1 Trajectory initialization

The main idea of the trajectory planner is to successively improve a trajectory as it is being traversed. Consequently, an initial trajectory suggestion has to be supplied. As discussed before, the chance of maintaining a feasible trajectory is increased if the initial trajectory is feasible. The planner uses a primitive approach by simply calculating a linear interpolation between the start and goal state in the joint space. Hence, given a predefined initial trajectory length ¯n_init and initial time increment ∆ ¯T_init, the initial joint trajectory is obtained through

qqq_i = qqq_s+ i

¯

n_init(qqq_f −qqq_s)

To form a feasible trajectory, a linear velocity is determined which realises the linearly interpolated trajectory. Such a velocity is given by

˙qqq_l =qqq_i+1−qqq_i

∆ ¯T_init

for any i = 1, . . . , ¯n_init. Furthermore, if forward dynamics are available, the planner can also calculate the control input required to attain this velocity. First,

13

(26)

14 3 Trajectory Planner

determine the required initial acceleration

¨qqq_l = ˙qqq_l−˙qqq_s

∆ ¯Tinit

The required input is then given by

τττs = M(qqqs)¨qqq_l+ C(qqqs, ˙qqq_l) ˙qqq_l+ g(qqqs)

During the intermediate states, zero control input is required, and the linear tra- jectory is traversed with the linear velocity ˙qqq_l. At the final state, either the inverse input is applied to again attain zero velocity. Otherwise, if the goal velocity is non-zero, another final input is determined by

τττ_n_¯_init−1= M(qqq_n_¯_init−1)¨qqq_n−1+ C(qqq_n_¯_init−1, ˙qqq_n_¯_init−1) ˙qqq_n_¯_init−1+ g(qqq_n_¯_init−1) where

¨qqq_n_¯_init−1 = ˙qqq_f −˙qqq_n_¯_init−1

∆ ¯Tinit

An immediate problem with this simple approach is the risk of violating the bounds on velocity and control input. In this case, the planner will clamp the initial trajectory so that it satisfies the bounds, with the hope that the trajectory is still somewhat feasible.

The primitive initial trajectory will probably be sufficient in simple scenarios.

However, if the system is complicated, or the workspace contains numerous obstacles and a convoluted topology, the planner will probably have to rely on some existing path planning procedure to generate a collision free path. Another idea could be to use the efficient two-step approach [2] to generate a feasible initial trajectory.

3.2 Deformation of the timed elastic band

The trajectory planner stores the currently planned trajectory as a TEB set B^∗. During each control cycle, the trajectory is deformed with the aim of achieving a feasible quasi time-optimal solution. Figuratively, the trajectory is first deformed in time during contraction/expansion of the optimization vector, and then deformed in space during the subsequent optimization iterations.

3.2.1 Deformation in time

The length of the timed elastic band, and accordingly the solution vector B^∗, will be varied according to a simple decision rule, as suggested in [14]. Ideally, ∆T should be close to the inherent sampling time of the system since this would reduce the model-mismatch. Hence, let ∆ ¯T_ref denote a predefined reference time related to the sampling time and also define ∆ ¯T_hyst≈0.1∆ ¯T_ref which will be used

(27)

3.2 Deformation of the timed elastic band 15

to avoid oscillations during updates. The decision rule is as follows:

During each TEB update i−











Insert a new state if ∆T_i > ∆ ¯Tref + ∆ ¯Thyst∧ni < ¯nmax

Remove a state if ∆T_i < ∆ ¯T_ref − ∆ ¯T_hyst∧n_i > ¯n_min Leave the TEB unchanged otherwise

The predicted total time should not change, so it holds that

∆Ti+1ni+1 = ∆Tini

Hence, inserting a new state will decrease ∆T and this action is therefore de- fined ascontracting in time. Similarly, removing a state will increase ∆T and it is defined as expanding in time. During each control cycle, the TEB update is performed a predefined ¯I_{T EB}number of times.

After the length has been changed, the complete length of the new TEB set B^∗

i+1is given by

d = 2n¯ _i+1p + (n_i+1−1)q + 1

Consequently, the problem (2.7) has now changed and it should be discussed how states are to be inserted/removed from B_i^∗since B_i+1^∗ will be used as a starting guess when calling the optimization solver. A first approach could be, in an attempt to maintain feasibility when inserting a state, to simply duplicate the last state if it is a steady state. However, there is no guarantee that this a suitable starting guess and there is no clear cut approach that can be applied when removing a state. Instead, the method used here is to perform a linear re- interpolation in time of the TEB set B^∗_i. After a contraction in time, a duplication of the end state is added to end of the trajectory. Next, the intermediate points are replaced by linear interpolations of the old trajectory. Likewise, after expanding in time, the state before the end state is removed and the intermediate points are re-interpolated. In brief, the intermediate states are updated according to

qqq⁽ⁱ⁺¹⁾_k = qqq⁽ⁱ⁾_¯k + γ_k(qqq⁽ⁱ⁾_¯k+1−qqq⁽ⁱ⁾_¯k ) where

¯k =

$ k∆Ti+1

∆T_i

%

=

$ k ni

n_i+1

%

and

γ_k = k∆T_i+1

∆Ti

− ¯k = k n_i ni+1

− ¯k, γ_k∈[0, 1)

The same method is applied to re-interpolate the control inputs. If n is large enough, the re-interpolated trajectory will be similar to the previous trajectory, which increases the chance of retaining feasibility. If the previous trajectory satisfies the bounds on state, velocity and input, then the re-interpolated trajectory will satisfy them as well, since these simple bounds define a convex set of allowed values. This does not apply to all trajectory constraints as the non-linear dynamics typically make the feasible region non-convex. An alternative approach could be to instead use a cubic re-interpolation scheme, which might have better chance of maintaining feasibility.

(28)

3.2.2 Deformation in space

After updating the length of the trajectory, some variation of (2.7) will be solved for the current value of n. Each non-linear optimization problem is solved us- ing theSequential Quadratic Programming (SQP) method. Appendix A provides a brief theoretical overview of the SQP method, as well as discuss different implementation details of the SQP solver included in the trajectory planner. The SQP Solver runs for a predefined ¯I_SQP number of iterations after each TEB update.

Hence, the complete number of optimization iterations performed each control cycle is given by ¯IT EB· ¯ISQP.

Employing the SQP method requires gradient evaluations of the objective and the constraints of (2.7). Since the system dynamics are usually highly non-linear it is not likely that analytical gradient information is available. Instead, gradients are obtained through automatic differentiation, a modern technique for comput- ing derivatives that is suited for optimization methods [10]. Appendix B provides an introduction to automatic differentiation. The technique can also be used to compute the required Hessians. However, as discussed in Appendix A, the planner implementation will opt to use a quasi-Newton update instead, which induces less computational cost and ensures that the Hessian is positive definite.

Each call to the SQP solver uses the current trajectory B_i^∗, which might have been re-interpolated during the TEB update, as a starting guess. Assuming that the initial starting guess is adequate, the SQP solver should most of the time be able to improve the trajectory through subsequent iterations, even though the problem changes slightly between calls. However, if no feasible solution is found or the SQP procedure aborts for some other reason, then the TEB updates performed during that control cycle are rejected and the solution is reverted to the trajectory that was computed in the previous control cycle. By doing so, the trajectory will stay feasible during the planning procedure, assuming the initial trajectory is feasible.

3.3 Control application and planner update

Now, the MPC action of the trajectory planner is performed. When the solver has finished deforming the trajectory, the first state of the TEB set is used to determine the control action. In real applications, the complete state, including position, velocity, acceleration and the control input, can be fed to some low level controllers which then generate signals to realize the movement. In the simula- tions performed in the next section, the first control input is simply applied to the system.

After the control application, the resulting state of the system is obtained through sensor readings. The stored trajectory is then shift-initialized to the current state. The previous first states and the applied input are removed, and the new start states are replaced with the sensor readings. In addition, the planner can here be supplied with a new end state, if say the target has moved during the control cycle. The update effectively shrinks by one state. If the number of state fall below ¯n_min, the trajectory is contracted in time which inserts a new state.

(29)

3.4 Collision avoidance 17

The sensor readings will also be used to update the state of the planner. If the new end state is not within a predefined close proximity ¯σp of the previous end state, the trajectory is re-initialized. Also, if the current state enters a predefined tracking vicinity ¯σt, the underlying optimization problem of the solver is switched to (2.8) to enforce convergence. If the target is reached to within some predefined tolerance ¯, the planning procedure stops.

3.4 Collision avoidance

The trajectory planner will incorporate simple collision avoidance in the sense that it will only aim to keep the end-effector from colliding. The simplified collision avoidance is used in this work as a proof of concept, to see how well the planner can deform the trajectory around simple obstacles. Future work may involve complete collision models, as in for example [5] where rigid bodies are modelled as a collection of polyhedra.

The simple obstacles are modelled as spheres with a fixed center coordinate and size. Assume there are m such obstacles known to the planner in a specific control cycle. Let obstacle j be located at Ojwith radius rj. The planner will utilize a predefined safety distance ¯s that should be maintained from each obstacle.

Hence, it holds that

||yyy(t) − Oj||²≥( ¯s + rj)² (j = 1, 2, . . . , m) or equivalently

||χy(qqq(t)) − Oj||²≥( ¯s + rj)² (j = 1, 2, . . . , m)

if the path is collision-free. In the timed elastic band framework, the above condition becomes

||χy(qqqk) − Oj||²≥( ¯s + rj)² (k = 1, 2, . . . , n − 1) (j = 1, 2, . . . , m) (3.1) where

qqqk = I 0

xxxk := Cxxxk

Now, collision avoidance can be included in the planning procedure by introducing (3.1) in (2.7), either through penalty functions or as inequality constraints.

The preferred method will be to add a penalizing function to the objective for each obstacle. As a result, (2.7) becomes

minB (n − 1)∆T − Xm

j=1

Xn−1 k=1

||χy(Cxxxk) − Oj||²

∆T −Axxx_k+ B(f (xxx_k) + h(xxx_k)τττ_k) = 0 (k = 1, 2, . . . , n − 1) τττ−≤τττ_k ≤τττ₊ (k = 1, 2, . . . , n − 1) xxx1= xxx_s, xxxn= xxx_f, ∆T > 0

(3.2)

(30)

The effect of the penalizing functions should be similar to the original elastic band formulation where the obstacles infer virtual forces to deform the trajectories around them. To achieve a more focused effect, the planner will only include points on the band within a predefined close proximity ¯σopof an obstacle in the corresponding avoidance formulation. Define the index set of all states in the TEB set within the close proximity of obstacle j

Kj, ¯σ_op :=n

k : ||χy(Cxxxk) − Oj||²≤( ¯σop+ rj)²o

Note that K_{j, ¯}_σ_op will be determined based on the current state of the trajectory and then held fixed during the following control cycle. Now, (3.2) becomes

minB (n − 1)∆T − Xm j=1

X

k∈K_{j, ¯}_σop

||χ_y(Cxxx_k) − O_j||²

s.t. xxxk+1−xxxk

∆T −Axxxk+ B(f (xxxk) + h(xxxk)τττk) = 0 (k = 1, 2, . . . , n − 1) τττ⁻≤τττk ≤τττ+ (k = 1, 2, . . . , n − 1) xxx₁= xxx_s, xxx_n= xxx_f, ∆T > 0

(3.3)

With this formulation, paths that wrap around obstacles will be favoured over those that pass through them. As a result, the trajectory will be continuously deformed away from obstacles until no points on the band are within the close proximity of any obstacle, i.e. K_{j, ¯o}_op = ∅ for all j.

If the end-effector enters the close proximity of obstacle i then the collision avoidance will be enforced through a hard constraint, so that (3.3) becomes

minB (n − 1)∆T − Xm j=1 j,i

X

k∈K_{j, ¯}_σop

||χ_y(Cxxx_k) − O_j||²

∆T −Axxx_k+ B(f (xxx_k) + h(xxx_k)τττ_k) = 0 (k = 1, 2, . . . , n − 1) τττ−≤τττ_k ≤τττ₊ (k = 1, 2, . . . , n − 1) xxx1= xxxs, xxxn= xxxf, ∆T > 0

||χ_y(Cxxx_k) − O_i||²≥( ¯s + r_j)² (k = 1, 2, . . . , n − 1)

(3.4)

As (3.3) is a less complex problem than (3.4) the strategy of preferring (3.3) saves computation time as well as increase the chance of finding a feasible solution.

The downside is that the manipulator may enter the close proximity on a trajectory that still collides with the obstacle and then fail to find a feasible solution.

The risk of such scenarios will be greatly influenced by the choice of ¯σ_op, which in implementation should be set conservatively high so that the planner has enough time to deform the trajectory around the obstacles.

In brief, the trajectory will be deformed around the obstacles known to the planner through formulation (3.3) and avoidance will only be enforced through (3.4) if the end-effector gets too close.

(31)

3.5 Tracking moving targets 19

3.5 Tracking moving targets

Assume the objective is to track some moving target, represented here by the curve yyy^tg(t) in Cartesian space. Reaching the target is then equivalent to the following end condition,

yyy(T ) = yyy^tg(T ) ˙yyy(T ) = ˙yyy^tg(T )

Consequently, a different formulation than (2.7), where the end velocity is set to zero, is required. The goal state is as before acquired through the inverse kinematics

qqq_f = χ⁻_y¹(yyy^tg(T ))

but note that qqq_f will change between solver calls since the target is moving. As long as the velocity of the target is not too large, the new underlying optimization problem should not differ too much and the previous trajectory will still serve as a feasible initial guess. The trajectory is only re-initialized if the new goal is outside the close proximity of the previous goal. Hence, problems where the target velocity is large enough so that target is allowed to leave the close proximity will be considered ill-posed. Determining the goal velocity now requires utilizing the velocity Jacobian. The Jacobian can either be supplied directly, or it is obtained by differentiating the forward kinematics. The goal velocity is then given by

˙qqq_f = J(qqq_f)⁻¹˙yyy^tg(T )

Care must be taken to the fact that the Jacobian might become singular during the procedure. All singularities should be identified before implementation and the problem could then be alleviated by for example ensuring that the targets do not pass through singularities. Situations where this is impossible to avoid will require further considerations.

If the target velocity is linear, or if some time parametrized position yyy^tg(t) or velocity ˙yyy^tg(t) function is supplied, a more sophisticated approach can be applied to the problem. Before running the solver during some cycle i, the current esti- mated total time (n_i−1−1)∆T_i−1can be used to predict where the target will end up. For example, if the target velocity is given by the fixed vvv, instead of aiming at the targets current position yyy^tg_i the goal can be set to

yyy^tg_i + (n_i−1−1)∆T_ivvv

As the target is approached, and the estimated total transition time becomes more accurate, the predicted target state will also improve. As a result, the system will have a better chance of reaching the target with matching speed, instead of first reaching the target and then having to catch up. An apparent drawback of this method is the fact that the prediction can end up outside the workspace, where the target can no longer be reached. In some cases, this would indicate that the target could never have been reached and that the planner should give up.

However, another cause for such a prediction could be that the predicted total transition time is too large, which for example might occur during the initial iterations. Properly distinguishing between these causes would require further exploration of the approach.

(32)

3.6 Multiple trajectories

An extension is made to the trajectory planner by introducing the possibility of maintaining multiple candidate trajectories. By doing so, it becomes possible to explore different paths. In an environment featuring many obstacles the number of topologically distinct collision-free paths might increase, and it is probably not apparent which candidate that would lead to the shortest transition time. A novel solution to this problem is provided in [15]. First, numerous collision-free paths are generated using the probabilistic roadmap approach from [8]. Next, the paths are filtered based on a topological equivalence relation. The remaining candidates can then be stored and improved simultaneously by a trajectory planner.

The approach is efficient and appears promising for real-time applications.

In this work, the multiple trajectory approach is applied to solve the issue of multiple solutions to the inverse kinematics problem. When the given target location infers different joint configuration goals the trajectory initialization described in Section 3.1 will generate distinct initial guesses, and its is not clear which of them should be fed to the planner to achieve the shortest transition time. A simple solution to the problem is to choose the goal state that is closest to the current state, but that would not necessarily yield the best trajectory if say the initial velocity is not zero. Hence, to fully alleviate this issue, all candidates are fed to the planner. Since the trajectories are independent of each other they can be solved in parallel. Additionally, candidate trajectories that pass through obstacles will be penalized when choosing the best candidate. As a result, this simple approach could in some cases increase the chance of finding a collision free trajectory, since the planner will have the opportunity to explore alternative paths.

The approach will definately incur a heavier demand on the processing computer, but the chance of finding an optimal trajectory increases. In this simplified setting, after some iterations it is not likely that the planner would switch from the currently active trajectory, as the other candidates will become more ill-posed as the active one is traversed. One could then commit to this trajectory and free up computational resources. Apart from solving the problem of multiple goal configurations, the results could be used to evalute the prospect of fully adopting the multiple trajectory approach. Future work may then involve adapting the more sophisticated method from [15], in combination with some improved initial path planning such as [8].

3.7 Real-time algorithm

The complete trajectory planning procedure is now condensed into a real-time algorithm. The main solution procedure is outlined in Algorithm 1. The trajectory planner needs to be supplied with a desired control strategy. The available strategies are given by

• MinimizeTime

• MinimizeEnergy

Sensor-Based Trajectory Optimization

Sensor-Based Trajectory Optimization

MARTIN BIEL

Sensor-Based Trajectory Optimization

M A R T I N B I E L

Sammanfattning

Abstract

Acknowledgements

Contents

1

Introduction

1.1 Background

1.2 Problem formulation

1.3 Related work

1.4 Approach

Preliminaries 2

2.1 Robot modelling

2.2 Optimal control problem

2.2.1 General motion planning problem

2.2.2 Constraint extensions

2.2.3 Objective extensions

2.3 Timed elastic band

2.4 Non-linear model predictive control

Trajectory Planner 3

3.1 Trajectory initialization

3.2 Deformation of the timed elastic band

3.2.1 Deformation in time

3.2.2 Deformation in space

3.3 Control application and planner update

3.4 Collision avoidance

3.5 Tracking moving targets

3.6 Multiple trajectories

3.7 Real-time algorithm