Robust Booster Landing Guidance/Control

(1)

Robust Booster Landing

Guidance/Control

UGURCAN ÇELIK

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ENGINEERING SCIENCES

(2)

(3)

Robust Booster Landing

Guidance/Control

UGURCAN ÇELIK

Degree Projects Systems Engineering (30 ECTS credits) Degree Programme in Aerospace Engineering (120 credits) KTH Royal Institute of Technology year 2020

Supervisor at KTH: Johan Karlsson Examiner at KTH: Johan Karlsson

(4)

TRITA-SCI-GRU 2020:300 MAT-E 2020:073

Royal Institute of Technology

School of Engineering Sciences

KTH SCI

SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

(5)

Abstract

The space industry and the technological developments regarding space ex-ploration hasn’t been this popular since the first moon landing. The privati-zation of space exploration and the vertical landing rockets made rocket sci-ence mainstream again. While being able to re-use rockets is efficient both in terms of profitability and popularity, these developments are still in their early stages. Vertical landing has challenges that, if neglected, can cause disastrous consequences. The existing studies on the matter usually don’t account for aerodynamics forces and corresponding controls, which results in higher fuel consumption thus lessening the economical benefits of vertical landing.

Similar problems have been tackled in studies not regarding booster land-ings but regarding planetary landland-ings. And while multiple solutions have been proposed for these problems regarding planetary landings, the fact that the re-inforcement learning concepts work well and provide robustness made them a valid candidate for applying to booster landings. In this study, we focus on developing a vertical booster descent guidance and control law that’s ro-bust by applying reinforcement learning concept. Since reinforcement learn-ing method that is chosen requires solvlearn-ing Optimal Control Problems (OCP), we also designed and developed an OCP solver software. The robustness of resulting hybrid guidance and control policy will be examined against various different uncertainties including but not limited to wind, delay and aerody-namic uncertainty.

(6)

(7)

Sammanfattning

Rymdindustrin och den tekniska utvecklingen av rymdutforskningen har inte varit så populär sedan den första månlandningen. Privatiseringen av utforsk-ningen av rymden och de vertikala landningsraketerna medförde att raketve-tenskapen återkom som en viktig huvudfråga igen. Även om det är effektivt att återanvända raketer i form av lönsamhet och popularitet, är denna utveck-ling fortfarande i sina tidiga stadier. Vertikal landning har utmaningar som, om de försummas, kan orsaka katastrofala konsekvenser. De befintliga studierna i frågan redovisar vanligtvis inte aerodynamikkrafter och motsvarande regula-torer, vilket resulterar i högre bränsleförbrukning som minskar de ekonomiska fördelarna med vertikal landning.

Liknande problem har hanterats i studier som inte avsåg boosterlandning-ar utan om planetboosterlandning-ariska landningboosterlandning-ar. Även om flera lösningboosterlandning-ar hboosterlandning-ar föreslagits för dessa problem beträffande planetariska landningar, det faktum att förstärk-ningsinlärningskonceptet fungerar bra och ger robusthet gjorde dem till en gil-tig kandidat för att ansöka om boosterlandningar. I den här studien fokuserar vi på att utveckla en lagstiftning för styrning av vertikala booster-nedstigningar som är robust genom att tillämpa koncepten inom förstärkningsinlärning. Ef-tersom förstärkt inlärningsmetod som väljs kräver lösning av optimala kon-trollproblem (OCP), designade och utvecklade vi också en OCP-lösningsmjukvara. Robustheten för resulterande hybridstyrning och kontrollpolicy kommer att undersökas mot olika osäkerheter inklusive, men inte begränsat till vind, för-dröjning och aerodynamisk osäkerhet.

(8)

(9)

1 Introduction 1

1.1 Motivation and Problem Definition . . . 1

1.2 Literature Review . . . 3

1.3 Focus and Objective of Thesis . . . 3

1.4 Organization of the Thesis . . . 4

2 OCP Solver - RSOPT 5 2.1 General Formulation of the OCP . . . 5

2.2 Direct Transcription Method . . . 6

2.2.1 Polynomial Approximation . . . 8

2.2.2 Legendre-Gaus-Radau Transcription . . . 12

2.3 Multiple Segment Optimal Control Problem Formulation . . . 15

2.3.1 Multiple Segment Legendre-Gaus-Radau Transcription 17 2.4 Mesh Refinement . . . 20

2.4.1 Relative Error Estimation . . . 22

2.4.2 Refinement Strategy . . . 24

2.5 RSOPT External Packages . . . 25

2.5.1 AdiGator . . . 25

2.5.2 IPOPT . . . 25

2.6 Implementation Details . . . 25

2.6.1 Single Segment OCP . . . 26

2.6.2 Multiple Segment OCP . . . 28

2.6.3 Constraint Jacobian Computation . . . 33

2.6.4 Hessian of Lagrangian Computation . . . 34

2.7 RSOPT Structure . . . 37

2.8 RSOPT Example Problems . . . 39

2.8.1 Example Problem-1 . . . 39

2.8.2 Example Problem-2 . . . 41

(10)

3 Booster Landing with RL 44

3.1 Modelling and Simulation . . . 45

3.1.1 Motion Model . . . 45

3.1.2 Aerodynamic-Gravity-Density Models . . . 47

3.2 Reinforcement Learning Framework . . . 48

3.2.1 Introduction to Reinforcement Learning . . . 48

3.2.2 BLP State-Action-Reward Definitions . . . 51 3.2.3 BLP Stochastic Environment . . . 53 3.2.4 BLP Policy Model . . . 56 3.3 Policy Optimization . . . 59 3.4 Apprenticeship Learning . . . 62 3.4.1 Data Generation . . . 62 3.4.2 ANN Training . . . 66 3.5 Results . . . 68 3.5.1 Apprenticeship Learning . . . 68

3.5.2 Policy Optimization Results . . . 69

3.5.3 Alternative Heuristic Method . . . 77

(11)

Introduction

1.1 Motivation and Problem Definition

The ability of rockets to make autonomous precise vertical landing, which enables to re-use them for further missions, has been gaining attention recently in space industry [1]. The reason for this is associated with the significant decrease in launching costs of the missions in conjunction with the increase in mission responsiveness.

Even though this concept was started to be investigated early in the 1990’s [2], due to significant challenges, the successful missions has been carried out in recent years by companies such as SpaceX and Blue Origin. Lars Blackmore who is the former principal rocket landing engineer of SpaceX, examined the challenges of these missions in four main category as follows [3];

• Extreme Environment • Small Margin for Error • Tough Touchdown Conditions • Precise Landing

The extreme environment can be explained with the extreme conditions which a booster is subjected during re-entry to earth atmosphere from space. Aerodynamic loads on booster, aerodynamic frictions on booster which cause extreme heating problems, winds, gusts and air ionization which results in a black-out can be given as example to these conditions. The small margin for er-ror is related with need of having exact combination of the precise soft landing conditions such as achieving a vertical attitude with zero velocity specifically

(12)

at ground. Moreover, the booster needs to land a desired landing location pre-cisely and landing legs need to withstand all the pressure exposed during touch down and prevent the booster from tipping over. When all these challenges are taken into account, the vertical landing for a booster has been needed to be in-vestigated further in detail in order to avoid catastrophic results.

A typical mission profile for a re-usable rocket is given at figure1.1, which belongs to SpaceX’s Falcon-9 launch vehicle. In this mission profile, the re-usable part of launch vehicle is the first stage booster. When the returning part of the mission is considered, it can be observed that the booster re-enters the earth atmosphere and lands the ground after passing through several flights phases which are named as entry burn, aerodynamic guidance and finally ver-tical landing in representative figure 1.1. All these phases have their own im-portance regarding with the challenges mentioned above. For instance, the en-try burn phase is carried out in order to decelerate the booster for the purpose of decreasing aerodynamic frictions to avoid extreme heating during re-entry. Aerodynamic guidance phase is conducted to steer the booster to desired land-ing location without usland-ing fuel to facilitate the job of the final phase.

Figure 1.1: Re-usable Rocket Typical Mission Profile (taken from [4]) However, in this thesis, the main area of interest will be powered vertical landing phase. Since it is the most critical phase in which the booster needs to shrink all the dispersion resulted from the former phases while also attenuating

(13)

the disturbances encountered during this phase in order to achieve precise soft landing to the desired location. Therefore, a hybrid guidance and control law will be developed for this phase of the returning mission of a reusable booster.

1.2 Literature Review

Since this concept in its early age, studies in the literature regarding this subject can be classified under two categories.

In the first category, the thrust profiles of the booster during landing are approximated with time dependent polynomials whose coefficients are com-puted to meet the end point constraints by the authors (see [5]- [6] ). However, the aerodynamics forces are neglected in both of these research.

In second category, authors preferred to convexify the optimal control prob-lem associated with vertical landing whose objective is to minimize the fuel used in the mission in order to make it possible for real time computations (see [1]-[2]-[7]). However, in these research, the robustness analysis aren’t taken into account such as the effects of existing delays, sensor noises and the disturbances which can be encountered during landing.

While we have limited number of research regarding the booster landing missions, we do have numerous studies regarding planetary landing problems. In fact, an extensive research is carried out for landing on Moon or Mars sur-face by employing various types of methods such as learning a guidance law with deep neural networks, developing a guidance and control policy with re-inforcement learning and deriving a close form guidance law from OCP (see [8]-[9]-[10]-[11]). Although there exist significant differences when the en-vironments are considered, the nature of the dynamics of both problems are close to each other. Therefore, these studies are valuable for the purpose of employing them to the booster landing missions with necessary modifications.

1.3 Focus and Objective of Thesis

In this thesis, the main goal is to develop a robust hybrid guidance and con-trol law , which have the ability of aerodynamic concon-trol, with reinforcement learning concept for powered descent guidance of a booster by employing the method originally introduced in [8] for Mars pinpoint landing problem. In [8], the aerodynamic forces are neglected due to low air density of the Mars atmo-sphere ,which means no aerodynamic control exist. However, in this study, because of the high air density in earth atmosphere, this method will be

(14)

gener-alized by including aerodynamic control ability. Moreover, due to the need of optimal control problem (OCP) solver software in order to follow the method, a home-grown OCP solver software design and development is also carried out in this thesis which is called RSOPT. The reason why a commercial software isn’t used for this purpose is because of the current inadequacy of literature on this concept for the students who study in European countries. Since academic licenses for widely adopted and used software such as GPOPS [12] and SOCS [13] are provided either for the students who study in America or not provided at all.

1.4 Organization of the Thesis

The outline of this thesis is as follows: In Chapter 2 the design of the OCP solver software called RSOPT will be introduced. First, the mathematical pre-liminaries will be explained then the implementation details will be given. Finally, example problems will be solved in order to validate the software. In Chapter 3, robust guidance and control law development by using rein-forcement learning concept will be explored for the vertical booster descent guidance. First, the models and parameters used in the booster landing simu-lation will be introduced and then reinforcement learning framework will be explained in detail which is the key part in order to develop the guidance/con-trol policy . Finally, the optimization results will be given at related section. In Chapter 4, the conclusion and future works will be delivered.

(15)

OCP Solver - RSOPT

RSOPT is an home-grown software which is written in Matlab in order to solve single phase optimal control problems (OCP) by using pseudo-spectral collo-cation methods. The primary reason for designing this software is to have the ability to solve complex aerospace related trajectory optimization problems.

In this chapter, first general formulation of single phase optimal control problems will be introduced. In section 2.2, the numerical method followed in RSOPT will be explained for single segment OCP, then it will be generalized for multiple segment OCPs in section 2.3. Then, mesh (segment) refinement method will be described at section 2.4. After explaining all the followed methods, the external packages used in RSOPT will be introduced in section 2.5. In section 2.6, the implementation details of RSOPT will be explained in detail. Then, the general algorithm structure of the RSOPT will be demon-strated at section 2.7. Lastly, two example problems will be solved and results will be shared at section 2.8 in order to validate the RSOPT.

2.1 General Formulation of the OCP

Consider the single phase Bolza form optimal control problems, the main pur-pose of these types of problems is to find optimal state trajectories x(t) ∈ Rn and control trajectories u(t) ∈ Rm which maximizes or minimizes the ob-jective function on defined time intervals t ∈ [t0, tf].For this purpose, the

objective functions can be defined as; J = ϕ[x(tf), u(tf), tf] +

Z tf

t0

g(x(t), u(t), t) _(2.1)

In this objective formulation, ϕ represents the terminal cost while g is the

(16)

continuous integral cost.

The states are subjected to the differential dynamic constraints;

˙x(t) = f (x(t), u(t), t) _(2.2)

Boundary conditions on states and controls are given as;

φ(x(t0), x(tf), u(t0), u(tf)) = 0 (2.3)

Path constraints on states and controls are;

CL≤ C(x(t), u(t), t) ≤ CU (2.4)

Upper bounds and lower bounds on states and controls are represented as; xL ≤ x(t) ≤ xU

uL ≤ u(t) ≤ uU

(2.5) Therefore Bolza form optimal control problem defined above can be rep-resented as following optimization problem;

min x,u J = ϕ[x(tf), u(tf), tf] + Z tf t0 g(x(t), u(t), t) s.t ˙x(t) = f (x(t), u(t), t) φ(x(t0), x(tf), u(t0), u(tf)) = 0 CL ≤ C(x(t), u(t), t) ≤ CU xL≤ x(t) ≤ xU uL≤ u(t) ≤ uU (2.6)

2.2 Direct Transcription Method

Most of the time, it is a challenging task to obtain analytical solutions for optimal control problems. In fact, it is almost impossible for highly nonlinear trajectory optimization problems. For this reason, it is better to employ the numerical methods to solve OCPs.

In this section, numerical methods followed in RSOPT in order to obtain solutions for optimal control problems will be discussed. Generally, numerical methods for optimal control problems are separated into two different groups, which are indirect methods and direct methods.

(17)

Indirect methods are based on finding the roots of the first order optimality conditions of optimization problem given at equation 2.6. Often, these condi-tions results in two point boundary value problems to be solved numerically. Although it is possible to attain high accuracy solutions with indirect methods, it isn’t preferred to be used in RSOPT. The reason is that it is often intractable to derive first order optimality conditions for aerospace related trajectory prob-lems. Moreover, most of the time it is also challenging to solve the resulting two point boundary value problem since convergence is heavily depending on initial guess.

Hence, the approach followed in this thesis is a direct method specifically a direct transcription method which depends on parametrization of both of the states and controls. Three necessary steps must be followed in a direct transcription method. The first step is to convert a continuous optimal control problem to a finite-dimensional nonlinear programming problem by state and control parametrization. In the second step, aroused NLP is solved with well-known parameter optimization solvers such as IPOPT. At the last step, the accuracy of the obtained solution is evaluated, and approximations are refined accordingly for this evaluation if it is needed [14].

As mentioned above the first step is to parametrize the states and controls in order to transform a continuous optimal control problem to a finite dimen-sional NLP problem. This transformation can be achieved by using various discretization techniques that exist in literature. Often these techniques fall into one of three categories, local transcription, global transcription and hy-brid local/global transcription.

• Local Transcription: In these methods, time is segmented into smaller intervals. State and control variables are approximated with low order polynomials like 2nd− 4thorder. The examples of these techniques are well known discretization methods such as trapezoidal, Runge-Kutta, Hermite-Simpson methods.

• Global Transcription: In a global transcription method, time isn’t seg-mented into intervals. However, state and control variables are approx-imated with high order polynomials. Orthogonal collocation methods such as chevbysev collocation and Gauss-Legendre collocation can be given as an example to these methods.

• Hybrid Local/Global Transcription: These methods can be regarded as multiple-segment global transcription. The reason is that time is seg-mented into relatively small intervals and high order polynomials are

(18)

used to approximate state and control variables. All the global transcrip-tion examples can be also examples to these methods with segmented time.

In RSOPT, it is preferred to implement a multiple segment global orthog-onal collocation method. Because of the fact that when global orthogorthog-onal collocation methods are considered, they have exponential convergence rate if the problem has a smooth solution, for this reason they are also called as pseudo-spectral methods [15]. It is known that most of the aerospace related optimization problems need to have smooth state or control changes over time due to structural and performance limitations. Therefore, it is reasonable to proceed with global orthogonal collocations methods in RSOPT. The reason why time segmentation ,which indicates the hybrid method, is also taken into account is that it enables to have computationally more efficient nonlinear pro-gramming problems. This will be explained later in section 2.4.

2.2.1 Polynomial Approximation

As mentioned above, in a global orthogonal collocation method the states and controls are approximated with high order polynomials. The approximations are achieved in this thesis by using Lagrange interpolation formula.

Considering an arbitrary function of P (t), this function can be approx-imated, in an interval t ∈ [t0, tf] by having N support points ti lie in this

interval with function values of Pi .

P (ti) = Pi i = 1, 2, 3 . . . N (2.7)

Then, the N − 1 degree of polynomial approximation of P (t) is attained by Lagrange interpolation polynomials as follows;

P (t) ≈

N

X

i=1

Li(t)P (ti) (2.8)

Where, Li(t) denotes the unique Lagrange polynomials and it is computed

as; Li(t) = N Y j=1 j6=i τ − τj τi− τj (2.9)

(19)

In Lagrange approximation, the number of the support points used as well as their locations have significant effects on approximation performance. Un-der a normal condition, it can be thought that increasing the number of the support points, which means increasing the degree of Lagrange polynomials, will end up with a better approximation. In fact, the result is different from this expectation. For instance, if equi-spaced support points are used, then increas-ing the number of points will cause error to increase near boundaries. This is also known as Runge Phenomenon [16]. Illustration of this phenomenon can be seen at figure-2.1. In this figure, the polynomial P (t) = 1+11tt 2 is

approxi-mated by using Lagrange interpolation andP (t) represents the approximation.ˆ The figure 2.1a shows the approximation with N = 15 and figure-2.1b shows the approximation with N = 25. It can be observed that increasing the number of equally spaced support points from 15 to 25 results in a higher error near boundaries. The maximum absolute error observed with N = 15 is equal to 0.1449, while for N = 25 it is equal to 1.3245

(a) N=15 (b) N=25

Figure 2.1: Runge Phenomenon

To overcome this phenomenon, it is possible to use support points located at chevbysev nodes or at Gaussian-quadrature nodes. Actually, this can be re-garded as the starting point of orthogonal collocation methods used in RSOPT. Since, it is selected to use specific kinds of Gaussian-quadrature nodes, which means also dynamics will be collocated at these nodes for transcription pur-pose. This will be explained at section 2.17.

There exist three different sets of Gaussian-quadrature nodes. These are, Legendre-Gauss-Labotto (LGL) nodes, Legendre-Gauss-Radau (LGR) nodes,

(20)

Legendre-Gauss (LG) nodes. The node distributions for each set can be inves-tigated at figure 2.2 with 10 nodes per set.

-1 -0.5 0 0.5 1

Node Locations

LG LGR LGL

Figure 2.2: Illustration of LG,LGR,LGL Node Locations

From the figure-2.2, it can be observed that node locations for all of the three sets are denser near the boundaries which helps to eliminate Runge Phe-nomena. The main difference between them are summarized as follows;

• LGL Nodes: are contained in interval [−1, 1] and they are symmetric about origin. They are roots of the equation; P˙N −1 = 0 plus points

-1,+1. Where, P˙N −1 is derivative of the (N − 1)th degree Legendre

polynomial.

• LGR Nodes: are contained in interval [−1, 1). They are roots of the equation ; PN −1+ PN = 0. Where, PN −1and PN denotes the (N − 1)th

and Nthdegree Legendre polynomial respectively.

• LG Nodes: are contained in interval (−1, 1) and they are symmetric about origin. They are roots of the equation; PN = 0. Where, PN is Nth

degree Legendree polynomial.

In RSOPT, LGR nodes will be used as support points for approximations. Hence, Legendre-Gauss-Radu orthogonal collocation also will be transcrip-tion method in order to transform the continuous OCP to finite dimensional NLP problem. The reason why LGR nodes are selected is that it is more suit-able for multiple-segment collocation.

(21)

This time, P (t) = 1+11tt 2 ,is approximated by using LGR nodes as support

points for Lagrange interpolation. In figure-2.3, P (t) represents the resul-ˆ tant approximation. The figures 2.8a and 2.8b show the approximations with N = 15 and with N = 25 respectively. It can be observed that increasing the number of LGR nodes from 15 to 25 results in a much better approximation, which means no Runge Phenomena is observed. The maximum absolute error is equal to 0.0113 for N = 15, it is equal to 7.42x10−4for N = 25 .

(a) N=15 (b) N=25

Figure 2.3: LGR Approximation

Before proceeding to transcription methodology, a slight variable variation will be conducted on Bolza form optimal control problem defined at equations 2.1-2.36. The problem defined at time interval [t0, tf] can be transformed and

contained in a new fixed time domain which is [−1, 1] with the following linear transformation.

τ = 2t

tf − t0

+tf + t0

tf − t0 (2.10)

The reason for doing so is that it will provide a better form for LGR orthog-onal collocation method since the node locations of this method is contained within [-1,1) interval, as mentioned above.

Therefore, Bolza form optimal control problem can be represented in a new time frame. The objective function defined at equation 2.1 is redefined as follows;

(22)

J = ϕ[x(1), u(1), 1] + tf − t0 2

Z 1

−1

g(x(τ ), u(τ ), τ )dτ _(2.11)

States differential constraints are; ˙x(τ ) = tf − t0

2 f (x(τ ), u(τ ), τ ) (2.12)

Boundary conditions on states and controls are given as;

φ(x(−1), x(1), u(−1), u(1)) = 0 _(2.13)

Path constraints on states and controls are;

CL≤ C(x(τ ), u(τ ), τ ) ≤ CU (2.14)

Upper bound and lower bounds on states and controls are represented as; xL≤ x(τ ) ≤ xU

uL≤ u(τ ) ≤ uU

(2.15) Hence, transformed Bolza form optimal control problem defined above can be represented as following optimization problem;

min x,u J = ϕ[x(1), u(1), 1] + tf − t0 2 Z 1 −1 g(x(τ ), u(τ ), τ ) s.t ˙x(τ ) = tf − t0 2 f (x(τ ), u(τ ), τ ) φ(x(−1), x(1), u(−1), u(1)) = 0 CL≤ C(x(τ ), u(τ ), τ ) ≤ CU xL≤ x(τ ) ≤ xU uL≤ u(τ ) ≤ uU

2.2.2 Legendre-Gaus-Radau Transcription

In Legendre-Gauss-Radau transcription methods, states’ differential constraints are collocated at LGR nodes. LGR nodes lie on the interval (-1,1] or [-1,1) depending on which legendre function is chosen to obtain roots.

(23)

If roots of PN(τ ) + PN −1(τ ) −→ N odesLGR = [−1, 1)

If roots of PN(τ ) − PN −1(τ ) −→ N odesLGR = (−1, 1]

(2.16)

On the other hand, the states parametrization is done at LGR nodes plus +1 or -1 depending on which point is chosen as a collocation point. That is if point -1 is selected as collocation point then point +1 will be also included for in-terpolation of the states. However, the controls are only parametrized at LGR nodes with Lagrange interpolation. Example node distribution schematic can be observed at figure 2.4 with different number of nodes chosen.

-1 -0.5 0 0.5 1 LGR Nodes 5 15 25 #Collocation P oints

Figure 2.4: Illustration of LGR nodes

As explained above, the states and controls are approximated with lagrange interpolation as follows; x(τ ) = N +1 X i=1 Lxi(τ )x(τi) u(τ ) = N X i=1 Lui(τ )u(τi) (2.17)

(24)

con-trol respectively; Lxi(τ ) = N +1 Y j=1 j6=i τ − τj τi− τj Lui(τ ) = N Y j=1 j6=i τ − τj τi− τj (2.18)

Therefore, equation 2.17 and equation 2.18 can be used to obtain states derivatives with respect to scaled time τ , since x(τi) is a constant value at

collocation nodes, it is enough to take derivative of LX(τ ) ;

˙x(τ ) = N +1 X i=1 ˙ Lxi(τ )x(τi) (2.19)

States derivatives at LGR nodes are obtained by calculating derivatives of Lagrange polynomials at nodes. Since location of the nodes are the roots of corresponding Legendre functions and they are predetermined constant values, Derivatives at nodes can be represented with a constant LGR differentiation matrix DLGR∈ RN xN +1; ˙x(τk) = N +1 X i=1 ˙ Lxi(τk)x(τi) = N +1 X i=1 Dkix(τi) (k = 1, 2 . . . N ) (2.20)

On the other hand, continuous time integral term in objective function de-fined at equation 2.11 can be approximated at LGR nodes with Gauss-Radau quadrature as follows; Z 1 −1 g(x(τ ), u(τ ), τ )dτ = N X i=1 w_iLGRg(x(τi), u(τi), τi) (2.21)

Where wLGRi is the LGR quadrature weights for corresponding node

loca-tions and it can be calculated as follows;

wLGR_i = 1 − τi

(N + 1)2_L

N(τi)2 (2.22)

Hence, Bolza form optimal control problem defined at equations 2.11-2.35 can be transformed a finite dimensional Nonlinear programming program by

(25)

combining them with equations 2.20- 2.21. The resulting NLP problem is given below; min x,u J = ϕ[x(1), u(1), 1] + tf − t0 2 N X i=1 wLGR_i g(x(τi), u(τi), τi) s.t N +1 X i=1 Dkix(τi) − tf − t0 2 f (x(τk), u(τk), τk) = 0 φ(x(−1), x(1), u(−1), u(1)) = 0 CL ≤ C(x(τk), u(τk), τk) ≤ CU xL≤ x(τk) ≤ xU uL≤ u(τk) ≤ uU k = 1, 2...N (2.23)

2.3 Multiple Segment Optimal Control

Prob-lem Formulation

Single phase and single segment optimal control problems defined at equations 2.1-2.36 can be extended to a problem which contains multiple segment/mesh intervals by dividing time intervals into relatively small segments. Illustration of this segmentation can be seen at figure 2.5.

Figure 2.5: Illustration of Time Segmentation

Dividing time into smaller segments will facilitate the things such as mesh refinement in order to obtain good solutions for the problem at hand. After segmentation, the objective function of the bolza form optimal control problem defined at equation 2.1 can be rewritten as follows;

J = ϕ[x(S)(t(S)_f ), u(S)(t(S)_f )] + S X s=1 " Z t(s)_f t(s)₀ g(s)(x(s)(t), u(s)(t), t(s)) # (2.24)

(26)

In this objective formulation, S is the total number of segments or total number of mesh intervals in a single phase optimal control problem,The other terms are the same as defined before.

The states differential dynamic constraints for multiple segment OCP are represented as;

˙x(s) =f(s)(x(s)(t), u(s)(t), t(s))

(s = 1, 2, 3, ...S) (2.25)

Boundary conditions on states and controls for multiple segment OCP are given as;

φ(x(s)(t(s)₀ ),x(s)(t(i)_f ), u(s)(t(s)₀ ), u(s)(t(s)_f )) = 0

(s = 1, 2, 3, ...S) (2.26)

Path constraints on states and controls for multiple segment OCP are; CL≤ C(s)(x(s)(t), u(s)(t), t) ≤ CU

(s = 1, 2, 3, ...S) (2.27)

Upper bounds and lower bounds on states and controls for multiple seg-ment OCP are represented as;

xL≤ x(s)(t) ≤ xU

uL≤ u(s)(t) ≤ uU

(s = 1, 2, 3, ...S)

(2.28)

Distinct from single segment OCP, additional linkage constraints for mul-tiple segment OCP must also be included in order to satisfy the states and controls continuity in between segment transition times. These constraints are given in following form;

Ln[x(1)(t(1)₀ ), u(1)(t(1)₀ ), x(1)(t(1)_f ), u(1)(t(1)_f ), t(1)₀ , t(1)_f x(2)(t(2)₀ ), u(2)(t(2)₀ ), x(2)(t(2)_f ), u(2)(t(2)_f ), t(2)₀ , t(2)_f .. . x(S−1)(t(S−1)₀ ), u(S−1)(t(S−1)₀ ), x(S−1)(t(S−1)_f ), u(S−1)(t(S−1)_f ), t(S−1)₀ , t(S−1)_f x(S)(t(S)₀ ), u(S)(t(S)₀ ), x(S)(t(S)_f ), u(S)(t(S)_f ), t(S)₀ , t(S)_f ] = 0 (s = 1, 2, 3, ...S) (2.29)

(27)

With time segmentation, the continuous time OCP can be represented with the following optimization problem.

min x,u J = ϕ[x (S) (t(S)_f ), u(S)(t(S)_f )] + S X s=1 " Z t(s)_f t(s)₀ g(s)(x(s)(t), u(s)(t), t(s)) # s.t ˙x(s)= f(s)(x(s)(t), u(s)(t), t(s)) φ(x(s)(t(s)₀ ), x(s)(t(s)_f ), u(s)(t(s)₀ ), u(s)(t(s)_f )) = 0 CL≤ C(s)(x(s)(t), u(s)(t), t) ≤ CU xL≤ x(s)(t) ≤ xU uL≤ u(s)(t) ≤ uU Ln[x(1)(t(1)₀ ), u(1)(t(1)₀ ), x(1)(t(1)_f ), u(1)(t(1)_f ), t(1)₀ , t(1)_f x(2)(t(2)₀ ), u(2)(t(2)₀ ), x(2)(t(2)_f ), u(2)(t(2)_f ), t(2)₀ , t(2)_f .. . x(S)(t(S)₀ ), u(S)(t(S)₀ ), x(S)(t(S)_f ), u(S)(t(S)_f ), t(S)₀ , t(S)_f ] = 0 s = 1, 2 . . . S (2.30) Indices representation for the time segmentation can be observed at figure 2.6.

Figure 2.6: Illustration of Time Segmentation

2.3.1 Multiple Segment Legendre-Gaus-Radau

Tran-scription

Similar to what is carried out for a single segment problem, the first thing to do is to make an affine time transformation given at equation 2.10. Then, the objective function is redefined as follows;

(28)

J = ϕ[x(τ(S) = 1), u(τ(S) = 1)]+ S X s=1 " t(s)_f − t(s)₀ 2 Z 1 −1 g(x(τ(s)), u(τ(s)), τ(s))dτ(s) # (2.31) The states differential constraints in transformed time frame for multiple segment continuous OCP can be written as;

˙x(s)= t (s) f − t (s) 0 2 f (s) (x(s)(τ(s)), u(s)(τ(s)), τ(s)) _(2.32) Boundary conditions on states and control variables are functions of only the first segment and the final segment since the initial time is contained in the first segment and the final time is contained in the final segment.Hence, the boundary condition constraint defined at equation 2.13 can be redefined as;

φ x(1)(τ(1) = −1), x(S)(τ(S) = 1), u(1)(τ(1) = −1), u(S)(τ(S) = 1) = 0 (2.33) Path constraints on states and controls are;

CL ≤ C(x(s)(τ(s)), u(s)(τ(s)), τ(s)) ≤ CU

(s = 1, 2, 3, ...S) (2.34)

Upper bounds and lower bounds on states and controls are represented as; xL≤ x(s)(τ(s)) ≤ xU

uL ≤ u(s)(τ(s)) ≤ uU

(s = 1, 2, 3, ...S)

(2.35)

Linkage constraints in between segment transitions will be formulated as; Ln[x(1)(τ₀(1)), u(1)(τ₀(1)), x(1)(τ_f(1)), u(1)(τ_f(1)), τ₀(1), τ_f(1) x(2)(τ₀(2)), u(2)(τ₀(2)), x(2)(τ_f(2)), u(2)(τ_f(2)), τ₀(2), τ_f(2) .. . x(S−1)(τ₀(S−1)), u(S−1)(τ₀(S−1)), x(S−1)(τ_f(S−1)), u(S−1)(τ_f(S−1)), τ₀(S−1), τ_f(S−1) x(S)(τ₀(S)), u(S)(τ₀(S)), τ(S)(τ_f(S)), u(S)(τ_f(S)), τ₀(S), τ_f(S)] = 0 (2.36) Now the continuous single phase multiple segment continuous optimal control problem is in appropriate form for Legendre-Gauss transcription in or-der to transform the continuous problem to finite dimensional NLP. By analog-ically, the states and control will be approximated with Lagrange polynomials.

(29)

However, for multiple segment cases the important and beneficial thing is the ability of approximating the states and controls with different order of polyno-mials in different segments. This will allow us to capture the rapidly changing dynamics with the help of high order of polynomial approximations at nec-essary intervals. Therefore, state and control approximations can be written as; x(s)(τ(s)) = N(s)₊₁ X i=1 L(s)_x_i(τ(s))x(s)(τ_i(s)) u(s)(τ(s)) = N(s) X i=1 L(s)_u_i(τ(s))u(s)(τ_i(s)) (2.37)

Similarly as in single segment, Lxand Lu represents the lagrange

polyno-mials and their formulas are given as;

L(s)_x_i (τ ) = N(s)+1 Y j=1 j6=i τ − τ_j(s) τ_i(s)− τ_j(s) L(s)_u i (τ ) = N(s) Y j=1 j6=i τ − τ_j(s) τ_i(s)− τ_j(s) (2.38)

States derivatives with respect to scaled time τ(s)at LGR collocation nodes will be evaluated the same as in single segment transcription by taking deriva-tives of lagrange polynomials only. Also, derivaderiva-tives at nodes again will be represented with constant LGR differentiation matrices D(s)∈ RN

(s)_xN(s)₊₁ ; ˙x(s)(τ_k(s) s ) = N(s)+1 X i=1 ˙ L(s)_x_i(τ_k(s) s )x (s)_(τ(s) i ) = N(s)₊₁ X i=1 D(s)_k_s_ix(s)(τ_i(s)) ks = 1, 2...N(s) (2.39)

Moreover, the continuous integral cost for multi-segment OCP will be approximated with LGR quadrature formula and corresponding quadrature weights wLGR can be calculated by using equation 2.22

(30)

S X s=1 Z 1 −1 g(x(τ(s)), u(τ(s)), τ(s))dτ(s) = S X s=1   N(s) X i=1 w(s)_i LGRg(s)(x(s)(τ_i(s)), u(s)(τ_i(s)), τ_i(s))   (2.40) Hence, continuous single phase multi-segment OCP defined at equations 2.24-2.28 is converted to a finite dimensional NLP with LGR transcription by using equations 2.31-2.40. The resultant NLP can be represented as follows; min x,u J = ϕ[x(τ (S) _{= 1), u(τ}(S) _{= 1)]} + S X s=1 " t(s)_f − t(s)₀ 2 N X i=1 w_i(s)LGRg(s)(x(s)(τ_i(s)), u(s)(τ_i(s)), τ_i(s)) # s.t N(s)₊₁ X i=1 D_k(s) six (s)_(τ(s) i ) − t(s)_f − t(s)₀ 2 f (s)_(x(s)_(τ(s) ks ), u (s)_(τ(s) ks ), τ (s) ks ) = 0 φ x(1)(τ(1) = −1), x(S)(τ(S) = 1), u(1)(τ(1) = −1), u(S)(τ(S) = 1) = 0 CL ≤ C(x(s)(τ(s)), u(s)(τ(s)), τ(s)) ≤ CU xL≤ x(s)(τ(s)) ≤ xU uL≤ u(s)(τ(s)) ≤ uU Ln[x(1)(τ₀(1)), u(1)(τ₀(1)), x(1)(τ_f(1)), u(1)(τ_f(1)), τ₀(1), τ_f(1) x(2)(τ₀(2)), u(2)(τ₀(2)), x(2)(τ_f(2)), u(2)(τ_f(2)), τ₀(2), τ_f(2) .. . x(S−1)(τ₀(S−1)), u(S−1)(τ₀(S−1)), x(S−1)(τ_f(S−1)), u(S−1)(τ_f(S−1)), τ₀(S−1), τ_f(S−1) x(S)(τ₀(S)), u(S)(τ₀(S)), τ(S)(τ_f(S)), u(S)(τ_f(S)), τ₀(S), τ_f(S)] = 0 s = 1, 2...S ks= 1, 2...N(s) (2.41)

2.4 Mesh Refinement

At the beginning of the direct transcription methodology, it is mentioned that there exist three certain steps which must be followed to solve OCPs. The first two steps were transforming continuous OCP to finite dimensional NLP and solving it with well-known solvers and these steps are explained step by

(31)

step, so far. Now, we shall take a look at the last step which is evaluation of the solution accuracy. This step will lead us to mesh refinement procedure. Mesh refinement is a procedure which contains the assessment of the accuracy of the approximations carried out while transforming OCP to a finite dimen-sional NLP. In addition to assessment procedure, it is inevitable to refine the approximations if the accuracy isn’t in expected limits. That is, if the solution obtained with current discretization isn’t accurate enough, then, the discretiza-tion will be refined and the problem will be solved with refined version until an accurate solution is obtained [14]. Also, it should be noted that mesh re-finement can be regarded as identical to segment rere-finement. However, in this section, it will be referred to as mesh intervals and mesh refinement in order to be consistent with literature.

Due to the increase in popularity of the direct transcription methods for solving OCPs, extensive research is carried out on mesh refinement techniques. In general, these techniques can be categorized into four different groups such as h,p,r,hp methods.

• h methods : Most of the time, these methods identify with traditional local discretization approaches such as trapezoidal, Runge-Kutta, Her-mite Simpson. The reason for being so is that in h-methods, extra mesh intervals (segments) are added to problems where it is needed in order to improve the solution accuracy depending on relative error estimation . However, no change is applied to the degree of the approximating poly-nomials for states and controls.

• p methods : Unlike the h-methods, accuracy improvements are accom-plished by increasing the degree of the approximating polynomials for states and controls without changing the number mesh intervals. In fact, most of the time a single mesh interval is used for transcribing purposes. For this reason p-methods are better suited for global discreatization methods.

• r methods : In r refinement methods, total number of mesh intervals and degree of the approximating polynomials are kept constant ,how-ever, segment locations can be relocated in a better way to improve the solution accuracy [17]. These methods are used when it is needed to have lower computational load all the time during refinement processes. For instance it is demanded to have it in real-time optimization applica-tions.

(32)

• hp methods : These methods can be regarded as the combination of h-methods and p-methods. They allow for a change in the number of mesh intervals in the problem as well as in the degree of the approxi-mating polynomials. Therefore, it is practical to use hp mesh refinement techniques for local, global and hybrid transcription methods.

All of the methods explained above have their own pros and cons. How-ever, in RSOPT, it is preferred to use hp mesh refinement methods since recent researches have shown that hp-methods surpass the others in terms of effec-tiveness [15]. The reasons can be explained by comparison. When h methods are considered, it is common to need extremely fine mesh in order to obtain a high-accuracy solution. In the case of p methods, most of the time, large de-gree polynomials are used to achieve accurate solutions[15]. Both of these re-sult in high dimensional nonlinear programming problems which cause com-putational issues. When r-methods are taken into account, although it is said that it is computationally more effective than others, it is also hard to obtain enough accuracy with fixed number of mesh intervals while solving optimal control problems with fast changing dynamics. Therefore, it is convenient to use hp methods ,which overcome all the issues mentioned here by allowing both the number of mesh intervals and degree of approximating polynomials vary, for mesh refinements.

As mentioned earlier, a considerable amount of research is carried out on mesh refinement methods especially for hp-methods. For this reason, there exist lots of different hp-methods algorithm in literature (see [15]-[18]-[19]-[20]-[21]) The distinctions aroused between them are associated with the char-acteristics of the relative error estimation and refinement strategy depending on this estimation, which are the two key parts of mesh-refinement procedure. In this thesis, a simple and efficient adaptive hp-mesh refinement algorithm given at [19] will be implemented for RS-OPT. Authors also prefer to call this algorithm as a ph-method (p-then-h strategy). Since, algorithm first tries to achieve adequate accuracy by adjusting degree of approximating polynomials and then if it fails, it arranges mesh intervals accordingly.Now, at oncoming sections, the method derived and explained in detail by authors will be sum-marized shortly here for the sake of completeness.

2.4.1 Relative Error Estimation

In this section, how relative error estimation is performed at adaptive hp-mesh refinement method defined in [19] will be summarized.Authors derived

(33)

the relative error estimation in each mesh intervals by comparing current La-grange state approximation with the more accurate state approximations. This is achieved by using higher order lagrange polynomials for state approxima-tion. Since, for the problems with smooth solutions, increasing the degree of lagrange polynomials, which can be also thought as increase in number of LGR nodes, yields more accurate results.

First step of the relative error estimation after solving finite dimensional NLP defined at equation 2.41 is to increase the number of collocation nodes at each mesh interval. Authors prefered to increase it by one;

M(s)= N(s)+ 1 _(2.42)

Where, M(s) given above represents the new number of nodes at each mesh interval and it should be noted that new node locations are also com-puted by solving interrelated Legendre polynomial equation given at equation 2.16. Then, corresponding states and controls at these new node locations are approximated by using solutions of states and controls attained after solving NLP with former nodes and lagrange interpolation formulas given at equations 2.37-2.38. After that, one can construct more accurate state approximation by using following Legendre Radau Integration formula;

ˆ x(s)(˜τ_j(s)) = x(s)(˜τ₁(s))+t (s) f − t (s) 0 2 M(s)₊₁ X l=1 I_jl(s)f (x(s)(˜τ_l(s)), u(s)(˜τ_l(s)), (˜τ_l(s))) s = 1, 2 . . . S j = 2, 3 . . . M(s)+ 1 (2.43) Where; ˆx, ˜τ represents the more accurate state approximation and new node locations respectively. Also, I is used to express M(s)xM(s)Radau in-tegration matrix. Interested readers can find the detailed information about formulas or how they are derived in [19].

After, attaining more accurate approximation for states, the last step is to evaluate relative error information as follows;

(34)

e(s)_i (˜τ_l(s)) = |ˆx (s) i (˜τ (s) l ) − x (s) i (˜τ (s) l )| 1 + max j∈[1,2,...,M(s)_+1]|x (s)_(˜_τ(s) j )| e(s)_max = max i,l e (s) i (˜τ (s) l ) s = 1, 2 . . . S l = 1, 2 . . . M(s)+ 1 i = 1, 2 . . . nx (2.44)

Where, e(s)max represents the maximum relative error estimation over all states at each mesh interval.

2.4.2 Refinement Strategy

After relative error estimation is carried out for each mesh interval, the next step in order to finalize the mesh-refinement procedure is to establish the re-finement strategy accordingly to error distribution over mesh intervals. For this purpose, as it mentioned earlier, authors propose p-then-h strategy for refining the current discretization. Therefore, they came up with following pseudo-algorithm;

Algorithm 1: Refinement Strategy

∀s ; if e(s)max < then OK ; else P(s)= log_N(s) e(s)max ; ˜ N(s)= N(s)+ P(s); if ˜N(s) < Nmaxthen

Solve the Problem with Refined Discreatization ;

else B(s)= max (_NN˜(s) min), 2 ;

Divide interval into B subintervals ;

Solve the Problem with Refined Discreatization ;

end end

(35)

relative error information, is user defined error tolerance. P represents the additional number of nodes needed in that specific mesh interval to achieve this error tolerance . B shows the number of sub-intervals needed to be created in a mesh interval. Nmin, Nmaxrepresents the maximum and minimum

allow-able node number in a mesh interval. From the pseudo algorithm, it can be easily understood why authors call this algorithm as p-then-h strategy. Mesh subdivisions are only conducted when the number of needed nodes exceeds the maximum number. Therefore, it can be concluded that this algorithm re-quires three different user specified values which are , Nmax, Nmin. Again,

for further information, one can visit the reference .

2.5 RSOPT External Packages

In RSOPT, two external open source packages are used to perform required tasks.

2.5.1 AdiGator

AdiGator is an open source automatic differentiation package which is written in Matlab. It transforms user defined mathematical function files into deriva-tive function files by using forward mode algorithmic differentiation [22]. This package is used for the purpose of producing hessian and jacobian files of NLP transformed from OCP, which significantly accelerates the required time for convergence of NLP. Adigator also enables the user to create vectorized deriva-tive files from the sparsity pattern of the problem, hence, it isn’t necessary to create derivative files repeatedly after mesh refinements.

2.5.2 IPOPT

IPOPT is a well-known open source solver for general large-scale nonlinear programming problems, which was originally written in C++ [23]. However, it’s Matlab interface is also available online. It employs interior-point algo-rithm to solve problems. IPOPT is used to solve NLP transformed from OCPs, defined at equation 2.41 .

2.6 Implementation Details

In this section, some implementation details of RSOPT in Matlab will be dis-cussed and explained. For convenience, it is started with single segment

(36)

opti-mal control problems. Then, the steps followed will be generalized for multiple segment problems with slight modifications.

2.6.1 Single Segment OCP

The resulting NLP for a single segment problem after transcription is defined at equation 2.64 and it will be a starting point.

Because of the fact that Matlab is a strong programming language for ma-trix calculations, it is reasonable to avoid loops and perform calculations by matrix operations. For this reason, in order to facilitate the computations and store the trajectories of states following capital X matrix is defined.

X =        x1,1 x2,1 x3,1 . . . xnx,1 x1,2 x2,2 x3,2 . . . xnx,2 x1,3 x2,3 x3,3 . . . xnx,3 .. . ... ... . . . ... x1,N +1 x2,N +1 x3,N +1 . . . xnx,N +1        (2.45)

As it is mentioned earlier that although, states are only collocated at LGR nodes, they are approximated at nodes plus +1 location. That’s why, X ∈ IR(N +1)x(nx)

. Where nx shows the total number states.

On the other hand, controls are only approximated at LGR nodes, therefore, it makes sense to define the following matrix U ∈ IR(N )x(nu).Where nu shows the total number controls.

U =        u1,1 u2,1 u3,1 . . . unu,1 u1,2 u2,2 u3,2 . . . unu,2 u1,3 u2,3 u3,3 . . . unu,3 .. . ... ... . . . ... u1,N u2,N u3,N . . . unu,N        (2.46)

It is worth to mention that, under-scripts (.)n,t, are used to denote

state/-control and node locations respectively.

Decision variable of the NLP is created by stacking states, controls, initial time and final time of the NLP into a column vector. After definition of the capital X,U matrices, the decision variable is denoted with a column vector of Z ∈ IR((N +1)(nx)+(N )(nu)+2); Z =     X(:) U (:) t0 tf     (2.47)

(37)

Where, (.)(:) is a pseudo-code in Matlab for column stacking operation. When differential constraints are taken into account, right hand side dy-namics, which was denoted with f before, can be also computed in matrix form at collocation nodes as in following way with F ∈ IR(N )x(nx);

F =        f1,1(X(1, :), U (1, :), τ1) . . . fnx,1(X(1, :), U (1, :), τ1) f1,2(X(2, :), U (2, :), τ2) . . . fnx,2(X(2, :), U (2, :), τ2) f1,3(X(3, :), U (3, :), τ3) . . . fnx,3(X(3, :), U (3, :), τ3) .. . ... ... f1,N(X(N, :), U (N, :), τN) . . . fnx,N(X(N, :), U (N, :), τN)        =        ˙ x1,1 x˙2,1 . . . x˙nx,1 ˙ x1,2 x˙2,2 . . . x˙nx,2 ˙ x1,3 x˙2,3 . . . x˙nx,3 .. . ... ... ... ˙ x1,N x˙2,N . . . x˙nx,N        (2.48)

In this F matrix , (.)(R, :), notation is a pseudo-code of Matlab which indicates Rthrow of matrix (.).

After defining X,U matrices, differential state constraints, which are also called differential defects at nodes in literature, can be expressed as follows with ∆ ∈ IR(N )×(nx);

∆ = DLGRX − tf − t0

2 F = 0 (2.49)

The boundary conditions of the NLP are assumed to be linear and they are denoted with a column vector of φ ∈ IRnφ ,where nφdenotes the number of boundary conditions.

φ(Z) = AφZ = 0 (2.50)

In the equation above, Aφ ∈ IR(nφ)×(((N +1)(nx)+(N )(nu)+2)). Also, one can argue that boundary conditions might be an inequality constraint, however, in RSOPT it is assumed to be in this way. Also, it is possible to implement inequality boundary conditions by not implementing them as boundary con-ditions but giving them as bound constraints regarding states or controls.

Similarly, the path constraints of the NLP are also assumed to be linear and they are denoted with a column vector of C ∈ IRnc where nc shows the total

number of path constraints.

CL≤ C(Z) = ACZ ≤ CU (2.51)

(38)

The bound constraints are also adapted accordingly after defining the de-cision variable of the NLP, z given at equation 2.47.

BZL ≤ Z ≤ BZU (2.52)

Where, BZLand BZU ∈ IR

(((N +1)(nx)+(N )(nu)+2))

.

Now, all the constraints of NLP given at equation 2.64 except bound con-straints are grouped together into a column vector. The reason why bound constraints aren’t included is that the NLP solver IPOPT requires that bound constraints are defined separately than others.

Gl =   0nxN 0nφ CL  ≤ G =   ∆(:) φ C  ≤ Gu =   0nxN 0nφ CU   (2.53)

In equation 2.53, 0(.)is used to indicate a zero vector with length of (.).

By following the equations 2.45-2.53, the final form of the NLP defined at equation 2.64 can be seen below. It is also the way how it is implemented in RSOPT. min Z J = ϕ[X(N + 1, :), u(N, :), 1] + tf − t0 2 w LGR g(X, U, τ ) s.t GL≤ G(Z) ≤ GU BZL ≤ Z ≤ BZU (2.54)

2.6.2 Multiple Segment OCP

Implementation details for multiple segment problems are similar to ones given for single segment problems. Since, the NLPs obtained after transcription are in similar form. Hence, the same steps will be followed. The main difference between them is the linkage constraints which multiple segment OCPs have. As mentioned before, linkage constraints exist at multiple segment OCPs due to continuity of states in between segment transitions. However, it is possible to eliminate these constraints by using the same variables at transition points. It explains that if the same variables are used both for at the end of the segment s and at the beginning of the segment s + 1, then it isn’t needed to add extra linkage constraints to the problem at hand. In order to achieve this goal and

(39)

also to facilitate the computations following state trajectory matrix is created. X =                            x1_1,1 x1_2,1 . . . x1_n_x_,1 x1_1,2 x1_2,2 . . . x1_n_x_,2 .. . ... ... ... x1 1,(N1+1) = x 2 1,1 x12,(N1+1)= x 2 2,1 . . . x1nx,(N1+1)= x 2 nx,1 x2_1,2 x2_2,2 . . . x2_n_x_,2 .. . ... ... ... x2_1,(N 2+1) = x 3 1,1 x22,(N2+1)= x 3 2,1 . . . x2nx,(N2+1)= x 3 nx,1 .. . ... ... ... .. . ... ... ... xS−1_1,(N S−1+1) = x S 1,1 x S−1 2,(NS−1+1) = x S 2,1 . . . x S−1 nx,(NS−1+1)= x S nx,1 .. . ... ... ... xS 1,NS+1 x S 2,NS+1 . . . x S nx,NS+1                            (2.55) Where X ∈ IR( PS s=1N(s)+1)x(nx)

. In this matrix, (.)(s)n,t, upper-script s

de-notes the segment and under-scripts, n, t are used to indicate states and node locations same as in a single segment case. After observing this matrix, one can notice that the same variable is defined for both end of segments and be-ginning of the following segments.

Now similar to one we have in a single segment, also to store the control trajectories following U, the matrix is also defined in RSOPT.

U =                          u1 1,1 u12,1 u13,1 . . . u1nu,1 u1_1,2 u1_2,2 u1_3,2 . . . u1_n_u_,2 .. . ... ... . . . ... u1 1,N(1) u1_2,N(1) u1_3,N(1) . . . u1_n u,N(1) u2 1,1 u22,1 u23,1 . . . u2nu,1 .. . ... ... . . . ... u2 1,N(2) u 2 2,N(2) x 2 3,N(2) . . . u 2 nu,N(2) .. . ... ... ... ... .. . ... ... ... ... uS 1,1 uS2,1 uS3,1 . . . uSnu,1 .. . ... ... ... ... uS_1,N(S) u S 2,N(S) u S 3,N(S) . . . u S nu,N(S)                          (2.56) Where U ∈ IR( PS s=1N(s))x(nu)

(40)

seg-ment numbers and under-scripts, n, t denotes controls and node locations. Decision variables of the multi segment NLP will be created by stacking states, controls, initial time and final time of the NLP into a column vector. The decision variable is denoted with a column vector of Z with a length of (PS s=1N (s)_)(n x+ nu) + nx+ 2. Z =     X(:) U (:) t0 tf     (2.57)

Where, (.)(:) is a pseudo-code in Matlab for column stacking operation. Also, one thing which should be noted is that the initial time is contained in the first segment and the final time is contained in the last segment of the problem.

Analogously, the right hand side dynamics are also computed in matrix form at collocation nodes as in the following form.

F =              f_1,1(1)(X(1, :), U (1, :), τ1) . . . fn(1)x,1(X(1, :), U (1, :), τ (1) 1 ) .. . ... ... f_1,N(1)(1)(X(N (1)_{, :), U (N}(1)_{, :), τ}(1) N(1)) . . . f (1) nx,N(1)(X(N (1)_{, :), U (N}(1)_{, :), τ}(1) N(1)) f(2) 1,N(1)₊₁(X(N (1)_{+ 1, :), U (N}(1)_{+ 1, :), τ}(2) 1 ) . . . f (2) nx,N(1)+1(X(N (1)_{+ 1, :), U (N}(1)_{+ 1, :), τ}(2) 1 ) .. . ... ... f(S) 1,PS s=1N(s) (X(end − 1, :), U (end, :), τ(S) N(S)) . . . f (S) nx,PS_s=1N(s)(X(end − 1, :), U (end, :), τ (S) N(S))              =             ˙ x(1)_1,1 . . . x˙(1)_n x,1 .. . ... ... ˙ x(1)_1,N(1) . . . x˙ (1) nx,N(1) ˙ x(2)_1,1 . . . x˙(2)_n_x_,1 .. . ... ... ˙ x(S)_1,N(S) . . . x˙ (S) nx,N(S)             (2.58)

In this matrix, the under-scripts and upper-scripts are identical to previous ones. However, the notation used (.)(end, :) is a pseudo-code in Matlab which takes the last row of corresponding the matrix. Also, (.)(end − 1, :) means one row before the last row.

Before proceeding to differential defect constraints compact form which is the implemented version, it is necessary to define some additional matrices

(41)

for completeness. The first one is the composite LGR differentiation matrix DLGR_compact ∈ IRM1xM2

. Where, M1 = PS_s=1N(s) and M2 = (PS_s=1N(s)) +

1.It’s structure can be seen below.

D_compactLGR =                                  • • • • 0 • • • • • • • • • • • • • • • • • • • • • • • • • • • • . . . . . . • • • • • • • • • • • • • • • • • • • • • • • • 0 • • • • • •                                 

When the compact differentiation matrix’s structure is observed, it consists of the blocks indicated with black dots. These blocks are the LGR differen-tiation matrices for each mesh interval. It can be seen that they are located one column before the diagonal due to same state variable usage in between segment transition matrices. In addition to these, all the blocks are matrices with size of N(s)by (N(s)+ 1).

The last definition before the differential defect constraints is the compact time difference matrix T ∈ IRT1xT2. Where, T1 = T2 = (

PS

s=1N (s)₎

(42)

T =                            t1_f − t1 0 0 . ._. 0 t1_f − t1 0    0    t2_f − t2 0 0 . ._. t2_f − t2 0    . ._. . ._. 0    tS_f − tS 0 0 . ._. 0 tS_f − tS 0                           

Hence, by using all the definitions given at equations 2.55-2.6.2, the dif-ferential constraints defined at equation 2.41 is re-written in a new compact form as follows ; ∆ = DLGR_compactX − 1 2T F = 0 (2.59) Where, ∆ ∈ IR( PS s=1N(s))x(nx)_.

The assumptions made for single segment OCPs will be also valid in this case. Identical steps are followed for boundary condition constraints, path constraints and bound constraints. Moreover, the same notations are used for easiness. The only distinction is their sizes and they are presented after each formulation.

The boundary conditions of the NLP are assumed to be linear and they are denoted with a column vector of φ ∈ IRnφ ,where nφ denotes the number of boundary conditions.

φ(Z) = AφZ = 0 (2.60)

Aφ ∈ IR(nφ)x(A1). Where, A1 = (PS_s=1N(s))(nx + nu) + nx+ 2), which

is equal to length of Z.

The path constraints of the NLP are also assumed to be linear and they are represented with a column vector of C ∈ IRnc where nc shows the total number of path constraints.

(43)

Where, Ac ∈ IR(nc)x(A1)),CLand CU ∈ IR(nc). A1 is the same as defined

above.

The bound constraints are also adapted accordingly after defining the de-cision variable of the multiple segment NLP, Z given at equation 2.57.

BZL ≤ Z ≤ BZU (2.62)

Where, the length of the BZL and BZU is equal to the length of Z, which

is shared above as A1.

Now, the last step is identical to one which is done for a single segment case. All the constraints except bound constraints are grouped together into a column vector. Gl =   0₍PS s=1N(s))x(nx) 0nφ CL  ≤ G =   ∆(:) φ C  ≤ Gu =   0₍PS s=1N(s))x(nx) 0nφ CU   (2.63) In equation 2.63, 0(.) is used to indicate a zero vector with length of (.).

Hence, by combining all definitions and results obtained from the equa-tions 2.55-2.63, the final form of the NLP defined equation 2.41 is;

min

Z J = ϕ(X(end, :), U (end, :), τ (end)) + S X s=1 t(s)_f − t(s)₀ 2 w (s)LGR g(X(s), U(s), τ(s)) s.t GL≤ G(Z) ≤ GU BZL ≤ Z ≤ BZU (2.64)

2.6.3 Constraint Jacobian Computation

In order to accelerate the convergence time of NLP, it is crucial to supply con-straint jocabian information to the solver used. Since, the decision variable of the NLP is defined as a column stacking vector of states, controls, initial time and final time, Z given at equation 2.57, jacobian of the constraints cor-responds to gradient information with respect to variable Z. Also, it must be noted that IPOPT, which is the NLP solver used in RSOPT, requires the gradi-ent of constraints except bound constraints. Hence, jacobian can be computed as follows;

(44)

∇ZG =   ∇Z∆(:) ∇Zφ ∇ZC   (2.65)

Since, it was assumed to have linear boundary constraints and path con-straints, we have;

∇Zφ = Aφ

∇ZC = AC

(2.66) Where, Aφ and AC are constant matrices. The only changing part of the

Jacobian matrix over NLP iterations is gradient of differential defects. In or-der to evaluate this part of the Jacobian matrix, the gradient operator ∇Z is extended as follows;

∇Z = [∇x∇u∇t] (2.67)

Where ∇tcontains both t0and tf. When differential defects given at

equa-tion 2.59 are taken into account, the gradients with respect to x, u, t can be computed as follows; ∇x∆ = DcompactLGR ∂X ∂x − 1 2T ∂F ∂x ∇u∆ = − 1 2T ∂F ∂u ∇t∆ = − 1 2 ∂T ∂tF (2.68)

If equation 2.73 is considered, one can notice that non-constant terms in the equations are only gradients of the right hand side dynamics, F . Most of the time, this function includes highly nonlinear dynamics in it so it isn’t easy to obtain exact gradients algebraically. For this reason, the automatic differentiation package Adigator, which is introduced at the related section of this thesis, is used here to attain these parts of the gradients. After obtaining necessary function files which computes the gradient of the F with respect to x and u with the help of Adigator package, the results are combined with usage of linear algebra in order to supply Jocabian of constraints in the form of equation 2.65 to NLP solver IPOPT.

2.6.4 Hessian of Lagrangian Computation

Supplying hessian of the Lagrangian information to NLP solver also speeds up the convergence time. Therefore, it will be also taken into consideration.

(45)

The computations are carried out in a similar way to jacobian computation. However, due to time restrictions of this thesis, hessian computations are only implemented for specific types of optimal control problems in RSOPT. These problems are the ones with linear objective function without any integral cost. In addition to this, also it was assumed to have linear boundary constraints and linear path constraints.

L = ϕ(Z) + λG(Z) = ϕ(Z) +λ∆ λφ λC   ∆(Z)(:) φ(Z) C(Z)   (2.69)

Where, L represents the lagrangian of the NLP. Also, λ denotes Lagrangian multiplier vector. Then hessian of the Lagrangian is computed as follows;

∇2 ZZL = ∇ 2 ZZϕ(Z) +λ∆ λφ λC ∇2ZZ     ∆(Z)(:) φ(Z) C(Z)     (2.70)

Hessian of Lagrangian ∇2ZZL is an L by L matrix. Where L is equal to

the length of the decision variable.

The second derivatives of all linear functions will be equal to zero. There-fore, the only contribution to hessian of Lagrangian will come from differential defect constraints. ∇2 ZZϕ(Z) = 0 ∇2 ZZφ(Z) = 0 ∇2 ZZC(Z) = 0 (2.71)

The operator ∇2Z can be written as follows;

∇2 ZZ =   ∇2 xx ∇2ux ∇2tx ∇2 ux ∇2uu ∇2tu ∇2 xt ∇2ut ∇2tt   (2.72)

Then, the second derivatives of the differential defect constraints can be computed as follows;

(46)

∇2_xx∆ = −1 2T ∂2F ∂x∂x ∇2 ux∆ = ∇ 2 xu∆ = − 1 2T ∂2_F ∂x∂u ∇2 tx∆ = ∇ 2 xt∆ = − 1 2 ∂T ∂t ∂F ∂x ∇2 uu∆ = − 1 2T ∂2F ∂u∂u ∇2_tu∆ = ∇2_ut∆ = −1 2 ∂T ∂t ∂F ∂u ∇2 t∆ = 0 (2.73)

Similar to what is done for Jacobian computation, the automatic differen-tiation package Adigator is used to create necessary function files in order to compute second derivatives of F with respect to x and u. Then, the results are combined properly with usage of linear algebra in order to create a hessian of Lagrangian matrix.

(47)

2.7 RSOPT Structure

In this section, the general workflow of the RSOPT is explained. The general structure of it can be seen below at figure 2.7;

RSOPT OCP SOLVER

Problem Initialization and Files Function Files

Bounds & Boundary Conditions Initial Mesh

Error Tolerance Max & Min Nodes

Problem Information Setup Call Adigator Discretization Information Matrices Call IPOPT

Checking Relative Error

If E >

Mesh Refinement If E <

Optimal Solution

Figure 2.7: RSOPT Structure

From the flowchart, it can be observed that the first step of the algorithm is to receive user defined function files which describe the optimal problem and some parameter values which are used at regarding parts of RSOPT’s al-gorithm.

(48)

In order to describe the optimal control problem, one can visit the general OCP formulation given at equation 2.30. The function files must include prob-lem’s dynamic equations and objective function. In addition to these files, the constraint information ,such as boundary conditions and bounds on states and control, are also needed to be included. After describing OCP with necessary files, additional parameter values must be also supplied to RSOPT such initial mesh intervals,maximum and minimum number of nodes which can be used at any mesh interval, and solution accuracy tolerance. All these parameters are used in RSOPT’s mesh refinement algorithm which is explained at section 2.4.

After defining OCP and necessary parameter values, algorithm first trans-form continuous optimal control into a finite dimensional NLP in a way which is explained at section 2.3.1 and creates the matrices defined at section 2.6.2 with initial guess computed by interpolating given boundary conditions.

After transformation is carried out, algorithm calls to external package Adigator only once in order to create files which are used to construct jocabian of constraints and hessian of lagrangian. This is explained at sections 2.6.3 -2.6.4.

Then, all the necessary information and files would be ready for the al-gorithm to call external package IPOPT for solving the problem. After the problem is solved and solution is obtained with current approximation, then relative errors are estimated in each mesh interval for states accordingly equa-tions 2.43-2.44 given at section 2.4.1. If the solution accuracy isn’t met with the desired one then the algorithm enters the mesh refinement procedure and determines new mesh intervals in accordance with the algorithm 1 given at section 2.4.

When mesh refinement is performed, then the algorithm transcribes the problem with refined approximations and the initial guess is made based on the solution obtained before. This refinement procedure is maintained until desired accuracy is attained. When the solution is met with the user-defined accuracy tolerance then the algorithm stops and gives the optimal solution.

(49)

2.8 RSOPT Example Problems

In this section, two example problems will be solved and results will be shared for the purpose of validation and comparison. The first example is taken from [19] which has an analytical optimal solution. The second example problem is space shuttle re-entry trajectory optimization problem taken from [14].

2.8.1 Example Problem-1

The problem formulation can be seen below, which has an objective function with integral cost, differential constraints and boundary conditions on state.

min x,u J = Z tf t0 (x2+ u2)dt s.t ˙x(t) = −x(t) + u(t) x(t0) = x0 x(tf) = xf

This problem is also called as hypersensitive problem in literature for large values of tf. [19]. However, in this thesis this behaviour won’t be investigated. The reason why this problem is included here is that it has an analytical optimal solution. It enables to validate RSOPT’s algorithm by comparing the results obtained with analytical solutions.

The analytical solution to this problem is in form of;

x∗(t) = c1et √ 2_{+ c} 2e−t √ 2 u∗(t) = c1(1 + √ 2)et √ 2_{+ c} 2(1 − √ 2)e−t √ 2 (2.74) Where; c1 = x(t0)e−tf √ 2_{− x(t} f) e−tf √ 2_{− e}tf √ 2 c2 = x(tf) − x(t0)etf √ 2 e−tf √ 2_{− e}tf √ 2 (2.75)

The optimal control problem is solved for tf = 100, x(0) = 1.5 and

x(100) = 1.5. For solution accuracy tolerance, = 10−6 is used. The optimal solution for objective function is found as;