Predicting the ground effect in drone landing with online learning

(1)

SECOND CYCLE, 30 CREDITS STOCKHOLM SWEDEN 2019,

Predicting the ground effect in drone landing with online learning

HELEN FROM

(2)

(3)

in drone landing with online learning

HELEN FROM

Master in machine learning Date: October 15, 2019

Supervisor: Gustav Eje Henter Examiner: Hedvig Kjellström

School of Electrical Engineering and Computer Science

Host organization: National Institute of Applied Sciences of Lyon Supervisor at host organization: Christian Wolf

(4)

(5)

called the ground effect. Traditional models fails to describe the full complex- ity of these forces and also lack the ability to adapt to differences in ground effect due to the environment. In this thesis I implemented a deep neural network to predict the ground effect from the state (i.e. altitude, velocity and attitude) and control input (i.e. rotor speeds) of a drone, which in previous work has shown to be able to improve control accuracy in drone landing. As an extension of this, I added online learning to be able to adapt to new environments or new values of the ground effect that are not know in advance. The method was tested in simulations and showed to improve landing in different scenarios.

(6)

ken, den så kallade markeffekten. Traditionella modeller lyckas inte efterlikna dessa krafters fulla komplexitet och de saknar även anpassningsförmåga till skillnader i markeffekter på grund av miljön. I det här examensarbetet har jag implementerat ett djupt neuralt nätverk för att förutspå markeffekten från en drönares tillstånd (d.v.s. altitud, hastighet och orientering) och kontrollinput (d.v.s. rotorhastighet), vilket i tidigare arbeten har visats förbättra styrnings- noggrannheten vid landning av drönare. Som en utveckling till detta så lade jag till online learning för att kunna anpassa flygningen till nya värden av markeffekten som inte var kända i förväg. Metoden testades i simuleringar och visade sig förbättra landning i olika scenarion.

(7)

en présence près du sol, appelées l’effet de sol. Les modèles traditionnels ne parviennent pas à décrire toute la complexité de ces forces et manquent éga- lement d’adaptabilité face aux variations de l’effet de sol dues à l’environne- ment. Dans cette thèse, j’ai implémenté un réseau de neurones profonds pour prédire l’effet de sol à partir de l’état (altitude, vitesse et orientation) et des entrées de contrôle (vitesses du rotor) d’un drone, qui s’était montré, dans des travaux précédents, capable d’améliorer la précision du contrôle des drones lors de l’atterrissage. J’ai étendu les travaux existants en ajoutant une partie consacrée à l’apprentissage incrémental permettant une adaptation à de nou- veaux environnements ou à de nouvelles valeurs de l’effet de sol inconnues à l’avance. La méthode a été testée sur des simulations et a montré une amélio- ration de l’atterrissage dans différents scénarios.

(8)

1 Introduction 1

1.1 Problem . . . 1

1.2 Research question . . . 3

1.3 Delimitations . . . 3

1.4 Sustainability, ethics, and societal aspects . . . 3

1.5 Thesis outline . . . 5

2 Background 6 2.1 Control theory . . . 6

2.1.1 Feedback linerarisation . . . 6

2.1.2 PID controller . . . 7

2.2 Drone control . . . 8

2.2.1 Drone state . . . 8

2.2.2 Nonlinear feedback-linearising drone controller . . . . 10

2.3 Ground effect . . . 10

2.4 Artificial neural networks . . . 11

2.4.1 Feed-forward neural networks . . . 12

2.4.2 Network training . . . 13

2.4.3 Deep neural networks . . . 17

2.5 Learning with changing data . . . 19

2.5.1 Continual learning . . . 19

2.5.2 Concept drift . . . 21

3 Method 23 3.1 Method overview . . . 23

3.2 Technology stack . . . 24

3.2.1 PyTorch . . . 24

3.2.2 Gazebo . . . 24

3.2.3 Robotics operating system . . . 24

(9)

3.3 The simulator . . . 25

3.3.1 Ground effect . . . 25

3.4 Controller . . . 26

3.5 Data gathering and network training . . . 29

3.5.1 Gathering data . . . 29

3.5.2 Data preprocessing . . . 29

3.5.3 Network . . . 30

3.6 Evaluation . . . 31

3.6.1 Confidence interval . . . 31

3.6.2 Hypothesis testing . . . 32

3.6.3 Error measure . . . 33

4 Experiments and discussion 34 4.1 Offline vs. baseline . . . 34

4.1.1 Experiment . . . 34

4.1.2 Results . . . 34

4.1.3 Discussion . . . 36

4.2 Offline learning of several parameters . . . 36

4.2.2 Results . . . 36

4.3 Offline vs. online learning . . . 39

4.3.2 Results . . . 39

4.4 Remembering previous knowledge . . . 41

4.4.2 Results . . . 42

4.5 Comparing online methods . . . 43

4.5.2 Results . . . 44

4.6 Flight test . . . 46

4.6.2 Results . . . 48

4.7 Complementary discussion . . . 49

4.7.1 Size of the ground effect . . . 49

(10)

4.7.2 Real obstacles . . . 49

5 Conclusion and future works 51

5.1 Conclusion . . . 51 5.2 Future work . . . 52

Bibliography 53

(11)

Introduction

This chapter introduces the background and motivation behind this thesis. A brief outline of the approach of this thesis is provided followed by the research question explicitly stated. Finally, some of the impact and relevance to society is presented.

1.1 Problem

A drone is an unmanned aerial vehicle (UAV) that can be controlled remotely or autonomously. A quadrotor is probably the type of vehicle that most people associate with the word drone. It is a multirotor helicopter, steered with four lifting propellers, see figure 1.1. From now on when I use the word drone, this is what I am referring to.

Figure 1.1 – Drone The applications for drones are numerous, with

examples as surveillance, taking measurements, video capture and monitoring land and property.

Research on autonomous drones is therefore a topic of high relevance with great advances during the last few years and with expectations of more to come.

Most of the state of the art solutions for autonomous drone navigation are vision based approaches that are able to spot things like gates in a

drone race competition. Kaufmann et al. [1] used a deep-learning approach with visual input to be able to adapt to new course layouts, a solution which made them the winners of the IROS Autonomous Drone Race Competition 2018, [2]. Another approach is proposed by Li et al. [3] with the advantage of being computationally efficient to detect gates in a racing environment.

(12)

One problem that automatic drone control encounters is the aerodynamic changes in lift and drag that can be experienced by an aerial vehicle when being close to the ground or other fixed surfaces. This change in forces is called the ground effect. Many autonomous drone controllers still perform poorly close to obstacles and are still in the need of human pilots in for example take-off and landing.

The first attempt to model the ground effect for helicopters were made by Cheeseman and Bennett [4] in 1955. In the case of drones it is much harder to model the ground effect due to a higher complexity. Sanchez-Cuevas, Heredia, and Ollero [5] study the ground effect specifically for multi-rotor vehicles and they show that the ground effect behaves differently compared to single-rotor vehicle. The airflow generated by the propellers interacts with the air around the drone, the environment but also with the airflow from other propellers.

It is however possible to increase stability in automatic multi-rotor landing by modelling the ground effect as shown by Danjun et al. [6] but a model based approach lacks the ability to capture more complex and potentially shifting behaviours of the ground effect.

In the paper ”Neural Lander: Stable Drone Landing Control Using Learned Dynamics”, Shi et al. [7] propose a novel way of approaching the problem, namely to learn the ground effect with a neural network. The solution is an offline supervised learning method which uses the drone state (e.g.

altitude, velocity and attitude) and control input (e.g. rotor speeds) to predict the ground effect. The predicted ground effect is then taken into account in a nonlinear feedback-linearization controller to steer the drone. However the Neural Lander solution is likely to generalise poorly to new situations. The ground effect will behave differently in different environments and this is not taken into account in their model. The drone may approach obstacles and the air would thus deviate, disperse etc. A drone which lands on a flat surface will experience a very different kind of ground effect than a drone which lands near a wall or one which lands near a complexly shaped object like a chair.

In this thesis project, I started from the method proposed in the Neural Lander paper. To it, I then added online learning (updates of the network during flight) to be able to generalise better to new environments, or more specifically, new values of the ground effect that had not been seen during the offline training phase. This was implemented and tested in a simulator environment. The results show that this is a promising approach that is able to improve the drones landing ability in different scenarios.

(13)

1.2 Research question

The research question that this thesis will address is:

Is it possible to predict the ground effect using online learning to improve drone landing and to generalise to new situations?

More specifically, I will look at a nonlinear feedback-linearising controller that includes predictions of the ground effect from a neural network. The network will be updated online, on data measured during the flight, to be able to adapt to the current environment. The drone controller will be tested, first and foremost, in landing experiments and the ground effect will be varied to test the adaptation-ability of online networks, in comparison with offline networks, predicting the ground effect.

1.3 Delimitations

This thesis tackles one step in a larger project where the aim is to develop a controller with predictions over some future horizon. In this thesis we use a fairly simple controller which does not optimise over more than the current time step. The main focus of this thesis is to develop a machine learning approach to learn the ground effect and adapt it online. This will later be used in a controller that predicts several time steps into the future, e.g. a model predictive controller, but that is not part of this thesis.

1.4 Sustainability, ethics, and societal as- pects

Machine learning has grown popular in many different domains lately and drones have also gained more and more interest. Some of the more well known usages for drones are military usage, or video capture for personal use or movie creation. An example of when drones can be used in combination with machine learning is surveillance of crops to be able to interpret meteorological changes, to spot places that needs water or pesticides, to see when it is the best time to harvest and so on. This is a good way to save money and resources and at the same time help the environment by reducing the use of pesticides.

Drones in combination with machine learning can also be used for docu- menting architectural or historical objects or places to be able to 3D map them or in some cases to use in restoration.

(14)

The usage of drones is also good for the police or firemen in situations like emergency rescue. To be able to locate a person in danger, to be able to deliver food or water to places with difficult access or to find a rescue path are some tasks that could potentially save lives.

In all these applications, the ability to fly stable near different surfaces is important. Improving the autonomy of drone navigation would facilitate the usage of drones in many different applications without the usage of a pro- fessional pilot. This thesis will contribute to the research of an approach to increase the stability in autonomous drone navigation.

One ethical issue with drones is the problem of privacy. If your neighbour were to fly a drone in his backyard, you might be concerned that his drone camera is taking images of you. It is a serious concern if we can no longer feel privacy in our own homes. On the other hand, drone cameras that are used in surveillance can help keeping criminality down and protecting people and property. The problem is that it is hard to draw a limit between where it is used for the greater good or when it is invading on our privacy and freedom.

To fully discuss the ethics in military use of drones is a discussion that will be far too long for this report so I will only mention some thoughts that I find important. I believe that drones risks to increase power unbalance. There is always an unbalance between rich and poor countries in a war situation but drones risk to make this gap even greater. For example, using drones in wars means that you no longer need to risk lives of your own men but this is a luxury that everyone cannot afford. Secondly, a drone can be fairly anonymous and can therefore be sent out on missions with a much smaller risk of being tracked to its owner compared to sending out a soldier in war. Thus, drones in military usage can facilitate actions as attacks or wars to a relatively small cost. This in its turn could potentially lead to an increase in these actions.

Another ethical issue in military usage of drones is its degree of autonomy. If a drone is autonomously steered, should it also autonomously spot and choose its targets? Can we trust facial recognition for spotting the right targets and how can we guarantee that no innocents are harmed? I believe that there are a lot more questions like this that are really difficult to answer. Con- sidering that this potentially is a gamble of peoples lives, I believe that this needs to be taken very seriously and rules needs to be set on a higher level to prevent misuse.

(15)

1.5 Thesis outline

In chapter 2 I introduce the important background theory and related works of this thesis. Chapter 3 describes the method of implementation of the controller as well as the neural network predicting the ground effect. In chapter 4, the different experiments and their results are presented and discussed. Finally, in chapter 5, a summary of the conclusions, answer to the research question and some ideas of future works are presented.

(16)

Background

This chapter introduces the background theory needed to follow this thesis.

It starts by introducing the relevant Control Theory that concerns the drone controller in this thesis and then continues on to present the theory of Artificial Neural Networks and Online Learning.

2.1 Control theory

Control theory or Automatic control is the theory and technology of regulating a process without or with minimal direct human intervention. This section contains some theory that is useful in automatic control, that will be used later to control the drone.

2.1.1 Feedback linerarisation

The key of feedback linearisation [8] is to transform a nonlinear system to a linear one. In drone control, a non-linear system of the dynamics of the drone can be linearised and solved to calculate the control inputs of the drone.

Consider a nonlinear system of the following form:

˙x =f (x) + g(x)u (2.1)

y =h(x) (2.2)

Here x is a state vector, ˙x is the time derivative of x, u is the vector of inputs and y is the output.

The system is transformed using a controller of the type: u = r(x, v) where v is a new intermediate control input. The u is chosen so that the system is linear with respect to the new control input v. In order to find this

(17)

transformation, the successive derivatives of each yiis calculated until the input appears in the expression of the derivative. This results in an equation of the type





 y₁^(k¹⁾

... yn^(kⁿ⁾





= A(x)u + b(x) (2.3)

where kidenotes the number of times yineeds to be differentiated in order to make the input appear. Assuming that A(x) is invertible, the transformation becomes:

u = A⁻¹(x)(v − b(x)), (2.4)

forming the linear system:





 y⁽¹⁾

... y⁽ⁿ⁾





=





 v₁

... v_n





. (2.5)

The new system can then be controlled with standard linear techniques as a PD-controller, as described in the next section.

2.1.2 PID controller

A proportional-integral-derivative controller (PID controller) provides a sim- ple yet efficient solution to many real-world control problems [9]. The controller aims to minimise an error over time by adjusting a control variable u(t).

The controller includes three parts: one that is proportional to a given error signal e(t), another that integrates the error over time, and a third that takes the time derivative of the error, where the error at a time t is the difference between some desired and measured value, e.g. position or velocity.

The control signal is given as a sum of the three parts, u(t) = Kpe(t) + Ki

Z t 0

e(t⁰)dt⁰+ Kd

de(t)

dt (2.6)

where Kd, Kiand Kdare constants that need to be tuned to minimise the total error.

The proportional term provides control directly proportional to the error, the integrative term reduces steady-state errors and the derivative term has a damping effect on rapid changes.

A simpler version of the PID-controller is the PD-controller where the integrative part is removed.

(18)

2.2 Drone control

The ability to autonomously steer a drone without a human pilot would facilitate its usage in many different applications. However, because of the nonlinear behaviour of drones, this becomes a difficult task.

One type of automatic controller for drones can be developed using feedback linearisation [8]. The feedback linearising-controller is simple to under- stand and implement while still showing good performance. Another way of steering a drone is by using Model Predictive Control (MPC) [10]. MPC optimises the control signal by predicting the drone states over some future time steps, implements the current time step and then optimises again. This makes the controller being able to anticipate future events and act accordingly which is the main advantage of the MPC.

For this thesis we used a feedback-linearising controller because of its simple nature and because there was one almost fully implemented and integrated in the simulation environment available at the laboratory. It is however desired in the future to implement a controller such as MPC with some predictions based on optimisation over multiple steps in the future. Before I go more into feedback-linearising control in the application of drones, I will explain some useful techniques to measure the state of a drone.

2.2.1 Drone state

Steering a drone requires knowledge of the state of a drone. The state can be its position, velocity and attitude (rotation). The position and velocity are usually represented by 3D vectors in relation to some frame of reference. The attitude of an object or vehicle can be parameterised in different ways, here follow brief explanations of two of these.

Yaw, pitch, and roll

A rotation from one Cartesian frame to another can be described by three angles ψ, θ and ϕ. These angles are sometimes called the Euler angles. There are different definitions of the Euler angles, and the definition used in this thesis is also named roll, pitch and yaw [8]. The combined rotation can be derived

(19)

from the following rotation matrix:

R(ψ, θ, ϕ) =







cos ψ − sin ψ 0 sin ψ cos ψ 0

0 0 1













cos θ 0 sin θ

0 1 0

− sin θ 0 cos θ













1 0 0

0 cos ϕ − sin ϕ 0 sin ϕ cos ϕ







(2.7) where the first matrix corresponds to the roll, the second is pitch and the third yaw as illustrated in figure 2.1.

Figure 2.1 – An illustration of the roll, pitch and jaw axes of an air-plane.

Source: Yaw_Axis_Corrected.svg via Wikimedia Commons: ^https://

commons.wikimedia.org/wiki/File:Yaw_Axis_Corrected.svg.

Licence incorporated under Creative Commons CC BY-SA 3.0 Quaternions

Another way of measuring the attitude of a body is with quaternions. Quater- nions can be seen as a generalisation of complex numbers where a quaternion is defined as:

q = q_w+ q_xi + q_yj + q_zk (2.8) Here i, j and k are Cartesian unit vectors defined by the following rules:

i² = j² = k² = ijk = −1, (2.9)

ij = k, jk = i, ki = j, (2.10)

ji = −k, kj = −i and ik = −j (2.11) [11].

(20)

The idea of measuring the attitude of a body with quaternions comes from Euler’s rotation theorem stating that any kind of rotation or sequence of rota- tions about a fixed point can be described as a single rotation around a rotation axis [12].

A rotation around a normalised vector (x,y,z) with the angle θ can be ex- pressed in quaternions as:

q = cos(θ

2) + sin(θ

2)(xi + yj + zk) (2.12)

Hence the attitude of a drone relative to an inertial frame can be represented by the four values (q^w, q_x, q_y, q_z) representing the quaternion according to equation (2.8) and (2.12).

2.2.2 Nonlinear feedback-linearising drone controller

A feedback linearising controller in the application of drones is presented by Voos [13]. The state vector he presents is a vector of velocities: vx, vy, vz, Euler angles: ψ, θ and ϕ, and rotational speed of the Euler anglesψ, ˙θ and ˙˙ ϕ.

The system is then divided into an attitude control system and a velocity control system. Feedback linearisation is used to transform the nonlinear attitude control system to a linear one. Finally the two systems are solved indepen- dently and together they give an input signal which is directly related to the angular velocities of each rotor which are used to steer the drone.

This thesis uses a similar controller, with a few differences such as choice of frame of reference and application of the feedback linearisation. We, for the latter one, applied it in both position/velocity control as well as attitude control. To this we also added a dependency on the ground effect which is not included in Voos’s proposed controller [13].

2.3 Ground effect

In this thesis it will be relevant to model the ground effect since we are working in a simulation environment where the ground effect does not naturally occur.

A commonly used model for the ground effect for helicopters was introduced by Cheeseman and Bennett [4]. To adapt this model to quadrotors or drones, Danjun et al. [6] proposed a slight modification, adding a constant ρ > 0 that needs to be tuned. Their model is as follows:

T_input

T_output = 1 − ρ R 4z

2

(2.13)

(21)

where z is the altitude of the rotor and R is the size of the rotor radius. Tinput

is here the thrust commanded which I will call f^rotor, and it can be calculated as

f_rotor = C_T · n² (2.14)

with C^T being a thrust constant and n being the rotor speed. T^output is the resulting output thrust which can be seen as a sum of the rotor thrust frotorand the ground effect fge.

The result of this model is that rotors spinning at a given rate of rotation produce a greater thrust near the ground then they would do at higher altitude (greater z). The effect increases with increased ρ. The model was proposed for a drone with rotors parallel to the ground and the effect of the rotors ori- entations relative to the ground has not been tested in the paper [6].

2.4 Artificial neural networks

An artificial neural network (ANN) is a computational model which consists of processing units, called neurons, and weights that represent connections between the neurons. The network is fed data called input. The input is data vectors of features describing the data, e.g. rgb values of pixels in a picture or answers of a questionnaire. The features are represented by numerical values that can be either of Boolean type, usually represented with a one or zero, a discrete numerical value or a continuous value.

Each neuron takes the weighted sum of its inputs plus a constant bias term, potentially runs it through a transfer function called activation function, and then outputs a result. The idea of the network is to process external inputs and return as output some function of that input. The output is, similarly to the input, a vector of features. The mathematical function that the network computes can be simple or complex and nonlinear. By updating the weight coefficients, the network can be trained to compute or approximate a functions of interest.

In supervised learning, the weight coefficients are updated based on pairs of desired inputs and outputs, thus the network is trained to approximate a function specified only as examples that represent some desired behaviour.

The desired outputs are often called targets. Training minimises the difference between the target values and the outputs computed by the network from the corresponding inputs.

Another type of machine learning is unsupervised learning where the training is done without known targets and the objective is then to find pat-

(22)

terns or commonalities in the data. Finally, reinforcement learning is a third branch of machine learning where the network learns through trying to max- imise an award for each action it takes. In this thesis I will use supervised learning since both the input and the target values are known during training.

When training a neural network, the available data is usually separated into different sets. training data is the set of data used to update the weights during the training process. validation data is a set of data used during the training process to check that the network is generalising well, and not overfitting to the training data. A network that generalises well should be able to predict the target values from the inputs as well on new unseen data as on the training data, while an overfitting network shows good prediction performance only on the training data but not on new unseen data. The network is not trained on the validation data, it instead acts as a set of unseen data only to test the generalisation ability of the network during training. Lastly, a set of test data is used to assess the prediction performance of the network on new unseen data after the training process is finished.

2.4.1 Feed-forward neural networks

A feed-forward neural network is an acyclic network which means that the connections between the neurons do not form any cycles. This type of network consists of layers, subsets of neurons, where the neurons in each layer feed directly into neurons in the succeeding layer. The values are thus computed from other, previously computed values, without any recursion.

The first layer is called the input layer and the last is called the output layer.

Some networks have layers in between the input and the output, these layers are called hidden layers.

A multilayer perceptron (MLP) is a type of feed-forward neural network which is fully connected. By fully connected means that each neuron in one layer is connected to all the neurons in the next layer. As the name suggests an MLP consists of several layers with at least one hidden layer, as opposed to a single layer perceptron which is a feed-forward neural network with no hidden layers.

Figure 2.2 shows an example of an MLP with one hidden layer. The nodes in the graph represent neurons and the arrows represent the network weights.

The weights in each layer l can be gathered in a weight matrix W^(l). In the example network, the matrix W⁽¹⁾ ∈ R^3×3 contains the weights represented by the arrows between the input layer and the hidden layer. x = (x⁰, x₁, x₂)^T is the input vector, where I have included the constant bias term, as x0 = 1,

(23)

for simplicity. The input vector is passed through this network in two steps.

Figure 2.2 – A multilayered perceptron network with one hidden layer.

First the output values of the hidden layer neurons are calculated as

h = φ(W⁽¹⁾x) (2.15)

where φ(·) represents a nonlinear activation function, which will be further explained in section 2.4.3, applied elementwise on its argument.

Finally, the data is passed from the hidden layer to the output, where I again include the bias h0 = 1. Depending on the application, it is common to not pass the data through an activation function in the last layer. The output y is then given by,

y = W⁽²⁾h. (2.16)

The full network can now be described as a function f (x, θ), where θ = {W⁽¹⁾, W⁽²⁾} denotes the set of network parameters.

f (x, θ) = W⁽²⁾φ(W⁽¹⁾x). (2.17)

2.4.2 Network training

Training a neural network means to learn the values of the weights θ of the network to perform well on a specific task. This is an optimisation problem that for complex tasks cannot be solved analytically.

(24)

The optimisation problem can be defined by an objective function and a set of values over which the function is being optimised. The objective function is often referred to as a loss function L and it measures the network performance during training and testing; the smaller the loss, the more accurate the predictions. It is optimised over the set of network parameters, θ, whose values can be restricted by some constraints.

A commonly used loss function is the mean squared error (MSE), defined as

L_{M SE}(X, T, θ) = 1 N

N

X

i=1

||t_i − f (x_i, θ)||²₂ (2.18) where X = {x0, . . . , x_N} is a set of N data points, T = {t0, . . . , t_N} is a set of corresponding target values and f (xi, θ) is the network output given some input xⁱ.

Optimiser

Gradient descent or stochastic gradient descent are optimisation methods that try to find the minimum of a function by taking steps following the negative gradient of the function. The size of the step is regulated by multiplying the gradient with a learning rate λ.

In the application of neural networks, gradient descent is a type of opti- miser that tries to find the optimum of the objective function, normally the minimum of the loss function. Each update of the network weights is based on the formula:

w_ij = w_ij − λ ∂L

∂w_ij. (2.19)

Another optimiser used in training neural networks is the Adam optimiser [14]. As opposed to the stochastic gradient decent method which uses a fixed learning rate, the Adam optimiser adaptively computes and adjusts individ- ual learning rates for different parameters. The method is popular due to its efficiency of getting good results with fewer optimisation updates.

Backpropagation

The network is trained by minimizing the output of the loss function using techniques such as gradient descent, to find a local minimum. Backpropa- gation is a technique for computing the gradient term in equation (2.19) for updating the weights of the network [15], [16]. The method is divided into two phases:

(25)

In the forward pass the input is propagated forward through the network, as described in equation (2.15) and (2.16). The output of each node is stored to be used in the second phase of the backpropagation algorithm. When reaching the output layer, a loss L is calculated using the loss function of the network output and the target output.

The backwards pass consists in calculating the gradients of the loss func- tion with respect to the weights by propagating information of the loss backwards through the network. The gradients are then used to update the weights according to the optimiser, described in the next section.

Let net^j be the input of neuron j net_j =

n

X

k=1

w_kjo_k. (2.20)

with outputs of the previous layer okand weights wkj. The output ojfor neuron j is then

o_j = φ(net_j) = φ

n

X

k=1

w_kjo_k

!

. (2.21)

The partial derivative of L with respect to a weight w^ij is calculated using the chain rule twice,

∂L

∂w_ij = ∂L

∂o_j

∂net_j

∂w_ij . (2.22)

The last factor of the right hand side in equation (2.22) is simplified as follows,

∂net_j

∂w_ij = ∂

∂w_ij

n

X

k=1

w_kjo_k = ∂

∂w_ijw_ijo_i = o_i. (2.23) Here we’ve used that the sum only contains one term that depends on wij. The middle term of the right hand side in equation (2.22) is the derivative of the activation function evaluated in netj:

∂oj

∂net_j = φ⁰(net_j). (2.24)

The first factor in the right hand side of equation (2.22) is easy to calculate if the neuron is in the output layer. If the neuron is in an arbitrary layer j, we need to recursively find the derivative as

∂L

∂o_j =X

l∈L

∂L

∂net_l

∂o_j

=X

l∈L

∂L

∂o_l

∂net_l

∂o_j

=

=X

l∈L

∂L

∂o_l · φ⁰(net_l) · w_jl

.

(2.25)

(26)

Here L is all the neurons receiving input from the neuron j. The formula depends on the stored output values of each node, calculated in the forward propagation, but also on the derivatives of L with respect to output values in the next layer. Therefore the backward propagation algorithm is calculating the gradients by moving backwards in the network. It starts in the last layer and then uses the calculated gradients of that layer to calculate the gradients of the preceding layer and so on. As opposed to this, the forward propagation algorithm calculates the outputs of each node by moving forwards in the network.

Backpropagation can be done over the full data set at once, on a subset of data points called a mini-batch or on a single data point. Computing the gradients over the full data set can be computationally heavy for large data sets and might result in getting stuck in local minima or saddle points. Updating on one data point at a time will make the gradients very noisy which helps to get out of local minima but the learning process is usually very slow because of too much noise. A good compromise between these two approaches is to instead use mini-batches. A mini-batch is a set of data points, sampled from the full data set. The gradient is calculated on one batch at a time which creates a more steady gradient with enough noise to often be able to avoid local minima and saddle points. It is also less computationally heavy than computing the gradients for the full data set and quicker than following the gradient of one data point at a time.

Data preprocessing

Before training a network some data preprocessing is usually done. Prepro- cessing can consist of pruning irrelevant features of the data, removing outliers, normalising data and removing null values etc. This can help the network to reach good results quicker and sometimes generalise better (that is, return more suitable output values for input values not represented in the training examples) [17].

When using features with different units or scale, it is common to stan- dardise the features of the network. Standardisation makes the features more balanced compared to each other and this has been found to improve results in practical applications. It can be applied to both input and output values.

A standardisation method called z-score normalisation, also known as mean and variance normalisation, translates the data points, feature by feature, by subtracting the mean µ and then scales the data points, feature by feature, by dividing by the standard deviation σ [17]. The mean and the standard

(27)

deviation for a feature j, are calculated as followed:

µ_j = 1 N

N

X

i=1

x_i,j (2.26)

σ_j = s

PN

i=1(x_i,j− µ_j)²

N − 1 (2.27)

where N is the the number of data points in the training data set and xiis one training data point.

The following formula describes how to compute the standardised data- points that are used as input for the network

x^new_i,i = x_i,j− µ_j

σ_j (2.28)

for each data point index i and each feature j. The result is that each feature is transformed to have mean zero and standard deviation one over the data.

2.4.3 Deep neural networks

Deep neural networks are neural networks with several layers. This gives the network the ability to succinctly capture more complex characteristics of the function to be approximated.

A problem that might arise with deeper neural networks is the problem of vanishing gradients [18]. This problem is the problem of gradients becoming vanishingly small when propagating the error through the network during the backpropagation, see equation (2.25). The problem typically arises when the activation function, which will be described in the next section, takes a value between zero and one. Due to multiplication of the errors, the gradients will then decrease with the depth of the network and will thus quickly vanish. This problem can be solved by choosing another type of activation function.

Training a deep neural network is often computationally heavy and requires large amounts of data. Over the years the computational power of computers has increased as well as storage space and the amount of data that is available.

This, as well as the knowledge of, and solutions to, the vanishing gradients problem are contributing factors to the large increase in popularity of deep neural networks over the recent years.

(28)

Activation function

The activation function in a deep neural network allows us to learn complex and non-linear mappings between the network input and output by introducing non-linear properties to the network. Without activation functions in the layers of a deep neural network, the weight matrices of the network can simply be multiplied together to a single matrix, equivalent to a single-layer network that computes the same function.

The activation function is applied to the input of a neuron and the resulting output is transferred forward to the neurons in the next layer of the network.

The function can force the neuron to output a value in a certain range or with a certain quality.

One type of activation functions are the sigmoid functions. They are char- acterised by their “S”-shape and can be functions as the hyperbolic tangent function φ(x) = tanh (x) or a logistic function like,

φ(x) = 1

1 + e^−x. (2.29)

A popular activation function today is the Rectified Linear Units (ReLU)

φ(x) = max(0, x) (2.30)

which has shown to give state-of-the-art results in many applications. The advantage of this function is that the derivative for positive input is 1 and thus the problem of vanishing gradients does not occur [19].

Spectral normalization

Spectral normalization is a weight normalization method first introduced by Miyato et al. [20]. The method aims to improve the training of networks by controlling the Lipschitz constant.

The Lipschitz constant ||f ||Lip, of a function f is the smallest value M such that ||f (x)|| − ||f (x⁰)||₂

||x − x⁰||₂ ≤ M (2.31)

for any x and x⁰. Functions that have finite M form a strict subset of the continuous functions.

By bounding the Lipschitz constant of a network through spectral normalization, the gradients of the network become bounded, which stabilizes training.

(29)

The spectral norm of a matrix A is defined as follows:

σ(A) := max

h:h6=0

||Ah||2

||h||₂ = max

||h||₂≤1||Ah||₂ (2.32)

Spectral normalization is applied on a network by normalizing the weights of the network with the spectral normalization of the weights.

Shi et al. [7] proves that for a ReLU network, the Lipschitz constant of the entire network is bounded by a constant M when spectral normalization is applied as follows on each layer in the network,

W_s = W/σ(W ) · M^1/L. (2.33)

Here W^s is the normalized weights and L is the number of layers in the network. Bounding the Lipschitz constant is important for stability in the control system when using deep neural networks to control a drone [7].

2.5 Learning with changing data

Working with real world data might give rise to problems. Sometimes data is acquired over time and sometimes the data is changing which can require the network to be adapted. The remainder of this background is about these problems and how learners may deal with this changing world.

2.5.1 Continual learning

Continual learning is the ability of a model to learn continually from a stream of data, building on previously learned knowledge and without forgetting previously seen tasks.

In some neural network applications it is practical to be able to update the network when new data is acquired. In this thesis for example, we can pre-train the drone controller in an offline manner, where a full set of the data is know and does not change at training time. However, to be able to adapt to, and learn from new situations we want to be able to update the network during the flight of the drone, in an online manner where the data is received sequentially in real time.

One simple method is to re-adapt the model sequentially with stochastic gradient descent, given each new data point. Such a solution can encounter a problem known as catastrophic inference or catastrophic forgetting [21], [22].

This is the problem when a network forgets what it previously learnt when

(30)

learning a new task or learning in a new situation. Humans can learn new task without forgetting old knowledge, or even utilise experience to quicker learn new tasks. However, while training an artificial neural network on new tasks or new data, it risks to forget what it previously learnt.

Another method is to re-train the network on the full data set, including the new data points, but this is not computationally feasible in real-time flight, a more efficient method is needed.

There are several papers trying to compare different online/continual learning methods by empirically comparing them in different test scenarios [23], [24]. To get an overview over the different types of continual learning techniques Maltoni and Lomonaco [25] categorised them in three groups:

Architectural techniques aims to lower forgetting by using different archi- tectural methods [26], [27]. This can consist in freezing the training of certain parameters while still being able to leverage from the previously learnt knowledge. A disadvantage for some of these methods is that the number of parameters might grow with the number of task.

In regularisation techniques the changes in the weights are restricted or regulated. This can be done by adding an extra term to the loss function that restricts changes in important weights for previously learned tasks [28], [29], [30].

Finally, in rehearsal techniques, catastrophic forgetting is avoided by re- playing past information. This can be done by e.g. storing previous training data or using a generative model to generate old data-points. This is then interleaved with data for a new task during the updates of the network.

Here follows some examples of different methods for continual learning:

Elastic weight consolidation (EWC) [28] is a method that intends to mimic the way a brain protects previously strengthened synapses when learning new skills. The algorithm slows down learning on certain weights based on their importance to previously learned tasks. To decide the importance of the weights the conditional probability of the parameters given data is considered.

The loss function for the second task is then a sum of the loss of the second task alone and a distance measure of the old and the new parameters weighted with their importance.

An extension of the EWC is introduced by Liu et al. [29]. Their approach is to rotate the parameter space of the neural network to attempt to make the Fisher Information Matrix approximately diagonal while keeping the output unchanged. The authors says that this rotation significantly improves the results compared to the original EWC method which instead assumes that the so-called Fisher information matrix is diagonal without any rotation.

(31)

Intelligent synapses [30] is a method that provides the weights of a network with an importance measure for solving previous tasks. Changes of the parameters with a high importance measure are then penalised to avoid forgetting.

Rehearsal methods can require a lot of memory if large amounts of data is stored to be replayed for the network during the online training. Hayes, Cahill, and Kanan [31] provides a solution where a buffer of a small set of data prototypes are stored for each class. The class specific buffer is updated when a new data point arrives and then the network is updated on all buffers in a random order.

In another approach, introduced by Shin et al. [32], a deep generative model is trained to generate data that are similar to the old data. Generated data are then interleaved with the new data points during the online update of the network.

In summary, there are several methods of continual learning and their main goal is to be able to acquire new knowledge without loosing old one. In this thesis we will learn and try to adapt to new environments and ground effects.

While updating the network we do not want to forget all previous knowledge we had about the ground effect and thus continual learning techniques might be useful. However, this leads me to another problem we could encounter in our application, namely concept drift.

2.5.2 Concept drift

In real world applications, the data often changes over time. This can be grad- ual changes, as prices on the market that slowly increases over the years or sudden changes as the change in someones working hours when suddenly go- ing on vacation or changing job.

In machine learning these changes in data are called concept drift [33].

Concept drift can occur as a shift in the targets concept of the data while the data distribution might still remain the same and that is referred to as real con- cept drift. Alternatively the data distribution might change while the targets remains the same and this is known as virtual concept drift. An example of virtual concept drift is in detection of spam, where the spam itself may change over time but our classifications remain spam or not spam.

When a model experiences concept drift it needs to be updated and adapted to the new situation. Sometimes concepts might reoccur and it is then prefer- able if the model can quickly re-adapt to the reoccurring situation when it arrives.

(32)

Probably the first algorithm to address concept drift was STAGGER [34]

that was presented by Schlimmer and Granger Jr. in 1986. The algorithm adapts representations of concepts through constructing features whose relevance is evaluated on the current data.

There are different types of strategies for dealing with concept drift, for example instance selection, instance weighting and ensemble learning [33], [35].

In instance selection the idea is to select instances relevant to the current concept. One common way of doing the selection is by using a training win- dow over recent instances to update the model for predictions in the near future [36], [37].

In instance weighting, training instances are weighted by their importance e.g. their age [38].

Ensemble methods uses several different models for prediction [39], [40], [41]. The models output can then be combined through voting or weighted voting or alternatively the best model for the current data can be selected alone.

In this thesis we thought that we might encounter concept drift where the ground effect might change even though the input data might look the same.

We then had to find a solution that can predict the ground effect, adapt to new ground effects and still remember the important general qualities of a ground effect.

(33)

Method

This chapter will first present an overview of the implementations of this project followed by detailed descriptions of each component of the system.

3.1 Method overview

Before going into details of the implementations of this project, I want to start by giving an overview of the different parts of the system. The main building blocks of the system is the simulator/drone, the controller and the neural network. Figure 3.1 shows an overview of how the different parts of the system are connected.

Figure 3.1 – An overview over the drone system.

The implementations of this thesis were made in a simulation environment, where I could simulate the drone and its flight under different configurations.

(34)

To simulate a natural environment I added the gravity as well as a modelled ground effect force acting on the drone. The simulator provides real time data of the drone state such as position, velocity, acceleration and attitude of the drone.

A neural network serves as a model of the ground effect. Given the drone state, the neural network can provide an estimate of the ground effect. The controller uses the ground effect estimation along with the drone state to calculate a control input for the drone, hence a command rotor speed for each rotor.

In the end I add online learning to the system. The drone states are then stored and used along with calculations of the ground effect to update the neural network model, online, during flight.

3.2 Technology stack

In order to implement and carry out the practical parts of this project, the following technology frameworks were used.

3.2.1 PyTorch

PyTorch is an open source deep learning platform that provides a large library of functionalities and tools for programming deep neural networks [42]. The platform is built for Python and is quick and easy to use.

3.2.2 Gazebo

Gazebo [43] is a free open-source 3D robotics simulator. The simulator makes it possible to design robots and then control, simulate and test algorithms for the robots in a realistic scenario and environment. In this way algorithms can be safely tested before applying them on real robots or drones.

3.2.3 Robotics operating system

The robotics operating system (ROS) is a framework for writing robotics soft- ware [44]. ROS serves as an interface which enables communication between robot algorithms and the Gazebo simulation environment.

(35)

3.3 The simulator

All the simulations of this thesis were made in Gazebo [43] in combination with the robotics operating system (ROS) [44] to communicate between the drone and the controller of the drone. The controller was written partly in C++ and partly in Python. The networks were created in Python using the PyTorch deep learning platform.

The drone model used was a nano drone with specifications:

• Weight = 56 g

• Body dimensions = 8 x 8 x 2.52 cm

• Smallest distance between two rotors = 6.328 cm

• Rotor radius = 4.7 cm

• Thrust constant = 2.39⁻⁵ kg· m/rad²

• Max rotor speed = 140 rad/s

3.3.1 Ground effect

To simulate the ground effect, I used the model of the ground effect introduced in section 2.3. Recall equation (2.13):

T_input

T_output = 1 − ρ R 4z

2

, (3.1)

with Tinput = f_rotorand Toutput = f_rotor+ f_ge. Together these equations give:

frotor

f_rotor+ f_ge = 1 − ρ R 4z

2

. (3.2)

Recall also equation (2.14):

f_rotor = C_T · n², (3.3)

this combined with equation (3.2) gives the ground effect fge, a vertical force added in the simulator on each rotor and computed as follows:

f_ge = C_Tn² 1

1 − ρ(_4z^R)² − 1

!

. (3.4)

(36)

The hyper-parameter ρ, was varied to create different ground effect strengths in the simulator. I will denote the hyper-parameter of the simulated ground effect as ρ^sim to distinguish if from the parameter used during training of the models. To avoid problems with an unbounded ground effect, the ground effect was modelled as constant for all altitudes below a certain cut-off value. The cut-off value was chosen as the smallest two decimal value that was larger than the break point where the ground effect goes to infinity. The cut-off value was thus chosen as 0.02 m for ρ = 1 and ρ = 2.

Figure 3.2 shows the behaviour of the ground effect for different values of the altitude. The properties of the ground effect is such that it is very large closer to the ground while it quickly approaches zero when the altitude is in- creasing.

Figure 3.2 – The ground effect [N] over altitudes [m] for different hyper- parameter values denoted in the legend. The rotor speeds were chosen to have the drone hovering on each altitude.

3.4 Controller

Essentially we want to control the rotor speeds of each rotor of the drone to steer it. To compute the necessary speeds of the four rotors we will first need to compute the thrust and moments required to reach some desired way-point.

The rotor speed ni of each rotor i of the drone are related to the force fi, produced from each rotor, as follows:





 f₀ f1

f₂ f₃







= C_T





 n²₀ n²₁ n²₂ n²₃







. (3.5)

This in its turn is related to the moments and trust according to force and

(37)

moment balance equations

T = f₀+ f₁+ f₂+ f₃ M_i = k_i(f₀+ f₁− f₂− f₃) Mj = kj(−f0+ f1+ f2 − f3) M_k = k_k(−f₀+ f₁− f₂+ f₃)

(3.6)

where kⁱ, k_j and k^k, are drone specific constants.

Three different frames of reference are used in these calculations. First the inertial frame where we denote the axes x, y and z, where z is the vertical axis.

Second we define a frame rotated with the yaw of the drone compared to the inertial frame but with the third axis remaining vertical, we denote this frame with axes u, v and w. The last frame is the frame of the drone denoted i, j and k, where k corresponds to up relative to the drone.

First a desired position or way-point, pd = (x_d, y_d, z_d) and a desired yaw angle ψ^dare decided. In this application the desired yaw will always be zero.

We want the drone to stop in each way-point hence desired velocity and acceleration is set to zero.

Let us first consider the position of the drone. With a proportional derivative controller we get the control signal

u = α₀(p_d− p) + α₁( ˙p_d− ˙p) + ¨p_d (3.7) where u = ¨p. If the error between the desired and the true value is denoted as e, we get the following equation:

¨

e + α₁˙e + α₀e = 0 (3.8)

The characteristic polynomial of this equation can be written as:

P (s) = s² + α₁s + α₀. (3.9) To ensure stability, the roots of this polynomial are chosen with a negative real part [8]. We choose the solution α0 = 1 and α1 = 2, putting the roots at s = −1. With these values inserted in equation (3.7) we can derive the acceleration a^xyz = u needed to compute the command to the drone.

To determine the desired thrust we will look at Newtons second law described in the uvw-frame. The total required thrust force frotors = (0, 0, T ) described in the i,j,k-frame has here been rotated with the roll and pitch (the middle term on the right hand side of equation (3.10)). The acceleration and

(38)

ground effect is rotated from the inertial frame with the yaw.

m



 a_u a_v a_w



=



 0 0

−mg



+ T





cos ϕ sin θ

− sin ϕ cos ϕ cos θ



+



 f_ge,u f_ge,v f_ge,w



 (3.10)

Here m is the drone mass, g the gravitational acceleration constant and fge,uvw

the total ground effect acting on the drone.

The thrust can then be derived as follows:

T = ma_w+ mg − f_ge,w

cos ϕ cos θ (3.11)

With the angles θ and ϕ given from equation (3.10) and (3.11) as θ_dc = arctan

a_u− f_ge,u/m a_w+ g − f_ge,w/m

(3.12)

ϕ_dc = arctan

a_v− f_ge,v/m

−a_w− g + f_ge,w/m

(3.13) I denote the angles with dc which means that is is a desired angle to get the command thrust needed, as opposed to ψ^dwhich is a real desired angle.

Next we will compute the moment of the drone, while stabilising the attitude movements. Following the same steps as in equation (3.7) and (3.8) we get

u =



 ψ¨_d θ¨_dc

¨ ϕ_dc



+ 2







 ψ˙_d θ˙_dc

˙ ϕ_dc



−



 ψ˙

θ˙

˙ ϕ







+







 ψ_d θdc

ϕ_dc



−



 ψ θ ϕ







 (3.14)

with u = (ψ, ¨¨ θ, ¨ϕ)^T.

The rotational speeds are given by the simulator as a rotation around the i, j and k axis of the drone, represented as ω. This relates to the rotational speed in Euler angles as (ψ, ˙˙ θ, ˙ϕ)^T = R(w_i, w_j, w_k)^T, with R being a rotation matrix.

If we differentiate this expression we get the the rotational acceleration as



 ψ¨

θ¨

¨ ϕ



= R⁻¹_T



 w_i w_j wk



+ R_T





˙ w_i

˙ w_j

˙ wk



 (3.15)

Finally, the moment is directly related to the rotational acceleration as M = I ˙ω, where I is the inertia matrix of the drone. So now with the thrust T and the moment M we can go back to equation (3.5) and (3.6) to compute the command rotor speeds.

(39)

3.5 Data gathering and network training

3.5.1 Gathering data

During data gathering, the true model of the applied ground effect was used in the controller. This to simulate an expert pilot as was used in the Neural Lander [7] when gathering data. The drone was then set to follow some randomly sampled way-points form a uniform distribution of −0.25 m ≤ x,y ≤ 0.25 m.

To get enough data in the most relevant area, closer to the ground, two out of three way-points were randomly sampled on the altitudes 0 ≤ z ≤ 0.25 m and one out of two were sampled on 0 ≤ z ≤ 1.5 m.

Data was gathered with two different values of the ground effect hyper- parameter ρ = 1.0 and ρ = 2.0 separately. Approximately 10 000 data points were gathered for each hyper-parameter value corresponding to approximately 15-20 minutes of flight time.

The data that was measured was a state vector ζ of the drone, consist- ing of altitude z, velocity vx, v_y, v_z, attitude qα, q_x, q_y, q_z and rotor speeds n0, n1, n2, n3of the four rotors. The acceleration a^x, ay, azwas also measured to be able to calculate the target ground effect as follows

f_ge = ma − mg − f_rotors (3.16)

where f^rotors was calculated as a sum of each rotor force f^rotor (equation (2.14)).

3.5.2 Data preprocessing

First some potential outliers in the data were sorted out by putting a strict upper and lower limit on the z-value of the ground effect and simply removing the data points with values outside of this range. The range I used was −1⁻³N ≤ f_ge,z ≤ 1N which removed only a small fraction of the data set. I choose a lower boundary slightly lower than zero. Setting the lower limit to strictly zero would have removed a large fraction of data point that were approximately zero but had been measured as small but negative.

Then the data was standardised by calculating the standard deviation and mean value for each feature for the full set of offline training data. These two vectors of values were used to translate and scale all data points in both training and testing and further on, in the online learning updates, all according to formula (2.28).

(40)

3.5.3 Network

I set up a deep feed-forward neural network with 3 hidden layers with dimensions 25, 30, 15. The input was the 12-dimensional drone state vector ζ and the output was the 3-dimensional ground effect. Predicting the ground effect as a 3-dimensional vector allows the network to learn a more general ground effect. ReLU activation functions were applied in each hidden layer and spectral normalisation was used on all the weights in each layer, following the example of [7].

Offline training

The input-output data pairs were first scrambled in a random order and then separated in a training and a validation set with approximately 80% of the data in the training set and 20% in the validation set. The network was then trained with the Adam optimiser for approximately 1000 epochs until the validation error converged. For the first 10 epochs a learning rate of 10⁻³ was used and then it was decreased to 10⁻⁴. These values were chosen from testing different values and observing how the error curve behaved. The training was done on mini-batches of size 64 and the loss function used was the mean squared error of the predicted ground effect compared to its target values.

Online training

The online learning was made similar to the offline learning, only here the data was gathered for updating the network during flight and was therefore given in a sequential order. The network was updated as soon as a batch of 64 data points had been gathered. Since the data was gathered with a frequency of 100 Hz, the network was updated a little less than two times every second.

I also implemented two modified versions of online learning. One that I call Online constraint and another one that I call Online margin. These two methods follow the same implementation as the naive online learning method described above, but with a modification to the loss function.

The idea of these modifications was to restrict too large changes of the network parameters to avoid catastrophic forgetting while still allowing the model to change over time. This is similar to regularisation methods such as L1-regularisation and the regularisation techniques [28], [29], [30] discussed in section 2.5.1, but instead of penalising weights for deviating from zero, I penalise weights that deviate from the weight values learned offline.