Adaptive detection of anomalies in the Saab Gripen fuel tanks using machine learning

(1)

UPTEC F 20030

Examensarbete 30 hp

Juni 2020

Adaptive detection of anomalies

in the Saab Gripen fuel tanks

using machine learning

Carl Tysk

(2)

Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student

Abstract

Adaptive detection of anomalies in the Saab Gripen

fuel tanks using machine learning

Carl Tysk, Jonathan Sundell

Gripen E, a fighter jet developed by Saab, has to fulfill a number of

specifications and is therefore tested thoroughly. This project is about detecting anomalies in such tests and thereby improving the automation of the test data evaluation.

The methodology during this project was to model the expected deviation between the measured signals and the corresponding signals from a fuel system model using machine learning methods. This methodology was applied to the mass in one of the fuel tanks. The challenge lies in the fact that the expected deviation is unknown and dependent on the operating conditions of the fuel system in the aircraft. Furthermore, two different machine learning approaches to estimate a prediction interval, within which the residual was expected to be, were tested. These were quantile regression and a variance estimation based method. The machine learning models used in this project were LSTM, Ridge Regression, Random Forest Regressor and Gradient Boosting Regressor.

One of the problems encountered was imbalanced data, since different operating modes were not equally represented. Also, whether the time

dependency of the signals had to be taken into account was investigated. Morever, choosing which input signals to use for the machine learning methods had a large impact on the result. The concept appears to work well. Known anomalies were detected, and with a low degree of false alarms. The variance estimation based approach seems superior to quantile regression. For data containing anomalies, the target signal drifted away significantly outside the boundaries of the prediction

interval. Such test flights were flagged for anomaly. Furthermore, the concept was also successfully verified for another fuel tank, with only minor and obvious adaptions, such as replacing the target signal with the new one.

(3)

Popul¨

arvetenskaplig sammanfattning

Gripen E, som är ett stridsflygplan under p˚ag˚aende utveckling av Saab, m˚aste uppfylla en rad olika specifikationer, b˚ade ställda utifr˚an kunder till företaget och myndigheter. Därför utförs provflygningar där det verifieras att flygplanet klarar av dessa krav. Därefter analyserar flygprovsingenjörer datan fr˚an dessa provflygningar. Detta kräver mycket resurser och tid. Tack vare simulationsmod-eller fr˚an Saab är det möjligt att jämföra simulerade signaler och motsvarande uppmätta signaler fr˚an provflygningarna. Om de skiljer sig nämnvärt, kan det skett en anomali i flygplanet eller i simuleringen.

Detta arbete handlar om nästa steg för att effektivisera denna jämförelse. Fokuset ligger p˚a bränslesystemet, och mer specifikt p˚a skillnaden mellan uppmätt och simulerad massa i en bränsletank.

F¨or att citera Donald Rumsfeld:

”There are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns - the ones we don’t know we don’t know.”

Den fysikaliska modellen är ”known knowns”. Detta är egenskaper som är kända om bränslesystemet. ”known unknowns” är att det finns kända approx-imationer i simuleringsmodellen. Exempelvis är bränsletankar modellerade som rätblock när de egentligen har en väsentligen mer komplex struktur. Slutgiltigen, ”unknown unknowns” är systematiska skillnader och avvikelser som är okända. Fördelen med maskininlärning är att det är väldigt tillämpbart för att hitta inte bara de ”known unknowns” utan även ”unknown unknowns”, till skillnad fr˚an mer traditionella metoder där man utg˚ar fr˚an kända trender och begränsningar i systemet.

M˚alet med projektet är att med hjälp av maskininlärningen träna upp en modell som hittar förväntad avvikelse mellan simulationsdata och motsvarande mätdata. Denna modell kan sedan användas för att detektera när dessa avviker fr˚an varandra till en oförväntad hög grad. Maskininlärningsmodeller som använts ¨ar LSTM, Ridge Regression, Random Forest Regressor och

Gra-dient Boosting Regressor. Dessa anv¨andes inte enbart f¨or att uppskatta en

normal skillnad utan ocks˚a för att estimera ett prediktionsintervall inom vilken skillnaden förväntades vara. Tv˚a olika metoder, en baserad p˚a kvantilregres-sion och en p˚a estimerad varians, testades för att uppskatta prediktionsinter-vallen.

(4)

Dessutom har mängden data varit begränsad. Detta kommer av att antalet tillgängliga dataset fr˚an provflygningarna varit l˚agt, särskilt under början av projektet. Huruvida tidsberoendet i signalerna m˚aste tas hänsyn till har ocks˚a undersökts. Olika sätt att processera datan har utforskats och använts, som exempelvis olika skalningar av datan, l˚agpassfiltrering och olika sätt att avlägsna data som inte bär p˚a väsentlig information. Valet av insignaler har varit av stor betydelse för resultatet.

Konceptet verkar fungera. Kända anomalier blev detekterade med en l˚ag niv˚a av falsklarm. Dessutom motbevisades en misstänkt anomali med hjälp av modellen. När bränslesystemingenjörer analyserade datan nogrannare visade det sig att signalerna i själva verket betedde sig som förväntat, men var p˚averkade av sällsynta förh˚alladen. När skillnaden mellan uppmätt tankniv˚a och simulerad tankniv˚a avvek signikant utanför prediktionsintervallen flaggades testflyget för anomali. Resultatet tyder p˚a att metoden som bygger p˚a att estimera varians ¨

(5)

Acknowledgments

This thesis was performed at the division Concepts, Modelling and Simu-lation of Vehicle Systems at Saab. We would like to thank Saab for the oppertunity to study the advances technology of the fighter jet Gripen, es-pecially its fuel system. Also, we are very grateful for the level of trust shown to us, that we got to study highly classified material. Our work was supervised by Doctor Ylva Nilsson, employee at Saab. Without her, the depth of the analysis would have been severely limited. Her support and ideas were crucial for the work to be succesfully carried out. Furthermore, we would like to thank the System Engineer Olof Bengtsson. His knowl-edge of the fuel system and view points were of extreme importance for the project.

(6)

1 Introduction

This master thesis revolves around the research and development of the next generation fighter jet ”JAS 39E/F Gripen”, developed by Saab AB in Link¨oping, Sweden. The aircraft is also referred to as ”Gripen E” or ”Gripen F” depending on if it is the single or two- seater model respectively. In this thesis the aircraft will from now on be referred to as ”Gripen”.

When developing a next generation fighter jet, one of the most important work areas is to test if the requirements from the customer are fulfilled. At Saab, Link¨oping, the department ”Flight test and verification” performs a lot of flight tests in order to verify the performance and functionality of Gripen. During such flights data from multiple sensors in many different aircraft systems is recorded for later analysis.

1.1 Background

Analyzing all the signals from all the test flights, to verify that everything pro-ceeded as expected, is a very time demanding and rather complex task. Firstly, the number of measurement signals are very large. Secondly, there is a huge number of flights that are to be analyzed. As it is now, the flight test engineers have to do this analysis manually.

Therefore, Saab is investigating the possibility of automating this procedure. A new infrastructure is being developed where the flight engineers are able to compare data from the flights with data from a simulation model. The purpose of the simulation model is to capture the most relevant behaviour of the system and its sensors. The simulation model takes in control signals recorded during the flight itself together with surrounding environment data, also recorded during the flight. Examples of control signals is for instance that the pilot presses the throttle or if a valve is opened or shut. Surrounding environment data is for instance the static reference pressure, the temperature of the air surrounding the aircraft, and flight speed.

(10)

Figure 1: A figure which illustrates the process of data storage and analysis of flight data.

The comparison and analysis between simulation and measurement data can be seen in Figure 1 as the box named ”Analyzis & report generation”. The collected data for analyzis comes from ”Storage B”. Firstly, the aircraft performs some maneuvers to verify that everything functions properly. The measurement data is stored in a place referred to as ”Storage A”. Thereafter, it goes through necessary pre-processing and the data is stored in Storage B. Also, the processed data is sent to the simulation model, the OMSimulator, which generates data that should correspond to the expected measurement data.

There are occasions when the simulation behaves very differently from the measurement signals. In such cases, the data is analyzed by the engineers and an investigation is carried out whether there was something wrong in the system, or if it is simply the simulation model that fails to capture the normal behaviour at these operating conditions. That gives an intuition about limitations in both the measurement equipment and the simulation model.

Naturally, the measurement data is not identical with the output from the simulation model. The measurements suffer from inexact readings due to lim-ited resolution, faulty readings due to extreme conditions during the flight and unwanted noise. The simulation model suffers from inexact estimations due to approximations of the physical system. For instance, the fuel tanks are modeled as cuboids even though they have a much more complex struc-ture.

(11)

degrees of freedom that have to be considered.

1.2 Purpose

The described simulation tool has become a step forward in automatically find-ing unexpected deviations in the data. However, the result still needs to be analyzed manually and the automation can be extended further. To manually go through all the data to check whether the measurement and simulation are in agreement or not is very ineffective. Therefore, it would be a large improvement to find an automatic procedure to filter out all the expected data, and only visualize the data that contains anomalies. Anomalies are data points where the simulation and measurements differ from each other in a manner that is unexpected.

The purpose of this thesis is to predict what the difference between the measure-ment and corresponding simulation data normally is. This difference will from now on be called the residual. For instance, the residual for the fuel level in tank 2 is simply the measured fuel level in tank 2 minus the simulated fuel level for tank 2.

1.3 Intended User

The intended user of the automation tool is staff with much knowledge and experience of the particular sub-system, such as flight test engineers, system engineers or model developers. This is due to the fact that the user has to be able to find the cause of the anomaly. The tool is only for automatically finding events of anomalies, not identifying the cause. The application will be used off-line, after the flight has been completed and the data sets have been stored in Storage B.

1.4 Limitations

There are many different subsystems in the aircraft, but the investigation of this thesis is limited to the fuel system. However, the concept could be gener-alized to include other systems. Furthermore, the investigation will primarily be focused on predicting the residual of fuel tank 2. A few other signals will also be investigated. In addition, the methodology will also be verified for tank 3.

(12)

The aim of this project is not to improve the simulation model nor the mea-surement system but to find an automatic way of finding where they are not functioning as intended. If the application, during the work process, indicates on imperfections for the simulation model that information would obviously be valuable. But to keep the data set untouched during this working process no corrections regarding the simulation model will be made.

A robust way of detecting anomalies, in the sense of filter out outlier values, is outside the scope. It would be sufficiently to raise a warning flag as soon as the unseen residual is outside of the prediction interval. However, a more advanced approach will be tested.

1.5 Requirements

All the necessary code shall be written in Python and be structured as a module. This makes it possible for the staff to import the application. Natu-rally, the code should be well documented, well structured and easy to fol-low. Furthermore, the pre-processing steps should be divided into several functions. This will make it easy to improve and develop the application. The code should, continuously, be version controlled in GIT in a given reposi-tory.

The model should predict what the normal residual value should be given certain inputs, a prediction interval around that value in which the residual is expected to be in and return a flag message if the measured residual is outside the boundaries of the interval. The prediction interval should be dependent on the input signals,since if, for instance, the aircraft is flying upside down then it is expected that the prediction interval becomes larger since the flying conditions make the data more varied and unreliable.

1.6 Idea

The prediction model for the residual will be achieved using machine learning. The algorithm should be trained, in a supervised manner, to find how to weight the input data, measured sensor data from aircrafts and simulation signals, to estimate the target signal, which is the residual value between flight data and the corresponding simulation data.

(13)

2 Theory

Since the focus of this thesis is on the fuel system in Gripen, some basic knowledge about the fuel system needs to be introduced. As mentioned in section 1.4 the thesis is limited to one aircraft individual, therefore this section will be specific for this individual regarding the aircraft and the fuel system design.

2.1 Overview of the fuel system

The fuel of the aircraft is stored in several different tanks spread out in the aircraft. However, the only tank that directly provides fuel for the engine is tank 1. A pump makes sure that this tank is constantly fed with fuel, from the other tanks, if its fuel volume is under a certain threshold. This pump is referred to as the transfer pump. One of the main reason for several different tanks in an aircraft is to be able to control the center of gravity of the aircraft, by transferring fuel between the tanks [11]. Another attribute of the fuel system is that it is possible to pressurize the tanks. This pressurization helps the transfer pump to transfer fuel with more ease and helps the fuel to not boil at high altitudes.

Figure 2: A figure which shows where the tanks are located in the aircraft and how they are labeled.

(14)

stands for ”Venting Tank” and works as a tank for venting of the pressuriza-tion system, but also as a last buffert tank as a place for the fuel to end up in instead of other parts of the aircraft. The Venting Tank is generally dry during most of the time and is therefore also a good place to place the transfer pump.

The basic idea of the fuel system, regarding fuel transfer, is to empty the outer tanks first and make sure that there is fuel left in tank 1 as long as possible.

2.1.1 Transfer pump and jet pumps

The transfer pump is driven by the aircrafts hydraulic system. The control signal of the transfer pump is a small current (mA range) and is sent to the transfer pump as steps with different amplitudes. When the amplitude of the signal is at its largest, the pump will work at max capacity. Because of the priorities of the hydraulic system, the pump is controlled to only run at max capacity when flying at low altitudes at high fuel consumption of the en-gine.

To help the transfer pump with fuel transfer, jet pumps are placed in some of the tanks.

(15)

2.1.2 Fuel level/volume/mass - measurement

Figure 4: A figure which shows the probes and the fuel level in a typical tank. The fuel system consists of multiple tanks.

As seen in Figure 4, there may be several probes located in each tank. The probes are capacitive and fuel tank levels are calculated as

Ctot=Cstray+Cdry+x(K −1)Cdry (1) where Ctotis the total measured capacitance, Cstray is a stray capacitance in the probes, C_dry the capacitance when the probe is totally dry. x is the ratio of the immersed fuel probe and K the relative permitivity. When x = 1 the probe is completely immersed in the fuel. The relative permitivity can be calculated as

K −1=Ctot−Cstray−Cdry

Cdry

. (2)

There is a known statistical relationship between permitivity and density which can be stated as

ρ= K −1

(K −1)b+a (3)

where ρ is the density and a and b fuel are fuel specific constants.

2.1.3 Signals

A lot of signals from the fuel system can be measured and stored for later use. The signals can be divided into two groups; measurement signals and control signals. Measurement signals are real signals and consists mainly of:

(16)

• Temperatures • Pressures

The capacitance are measured from the probes in the different tanks. The temperatures are measured from temperature sensors placed around the system, for example after the boost pump. The pressures are from pressure sensors connected to the air pressure system and different tanks.

From the measured capacitance the volume can be calculated from known tables. The reason for this approach is because of how the probes are placed and the special geometry of the tanks. From this volume then the mass can be calculated since the density of the fuel is known.

2.1.4 Inclination in the tank levels

When the aircraft accelerates to gain speed or rises to increase altitude, measure-ments of the acceleration in either direction during the flights can be made.

Figure 5: How the acceleration in either direction is labeled

These accelerations correspond to the inclination of the fuel surface. This in-clination can indirectly be measured from the accelerations and can be translated to two measures. The two measures are called the Pitch, calculated as:

θ =arctangens(−Nx

Nz

)∗(180/π) (4)

and the Roll, calculated as:

φ=arctangens(−Ny

Nz

)∗(180/π) (5)

(17)

2.1.5 Physical limitations

Naturally, there exists physical limitations for the aircraft and its systems in practice. As mentioned, the measurement probes in the fuel tanks are not capable of exact readings due to fuel splashing and unwanted measurement noise. Another limitation regarding the tanks and its measurement probes is that they are not capable of measuring the fuel during large inclinations of the aircraft. This is because of the probes being fully drowned in fuel, or the opposite, fully free from any fuel because of how they are positioned in the tanks. The measurements are said, according to fuel system experts, to be trusted for known intervals on θ and φ:

−a<φ<a (6)

and

−b<θ <c (7)

where a, b and c are classified constants. For measurement there is also the limitation of the sampling frequency to record measurements. The resolution for the sampling frequency is high relative to the resolution the data is saved at. The data is saved at a frequency of 1Hz.

2.2 Prediction intervals

(18)

Figure 6: A conceptual figure over the prediction interval

Figure 6 illustrates the function of the prediction intervals. If the target signal is within the expected intervals the values are considered normal.

2.2.1 Quantile Regression

Quantile regression can be used to get an estimate of the prediction interval [10]. Let Y be the target signal with a cumulative distribution function

Fy=P (Y ≤y). (8)

Thus, the τth quantile is calculated as,

Qy(τ)=inf{Fy(y)≥τ} (9)

where τ is a value between 0 and 1.

Let µ(x, ˆβ) be a regression function that maps input data x with parame-ters ˆβ to the target signal Y. For instance, ˆβ could be the parameters that minimize the squared residuals

ˆ β =argmin β ∞ X −∞ (Yi−µ(Xi,β))2 (10)

Let f(x, ˆβ) be the quantile regression function. Minimizing the expected loss of

Y −µ(x,ˆβ) using quantile regression can be stated as

(19)

where f is the quantile loss function. L_τ is a check function defined by

L(ξ|τ)=

(

(τ)∗ξ ξ ≥0

(τ −1)∗ξ ξ <0 (12)

ξ = y − ˆy is the error term, the difference between the estimate and the

target signal (residual in a tank), and τ is the penalizing factor. Putting τ equal to 0.5 means that the model is simply trying to minimize the overall mean absolute error. Putting τ to 0.75 corresponds to penalizing over-predictions by 0.75 whereas under-predictions are penalized with 1−0.75 = 0.25. Thus, a model trained with τ =0.75 will try to over-predict so that the error for all the over-predictions are as small as possible and an upper prediction interval will be given. Similarly, putting q =0.25 will penalize under-predictions three times as much as over predictions. Thus, the aim is to minimize

(τ −1) Z y −∞ ξdFξ(y)+τ Z ∞ y ξdFξ(y) (13) under a given loss function.

Figure 7: Illustrates the penalty for different errors and quantiles. The error axis is a made up axis from -1 to 1 to visualize how the algorithm works.

(20)

2.2.2 Variance estimation

Let the target signal y be a linearly dependent on k input variables x₁, x2 ... xk.

y =β0+β1x1+β2x2+...+βkxk. (14)

where β, β = [β0,β1,β2,...βk]T ∈ IRk∗1, is a column vector with coefficients. A single observation y_i can then be written as y_i= x_iβ + i where i is noise.

µ∗(x₀) = x₀β∗ is the expected value of y in the point x₀= [x₀₁,x02...x0k] ∈

IR1∗k.

Let s denote the estimated standard deviation. X ∈ IRn∗k, is a matrix consisting of the values of the k input variables in a row, one row for each observation. n is the number of observations. This gives that the estimated variance of Y , Y ∈ IRn∗1, is [12]

s2= 1

(n−(k+1))(Y −Xβ

∗

)T(Y −Xβ∗). (15)

Assuming that _i is normally distributed noise, _i ∈ N(0,σ), the confidence interval for the model in the point x0 is given by

Iµ(x0)≈µ∗(x0)±tα

2(n−(k+1))s

q

x0(XTX)−1x0T (16)

where I_µ(x₀) is the confidence interval in the point x₀and tα

2 is the t distribution for the significance level α [12]. The prediction interval can be calculated as

IY(x0)=µ∗(x0)±tα

2(n−(k+1))s

q

1+x0(XTX)−1x0T (17)

where I_Y(x₀) is the prediction interval for Y in the point x₀ [12]. Assuming that n is large in comparison to k and that the uncertainty in the model is small in comparison to the standard deviation gives the approximation

IY(x0)≈µ∗(x0)±tα

2(∞)s. (18)

2.3 Machine learning algorithms

(21)

This is how the implemented simulation model, the ”OMSimulator” at Saab, works. Where machine learning methods are of more use is when the rules about a system are not very clear. For example when looking at the differ-ence between measurement data and simulation data it is very hard to specify rules for when the simulation model under predicts, over predicts or are in agreement of the measurement data, due to the mentioned imperfections and limitations in the simulation model and the measurement system. Here, machine learning algorithms can learn to find rules by themselves from training on data sets.

There are many types of machine learning systems. For this project, to solve the regression task, the focus is on supervised, batch, model based machine learning. Supervised means that the purpose of the machine learning algorithm is to find relationships between the input variables, from now on called features, and the target signal, or label. The problem can both be framed as a time dependent and time independent problem and the approach to solve the problem will, obviously, differ between the two alternatives. Since the aim is to analyze the problem off-line, batch machine learning is used. Model based implies that the number of parameters in the model is not changed with the size of the training data.

There are numerous of different machine learning algorithms, see for exam-ple k-Nearest Neighbors (KNN), Linear Regression, Support Vector Machines (SVM), Neural Networks (NN) and many more [8]. A few of these are used in this work, such as Random Forest, Ridge Regression, Gradient Boosting Regressor (GBR) and Long Short-Term Memory (LSTM).

2.3.1 Ridge Regression

One of the most common supervised ML algorithm is Linear Regression. A linear model can be written as:

y=Xβ+, ∼N (0,σ_e2I). (19) where y is the labels, X is the inputs, β is the weight, is random noise and σ is the standard deviation for the noise. From this, the least squares problem becomes:

ˆ

β =argmax

β

||Xβ−y||2₂ (20)

and is solved by the normal equations:

XTXˆθ =XTy. (21)

Ridge Regression is a form of linear regression when a penalty term is added for the least squares problem:

ˆ

β =argmax

β

(22)

where λ is a regularization parameter. If λ = 0 then the Ridge Regression is just linear regression. If λ is very large, then all weights end up close to zero, meaning the result will be a line through the mean of the data.

2.3.2 Decision Trees

A very popular and wide spread machine learning algorithm is decision trees. These are often used for classification but can also, as in this case, be used in regression purposes.

(a) A figure which illustrates how the data is split for a 2-dim problem.

(b) A figure which shows how certain splits are mapped to the target signal y.

Figure 8: A conceptual figure over the principle of tree based machine learning for a 2-dim problem. The features are split and then mapped to the target signal y. Instead of optimizing parameter weights for a model consisting of for ex-ample polynomials, the decision tree method chooses to find groups with as similar value for a target signal as possible, regarding the features. This method will search for optimal splits between features that gives the smallest loss func-tion.

(23)

Figure 9: A figure which illustrates the tree-like structure of the algorithm. Figure 9 shows the resulting tree-like structure of the algorithm. The method then learns by training to find the most optimal splits given training data. Note that in these illustrations there are only 2 features while in this project, there are many more.

2.3.3 Random Forest Regressor

The Random Forest regressor is an ensemble method [8]. An ensemble method is a group of predictors. The idea behind an ensemble method is to use different predictors and aggregate their prediction, which often will get better predictions than with the best individual predictor. For Random Forest the predictors are Decision Trees.

Another characteristic for Random Forest is that it uses the concept of bagging. Bagging is when the training set is divided into subsets for each individual decision tree to train on different subsets. This sampling for subsets is performed with replacement, which in statistics is called bootstrapping and bagging is short for bootstrap aggregating [8].

2.3.4 Gradient Boosting Regressor

(24)

just as Random Forest but the difference for GBR is that it uses the concept of boosting instead of bagging. What boosting does is that it train its predictors sequentially, where every predictor try to correct the errors of their predecessor. For GBR this is made for a predictor to train for the residual errors made by the previous predictor [8].

An important property of GBR, for this project, is that it has the ability to use quantile regression as a loss function. Thus, different GBR-models can be trained for different quantiles.

2.3.5 Long Short-Term Memory

A powerful machine learning algorithm is Long Short-Term Memory, or LSTM for short . LSTM uses Artificial Neural Networks. These consist of layers of neurons that function similar to the brains neurons. The idea is that the neurons connect or disconnect certain connections in the network, based on feedback from the labels. Training of these connections of neurons can be seen as training weights for the model. LSTM has the ability to learn relationships and tendencies over arbitrarily long time intervals [8].

(25)

sent through the activation function tanh. The output gate filters the output from the tanh function and passes on the short term state, h_(t)which is equal to the output y_(t).

FC stands for fully connected layer. g_(t) is the output from the main layer where the current input vector, x_(t) and the current state h_(t−1) is analyzed. If there are relevant tendencies and relationships to be saved they are sent to the long term memory state. The three remaining layers are gate controllers. They are controlled by the logistic activation function which means that their output ranges between 0 and 1. Their output is multiplied element-wise. If the product equals one, then it will turn the gate open whereas 0 will close the gate.

There are three gates and they are all controlled by this binary condition. Firstly, the forget gate which, as previously mentioned, determines what part of the long term state that should be erased. It is controlled by f_(t). Secondly, the input gate, controlled by i_(t), determines what memories that should be added. Thirdly, the output gate, controlled by o_(t), chooses which memories to use in the current output (y_(t) and h_(t)).

2.4 Machine Learning preprocessing

Before any ML is applied the data needs to be thoroughly preprocessed. This often makes the difference if the ML method will produce good results or not. Moreover, the preprocessing steps have to be performed in the correct order. This section provides the theory behind the most important preprocessing steps that are used in this work.

2.4.1 Low pass filtering

Because of the way measurement is made in the fuel tanks in Gripen, it is natural with fuel splashing, and therefore natural to have a lot of noise in the measurement. To deal with this noise, low pass filtering could be used, and in particular a Butterworth filter. A Butterworth filter is designed to minimize the ripples in the passband [4].

The magnitude of a general Butterworth filter can be written as

|H|2= 1

1+(_ωω 0)

2n (23)

(26)

model.

To avoid getting a phase distortion forward-backward filtering can be used. Forward-backward filtering is when a linear filter is applied forwards in time, but since the data is off-line and a part of a batch, a filter applied backward could also be used [1]. Doing this results in a zero-phase filter.

Figure 11: Illustrates the frequency response of five different orders of Butterworth filter.

Figure 11 shows the frequency response of a Butterworth filter, with cutoff frequency ω0=1rad/s, with order 1, 2, 3, 4 and 5.

2.4.2 Feature selection

(27)

this project is that most of the underlying physics already are taken care of by the simulation model, and a big part of what is left as trends for the residuals are flaws in the simulation model, individual performance differences in devices, measurement noise, splashing in fuel tanks etc.

2.4.3 Linear correlation coefficient

Analyzing the linear correlation coefficient between the residuals and features gives understanding which signals that are linearly correlated with the output signals. A common statistical measure of linear correlation is formulated as Pearson’s r. It is calculated as

ρX,Y=

cov(X,Y ) σXσY

(24) where X and Y are the variables of interest [5]. In this case, the linear correlation is calculated for the residual and a certain feature. Pearson’s R will be a value between -1 and 1, where a value on the boundaries ±1 means that they are perfectly linearly correlated or anticorrelated. A 0 means that there is no linear correlation between them. However, the signals could still have a nonlinear relationship if symmetrical around x=0.

2.4.4 Feature importances

For tree based methods the importance of each feature can be analyzed by using the python module scikit feature importances [6]. The feature that is most important each time a decision is made in the algorithm and on which level. The earlier the more important. This information is stored and can be analyzed after the algorithm has processed all the data. Consequently, a list of which features that were most important, based on the decisions made in the decision tree, is generated. If there are some features that are rarely used, then they can be disregarded. Then the algorithm can be implemented with a new, shortened, list of features and see if that improves the overall accuracy.

2.4.5 Principal Component Analysis

Principal component analysis, often shortened as PCA, is a tool for analyzing which features that span the most variance [7]. The aim of principal component analysis is to transform the features so that the features sent into the machine learning algorithms are as uncorrelated as possible.

(28)

Figure 12: The figure shows the idea behind PCA. The features are transformed into a new set of orthogonal eigenvectors.

Figure 12 visualizes the concept behind PCA, the initial features, feature 1 and feature 2, are transformed into two principal components, 1 and 2. Prin-cipal component 1 spans most of the variance of the signal. The variance on that axis is larger than that of the axis spanned by principal component 2.

The procedure is as follows. The first component in the PCA, i.e Principal Component 1 in Figure 12, is the axis that spans the greatest variance from a scalar projection of the data. This is clearly visualized in Figure 12. If the data should be scalar projected on the axis spanned by Principal Component 1, then that would be the axis with the greatest variance.

To transform the features into the new set of principal components each feature is given a weight

tk(i)=x(i)∗w(k) (25)

(29)

than the initial number of features would reduce the dimension of the input data to the machine learning algorithm. The first weight vector is calculated as

w1=argmax X i (xiw)2 (26)

Further principal components are calculated in a similar fashion. Rewriting equation (26) on matrix form gives

w1=argmax

wTXTXw

(27) where X is the matrix with the features as columns. The principal components are ordered after their relative importance, where the first component is the most significant component. To calculate principal component number k, the first k−1 principal components are subtracted from X to build a new data matrix. Thereafter, the same calculation is performed as in equation (27) but with this new data matrix.

2.4.6 Sampling methods

One important aspect of machine learning is that the data is evenly distributed, sometimes referred to as balanced [8]. Otherwise, the model will fit to parts of the data that is over represented, even though it may not carry more information about the the actual data. Events, or data points, that take place very rarely but is of same importance as other more frequent events regarding explaining the model, will drown in quantity.

2.4.7 Scaling parameters

Many machine learning algorithms are sensitive to the scaling of the features [8]. Especially neural networks, where the activation functions cannot function properly if the input data is out of their specified range.

The same transformation has to be performed on all the input features and not only the ones out of range. Firstly, the part of the data that is used for training is transformed. Thereafter, the parameters that were used in the transformation of the training data is used to transform the features in the test data.

Standardization Standardization means to transform the features so that they have zero mean and with unit variance. The calculations behind this transformation is simply

xtrain,s=

xtrain−µxtrain σxtrain

(30)

where x_train is the transformed feature, x_train is the feature, µ_x_train is the mean and σxtrain the standard deviation. The same procedure is calculated for all features. All the features during the training phase are transformed in this manner and their mean and standard deviation are saved.

To perform the same transformation during the test phase of the model the same parameters are used as in the training phase.

xtest,s=

xtest−µxtrain σxtrain

(29) where x_testis an feature used during the test phase. Note that x_trainand x_test contains the same features, for instance the mass in tank 2.

Normalization The aim of normalization is to transform all the data so that it ranges between 0 and 1. The tranformation can be stated as

xtrain,n=

xtrain−min(xtrain)

max(x_train)−min(x_train) (30)

where min(...) and max(...) means taking the minimum and maximum value of the vector. x_train,nis the normalized feature x_train. Similarly as for standardization, the same transformation has to be made to the features during the test phase.

xtest,n=

xtest−min(xtrain)

max(xtrain)−min(xtrain)

(31) where x_testis a feature during the test phase. Note that x_testand x_testare the same feature, for instance the mass in tank 2.

2.5 Applying the machine learning algorithm

After the data has been formatted correctly, processed and made ready for the algorithm, it is divided into two parts. A training and a test set. The training data is used for tuning the parameters within the model so that it finds the relations between the features and the target signal. The test data is used for verifying how well the trained model estimates the labels.

2.5.1 Cross Validation

(31)

phase it has to be reconstructed in a less complex manner. Say for instance that the model is forbidden from finding polynomial trends over a certain de-gree.

However, instead of having to reconstruct the model every time it overfits or underfits, cross validation can be used on the training data. Validation data is taken from the training set and used repeatedly to make sure that the model does not overfit. If the model starts to overfit the error term between the labels and the estimate tend toward zero. Normally, the model would in that case stop iterating. However, the model is then tested on the validation data and will probably perform poorly. The model gets this poor result as a feedback and is retrained to see if it fits the validation more closely at the end of the next iteration. When the accuracy score during the validation phase stops improving, the model is ready to be verified against the test data.

(32)

Figure 13 illustrates how the data is divided into training and validation data. After one iteration, another part of the data is used for validation.

2.5.2 Tuning of the hyperparameters

A grid search is a structured method of evaluating which set of hyperparam-eters, that give the most optimal result [6]. The hyperparameters is the parameters that define the model, such as the number of coefficients in a linear regression model. The grid search takes as input the combinations of hyperparameters to be implemented. Also, it takes as input which loss function that should be used for evaluating the model. For instance, least square is a loss function which will give the optimal set of hyperparame-ters under the condition that the sum of the squared error should be mini-mized.

2.6 Model Verification

The model verification is performed on the test set. There are different methods to evaluate how well the model performs. Firstly, one can calculate different accuracy metrics. The absolute error shows how well the estimate captures the target signal. It can be stated as

MAE =

Pn

i=1| ˆyi−yi|

n (32)

where n is the number of samples [3]. Furthermore, the the mean squared error can be stated as

MSE =

Pn

i=1( ˆyi−yi)2

n (33)

(33)

the training batch. Only then is it probable that the model finds the eventual anomalies when it is used on new data. Furthermore, there are flights where there are known anomalies. Consequently, one way of evaluating the model will be whether the model flags on those anomalies.

2.7 Anomaly detection

2.7.1 Cusum

The cusum function is a test that can be used for anomaly detection. The main principle in the test is that the function should only give an alarm if the time integral between the estimated signal and the target signal rises above certain threshold values [2]. Thus, the function gives an alarm only if the target signal is sufficiently far off from the estimated signal or if they differ during sufficiently long time. Consequently, the cusum algorithm makes the anomaly detection more robust, small errors during short time periods are ignored whereas larger errors are detected. The test can be stated as follows

gt=gt−1+st−v

gt=0, and ˆk =t if gt<0

gt=0, and ta=t and alarm if gt>h>0

where gtis the area between the estimated signal and the target signal and stis the difference between the signals at time step t. v is drift term which makes sure that the ”error area” is decreased when the estimated signal and target signal are sufficiently close to each other. ˆk is a reset parameter. Without the drift term, the function will alarm every time the estimated signal and target signal differ from each other if the treshold has been passed during prior time steps. ta is the output from the test which is the time index that the threshold has been reached.

Note that there are several varieties of the Cusum algorithm. For one of them, which is shown in Figure 14, the algorithm can be used to flag for anomaly as long as the target signal remains outside of the boundaries of the prediction interval. This variety can be stated as follows

gt=gt−1+st if st>0

gt=gt−1−v if st=0

gt=0, and ˆk =t if gt<0

ta=t and alarm if gt>h>0 until gt=0

(34)

index 15, g_tis gradually decreased by the drift term until it becomes zero. s_t is zero when the target signal is within the prediction interval and the absolute difference between the target signal and the prediction interval otherwise. This is the approach in this project.

(35)

3 Method

The aim of this thesis is to estimate the expected difference, the residual, between a measured and a simulated signal in Gripen. The model of the residual is used to determine whether an anomaly occurred, as stated in sec-tion 1.2. This idea however, can be applied to any residual between an ex-isting measured and simulated signal. To limit the scope of this project it is targeted towards estimating the fuel level of one the tanks. The chosen tank is tank 2 in Gripen, for which the focus lies is estimating the residual between the measured and simulated fuel mass. Furthermore, the method will be applied to the mass in tank 3 to see if the concept could be general-ized.

The code will be written in P ython. There are some packages in P ython that are very applicable to this project. Firstly, numpy and pandas are two packages which make it possible to structure the data so that the necessary preprocessing steps can be applied to the data. Secondly, scikit and Keras/T ensorF low are two packages from which it is very simple to construct machine learnings models. scikit comes with many simpler models whereas Keras/T ensorF low is very suitable for constructing more advanced neural networks. Note that several other packages are also needed but those mentioned are the most significant for this project.

The aim is to detect unexpected anomalies by creating a model that predict the interval within the residual is expected to be. That can be achieved both assuming that the model for the residual has to take the time dependency of the signals into account or not.

In section 2 the theory behind different models and processing steps are presented. This section presents how this theory is applied in this project, from preprocessing the data to anomaly detection.

3.1 Prediction Intervals

As mentioned in section 2.2, two different approaches to estimating the prediction interval are used, quantile regression and variance estimation. How they are used is presented in the following sections.

3.1.1 Quantile Regression

(36)

factor the larger the interval becomes. This approached is tested for LSTM and Gradient Boosting Regressor. In GBR, the quantile is directly specified as a hyper parameter to the model. In LSTM a customized loss function is created which penalizes over and under prediction differently.

3.1.2 Variance Estimation

The variance estimation method consists of two different parts. Firstly, a machine learning model tries to fit to the target signal, the residual, as closely as possible. Thereafter, the squared difference between the target signal and the estimate is calculated, see equation (15).

e2=(y−ˆy)2 (34)

This gives a vector, [e2₁,e2₂,e2₃,...,e2_n] where each term e2_i corresponds to the squared difference calculated in that point. Thereafter, a machine learning algorithm estimates the variance by fitting the features to the squared difference. This relationship, between the features x to the squared difference, is noted f(x). The prediction interval is then approximated with

IY(x0)≈µ(x0)±tα

2(∞)f(x) (35)

based on the expression derived in equation (18). The lesser the number of data points close to x₀ the more inaccurate the approximation becomes. It is assumed that the residual is normally distributed and that the expectation value is 0. Furthermore, it is assumed that f(x) is a accurate approximation of the variance in x₀. The true prediction interval for α will differ from the estimated prediction interval for α [12]. As a result, it is directly incorrect to state that it is expected that 95% of the data will be within the prediction interval for α =10. α should rather be viewed as a parameter which can be used to tune the size of the prediction interval.

To visualize the concept behind variance estimation, let y be a variable which is modeled as a second degree polynomial.

yi=a+bxi+cx2i+ei (36)

(37)

Figure 15: A figure which shows the true model (µ,−−), the estimated model (µ∗, green), the models confidence interval (I_µ, red), the prediction interval (I_Y, magenta) and the estimated prediction interval (IY,est, blue dashes). The * are the data samples, yi

.

Figure 15 visualizes the concept behind variance estimation. The prediction interval is given by IY. Furthermore, the estimated prediction interval is labeled as I_Y,est, the expected value µ. Also, the estimated expected value is labeled as µ∗ and the confidence interval is given by Iµ. Consequently, µ is the value of y unaffected by noise whereas µ∗ is the fitted polynomial to the noisy signal. The difference between the confidence interval and the prediction interval is that the former only takes into account the uncertainties in the model itself, whereas the later also models the uncertainty in the data.

3.2 Data acquisition

(38)

3.3 Visualizing the data

To determine what signals might be useful and not, it is important to get an overview of the signals. Therefore, the first step, once the data is loaded and made ready in P ython, is to plot all the signals.

Figure 16: A plot over the measured and simulated mass in tank 2 and its residual for a typical flight. The aircraft left ground at about t = 980s and landed at about t=2000s.

(39)

Figure 17: A plot over the measured static reference pressure, theta angle and phi angle over time for a typical test flight.

In Figure 17 the static reference pressure, theta angle and phi angle for a typical flight is visualized. The static reference pressure is the atmospheric pressure measured outside the aircraft. This corresponds to the altitude of the aircraft. This is known to have a big impact of the fuel system according to fuel system engineers at Saab. The theta and phi angle give information about how the aircraft is positioned and thereby the ability to measure signals, such as the fuel tank levels, are dependent on how the aircraft is tilted. In the model of the fuel mass residual one can presume that the theta and phi angle will be useful signals.

(40)

Figure 18: A scatter plot for the residual in tank 2 over theta angle for 4 different test flights.

The data is visualized in a lot of different ways. For instance in Figure 18 are two signals, theta angle and the residual for tank 2, plotted against each other. This way of plotting different signals against each other is to see correlation between signals to each other and how it associates with the fuel system in large.

3.4 Machine Learning preprocessing

The approach to solving the problem is presented in this section. These steps have to be performed in the correct order. For instance, scaling of the data has to be performed after irrelevant parts of the flights have been cut away. Otherwise it would be impossible to find where the cuts should be. Also, the cuts that are removed from the data will impact the ing. Consequently, the cutting of the signals is performed before the scal-ing.

A technique called pipeline is used to structure the preprocessing steps. Pipeline is a common tool, available in scikit, for automatically performing a series of data processing steps with one call [6]. This makes it much easier to structure the code and to make sure that all necessary steps have been per-formed.

(41)

Note that there is far too little data to only use data from one flight to train the machine learning algorithm. One flight consists approximately of a few thousand data samples. Therefore, data samples from approximately 100 flights are concatenated after the preprocessing steps have been performed on each flight data separately.

3.4.1 Test and train data separation

The data is split into two different parts. One is the training batch which is used for training the model, fitting its coefficients and weights, so that it finds relationships between the features and the label. The second part is the data used for testing and verifying the model. This part is for making sure that the trained model can be generalized to unseen data.

The preprocessing steps differ for the different training and test set batches. There are processing steps that are performed on the training batch which are not performed on the testing part. This is to make sure that the training data is as well balanced as possible and to limit the impact of faulty and inaccurate measurement readings.

For Ridge Regression and Decision Trees Regressors, the processing steps for the training data are listed below.

• Data is cleaned from NaN, doublettes and other faulty numbers • Low pass filtering of measurement signals

• Residual signal is added to the data (which will be separated from the features)

• Inclinations θ and φ are added to the features • Accumulated signals are added to the features • Data is cut

• Data is resampled using stratified oversampling • Irrelevant features are removed

(42)

For LSTM, the training data preprocessing steps are listed below. • Data is cleaned from NaN, doublettes and other faulty numbers • Low pass filtering of measurement signals

• Residual signal is added to the data (which will be separated from the features)

• Inclinations θ and φ are added to the features • Data is cut

• Irrelevant features are removed • Features are separated from the labels • Features and labels are normalized

• The features are framed for LSTM. Either using uneven oversampling or sampling evenly back in time. The labels are shifted so that they are equal in length to the features

The difference between the test batch and the training batch is the normalization of the labels. For the test part, only the features have to be normalized. This is due to the fact that only data used to train the model has to be normalized.

3.4.2 Data cleaning

Firstly, the data is cleaned by removing samples with NaN. Also, the simulation data come in doublets. Every sample is recorded twice which is solved by taking every second value. This process has to be performed on all the data in order for the machine learning algorithm to work. Furthermore, the time indexation is indexed as time elapsed after midnight. Therefore, another time vector is created where the indexation starts from 0 and stops when the engine in the aircraft is turned off.

(43)

3.4.3 Low pass filtering

Figure 19: Lowpass filtering on a typical measure signal (mass in tank 2) In Figure 19 we can see a Butterworth filter applied to a typical measurement signal in Gripen. The noisiness of the unfiltered graph in Figure 19 is due to fuel splashing in the tank. It is likely that applying a filter in this manner removes some information in the signal. It is rather difficult to determine which degree of low pass filtering that should be used. According to Gripen fuel system engineers at Saab, the fuel system is relatively slow. The interesting dynamics of the system lies in larger deviations of the fuel mass in a tank.

3.4.4 Inclination in tank levels

The inclination in the tanks levels is not measured directly, as mentioned in section 2.1.4, but calculated from the accelerations of the aircraft. These signals are sensor data from the aircraft. The tanks are modeled as cuboids but they are, in reality, shaped very irregularly. It is expected that the residual differ between different inclinations in the tank due to this simplification in the model.

3.4.5 Label

(44)

3.4.6 Accumulated signals

If, for instance, a valve is modeled with a certain area which differ from reality, that can lead to an accumulated difference between some simulated and measured signals. Consequently, they will differ gradually more and more. An approach to solve this problem is to add certain accumulated signals. Thus, the model can learn the tendency of accumulated difference between these measured and simulated signals.

There are two signals created for this purpose. The valves opening com-mand signal indicates that there is a flow into and out from tank 2. These are integrated and are the total time for when the input and output valve commands are opened. Thus, there are two new features added to the set which is expected to increase the model’s ability to learn accumulated tenden-cies.

During the working process of this project some specific behaviors regarding the simulation model are observed. It can be noted that the simulation is inaccurate when it comes to predicting the masses in the fuel tanks when the transfer pump runs at max capacity. During these conditions the simulation model often over-predicts the masses in the fuel tanks, leading to the residual to become largely negative. An attempt to increase the ML-models ability to learn this behavior is made by creating another signal.

To create a signal the transfer pumps control signal is integrated every in-dex where the signal is on its highest level, meaning the transfer pump runs at max capacity. This is very similar to the previous created signals and this new feature will be added to the data sets.

(45)

3.4.7 Cut signals

Figure 20: A figure which visualizes how the data is cut based on θ Figure 20 visualizes the cutting of the data for angles that lie outside the interval −b<θ <c. It is expected that the measurements are, as previously mentioned, very unreliable during such conditions. All the measurement sensor values that are recorded simultaneously as the data samples with large angles are removed. As a result, all feature vectors are all the same length but shortened compared to their lengths before the cut.

3.4.8 One hot encoding

(46)

Figure 21: Control signal for the transfer pump servo (blue). Binary signals for different voltage levels in transfer pump servo signal (red, green, light blue, yellow) In Figure 21 the control signal for the transfer pump servo can be seen as the blue graph at the top. This signal can be 0 mA, 1.75 mA, 2.75 mA or 5 mA. The different currents as control signal input to the transfer pump correspond to different levels of mass flow. The reason for the different levels, instead of only on/off, is as mentioned in section 2.1.1 because of priorities in the control of the pump. The transfer pump is driven by the hydraulic oil system which priorities the control system for the aircraft and the landing gear higher than the transfer pump. This solution with controlling the transfer pump on different levels is to reduce the wear on the pump and reduce the usage of the hydraulic oil system.

To capture these levels and behaviour of the pump one hot encoding is im-plemented. In one hot encoding the idea is to put numerical values to categorical data. Here, four new signals are created, that responds to each one of the different levels either being a one or a zero. These signals can be seen as the red, green, light blue and yellow graph in Figure 21.

3.4.9 Feature selection

(47)

more complex, nonlinear manner. Many of the features are correlated with each other. Therefore, adding more and more features does not necessarily mean that more information is provided to the machine learning algorithm.

Figure 22: Principal component analysis applied to all the relevant features. As seen in Figure 22, the 30 most significant principal components span over 98% of the variance of 65 features that are expected to be correlated with the residual. Therefore, it is likely that reducing the dimension of the features to only the most important principal components will produce a more accurate result. The problem becomes less complex and the transformed features still carry almost all the information from the initial features.

3.4.10 Sampling methods

The reason behind stratified sampling This paragraph explains the reason why stratified oversampling is used in this project. The next paragraph explains the method in itself.

(48)

(a) The model estimates the training label. Thus, the model estimates the signal which it is trained to estimate. Note that ˆy (blue

curve) is also shown in the figure, but hidden by y.

(b) The model estimates the test set that is taken as 30% of the orginal data set (but shuffled).

(49)

Figure 23a shows the target signal estimate, the target signal and the predic-tion interval for estimapredic-tion of the training batch. The features used for training the model is used as input to the trained model. Clearly, the estimate is very close to the target signal. This, however, is not surprising since the model is trained using the same values of these features.

Figure 23b shows that the target signal estimate is very close to the target signal. The features and labels come from the 30% of the data that is removed from the original data set.

Figure 24: The model estimates data that is not in the orginal data set. Thus, the data does not come from the 70% used for training or the 30% that is previously used as test, see Figure 23a and Figure 23b.

(50)

Stratified oversampling During test flights, the pilot has a test card in his lap which states all the manouvers scheduled for the flight. These flights are for verifying that the different demands of the aircraft are fulfilled. The typical sequence of events for a test flight is to first start the APU (Auxilitary power unit). Basically, it works as a emergency power supply for different aircraft systems and as a starter for the main engine and the recording. Thereafter, the main engine can be started and the pilot can taxi out on the runway. The pilot then checks with the control group for a clear sign to take off. After the take off the pilot flies to a play area where he can perform the scheduled maneuvers. When all the maneuvers are done the pilot returns to the base for a landing. Even though the maneuvers change for different flight tests the basic procedure is the same.

Figure 25: Altitude and a histogram over the distribution of the altitude samples for a flight.

(51)

Since there are no known distributions of flight modes or what exactly specifies a flight mode, the approach to solve this was to look at which signals could be used for defining different flight modes, with the help of fuel system engineers for Gripen. With these signals a flight mode would be defined by investigating where the signals are relatively constant.

Figure 26: A figure which shows the concept behind stratified sampling. A signal is divided into different flight modes by the aperture levels. The time indexes, x0, x1, x2, x3 and x4 are saved. The same procedure is repeated for

three signals and the time indexes are sorted chronologically.

Figure 26 shows an illustration over the concept of how the data is bal-anced in different flight modes. Firstly, a signal for defining the different flight modes are picked out. That would be represented as the brown curve in the figure. Secondly, aperture levels are defined for this signal. Aperture lev-els are levlev-els between which it is expected that the operating conditions are approximately the same. If the brown curve would represent altitude, simi-lar altitudes are isolated together by the apertures. All data points between two aperture levels that are next to each other are considered as data for the same flight mode. The next step is to iterate through all the data samples in the signal and see between which aperture levels that each data sample correspond to. The index for the value following immediately after the signal cross an aperture line will be saved. The next index that is saved need to pass another aperture line than the latest crossed. These are noted in the Figure 26 as x1, x2 and x3. For the boundaries, the first and last index, x0

(52)

modes.

Between all adjacent index a fix number, for instance 10, of data samples are chosen randomly. The same data point may be chosen multiple times. This makes sure that the data is evenly distributed over different flight modes. The signals for defining the different flight modes are set to be the static reference pressure, theta, and the control signal for the transfer pump. The procedure is made with the addition that the indexes from the three different signals are added together and sorted chronologically.

This stratified sampling approach can also be flexible by changing the size of the different aperture levels to find some optimum. Moreover, the number of data points taken between each index can be changed easily.

Note that this approach can only be used for the time independent solution. Merg-ing data over several time steps will destroy information about different time lags.

Uneven oversampling For the time dependent solution, namely using LSTM in the prediction, the number of data points was largely amplified. This is due to the fact that LSTM cells, constructed using Keras and Tensor-flow backend, require a certain format for the data. Firstly, the number of time lags has to be chosen. This number is rather difficult to choose. This number determines over which time lags that there is relevant information and relationships between signals. Moreover, choosing a large number of lags is problematic since it leads to memory problems and long computing time.

(53)

3.4.11 Scaling

Depending on the machine learning algorithm, the data is scaled differently. Especially LSTM, since neural networks are incapable of converging if the data is wrongly scaled. Different scalings are applied to the signals to see which of them provide the most satisfying result.

As mentioned in section 2.4.7, there are two very common scaling techniques used in machine learning, namely standardization and normalization. These are applied to the signals to see which of them provide the most satisfying result.

3.4.12 Framing for time dependent solution

For the time dependent solution it is assumed that prior time steps carry in-formation about how to predict in the current time step, in opposite to the time independent solution. As a result, the data has to prepared in a dif-ferent manner. LSTM is the machine learning model used in this purpose. The structure of the data into an LSTM cell is datasamples, lags in time and lastly the number of features. Every feature contains multiple datasamples (approximately 3000 samples per flight) which is stored as several lagged data vectors.

Figure 27: A figure which shows how the data is formated for the time dependent solution.