Mathematical Methods for Maritime Signal Curation in Noisy Environments

(1)

School of Education, Culture and Communication

Division of Applied Mathematics

MASTER THESIS IN MATHEMATICS / APPLIED MATHEMATICS

Mathematical Methods for Maritime Signal Curation in Noisy

Environments

by

Lara Jiménez Blázquez

Masterarbete i matematik / tillämpad matematik

DIVISION OF APPLIED MATHEMATICS

MÄLARDALEN UNIVERSITY SE-721 23 VÄSTERÅS, SWEDEN

(2)

School of Education, Culture and Communication

Division of Applied Mathematics

Master thesis in mathematics / applied mathematics

Date:

2019-06-07

Project name:

Mathematical Methods for Maritime Signal Curation in Noisy Environments

Author:

Lara Jiménez Blázquez

Supervisor(s):

Christopher Engström (MDH) George Fodor (QTAGG) Tomas Lindqvist (QTAGG)

Reviewer: Richard Bonner Examiner: Sergei Silvestrov Comprising: 30 ECTS credits

(3)

Abstract

QTAGG has designed a real-time autonomous system that continuously calculates an optimum propulsion plan controlling the engines and propellers of a vessel. In this way, the precision of the signals that are used is very important, as any little error in the signal can produce incorrect control effects and cause critical damages to the equipment or passengers.

This thesis describes the mathematics and implementation of a system to detect and correct disturbances in the data signals of a vessel. The system applies a signal curation based on mathematical modelling and statistics leading to clean data to use in QTAGG’s control sys-tem.

(4)

Chapter 1 Introduction

"Information is the fuel of the 21st century, and data analytics the combustion engine" - Peter Sondergaard

1.1 QTAGG: Marine Energy Efficiency

QTAGG R&B is an enterprise focused on marine energy efficiency. It was founded in 1997 in Västerås, Sweden. QTAGG’s products reduce the fuel on large seagoing vessels as well as wear and tear of the engine, rudder and propeller.

This company has developed an autonomous self-learning control system, which combines statistical and real-time data to calculate an optimum propulsion plan governing the engines and propellers of the vessel. Hence, improving the total efficiency of the ship, arriving on time at the lowest cost. In this way, without any manual intervention, the system automatically adapts to the new conditions and circumstances of the sea, ensuring that the propulsion of the vessel performs as its best during the whole voyage.

The system continuously analyzes the ship’s motion and simulates an interactive model of the ship based on sensor data. With this model, all possible scenarios are evaluated and the alternative with the lowest fuel consumption is continuously selected. Depending on the sys-tem configuration, the size of the vessel and operational route, the fuel saving is around 5-10%.

1.2 Problem description

In these last years, the flow of data used in industry has increased remarkably. Specifically in the marine field, this sharp growth is due to both the increased number of sensors and the availability of networked information such as weather forecast, marine traffic, etc.

The analysis of this data helps companies to take more efficient decisions, improve their products, do smarter business movements, and therefore have happier customers.

(7)

But often this big amount of data has not the desired quality. Sometimes, the signal is lost from the sensor, altered or corrupted by noise, giving as a result outliers or temporary bias. Hence, developing an autonomous monitoring system which enables automatic detection and correction of abnormalities in signals, is a subject of big interest.

QTAGG has developed a real-time autonomous system which repeatedly calculates an op-timum propulsion plan controlling the engines and propellers of the vessel. In this way, the accuracy of the signals used is crucial, as a wrong signal can produce an erroneous control effect which can cause serious damages to the equipment or passengers.

In this project, a system for signal curation based on statistics and mathematical modelling is developed in order to detect and adjust the previous perturbations which can appear in the data signals of a vessel. The clean signal will be used in QTAGG’s control system.

1.3 Goals and objectives

1. Implementation of a signal curation system which enables automatic detection and cor-rection of abnormalities in the signals of a vessel

2. Use Mathematics and Data analysis to obtain robust detection and correction methods

1.4 Content of the report

The report is structured as follows:

• Chapter 1: It contains an introduction to the company in which the project was de-veloped (QTAGG), a description of the problem, the objectives and a brief explanation of the structure of this report.

• Chapter 2: Explanation of the theoretical framework in which the problem appears.

For the imputation of missing values the methods that were used are Recursive Least Squares and Kalman Filter. For the detection and correction of outliers, the method was Recursive Hampel Filter.

• Chapter 3: The study case is developed under the concepts explained in the theoretical framework.

Implementation of a signal curation system for the detection and correction of disturb-ance in data signals of big boats.

• Chapter 4: It concludes with a general analysis of the results obtained in the study and proposal for future lines of analysis.

(8)

1.5 Literature review

It was not easy to find relevant literature for this project as there are many papers related to signal analysis and correction, but most of them are very focused on a unique particular topic or application. It is possible to classify the existing literature in five types as shown below.

1. Theoretical results found in the signal processing literature, such as filters, outlier de-tectors or orthogonal transforms.

2. Theoretical results found in the modelling literature (mainly so called soft sensors liter-ature and different kinds of Kalman filters).

3. Results in the field of error detection, correction and recovery. The most known are the IFAC (International Federation for Automation and Control) conferences SAFEPRO-CESS on Fault Detection, Supervision and Safety for Technical Processes.

4. Results in Sensor Fusion research. Sensor fusion studies architectures how it is possible to merge data from multiple sensors to decrease the amount of uncertainty on signals. 5. Different reports where the focus is on how concrete applications are done rather than

describing specific theories.

In this project, we will focus on both, theoretical and practical research. About traditional theoretical literature, it would be important to remark:

- Imputation of missing values in a signal is achieved by using Kalman filter and Recurs-ive Least Squares. These methods are very useful since they can provide valuable information about important variables that enable the system to provide accurate control inputs to the ac-tuator. They are a very practical and economical alternative to real measurements.

However, estimation is challenging as many times it needs to handle faulty system models and noisy sensor measurements. Thus, the reason to use these two specific methods is that Kalman filter is very accurate when solving the estimation problem for linear dynamic sys-tems with noisy observations, and Recursive Least Squares can be recursively updated as more input data is available with a very good performance in the presence of Gaussian measurement noise.

Kalman filter was proposed in the paper ‘A New Approach to Linear Filtering and Predic-tion Problems’ by R.E. Kalman in 1960, where it is presented a recursive solution to the discrete data filtering problem. Since that time, this algorithm raises in importance, becoming a subject of extensive study and application, especially in the area of autonomous navigation. Thus, currently it is one of the most essential estimation methods, as it manages to estimate hidden variables based on faulty and uncertain measurements. Moreover, it makes possible to predict the future system state from past estimations.

(9)

Recursive Least Squares was first developed by Gauss in 1821 in his ‘Theoria Combinationis Observationum Erroribus Minimis Obnoxiae’, but it was ignored for almost a century until 1950 when Plackett rediscovered Gauss’ work and the Recursive Least Squares method star-ted taking the importance that it has nowadays, due to its attractive and statistical properties.

- For the detection of outliers, we used the Recursive Hampel Filter method. This algorithm is an outlier detector and corrector that removes the faulty values without distorting the original data and improves the quality of the signal. It is based on a moving data window, the median of its values and the MAD (median absolute deviation) scale estimator.

The Recursive Hampel Filter is an improvement of the recursive standard median filter, which unfortunately can also introduce some distortion in the fragment of the signal that we want to retain, making its utility very application-dependent. This new method was described by Davies and Gather in ‘Hampel identifier’ in 1993, when it started to be implemented as an outlier filtering technique in many physics and industrial applications.

Moreover, many results from sensor fusion literature were relevant to the development of this thesis:

- Each sensor in a tracked object provides different information with different accuracy. These sensors can suffer a breakdown, uncertainty or cover only a restrict region. A good solution for the listed problems is to use sensor fusion. It combines different sensory data in order to improve the information from the sources when they are used individually. In this way, sensor fusion has the same approach as this thesis.

In sensor fusion literature, several types of architectures are described in order to obtain a clean data from several sensors by using redundancy among them. For example, the paper of X. Rosalind Wang et al (2008) which has a closest solution to our problem, describes an architecture used for determining false alarms in gas sensors. Mathematically, the method boils down to classify a residual between a predicted signal value obtained from a model and the actual signal (in the article the residual is classified with a Bayesian network). We use this approach to detect signals errors and classify the correction method as described later in Section 3.

In this manner, all this literature is very important as the approach of this thesis is to make use of different theoretical methods such as Kalman Filter, Recursive Least Squares and Re-cursive Hampel Filter, in order to improve the information received from the sensory data so that it can be used later in other applications. That is why, this thesis is very complete since it combines both theoretical background and a specific application in the marine field.

(10)

Chapter 2 Theoretical framework

This project analyzes data in real time, therefore it is not possible to use the standard machine learning techniques. It is necessary to use its adaptation to real time series data analysis. In this way, the three main methods used in this project are Kalman filter and Recursive Least Squares, for the imputation of missing values, and the Recursive Hampel Filter, for the filtering of the signals.

2.1 Kalman filter

The Kalman filter is named after Rudolph Kalman, who in 1960 proposed a recursive solution to the discrete-data linear filtering problem.

This filter is a set of mathematical equations for an optimal estimator of the predictor-corrector type. Thus, it is a method that produces an estimation of unknown variables from a series of measurements observed over time which have some statistical noise.

Algorithm:

Consider that we want to know the value of a variable within a process of the form x_t= Atxt−1+ Btut+ wt

where xt is the state vector, Atthe state-transition matrix from time t − 1 to t, ut the input data

and wt is the associated white noise process with known covariance. This is what we call the

state equation.

Moreover, we can model the observations by what is called the measurement equation: z_t= Ctxt+ vt

where zt is the actual measurement of x at the time t, Ct is the connection matrix between the

(11)

measure-ment error. The error covariance matrix Pt is given by:

P_t= E[eteTt ] = E[(xt− ¯xt)(xt− ¯xt)T]

The Kalman filter is usually described in two steps: time update and measurement update.

Figure 2.1: Schema for Kalman Filter

In this way, having the system

xt= Atxt−1+ Btut+ wt

z_t= Ctxt+ vt

PREDICTION: These are predictions of the state variables just taking into account the dy-namics that have been assumed to follow

¯

xt= Atxt−1+ Btut (Prediction of the state)

¯

P_t = A_t· P_t−1· AT

t + Rt (Prediction of the covariance)

ACTUALIZATION: we improve the predicted data using the information of the observable variables

K_t= ¯P_tC_tT(CtP¯tCt+ Qt)−1 (Calculation of Kalman gain)

xt= ¯xt+ Kt(zt−Ctx¯t) (Actualization of the state)

(12)

where,

At: state-transition matrix from t − 1 to t

B_t: observation matrix

w_t: process noise, where wt ∼ N(0, Rt)

Rt: covariance matrix of the process noise

v_t: observation noise, where vt∼ N(0,Ct)

C_t : matrix that relates the state with the measurements Qt : covariance of the observation noise

K_t: Kalman gain. It indicates the confidence in the observed characteristics

There is a result that the optimum weight between two signals that gives the smallest standard deviation of the result is when the weight is the ratio of the variance of the two signals. Thus, the Kalman weight K is the ratio of the variances of the signals.

In this manner, if we rewrite the equation for the actualization of the state as follows: x_t= (1 − Kt ·Ct)x_t+ K_t· z_t

It is possible to see that if the input has not too much noise, this means that the standard de-viation of z is low and then the Kalman gain K would be larger, giving a higher weight to the input signal. While if the input signal z has a lot of noise, then the current value of the input would have less weight and therefore, the old value of the state will have more, as 1 − K would be larger.

In this way, it can be seen that the Kalman filter will be correct when a correct value of the noise is presented to it. Therefore, the signal curation process should not change the standard deviation of the signal. However, if the signal has outliers, interruption, locks at certain value or noise not related to the process, then these disturbances need to be removed by the signal curation.

Note: It is important to remark that in the case where the process has a non-linear equa-tion, the method that would be used is the Extend Kalman Filter. However, in this project we have focused only on the linear case.

Augmented Kalman filter

This filter includes in the state vector some unknown system parameters to be estimated (bias). These parameters are modelled as stochastic variables where bt is zero mean Gaussian noise

(see equations below). In this way, the covariance matrix will become an augmented process noise matrix such that:

P_aug= Px 0 0 P_b

(13)

Therefore, by augmenting the state vector with the bias, the Kalman Filter can be used to track simultaneously both the states and the bias. Thus, the augmented model would look like:

xt+1= Atxt+ Btut+ Ftbt+ wt

z_t = Ctxt+ Gtbt+ vt

where xt is the state vector, ztthe observation vector, ut the input vector and bt the bias vector.

If we reformulate this as states:

¯ A= A F 0 I ¯ B= B 0 ¯ C= C G X= xt b_t ¯ w= w x t wb_t

This gives the new model:

X_t+1= ¯A_tX_t+ ¯B_tu_t+ ¯w_t z_t= ¯C_tx_t+ vt

Hence, we have seen the formulation of the Augmented Kalman Filter based on the analysis of augmented complex statistics.

2.2 Recursive least squares

Recursive Least Squares was discovered by Gauss in 1821, but it was not until 1950 when Plackett rediscovered the original work of Gauss, when it gets the use and importance that it has nowadays.

It is an adaptive filter which recursively finds the coefficients wn to minimize the weighted

least squares error cost function

C(w) =

N

∑

i=0

λN−i· e(i)2

where 0 < λ ≤ 1 is the ‘forgetting factor’ which exponentially gives less weight to the previous samples. In this manner, the smaller λ is, the less contribution of older samples is in the covariance matrix. Normally, λ has a value between 0.98 and 1.

Algorithm

Being the nth order linear regression of the form

(14)

where the measurement d(k) of x(k) has a zero mean value noise e(k) so that d(k) = x(k) + e(k); k= 1, 2, . . .

The RLS algorithm tries to find a recovered signal ˆd(N) for the observation N, such that the estimation is good if the error ˆd(N) − d(N) is a small value.

In this way, the weight least squares error cost function would be

C(w) = N

∑

i=0 λN−i· e(i)2 = N

∑

i=0

λN−i· (d(i) − ˆd(i))2

= N

∑

i=0 λN−i· (d(i) − n

∑

k=0 w_k(i)x(i))2 Calling,

d0(i) = λN−id(i) x0(i) = λN−ix(i) then the cost function can be rewritten as

C(w) = N

∑

i=0 (d0(i) − n

∑

k=0 w_k(i)x0(i))2

which is the usual standard least squares criterion. Hence, the LS solution can be calculated as follows, w= N

∑

i=0 x0(i)x0(i)T !−1 · N

∑

i=0 x0(i)d0(i) = N

∑

i=0 λN−ix(i)x(i)T !−1 · N

∑

i=0 λN−ix(i)d(i)

where we will denote,

Φ(N) = N

∑

i=1 λN−ix(i)x(i)T ψ (N) = N

∑

i=1 λN−ix(i)d(i) To be able to find a recursive in time way to calculate wn,

(15)

we need to rewrite the variable Φ(N) and ψ(N) as functions of the previous sample, so that Φ(n) = N

∑

i=1 λN−ix(i)x(i)T = λ N−1

∑

i=1 λN−1−ix(i)x(i)T+ x(N)x(N)T = λ Φ(N − 1) + x(N)x(N)T ψ (n) = N

∑

i=1 λN−ix(i)d(i) = λ N−1

∑

i=1 λN−1−ix(i)d(i) + x(N)d(N) = λ ψ(N − 1) + x(N)d(N)

Prop: (The matrix inversion lemma) If A and B are M × M positive definite matrices, D is a N× N matrix, and C is a M × N matrix which are related by

A= B−1+CD−1CT then,

A−1= B − BC D +CTBC−1CTB

Applying the previous proposition to Φ(N), we obtain,

Φ−1(N) = λ−1Φ−1(N − 1) −λ −2_Φ−1_{(N − 1)x(n)x}T_(N)Φ−1_{(N − 1)} 1 + λ−1xT(N)Φ−1(N − 1)x(N) where denoting, P(N) = Φ−1(N) and g(N) = λ −1_{P(N − 1)x(N)} 1 + λ−1xT_{(N)P(N − 1)x(N)} = (calculations) = P(N)x(n) we obtain, P(N) = λ−1P(N − 1) − λ−1g(N)xT(N)P(N − 1) and thus, w(N) = (Φ(N))−1ψ (N) = P(N)ψ (N) = P(N)(λ ψ(N − 1) + x(N)d(N)) = P(N)(λ Φ(N − 1)w(N − 1) + x(N)d(N)) = P(N) Φ(N) − x(N)x(N)T w(N − 1) + x(N)d(N) = w(N − 1) − P(N)x(N)x(N)Tw(N − 1) + P(N)x(N)d(N) = w(N − 1) + P(N)x(N) d(N) − x(N)Tw(N − 1) = w(N − 1) + P(N)x(N)α(N) = w(N − 1) + g(N)α(N) where, α (N) = d(N) − x(N)Tw(N − 1)

is the a priori error. It is important to remark that it is different from the a posteriori error e(N) = d(N) − xT(N)w(N)

(16)

Now, with the previous equations, it is possible to summarize the RLS algorithm, g(N) = λ −1_{P(N − 1)u(N)} 1 + λ−1x−1xT_{(N)P(N − 1)x(N)} α (N) = d(N) − x(N)Tw(N − 1) P(N) = λ−1P(N − 1) − λ−1g(N)xT(N)P(N − 1) w(N) = w(N − 1) + g(N)α(N)

It is possible to see that the equation for the covariance matrix P follows an algebraic Riccati equation and hence, it has similarities with the Kalman filter.

2.3 Recursive Hampel filter

The main idea of the Recursive Hampel Filter is to run through the signal entry by entry, replacing if necessary, each entry by the median of its neighbours. These neighbours form a ‘window’ which moves over the entire signal. In this way, let w be the following moving data window

wK_i = {yi−K, yi−K+1.., xi, .., xi+K−1, xi+K}

where K is the width, yi− jthe output for the previous input sequence {xi} at time i − j.

The recursive standard median filter introduced by J.W. Turkey (1974), is obtained by cal-culating the median of the window such that

m_i= median{yi−K, yi−K+1.., xi, .., xi+K−1, xi+K}

This recursive standard median filter performs accurately against local outliers in the data. However, it can also introduce some distortion, making this technique to often fail entirely in practice. For this reason, new techniques such as median filter extensions, recursive median filters or weight filters have been developed.

Algorithm

Recursive Hampel filter is a simple but efficient filter to identify outliers in a time-series data sequence, replacing these with a more reasonable value. They are known to operate better than standard median filters, as these typically work with a fixed threshold, while the Hampel filter uses a scaling factor that takes into consideration the distribution of the data.

In this way, the Recursive Hampel Filter uses the median of the neighbour observations as a reference and the median absolute deviation (MAD) as a measure of distance. So that, a data point is considered an outlier when it lies more than a number t (threshold) of MAD from the median of its neighbour, and it would be replaced by the median.

(17)

More precisely, for each observation the Recursive Hampel Filter builds a moving window with K previous points and K next future points. Hence, the MAD is calculated from these 2K + 1 points and if necessary, the observed point would be replaced by the median of these 2K + 1 observations. Analytically, the filter’s response is given by

y_i= xi, if |xi− mi| ≤ t · Si m_i, if |xi− mi| > t · Si

where xithe observation to analyze, miis the median value for the window, t the threshold and

Siis defined as

S_i= σ · madi≈ 1.4826 · median(|xi−K− mi|, .., |xi+K− mi|)

The factor 1.486 makes the MAD an unbiased estimator of the standard deviation for Gaussian data.

It is important to remark that this moving data window wK_i is not well defined for the first and last K observations. The most common approach for these points is to extend the ori-ginal sequence by appending K additional copies at the beginning of the sequence for the first element or at the end, for the last one. However, this approach will not be very useful if the last observation is an outlier. In this way, a delay of K observation would be needed in order to perform the algorithm properly and to introduce a specific constant for each different ana-lyzed signal. Therefore, a better approach is to only use a window formed by the previously computed values and the new point.

(18)

Chapter 3 Study case

3.1 Contextualization

QTAGG has developed a real-time autonomous system which continuously schedules an op-timum propulsion plan governing the engines and propellers of the vessel. In this manner, the precision of the used signals is very important, as a wrong signal can generate incorrect control effects which can cause serious damages to the equipment or passengers.

The process of the control system would be as follows,

Figure 3.1: Schema of the full control process

The data comes from the sensors of the vessel with some disturbances which the signal cura-tion system will process. It is important to remark that this analysis and correccura-tion need to be precise and very accurate, and not a total ‘clean’ of the signal, since some of the noise will be

(19)

later used by the controller of the ship to model and optimize efficiently the consumption of fuel. At the same time, the curation system should remove outliers since the control system calculates derivatives of signals and a very large derivative can have disastrous consequences.

The signal correction block is classifying the residual to obtain an appropriate curation method as follows:

• If the standard deviation is larger than a given range, we apply the Hampel filter

• If the modeled value from the least square estimation is larger or smaller than a given value, then a least-square correction is applied

• If the process model used by the Kalman filter gives an estimated value much different from the actual value, then the system uses a different model structure for the Kalman filter.

More details are explained in the code section.

Hence, the method of the signal curation system to automatically detect and correct signals works as follows:

1. Model the properties of the signal

2. Preprocess to identify if a signal is correct or not

3. Select the method necessary to improve the signal using the residual 4. Compute the correction of the signal

(20)

There are different scenarios of disturbances in a signal that need to be corrected:

PROBLEM SOLUTION

Lost data Imputation by statistic relation with other sensor signals, using methods such as Kalman filter

Outliers Correct wrong samples, when it is known that physical properties would not allow large variations, using techniques such as Hampel filter

Wrong data Detection of wrong values, such as when a specific value is repeated during a too long period of time

Table 3.1: Signal problems and their solution

In this project, a system for signal curation based on statistics and mathematical modelling is implemented in order to detect and correct disturbances which can appear in the data signals of a vessel and that will be used in QTAGG’s control system.

3.2 Approach and analysis

As it has been explained before, there are different reasons why a signal needs to be corrected. In this way, it is necessary a different method and implementation for each different case. In this project, we have focused concretely on three signals: Fuel consumption, GPS position and Speed through water.

The data used in this project was taken from Mariella, a vessel that links Stockholm and Helsinki. Concretely, this data corresponds to the date 21/09/2018. Each data logs contains 15 minutes of samples, and for this project, we used 11 hours of data in total. The data logs are organized such that every column is a different signal coming from the physical sensors of the vessel, and each sample of the dataset (row) is taken every 0.25 seconds.

(21)

Figure 3.3: Example of some columns of a data log used in this project

3.2.1 Fuel consumption

Lost data

Let’s consider N signals obtained from the sensors of a vessel. Some of these signals are related to each other, in the manner that a signal sncan be computed from other related signals,

such that

ˆ

s_n(t) = f si(t), sj(t), sk(t), ...

In this way, if the signal snhas missing values, these can be estimated from the other related

signals.

This is the case for the variable fuel consumption. Engine power, speed, boat weight, sea state, wind resistance and the number of miles sailed have a direct influence on fuel consump-tion. By looking at the correlations between the fuel consumption signal and some of these variables in the data, it is possible to state the following regression model:

f c= β0+ β1· speed + β2· wind

We also need to take into account the number of engines that are ON in the moment the estim-ation is calculated, therefore we are going to have five of this model depending on the number of engines. So that, for instance, if the value that we want to estimate is when two engines are ON, instead of taking the immediately previous value (that maybe corresponds when three engines are ON), we will take the previous value when two engines were ON.

(22)

It is important to remark that in the data, there are two different variables for the speed ( ‘Speed through water’ and ‘Speed over ground’). The first one is the speed of the vessel under the water, which can be affected by the sea current, and second one is the speed of the vessel above the water, which can be influenced by the wind. These two variables are really similar to each other and therefore, there is a high risk of having multicollinearity. We take into con-sideration only one of them (Speed Through Water) to include in the model, as it is the only relevant for the fuel consumption.

Moreover, there are also different variables related to the wind speed, but we will take into account the variable that refers to the relative speed of the wind, since it already takes into consideration the angle of it.

With respect to the model, instead of using a standard multiple regression, we have implemen-ted the recursive least squares method in order to adapt the model to real-time data analysis.

In the following graph, it is possible to see the accurate performance of this method.

(23)

Outliers

An outlier is an observation which is distant from the rest of points in a dataset. It needs to be analyzed carefully since it often contains valuable information, but sometimes it is just a bad data point which causes serious problems in statistical analyses.

In our case, the outliers will be normally just errors in the system. For example, if we analyze the ‘Fuel consumption’ signal, it cannot go from 800 l/h to 1300 l/h suddenly, since it is not physically possible. Therefore, this outlier should be corrected. In order to do so, a Recursive Hampel Filter has been implemented.

As we have seen in the previous section, the Recursive Hampel filter uses a window of the form,

where K is the width of the window, yi− j the output for the previous input sequence {xi} at

time i − j.

Therefore, K ‘future’ neighbours are needed. The most frequent approach is to extend the original sequence by adding K additional copies at the beginning of the window for the first element or at the end, for the last one. Nevertheless, this technique does not perform very well for outliers. Instead, it is better to consider a window formed by the previously computed values and the new point, and to include a specific constant, such that a point is considered an outlier if

|xi− mi| > t · max (Si, cte)

where xiis the observation to analyze, mithe median value for the window, t the threshold and

S_iis defined as

S_i= σ · madi≈ 1.4826 · median(|yi−K− mi|, .., |xi+K− mi|)

It is important to remark that the number of neighbours K should not be a large number since if so, the signal would be ‘too clean’. This is, the noise of the signal would also be removed, and it is important to keep it, as it is necessary for future analysis in the controller of the vessel.

(24)

Figure 3.5: Outlier filtering in fuel consumption signal

3.2.2 GPS Position

Outliers

For the GPS Position, we have focused on the detection and correction of outliers since it is very difficult to input values for the GPS Latitude and Longitude, as they are not very correl-ated with any other signals.

Firstly, it is essential to remember that for the latitude any value out of −90◦ and 90◦ and for the longitude, −180◦and 180◦, is considered an outlier.

To improve the performance of the Recursive Hampel Filter, we decomposed the data point as below, where we considered the unitary circumference for the calculations.

(25)

sinθ = a c = a cosθ = b

c = b

In this way, the Recursive Hampel Filter method is applied in ‘signal a’ and ‘signal b’. Later, the signal is re-transformed into degrees.

Instead of applying the following transformation,

tanθ = a

b =⇒ θ = arctan a

b

we calculate each θ of ‘signal a’ and ‘signal b’, and compute the average of them. Since, the division between the values of ‘signal a’ and ‘signal b’ can be minimal, there is a big risk that MATLAB does not perform accurately the computation of the value of θ . In this way,

θa= arcsin(a)

θ_b= arccos(b)

=⇒ θ =θa+ θb 2

In the following graphs for the Latitude signal, it is possible to see that there is only a straight line, this is because the values for the signal changes around 0.001 degrees every 6 or 7 samples approximately (1-2 seconds in real life). This makes sense, since this data was taken when the boat was already sailing in the sea and not in the harbour making any turn. However, to see the good performance of the algorithm we included several outliers in the dataset.

(26)

Figure 3.6: Outlier filtering in GPS Latitude signal

Figure 3.7: Detail in outlier filtering for GPS Latitude signal

It is possible to see that the outliers are really well detected and replaced by an accurate value, therefore the model performs properly.

(27)

3.2.3 Speed Through Water

Lost data

For the imputation of values for the Speed Through Water signal, we have applied the Aug-mented Kalman filter. This method is a variation of the standard Kalman filter which includes some undetermined system parameters to be estimated (bias) in the state vector. This bias is considered as zero mean Gaussian noise.

The model for the Augmented Kalman Filter looks like follows: x_t+1= Atxt+ Btut+ Ftbt+ wt

zt = Ctxt+ Gtbt+ vt

where xt is the state vector, ztthe observation vector, ut the input vector and bt the bias vector.

The new covariance matrix will be extended to an augmented process noise matrix such as:

P_aug= Px 0 0 P_b

In this way, to construct the specific model for the estimation of the STW signal, we could set the following relation:

stw= sog + k · w

where, stw is the Speed Through Water signal, sog the Speed Over Ground signal and w the Relative Wind signal of our dataset.

It is important to remark that although the difference between their values is really small, and it might seem to be the same, the STW and SOG signals measure different parameters.

Besides, we have taken into consideration the wind. The first idea was to take into consid-eration the sea current, however there was no data available. Thus, the wind was included in the bias term in order to improve the estimation of the Speed Through Water signal. We have selected the relative speed of the wind since it also takes into account the direction and angle of the wind.

In this manner, the model would look like:

X_t+1= ¯A_tX_t+ ¯B_tu_t+ ¯w_t zt= ¯Ctxt+ vt

(28)

where, ¯ A=   1 2 1 2 k 1 2 1 2 −k 0 0 1   C¯=   1 0 k 0 1 −k 0 0 k   X=   stw_t sog_t bt   w¯ =   wstw_t wsog_t w_tb   ¯

B= 0 since we have no known inputs. Moreover, it is important to remark that the variances w_tstwand w_tsogare estimated using the median of the variance over a sliding window.

We have tried this model in the set of data obtained from the Mariella, and we can see the accurate performance of it.

(29)

Figure 3.9: Detail of comparison of raw data of STW with its estimation using Kalman Filter

Outliers

The Speed Through Water is the speed of the vessel relative to the water. It is very important that the data is accurate and clean as it is vital for a good control and performance of the boat.

As for the case of the Fuel Consumption signal, in order to remove the outliers, a Recurs-ive Hampel Filter has been implemented.

To see the precise performance of this method, we added a random outlier to the data se-quence from Mariella. The filter is able to detect the outlier and substitute it for a more precise point.

(30)

(31)

Chapter 4 Matlab code

In this section, we are going to give an in-depth explanation of the code developed in this thesis for the implementation of the signal curation system.

4.1 Fuel consumption

4.1.1 Lost data

For the imputation of missing data in the fuel consumption signal, it was implemented a func-tion, which from the relation between the fuel consumption with the speed through water and wind speed, and also the number of working engines, makes possible to apply Recursive Least Squares and compute the missing values in real time.

function y_est = FuelConsumptionMissingValuesLMB(stw,Wind, fc,E1Speed, E2Speed,E3Speed,E4Speed)

% Function to estimate the fuel consumption taking into %account the number

% of engines that are ON in the moment of estimation % Input : Log data

% Output: ’clean’ fuel consumption signal %

% p = # of observations (filter order) % n = # of variables

% w Estimated coefficients n X 1 % x Observation matrix p X 1 % P Covariance matrix n X n

% alpha Error p X 1

% lambda Forgetting factor

(32)

g2 alpha2 P2 w2 g3 alpha3 P3 w3 g4 alpha4 P4 w4

if isempty(w0)

w0 = zeros(2,1); end

... %Same for w1, w2, w3 and w4

if isempty(P0) P0 = eye(2); end

... %Same for P1, P2, P3 and P4

We created a function called RLSEngineRealTime which has as inputs: the Speed Through Water, Relative Wind Speed, Fuel Consumption, Engine 1 Speed, Engine 2 Speed, Engine 3 Speed and Engine 4 Speed signals. The obtained output is an imputation of the fuel consump-tion signal.

For a later implementation in QTAGG’s system, some variables are defined as persistent and set as zero.

%Variables ======================================= %SI measurements

stw = stw*1.8519*5/18; %Transform knots into m/s %Wind already in m/s

fc = fc/3600;%Transform l/h into l/s

E1Speed = E1Speed*2*pi/60; %Transform RPM into rad/s E2Speed = E2Speed *2*pi/60; %Transform RPM into rad/s E3Speed = E3Speed *2*pi/60; %Transform RPM into rad/s E4Speed = E4Speed *2*pi/60; %Transform RPM into rad/s

%General names for variables

%================================================== x1 = stw;

x2 = Wind; y = fc;

For a better performance of the algorithm, we worked with system international units. This is, meters per second for wind speed and speed through water, liters per second for the fuel signal and radians per second for the angular speed of the engines.

(33)

% Initialisation =================================== z = [x1 x2];

engines = [E1Speed E2Speed E3Speed E4Speed]; lambda = 0.98;

Now, the Recursive Least Squares method is implemented. To start, we first set a value for the forgetting factor λ .

%RLS =============================================== d = y;

x = z’;

num_engines = 4 - sum(engines == 0);

%RLS for 0 Engine (stopped) if (num_engines == 0) alpha0 = d - x’*w0; g0 = P0 * x * (lambda + x’*P0*x)^(-1); P0 = 1/lambda * P0 - g0*x’*(1/lambda)*P0; w0 = w0 + alpha0 *g0; y_est = w0(1)*x(1) + w0(2)*x(2); if y_est < 0 y_est = 0; end end

... %Same for 1, 2, 3, 4 Engines ON

Later, the method is implemented as shown in chapter 2. However, it is important to remark that we need to take into account the number of engines that are ON in the moment the estimation is calculated. In this way, there will be 5 models, one for each case (0,1,2,3,4 engines ON).

% Convert into l/h again =========================== y_est = y_est*3600;

(34)

4.1.2 Outliers

Normally the outliers registered in the fuel consumption signal are just errors in the system. Therefore, they should be corrected. To do so, we have implemented a Recursive Hampel Filter function.

%

function [y_est,winf] = FuelConsumptionOutliersLMB(winf, d) %

%Function to remove outliers of short time duration for the %fuel

%consumption signal %

%As it is necessary to have K ’future neighbours’, the %delay cannot

%be less than K sample intervals %

%Source reference: S.B. Leeb, et. al. (1997) ’Applications %of Real-Time Median Filtering

%with Fast Digital and Analog Sorters’ (page 138) %

%Input: signal to be cleaned %Output: ’clean’ signal

%

% p = #number of observations

% k = #number of neighbours to take in the window

% constant = Specific constant for the RHF of the signal % sd = Standard deviation

% indice = Index to build the first sliding window % t = Threshold

% w = Sliding window p X 1

persistent k constant t sd indice w

if isempty(k)

k = 13; % number of neighbour points. %Equivalent to 3segs in real life

end

if isempty(constant)

constant = 50; %Specific constant for fuel consumption %signal

(35)

end

if isempty(sd)

sd = 1/(erfinv(1/2)*sqrt(2)); %Standard deviation end

if isempty(t)

t = 1; %Threshold for fuel consumption signal end

if isempty(indice)

indice = 1; %Index for first values end

if isempty(w)

w = zeros(1,14); %Sliding window end

We created a function called FuelConsumptionOutliersLMB which has as inputs: the pre-vious Fuel Consumption signal and the new value. The obtained output is the corrected value of the signal.

Later, the values for the standard deviation, threshold, number of neighbours for the win-dow and specific constant for the fuel consumption signal are established, and for a later implementation in QTAGG’s system, defined as persistent. It is important to remark that the number of neighbours should not be very high, since it is important to keep some of the noise of the signal in order to use it later in the controller of the system.

% Initialisation ===================================== % First values are used to build the first sliding window if indice <= k

y_est = d;

winf(indice) = d; indice = indice+1; else

Then, to start the algorithm, the first k values of the signals are not filtered since there is no possible window to analyze them with.

(36)

%RHF =================================================== w(1:13) = winf;

w(14) = d; %Definition of the sliding window

%Calculation of median of the window and mas value medi = median(w);

mad = median(abs(w-medi));

if (abs(d - medi) <= t*max(sd*mad,constant)) %RHF %condition

y_est = d; %Return y_est value

winf = w(2:14); %New inferior window

else

y_est = medi; %Return y_est value w(14) = medi;

winf = w(2:14); %New inferior window end

end

Then, the Recursive Hampel Filter method is implemented. As we have seen before, the Recursive Hampel filter uses a window of the form,

where K is the width of the window, yi− j the output for the previous input sequence {xi} at

time i − j.

However, in the real time series implementation, there are no future points, thus the win-dow will be formed only by K previous values of the sequence and the new value of the signal.

By the Recursive Hampel Filter, a point is considered an outlier if |xi− mi| > t · max (Si, cte)

where xiis the observation to analyze, mithe median value for the window, t the threshold and

S_iis defined as

Si= σ · madi≈ 1.4826 · median(|xi−K− mi|, .., |xi+K− mi|)

(37)

4.2 GPS Position

4.2.1 Outliers

For the GPS Position signals, as the latitude and longitude are not very correlated with any other signals from the dataset, it is very complicated to impute missing values. Hence, we have focused only on the diagnosis and adjustment of outliers.

We have implemented two different functions, one for the GPS Latitude signal (GPSLatOut-liersLMB) and another for the GPS Longitude (GPSLongOut(GPSLatOut-liersLMB), which are equal ex-cept a specific condition for each of them, as it is shown below.

%

function [y_est,winf_a,winf_b] = GPSLatOutliersLMB(winf_a, winf_b, d)

%

%Function to remove outliers of short time duration for the %GPSLong signal

%

%As it is necessary to have K ’future neighbours’, the delay %cannot

%

% constant = Specific constant for the RHF of the signal % sd Standard deviation

% indice Index to build the first the first sliding window % t Threshold

% w Sliding window p X 1

persistent d_ant k constant t sd indice w_a w_b

if isempty(k)

(38)

%Equivalent to 3seg in real life end

constant = 0.5; %Specific constant for STW consumption %signal

end

if isempty(d_ant)

d_ant = 0; %Previous value of d end

if isempty(sd)

if isempty(t)

t = 2; %Threshold for STW consumption signal end

if isempty(indice)

if isempty(w_a)

w_a = zeros(1,14); %Sliding window signal a end

if isempty(w_b)

w_b = zeros(1,14); %Sliding window signal b end

The input of the function GPSLatOutliersLMB are the previous clean sequence, the previ-ous sequence of signal a and signal b and the new value.

As in previous cases, some variables are set as zero and defined as persistent.

%Latitude constraint ================================= d = d_ant;

(39)

d_ant = d; end %Long constraint ===================================== if (d < -180 | d > 180) d = d_ant; else d_ant = d; end

Later, the latitude constraint is set: any value out of −90◦and 90◦is considered an outlier. If it would be for the GPSLongOutliersLMB function, the implementation for the Longitude signal, the condition would be: any value out of −180◦ and 180◦ is considered an outlier. Hence, the code would be as shown before.

% Initialisation ======================================= signal_a = sind(d);

signal_b = cosd(d);

% First values are used to build the first sliding window if indice <= k y_est = atand(signal_a/signal_b); winf_a(indice) = signal_a; winf_b(indice) = signal_b; indice = indice+1; else

Then, the new value is transformed into ‘signal a’ and ‘signal b’, applying sin and cos functions. It is important to remember that we are working in degrees and it is necessary to specify it to MATLAB.

Besides, to start the algorithm, the first k values of the sequence are not filtered since there is no window to analyze them with.

(40)

%RHF signal_a =========================================== w_a(1:13) = winf_a;

w_a(14) = d; %Definition of the sliding window

%Calculation of median of the window and mas value medi_a = median(w_a);

mad_a = median(abs(w_a-medi_a));

if (abs(d - medi_a) <= t*max(sd*mad_a,constant)) %RHF %condition

winf_a = w_a(2:14); %New inferior window

else

signal_a = medi_a; %Return y_est value w_a(14) = medi_a;

winf_a = w_a(2:14); %New inferior window end

theta1 = asind(signal_a);

... %Same for ‘signal b’

Next, the Recursive Hampel Filter method is applied in ‘signal a’ and ‘signal b’ separately.

%Return to linear measurement =========================== y_est = (theta1 + theta2)/2;

And finally, the value is returned to linear measurements, being the average of both θ (θ1

for latitude and θ2for longitude).

4.3 Speed Through Water

4.3.1 Lost data

For the imputation of missing values in the Speed Through Water signal, we have used the Kalman Filter method. In this way, we have two different functions: one for the model of the problem and another for the implementation of the Kalman Filter algorithm.

(41)

%

function xhat = Kalman2SAuto_final(s1, s2, s3, range) %Function for the designed model

%Inputs:

%s1 = first signal %s2 = 2nd signal %s3 = 3rd signal

%coef = coefficients in the least squares relation between %the signals

%range = vector defining valid values for s1,s2,s3 %

%Output:

%xhat = estimated signal %

%if no range specified if nargin < 3

range = [-inf inf; -inf inf]; end

n = min(length(s1),length(s2));

z = [s1(1:n)’;s2(1:n)’]; %input signals used b0 = s3(1); %initial estimate of bias

k=2;

while isnan(b0)

b0 = s3(k); %initial estimate of bias k = k+1;

end

For the model of the problem, we create a function called Kalman2SAuto-final which has as inputs: Relative Wind Speed, Speed Though Water, Speed Over Ground signals and the range which these signals cannot exceed. The obtained output is the estimated signal.

We also establish the first value of the Relative Wind Speed signal into the bias, to improve the estimation of the Speed Through Water.

%Estimate variances discarding any windows with zero %variance

(42)

v2 = movvar(s2,9,’omitnan’); v3 = movvar(s3,9,’omitnan’); v1 = median(v1(v1~=0),’omitnan’); v2 = median(v2(v2~=0),’omitnan’); v3 = median(v3(v3~=0),’omitnan’);

Px = diag([v1 v2]); %estimated state variance

Pb = v3; %estimated variance of bias, errors assumed to be %independent

V = diag([v1 v2]); %estimated observation variance

%Calculation of the coefficient of the relation among the %variables

%stw = b1*sog + b2*sc + b0 R = corrcoef(s1,s2,s3);

coef = [R(1,1) R(1,2) R(1,3)];

Then, the variances are estimated using the median of the variance over a sliding window. And from these values, we later compute the augmented process noise matrix such as:

Paug=

Px 0

0 P_b

From the correlation matrix, we calculate the coefficients to construct a specific model for our problem.

stw= sog + k · w

where, stw is the Speed Through Water signal, sog the Speed Over Ground signal and w the Relative Wind signal of our dataset.

%Model

A = [1 1; 1 1]/2; %Model, states equal to mean \pm bias F = [coef(3); -coef(3)];

C = A; %Measurements, equal to observation \pm bias G = [coef(3) -coef(3)]’;

%Call Kalman filter function

(43)

Later, the model explained in the previous section is defined and we call the Kalman Filter function.

function xhat = s2kalman(z, b0, A, F, C, G, V, Wx, Wb, range) % Kalman filter for two inputs with slowly changing

%constant difference

% between them. For example stw and sog.

% Signals are modelled as the mean between the two %plus/minus a dynamically

% changing bias estimate using a Kalman filter.

% While one signal is missing the bias estimate will %slowly decrease towards zero.

%

% z Measurement signals m = # of observations % A State transition model n X n, n = # of states % F State Bias model n X 1

% C Observation model m X n % G Observation Bias model n X 1 % V Observation noise estimate n X n % Wx State noise estimate n X n % Wb Bias noise estimate 1 X 1

% range domain for z 1 X 2 ex [min max]

% Initialisation ====================================== if nargin < 10

range = [-inf inf; -inf inf]; end

if size(range,1) == 1

range = [range; range]; end

A new function s2kalman is created for the Kalman Filter algorithm, which has as inputs: the signals of the model, the first value in the bias, the matrices of the model, the variance matrix for the observations, the augmented matrix and the range which the signals cannot exceed. The obtained output is the estimated signal. This function was already created by my supervisor Christopher Engström for previous projects in QTAGG.

Abar = [A F; zeros(size(F,2),size(A,2)) eye(size(F,2))]; Cbar = [C G];

(44)

The matrices ¯Aand ¯C are built from the input matrices. ¯B= 0 since we have no known inputs.

Xt+1= ¯AtXt+ ¯Btut+ ¯wt

z_t= ¯C_tx_t+ vt

Besides, the augmented covariance matrix is built from the covariance matrices created in the previous function.

% Number of sensors m = size(Cbar, 1);

% Number of state values n = size(Cbar, 2);

% Number of observations numobs = length(z);

%initialize first state to first observation in interval xhat = zeros(n, numobs);

%range condition k = 1;

while z(1,k) < range(1,1) || z(1,k) > range(1,2) || isnan(z(1,k))

k = k+1; end

xhat(1,1) = [z(1,k)]; k = 1;

while z(2,k) < range(2,1) || z(2,k) > range(2,2) || isnan(z(2,k))

k = k+1; end

xhat(1,2) = [z(2,k)]; xhat(1,3) = b0;

More initial conditions are set and the range condition is checked before starting the Kal-man Filter algorithm, which also discards the missing values of the dataset.

% Initialize P, I P = ones(size(Abar));

(45)

I = eye(size(Abar));

% Kalman Filter ==================================== for k = 2:numobs

if z(1,k) < range(1,1) || z(1,k) > range(1,2) || isnan(z(1,k))

alpha1 = 0; %if z(1,k) is out of the range or a %missing data, to not take it into account for %the Kalman z(1,k) = 0; else alpha1 = 1; end if z(2,k) < range(2,1) || z(2,k) > range(2,2) || isnan(z(2,k)) alpha2 = 0; z(2,k) = 0; else alpha2 = 1; end % Predict

xhat(:,k) = Abar * xhat(:,k-1); %prediction P _{= Abar * P * Abar’ + W;}

% Update

K = P _{* Cbar’ / (Cbar * P * Cbar’ + V);} P _{= (I - K * Cbar) * P;}

xhat(:,k) = xhat(:,k) + K * (z(:,k) - Cbar * xhat(:,k)); xhat(3,k) = xhat(3,k)*alpha1*alpha2+xhat(3,k)*

(1-alpha1*alpha2)*0.95;

%reduce bias term when measurements are missing end

Then, the Kalman Filter algorithm explained in the previous section is implemented. Moreover, we have included a condition to properly handle the observations that are miss-ing or out of range. All these measurements will be ‘marked’ and not taken into account in the algorithm. Thus, in the case that an observation is set as missing, we will only use the prediction step to compute its next value. However, we have also introduced some alpha terms that will reduce the bias for these missing measurements.

Thus, we have included a new signal in the bias to improve the standard implementation of the Kalman Filter, in order to reach a better estimation of the Speed Through Water signal.

(46)

4.3.2 Outliers

An accurate Speed Through Water data is a pre-requisite for many decisions related to the control and performance of a vessel. Hence, it is important to have a high quality signal for these future actions and decisions of the system. For the cleaning of the STW signal, we have used the Recursive Hampel Filter method.

%

function [y_est,winf] = STWOutliersLMB(winf, d) %

%Function to remove outliers of short time duration for the %STW signal

%

%As it is necessary to have K ’future neighbours’, the delay %cannot

%

% constant = Specific constant for the RHF of the signal % sd Standard deviation

% indice Index to build the first sliding window % t Threshold

% w Sliding window p X 1

persistent k constant t sd indice w

if isempty(k)

k = 13; % number of neighbour points. %Equivalent to 3seg in real life

end

constant = 0.5; %Specific constant for STW consumption %signal

(47)

end

if isempty(sd)

if isempty(t)

t = 1; %Thershold for STW consumption signal end

if isempty(indice)

if isempty(w)

w = zeros(1,14); %Sliding window end

The input of the function STWOutliersLMB are the previous clean sequence and the new value. The output is the clean signal.

As in previous cases, some variables are set as zero and defined as persistent.

% Initialisation ====================================== % First values are used to build the first sliding window if indice <= k

y_est = d;

winf(indice) = d; indice = indice+1; else

To start the algorithm, the first k values of the data sequence are not filtered as there is no possible window to evaluate them with.

%RHF =================================================== w(1:13) = winf;

(48)

w(14) = d; %Definition of the sliding window

%Calculation of median of the window and mas value medi = median(w);

mad = median(abs(w-medi));

if (abs(d - medi) <= t*max(sd*mad,constant)) %RHF %condition

y_est = d; %Return y_est value

winf = w(2:14); %New inferior window

else

y_est = medi; %Return y_est value w(14) = medi;

winf = w(2:14); %New inferior window end

end

Finally, the Recursive Hampel Filter is implemented, and as we mentioned before, a point is considered an outlier if

|xi− mi| > t · max (Si, cte)

where xi is the observation to analyze, mi the median value of the sliding window, t the

threshold and Siis defined as

S_i= σ · madi≈ 1.4826 · median(|xi−K− mi|, .., |xi+K− mi|)

(49)

Chapter 5 Conclusions

In first place, the aim of this work was to implement a signal curation system that enables automatic detection and correction of abnormalities in the signals of a vessel.

There are three different scenarios of disturbances that a signal can have and that need to be adjusted: lost data, wrong data and outliers. For this project, we focused on the detection and correction of these disturbances in three specific signals: Fuel Consumption, GPS Posi-tion and the Speed Through Water.

To clean the Fuel Consumption signal, we implemented a function based on the Recursive Least Squares method for the imputation of missing values and a Recursive Hampel Filter for the location and rectification of outliers. For the GPS Position signals (Latitude and Longit-ude), we focused on the removal of outliers and correction of wrong data (values out of range) using a function based on the Recursive Hampel Filter. In the Speed Through Water signal, we implemented a Kalman Filter for the estimation of missing values and again a Recursive Hampel filter based function for the correction of outliers.

As we have seen before, these methods perform very accurately and therefore they are precise tools to be implemented in QTAGG’s control system and reach an optimal performance of the vessel.

In this way, our second goal, to make use of Mathematics and Data analysis to obtain robust detection and correction methods can be seen in this project, strengthening the idea of Math-ematics and Data analysis as one of the main keys in the industrial application. As without this signal curation system, the performance of the control system will not be that precise and severe damages to the equipment or passengers can be produced.

Finally, interesting future lines of study could be the implementation of a signal curation system for more interesting signals such as the Roll Position or Engine Speed, or to improve the accuracy of the Fuel Consumption signal by including the sea current in the model, as in this project it was not possible because of lack of data.

(50)

List of Figures

2.1 Schema for Kalman Filter . . . 8

3.1 Schema of the full control process . . . 15

3.2 Schema of the system LMB . . . 16

3.3 Example of some columns of a data log used in this project . . . 18

3.4 Comparison of raw data of fuel consumption with its estimation using RLS . 19 3.5 Outlier filtering in fuel consumption signal . . . 21

3.6 Outlier filtering in GPS Latitude signal . . . 23

3.7 Detail in outlier filtering for GPS Latitude signal . . . 23

3.8 Comparison of raw data of STW with its estimation using Kalman Filter . . . 25

3.9 Detail of comparison of raw data of STW with its estimation using Kalman Filter . . . 26

(51)

Bibliography

[1] R.G.K.M. Aarts System Identification and Parameter Estimation. Lecture Notes, Univer-sity of Twente, 2012.

[2] M. Alexandersson. Design Evaluation Tool of Ship Concept. 2009.

[3] G. Bishop and G. Welch. An Introduction to Kalman Filter. University of North Carolina at Chapel Hill, 2001.

[4] A. Bocchetti and A. Lepore. A statistical control of the ship fuel consumption. 2013. [5] A. El Ouardi. Applying Autonomous Methods for Signal Analysis and Correction with

Ap-plications in the Ship Industry. Master’s thesis, Department of Mathematics, Mälardalen University, 2017.

[6] F. Haugen. State estimation with Kalman Filter. Kompendium for Kyb. 2 with Oslo Uni-versity, 2015.

[7] S.S. Haykin. Adaptive filter theory. 2002.

[8] R. E. Kalman. A New Approach to Linear Filtering and Prediction Problems. J. Basic Eng, Volume 82, Pages 35-45, 1960.

[9] J. Lacaille. An Automatic Sensor Fault Detection and Correction Algorithm. AIAA ATIO, Hilton Beach, 2009.

[10] S. B. Leeb et al. Applications of Real-Time Median Filtering with Fast Digital and Ana-log Sorters.IEEE/ASME Transactions on mechatronics, Vol. 2, No. 2, 1997.

[11] S.J. Miller. The method of Least Squares. Mathematics Department Brown University, Vol. 114, 2006.

[12] R.K. Pearson et al. The Class of Generalized Hampel Filters. 23rd European Signal Pro-cessing Conference (EUSIPCO), 2015.

[13] I. Peñarrocha et al. Adaptive extended Kalman filter for recursive identification under missing data.Universitat Jaume I, 2010.

[14] J.W. Tukey. Nonlinear (nonsuperposable) methods for smoothing data. Abstracts, Cong. Rec. EASCON-74, 673. 1974.

(52)

[15] O. Valentin et al. Design Evaluation Tool of Ship Concept. 2016.

[16] Wang X.R. et al. Spatiotemporal Anomaly Detection in Gas Monitoring Sensor Net-works. In: Verdone R. (eds) Wireless Sensor NetNet-works. Lecture Notes in Computer Sci-ence, vol 4913. Springer, Berlin, Heidelberg, 2018

[17] J. Wiklander. Performance comparison of the Extended Kalman Filter and the Recursive Prediction Error Method. Degree project, Control Technology, Technical University of Linkoping, 2003.

[18] S. Boyd. Lectures notes: Linear Dynamical Systems Standford University, 2008. https://stanford.edu/class/ee363/lectures/kf.pdf

[19] I. Tabus. Lectures notes: Recursive Least Squares. Tampere University of Technology, 2012. http://www.cs.tut.fi/ tabus/course/ASP/LectureNew10.pdf

(53)

Appendix A

Summary of reflection of objectives in the

Thesis

Compliance of the requirements of the Swedish National Agency for Higher Education to Master theses (2 years) is shown below.

A.1 Objective 1: Knowledge and understanding

This project deals with the knowledge and understanding of real-time data analysis techniques for noisy environments. In the section for literature review, some different resources neces-sary for the development of this thesis were displayed. Moreover, the whole chapter 2 is an expansion of these methods, with a full explanation of each of them, making also a specific application in the marine field, which is detailed in chapter 3.

Besides, programming skills were indispensable. The student implemented the different meth-ods for three signals of one of the vessels that QTAGG works with, obtaining really precise and accurate results.

A.2 Objective 2: Methodological knowledge

In addition to the theoretical part of chapter 2, where the three main mathematical techniques used in this thesis were exposed, a deeper methodological knowledge is shown in a particular application, the implementation of a ‘magic box’ in order to clean the signals received from the sensors of a vessel.

(54)

A.3 Objective 3: Critically and systematically integrate

know-ledge

For the development of this thesis, the student used different theoretical articles and books for the explanation of the theoretical framework, and also many of the sources that the supervisor in QTAGG George Fodor gave to the student. These last articles that he provided were mainly about signal processing and marine information, which were very useful in order to be able to apply the mathematical methods in the theoretical chapter of this thesis to the marine field.

A.4 Objective 4: Independently and creatively identify and

carry out advanced tasks

The first decision of this thesis was how to design a good structure of it. Then, for the theor-etical part, I decided to give an extended explanation for each of the mathematical methods. Later, for the study case, based on the supervisor Christopher Engström’s code, a new imple-mentation was developed for the signal curation system presented in this thesis, also explaining its contextualization and results in this report. However, it is important to remark the help and discussion with the supervisors in order to improve the content of the thesis.

A.5 Objective 5: Present and discuss conclusions and

know-ledge

The structure of this thesis enables different levels of depth in the subject. As for the under-standing of the project, the study case chapter would be enough. However, for a more in-depth extension, a theoretical framework is available, and also a full explanation of the MATLAB code created for this project. Besides, an oral presentation of this thesis will take place in June 2019, where the content will be exposed, and the audience will have the opportunity to ask questions.

A.6 Objective 6: Scientific, social and ethical aspects

For the mathematical methods behind the implementation of the signal curation system, clear references to the sources used were given in the bibliography section. Besides, a detailed explanation of the data provided by QTAGG for the development of this thesis, can be found in the study case section. And finally, all the study case and implemented code (see Appendix B) is original of this work.

(55)

Appendix A

Appendix: Complete Matlab code

Due to privacy concerns this appendix has been removed. In case of interest, the reader can contact the companyhttps://www.qtagg.com/

Mathematical Methods for Maritime Signal Curation in Noisy Environments

School of Education, Culture and Communication

Division of Applied Mathematics

MASTER THESIS IN MATHEMATICS / APPLIED MATHEMATICS

Mathematical Methods for Maritime Signal Curation in Noisy

Environments

by

Lara Jiménez Blázquez

Masterarbete i matematik / tillämpad matematik

DIVISION OF APPLIED MATHEMATICS

School of Education, Culture and Communication

Division of Applied Mathematics

Contents

Chapter 1

Introduction

1.1

QTAGG: Marine Energy Efficiency

1.2

Problem description

1.3

Goals and objectives

1.4

Content of the report

1.5

Literature review

Chapter 2

Theoretical framework

2.1

Kalman filter

2.2

Recursive least squares

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

2.3

Recursive Hampel filter

Chapter 3

Study case

3.1

Contextualization

3.2

Approach and analysis

3.2.1

Fuel consumption

3.2.2

GPS Position

3.2.3

Speed Through Water

Chapter 4

Matlab code

4.1

Fuel consumption

4.1.1

Lost data

4.1.2

Outliers

4.2

GPS Position

4.2.1

Outliers

4.3

Speed Through Water

4.3.1

Lost data

4.3.2

Outliers