Fault Detection in Mobile Robotics using Autoencoder and Mahalanobis Distance

(1)

V¨

aster˚

as, Sweden

Thesis for the Degree of Master of Science - Computer Science with

Specialization in Embedded Systems 30.0 credits

FAULT DETECTION IN MOBILE

ROBOTICS USING AUTOENCODER

AND MAHALANOBIS DISTANCE

Christian Mortensen

cmn16002@student.mdh.se

Examiner:

Thomas Nolte

M¨alardalen University, V¨aster˚as, Sweden

Supervisor:

Nandinbaatar Tsog

M¨alardalen University, V¨aster˚as, Sweden

Company Supervisor: Jonatan Blom

ABB Corporate Research, V¨aster˚as, Sweden

(2)

Abstract

Intelligent fault detection systems using machine learning can be applied to learn to spot anomalies in signals sampled directly from machinery. As a result, expensive repair costs due to mechanical breakdowns and potential harm to humans due to malfunctioning equip-ment can be prevented. In recent years, Autoencoders have been applied for fault detection in areas such as industrial manufacturing. It has been shown that they are well suited for the purpose as such models can learn to recognize healthy signals that facilitate the detection of anomalies. The content of this thesis is an investigation into the applicability of Autoencoders for fault detection in mobile robotics by assigning anomaly scores to sam-pled torque signals based on the Autoencoder reconstruction errors and the Mahalanobis distance to a known distribution of healthy errors. An experiment was carried out by train-ing a model with signals recorded from a four-wheeled mobile robot executtrain-ing a pre-defined diagnostics routine to stress the motors, and datasets of healthy samples along with three different injected faults were created. The model produced overall greater anomaly scores for one of the fault cases in comparison to the healthy data. However, the two other cases did not yield any difference in anomaly scores due to the faults not impacting the pattern of the signals. Additionally, the Autoencoders ability to isolate a fault to a location was studied by examining the reconstruction errors faulty samples determine whether the errors of signals originating from the faulty component could be used for this purpose. Although we could not confirm this based on the results, fault isolation with Autoencoders could still be possible given more representative signals.

(3)

2.3 Autoencoders . . . 5 2.4 Anomaly Scoring . . . 7 2.4.1 Distance Metrics . . . 7 3 Related Work 9 4 Research Methodology 11 5 System Model 13 6 Autoencoder-Based Fault Detection 14 6.1 Fault Detection . . . 14 6.1.1 Autoencoder . . . 14 6.1.2 Anomaly Scoring . . . 15 6.1.3 Fault Threshold . . . 15 6.2 Fault Isolation . . . 15 7 Experiments 17 7.1 Industrial Use Case . . . 17

7.1.1 Diagnostics Routine . . . 17 7.1.2 Fault Injections . . . 18 7.1.3 Data Collection . . . 18 7.2 Synthetic Data . . . 19 7.3 Data Pre-Processing . . . 19 7.4 Model Training . . . 20 8 Results 22 8.1 Fault Detection . . . 23 8.2 Fault Isolation . . . 26 9 Discussion 29 10 Conclusions 30 10.1 Future Work . . . 30 References 33

Appendix A Industrial Use Case Results 34

(4)

List of Acronyms

ANN Artificial Neural Network. 1, 5, 9

CNN Convolutional Neural Network. 9, 10

IMU Inertial Measurement Unit. 10

k-NN k-Nearest Neighbor. 1, 7

LSR Latent Space Representation. 5–7, 10

LSTM Long Short-Term Memory. 9, 10

MCD Minimum Covariance Determinant. 8, 9, 15

MLP Multilayer Perceptron. 20

PCA Principal Component Analysis. 5

ROS Robot Operating System. 17

SGD Stochastic Gradient Descent. 7

SVM Support Vector Machine. 1

(5)

1. Introduction

As the complexity of modern technologies such as robots and automotive vehicles increases, so does the need for robust and intelligent monitoring strategies to automatically detect faults during operation. Due to the large amount of sensor data that is constantly flowing through such systems, it is not feasible for a human diagnostician to analyze and detect anomalies manually from these signals. This is problematic as undetected mechanical faults can lead to damage to the equipment or harm to human operators, among other

things [1, 2]. Model-based methods are widely used in the industry to detect faults in

mechanical components such as motors, and this is done by mathematically modelling the

expected response when some input is given [3]. These methods rely on a priori knowledge

of the system to correctly model the expected behaviour, and have difficulties in accounting

for noise caused by external factors such as changes in the operating environment [3].

Fault detection using machine learning techniques can alleviate the issues that comes with model-based approaches, where instead a machine learning model can be trained detect faults from recorded sensor data as opposed to hand crafted mathematical

mod-els [4, 5]. The traditional way of machine learning in the fault detection domain is to

extract descriptive features in either the time, frequency, or time-frequency doman using

signal processing techniques[3,6]. Still, the data may contain useless information which

will affect the stability and accuracy of the fault detection. To improve the quality of the extracted data further techniques are commonly applied to select the most relevant features, a common approach is to use principal component analysis (PCA) to reduce

the dimensionality of the data [7]. Fault detection can then be done by training a

ma-chine learning model using the pre-processed data, e.g. Support Vector Mama-chine (SVM)

or k-Nearest Neighbor (k-NN) [6]. While these approaches have proved to yield very good

results when used for fault detection [7,8], they require expert knowledge about signal

pro-cessing to design a feature extraction and selection pipeline that will produce descriptive

features [6,9,10]. In addition to the aforementioned problem, traditional models do not

generalize well over multiple applications, often requiring a redesign of the pre-processing

pipeline to migrate a model over to another use case [6].

Due to the recent advances in machine learning using artificial neural networks (ANNs), a large part of the current research in the domain are focused on the application of such

techniques [2, 3, 4, 10, 11, 12]. In particular special types of neural networks known as

Autoencoders have found interest for fault detection applications. Autoencoders have the ability to extract relevant features directly from raw sensor signals requiring no complex pre-processing, leading to a greater capability for a single model to generalize over multiple

different faults [2,4,12,13]. They are also trained in an unsupervised fashion, where they

learn to reduce training samples into intermediate representations which facilitates the

reconstruction back to signals that are close to the original samples [13]. Autoencoders are

well suited for anomaly detection applications as a representation of fault-free sensor data can be learned, and consequently will yield poor reconstructions of faulty data provided that the signals differ enough from the healthy ones.

Previous research in fault detection techniques using Autoencoders have mainly tar-geted industrial uses in areas such as manufacturing equipment or wind turbines, but applications for these techniques can also be found in other areas such as robotics. Mo-bile robots operate in dynamic environments and can as such benefit greatly from such techniques as the model can be trained to extract features valuable features from many different situations. Additionally, sensor data can be gathered over longer periods of time and be used to update the weights of the model for better predictive power.

(6)

The goal of this thesis is to evaluate the usage of Autoencoders and the Mahalanobis distance metric for fault detection in a four wheeled mobile robot, and is carried out in collaboration with ABB Corporate Research in V¨aster˚as, Sweden. Additionally, the ability for the Autoencoder to distinguish between healthy and faulty variables in a sample is investigated for fault isolation purposes.

1.1. Problem Formulation

Intelligent fault monitoring systems can be used to analyze sensor signals from machin-ery for early detection of faults, and as a result can prevent costly repair costs. Data-driven approaches have previously been successfully used for this purpose, which use data recorded from the machinery to teach a model to distinguish between healthy and faulty operating conditions. Traditional data-driven approaches typically require manual feature engineering using e.g. signal processing techniques to convert raw signals into a format that contains relevant information for anomaly detection, a process which can be very work intensive. In recent years Autoencoders have been successfully applied in models for anomaly detection in various forms of machinery, where they can act as automatic fea-ture extractors from raw signals to remove the need manual signal processing steps. The majority of research on Autoencoder based methods for fault detection has been focused on industrial machinery, with a minority of research being conducted on robotic systems such as UAVs. This thesis aims to investigate how Autoencoders can be applied for fault detection and isolation in the mobile robotics domain, and with this in mind the following research questions have been formulated:

RQ1: How can we apply Autoencoders to detect faults in signals originating from the drive system of a wheeled mobile robot?

RQ2: Can we use Autoencoders to isolate faults to a location based on signals originating from the drive system of a wheeled mobile robot?

1.2. Research Contributions

The main research contribution of this work is a study on the usage of Autoencoders for fault detection in the domain of mobile robotics, which so far has not been widely researched. An experiment is carried out by training an Autoencoder to recognize healthy signals recorded from a mobile robot, which is then evaluated by inputting both healthy and faulty sensor signals to compare the results.

1.3. Thesis Outline

Listed below is an outline of the rest of the thesis, together with brief descriptions of the content of each section:

• Section 2 provides the theoretical background to the work. It provides an overview of intelligent fault detection techniques, and defines the methods used in this thesis. • Section 3 presents other works that are related to this thesis.

(7)

• Section 4 describes the research methodology that was used from the initial problem description, to experimental design and execution, and finally to result analysis. • Section 5 describes the assumptions regarding the system, explaining the expected

inputs and outputs.

• Section 6 documents the proposed fault detection method, where each individual software component is described.

• Section 7 contains information regarding how the experiments were carried out, including data collection, pre-processing, and experimental parameters.

• Section 8 presents the results of the experiments in table and graph form.

• Section 9 provides an analysis and discussion of the results from Section 8, and tries to answer the research questions.

• Section 10 concludes the report and summarizes the results. Possible future works are also presented here.

(8)

2. Background

A number of different strategies can be applied for fault detection in robotics, where the correct choice of method for a given situation will depend on many different factors. Faults may be software or hardware related, or even related to how the robot interacts

with its environment and other robots [5]. An additional contributing factor to the choice

of method is the level of autonomy of the robotic system itself. Mobile robots are complex systems with several different sub-systems that work together to achieve autonomy. Higher level behaviours such as localisation, navigation, and even collaboration between robots are all parts that may experience faults that can be diagnosed. The goal of this research is to detect mechanical anomalies, and as such the rest of this section will be focused on strategies that can be applied to fulfill this purpose.

2.1. Model-based Fault Detection

Model-based methods rely on mathematical models of the system under normal operation or using models of known faulty behaviours, and rely on a priori knowledge of the system where discrepancies between the modeled and the actual behaviour are used to detect

faults [5]. Analytical models can be applied to individual subsystems of the robot to

estimate their expected output using observers and to generate a residual as the difference

of this value and the actual output [5,14]. In a fault-free system the residual should be

close to zero, and if not it can be considered that some error has occurred [5]. The expected

outputs of sub-systems of the mobile robot such as the revolutions per minute (RPM) of

the motors that drive the wheels can be easily modeled as functions of time [5]. However,

a major flaw with analytical models is the difficulty in accounting for the uncertainties

and disturbances which are unavoidable when the model is applied in practice [3,14].

2.2. Data-driven Fault Detection

Data-driven approaches extract relevant information directly from sampled data, and

in-clude machine learning approaches and statistical filtering techniques [5]. A major

disad-vantage of relying on model-based fault detection is the difficulty in correctly modeling the complex behaviours of a robot, and as such data-driven methods could be more suitable due to the sampled data being used to drive the construction of the model. Data-driven methods typically generalize well over multiple problem areas, and as such the same model

could possibly be used to detect faults on many different robots [5]. These methods learn

from previous examples of observed normal or faulty behaviours, and can then be applied during run-time to detect or predict faulty behaviours from the data produced online by the robot. Training can be supervised by using labeled data samples such that the labels correspond to either normal behaviour or to a known fault, or be unsupervised where the model can distinguish between faulty and normal behaviour solely based on the

data itself [3]. The majority of recent research regarding intelligent fault detection regard

unsupervised or semi-supervised methods, where only a small amount of labeled data is provided). This may be due to the difficulty in capturing and labeling all possible fault

cases in the recorded data [5, 15], acquiring enough training data to train an accurate

model [5,15], in addition to the knowledge and work of a human expert influencing the

accuracy due to how the data is labeled [12, 15, 16]. Unsupervised fault detection

ap-proaches include those that use clustering techniques to separate samples into categories,

(9)

Input Encoder Code Decoder

(LSR) Output

Figure 1: The components of an Autoencoder. Some input is coded by the encoder into a format

of a lower dimensionality, which is then reconstructed by the decoder into its original form.

Machine learning models based on Artificial Neural Networks (ANNs) have gained a lot of attention during recent years. An ANN can be considered as a set of processing

units referred to as neurons, each with its own set of input and output connections [17,18].

Neural networks are divided into input and output layers, with multiple hidden layers of neurons in-between them. Each neuron is activated by a non-linear activation function of the values produced by the weighted connections from the previous layer, the value of

which is passed on to the connected neurons in the next layer [18]. The weights of the

input connections are updated during training based on how well the model performed,

which is expressed with a loss function [18]. Two categories of ANNs exist, referred to as

feed-forwardand recurrent networks. Feed-forward networks follow the structure described

previously and can be seen as directed acyclic graphs, whereas each neuron in a recurrent

network has an additional connection to itself [18].

One of the main challenges regarding machine learning techniques is to develop a suit-able feature engineering strategy to convert the signals into a format which is worksuit-able by a classifier. For this purpose, various signal processing methods are applied to transform the raw signals into the frequency or time-frequency domain. Other statistical techniques such as Principal Component Analysis (PCA) may be applied to extract the most impor-tant information from the data while at the same time reducing the dimensionality of the

feature vector used by a classifier [19,20]. An Autoencoder is a type of ANN that can be

seen as a generalization of PCA that both reduces the data to a format of lower dimension, and then reconstructs the intermediate format into a representation that is close to the

original [21]. The intermediate data of the Autoencoder can then be used in a similar way

as the principal components in PCA, but has some interesting characteristics which may give it an edge for fault detection in multivariate data.

2.3. Autoencoders

An Autoencoder is an ANN consisting of two main components; an encoder that converts the input data into a format with lower dimensionality by learning its most distinguishing features, and a decoder which reconstructs the encoded data with as small difference from

the original representation as possible [13]. The code produced by the encoder is referred

to as the Latent Space Representation (LSR), and due to its reduced number of neurons in comparison to the input layer, it should contain only the most representative features

(10)

1 ₁

Figure 2: An Autoencoder consisting of a single hidden layer with the neurons h1, h2, ..., hk.

Each neuron is fully connected to all neurons in its preceding layer in addition to an additional bias neuron. The influence the connections have are determined by the weight matrices W1 and

W2, as well as the bias vectors b1 and b2.

Autoencoders are trained in an unsupervised fashion, i.e. with unlabeled data. To get a separation between anomalies and nominal data we can train an Autoencoder to reconstruct healthy signals, and then consider an input as faulty in when the difference

between the reconstruction and the input is above a certain threshold [3].

For a formal definition of an Autoencoder, we can consider the function fθ(x) =

h as the encoder mapping the input x = {x1, x2, ..., xn} into a feature vector h =

{h₁, h2, ..., hk}, i.e. the LSR [12, 22]. We then define the decoder as a function of the LSR gθ(h) = ˜x mapping from the feature space back to the input space, producing a reconstruction ˜x = {˜x1,˜x2, ...,˜xn} of the input such that x ≈ ˜x [22].

h= fθ(x) = f(W1x+ b1) (1)

˜

x= gθ(h) = g(W2h+ b2) (2)

Equations 1 and 2 show the mathematical definitions of the encoder and decoder respectively. The functions f(·) and g(·) are typically non-linear, such as the sigmoid or

hyperbolic tangent functions. Both functions are parameterized by θ = {W₁, W2, b1, b2},

containing the respective weights and biases of the links between the layers. The weight

matrices W1 and W2 determine the contribution of the outputs of the previous layer in

activating the neurons, whereas the bias vectors b1 and b2 determine contributions from

an additional dummy neuron which is always fully activated (in other words, it is always

equal to 1) [18]. These parameters can be seen in Figure 2 as the weights of the links

between the layers.

In the training phase, the parameters initially randomized and then updated simulta-neously through several iterations to minimize some loss (cost) function. In the most basic Autoencoder, the loss for a training sample L(x, ˜x) is equal to the squared reconstruction

error [22], as is shown in Equation 3.

(11)

For a training dataset with N examples, the goal is to find an optimal set of parameters that minimize the loss over all samples, see Equation 4. A common optimization strategy

to use is Stochastic Gradient Descent (SGD) [22].

θ0 = argmin θ 1 N N X i=1 L(xi, gθ(fθ(xi))) (4)

When designing an Autoencoder, it is important to consider some strategy for pre-venting overfitting, as a generalizable model needs to produce a low reconstruction error

on both the training and testing samples [22]. Reducing the size of the latent space low

enough is one way to do this, as the encoder will be forced to discard information to keep only coarse-grained features. However, reducing it too much will likely negatively impact the accuracy of the model, and in these cases a better option may be to use regularized Autoencoders, which add additional mechanisms to the process to increase the

general-izability of the model [22]. A common regularization technique is to enforce a sparsity

constraint by including an additional term to the optimization objective that penalizes

the weights of the hidden neurons, which prevents certain neurons from firing [22]. Such

techniques can be used to increase the size of the bottleneck between the encoder and the decoder and may even allow for Autoencoders with an LSR with a higher dimensionality

then the input data, as it prevents the model from simply duplicating the inputs [22]. L1

regularization can be used for this purpose, where the penalty term is equal to the sum of

the weights multiplied by some regularization factor λ [22,23].

2.4. Anomaly Scoring

Fault detection entails separating faulty from healthy samples, and there exists several methods for achieving this by having an Autoencoder act as a feature extractor. Clustering approaches such as k-means or k-NN can be applied on the LSR to group samples based on

some similarity metric [12]. In this work, the reconstruction errors from an Autoencoder

was used, and similar approaches have been previously applied by several authors with

very promising results [10,20,24]. The intuition is to train an Autoencoder to model the

normal operating behaviour of the robot, so that erroneous signals will be reconstructed with large errors at a higher frequency than that of healthy signals. Consequently, a sample can be marked as faulty if the distance to some distribution of healthy samples is greater than a set threshold, and the distance can be used as a metric of how anomalous the sample is, i.e. an anomaly score.

2.4.1. Distance Metrics

To find anomalies a reliable distance metric is needed to score the samples based on their positions in relation to a known distribution of healthy samples. We can consider

a sample to be scored as a vector e = {e1, e2, ..., ep} with p elements, each being the

reconstruction errors for some observation (i.e., sensor measurement) in o and ˜o calculated as in Equation 5. For one-dimensional non time-series inputs o is typically equal to the full inputs and outputs of the Autoencoder, but in the case of two-dimensional time-series data the size is equal to the number of different signals.

e= ˜o − o (5)

The simplest metric that we can use to score the sample is its Euclidean distance to the mean of a known distribution of reconstruction errors from healthy samples. This

(12)

(a) Euclidean distance. (b) Mahalanobis distance.

Figure 3: A comparison between an anomaly score based on the Euclidean distance in the left

figure and the Mahalanobis distance in the figure on the right. The two purple samples are in the case of the euclidean distance given the same score despite one of them being closer to the cluster, whereas the score varies for the Mahalanobis distance.

metric can be seen in Equation 6 denoted as ED(·), where µ ∈ Rp _{is the center of the}

distribution, in other words the vector of mean values from the samples.

ED(e) = ke − µk (6)

In cases where the variables are uncorrelated the Euclidean distance may be a useful metric to find outliers. However, in cases where some correlation between the variables can be observed it will not provide useful information in terms of the distance to the cluster, as can be seen in Figure 3a. The Mahalanobis distance which is defined in Equation 7 is a measure better suited for multivariate data with correlations as it considers the distribution

of the points, which is done by incorporating the covariance matrix [25]. Figure 3b shows

how the Mahalanobis distance provides a better estimate of the anomalousness of a sample, as points which are further away from the cluster are scored higher than those closer to it.

M D(e) =

q

(e − µ)Σ−1_{(e − µ)}T (7)

A further extension is the robust Mahalanobis distance which reduces the influence of anomalies in the training data by computing a covariance matrix through Minimum Covariance Determinant (MCD) estimator. The MCD estimator finds a subset of samples with a given size from the original dataset which minimizes the determinant of the

covari-ance matrix [26]. Equation 8 provides the definition of the robust distance Mahalanobis

distance, where ˆµ is the robust location of the distribution (the mean of the samples

se-lected by MCD). See the paper by Rousseeuw and Driessen [27] as well as the paper by

Hubert and Debruyne [26] for further information regarding MCD and its uses in anomaly

detection.

RD(e) =

q

(13)

3. Related Work

Historically the problem of detecting faults in machinery has been approached in several different ways. Model-based approaches have found a lot of success, but in recent years the research in data-driven methods have picked up speed. An early example is the work conducted by Goel et al., where an intelligent monitoring system for a mobile robot was

developed by analyzing sensory signals from wheel encoders and gyroscopes [28]. A bank of

Kalman filters which were tuned to known faults generated residuals, and an ANN trained under supervision was applied to provide a final classification. An average Mahalanobis distance for each filter with their respective output residuals and covariance matrices was calculated from multiple time-steps. The final classification was produced by feeding the average distances of each filter to the ANN to produce a final probability distribution, and then selecting the final fault class as the most likely candidate from the distribution.

A more recent example include the work of Fu et al., where deep learning with a hybrid

CNN and LSTM network was applied to detect and isolate actuator faults in UAVs [29].

The training dataset contained signals such as power outputs and attitude data from six different actuators. Faults were pre-recorded and labeled to correspond to the faulty actuator to facilitate fault classification by the network.

Unsupervised methods have also been explored to both detect and isolate faults. The main advantage of these methods is that the training data can be reduced to only contain nominal data to construct a model which is capable of detecting anomalies in online signals in comparison to known healthy patterns, and as a result being able to cover a wider span of possible faults with a smaller dataset. Khalastchi et al. explored a fault detection method based on the Mahalanobis distance metric to find outliers in recent sensor readings, by

comparing it to previously sampled data [25]. The model is trained during run-time by

utilizing a sliding window with to select a fixed number of recent readings, excluding the most recent ones. From the samples in a window, sets of correlated attributes are found by computing the Pearson correlation coefficient of each attribute pair, and discarding those that are below a set threshold. The Pearson correlation coefficient was also used by Zhao

et al. to capture correlations between signals for fault detection in electric generators [30].

In recent years Autoencoders have been applied as a replacement for manual feature engineering techniques, and have shown to be capable in capturing non-linear relationships

between variables without supervision [10]. Wu et al. employed a multi-level denoising

Autoencoder to detect faults in wind turbines based on multivariate data sampled from

a SCADA system [10]. The Autoencoder was trained to reconstruct healthy signals, and

with the trained network the same signals were once again passed through the model to output a set of healthy reconstruction errors. From the errors a MCD covariance matrix was computed, which was then used to derive a distribution of anomaly scores with the robust Mahalanobis distance metric. A fault detection threshold was then set from the distribution by integrating a Gaussian kernel density function up to some confidence level.

Liang et al. used a similar method for fault detection in pumps [24]. Reconstruction

errors from a sparse autoencoder was computed during run-time, and the Mahalanobis distance with respect to a distribution of healthy reconstruction errors was used to de-termine anomaly scores. A dynamic fault threshold was also computed ahead of time by integrating the density estimate with a Gaussian kernel from the healthy distribution to some confidence level. When the anomaly score of a sample was greater than the threshold a fault is assumed, and the fault was then isolated to a subset of the input variables as the ones with greatest squared prediction error between the inputs and reconstructions.

(14)

a manipulator robot used for assisted feeding tasks [31]. The encoder and decoder were combined with LSTM units to better account for the temporal features of the signals. Additionally, Gaussian noise was added to the signals to introduce denoising capabilities to the model. Fault thresholds are set online by applying support vector regression with a radial basis function kernel to the LSR of the variational Autoencoder, while anomaly scores were computed as the negative log-likelihood of an input.

Sadhu et al. investigated a combined fault detection and diagnosis framework based

on deep learning for real-time usage in UAVs [20]. The framework consisted of an initial

Autoencoder based on 1D CNNs and bi-directional LSTM units to act as the fault detector, and a CNN and LSTM classifier to perform the diagnosis in case a fault was detected. Sampled raw IMU data recorded during non-faulty operation of the UAV was used to train the Autoencoder, and once the network was trained the samples were inputted once again compute the reconstruction errors and fit them to a Gaussian distribution. The Mahalanobis distance of the errors for each datapoint with respect to the fitted distribution was used as anomaly scores and a fault threshold was set based on the top 0.01% scoring samples.

As can be seen by the previously described works, feature extraction techniques based on Autoencoders have been applied in heavy static machinery such as pumps and wind turbines. For autonomous systems we can refer to the works of Park et al. and Sadhu et al. where Autoencoders have successfully been used to extract representative features for robot manipulators and UAVs. The data collected by these authors may not be representative for similar applications in the domain of mobile robotics however, and to the best of our knowledge no previous works have been published regarding fault detection using Autoencoders in this area.

(15)

4. Research Methodology

A framework for research proposed by Holz et al. was used as a base to more concretely

formulate the research methodology, consisting of four questions as listed below [32].

1. What do we want to achieve? - The goal of the research is to evaluate if Autoen-coders are applicable for the purpose of fault detection of mechanical faults in mobile robotics.

2. Where does the data come from? - An experiment is conducted to produce the data. Sensor signals recorded from the drive system of a mobile robot are used to train a machine learning model, and the outputs of the model are then used to evaluate its performance. Both healthy signals, and signals recorded when faults have been injected into the drive system are recorded.

3. What do we do with the data? - With the data we try to identify patterns in the model’s output when different types of signals are provided, i.e. if we can see any differences between healthy and faulty signals.

4. Have we achieved our goal? - By inspecting the data we can draw conclusions as to whether or not the model that was developed performed well for the purpose. Based on the points above, the research methods used in this thesis are literature reviews, empirical experiments, synthetic experiments, and statistical analysis. The

re-search process followed a structure as described by Robson and McCartan [33], where an

initial literature study into relevant works was used to guide the formulation of research questions and the design of an experiment. An experiment consisting of a base-line and various treatments was used, where the base-line consisted of healthy samples and the treatments were samples recorded during various fault cases. In Figure 4 the main steps

of the research process that is based on the work by Nunamaker and Chen [34] can be

seen, and each step is detailed below:

• An initial problem description was provided by the industrial partner regarding whether artificial intelligence techniques can be used to detect faults in their mobile platform.

• A literature study was carried out to get a better understanding of the problem, in addition to mapping out the state-of-the-art in the major problem areas which are fault detection, isolation, as well as diagnostics. From this phase information was gathered about methods that could be applied to the given problem, in addition to how the problem could be used to gain interesting insights from a research perspec-tive. A large part of the thesis was spent on this step. The literature study was

Literature

Study FormulationProblem ExperimentDesign ExperimentExecution AnalysisResult Initial Problem

Description

Figure 4: A diagram outlining the major steps of the research process as boxes in chronological

order from left to right, where the arrows show the transitions between steps. In certain cases revisions to the previous steps had to be made as new knowledge was gained further down the process, and as such arrows can point both forward and backwards.

(16)

conducted in several iterations, as knowledge gaps that were found in some of the other steps would require further study of the state-of-the-art.

• From the information gained by the literature study, a more concrete problem

for-mulation was made. The problem formulation consists of a selection of research

questions that would later guide the design and execution of the experiment, but changes to the questions could be made at a later stage based on additional knowl-edge gained from the experiments.

• In the experiment design phase the core of the work was formulated and the design of the model was made, which was then implemented and used during the

experi-ment execution stage. Two separate experiments were created and executed, to (1)

investigate how the model performed by changing various parameters such as the Autoencoder’s number of neurons and the size of the input samples, and (2) to eval-uate the best performing model from the first step’s capability detecting faults. The second experiment was conducted to answer the research question and was designed

as a single-case experiment [33], with a base-line generated by inputting healthy

observations into the model, and several treatments in the form of sensor readings recorded after various types of faults had been injected into the drive system. To be able to answer RQ1, the anomaly scores produced by the model was observed, whereas the prediction error was used to answer RQ2.

• The outputs of the experiments were finally studied in the result analysis stage using statistical methods, and conclusions were made based on visual observations of how the anomaly scores and prediction errors differed from the base-line after various treatments had been applied.

As can be seen in Figure 4 the research was carried out as an iterative process, where it was possible to return to previous steps after additional knowledge had been attained in the later stages. This was applied in practice during at several times during the work. An example of how this was used is by conducting additional literature studies after a problem that was too big for the scope of thesis had been formulated, or by returning to the experiment design step to revise the model after it was found out to not perform well enough on the test data.

(17)

5. System Model

In this section any assumptions made regarding the system will be described, including regarding its inputs and outputs. The data that is processed by the system consists of sensory signals sampled from a robot, and we define the number of signals as p from an unbounded time-series. Our system will processes one sample at a time, and to be able to also capture temporal information we wish to process some number of time-points at a

time. We define a single sample as a two-dimensional matrix X ∈ Rp×w_{, where w is the}

number of time-points. The output is a binary value designating whether the sample is either nominal or anomalous, and as such we define the entire system as a function F (X) =

y, where y ∈ {¬Anomalous, Anomalous}. Prior to online fault detection, the model must

(18)

6. Autoencoder-Based Fault Detection

In this section a concrete formulation of the proposed method is made, and the different components making up the system are explained. The method consists of an initial model training step where patterns in healthy signals are learned using a pre-recorded dataset, and an online fault detector where the model produces predictions regarding signals sam-pled in real-time. A diagram showing the offline training steps can be seen in Figure 5, and a similar diagram for how anomalies are detected in real-time is visible in Figure 6.

6.1. Fault Detection

Offline Training Weights and Biases Train Autoencoder Reconstruction Errors Autoencoder Anomaly Scores Mahalanobis Distance Training Data Fault Threshold Determine Threshold Covariance Matrix, Location MCD

Figure 5: A diagram showing the offline training process of the fault detector. The training data

containing healthy samples is first used to train the weights of the neural network, and are then once again passed through the Autoencoder to produce a set of reconstruction errors to be used for determining the online fault detection threshold. Anomaly scores are assigned to the errors based on the covariance and central tendency of the error distribution from an MCD estimator, and these scores are then used to set the threshold to correspond with a top percentage of the most anomalous samples.

The model relies on an initial feature extraction step using an Autoencoder, which is followed by an anomaly detection scheme based on the Mahalanobis distance metric. We first train the model offline using a dataset of recorded healthy signals, where the weights and biases of the Autoencoder are set, and a fault threshold is determined. A diagram detailing the training process of the fault detector can be seen in figure 5, where the trained weights and the fault threshold are later passed on to the model used for anomaly detection in online data.

6.1.1. Autoencoder

The Autoencoder used in this work consists of a single hidden layer. For both the encoder and decoder networks the sigmoidal activation function is used as in Equation 9, which maps its input to the range [0, 1].

σ(x) = 1

1 + e−x (9)

We train the network using the squared reconstruction error as the cost function which was previously defined in Equation 4, and we also add a L1 regularization term for all

(19)

Online Fault Detection Reconstruction Errors Autoencoder Weights and Biases Anomaly Scores Mahalanobis Distance Covariance Matrix, Location Fault Threshold _Threshold Comparison Unknown Sample Yes

Faulty? _IsolationFault

Figure 6: A diagram of the online fault detector, which utilizes the pre-computed Autoencoder

weights, covariance matrix, and robust location. A single sample is processed at a time, where a sample error is computed based on the reconstructed signal from the Autoencoder. Following this, an anomaly score is assigned by computing the Mahalanobis distance from the previously computed distribution of healthy training samples. Finally, the score is compared with the fault threshold. Only if the sample is determined to be faulty, the fault is isolated.

6.1.2. Anomaly Scoring

The mean squared error over the time points in a window is used as the reconstruction error measure for a sample, and this is done to produce an unified error for the entire

sample such that e ∈ Rp_{. In other words, we derive a sample error by first computing the}

squared error between the original p signals and their reconstructions, and then average these values over all time-points in the sample window. During the training phase a distribution of sample errors are computed from the same healthy signals that was used to train the Autoencoder in the previous step, and based on this distribution a covariance matrix and robust estimation of central tendency is computed using the MCD estimator. The covariance matrix and robust location are then used to compute anomaly scores using the robust Mahalanobis distance, which can be seen in Equation 8.

6.1.3. Fault Threshold

The fault threshold is an important part of the fault detector as it has a large effect on the model’s ability to detect true faults from faulty signals, in addition to the number of false positives as a result of from healthy ones. During training, a threshold is set by setting the threshold to correspond to a certain percentage of the highest anomaly scores produced from the healthy training data. Any sample that during testing or online use has an anomaly score exceeding this threshold will be predicted as anomalous by the model.

6.2. Fault Isolation

The previously described fault detection module only signals if some fault has happened, but does not provide any information of where it originates from. Inspired by a previous

approach taken by Liang et al. [24], we compute a two-dimensional map using the

Q-statistic (the squared prediction error) as in Equation 10 from the elements of the input and reconstruction matrices.

(20)

Qi,j = (˜xi,j− xi,j)2, (i = 1, ..., p), (j = 1, ..., w) (10) In the error map the signals are separated row-wise and as such we can isolate the faulty variables to the rows with the largest contribution in the map. Furthermore we isolate the fault to a given component as the set of signals with the largest contribution which originate from it.

(21)

7. Experiments

This section details how the experiments were carried out, including information on how data was collected and processed, the architecture of the model, and various experimental parameters. The experiments were carried out on two different sets of data, where the first dataset contained sensor measurements recorded from a mobile robot, and the second was synthetically created.

7.1. Industrial Use Case

To evaluate the model for real world applications a dataset was created from signals recorded from a four-wheel omni-directional mobile platform. A model of the target plat-form can be seen in Figure 7. At the highest level an Intel NUC mini-pc running Robot

Operating System (ROS) [35] can be seen, which communicates with a STM32 F7 ARM

Cortex microcontroller through a network switch. The STM32 controller coordinates and communicates with drive units 1 to 4 over CAN. Each drive unit actuates separate brush-less DC steering and wheel drive motors placed at the four corners of the mobile base, i.e. at the front left, front right, rear left, and rear right positions.

ROS

Intel NUC

STM32 F7

Drive Unit 1 Drive Unit 2 Drive Unit 3 Drive Unit 4 Network

Switch

CAN 1 CAN 2

Ethernet CAN

Figure 7: A diagram showing the components of interest and their connections in the distributed

architecture of the robot.

7.1.1. Diagnostics Routine

In order to simplify the data collection process a diagnostics routine was created, which intended to stress the motors such that faulty signals could be discerned from healthy

(22)

ones. One full execution cycle of the diagnostics routine is described below:

1. Each wheel is rotated in turn by actuating the steering motors at different velocities. 2. The wheels are aligned in a circle and the entire base is rotated in-place back and

forth four times with increasing velocity.

3. The base is rotated with a constant velocity, and one steering motor at a time is actuated to rotate the wheel to -4 degrees, then to 4 degrees, then back to 0 degrees relative the starting rotation.

4. The base is rotated with a constant velocity, and for one wheel motor at a time a switch from velocity control to torque control is made with two different reference values (2 Nm and −2 Nm).

7.1.2. Fault Injections

To provide a better understanding of the model’s capability to detect faults, two different types of artificial sensor faults were injected into the drive units. A description of each of these faults are listed below:

• Encoder alignment fault (1 degree) - An error of 1 degree is introduced into the alignment of the rotor magnetic field relative the stator which is used by the motor encoder to produce an offset.

• Steering motor absolute angle error (3 degrees) - An error of 3 degrees are introduced into the angle of the steering motor encoder. This causes the wheel to always be slightly misaligned with respect to its expected rotation.

7.1.3. Data Collection

A dataset was created by recording signals directly from the robot in a lab environment. The signals were recorded into a ROS bag format at 50Hz from a remote laptop connected to the on-board network switch over Wi-Fi, and the bag files were then converted into a CSV format to be processed by the model. Torque signals from the drive and steering motors of all wheels were separated for use to train the final model as they provided promising results after preliminary tests, resulting in a total of 8 different signals, as can be seen in Table 1.

Origin Signal Unit

Drive Motor Torque Nm

Steering Motor Torque Nm

Number of collected signals: 8

Table 1: Actuator signals collected from each of the four drive units.

A description of the complete dataset is shown in Table 2. One complete diagnostics cycle was recorded for each of the injected faults, and 12 cycles for the healthy data. Before each diagnostics cycle the robot was moved to a different location, and was also rotated to have varying starting orientations. In Figure 8 the eight torque signals recorded during one of the healthy cycles are shown.

(23)

Type Collected cycles

Healthy 12 Cycles

Front right encoder alignment fault 1 Cycle

Rear right encoder alignment fault 1 Cycle

Front right steering motor absolute angle error 1 Cycle

Table 2: Number of diagnostics cycles collected for each fault class.

0 50 100 150 200 Time (s) 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 Torque (Nm) Healthy Signal

Front Left Drive Wheel

Front Left Steering Front Right Drive WheelFront Right Steering Rear Left Drive WheelRear Left Steering Rear Right Drive WheelRear Right Steering

Figure 8: Healthy torque signals recorded during one of the diagnostics cycles from the mobile

robot in the industrial use case.

7.2. Synthetic Data

To further test the model a synthetic dataset was generated to tweak the model and generate additional results, and was used for the initial design and testing of the model before the real world data for the industrial use case could be recorded. The complete dataset contained four different waves that were generated with varying amplitudes and frequencies, and Gaussian noise was also added to further diversify the data. Data to simulate healthy signals were generated with fixed values for waveform amplitude and frequency, and the noise was generated with a fixed standard deviation. In a similar fashion to the industrial datasets, the signals were sampled at 50Hz. A sequence of faulty data was generated by in turn increasing the amplitude, frequency, and noise standard deviation, with the goal of investigating how the model reacts to changes in these values.

7.3. Data Pre-Processing

In the experiments the data was pre-processed by standardization and normalization, where the range of the elements in the datasets are mapped to the interval [0, 1]. After this the signals are divided into samples of size w × p such that they can be fed to the model. Data standardization is done as is shown in Equation 11, where ˆx is the raw data

prior to standardization, µ ∈ Rp _{and s ∈ R}p _{are the mean and standard deviations of}

the signals computed from the healthy training data. After standardization, the data is normalized to lie within the range [0, 1], to speed up the learning process.

xi =

ˆ xi− µ

(24)

The datasets are then divided into equally sized blocks using a sliding window approach. The width of the window is set in accordance to the expected length of each sample (w), and this window is then moved across the data sequence with a set step size of 1 to capture as much information as possible from the signals. At each step across the sequence, a sample is created by extracting the sensor values within the window.

7.4. Model Training

The training of the model was done using the healthy data, which was split up into subsets by keeping two of the recorded cycles for validation and testing, and the rest for training. The validation data is used to provide a metric of how well the Autoencoder performs after each training epoch, while the testing set is used to assess the predictive skill of the full model. The data is split up into smaller samples using the sliding window technique separately for each cycle. The samples from the 10 different diagnostic cycles in the training set are then merged into one unified segment of samples, and 101965 training samples were extracted in this way. No additional pre-processing was made on the signals with the exception to standardization and normalization as previously discussed, and as such the signals can be considered to be raw.

Table 3 shows the parameters of the experiment. The window size is fixed at 16, and a step size of 1 is used to capture as many features as possible from the signals. Several models are created to assess the impact of the number of hidden neurons on the predictive performance of the model, controlled by the parameter k.

Parameter Value

Number of hidden neurons (k) [8, 16, 32, 64]

Window size (w) [8, 16, 32]

Window step size 1

Threshold top % 0.1%

Table 3: Experimental parameters.

In Table 4, the structure of the Autoencoder can be seen. We utilize a Multilayer Perceptron (MLP) structure for the network using fully connected layers, where each neuron in a layer has connections to all neurons in the neighboring layers (excluding the bias neurons). As can be seen, the size first fully connected layer (i.e. the hidden layer) is controlled by the parameter k. The size of the input layer and the second fully connected layer (the output layer) is controlled by w and p, which is the shape of a sample window.

Layer Number of neurons

Input layer w × p

Fully connected layer k

Fully connected layer w × p

Table 4: Network structure of the Autoencoder.

Training was carried out on an external laptop running Ubuntu 20.04, with a 2.8GHz Intel i5-6500U CPU with 2 physical and 4 logical cores. The model was implemented in

(25)

with Keras [37]. Scikit-learn [38] 0.24.1 was used for fast MCD estimations, as well as for

Mahalanobis distance calculations. Additionally, NumPy [39] version 1.18.4 was used.

In Table 5 the hyperparameters that were used to train the Autoencoder can be seen. The loss function used is the mean squared error between the input and output layers, as defined in Equation 3. The Adam optimizer was used to update the weights in the network with the learning parameters that are listed in Table 6. For more information

on the Adam optimization algorithm, see the paper by Kingma and Ba [40]. The model

was trained with an early stopping patience of 20 epochs (i.e., the training stops after 20 complete iterations through all samples without a reduction in validation loss), or until a maximum of 400 epochs. Additionally, a batch size of 128 samples per iteration was used.

Hyperparameter Value

Loss function MSE

Optimizer Adam

Learning rate 0.001

Max epochs 400

Early stopping patience 20

Batch size 128

Table 5: Tensorflow hyperparameters for the Autoencoder network.

Parameter Value

α 0.001

β1 0.9

β2 0.999

10−7

(26)

8. Results

In this section the results of the experiments in the form of graphs and tables are presented. To investigate the impact on number of hidden neurons and the size of the sample win-dow on the predictive accuracy of the model, an initial experiment was conducted where changes were made to the parameters k and w as can be seen in Table 3. The parameters were investigated individually by first fixing the window size w to 16 and tweaking the number of hidden neurons k, after which k was fixed to the value that performed the best while changes were made to w. In Tables 7 and 8 the results of changing the parameters can be seen for the synthetic dataset and the data from the mobile robot in the industrial use case respectively. The measures of performance in these tables are the percentage of true positives (i.e. the percentage of anomalous samples over all samples in the faulty datasets), and the percentage of false positives (i.e. the percentage of anomalous samples from the healthy test datasets). A sample is reported as anomalous if it exceeds the fault threshold set during training, which corresponds to the anomaly score of the top 0.1% highest scoring sample from the healthy training data. Figures showing the impact the parameters have on varying top percentages for determining the fault threshold (from 0% to 100%) can be seen in Appendices A and B for further reference.

k 8 16 32 64 16 16 16

w 16 16 16 16 8 16 32

True Positives 60.29% 55.47% 52.78% 49.16% 25.99% 55.47% 60.08%

False Positives 0.55% 0.18% 0.15% 0.04% 0.04% 0.18% 0.41%

Table 7: Reported percentage of true and false positives with varying parameter values from the

synthetic dataset.

The results from the synthetic dataset that can be seen in Table 7 show that a smaller number of hidden neurons provide the greatest percentage of both true and false positives, while the opposite can be seen as the number of neurons increases. Selecting the best parameter here depends on whether a large number of true positives or a low number of false positives is desired, and is as such dependant on the situation. With this in mind k = 16 was selected, as it provides a good middle point between both variables. A similar pattern can be seen for the window size, as a larger window size increases both the percentage of true positives and false positives. The smallest tested window size (8) performs quite poorly however, producing a percentage of true positives that is roughly half as large than when w = 16. Because of this, w = 16 was chosen for further experiments as it performed the best overall.

k 8 16 32 64 64 64 64

w 16 16 16 16 8 16 32

True Positives 0.09% 0.11% 0.61% 1.14% 0.96% 1.14% 1.22%

False Positives 0.11% 0.11% 0.11% 0.14% 0.11% 0.14% 0.16%

Table 8: Reported percentage of true and false positives with varying parameter values from the

dataset recorded from the mobile robot in the industrial use case.

We can see some differences between the synthetic dataset and the industrial use case by comparing the previous table to Table 8, which contains the results of changing the

(27)

same parameters for the torque signals recorded directly from the mobile robot. The most prominent feature of the results are the overall low percentages of both true and false positives. Secondly we can see that the highest number of hidden neurons give the highest percentage of true positives, while the lower numbers perform better in terms of lowering the amount of false positives. It is worth noting here that the size of the input is larger for these models (8 input signals as opposed to 4 for the synthetic data), which may have an impact on these results. A hidden layer size of k = 64 was fixed when tweaking the size of the sliding window, and the value w = 16 was chosen for further experiments. In Table 9 the final parameters of both models can be seen, in addition to the fault thresholds.

Dataset k w Fault threshold

Synthetic 16 16 5.70

Industrial 64 16 336.46

Table 9: The final parameters chosen for both datasets and used for further experimentation.

8.1. Fault Detection

In Figure 9 a box plot showing the distribution of anomaly scores for the synthetic datasets can be seen. A noticeable difference between the anomaly scores for both classes can be seen, with a median score of around 2 and 8 for the healthy and faulty datasets respectively. Figure 23 in Appendix B provide a different view of these distributions in the form of histograms.

100 ₁₀1

Anomaly Score Faulty

Healthy

Figure 9: A logarithmic scale box plot showing the spread of anomaly scores for the synthetic

test data.

Figures 10 and 11 show a comparison between the original signals and the anomaly scores assigned by the model for the synthetic healthy and faulty test sets. The fault threshold fits the healthy data well in this case, with only occasional samples exceeding it. For the faulty data we can see an increase in anomaly scores, with several large areas that exceed the fault threshold. Increasing the amplitudes as is the case in the interval [0, 1000] does not seem to produce large anomaly scores, although some increases can be seen at the end of this interval. The model seem more sensitive to changes in the frequency of the waveforms, as large scores can be seen in the interval [1000, 2250] where this parameter was increased when the signals were generated. It does also seem to be able to detect noise in the signals, in this case Gaussian noise, which is evident in the time interval [2500, 3500] where increases in the standard deviation of the noise generator also produced larger anomaly scores.

(28)

0 10 20 30 40 50 10 5 0 5 10 Value 0 10 20 30 40 50 Time (s) 0 20 40 60 80 Anomaly Score Healthy

Figure 10: ”Healthy” signals from the test sets from the synthetic data, and the predicted anomaly

scores. The fault threshold with value 5.70 is denoted by the red line in the bottom graph.

0 10 20 30 40 50 60 70 10 5 0 5 10 Value 0 10 20 30 40 50 60 70 Time (s) 0 20 40 60 80 Anomaly Score Faulty

Figure 11: Differing signals (i.e. ”faulty” signals) from the synthetic dataset, and its anomaly

scores. The fault threshold with value 5.70 is denoted by the red line in the bottom graph.

For the industrial use case we can see a breakdown of the distribution of anomaly scores for all the recorded fault classes and the healthy test case in Figure 12. The scores for both encoder alignment fault cases follow the same distribution as the healthy test data, and as a result the model will not be able to detect these faults from the provided signals. On the other hand we can see that the absolute angle fault injected in the front right steer motor has a different distribution of scores, signifying that the model has been able to detect some anomalies in these samples. Refer to Figure 20 in Appendix A for the histograms detailing these results.

100 ₁₀1 ₁₀2

Anomaly Score Steer motor absolute angle error front right

Encoder alignment fault rear right Encoder alignment fault front right Healthy

Figure 12: A logarithmic scale box plot showing the spread of anomaly scores for all test cases

from the signals recorded from the mobile robot in the industrial use case.

(29)

the anomaly scores produced by the method. As can be seen, a majority of the scores are low but with occasional large spikes which exceed the threshold, which is in line with the threshold being set to the anomaly score of the top 0.1% scoring sample. In comparison to the synthetic data, the threshold does not fit the majority of the data well due to these spikes in anomaly scores.

0 50 100 150 200 10 5 0 5 10 Torque (Nm) 0 50 100 150 200 Time (s) 0 200 400 600 800 Anomaly Score Healthy

Figure 13: Healthy signals produced by the mobile robot during diagnostics cycle, and the

predicted anomaly scores. The fault threshold with value 336.46 is denoted by the red line in the bottom graph.

Similar results can be seen from the two encoder faults in Figure 14, which is as expected considering their respective anomaly score distribution in Figure 12 being close to the healthy data. It seems that there are no noticeable differences in the input signals in this case, which may due to the injected error not being severe enough to cause a difference in the output torque of any motor.

To contrast, Figure 15 shows the anomaly scores for the absolute angle fault of the front right steering motor. Here we can see large differences in anomaly scores in comparison to the previous examples. Similarly to the other samples we can observe the same kind of spikes at the start of the diagnostics routine, followed by several large areas exceeding the threshold with increasing severity as the velocity increases during the second step of the routine. We can also see some larger areas during the second and third step where the anomaly scores are larger than that of the nominal case, but not severe enough to exceed the threshold and be flagged as anomalous.

(30)

0 50 100 150 200 10 5 0 5 10 Torque (Nm) 0 50 100 150 200 Time (s) 0 200 400 600 800 Anomaly Score

Encoder alignment fault front right

Encoder alignment fault rear right

Figure 14: Faulty signals from an injected alignment fault into the front and rear right wheel

encoders of the mobile robot, and the anomaly scores produced by the model. The fault threshold with value 336.46 is denoted by the red line in the bottom-most graphs.

Steer motor absolute angle error front right

Figure 15: Faulty signals from a diagnostics cycle with an absolute angle error fault injected into

the front right steering motor of the mobile robot, and its resulting anomaly scores. The fault threshold with value 336.46 is denoted by the red line in the bottom graph.

8.2. Fault Isolation

The fault isolation strategy builds upon the premise of being able to identify the failing components from sensor readings. We will only examine the data from the industrial use case for this, as it contains known faulty components. Moreover, we will only use the data from the front right steering motor angle error fault, since it was the only fault case that

(31)

showed any differences from the samples of the nominal test case.

In Figure 16 the mean Q-statistic (Equation 10) of each cell (sensor and time pair) computed over several fixed intervals can be seen. The largest differences in anomaly scores between this case and the nominal data can be observed in the time interval of [70s, 200s] (see Figure 15, and as such the prediction errors have been calculated separately in four different slices of this interval to investigate which signals have the largest reconstruction errors. Additionally, the values have been normalized to lie within the interval [0, 1] for visualization purposes.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Time in Sample

Front Left Drive Wheel Front Left Steering Front Right Drive Wheel Front Right Steering Rear Left Drive Wheel Rear Left Steering Rear Right Drive Wheel Rear Right Steering

Steer motor absolute angle error front right Time Interval [80s, 120s]

0.0 0.2 0.4 0.6 0.8 1.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Time in Sample

0.0 0.2 0.4 0.6 0.8 1.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Time in Sample

0.0 0.2 0.4 0.6 0.8 1.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Time in Sample

0.0 0.2 0.4 0.6 0.8 1.0

Figure 16: Map of the squared prediction errors averaged over four different intervals for the

absolute angle error fault in the front right steering motor. The different signals are organized row-wise whereas the time-points in the sample window are stacked column-wise.

From the images in Figure 16 it can be observed that the largest errors in all cases can be found in the signals originating from the rear right wheel drive motor, despite the fault being injected in the front right steering motor. In Figure 17 the averaged errors over the full samples for each signal have been plotted for the same test case. We can see that the largest errors are not produced for the signals originating from the front right motors, which is where the fault was injected.

(32)

0 50 100 150 200 10 5 0 5 10 Torque (Nm)

0 50 100 150 200 Time (s) 0.00 0.02 0.04 0.06 0.08 0.10 _SPE

Steer motor absolute angle error front right

Figure 17: A comparison between the original signals from the absolute angle fault in the front

(33)

9. Discussion

The results provide some insights into answering the research questions, but to be able to draw more concrete conclusions a larger dataset that includes more fault cases is necessary. Signals from both encoder alignment fault cases follow closely to that of the nominal case, and as such the pattern of anomaly scores are almost identical to the scores produced from the healthy test data. However, we can definitely see that the approach has some potential by considering the results from feeding the model with data from the absolute angle error fault case as shown in Figure 15, where major differences can be seen in the time interval of the second and third steps of the diagnostics routine.

In all the test cases, we see large spikes in anomaly scores during the first and third step of the routine where the signals are quite noisy. This has major implications on the accuracy of the model, and also impacts the selection of a good fault threshold in a negative way. In the experiments the threshold was selected from the top 0.1% scoring sample from the training data, which resulted in a threshold at around the value of these outliers. As a consequence of this, several areas which are visibly different would not exceed this threshold, and are therefore not interpreted as anomalous by the model. For instance, consider the time interval [70s, 160s] in Figure 15, where multiple large areas of successively increasing anomaly scores can be seen. As a result of the fault threshold being set too high, most of these noticably anomalous samples are not classified as such. The spikes in anomaly score indicates that the Autoencoder has issues in learning a good enough representation of the noisy data. Hence, future improvements to the model may need to be made in either denoising the data as an initial pre-processing steps, or by investigating how a denoising Autoencoder would perform with the same data.

The final experiment was to investigate if the signals reconstructed by the Autoencoder provided information regarding which signals was faulty, and as such pin-pointing the faulty component for isolation purposes. For this purpose the squared prediction error between the original and reconstructed signals were used, with the hypothesis that the errors of the faulty signals would be greater in a given sample than that of the healthy ones. As can be seen in Figures 16 and 17 the reconstruction errors for the absolute angle fault in the front right steering motor can not be used to confirm this hypothesis, as most of the large errors seem to point towards the rear right wheel motor being faulty. By inspecting the input signals we can however see differences between the front left and rear right wheel drive motor signals and their healthy counterparts, and as such a spread in reconstruction errors as can be seen in the figure is expected. Hence, torque signals may not provide valuable enough information as is, and an additional layer might need to be added to the model to map such patterns to their actual causes. Further experimentation with a larger dataset of faults is needed here to draw any real conclusions however, as the signals from the other fault cases do not provide any indication of being faulty, and can as a result not be used.

The results can be partly used to answer the research questions of the thesis. As was shown in the experiments carried out on the industrial dataset, we can see that the Autoencoder has the capability of learning valuable features in order to differentiate between healthy and faulty behaviour (RQ1). The experiments did not confirm if the Autoencoder can be used to isolate faults to the failing component by examining the reconstructed signals, as interference due to anomalous behaviours in the torque outputs from other motors could be observed which affected the reconstruction errors (RQ2).