Outlier detection on sparse-encoded vibration signals from rolling element bearings

(1)

vibration signals from rolling element

bearings

Kammal Al-Kahwati

Computer Science and Engineering, master's level (120 credits) 2019

Luleå University of Technology

(2)

Outlier detection on sparse-encoded

vibration signals from rolling element

bearings

Kammal Al-Kahwati

Dept. of Computer Science, Electrical and Space Engineering

Lule˚

a University of Technology

(3)

(4)

A

BSTRACT

The demand for reliable condition monitoring systems on rotating machinery for power generation is continuously increasing due to a wider use of wind power as an energy source, which requires expertise in the diagnostics of these systems. An alternative to the limited availability of diagnostics and maintenance experts in the wind energy sector is to use unsupervised machine learning algorithms as a support tool for condition moni-toring. The way condition monitoring systems can employ unsupervised machine learning algorithms consists on prioritizing the assets to monitor via the number of anomalies de-tected in the vibration signals of the rolling element bearings. Previous work has focused on the detection of anomalies using features taken directly from the time or frequency domain of the vibration signals to determine if a machine has a fault. In this work, I detect outliers using features derived from encoded vibration signals via sparse coding with dictionary learning. I investigate multiple outlier detection algorithms and evaluate their performance using different features taken from the sparse representation. I show that it is possible to detect an abnormal behavior on a bearing earlier than reported fault dates using typical condition monitoring systems.

(5)

(6)

P

REFACE

I would like to thank my girlfriend, Rania Awn for cheering me on, supporting me and listening to topics that you have not studied just so that I can clear my mind. Without your positive and caring attitude, this thesis work would be much harder to do.

I would also like to thank my family and friends, especially Daniel Bj¨ork, Niklas

Lund-borg and Shahin Salehi for all the support and tips provided.

Lastly, I would like to give my greatest thanks to my supervisor Sergio Martin-del-Campo Barraza for your insight and all the help provided during the course of this thesis. It has been an invaluable experience and I have learned a lot from you.

Kammal Al-Kahwati

September 2019, Lule˚a, Sweden

(7)

(8)

C

ONTENTS

Chapter 1 – Introduction 1 1.1 Background . . . 1 1.2 Motivation . . . 3 1.3 Problem definition . . . 3 1.4 Thesis structure . . . 4

Chapter 2 – Related work 7 2.1 Rolling element bearings . . . 7

2.2 Machine learning on rolling element bearings . . . 8

2.3 Sparse coding with dictionary learning . . . 8

Chapter 3 – Theory 11 3.1 Supervised learning . . . 11

3.2 Unsupervised learning . . . 12

3.3 Isolation forest . . . 12

3.4 Extended isolation forest . . . 14

3.5 DBSCAN . . . 15 Chapter 4 – Implementation 19 4.1 Data . . . 19 4.2 Frameworks . . . 21 4.2.1 NumPy . . . 21 4.2.2 Pandas . . . 22 4.2.3 Matplotlib . . . 22 4.2.4 Scikit-learn . . . 22 4.3 Isolation Forest . . . 22

4.4 Extended isolation forest . . . 22

4.5 DBSCAN . . . 23

Chapter 5 – Results 25 5.1 Isolation Forest . . . 25

5.1.1 Comparison among segments . . . 25

5.1.2 Comparison among turbines . . . 34

(9)

5.3 DBSCAN . . . 59

5.3.1 Running DBSCAN without normalization . . . 59

5.3.2 Running DBSCAN with normalization . . . 60

5.3.3 Running DBSCAN with adjusted normalization . . . 61

5.4 Discussion . . . 62

5.4.1 Comparison among segments . . . 62

5.4.2 Comparison among turbines . . . 63

5.4.3 DBSCAN . . . 63

Chapter 6 – Conclusions and future work 65 6.1 Conclusions . . . 65

6.2 Future work . . . 66

Appendix A – Appendix 67

(10)

C

HAPTER

1 Introduction

Wind power, as an energy source in Sweden, is becoming more popular by each year. In 2006, wind power accounted for less than 0.5% of the total energy production, while roughly ten years later in 2017, the number was up to 11% [1]. Wind turbines are es-sential on the ongoing growth of wind power as a renewable energy source. The wind turbines are often placed close to each others in what is known as “wind farms”. The wind turbines, within each wind farm, are typically of a similar model and they all ex-perience similar environmental and operational conditions. During their lifetime, the turbines may experience faults, where the most common faults are from rolling element bearings. Therefore, machine and bearing maintenance requires efficient condition moni-toring methods that include data analytics for the detection of faults as early as possible while operating in resource-constrained environments. Bearing diagnostics is typically carried out by experts who use the vibration signals originating from a machine. How-ever, there is a constant growing demand for condition monitoring expertise that outpaces the resources and number of experts available.

1.1 Background

The wind power industry has grown rapidly with turbines that had gone from producing 2MW to 10MW in the span of a decade. Adding to this this, the lifespan of a turbine tends to be around 20 years, service and maintenance has proven to be a challenge. When a maintenance action is required, the company needs to halt the operation of the machine and perform the maintenance using a crane. This action is expensive in itself, adds up on the cost of not producing energy, and brings the risk of not fulfilling the energy demand. Furthermore, spare parts in this industry of fast development are not readily available all the time. The highest number of failures are found in the drivetrain of the wind turbine, with gearbox failures being the most cumbersome, since it is one of the

most expensive components to replace. A typical 2MW gearbox may cost e200k-400k to

(11)

replace, including bearings which may cost in betweene50k-100k [2]. Service personnel monitor the turbines throughout the day, and fix problems accordingly. However, as the number of turbines grows, one person may be responsible for the continuous operation of up to a hundred wind turbines. Wind turbines are complex and when they suffer a fault, experts need to investigate the root cause of the problem and fix it accordingly. Thus, companies employ different types of condition monitoring methods, to monitor any possible problem. These methods include using vibration sensors or accelerometers for each bearing in question. The sensors produce an output, known as a vibration signal, which is studied by experts. Research has found that certain vibration frequencies correspond different kinds of faults. These vibration signals include noise originated from other elements within the machine, which mixes with the signal component emanating from the rolling element bearings resulting in weak signatures for some faults [3]. Despite all of these challenges, one can assume that wind turbines in the same park experience

similar environmental and operational conditions. This means the methods may be

adapted to several turbines, and data may be validated in such manner as well.

Condition based maintenance involves continuous monitoring of wind turbines to detect faults before they occur. The process can be described in four stages: data acquisition, preprocessing, analysis and selecting the right maintenance action [5]. One interesting aspect in the preprocessing stage is feature extraction. A feature is any characteristic of the signal that is measurable and particularly effective for modeling or classifying it [4]. The features are often manually selected by experts, and a change in that measure is often an indication of a developing failure. [5].

One new approach for condition monitoring of rolling element bearings is an unsu-pervised method called sparse coding with dictionary learning. The vibration signal is preprocessed and modelled as a linear superposition of waveforms and Gaussian noise. These waveforms are known as “atoms”, which are learned from that signal. Atoms can be referred to as features, and are basically different waveforms that describe the entire vibration signal. The process is roughly conducted in the following way:

The vibration signals from the sensors are recorded and preprocessed using a sparse en-coding algorithm. Common algorithms include Matching Pursuit [6] and Orthogonal Matching Pursit [7]. The atoms in the dictionary are updated using the resulting sparse representation using an optimization algorithm. This procedure may be carried out when it is expected for the wind turbine to be on healthy operational conditions, such as the beginning of operation.

Since data is always recorded from these vibration sensors, new dictionaries can be formed at any time with continuous updates over time. As mentioned earlier, changes in one of these features in a rolling element bearing often points towards changes in that bearing. The new dictionaries are compared to “baseline” dictionaries to see if any drastic changes have occured over time. This approach to condition monitoring of wind turbines has shown that a dictionary distance measure is a useful indicator that can complement to already existing methods. Bearing faults could be detected several months earlier

(12)

1.2. Motivation 3

which could help reduce risks of costly failures in wind turbines.

The focus of this work is on the sparse representation of the vibration signals and the originated features derived from the dictionary learning approach. The aim of this project is to use unsupervised machine learning methods to identify outliers in the sparse representation and proposed features, which might originate from anomalies in the rolling element bearings of wind turbines.

1.2 Motivation

Condition monitoring of wind turbines is rather important in terms of effectiveness and fault reduction for generation of electrical energy using wind power. As the turbines grow in number, the service personnel responsible for the diagnostics get an increased work-load. This may result in a decrease of the quality of maintenance and service provided, which in turn may increase the number of faults occurring in wind turbines, leading to revenue losses and stressed out personnel. Dictionary learning methods using sparse coding representations of vibration signals has proven to be useful, for the earlier diag-nosis of wind turbine faults, which may benefit operators by relieving unnecessary stress. Unsupervised machine learning methods are rather straightforward and cheap to deploy compared to other types of methods that require labeled data to define a model. Thus, they may prove to be useful in the detection of faults in rolling element bearings residing in the drivetrain of wind turbines.

1.3 Problem definition

Unsupervised machine learning methods do not require labeled data for their operation, which provides greater versatility in their use. The data used in this study correspond to 46 months of vibration signals collected approximately every 12 hours from six different wind turbines located in a wind farm in northern Sweden. The data has been prepro-cessed using Matching Pursuit and Orthogonal Matching Pursuit algorithms, to create the sparse representations and both had used the same dictionary learning algorithm to update the dictionaries.

It has already been proven that sparse coding with dictionary learning could serve as a complement to currently in use condition monitoring methods. The question that arises is if it is possible to use unsupervised machine learning methods to find anomalies in the sparse representation of the vibration signals. In particular, this work focuses on the identification of outliers. An outlier can be described as a point or segment of a signal that do not fit in with the shape of the rest of that signal. Since a change in a feature is indicative of an abnormal behavior, outliers in the sparse representation of vibration data of rolling element bearings should also be indicative of possible anomalies in the bearing. The sparse representation is an approximation of the vibration signal as a linear

(13)

combination of features, known as atoms, plus noise.

Since unsupervised machine learning covers a rather broad spectrum of methods, one needs to ask what kind of unsupervised machine learning algorithms could be used? The preprocessed data can have as many features as data points in the signal, which requires the need to tailor the data to the appropriate machine learning algorithm. How should the data be visualized to best enable the identification of outliers?

The identification of outliers is the end goal of this project but it requires the iden-tification of suitable features to achieve this goal. Lastly, looking at the bigger picture, what additional information is possible to derive from the outlier detection study of the sparse representation of vibration signals?

In summary, the aim of this project is to address and investigate the answer to the following four research questions:

Q1: Is it possible to identify outliers from the sparse representation of vibration sig-nals of rolling element bearings?

Q2: What are the methods that are best suited for the identification of outliers from the sparse representation of vibration signals?

Q3: What are the most suitable features in the identification of outliers?

Q4: What information is possible to derive from the outlier detection study of the sparse representation of the vibration data?

Using unsupervised machine learning methods is a rather broad topic, which brings the need for delimitations. In this study, three different algorithms will be applied to the sparse representation of vibration signals to detect outliers:

1. Isolation Forest [8]

2. Extended Isolation Forest [9] 3. DBSCAN [10]

Isolation Forest and Extended Isolation forest is used to study and compare datasets encompassing multiple turbines, and datasets among segments of the sparse representa-tion of the vibrarepresenta-tion signals of each individual turbine. DBSCAN operates on datasets describing the sparse representation over time.

1.4 Thesis structure

The thesis is structured as follows: Chapter 2 provides related work. Here work related to this project is presented, such as the sparse coding method with dictionary learning.

(14)

1.4. Thesis structure 5

Chapter 3 describes the theory behind different machine learning approaches and the algorithms implemented in this work. Chapter 4 provides a description of how the work was implemented. The frameworks used in this project will be described along with the flow of data through the implementations. Furthermore, the reader is presented with a brief overview of the python files and their use. Chapter 5 presents the results of this work and a discussion of the results. Chapter 6 present the conclusion, which includes answers to the research questions stated in the problem description and future work.

(15)

(16)

C

HAPTER

2 Related work

This chapter describes work related to the research conducted in this study, such as rolling element bearings, their uses, and how condition monitoring is carried out on them. Furthermore, it presents the use of machine learning in condition monitoring processes of rolling element bearings, typical features and how the features are used. Lastly, sparse coding with dictionary learning is introduced along with its implementation as it pertains to condition monitoring of rolling element bearings.

2.1 Rolling element bearings

A rolling element bearing is described as a mechanical component located between two parts that permits rotation between them, and is among the most common of bearings due to its generally useful properties. Rolling element bearings work by transferring a load between the parts in contact while reducing the friction between those parts.

Martin-del-Campo et al. [11] present a description of rolling element bearings and their

characteristic frequencies. The fault frequencies of rolling element bearings situated

in wind turbines are typically evaluated in terms of order, which is the ratio between frequency and shaft speed. The faults can occur due to a variety of reasons such as faulty lubrication, fretting, excessive load, corrosion and regular wear, among others [12]. Table 2.1 presents an example of the characteristic frequencies of a rolling element bearing situated in the high-speed shaft of the gearbox in a wind turbine. All these values are presented in terms of order for ease of understanding. An order of 1 means that an event occurs only once on each rotation of the bearing.

(17)

Table 2.1: Example of fault frequencies in wind turbines, courtesy of [11].

Fault characteristic order

Motor shaft speed 1.0

Ball pass frequency, inner race 9.6

Ball pass frequency, outer race 7.4

Ball spin frequency 3.7

Fundamental train frequency 0.4

Gear mesh 35

Methods of condition monitoring of rolling element bearings typically depend on the application for which the bearing is to be used, among other factors. The methods offer benefits but also suffer from limitations related to the bearing size, location, lubrica-tion system and mounting condilubrica-tions. [13]. Common methods of condilubrica-tion monitoring of rolling element bearings include measuring vibration, acoustic emission, sound pres-sure, temperature or lubricant analysis [14]. The work described by this thesis limits to vibration analysis only.

2.2 Machine learning on rolling element bearings

One approach for condition monitoring of rolling element bearings is an unsupervised method called dictionary learning using a sparse representation of vibration data, pre-sented in [4]. The signal is modeled as a linear superposition of waveforms and Gaussian noise. The waveforms are “atoms”, learned from that signal. The atoms can be referred to as features of the signal. An interesting aspect that Martin-del-Campo et al. present is that if one trains a dictionary with the sparse signal approximation, one can compare this baseline dictionary with future dictionaries of the wind turbines, since condition monitoring requires constant data acquisition. If the dictionaries differ, then the fea-tures have changed, which is indicative of an anomaly. In that study, six turbines were recorded during a period of 46 consecutive months. One of the turbines had a gear box failure around after a year of deployment, and the distance to the baseline dictionary was greater than in the healthy turbines.

2.3 Sparse coding with dictionary learning

Sparse coding with dictionary learning has been used in condition monitoring applications of rolling element bearings before. In [4], Martin-del-Campo et al. explains sparse coding with dictionary learning as modeling a signal S(t) as a linear superposition of waveforms and Gaussian noise with compact support. The authors of [15] model a ball bearing

(18)

2.3. Sparse coding with dictionary learning 9

vibration signal using a dictionary of 16 atoms. See Figure 2.1 for an example of this representation. In [16], the authors use sparse coding with dictionary learning on bearings of different conditions. They train different dictionaries for different kinds of faults in the bearings and later merge the the dictionaries together, and show promising results regarding to fault diagnosis in machinery when considered as a classification problem. In the sparse representation of a signal, the signal is reconstructed using a dictionary of waveforms, also known as atoms that are defined at different timestamps of the signal with a corresponding weight. The sparse representation also provides features such as fidelity, which contains the resulting Signal-to-Noise residual (SNR) of the signal segment at the end of the sparse coding algorithm, dictionary distance, which is the dictionary distance as it is measured from the signal segment to a baseline dictionary among others.

50 100 150 Atoms Length Channel # −0.4 −0.2 0.0 0.2 0.4 Signal Amplitude

i−150 i−100 i−50 i i+50 i+100 i+150 1 5 10 15 Sample # Channel #

Figure 2.1: Sparse representation of vibration signal from rolling element bearing. In the top panel: The dotted line is the vibration signal, the blue line is method residual. The bottom panel represents the sparse representation. The sparse representation has not been created for the second half of the top panel, which the reader can see in the bottom panel aswell, since the representation stops at sample i. The right panel shows the dictionary of atoms used to represent the whole vibration signal. Courtesy of [15].

(19)

(20)

C

HAPTER

3 Theory

The term machine learning was coined by Adam Lee Samuel in 1959 [17]. Samuel wrote a paper on training a computer to play the game checkers, and verified that the machine could play better than a person during a training period of 8-10 hours [18]. However, it is during the last decade that machine learning has become one of the most relevant topics within many organizations, which look for innovative ways to use data to help grow their business turn to machine learning applications. Machine learning can be described as using algorithms to iteratively learn from previous data observations in order to predict future outputs [17]. The more data is used for training the algorithms, the more precise they become. Machine learning algorithms can be divided into three main categories, Supervised-, Unsupervised- and Reinforcement learning. In this work, unsupervised methods are used. However, we introduce supervised learning to explain the difference with unsupervised learning approaches. This difference pertains mainly to the presence of labeled data.

3.1 Supervised learning

In supervised machine learning algorithms, the data comes in input-output pairs [19] or labels [20]. The idea is that the inputs could for example be features or set of features, which are introduced with a label representing the output. This is used to train a model, which in turn generalizes to new inputs that try to predict the output. This method of machine learning is often more expensive than unsupervised learning since it requires labeling of data and human intervention to train models.

Some popular machine learning algorithms include Support Vector Machines (SVMs) or the Naive Bayes algorithm. The Support Vector Machine algorithm works by defining a decision boundary which is later used to separate data into two classes [13]. Regarding Naive Bayes, the algorithm is simple and is based on applying Bayes’ theorem [21] to the input data, while incorporating a prior, which is an assumption on the data distribution.

(21)

The algorithm has been criticized for being too severe in its assumptions thus rendering bad results, but there are effective ways for its implementation, which makes it a valid alternative to the previously discussed SVMs [22].

3.2 Unsupervised learning

Unsupervised machine learning methods are good when the dataset is rather large and without labels. The data may be hard to understand, and algorithms use classifiers, clustering or similar methods to identify patterns in the data [17]. In this approach, the data is not labeled and can thus only be regarded as input features. The work requires using unsupervised machine learning methods to try and create a model that identifies patterns based in a commonality of the features.

3.3 Isolation forest

Isolation forest builds on the principle that anomalous points in a given data set are in minority and have different attribute-values [8]. The algorithm has a linear time complexity and runs on a constant memory requirement.

The algorithm works by recursively partitioning of data, until a single point has been isolated. Since anomalous points have different attribute-values, they are isolated much faster than regular points. It uses a random tree structure to partition the data, i.e. isolating all points and store path lengths to each point. When a collection of these random trees (a forest) produce shorter path lengths for some of the points, then there is a high chance of the points being anomalous.

Figure 3.1: Comparison between isolating an inlier and outlier. Picture taken from Liu et al. [8].

(22)

3.3. Isolation forest 13

Figure 3.1 presents an example taken from Liu et al. [8] that shows the difference in

isolating an inlier, xi versus an outlier, x0 using a Gaussian distribution of 135 points as

input data. The inlier was isolated using 12 partitions while in the case with the outlier required 4 partitions to isolate the point.

Anomaly detection using Isolation Forest

Using the Isolation Forest algorithm for detection of outliers is a two step process. The algorithm needs first to be trained on some set of data. The training stage uses samples of the training set to build isolation trees. The second step, testing, passes testing samples through the isolation trees to obtain anomaly scores [8]. The input to the algorithm consists of two parameters. The sub-sampling size ψ and the number of trees t. Algorithm 1 creates the forest and is done by recursively calling algorithm 2 to create iTrees, until

the tree height limit (l = dlog2ψe) is reached, or a point is isolated. Pseudocode for

training stage can be found in Figure 3.2, which describes the forest and Figure 3.3 [8], which describes the creation of iTrees. Pseudocode for testing can be found in 3.4. The reader should note that iForest also work on data sets with no anomalies [8].

Figure 3.2: Training algorithm 1. Pseu-docode taken from Liu et al. [8].

Figure 3.3: Training algorithm 2. Pseu-docode taken from Liu et al. [8].

(23)

Figure 3.4: Testing algorithm. Pseudocode taken from Liu et al. [8].

3.4 Extended isolation forest

Extended isolation forest is a variation of isolation forest (IF) that incorporates some improvements. Hariri et al. [9] argue that the anomaly scores created by IF suffer from inconsistencies and are biased. Using score maps, they showed that in two-dimensional datasets the distribution of scores were inconsistent.

Even though the slicing operation used by IF to isolate data points occur at random, the slicing always occur parallel to the axes of operation. The extended IF fixes this issue by either transforming the data at random before each tree creation, or slicing the data with random slopes of hyperplanes, as shown in Figure 3.5. The reader should note that this random slicing also works in datasets of greater dimension than two.

The code in algorithm 1 remains the same as in IF, but algorithms 2 and 3 have been changed. In algorithm 2, changes are made for picking random features and random values to that feature. Also, Hariri et al. have made it possible to define the extension level used by the algorithm, so that in higher dimensional data with low cardinality in one dimension, the feature could be ignored to reduce possible selection bias. In algorithm 3, the changes are made accordingly to algorithm 2. Pseudocode can be found in Figure 3.6 and Figure 3.7 [9].

(24)

3.5. DBSCAN 15

Figure 3.5: Visualization of partitioning in Extended Isolation Forest. Picture from Hariri et al. [9].

Figure 3.6: Training algorithm 2, code

taken from Hariri et al. [9].

Figure 3.7: Testing algorithm, code taken from Hariri et al. [9].

3.5 DBSCAN

Density-based spatial clustering of applications with noise (DBSCAN) is a popular clus-tering algorithm which is implemented in several toolkits and discussed in textbooks,

(25)

such as [23] and [24]. It uses a minimum density level estimation, and takes two argu-ments from the user, minP ts that define a number of neighbors, and a radius measure ε. Points within ε with more than minP ts neighbors are called core points and all points within this radius of the point are considered to belong to the same cluster [10]. If a point in this neighborhood is again classified as a core point, these neighborhoods are also included, called density reachability. Points that do not fall into this category, but are still in the neighborhood are classified as border points. All points within the set are density connected. Points that do not fall into these categories are considered as noise. Figure 3.8 presents a description of this procedure. The reader can find pseudocode of the algorithm in Figure 3.9 [10].

Figure 3.8: Illustration of DBSCAN. Arrows indicate density reachability. Point A and other red points are considered core points. Point B and C are border points and are density connected together with A and the red points. Point N does not fall into the category of core point nor border point and is hence classified as noise. Picture taken from Schubert et al. [10].

(26)

3.5. DBSCAN 17

(27)

(28)

C

HAPTER

4 Implementation

This chapter describes the implementation of this project. It includes the data processing, how the data is handled within the various frameworks used and how those frameworks work with each other is presented. Furthermore, the software architecture of how every-thing communicates is introduced together with the .py files used in the project and the machine learning implementations. Finally, the visualization of data is discussed.

4.1 Data

The data used in this project comes from vibration signals from six different wind turbines located in a wind farm in northern Sweden. It has been recorded during a time span of 46 consecutive months, in approximately 12 hour intervals (some days have more than two recordings, some days have less). The data is recorded with a sampling rate of 12.8kS/s and each signal segment is 1.28 seconds long, providing 16384 samples. During the recording of the data, five out of six turbines turbines did not experience any faults. These turbines are 1, 2, 3, 4, and 6, except for turbine 2 which suffered an electrical fault at the start of operation, probably from an incorrect installation.

Turbine 5 suffered two failures. One failure occurred in the inner raceway of a rolling element bearing on the output shaft. The failure was reported in January 2013 and the bearing was replaced a month later in February, after 1.2 years of operation. The second reported fault was an inner raceway failure on one of four cylindrical bearings supporting the one of the planets in the first planetary gear, causing the company to replace the entire gearbox in November 2013 after 2 years of operation. [4].

The dataset has been preprocessed using sparse coding with dictionary learning for use in this project, and comes in six different .mat files, one for each turbine. These files contain the following information:

Sparse representation matrices: The matrices have dimension M xN , where m=1600 and n≈2000 (it differs for each turbine). Each m represents an event and each n corresponds

(29)

to a sample time of the vibration signal, which is henceforth referred to as segment. An atomic event is defined by an atom number, time stamp and weight. Each of these variables form one of the matrices of the sparse representation. All of these variables exist for both sparse coding algorithms (Matching Pursuit and Orthogonal Matching Pursuit). The list of variables include:

• e: indicates the atom

• t: indicates the time within the signal segment • w: indicated the weight

• DB: contains the resulting Signal-to Noise residual (SNR) of the signal segment at the end of the sparse coding algorithm

• dist0: contains the dictionary distance as it is measured from the current signal segment to a baseline dictionary

• dist60: contains the dictionary distance as it is measured from the current signal segment to 60 signal segments in the past.

• ldc: contains the learned dictionaries.

Common variables to both sparse coding algorithms:

• V rms: each row corresponds to the vibration RMS value of each of the signal segments

• selseg: list of signal segments selected from the original input signal

• measInf o: matrix where each row corresponds to a signal segment and each column is a different signal descriptor:

1. Speed: mean segment speed [cpm].

2. SpeedM in: minimum segment speed [cpm]. 3. SpeedM ax: maximum segment speed [cpm]. 4. Load: load of segment

5. RotDir: rotational Direction

6. SignalLines: number of signal points in the original signal. 7. SampleRate: sampling frequency [Hz]. Always 16800.

The implementation consists of 5 Python files, Plot, which handles all the plotting

and visualisations. IF, which is an implementation of Isolation forest. EIF, which

(30)

4.2. Frameworks 21

DBSCAN. Finally, Data, which handles all the data extraction and manipulation from .mat files provided for this project. When one of the machine learning algorithms or visualisation is run, the .mat files are loaded into Python using scipy package and loads as a Python Dictionary. The dictionary keys are loaded as the variables of the .mat file and their respective values are the values of the variables (matrix or vector). Data relevant to the feature set chosen for the algorithm or visualisation is extracted and transformed using numPy and finally stored as a Pandas DataFrame. See Figure 4.1 for a simplified view of the communication between classes.

Figure 4.1: View of the Python implementation consisting of the five files for plotting, Isolation Forest, Extended Isolation Forest, DBSCAN and data handling and manipulation. The machine learning implementations are kept separated but connect to the data class and the plot class. The numbers indicate in which order the classes are communicating.

4.2 Frameworks

In this subsection, the reader is presented with the frameworks used in the project, such as numPy, pandas, matplotlib, and scikit-learn.

4.2.1 NumPy

NumPy is widely used for scientific computations in Python. The package contains an N-dimensional array object, ndarray which arranges the data as vectors and matrices. Some of the mot common operations in this package include linear algebra, fourier transform and random number capabilites, among a lot of other useful functions.

(31)

4.2.2 Pandas

Pandas is an open source tool for data analysis applications. It provides easy to use data structures with list-like capabilities and contains numerous functions to help the user create, transform and analyse data. The package is very well integrated with numPy and matplotlib.

4.2.3 Matplotlib

Matplotlib is a plotting library used for both 2D and 3D plotting in Python. The package provides easy to use functions that are familiar to Matlab users. The documentation is well written with a lot of code examples, and the interfaces are customisable for the more experienced user.

4.2.4 Scikit-learn

Scikit-learn is an open source package used for data mining and data analysis. The package is built on NumPy and matplotlib. It provides various implementations of pre-processing, model selection and machine learning algorithms among others.

4.3 Isolation Forest

The Isolation forest (IF) algorithm is installed from Scikit-learn, and the implementation is built on that. The class connect to the data class for relevant data delivery as a Pandas Dataframe. In this work, the data used as input to the algorithm are 2D sets of feature representations, chosen from the sparse coding variables described in section 4.1.

The algorithm computes scores for input samples as the mean anomaly score of the trees in the forest. When the algorithm is run on the feature set, anomaly scores can be retrieved by using the decision function method. Scores below 0 are considered to be outliers, and the lower the score, the more the input sample is considered to be an outlier. The algorithm has to be run with a contamination parameter specified by the user (default 10%), and it uses this to classify the most anomalous points within the dataset. The fit predict method returns labels for each of the input samples based on this classification. The labels are either -1 (outlier) or 1 (inlier) for each sample. The plot class created for this project is used to plot the results, green dots represent inliers and red dots represent outliers.

4.4 Extended isolation forest

The Extended isolation forest (EIF) algorithm is installed from https://github.com/ sahandha/eif.git and the implementation is built on that. The class connect to the

(32)

4.5. DBSCAN 23

data class for relevant data delivery as a Pandas DataFrame. In this work, the data delivered to the algorithm are 2D sets of feature representations, chosen from the sparse coding variables described in Section 4.1.

After running of the algorithm, it returns anomaly scores for each point. The scores are positive, and the higher the score, the more anomalous the point. Points greater than 0.5 can be considered anomalous, but it can differ depending on the data. The implementation in this work has a contamination parameter to be specified by the user and is used to classify the most anomalous data as outliers. This is done to keep the operation of the IF and EIF similar to each other. The plot class created for this project is used to plot the results, green dots representing inliers and red dots representing outliers.

4.5 DBSCAN

The DBSCAN algorithm is installed from Scikit-learn, and the implementation is built on that. The class connect to the data class for relevant data delivery as a Pandas DataFrame. In this work, the data delivered to the algorithm are 2D sets of feature representations, chosen from the sparse coding variables described in Section 4.1.

The algorithm takes two inputs, eps and minSamples. Eps is the radius and minSam-ples the minimum number of neighbors required for a point to be called a core point within that radius. Refer to Section 3.5 for an explanation of what eps and minSamples means.

When the algorithm is run, it groups data into clusters and labels each cluster by a number, -1 being considered “noise” or outlier. The plot class created for this project is used to plot the results, green dots represet inliers and red dots represent outliers.

The algorithm is implemented to run in three different ways, which will be called without normalization, with normalization and with adjusted normalization. Without normalization runs the algorithm as it is on the feature set provided. With Normalizatin uses Scikit-learns StandardScaler which standardizes the features by removing mean and scaling to unit variance before running the algorithm.

With adjusted normalization takes the feature set, retrieves the greatest value of the y-axis (usually weight of atom), and scales the x-axis by that value. This motivation behind this implementation is that the distribution of data is very tight in some areas when using interspike-interval as a feature, and DBSCAN could have issues in clustering those.

(33)

(34)

C

HAPTER

5 Results

This chapter presents the results from the models discussed in the previous chapter. The results include a comparison among all segments (time) of the turbines and a comparison among turbines at each segment using Isolation Forest and Extended Isolation Forest. This chapter also includes the results from the DBSCAN algorithm under three different scenarios. It is also worth noting that all of the data used in this chapter has been preprocessed with the Matching Pursuit algorithm.

5.1 Isolation Forest

5.1.1 Comparison among segments

This section presents a comparison among all segments using Isolation Forest (IF) and considering three features. Each segment is a vibration recording with a length of 1.28 seconds (16384 samples). The considered features are Dictionary distance, Signal-to-Noise residual (DB) and segment number. The study is first run using dictionary distance and segment number and it is followed by the study of dictionary distance versus Sginal-to-Noise residual. The data algorithm in this sub section runs on a contamination parameter of 0.01, which means that 1% of the data points will be interpreted as outliers. However, the interest lies in the actual anomaly score.

(35)

0 500 1000 1500 2000 seg 0 2 4 6 8 10 12 14 di st turbine 1 inliers outliers (a) Turbine 1 0 500 1000 1500 2000 seg 0 2 4 6 8 10 12 14 di st turbine 2 inliers outliers (b) Turbine 2 0 200 400 600 800 1000 seg 0 2 4 6 8 di st turbine 3 inliers outliers (c) Turbine 3 0 200 400 600 800 1000 1200 1400 1600 seg 0 2 4 6 8 10 di st turbine 4 inliers outliers (d) Turbine 4 0 250 500 750 1000 1250 1500 1750 2000 seg 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 di st turbine 5 inliers outliers (e) Turbine 5 0 200 400 600 800 1000 1200 1400 1600 seg 0 2 4 6 8 10 12 14 di st turbine 6 inliers outliers (f) Turbine 6

Figure 5.1: Isolation Forest algorithm applied to Turbine 1, Turbine 2, Turbine 3, Turbine 4, Turbine 5 and Turbine 6 using features Dictionary distance (vertical axis) and Segment number (horizontal axis) as input features.

Figure 5.1 shows IF applied to Turbine 1, Turbine 2, Turbine 3, Turbine 4, Turbine 5 and Turbine 6 with the features Dictionary distance and Segment number. All figures essentially show that the outliers reside either within the head or the tail of the curves. These points are the easiest to isolate given the characteristics of the Isolation Forest algorithm, This behavior might be the result of a suboptimal visualization for the selected features.

(36)

5.1. Isolation Forest 27

Table 5.1: Dates and anomaly scores from the Isolation Forest algorithm on the features Dic-tionary distance and Segment number for Turbine 1, Turbine 2 and Turbine 3.

Turbine 1 Turbine 2 Turbine 3

Date Anomaly score Date Anomaly score Date Anomaly score

2011-11-12 -0.220414 2011-11-10 -0.249934 2012-02-29 -0.201839 2011-11-12 -0.217981 2011-11-10 -0.249427 2012-02-29 -0.201839 2011-11-13 -0.217011 2011-11-11 -0.247296 2012-03-01 -0.194283 2011-11-13 -0.208811 2011-11-12 -0.242257 2012-03-01 -0.187341 2011-11-14 -0.207373 2011-11-12 -0.239750 2012-03-01 -0.181457 2011-11-14 -0.205532 2011-11-12 -0.237751 2012-03-02 -0.172365 2011-11-15 -0.202199 2011-11-12 -0.234762 2012-03-02 -0.167550 2011-11-15 -0.199354 2011-11-12 -0.231291 2012-03-02 -0.158719 2011-11-15 -0.197464 2011-11-12 -0.230796 2012-03-03 -0.156939 2011-11-16 -0.195108 2011-11-12 -0.229314 2012-03-04 -0.148752 2011-11-16 -0.192608 2011-11-12 -0.222927 2015-09-15 -0.145574 2011-11-16 -0.191671 2011-11-12 -0.219097 2015-09-16 -0.145574 2011-11-16 -0.188869 2011-11-12 -0.213282 2011-11-17 -0.184687 2011-11-12 -0.210393 2011-11-17 -0.184224 2011-11-13 -0.208164 2011-11-17 -0.183761 2011-11-13 -0.203865 2011-11-17 -0.181452 2011-11-13 -0.197701 2011-11-17 -0.176399 2011-11-13 -0.195804 2011-11-17 -0.171838 2011-11-13 -0.196275 2011-11-17 -0.170022 2011-11-13 -0.195942 2011-11-17 -0.170731 2011-11-13 -0.195845

Table 5.1 shows the output of running Isolation Forest on Turbine 1, Turbine 2 and Turbine 3 with dictionary distance and segment number as input features. As discussed earlier, the algorithm classifies the head or tail of the curves shown in Figure 5.1 and this representation might me suboptimal given the behavior of the algorithm.

(37)

Table 5.2: Dates and anomaly scores from the Isolation Forest algorithm on the features Dic-tionary distance and Segment number for Turbine 4, Turbine 5 and Turbine 6.

2012-02-29 -0.219772 2011-07-17 -0.202673 2011-11-04 -0.205603 2012-02-29 -0.216371 2011-07-19 -0.200300 2011-11-04 -0.205603 2012-03-01 -0.213952 2011-07-20 -0.198880 2011-11-04 -0.202269 2012-03-01 -0.212504 2011-07-20 -0.198880 2011-11-05 -0.198852 2012-03-01 -0.211541 2011-07-20 -0.196991 2011-11-05 -0.190666 2012-03-02 -0.202928 2011-07-21 -0.195010 2011-11-06 -0.186474 2012-03-02 -0.193819 2011-07-21 -0.194070 2011-11-06 -0.186009 2012-03-02 -0.189607 2011-07-21 -0.192194 2011-11-06 -0.181749 2012-03-03 -0.186348 2011-07-22 -0.192194 2011-11-07 -0.175684 2012-03-04 -0.184030 2011-07-22 -0.191726 2011-11-07 -0.169220 2012-03-04 -0.177028 2011-07-22 -0.189926 2011-11-07 -0.162594 2012-03-05 -0.175656 2011-07-22 -0.188993 2011-11-07 -0.161987 2012-03-05 -0.171100 2011-07-22 -0.186666 2011-11-07 -0.158544 2012-03-06 -0.168739 2011-07-22 -0.185737 2011-11-07 -0.159502 2012-03-06 -0.166162 2011-07-22 -0.183884 2015-09-13 -0.157436 2012-03-07 -0.162922 2011-07-22 -0.179122 2015-09-13 -0.159218 2012-03-07 -0.157409 2011-07-22 -0.179217 2015-09-14 -0.159218 2011-07-22 -0.176008 2011-07-23 -0.177314

Table 5.2 shows the output of running Isolation Forest on Turbine 4, Turbine 5 and Turbine 6 with dictionary distance and segment number as input features. As discussed with Table 5.1 the outliers classified by Isolation Forest in Figure 5.1 are located either at the head or tail of the curves. This behavior might be the result of a suboptimal visualization for the selected features.

(38)

5.1. Isolation Forest 29 0 500 1000 1500 2000 seg 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 di st turbine 1-6 inliers outliers

Figure 5.2: Isolation forest applied to all turbines with the dictionary distance and segment number as input features.

In contrast to the view of the individual turbines, Figure 5.2 presents the IF algorithm applied to the dictionary distance and segment number of all the turbines simultaneously. A similar conclusion appears to be true here as well, IF only classifies the heads or tails of the curves due to the algorithm isolating these points with the fewest splits. The visualization introduces a bias towards the initial and final segments and is not optimal for this work.

(39)

This study continues with the evaluation of the Isolation Forest algorithm using dic-tionary distance and Signal-to-Noise residual as input features.

4 6 8 10 12 14 db 0 2 4 6 8 10 12 14 di st turbine 1 inliers outliers (a) Turbine 1 −7.5 −5.0 −2.5 0.0 2.5 5.0 7.5 10.0 12.5 db 0 2 4 6 8 10 12 14 di st turbine 2 inliers outliers (b) Turbine 2 4 6 8 10 12 14 db 0 2 4 6 8 di st turbine 3 inliers outliers (c) Turbine 3 7 8 9 10 11 12 13 db 0 2 4 6 8 10 di st turbine 4 inliers outliers (d) Turbine 4 −2.5 0.0 2.5 5.0 7.5 10.0 12.5 db 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 di st turbine 5 inliers outliers (e) Turbine 5 6 8 10 12 14 db 0 2 4 6 8 10 12 14 di st turbine 6 inliers outliers (f) Turbine 6

Figure 5.3: Isolation Forest applied to Turbine 1, Turbine 2, Turbine 3, Turbine 4, Turbine 5 and Turbine 6 using features Dictionary distance (vertical axis) and Signal-to-Noise residual (horizontal axis).

Figure 5.3 shows IF applied to Turbines 1-6 using the features Dictionary distance and Signal-to-Noise residual. Turbine 1, Turbine 3, Turbine 4 and Turbine 6 have a disperse behavior with outliers residing in the outer area of the graphs which is expected considering how the IF algorithm identifies points to be anomalous.

Turbine 2 and Turbine 5 on the other hand contain edges that are visually evident. The graphs also show the spread to be limited to the top right in Turbine 2, and in the top middle for Turbine 5. The outliers are residing in the lower density area of the bottom

(40)

left corner in Turbine 2. Turbine 5 has the outliers that are of the edge in the top left corner.

An additional aspect that is worth noting is that the x-axis (Signal-to-Noise residual) of Turbine 2 and Turbine 5 have significantly lower minimum value compared to Turbine 1, Turbine 3, Turbine 4 and Turbine 6. Turbine 2 has a minimum Signal-to-Noise of approximately -7 dB and Turbine 5 has a minimum Signal-to-Noise of approximately -3 dB, while the other turbines have minimum Signal-to-Noise values in the range between 4 and 7 dB: The maximum Signal-to-Noise value of the turbines are around 14 for all the turbines, suggesting that Turbine 2 and Turbine 5 have a wider range for this feature.

The dates corresponding to outliers and their anomaly score can be found in Table 5.3 and Table 5.4. The turbines are split up into two tables for ease of review when compared to a single table..

Table 5.3: Dates and anomaly scores from the Isolation Forest algorithm on the features Dic-tionary distance and Signal-to-Noise residual for Turbine 1, Turbine 2, and Turbine 3.

2011-11-12 -0.178211 2011-11-10 -0.248357 2012-02-29 -0.160730 2011-11-12 -0.180089 2011-11-11 -0.305871 2012-03-07 -0.190203 2011-11-13 -0.178944 2011-11-12 -0.308602 2012-03-07 -0.159701 2011-11-13 -0.169412 2011-11-12 -0.276325 2012-03-11 -0.214023 2011-11-14 -0.178352 2011-11-12 -0.258158 2012-03-12 -0.211298 2011-11-14 -0.177204 2011-11-12 -0.210534 2012-03-26 -0.208530 2011-11-15 -0.194433 2011-11-12 -0.251010 2012-03-26 -0.213440 2011-11-15 -0.171273 2011-11-12 -0.214065 2012-04-10 -0.206644 2011-11-15 -0.163977 2011-11-12 -0.218913 2012-06-20 -0.185631 2011-11-16 -0.159479 2011-11-13 -0.211653 2012-07-02 -0.164648 2011-11-17 -0.150625 2011-11-13 -0.211329 2012-10-23 -0.172605 2011-11-21 -0.162435 2011-11-13 -0.234428 2014-12-10 -0.165227 2011-11-24 -0.178991 2011-11-13 -0.234428 2011-11-24 -0.189709 2011-11-13 -0.206533 2012-05-06 -0.153733 2011-11-14 -0.219541 2013-01-09 -0.168245 2011-11-19 -0.213435 2013-05-02 -0.152027 2011-11-20 -0.229646 2013-11-27 -0.169045 2011-11-20 -0.231623 2015-04-13 -0.178135 2011-11-22 -0.209047 2015-06-04 -0.171642 2011-11-23 -0.259530 2015-09-06 -0.159288 2011-12-10 -0.211517

(41)

Table 5.3 shows data concerning Turbine 1, Turbine 2 and Turbine 3. In addition, it shows the corresponding dates for the outliers, and their respective anomaly scored classified by the isolation forest algorithm. The majority of dates regarding Turbine 1 and Turbine 3 correspond to the beginning of operation with Turbine 1 having some later dates as well, with relatively high scores when compared to Turbine 2.

Regarding turbine 2, all of the dates concerning outliers correspond to the beginning of operation, for which is it known that there was an electrical fault.

Table 5.4: Dates and anomaly scores from the Isolation Forest algorithm on the features Dic-tionary distance and Signal-to-Noise residual for Turbine 4, Turbine 5 and Turbine 6.

2012-03-01 -0.128744 2011-07-20 -0.169423 2011-11-05 -0.156313 2012-03-02 -0.126086 2011-07-21 -0.202719 2011-11-06 -0.152303 2012-03-05 -0.133677 2011-07-22 -0.182594 2011-11-07 -0.151639 2012-03-06 -0.127164 2011-07-22 -0.168359 2011-11-07 -0.142854 2012-03-07 -0.147593 2013-01-10 -0.179847 2011-11-07 -0.141924 2012-03-08 -0.12770 3 2013-01-15 -0.168085 2011-11-07 -0.149576 2012-03-11 -0.132336 2013-01-25 -0.180264 2011-11-07 -0.138734 2012-03-13 -0.131127 2013-01-26 -0.216947 2011-11-16 -0.142582 2012-03-26 -0.143443 2013-01-26 -0.189301 2011-11-16 -0.143721 2012-05-14 -0.159312 2013-01-27 -0.191500 2011-11-21 -0.175539 2012-09-13 -0.140218 2013-01-28 -0.187671 2011-12-02 -0.165427 2013-03-20 -0.136088 2013-01-29 -0.174897 2012-04-20 -0.142290 2014-06-16 -0.140887 2013-01-29 -0.174802 2012-05-05 -0.149215 2015-03-09 -0.135611 2013-01-31 -0.177334 2013-03-02 -0.142269 2015-06-10 -0.128630 2013-02-02 -0.179847 2014-02-24 -0.145499 2015-06-11 -0.148859 2013-02-02 -0.211702 2015-02-16 -0.141042 2015-07-10 -0.141087 2013-02-03 -0.178469 2015-03-07 -0.140455 2013-02-04 -0.213631 2013-02-05 -0.206018

Table 5.4 shows data concerning Turbine 4, Turbine 5 and Turbine 6. The healthy turbines (Turbine 4 and Turbine 6) have high scores when compared to Turbine 5, and the majority of outliers correspond to the beginning of operation.

Regarding Turbine 5, the majority of outliers correspond to the date where the report on the failure of the rolling element bearing occurred, and continue up until the replacement of that bearing.

(42)

scores than the healthy turbines, and the dates and scores in these two turbines corre-spond to the electrical failure in Turbine 2 and the rolling element bearing failure reported on Turbine 5.

As a comparison to the individual turbines, the reader can see in Figure 5.4 IF running on the same features (Dictionary distance and Signal-to-Noise residual) but with all the turbines together. Since the algorithms contamination parameter was set to 0.01, the outliers seem to correspond mainly with outliers found in Figure 5.3(b) and Figure 5.3(e), namely Turbine 2 and Turbine 5.

−5 0 5 10 15 db 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 di st turbine 1-6 inliers outliers

Figure 5.4: Isolation forest running on all turbines where the features are Dictionary distance and Signal-to-Noise residual.

(43)

The anomaly scores of this evaluation are available in Table A.1, Table A.2 and Table A.3 located in Appendix A. Table A.1 contains data on Turbine 2 and the dates cor-respond to the beginning of operation, when it is known there was an electrical fault, possibly from an incorrect installation. The table contains 39 entries. Table A.2 contains data on Turbine 3, and the majority of entries correspond to the beginning of operation. The table contains 10 entries. Table A.3 contains data on Turbine 5. Here the reader can see the dates correspond to the failure report on the inner raceway failure on the rolling element bearing on the output shaft in January of 2013, continuing to the date where where bearing was replaced in February of 2013. The dates continue to extend from July 2013 to November 2013, which correspond to the date of the gearbox replacement. The anomaly scores in this period of time are fluctuating in the lower range until the date of replacement on 2013-11-17. After the replacement of the gearbox on 2013-11-17, there are no anomaly scores reported. In total, this table has 55 entries. Dates ranging from July 2013 until to the gearbox replacement in November of 2013 also shows, with anomaly scores decreasing until the change was made in 2013-11-17. No dates after this change is produced by this run of the isolation forest algorithm. The table contains 55 entries.

5.1.2 Comparison among turbines

This section presents Isolation Forest algorithm applied to a study of each individual segment. The features correspond to the weight and order of each atomic event within an individual segment. Weight is the atom weight resulting for the sparse representation. Order is a normalized measure with respect to the shaft speed that is derived from the inter-spike interval (ISI) of each event per atom number.

Mean and standard deviation of anomaly scores identified are calculated for all turbines combined, and for each turbine individually. Number of anomaly scores identified outside different standard deviations is also presented. Rows corresponding to offline turbines are marked with “−”.

(44)

5.1. Isolation Forest 35 0 200 400 600 800 order 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 w ei gh t inliers outliers

(a) Scatter plot

−0.3 −0.2 −0.1 0.0 0.1 0 2 4 6 8 10 12 14 Turbine 1 Turbine 2 Turbine 3 Turbine 4 Turbine 5 Turbine 6 (b) Histogram

Figure 5.5: Scatter plot (a) and histogram (b) of the Isolation Forest algorithm applied to all combined turbines from 2013-01-05.

Figure 5.5 shows a scatter plot of the IF algorithm applied to all available turbines at date 2013-01-05 and histograms of their corresponding anomaly scores. This date corresponds to the report of an inner raceway failure on rolling element bearing of the output shaft in Turbine 5.

Table 5.5: Mean (µ) and standard devia-tion (σ) of segment with date 2013-01-05 corresponding to reported bearing failure. Turbine 6 is offline. Turbine µ σ All 0.066 0.078 T1 0.066 0.075 T2 0.062 0.084 T3 0.065 0.082 T4 0.064 0.078 T5 0.074 0.070 T6 -

-Table 5.6: Number of anomaly scores out-side different standard deviations for seg-ment 2013-01-05 corresponding to reported bearing failure. Turbine 6 is offline.

1σ 2σ 3σ 4σ 5σ 6σ 1060 541 189 44 1 0 T1 227 105 26 1 0 0 T2 245 146 46 7 0 0 T3 237 129 41 11 0 0 T4 188 93 48 21 1 0 T5 163 68 28 4 0 0 T6 - - -

(45)

-Table 5.5 and -Table 5.6 are derived from the histogram in Figure 5.5(b). The tables show mean and standard deviation of anomaly scores, and number of anomalies outside of various standard deviations. Table 5.5 indicates that the mean of the anomaly scores regarding Turbine 5 is higher when compared to the other turbines. Table 5.6 shows that the number outliers found outside of various standard deviations concerning Turbine 5 is smaller when compared to the other turbines. The other turbines were operating in a healthy state of operation during this date.

0 200 400 600 800 order 2.5 5.0 7.5 10.0 12.5 15.0 17.5 w ei gh t inliers outliers

(a) Scatter plot

Figure 5.6 shows the scatter plot of the IF algorithm applied to all turbines available at date 2013-02-04 and histograms of their corresponding anomaly scores. This date corresponds to the replacement of the rolling element bearing in Turbine 5 which was reported approximately three weeks earlier.

(46)

Table 5.7: Mean (µ) and standard devia-tion (σ) of segment with date 2013-02-04 corresponding to the bearing replacement. Turbine 6 is offline. Turbine µ σ All 0.061 0.079 T1 0.061 0.074 T2 0.055 0.088 T3 0.059 0.083 T4 0.059 0.080 T5 0.073 0.068 T6 -

-Table 5.8: Number of anomaly scores out-side different standard deviations for seg-ment 2013-02-04 corresponding to the bear-ing replacement. Turbine 6 is offline.

1σ 2σ 3σ 4σ 5σ 6σ Total 1128 499 141 34 1 0 T1 231 69 16 4 0 0 T2 274 137 43 7 0 0 T3 246 131 30 9 0 0 T4 209 92 46 11 1 0 T5 168 70 6 3 0 0 T6 - - -

-Table 5.7 and -Table 5.8 are derived from the histogram in Figure 5.6(b). The tables show mean and standard deviation of anomaly scores, and number of anomalies outside of various standard deviations. Table 5.7 indicates that the mean of the anomaly scores regarding Turbine 5 is higher when compared to the other turbines and Table 5.8 shows that the number outliers found outside of various standard deviations concerning Turbine 5 is smaller when compared to the other turbines. The other turbines were operating in a healthy state of operation during this date.

(47)

0 200 400 600 800 1000 order 2 4 6 8 10 12 14 w ei gh t inliers outliers

(a) Scatter plot

−0.3 −0.2 −0.1 0.0 0.1 0 2 4 6 8 10 12 14 16 Turbine 1 Turbine 2 Turbine 3 Turbine 4 Turbine 5 Turbine 6 (b) Histogram

Figure 5.7 shows a scatter plot of the IF algorithm applied to all available turbines online at date 2013-02-03 and histograms of their corresponding anomaly scores. This date corresponds to the replacement of the rolling element bearing in Turbine 5, which was reported approximately three weeks earlier. The reader should note that this data is from one day earlier than Figure 5.6 and is presented because it is a date closest to the fault with data from Turbine 6 available.

(48)

Table 5.9: Mean (µ) and standard devia-tion (σ) of segment with date 2013-02-03 corresponding to the bearing replacement. Turbine 3 is offline. Turbine µ σ All 0.071 0.079 T1 0.078 0.074 T2 0.077 0.080 T3 - -T4 0.062 0.084 T5 0.077 0.065 T6 0.061 0.087

Table 5.10: Number of anomaly scores out-side different standard deviations for seg-ment 2013-02-03 corresponding to the bear-ing replacement. Turbine 3 is offline.

1σ 2σ 3σ 4σ 5σ 6σ Total 1091 554 198 32 0 0 T1 200 89 29 3 0 0 T2 200 122 45 2 0 0 T3 - - - -T4 261 127 53 10 0 0 T5 143 62 22 6 0 0 T6 287 154 49 11 0 0

Table 5.9 and Table 5.10 are derived from the histogram in Figure 5.7(b). The tables show mean and standard deviation of anomaly scores, and number of anomalies outside of various standard deviations. In Table 5.9, the results show that the mean is similar for Turbine 1, Turbine 2 and Turbine 5. Turbine 5 however has a lower spread when compared to the other turbines though, as can be seen in Table 5.10. The other turbines were operating under healthy conditions on this day.

(49)

0 100 200 300 400 500 600 700 order 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 w ei gh t inliers outliers

(a) Scatter plot

Figure 5.8 shows a scatter plot of the IF algorithm applied to all available turbines at date 2013-11-17 and histograms of their corresponding anomaly scores. This date corresponds to the replacement of the gearbox in Turbine 5.

Table 5.11: Mean (µ) and standard devi-ation (σ) of segment with date 2013-11-17 corresponding to replacing the entire gear-box. Turbine µ σ All 0.062 0.079 T1 0.059 0.082 T2 0.060 0.083 T3 0.060 0.084 T4 0.06 0.078 T5 0.068 0.066 T6 0.062 0.081

Table 5.12: Number of anomaly scores out-side different standard deviations for seg-ment 2013-11-17 corresponding to replac-ing the entire gearbox.

1σ 2σ 3σ 4σ 5σ 6σ Total 1396 603 183 47 0 0 T1 269 116 34 6 0 0 T2 286 126 22 7 0 0 T3 250 126 39 12 0 0 T4 199 86 40 9 0 0 T5 159 47 13 2 0 0 T6 233 102 35 11 0 0

(50)

Table 5.11 and Table 5.12 are derived from the histogram in Figure 5.8(b). The tables show mean and standard deviation of anomaly scores, and number of anomalies outside of various standard deviations. Table 5.11 indicates that the mean of the anomaly scores regarding Turbine 5 is higher compared to the other turbines, and Table 5.12 shows that the number outliers found outside of the various standard deviations concerning Turbine 5 is smaller than compared to the other turbines. The other turbines were operating in a healthy state of operation during this date.

0 200 400 600 800 1000 order 2.5 5.0 7.5 10.0 12.5 15.0 17.5 w ei gh t inliers outliers

(a) Scatter plot

Figure 5.9 shows a scatter plot of the IF algorithm applied to all available turbines at date 2014-06-01 and histograms of their corresponding anomaly scores. All turbines are considered to be operating in a healthy state during this date.

(51)

Table 5.13: Mean (µ) and standard devi-ation (σ) of segment with date 2014-06-01 corresponding to healthy operation of all turbines. Turbine µ σ All 0.061 0.074 T1 0.057 0.078 T2 0.054 0.079 T3 0.058 0.076 T4 0.067 0.070 T5 0.064 0.071 T6 0.066 0.070

Table 5.14: Number of anomaly scores out-side different standard deviations for seg-ment 2014-06-01 corresponding to healthy operation of all turbines.

1σ 2σ 3σ 4σ 5σ 6σ Total 1324 579 208 77 1 0 T1 240 113 42 16 0 0 T2 286 118 29 17 0 0 T3 242 109 33 10 0 0 T4 177 76 39 11 1 0 T5 199 81 31 9 0 0 T6 180 82 34 14 0 0

In Table 5.13, it is possible to see that Turbine 4 has the greatest mean and least spread, along with Turbine 6. Turbine 5 does not stand out as it did in the previous dates.

5.2 Extended Isolation Forest

5.2.1 Comparison among segments

This section presents a comparison among all segments using Extended Isolation Forest (EIF) and considering the same three features as in the previous section. Each segment is a vibration recording with a length of 1.28 seconds (16384 samples). The considered features are Dictionary Distance, Signal-to-Noise residual (DB) and segment number. The study is first run using dictionary distance and Signal-to-Noise residual and it is followed by the study of dictionary distance versus segment number. The algorithm in this section uses a contamination parameter of 0.01, which means that 1% of the data points will be interpreted as outliers. However, the interest lies in the actual anomaly score.

It is worth noting that the work in this section is similar to the previous sections and as a result, many points will be reiterated.

(52)

5.2. Extended Isolation Forest 43 0 500 1000 1500 2000 seg 0 2 4 6 8 10 12 14 di st turbine 1 inliers outliers (a) Turbine 1 0 500 1000 1500 2000 seg 0 2 4 6 8 10 12 14 di st turbine 2 inliers outliers (b) Turbine 2 0 200 400 600 800 1000 seg 0 2 4 6 8 di st turbine 3 inliers outliers (c) Turbine 3 0 200 400 600 800 1000 1200 1400 1600 seg 0 2 4 6 8 10 di st turbine 4 inliers outliers (d) Turbine 4 0 250 500 750 1000 1250 1500 1750 2000 seg 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 di st turbine 5 inliers outliers (e) Turbine 5 0 200 400 600 800 1000 1200 1400 1600 seg 0 2 4 6 8 10 12 14 di st turbine 6 inliers outliers (f) Turbine 6

Figure 5.10: Extended Isolation Forest applied to Turbine 1, Turbine 2, Turbine 3, Turbine 4, Turbine 5 and Turbine 6 using features Dictionary distance (vertical axis) and Segment number (horizontal axis).

Figure 5.10 shows EIF applied to Turbine 1, Turbine 2, Turbine 3, Turbine 4, Turbine 5 and Turbine 6 with the features Dictionary distance and Segment number. All the figures essentially show that the algorithm selects the initial and final segments as outliers. It appears that even when the algorithm performs splitting using random slopes with reference to either of the axes, it is still easier to isolate the points at the extremes of the curves.

(53)

Table 5.15: Dates and scores from the Extended Isolation Forest algorithm on the features Dictionary distance and Segment number for Turbine 1, Turbine 2 and Turbine 3.

2015-09-15 0.646916 2011-11-10 0.649620 2015-09-16 0.659316 2015-09-14 0.646479 2011-11-10 0.647423 2015-09-15 0.657979 2015-09-13 0.646479 2011-11-11 0.645583 2015-09-14 0.654198 2015-09-12 0.646042 2011-11-12 0.644274 2015-09-14 0.652430 2015-09-12 0.645605 2011-11-12 0.643336 2015-09-09 0.645403 2015-09-11 0.645605 2015-09-15 0.642184 2012-02-29 0.643752 2015-09-11 0.644285 2015-09-15 0.642117 2012-02-29 0.643752 2011-11-12 0.644255 2011-11-12 0.641294 2015-09-09 0.642421 2011-11-12 0.642079 2015-09-14 0.639948 2012-03-01 0.641211 2015-09-09 0.639508 2015-09-14 0.638650 2015-09-08 0.638819 2015-09-09 0.636828 2015-09-13 0.638584 2012-03-01 0.636390 2015-09-08 0.635537 2015-09-13 0.636427 2011-11-13 0.634606 2015-09-12 0.633783 2011-11-13 0.634606 2011-11-12 0.631627 2011-11-14 0.633203 2011-11-12 0.629560 2011-11-14 0.633203 2015-09-12 0.629532 2015-09-07 0.632073 2015-09-10 0.629040 2011-11-15 0.629775 2011-11-12 0.628129 2011-11-15 0.629349 2011-11-12 0.627214 2011-11-15 0.626288 2015-09-09 0.626426

Table 5.15 shows the output of running Extended Isolation Forest on Turbine 1, Turbine 2 and Turbine 3 with dicitonary distance and segment number as input features. As discussed earlier, the algorithm classifies the head or tail of the curves shown in Figure 5.10. This behavior might be the result of a suboptimal visualization for the features selected.

(54)

5.2. Extended Isolation Forest 45

Table 5.16: Dates and scores from the Extended Isolation Forest algorithm on the features Dictionary distance and Segment number for Turbine 4, Turbine 5 and Turbine 6.

2015-09-15 0.662830 2015-09-14 0.657041 2015-09-14 0.665033 2015-09-15 0.660946 2015-09-14 0.655709 2011-11-04 0.664136 2015-09-14 0.659090 2015-09-13 0.655572 2015-09-13 0.664133 2015-09-14 0.657754 2015-09-13 0.653731 2015-09-13 0.662338 2012-02-29 0.656464 2015-09-12 0.650135 2015-09-12 0.660479 2012-02-29 0.654597 2015-09-11 0.648816 2011-11-04 0.659589 2015-09-13 0.654020 2015-09-09 0.647939 2011-11-05 0.657258 2012-03-01 0.651436 2015-09-08 0.647410 2011-11-04 0.657258 2015-09-13 0.651368 2011-07-19 0.646897 2011-11-05 0.653619 2015-09-12 0.650859 2011-07-17 0.646897 2015-09-12 0.650960 2012-03-01 0.649226 2015-09-09 0.646602 2011-11-06 0.650023 2015-09-12 0.647278 2011-07-20 0.642535 2015-09-09 0.648918 2015-09-10 0.647210 2011-07-20 0.639358 2015-09-07 0.638462 2012-03-01 0.643147 2015-09-07 0.638462 2015-09-09 0.648918 2012-03-02 0.639622 2015-09-08 0.637688 2011-11-06 0.645783 2015-09-09 0.633440 2011-07-20 0.633367 2015-09-08 0.645257 2011-07-21 0.632444 2015-09-07 0.630580 2015-09-07 0.630580 2011-11-06 0.643602 2015-09-06 0.629639

Table 5.15 shows the output of the Extended Isolation Forest algorithm applied to Turbine 4, Turbine 5 and Turbine 6. As discussed earlier, the algorithm classifies the head or tail of the curves shown in Figure 5.10. Given the nature of the algorithm, the visualization of the selected features might be suboptimal.

(55)

0 500 1000 1500 2000 seg 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 di st turbine 1-6 inliers outliers

Figure 5.11: Extended Isolation Forest applied to all turbines with dictionary distance and segment numbers as input features.

In contrast to the view of the individual turbines, Figure 5.11 presents the EIF algo-rithm applied to the dictionary distance and segment number of all the turbines simul-taneously. A similar conclusion as in the previous section regarding IF appears to be true here too. The algorithm only classifies the heads or tails of the curves due to the algorithm isolating these points with the fewest splits. The visualization introduces a bias towards the initial and final segments and is not optimal for this work.