• No results found

Detection of Long Term Vibration Deviations in GasTurbine Monitoring Data

N/A
N/A
Protected

Academic year: 2021

Share "Detection of Long Term Vibration Deviations in GasTurbine Monitoring Data"

Copied!
61
0
0

Loading.... (view fulltext now)

Full text

(1)

Master of Science Thesis in Electrical Engineering

Department of Electrical Engineering, Linköping University, 2020

Detection of Long Term

Vibration Deviations in Gas

Turbine Monitoring Data

(2)

Johan Hansson LiTH-ISY-EX--20/5298--SE

Supervisor: Pavel Anistratov

isy, Linköping University

Andreas Hansson

Siemens Industrial Turbomachinery AB

Examiner: Erik Frisk

isy, Linköping University

Division of Vehicular Systems Department of Electrical Engineering

Linköping University SE-581 83 Linköping, Sweden

(3)

Abstract

Condition based monitoring is today essential for any machine manufacturer to be able to detect and predict faults in their machine fleet. This reduces the main-tenance cost and also reduces machine downtime. In this master’s thesis two approaches are evaluated to detect long term vibration deviations also called vi-bration anomalies in Siemens gas turbines of type SGT-800. The first is a simple rule-based approach where a series of CUSUM test are applied to several sig-nals in order to check if the an vibration anomaly has occurred. The second approach uses three common machine learning anomaly detection algorithm to detects these vibration anomalies. The machine learning algorithms evaluated are k-means clustering , Isolation Forest and One-class SVM. This master’s thesis conclude that these vibration anomalies can be detected with these ML models but also with the rule-based model with different levels of success. A set of fea-tures was also obtained that was the most important for detection of vibration anomalies. This thesis also presents which of these models are the best suited anomaly detection and would be the most appropriate for Siemens to implement.

(4)
(5)

Acknowledgments

This master’s thesis has been written at Siemens Industrial Turbomachinery AB. I want to thank all of the colleagues at Siemens for interesting insights and help throughout this masters thesis. I also want to thank my supervisor at Siemens Andreas Hansson and the department manager Fredrik Tengvall for all the help and pushing me in the right direction.

I also want to thank my examiner Erik Frisk for valuable insights. Last but not least I want thank Pavel Anistratov my supervisor at Linköping University for a lot of valuable discussions and proofreading of this masters thesis.

Norrköping, May 2020 Johan Hansson

(6)
(7)

Contents

Notation ix 1 Introduction 1 1.1 Problem Formulation . . . 2 1.2 Delimitations . . . 2 2 Background 3 2.1 Industrial Gas Turbines . . . 3

2.2 Vibration Measurements . . . 5 2.3 Vibration Anomalies . . . 7 2.4 Operational Data . . . 8 2.5 Related Work . . . 10 3 Theory 11 3.1 CUSUM . . . 11 3.2 K-means Clustering . . . 11

3.3 Support Vector Machine . . . 13

3.3.1 One-class Support Vector Machine . . . 15

3.4 Isolation Forest . . . 15

3.5 Dimensionality reduction . . . 16

3.6 Normalization . . . 18

3.7 Performance Evaluation . . . 18

4 Method 19 4.1 Tools and Frameworks . . . 19

4.2 Rule-Based Model . . . 19

4.2.1 Parameters and Thresholds . . . 20

4.3 ML models . . . 20

4.3.1 Feature Selection . . . 20

4.3.2 Pre-Processing . . . 22

4.3.3 Parameters and Thresholds . . . 23

4.4 K-means . . . 23

4.5 Isolation Forest . . . 25

(8)

4.6 One-class SVM . . . 25 4.7 Implementation . . . 26 4.8 Performance Evaluation . . . 26 4.9 Data Visualization . . . 27 5 Results 29 5.1 Data Visualization . . . 29 5.2 Rule-based Model . . . 30 5.3 K-means Model . . . 31

5.4 Isolation Forest Model . . . 33

5.5 One-class SVM Model . . . 35 5.6 Model Performances . . . 36 6 Discussion 41 6.1 Normalization . . . 41 6.2 Models . . . 41 6.3 Ground truth . . . 42 6.4 Related Work . . . 43 6.5 Future Work . . . 44 6.6 Results Usefulness . . . 45 7 Conclusions 47 7.1 Thesis Questions . . . 47 Bibliography 49

(9)

Notation

Abbreviations

Abbrevation Meaning

rms Root Mean Square

rdc Remote Diagnostic Centre

cusum Cumulative SUM

svm Support Vector Machine if Isolation Forest

ml Machine Learning

pca Principal Component Analysis sgt Siemens Gas Turbine

tp True Positives fp False Positives

tn True Negatives

fn False Negatives

(10)
(11)

1

Introduction

Siemens Industrial Turbomachinery AB today produces and serves a range of dif-ferent gas turbines. Today it’s crucial that faults are detected before they occur in the gas turbine since otherwise a unplanned stops of the turbine will be needed. Siemens today monitors the turbines as a service to make the customers aware if the turbine shows an abnormal behavior. At the department Remote Diagnos-tic Center (RDC) the team performs daily monitoring of the turbines operation where everything ranging from lube oil pressure to the vibrations in the turbines are supervised. Operational data for the different monitored turbines arrives on a daily basis and triggers a chain of analysis for detecting any abnormal behav-ior. The historical signals can then be viewed in an internal signal database. All of this work at RDC is to provide a proactive diagnosis for the customers and avoiding unplanned stops of the customer turbine site.

One kind of abnormal behavior is long term vibration anomalies. These can be seen as increasing vibrations over time or a change in trends. Not all increases in vibrations are abnormal if they can be explained by the other signals such as an increase in load. Vibrations in turbines are normal and in it self not dangerous but dangerous faults show up in vibrations. The goal for this master’s thesis is to evaluate different methods to detect deviations in long term vibration trends.

(12)

1.1

Problem Formulation

The thesis aim is to answer the following questions:

• Can vibration anomalies be detected in available operational turbine data at Siemens?

• Which methods are suitable for vibration anomaly detection?

• Which features are the most significant for vibration anomaly detection in gas turbines?

With the first question I wish to answer is the quality of the available turbine data good enough to be able to detect vibration anomalies and can the developed mod-els detect these anomalies reliably. The second question aims to answer which of these developed models are the best suited for this kind of anomaly detection. The third question aims to answer which of the available features or signals are the most important for anomaly detection.

1.2

Delimitations

This master’s thesis is limited to detect vibration anomalies in the turbine of type SGT-800. The turbines, which data has been gathered from are also situated in one region and run in similar conditions in terms of load profile and ambient temperature to make the data as similar as possible. This is necessary to be able to develop a fleet model for vibration anomaly detection.

(13)

2

Background

2.1

Industrial Gas Turbines

Industrial gas turbines are used world wide. They can be used to either gener-ate electrical power to an energy grid or genergener-ate mechanical power to another system such as a compressor. Siemens Industrial Turbomachinery AB produces today a range of different industrial turbines but the turbine considered in this master’s thesis is the medium size gas turbine of class SGT-800, see Figure 2.1. This turbine class is sold world wide and its applications can vary from gas and oil industry to mining industry [23]. It’s a single shaft turbine engine and the driveshaft extends from the core engine to the right into the gearbox to the left in Figure 2.1. The core engine is composed of three main components the compres-sor, combustor and turbine.

In Figure 2.1, the gearbox is to the far left. It connects to the driveshaft and to the generator which is not visible in the figure. The gearbox reduces the gas turbine shaft speed to a compatible shaft speed of the generator, the generator operates at around 1500 RPM for a 50 Hz electrical grid. The shaft speed of the turbine is around 6600 RPM. The next component to the right from the gearbox is the air duct where the gas turbine has its air inlet.

In Figure 2.2, a cross section view of the compressor, combustor and turbine is displayed. The compressor contain 15 stages of rotors and stators. One pair of rotor and stator is considered as one stage. As the air passes through the different stages the radius of the rotors and the slits in the stators gets smaller. The pres-sure of the air flowing through increases from 1 bar to 20 bar until the air finally exits. The temperature of the air also increases as the air is compressed inside the compressor. The compressor also has three variable guide vanes that can be used to control the airflow. There are bleeding valves in case the compressor has to release some of its compressed air. The purpose of the compressor is to increase

(14)

Figure 2.1:Full view of a Siemens Industrial Gas Turbine of class SGT-800

the pressure of the air. An increase in air pressure together with an increase in firing temperature lead to an increase in thermal efficiency of the turbine to a certain degree [4]. There is a threshold where the efficiency will decrease with an increase of air pressure for each gas turbine. Since the machine is a single axis machine as the compressor starts spinning the entire rotor starts spinning.

Compressed air then enters the combustor where a ring of burners ignites the gas with some of the compressed air to generate thrust. As the turbine combus-tor is ignited the turbine becomes idle and the ignition level increases until the the sought level of operations is reached. When the sought level of operation is reached the generator switch is shut and the turbine is then operational. The combustor in a SGT-800 is annular which means that there is no separating wall between each burner. The idea behind this construction is to reduce the pressure loss and enable a smaller engine diameter [21]. The purpose of the combustor is to provide a highest possible firing temperature while at the same time keeping the amount of nitrogen oxides (NOx) low in the exhaust gas. An increased firing

temperature in the combustor increases the amount of (NOx) [4]. The turbine

consist of three stages of rotor blades which has interlocking shrouds to reduce vibrations. The rotor blades are thermally coated with special alloys to withstand the temperatures created by the combustor. The turbine blades also have inter-nal cooling via channels trough the blade to further protect the blades from high temperatures. The turbine is in turn connected to the driveshaft which enters the gearbox.

(15)

2.2 Vibration Measurements 5

Figure 2.2:Cross section view of a Siemens Industrial Gas Turbine of class SGT-800

the magnet is spinning the electrons in the coil are moving and thus creating a current. The magnet could have two poles or more. An increasing amount of poles in the magnet reduces the RPM needed of the spinning magnet. This is because the current would alternate faster with extra set of poles in the spinning magnet. The generators connected to the SGT-800 turbines have in general two poles.

2.2

Vibration Measurements

A vibration is an oscillatory movement around some reference point. Vibrations can be measured as either a displacement (d), velocity (v) or as acceleration (a) and modeled with the following equations

d = A sin(wt + φ), v = Aw cos(wt + φ), a = −Aw2sin(wt + φ).

(2.1)

Measurement metrics are defined differently depending what you are measuring. For acceleration, the metric is the peak shown in Figure 2.3 and for displacement it is the peak to peak. For velocity, the metric is obtained by calculating the root

(16)

mean square (RMS) of the signal which is defined by RMS = v u u u u t (1/T ) T Z 0 x(t)2. (2.2)

The different vibration measurements are used in different frequency ranges [4]. For frequencies f ≤ 10 Hz, displacement is used while velocity is used for fre-quencies in the range 10 Hz≤ f ≤ 1000 Hz and acceleration for f ≥ 1000 Hz. A transducer transforms the mechanical vibration to a time-varying voltage

out-Figure 2.3:Timesignal showing the measurements Peak,Peak-Peak and RMS

put [4]. The raw signal is then filtered by an analog filter to obtain the overall signal which is the energy content of the raw signal with frequencies between 10-1000Hz. The 1xn and 2xn are the signals representing the energy content of the raw signal with frequencies one times the current rotor frequency and two times the current rotor frequency. They are obtained by applying a moving analog fil-ter which moves with the current rotor running-frequency also called a tracking filter. The overall 1xn and 2xn signals are then stored in the data collector at the turbine site. The amplitude and phase of these signals are then sent to signal database called STA-RMS where they can be monitored. This flow is shown in Figure 2.4.

(17)

2.3 Vibration Anomalies 7

Figure 2.4:A system view on how the vibrations are measured in a SGT-800

Figure 2.5:A system view on how the vibrations are measured in a SGT-800

Vibrations are measured at 6 different points in the turbine which can be seen in Figure 2.5 at the first and second turbine bearings then in the gearbox where it is measured at the gearbox casing and then at the low speed end. At the generator the vibrations are measured at each of it two bearings. For each of these 6 mea-surement points, an overall signal is generated. The 1xn and 2xn are generated for the first and second turbine bearings and the gearbox. Then there might be signal redundancy depending what package the customer has ordered.

2.3

Vibration Anomalies

What is a vibration anomaly? A vibration anomaly is an increase in vibration lev-els over time that can not be explained by the signals it is usually dependent of. In the case of gearbox casing, its vibrations are dependent of load and ambient tem-perature. In Figure 2.6 one can see the overall gearbox casing vibration content. This signal is plotted together with active load (in green) an ambient temperature (in red). Note that in July 2019 the vibration levels changes. This is likely due to a rebalancing of the rotor. But the changes in vibrations after November 2019 indicate a possible anomaly. In Figure 2.7 one can see this change zoomed in. Fig-ure 2.7 shows that the load has not changed significantly during this time period. Another thing to note is that the vibrations seems to decrease when the outside temperature decreases. In Figure 2.8 the peak is zoomed in again. One can see for the different peaks of ambient temperature that the vibrations are at different levels while the load has not changed significantly during this time period. Thus this peak seems to show anomalous behavior which need to be analyzed further by a vibration expert.

(18)

2.4

Operational Data

The available data set consists of sensor measurements from up to 800+ signals. Of all of these available signals, 102 were selected based on expert knowledge at Siemens. These 102 signals are in some way relevant to vibration anomalies. Of these signals, there might be signal redundancy meaning several sensors measure the same thing. The data is compressed before it is sent to the signal database us-ing an algorithm called swus-ingus-ing door trendus-ing algorithm (SDT)[6]. Swus-ing door trending compression roughly takes two points and draws a line between them, any point that are within a certain distance to this line gets removed. When the data are extracted from the signal database a data management system interpo-lates the different signals in such a way that the signals data will be available at the same time instances even though the signals are sampled at different times and frequencies.

Figure 2.6: Overall gearbox casing vibrations with active load and ambient temperature

(19)

2.4 Operational Data 9

Figure 2.7: Zoom of overall gearbox casing vibrations with active load and ambient temperature 2.6

Figure 2.8: A deeper zoom of overall gearbox casing vibrations with active load and ambient temperature 2.7

(20)

2.5

Related Work

The field of anomaly detection is an established field. The authors of [7] describe anomaly detection in the following way: "anomaly detection refers to the problem of finding patterns in data that do not conform to expected behavior". Anomaly detection is used in various fields such as machine fault detection, fraud detection at credit card companies or network intrusion detection in network security. K-means clustering is used in [24] for network anomaly detection. The data set [24] consists of normal network data and network data when intrusions are made. In [24] k-means has the lowest accuracy but since the model is simple to implement it will still be considered in this master’s thesis. Isolation Forest has been applied to detect gas path anomalies in gas turbines [26], where the authors has no labels available and anomalies are few. This is the case for this master’s thesis as well. In [26] the Isolation Forest model performs well for gas path anomalies and needs no labels. Another approach is to create a model and predict the values of the available signals and then compare them to the observed values to determine if the data is anomalous or not [13]. The authors uses a Support Vector Regression model that predicts performance signals in order to detect performance anoma-lies. A recurrent neural network has also been used for normal pattern extraction for gas path analysis for gas turbines [2]. In [2] only normal data is available and the authors develops an anomaly detection concept for this specific case. Good accuracy was achieved for both faulty data and normal data. One-class SVM has also been used for machine fault detection [22]. In [22], One-class SVM is ap-plied to rotating machinery which is a similar case to this master’s thesis. The authors in [22] concluded that One-class SVM is competitive in terms of anomaly detection with neural networks.

(21)

3

Theory

This contains a brief summary of the theory behind the different methods used throughout the master’s thesis. Methods range from Machine Learning (ML) mod-els such as K-means and Isolation Forest (IF) to dimension reduction techniques such as Principal Component Analysis (PCA).

3.1

CUSUM

The Cumulative SUM (CUSUM) is a common algorithm for signal change detec-tion [17] . The algorithm work by creating a test quantity Tk as

Tk+1= max(Tk+ sk, 0), T0 = 0, sk = |r| − ν, (3.1)

where ν is the drift factor and rk is the residual. The residual should be set up in

such a way that r = 0 when there is no change and r , 0 if there is a change. This gives the test quantity the property that it continues to increase as long as there is a change. The residual is often r = y − ˆy where ˆy is the predicted value and y is

the observed value.

3.2

K-means Clustering

K-means is a popular unsupervised clustering method [8]. It’s an iterative clus-tering method which starts with guessing the placement of the k cluster centers. It then assigns the data points to the closest centers and then replaces the cluster centers depending on the distance average of the data points belonging to it. It repeats this process until the cluster centers doesn’t move [8]. The algorithm is described in [8] as

(22)

1. Given a cluster assignment C, the cluster variance Equation 3.3 is mini-mized with respect to m1, .., mk thus gives the means of the newly assigned

clusters Equation 3.2.

2. With a set of means m1, .., mk, Equation 3.3 is then minimized; each

observa-tion is getting assigned to its closest cluster mean. That is given by Equaobserva-tion 3.4.

3. Step 1 and 2 are repeated until the assigned clusters doesn’t change.

xS = X i∈S ||xim||2 (3.2) min C,mkK1 K X k=1 Nk X C(i)=k ||xkmk||2 (3.3) C(i) = argmin1≤k≤K||xkmk||2 (3.4) In the ideal case, the features are distributed as in Figure 3.1 where a distinct clustering is visible. However this it not usually the case since often more than three features are used and visualization of the clusters is no longer possible.

(23)

3.3 Support Vector Machine 13

3.3

Support Vector Machine

The Support Vector Machine (SVM) is a supervised classification algorithm that tries to construct a linear boundary condition that separates the different classes of data. It does this by calculating the boundary condition that maximizes the margin between the two classes which is the distance between the dotted line and the line in Figure 3.2. If the data points aren’t linearly separable as in Figure 3.2, the algorithm tries to project the data to a higher dimension space using a kernel function φ(x) to more easily separate the data points [8].

Figure 3.2:A SVM example

The algorithm is defined in [8] as follows. Let (x1, y1), ..., (xn, yn) be the

train-ing data where xiRp. The data points can either be of class -1 or 1. Let’s define

the hyperplane as {x : f (x) = xTβ + β0}where β is a unit vector ||β|| = 1 and β0is

a constant. β is a vector which is orthogonal to the hyperplane. The margin M is defined as the distance between the hyperplane and the nearest point belonging to either class. It can be seen in Figure 3.2 as the distance between the either of the dotted lines to the hyperplane which is the line between them. The problem can then be written as

max

β,β0,||β||

M

subject to yi(xTβ + β0) ≥ M, i = 1, .., N ,

(24)

where yi ∈ {−1, 1}. yi is defined in such a way that yi = 1 if x belongs to +1 and

yi = −1 if x belongs to −1. Since the margin is defined as 1/||β|| the problem can

be rewritten as following min β,β0, ||β|| subject to yi(xTβ + β0) ≥ 1, i = 1, .., N . (3.6)

One then introduces a the slack variables ξ = {ξ1, ..., ξN} two handle that the

two classes −1 and 1 overlap. The slack variable ξ is the proportional amount of which the prediction f (xi) = xTi β + β0 is on the wrong side of its margin. This

slack variable can then be introduced to Equation 3.6 which gives

min β,β0, ||β|| subject to yi(xTβ + β0) ≥ 1, i = 1, .., N , ξi ≥0 X ξ ≤ constant. (3.7)

The next step is to re-express Equation 3.7 as a quadratic programming problem

min β,β0, 1 2||β|| 2+ C N X i=1 ξi subject to ξi0, yi(xTβ + β0) ≥ 1 − ξi, i = 1, .., N , (3.8)

where C is a cost parameter which replaces the constant in Equation 3.7. In the case that the classes are separable, C = ∞. The solution is obtained by using Lagrange multipliers. The Lagrange primal function is defined in Equation 3.9.

LP = min β,β0,ξi 1 2||β|| 2+ C N X i=1 ξiN X i=1 αi(yi(xTβ + β0) − (1 − ξi)) − N X i=1 µiξi (3.9)

Then equation 3.9 is derivated with regards to β, β0, ξi and set to zero that

gives the following equations. By substituting Equation 3.10 into Equation 3.9 one the gets the Lagrange dual problem which is shown in Equation 3.11.

β = N X i=1 αiyixi 0 = N X i=1 αiyi αi = C − µii αi, µi, ξi0∀i (3.10) LD = N X i=1 αi− 1 2 N X i=1 N X i0=1 αiαi0y iyi0xT i xi (3.11)

(25)

3.4 Isolation Forest 15

This Lagrange dual problem is then solved. If the classes aren’t linearly separable then a kernel function can be used to project the data to another plane. One popular kernel is the radial basis function kernel defined as

KRBF= exp(−γ||x − x 0||2

). (3.12)

3.3.1

One-class Support Vector Machine

One-class support vector machine (SVM) is an unsupervised anomaly detection algorithm [1] and is a special case of the SVM already described. One-class SVM tries to create a boundary condition for separating normal points from anomalies. This can be seen in Figure 3.3. The algorithm works in a similar fashion to the SVM but instead of trying to isolate three or more classes the One-class SVM tries to isolate normal data from anomalous data. The problem is defined in a similar fashion to the SVM as min β,ξ,β0 1 2||β|| 2β 0+ 1 νN N X i=1 ξi subject to ξi0, yi(βTφ(xi)) ≥ β0−ξi, i = 1, .., N , (3.13)

where ξ, is a slack variable and ν is a regularization parameter that controls the trade off between the constant β0and the slack variables ξ. ν sets an upper limit

on the fraction of anomalies and lower limit on the number of support vectors.

3.4

Isolation Forest

Isolation Forest (IF) is an other unsupervised anomaly detection algorithm [14]. The method works by trying to classify anomalies rather than normal data points. It does this by trying to isolate all data points and creates a tree structure. The algorithm does this by randomly selecting a feature and randomly splitting that feature. It then creates several more tree structures using the same process. The idea is that anomalies are few and exist outside the cluster of normal data points and are thereby easier to isolate.

Isolation forest creates a tree structure called an isolation tree [14]. The nodes T, of the isolation tree either has no or two children Tl, Tr. The values from the

parent node are divided into the child nodes using a test on a randomly selected attribute q and a randomly selected split value p. This splitting of the data con-tinues until either the isolation tree reaches a height limit or all data points are isolated. Next the path length h(x) is defined as the number of edges of the three one has to walk from the current position until one reaches the top node, i.e the root node. The anomalous points will since they are easier to isolate end up closer to the root node and thus have a short path length. Then the average search path in a binary search tree as

(26)

Figure 3.3:A One-class SVM example

where H(i) is estimated by ln(i) + 0.5772156649. With this, one can then intro-duce a score function as

s(x, n) = 2

E(h(x))

c(n) . (3.15)

The score function s(x, n) gives a measure of how anomalous a sample is. A sam-ple with a score function value close to 1 is definitely anomalous while a samsam-ple with a score function much smaller than 0.5 can be regarded as normal. If all off the instances has a score function close to 0.5 the dataset does not have any clear anomalies.

3.5

Dimensionality reduction

To reduce the amount of features and improve model efficiency a dimensionality reduction technique is applied to the data set before it is sent to the models called principal components analysis (PCA) [8]. PCA attempts to reduce the dimensions of the dataset by creating a new dataset from linear combinations of the original features. In Figure 3.4 the process of creating principal components is shown. The data seen in Figure 3.4 can be projected alongside any of the two principal components. In Figure 3.5 the data is projected along the first principal compo-nent. The goal of the principal component analysis is to maximize the explained

(27)

3.5 Dimensionality reduction 17

(a)Data (b) Data with principal compo-nents

Figure 3.4:PCA example.

variance of the projected points.

(28)

3.6

Normalization

Since the ML models use Euclidean distances and the features are of different sizes, it is necessary to normalize the data [18]. There are different scaling meth-ods available an one of them is min max scaling which is defined as

x0= x − min(x)

max(x) − min(x). (3.16)

There is also a z-score normalization which is defined as

x0= x − ¯x

σ . (3.17)

where ¯x and σ are the mean and standard deviation of the signal [18].

3.7

Performance Evaluation

There exists several metrics for performance evaluation such as precision, recall, accuracy and F1 [19]. These metrics use the counts of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN). These metrics are defined as Precision = TP TP + FP (3.18) Recall = TP TP + FN (3.19) F1 = TP TP + (FP + FN)/2 (3.20) Accuracy = TP + TN TP + FP + TN + FN (3.21)

(29)

4

Method

This section explains and further elaborates on how the model and dataset are used. Two approaches for long term anomaly detection has been developed. One is a Rule-based model where a series of CUSUM test is used to generate alarms. The other approach is a data driven approach where three different ML models K-means, IF and One-class SVM has been implemented and applied to the dataset.

4.1

Tools and Frameworks

All of the code for this master’s thesis has been written in Python 3.7 using the PyCharm IDE [20][5]. Several Python frameworks has been used through out this master’s thesis but matplotlib and scikit-learn are worth to mention [12][16]. Mat-plotlib is an open source visualization framework for Python. All of the graphs and plots in this master’s thesis has been generated by using matplotlib. Scikit-learn is an open source ML framework for Python and contains a lot of function-alities. The framework contains the different ML models but also the dimension-ality reduction method PCA and the two normalization methods.

4.2

Rule-Based Model

This model uses the same logic for detecting anomalous vibration behavior as one would check manually.

• Have all of the vibration signals changed significantly?

• Has the load not changed significantly during the same time period? • Has the ambient temperature not changed significantly during the same

time period?

(30)

If the answer to all of these questions are yes then the time period is likely an anomaly. One can see in Figure 4.1 an overview of the Rule-based model. In all

Figure 4.1:A system overview of how the rule based model approach works

of the CUSUM test a rolling mean has been used as the normal value. The drift parameter is also set differently depending on which signal the test tries to detect a change for since they all are of different sizes. For this model 8 signals has been used which can be seen in Figure 4.1. This is because any more signals are not needed to raise an alarm that the timestamp indicates an vibration anomaly. Before sending the signals to the model the data is filtered on active load only using data which has a load over 20 MW to ensure that the turbine is running.

4.2.1

Parameters and Thresholds

For each of the CUSUM tests, the drift parameters are tuned as a fraction of the normal value. The thresholds for the alarm where set by hand while testing on the bad turbine.

4.3

ML models

The ML approach can be seen in Figure 4.2. The data is sent to three ML anomaly detection algorithms covered in Chapter 3.

4.3.1

Feature Selection

Before sending the data into the ML models a feature selection is done by select-ing the features which have an absolute correlation above 0.5 to one of the overall vibration signals. This is achieved by creating a correlation matrix for the data belonging to all of the good turbines and then looking at the correlation for one overall vibration signal at a time. The correlation matrix has in each of its rows the correlation for a specific signal in the dataset to the other signals and can be seen in Figure 4.3. Any duplicates are then removed from the list of features which results in a list of unique important features.

(31)

4.3 ML models 21

Figure 4.2:A system overview of how the ML models approach works

• Bearing 1, overall vibration signal (b_n1_vib1_fs) • Bearing 1, 1xn signal (bearing1_1xn_vib1_fs) • Bearing 2, overall vibration signal (b_n2_vib1_fs) • Bearing 2, 1xn vibration signal (bearing2_1xn_vib1_fs) • Generator bearing 1, overall vibration signal (gen_b_vib1_fs)

• Generator bearing 2, overall vibration signal (gen_b2_vib1_fs) • Generator bearing 1, temperature (generator_bearing1_temp_fs) • Generator bearing 2, temperature (generator_bearing2_temp_fs) • Rotor speed signal (speed_transducer_fs)

• Gearbox, 1xn vibration signal (gearbox_1xn_vib1_fs) • Gearbox, 2xn vibration signal (gearbox_2xn_vib1_fs)

• Gearbox, low speed overall vibration signal (gearbox_gen_vib_fs) • Gearbox, high speed overall vibration signal (gearbox_vib1_fs) • Lube oil temperature (lube_oil_temp1_fs)

• Gearbox, temperature (gearbox_temp1_fs)

• Bearing 1, 2xn vibration signal (bearing1_2xn_vib1_fs) • Lube oil tank pressure (lube_oil_tank_pressure_fs)

(32)

Figure 4.3:The correlation matrix for most relevant features

4.3.2

Pre-Processing

The second step is to pre-process the data. A rolling mean was applied on the selected features with a window of 24 hours. The window size was obtained by increasing the window size step by step while looking at the performance of the models. The rolling mean takes a window with a specific size and takes the mean of the signal on this window. This is to smooth out the signals. Also the scope is long term changes which means that fast changes in the signals are not important and should be removed if possible. Then the data set was filtered by removing all of the points which did not have an active load above 20 MW to ensure that the turbine is running. The dataset was then normalized by applying a min-max scaler and a zscore scaler (see Section 3.6). The normalization is necessary to introduce the signals to a common scale since the distribution of the signals might differ and then it is problematic to measure similarities and differences [24]. Two normalization methods where used to be able to evaluate which of them where the best suited and if they had an impact on the performance of the models. To reduce the time complexity of models the normalized signals are transformed by a PCA model to check if the data set can be represented using fewer features.

(33)

4.4 K-means 23

In Figure 4.4 one can see the explainability of the variance with the number of features used for both minmax normalized features and zscore. By looking at these plots one can conclude that 10 PCA components will be a good choice for no matter which normalization method is used.

(a)PCA for minmax normalized features (b)PCA for zscore normalized features

Figure 4.4:The explain-ability of the variance with an increasing amount of features for minmax and zscore normalized features

These 10 PCA components was then sent into each of the models.

4.3.3

Parameters and Thresholds

Each of the different ML models contain either some thresholds (k-means clus-tering case) or some parameters which gives an indication on how many alarms can be accepted for the training data. The training data contains no indication of vibration anomalies and every alarm is thus a false alarm. The contamination parameter for IF, the ν parameter for One-class SVM and the thresholds for the k-means clustering are set in such a manner that there would be 0.1% false alarms for the training data. These parameters and thresholds was tuned by looking at the model performance for different levels of contamination. The amount of alarms for the different models increases if the contamination is increased and decreases if it the contamination parameter is decreased. The contamination pa-rameter could not be set to be zero for the training data since then ML models would not generate any alarms for the tests.

4.4

K-means

The K-means model that has been used is the one implemented by scikit learn [10]. Two models are created one where they both use features normalized by using either minmax or zscore. First the number of desired clusters needed to be calculated. To get the optimal amount of clusters an elbow plot using was used by applying the function provided from [25]. The elbow plot or elbow method looks at the explained variance as function of the number of clusters [3]. The

(34)

idea is based around that there will be a threshold where the model would not improve much by adding another cluster. It gives an indication of which amount of clusters are good to use. In Figure 4.5 the elbow plot for the minmax nor-malized features is shown. In Figure 4.6 the elbow plot is shown for the zscore normalized features. Thus five clusters are used for both of the models.

Figure 4.5:The distribution score of the K-means model using minmax nor-malized features with an increasing amount of clusters

K-means assigns clusters to the entire data set. To get the anomalies for this model, the distance for each point to its cluster was computed. Then a threshold was set based on the training data from turbines that had not shown any abnor-mal behavior in the past. The threshold is set in such a way that a certain percent-age would be considered anomalous. Thus for each of the 5 clusters thresholds are set for the training data in such a way that 0.1% of the data belonging to each cluster is anomalous. This is done by selecting the n biggest distances and setting the threshold to the smallest distance. The variable n is then set to the anomaly fraction times with the total amount of points. Thus the training data is anoma-lous according to my anomaly fraction variable see Figure 4.7 for an example of how the distribution of the thresholds can look. Except from the number of clus-ter parameclus-ter the number of jobs for the algorithm was set to -1 which means it uses all the available thread. The rest of the parameters for k-means where set to default and not examined further.

(35)

4.5 Isolation Forest 25

Figure 4.6: The distribution score of the K-means model using zscore nor-malized features with an increasing amount of clusters

4.5

Isolation Forest

The IF model that is used is the one implemented by scikit learn [9]. Two mod-els are created using features from one of the two normalization methods. The contamination variable is used as a rough estimate on how many anomalies the training data has. Here it is set after some tuning to 0.1% which is the same as the k-means model. I have also tested using different number of estimators which is the number of Isolation trees in the model 100, 1000, 10000. The compu-tation time for the model increased as expected when increasing the number of estimator and the accuracy does not seem to increase thus I kept 100 number of estimators. The rest of the parameters where not examined any further but set to their default value.

4.6

One-class SVM

The One-class SVM model that has been used is the one implemented by scikit learn [11]. Two models are created using features from one of the two normal-ization methods. Since there are no labels it was quite an issue to tune the hyper parameters. The parameter ν is set in a similar fashion to the contamination parameter in the IF model and uses the same anomaly fraction to 0.1%. The ker-nel that was used in the model was the radial basis function. The parameter γ

(36)

Figure 4.7: The distribution of the distances to cluster 2 with the threshold in red

which is the kernel coefficient was after some testing set to 10−6. The rest of the

parameters where not examined and set to their default value.

4.7

Implementation

These models can be implemented in Siemens existing analysis tool and used on daily monitoring data. This can be achieved by fetching the signals for the day which is to be analyzed and the day before. This is necessary since the rule-based model uses a rolling 1-day mean as the normal value for the signals and the signals are pre-processed using a 1-day rolling mean in the ML models. Thus if no extra day is buffered the models will not have any data to analyze since the rolling mean removes points which do not have a 1-day look back.

4.8

Performance Evaluation

To be able to evaluate the performance of the different models a ground truth is needed. For the faulty turbine the period where an vibration anomaly is

(37)

in-4.9 Data Visualization 27

dicated is considered as a true anomaly. This is visualized in Figure 4.8 where the anomalous data points are the points between the red lines indicating areas 1 and 2. For the other two machines the assumption is that they do not have any

Figure 4.8: The overall gearbox vibration signal with the red lines as an indication where the anomalies are. The lower figure represent the actual ground truth signal which is either zero or one.

vibration anomalies. This means that the precision, recall and F1 metrics are not available for these tests. This is because the TP and FP are both zero when the ground truth has no alarms in it.

4.9

Data Visualization

To be able to visualize the different effects of the normalization methods and the effect of the rolling mean, t-sne was applied to project the data to a 2-D plane [15]. The data was then labeled into different classes if it was normal data which is the data used for training or normal data from either the good turbines or the bad turbine. Again here my guess of the ground truth is used (see Section 4.8) to determine if it is abnormal or not. Normal data is then compared with the normal and abnormal data and to the normal data from the good turbines.

(38)
(39)

5

Results

In this section contains the results from the two different approaches. The tests have been performed on one bad turbine, that indicates an vibration anomaly, two turbines without any indications of anomalies. These three turbines are new to the model and have not been available during training. The ML models have been trained on 3 months data from 12 different gas turbines that have no indica-tions of vibration anomaly.

5.1

Data Visualization

In Figure 5.1 the t-sne projections for the unfiltered dataset are shown. In Figure 5.1 one can observe that the abnormal data points seem to be very close to the normal data points from the bad turbine. The normal data from the bad turbine is also clearly distinguishable from the normal data from the training set. The normal data from the two normal machines seems to form two distinct clusters. In Figure 5.2 the dataset is visualized after using the rolling 24-h mean. Here one can note that the rolling mean seems to smooth out the abnormal data points and separate it from the normal data points from the bad turbine. It seems however as the abnormal data is closer to the normal training data. The normal data from the bad turbine seems to form clusters of it’s own but is also spread out more. The normal data from the two good turbine also gets spread out more compared to Figure 5.1. In Figure 5.3 one can see the visualization of the minmax normal-ized features. One can note that the abnormal data points seems to form its own clusters while its still close to the normal data from the bad turbine and normal data from the training set. The different normal data seem to form separable clusters. The cluster belonging to good turbine 2 seems to be further away from the normal training data compared to the other good turbine. In Figure 5.4 the zscore normalized dataset is shown. Here one can note that the abnormal points

(40)

form a distinct cluster outside the two other normal clusters. The normal data points also form its own separate clusters. The cluster of the normal data belong-ing to the good turbine 2 seem to be closer to the normal data from the trainbelong-ing set compared to the normal data from good turbine 1.

(a) 2D visualization of normal training data with abnormal and normal data from the bad turbine.

(b)2D visualization of the normal training data with normal data from the two good turbines.

Figure 5.1:2D t-sne projection of the unfiltered dataset.

(a) 2D visualization of normal training data with abnormal and normal data from the bad turbine.

(b) 2D visualization of normal training data with the normal data from the two good turbines.

Figure 5.2:2D t-sne projection of the filtered dataset.

5.2

Rule-based Model

Figure 5.5 shows the test for the bad turbine where there are anomalies. In this figure the amounts of alarms is shown together with the accuracy of the model and the F1 measure. One can see from the figure that the model generates a couple of false alarm in the region where the signal is low and when the signal levels changes. This observable change in signal mean is as already mentioned (Section 2.3) likely due to a re-balance of the rotor but this model detects a change here since all of the vibration levels change after the re-balance. Since this data

(41)

5.3 K-means Model 31

(a) 2D visualization of normal training data with abnormal and normal data from the bad turbine.

(b) 2D visualization of normal training data with the normal data from the two good turbines.

Figure 5.3: 2D t-sne projection of the minmax normalized and filtered dataset.

(a) [2D visualization of normal training data with abnormal and normal data from the bad turbine.

(b) 2D visualization of normal training data with the normal data from the two good turbines.

Figure 5.4:2D t-sne projection of the zscore normalized and filtered dataset.

is not filtered with a rolling mean, the signal varies more. But the actuary is still quite good while the F1 score is quite low.

In Figure 5.6 the rule based model is tested on good turbine 1 that has no anomalies. Here one can note that the amounts of alarms is lower compared to the previous test and the accuracy is higher since no alarms are expected for this test.

In Figure 5.7 one can see the results on good turbine 2. The alarms for this machine is even lower than the previous ones and thus the accuracy is higher since no alarm is expected for this signal as well.

5.3

K-means Model

In Figures 5.8, 5.9 and 5.10 the results for the k-means model are shown. In these figures one can see the amount of alarms generated together with the accuracy

(42)

Figure 5.5:The Rule-based model applied to the bad turbine data.

Figure 5.6:The Rule-based model applied to the good turbine 1 data.

and F1 measures. In Figure 5.8 one can observe that the minmax normalized data alarms more frequently compared to the zscore normalized data. In this test timestamps where an alarm should be raised according to my created ground truth (see Figure 4.8) are after 2019-11 in Figure 5.8. The amount of false alarms

(43)

5.4 Isolation Forest Model 33

Figure 5.7:The Rule-based model applied to the good turbine 2 data.

is higher for the minmax normalized features compared to the zscore features. The accuracy measure measure are also better the zscore normalized features while the F1 measure is slightly lower.

In Figure 5.9 one can see that the zscore normalized features generates an alarm less frequently compared to the minmax normalized features. In this test no alarms are expected as mentioned in the Section 4.8. The accuracy is thus lower for the minmax normalized features. Since there are no expected alarms the precision and recall measures are not available.

In Figure 5.10 the results from the test from good turbine 2 is shown. One can note that the k-means model with zscore normalized features generates less alarms compared to the other model. Since the expected result for this turbine is no alarm the accuracy for the k-means model with zscore normalized features is the highest.

5.4

Isolation Forest Model

In Figures 5.11, 5.12 and 5.13 the results for the IF model are shown. In these Figures the amounts of generated alarms are shown together with the accuracy and F1 measure. In Figure 5.11 one can see the results for the test on the bad turbine for both minmax normalized features and zscore normalized features.

(44)

Figure 5.8: Kmeans model with 24-hour rolling mean applied to the bad turbine data.

Figure 5.9: Kmeans model with 24-hour rolling mean applied to the good turbine 1 data.

Here the minmax normalized features generates more alarms compared to the zscore normalized features and the accuracy and F1 is also higher. This has to do with the model using zscore normalized features does not detects the ground truth anomalies. Thus the F1 measure is zero since there are no true positives.

(45)

5.5 One-class SVM Model 35

Figure 5.10:Kmeans model with 24-hour rolling mean applied to the good turbine 2 data.

note that the IF model with zscore nomralized features generates more alarms compared to the model with minmax normalized features. The accuracy of the model with zscore normalized features is lower then the models with minmax normalized features. Both of the models detect the alarms at the drop of the vibration signal.

In Figure 5.13 the two models are tested on the good turbine 2. One can note that the model with zcore normalized features generates more alarms compared to the model with minmax features. This is why the accuracy is better for the model with minmax normalized features.

5.5

One-class SVM Model

The results for the different test for the one-class SVM model can be seen in Fig-ures 5.14, 5.15 and 5.16. In these figFig-ures the amounts of generated alarms are shown together with the accuracy and F1 measures. In Figure 5.14 the test on the bad turbine is shown. One can note that the model using zscore normalized features generates more alarms and has higher accuracy and F1 measures.

In Figure 5.15 the test on good turbine 1 is shown. One can note that the both of the models generate the same amount of alarms and has the same accuracy and F1 measures. The amounts of alarms generated are low thus the high accuracy since no alarms are expected.

In Figure 5.15 the test on good turbine 2 is shown. One can note that the both of the models generate the same amount of alarms and has the same accuracy and F1 measures. The alarms generated in this test are also low just as the previous test. Thus the high accuracy of the models are achieved.

(46)

Figure 5.11:IF model with 24-hour rolling mean applied to the bad turbine data.

Figure 5.12:IF model with 24-hour rolling mean applied to the good turbine 1 data.

5.6

Model Performances

Tables 5.1 and 5.2 present the performance metrics for different tests. The k-means model has the highest recall even though the model has the worst accu-racy of all of the models. The precision for the k-means model is the lowest

(47)

5.6 Model Performances 37

Figure 5.13:IF model with 24-hour rolling mean applied to the good turbine 2 data.

Figure 5.14:One-class SVM model with 24-hour rolling mean applied to the bad turbine data.

compared to the other ML models. The accuracy for the k-means model is the worst when it comes to the test on the two good machines. The One-class SVM model has good accuracy for the test on the bad turbine. Depending on which normalization approach the model uses, the model achieves really good results and quite bad results when comparing to the other models. The best results from

(48)

Figure 5.15:One-class SVM model with 24-hour rolling mean applied to the good turbine 1 data.

Figure 5.16:One-class SVM model with 24-hour rolling mean applied to the good turbine 2 data.

the One-class SVM has the second best precision and accuracy for the test on the bad turbine. The model achieves the best accuracy for the good turbine 1. The IF model also has different results depending on which normalization method is used. The best results has the highest precision, accuracy, F1 and recall measures. The performance on the two good turbines is good, it has accuracy slightly lower

(49)

5.6 Model Performances 39

Table 5.1:The test results for the bad turbine.

Test Precision Recall F1 Accuracy

OneClassSVM_bad_minmax 0.2302 0.0796 0.1183 0.7879 OneClassSVM_bad_zscore 0.7289 0.3441 0.4675 0.8599 kmeans_bad_minmax 0.2184 0.7895 0.3421 0.4576 kmeans_bad_zscore 0.2575 0.4106 0.3165 0.6831 RuleBased_bad 0.1726 0.0195 0.0351 0.8272 if_bad_minmax 0.9403 0.5160 0.6663 0.9076 if_bad_zscore 0.0 0.0 0.0 0.8129

Table 5.2:The test results for the good turbine 1 and 2.

Test Accuracy OneClassSVM_good1_minmax 0.9961 OneClassSVM_good1_zscore 0.9961 OneClassSVM_good2_minmax 0.8678 OneClassSVM_good2_zscore 0.8678 kmeans_good1_minmax 0.5595 kmeans_good1_zscore 0.6930 kmeans_good2_minmax 0.1959 kmeans_good2_zscore 0.8444 if_good1_minmax 0.9835 if_good1_zscore 0.9774 if_good2_minmax 0.9869 if_good2_zscore 0.9239 RuleBased_good1 0.9758 RuleBased_good2 0.9891

than One-class SVM when looking at the best results for IF. The rule-based model has a lower precision, recall and F1 compared to the ML models but has good ac-curacy for the test on the bad turbine. The acac-curacy for the tests on the two good turbines are comparable to the One-class but slightly lower. When comparing the accuracy for the rule-based model with the accuracy for the k-means model the rule-based model outperforms.

(50)
(51)

6

Discussion

In this chapter the results and findings of this master’s thesis are presented and discussed.

6.1

Normalization

From the results in the previous chapter one can clearly see that the normaliza-tion method has an impact on the model performance. The zscore normalizanormaliza-tion seems to be the best choice for the k-means models while the minmax normaliza-tion outperforms zscore for the IF model. When it comes to the One-class SVM the zscore normalization achieves the best performance for both the bad turbine and the two good turbines.

6.2

Models

The results presented in Chapter 5 point that, by all the ML models, k-means has the worst performance. Since it generates alarms very frequently it has a good re-call score since it detects a lot of the anomalies but it also has the worst accuracy when comparing with the other models (see Tables 5.1 and 5.2). The One-class SVM had the by far the longest training time and prediction time when using a low γ compared to the other two models. The One-class model performs better compared to the k-means when using zscore normalization. It has a high perfor-mance in the test on the bad machine while at the same time has the best accuracy for the test on the good turbine 1. The rule-based model has good accuracy for the bad turbine while at the same time having good accuracy for both of the good turbines. What it lacks, however, is good precision since it doesn’t detect all of anomalies. The IF model using minmax normalized features has the best

(52)

mance of all of the models. It has high precision and good accuracy for the bad turbine while at the same time it has good accuracy for both of the good turbine. One reason why the k-means model does not perform as well as the other ML models can be that datapoints is not as close of the cluster centers as expected. The idea is that the normal data points should be close together near the cluster center while the anomalies should be on the outside of the cluster center. The cluster centers are calculated using the training data. Thus if the test data is not as similar to the training data they will end up further out from the cluster centers. The k-means model generates alarms more frequently compared to the other ML models and quite frequently overall. This points to the thresholds set from the training data are to narrow and needs further development. From Figures 5.3 and 5.4 one can see that the normal data from the two good turbines form their own clusters instead of the data overlapping with the normal training data. This could indicate that the normal data is not as similar as one would like them to be. The different turbines used has a quite signals but the preconception was that they would be similar enough to form clusters of normal data which do not seem to happen since the k-means model alarms to frequently. The abnormal data shown in Figures 5.3 and 5.4 does not seem to be that far away also from the two normal clusters in both of the figure. This can make it hard for the k-means model to detect the abnormal data.

One has to have in mind that the IF model is by default random and thus one can not expect the same results from the same test. This is because the IF model splits the features randomly when trained to isolate each data points. Thus the results from the IF model can be unreliable which has been observed in this master’s thesis. The results from the IF model shown in Section 5.4 is the best results achieved. Tests on the bad turbine where both of the models using either of the normalization method misses the anomalies has been observed.

One reason why the One-class SVM performs better using when using the zscore normalization method compared to the minmax features could be the how distinct the abnormal data is. One can observe in Figures 5.3 and 5.4 that the abnormal data cluster seems to be more distinguishable from the normal data when using the zscore normalization compared to minmax normalization.

6.3

Ground truth

In Figure 4.8 the ground truth for the bad turbine is shown. In this master’s thesis all of the points in region 1 and 2 are considered true anomalies but are they part of one anomalous occurrence or are there two separate occurrences one in each re-gion. It seems as there are to separate occurrences since the vibration levels drops back to normal in between. By comparing the results from the models to Figure 4.8 all of the models detect both of these occurrences but with different amounts of false alarms in between. The rule-based method alarms both on the vibration increases but also the vibration decrease. The k-means model generates alarms throughout both of the occurrences but since it generates an alarm frequently it also generates false alarms between and after the occurrences. The IF model

(53)

6.4 Related Work 43

detects both of the occurrences and also generates some false alarms but not as many as the k-means model. The One-class SVM detects both of the occurrences while at the same time few false alarms if any around the two occurrences.

6.4

Related Work

The approach proposed in this master’s thesis differs a bit from some of the work mentioned in Section 2.5. The approach in [26] uses a series of IF models with different levels of contamination to group the dataset into groups of normal data and abnormal data. This has not been attempted in this master’s thesis. Instead a simpler approach is used which considers each point of the dataset in terms of if it is abnormal or normal. As mentioned in Section 6.2 the IF model can be miss vibration anomalies but can achieve high performance when vibration anomalies are detected. The authors of [26] find that it is a good option for gas turbine gas path anomalies. The authors of [26] achieves precision, recall and F1 performance of around 94% compared to this thesis which best results has 94% precision and recall of 51% and recall 66% F1. Thus the model from [26] clearly performs better. One reason for this can be that the gas path anomalies are more anomalous than vibration anomalies and are thus easier to detect.

In [13] the authors also use a filtering method to do the feature selection, but since the scope of that article is flight performance, the flight dynamic features where used. In this master’s thesis since the scope is vibration anomalies the overall vibration signals was used instead. The Support Vector Regression model used in [13] predicts the flight dynamics feature values and evaluates the pre-dicted value with the observed value to check whether the point is anomalous or not. The scope of [13] also concerns fast dynamic changes which differs from this master’s thesis which looks at long term changes. In [2] the approach is instead to find features that will remain unchanged during normal conditions and pre-dict that feature using a recurrent neural network to detect gas path anomalies. Since both [2] and [13] predicts signals the performance metrics are in terms of how well the signal was predicted. Thus the the results of [2] and [13] can not be directly compared to this master’s thesis results.

In [24] the authors uses a k-means model to detect network intrusions. The dataset used in [24] uses a dataset with duplicated records. This was never at-tempted in this master’s thesis since it was realized at a late stage of the thesis that this could be useful. One could duplicate the data from the bad machine and add a Gaussian noise to get new faulty data. The authors of [24] achieves an accuracy of around 58% and a false alarm ratio of approximate 23%. This can be compared to the k-means results for the bad turbine with an accuracy of 68% and a false alarm rate around 15 − 30%. Thus an higher accuracy is achieved in this masters thesis but a bigger false alarm rate is observed for the good turbine 1.

(54)

The difference between the approach is that the authors of [22] has a supervised problem for a single machine while this master’s thesis propose an unsupervised approach for a fleet of machines. The results from [22] point out that One-class SVM is a good model to use for machine fault detection while this thesis results point out that to One-class SVM is decent at detecting vibration anomalies. The authors of [22] use different performance metrics compared to this master’s the-sis but one can still conclude that they achieve better results. This could due to One-class SVM performing better with supervised problems compared to unsu-pervised.

6.5

Future Work

Since the vibration levels for the different turbines are quite unique and not as similar as one would hope anomaly detection using unsupervised learning al-gorithms is difficult. An idea to improve it is to make some kind of regression model or models to predict the different vibrations levels. One can then create a residual with the observed value and subtract the predicted value. One can then set an alarm when the difference exceeds a threshold. The benefit of doing this is that the vibrations signals is available at Siemens signal database thus the true value of the vibrations is known. This approach is attempted in [13][2] with some differences. The idea to predict a features which would remain the same in normal conditions seems like good idea to investigate for vibration anomaly detection. Here the overall signals can probably be used since they contain the entire vibration content.

Since the amount of indicated anomalies in the data are few another idea on how to improve the data set is to inject a vibration anomaly. This can be done by increasing the overall vibration signals with a step for a certain amount of time. Thus more test with an anomaly will be available and increase the reliability of the performance measures. One could also duplicate the bad turbine data which is done in [24] and then apply a Gaussian noise to it. Then new faulty data would be available to work with.

It would be great if the employees working with the daily monitoring could mark a part of the turbine data that is anomalous. This would enable a supervised model approach to be used but this will lead to another question how should it be labeled. Here it will be good if all of the different faults and or anomalous behaviors had its own label. It seems wise to separate the anomalies in groups such as sensor anomalies and vibration anomalies. A good idea would be to start with the anomalies that are the most troublesome finding manually.

One more idea that could be looked into is to make machine specific models for vibration anomaly detection. One feature that is available but has not been used is the phase signal for the 1xn and 2xn. The vibration experts at Siemens look at both the vibration magnitude and the angle or phase of the vibration and represent vibrations as vectors in a 2-d space. This vibration signature is unique for each of the turbine thus it has not been used through out this master’s thesis but will be really good to use for modeling normal behavior of a single machine.

(55)

6.6 Results Usefulness 45

What has not been explored in this master’s thesis and could improve the mod-els are handcrafting features. One could be done is to create new features from the available ones to obtain features that can be used for better vibration anomaly detection. This however requires domain knowledge with turbine vibrations and could possibly be done with help from the vibration experts at Siemens.

6.6

Results Usefulness

The results provided in Chapter 5 show that to a certain extent vibration anoma-lies can be detected using the models proposed in this master’s thesis. Better performance can be achieved by implemented some of the proposed methods in the previous section. The next step would be to develop new models for detecting new kinds of anomalies in other signals. Gas path anomalies seems to be good to start with since some research has already been done in this field. One could also designing models for specific turbine faults. One problem with this is that faulty machine data are rare but could be overcome by injecting this faults in the turbine data. The need for good diagnostics models, data driven such as the ML models or rule-based are crucial for conditioned based maintenance. Conditioned based maintenance would allow for a reduction in maintenance cost and increase of turbine uptime which is critical for good profitability for both Siemens and its customers.

(56)
(57)

7

Conclusions

This chapter will give the reached conclusions regarding the thesis questions see Section 1.1 and the master’s thesis as a whole.

7.1

Thesis Questions

Can vibration anomalies be detected in available operational turbine data at Siemens? Yes, vibration anomalies can be detected but with different levels of reliability depending on which of the models are used. One also has to have in mind that the results from the different models stems from a guess when creating the ground truth (see Section 4.8). Thus the performance of the models only becomes as accurate as the guess of where the anomalies are.

Which methods are suitable for vibration anomaly detection? When com-paring the different ML models one can clearly see that the IF model seems to have the best performance and has a low execution time compared to k-means and One-class SVM models. However it is slightly undependable since the re-sults from the model cannot be expected to be the same from test run to test run. This is because in the training the model randomly splits of the features to de-tect anomalies. The One-class SVM has the second best performance but has a high training and predicting time compared to the other ML models. However the performance of the rule-based method is also good when looking at the two good turbines. When taking in mind how this model will be implemented and used in practice the rule-based approach has an advantage of being simple to im-plement and being simple to explain for the end users at RDC. The ML models will inevitably be harder to implement and harder to explain for the end user who lacks machine learning domain knowledge. The ML model is also likely to require more maintenance since it is more complex. Thus when taking this into consideration the rule-based model seems to be the best choice.

(58)

Which features are the most significant for vibration anomaly detection in gas turbines? The overall vibration signals shows the vibration entire vibration content for different parts of the turbine. Since this masters thesis is limited to detect anomalies in the vibrations its seems reasonable to use the signals which has the highest absolute correlation to the overall vibration signals. If this was a supervised problem one could then use different sets of signals or features and check the model output and keep the ones with the pest performance. But since no true amount of anomalies are available and this being an unsupervised ap-proach this is not possible. The list of features which where found to be the most important are found can be seen in Section 4.3.1.

References

Related documents

Two previous studies have shown that normal office but elevated ambulatory BP, or as we prefer to call it, “isolated ambulatory hypertension”, has a noticea- ble prevalence in

Honour killings exist in the Swedish society and the Middle Eastern society, but it is not related to Islamic religion, rather than a tradition, therefore, the old Middle

Den teoretiska grunden för prevention utgår från att insatser bör riktas till faktorer som minskar risken för framtida problem, alternativt ökar skyddet mot problemen.. Det är då

Att kompetent och trevlig vårdpersonal hade en avgörande roll för de äldres upplevelser av livssituationen på äldreboendet lyfts även upp i tidigare forskning som visar

Against this background, this thesis has utilized a visual and textual discourse analysis to investigate memory books published by the Swedish Armed Forces, which detail the pres-

Flera familjehemsföräldrar beskriver sin relation till fosterbarn som att de tycker om fosterbarnet som det vore sitt eget men att det saknas något för att det exakt

We demonstrate optically pumped dynamic nuclear polarization of 29 Si nuclear spins that are strongly coupled to paramagnetic color centers in 4H- and 6H-SiC.. The 99%  1% degree

Keywords: Ethical values, grounded theory, older patient, next of kin, nursing care, qualitative methods, empirical ethics, normative ethics... LIST