Fault Clustering With Unsupervised Learning Using a Modified Gaussian Mixture Model and Expectation Maximization

(1)

Master of Science Thesis in Electrical Engineering

Department of Electrical Engineering, Linköping University, 2021

Fault Clustering With

Unsupervised Learning

Using a Modified Gaussian

Mixture Model and

Expectation Maximization

(2)

Mixture Model and Expectation Maximization

Kevin Lindström LiTH-ISY-EX–21/5423–SE Supervisor: Max Johansson

isy_{, Linköpings universitet} Examiner: Daniel Jung

isy_{, Linköpings universitet}

Division of Automatic Control Department of Electrical Engineering

(3)

Sammanfattning

När ett fel upptäcks i bilmotorn så lyser motorlampan. Då är det ofta upp till mekanikern att diagnostisera felet på motorn. Manuell klassificering av en meka-niker kan vara tidskrävande och dyrt. Teknologiska framsteg har givit oss över-väldigande datorkraft som kan användas för att diagnostisera fel med datadrivna klassificerare. Datadrivna klassificerare kräver generellt sett väldigt mycket trä-ningsdata för att kunna diagnostisera systemfel med precision genom att jämföra sensordata med träningsdata. Detta eftersom att det krävs träningsdata från en stor mängd olika realiseringar från samma typ av fel. I denna studie föreslås en algoritm som inte kräver uppmärkt träningsdata, det vill säga träningsdata där korrekt typ av fel är dokumenterat. Istället klustrar den föreslagna algoritmen feldata genom att kombinera en systemmodell ochunsupervised learning i form

av en modifierad version avGaussian mixture model och Expectation Maximization.

Om ett eller flera av dessa felscenarier blir diagnostiserade vid ett senare tillfälle, är det sannolikt att resterande felrealisationer i samma kluster har samma feldia-gnos.

Den modifierade versionen avGaussian mixture model som föreslås i den här

stu-dien tar hänsyn till att residualdata, i vissa fall, inklusive fallet i denna studie då data samlats från en förbränningsmotor, verkar divergera från det nominella fallet (datapunkter nära origo) längs en ungefärligt linjär bana när felstorleken ökar. Detta tas hänsyn till genom att modellera klustren som gaussiska fördel-ningar runt felvektorer som representerar den bana som residualdatat divergerar längs när felstorleken ökar. Algoritmen tar dessutom hänsyn till att data upp-mätt vid ett och samma scenario sannolikt tillhör samma felklass, alltså är det inte nödvändigt att klustra varje datapunkt separat. Istället kan data från ett och samma scenario ses som en grupp som klustras tillsammans.

Den här studien utvärderar dessutom den föreslagna modellen som ensemi-supervised learner där viss data är känt. I det fallet kan algoritmen dessutom användas för

att estimera felstorlek på de okända felen genom att använda de beräknade fel-vektorerna, givet att det finns kända felstorlekar för data i samma kluster. Algoritmen utvärderas med data insamlat från en motortestbänk som använder en kommersiell volvomotor och visar lovande resultat där de flesta felrealisering-ar klustras korrekt. Dock visfelrealisering-ar resultaten att det finns tvetydigheter för data med små felstorlekar eftersom att denna typ av data liknar det nominella fallet och därmed överlappar mer med data från andra felklasser.

(4)

(5)

Abstract

When a fault is detected in the engine, the check engine light will come on. After that, it is often up to the mechanic to diagnose the engine fault. Manual fault classification by a mechanic can be time-consuming and expensive. Recent tech-nological advancements have granted us immense computing power, which can be utilized to diagnose faults using data-driven classifiers. Data-driven classifiers generally require a lot of training data to be able to accurately diagnose system faults by comparing sensor data to training data because labeled training data is required for a wide variety of different realizations of the same faults. In this study an algorithm is proposed that does not rely on labeled training data, in-stead the proposed algorithm clusters similar fault data together by combining an engine model and unsupervised learning in the form of a modified Gaussian mixture model using Expectation Maximization. If one or more of the fault sce-narios in a cluster is later diagnosed, the rest of the data in the same cluster is likely to have the same diagnosis.

The modified Gaussian mixture model proposed in this study takes into account that residual data, in some cases including the case in this study when the data is from an internal combustion engine, seem to diverge from the nominal case (data points near the origin) along a linear trajectory as the fault size increases. This is taken into account by modeling the clusters as Gaussian distributions around fault vectors that each represent the trajectories the data moves along as the fault size increases for each cluster or fault mode. The algorithm also takes into ac-count that data from one scenario are likely to belong to the same fault class i.e. it is not necessary to classify each data point separately, instead the data can be clustered as batches.

This study also evaluates the proposed model as a semi-supervised learner, where some data is known. In this case, the algorithm can also be used to estimate the fault sizes of unknown faults by using the acquired fault vectors, given that there are known fault sizes for other data in the same cluster.

The algorithm is evaluated with data collected from an engine test bench using a commercial Volvo engine and shows promising results as most fault scenarios can be correctly clustered. However, results show that there are clustering ambi-guities for data from small faults, as they are more similar to the nominal case and overlap more with data from other fault modes.

(6)

(7)

Acknowledgments

I would like to give a special thanks to my supervisor Max Johansson who has pro-vided invaluable feedback and constructive criticism when brainstorming how to proceed with the study. I would also like to give a special thanks to my examiner Daniel Jung who has also provided great feedback and helped lay the foundation for this study.

(8)

Notation xi

1 Introduction 1

1.1 Motivation . . . 2

1.1.1 Clustering . . . 3

1.1.2 Fault size estimation . . . 3

1.2 Aim . . . 4 1.3 Research questions . . . 5 1.4 Delimitations . . . 5 1.5 Methodology . . . 5 1.6 Thesis outline . . . 6 2 Theory 7 2.1 Gaussian Mixture Model . . . 7

2.1.1 Expectation Maximization for GMM . . . 7

2.2 Principal Component Analysis . . . 11

3 Method 13 3.1 Proposed algorithm . . . 13

3.2 Estimating number of clusters . . . 18

3.3 Fault size estimation . . . 18

4 Results 21 4.1 System and Data . . . 21

4.2 Clustering . . . 23

4.2.1 Unsupervised learning . . . 23

4.2.2 Semi-supervised learning . . . 27

4.2.3 Supervised learning . . . 34

4.3 Estimating number of clusters . . . 36

4.4 Fault size estimation using labeled data . . . 37

5 Discussion and conclusion 41 5.1 Results . . . 41

(9)

Contents ix

5.2 Method . . . 42 5.3 Future work . . . 43

(10)

(11)

Notation

Fault modes

Notation Meaning

fpim Fault in intake manifold pressure sensor

fpic Fault in intercooler pressure sensor

fwaf Fault in air filter air flow sensor

fiml Leakage in intake manifold

Abbreviations

Abbreviation Meaning

NF No fault mode

WLTC Worldwide harmonized Light-duty vehicles Test Cy-cles

GMM Gaussian Mixture Model EM Expectation Maximization

(12)

(13)

1

Introduction

Technical systems, e.g. engines fail, and when they do, the fault should be identi-fied and dealt with as soon as possible to reduce the risk of further damage to the system and to ensure that the system remains safe to operate. Historically it has been up to experts to diagnose faults manually. However, with today’s techno-logical advancements, especially in computational power, fault diagnosis using computers can give important clues to aid the experts e.g. a mechanic when diag-nosing the fault. Supervised machine learning can be used to train a data-driven model to diagnose the system fault. However, as faults are rare events and the actual fault might not be known, or the fault label in the training data might be incorrect, correctly labeled training data is scarce. In unsupervised learning, the point is to train a model that can recognize patterns of individual faults without any labels for the data. There are also cases where only some of the training data comes from known fault cases where the goal is to predict the class of data that comes from unknown fault cases, which is called semi-supervised learning. In model based diagnosis, residuals comparing model predictions with sensor data, are used to detect anomalies caused by faults in the system. Data driven fault classification uses previously collected data to model the relationship be-tween a set of features such as residuals to fault classes so that the model can later be used to determine the type of fault that new fault data belongs to when a new fault occurs.

The diagnostic problem is further complicated by uncertainties in the data and ambiguities in the diagnoses since there are often several fault hypotheses that can explain a set of residual outputs. Furthermore, different realizations of the same fault class can have other variations such as different fault sizes, which means there are seldom obvious differences between different fault classes. But

(14)

at the same time, by plotting and looking at the residual data for different faults, it is visible that the residuals for different faults seem to spread out differently in space which could give useful information to be able to cluster data.

1.1 Motivation

Two common approaches to fault diagnosis are model-based diagnosis [5] and data-driven diagnosis [20]. There is a wide variety of papers discussing fault di-agnosis using machine learning. Data-driven didi-agnosis uses data collected from previous fault occurrences to classify new faults. A limitation of this approach is that correctly labeled training data can be scarce, and without a good system model, a tremendous amount of data could be required to cover all operating points as well as a wide variety of fault severities. Model-based diagnosis can overcome the issue with different operating points since a good model will take the operating point into consideration. Simply put, residual generators compare sensor data to the model estimations of the sensor values. A common approach in model-based diagnosis is to match sets of triggered residual generators to certain fault hypotheses. A problem with this approach is that there can be several diag-noses that can explain the triggered residuals. In [11] a hybrid diagnosis system combining model-based and data-driven fault diagnosis is designed to identify known faults and to identify unknown faults i.e. faults that are not represented in the training data. A literature review discussing the bridging between model-based diagnosis and data-driven diagnosis for fault diagnosis can be found in [14].

Unsupervised learning, contrary to supervised learning, is able to cluster unla-beled data together. There are different types of unsupervised learning methods. One of the simplest unsupervised learning methods is k-means clustering [16]. Another more advanced method is Gaussian mixture models [7, 18]. K-means clustering is simple because it only looks at the distance between data points and the cluster mean. Gaussian mixture models however models the probability that a data point belongs to a certain cluster by using a sum of Gaussian distributions. Not only can this describe the clusters more accurately, it also has the advantage of returning the probability that a certain point belongs to a certain cluster. In this study, a modified version of the Gaussian mixture model using Expec-tation Maximization is proposed that is developed to take into account that data from different fault modes seem to spread out in a certain direction in the resid-ual space, as well as the fact that residresid-uals usresid-ually are time-series data. By using this method and taking these observed data features into account, unlabeled data can quite successfully be grouped together based on the underlying fault mode without any labeled data.

(15)

1.1 Motivation 3

In [22], a vector clustering technique has been developed to be able to classify faults of varying fault sizes in heat pumps using limited known training data. They note that data from a certain fault mode moves away from the nominal case along a linear trajectory in space as the fault size varies. The same phenomenon can be seen in the residual data from the internal combustion engine used in this study, see Figure 1.1.

1.1.1 Clustering

A common approach to data-driven diagnosis is to use a supervised learner to classify different fault modes. With a supervised learner, it is a necessity that there is correctly labeled training data. To classify a certain fault mode with a certain severity, most approaches need labeled training data from a similar fault severity to correctly classify new data. Residual data for different severities for the fault mode fpic can be seen in Figure 1.1. The residual data for small fault

sizes are close to the nominal case in the origin, whereas data from larger fault sizes seem to diverge more from the nominal case, but on a linear trajectory in a certain direction. The same phenomenon can be seen when inspecting the resid-ual data from other fault modes as well. To overcome the necessity of labeled training data, an unsupervised clustering algorithm will be used, more specifi-cally a modified version of the Gaussian mixture model using Expectation Maxi-mization, and to reduce the need for many different fault severities in the training data the algorithm will take into account the fact that the residual data from a certain fault mode seem to diverge from the nominal case on an approximately linear trajectory regardless of fault severity.

The residual data used in this study, which can be seen in Figure 4.1, is close to the origin in the nominal case. Because the fault realizations diverge from the nominal case as the severity increases, there will be a significant overlap between different fault modes, especially when the fault sizes are small. The Gaussian mixture model can model the probabilities that a data point belongs to each clus-ter. By using these calculated probabilities and taking the joint probability that every data point in a measurement file belongs to each cluster, the most proba-ble cluster should be the one where the joint probability is maximized. This is utilized in the proposed algorithm in Chapter 3.

1.1.2 Fault size estimation

When a car is in operation and a fault is detected, it can be of great importance for the safety of the driver as well as the longevity of the vehicle that the fault size or severity of the fault is estimated. It could be a severe fault that greatly impacts the performance of the vehicle and safety of the driver and passengers that require the vehicle to be stopped immediately, or it could be a less severe fault where the vehicle is safe to drive to the mechanic but should be fixed as soon as possible, or a minor fault that should be looked over but is not urgent. Fault size estimation is relevant for tracking component degradation and prognostics where the objective

(16)

Figure 1.1:Fault mode fpicwith different severities

is to predict remaining useful life. Fault size estimation is a fairly unexplored subject but has been studied in, for example, [2] and [4] on bearings. Since these have been conducted on bearings, the approach is not applicable to the case of internal combustion engine data in this study. The fault vectors, estimated in the clustering algorithm, are used to estimate the direction of the data spread of a certain fault mode as the fault size varies. The estimated fault vectors could be utilized to estimate the fault size or severity of the detected fault, as long as there is some collected data from the same fault mode with a known fault size or severity for comparison.

1.2 Aim

The objective of this thesis is to develop a method to cluster residual data from an internal combustion engine based on fault modes without having to provide labeled training data. Another goal is to explore the possibilities of using the obtained clusters to estimate the fault sizes of unknown faults provided that some of the data used for clustering is correctly labeled and with known fault sizes.

(17)

1.3 Research questions 5

1.3 Research questions

The following research questions have been selected to address the problems de-scribed in Section 1.1:

1. Can the Gaussian mixture model be modified such that it can be used to cluster residual data from different fault scenarios based on fault modes? 2. Is it possible to use time series information to increase separation between

fault classes by for example using batch data instead of analyzing one sam-ple at a time?

3. Can the fault size be estimated using the obtained model?

4. Is it possible to formulate an algorithm that can estimate the number of clusters?

The proposed model following these researching questions will be validated us-ing an engine case study.

1.4 Delimitations

The following delimitations are set for this study: • Only single fault modes will be considered.

• Residuals are generated using an existing model and no further modeling will be performed.

• The available training data is assumed to be correctly labeled and have ac-curate documented fault sizes.

1.5 Methodology

This study is conducted as part literature study and part experimental work where an algorithm for residual fault clustering is designed. The literature study part of the study is intended to motivate the experimental work and describe some central concepts related to the experimental work. The experimental work will originate from the conventional GMM and EM algorithm, and step by step be modified to better handle residual data. After each step, the model will be validated using real data from a test bench. Finally, the study is concluded with a discussion about the results.

(18)

1.6 Thesis outline

The continuation of the thesis will be divided into the following chapters: • Chapter 2 will introduce some of the central concepts used to design the

proposed method.

• Chapter 3 presents a proposed method to answer the research questions in Section 1.3.

• Chapter 4 presents the results from the engine case study evaluating the proposed method.

• Chapter 5 discusses the results, methodology, future work, and answers the research questions.

(19)

2

Theory

This chapter introduces some of the central concepts in this study. The imple-mentation using these methods is described in Chapter 3.

2.1 Gaussian Mixture Model

Gaussian mixture model is a model to cluster data for fault diagnosis that has been used in e.g. [19] and tested on a rotary machine. This thesis will use the same method with some modifications. Instead of clustering around a center-point, the points should be clustered around a vector. Each vector should then update iteratively by using the Expectation Maximization algorithm and in each step, update the direction of the vector using the points that are closer to this vector than any of the other vectors and update the covariance matrices of the distributions. These steps will be iterated until the vectors have converged or when a maximum number of iterations has been reached.

2.1.1 Expectation Maximization for GMM

Expectation Maximization is a popular tool for simplifying difficult maximum likelihood problems [8] and it is especially popular for Gaussian mixture models, which will be used in this thesis. The goal is to maximize the Gaussian mixture model likelihood function with respect to the parameters (means, covariances and mixing proportions). The Expectation Maximization algorithm can be de-scribed by the following pseudocode for Gaussian mixture models, see for exam-ple [3], [8]:

1. Define a sample x ∈ RM, where M is the number of residuals. Initialize the means µk ∈ RM, covariances Σk ∈ RM×M and mixing proportions πk ∈

(20)

[0, 1], where k = 1, ..., K is cluster index, and evaluate the initial value of the log likelihood ln p(X|µ, Σ, π) = N X n=1 ln        K X k=1 πkN(xn|µk, Σk)        (2.1) 2. Expectation step:

Evaluate the responsibilities that a sample xn belongs to a given cluster k

using the current parameter values of the Gaussian distributions

γk,n=

πkN(xn|µk, Σk)

PK

j=1πjN(xn|µj, Σj)

. (2.2)

where N (xn|µk, Σk) is the likelihood of xn computed from a multivariate

normal distribution with mean µk and covariance Σk, and πk is the mixing

proportion for cluster k. The responsibilities γk,n are scalars in the range

[0, 1] and

K

X

k=1

πkN(xn|µk, Σk) = 1 (2.3)

for all n. The responsibilities represent how much responsibility each clus-ter has for each sample n, or in other words the responsibilities are a mea-sure of how well each cluster fits each data point n. The responsibilities are then used as weights in the maximization step.

3. Maximiziation step:

Re-estimate the parameters of the Gaussian distributions using the current responsibilities µnew_k = 1 Nk N X n=1 γk,i· xn (2.4) Σnew_k = 1 Nk N X n=1

γk,i· (xn−µnewk ) · (xn−µnewk )T (2.5)

π_knew= Nk N (2.6) where Nk = N X n=1 ln        K X k=1 πkN(xn|µk, Σk)        (2.7) 4. Evaluate the log likelihood (2.1) and check for convergence of either the parameters or the log likelihood. If the convergence criterion is not satisfied, return to step 2.

(21)

2.1 Gaussian Mixture Model 9

This algorithm will converge to a local optimum in the log-likelihood, but is not guaranteed to find a global optimum [17] [9], hence the algorithm should be per-formed many times with different initializations of the parameters. The parame-ters from the run with the highest log-likelihood (2.1) is then selected.

An illustrative example of how the algorithm could look will be presented be-low. In Figure 2.1 an initialization of the algorithm on some data is shown where there seems to be two separate Gaussian distributions. The distributions are ini-tialized randomly and even though they are overlapping and quite similar, their means and covariances are different which will affect the responsibilities. The responsibilities for the upper left points will be larger for the green distribution than for the blue distribution.

Figure 2.1:Step 1: Initialization of GMM.

In Figure 2.2 new means and covariances has been calculated using the new re-sponsibilities and the mean of the green distribution has shifted more towards the points in the upper left corner since the responsibilities for those points are larger for the green distribution as mentioned previously. Similarly, the mean of the blue distribution has shifted toward the points in the lower right corner.

(22)

Figure 2.2:Step 2 of EM GMM.

New responsibilities are calculated once again and in Figure 2.3 the means and covariances are calculated again and the algorithm has converged to a solution which seems to fit the two distributions quite well.

(23)

2.2 Principal Component Analysis 11

2.2 Principal Component Analysis

The principal components of a collection of data x1, ..., xN are a set of orthogonal

vectors that span the space RMin which the data points reside, just like the stan-dard basis. The difference is that the direction of the first principal component is chosen so that it matches the direction in which the data have the largest variance. The second principal component is chosen the same way but has to be orthogonal to the first principal component. The third principal component has to be orthog-onal to the first and second principal component and chosen the same way, and so on.

Principal Component Analysis computes the principal components of the data and is often used to perform a change of basis on the data and only using the first few principal components that explain a certain amount of variance in the data and discard the rest of the principal components. This can be a useful tool to reduce the dimensionality of the data but still explain as much of the variance as possible. An example of principal components in two dimensional data is shown in Figure 2.4.

Figure 2.4:Example of PCA displaying the two principal components for a set of data points.

In this study, weighted Principal Component Analysis is used to extract the fault vector as the first principal component. The difference is that each data point has a weight that specifies how much that data point should contribute to the principal components. For more on weighted Principal Component Analysis, see [6] [1].

(24)

(25)

3

Method

In this chapter, an algorithm will be proposed for clustering unknown faults and estimating unknown fault sizes given some known faults. Data from unknown faults are assumed to have an unknown fault mode and an unknown fault size. Data from known faults are assumed to have a known fault mode and in most cases a known fault size. The proposed algorithm is based on the multivariate GMM using an EM algorithm. To increase clustering performance, the changes made from the standard GMM and EM algorithm are made to take into account the observed linear spread of different fault modes as the fault size varies as well as the fact that the data is time-series data where each scenario represents a cer-tain fault instead of looking at individual data points.

3.1 Proposed algorithm

In this section the algorithm for this study will be proposed. Before going into de-tails, an illustrative example of regular GMM as well as the proposed algorithm is shown. Figure 3.1 shows some data surrounded by ellipses/ellipsoids. The data shown is artificial, but could represent, for example, residual data from an inter-nal combustion engine where two faults of different fault sizes are present. Data from different fault sizes might not overlap if the difference is fault size is large enough, just like for the data in Figure 3.1. The ellipsoids are there to show differ-ent clusters that could be found by regular GMM introduced in Chapter 2. GMM models data as a mixture of Gaussian distributions, and since data from differ-ent fault sizes can be separated, GMM could model each fault size as a separate Gaussian distribution.

(26)

Figure 3.1:Plot of data with ellipses/ellipsoids showing how a regular GMM could cluster this type of data.

Figure 3.2 shows the initialization of the proposed algorithm where the clusters are Gaussian distributions around a fault vector. The algorithm described later in this chapter is then used to iteratively update the clusters until convergence. An example of a desirable result from this algorithm when it has converged is shown in Figure 3.3 where the clusters cover the data on a linear trajectory.

(27)

3.1 Proposed algorithm 15

Figure 3.3:Vectorized GMM has converged.

The algorithm in its most general form when used as an unsupervised learner is described step by step below. This can be compared to the standard Gaussian mixture model using Expectation Maximization described in Chapter 2.

1. Take initial guesses for parameters ˆvk ∈ RM(fault vector for each cluster k),

ˆ

Σ_k (covariance matrix for each cluster k), ˆπk (mixing proportions for each

cluster k).

• Initial guesses for fault vectors ˆvkare constructed by randomly

choos-ing k points from the data and form k fault vectors from the origin to these points.

• The residual data as well as the fault vector are then projected onto the hyperplane perpendicular to the fault vector. Multiplying the residual matrix D and the fault vectors ˆvk with the transformation matrix

A =hN ( ˆvk)T vˆk

i−1

(3.1) where N ( ˆvk) is the nullspace of ˆvk, yields the residual matrix as well

as the fault vectors in a coordinate system where the last component is in the direction of fault vector ˆvk, and the rest of the components are

perpendicular to fault vectors ˆvk

ˆ

vk· A (3.2)

D · A (3.3)

Discarding the last component of the fault vectors and the last columns of the residual matrix therefore yields the coordinates of the fault vec-tors as well as the residual matrix projected onto the hyperplane per-pendicular to the fault vector where the former one (fault vector pro-jection) corresponds to the cluster mean ˆµk.

(28)

• Initial guesses for the covariance matrices ˆΣ_kare set to the overall sam-ple variance in the hyperplane orthogonal to fault vector k.

• Initial guesses for the mixing proportions ˆπkare each set to 1/K where

K is the number of fault vectors (clusters).

2. Expectation Step: compute the responsibilities ˆ γi,k= ˆ πkN(yi,k|µˆk, ˆΣk) PK j=1πˆjN(yi,j|µˆj, ˆΣj) , i = 1, 2, ..., N . k = 1, 2, ..., K. (3.4) The responsibilities ˆγi,k are proportional to the probabilities that point i

belongs to cluster k given the current model. 3. Calculate batchwise responsibilities:

The responsibilities calculated in the previous step are calculated for each sample. However, it is known that the residual data comes from different scenarios where each scenario contains measurements taken over a period of time i.e. residual data from a certain scenario should represent a certain fault or combination of faults. To take this information into consideration, batchwise responsibilities are calculated by calculating the joint probability for several measurements of a user specified batch size as

ˆ γb,k = Q i∈b( ˆγi,k) PK j=1 Q i∈b( ˆγi,j) (3.5) where b is a batch and then normalizing so that the sum of the responsibili-ties over all clusters k for a batch is 1. To calculate the joint probability that each data point belongs to cluster k, the batch size should be set to the num-ber of data points in one scenario. A problem with such a large batch size is that the joint probabilities easily become too small for the computer to handle. A solution to this is to reformulate (3.5) by using the logarithm to reduce the exponential value of both the numerator and denominator using

Y i∈b ˆ γi,k= exp        log        Y i∈b ˆ γi,k               = exp        X i∈b log ˆγi,k        = = exp        X i∈b log ˆγi,k − B        · exp(B) ⇒ ˆγb,k = = exp P

i∈blog ˆγi,k − B · exp(B)

PK j=1 expP i∈blog ˆ γi,j −_B_{· exp(B)} = = exp P

i∈blog ˆγi,k − B

PK j=1 expP i∈blog ˆ γi,j −_B (3.6) and by choosing B as B = max {_j∈K}        X i∈b logγˆi,j       

(29)

3.1 Proposed algorithm 17

the numerator is forced to the interval [0, 1] and the denominator to the interval [1, K], hence this becomes a feasible problem for the computer to solve as long as the absolute value of B does not become too large for nu-meric computations. If B becomes too large, each fault scenario could in-stead be divided into several smaller batches to overcome this issue. 4. Maximization step: Compute fault vectors and covariance matrices

Fault vectors are calculated using weighted PCA with the calculated respon-sibilities as weights. The covariance matrices are calculated as

ˆ Σk = PN i=1γˆi,k· x_i,kT −_µ_ˆ_k_·_xT i,k−µˆk T PN i=1γˆi,k (3.7) 5. Evaluate the log-likelihood:

Calculate the log-likelihood using

ln p(X|µ, Σ, π) = N X n=1 ln        K X k=1 πkN(xn|µk, Σk)        (3.8)

and check for convergence of the log-likelihood. If the convergence crite-rion is not satisfied return to step 2. If the convergence critecrite-rion has been satisfied, save the log-likelihood, fault vectors, mixing proportions and co-variance matrices. If the predetermined maximum number of outer itera-tions has not been reached, return to step 1, otherwise move to the next step.

6. Choose the model parameters from the outer iteration with the highest log-likelihood.

There are two loops in the algorithm, one inner loop and one outer loop. The pur-pose of the inner loop is to update the parameters until convergence. Since the log-likelihood of mixture models is prone to have many different local maxima [17] [9], the outer loop serves the purpose of finding different local maxima by ini-tiating each outer loop iteration with different parameters. When the predefined number of maximum outer loop iterations has been reached, the model is chosen as the model with the highest log-likelihood out of all outer-loop iterations. When using the algorithm as a semi-supervised learner there are some modifi-cations to the algorithm described above. The known data for each fault mode is permanently bound to a certain cluster k with responsibility ˆγk = 1 and clusters

that contain known data have their fault vectors initialized as a randomly cho-sen data point in the known data to get a fault vector that lines up well with the known fault mode.

(30)

mixing proportions are not reliable. Hence, another modification of the algo-rithm described above, that will be evaluated, is when the mixing proportions ˆπk

are disregarded and set constant to 1/K. This will calculate the likelihood that a point belongs to a certain cluster without taking into account how many data points currently belong to each cluster. When constantly updating the mixing proportions ˆπk the clustering result will be highly dependent on how much data

is available for each fault mode. Consider a case where there is training data from two fault modes where 1000 data points are available from fault mode 1 and 9000 data points are available from fault mode 2. This would result in the mixing proportions ˆπ1 = 0.1 and ˆπ2 = 0.9, which would significantly decrease

the probability that data from fault mode 1 is correctly clustered, even though it fits better with its corresponding cluster. For this reason it remains interest-ing to investigate this modification of the original Gaussian mixture model and compare its performance to when ˆπkis updated each inner iteration.

3.2 Estimating number of clusters

Determining the number of clusters is one of the core problems when clustering data. Increasing the number of clusters will result in overfitting. The Bayesian Information Criterion (BIC) is a statistical measure for model selection that will be used in this study [12]. Generally the first local minimum is a good estimate for model selection, but when it comes to clustering data with mixture models it is not sufficient because the log-likelihood monotonically increases with the number of clusters. A more robust method is to find the knee of the BIC-curve [21] which is used in this study. The algorithm is run from one cluster up to a user defined number of clusters to get a BIC-curve to estimate the number of clusters. There are many ways to determine the knee point, in this study it is determined by analyzing how the BIC changes for each number of clusters and find the change point using root mean square which is a built-in function in, for example, MATLAB.

3.3 Fault size estimation

The proposed algorithm in Section 3.1 finds vectors that matches the direction in which each fault mode propagates as the fault size varies. This is taken advantage of in the fault size estimation of this study. By looking at the data in Figure 1.1 it seems that data for a certain fault mode moves along a trajectory as the fault size varies. It even looks like the fault size is linearly dependant on the divergence in the fault vector direction which is why a simple linear model has been deemed sufficient. In this study, the proposed algorithm is evaluated as an unsupervised learner, semi-supervised learner as well as a supervised learner. In the case of an unsupervised learner, there are no labels and no fault size data available, hence the fault size cannot be estimated in that case. Instead, the fault size is estimated

(31)

3.3 Fault size estimation 19

when the algorithm is used as a semi-supervised learner to evaluate the fault size estimation performance, and as supervised learner to validate that a linear fit is sufficient for the available data.

(32)

(33)

4

Results

This chapter presents the results from the experiments evaluating the proposed algorithm. First, a description of the residual data is given in Section 4.1. Then, the proposed clustering algorithm is evaluated in Section 4.2. Finally, the perfor-mance when estimating the number of clusters and for the fault size estimation, are evaluated in Sections 4.3 and 4.4 respectively. The analysis and discussion of the results is left for the discussion in Chapter 5.

4.1 System and Data

The residual data used in this study consist of observations from the nominal case where no fault is present as well as four different fault modes, more specifically three sensor faults as well as leakage in the intake manifold. Since the severity of a fault can vary, for each of these fault modes, data has been collected for differ-ent severities. The fault modes and their corresponding severities in the available data are shown in Table 4.1, where there are two data sets with unknown fault sizes. The data used in this thesis is collected from an engine test bench using a commercial four cylinder, turbo charged internal combustion engine from Volvo. The sensors used in this study are the same sensors available in a commercial setup of the engine. The data is generated using a WLTP (Worldwide Harmo-nized Light-Duty Vehicles Test Procedures) cycle using a driver model and a ve-hicle model to follow the cycle. The WLTP cycle covers different operating points of the engine which is an important aspect to be able to cluster faults regardless of operating point. The WLTP cycle is described in more detail in [15]. Further-more, the residuals used in this study are generated using a set of physically-based Grey-Box Recurrent Neural Networks which are described in detail in [10]. After residual generation, there are nine residuals that are used in this study i.e.

xn∈ R9. All data is downsampled by a factor of 10 to decrease the time taken to

(34)

perform the clustering. This factor was chosen because it significantly decreases the time taken while keeping the clustering performance at a good level. Three of the nine dimensions in the original residual data are shown in Figure 4.1 and the downsampled residuals can be seen in Figure 4.2.

Figure 4.1:The original residuals plotted in the first three dimensions.

fi Description of fault mode Severities θ

fpic Sensor fault intercooler pressure -20,-15,-10,-5,5,10,15 (%)

fpim Sensor fault intake manifold pressure -20,-15,-10,-5,5,10,15 (%)

fwaf Sensor fault air flow in air filter -20,-15,-10,-5,5,10,15,20 (%)

fiml Intake manifold leakage 4mm, 6mm, ?, ?

N F No fault 0

Table 4.1:Available data from different fault realizations and different sever-ities. The question marks in the severities represents data that has an un-known fault size.

(35)

4.2 Clustering 23

Figure 4.2:The downsampled residuals plotted in the first three dimensions.

4.2 Clustering

In this section the proposed clustering algorithm is evaluated given that the num-ber of clusters is known. First off, as a baseline for comparison, unsupervised clustering using conventional GMM is evaluated followed by sample-by-sample clustering using the proposed algorithm. Then the algorithm is evaluated as an unsupervised learning with updating mixing proportions and with fixed mixing proportions. Finally, the results are presented for the semi-supervised learning and supervised learning variants of the algorithm in Sections 4.2.2 and 4.2.3. For the tests, a commercial desktop CPU with six cores has been used. The compu-tation time for the proposed algorithm as an unsupervised learner when all fault realizations in Table 4.1 are used was less than one minute when using fixed mix-ing proportions and 18 outer iterations. More outer iterations are required to obtain a desirable result when using updating mixing proportions. For the re-sults in Section 4.2.1, 3000 outer iterations were used, which took approximately an hour.

4.2.1 Unsupervised learning

This subsection shows the result when using the proposed algorithm for unsuper-vised learning. To further motivate the proposed algorithm by having something to compare the results to, the results using regular GMM and EM is first pre-sented for comparison.

(36)

Clustering using conventional GMM

The matching matrix for the regular GMM can be seen in Table 4.2 and a plot of the corresponding clusters can be seen in Figure 4.3. The regular GMM does not seem to find clusters that clearly separate the different fault modes since data from each fault mode is spread out over the different clusters.

fiml fpic fpim fwaf

Gr1 6437 1126 11187 12018

Gr2 2918 6323 8050 7151

Gr3 5 6611 3 1

Gr4 5068 10496 6109 9670

Table 4.2: Matching matrix for clustering using regular GMM. Data from all fault sizes for faults fpim, fpic, fwaf, fiml using the data in Figure 4.2 are

included in the clustering.

Figure 4.3:Clustering result when using regular GMM.

Sample-by-sample clustering

In Table 4.3, the result is presented for the proposed algorithm but when the clustering is done on individual data points instead of batchwise clustering to further motivate the use of batchwise clustering, the corresponding clusters can be seen in Figure 4.4. Once again, the clusters do not clearly separate the different fault modes as data from different fault modes are spread out over the different

(37)

4.2 Clustering 25

clusters. This can later be compared to the results of the proposed clustering algorithm using batchwise clustering.

fiml fpic fpim fwaf

Gr1 6413 1113 11146 11808

Gr2 6388 1782 9070 12348

Gr3 206 12600 805 1310

Gr4 1421 9081 4328 3374

Table 4.3: Matching matrix for clustering individual data points instead of batchwise clustering. Data from all fault sizes for faults fpim, fpic, fwaf, fiml

using the data in Figure 4.2 are included in the clustering.

Figure 4.4:Clustering result when clustering individual data points.

Proposed algorithm

Now that some baseline for comparison has been established, the results for the proposed algorithm using batchwise clustering will be presented. In the first ex-ample, estimated mixing proportions will be used. Table 4.4 displays how many realizations of each fault mode in Table 4.1 was assigned to each cluster in the

(38)

clustering algorithm when the data in Figure 4.2 was used. For easier compar-ison to previous results where the data was clustered sample-by-sample, each batch, or realization, consists of approximately 3600 data points. To be able to explain the results more clearly, a fault realization of fault mode fi is said to be

correctly clustered if it belongs to a cluster where the majority of the fault real-izations from fault mode fi were clustered, and that fault mode fi is the most

common fault mode in the cluster. Similarly, data of fault mode fi is said to be

incorrectly clustered if fault mode fi is not the most common fault mode in the

cluster or if the majority of realizations from fault mode fi are not in the same

cluster. With these definitions, there are two incorrectly clustered fault realiza-tions, more specifically fpim with severity θ = −5% which is the smallest fault

size in the available fpimdata, and fimlwith an unknown fault size. The fault size

of this fiml instance is estimated to have the smallest fault size of the available

fiml data in Section 4.4. Most data are correctly clustered. The resulting fault

vectors from the cluster algorithm plotted with the some of the data can be seen in Figure 4.5.

fiml fpic fpim fwaf

Gr1 3 0 0 0

Gr2 0 7 0 0

Gr3 0 0 6 0

Gr4 1 0 1 8

Table 4.4: Matching matrix for clustering data from all fault sizes for faults

fpim, fpic, fwaf, fiml using the data in Figure 4.2. The resulting log-likelihood

for these clusters is 2.526 · 106.

The result for the same example as above but with equal mixing proportions can be seen in Table 4.5 where only one fault instance fpimis clustered incorrectly

con-trary to previous example where two fault instances were clustered incorrectly. The plot of the data and fault vectors look very similar to the previous example and need not be included.

fiml fpic fpim fwaf

Gr1 4 0 0 0

Gr2 0 7 0 0

Gr3 0 0 6 0

Gr4 0 0 1 8

Table 4.5: Matching matrix for clustering data from all fault sizes for faults

fpim, fpic, fwaf, fiml using equal mixing proportions and the data in

(39)

4.2 Clustering 27 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 Res1 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 Res2 Residuals

Figure 4.5:Clustering result with fault vectors for unsupervised learning.

4.2.2 Semi-supervised learning

To motivate the use of the proposed algorithm in Chapter 3 as a semi-supervised learner, this subsection presents results for different scenarios. The results when roughly half of the data is known and the rest is unknown is presented in Ta-ble 4.7, the corresponding matching matrix can be seen in TaTa-ble 4.6. Some of the tables in this section also include the fault size estimation θ which will be cov-ered more in Section 4.4. In Tables 4.8 and 4.9, the known/unknown faults are inverted compared to Table 4.7 to cover both cases of known/unknown for each fault. In both scenarios, the number of clusters is set to four.

Clustering with updating mixing proportions

In the case presented in Table 4.7 there are three incorrectly clustered fault in-stances, more specifically fiml with severities θ = [4mm, unknown] as well as

fpimwith severity θ = −5%. The incorrectly clustered instance of fimlwith an

un-known fault size is estimated to have the smallest fault size in the available data of fiml in Section 4.4. The incorrectly clustered realization of fpim is clustered

into the same cluster as fwaf with an estimated fault size of 0.132%. This

esti-mated fault size is considerably smaller than the smallest fault size of fwaf which

(40)

These three incorrectly clustered fault instances are all part of the known data and were in the correct clusters during the algorithm. However, when using the obtained model parameters to cluster, these fault realizations do not fit best with the correct clusters. This means that these fault realizations have contributed cor-rectly to the model, but still do not fit best to the cluster of the corresponding fault mode. To clarify, the results presented in Table 4.7 is the result when using the obtained model to cluster the fault realizations. This will be discussed further in Chapter 5. Since the incorrectly clustered faults are known and need not be clustered, the number of incorrectly clustered scenarios is actually zero in this case.

In Table 4.9 there are four misclassified fault instances, more specifically fpim

with severities θ = [−5%, +5%] and fwaf with severities θ = [−5%, +5%] i.e. the

smallest severities in the available data for these two fault modes. One realiza-tion of fpimis incorrectly clustered as fwaf with a fault size of −0.6571% which

is considerably smaller than the smallest fault size of fwaf which is 5%. This is

as previously mentioned a good indicator that the fault realization has been in-correctly clustered. The three other inin-correctly clustered fault realizations were all clustered into the same cluster as fiml. Since there is only one known fault

realization of fimlin this scenario, the fault size can not be estimated of the three

fault realizations that were incorrectly clustered into the same cluster as fiml.

fiml fpic fpim fwaf

Gr1 2 0 0 0

Gr2 0 7 0 0

Gr3 0 0 6 0

Gr4 2 0 1 8

Table 4.6: Matching matrix for semi-supervised learning using mixing pro-portions.

(41)

4.2 Clustering 29

fi θ fˆi θˆ Known/Unknown

fpim -20% fpim -20.54% Known

fpim -15% fpim -15.55% Unknown

fpim -5% fwaf 0.13% Known

fpim 5% fpim 5.08% Known

fpim 10% fpim 9.31% Unknown

fpic -20% fpic -20.04% Known

fpic -15% fpic -15.38% Unknown

fpic 5% fpic 4.50% Known

fpic 10% fpic 9.10% Unknown

fwaf -20% fwaf -19.76% Known

fwaf -10% fwaf -10.48% Unknown

fwaf 5% fwaf 5.01% Known

fwaf 10% fwaf 9.82% Unknown

fiml 4mm fwaf 2.68% Known

fiml 6mm fiml - Unknown

fiml - fwaf 0.79% Known

fiml - fiml - Unknown

Table 4.7: Results for semi-supervised learning using mixing proportions. The incorrectly clustered fault realizations are fpim, fiml, fimlwith fault sizes

−_{5%, 4mm and an unknown fault size, respectively.} _{The resulting} log-likelihood of the clusters is 2.520 · 106.

(42)

fiml fpic fpim fwaf

Gr1 2 0 1 0

Gr2 0 7 0 0

Gr3 0 0 5 0

Gr4 2 0 1 8

Table 4.8: Matching matrix for semi-supervised learning using mixing pro-portions.

fpim -5% fwaf -0.66% Unknown

fpim 5% fiml - Unknown

fwaf -5% fiml - Unknown

fwaf 5% fiml - Unknown

fiml 6mm fiml - Known

fiml - fiml - Known

Table 4.9: Results for semi-supervised learning using mixing proportions. The incorrectly clustered fault realizations are fpim, fpim, fwaf, fwaf with fault

sizes −5%, 5%, −5%, 5%, respectively. The resulting log-likelihood of the clusters is 2.528 · 106

(43)

4.2 Clustering 31

Clustering with fixed mixing proportions

In the previous examples, the mixing proportions were updated during the algo-rithm. This means that the algorithm takes into account not only how likely it is that a certain fault realization belongs to a certain fault vector and covariance matrix, but it also takes into account how many data points belong to each cluster. This means that the classification result will vary depending on how many fault scenarios are present for each fault mode. Below, an alternative method using the same scenarios as above will be evaluated, with the difference being that equal mixing proportions are used.

In Tables 4.10 and 4.11, only fpim with fault severity θ = −5% is incorrectly

clustered. As a recurring theme, it is the smallest fault realization in the avail-able data. In Tavail-able 4.13, the same fault and fault severity is misclassified. In both these cases, one realization of fpimis incorrectly clustered as fwaf and estimated

to have a much smaller fault size than the least severe fault of all fwaf realizations

which could be an indicator that the fault has been incorrectly clustered. fiml fpic fpim fwaf

Gr1 4 0 0 0

Gr2 0 7 0 0

Gr3 0 0 6 0

Gr4 0 0 1 8

Table 4.10:Matching matrix for semi-supervised learning using equal mix-ing proportions.

(44)

fpim -5% fwaf 0.01% Known

Table 4.11: Semi supervised clustering with equal mixing proportions. There is one incorrectly clustered fault realization, fpimwith fault size −5%.

Log-likelihood = 2.526 · 106

fiml fpic fpim fwaf

Gr1 4 0 0 0

Gr2 0 7 0 0

Gr3 0 0 6 0

Gr4 0 0 1 8

Table 4.12: Matching matrix for semi-supervised learning using equal mix-ing proportions.

(45)

4.2 Clustering 33

fpim -5% fwaf -0.62% Unknown

fwaf 5% fwaf 5.36 Unknown

Table 4.13: Semi-supervised clustering with equal mixing proportions. There is one incorrectly clustered fault realization, fpimwith fault size −5%.

(46)

4.2.3 Supervised learning

In this section, the results when all training data is known will be presented. This section mostly serves the purpose as a comparison to the unsupervised and semi-supervised learning sections, but also to show that the proposed algorithm in Chapter 3 can be used for supervised learning.

To motivate the use of this algorithm as a supervised learner, let us look at the case where the limited data in Table 4.14 is used for training a model and the rest of the available data is clustered using the obtained model. A model con-sisting of fault vectors and covariance matrices is created using the known data in Table 4.14. Using this model, the training data as well as rest of the data is classified and the result is presented in Table 4.15. Aside from the fault free data which is not included in the training data, only fpim(θ = −5%) and fiml with an

unknown fault size are incorrectly clustered as fwaf, these are the same two fault

realizations that were misclassified in the unsupervised learning in Section 4.2. As a common theme in the results, the two incorrectly clustered fault realizations are estimated to have fault sizes that are significantly smaller than the least severe fault of fwaf. fi θ fpim [-15%, +15%] fpic [-15%, +15%] fwaf [-15%, +15%] fiml [4mm, 6mm]

(47)

4.2 Clustering 35

N F 0 fwaf 2.45% Unknown

N F 0 fwaf -0.59% Unknown

fpim -5% fwaf 0.35% Unknown

fiml 4mm fiml 3.13mm Known

fiml 6mm fiml 6.39mm Known

fiml - fwaf 0.19% Unknown

fiml - fiml 5.88mm Unknown

Table 4.15:Results for supervised learning. The incorrectly clustered fault realizations are fpim, fiml with fault sizes −5% and an unknown fault size,

(48)

4.3 Estimating number of clusters

In this section the results from estimation of number of clusters is presented. In Figure 4.6 the results when all four faults are included in the clustering for both updating mixing proportions and equal mixing proportions are presented. The algorithm correctly guesses that there are four clusters in both cases. The clustering results from the run with updating mixing proportions can be seen in Table 4.4. In Table 4.5 the resulting clusters when equal mixing proportions are used is presented. 1 2 3 4 5 6 7 8 Number of clusters -5 -4.9 -4.8 -4.7 -4.6 BIC

106 BIC for all faults using updating mixing proportions

1 2 3 4 5 6 7 8 Number of clusters -5 -4.9 -4.8 -4.7 -4.6 BIC

106 BIC for all faults using equal mixing proportions

Figure 4.6: Estimation of number of clusters by finding a knee in the BIC-curve, which is shown as a red circle in this figure. Four fault classes are included.

In Figure 4.7, two examples of the estimation of number of clusters are shown when there are three different fault modes in the data and equal mixing pro-portions are used. The first example estimates the number of clusters when

fpim, fpicand fwaf are included in the data, whereas the second example

esti-mates the number of clusters when fpim, fpicand fiml are included in the data.

In both examples the estimations are correct as they estimate that there are three clusters. It might not be obvious to the eye why the knee point is estimated as 3 instead of, e.g., 5. This is discussed further in Chapter 5.

(49)

4.4 Fault size estimation using labeled data 37 1 2 3 4 5 6 7 8 Number of clusters -4.2 -4.1 -4 -3.9 BIC

106 BIC for fpim, fpic, fwaf

1 2 3 4 5 6 7 8 Number of clusters -3.5 -3.4 -3.3 -3.2 BIC

106 BIC for fpim, fpic, fiml

Figure 4.7: Estimation of number of clusters by finding a knee in the BIC-curve, equal mixing proportions with three different fault modes.

4.4 Fault size estimation using labeled data

In this section, fault size estimation results from different scenarios will be pre-sented, since the fault size estimation depends on what data is known. How the simple linear model used in this study fits to the data can be seen in Figure 4.8 where a linear model is fit between fault size and mean deviation in fault vec-tor direction for fpim, fpicand fwaf. Table 4.16 shows fault size estimations for

{N F, fpim, fpic, fwaf, fiml} when all known fault sizes are used for the estimations.

The no-fault data N F has four different fault size estimations, each one is es-timated using the fault mode {fpim, fpic, fwaf, fiml}. Fault size estimations from

other scenarios can be seen in Tables 4.7, 4.9, 4.11, 4.13 and 4.15. Fault size esti-mations for semi-supervised learning using updating mixing proportions can be seen in Tables 4.7 and 4.9. The estimations are quite accurate for correctly clus-tered faults, whereas incorrectly clusclus-tered faults are estimated to have very small fault sizes. Fault size estimation for semi-supervised learning using fixed mixing proportions can be seen in Tables 4.11 and 4.13. The same conclusions can be drawn for this scenario as in the previous scenario with updating mixing propor-tions. Finally, Table 4.15 shows that the fault size estimation is accurate even for supervised learning with limited training data, and that once again, incorrectly clustered faults are estimated to have very small fault sizes. To summarize, when the faults are clustered correctly, the fault size estimations are close to the actual fault size, and when the faults are clustered incorrectly, the estimated fault sizes are small which could be an indication that they have been incorrectly clustered.

(50)

Figure 4.8: Fitting linear model between fault size and mean deviation in fault vector direction.

(51)

4.4 Fault size estimation using labeled data 39 fi θ θˆ N F 0 [-0.36%, 0.18%, 2.47%, 0.57mm] N F 0 [1.00%, 0.29%, 0.46%, 0.99mm] fpim -20% -20.12% fpim -15% -15.51% fpim -10% -9.85% fpim -5% -4.35% fpim 5% 5.64% fpim 10% 9.57% fpim 15% 14.62% fpic -20% -19.92% fpic -15% -15.34% fpic -10% -9.70% fpic -5% -4.47% fpic 5% 4.68% fpic 10% 9.35% fpic 15% 15.49% fwaf -20% -19.80% fwaf -15% -14.99% fwaf -10% -10.50% fwaf -5% -5.14% fwaf 5% 5.83% fwaf 10% 10.33% fwaf 15% 14.25% fwaf 20% 20.03% fiml 4mm 3.13mm fiml 6mm 6.39mm fiml - 1.63mm fiml - 5.88mm

Table 4.16:Fault size estimation when all known fault sizes in θ are used to estimate fault size. Estimation for NF is divided into four categories depend-ing on which fault mode is used to estimate the fault size of NF

(52)

(53)

5

Discussion and conclusion

This chapter discusses the results, method and some future work. The proposed algorithm has been customized to take characteristics of residual data and diag-nosis into account. Customizing the algorithm for specific purposes makes it less general, but can significantly improve the clustering performance when applied to data with the specific characteristics it has been designed for. The customiza-tions made also makes it easier to estimate the fault sizes of the fault realizacustomiza-tions.

5.1 Results

The goal of this study was to develop a method that can group unlabeled residual data together based on fault modes to battle the fact that labeled data can be diffi-cult and time consuming to generate, and the fact that someone could potentially label the data incorrectly. The proposed modified Gaussian mixture model takes the residual behaviour as well as time series data into account and shows promis-ing results as only some of the least severe realizations of the fault modes are clustered incorrectly. Why only the smaller fault sizes are clustered incorrectly can probably be explained by the fact that data from smaller fault realizations are closer to the nominal case and have a large overlap with other fault modes. The results of the proposed algorithm are nonetheless significantly better than the results from regular GMM in Table 4.2 and the results from the proposed algorithm using batchwise clustering is also significantly better than the results from clustering each data point separately in Table 4.3.

As mentioned the results are promising, however the tests in this study have been performed with a limited amount of data all collected from the same en-gine where data from different fault scenarios are collected using the same drive cycle and the same engine test bench. To further test the performance of this

(54)

algorithm it should be tested with data from different drive cycles as well as data from different engines and engine models. The clustering results could also be affected if other residual generators are used, but residual generation is not part of this study.

The fault size estimation also seems accurate, this is however expected since the relationship between fault size and mean cluster position in the fault vector di-rection looks very close to linear in Figure 4.8. The fault size estimation for fiml

can not be validated as there are only two available fault sizes. As such, only the three sensor fault size estimations can be validated, and it could very well be that other types of faults do not have a linear relationship between fault size and mean cluster position in the fault vector direction, this is something that could be investigated in the future with more data from other types of faults.

The clustering using fixed mixing proportions seems more accurate than the ver-sion with updating mixing proportions and is also less sensitive to when there is different amount of data for different fault modes. It also requires significantly fewer (10x-100x) outer iterations to converge to a good optimum. The estimation of number of clusters is then very slow for the version with updating mixing pro-portions as the algorithm has to be run for K = [1, N ]. For context, when using a standard desktop CPU with six cores, the computation time is approximately one minute for the scenario in Table 4.5 where equal mixing proportions are used, and the computation time for updating mixing proportions in Table 4.4 is over ten minutes. The estimation of number of clusters has been performed with both updating mixing proportions and equal mixing proportions, but most examples are with equal mixing proportions because that approach finds a good solution faster and is, therefore, less time-consuming.

5.2 Method

In the following section the research questions in Section 1.3 will be revisited to see that they are answered with the proposed algorithm, but also to criticize the method used to solve the questions.

1. Can the Gaussian mixture model be modified such that it can be used to cluster residual data from an internal combustion engine based on fault modes?

Yes, the proposed algorithm can cluster data based on fault modes quite successfully. However, the data used for validation is limited and all comes from the same drive cycle and the same engine using the same engine model. This means it has only been validated that it seems to work well on the setup used in this study, but the results could be different for other data. The chosen drive cycle is designed to cover a wide range of operating points, but it would be interesting to validate the proposed algorithm on another data set.