Self-organizing maps for virtual sensors, fault detection and fault isolation in diesel engines

(1)

Self-organizing maps for virtual sensors,

fault detection and fault isolation in

diesel engines

Examensarbete utfört i Reglerteknik vid Tekniska Högskolan i Linköping

av

Conny Bergkvist and Stefan Wikner Reg nr: LiTH-ISY-EX–05/3634–SE

(2)

(3)

Self-organizing maps for virtual sensors,

fault detection and fault isolation in

diesel engines

Examensarbete utfört i Reglerteknik vid Tekniska Högskolan i Linköping

av

Conny Bergkvist and Stefan Wikner Reg nr: LiTH-ISY-EX–05/3634–SE

Supervisor: Johan Sj¨oberg (LiTH) Urban Walter (Volvo) Fredrik Wattwil (Volvo) Examiner: Svante Gunnarsson (LiTH)

Jonas Sj¨oberg (Chalmers) Link¨oping 18th February 2005.

(4)

(5)

Avdelning, Institution Division, Department Institutionen för systemteknik 581 83 LINKÖPING Datum Date 2005-02-11 Språk

Language Rapporttyp Report category ISBN Svenska/Swedish

X Engelska/English Licentiatavhandling X Examensarbete ISRN LITH-ISY-EX--05/3634--SE C-uppsats D-uppsats Serietitel och serienummer _{Title of series, numbering} ISSN

Övrig rapport

____

URL för elektronisk version

http://www.ep.liu.se/exjobb/isy/2005/3634/

Titel

Title Self-organizing maps for virtual sensors, fault detection and fault isolation in diesel engines

Författare

Author Conny Bergkvist and Stefan Wikner

Sammanfattning

Abstract

This master thesis report discusses the use of self-organizing maps in a diesel engine management system. Self-organizing maps are one type of artificial neural networks that are good at visualizing data and solving classification problems. The system studied is the Vindax(R) development system from Axeon Ltd. By rewriting the problem formulation also function estimation and conditioning problems can be solved apart from classification problems.

In this report a feasibility study of the Vindax(R) development system is performed and for implementation the inlet air system is diagnosed and the engine torque is estimated. The results indicate that self-organizing maps can be used in future diagnosis functions as well as virtual sensors when physical models are hard to accomplish.

Nyckelord

Keyword

self-organizing maps, neural network, virtual sensor, diesel engine, fault detection, fault isolation, automotive, development system

(6)

(7)

Abstract

This master thesis report discusses the use of self-organizing maps in a diesel engine management system. Self-organizing maps are one type of artificial neural networks that are good at visualizing data and solving classification problems. The system studied is the Vindax R _{development system from Axeon Ltd. By rewriting the}

problem formulation also function estimation and conditioning problems can be solved apart from classification problems.

In this report a feasibility study of the Vindax R _{development system is}

per-formed and for implementation the inlet air system is diagnosed and the engine torque is estimated. The results indicate that self-organizing maps can be used in future diagnosis functions as well as virtual sensors when physical models are hard to accomplish.

Keywords: self-organizing maps, neural network, virtual sensor, diesel engine, fault detection, fault isolation, automotive, development system

(8)

(9)

Preface

The thesis has been made by two students, one student from Chalmers University of Technology and one from Link¨opings University. For practical purposes this report exist in two versions; one at Chalmers and one at Link¨oping. Their contents are identical, only the framework differs slightly to meet each university’s rules.

(10)

(11)

Acknowledgments

Throughout the thesis work a number of persons have been helpful and engaged in our work. We would like to thank our instructors Fredrik Wattwil and Urban Walter at Engine Diagnostics, Volvo Powertrain for their support and engagement. In ad-dition we thank instructors and examiners; Svante Gunnarsson and Johan Sjöberg at Linköpings University and Jonas Sjöberg at Chalmers University of Technol-ogy. Also the support team from Axeon Ltd - Chris Kirkham, Helge Nareid, Iain MacLeod and Richard Georgi - and the project coordinator, Carl-Gustaf Theen, as well as the other employees at the Engine Diagnostics group have been to great help for the success of this thesis.

(12)

(13)

Notation

Symbols

x, X Boldface letters are used for vectors and matrices.

Abbreviations

ABSE ABSolute Error

ANN Artificial Neural Network

DC Dynamic Cycle

EMS Engine Management System ETC European Transient Cycle

PCI Peripheral Component Interconnect RMSE Root-Mean-Square Error

SOM Self-Organizing Map

VDS Vindax R _{Development System}

VP Vindax R _Processor

(14)

(15)

1 Introduction 1 1.1 Background . . . 1 1.2 Problem description . . . 2 1.3 Purpose . . . 2 1.4 Goal . . . 3 1.5 Delimitations . . . 3 1.6 Method . . . 3 1.7 Thesis outline . . . 4 2 Background theory 5 2.1 Self-organizing maps . . . 5 2.2 Diesel engines . . . 8 2.2.1 The basics. . . 8 2.2.2 Four-stroke engine . . . 9 2.2.3 Turbo charger. . . 9

2.2.4 What happens when the gas pedal is pushed? . . . 10

3 The Vindax R _{Development System} ₁₃ 3.1 Introduction. . . 13

3.1.1 Labeling step . . . 14

3.1.2 Classification step . . . 14

3.2 Conditioning of input signals . . . 15

3.3 Classification of cluster data. . . 16

3.3.1 Introduction . . . 16

3.3.2 Results and discussion . . . 19

3.4 Function estimation . . . 20

3.5 Handling dynamic systems. . . 25

3.6 Comparison with other estimation and classification methods . . . . 29 3.6.1 Cluster classification through minimum distance estimation . 29

(16)

x Contents

3.6.2 Function estimation through polynomial regression . . . 30

4 Data collection and pre-processing 31 4.1 Data collection . . . 31 4.1.1 Amount of data. . . 31 4.1.2 Sampling frequency . . . 32 4.1.3 Quality of data . . . 32 4.1.4 Measurement method . . . 33 4.2 Pre-processing . . . 34

5 Conditioning - Fault detection 37 5.1 Introduction. . . 37

5.2 Method . . . 37

5.3 Results and discussion . . . 38

6 Classification - Fault detection and isolation 41 6.1 Introduction. . . 41

6.2 Method . . . 41

6.2.1 Leakage isolation . . . 42

6.2.2 Reduced intercooler efficiency isolation. . . 43

7 Function estimation - Virtual sensor 47 7.1 Introduction. . . 47

7.2 Method . . . 47

8 Conclusions and future work 51 8.1 Conclusions . . . 51 8.1.1 Conditioning . . . 52 8.1.2 Classification . . . 52 8.1.3 Function estimation . . . 53 8.2 Method criticism . . . 53 8.3 Future Work . . . 53 Bibliography 55 A RMSE versus ABSE 57 B Measured variables 59 C Fault detection results 61 C.1 90 % activation frequency resolving. . . 62

C.2 80 % activation frequency resolving. . . 63

C.3 70 % activation frequency resolving. . . 64

(17)

Contents xi

(18)

(19)

Chapter 1

Introduction

This chapter gives the background, the purpose, the goal, delimitations, used meth-ods and the outline of the thesis.

1.1 Background

The demand for new methods and technologies in the Engine Management System, EMS, is increasing. Laws that govern the permissible level of pollutants in the exhaust of diesel engines have to be followed at the same time as the drivers demand high performance and low fuel consumption, putting the EMS under pressure.

Applications in the EMS are based on a number of inputs measured by sensors and estimated by models. Limitations of these sensors and models create situations where qualitative input signals cannot be obtained. Adding new sensors is expensive and creates additional need for diagnostics. Together, this forces engineers to look at new methods to attain these values or to find ways around the problem.

Identifying models using Artificial Neural Networks, ANNs, can be one solution. A lot of research has been put into this area. The increasing number of applica-tions with implemented ANNs indicates that the technology could be of use in an EMS. There are development systems available to produce hardware and software applications using ANNs to be deployed in the EMS. The Vindax R _Development

System, VDS, by Axeon Limited1_{is an example of this. The system may be capable}

of solving problems within the EMS.

The thesis work has been performed at Volvo Powertrain Corporation. The diagnosis group want to know if the VDS can be used to help in the area of diagnosing diesel engines.

1_{www.axeon.com}

(20)

2 Introduction

1.2 Problem description

Three different kinds of problems are investigated in this thesis:

Conditioning - Fault detection Conditioning a system is one way of determin-ing if the system is fault free. Assume that a system has states x ∈ Φ, where x denotes the system states and Φ denotes a set, if it is fault free. Determining if x ∈ Φ is a conditioning problem.

Classification - Fault isolation There can be different kinds of faults that needs to be isolated. Assume that a system with states x is fault free if x ∈ Φ0.

Assume also that k different classes of errors can occur and that each error causes the states to belong to one of the sets Φ1, Φ2, . . . Φk. Determining

which set x belongs to is a classification problem.

Function estimation - Virtual sensor A system generates output according to y = f (u) ∈ <. This output can be measured by sensors or obtained by determining the function f . In complex systems, e.g. an engine, it may be hard to determine f exactly. Finding an estimation of f is a function estimation problem.

1.3 Purpose

The aim for this master thesis is to investigate a neural network application, based on Self-Organized Maps, SOM. The main purpose is to investigate how well such a neural network works compared to traditional software applications used for control of real time functions used in EMSs. The actual environment to be investigated is the VDS. The thesis should:

• Evaluate the training process of the system regarding:

– Time requirements

– Size and type of source measurement data needed

• Evaluate the strengths and weaknesses of the system

• Evaluate the performance and capabilities of the VDS to solve problems de-scribed in section1.2.

(21)

1.4 Goal 3

1.4 Goal

To fulfill the purpose, the thesis goal is divided into three parts:

• Estimate the output torque with better accuracy than the value estimated by the EMS

• Detect air leakages in the inlet air system as well as a reduced intercooler efficiency with accuracy sufficient to make a diagnosis

• Isolate the same errors sufficient accurate to make a diagnosis

All experiments are performed on a Volvo engine using the VDS.

1.5 Delimitations

The evaluation of the VDS will be based upon a PCI2_{-card version of the Vindax} R

hardware on a Windows PC.

For the fault detection and isolation problems data from one single engine in-dividual are used. However, two different types of test running cycles are used. For function estimation data from one single type of test cycle are used, but the test cycle is run on two different types of engines. For all problems, the data used for verification are of the same kind of data as for training.

1.6 Method

To fulfill the purpose and reach the goal of this thesis, a feasibility study of the VDS is performed. The study reveals basic properties of the system and it will be a solid base for the continued work. With the feasibility study as base, the actual development of the applications takes place.

2_{Peripheral Component Interconnect, PCI, is a standard for connecting expansion cards to a}

(22)

4 Introduction

1.7 Thesis outline

Chapter 2 Introduction to the theory behind the Vindax R _{Processor, VP, used}

in the VDS; self-organizing maps. The diesel engine is also described to give a sense of the environment where the algorithm is going to be deployed. Chapter 3 A feasibility study of the VDS and its three main application areas;

conditioning, classification and function estimation.

Chapter 4 Here the collection and pre-processing of data is discussed. Important issues as representability of data are handled, both in general and specific terms.

Chapter 5-7 Each of these chapters describes the development of an application. These are very specific chapters and presents results that can be expected to be achieved with the VDS.

Chapter 8 Ends the report with conclusions and recommendations for future work.

(23)

Chapter 2

Background theory

This chapter gives a short introduction to self-organizing maps and diesel engines for readers with no or little knowledge about these areas.

2.1 Self-organizing maps

Professor Kohonen developed the SOM algorithm in its present form in 1982 and presents it in his book [1]. The algorithm maps high-dimensional data onto a low dimensional array. It can be seen as a conversion of statistical relationships into geometrical, suitable for visualization of high-dimensional data. As this is a kind of compression of information, preserving topological and/or metric relationships of the primary data elements, it can also be seen as a kind of abstraction. These two aspects make the algorithm useful for process analysis, machine perception, control tasks and more.1

A SOM is actually an ANN categorized as a competitive learning network. These kinds of ANNs contain neurons that, for each input, compete over determining the output. Each time the SOM is fed with an input signal, a competition takes place. One neuron is chosen as the winner and decides the output of the network.

The algorithm starts with initiation of the SOM. Assume that the input vari-ables, ξi, are fed to the SOM as a vector u = [ξ1, ξ2. . . ξn]T ∈ <n. This defines

the input space as a vector of dimension n. Each neuron is then associated with a reference vector, mi ∈ <n, that gives the neuron an orientation in the input space.

The reference vectors are finally, often randomly, given an initial value.

Then the algorithm carries out a regression process to represent the available inputs. For each sample of the input signal, u(t), the SOM-algorithm selects a winning neuron. To find the winner, the distance between each neuron and the

1_{It is formally described, by the inventor himself, as ([}₁_{], p 106) “a nonlinear, ordered, smooth}

mapping of high-dimensional input data manifolds onto the elements of a regular low-dimensional array.”

(24)

6 Background theory

input sample is calculated. According to equation (2.1)2 _{the winner m}

c(t) is the

neuron closest to the input sample.

||u(t) − mc(t)|| ≤ ||u(t) − mi(t)|| ∀ i (2.1)

Here t is a discrete time-coordinate of the input samples.

When a winner is selected, the reference vectors of the network are updated. The winning neuron and all neurons that belong to its neighborhood are updated according to equation (2.2).

mi(t + 1) = mi(t) + hc,i(t)(u(t) − mi(t)) (2.2)

The function hc,i(t) is a ‘neighborhood function’ decreasing with the distance

between mc and mi. The neighborhood function is traditionally implemented

as a Gaussian (bell-shaped) function. Also for convergence reasons, hc,i(t) →

0 when t → ∞. This regression distributes the neurons to approximate the proba-bility density function of the input signal.

To give a more qualitative description of the SOM-algorithm an example is con-sidered. The input signal, generated by taking points on the arc of a half circle3_{, is}

shown in figure2.1. This is a signal of dimension two that the SOM is going to map. The SOM has in this case a size of 16 neurons, i.e. an array 4 by 4. To illustrate the learning process, the array is imagined as a net, with knots, representing neu-rons, and elastic strings keeping a relation to each knot’s neighbors. For initiation, the SOM is distributed over the input space according to figure2.1. The reference vectors are randomized in the input space (using a Gaussian distribution).

During training, a regression process adjusts the SOM to the input signal. For each input, the closest neuron is chosen to be the winner. When a winner is selected, it and its neighbors are adjusted towards the input. This can be seen as taking the winning neuron and pulling it towards the input. The elastic strings in the net causes the neighbors also to adjust, following the winner. Referring to equation (2.2) the neighborhood-function, hc,i(t), determines the elasticity of the net.

The regression continues over a finite number of steps to complete the training process. In figure 2.1 network shapes after 10 and 100 steps of training are illus-trated. As can be seen a nonlinear, ordered and smooth mapping of the input signal is formed after 100 steps. The network has formed an approximation of the input signal.4

2_{In equation (}_2.1_{) the norm used usually is the Euclidean (2-norm), but other norms can be}

used as well. The VDS uses the 1-norm.

3_{Although there is probably no physical system giving this input, it suites the purpose to}

illustrate the SOM-algorithm.

(25)

2.1 Self-organizing maps 7

Figure 2.1. Illustration of the SOM-algorithm. Each step involves presenting one input sample for the SOM. The network has started approximating the distribution of the input signal already after 10 steps. After 100 steps the network has formed an approximation of the input signal.

(26)

8 Background theory

2.2 Diesel engines

It is not crucial to know how diesel engines work to read this report, but some knowledge might increase the understanding why certain signals are used and how they affect the engine.

This thesis will only discuss the four-stroke diesel engine because that is the type that is used in Volvo’s trucks. To learn more about engines, both diesel and Otto (e.g. petrol), see e.g. [2], [3].

2.2.1 The basics

A diesel engine, or compression ignited engine as it is sometimes called, converts air and fuel through combustion into torque and emissions.

Figure 2.2. A picture from the side of one cylinder in a diesel engine.

Figure 2.2 shows the essential components in the engine. The air is brought into the engine through a filter to the intake manifold. There the air is guided into the different cylinders. In the cylinder the air is compressed and when the fuel is injected the mixture self ignites. The combustion results in a pressure that generates a force on the piston. This in turn is transformed into torque on the crankshaft that makes it rotate. Another result of the combustion is exhaust gases that go through the exhaust manifold, the muffler and out into the air.

In this thesis the measured intake manifold pressure and temperature are called boost pressure and boost temperature respectively.

(27)

2.2 Diesel engines 9

2.2.2 Four-stroke engine

The engine operates in four strokes (see figure2.3):

Figure 2.3. The four strokes in an internal combustion engine.

1. Intake stroke. (The inlet valve is open and the exhaust valve is closed) The air in the intake manifold is sucked into the cylinder while the piston moves downwards. In engines equipped with a turbo charger, see section 2.2.3, air will be pressed into the cylinder.

2. Compression stroke. (Both valves are closed) When the piston moves up-wards the air is compressed. The compression causes the temperature to rise. Fuel is injected when the piston is close to the top position and the high temperature causes the air-fuel mixture to ignite.

3. Expansion stroke. (Both valves are closed) The combustion increases the pressure in the cylinder and pushes the piston downwards, which in turn rotates the crankshaft. This is the stroke where work is generated.

4. Exhaust stroke. (The exhaust valve is open and the inlet valve is closed) When the piston moves upwards again it pushes the exhaust gases out into the exhaust manifold.

After the exhaust stroke it all starts over with the intake stroke again. As a result of this, each cylinder produces work during one stroke and consumes work (friction, heat, etc.) during three strokes.

2.2.3 Turbo charger

The amount of work, and in the end the speed, generated by the engine depends on the amount of fuel injected, but also the relative mass of fuel and oxygen is important. For the fuel to be able to burn there must be enough of oxygen. From the intake manifold pressure the available amount of oxygen is calculated and in

(28)

10 Background theory

turn, together with the wanted work, the amount of fuel injected is calculated. So, a manifold pressure that is not high enough might lead to lower performance.

Figure 2.4. A schematic picture of a diesel engine.

To get more air, and with that more oxygen, into the cylinders, almost all large diesel engines of today use a turbo charger. This will increase the performance of the engine as more fuel can be injected. The turbo charger uses the heat energy in the exhaust gases to rotate a turbine. The turbine is connected to a compressor (see figure2.4) that pushes air from the outside into the cylinders.

One effect of the turbo charger is that the compressor increases the temperature of the air. When the air is warmer it contains less oxygen per volume. An intercooler can be used to increase the density of oxygen and by that increase the performance of the engine. After the intercooler the air has (almost) the same temperature as before going through the compressor but at a much higher pressure. From the ideal gas law, d = m/V = p/RT , we get that the density has increased. This way we can get more air mass into the same cylinder volume only using energy that otherwise would be thrown away.

2.2.4 What happens when the gas pedal is pushed?

When the driver pushes the pedal the engine produces more torque. Simplified, the process looks like:

1. The driver pushes the pedal.

2. More fuel is injected into the cylinders.

3. More fuel yields a larger explosion which leads to more heat and in turn more pressure in the cylinder. (Ideal gas law)

(29)

2.2 Diesel engines 11

4. The increased pressure pushes the piston harder which leads to higher torque. But it does not stop here; after a little while the turbo ’kicks’ in.

5. The increased amount of exhaust gases together with the increased temper-ature accelerate the turbine.

6. The increased speed of the turbine makes the compressor push more air into the intake manifold and the pressure rises.

7. This leads to more air in the cylinders.

8. With more air in the cylinders more fuel can be used in the combustion which, in turn, leads to an even larger explosion and even more torque.

The final four steps is repeated until a steady state is achieved. This is a rather slow process which takes a second or two before steady state is achieved.

(30)

(31)

Chapter 3

The Vindax

R

Development

System

In this chapter the VDS by Axeon Limited is introduced with a feasibility study. To be able to visualize and easily understand the way the VDS works, low dimensional inputs are used to create illustrative examples. To finish this chapter the VDS is compared with other systems trying to solve the same problems.

In this chapter the variables y and u denote the output signal and the input signal respectively.

3.1 Introduction

The VDS deploys a SOM algorithm in hardware through a software development system. The hardware system, used for this thesis, consists of a PCI version of the VP. It consists of 256 RISC processors, each representing a neuron in a SOM. The architecture allows for the computations to be done in parallel, which decreases the working time substantially. The dimension of the input vectors, i.e. the memory depth, is currently limited to 16.

Configuration of the processor is quite flexible. It can be divided, in up to four separate networks that can work in parallel, individually or in other types of hierarchies. There is also a possibility to swap the memory (during operation), enabling more than one full size network to run in the same processor.

The software is simple to use and provides good possibilities to handle and visualize data. It also includes some data pre-processing tools as well as output functions. In the software system, the VP is used with networks sizes of 256 neu-rons and smaller. To test larger networks, a software emulation mode, where the maximum size is 1024 neurons and the memory depth is 1024, is available. The emulation mode do not use the VP and therefore increases the calculation times considerably.

(32)

14 The Vindax R _{Development System}

The VDS is able to solve classification problems, described in section1.2, accord-ing to section3.3. Rewriting of the conditioning and function estimation problems, enables a solution to these problems as well. Performing this is described in sections 3.2and3.4.

Development of applications using the VDS involves 4 steps: 1. Data collection and pre-processing

2. Training 3. Labeling 4. Classification

Data collection and pre-processing are described in chapter4. During the train-ing step the network is organized ustrain-ing the SOM algorithm presented in section2.1. The labeling and classification steps are described below.

3.1.1 Labeling step

The training has formed an approximation of the input signal distribution and this approximation should now be associated with output signals, be labeled.

The labeling step uses the same input data as the training step, together with measured output data. It requires the correct output value to be known for each input. The neuron responding to a particular input sample should have an output value matching the measured output value. Presenting the data to the network makes it possible to assign values for each neuron.

As there are more input samples than neurons, each neuron will have many output value candidates. If these candidates are not identical, which is most often the case, some kind of strategy for choosing which one to use as label is needed. Either, it is chosen manually for each neuron, or one of the automatic methods of the VDS is used. There are different kinds of methods suitable for different kinds of problems and they are discussed in sections3.2-3.4in its contexts.

3.1.2 Classification step

If correctly labeled, the network can be used for mapping new inputs using the classification step. This is done in a very straightforward manner and can be seen as function evaluation. For each new input a winning neuron is selected as described in section2.1. The label of the winner is the output of the network.

(33)

3.2 Conditioning of input signals 15

3.2 Conditioning of input signals

To solve a conditioning problem, according to section1.2, the distance measurement has to be involved. As discussed in section2.1, the distance from the input sample to each neuron is calculated when a winner is selected. It is this distance that can be used for conditioning of signals.

The idea is to save the maximum distance, between the neurons and the input signals, that occurs when the network is presented with input signals from a fault free system. An error has probably occurred if the maximum distance is exceeded when new inputs are presented to the network.

To do this, the network is first trained on data from a system with no faults. After that, the same data are fed to the network again and the distance between the input signal and the winning neuron is saved for each sample. The maximum distance, for each neuron, is used as label.

During the classification step, the label value of the winning neuron is subtracted from the distance between this neuron and the current sample to get the difference between the distance of the current sample and the maximum distance that occured during the training. Previously unseen data have been presented to the VP when the difference is larger than zero. Assuming that the network has been trained on representative (see4.1) data, this implies that an error has probably occurred.

Applying this on the real problem is as simple as on a constructed one. Therefore see chapter5for a real world example that illustrates the use of this technique.

(34)

3.3 Classification of cluster data

To explore the capabilities to solve classification problems, see section 1.2, with VPs, an example with clusters in three dimensions is made.

3.3.1 Introduction

Three, Gaussian, clusters are created with the mean values in three points. They are labeled: blue, red and black. Different variances are used to create two sets of input data, shown in figures3.1-3.2. Printed in black and white it is hard to see which cluster the data points belong to. The purpose of the figures is however to show how the clusters are distributed.

Figure 3.1. First cluster setup with every tenth sample of the data plotted from different angles.

The task is to take an input signal, composed of the three coordinates for each point, and classify it as blue, red or black. To do this, the VP is first trained (using all 256 neurons). This creates an approximation of the input signal distribution that, hopefully, contains three regions. This can be visualized by looking at the distance between neurons using the visualization functionality in the VDS. This is very useful when the input signal has more than three dimensions, as it is not possible to visualize the actual data itself. In this way it is possible to get an idea of how the neurons are distributed over the input space also when working in higher dimension. Then the SOM is labeled to be able to give some output. This is the part where settings for the classification are made.

(35)

3.3 Classification of cluster data 17

Figure 3.2. Second cluster setup with every tenth sample of the data plotted from different angles.

When labeling the SOM, there will be conflicts when neurons are associated with more than one input. For example, if a neuron is associated with inputs with label red 60 times and black 40 times the neuron will be named multi-classified after the labeling process in the VDS. This situation has to be resolved and there are different methods for this.

In the VDS, there are six different multi-classification resolving methods: 1. Most frequent

2. Activation frequency threshold 3. Activation count threshold

4. Activation frequency range threshold 5. Activation count range threshold 6. Manually

The first two are most appropriate for the situations occurring in this thesis, which is why the other methods not are handled here.1

1_{The third method is similar to the second one but harder to use as the number of activations}

depend on the amount of data. There is no reason to use a range (method 4-5) instead of a threshold as the neurons with a high number of activations definitely should be labeled. Finally the manual method is very time-consuming and does not really provide an advantage compared to the first two methods.

(36)

Using the first method makes the VDS assign neuron labels that occurred most frequently. In the example above, this would mean that the neuron would be clas-sified as red.

With the second option it is possible to affect the accuracy of the classification. There will be a trade-off between the number of unclassified inputs and the accuracy of the ones classified. If the activation frequency threshold is set to a high value, say 90 percent, the classification will be accurate but there can be many inputs that are classified as unknown. In the example above, this would mean that the neuron is not classified and the threshold has to be lowered to 60 percent to classify the neuron as red.

Figure 3.3. Possible label regions for input classification when using data from figure3.2 as input.

This is visualized in figure 3.3. In between the clusters there are inputs that belong to one cluster but are situated closer to another cluster center. For example, there are many inputs in figure 3.2 that belong to the red cluster but are closer to the black cluster center (and vice versa). Depending on how high the frequency threshold is set for the labeling, these inputs will be classified as black or not classified at all.

With a high threshold there will be many neurons without a label. This causes the unclassified region to be big and the accuracy, of those classified, to be high. The drawback of this is that many inputs will not be classified at all, e.g. trying to classify between an engine with no faults and an engine with air leakage will return many answers saying nothing.

(37)

3.3 Classification of cluster data 19

3.3.2 Results and discussion

The results from using the VDS to perform this task are summarized in tables3.1 -3.4. Classified as blue Classified as red Classified as black Not classified

True value blue 95.74% 0.00% 0.06% 4.20%

True value red 0.00% 95.72% 0.06% 4.16%

True value black 0.00% 0.00% 98.41% 1.59%

Table 3.1. Classification on first cluster (figure 3.1) data with no multi-classification resolution. Classified as blue Classified as red Classified as black

True value blue 99.75% 0.00% 0.26%

True value red 0.13% 99.74% 0.13%

True value black 0.07% 0.50% 99.42%

Table 3.2. Classification on first cluster (figure3.1) data using most frequent as multi-classification resolution. Classified as blue Classified as red Classified as black Not classified

True value blue 77.72% 0.13% 0.19% 21.96%

True value red 0.07% 84.13% 0.13% 15.68%

True value black 0.28% 0.28% 70.25% 29.19%

Table 3.3. Classification on second cluster (figure3.2) data with no multi-classification resolution. Classified as blue Classified as red Classified as black

True value blue 96.76% 1.10% 2.14%

True value red 0.59% 96.80% 2.61%

True value black 2.46% 1.12% 96.42%

Table 3.4. Classification on second cluster (figure3.2) data using most frequent as multi-classification resolution.

Using no multi-classification resolution, i.e. a frequency threshold of 100 percent, gives high accuracy but many unclassified samples. Instead, using most frequent resolution gives a high classification rate but with less accuracy. Which one to use depend on the application where it should be used.

These two are extremes where no multi-classification resolution leaves many neurons without a label whereas the most frequent resolution method labels all neurons, except for the ones with equally many activations from two or more labels. The frequency threshold can be lowered to get a result that would lie somewhere in between these. I.e. a little lower accuracy but a little lower not classified signals.

(38)

3.4 Function estimation

This section investigates the capabilities to estimate functions, see section1.2, with the VDS.

3.4.1 Introduction

First a simple linear system is created:

y1[t] = 600u1[t] + 750u2[t], (3.1)

with uniformly distributed random inputs u1, u2 ∈ [0, 2]. These are used to train

the VP.

This is a very simple problem; parameter estimation. It is questionable whether to use a neural network to solve this as simpler methods will probably produce better results. It is however suitable to illustrate the characteristics of the VDS.

The labeling of the VP will be done by, for each neuron, choosing the mean value of all label candidates. This is the most common method to select labels when estimating functions. It is appropriate as there will be a very large number of labels, i.e. measured outputs, and choosing the mean value will give a nice approximation.

Three network sizes are used to estimate the system: - 64 neurons

- 256 neurons, the physical limit of the VP

- 1024 neurons, the limit of the software emulation mode

Increasing the number of neurons also increases the number of output values (one output value per neuron). Used on the same problem a larger network should lead give a higher output resolution hence reduce the errors.

As a second step, noise is introduced to the system. The network is going to be trained and labeled using noisy signals. The classification can then be compared with the true output values to see whether the network is performing worse or if it is a robust method.

The noise, band-limited white noise, is added to the input and output signals. Different amplitudes will be used, as the input signals are much weaker than the output. The amplitude is chosen according to table3.5.

Signal Noise-amplitude u1 5 · 10−4

u2 5 · 10−4

y 100

(39)

3.4 Function estimation 21

Three non-linear systems are compared to the linear system. These are: y2[t] = 1200 · atan(u1[t]) + |18 · u2[t]| 2 (3.2) y3[t] = 1200 · atan(u1[t]) + 100 · e2· √ u2[t] _(3.3)

plus a third system identical to system (3.2) except for a backlash with a dead-band width of 200 and an initial output of 10002_{, that is applied on the output signal.}

All systems have (different) uniformly distributed, random inputs, u1, u2∈ [0, 2].

3.4.2 Results and discussion

The results are summarized in table 3.6with Root Mean Square Errors, RMSEs, and Absolute Errors, ABSEs for the estimations. See appendix A for details of RMSE and ABSE.

System RMSE mean ABSE max ABSE

System (3.1) 64 neurons 70.9 59.2 187.8 256 neurons 35.3 29.6 93.2 1024 neurons 18.5 15.6 51.5 256 neurons (noisy signal) 254.0 199.2 1069 System (3.2) without backlash 39.8 31.2 164.5 with backlash 83.4 68.4 283.1 System (3.3) 41.2 33.3 148.4

Table 3.6. Statistics after classification with VDS on the four systems examined in this section. Systems (3.2) and (3.3) are all with 256 neuron networks and without noise.

With a SOM of 256 neurons a result shown in figure3.4is achieved. The esti-mated output values in the figure, are gathered at a number of levels. I.e. the output from the VP is discrete. It is hard to see in the figure, but there are actually 256 different levels, corresponding to the number of neurons in the SOM.

Looking at how the SOM is distributed over the input space, the output levels are denser towards the middle. When trained, the SOM approximates the proba-bility distribution of the input signal. In this example two uniformly distributed signals are used as input. Together these inputs have a higher probability for values in the middle range of the space. Extreme values have less probability and hence the look of figure3.4.3

2_{See Matlab} R_/Simulink R _{help for details}

3_{A classical example in probability theory is throwing dices. Each throw is uniformly}

distrib-uted among the results. Taking two throws in a row, the probability to get an outcome of 6 is higher than the probability to get for example 2. That is very similar to the problem that is approached here.

(40)

Figure 3.4. Correlation plot with estimated values compared to the true values of sys-tem (3.1) using a network of 256 neurons. The straight line shows the ideal case where the network has infinitely many neurons and is optimally trained and labeled.

Figure 3.5. Correlation plot with estimated values compared to the true values using a network of 1024 neurons applied on system (3.1). The straight line shows the ideal case where the network has infinitely many neurons and is optimally trained and labeled.

(41)

3.4 Function estimation 23

Comparing figure3.4with figure3.5, the horizontal lines are closer together and also shorter when the SOM is larger. This indicates smaller errors and the values in table3.6confirms this.

The horizontal lines in figure3.4are approximately twice the length of the lines in figure 3.5. The RMSE and the mean ABSE are also twice as big with a 256 neuron SOM, see table3.6.

Looking at the results when using only 64 neurons shows that these values are, approximately, twice as large as for 256 neurons. All together the results indicates that, for this problem, the error size is reduced with the square-root of the network size. This is probably problem dependent and more tests are needed to reveal real relationship.

Figure 3.6. Correlation plot with estimated values compared to the true values using a SOM of 256 neurons with noisy signals applied on system (3.1). The straight line shows the ideal case where the SOM has infinitely many neurons and is optimally trained and labeled.

As noise is introduced to the problem, the SOM performs a lot worse. Figure3.6 shows the correlation plot and the estimation error is much bigger now. In com-parison to the noise free estimation error, there is a six times higher error made by the SOM now. The horizontal lines are distributed the same way compared to figure3.4, but they are longer. This points out that there are more inputs that are misclassified now than earlier, increasing the error. This concurres with the values in table3.6.

Looking into non-linearities, the results from the systems (3.2) and (3.3) do not differ much from the linear case. This can also be seen when comparing the topmost graph in figure 3.7 with figure 3.4. There is no big difference between a

(42)

Figure 3.7. Correlation plot between true and estimated values for system (3.2). Without backlash (top figure) and with backlash (bottom figure). The bottom figure definitely shows worse results than the top figure. The straight line shows the ideal case where the network has infinitely many neurons and is optimally trained and labeled.

linear and non-linear system as long as it is static. The reason for this is that the SOM does a mapping.

The addition of the backlash to system (3.2) decreases the VDS ability to model the system as can be seen in table 3.6 and when comparing the two plots in fig-ure3.7. A SOM can’t handle a backlash. The backlash output is dependent on the direction from which the backlash area is entered. The SOM system is static, so it only looks at current signal values, and therefore it doesn’t have the required in-formation to handle dynamic systems, i.e. systems that depend on previous values. See section3.5for more about dynamic systems and SOMs.

(43)

3.5 Handling dynamic systems 25

3.5 Handling dynamic systems

In this section, a method to estimate dynamic systems with SOMs is described. Although a function estimation problem is used to exemplify this, the method apply to conditioning and classification problems as well.

In [4], the estimation of a dynamic system is done automaticly by modification of the SOM-algorithm. In short, the output signal is used as an extra input sig-nal during the training step to create some short-term memory mechanism. The technique is called Vector-Quantized Temporal Associative Memory. This solution is not possible in the VDS without remaking of the core system.

Another, manual, way of handling dynamic data is adopted instead. Lagged, input and output, signals are used as additional input signals to incorporate the information needed to handle dynamics.

3.5.1 Introduction

The SOM can be seen as a function estimator, estimating the function f (·) in y(t) = f (u(t)),

i.e. the output at time t only depends on input signals at time t. By adding historical input and output values to the input vector, the method can estimate functions on the form

y(t) = f (u(t), u(t − 1), ..., u(t − k), y(t − 1), ..., y(t − n))

In this way, the dynamic function becomes, in theory, static when enough historical values are used. Knowing that the SOM method has no problem with non-linearities this leads to that, in theory, given enough historical signal values all causal, iden-tifiable systems can be estimated with this method. This is only limited by the discrete output, there can only be as many output values as there are neurons.

This theory is illustrated using the discrete system

y[t] = u[t], (3.4)

using both Gaussian and uniformly distributed random signals as input u. The VP is trained on the signal u[t] and labeled with y[t]. This example also shows upon differences when using different kinds of distributions.

When this is verified, a time shift is introduced to the system;

y[t] = u[t − 1] (3.5)

Exactly as before, the processor is trained on the signal u[t] and labeled with y[t]. It should be impossible to approximate the system because of the random data input. The VP is not able to guess the value of y[t] when just knowing u[t] and not u[t − 1] as there is no correlation between them.

Then the VP is trained on the signal u[t−1] and labeled with y[t]. This problem is identical to the first problem, just as a change of variable, and there should

(44)

be no problem to estimate the system with the same accuracy as the system in equation (3.4).

A situation where the exact dynamics of the system are known will not likely appear in practice. Therefore a test is performed to see what happens if the VP is over-fed with information. The processor is trained with both u[t] and historical values of u[t] as input.

3.5.2 Results and discussion

Table3.7summarizes these results.

System Input signals max ABSE mean ABSE RMSE

y[t] = u[t] u[t] 2.6 0.4 0.5

u[t] Gaussian 38.1 0.4 0.8

y[t] = u[t − 1] u[t] 134.7 64.0 73.9

u[t − 1] 2.0 0.4 0.5

u[t], u[t − 1] 12.7 4.0 4.8

u[t], ..., u[t − 3] 89.3 15.6 19.6

Table 3.7. Results of function estimation using the VDS on simple time lagging examples. Starting with the system in equation (3.4), the results show that the VP has no problem doing this estimation using a uniformly distributed random signal as input. The correlation plot in figure3.8visualizes how well the system is estimated and the values in table3.7are very good.

Figure 3.8. Correlation plot of correct vs estimated values using a uniformly distributed input signal.

(45)

3.5 Handling dynamic systems 27

Gaussian and a uniformly distributed signal as input. The estimation at extreme values are worse with a Gaussian signal. The reason for this is that the SOM algorithm approximates the distribution of the training data. Using an uniformly distributed input signal will spread the neurons evenly over the input space, while using Gaussian input more neurons will be placed in the center of the input space and less neurons at the perimeters. Less neurons, in an area, gives a larger error when discretizising the input signal and most often results in a larger error in the output signal, which can be seen in figure3.9.

Figure 3.9. Correlation plot of correct vs estimated values using a Gaussian input signal. Trying to estimate the system in equation (3.5), the result again confirms the expectations. The correlation between the correct and the estimated values of y is ∼ 0.0043, the system could not be estimated. As proposed, this is solved by lagging the input signal. Now the accuracy is almost identical to the first estimation. The small difference is due to randomness of the input signals.

The VP is clearly confused when the dynamics is not exactly matched, as can be seen when comparing figure3.10with figure3.8. Here both u[t] and u[t − 1] are used as inputs. The result shows that it is possible to estimate the system (3.5) although not as good as when using only u[t − 1] as input. This is because the SOM method uses unsupervised learning. It does not know, during training, that the extra dimension in the input space is useless. This results in a lower resolution. Using more historical signals as input gives even worse results. This shows the importance of having good knowledge of time shifts and other dynamics in the system.

(46)

Figure 3.10. Correlation plot of correct vs estimated values when estimating y[t] = u[t − 1] with the VP. The processor is given u[t] and u[t − 1] as inputs.

(47)

3.6 Comparison with other estimation and classification methods 29

3.6 Comparison with other estimation and

classi-fication methods

In this section a comparison is made between the VDS and some other simple methods to get an idea of the performance of the VDS. Classification and function estimation are the problems chosen as suitable for a comparison.

3.6.1 Cluster classification through minimum distance

esti-mation

The classification of cluster data with the VP can be compared with a simple algorithm implemented in Matlab R_{. The idea is to estimate the cluster centres by}

taking an average over the input training data. These averaged centres will then act as reference points to which a distance is calculated when an input data point should be classified. The algorithm is simply4:

1. Estimate each cluster centre by taking the mean value of all input data that belong to each cluster

2. Calculate the distance from each new input data point to all the cluster centres

3. Classify the input as belonging to the closest cluster centre

Classified as blue red black

True value blue 97.10% 0.86% 2.04% True value red 0.50% 98.18% 1.32% True value black 1.84% 1.34% 96.82% Table 3.8. Classification results using the algorithm described.

This algorithm is tested on the second set of cluster data (figure 3.2). The results are presented in table 3.8 and are compared with table 3.2 where most frequent multi-classification resolution is used. The performance is slightly better in the VDS approach.

This shows that the VDS has sufficient capabilities to classify data. The example was a simple one, why the comparing algorithm was quite easy to design. If however noisy signals are used or the clusters are not Gaussian, there might be a lot more problems to find a suitable algorithm, mainly because of that the cluster centres are difficult to estimate. The VDS however, is used the same way, which is why it has a strong advantage in simplicity.

(48)

Figure 3.11. Correlation plot with estimated values compared to true values using least squares estimation.

3.6.2 Function estimation through polynomial regression

The estimation of the system in equation (3.1) with the VDS in section3.4is put into perspective with a least squares parameter estimation with Matlab R_.

With no noise added to inputs and outputs the estimation problem is deter-ministic. Therefore only the case with noise is interesting for a comparison. With knowledge of the system the estimation problem is formulated as:

y = Θ u (3.6)

where Θ = [θ1 θ2] are the parameters to be estimated and u = [u1 u2]T the input

signals. This is an over determined equation system that is solved with least square minimization.

Using exactly the same data set as in the noise example in section 3.4 but using Matlab R_{’s polynomial regression the result shown in figure}_3.11_{is achieved.}

Comparing this to figure3.6, the polynomial regression look worse and table 3.9 confirms this. The VDS performs better than a polynomial regression.

Method Max ABSE Average ABSE RMSE

VDS 1078 189.3 254.0

Polynomial regression 1371 237.1 297.3

Table 3.9. Errors when using polynomial regression compared to the results from the VDS.

4_{The first step of this algorithm correspond to the training and labeling steps in the VDS and}

(49)

Chapter 4

Data collection and

pre-processing

Collecting and pre-processing data are two subjects that due to their complexity could be subject for a thesis of their own. Therefore, this report does not handle them in depth and only relevant issues are discussed. It is however important to stress that data collection and pre-processing are two key factors to succeed in developing applications with SOMs.

4.1 Data collection

The entirely data driven property of the SOMs, and therefore the VDS makes data collection extremely important. The training of the SOM is affected by the way data are collected. The

• amount of data • sampling frequency • quality of data • measurement method

are issues that have to be considered.

4.1.1 Amount of data

A rule of thumb is that during the training sequence the VDS should be presented to, at least, an amount of data points 500 times larger than the size of the SOM1_.

This is to ensure that the network approximates the probability density function

1_{There is no risk of over-training (over-fitting). The SOM method has no such problems.}

(50)

32 Data collection and pre-processing

of the input data. In a standard size VP with 256 neurons, this means 128000 samples.

When not enough data is available, training data can be presented to the SOM in a cyclic way until the required number of data points have been processed2. However, doing this may cause information to be lost, e.g. input data with gaps are not representative.

In addition, enough data for both training and validation has to be available.

4.1.2 Sampling frequency

The sampling frequency has to be considered when dealing with dynamics. As was shown in section3.5, dynamics can be handled by lagging input and/or output signals. How many and which old values should be used, depend on the dynamics of the system combined with the sampling frequency. With a high sampling frequency, many old values are needed to cover the dynamics of the system. On the other hand, a low sampling frequency, might cause dynamic information to be lost.

In this thesis a sampling frequency of 10 Hz is used. This is the sampling frequency that is available in the test cell equipment used.

4.1.3 Quality of data

The quality of data is the most important factor to get good results. There are many aspects of this, but the two most important are that data have to be representative and correctly distributed. This is, again, due to the fact that the SOM-algorithm is data driven and cannot extrapolate or interpolate. The VDS will, most likely, generate unreliable output if variables are not representative and correctly distrib-uted.

Representative data contain values from the entire operating range. This means that the variables have varied from their maximum to their minimum values during the measurement. All possible combinations of variables should also be available to ensure representativeness.

Data from variables that are not representative should either be omitted or used for scaling/normalization. E.g. the ambient air pressure variable is hard to measure in its full range. Therefore it is better to use it to scale other pressure variables that depend upon it.

The distributions of the input variables depend on the test cycle used when collecting data. Therefore the test cycle has to produce data that are suitable for the intended application. There are two different categories that are used: uniform and real life distributions.

Uniform distribution The distribution should be uniform when estimating func-tions and doing condition monitoring. In both these cases it is important that

2_{In this thesis the data are also randomized to ensure that they are not grouped in a bad}

(51)

4.1 Data collection 33

the application has equal performance over the entire input space, why input data should to be uniformly distributed.

Real life distribution A classification problem does not require equal perfor-mance over the entire input space. It is sufficient, and often advantageous, to make the classification in areas where the input signal resides frequently in a real situation.

Although these are the distributions wanted for all input signals, often not all of them can be controlled individually, i.e the boost temperature depends on the torque and the engine speed and can not be controlled manually. This is something that needs to be considered from case to case when constructing the cycle used for gathering data.

4.1.4 Measurement method

Data are collected in an engine test cell where an engine is run in a test bench. In the bench an electromagnetic braking device is connected to the engine. This device brakes the engine to the desired speed. Different operating conditions can then be obtained. A lot of variables can be measured but the ones in appendixB are the ones measured for this thesis.

As there are deviations between engine individuals, only one engine is used. In the laboratory, the engine is controlled with test cycles/schemes to simulate different driving styles and routes. It is important that the cycle generates appro-priate data that suite the purpose, as discussed in section 4.1.3. For this thesis, two different test cycles are chosen, the Dynamic Cycle, DC, and the European Transient Cycle, ETC. A DC provides the possibility to control variables using different regulators, here the engine speed and torque are used, whereas the ETC is a standardized cycle.

Aiming for data from the entire dynamical range of the engine, a randomized DC is created. The cycle starts with a torque of 600 Nm and an engine speed of 1350 RPM and then desires new torques and engine speeds twice per second. These values are randomly set under the constraint to prevent the engine to over-load3_{. This is followed by not allowing the speed to vary with no more than 100}

RPM/second and the torque with 1000 Nm/second. In this way, the cycle will take the engine through a large amount of operating conditions.

Two different change rates is used in this thesis. The first sets new desired values two times per second and runs for 2000 seconds. With a sampling frequency of 10 Hz 20000 samples are collected per cycle. The other sets new desired values five times per 10 seconds and runs for 20000 seconds. The same sampling frequency is used hence 200000 samples are collected.

The ETC is a standardized cycle that contains three phases:

3_{An overload situation can occur if the engine is running at a very high speed. If the demanded}

speed drops from a high level to a much lover level in a short time, for example going from 2500 RPM to 1000 RPM in one second, the breaking of the engine will cause the torque to peak.

(52)

34 Data collection and pre-processing

1. city driving - many idle points, red lights, and not very high speed. 2. country road driving - small road with lot of curves driving

3. high-way driving - rather continuous high speed driving

The data from this cycle are not as uniformly distributed as the DC but contain other types information about the dynamical behavior of the engine. Sequences such as taking the engine from idle to full load are captured in the ETC cycle but not in the DC.

The ETC cycle takes approximately 1800 seconds. Again 10 Hz is used as sam-pling frequency why approximately 18000 samples are collected.

Measurements are performed in three different cases, listed in table 4.1. They are performed on different occasions (read days) why the quality of the collected data has to be examined closely. Both the DC and the ETC are used, thus six sets of measurements are collected.

Case Description 1. Fault free engine

2. Engine with a 11 mm leakage on the air inlet pipe (after the intercooler, see figure2.2)

3. Engine with reduced intercooler efficiency to 80% Table 4.1. Description of the three different measurement cases.

4.2 Pre-processing

The pre-processing of data can be divided into specific and general pre-processing. The specific pre-processing deals with how to generate proper files for use as input to the VDS. This can be very different depending on how the data files are structured. In addition to this, data may have to be filtered, time shifted, differentiated or other. This is also, more appropriately, called feature extracting and is more a part of the problem solving method.4 When the input files are in order, data are ready to be used as input to the VDS but not to the VP.

The general pre-processing involves file segmenting, statistics generation and channel splitting where different toolboxes in the VDS can be used as well as other software. Also, the general pre-processing includes transforming data to a form the VP handles. The input interface of the VP uses 8-bit unsigned integers. Due to this the data, usually provided as floats, are scaled to the range 0-255 and then converted to unsigned integers. It is convenient to use the VDS for these tasks.

One variable that is handled in the same way for all applications (chapters5-7) is the ambient air pressure. The mean value and the standard deviation in table4.2

4_{The feature extracting is handled in the three chapters}₅_-₇ _{where the developments of the}

(53)

4.2 Pre-processing 35

Case 1 2 3

Mean value [kPa] 97.3 98.1 99.7 Standard deviation [kPa] 0.0603 0.0448 0.0604

Table 4.2. Ambient air pressure mean values and standard deviations for the three cases, see table4.1, using the DC.

reveals that this variable is not representative. The pressure values collected, stem from the ambient air pressure outdoors, which varies from day to day5_.

Measure-ments from all different altitudes and weather conditions are not available, and therefore this variable is used to normalize turbo boost pressure, which is the only other pressure signal used.

5_{All cases are measured in different days. This will cause the VDS to produce very impressive}

results if, for example, a distinction is to be made between case one and two. The VDS will only use the ambient pressure when deciding the winner. The results are not applicable on a real situation where the ambient air pressure will vary more.

(54)

(55)

Chapter 5

Conditioning - Fault

detection

In this chapter a fault detection problem is treated. This is done using the condi-tioning method in the VDS. The goal is to test if Vindax R _{can be used to detect}

some faults in a diesel engine.

5.1 Introduction

The functionality used for fault detection is described in section3.2. This method is based on the distance measurement between neurons and input data that can be extracted from the VDS. When a combination of the engine states produces a larger distance between the winning neuron and the input signal than occurred during training, an error is detected, or, at least, an unknown situation has occurred.

5.2 Method

Data from a DC and an ETC collected according to chapter 4, are used. The system is trained and labeled with 75% of the data from the engine with no faults. The verification data consist of the remaining 25% of the fault free data plus data from an engine with an air leakage and an engine with reduced InterCooler, IC, efficiency.

The following variables are used as input:

• moving average over 10 values of boost pressure • moving average over 100 values of boost temperature • moving average over 100 values of fuel value

• moving average over 100 values of engine speed 37

(56)

38 Conditioning - Fault detection

The smoothing, moving averages, is done to solve the problem with dynamics, see section 6.2 for a more thorough discussion. It would also be possible to de-lay signals to solve dynamic problems. However, since different faults show with different delays and no obvious delays can be detected, that approach is not used.

5.3 Results and discussion

The results from the conditioning are summarized in table 5.1 and visualized in figure5.1.

% mean

Healthy engine 0.9 4.3 Air leakage 11.8 9.1 IC reduced 35.7 33.9

Table 5.1. Results during fault detection with a fault free engine, an engine with an air leakage and an engine with reduced IC efficiency. The second column shows the percentage, of all samples, the samples where the difference is above zero. The last column shows the mean value of the difference values when the difference is above zero. Values above zero indicates an unknown input combination, i.e. a probable fault.

This shows that faults can be detected. As explained in section 3.2 difference values above zero indicates a probable fault in the system. There are definitely a lot more samples with a difference above zero when there is a fault in the system. The differences are also in general more above zero in the faulty cases than in the fault free case.

In the results presented it is clear that the fault detection performs better when the error is a reduced IC efficiency than when it is an air leakage. This can be explained by comparing the chosen input variables to variables chosen in chapter6. They are similar to the variables chosen for isolation of reduced IC efficiency faults, which explains the better performance.

It would be the other way around if the chosen variables would be more similar to the ones for isolation of an air leakage. Therefore a change in variables towards this would improve the fault detection when there is an air leakage. This is however not a solid approach as the idea of conditioning is to detect new types, i.e. unseen, faults. Input variables should not be adapted to a specific problem but should be quite general to ensure that unseen faults are detected.

(57)

5.3 Results and discussion 39

Figure 5.1. Results during fault detection with (from top to bottom) a healthy engine, an engine with air system leakage and an engine with reduced IC. Values above zero indicates an unknown input combination, i.e. a probable fault.

(58)

(59)

Chapter 6

Classification - Fault

detection and isolation

This chapter describes fault detection and isolation with the VDS. The purpose is to demonstrate how to develop such an application and to give an indication of results that can be expected when using Vindax R _{for this task. The goal is to test}

whether or not Vindax R _{is capable to produce sufficient information to make a}

diagnosis on a diesel engine inlet air system.

6.1 Introduction

As was described in section4.1.4, data from three different cases are available. In engine diagnostics it is desirable to isolate the faults in these cases, i.e. distinguish a fault free engine from an engine with a leakage in the inlet air system or a reduced intercooler efficiency. This is what the VDS is going to be used for in this chapter. Two applications are developed to fulfill the purpose of this chapter; detection of a leakage and detection of a reduced intercooler efficiency. DC and ETC data from a fault free engine and an engine with implemented faults are combined and are used for input. For training and labeling 75 % of the data are used and the remaining part is used for verification.

6.2 Method

To develop these applications the network is trained to recognize fault situations. The engine state will reveal these situations by the combination of its state vari-ables, e.g. load, speed, oil and water temperature. All variables will not contain information about the fault and hence the selection of the proper variables is vital.

(60)

42 Classification - Fault detection and isolation

6.2.1 Leakage isolation

When there is an air leakage, the compressor is not able to build up the intake manifold pressure, i.e. the boost pressure signal, as fast and to the same level, compared to the fault free engine. Therefore, the boost pressure signal and its derivative are suitable as inputs. Additional state variables are the amount of fuel injected, i.e. fuel value, with its derivative. This variable is selected because the amount of fuel injected is correlated with the boost pressure. See section2.2.4for details.

The following signals are used as inputs and pre-processed as follows: • boost pressure

Smoothed by taking a 10 sample average and then normalized with the am-bient air pressure.

• boost pressure derivatives

The difference between the current boost pressure and the values at t − 4 and t − 8 are used as inputs to incorporate the slope. These two where chosen simply by examining the slope of the boost pressure and trying to incorporate as useful information about the derivative as possible.

• fuel value

Smoothed by taking a 10 sample average delayed by 7 samples to account for dynamics, see section3.5. The time constant, 7 samples, is estimated by maximizing the correlation between the fuel value and the boost pressure signals.

• fuel value derivatives

The differences between the current fuel value and the values at t − 4 and t − 8 are used as inputs to incorporate the slope. These two where chosen of the same reasons as with the boost pressure derivatives.

The smoothing reduces the effect of noise and dynamic behavior. This makes a total of six input signals.

Another important variable is the engine speed, however not suitable for input. Experiments using engine speed as an input variable, give worse results1_{. The}

infor-mation that however resides in the engine speed signal is incorporated in another way. As the boost pressure depends on this variable, but not as strongly as on the fuel value, the input space is divided into three areas:

1. Low speed : 0 rpm → 1250 rpm 2. Middle speed : 1150 rpm → 1650 rpm 3. High speed : 1550 rpm → ∞ rpm

This is done to reduce the effect that the engine speed has on the boost pressure.

1_{Why this is the case is not obvious. Maybe the VP is confused as more inputs are introduced.}

But it could also be that the dynamics are not properly handled, i.e. the signal lagging is not correct. Experiments could be done to try to solve this, but the solution presented here gives satisfactory results.