A Comparison of Isolation Algorithms on
a Benchmark System
MARTIN TORELM
Abstract
Acknowledgements
I would like to take this opportunity to thank my supervisor, Anna Pernest˚al, who has been of great help and support throughout this master’s degree project. I would also like to thank Scania - NED, and the department of Automatic Control at the Royal Institute of Technology, Sweden. I would also like to thank Vicen¸c Puig for providing information about the previous work on the benchmark problem used in this thesis.
Table of Contents
1 Introduction 1
1.1 Background . . . 1
1.2 Objective . . . 2
1.3 Goal . . . 2
2 The benchmark problem 3 2.1 The model . . . 3 2.2 Faults . . . 5 2.2.1 Additative faults . . . 5 2.2.2 Mixed faults . . . 6 3 Test quantities 7 3.1 Residual generation . . . 7 3.1.1 Discretisation . . . 8
3.2 Filtering and thresholds . . . 8
3.2.1 Cu-sum test . . . 9
3.2.2 Thresholds . . . 9
3.2.3 The Failure Signature Matrix . . . 9
4 Isolation algorithms 12 4.1 Column reasoning . . . 12
4.2 Row reasoning . . . 13
4.3 A Bayesian approach to isolation . . . 14
4.3.1 Independence . . . 15
4.3.2 Partial independence . . . 15
4.3.3 Full dependence . . . 16
5 Performance measures 18
5.1 Memory usage . . . 18
5.2 Diagnostic resolution . . . 18
5.3 Normalised diagnostic accuracy . . . 19
5.4 Error rate . . . 20
6 Simulation and evaluation 21 6.1 Simulations . . . 21
6.1.1 Column reasoning . . . 22
6.1.2 Row reasoning . . . 22
6.1.3 Bayesian isolation methods . . . 22
6.1.4 Diagnostic Model Processor . . . 27
6.2 Evaluation of the algorithms . . . 27
6.2.1 Memory usage . . . 30
7 Results 32 7.1 Performance measures . . . 32
7.1.1 Additative faults . . . 34
7.1.2 Mixed faults without filtering . . . 35
7.1.3 Mixed faults with filtering . . . 36
7.1.4 Memory usage . . . 37 8 Conclusion 38 8.1 Discussion . . . 38 8.2 Recommendation . . . 39 8.3 Summary . . . 39 8.4 Future work . . . 40
A Notation and parameters I
B Isolation V
Chapter 1
Introduction
Diagnostics is an important task in the field of industrial systems. A mal-functioning component could, for example, result in decreased efficiency, damage to the system, or even cause personal injury.
It is important to detect and isolate the fault to be able to make the right decisions when a fault is present. Isolation algorithms are used for finding out which component or how the component is failing. Isolation can be based on consistency tests for a process. Based on knowledge of how the tests react on different faults, the faulty component can be pointed out. A good isolation system can be of valuable help for the shop technician when locating and repairing the fault.
1.1
Background
-Process Residual
generation Filtering Isolation
Diagnoses
Process data Test quantities
Figure 1.1: The figure shows the structure, which is used in this work for making diagnosis. The method is to use residuals, which are filtered and thresholded, as inputs to the isolation. The output of the isolation is the diagnoses, which is a set of possible faults causing the behaviour of the process.
The residuals are then filtered, and by applying thresholds to the residuals, tests are computed. The tests will then be used as inputs to the isolation.
1.2
Objective
The objective with this thesis is to compare different isolation algorithms. Based on the comparison of the different approaches, a recommendation should be made for which types of problems the isolation algorithms are applicable.
1.3
Goal
The goal of this project is to:
1. Implement four types of isolation methods on a benchmark problem 2. Develop performance measures to compare the isolation methods 3. Compare and evaluate the methods with the developed performance
Chapter 2
The benchmark problem
In this chapter the benchmark problem is presented and the model equations are derived. The model equations will be used for simulation and when developing forming the residuals. This benchmark problem has previously been used in [Pulido] and [O. Bouamama].
2.1
The model
The benchmark system that is used for comparing the algorithms is taken from [O. Bouamama] and consists of two tanks with various components connecting the tanks. The system is shown in Figure 2.1.
The purpose of the two-tank system is to provide a constant flow to the consumer. A PI-controlled pump provides water to tank T1, with an inlet
flow Qp, to a nominal level of h1c = 0.5 m. Tank T1 is connected with T2
by a pipe. The water level, h1, in tank T2 is controlled by an ”ON-OFF”
controller acting on the valve Vb and causing a water flow, Q12, from tank
T1. The ”ON-OFF” controller is controlling the water level, h2, in tank T2.
The valves Vf1 and Vf2 are used to simulate water leakage of the respective tank.
The inputs to the system are the pump flow, Qp, the control output from the ”ON-OFF” controller, Ub, the input voltage Uo the valve for the outlet to the consumer, and the pump voltage Up. The input vector u is a measured quantity. Measured quantities have the letter ”m” added to the variable. u = ⎡ ⎢ ⎢ ⎣ Qmp Ubm Uom Upm ⎤ ⎥ ⎥ ⎦
Figure 2.1: System used for benchmark. as: ym= y1m y2m = h1+ ε1 h2+ ε2 , (2.1)
where ε1 and ε2 are measurement noises.
The change of the volume in the tanks can be described as difference between the sum of all in-flows and the sum of all out-flows. This is written as: ˙ V1= A1˙h1 = Qint1−Qout1= = Qp− Q12− Qf1 ˙ V2= A2˙h2 = Qint2−Qout2= = Q12− Qo− Qf2 (2.2)
The inlet flow Qp is assumed to be proportional to the pump voltage, Up, and taking into account for the limitation of the pump, the inlet flow is described as in the equation below.
Qp(t) = ⎧ ⎨ ⎩ Up 0 < Up < Qp max 0 Up ≤ 0 Qp max Up ≥ Qp max (2.3)
The PI controller acting on the pump is modelled as: Up = Kp(h1c− h1(t)) + Ki
(h1c− h1(t)) dt, (2.4)
where Kpand Kiare constants and h1c is the set point for the PI-controller. Using Bernouilli’s law, the water flow Q12 between the two tanks is
The ON-OFF controller controls the inlet flow to tank T2 through a
valve, Vb. The valve opens if the water level in tank 2 is less or equal than 0.09 m and closes if the water level is greater than 0.09. The water level can not be less than 0 m and it can not be greater than 0.11 m. The control signal acting on the valve is
Ubm =
0 , if 0.09 m≤ h2< 0.11 m
1 , if 0.00 m≤ h2< 0.09 m . (2.6)
Using Bernouilli’s law for a second time gives the outflow to the con-sumer.
Qo= Cvo·h2· Uom, (2.7)
where Uom is the control signal to the valve Vo. Uom=
1 if Vois opened
0 if Vois closed (2.8)
Using the Equations 2.2 - 2.8 the final model of the system is written as: ˙h1= Qp−Cvb·sgn(h1−h2) √ |h1−h2|·Ubm−Qf 1 A1 ˙h2= Cvb·sgn(h1−h2) √ |h1−h2|·Ubm−Cvo·√h2·Uom−Qf 2 A2 (2.9)
2.2
Faults
Different types of faults are considered in the benchmark problem. In [Pulido] there are totally six additative faults simulated. In [O. Bouamama] there are total of eight faults simulated, both additative and multiplicative. Both of the works only cover the single fault scenario, but in this work we will also consider multiple faults. In this thesis both the set of faults in [Pulido] and in [O. Bouamama] are studied. In this text, the faults will be denoted F , with different subscripts for the different faults. The superscript, ”P”, will be used for faults taken from [Pulido], and ”B” will be used for faults taken from [O. Bouamama].
2.2.1
Additative faults
In [Pulido] there are only additative, single faults considered. To be able to compare the results following faults from this work are used:
FffP: Fault free mode: the process run without faults FpumpP : Pump fault: additative fault in pump P1
FyP1: Additative fault in level sensor y1
FQP
f 1: Constant leak in tank T1
FQP
f 2: Constant leak in tank T2
FUP
p: Additative fault in the controller output Up in tank T1
In this case there is a total number of six different single faults that can be simulated. All combinations of faults will be simulated in the benchmark system, and this means that there are 26 = 64 possible faults scenarios.
2.2.2
Mixed faults
The faults that has been taken into consideration in [O. Bouamama] are two types, additative and multiplicative.
FffB: Fault free mode: the process run without faults
FpumpB : Pump fault: the pump is simulated off from t = 40s to t = 120s FyB1: Level sensor y1m is stuck to zero from t = 40s to t = 120s FyB2: Level sensor y2m is stuck to zero from t = 40s to t = 120s FQB
f 1: Water leak in tank T1 from t = 40s to t = 120s. Qf1= 10
−4 m3/s
FQB
f 2: Water leak in tank T2 from t = 40s to t = 120s. Qf2= 10
−4 m3/s
FUB
p: Controller output U
m
p is short circuit to ground from t = 40s to t =
120s FVB
b: Valve Vb is blocked out from t = 40s to t = 150s
FUB
b: Controller output U
m
b is short circuit to ground from t = 40s to t =
120s
Chapter 3
Test quantities
In this chapter, residuals are computed and test quantities are formed. The test quantities are obtained by simple relations from the model and will be used as input for the isolation.
3.1
Residual generation
The detection part of the diagnostic system is based on residuals. Residuals are obtained from Analytic Redundancy Relations, ARR [Nyberg, Frisk]. The ARR, shown Equations 3.1 - 3.4, is obtained from the model equations (see Equation 2.2 - 2.9 in Chapter 2), by identifying the relations between the measured outputs, xj, and the modelled outputs, ˆxj, with j = 1, 2 . . . 4.
A1 dym 1 dt + dε1 dt x1 ≈ Q12+ Qmp+ ε3− Qf1 ˆ x1 (3.1) A2 dym2 dt + dε1 dt x2 ≈ Q12− Cvo· (ym2 + ε2)· Uom− Qf2 ˆ x2 (3.2) Upm+ ε4 x3 ≈ Kp(eh1) + Ki (eh1) dt ˆ x3 (3.3) Qmp + ε3 x4 ≈ ⎧ ⎨ ⎩ Upm+ ε4 , if 0 < Upm+ ε4 < Qp max 0 , if Upm+ ε4 ≤ 0 Qp max , if Upm+ ε4 ≥ Qp max ˆ x4 (3.4)
where εi, i = 1, 2 . . . 4, is the sensor noise, and
Q12 = −Cvbsign(ym1 + ε1− y2m− ε2)|y1m+ ε1− ym2 − ε2| · Ubm
The residuals can then be computed as the difference between the measured and modelled output.
r1(t) = xˆ1(t)− x1(t)
r2(t) = xˆ2(t)− x2(t)
r3(t) = xˆ3(t)− x3(t)
r4(t) = xˆ4(t)− x4(t)
3.1.1
Discretisation
The model equations in 2.9 must be discretisised because the implementation of the residuals in the simulation needs to be done in discrete form. The model was discretisised by ”Euler’s method” which, for instance, transfers dym/dt to
dym
dt =
ym(k)− ym(k− 1)
T (3.5)
The residual generation used for the simulations can then be written as: r1(k) = −Cvbsgn(y1m(k)− y2m(k)) |ym 1 (k)− y2m(k)| · Ubm(k)− −Qm p(k)− Qf1(k)− A1 ym 1(k)−y1m(k−1) T r2(k) = Cvbsgn(ym1(k)− ym2(k))|ym1(k)− ym2(k)| · Ubm(k)− −Cvo· ym2(k)· Uom(k)− Qf2(k)− A2 ym 2(k)−ym2(k−1) T r3(k) = Upm(k)− Kp(em(k)− em(k− 1)) − KiT em(k)− Up(k− 1) r4(k) = Qmp (k)− ⎧ ⎨ ⎩ Upm 0 < Upm(k) < Qp max 0 Upm(k) ≤ 0 Qp max Upm(k) ≥ Qp max (3.6)
3.2
Filtering and thresholds
0 50 100 −0.1 −0.05 0 0.05 0.1 r1 Time [s] 0 50 100 −5 0 5 10 15x 10 −3 r2 Time [s] 0 50 100 −1 −0.5 0 0.5 1 x 10−7 r3 Time [s] 0 50 100 −2 −1 0 1 2 x 10−7 r4 Time [s]
Figure 3.1: Residuals for additative fault in y1. The dashed line shows the
threshold used for the residuals.
3.2.1
Cu-sum test
The Cu-sum test [Gustafsson] is used to detect small changes in the bias of the residuals. The Cu-sum test is defined as
St+1 =
St+ yt+1 yt+1> h
0 yt+1≤ h .
A rule of thumb is that the value of h should be 2.5 times the fault size.
3.2.2
Thresholds
Thresholds are used to determine if a test has reacted to a fault. The thresholds are chosen in a way such that no test reacts when the system is simulated in the fault free case. The threshold is then set just above the highest value of the residual. This is to avoid false alarms. A simulation of an additative fault in the sensor for y1 is shown in Figure 3.1. The residuals r1
and r2are sensitive to the fault and have exceeded the thresholds. Figure 3.2
shows the test results for the same simulation.
3.2.3
The Failure Signature Matrix
Test results for fault FyP1 Time [s] Te st 0 40 80 120 d1 d2 d3 d4
Figure 3.2: The test results for the additative fault, FyP1.
ri FpumpP FyP1 FyP2 FQPf 1 FQPf 2 FUPp
r1 0 x x x 0 0
r2 0 x x 0 x 0
r3 0 0 0 0 0 x
r4 x 0 0 0 0 x
Table 3.1: The failure signature matrix
0 50 100 150 200 250 300 −2 −1 0 1 2 3 4 5 Threshold Time [s] Amplitude −2 0 2 4 6 0 5 10 15 20 25 Threshold Amplitude
Chapter 4
Isolation algorithms
There are many different algorithms that have been developed for isolation. Some of them are based on models of the process, while others are based on experience. In this project four different model based approaches to fault isolation are considered. Variations of the algorithms by changing different assumptions are also considered.
First, some notation: Let ci be a variable, which describes the behavioural mode in component i, such that
ci =
0 for ”no fault in component i”
1 for ”fault in component i” . (4.1)
Further on, let dj be the test result from test j and dj=
0 for no alarm
1 for alarm . (4.2)
The current system behavioural mode, C, and the test results, D, can then be written as
C = [c1, c2, c3, . . . cn] (4.3)
D = [d1, d2, d3, . . . dm]. (4.4)
Let Δ be a diagnosis, i. e. a system behvioural mode that is consistent with measurements. Note that there can be several system behavioural modes that are consistent with measurements. The output, D, from the isolation system is a set of diagnoses.
4.1
Column reasoning
1. No conclusion can be drawn from a test result that has not been acti-vated
2. An inactivated test result can exclude the faults where x:s are marked (this is the traditional way of doing isolation in FDI field, but it is generally not to recommend, since it could exclude a correct diagnosis when a test misses a detection. See Figure 3.3 )
Example 1 Consider the FSM:
di c1 c2 c3
d1 x 0 x
d2 0 x x
(4.5)
When test d1 reacts and test d2does not react the first variant of the Column
reasoning method will give the diagnoses:
D = {{c2}, {c3}}, (4.6)
while the second variant of Column reasoning will give:
D = {{c2}}. (4.7)
4.2
Row reasoning
Row reasoning/a variant of Reiter’s algorithm (DX) is a common way to handle the isolation problem. The main idea behind row reasoning is that each test results in a conflict. A conflict means, in this case, that not all of the components included in the conflict can be non faulty at once. These conflicts can be generated with the tests together with the rows of the Failure Signature Matrix. For every new conflict, the intersection with the old ones produces the new diagnosis statement.
Example 2
di c1 c2 c3
d1 x 0 x
d2 0 x x
Consider the FSM above. If test number one reacts, i. e. d1 = 1 then it
would produce the conflict {c1, c3} and the diagnoses:
D = {{c1}, {c3}} (4.8)
Later on, if also test number two reacts (d2 = 1), then conflicts would be
{c1, c3} ∧ {c2, c3}, and the Row reasoning isolation method will produce the
diagnoses:
{c1, c3} ∧ {c2, c3} ⇒
This means that there are four possible explanations of the system’s be-haviour, three of them contains double faults, and one contains a single fault.
The most common assumption is that the current behaviour of the process probably has its explanation from the fault that includes the least compo-nents, from the example above, this would be {c3}, even though all of the
above sets are possible.
In this thesis we will consider Reiter’s algorithm for finding the minimal diagnosis. It produces a minimal set of possible faults, and a common in-terpretation is that the current system behavioural mode exists in this set, even if all supersets of the minimal diagnosis are possible. From the example above, the result would be: D = {{c1c2}, {c3}}.
The structure of Reiter’s algorithm is shown below:
1. Initialise the set of minimal diagnoses to hold only the empty set, i.e.
2. Given a (new) conflict, find out if any minimal diagnosis is invalidated, i.e. has an empty intersection with the conflict 3. Extend any invalidated diagnosis to a set of new diagnoses
consisting of the invalidated diagnosis and an element from the new conflict
3. Remove any new diagnoses that are not minimal, i.e. are super-sets of any other minimal diagnosis
4. Iterate from Item 2 for all new conflicts
4.3
A Bayesian approach to isolation
A drawback with Column reasoning and Row reasoning is that the algo-rithms often produces many diagnoses. Therefore, a Bayesian approach to isolation [Pernest˚al] has been considered. The main idea behind this ap-proach to fault isolation is to compute the probability that a fault is present. This probability can then be used for ranking or decision making on fault accommodation.
Let C be the current system behavioural mode, D the test results, then, Baye’s rule is applicable as follows:
P (C | D) = P (D|C)P (C)P (D) (4.10)
In order to obtain a good estimation of the functions in Equation 4.10, simulations of the system has to be done with all combinations of faults, single as well as multiple. If all faults are assumed to be independent and the probability for a single fault to occur is P (ci) = pc,∀i = 1 . . . n then
P (C) is called the prior probability. The simulations are done with the cur-rent system behavioural mode and the test results as outputs. The function P (D) is a normalisation factor and is calculated as follows:
P (D) =
C
P (D| C)P (C) (4.11)
The Bayesian approach is varied using different assumptions about indepen-dence. The only difference is how P (D | C) is computed. The next three subsections will describe the different assumptions.
4.3.1
Independence
In this variant of the Bayesian algorithm, all test results, dj, are assumed to be independent. The probability, P (D| C), can then be computed as
P (D| C) =
j
P (dj | C). (4.12)
For every fault simulation, simulation data is gathered at the sample times, tk = kT , where k = 0, . . . , N − 1 and T is the sampling interval. Then the estimation of the distribution P (dj | C) is done as follows:
P (dj = 1| C) = N k=1 dj(tk | C) N P (dj = 0| C) = 1 − P (dj = 1| C), (4.13)
where dj(tk | C) is the observed test result from time tk given the system behavioural mode C.
4.3.2
Partial independence
The assumption about independence among test results is generally not valid. Two tests could, for example, be dependent when they share the same underlying relations, and if one of the tests reacts it can cause the other to react. The assumption of partial independence of the test results is used in this variant of the Bayesian algorithm.
The method, which used in this work, for finding dependence among tests is taken from [Pernest˚al]. If tests are dependent, and a test has reacted, the knowledge about the system behavioural mode will not provide any information about the other tests. To decide if tests are dependent, training data is collected from different system behavioural modes. The training data is then evaluated and likelihoods of different dependencies are computed.
4.3.3
Full dependence
In reality, there is always a possibility for dependence among tests, and to make sure we cover all possibilities we can assume that there are dependen-cies between all tests, the probabilities should then be calculated as:
P (C| D) = P (d1d2...dm|C)
P (D) (4.15)
The assumption about full dependence is the best that can be done when computing the probabilities for system behavioural modes given the test results, and can also be used as reference when evaluating how well the other assumptions work.
4.4
Diagnostic Model Processor
The Diagnostic Model Processor [Petti], DMP, is a model-based algorithm for diagnostics. This method is also based on residuals. The residuals are weighted to decide the degree of violation of the model equations. The thresholds are be obtained as before, and when the residual rj exceeds the threshold τj corresponds to vj exceeding the value 0.5. The residuals are calculated from measurements, u and y, from the process. Let C be a vector of assumptions about the system behavioural mode. Then the residuals can be written as:
rj = gj(C) (4.16)
The residuals are used to calculate a satisfaction vector, vsf, which contains the information on how well the model equations are satisfied: 0 for perfect satisfaction and±1 when model equations are severely violated high or low respectively.
vjsf= (rj/τj)
n
1 + (rj/τj)n (4.17)
This can be seen as an other way of thresholding.
The sensitivity function, S, is determined through the partial derivative of the model equations, cj, with respect to the fault, ci:
Sij = ∂gj/∂ci |τj|
The sensitivity function corresponds to the FSM in the FDI approach and describes how easy a behavioural mode, ci, violates the residual ri.
The failure likelihood, Fi, of assumption ci is determined from the equation:
Fi = n j=1 (Sijvsfj) n j=1 |Sij| (4.19)
Chapter 5
Performance measures
To be able to compare the performance and the resources that are required from the respective algorithm, performance measures are needed. Explana-tions and definiExplana-tions of performance measures follow in this chapter. Some of the measures are not applicable on certain types of problems. If this is the case, a description can be found in the respective section.
5.1
Memory usage
The memory usage of an algorithm is defined as the amount of memory required for carrying through the isolation. This measure is dependent on size as well as the desired accuracy of the data structures:
• In Column reasoning and Row reasoning, an FSM needs to be stored • In the Bayesian approaches, the functions P (D|C) and P (C) needs to
be stored as tables
• In DMP, the sensitivity matrix S needs to be stored
5.2
Diagnostic resolution
The diagnostic resolution ([Pulido]) measures the average of the belief, pkC, of the system behavioural mode, C, evaluated for sample number k. The diagnostic resolution is defined as:
γ = 1 L L k=1 C pkC, (5.1)
the optimal value of the diagnostic resolution, they need to point out one system behavioural mode in average.
Example 3 Let B1, B2, B3 and B4 be the possible behavioural modes. If
the Row reasoning method produces the diagnostic statement, {{B1}, {B2}}
in performance test k, then
B
pkC = 2 (5.2)
If the Bayesian method states the diagnoses,
{P (B1) = 0.5, P (B2) = 0.4, P (B3) = 0.1, P (B4) = 0} for performance test
k, then
B
pkC = 1 (5.3)
Since the Bayesian method states the diagnosis on the form of probabilities, the Diagnostic resolution will always become one for this case.
5.3
Normalised diagnostic accuracy
Normalised diagnostic accuracy, NDA, was developed in order to handle the multiple fault scenario. The idea is to place different weights depending on how important a component is, i. e. to let single faults be more important than behavioural modes containing multiplicative faults. Let the function f (C | D) denote the confidence of a diagnosis, and let
C
f (C | D) = 1 (5.4)
The NDA is then defined as:
α = 1 N Cf (C | DC)· kC 1 N CkC = Cf (C| DC)· kC CkC (5.5) Where DC is observations of the test results, when C is the true system behavioural mode, and kC is a vector which includes weights for the system behavioural mode. Depending on how important it is that the system be-havioural mode, C, is included in the diagnosis when active, these weights is chosen in a way that important behavioural modes gets large values, and less important system behavioural modes gets smaller values. For example, it is more important to be able to have a correct diagnosis statement for sin-gle faults or NF than for the case when more or all components are faulty. The parameter kC is design parameter. A good choice of kC is to let single faults have the value 0.11, double faults have the value 0.12 etc.
the true system behavioural mode is C2, the test result is DC and the isolation
reaches the conclusion that C2 or C3 is present then the confidences of the
diagnoses are: f (C1|DC2) = 0 f (C2|DC2) = 1 2 f (C3|DC2) = 1 2
The only confidence which contributes to the sum in the denominator is then f (C2|DC2) = 1/2.
The optimal value for the NDA is 1. The optimal value of the NDA is not, in reality, achievable because this means that the confidence of the diagnosis needs to be one at all times.
5.4
Error rate
The error rate is defined as the average percentage of faulty diagnoses for the current system behavioural modes. A faulty diagnosis means that the true system behavioural mode is not present in the diagnosis.
β = 1
L
C
(f (C|DC) == 0) (5.6)
Chapter 6
Simulation and evaluation
This chapter describes how the simulation and evaluation of the isolation methods was done technically.
6.1
Simulations
The benchmark system was simulated in SIMULINK and the isolation algo-rithms were implemented in MATLAB. The residual generation is the same for all types of algorithms in order to make an objective comparison. All thresholds were also kept the same for all algorithms, with exception for DMP, where thresholds are defined in an other way. All simulations were done both for the case where only additative faults used, with a total of six faults and for the case where multiplicative faults are used. Two ways of handling the test results have also been considered:
Time [s]
0 50 100 150
2. 1.
1. Test results are computed at every sample time
2. Test results are computed at every sample time until it is equal to one, then it is held to one during the entire simulation. In this way fluctuation in the diagnosis is avoided (see Figure 6.1)
The residuals is shown in the Figures 6.2 - 6.3, while simulating the faults FyP1 and FQP
f 1. Figure 6.4 - 6.5 shows the test results. The simulation
output for each isolation method is shown in the Figures 6.6 - 6.13, using the faults FyP1 and FQP
f 1, when only single faults are considered and when
test results are held.
6.1.1
Column reasoning
Column reasoning was implemented both for single faults and multiple faults. The multiple fault case needed an extended FSM. This FSM is obtained by merging the single fault FSM into a multiple fault FSM. A Simulation of an additative fault in the level sensor for tank T1 is shown
in Figure 6.6 (only considering single faults). Note that the first variant of Column reasoning is more careful with excluding faults, and this lead to many diagnoses, but if there are small faults active, causing the tests not to react, then the Column reasoning 1 seem reasonable.
6.1.2
Row reasoning
All combinations of faults were used during the simulations. The properties of row reasoning are such that the regular FSM can be used in both the single-fault case and multiple-fault case. Row reasoning always produces a diagnosis for multiple faults; therefore, when evaluating the algorithm in the single fault case, diagnoses with more than one component are ignored. The output from the Row reasoning method is shown in Figure 6.8 while simulating a fault in the level sensor for tank T1. A comparison of the output
from the Row reasoning method with the output from Column reasoning method (see Figure 6.6, to the right), shows that the methods are very similar. The difference between them shows when around 40 s, when just one test have reacted. The Row reasoning method shows that there are three possible components faulty, while the Column reasoning method has just pointed one wrong component. Shortly after that, the next test reacts and the two methods show the same output.
6.1.3
Bayesian isolation methods
0 50 100 −0.1 −0.05 0 0.05 0.1 r_1 Time [s] 0 50 100 −5 0 5 10 15x 10 −3 r_2 Time [s] 0 50 100 −1 −0.5 0 0.5 1 x 10−7 r_3 Time [s] 0 50 100 −2 −1 0 1 2 x 10−7 r_4 Time [s]
Figure 6.2: Residuals for the single fault scenario and additative faults con-sidered, simulating the fault FyP1 from t = 40 s to t = 120 s.
0 50 100 −0.01 0 0.01 0.02 0.03 r_1 Time [s] 0 50 100 −4 −2 0 2 4x 10 −4 r_2 Time [s] 0 50 100 −1 −0.5 0 0.5 1 x 10−7 r_3 Time [s] 0 50 100 −2 −1 0 1 2 x 10−7 r_4 Time [s]
Figure 6.3: Residuals for the single fault scenario and additative faults con-sidered, simulating the fault FQP
Test results for fault FyP1 Time [s] Te st 0 40 80 120 d1 d2 d3 d4
Figure 6.4: Test results for the single fault scenario and additative faults considered, simulating the fault FyP1 from t = 40 s to t = 120 s.
Test results for fault FQPf1
Time [s] Te st 0 40 80 120 d1 d2 d3 d4
Figure 6.5: test results for the single fault scenario and additative faults considered, simulating the fault FQP
Time [s]
Column reasoning 2, diagnoses for fault FyP1
0 40 80 120 FpumpP FyP1 FyP2 FQP f1 FQPf2 FUPp Time [s]
Column reasoning 2, diagnoses for fault FyP1
0 40 80 120 FpumpP FyP1 FyP2 FQP f1 FQPf2 FUPp
Figure 6.6: Output from the Column reasoning isolation methods for the single fault scenario and additative faults considered, simulating the fault FyP1 from t = 40 s to t = 120 s. The figure to the left shows that no conclusion can be drawn about the system behavioural mode for the first 40 samples.
Time [s]
Column reasoning 1, diagnoses for fault FQPf1
0 40 80 120 FpumpP FyP 1 FyP2 FQPf1 FQP f2 FUP p Time [s]
Column reasoning 2, diagnoses for fault FQPf1
0 40 80 120 FpumpP FyP 1 FyP2 FQPf1 FQP f2 FUP p
Figure 6.7: Output from the Column reasoning isolation methods for the single fault scenario and additative faults considered, simulating the fault FQP
Time [s]
Row reasoning, diagnoses for fault FyP
1 0 40 80 120 FpumpP FyP1 FyP2 FQP f1 FQPf2 FUP p
Figure 6.8: Output from the Row reasoning isolation method for the single fault scenario and additative faults considered, simulating the fault FyP1 from t = 40 s to t = 120 s.
Time [s]
Row reasoning, diagnoses for fault FQP
f1 0 40 80 120 FpumpP FyP1 FyP2 FQP f1 FQPf2 FUP p
Figure 6.9: Output from the Row reasoning isolation method for the single fault scenario and additative faults considered, simulating the fault FQP
f 1
Example 5 Consider a system with three components and three tests, i. e. i = 3 and j = 3. If we simulate all behavioural modes, the probabilities can be estimated through
P (d1 = x, d2= y, d3= z| c1= ξ, c2= ζ, c3= ϑ) = n ξζϑ xyz
N (6.1)
where nξζϑxyz is the number of samples, N is the total number of samples and x, y, z, ξ, ζ, ϑ can take the values 0 or 1.
A simulation of an additative fault in the level sensor in tank T1, where only
single faults are considered, is shown in Figure 6.10.
6.1.4
Diagnostic Model Processor
DMP was implemented in MATLAB and the thresholds and the sensitivity function from [Pulido] was used for the case where only additative faults are considered. For the multiplicative faults, the sensitivity function was chosen such that the elements corresponding to multiplicative faults got the values +1 and -1 for residuals which reacts with a positive and a negative derivative respectively. The thresholds were chosen in a way that no residual react in the fault free mode.
A simulation of a fault in the level sensor for tank T1is shown in Figure 6.12,
where only single faults are assumed to be possible. The figure shows the likelihoods for faults in the different components. note that this method produces a different result than the previous methods. The likelihoods can be both positive and negative. The interpretation of negative likelihoods can be that a negative fault is present in the corresponding component or that it is highly unlikely that the fault is present.
6.2
Evaluation of the algorithms
The evaluation of the isolation algorithms was done with MATLAB. Scripts were used to simulate all combinations of faults and the data was processed afterwards. All of the performance measures are evaluated both for snap-shots of data, and snapsnap-shots where the test results are held active once activated. For the case where mixed faults are simulated, the performance is also measured with the test results filtered using the Cusum-test. In the additative fault case, filtering is not necessary because simulations showed that it was always possible to separate a violated residual from non-violated residual.
Time [s]
Bayesian method-independence, diagnoses for fault FyP1
0 40 80 120 FpumpP FyP1 FyP2 FQP f1 FQPf2 FUPp
Figure 6.10: Output from the Bayesian method, where the test results is assumed to be independent, and the fault FyP1 is present from t = 40 s to t = 120 s. The figure shows the probabilities for the single fault scenario and additative faults considered.
Time [s]
Bayesian method-independence, diagnoses for fault FQP
f1 0 40 80 120 FpumpP FyP1 FyP2 FQP f1 FQPf2 FUPp
Figure 6.11: Output from the Bayesian method, where the test results is assumed to be independent, and the fault FQP
f 1 is present from t = 40 s to
Time [s]
Diagnostic model processor, diagnoses for fault FyP1
0 40 80 120 FpumpP FyP1 FyP2 FQP f1 FQP f2 FUPp
Figure 6.12: Output from the Diagnostic Model Processor. The figure shows the likelihoods for the single fault scenario and additative faults considered, simulating the fault FyP1 from t = 40 s to t = 120 s.
Time [s]
Diagnostic model processor, diagnoses for fault FQP
f1 0 40 80 120 FpumpP FyP 1 FyP2 FQP f1 FQPf2 FUPp
Figure 6.13: Output from the Diagnostic Model Processor. The figure shows the likelihoods for the single fault scenario and additative faults considered, simulating the fault FQP
• Error rate
The error rate is only measured for the single-fault case. Error rate is not evaluated for the multiple fault case.
• Diagnostic resolution
The diagnostic resolution is measured for the single-fault case only. Diagnostic resolution is not measured for multiple faults.
• Normalised Diagnostic Accuracy
NDA uses different weights, depending on the importance of isolating the current system behavioural mode. Therefore, it is suitable for measuring performance on multiple faults. Isolation performance is measured for all faults, single as well as multiple.
• Memory usage
The memory usage is computed by analysing the memory structures needed for storing the information about the isolation.
The parameters for the benchmark system and the isolation can be found in Appendix A.
6.2.1
Memory usage
The memory usage is denoted δ. For the case where non-boolean structures are used, the number of bits used in the elements of the structures is η. Single faults
Isolation algorithm Need to store δ [bit]
Column reasoning A regular FSM with m · n booleans m · 2n
Row reasoning A regular FSM with m · n booleans m · n
Bayesian method, in-dependence P (C) n · η P (D | C) = dj P (dj | C) m · n · η (1 + m)n · η Bayesian method,
par-tial independence P (C) n · η P (D | C) = P (drds| C) dj|j=r,s P (dj | c) ((m − 2)n + 22n) · η (3 + m)n · η Bayesian method, full
dependence
P (C) n · η
P (D | C) 2m· n · η
(1 + 2m)n · η
Diagnostic Model Pro-cessor
The sensitivity function with m · n real numbers
Multiple faults
Isolation algorithm Need to store δ [bit]
Column reasoning An extended FSM with m · 2nbooleans m · n
Row reasoning A regular FSM with m · n booleans m · n
Bayesian method, in-dependence P (C) 2n· η P (D | C) = dj P (dj | C) m · n · η (1 + m)n · η Bayesian method,
par-tial independence P (C) 2n· η P (D | C) = P (drds| C) dj|j=r,s P (dj | c) ((m − 2)n + 22n) · η (3 + m)2n· η
Bayesian method, full dependence
P (C) 2n· η
P (D | C) 2m· 2n· η
Chapter 7
Results
The sections in this chapter presents the results from the simulations and the evaluation of the performance.
7.1
Performance measures
A comparison of the performance measures for both sets of faults shows that the isolation is easier for the set of additative faults. The main reason for this is that the same residuals are used in both cases and there are more faults in the mixed fault case therefore, harder to isolate the faults. When only considering single faults, the diagnostic resolution has its largest value in the Column reasoning and Bayesian methods.
The optimal value for the error rate is zero, and the Bayesian methods al-ways makes the error rate equal to zero, because the true system behavioural mode always gets a probability greater than zero. The error rate is high for the Row reasoning and the Column reasoning method, assumption 2. This is because of the fact that the benchmark system’s reactions to some faults are delayed due to thresholds of the residuals and time delays in the system. The Column reasoning has a particularly high value of the error rate; it requires that the failure signatures exactly match the test results. The Row reasoning has a high error rate, in the case where the test results are held. Column reasoning, assumption 1, has higher diagnostic resolution than as-sumption 2, this means that asas-sumption 1 is more cautious excluding faults. The error rate for assumption 2, on the other hand, is higher than for as-sumption 1.
Row reasoning, compared to Column reasoning, has a lower diagnostic res-olution. This has to do with the minimal diagnoses that the Row reasoning produces. In the multiple fault case, the usage of minimal diagnoses lead to decreased normalised isolation accuracy.
The Bayesian methods have a high NDA for all simulations.
relatively low.
The effect of holding the test result active, once they are activated, is that the fluctuation in diagnosis statement are avoided. This leads to an increased diagnostic accuracy. The results of the evaluation of the performance show that the effects of holding the test results largest for the mixed fault case. It can be explained by that multiplicative faults generates test results that fluctuates more.
For the additative fault scenario, the diagnoses for different test results are shown in Appendix B. A comparison of the diagnoses from Column rea-soning, assumption 1, and Row reasoning shows that the only difference between them is how the behavioural mode NF is treated. The difference between the Bayesian approaches is that when dependence among test is as-sumed, for the test result [d1d2d3d4] = [0010], says that neither single fault
nor NF is probable. This means that this particular test result was never present during the simulations. If the algorithm sees such a test result it cannot say anything about the present fault.
The differences between the other assumptions are not very large. It is for example the test result [1000] and [0100] that differs marginally. The Di-agnostic Model Processor’s diagnoses looks a little different form the other methods. It shows the likelihoods for different faults. The likelihood has a value from -1 to +1, and one interpretation of this, in the single fault scenario, is that it shows if the current fault is positive or negative.
7.1.4
Memory usage
The result of the calculations follows by the next sections. Decreasing the number of bits used for storing the tables can reduce the memory usage for the Bayesian methods and the Diagnostic Model Processor. The Row reasoning algorithm uses less memory than any other method that has been considered. In the single fault case, the column reasoning and the row reasoning methods uses an equal amount of memory. The Diagnostic Model Processor uses less memory than the Bayesian methods, but more than the column- and row reasoning.
Additative faults
Memory usage, δ[bits]
Single faults Multiple faults
Column reasoning, assumption 1 24 256
Column reasoning, assumption 2 24 256
Row reasoning 24 24
Bayesian method, independence 1 920 20 480
Bayesian method, partial independence 2 688 28 672
Bayesian method, full dependence 6 528 69 632
Diagnostic Model Processor 1 536
-Mixed faults
Memory usage, δ[bits]
Single faults Multiple faults
Column reasoning, assumption 1 32 1 024
Column reasoning, assumption 2 32 1 024
Row reasoning 32 32
Bayesian method, independence 2 560 81 920
Bayesian method, partial independence 3 584 114 688
Bayesian method, full dependence 8 704 278 528
-Chapter 8
Conclusion
In this chapter the comparison of the isolation algorithms are discussed and conclusions are drawn from the results. Recommendations about when to use which of the different algorithms are also presented.
8.1
Discussion
All isolation algorithms presented here are good at isolating single faults in the benchmark problem, but when it comes to multiple faults, it is almost impossible to find an isolation algorithm capable of isolating all the 64 and 256 faults respectively. This is because there is too few test quantities, and the number of unique diagnoses that could be stated from the four test quan-tities is 24 = 16. In the DMP case, the number of unique diagnoses is much higher, because the thresholds are delivered in an other way, and the sign is also taken into account. To be able to increase the isolation performance, more test quantities are needed.
The isolation performance in general depends also on how good the tests are.
The performance measures in this thesis shows that simple algorithms like Row reasoning and Column reasoning are efficient considering memory us-age. They are also good at isolating single faults. This could be a good start when developing an isolation system.
If it shows that the isolation performance is not good enough or if the diag-noses are desired to be ranked, then DMP or a Bayesian approach could be interesting. For systems, where there are memory restrictions, DMP would be preferred. If the NDA is not high enough, the Bayesian algorithms should be used. It is important to know that there is a trade off between memory usage and NDA.
Bayesian method, where the test results are assumed to be independent, is a good estimation of P (C|D) considering both isolation performance and error rate. This could be a good alternative if there are memory restrictions. The other assumptions about independence in the Bayesian algorithm are only necessary to use if better precision is required or if the test results have strong dependence. The difference between the assumptions might also grow with decreased fault sizes.
8.2
Recommendation
From the conclusions above the following can be recommended: Isolation method Suits
Column reasoning - Small to large-scale systems - When memory usage is restricted Row reasoning - When diagnosing large-scale systems
- When multiple faults needs to be diagnosed - Systems with narrow memory restrictions Bayesian methods - When the diagnosis statement needs to be
ranked, for example, when other isolation algo-rithms produces too many diagnoses
- For medium to large-scale systems - If memory usage is not an issue Diagnostic Model
Processor
- For small-medium sized systems
- When the diagnosis statement needs to be ranked
- When there are memory restrictions
Table 8.1: The isolation methods are listed below together with the type of problems the respective method is recommended.
8.3
Summary
In this thesis the following goals have been reached:
• Implementation of four isolation methods has been made on a bench-mark problem.
• Performance measures has been gathered and developed.
8.4
Future work
The following future work is recommended.
• Extend the benchmark by adding extra tanks and tests to be able to see how the isolation algorithms handles additional faults and measure complexity etc.
• The performance of the isolation depends on how the tests are formed and filtering of the tests. More work is needed to be able to find methods for optimising tests for the isolation.
Bibliography
[O. Bouamama] B. Ould Bouamama, R. Mrani Alaoui, P. Taillibert and M. Staroswiecki Diagnosis of a two-tank system, 2001.
[Nyberg, Frisk] Mattias Nyberg, Erik Frisk Model Based Diagnosis of Tech-nical Processes, 2005.
[Jensen] Mathias Jensen Distributed Fault Diagnosis for Networked Embed-ded Systems, 2003.
[Gertler] J. Gertler, D. Singer A New Structural Framework for Parity Equation-based failure Detection and Isolation., 381-388, Automatica, 1990.
[Pulido] B. Pulido, V. Puig, T. Escobet, J. Quevedo A new fault localization algorithm that improves..., 2005.
[Wotawa] F. Wotawa A variant of Reiter’s hitting-set algorithm, 1999. [Pernest˚al] A. Pernest˚al A Bayesian Approach to Fault Isolation - Structure
Estimation and Inference, 2005.
[Petti] Petti et al. Diagnostic Model Processor: using deep knowledge for process fault diagnosis. AICHE Journal, 36(4):565-575, 1990.
[Ni˚ArPe] Anders Nilsson, Karl-Erik ˚Arz´en, Thomas F. Petti Model-based diagnosis - State transition events and constraint equations, 1992. [M. Ko´scielny] Jan Maciej Ko´scielny Fault isolation in industrial processes
by the dynamic table of states method, Automatica, Vol 31, No 5, 747-753, 1995.
[Schmid] F. Schmid Model-Based Fault Detection And Isolation: A New Approach for Fault Isolation in Dynamic Networks with Time Delays, diploma thesis, 2004.
Appendix A
Notation and parameters
Quantity Description Value Unit
ε1 Measurement noise V
ε2 Measurement noise V
ε3 Measurement noise V
ε4 Measurement noise V
rate noise my1 5.00e-4
rate noise my2 3.00e-4
rate noise mUp 1.00e-7
rate noise mQp 1.00e-7
T Sample time 1 s
A1 Area tank 1 0.0154 m2
A2 Area tank 2 0.0154 m2
Cvb Global hydraulic flow coefficient of valve
Vb
1.59e-4
Cvo Hydraulic flow coefficient of valve Vo 1.60e-4
h1c Reference value for PI-controller 0.5 m
h1 Water level in tank 1 - m
h2 Water level in tank 2 - m
h1max Maximal height of the tank T1 0.6 m
h2max Maximal height of the tank T2 0.6 m
Kp Proportional control constant 1.00e-3
-Ki Integration control constant 5.00e-6
P1 Pump
-Qint1 Inlet flow tank T1 - m3/s
Qint2 Inlet flow tank T2 - m3/s
Qout1 Out flow from T1 - m3/s
Qout2 Out flow from T2 - m3/s
Q12 Flow from T1 to T2 - m3/s
Qf1 Leak flow from T1 m3/s
Quantity Description Value Unit
Qo Out flow to consumer - m3/s
Qpmax Max flow from P1 0.01 m3/s
Qp Water flow from P1 - m3/s
Qmp Measured water flow from P1 - m3/s
T1 Tank 1 -
-T2 Tank 2 -
-Ubm Control signal for valve Vb - V
Uom Control signal for valve Vo - V
Up Control signal for the pump, P1 - V
V1 Valve 1 -
-V2 Valve 2 -
-Vb Valve Vb -
-ym Measured level in tanks - m
ym1 Measured level in tank T1 - m
ym2 Measured level in tank T2 - m
C System behavioural mode -
-ci Status for component no j -
-D Test result -
-di Status for test no i -
-i Component index -
-j Test/residual/satisfaction index -
-k Sample -
-l -
-m Number of components -
-n Number of test resultsresidualssatisfaction
tests - -rj Residual number j - -a Assumption - -Fi Failure likelihood - -Sij Sensitivity function - -vsfi Satisfaction vector - -P (C) Prior probability -
-P (C|D) Probability of the system behavioural
mode, C, given the test results, D
-
-P (D|C) Probability of the test results, D, given the system behavioural mode C
-Quantity Description Value Unit
L Total number of samples used when
evalu-ating the performance measures
-
-f (C|DC) Diagnostic confidence of the system be-havioural mode C, given the test result DC
DC The test result from system behavioural
mode C
α Normalised diagnostic accuracy -
-β Error rate - -δ Memory usage - -γ Diagnostic resolution - -x - -y - -z - -ξ - -ζ - -ϑ -
-τ1 Threshold for residual 1 6.30e-4/
-7.63e-5/ -7.63e-5*
-τ2 Threshold for residual 2 3.74e-4/
-1.23e-4/ -1.23e-4*
-τ3 Threshold for residual 3 1.20e-7/
-1.14e-7/ -3.34e-7*
-τ4 Threshold for residual 4 2.24e-7/
-1.03e-7/ -1.03e-7* -*) Threshold for the cases: additative faults/ mixed faults/ mixed faults
with filtering
FSM used for additative faults
c1 c2 c3 c4 c5 c6
d1 0 x x x 0 0
d2 0 x x 0 x 0
d3 0 0 0 0 0 x
FSM for mixed faults c1 c2 c3 c4 c5 c6 c7 c8 d1 x x x x 0 0 x x d2 0 x x 0 x 0 x x d3 0 x 0 x 0 x 0 0 d4 x 0 0 0 0 x x 0
Sensitivity matrix, S, for additative faults
c1 c2 c3 c4 c5 c6
d1 0 -0.87 0.87 56.46 0 0
d2 0 0.2815 -0.85 0 54.92 0
d3 0 0 0 0 0 -0.93
d4 -0.87 0 0 0 0 0.87
Sensitivity matrix, S, for mixed faults
c1 c2 c3 c4 c5 c6 c7 c8
d1 1 1 1 1 0 0 1 1
d2 0 1 1 0 1 0 1 1
d3 0 1 0 1 0 1 0 0