A Comparison of Isolation Algorithms on
a Benchmark System
MARTIN TORELM
Abstract
Acknowledgements
I would like to take this opportunity to thank my supervisor, Anna Pernest˚al, who has been of great help and support throughout this master’s degree project. I would also like to thank Scania  NED, and the department of Automatic Control at the Royal Institute of Technology, Sweden. I would also like to thank Vicen¸c Puig for providing information about the previous work on the benchmark problem used in this thesis.
Table of Contents
1 Introduction 1
1.1 Background . . . 1
1.2 Objective . . . 2
1.3 Goal . . . 2
2 The benchmark problem 3 2.1 The model . . . 3 2.2 Faults . . . 5 2.2.1 Additative faults . . . 5 2.2.2 Mixed faults . . . 6 3 Test quantities 7 3.1 Residual generation . . . 7 3.1.1 Discretisation . . . 8
3.2 Filtering and thresholds . . . 8
3.2.1 Cusum test . . . 9
3.2.2 Thresholds . . . 9
3.2.3 The Failure Signature Matrix . . . 9
4 Isolation algorithms 12 4.1 Column reasoning . . . 12
4.2 Row reasoning . . . 13
4.3 A Bayesian approach to isolation . . . 14
4.3.1 Independence . . . 15
4.3.2 Partial independence . . . 15
4.3.3 Full dependence . . . 16
5 Performance measures 18
5.1 Memory usage . . . 18
5.2 Diagnostic resolution . . . 18
5.3 Normalised diagnostic accuracy . . . 19
5.4 Error rate . . . 20
6 Simulation and evaluation 21 6.1 Simulations . . . 21
6.1.1 Column reasoning . . . 22
6.1.2 Row reasoning . . . 22
6.1.3 Bayesian isolation methods . . . 22
6.1.4 Diagnostic Model Processor . . . 27
6.2 Evaluation of the algorithms . . . 27
6.2.1 Memory usage . . . 30
7 Results 32 7.1 Performance measures . . . 32
7.1.1 Additative faults . . . 34
7.1.2 Mixed faults without ﬁltering . . . 35
7.1.3 Mixed faults with ﬁltering . . . 36
7.1.4 Memory usage . . . 37 8 Conclusion 38 8.1 Discussion . . . 38 8.2 Recommendation . . . 39 8.3 Summary . . . 39 8.4 Future work . . . 40
A Notation and parameters I
B Isolation V
Chapter 1
Introduction
Diagnostics is an important task in the ﬁeld of industrial systems. A malfunctioning component could, for example, result in decreased eﬃciency, damage to the system, or even cause personal injury.
It is important to detect and isolate the fault to be able to make the right decisions when a fault is present. Isolation algorithms are used for ﬁnding out which component or how the component is failing. Isolation can be based on consistency tests for a process. Based on knowledge of how the tests react on diﬀerent faults, the faulty component can be pointed out. A good isolation system can be of valuable help for the shop technician when locating and repairing the fault.
1.1
Background
Process Residual
generation Filtering Isolation
Diagnoses
Process data Test quantities
Figure 1.1: The ﬁgure shows the structure, which is used in this work for making diagnosis. The method is to use residuals, which are ﬁltered and thresholded, as inputs to the isolation. The output of the isolation is the diagnoses, which is a set of possible faults causing the behaviour of the process.
The residuals are then ﬁltered, and by applying thresholds to the residuals, tests are computed. The tests will then be used as inputs to the isolation.
1.2
Objective
The objective with this thesis is to compare diﬀerent isolation algorithms. Based on the comparison of the diﬀerent approaches, a recommendation should be made for which types of problems the isolation algorithms are applicable.
1.3
Goal
The goal of this project is to:
1. Implement four types of isolation methods on a benchmark problem 2. Develop performance measures to compare the isolation methods 3. Compare and evaluate the methods with the developed performance
Chapter 2
The benchmark problem
In this chapter the benchmark problem is presented and the model equations are derived. The model equations will be used for simulation and when developing forming the residuals. This benchmark problem has previously been used in [Pulido] and [O. Bouamama].
2.1
The model
The benchmark system that is used for comparing the algorithms is taken from [O. Bouamama] and consists of two tanks with various components connecting the tanks. The system is shown in Figure 2.1.
The purpose of the twotank system is to provide a constant ﬂow to the consumer. A PIcontrolled pump provides water to tank T1, with an inlet
ﬂow Q_{p}, to a nominal level of h_{1c} = 0.5 m. Tank T1 is connected with T2
by a pipe. The water level, h1, in tank T2 is controlled by an ”ONOFF”
controller acting on the valve V_{b} and causing a water ﬂow, Q12, from tank
T1. The ”ONOFF” controller is controlling the water level, h2, in tank T2.
The valves V_{f1} and V_{f2} are used to simulate water leakage of the respective tank.
The inputs to the system are the pump ﬂow, Q_{p}, the control output from the ”ONOFF” controller, U_{b}, the input voltage U_{o} the valve for the outlet to the consumer, and the pump voltage U_{p}. The input vector u is a measured quantity. Measured quantities have the letter ”m” added to the variable. u = ⎡ ⎢ ⎢ ⎣ Qm_{p} U_{b}m U_{o}m U_{p}m ⎤ ⎥ ⎥ ⎦
Figure 2.1: System used for benchmark. as: ym= y_{1}m y_{2}m = h1+ ε1 h2+ ε2 , (2.1)
where ε1 and ε2 are measurement noises.
The change of the volume in the tanks can be described as diﬀerence between the sum of all inﬂows and the sum of all outﬂows. This is written as: ˙ V1= A1˙h1 = Q_{int1}−Q_{out1}= = Q_{p}− Q12− Q_{f1} ˙ V2= A2˙h2 = Qint2−Qout2= = Q12− Q_{o}− Q_{f2} (2.2)
The inlet ﬂow Q_{p} is assumed to be proportional to the pump voltage, U_{p}, and taking into account for the limitation of the pump, the inlet ﬂow is described as in the equation below.
Q_{p}(t) = ⎧ ⎨ ⎩ U_{p} 0 < U_{p} < Q_{p max} 0 U_{p} ≤ 0 Q_{p max} U_{p} ≥ Q_{p max} (2.3)
The PI controller acting on the pump is modelled as: U_{p} = K_{p}(h_{1c}− h1(t)) + K_{i}
(h_{1c}− h1(t)) dt, (2.4)
where K_{p}and K_{i}are constants and h_{1c} is the set point for the PIcontroller. Using Bernouilli’s law, the water ﬂow Q12 between the two tanks is
The ONOFF controller controls the inlet ﬂow to tank T2 through a
valve, V_{b}. The valve opens if the water level in tank 2 is less or equal than 0.09 m and closes if the water level is greater than 0.09. The water level can not be less than 0 m and it can not be greater than 0.11 m. The control signal acting on the valve is
U_{b}m =
0 , if 0.09 m≤ h2< 0.11 m
1 , if 0.00 m≤ h2< 0.09 m . (2.6)
Using Bernouilli’s law for a second time gives the outﬂow to the consumer.
Q_{o}= C_{vo}·h2· U_{o}m, (2.7)
where U_{o}m is the control signal to the valve V_{o}. U_{o}m=
1 if V_{o}is opened
0 if V_{o}is closed (2.8)
Using the Equations 2.2  2.8 the ﬁnal model of the system is written as: ˙h1= Qp−Cvb·sgn(h1−h2) √ h1−h2·U_{b}m−Qf 1 A1 ˙h2= Cvb·sgn(h1−h2) √ h1−h2·U_{b}m−Cvo·√h2·Uom−Qf 2 A2 (2.9)
2.2
Faults
Diﬀerent types of faults are considered in the benchmark problem. In [Pulido] there are totally six additative faults simulated. In [O. Bouamama] there are total of eight faults simulated, both additative and multiplicative. Both of the works only cover the single fault scenario, but in this work we will also consider multiple faults. In this thesis both the set of faults in [Pulido] and in [O. Bouamama] are studied. In this text, the faults will be denoted F , with diﬀerent subscripts for the diﬀerent faults. The superscript, ”P”, will be used for faults taken from [Pulido], and ”B” will be used for faults taken from [O. Bouamama].
2.2.1
Additative faults
In [Pulido] there are only additative, single faults considered. To be able to compare the results following faults from this work are used:
F_{ﬀ}P: Fault free mode: the process run without faults FpumpP : Pump fault: additative fault in pump P1
F_{y}P_{1}: Additative fault in level sensor y1
F_{Q}P
f 1: Constant leak in tank T1
F_{Q}P
f 2: Constant leak in tank T2
F_{U}P
p: Additative fault in the controller output Up in tank T1
In this case there is a total number of six diﬀerent single faults that can be simulated. All combinations of faults will be simulated in the benchmark system, and this means that there are 26 = 64 possible faults scenarios.
2.2.2
Mixed faults
The faults that has been taken into consideration in [O. Bouamama] are two types, additative and multiplicative.
F_{ﬀ}B: Fault free mode: the process run without faults
F_{pump}B : Pump fault: the pump is simulated oﬀ from t = 40s to t = 120s F_{y}B_{1}: Level sensor y_{1}m is stuck to zero from t = 40s to t = 120s F_{y}B_{2}: Level sensor y_{2}m is stuck to zero from t = 40s to t = 120s F_{Q}B
f 1: Water leak in tank T1 from t = 40s to t = 120s. Qf1= 10
−4 _{m}3_{/s}
F_{Q}B
f 2: Water leak in tank T2 from t = 40s to t = 120s. Qf2= 10
−4 _{m}3_{/s}
F_{U}B
p: Controller output U
m
p is short circuit to ground from t = 40s to t =
120s F_{V}B
b: Valve Vb is blocked out from t = 40s to t = 150s
F_{U}B
b: Controller output U
m
b is short circuit to ground from t = 40s to t =
120s
Chapter 3
Test quantities
In this chapter, residuals are computed and test quantities are formed. The test quantities are obtained by simple relations from the model and will be used as input for the isolation.
3.1
Residual generation
The detection part of the diagnostic system is based on residuals. Residuals are obtained from Analytic Redundancy Relations, ARR [Nyberg, Frisk]. The ARR, shown Equations 3.1  3.4, is obtained from the model equations (see Equation 2.2  2.9 in Chapter 2), by identifying the relations between the measured outputs, x_{j}, and the modelled outputs, ˆx_{j}, with j = 1, 2 . . . 4.
A1 _{dy}m 1 dt + dε1 dt x1 ≈ Q_{}12+ Qm_{p}_{}+ ε3− Qf1_{} ˆ x1 (3.1) A2 dym_{2} dt + dε1 dt x2 ≈ Q12− Cvo· (ym_{2} + ε2)· U_{o}m− Qf2 ˆ x2 (3.2) U_{p}m+ ε4 x3 ≈ Kp(eh1) + Ki (e_{h1}) dt ˆ x3 (3.3) Qm_{p} + ε3 x4 ≈ ⎧ ⎨ ⎩ U_{p}m+ ε4 , if 0 < U_{p}m+ ε4 < Qp max 0 , if U_{p}m+ ε4 ≤ 0 Q_{p max} , if U_{p}m+ ε4 ≥ Qp max ˆ x4 (3.4)
where ε_{i}, i = 1, 2 . . . 4, is the sensor noise, and
Q12 = −Cvbsign(ym1 + ε1− y2m− ε2)y1m+ ε1− ym2 − ε2 · U_{b}m
The residuals can then be computed as the diﬀerence between the measured and modelled output.
r1(t) = xˆ1(t)− x1(t)
r2(t) = xˆ2(t)− x2(t)
r3(t) = xˆ3(t)− x3(t)
r4(t) = xˆ4(t)− x4(t)
3.1.1
Discretisation
The model equations in 2.9 must be discretisised because the implementation of the residuals in the simulation needs to be done in discrete form. The model was discretisised by ”Euler’s method” which, for instance, transfers dym/dt to
dym
dt =
ym(k)− ym(k− 1)
T (3.5)
The residual generation used for the simulations can then be written as: r1(k) = −Cvbsgn(y1m(k)− y2m(k)) ym 1 (k)− y2m(k) · U_{b}m(k)− −Qm p(k)− Qf1(k)− A1 _{y}m 1(k)−y1m(k−1) T r2(k) = Cvbsgn(ym1(k)− ym2(k))ym1(k)− ym2(k) · Ubm(k)− −Cvo· ym_{2}(k)· U_{o}m(k)− Q_{f2}(k)− A2 _{y}m 2(k)−ym2(k−1) T r3(k) = U_{p}m(k)− Kp(em(k)− em(k− 1)) − KiT em(k)− Up(k− 1) r4(k) = Qm_{p} (k)− ⎧ ⎨ ⎩ U_{p}m 0 < U_{p}m(k) < Q_{p max} 0 U_{p}m(k) ≤ 0 Q_{p max} U_{p}m(k) ≥ Q_{p max} (3.6)
3.2
Filtering and thresholds
0 50 100 −0.1 −0.05 0 0.05 0.1 r1 Time [s] 0 50 100 −5 0 5 10 15x 10 −3 r2 Time [s] 0 50 100 −1 −0.5 0 0.5 1 x 10−7 r3 Time [s] 0 50 100 −2 −1 0 1 2 x 10−7 r4 Time [s]
Figure 3.1: Residuals for additative fault in y1. The dashed line shows the
threshold used for the residuals.
3.2.1
Cusum test
The Cusum test [Gustafsson] is used to detect small changes in the bias of the residuals. The Cusum test is deﬁned as
S_{t+1} =
S_{t}+ y_{t+1} y_{t+1}> h
0 y_{t+1}≤ h .
A rule of thumb is that the value of h should be 2.5 times the fault size.
3.2.2
Thresholds
Thresholds are used to determine if a test has reacted to a fault. The thresholds are chosen in a way such that no test reacts when the system is simulated in the fault free case. The threshold is then set just above the highest value of the residual. This is to avoid false alarms. A simulation of an additative fault in the sensor for y1 is shown in Figure 3.1. The residuals r1
and r2are sensitive to the fault and have exceeded the thresholds. Figure 3.2
shows the test results for the same simulation.
3.2.3
The Failure Signature Matrix
Test results for fault F_{y}P_{1} Time [s] Te st 0 40 80 120 d1 d2 d3 d4
Figure 3.2: The test results for the additative fault, F_{y}P_{1}.
r_{i} FpumpP F_{y}P_{1} F_{y}P_{2} F_{Q}P_{f 1} F_{Q}P_{f 2} F_{U}P_{p}
r1 0 x x x 0 0
r2 0 x x 0 x 0
r3 0 0 0 0 0 x
r4 x 0 0 0 0 x
Table 3.1: The failure signature matrix
0 50 100 150 200 250 300 −2 −1 0 1 2 3 4 5 Threshold Time [s] Amplitude −2 0 2 4 6 0 5 10 15 20 25 Threshold Amplitude
Chapter 4
Isolation algorithms
There are many diﬀerent algorithms that have been developed for isolation. Some of them are based on models of the process, while others are based on experience. In this project four diﬀerent model based approaches to fault isolation are considered. Variations of the algorithms by changing diﬀerent assumptions are also considered.
First, some notation: Let c_{i} be a variable, which describes the behavioural mode in component i, such that
c_{i} =
0 for ”no fault in component i”
1 for ”fault in component i” . (4.1)
Further on, let d_{j} be the test result from test j and d_{j}=
0 for no alarm
1 for alarm . (4.2)
The current system behavioural mode, C, and the test results, D, can then be written as
C = [c1, c2, c3, . . . cn] (4.3)
D = [d1, d2, d3, . . . dm]. (4.4)
Let Δ be a diagnosis, i. e. a system behvioural mode that is consistent with measurements. Note that there can be several system behavioural modes that are consistent with measurements. The output, D, from the isolation system is a set of diagnoses.
4.1
Column reasoning
1. No conclusion can be drawn from a test result that has not been activated
2. An inactivated test result can exclude the faults where x:s are marked (this is the traditional way of doing isolation in FDI ﬁeld, but it is generally not to recommend, since it could exclude a correct diagnosis when a test misses a detection. See Figure 3.3 )
Example 1 Consider the FSM:
d_{i} c1 c2 c3
d1 x 0 x
d2 0 x x
(4.5)
When test d1 reacts and test d2does not react the ﬁrst variant of the Column
reasoning method will give the diagnoses:
D = {{c2}, {c3}}, (4.6)
while the second variant of Column reasoning will give:
D = {{c2}}. (4.7)
4.2
Row reasoning
Row reasoning/a variant of Reiter’s algorithm (DX) is a common way to handle the isolation problem. The main idea behind row reasoning is that each test results in a conﬂict. A conﬂict means, in this case, that not all of the components included in the conﬂict can be non faulty at once. These conﬂicts can be generated with the tests together with the rows of the Failure Signature Matrix. For every new conﬂict, the intersection with the old ones produces the new diagnosis statement.
Example 2
d_{i} c1 c2 c3
d1 x 0 x
d2 0 x x
Consider the FSM above. If test number one reacts, i. e. d1 = 1 then it
would produce the conﬂict {c1, c3} and the diagnoses:
D = {{c1}, {c3}} (4.8)
Later on, if also test number two reacts (d2 = 1), then conﬂicts would be
{c1, c3} ∧ {c2, c3}, and the Row reasoning isolation method will produce the
diagnoses:
{c1, c3} ∧ {c2, c3} ⇒
This means that there are four possible explanations of the system’s behaviour, three of them contains double faults, and one contains a single fault.
The most common assumption is that the current behaviour of the process probably has its explanation from the fault that includes the least components, from the example above, this would be {c3}, even though all of the
above sets are possible.
In this thesis we will consider Reiter’s algorithm for ﬁnding the minimal diagnosis. It produces a minimal set of possible faults, and a common interpretation is that the current system behavioural mode exists in this set, even if all supersets of the minimal diagnosis are possible. From the example above, the result would be: D = {{c1c2}, {c3}}.
The structure of Reiter’s algorithm is shown below:
1. Initialise the set of minimal diagnoses to hold only the empty set, i.e.
2. Given a (new) conﬂict, ﬁnd out if any minimal diagnosis is invalidated, i.e. has an empty intersection with the conﬂict 3. Extend any invalidated diagnosis to a set of new diagnoses
consisting of the invalidated diagnosis and an element from the new conﬂict
3. Remove any new diagnoses that are not minimal, i.e. are supersets of any other minimal diagnosis
4. Iterate from Item 2 for all new conﬂicts
4.3
A Bayesian approach to isolation
A drawback with Column reasoning and Row reasoning is that the algorithms often produces many diagnoses. Therefore, a Bayesian approach to isolation [Pernest˚al] has been considered. The main idea behind this approach to fault isolation is to compute the probability that a fault is present. This probability can then be used for ranking or decision making on fault accommodation.
Let C be the current system behavioural mode, D the test results, then, Baye’s rule is applicable as follows:
P (C  D) = P (DC)P (C)_{P (D)} (4.10)
In order to obtain a good estimation of the functions in Equation 4.10, simulations of the system has to be done with all combinations of faults, single as well as multiple. If all faults are assumed to be independent and the probability for a single fault to occur is P (c_{i}) = p_{c},∀i = 1 . . . n then
P (C) is called the prior probability. The simulations are done with the current system behavioural mode and the test results as outputs. The function P (D) is a normalisation factor and is calculated as follows:
P (D) =
C
P (D C)P (C) (4.11)
The Bayesian approach is varied using diﬀerent assumptions about independence. The only diﬀerence is how P (D  C) is computed. The next three subsections will describe the diﬀerent assumptions.
4.3.1
Independence
In this variant of the Bayesian algorithm, all test results, d_{j}, are assumed to be independent. The probability, P (D C), can then be computed as
P (D C) =
j
P (d_{j}  C). (4.12)
For every fault simulation, simulation data is gathered at the sample times, t_{k} = kT , where k = 0, . . . , N − 1 and T is the sampling interval. Then the estimation of the distribution P (d_{j}  C) is done as follows:
P (d_{j} = 1 C) = N k=1 d_{j}(t_{k}  C) N P (d_{j} = 0 C) = 1 − P (d_{j} = 1 C), (4.13)
where d_{j}(t_{k}  C) is the observed test result from time t_{k} given the system behavioural mode C.
4.3.2
Partial independence
The assumption about independence among test results is generally not valid. Two tests could, for example, be dependent when they share the same underlying relations, and if one of the tests reacts it can cause the other to react. The assumption of partial independence of the test results is used in this variant of the Bayesian algorithm.
The method, which used in this work, for ﬁnding dependence among tests is taken from [Pernest˚al]. If tests are dependent, and a test has reacted, the knowledge about the system behavioural mode will not provide any information about the other tests. To decide if tests are dependent, training data is collected from diﬀerent system behavioural modes. The training data is then evaluated and likelihoods of diﬀerent dependencies are computed.
4.3.3
Full dependence
In reality, there is always a possibility for dependence among tests, and to make sure we cover all possibilities we can assume that there are dependencies between all tests, the probabilities should then be calculated as:
P (C D) = P (d1d2...dmC)
P (D) (4.15)
The assumption about full dependence is the best that can be done when computing the probabilities for system behavioural modes given the test results, and can also be used as reference when evaluating how well the other assumptions work.
4.4
Diagnostic Model Processor
The Diagnostic Model Processor [Petti], DMP, is a modelbased algorithm for diagnostics. This method is also based on residuals. The residuals are weighted to decide the degree of violation of the model equations. The thresholds are be obtained as before, and when the residual r_{j} exceeds the threshold τ_{j} corresponds to v_{j} exceeding the value 0.5. The residuals are calculated from measurements, u and y, from the process. Let C be a vector of assumptions about the system behavioural mode. Then the residuals can be written as:
r_{j} = g_{j}(C) (4.16)
The residuals are used to calculate a satisfaction vector, vsf, which contains the information on how well the model equations are satisﬁed: 0 for perfect satisfaction and±1 when model equations are severely violated high or low respectively.
v_{j}sf= (rj/τj)
n
1 + (r_{j}/τ_{j})n (4.17)
This can be seen as an other way of thresholding.
The sensitivity function, S, is determined through the partial derivative of the model equations, c_{j}, with respect to the fault, c_{i}:
S_{ij} = ∂gj/∂ci τj
The sensitivity function corresponds to the FSM in the FDI approach and describes how easy a behavioural mode, c_{i}, violates the residual r_{i}.
The failure likelihood, F_{i}, of assumption c_{i} is determined from the equation:
F_{i} = n j=1 (S_{ij}vsf_{j}) n j=1 Sij (4.19)
Chapter 5
Performance measures
To be able to compare the performance and the resources that are required from the respective algorithm, performance measures are needed. Explanations and deﬁniExplanations of performance measures follow in this chapter. Some of the measures are not applicable on certain types of problems. If this is the case, a description can be found in the respective section.
5.1
Memory usage
The memory usage of an algorithm is deﬁned as the amount of memory required for carrying through the isolation. This measure is dependent on size as well as the desired accuracy of the data structures:
• In Column reasoning and Row reasoning, an FSM needs to be stored • In the Bayesian approaches, the functions P (DC) and P (C) needs to
be stored as tables
• In DMP, the sensitivity matrix S needs to be stored
5.2
Diagnostic resolution
The diagnostic resolution ([Pulido]) measures the average of the belief, p_{kC}, of the system behavioural mode, C, evaluated for sample number k. The diagnostic resolution is deﬁned as:
γ = 1 L L k=1 C p_{kC}, (5.1)
the optimal value of the diagnostic resolution, they need to point out one system behavioural mode in average.
Example 3 Let B1, B2, B3 and B4 be the possible behavioural modes. If
the Row reasoning method produces the diagnostic statement, {{B1}, {B2}}
in performance test k, then _{}
B
p_{kC} = 2 (5.2)
If the Bayesian method states the diagnoses,
{P (B1) = 0.5, P (B2) = 0.4, P (B3) = 0.1, P (B4) = 0} for performance test
k, then _{}
B
p_{kC} = 1 (5.3)
Since the Bayesian method states the diagnosis on the form of probabilities, the Diagnostic resolution will always become one for this case.
5.3
Normalised diagnostic accuracy
Normalised diagnostic accuracy, NDA, was developed in order to handle the multiple fault scenario. The idea is to place diﬀerent weights depending on how important a component is, i. e. to let single faults be more important than behavioural modes containing multiplicative faults. Let the function f (C  D) denote the conﬁdence of a diagnosis, and let
C
f (C  D) = 1 (5.4)
The NDA is then deﬁned as:
α = 1 N Cf (C  DC)· kC 1 N CkC = Cf (C_{} DC)· kC CkC (5.5) Where D_{C} is observations of the test results, when C is the true system behavioural mode, and k_{C} is a vector which includes weights for the system behavioural mode. Depending on how important it is that the system behavioural mode, C, is included in the diagnosis when active, these weights is chosen in a way that important behavioural modes gets large values, and less important system behavioural modes gets smaller values. For example, it is more important to be able to have a correct diagnosis statement for single faults or NF than for the case when more or all components are faulty. The parameter k_{C} is design parameter. A good choice of k_{C} is to let single faults have the value 0.11, double faults have the value 0.12 etc.
the true system behavioural mode is C2, the test result is D_{C} and the isolation
reaches the conclusion that C2 or C3 is present then the conﬁdences of the
diagnoses are: f (C1DC2) = 0 f (C2DC2) = 1 2 f (C3D_{C}_{2}) = 1 2
The only conﬁdence which contributes to the sum in the denominator is then f (C2DC2) = 1/2.
The optimal value for the NDA is 1. The optimal value of the NDA is not, in reality, achievable because this means that the conﬁdence of the diagnosis needs to be one at all times.
5.4
Error rate
The error rate is deﬁned as the average percentage of faulty diagnoses for the current system behavioural modes. A faulty diagnosis means that the true system behavioural mode is not present in the diagnosis.
β = 1
L
C
(f (CD_{C}) == 0) (5.6)
Chapter 6
Simulation and evaluation
This chapter describes how the simulation and evaluation of the isolation methods was done technically.
6.1
Simulations
The benchmark system was simulated in SIMULINK and the isolation algorithms were implemented in MATLAB. The residual generation is the same for all types of algorithms in order to make an objective comparison. All thresholds were also kept the same for all algorithms, with exception for DMP, where thresholds are deﬁned in an other way. All simulations were done both for the case where only additative faults used, with a total of six faults and for the case where multiplicative faults are used. Two ways of handling the test results have also been considered:
Time [s]
0 50 100 150
2. 1.
1. Test results are computed at every sample time
2. Test results are computed at every sample time until it is equal to one, then it is held to one during the entire simulation. In this way ﬂuctuation in the diagnosis is avoided (see Figure 6.1)
The residuals is shown in the Figures 6.2  6.3, while simulating the faults F_{y}P_{1} and F_{Q}P
f 1. Figure 6.4  6.5 shows the test results. The simulation
output for each isolation method is shown in the Figures 6.6  6.13, using the faults F_{y}P_{1} and F_{Q}P
f 1, when only single faults are considered and when
test results are held.
6.1.1
Column reasoning
Column reasoning was implemented both for single faults and multiple faults. The multiple fault case needed an extended FSM. This FSM is obtained by merging the single fault FSM into a multiple fault FSM. A Simulation of an additative fault in the level sensor for tank T1 is shown
in Figure 6.6 (only considering single faults). Note that the ﬁrst variant of Column reasoning is more careful with excluding faults, and this lead to many diagnoses, but if there are small faults active, causing the tests not to react, then the Column reasoning 1 seem reasonable.
6.1.2
Row reasoning
All combinations of faults were used during the simulations. The properties of row reasoning are such that the regular FSM can be used in both the singlefault case and multiplefault case. Row reasoning always produces a diagnosis for multiple faults; therefore, when evaluating the algorithm in the single fault case, diagnoses with more than one component are ignored. The output from the Row reasoning method is shown in Figure 6.8 while simulating a fault in the level sensor for tank T1. A comparison of the output
from the Row reasoning method with the output from Column reasoning method (see Figure 6.6, to the right), shows that the methods are very similar. The diﬀerence between them shows when around 40 s, when just one test have reacted. The Row reasoning method shows that there are three possible components faulty, while the Column reasoning method has just pointed one wrong component. Shortly after that, the next test reacts and the two methods show the same output.
6.1.3
Bayesian isolation methods
0 50 100 −0.1 −0.05 0 0.05 0.1 r_1 Time [s] 0 50 100 −5 0 5 10 15x 10 −3 r_2 Time [s] 0 50 100 −1 −0.5 0 0.5 1 x 10−7 r_3 Time [s] 0 50 100 −2 −1 0 1 2 x 10−7 r_4 Time [s]
Figure 6.2: Residuals for the single fault scenario and additative faults considered, simulating the fault F_{y}P_{1} from t = 40 s to t = 120 s.
0 50 100 −0.01 0 0.01 0.02 0.03 r_1 Time [s] 0 50 100 −4 −2 0 2 4x 10 −4 r_2 Time [s] 0 50 100 −1 −0.5 0 0.5 1 x 10−7 r_3 Time [s] 0 50 100 −2 −1 0 1 2 x 10−7 r_4 Time [s]
Figure 6.3: Residuals for the single fault scenario and additative faults considered, simulating the fault F_{Q}P
Test results for fault F_{y}P_{1} Time [s] Te st 0 40 80 120 d1 d2 d3 d4
Figure 6.4: Test results for the single fault scenario and additative faults considered, simulating the fault F_{y}P_{1} from t = 40 s to t = 120 s.
Test results for fault F_{Q}P_{f1}
Time [s] Te st 0 40 80 120 d1 d2 d3 d4
Figure 6.5: test results for the single fault scenario and additative faults considered, simulating the fault F_{Q}P
Time [s]
Column reasoning 2, diagnoses for fault F_{y}P_{1}
0 40 80 120 F_{pump}P F_{y}P_{1} F_{y}P_{2} F_{Q}P f1 F_{Q}P_{f2} F_{U}P_{p} Time [s]
Column reasoning 2, diagnoses for fault F_{y}P_{1}
0 40 80 120 F_{pump}P F_{y}P_{1} F_{y}P_{2} F_{Q}P f1 F_{Q}P_{f2} F_{U}P_{p}
Figure 6.6: Output from the Column reasoning isolation methods for the single fault scenario and additative faults considered, simulating the fault F_{y}P_{1} from t = 40 s to t = 120 s. The ﬁgure to the left shows that no conclusion can be drawn about the system behavioural mode for the ﬁrst 40 samples.
Time [s]
Column reasoning 1, diagnoses for fault F_{Q}P_{f1}
0 40 80 120 F_{pump}P F_{y}P 1 F_{y}P_{2} F_{Q}P_{f1} F_{Q}P f2 F_{U}P p Time [s]
Column reasoning 2, diagnoses for fault F_{Q}P_{f1}
0 40 80 120 F_{pump}P F_{y}P 1 F_{y}P_{2} F_{Q}P_{f1} F_{Q}P f2 F_{U}P p
Figure 6.7: Output from the Column reasoning isolation methods for the single fault scenario and additative faults considered, simulating the fault F_{Q}P
Time [s]
Row reasoning, diagnoses for fault F_{y}P
1 0 40 80 120 F_{pump}P F_{y}P_{1} F_{y}P_{2} F_{Q}P f1 F_{Q}P_{f2} F_{U}P p
Figure 6.8: Output from the Row reasoning isolation method for the single fault scenario and additative faults considered, simulating the fault F_{y}P_{1} from t = 40 s to t = 120 s.
Time [s]
Row reasoning, diagnoses for fault F_{Q}P
f1 0 40 80 120 F_{pump}P F_{y}P_{1} F_{y}P_{2} F_{Q}P f1 F_{Q}P_{f2} F_{U}P p
Figure 6.9: Output from the Row reasoning isolation method for the single fault scenario and additative faults considered, simulating the fault F_{Q}P
f 1
Example 5 Consider a system with three components and three tests, i. e. i = 3 and j = 3. If we simulate all behavioural modes, the probabilities can be estimated through
P (d1 = x, d2= y, d3= z c1= ξ, c2= ζ, c3= ϑ) = n ξζϑ xyz
N (6.1)
where nξζϑ_{xyz} is the number of samples, N is the total number of samples and x, y, z, ξ, ζ, ϑ can take the values 0 or 1.
A simulation of an additative fault in the level sensor in tank T1, where only
single faults are considered, is shown in Figure 6.10.
6.1.4
Diagnostic Model Processor
DMP was implemented in MATLAB and the thresholds and the sensitivity function from [Pulido] was used for the case where only additative faults are considered. For the multiplicative faults, the sensitivity function was chosen such that the elements corresponding to multiplicative faults got the values +1 and 1 for residuals which reacts with a positive and a negative derivative respectively. The thresholds were chosen in a way that no residual react in the fault free mode.
A simulation of a fault in the level sensor for tank T1is shown in Figure 6.12,
where only single faults are assumed to be possible. The ﬁgure shows the likelihoods for faults in the diﬀerent components. note that this method produces a diﬀerent result than the previous methods. The likelihoods can be both positive and negative. The interpretation of negative likelihoods can be that a negative fault is present in the corresponding component or that it is highly unlikely that the fault is present.
6.2
Evaluation of the algorithms
The evaluation of the isolation algorithms was done with MATLAB. Scripts were used to simulate all combinations of faults and the data was processed afterwards. All of the performance measures are evaluated both for snapshots of data, and snapsnapshots where the test results are held active once activated. For the case where mixed faults are simulated, the performance is also measured with the test results ﬁltered using the Cusumtest. In the additative fault case, ﬁltering is not necessary because simulations showed that it was always possible to separate a violated residual from nonviolated residual.
Time [s]
Bayesian methodindependence, diagnoses for fault F_{y}P_{1}
0 40 80 120 F_{pump}P F_{y}P_{1} F_{y}P_{2} F_{Q}P f1 F_{Q}P_{f2} F_{U}P_{p}
Figure 6.10: Output from the Bayesian method, where the test results is assumed to be independent, and the fault F_{y}P_{1} is present from t = 40 s to t = 120 s. The ﬁgure shows the probabilities for the single fault scenario and additative faults considered.
Time [s]
Bayesian methodindependence, diagnoses for fault F_{Q}P
f1 0 40 80 120 F_{pump}P F_{y}P_{1} F_{y}P_{2} F_{Q}P f1 F_{Q}P_{f2} F_{U}P_{p}
Figure 6.11: Output from the Bayesian method, where the test results is assumed to be independent, and the fault F_{Q}P
f 1 is present from t = 40 s to
Time [s]
Diagnostic model processor, diagnoses for fault F_{y}P_{1}
0 40 80 120 F_{pump}P F_{y}P_{1} F_{y}P_{2} F_{Q}P f1 F_{Q}P f2 F_{U}P_{p}
Figure 6.12: Output from the Diagnostic Model Processor. The ﬁgure shows the likelihoods for the single fault scenario and additative faults considered, simulating the fault F_{y}P_{1} from t = 40 s to t = 120 s.
Time [s]
Diagnostic model processor, diagnoses for fault F_{Q}P
f1 0 40 80 120 F_{pump}P F_{y}P 1 F_{y}P_{2} F_{Q}P f1 F_{Q}P_{f2} F_{U}P_{p}
Figure 6.13: Output from the Diagnostic Model Processor. The ﬁgure shows the likelihoods for the single fault scenario and additative faults considered, simulating the fault F_{Q}P
• Error rate
The error rate is only measured for the singlefault case. Error rate is not evaluated for the multiple fault case.
• Diagnostic resolution
The diagnostic resolution is measured for the singlefault case only. Diagnostic resolution is not measured for multiple faults.
• Normalised Diagnostic Accuracy
NDA uses diﬀerent weights, depending on the importance of isolating the current system behavioural mode. Therefore, it is suitable for measuring performance on multiple faults. Isolation performance is measured for all faults, single as well as multiple.
• Memory usage
The memory usage is computed by analysing the memory structures needed for storing the information about the isolation.
The parameters for the benchmark system and the isolation can be found in Appendix A.
6.2.1
Memory usage
The memory usage is denoted δ. For the case where nonboolean structures are used, the number of bits used in the elements of the structures is η. Single faults
Isolation algorithm Need to store δ [bit]
Column reasoning A regular FSM with m · n booleans m · 2n
Row reasoning A regular FSM with m · n booleans m · n
Bayesian method, independence P (C) n · η P (D  C) = dj P (dj  C) m · n · η (1 + m)n · η Bayesian method,
partial independence P (C) n · η P (D  C) = P (drds C) djj=r,s P (dj  c) ((m − 2)n + 22n) · η (3 + m)n · η Bayesian method, full
dependence
P (C) n · η
P (D  C) 2m_{· n · η}
(1 + 2m)n · η
Diagnostic Model Processor
The sensitivity function with m · n real numbers
Multiple faults
Isolation algorithm Need to store δ [bit]
Column reasoning An extended FSM with m · 2n_{booleans} _{m · n}
Row reasoning A regular FSM with m · n booleans m · n
Bayesian method, independence P (C) 2n_{· η} P (D  C) = dj P (dj  C) m · n · η (1 + m)n · η Bayesian method,
partial independence P (C) 2n· η P (D  C) = P (drds C) djj=r,s P (dj  c) ((m − 2)n + 22n) · η (3 + m)2n· η
Bayesian method, full dependence
P (C) 2n_{· η}
P (D  C) 2m_{· 2}n_{· η}
Chapter 7
Results
The sections in this chapter presents the results from the simulations and the evaluation of the performance.
7.1
Performance measures
A comparison of the performance measures for both sets of faults shows that the isolation is easier for the set of additative faults. The main reason for this is that the same residuals are used in both cases and there are more faults in the mixed fault case therefore, harder to isolate the faults. When only considering single faults, the diagnostic resolution has its largest value in the Column reasoning and Bayesian methods.
The optimal value for the error rate is zero, and the Bayesian methods always makes the error rate equal to zero, because the true system behavioural mode always gets a probability greater than zero. The error rate is high for the Row reasoning and the Column reasoning method, assumption 2. This is because of the fact that the benchmark system’s reactions to some faults are delayed due to thresholds of the residuals and time delays in the system. The Column reasoning has a particularly high value of the error rate; it requires that the failure signatures exactly match the test results. The Row reasoning has a high error rate, in the case where the test results are held. Column reasoning, assumption 1, has higher diagnostic resolution than assumption 2, this means that asassumption 1 is more cautious excluding faults. The error rate for assumption 2, on the other hand, is higher than for assumption 1.
Row reasoning, compared to Column reasoning, has a lower diagnostic resolution. This has to do with the minimal diagnoses that the Row reasoning produces. In the multiple fault case, the usage of minimal diagnoses lead to decreased normalised isolation accuracy.
The Bayesian methods have a high NDA for all simulations.
relatively low.
The eﬀect of holding the test result active, once they are activated, is that the ﬂuctuation in diagnosis statement are avoided. This leads to an increased diagnostic accuracy. The results of the evaluation of the performance show that the eﬀects of holding the test results largest for the mixed fault case. It can be explained by that multiplicative faults generates test results that ﬂuctuates more.
For the additative fault scenario, the diagnoses for diﬀerent test results are shown in Appendix B. A comparison of the diagnoses from Column reasoning, assumption 1, and Row reasoning shows that the only diﬀerence between them is how the behavioural mode NF is treated. The diﬀerence between the Bayesian approaches is that when dependence among test is assumed, for the test result [d1d2d3d4] = [0010], says that neither single fault
nor NF is probable. This means that this particular test result was never present during the simulations. If the algorithm sees such a test result it cannot say anything about the present fault.
The diﬀerences between the other assumptions are not very large. It is for example the test result [1000] and [0100] that diﬀers marginally. The Diagnostic Model Processor’s diagnoses looks a little diﬀerent form the other methods. It shows the likelihoods for diﬀerent faults. The likelihood has a value from 1 to +1, and one interpretation of this, in the single fault scenario, is that it shows if the current fault is positive or negative.
7.1.4
Memory usage
The result of the calculations follows by the next sections. Decreasing the number of bits used for storing the tables can reduce the memory usage for the Bayesian methods and the Diagnostic Model Processor. The Row reasoning algorithm uses less memory than any other method that has been considered. In the single fault case, the column reasoning and the row reasoning methods uses an equal amount of memory. The Diagnostic Model Processor uses less memory than the Bayesian methods, but more than the column and row reasoning.
Additative faults
Memory usage, δ[bits]
Single faults Multiple faults
Column reasoning, assumption 1 24 256
Column reasoning, assumption 2 24 256
Row reasoning 24 24
Bayesian method, independence 1 920 20 480
Bayesian method, partial independence 2 688 28 672
Bayesian method, full dependence 6 528 69 632
Diagnostic Model Processor 1 536
Mixed faults
Memory usage, δ[bits]
Single faults Multiple faults
Column reasoning, assumption 1 32 1 024
Column reasoning, assumption 2 32 1 024
Row reasoning 32 32
Bayesian method, independence 2 560 81 920
Bayesian method, partial independence 3 584 114 688
Bayesian method, full dependence 8 704 278 528
Chapter 8
Conclusion
In this chapter the comparison of the isolation algorithms are discussed and conclusions are drawn from the results. Recommendations about when to use which of the diﬀerent algorithms are also presented.
8.1
Discussion
All isolation algorithms presented here are good at isolating single faults in the benchmark problem, but when it comes to multiple faults, it is almost impossible to ﬁnd an isolation algorithm capable of isolating all the 64 and 256 faults respectively. This is because there is too few test quantities, and the number of unique diagnoses that could be stated from the four test quantities is 24 = 16. In the DMP case, the number of unique diagnoses is much higher, because the thresholds are delivered in an other way, and the sign is also taken into account. To be able to increase the isolation performance, more test quantities are needed.
The isolation performance in general depends also on how good the tests are.
The performance measures in this thesis shows that simple algorithms like Row reasoning and Column reasoning are eﬃcient considering memory usage. They are also good at isolating single faults. This could be a good start when developing an isolation system.
If it shows that the isolation performance is not good enough or if the diagnoses are desired to be ranked, then DMP or a Bayesian approach could be interesting. For systems, where there are memory restrictions, DMP would be preferred. If the NDA is not high enough, the Bayesian algorithms should be used. It is important to know that there is a trade oﬀ between memory usage and NDA.
Bayesian method, where the test results are assumed to be independent, is a good estimation of P (CD) considering both isolation performance and error rate. This could be a good alternative if there are memory restrictions. The other assumptions about independence in the Bayesian algorithm are only necessary to use if better precision is required or if the test results have strong dependence. The diﬀerence between the assumptions might also grow with decreased fault sizes.
8.2
Recommendation
From the conclusions above the following can be recommended: Isolation method Suits
Column reasoning  Small to largescale systems  When memory usage is restricted Row reasoning  When diagnosing largescale systems
 When multiple faults needs to be diagnosed  Systems with narrow memory restrictions Bayesian methods  When the diagnosis statement needs to be
ranked, for example, when other isolation algorithms produces too many diagnoses
 For medium to largescale systems  If memory usage is not an issue Diagnostic Model
Processor
 For smallmedium sized systems
 When the diagnosis statement needs to be ranked
 When there are memory restrictions
Table 8.1: The isolation methods are listed below together with the type of problems the respective method is recommended.
8.3
Summary
In this thesis the following goals have been reached:
• Implementation of four isolation methods has been made on a benchmark problem.
• Performance measures has been gathered and developed.
8.4
Future work
The following future work is recommended.
• Extend the benchmark by adding extra tanks and tests to be able to see how the isolation algorithms handles additional faults and measure complexity etc.
• The performance of the isolation depends on how the tests are formed and ﬁltering of the tests. More work is needed to be able to ﬁnd methods for optimising tests for the isolation.
Bibliography
[O. Bouamama] B. Ould Bouamama, R. Mrani Alaoui, P. Taillibert and M. Staroswiecki Diagnosis of a twotank system, 2001.
[Nyberg, Frisk] Mattias Nyberg, Erik Frisk Model Based Diagnosis of Technical Processes, 2005.
[Jensen] Mathias Jensen Distributed Fault Diagnosis for Networked Embedded Systems, 2003.
[Gertler] J. Gertler, D. Singer A New Structural Framework for Parity Equationbased failure Detection and Isolation., 381388, Automatica, 1990.
[Pulido] B. Pulido, V. Puig, T. Escobet, J. Quevedo A new fault localization algorithm that improves..., 2005.
[Wotawa] F. Wotawa A variant of Reiter’s hittingset algorithm, 1999. [Pernest˚al] A. Pernest˚al A Bayesian Approach to Fault Isolation  Structure
Estimation and Inference, 2005.
[Petti] Petti et al. Diagnostic Model Processor: using deep knowledge for process fault diagnosis. AICHE Journal, 36(4):565575, 1990.
[Ni˚ArPe] Anders Nilsson, KarlErik ˚Arz´en, Thomas F. Petti Modelbased diagnosis  State transition events and constraint equations, 1992. [M. Ko´scielny] Jan Maciej Ko´scielny Fault isolation in industrial processes
by the dynamic table of states method, Automatica, Vol 31, No 5, 747753, 1995.
[Schmid] F. Schmid ModelBased Fault Detection And Isolation: A New Approach for Fault Isolation in Dynamic Networks with Time Delays, diploma thesis, 2004.
Appendix A
Notation and parameters
Quantity Description Value Unit
ε1 Measurement noise V
ε2 Measurement noise V
ε3 Measurement noise V
ε4 Measurement noise V
rate noise my1 5.00e4
rate noise my2 3.00e4
rate noise mUp 1.00e7
rate noise mQp 1.00e7
T Sample time 1 s
A1 Area tank 1 0.0154 m2
A2 Area tank 2 0.0154 m2
C_{vb} Global hydraulic ﬂow coeﬃcient of valve
Vb
1.59e4
C_{vo} Hydraulic ﬂow coeﬃcient of valve Vo 1.60e4
h_{1c} Reference value for PIcontroller 0.5 m
h1 Water level in tank 1  m
h2 Water level in tank 2  m
h1max Maximal height of the tank T1 0.6 m
h2max Maximal height of the tank T2 0.6 m
K_{p} Proportional control constant 1.00e3
K_{i} Integration control constant 5.00e6
P1 Pump
Qint1 Inlet ﬂow tank T1  m3/s
Qint2 Inlet ﬂow tank T2  m3/s
Qout1 Out ﬂow from T1  m3/s
Qout2 Out ﬂow from T2  m3/s
Q12 Flow from T1 to T2  m3/s
Q_{f1} Leak ﬂow from T1 m3/s
Quantity Description Value Unit
Q_{o} Out ﬂow to consumer  m3/s
Q_{pmax} Max ﬂow from P1 0.01 m3/s
Q_{p} Water ﬂow from P1  m3/s
Qm_{p} Measured water ﬂow from P1  m3/s
T1 Tank 1 
T2 Tank 2 
U_{b}m Control signal for valve V_{b}  V
U_{o}m Control signal for valve V_{o}  V
U_{p} Control signal for the pump, P1  V
V1 Valve 1 
V2 Valve 2 
V_{b} Valve V_{b} 
ym Measured level in tanks  m
ym_{1} Measured level in tank T1  m
ym_{2} Measured level in tank T2  m
C System behavioural mode 
c_{i} Status for component no j 
D Test result 
d_{i} Status for test no i 
i Component index 
j Test/residual/satisfaction index 
k Sample 
l 
m Number of components 
n Number of test resultsresidualssatisfaction
tests  r_{j} Residual number j  a Assumption  F_{i} Failure likelihood  S_{ij} Sensitivity function  vsf_{i} Satisfaction vector  P (C) Prior probability 
P (CD) Probability of the system behavioural
mode, C, given the test results, D

P (DC) Probability of the test results, D, given the system behavioural mode C
Quantity Description Value Unit
L Total number of samples used when
evaluating the performance measures

f (CD_{C}) Diagnostic conﬁdence of the system behavioural mode C, given the test result D_{C}
D_{C} The test result from system behavioural
mode C
α Normalised diagnostic accuracy 
β Error rate  δ Memory usage  γ Diagnostic resolution  x  y  z  ξ  ζ  ϑ 
τ1 Threshold for residual 1 6.30e4/
7.63e5/ 7.63e5*
τ2 Threshold for residual 2 3.74e4/
1.23e4/ 1.23e4*
τ3 Threshold for residual 3 1.20e7/
1.14e7/ 3.34e7*
τ4 Threshold for residual 4 2.24e7/
1.03e7/ 1.03e7* *) Threshold for the cases: additative faults/ mixed faults/ mixed faults
with ﬁltering
FSM used for additative faults
c1 c2 c3 c4 c5 c6
d1 0 x x x 0 0
d2 0 x x 0 x 0
d3 0 0 0 0 0 x
FSM for mixed faults c1 c2 c3 c4 c5 c6 c7 c8 d1 x x x x 0 0 x x d2 0 x x 0 x 0 x x d3 0 x 0 x 0 x 0 0 d4 x 0 0 0 0 x x 0
Sensitivity matrix, S, for additative faults
c1 c2 c3 c4 c5 c6
d1 0 0.87 0.87 56.46 0 0
d2 0 0.2815 0.85 0 54.92 0
d3 0 0 0 0 0 0.93
d4 0.87 0 0 0 0 0.87
Sensitivity matrix, S, for mixed faults
c1 c2 c3 c4 c5 c6 c7 c8
d1 1 1 1 1 0 0 1 1
d2 0 1 1 0 1 0 1 1
d3 0 1 0 1 0 1 0 0