An Analysis of Diagnostic Test Performance with Regard to Dispersion Among Individuals

(1)

An Analysis of Diagnostic Test Performance with Regard to Dispersion Among Individuals

ANDERS SIMON

Masters’ Degree Project Stockholm, Sweden January 2009

XR-EE-RT 2009:001

(2)

An Analysis of Diagnostic Test Performance with Regard to Dispersion Among Individuals

ANDERS SIMON

Master’s Thesis at Dept. of Electrical Engineering Supervisor: Anna Pernestål, Anders Granberg

Examiner: Bo Wahlberg

(3)

(4)

Abstract

The methods used today at Scania to test and calibrate diagnostic tests are often carried out given data from only a few trucks. This is due to the expenses and vast amount of time associated with collecting data.

The dispersion between different sensor individuals is thus not included in the evaluation of a diagnostic test. Often data from a single truck is collected, the data is then used for diagnostic test calibration. This leads to that when the diagnostic test is implemented in a different truck with a different set of sensors, the diagnostic test might not act as desired.

The main focus of this thesis is the development of a method that eval- uates the performance of a diagnostic test, both when data is taken from only one individual, and when data is taken from a whole population. The method makes it possible to use data from only one individual to predict the diagnostic test performance for the whole population of individuals. A performance evaluating method that can take sensor individuals into account is compared with a method that can not. These methods are tested on parts of the Scania Selective Catalytic Reduction diagnosis system. Results show that the authenticity of the evaluated diagnostic test performance is improved significantly when the requirements for using the developed method are fulfilled.

(5)

Referat

Statistisk utvärdering av diagnostester med fokus på individspridning

Designen och utvärderingen av ett diagnostest på Scania sker idag ofta givet data från endast ett fåtal lastbilar. Detta för att insamlandet av ny kördata är en dyr och tidskrävande process. Den spridning mellan olika lastbilar som finns, individspridningen, inkluderas då inte i utvärderin- gen. Individspridningen kan till exempel bero på att olika lastbilar har olika uppsättningar av givare. Det ger till följd att prestandan för ett diagnostest är olika för olika lastbilar.

I detta examensarbete behandlas utvärdering av ett diagnostest med hänsyn taget till individspridning samt kravsättning av ett diagnostest.

En metod presenteras som givet data från endast en lastbil uppskattar ett diagnostests prestanda. Metoden gör det möjligt att använda data från endast en individ för att uppskatta ett diagnostests prestanda för en hel population av individer. Den framtagna metoden har applicerats och utvärderats på ett diagnostest designat för att detektera reducerad verkningsgrad i Scanias SCR-efterbehandlingssystem. Resultat visar att autenticiteten av ett diagnostests prestanda utvärdering ökar markant då kraven för att använda den utvecklade metoden är uppfyllda.

(6)

Acknowledgement

I would like to start off by giving a great thanks to my supervisors Anna Pernestål and Anders Granberg. This thesis would not have been possible without your ideas and valuable feedback. Then I want to thank my thesis coworker Joakim Goldkuhl for all the great discussions and his substantial contribution to this thesis. I also want thank all the people working at NED for a great working atmosphere and all the pleasant coffee breaks.

I want to thank Marcus, Josef and Samuel, who were also doing their thesis work at NED, for many very much appreciated lunch breaks up at Syd.

Finally I want to thank my family and my friends for all the support you have given me.

Anders Simon Södertälje, December 2008.

(7)

Introduction

1.1 Background

Diagnosis systems are frequently used with both medical and technical applications.

The task of a diagnosis system is to, from observations and knowledge generate a diagnosis, i.e. to decide whether there is a fault present or not and, if so also iden- tify the fault. Within the vehicle industry diagnosis systems are commonly used to ensure both safety and reliability of a vehicle but also to make sure that emission legislations are kept.

A diagnosis system can consist of one or many diagnostic tests. Let us consider a diagnostic test for a heavy duty truck and let the diagnostic test be used to ensure that exhaust emission levels are not too high. Let the diagnostic test be dependent on sensor measurements. This is a very plausible scenario within the vehicle industry. The methods used today to test and calibrate such a diagnostic test does typically not take dispersion between different sensor individuals, mounted on different trucks, into account. Different sensor individuals could for example be temperature sensors with different built in bias faults. Often data from a single truck is collected, the data is then used for diagnostic test calibration. This leads to that when the diagnostic test is implemented in a different truck with a different set of sensors, the diagnostic test might not act as desired.

When evaluating and calibrating a diagnostic test there is usually only a limited amount of data available. Sometimes data is collected through physical test runs and sometimes it is possible to collect data through computer simulations. Within the vehicle industry it can be both expensive and time consuming to collect data through physical testing. It can therefore be desirable to keep the number test runs needed for evaluation low.

We will in this thesis develop a method of how to evaluate the performance of diagnostic tests, both for when data is taken from the whole population and for when

(11)

CHAPTER 1. INTRODUCTION

data is taken from only one data sample set, i.e. data from only one truck. A performance evaluating method that takes differences between for example sensor individuals into account will be compared with a method that does not. These methods will then be tested on parts of the SCR diagnosis system.

1.2 Objectives

The objectives of this thesis is to:

• Develop methods and performance measures for evaluation of diagnostic tests when given representative data.

• Given data from one individual, evaluate the expected performance of a diagnostic test on the whole population of individuals.

• Apply the performance evaluating method on chosen parts of the Scania Se- lective Catalytic Reduction (SCR) diagnosis system.

• For a given diagnostic test with a desired boundary on probability of false alarm and probability of detection, derive requirements on variance allowed in models and sensors used by the diagnostic test.

This master thesis was carried out at Scania CV AB in Södertälje. The work has been done under supervision of Automatic Control, Dept. of Electrical Engineering at KTH.

1.3 Related Work

Work related to this thesis has been done by Haraldsson [3] regarding design of diagnostic tests and how to set a residual threshold for fault detection. According to Nyberg [5] the performance of a diagnostic test can be evaluated through the use of a power function. The power function can be calculated for a diagnostic test.

Having the fault size as input, it calculates the probability of the outcome of the diagnostic test. [5] calculates the power function from a large amount of data from many individuals. It is however not discussed in [5] nor [3] how to handle the case when data from only one individual is available and how this effects the diagnosis.

Other methods of how to evaluate the performance of a diagnostic test are ROC (Receiver Operating Characteristics) and RPF (Residual Performance Function).

ROC and RPF are both functions of a threshold limit while the power function is a function of a fault size. ROC and RPF are suitable when the diagnosed fault lacks a fault size, this could for example be stuck sensor. More information about ROC and RPF can be found in [4] and [6]. It is not discussed in [4] nor [6] how to handle the case when data from only one individual is available and how this effects the

2

(12)

1.3. RELATED WORK

diagnosis.

Overall, to the author’s knowledge, the methods used in this thesis, to evaluate the performance of a diagnostic test when data from only one individual is available, have not previously been applied on diagnostic test evaluation.

In the area of diagnosis false alarm and missed detection are often discussed. How to define these two is not trivial. There are several ways of how to define false alarms and missed detection. In [7] Pernestål and Nyberg discuss false alarms and missed detection in relation to service failures of components. Their work helped develop the theories used in this thesis to define false alarms and missed detection.

In this thesis we will continue developing existing performance evaluating methods.

Methods will be designed to also take dispersion among individuals into account.

(13)

(14)

Chapter 2

Diagnosis Theory

In this chapter we will go through the background theory that lies as a foundation to the developed performance evaluating methods and requirements.

2.1 Faults and Service Failures

Systems are designed to deliver services. Take for example the Selective Catalytic Reduction (SCR) system, the service it provides is to reduce emissions to an ac- cepted level. A sensors service is to provide measurements. Let us say that a fault appears which leads to that a measured temperature differs from true temperature.

How far off from the true value should sensor readings be before the diagnosis sys- tem detects it as a faulty sensor. We introduce so called Service Failures, this view on faults was introduced by Perenstål and Nyberg [7]. We define the following:

• Fault: Deviation of at least one characteristic property or variable of the system from nominal behavior/acceptable behavior.

• Service Failure: An unpermitted deviation in a systems ability to perform a required task.

With this view on faults, we say that a sensor should be diagnosed as faulty if sensor readings from that particular sensor leads to a service failure. A service failure could for example be an SCR system not being able to reduce emissions to legal emission levels. We can therefore say that it is ”ok” for the system to have faults as long as the faults are not large enough to cause a service failure. We can then state that the goal of diagnosis is to detect whether there is service failure present or not, and if a service failure is present, isolate the source of the service failure.

2.2 False Alarms and Missed Detection

When setting up requirements for a diagnostic test, false alarm and missed detection are often discussed. With the following example, false alarm and missed detection

(15)

CHAPTER 2. DIAGNOSIS THEORY

will be defined.

Let z be the outcome of a random variable Z. Let Z have a sample set defined as:

Ω Entire sample set

Θ0 No service failure present

L Alarm

F_L= L ∩ Θ₀ Alarm, no service failure present ML= Ω\(Θ0∪ L) No Alarm, service failure present For illustration see figure 2.1.

Figure 2.1. Sample Set for a random variable Z

Ideally, from a diagnosis perspective, we would like that:

L = Ω\Θ0

i.e. we have that the probability of ML, P (ML), and the probability of FL, P (FL), are both equal to zero. In real life this is seldom possible to achieve. We must take so called false alarms and missed detection into consideration. Following [7]

we define the two events, ”false alarm” and ”missed detection” as the expressions shown in (2.1) and (2.2) below. Let us call this approach the exclusive definition:

False Alarm = {z : (Z ∈ FL|Z ∈ Θ0)} (2.1) Missed Detection = {z : (Z ∈ M_L|Z ∈ Ω\Θ₀)} (2.2) An alternative way to define false alarm and missed detection, could be to use the expressions shown in (2.3) and (2.4). Let us call this approach the absolute definition. We separate the exclusive definition from the absolute definition by adding a star to absolute definition. We define the absolute definition as:

6

(16)

2.2. FALSE ALARMS AND MISSED DETECTION

False Alarm* = z : (Z ∈ FL|Z ∈ Ω) (2.3) Missed Detection* = z : (Z ∈ ML|Z ∈ Ω) (2.4) It is not obvious which of the two definitions that should be used to define false alarm and missed detection. We will explain the difference between the two with a simple example. Let Z(θ) be a random variable. Depending on the fault size θ, the outcome of Z(θ), z, will belong to one of the different sample sets defined above (see Figure 2.1). We will then get following:

• P (False Alarm*) = {Probability of alarm, no service failure is present, when the outcome of Z(θ), z, can belong to the whole sample set Ω (i.e a sample set where service failure can be present)} = P (FL)

• P (False Alarm) = {Probability of alarm, no service failure is present, when the outcome of Z(θ), z, is restricted to the sample set Θ0 (i.e a sample set where no service failure can be present)} = P (FL|Θ₀)

One can argue that the expression in (2.1) is a ”better” definition of false alarm since it’s out of context to discuss false alarms when service failures can be present.

Therefore we agree with [7] and we will use the exclusive definition to define false alarm. In the same way we can argue that the exclusive definition should be used to define missed detection.

Using the exclusive definition we get the following probability expressions:

P (False Alarm) = P (F_L|Θ₀) = P ((L ∩ Θ₀)|Θ₀) =

= {def: conditional probability, [1]} = P ((L ∩ Θ0) ∩ Θ0) P (Θ₀) =

= P (L ∩ (Θ₀∩ Θ₀))

P (Θ0) = P (L ∩ Θ₀)

P (Θ0) = P (L|Θ₀) (2.5)

P (Missed Detection) = P (M_L|Ω\Θ₀) = 1 − P ((L ∩ Ω\Θ₀)|Ω\Θ₀) =

= 1 −P ((L ∩ Ω\Θ0) ∩ Ω\Θ0)

P (Ω\Θ₀) = 1 −P (L ∩ (Ω\Θ0∩ Ω\Θ0)) P (Ω\Θ₀) =

= 1 −P (L ∩ Ω\Θ₀)

P (Ω\Θ0) = 1 − P (L|Ω\Θ₀) (2.6)

We can now extend the exclusive definition to also include the definition of detection.

We define detection as:

Detection = z : (Z ∈ L|Z ∈ Ω\Θ₀) (2.7)

(17)

The probability of detection can written as:

P (Detection) = P (L|Ω\Θ₀) = 1 − P (Missed Detection) (2.8) We define Alarm as:

Alarm = z : (Z ∈ L|Z ∈ Ω) (2.9)

And the probability of Alarm can written as:

P (Alarm) = P (L) (2.10)

2.3 Diagnosis Requirements

2.3.1 Detecting Requirements

As mentioned above in Section 2.1 the desired accomplishment of a diagnosis system is to, through observations and knowledge, generate a diagnosis, i.e. to decide whether a service failure is present or not. In automotive applications high probability of detection is important to ensure safety, reliability and that environmental requirements are met. On the other hand, what is not tolerated are false alarms.

Diagnosis must be performed in a way that ensures small probability of detecting a service failures when no service failures are present, but at the same time have high probability of detecting failures when they are present. Let S = (S₁, . . . , S_k) be the set of services that we surveil, and let Si= f ail denote that service i has failed. A requirement on fault detection could then be ”Si= f ail should be detected with at least probability γ_i”. If a failure is detected an alarm is set off. The above stated can be written in a more formal way. Let A be the set of possible alarms. Let A = 1 denote that at least one alarm in A is set off i.e. A = 1 is equivalent to at least one Si is detected as fail. A = 0 denotes that none of the alarms in A has been set off. This means that A = 0 is equivalent to that none of the Si services are detected as fail. We can then write the detection requirements as

P (Detection) = P (A = 1|Si= f ail) ≥ γi. (2.11) For a vehicle to be well received on the market the probability of false alarms must be low. In a same way as above we say that the probability of false alarms must be less than λ. That gives us the following requirement.

P (False Alarm) = P (A = 1|S = ok) ≤ λ (2.12) Where S = ok states that no service failure is present. The diagnostic test perfor- mance measure that will be used in this thesis is the evaluated P (Detection) and P (False Alarm) for a given diagnostic test.

8

(18)

2.4. TEST DESIGN

2.3.2 Isolation Requirements

Another part of diagnosis is isolation i.e. to define the source of the failure. The process of diagnosis can be divided into three steps. In the first step we observe or measure the process that is subjected to diagnosis. In the second step we generate diagnostic tests and statements that are used in the third step to isolate the fault (see Figure 2.2). In this thesis there will not be any focus on how isolation can be done.

Figure 2.2. Illustration of the 3 steps of diagnosis

2.4 Test Design

There are numerous ways of how to create a diagnostic test. Depending on the process and what parts that are being diagnosed, some tests are more suitable than others. According to [5] some of the most common approaches are the following.

(19)

2.4.1 Hardware Redundancy

One way of making sure a measured value is correct, is to have multiple sensors measuring the same state. At least two sensors are needed. If the residual between the two sensors become to large, one can suspect that one of the two sensors is malfunctioning. To be able to tell which sensor is faulty you need a third sensor.

The sensor measuring a value that deviates the most from the third sensor is the faulty one. The drawbacks with this approach is that, depending on type of sensor, having multiple sensors could be an expensive solution. Another problem is that all three sensors in the example above could be faulty.

2.4.2 Signal Range Check

Another approach is to use expert knowledge of what values a measured signal should be in when service failures are not present. The system then checks if measured signal is within a certain range. A service failure is detected as present if measured signal goes past an upper or a lower threshold. A drawback with this method is that the output domain for a process with a faulty sensor might be very similar to the output domain for a process with a properly functioning sensor.

2.4.3 Model Based Diagnosis

The idea with model based diagnosis is to compare measured values of the process with estimated values from a model of the process. A residual r(t) is formed as:

r(t) = ys(t) − ym(t) (2.13)

Where ys(t) is the measured value and ym(t) is the estimated value. The residual r(t) is a function that is ideally zero in the fault free case. In reality, considering the fault free case, the residual will not be zero due to noise and model errors. The residual is often signal processed and a so called Test Quantity (see Section 2.5) is received.

Usually a thresholded test quantity is used to produce a diagnosis statement. Model based diagnosis is an effective and powerful way of doing diagnosis. Drawbacks with model based diagnosis are that it might be very difficult or even impossible to create an accurate enough model of the diagnosed process.

2.5 Test Quantity

Let x = (x₁, . . . , x_n) be a set of observations from some known or unknown dis- tribution. Typically x is residuals. According to [5] we state that the purpose of a test quantity is to better separate residual probability density functions. A test quantity T Q(x) is a function from an observation x to a scalar value. A test quan- tity is often used with Model based diagnosis (see Section 2.4.3). It is there used to determine if a fault is present or not. The test quantity is thresholded and if its value exceeds an upper and/or lower predefined thresholds J_l, J_u, fault detection

10

(20)

2.5. TEST QUANTITY

will respond, i.e. diagnostic tests states that a service failure is present. The idea is to generate probability density functions (pdf:s) for corresponding T Q(x), one pdf for the fault free case and one for when a fault large enough to cause a service failure is implemented. Ideal is to have pdf:s that are well separated. If this is the case, we can easily find a suitable threshold J. J should be chosen so that the probability of detecting a service failure is large, but at the same time have a low probability of false alarms (see figure 2.3).

Figure 2.3. Example of well separated pdf:s

There are many different algorithms for designing a test quantity. Which one that suits best depends on the test quantity distributions, i.e. one distribution from the fault free case and one distribution from when large enough fault is present. One way of constructing a T Q could be:

T Q(x) = 1 N + 1

XN t=0

r(t) (2.14)

In expression (2.14) above, we have that x = r(1), . . . , r(N ) and T Q(x) are based on the mean value of the residual for the N + 1 latest sample values. This ap- proach could for example be used to reduce impact of noise. Other T Q algorithms can, according to [3], be more complex and involve different transformations, noise reduction, subset rejection and normalization.

(21)

2.6 Hypothesis Testing

Given a set of observations x = (x₁, . . . , xn) from a distribution. We wish to test for a null hypothesis. The distribution of x is a function of θ and η where θ defines fault size and η is a random variable defining for example measurement noise and model errors. We emphasize this by writing x as x(θ, η). Let the null hypothesis H₀ be the case when no service failure is present. In this thesis only binary hypothesis testing is considered i.e. either H0 is rejected or H0is not rejected. If a test quantity T Q(x(θ, η)) is defined and we let C be a rejecting region where C is a subset to the set of possible outcome of T Q(x(θ, η)). We can then use the following significance test.

if T Q(x(θ, η)) ∈ C reject H0

if T Q(x(θ, η)) /∈ C do not reject H₀ The significance level α is defined as

α = P (T Q(x(θ, η)) ∈ C|H0 is true).

The significance level is the same as the false alarm rate for the diagnostic test. It is, as mentioned in Section 2.3, desired to keep the probability of false alarm small but at the same time keep probability of detecting service failures high. These requirements always contradict each other. One realizes this by looking at figure 2.3. In this figure the region C is all the values on the horizontal axis on right hand side of threshold J. If the threshold J is moved further left we have that the probability of detecting service failures is increased, but at the same time the probability of having false alarms is also increased. If instead J would have been moved further right, the situation would have been the other way around i.e. both probabilities would have decreased. Depending on how well separated the T Q pdf:s are, relative the size of the variance of TQ outcome, adequate probabilities for false alarms and service failure detection can be achieved.

2.7 Power Function

In [5] the Power Function is defined as

β(θ) = P (reject H₀|Θ = θ) = P (T Q(x(θ, η)) > J|Θ = θ). (2.15) Where x, θ and η are defined as above. The fault size θ is the outcome of a random variable Θ. With a defined test quantity T Q(x), and a threshold J, one can, through simulations or analytical expressions, calculate the power function β(θ). A power function, β(θ), together with the knowledge of what fault sizes cause service failures, calculates the probability of false alarm and missed detection for a given fault size θ. The power function is therefore a method to evaluate the performance of a diagnostic test. Figure 2.4 shows a typical power function where the dotted

12

(22)

2.8. SUMMARY

lines indicate θ_ll (lower limit) and θ_ul (upper limit) which are faults large enough to cause a service failure, i.e.

θ ≤ θ_ll θ ≥ θul

)

⇒ Service Failure.

The probability of rejecting hypothesis H₀increases as the fault size, θ, increase.

Figure 2.4. Typical power function

For a given diagnostic test, ideally we would want its corresponding power function to have the following properties

β(θ) = 0, ∀ θ ∈ [θll, θul]

β(θ) = 1, ∀ θ /∈ [θll, θul]. (2.16) If the power function fulfills (2.16) we have that

P (Detection) = 1 P (False Alarm) = 0.

2.8 Summary

In this chapter we started off by describing the fundamentals of diagnosis. We defined false alarm and missed detection. We defined a diagnostic test performance

(23)

measure and we gradually built up the theory needed to introduce the power func- tion, β(θ).

14

(24)

Chapter 3

Handling Dispersion

In this chapter we will extend existing performance evaluating methods. Methods will be designed to also take dispersion among individuals into account.

3.1 Multidimensional Power Function

In real world diagnostic tests we often have a multidimensional power function. The reason for this is because the probability density function for a test quantity often depend on variables other than the fault size θ. This could for example be due to dispersion among individuals. Let ϕ define uncertainties other than fault size θ.

Let ϕ be the outcome of a random variable Φ. As an example take a diagnostic test designed to detect if a fluid has reached a certain degree of dilution. Let the diagnostic test have the following setup:

A sensor measures degree of dilution, if measured degree of dilution is larger than a threshold J the diagnostic test will alarm. The sensor measuring degree of dilution can have a built in bias fault. The size of the built in bias fault can be different for different sensor individuals. This means that two setups of the above described diagnostic test, i.e. two sensors measuring degree of dilution, might not respond equally.

In the above example θ would represent the degree of dilution and Φ would represent dispersion among sensor individuals. We extend the definition of the power function to also include Φ = ϕ. Let us define the extended power function as:

β(θ, ϕ) = P (reject H₀|θ, ϕ) = P (T Q(x(θ, η, ϕ)) > J|θ, ϕ) (3.1) In many applications it is often preferable to have a power function that depend on θ only. By marginalizing β(θ, ϕ) over all possible ϕ:s we will obtain a power function dependent on θ only.

(25)

CHAPTER 3. HANDLING DISPERSION

As an example we assume that Φ has a discrete distribution and that the outcome of Φ is ϕi where i = 1, . . . , n. Also, let us assume that P (Φ = ϕi) is known. Then we can calculate β(θ) by marginalizing over all possible ϕ:s. We can then express β(θ) as

β(θ) = Xn i=1

β(θ|ϕi)P (ϕi) = Xn i=1

P (T Q(x(θ, η, ϕ)) > J|θ, ϕi)P (ϕi). (3.2)

If instead Φ would have continuous distribution, f_Φ(ϕ), then marginalizing would produce the following expression for β(θ)

β(θ) = Z _∞

−∞β(θ|ϕ)f_Φ(ϕ)dϕ = Z _∞

−∞P (T Q(x(θ, η, ϕ)) > J|θ, ϕ)f_Φ(ϕ)dz. (3.3)

3.2 Estimating the Power Function From One Data Sample Set

Let us consider a diagnostic test for a heavy duty truck and let the diagnostic test be used to ensure that emission levels are not too high. Let the diagnostic test be dependent on sensor measurements. This is a very plausible scenario within the vehicle industry. The methods used today to test and calibrate such a diagnostic test typically does not take dispersion between different sensor individuals into account.

Different sensor individuals could for example be temperature sensors with different built in bias faults. Often data from a single truck is collected, the data is then used for diagnostic test calibration. This leads to that when the diagnostic test is implemented in a different truck with a different set of sensors, the diagnostic test might not act as desired.

When evaluating a diagnostic test there is usually only a limited amount of data available. Data can either be collected through computer simulations or through physical test runs. When a computer model of the diagnosed system is not available, physical test runs, to obtain test data, is the only option. Within the vehicle industry it can be both expensive and time consuming to collect data through physical testing. It can therefore be desirable to keep the number test runs low. We will in this section describe a method of how to create a power function, for the whole population of individuals, by using data from only one data sample set, i.e. data from only one individual.

3.2.1 The Test Quantity Assumption A test quantity can be written as

T Q(η, Θ, Φ). (3.4)

16

(26)

3.2. ESTIMATING THE POWER FUNCTION FROM ONE DATA SAMPLE SET

Where η can either be a single or multidimensional random variable depending on the nature of the test quantity. η can vary within each individual. An example of η is measurement noise. Let Θ, as in previous sections, be a random variable that describe the size of a fault. Let Φ be either a single or multidimensional random variable depending on the nature of the test quantity. Let the outcome of Φ set a parameter value for a given individual. Φ could for example be a constant measurement bias fault where different individuals has different outcomes of Φ. We say that the outcome of Φ specifies the individual. A residual between a calculated model based value and a sensor is an example of a test quantity that fulfills equation (3.4), see Example 1.

Example 1

Let y be the true output signal from a system and let ym be a model based estimate of y. Let ys be a measured sensor value of y. We define ym and ys as

y_m = y + D_m(θ) ys= y +

 = ₀+ µ. (3.5)

Here Dm(θ) describes how the fault θ influence the difference between the true value of y and the model estimated value ym. Let ₀ and µ describe the difference between the sensor value y_s and the true value y. µ is a random variable for which the outcome sets a parameter value for an individual. ₀ is a random variable who can vary within each individual. We create the residual r = y_m− y_s and then let r be the test quantity T Q(r). With the use of equation (3.5) we can write T Q(r) as

T Q(r) = r = ys− ym= y + Dm(θ) − (y + (0+ µ)) =

= D_m(θ) − ₀− µ = T Q(₀, θ, µ). (3.6)

Due to an increasing method complexity when using multiplicative noise we will in this thesis only consider additive noise. We consider the case when the test quantity T Q is a sum of two functions, one function K(η, Θ), where η and Θ varies within each individual, and one function V (Φ), where Φ varies between the different individuals. In this case T Q can the be written as

T Q(η, Θ, Φ) = K(η, Θ) + V (Φ). (3.7) We define a test quantity, T Qi, with a set Φ = ϕi as

T Qi = T (η, Θ, ϕi). (3.8)

The assumption made in (3.7) gives us that the pdf:s of two test quantities, T Qi

and T Q_j, with corresponding Φ values ϕ_i and ϕ_j, are position shifted between each

(27)

other. Mathematically this can can be written as

T Q_i = T Q^d _j− ∆_ij. (3.9)

Where ∆ij = V (ϕj) − V (ϕi) and A = B denotes that A and B have the same^d distribution. This means that if we have the T Q_i pdf for one known individual we can calculate the T Q_j pdf:s for all individuals, j = 1, . . . , n. The T Q_i pdf can for example be estimated from the one data sample set.

Example 1 (cont.)

In this example an individual would be one setup of the diagnosed system. Each setup of the system has a sensor with a measurement error . The sensor has one of the n possible types of measurement error, ₁, . . . , n. The distribution of the measurement errors has the following properties

_i = ^d ₀+ µ_i ∀ i. (3.10)

Where ₀ is a random variable, E[₀] = 0 and E[i] = µi. µi is one of the possible outcomes µ₁, . . . , µnof the random variable µ. This corresponds to that the different sensors have the same measurement noise variance but can differ in offset. Using (3.8) we let T Qibe the test quantity of a system that has a sensor with measurement error i. T Qi can then be written using the same structure as (3.7), since

T Qi= T Q(i, Θ) = T Q(0, Θ, µi) = Dm(Θ) − i =

= −₀− µ_i+ D_m(Θ) = K(₀, Θ) + V (µ_i). (3.11) Where K(₀, Θ) = ₀− Dm(Θ) and V (µi) = µi. If we compare this with expression (3.7), we see that ₀ = η, Θ = Θ and µ_i = (Φ = ϕ). Since D_m(Θ), for a set θ, is constant, we know that the T Q pdf, f_{T Q}_i_|θ(z), is the same as the pdf of i with a position shift of −Dm(θ).

3.2.2 Known ϕ

We will here take advantage of the consequences of expression (3.9). We will explain how β(θ) can be estimated from only one data sample set.

If we let Φ have a discrete distribution and the possible outcomes of Φ are ϕ₁, . . . , ϕ_n. It is then possible to obtain the power function β(θ) by marginalizing over the pos- sible outcomes of Φ. By doing so we can express the power function β(θ) as:

β(θ) = Xn j=1

β(θ|ϕ_j)P (ϕ_j) = Xn j=1

P (T Q_j > J|θ, ϕ_j)P (ϕ_j) (3.12)

18

(28)

3.2. ESTIMATING THE POWER FUNCTION FROM ONE DATA SAMPLE SET

If we instead let the distribution of Φ be continous we get:

β(θ) = Z _∞

−∞

β(θ|ϕ)f_Φ(ϕ)dϕ = Z _∞

−∞

P (T Q_ϕ > J|θ, ϕ)f_Φ(ϕ)dϕ (3.13)

Where f_Φ(ϕ) is the pdf of Φ.

Keeping in mind the statement made in equation (3.9), that we know that the test quantity pdf f_{T Q}_j will be a position shift of f_{T Q}_i shifted ∆_ij. Where ∆_ij = V (ϕi) − V (ϕj). We remind ourselves of the fact that β(θ) is the integrated test quantity pdf area on the right hand side of the threshold J, for a given θ. See Figure 3.1 for an illustration. When calculating β(θ), this gives us that a position

Figure 3.1. Illustration of β(θ)

shift of +∆ij for the test quantity pdf will yield the same results as a -∆ij position shift of the threshold J. We can therefore state the following:

P (T Qj > J|θ, ϕj) = P (T Qi > J − ∆ij|θ, ϕi, ϕj) (3.14) Expression (3.14) states that if, for a given θ value, we have the pdf of one test quantity T Qi and its corresponding Φ value, ϕi, is known. Then we can calculate the T Q pdf for any outcome of Φ.

By using equation (3.12) and (3.14) we can now write the power function as:

(29)

Φ has a discrete distribution βi(θ) =

Xn j=1

P (T Qi> J − ∆ij|θ, ϕi, ϕj)P (ϕj) (3.15)

Φ has a continous distribution βi(θ) =

Z _∞

−∞

P (T Qi> J − ∆iϕ|θ, ϕi, ϕ)f_Φ(ϕ)dϕ (3.16) Where the subscript of β states that the distribution of T Qihas been used to calcu- late the power function β(θ) i.e. the distribution of T Qi has been used to calculate the test quantity distribution for all outcomes of Φ.

In conclusion we can now state the following:

Given data from only one data sample set, a pdf for Θ and a model of V (ϕ). By one data sample set we mean data from for example one truck, i.e. one individual, i.e. a specific known Φ = ϕi. The data set includes data for one individual from when all possible θ fault sizes have been implemented. The data is used to estimate a TQ pdf for each θ fault. We can then, through the use of equation (3.15) together with the assumptions made above, estimate the the power function β(θ). As mentioned above it can often be desirable to be able use data from only one individual instead of data from a whole population.

Example 2 (V (ϕ) modeled as a linear function) V (ϕ) could for example be modeled as a linear function

V (ϕ) = kϕ (3.17)

we then express a position shift ∆ij as

∆ij = V (ϕi) − V (ϕj) = kϕi− kϕj = k(ϕi− ϕj). (3.18) If we let Φ have a discrete distribution and let the possible outcome of Φ be ϕ1, . . . , ϕn. Then the power function βi(θ) can be calculated as follows.

β_i(θ) = Xn j=1

P (T Q_i > J − k(ϕ_i− ϕ_j)|θ, ϕ_i, ϕ_j)P (ϕ_j) = Xn

j=1

Z _∞

J−k(ϕi−ϕj)f_{T Q}_i_|θ,ϕ_i_,ϕ_j(z)dzP (ϕj)

!

(3.19)

20

(30)

3.3. ESTIMATING THE POWER FUNCTION FROM SUFFICIENT AMOUNT OF DATA

3.2.3 Unknown ϕ

We have up until now assumed that the parameter ϕ_i (ϕ_i defining the individual from which we have data) is known for the given data sample set. If ϕi, for the given data sample set, is unknown, then it is no longer possible to use expression (3.15). Instead we can estimate β(θ) withβ(θ).^d β(θ) being the expectation of β(θ)^d with regard to ϕi. The outcome of Φ being ϕi if the distribution of Φ is discrete and ϕ⁰ if continous. This requires that the possible outcome of Φ is known together with with information about either P (Φ = ϕ_i) or f_Φ(ϕ⁰), depending on if the distribution of Φ is discrete or continous. We get the following expression for β(θ):^d

If Φ has a discrete distribution β(θ) =d

Xn i=1



 Xn j=1

(P (T Q_i > J − ∆_ij|θ, ϕ_i, ϕ_j)P (ϕ_j))



P (ϕ_i) =

= Xn i=1

βi(θ)P (ϕi) (3.20)

If Φ has a continous distribution β(θ) =d

Z _∞

ϕ⁰=−∞

Z _∞

ϕ=−∞(P (T Q_ϕ⁰ > J − ∆_ϕ⁰_ϕ|θ, ϕ⁰, ϕ)f_Φ(ϕ)dϕ)

f_Φ(ϕ⁰)dϕ⁰ =

= Z _∞

ϕ⁰=−∞

β_ϕ0(θ)f_Φ(ϕ⁰)dϕ⁰. (3.21)

When the power function β(θ) is obtained the probabilities of false alarm and missed detection can be calculated (see Section 4.1).

3.3 Estimating the power function from sufficient amount of data

Deriving the power function analytically is in most cases very hard or even impossible. The reason for this lies in the fact that the pdf of the test quantity is in most cases unknown. When this is the case estimating the power function is a possible solution. If information about dispersion between different sensor individuals is not available, one approach is to collect alot of data from many different individuals.

The idea is to, through the use of large amount of data from many different individuals, estimate a power function that well describes the whole population of individuals. This can be done with the so called Monte Carlo Simulation method [5]. The method can be described as follows:

1. Assume a distribution of noise in the observations x_i, i = 1, . . . , N .

(31)

2. The random variable Θ is set to a fixed value, Θ = θ_f, for which β(θ_f) is to be calculated.

3. With this fixed fault size θ_f implemented, a large amount of observation data sets, xi, i = 1, . . . , N , are collected. N is typically of at least four digit size.

The data sets are collected from many different individuals.

4. For each data set xi a test quantity T Qi(xi) is calculated.

5. All the T Qi values are collected in a histogram. This histogram will then be used to estimate the probability density function fT Q(x|θ_f)

6. By using a fixed threshold J, β(θf) can now be estimated.

7. Go back to step 2 and fix a new θf. Continue doing so until desired resolution is achieved.

The downside of this method is that you in step 3 need data from the whole spectra of individuals to get a good estimate of the power function. As stated in the beginning of this section we have that an absence of data from all possible individuals is one of the main problems which makes this method inappropriate.

3.4 Summary

In Section 3.2.1 we made assumptions about the test quantity distribution. We then described a method, dependent on these assumptions, of how to create a power function, for the whole population of individuals, by using data from only one data sample set, i.e. data from only one individual. This method takes dispersion among individuals, Φ, into account.

22

(32)

Chapter 4

Performance Measure and Model Accuracy

As stated in Section 2.3.1 we wish, for a given diagnostic test, to be able to evaluate the probability of false alarm and the probability of missed detection. Calculating these probabilities gives us a performance measure that can be used to evaluate a diagnostic test. We will in this section explain how these probabilities can be obtained and there by introduce our developed performance evaluating method.

We will also discuss how to force a requirement on model accuracy.

4.1 Evaluating a Diagnostic Test

We assume that all service failures are caused by one or several malfunctioning parts of the diagnosed system. As defined above, θ is the fault size of the malfunctioning part. A power function, β(θ), together with the knowledge of what fault sizes cause service failures, calculates the probability of false alarm and missed detection for a given θ. A power function does not take the probability of there actually being a θ fault present into account. By marginalizing the power function the desired total probability of false alarm and missed detection can be calculated. To achieve this we need the power function, β(θ), and the pdf f_Θ(θ) (We assume that Θ has a continues distribution). Let Ω be the entire outcome set of Θ. This gives us the following expression:

P (False Alarm) = P (T Q > J|θ ∈ Θ₀) =

= R

θ∈Θ0P (T Q > J|θ)fΘ(θ)dθ R

θ∈Θ0fΘ(θ)dθ = R

θ∈Θ0β(θ)fΘ(θ)dθ R

θ∈Θ0fΘ(θ)dθ (4.1) Where Θ₀ defines the set of θ errors not large enough to cause service failures. The numerator in (4.1) is the marginalization of the power function, the denominator comes from the false alarm definition in Section 2.2 and defines P (θ ∈ Θ₀). In the

(33)

CHAPTER 4. PERFORMANCE MEASURE AND MODEL ACCURACY

same way as above we can express P (Missed Detection) as:

P (Missed Detection) = P (T Q < J|θ ∈ Ω\Θ0) =

= R

θ∈Ω\Θ0P (T Q < J|θ)f_Θ(θ)dθ R

θ∈Ω\Θ0f_Θ(θ)dθ = R

θ∈Ω\Θ0(1 − β(θ))f_Θ(θ)dθ R

θ∈Ω\Θ0f_Θ(θ)dθ (4.2) Where Ω\Θ₀ defines the set of θ errors large enough to cause service failures. Using the definitions from Section 2.2 we get that

P (Detection) = 1 − P (Missed Detection) (4.3) If one wish to calculate P (Alarm) this can be done by extending the integration limits in expression (4.1) to include the whole Θ domain. We then get

P (Alarm) = P (T Q > J) = Z

θ∈ΩP (T Q > J|θ)fΘ(θ)dθ (4.4) If Θ would have a discrete distribution then the calculations above need to be adjusted. The following changes then need to be made

Z

θ∈Θ0

⇒ ^X

θ∈Θ0

(4.5) Z

θ∈Ω\Θ0

⇒ ^X

θ∈Ω\Θ0

(4.6) Z

θ∈Ω⇒ ^X

θ∈Ω

(4.7)

f_Θ(θ)dθ ⇒ P (θ). (4.8)

4.2 Model Accuracy

Within the area of model based diagnosis it should be of interest to know whether the existing model is accurate enough to give a test that fullfill the diagnostic requirements. A diagnostic requirement could for example be: P (False Alarm) ≤ λ.

We will in this section present a method of how to calculate the model accuracy needed to fulfill the above requirement. The presented method is restricted to the special case of when the pdf of the model noise (see 4.2.1) is known.

4.2.1 Model Variance

It is difficult and sometimes impossible to perfectly model complex dynamic systems.

Often linearization and other simplifications are involved which leads to that the model will be more accurate within some regions of operation and less accurate in others. The difference between the true output of a system, y, and the output of an estimated model of the same system, ym, can therefore be seen as a random variable or so called model noise (see Figure 4.1). The variance of this random variable can be used as a measure to define how accurate a model is.

24

(34)

4.2. MODEL ACCURACY

0 10 20 30 40 50 60 70 80 90 100

0 2 4 6 8 10 12 14

t

Output from model, y Real output, y

Figure 4.1. Illustration of typical impact of model variance

4.2.2 Normal Distributed Noise

Let r = y_s− y_m where y_s is the measured output of the diagnosed system and y_m is the output from an estimated model of the diagnosed system. Let us denote the deviation between y_s, and the true output y, _s. In the same way we denote _m to be the deviation between y_m and y. We can think of _m and _s as two independent random variables. Let us consider the following Model based diagnostic test:

If the residual r is larger than a set threshold J, a service failure is assumed to be present. We also have the following information:

ym− y = m where m ∼ N (µm(θ), σm) (4.9)

ys− y = s where s∼ N (µs, σs) (4.10)

r = ys− ym= m− s ⇒ r ∼ N (µs− µm(θ),^qσ_m² + σ²_s) (4.11) Let θ, which is the outcome of a random variable Θ, define fault size. The task of the above described model based diagnostic test is then to detect θ faults large enough to cause a service failure.

With the above configuration we see that the residual distribution, for different θ sizes, only vary in expectation value and not in variance. Let T Q(θ) be the random variable describing the outcome of the residual above when a θ fault is present. This gives us that

T Q(θ) ∼ N (µ(θ),^qσ²_m+ σ²_s) (4.12)

µ(θ) = µs− µm(θ). (4.13)

We then wish to find, for a given σ_s, the variance of the model, σ_m, so that the

(35)

CHAPTER 4. PERFORMANCE MEASURE AND MODEL ACCURACY

following requirement is fulfilled:

P (False Alarm) ≤ λ (4.14)

Let us assume that Θ has a discrete distribution and that the possible outcome of Θ is θi where i = 1, . . . , n. Let θ1, . . . , θm, where m < n, define faults not large enough to cause a service failure. We can then use the results from Section 4.1 and state the above requirement as

P (False Alarm) = Pm

i=1P (T Q(θ_i) > J)P (θ_i) Pm

i=1P (θ_i) ≤ λ. (4.15)

If we replace ^pσ_m² + σ_s² with σ and rewrite (4.15) using (4.12) we get

P (False Alarm) = Pm

i=1(^R_J^∞_σ√¹

2πe⁻^{(z−µ(θi))}^2σ2 ²⁾dz)P (θi) Pm

i=1P (θ_i) ≤ λ. (4.16)

If instead Θ would have a continuous distribution, f_Θ(z), and θ values between 0 and θSF would define faults not large enough to cause a service failure. Then expression (4.16) would instead be

P (False Alarm) = RθSF

0 (^R_J^∞ ¹

σ√

2πe⁻^{(z−µ(θ))2)}^2σ2 dz)f_Θ(θ)dθ RθSF

0 fΘ(θ)dθ ≤ λ. (4.17)

We see that if µ(θi) and P (θi), for the discrete case, and µ(θ) and fΘ(z), for the continuous case, are known, then (4.16) and (4.17) becomes equations of one vari- able, σ. Solving these equations, together with information about σ_s, makes it now possible to calculate the variance of the model, σm, as

σm=^qσ²− σ²_s. (4.18)

Solving equation (4.16) and (4.17) analytically, when m > 1, is in most cases im- possible. Instead we can solve them numerically.

If instead σ_m is given and we wish to find the variance of the sensor, σ_s, so that requirement P (False Alarm) ≤ λ is fulfilled. We can then use the same approach as above and calculate σs as

σs=^qσ²− σ_m². (4.19)

4.3 Summary

By marginalizing the power function with respect to the fault size, θ, a diagnostic test performance evaluating method could be derived. We discussed model accuracy and for a given requirement on P (False alarm) ≤ λ, together with an assumption of

26

(36)

4.3. SUMMARY

normal distributed test quantities, we derived a method of how to choose σ_m for the above requirement to be fulfilled. In Chapter 6 the derived performance evaluating method will be applied on a diagnostic test designed for the Scania SCR system.

We will compare results of when the effects of Φ are taken into consideration with results of when no consideration to Φ is taken.

(37)

(38)

Chapter 5

The SCR system

We will in this chapter give a short introduction to todays emission legislations and how N Ox emission can be reduced with help of an SCR after treatment system.

We will also describe the engine certification process.

5.1 Emission Legislations

Legislations on emission levels for heavy duty trucks are very strict and they will be getting even more strict in the future. The purpose of these laws, among other things, are to reduce the environmental impact that diesel engine exhausts have.

This puts a lot of pressure on the manufacturers. To accommodate these laws they have to develop more efficient engines and after-treatment systems. In this master thesis we will study an after-treatment system called SCR and in particu- lar diagnosis of this system. The SCR system helps reduce so called N O_xemissions.

In Europe manufacturers need to meet the different sets of rules and requirements of the EURO legislations [8]. Included in the demands of the EURO 4 legislations are both restrictions on pollution (N O_x emissions) and demands of an On Board Diagnosis (OBD) system. The purpose of such a system is to make sure that the emission requirements are kept, not just for recently built engines, but throughout an engines life time. The legislations say that the OBD system needs to detect if emissions are too high. What makes this a difficult task is that the emission threshold levels refers to the mean emissions during a specific test cycle, a so called European Transient Cycle (ETC). Due to the structure of this cycle (see Section 5.3.1), it is difficult, during normal every day operation of a vehicle, to say if the emission thresholds are met.

An Analysis of Diagnostic Test Performance with Regard to Dispersion Among Individuals

An Analysis of Diagnostic Test Performance with Regard to Dispersion Among Individuals

ANDERS SIMON

Masters’ Degree Project Stockholm, Sweden January 2009

An Analysis of Diagnostic Test Performance with Regard to Dispersion Among Individuals

Abstract

Referat

Statistisk utvärdering av diagnostester med fokus på individspridning

Acknowledgement

Contents

Chapter 1

Introduction

1.1 Background

1.2 Objectives

1.3 Related Work

Chapter 2

Diagnosis Theory

2.1 Faults and Service Failures

2.2 False Alarms and Missed Detection

2.3 Diagnosis Requirements

2.4 Test Design

2.5 Test Quantity

2.6 Hypothesis Testing

2.7 Power Function

2.8 Summary

Chapter 3

Handling Dispersion

3.1 Multidimensional Power Function

3.2 Estimating the Power Function From One Data Sample Set

3.3 Estimating the power function from sufficient amount of data

3.4 Summary

Chapter 4

Performance Measure and Model Accuracy

4.1 Evaluating a Diagnostic Test

4.2 Model Accuracy

4.3 Summary

Chapter 5

The SCR system

5.1 Emission Legislations