• No results found

A Change Detection and Segmentation Toolbox for Matlab

N/A
N/A
Protected

Academic year: 2021

Share "A Change Detection and Segmentation Toolbox for Matlab"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

A Change Detection and Segmentation Toolbox for Matlab

Fredrik Gustafsson

Dept. of Electrical Engineering

Linkoping University

S-58183 Linkoping, Sweden

Abstract

This report describes the algorithms implemented in a Matlab toolbox for change detection and data segmentation. Functions are provided for simulating changes, choosing design parameters and detecting abrupt changes in signals.

1 Introduction

A signal or system is said to be abruptly changing when it can be described by a parametric model with a piecewise constant parameter vector . Of primary interest is the vector of change times and also the sequence of parameter vectors. These are referred to as jumptimes and thseg, respectively.

A change detector compares the residuals from lters matched to dierent hypoth- esis about possible change times. Designing a change detector includes the following subproblems:

Choose a data model structure (nnn) used to compute residuals from input-output data (z).

Choose a search scheme to reduce the number of hypothesis.

Choose a distance measure (DM) for comparing dierent hypothesis, which is a function of the residuals.

Some distance measures must be supported by a stopping rule (SR) for deciding when the distance measure is large enough for accepting a change hypothesis.

Work done whilst visiting Department of Electrical and Computer Engineering, University of Newcastle, Australia

1

(2)

Basically, the search schemes are implemented as dierent functions where the other choices are input parameters. Generally, there is no distinction between detection, estimation of one change time, and segmentation, simultaneous estimation of several change times. When a detector has found a change, it is simply restarted to look for a new one.

The toolbox contains the following functions:

Purpose Syntax

Simulation y=simchange(z,nnn,jumptimes,thseg)

Linear ltering thhat,epsi] = filt(z,nnn,ff)

Detection/Segmentation jumptimes,thseg,gt]=onemodel(z,nnn,DM,SR)

jumptimes,thseg,gt]=twomodel(z,nnn,DM,SR,M)

jumptimes,gt]=cpe(z,nnn,DM,h)

jumptimes,thseg] =segm(z,nnn,DM)

jumptimes,thseg,lr]=mlr(z,nnn,DM)

jumptimes,thseg,lr]=glr(z,nnn)

Design L0 = arl(h,th,acc,Nstep,dsp)

L = MC(h,th,Niter)

h = cusumdesign(L0,th,hmax) h = chi2(d,alpha)

Model conversion nnn = ss2nnn(A,B,C,D,Q,R,P0)

A,B,C,D,Q,R,P0] = nnn2ss(nnn)

Plot segplot(z,jumptimes)

Help helpdetect

Demonstration demodetect

The following sections treat the subproblems listed above and explain the possible solutions and the Matlab syntax.

2 Data models

In any model based approach to signal processing, the user has to specify a mathematical model for the data. Filters Gy and Gu, if there is an exogenous input, matched to the data model and applied to the signal give normally distributed residuals "t under H0, the hypothesis of no model change:

Gy(q;1)yt+Gu(q;1)ut ="t2NID(0St)

where NID denotes independent Normal distribution. All change detection approaches aim to test if the residuals are independent, zero mean or normally distributed. First,

2

(3)

we need to characterize the possible models that can be used, which indirectly specify the data lters Gu and Gy.

2.1 Mathematical model denitions

Five linear models are supported by the toolbox. In all cases, the n jump times are denoted k1k2::kn and the piecewise constant parameters in segment i by (i), i = 012::n.

1. A piecewise constant oset in white noise:

yt = (i) +et:

2. A regression model with arbitrary regressor 't:

yt='Tt (i) +et:

3. An auto-regression (AR) of order na as a special case of 2

'Tt = (;yt;1;yt;2::;yt;na)

yt = 'Tt (i) +et:

4. An autoregression with exogenous input AR X as one further special case of 2

'Tt = (;yt;1;yt;2::;yt;naut;nkut;nk;1::ut;nk;nb+1)

yt = 'Tt (i) +et:

5. A linear state space model where the change is momentarily injected in the state is given by

xt+1 = Axt+But+Gwt+Xn

i=1(t;ki) (i)

yt = Cxt+Dut+et

where the noise covariances are denoted

Q= EwtwTt

(i)  R = Ee2t

(i) P0 = Ex0xT0

(i)

In the four rst cases, the measurement noise variance is denoted (i) = Ee2t, and it might change in the dierent segments. The non-standard noise scaling (i) in 5 is a feature that can improve the performance quite much on real signals.

The appropriate linear lter matched to kn is the recursive least squares (RLS) scheme 11] for models 1-4 and the Kalman lter 3] for model 5. These lters deliver the residuals "t(kn) and the estimated noise variance ^(i). Nothing need to be known about RLS and the Kalman lter except that they all produce white and independent normally distributed residuals as long as they are matched to the correct model of the signal.

3

(4)

2.2 The structure parameter

nnn

The structure of the linear model is conveniently collected in the parameter nnn, and the data in a matrixz. The syntax is as follows:

1. nnn = ] and z = y for a piecewise constant mean model.

2. nnn = ] and z = y Phi] for a piecewise constant linear regression model.

3. nnn = na and z = y for a piecewise constant AR(na) model.

4. nnn = na nb nk]andz = y u]for a piecewise constant ARX(na,nb,nk) model.

5. nnn =  A B Q P0 C D R zeros(1,2*length(C)-1)]andz = y u]for a state space model.

For the last model, there are two macros for model conversions,

nnn=ss2nnn(ABCDQRP0)

ABCDQRP0] =nnn2ss(nnn)

2.3 Simulation

The function simchange facilitates experiments and design of detectors. In Matlab notation, the row vector of jump times is denoted jumptimes and the n+ 1 column vectors (i) are collected in the d(n+ 1) matrix thseg. The output from a linear system subject to the abrupt changes specied in jumptimes and thseg is simulated by

y=simchange(znnnjumptimesthseg)

where z = e], z = e Phi], z = e], z = e u] orz = e u w], respectively.

The result of a simulation or segmentation is conveniently plotted by

segplot(zjumptimes)

2.4 Filtering

Thefiltfunction allows a quick examination of the signal. The residuals are computed by a recursive least squares (RLS) scheme with forgetting factor for the rst four model structures and by a Kalman lter for the last one. The syntax is

thhatepsi] =filt(znnn)

where z is either y or y u] if there is an input. By plotting the residuals, abrupt changes can be visually detected.

4

(5)

3 Search schemes

A well-known problem in detection and segmentation is the computational complexity.

In detection, there are t+ 1 hypothesis at time t corresponding to

H

0 : no change

H

k

1 : a change at timek

In segmentation, there can be a change at each time instant giving 2t hypothesis. The following ways to decrease the complexity have been proposed:

Compute the residuals under H0 only and apply a whiteness test to its residuals as done in for instance 9].

Compute the residuals under H0 and H1L for only one jump time L as proposed in 6] and 5]. If the residuals from H1L are smaller, a change is decided and the actual change time kL is estimated in a second step.

Only change times in a sliding window are considered as done in 17].

In the segmentation case, a pre-determined numberM of hypothesis is considered at each time iteration. Only theM most likely sequencesknfrom timetare saved to time t+ 1, where the possibility of a new change gives 2M sequences, which after considering the new measurement, are again decreased to M sequences and so on. The implementation requires M parallel lters and is described in 4] and

?].

These methods are implemented in the following functions

onemodel.

twomodel: The window size is one argument.

glr: The window size is one argument.

segm: The arguments include the number of considered hypothesis, a minimum segment length and how many data a new segment must contain before it can be rejected.

4 Distance measures

A distance measure st is a function of the residual at time t, which is expected to be small under the no jump hypothesis H0 and large after a change. There are two kind of distance measures. The rst one needs thresholding or a stopping rule, which is not needed in the second kind. The rst class is not parsimonious, that is, there is no inherent mechanism preventing a more complicated model (one with a change) to be

5

(6)

preferred to a simpler one (without a jump) even if there is no evidence for it in data.

This is the reason for thresholding or incorporating a stopping rule.

This is not the case for the second class, which satises the parsimonious princi- ple. The distance measure is here denoted V(kn) and it is a function of all residuals.

This measure for each kn is directly comparable to other ones, which makes this class very suitable for segmentation. That is, the possibility for several jumps (n > 1) is considered.

4.1 Non-parsimonious measures

4.1.1 Whiteness test

If just one model is estimated, some dierent whiteness tests of its residuals have been proposed. The simplest one is to check the average

st= p"t





which should sum up to approximately zero under H0. The normalization with its standard deviation is for simplifyng the choice of thresholds. The 2-test is another possibility,

st= "2t

 :

By summing up n terms, we have a 2(n) distribution and the actual value can be compared to a table.

4.1.2 Comparative measures

Here two models, 0 and 1, corresponding to no jump and a jump at a certain time instant are compared. The GLR test 17, 5] gives

st= log(0)

(1) +

2t(0)

(0) ;

2t(1)

(1) and the divergence test 6] gives

st= (0)

(1) ;1 +



1 + (0)

(1)

!

2t(0)

(0) ;2t(0)t(1)

(1) :

4.2 Parsimonious measures

The likelihood for data, given all parameters nkn nn and a Gaussian assumption on all stochastic elements, is easily computed as

;2logp(yNjkn nn) = C+Xn

i=1 ki

X

t=ki;1+1

(yt;'t (i))2

(i) + (ki;ki;1)log(i) 6

(7)

The likelihood can be minimized with respect to the nuisance parameters n and n, which gives the following distance measure

V(kn) = minnn;2logp(yNjkn nn) = C+Xn

i=1log

0

@

ki

X

t=ki;1+1(yt;'t^(i))2

1

A

:

This is not a parsimonious distance measure, since it is a decreasing function ofn. The likelihood can however be complemented with a penalty term like in Akaike's AIC 1]

and BIC 2] and Rissanen's MDL 13] criteria for model structure selection as suggested in 10, 18].

V(kn) = ;2logp(yNjkn nn) = C+Xn

i=1log

0

@

ki

X

t=ki;1+1(yt;'t^(i))2

1

A+f(N)nd:

Here nd is the number of parameters used in the segmentation, and f(N) = logN for MDL and BIC and f(N) = 2 for AIC.

Another possibility is to marginalize the likelihood with respect to the nuisance parameters

V(kn) = ;2logZ

n Z

np(yNjkn nn)d ndn:

This leads to a parsimonious alternative to the GLR test ?], which will be referred to as marginalized likelihood ratio (MLR) test.

4.3 Representation

Dierent distance measures are supported by the dierent detection/segmentation func- tions. Foronemodel, there are

1. Mean in residuals.

2. The 2-test.

and for twomodel

1. The GLR test. In combination with the sliding window assumed in this function, this is commonly referred to as Brandt's GLR.

2. The divergence test.

The rst one seems to be more popular and is default. For the parsimonious measures, there are the following possibilities for mlr:

1. The marginalized likelihood with known constant noise scaling (i) =. 2. The marginalized likelihood with unknown constant noise scaling (i) =.

7

(8)

3. The marginalized likelihood with unknown changing noise scaling (i).

where alternative 3 is default. For segm, there are

1. The marginalized likelihood with known constant noise scaling (i) =. 2. The marginalized likelihood with unknown changing noise scaling (i).

3. The likelihood with AIC penalty term.

4. The likelihood with BIC/MDL penalty term.

The second possibility is the most powerful for real signals and is the default one.

5 Stopping rules

The non-parsimonious distance measures st are often proposed to be used in combina- tion with a stopping rule. Two possibilites are supported. The stopping time ta in the cumulative sum (CUSUM) 12, 8] algorithm is dened as

gt = max(0gt;1+st; )

ta = mint (gt>h):

Here h is the threshold and a drift parameter related to the smallest possible change that can be detected. The stopping time of the geometric moving average (GMA) 14]

test is

gt = gt;1+st

ta = mint (gt >h): (1)

In both cases, there is a forgetting of past data in and respectively. Since we cannot forget more than we know, the CUSUM test statistic is reseted to 0 if it becomes negative. A very rough estimate of the change time is computed in the toolbox. For the CUSUM test, the last zero time forgt is taken

^

k = maxk (gk= 0) and for the GMA test

^

k= maxk (gk <0:1gt):

The CUSUM and GMA algorithms test for a positive bias in the test statistic. There are also two-sided versions to test for both positive and negative biases. All these cases can be summarized as below:

g

+t = max(g0 gt;1+st; )

g

;t = max(g0 gt;1;st; )

ta = mint (gt+>h+ org;t >h;):

Here g0 =;1, = 0 in the GMA test and = 1 in the CUSUM test. For one-sided tests h; =1, otherwiseh; =h+.

8

(9)

6 Change point estimation

The detection problem is recognized as change point estimation in the statistical liter- ature. The assumption is that the mean of a white stochastic process changes at time

k underH1(k):

yt = 0+et t k

yt = 1+et t >k:

We summarize the survey paper 15] on dierent procedures to test H0 to H1. The following sub-problems are considered:

P1. 1 > 0, where 0 is unknown.

P2. 1 > 0, where 0 = 0 is known.

P3. 1 6= 0, where 0 is unknown.

P4. 1 6= 0, where 0 = 0 is known.

The changing mean model is the data model 1 in Section 2, so the change point esti- mation in P3and P4are special cases of the other methods. However, the scalar case enables one-sided tests, and the non-parametric approaches below are interesting.

6.1 The Bayesian approach

A Bayesian approach where the prior probability of all hypothesis are the same gives:

P1 U1B = XN

t=2

t(yt;y)

P2 U2B = XN

t=2tyt

P3 U3B = 1

N 2

NX;1 k=1

N

X

t=k+1(yt;y)2

P4 U4B = 1

N 2

NX;1 k=1

N

X

t=k+1

y

t2

where y is the sample mean of yt. If U > h, where h is a prespecied threshold, one possible estimate of the jump time (change point) is given by

P3 k^3B = arg max1k<N 1

N ;k

N

X

t=k+1(yt;y)2 for P3 and similarily forP4.

9

(10)

6.2 The maximum likelihood approach

Using the ML method the test statistics are as follows:

P1 U1ML = maxk q yk+1N;y1k

k

;1+ (N ;k);1

P2 U2ML = maxk pN ;kyk+1N

P3 U3ML = maxk (yk+1N;y1k)2

k

;1 + (N ;k);1

P4 U4ML = maxk (N ;k)yk2+1N

where ymn = n;m1+1 Pnt=myt. If H1 is decided, the jump time estimate is given by replacing maxk by arg maxk.

6.3 A non-parametric approach

The Bayesian and maximum likelihood approaches presumed a Gaussian distribution for the noise. Non-parametric tests for the two rst problems assuming only whiteness are the following ones:

P1 U1NP = max

1k<N

sik;Esik Varsik where

s

1k = XN

t=k+1

I(yt;medy >0)

s

2k = XN

t=k+1 N

X

m=1I(ytym):

Here med denotes the median andI is the indicator function. These are a kind of white- ness test, based on the idea that under H0, yt is larger than its mean with probability 50 %. Determining expectation and variance of sik is a standard probability theory problem. For instance, the distribution ofs1k is hypergeometric under H0. Again, if H1 is decided the jump time estimate is given by the maximizing argument.

6.4 Implementation

Change point estimation is implemented in

jumptimethseggtU] =cpe(y]DMh) The distance measure DM might be

10

(11)

1 Bayesian

2 Maximum likelihood

3 Non-parametric I

4 Non-parametric II

and, ifDM is a vector, the second element chooses the problem formulation.

7 Design

The design of a CUSUM detector (1) requires tuning of the parameters h and . Nor- mally, some typical faults are examined and is chosen to one half of their minimum inuence on the residuals. A less cumbersome and more pragmatic approach is to x

to, say, one standard devation of the noise. The threshold h is then chosen from a specied mean time between false alarms.

There exists one function to evaluate the analytical properties of the CUSUM de- tector, namely the Average Run Length (ARL) function. Suppose that the input to the CUSUM algorithm is white noise with mean 0 and variance 2. The ARL function is dened as

L0 = E(tajno change h  0):

That is, L0 is the average stopping time of the algorithm for a false alarm. It is indeed a function of only two parameters

L0 =f



h



 ;

0



!

=f(h):

This function replaces time-consuming Monte-Carlo simulations for tuning h and . The ARL function is evaluated numerically (see Eq. 5.2.28 in 7] or 8]) in

L0=arl(hth)

Since it is an ill-conditioned problem sometimes, it may take some time to compute.

An accurate explicit approximation is derived in 16]

  e

;2(h+1:166)(;);1 + 2(h+ 1:166)( ; )

2 2 : (2)

This ecient approximation is one option in arl.

The ARL function is very sensitive to its design parameters. The mean time L0 says nothing about the distribution of the alarms times. It is advisable to evaluate the

nal design using Monte Carlo simulations. This can be done by

L=MC(hthNiter)

11

(12)

The vector L contains the time instant for the rst false alarm in theNiter iterations.

In practice, the inverse of the ARL function would be more useful. That is, com- pute the threshold h from a specied mean time between false alarms L0. No explicit solution seems to be known. A very simple search strategy, using the ARL function, is implemented in

h=cusumdesign(L0thh0)

It starts by computing L0=arl(h0,th) and then uses a bisection technique to nd the solution The approximation (2) of the ARL function is used to speed up the calculation.

Finally,

h =chi2(d)

computes the threshold for a 2 distribution with d degrees of freedom, which gives a condence level (probability of false alarms) of .

References

1] H. Akaike. Fitting autoregressive models for prediction. Ann. Inst. Statist. Math., 21:243{247, 1969.

2] H. Akaike. On entropy maximization principle. In Symposium on Applications of Statistics, 1977.

3] B.D.O. Anderson and J.B. Moore. Optimal Filtering. Prentice Hall, Englewood Clis, NJ., 1979.

4] P. Andersson. Adaptive forgetting in recursive identication through multiple models. International Journal of Control, 42(5):1175{1193, 1985.

5] U. Appel and A.V. Brandt. Adaptive sequential segmentation of piecewise station- ary time series. Information Sciences, 29(1):27{56, 1985.

6] M. Basseville and A. Benveniste. Design and comparative study of some sequential jump detection algorithms for digital signals. IEEE Transactions on Acoustics, Speech and Signal Processing, 31:521{535, 1983.

7] M. Basseville and I.V. Nikiforov. Detection of abrupt changes: theory and applica- tion. Information and system science series. Prentice Hall, Englewood Clis, NJ., 1993.

8] C.S. Van Dobben de Bruyn. Cumulative sum tests: theory and practice. Hafner, New York, 1968.

9] R.H. Jones, D.H. Crowell, and L.E. Kapuniai. Change detection model for serially correlated multivariate data. Biometrica, 26:269{280, 1970.

12

(13)

10] G. Kitagawa and H. Akaike. A procedure for the modeling of nonstationary time series. Ann. Inst. Statist. Math., 30:351{360, 1978.

11] L. Ljung and T S!oderstr!om. Theory and Practice of Recursive Identication. MIT Press, Cambridge MA, 1983.

12] E.S Page. Continuous inspection schemes. Biometrika, 41:100{115, 1954.

13] J. Rissanen. Stochastic Complexity in Statistical Inquiry. World Scientic, Singa- pore, 1989.

14] S.W. Roberts. Control charts based on geometric moving averages. Technometrics, 8:411{430, 1959.

15] A. Sen and M.S. Srivastava. On tests for detecting change in the mean. Annals of Statist., 3:98{108, 1975.

16] D. Siegmund. Sequential analysis { tests and condence intervals. Series in statis- tics. Springer, 1985.

17] A.S. Willsky and H.L. Jones. A generalized likelihood ratio approach to the detec- tion and estimation of jumps in linear systems. IEEE Transactions on Automatic Control, pages 108{112, 1976.

18] Y. Yao. Estimating the number of change points via Schwartz' criterion. Statistics and probability letters, pages 181{189, 1988.

13

References

Related documents

Purpose The purpose of this study is to see if the distance to a hospital performing colon cancer surgery is a risk factor for emergency surgical intervention and to determine

Observations from a Distance was a solo exhibition of ten new necklaces held at the Ornamentum Gallery in Hudson, New York from August 18 – September 17, 2018.. I often

It has been separated into four different categories: team members distribution which looks at how the teams are distributed between offices and comments regarding this;

Finally a whole range of di↵erent frequencies were applied to the circuit and the signal amplitude and phase angle were measured both with the developed tool but also with a

Förekomst av såväl muntlig som skriftlig rapportering bör rimligen diskuteras igenom och kontraktsregleras på förhand, t ex bör i så fall modern få genomläsa och kommentera

The Swedish Institute for Wood Technology Re- search serves the five branches of the industry: saw- mills, manufacturing (joinery, wooden houses, fur- niture and other

Vi använde oss utav parprogrammering för att skapa grunden för DataAccess-, Logic- samt Presentation-klasserna för startsidan för projektredovisning samt sidan för att lägga till en

The next step started around 1970 when the dierent soubroutines for model estimation and model analysis were collected together with abet- ter user interface and direct