A Change Detection and Segmentation Toolbox for Matlab

(1)

A Change Detection and Segmentation Toolbox for Matlab

Fredrik Gustafsson

Dept. of Electrical Engineering

Linkoping University

S-58183 Linkoping, Sweden

Abstract

This report describes the algorithms implemented in a Matlab toolbox for change detection and data segmentation. Functions are provided for simulating changes, choosing design parameters and detecting abrupt changes in signals.

1 Introduction

A signal or system is said to be abruptly changing when it can be described by a parametric model with a piecewise constant parameter vector . Of primary interest is the vector of change times and also the sequence of parameter vectors. These are referred to as ^jumptimes and ^thseg, respectively.

A change detector compares the residuals from lters matched to dierent hypothesis about possible change times. Designing a change detector includes the following subproblems:

Choose a data model structure (ⁿⁿⁿ) used to compute residuals from input-output data (^z).

Choose a search scheme to reduce the number of hypothesis.

Choose a distance measure (^DM) for comparing dierent hypothesis, which is a function of the residuals.

Some distance measures must be supported by a stopping rule (^SR) for deciding when the distance measure is large enough for accepting a change hypothesis.

Work done whilst visiting Department of Electrical and Computer Engineering, University of Newcastle, Australia

1

(2)

Basically, the search schemes are implemented as dierent functions where the other choices are input parameters. Generally, there is no distinction between detection, estimation of one change time, and segmentation, simultaneous estimation of several change times. When a detector has found a change, it is simply restarted to look for a new one.

The toolbox contains the following functions:

Purpose Syntax

Simulation y=simchange(z,nnn,jumptimes,thseg)

Linear ltering thhat,epsi] = filt(z,nnn,ff)

Detection/Segmentation jumptimes,thseg,gt]=onemodel(z,nnn,DM,SR)

jumptimes,thseg,gt]=twomodel(z,nnn,DM,SR,M)

jumptimes,gt]=cpe(z,nnn,DM,h)

jumptimes,thseg] =segm(z,nnn,DM)

jumptimes,thseg,lr]=mlr(z,nnn,DM)

jumptimes,thseg,lr]=glr(z,nnn)

Design L0 = arl(h,th,acc,Nstep,dsp)

L = MC(h,th,Niter)

h = cusumdesign(L0,th,hmax) h = chi2(d,alpha)

Model conversion nnn = ss2nnn(A,B,C,D,Q,R,P0)

A,B,C,D,Q,R,P0] = nnn2ss(nnn)

Plot segplot(z,jumptimes)

Help ^helpdetect

Demonstration ^demodetect

The following sections treat the subproblems listed above and explain the possible solutions and the Matlab syntax.

2 Data models

In any model based approach to signal processing, the user has to specify a mathematical model for the data. Filters ^Gy and ^Gu, if there is an exogenous input, matched to the data model and applied to the signal give normally distributed residuals ^"t under ^H⁰, the hypothesis of no model change:

Gy(^q^;1)^yt+^Gu(^q^;1)^ut =^"t²NID(0^St)

where NID denotes independent Normal distribution. All change detection approaches aim to test if the residuals are independent, zero mean or normally distributed. First,

2

(3)

we need to characterize the possible models that can be used, which indirectly specify the data lters ^Gu and ^Gy.

2.1 Mathematical model denitions

Five linear models are supported by the toolbox. In all cases, the ⁿ jump times are denoted ^k¹^k²^::^kn and the piecewise constant parameters in segment ⁱ by (ⁱ), ⁱ = 012^::ⁿ.

1. A piecewise constant oset in white noise:

yt = (ⁱ) +^et^:

2. A regression model with arbitrary regressor ^'t:

yt=^'_Tt (ⁱ) +^et^:

3. An auto-regression (AR) of order ⁿa as a special case of 2

'Tt = (^;yt^;1^;yt^;2^::^;yt^;n^a)

yt = ^'_Tt (ⁱ) +^et^:

4. An autoregression with exogenous input ^{AR X} as one further special case of 2

'Tt = (^;yt^;1^;yt^;2^::^;yt^;nâût^;n^kût^;n^k^;1^::ût^;n^k^;n^b⁺¹)

yt = ^'_Tt (ⁱ) +^et^:

5. A linear state space model where the change is momentarily injected in the state is given by

xt⁺¹ = ^Axt+^But+^Gwt+^Xⁿ

i⁼¹(^t^;^ki) (ⁱ)

yt = ^Cxt+^Dut+^et

where the noise covariances are denoted

Q= E^w^t^w^Tt

(ⁱ) ^R = E^e²^t

(ⁱ) ^P⁰ = E^x⁰^x^T⁰

(ⁱ)

In the four rst cases, the measurement noise variance is denoted (ⁱ) = E^e²_t, and it might change in the dierent segments. The non-standard noise scaling (ⁱ) in 5 is a feature that can improve the performance quite much on real signals.

The appropriate linear lter matched to ^kⁿ is the recursive least squares (RLS) scheme 11] for models 1-4 and the Kalman lter 3] for model 5. These lters deliver the residuals ^"t(^kⁿ) and the estimated noise variance ^(ⁱ). Nothing need to be known about RLS and the Kalman lter except that they all produce white and independent normally distributed residuals as long as they are matched to the correct model of the signal.

3

(4)

2.2 The structure parameter

ⁿⁿⁿ

The structure of the linear model is conveniently collected in the parameter ⁿⁿⁿ, and the data in a matrix^z. The syntax is as follows:

1. ^{nnn = ]} and ^{z = y} for a piecewise constant mean model.

2. ^{nnn = ]} and z = y Phi] for a piecewise constant linear regression model.

3. ^{nnn = na} and ^{z = y} for a piecewise constant AR(na) model.

4. nnn = na nb nk]and^{z = y u]}for a piecewise constant ARX(na,nb,nk) model.

5. nnn = A B Q P0 C D R zeros(1,2*length(C)-1)]and^{z = y u]}for a state space model.

For the last model, there are two macros for model conversions,

nnn=^ss2nnn(^A^B^C^D^Q^R^P0)

^A^B^C^D^Q^R^P0] =^nnn2ss(ⁿⁿⁿ)

2.3 Simulation

The function ^simchange facilitates experiments and design of detectors. In Matlab notation, the row vector of jump times is denoted ^jumptimes and the ⁿ+ 1 column vectors (ⁱ) are collected in the ^d(ⁿ+ 1) matrix ^thseg. The output from a linear system subject to the abrupt changes specied in ^jumptimes and ^thseg is simulated by

y=^simchange(^zⁿⁿⁿ^jumptimes^thseg)

where ^{z = e]}, z = e Phi], ^{z = e]}, ^{z = e u]} orz = e u w], respectively.

The result of a simulation or segmentation is conveniently plotted by

segplot(^z^jumptimes)

2.4 Filtering

The^filtfunction allows a quick examination of the signal. The residuals are computed by a recursive least squares (RLS) scheme with forgetting factor for the rst four model structures and by a Kalman lter for the last one. The syntax is

^thhat^epsi] =^filt(^zⁿⁿⁿ)

where ^z is either ^y or ^{y u]} if there is an input. By plotting the residuals, abrupt changes can be visually detected.

4

(5)

3 Search schemes

A well-known problem in detection and segmentation is the computational complexity.

In detection, there are ^t+ 1 hypothesis at time ^t corresponding to

H

0 : no change

H

k

1 : a change at time^k

In segmentation, there can be a change at each time instant giving 2^t hypothesis. The following ways to decrease the complexity have been proposed:

Compute the residuals under ^H⁰ only and apply a whiteness test to its residuals as done in for instance 9].

Compute the residuals under ^H⁰ and ^H¹^L for only one jump time ^L as proposed in 6] and 5]. If the residuals from ^H¹^L are smaller, a change is decided and the actual change time ^k^L is estimated in a second step.

Only change times in a sliding window are considered as done in 17].

In the segmentation case, a pre-determined number^M of hypothesis is considered at each time iteration. Only the^M most likely sequences^kⁿfrom time^tare saved to time ^t+ 1, where the possibility of a new change gives 2^M sequences, which after considering the new measurement, are again decreased to ^M sequences and so on. The implementation requires ^M parallel lters and is described in 4] and

?].

These methods are implemented in the following functions

onemodel.

twomodel: The window size is one argument.

glr: The window size is one argument.

segm: The arguments include the number of considered hypothesis, a minimum segment length and how many data a new segment must contain before it can be rejected.

4 Distance measures

A distance measure ^st is a function of the residual at time ^t, which is expected to be small under the no jump hypothesis ^H⁰ and large after a change. There are two kind of distance measures. The rst one needs thresholding or a stopping rule, which is not needed in the second kind. The rst class is not parsimonious, that is, there is no inherent mechanism preventing a more complicated model (one with a change) to be

5

(6)

preferred to a simpler one (without a jump) even if there is no evidence for it in data.

This is the reason for thresholding or incorporating a stopping rule.

This is not the case for the second class, which satises the parsimonious principle. The distance measure is here denoted ^V(^kⁿ) and it is a function of all residuals.

This measure for each ^kⁿ is directly comparable to other ones, which makes this class very suitable for segmentation. That is, the possibility for several jumps (ⁿ ^> 1) is considered.

4.1 Non-parsimonious measures

4.1.1 Whiteness test

If just one model is estimated, some dierent whiteness tests of its residuals have been proposed. The simplest one is to check the average

st= ^p^"^t

which should sum up to approximately zero under ^H⁰. The normalization with its standard deviation is for simplifyng the choice of thresholds. The ²-test is another possibility,

st= ^"²^t

:

By summing up ⁿ terms, we have a ²(ⁿ) distribution and the actual value can be compared to a table.

4.1.2 Comparative measures

Here two models, 0 and 1, corresponding to no jump and a jump at a certain time instant are compared. The GLR test 17, 5] gives

st= log(0)

(1) +

2t(0)

(0) ^;

2t(1)

(1) and the divergence test 6] gives

st= (0)

(1) ^;1 +

1 + (0)

(1)

!

2t(0)

(0) ^;2^t(0)t(1)

(1) ^:

4.2 Parsimonious measures

The likelihood for data, given all parameters ⁿ^kⁿ ⁿⁿ and a Gaussian assumption on all stochastic elements, is easily computed as

;2log^p(^y^N^jkⁿ ⁿⁿ) = ^C+^Xⁿ

i⁼¹ kⁱ

X

t⁼k^i;1+1

(^yt^;^'t (ⁱ))²

(ⁱ) + (^ki^;^ki^;1)log(ⁱ) 6

(7)

The likelihood can be minimized with respect to the nuisance parameters ⁿ and ⁿ, which gives the following distance measure

V(^kⁿ) = minⁿⁿ^;2log^p(^y^N^jkⁿ ⁿⁿ) = ^C+^Xⁿ

i⁼¹log

0

@

kⁱ

X

t⁼k^i;1⁺¹(^yt^;^'t^(ⁱ))²

1

A

:

This is not a parsimonious distance measure, since it is a decreasing function ofⁿ. The likelihood can however be complemented with a penalty term like in Akaike's AIC 1]

and BIC 2] and Rissanen's MDL 13] criteria for model structure selection as suggested in 10, 18].

V(^kⁿ) = ^;2log^p(^y^N^jkⁿ ⁿⁿ) = ^C+^Xⁿ

i⁼¹log

0

@

kⁱ

X

t⁼k^i;1+1(^yt^;^'t^(ⁱ))²

1

A+^f(^N)^nd:

Here ^nd is the number of parameters used in the segmentation, and ^f(^N) = log^N for MDL and BIC and ^f(^N) = 2 for AIC.

Another possibility is to marginalize the likelihood with respect to the nuisance parameters

V(^kⁿ) = ^;2log^Z

n Z

ⁿ^p(^y^N^jkⁿ ⁿⁿ)^d ⁿ^dⁿ^:

This leads to a parsimonious alternative to the GLR test ?], which will be referred to as marginalized likelihood ratio (MLR) test.

4.3 Representation

Dierent distance measures are supported by the dierent detection/segmentation functions. For^onemodel, there are

1. Mean in residuals.

2. The 2-test.

and for ^twomodel

1. The GLR test. In combination with the sliding window assumed in this function, this is commonly referred to as Brandt's GLR.

2. The divergence test.

The rst one seems to be more popular and is default. For the parsimonious measures, there are the following possibilities for ^mlr:

1. The marginalized likelihood with known constant noise scaling (ⁱ) =. 2. The marginalized likelihood with unknown constant noise scaling (ⁱ) =.

7

(8)

3. The marginalized likelihood with unknown changing noise scaling (ⁱ).

where alternative 3 is default. For ^segm, there are

1. The marginalized likelihood with known constant noise scaling (ⁱ) =. 2. The marginalized likelihood with unknown changing noise scaling (ⁱ).

3. The likelihood with AIC penalty term.

4. The likelihood with BIC/MDL penalty term.

The second possibility is the most powerful for real signals and is the default one.

5 Stopping rules

The non-parsimonious distance measures ^st are often proposed to be used in combination with a stopping rule. Two possibilites are supported. The stopping time ^ta in the cumulative sum (CUSUM) 12, 8] algorithm is dened as

gt = max(0^gt^;1+^st^;)

ta = min_t (^gt^>^h)^:

Here ^h is the threshold and a drift parameter related to the smallest possible change that can be detected. The stopping time of the geometric moving average (GMA) 14]

test is

gt = ^gt^;1+^st

ta = min_t (^gt ^>^h)^: (1)

In both cases, there is a forgetting of past data in and respectively. Since we cannot forget more than we know, the CUSUM test statistic is reseted to 0 if it becomes negative. A very rough estimate of the change time is computed in the toolbox. For the CUSUM test, the last zero time for^gt is taken

^

k = max_k (^gk= 0) and for the GMA test

^

k= max_k (^gk ^<0^:1^gt)^:

The CUSUM and GMA algorithms test for a positive bias in the test statistic. There are also two-sided versions to test for both positive and negative biases. All these cases can be summarized as below:

g

+t = max(^g⁰^gt^;1+^st^;)

g

;t = max(^g⁰^gt^;1^;^st^;)

ta = min_t (^g_t⁺^>^h⁺ or^g^;_t ^>^h^;)^:

Here ^g⁰ =^;1, = 0 in the GMA test and = 1 in the CUSUM test. For one-sided tests ^h^; =¹, otherwise^h^; =^h⁺.

8

(9)

6 Change point estimation

The detection problem is recognized as change point estimation in the statistical liter- ature. The assumption is that the mean of a white stochastic process changes at time

k under^H¹(^k):

yt = ⁰+^et ^t ^k

yt = ¹+^et ^t ^>^k:

We summarize the survey paper 15] on dierent procedures to test ^H⁰ to ^H¹. The following sub-problems are considered:

P1. ¹ ^> ⁰^{, where} ⁰ is unknown.

P2. ¹ ^> ⁰^{, where} ⁰ = 0 is known.

P3. ¹ ⁶= ⁰, where ⁰ is unknown.

P4. ¹ ⁶⁼ ⁰^{, where} ⁰ = 0 is known.

The changing mean model is the data model 1 in Section 2, so the change point estimation in P3^and P4are special cases of the other methods. However, the scalar case enables one-sided tests, and the non-parametric approaches below are interesting.

6.1 The Bayesian approach

A Bayesian approach where the prior probability of all hypothesis are the same gives:

P1 ^U¹^B ⁼ ^X^N

t⁼²

t(^yt^;^y)

P2 ^U²^B = ^X^N

t⁼²^tyt

P3 ^U³^B = 1

N 2

N^X^;1 k⁼¹

N

X

t⁼k⁺¹(^yt^;^y)²

P4 ^U⁴^B = 1

N 2

N^X^;1 k⁼¹

N

X

t⁼k⁺¹

y

t2

where ^y is the sample mean of ^yt. If ^U ^> ^h, where ^h is a prespecied threshold, one possible estimate of the jump time (change point) is given by

P3 ^k^{^}³^B ^{= arg max}1k<N 1

N ;k

N

X

t⁼k⁺¹(^yt^;^y)² for P3 and similarily forP4.

9

(10)

6.2 The maximum likelihood approach

Using the ML method the test statistics are as follows:

P1 ^U¹^ML ^{= max}_k ^q ^y^k⁺¹^N^;^y¹^k

k

;1+ (^N ^;^k)^;1

P2 ^U²^ML ^{= max}_k ^p^N ^;^k^yk⁺¹N

P3 ^U³^ML ^{= max}_k ⁽^y^k⁺¹^N^;^y¹^k⁾²

k

;1 + (^N ^;^k)^;1

P4 ^U⁴^ML = max_k (^N ^;^k)^y_k²⁺¹_N

where ^ymn = _n^;_m¹⁺¹ ^P_nt⁼_m^yt. If ^H¹ is decided, the jump time estimate is given by replacing maxk by arg maxk.

6.3 A non-parametric approach

The Bayesian and maximum likelihood approaches presumed a Gaussian distribution for the noise. Non-parametric tests for the two rst problems assuming only whiteness are the following ones:

P1 ^U¹^NP = max

1k<N

sik^;E^s_ik Var^s_ik where

s

1k = ^X^N

t⁼k⁺¹

I(^yt^;med^y ^>0)

s

2k = ^X^N

t⁼k⁺¹ N

X

m⁼¹^I(^yt^ym)^:

Here med denotes the median and^I is the indicator function. These are a kind of whiteness test, based on the idea that under ^H⁰, ^yt is larger than its mean with probability 50 %. Determining expectation and variance of ^s_ik is a standard probability theory problem. For instance, the distribution of^s¹_k is hypergeometric under ^H⁰. Again, if ^H¹ is decided the jump time estimate is given by the maximizing argument.

6.4 Implementation

Change point estimation is implemented in

^jumptime^thseg^gt^U] =^cpe(^y]^DM^h) The distance measure DM might be

10

(11)

1 ^Bayesian

2 Maximum likelihood

3 Non-parametric I

4 Non-parametric II

and, if^DM is a vector, the second element chooses the problem formulation.

7 Design

The design of a CUSUM detector (1) requires tuning of the parameters ^h and . Nor- mally, some typical faults are examined and is chosen to one half of their minimum inuence on the residuals. A less cumbersome and more pragmatic approach is to x

to, say, one standard devation of the noise. The threshold ^h is then chosen from a specied mean time between false alarms.

There exists one function to evaluate the analytical properties of the CUSUM detector, namely the Average Run Length (ARL) function. Suppose that the input to the CUSUM algorithm is white noise with mean ⁰ and variance ². The ARL function is dened as

L0 = E(^ta^jno change ^h ⁰)^:

That is, ^L0 is the average stopping time of the algorithm for a false alarm. It is indeed a function of only two parameters

L0 =^f

h

;

0

!

=^f(^h)^:

This function replaces time-consuming Monte-Carlo simulations for tuning ^h and . The ARL function is evaluated numerically (see Eq. 5.2.28 in 7] or 8]) in

L0=^arl(^h^th)

Since it is an ill-conditioned problem sometimes, it may take some time to compute.

An accurate explicit approximation is derived in 16]

^e

;2(h⁺¹:¹⁶⁶⁾⁽^;⁾;1 + 2(^h+ 1^:166)( ^;)

2 ² ^: (2)

This ecient approximation is one option in ^arl.

The ARL function is very sensitive to its design parameters. The mean time ^L0 says nothing about the distribution of the alarms times. It is advisable to evaluate the

nal design using Monte Carlo simulations. This can be done by

L=^MC(^h^th^Niter)

11

(12)

The vector ^L contains the time instant for the rst false alarm in the^Niter iterations.

In practice, the inverse of the ARL function would be more useful. That is, compute the threshold ^h from a specied mean time between false alarms ^L0. No explicit solution seems to be known. A very simple search strategy, using the ARL function, is implemented in

h=cusumdesign(^L0^th^h0)

It starts by computing L0=arl(h0,th) and then uses a bisection technique to nd the solution The approximation (2) of the ARL function is used to speed up the calculation.

Finally,

h =^chi2(^d)

computes the threshold for a ² distribution with ^d degrees of freedom, which gives a condence level (probability of false alarms) of .

References

1] H. Akaike. Fitting autoregressive models for prediction. Ann. Inst. Statist. Math., 21:243{247, 1969.

2] H. Akaike. On entropy maximization principle. In Symposium on Applications of Statistics, 1977.

3] B.D.O. Anderson and J.B. Moore. Optimal Filtering. Prentice Hall, Englewood Clis, NJ., 1979.

4] P. Andersson. Adaptive forgetting in recursive identication through multiple models. International Journal of Control, 42(5):1175{1193, 1985.

5] U. Appel and A.V. Brandt. Adaptive sequential segmentation of piecewise station- ary time series. Information Sciences, 29(1):27{56, 1985.

6] M. Basseville and A. Benveniste. Design and comparative study of some sequential jump detection algorithms for digital signals. IEEE Transactions on Acoustics, Speech and Signal Processing, 31:521{535, 1983.

7] M. Basseville and I.V. Nikiforov. Detection of abrupt changes: theory and applica- tion. Information and system science series. Prentice Hall, Englewood Clis, NJ., 1993.

8] C.S. Van Dobben de Bruyn. Cumulative sum tests: theory and practice. Hafner, New York, 1968.

9] R.H. Jones, D.H. Crowell, and L.E. Kapuniai. Change detection model for serially correlated multivariate data. Biometrica, 26:269{280, 1970.

12

(13)

10] G. Kitagawa and H. Akaike. A procedure for the modeling of nonstationary time series. Ann. Inst. Statist. Math., 30:351{360, 1978.

11] L. Ljung and T S!oderstr!om. Theory and Practice of Recursive Identication. MIT Press, Cambridge MA, 1983.

12] E.S Page. Continuous inspection schemes. Biometrika, 41:100{115, 1954.

13] J. Rissanen. Stochastic Complexity in Statistical Inquiry. World Scientic, Singa- pore, 1989.

14] S.W. Roberts. Control charts based on geometric moving averages. Technometrics, 8:411{430, 1959.

15] A. Sen and M.S. Srivastava. On tests for detecting change in the mean. Annals of Statist., 3:98{108, 1975.

16] D. Siegmund. Sequential analysis { tests and condence intervals. Series in statistics. Springer, 1985.

17] A.S. Willsky and H.L. Jones. A generalized likelihood ratio approach to the detection and estimation of jumps in linear systems. IEEE Transactions on Automatic Control, pages 108{112, 1976.

18] Y. Yao. Estimating the number of change points via Schwartz' criterion. Statistics and probability letters, pages 181{189, 1988.

13