# Conditional particle ﬁlters for system identiﬁcation

## Full text

(1)

### Conditional particle filters forsystem identification

Fredrik Lindsten

Division of Automatic Control Linköping University, Sweden

(2)

### Identification of state-space models

2(28)

Consider a nonlinear, discrete-time state-space model, xt+1

ft

xt; θ

vt

θ

,

yt

ht

xt; θ

et

θ

. We observe

y1:T

y1, . . . , yT

### }

. Aim: Estimateθgiveny1:T.

(3)

### Latent variable model

3(28)

Alternate between updatingθ and updatingx1:T.

Frequentists:

• Find ˆθML

arg maxθpθ

y1:T

### )

.

• Use e.g. the EM algorithm.

Bayesians:

• Findp

θ

y1:T

### )

.

• Use e.g. Gibbs sampling

(4)

4(28)

Definition

LetK

y

x

### )

be a Markovian transition kernel and letπ

x

### )

be some probability density function.Kis said to leaveπinvariant if

Z

K

y

x

π

x

dx

π

y

.

• A Markov chain

Xn

### }

n1with transition kernelKhasπas stationary distribution.

• If the chain is ergodic, thenπis its limiting distribution.

(5)

### ex) Invariant distribution

5(28)

0 100 200 300 400 500

−10 0 10 20 30 40

Time (t) xt

−100 −5 0 5 10

0.05 0.1 0.15 0.2

x

Densityvalue

(6)

### ex) Invariant distribution

5(28)

0 2000 4000 6000 8000 10000

−10 0 10 20 30 40

Time (t) xt

−100 −5 0 5 10

0.05 0.1 0.15 0.2

x

Densityvalue

(7)

### Markov chain Monte Carlo

6(28)

Objective: Sample from intractable target distributionπ

x

### )

.

Markov chain Monte Carlo (MCMC):

• Find kernelK

y

x

which leavesπ

x

### )

invariant (key step).

• Sample a Markov chain with kernelK.

• Discard the transient (burnin) and use the sample path as approximate samples fromπ.

(8)

7(28)

ex) Sample from,

x

y



;10 10



,2 1 1 1



.

Gibbs sampler

• Drawx0

π

x

y

;

• Drawy0

π

y

x0

.

0 5 10 15

0 5 10 15

X

Y

(9)

8(28) Aim: Findp

θ

y1:T

### )

.

Alternate between updatingθ and updatingx1:T. MCMC: Gibbs sampling for state-space models. Iterate,

• Drawθ

r

p

θ

x1:T

r

1

, y1:T

;

• Drawx1:T

r

p

x1:T

θ

r

, y1:T

### )

.

The above procedure results in a Markov chain,

θ

r

, x1:T

r

### ]}

r1

with stationary distributionp

θ, x1:T

y1:T

.

(10)

8(28) Aim: Findp

θ, x1:T

y1:T

### )

.

Alternate between updatingθ and updatingx1:T. MCMC: Gibbs sampling for state-space models. Iterate,

• Drawθ

r

p

θ

x1:T

r

1

, y1:T

;

• Drawx1:T

r

p

x1:T

θ

r

, y1:T

### )

.

The above procedure results in a Markov chain,

θ

r

, x1:T

r

### ]}

r1

with stationary distributionp

θ, x1:T

y1:T

.

(11)

8(28)

Aim: Findp

θ, x1:T

y1:T

### )

.

Alternate between updatingθ and updatingx1:T.

MCMC: Gibbs sampling for state-space models. Iterate,

• Drawθ

r

p

θ

x1:T

r

1

, y1:T

;

• Drawx1:T

r

p

x1:T

θ

r

, y1:T

### )

.

The above procedure results in a Markov chain,

θ

r

, x1:T

r

### ]}

r1

with stationary distributionp

θ, x1:T

y1:T

.

(12)

8(28)

Aim: Findp

θ, x1:T

y1:T

### )

.

Alternate between updatingθ and updatingx1:T. MCMC: Gibbs sampling for state-space models. Iterate,

• Drawθ

r

p

θ

x1:T

r

1

, y1:T

;

• Drawx1:T

r

p

x1:T

θ

r

, y1:T

### )

.

The above procedure results in a Markov chain,

θ

r

, x1:T

r

### ]}

r1

with stationary distributionp

θ, x1:T

y1:T

.

(13)

### Linear Gaussian state-space model

9(28)

ex) Gibbs sampling for linear system identification.

xt+1

yt



A B C D

 xt ut



vt

et

 . Iterate,

• Drawθ0

p

θ

x1:T, y1:T

;

• Drawx01:T

p

x1:T

θ0, y1:T

### )

.

0 0.5 1 1.5 2 2.5 3

−10

−5 0 5 10 15 20 25

Magnitude(dB)

0 0.5 1 1.5 2 2.5 3

−50 0 50 100

Phase(deg)

TruePosterior mean 95 % credibility

(14)

### Gibbs sampler for general SSM?

10(28)

What about the general nonlinear/non-Gaussian case?

• Drawθ0

p

θ

x1:T, y1:T

;

OK!

• Drawx01:T

p

x1:T

θ0, y1:T

.

Hard! Problem:p

x1:T

θ, y1:T

### )

not available!

Idea: Approximatep

x1:T

θ, y1:T

### )

using a particle filter.

(15)

### Gibbs sampler for general SSM?

10(28)

What about the general nonlinear/non-Gaussian case?

• Drawθ0

p

θ

x1:T, y1:T

; OK!

• Drawx01:T

p

x1:T

θ0, y1:T

. Hard!

Problem:p

x1:T

θ, y1:T

### )

not available!

Idea: Approximatep

x1:T

θ, y1:T

### )

using a particle filter.

(16)

### Gibbs sampler for general SSM?

10(28)

What about the general nonlinear/non-Gaussian case?

• Drawθ0

p

θ

x1:T, y1:T

; OK!

• Drawx01:T

p

x1:T

θ0, y1:T

. Hard!

Problem:p

x1:T

θ, y1:T

### )

not available!

Idea: Approximatep

x1:T

θ, y1:T

### )

using a particle filter.

(17)

11(28)

• Resampling:

xi1:t1, wit1

Ni=1

˜xi1:t1, 1/N

### }

Ni=1.

• Propagation: xit

Rtθ

dxt

˜xi1:t1

andxi1:t

˜xi1:t1, xit

.

• Weighting:wit

Wtθ

xi1:t

.

xi1:t, wit

### }

Ni=1

Weighting Resampling Propagation Weighting Resampling

(18)

### The particle filter

11(28)

• Resampling + Propagation:

ait, xit

Mtθ

at, xt

w

at

t1

lwlt1Rtθ

xt

xa1:tt1

.

• Weighting:wit

Wtθ

xi1:t

.

xi1:t, wit

### }

Ni=1

Weighting Resampling Propagation Weighting Resampling

(19)

### The particle filter

12(28)

Algorithm Particle filter (PF) 1. Initialize (t

### =

1):

(a) Drawxi1∼Rθ1(x1)fori=1, . . . , N. (b) Setwi1=Wθ1(xi1)fori=1, . . . , N. 2. fort

### =

2, . . . , T:

(a) Draw(ait, xit) ∼Mθt(at, xt)fori=1, . . . , N.

(b) Setxi1:t= {xa1:t−1it , xit}andwit=Wtθ(xi1:t)fori=1, . . . , N.

(20)

13(28)

5 10 15 20 25

−4

−3

−2

−1 0 1

Time

State

(21)

14(28)

• WithP

x01:T

xi1:T

### )

∝ wiT we get,x1:T0 approx.

p

x1:T

θ, y1:T

.

5 10 15 20 25

−4

−3.5

−3

−2.5

−2

−1.5

−1

−0.5 0 0.5 1

Time

State

(22)

### Problems

15(28)

Problems with this approach,

• Based on a PF

### ⇒

approximate sample.

• Does not leavep

θ, x1:T

y1:T

### )

invariant!

• Relies on largeNto be successful.

• A lot of wasted computations.

To get around these problems,

Use a conditional particle filter (CPF). One prespecified path is retained throughout the sampler.

C. Andrieu, A. Doucet and R. Holenstein, “Particle Markov chain Monte Carlo methods”, Journal of the Royal Statistical Society: Series B, 72:269-342, 2010.

(23)

### Problems

15(28)

Problems with this approach,

• Based on a PF

### ⇒

approximate sample.

• Does not leavep

θ, x1:T

y1:T

### )

invariant!

• Relies on largeNto be successful.

• A lot of wasted computations.

To get around these problems,

Use a conditional particle filter (CPF). One prespecified path is retained throughout the sampler.

C. Andrieu, A. Doucet and R. Holenstein, “Particle Markov chain Monte Carlo methods”, Journal of the Royal Statistical Society: Series B, 72:269-342, 2010.

(24)

### Conditional PF with ancestor sampling

16(28)

Algorithm CPF w. ancestor sampling (CPF-AS), conditioned onx?1:T 1. Initialize (t

### =

1):

(a) Drawxi1∼Rθ1(x1)fori6=Nand setxN1 =x?1. (b) Setwi1=Wθ1(xi1)fori=1, . . . , N.

2. fort

### =

2, . . . , T:

(a) Draw(ait, xit) ∼Mθt(at, xt)fori6=Nand setxNt =x?t. (b) DrawaNt withP(aNt =i)∝ wit−1p(x?t |θ, xit−1).

(c) Setxi1:t= {xa1:t−1it , xit}andwit=Wtθ(xi1:t)fori=1, . . . , N.

F. Lindsten, M. I. Jordan and T. B. Schön, “Ancestor Sampling for Particle Gibbs” In proceedings of the 2012 Conference on Neural Information Processing Systems (NIPS), (accepted).

(25)

17(28)

Theorem

For anyN

### ≥

2, the procedure;

(i) Run CPF-AS

x1:T?

;

(ii) SampleP

x01:T

xi1:T

### )

∝ wiT;

defines a Markov kernel on XT which leavesp

x1:T

θ, y1:T

invariant.

Proof.

(26)

17(28)

Theorem

For anyN

### ≥

2, the procedure;

(i) Run CPF-AS

x1:T?

;

(ii) SampleP

x01:T

xi1:T

### )

∝ wiT;

defines a Markov kernel on XT which leavesp

x1:T

θ, y1:T

invariant.

Proof.

(27)

### CPF vs. CPF-AS

18(28)

CPFCPF-AS

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

(28)

### Particle Gibbs with ancestor sampling

19(28)

Bayesian identification: Gibbs + CPF-AS = PG-AS Algorithm PG-AS: Particle Gibbs with ancestor sampling

1. Initialize: Set

θ

0

, x1:T

0

arbitrarily.

2. Forr

1, iterate:

(a) Drawθ

r

p

θ

x1:T

r

1

, y1:T

.

(b) Run CPF-AS

x1:T

r

1

, targetingp

x1:T

θ

r

, y1:T

### )

. (c) Sample withP

x1:T

r

xi1:T

### )

∝ wiT.

For any number of particles N

### ≥

2, the Markov chain

θ

r

, x1:T

r

### ]}

r1has stationary distributionp

θ, x1:T

y1:T

.

(29)

### ex) Stochastic volatility

20(28)

PG-AS for stochastic volatility model,

xt+1

θ1xt

vt, vt

0, θ2

, yt

etexp 1

2xt



, et

0, 1

### )

.

0 200 400 600 800 1000

0.2 0.4 0.6 0.8 1

Iteration number θ1

0 200 400 600 800 1000

0 0.2 0.4 0.6 0.8 1

Iteration number θ2

N=5 N=20 N=100 N=1000 N=5000

(30)

### ex) Stochastic volatility

20(28)

PG-AS for stochastic volatility model,

xt+1

θ1xt

vt, vt

0, θ2

, yt

etexp 1

2xt



, et

0, 1

### )

.

0.8 0.85 0.9 0.95 1

0 10 20 30 40

θ1

Probabilitydensity

0 0.1 0.2 0.3 0.4 0.5

0 5 10 15

θ2

Probabilitydensity

N=5 N=20 N=100 N=1000 N=5000

(31)

### Maximum likelihood inference

21(28)

Back to frequentistic objective, ˆθML

arg max

θ

pθ

y1:T

### )

.

Expectation maximization (EM). Iterate;

(E) Q

θ, θ

r

1

Eθ[r1]

log pθ

x1:T, y1:T

y1:T

; (M) θ

r

arg maxθQ

θ, θ

r

1

### ])

.

Problem: The E-step requires us to solve a smoothing prob- lem, i.e. to compute an expectation underpθ

x1:T

y1:T

.

(32)

### Maximum likelihood inference

21(28)

Back to frequentistic objective, ˆθML

arg max

θ

pθ

y1:T

### )

.

Expectation maximization (EM). Iterate;

(E) Q

θ, θ

r

1

Eθ[r1]

log pθ

x1:T, y1:T

y1:T

; (M) θ

r

arg maxθQ

θ, θ

r

1

### ])

.

Problem: The E-step requires us to solve a smoothing prob- lem, i.e. to compute an expectation underpθ

x1:T

y1:T

.

(33)

### Particle smoother EM

22(28)

Idea: Use a particle smoother (PS) for the E-step.

pθ0

x1:T

y1:T

1 N

## ∑

N j=1

δ˜xj1:T

x1:T

### )

.

The E-step is approximated with

QbN

θ, θ0

1 N

## ∑

N j=1

log pθ

˜xj1:T, y1:T

.

(34)

### Problems with PS-EM

23(28)

Problems with PS-EM,

• Doubly asymptotic – requiresN

andR

### →

simultaneously to converge.

• Relies on largeNto be successful.

• A lot of wasted computations.

(35)

### Stochastic approximation EM

24(28)

Assume for the time being that we can sample frompθ

x1:T

y1:T

### )

. Stochastic approximation EM (SAEM): Replace the E-step with,

Qbr

θ

Qbr1

θ

γr 1 M

## ∑

M j=1

log pθ

˜xj1:T, y1:T

Qbr1

θ

### )

! ,

where˜xj1:Ti.i.d.

pθ

x1:T

y1:T

forj

### =

1, . . . , M.

SAEM converges to a maximum of pθ

y1:T

for any M

### ≥

1 under standard stochastic approximation conditions.

B. Delyon, M. Lavielle and E. Moulines, “Convergence of a stochastic approximation version of the EM algorithm”, The Annals of Statistics, 27:94-128, 1999.

(36)

### Stochastic approximation EM

24(28)

Assume for the time being that we can sample frompθ

x1:T

y1:T

### )

. Stochastic approximation EM (SAEM): Replace the E-step with,

Qbr

θ

Qbr1

θ

γr 1 M

## ∑

M j=1

log pθ

˜xj1:T, y1:T

Qbr1

θ

### )

! ,

where˜xj1:Ti.i.d.

pθ

x1:T

y1:T

forj

### =

1, . . . , M.

SAEM converges to a maximum of pθ

y1:T

for any M

### ≥

1 under standard stochastic approximation conditions.

B. Delyon, M. Lavielle and E. Moulines, “Convergence of a stochastic approximation version of the EM algorithm”, The Annals of Statistics, 27:94-128, 1999.

(37)

### Stochastic approximaiton EM

25(28)

Bad news: We cannot sample frompθ

x1:T

y1:T

### )

.

Good news: It is enough to sample from a uniformly ergodic Markov kernel, leavingpθ

x1:T

y1:T

### )

invariant.

We can use CPF-AS to sample the states!

(38)

### Stochastic approximaiton EM

25(28)

Bad news: We cannot sample frompθ

x1:T

y1:T

### )

.

Good news: It is enough to sample from a uniformly ergodic Markov kernel, leavingpθ

x1:T

y1:T

### )

invariant.

We can use CPF-AS to sample the states!

(39)

### Stochastic approximaiton EM

25(28)

Bad news: We cannot sample frompθ

x1:T

y1:T

### )

.

Good news: It is enough to sample from a uniformly ergodic Markov kernel, leavingpθ

x1:T

y1:T

### )

invariant.

We can use CPF-AS to sample the states!

(40)

### SAEM-AS

26(28)

Maximum likelihood identification: SAEM + CPF-AS = SAEM-AS Algorithm SAEM-AS: Particle SAEM with ancestor sampling

1. Initialize: Set

θ

0

, x1:T

0

arbitrarily.

2. Forr

1, iterate:

(a) Run CPF-AS

x1:T

r

1

, targetingp

x1:T

θ

r

1

, y1:T

. (b) Compute

Qbr

θ

Qbr1

θ

γr

## ∑

N i=1

wiTlog pθ

xi1:T, y1:T

Qbr1

θ

! .

(c) Computeθ

r

arg maxθQbr

θ

### )

. (d) Sample withP

x1:T

r

xi1:T

∝ wiT.

(41)

27(28)

SAEM-AS withN

### =

10for stochastic volatility model, xt+1

θ1xt

vt, vt

0, θ2

,

yt

etexp 1 2xt



, et

0, 1

### )

.

0 200 400 600 800 1000

0.2 0.4 0.6 0.8 1

Iteration number θ1

0 200 400 600 800 1000

0 0.2 0.4 0.6 0.8 1

Iteration number θ2

(42)

### Conclusions

28(28)

Conclusions,

• Conditional particle filters are useful for identification!

• CPF-AS defines a kernel on XT leavingpθ

x1:T

y1:T

### )

invariant.

• CPF-AS consists of two parts

Conditioning: Ensures correct stationary distribution for anyN.

Ancestor sampling: Mitigates path degeneracy and enables movement around the conditioned path.

• PG-AS for Bayesian inference and SAEM-AS for maximum likelihood inference. Both work with few particles.

Future work includes finding. . .

• . . . stronger ergodicity results for CPF-AS (uniformly ergodic?).

• . . . exact conditions for almost sure convergence of SAEM-AS.

(43)

### Conclusions

28(28)

Conclusions,

• Conditional particle filters are useful for identification!

• CPF-AS defines a kernel on XT leavingpθ

x1:T

y1:T

### )

invariant.

• CPF-AS consists of two parts

Conditioning: Ensures correct stationary distribution for anyN.

Ancestor sampling: Mitigates path degeneracy and enables movement around the conditioned path.

• PG-AS for Bayesian inference and SAEM-AS for maximum likelihood inference. Both work with few particles.

Future work includes finding. . .

• . . . stronger ergodicity results for CPF-AS (uniformly ergodic?).

• . . . exact conditions for almost sure convergence of SAEM-AS.

Updating...

## References

Related subjects :