• No results found

Conditional particle filters for system identification

N/A
N/A
Protected

Academic year: 2022

Share "Conditional particle filters for system identification"

Copied!
43
0
0

Loading.... (view fulltext now)

Full text

(1)

Conditional particle filters for system identification

Fredrik Lindsten

Division of Automatic Control Linköping University, Sweden

(2)

Identification of state-space models

2(28)

Consider a nonlinear, discrete-time state-space model, xt+1

=

ft

(

xt; θ

) +

vt

(

θ

)

,

yt

=

ht

(

xt; θ

) +

et

(

θ

)

. We observe

y1:T

= {

y1, . . . , yT

}

. Aim: Estimateθgiveny1:T.

(3)

Latent variable model

3(28)

Alternate between updatingθ and updatingx1:T.

Frequentists:

• Find ˆθML

=

arg maxθpθ

(

y1:T

)

.

• Use e.g. the EM algorithm.

Bayesians:

• Findp

(

θ

|

y1:T

)

.

• Use e.g. Gibbs sampling

(4)

Invariant distribution

4(28)

Definition

LetK

(

y

|

x

)

be a Markovian transition kernel and letπ

(

x

)

be some probability density function.Kis said to leaveπinvariant if

Z

K

(

y

|

x

)

π

(

x

)

dx

=

π

(

y

)

.

• A Markov chain

{

Xn

}

n1with transition kernelKhasπas stationary distribution.

• If the chain is ergodic, thenπis its limiting distribution.

(5)

ex) Invariant distribution

5(28)

0 100 200 300 400 500

−10 0 10 20 30 40

Time (t) xt

−100 −5 0 5 10

0.05 0.1 0.15 0.2

x

Densityvalue

(6)

ex) Invariant distribution

5(28)

0 2000 4000 6000 8000 10000

−10 0 10 20 30 40

Time (t) xt

−100 −5 0 5 10

0.05 0.1 0.15 0.2

x

Densityvalue

(7)

Markov chain Monte Carlo

6(28)

Objective: Sample from intractable target distributionπ

(

x

)

.

Markov chain Monte Carlo (MCMC):

• Find kernelK

(

y

|

x

)

which leavesπ

(

x

)

invariant (key step).

• Sample a Markov chain with kernelK.

• Discard the transient (burnin) and use the sample path as approximate samples fromπ.

(8)

Gibbs sampler

7(28)

ex) Sample from,

N

x

y



;10 10



,2 1 1 1



.

Gibbs sampler

• Drawx0

π

(

x

|

y

)

;

• Drawy0

π

(

y

|

x0

)

.

0 5 10 15

0 5 10 15

X

Y

(9)

Gibbs sampler for SSMs

8(28) Aim: Findp

(

θ

|

y1:T

)

.

Alternate between updatingθ and updatingx1:T. MCMC: Gibbs sampling for state-space models. Iterate,

• Drawθ

[

r

] ∼

p

(

θ

|

x1:T

[

r

1

]

, y1:T

)

;

• Drawx1:T

[

r

] ∼

p

(

x1:T

|

θ

[

r

]

, y1:T

)

.

The above procedure results in a Markov chain,

{

θ

[

r

]

, x1:T

[

r

]}

r1

with stationary distributionp

(

θ, x1:T

|

y1:T

)

.

(10)

Gibbs sampler for SSMs

8(28) Aim: Findp

(

θ, x1:T

|

y1:T

)

.

Alternate between updatingθ and updatingx1:T. MCMC: Gibbs sampling for state-space models. Iterate,

• Drawθ

[

r

] ∼

p

(

θ

|

x1:T

[

r

1

]

, y1:T

)

;

• Drawx1:T

[

r

] ∼

p

(

x1:T

|

θ

[

r

]

, y1:T

)

.

The above procedure results in a Markov chain,

{

θ

[

r

]

, x1:T

[

r

]}

r1

with stationary distributionp

(

θ, x1:T

|

y1:T

)

.

(11)

Gibbs sampler for SSMs

8(28)

Aim: Findp

(

θ, x1:T

|

y1:T

)

.

Alternate between updatingθ and updatingx1:T.

MCMC: Gibbs sampling for state-space models. Iterate,

• Drawθ

[

r

] ∼

p

(

θ

|

x1:T

[

r

1

]

, y1:T

)

;

• Drawx1:T

[

r

] ∼

p

(

x1:T

|

θ

[

r

]

, y1:T

)

.

The above procedure results in a Markov chain,

{

θ

[

r

]

, x1:T

[

r

]}

r1

with stationary distributionp

(

θ, x1:T

|

y1:T

)

.

(12)

Gibbs sampler for SSMs

8(28)

Aim: Findp

(

θ, x1:T

|

y1:T

)

.

Alternate between updatingθ and updatingx1:T. MCMC: Gibbs sampling for state-space models. Iterate,

• Drawθ

[

r

] ∼

p

(

θ

|

x1:T

[

r

1

]

, y1:T

)

;

• Drawx1:T

[

r

] ∼

p

(

x1:T

|

θ

[

r

]

, y1:T

)

.

The above procedure results in a Markov chain,

{

θ

[

r

]

, x1:T

[

r

]}

r1

with stationary distributionp

(

θ, x1:T

|

y1:T

)

.

(13)

Linear Gaussian state-space model

9(28)

ex) Gibbs sampling for linear system identification.

xt+1

yt



=

A B C D

 xt ut



+

vt

et

 . Iterate,

• Drawθ0

p

(

θ

|

x1:T, y1:T

)

;

• Drawx01:T

p

(

x1:T

|

θ0, y1:T

)

.

0 0.5 1 1.5 2 2.5 3

−10

−5 0 5 10 15 20 25

Frequency (rad/s)

Magnitude(dB)

0 0.5 1 1.5 2 2.5 3

−50 0 50 100

Frequency (rad/s)

Phase(deg)

TruePosterior mean 95 % credibility

(14)

Gibbs sampler for general SSM?

10(28)

What about the general nonlinear/non-Gaussian case?

• Drawθ0

p

(

θ

|

x1:T, y1:T

)

;

OK!

• Drawx01:T

p

(

x1:T

|

θ0, y1:T

)

.

Hard! Problem:p

(

x1:T

|

θ, y1:T

)

not available!

Idea: Approximatep

(

x1:T

|

θ, y1:T

)

using a particle filter.

(15)

Gibbs sampler for general SSM?

10(28)

What about the general nonlinear/non-Gaussian case?

• Drawθ0

p

(

θ

|

x1:T, y1:T

)

; OK!

• Drawx01:T

p

(

x1:T

|

θ0, y1:T

)

. Hard!

Problem:p

(

x1:T

|

θ, y1:T

)

not available!

Idea: Approximatep

(

x1:T

|

θ, y1:T

)

using a particle filter.

(16)

Gibbs sampler for general SSM?

10(28)

What about the general nonlinear/non-Gaussian case?

• Drawθ0

p

(

θ

|

x1:T, y1:T

)

; OK!

• Drawx01:T

p

(

x1:T

|

θ0, y1:T

)

. Hard!

Problem:p

(

x1:T

|

θ, y1:T

)

not available!

Idea: Approximatep

(

x1:T

|

θ, y1:T

)

using a particle filter.

(17)

The particle filter

11(28)

• Resampling:

{

xi1:t1, wit1

}

Ni=1

→ {

˜xi1:t1, 1/N

}

Ni=1.

• Propagation: xit

Rtθ

(

dxt

|

˜xi1:t1

)

andxi1:t

= {

˜xi1:t1, xit

}

.

• Weighting:wit

=

Wtθ

(

xi1:t

)

.

⇒ {

xi1:t, wit

}

Ni=1

Weighting Resampling Propagation Weighting Resampling

(18)

The particle filter

11(28)

• Resampling + Propagation:

(

ait, xit

) ∼

Mtθ

(

at, xt

) =

w

at

t1

lwlt1Rtθ

(

xt

|

xa1:tt1

)

.

• Weighting:wit

=

Wtθ

(

xi1:t

)

.

⇒ {

xi1:t, wit

}

Ni=1

Weighting Resampling Propagation Weighting Resampling

(19)

The particle filter

12(28)

Algorithm Particle filter (PF) 1. Initialize (t

=

1):

(a) Drawxi1∼Rθ1(x1)fori=1, . . . , N. (b) Setwi1=Wθ1(xi1)fori=1, . . . , N. 2. fort

=

2, . . . , T:

(a) Draw(ait, xit) ∼Mθt(at, xt)fori=1, . . . , N.

(b) Setxi1:t= {xa1:t−1it , xit}andwit=Wtθ(xi1:t)fori=1, . . . , N.

(20)

The particle filter

13(28)

5 10 15 20 25

−4

−3

−2

−1 0 1

Time

State

(21)

Sampling based on the PF

14(28)

• WithP

(

x01:T

=

xi1:T

)

∝ wiT we get,x1:T0 approx.

p

(

x1:T

|

θ, y1:T

)

.

5 10 15 20 25

−4

−3.5

−3

−2.5

−2

−1.5

−1

−0.5 0 0.5 1

Time

State

(22)

Problems

15(28)

Problems with this approach,

• Based on a PF

approximate sample.

• Does not leavep

(

θ, x1:T

|

y1:T

)

invariant!

• Relies on largeNto be successful.

• A lot of wasted computations.

To get around these problems,

Use a conditional particle filter (CPF). One prespecified path is retained throughout the sampler.

C. Andrieu, A. Doucet and R. Holenstein, “Particle Markov chain Monte Carlo methods”, Journal of the Royal Statistical Society: Series B, 72:269-342, 2010.

(23)

Problems

15(28)

Problems with this approach,

• Based on a PF

approximate sample.

• Does not leavep

(

θ, x1:T

|

y1:T

)

invariant!

• Relies on largeNto be successful.

• A lot of wasted computations.

To get around these problems,

Use a conditional particle filter (CPF). One prespecified path is retained throughout the sampler.

C. Andrieu, A. Doucet and R. Holenstein, “Particle Markov chain Monte Carlo methods”, Journal of the Royal Statistical Society: Series B, 72:269-342, 2010.

(24)

Conditional PF with ancestor sampling

16(28)

Algorithm CPF w. ancestor sampling (CPF-AS), conditioned onx?1:T 1. Initialize (t

=

1):

(a) Drawxi1∼Rθ1(x1)fori6=Nand setxN1 =x?1. (b) Setwi1=Wθ1(xi1)fori=1, . . . , N.

2. fort

=

2, . . . , T:

(a) Draw(ait, xit) ∼Mθt(at, xt)fori6=Nand setxNt =x?t. (b) DrawaNt withP(aNt =i)∝ wit−1p(x?t |θ, xit−1).

(c) Setxi1:t= {xa1:t−1it , xit}andwit=Wtθ(xi1:t)fori=1, . . . , N.

F. Lindsten, M. I. Jordan and T. B. Schön, “Ancestor Sampling for Particle Gibbs” In proceedings of the 2012 Conference on Neural Information Processing Systems (NIPS), (accepted).

(25)

Conditional PF with ancestor sampling

17(28)

Theorem

For anyN

2, the procedure;

(i) Run CPF-AS

(

x1:T?

)

;

(ii) SampleP

(

x01:T

=

xi1:T

)

∝ wiT;

defines a Markov kernel on XT which leavesp

(

x1:T

|

θ, y1:T

)

invariant.

Proof.

Ask me later. . .

(26)

Conditional PF with ancestor sampling

17(28)

Theorem

For anyN

2, the procedure;

(i) Run CPF-AS

(

x1:T?

)

;

(ii) SampleP

(

x01:T

=

xi1:T

)

∝ wiT;

defines a Markov kernel on XT which leavesp

(

x1:T

|

θ, y1:T

)

invariant.

Proof.

Ask me later. . .

(27)

CPF vs. CPF-AS

18(28)

CPFCPF-AS

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

(28)

Particle Gibbs with ancestor sampling

19(28)

Bayesian identification: Gibbs + CPF-AS = PG-AS Algorithm PG-AS: Particle Gibbs with ancestor sampling

1. Initialize: Set

{

θ

[

0

]

, x1:T

[

0

]}

arbitrarily.

2. Forr

1, iterate:

(a) Drawθ

[

r

] ∼

p

(

θ

|

x1:T

[

r

1

]

, y1:T

)

.

(b) Run CPF-AS

(

x1:T

[

r

1

])

, targetingp

(

x1:T

|

θ

[

r

]

, y1:T

)

. (c) Sample withP

(

x1:T

[

r

] =

xi1:T

)

∝ wiT.

For any number of particles N

2, the Markov chain

{

θ

[

r

]

, x1:T

[

r

]}

r1has stationary distributionp

(

θ, x1:T

|

y1:T

)

.

(29)

ex) Stochastic volatility

20(28)

PG-AS for stochastic volatility model,

xt+1

=

θ1xt

+

vt, vt

∼ N (

0, θ2

)

, yt

=

etexp 1

2xt



, et

∼ N (

0, 1

)

.

0 200 400 600 800 1000

0.2 0.4 0.6 0.8 1

Iteration number θ1

0 200 400 600 800 1000

0 0.2 0.4 0.6 0.8 1

Iteration number θ2

N=5 N=20 N=100 N=1000 N=5000

(30)

ex) Stochastic volatility

20(28)

PG-AS for stochastic volatility model,

xt+1

=

θ1xt

+

vt, vt

∼ N (

0, θ2

)

, yt

=

etexp 1

2xt



, et

∼ N (

0, 1

)

.

0.8 0.85 0.9 0.95 1

0 10 20 30 40

θ1

Probabilitydensity

0 0.1 0.2 0.3 0.4 0.5

0 5 10 15

θ2

Probabilitydensity

N=5 N=20 N=100 N=1000 N=5000

(31)

Maximum likelihood inference

21(28)

Back to frequentistic objective, ˆθML

=

arg max

θ

pθ

(

y1:T

)

.

Expectation maximization (EM). Iterate;

(E) Q

(

θ, θ

[

r

1

]) =

Eθ[r1]

[

log pθ

(

x1:T, y1:T

) |

y1:T

]

; (M) θ

[

r

] =

arg maxθQ

(

θ, θ

[

r

1

])

.

Problem: The E-step requires us to solve a smoothing prob- lem, i.e. to compute an expectation underpθ

(

x1:T

|

y1:T

)

.

(32)

Maximum likelihood inference

21(28)

Back to frequentistic objective, ˆθML

=

arg max

θ

pθ

(

y1:T

)

.

Expectation maximization (EM). Iterate;

(E) Q

(

θ, θ

[

r

1

]) =

Eθ[r1]

[

log pθ

(

x1:T, y1:T

) |

y1:T

]

; (M) θ

[

r

] =

arg maxθQ

(

θ, θ

[

r

1

])

.

Problem: The E-step requires us to solve a smoothing prob- lem, i.e. to compute an expectation underpθ

(

x1:T

|

y1:T

)

.

(33)

Particle smoother EM

22(28)

Idea: Use a particle smoother (PS) for the E-step.

pθ0

(

x1:T

|

y1:T

) ≈

1 N

N j=1

δ˜xj1:T

(

x1:T

)

.

The E-step is approximated with

QbN

(

θ, θ0

) ,

1 N

N j=1

log pθ

(

˜xj1:T, y1:T

)

.

(34)

Problems with PS-EM

23(28)

Problems with PS-EM,

• Doubly asymptotic – requiresN

andR

simultaneously to converge.

• Relies on largeNto be successful.

• A lot of wasted computations.

(35)

Stochastic approximation EM

24(28)

Assume for the time being that we can sample frompθ

(

x1:T

|

y1:T

)

. Stochastic approximation EM (SAEM): Replace the E-step with,

Qbr

(

θ

) =

Qbr1

(

θ

) +

γr 1 M

M j=1

log pθ

(

˜xj1:T, y1:T

) −

Qbr1

(

θ

)

! ,

where˜xj1:Ti.i.d.

pθ

(

x1:T

|

y1:T

)

forj

=

1, . . . , M.

SAEM converges to a maximum of pθ

(

y1:T

)

for any M

1 under standard stochastic approximation conditions.

B. Delyon, M. Lavielle and E. Moulines, “Convergence of a stochastic approximation version of the EM algorithm”, The Annals of Statistics, 27:94-128, 1999.

(36)

Stochastic approximation EM

24(28)

Assume for the time being that we can sample frompθ

(

x1:T

|

y1:T

)

. Stochastic approximation EM (SAEM): Replace the E-step with,

Qbr

(

θ

) =

Qbr1

(

θ

) +

γr 1 M

M j=1

log pθ

(

˜xj1:T, y1:T

) −

Qbr1

(

θ

)

! ,

where˜xj1:Ti.i.d.

pθ

(

x1:T

|

y1:T

)

forj

=

1, . . . , M.

SAEM converges to a maximum of pθ

(

y1:T

)

for any M

1 under standard stochastic approximation conditions.

B. Delyon, M. Lavielle and E. Moulines, “Convergence of a stochastic approximation version of the EM algorithm”, The Annals of Statistics, 27:94-128, 1999.

(37)

Stochastic approximaiton EM

25(28)

Bad news: We cannot sample frompθ

(

x1:T

|

y1:T

)

.

Good news: It is enough to sample from a uniformly ergodic Markov kernel, leavingpθ

(

x1:T

|

y1:T

)

invariant.

We can use CPF-AS to sample the states!

(38)

Stochastic approximaiton EM

25(28)

Bad news: We cannot sample frompθ

(

x1:T

|

y1:T

)

.

Good news: It is enough to sample from a uniformly ergodic Markov kernel, leavingpθ

(

x1:T

|

y1:T

)

invariant.

We can use CPF-AS to sample the states!

(39)

Stochastic approximaiton EM

25(28)

Bad news: We cannot sample frompθ

(

x1:T

|

y1:T

)

.

Good news: It is enough to sample from a uniformly ergodic Markov kernel, leavingpθ

(

x1:T

|

y1:T

)

invariant.

We can use CPF-AS to sample the states!

(40)

SAEM-AS

26(28)

Maximum likelihood identification: SAEM + CPF-AS = SAEM-AS Algorithm SAEM-AS: Particle SAEM with ancestor sampling

1. Initialize: Set

{

θ

[

0

]

, x1:T

[

0

]}

arbitrarily.

2. Forr

1, iterate:

(a) Run CPF-AS

(

x1:T

[

r

1

])

, targetingp

(

x1:T

|

θ

[

r

1

]

, y1:T

)

. (b) Compute

Qbr

(

θ

) =

Qbr1

(

θ

) +

γr

N i=1

wiTlog pθ

(

xi1:T, y1:T

) −

Qbr1

(

θ

)

! .

(c) Computeθ

[

r

] =

arg maxθQbr

(

θ

)

. (d) Sample withP

(

x1:T

[

r

] =

xi1:T

)

∝ wiT.

(41)

ex) Stochastic volatility

27(28)

SAEM-AS withN

=

10for stochastic volatility model, xt+1

=

θ1xt

+

vt, vt

∼ N (

0, θ2

)

,

yt

=

etexp 1 2xt



, et

∼ N (

0, 1

)

.

0 200 400 600 800 1000

0.2 0.4 0.6 0.8 1

Iteration number θ1

0 200 400 600 800 1000

0 0.2 0.4 0.6 0.8 1

Iteration number θ2

(42)

Conclusions

28(28)

Conclusions,

• Conditional particle filters are useful for identification!

• CPF-AS defines a kernel on XT leavingpθ

(

x1:T

|

y1:T

)

invariant.

• CPF-AS consists of two parts

Conditioning: Ensures correct stationary distribution for anyN.

Ancestor sampling: Mitigates path degeneracy and enables movement around the conditioned path.

• PG-AS for Bayesian inference and SAEM-AS for maximum likelihood inference. Both work with few particles.

Future work includes finding. . .

• . . . stronger ergodicity results for CPF-AS (uniformly ergodic?).

• . . . exact conditions for almost sure convergence of SAEM-AS.

(43)

Conclusions

28(28)

Conclusions,

• Conditional particle filters are useful for identification!

• CPF-AS defines a kernel on XT leavingpθ

(

x1:T

|

y1:T

)

invariant.

• CPF-AS consists of two parts

Conditioning: Ensures correct stationary distribution for anyN.

Ancestor sampling: Mitigates path degeneracy and enables movement around the conditioned path.

• PG-AS for Bayesian inference and SAEM-AS for maximum likelihood inference. Both work with few particles.

Future work includes finding. . .

• . . . stronger ergodicity results for CPF-AS (uniformly ergodic?).

• . . . exact conditions for almost sure convergence of SAEM-AS.

References

Related documents

Dock är social interaktion ändå viktigt för spelnjutningen eftersom det finns folk som spelar spel mestadels för det, även om de vanligtvis inte tycker om

This paper expands the literature by investigating the convergence behaviour of carbon dioxide emissions in the Americas and the factors determining the formation

Det samplade materialet användes även för att med hjälp av ljudredigeringsprogram belysa hur olika parameterinställningar påverkar dynamisk expansion i den aktuella ljudmiljön..

Vidare menar hotellchefen att samma sak gäller sökande som studerat hotell och värdskap där intresset för branschen kan ha kallnat vilket leder till ett mindre intresse att jobba

skeleton model showed stable (identified in more than 20 generated samples) associations between Spasticity and functioning categories from all ICF domains of functioning:

I Sverige och i övriga världen råder det nu på många håll ett mycket dåligt ekonomiskt klimat. Flera stora företag tvingas till betydande nedskärningar och i vissa fall också

72 Det fanns även en diskussion att förlänga det tidsbestämda straffet över 18 år när det kommer till flerfaldig brottslighet och återfall, men det ansågs i slutändan inte

One explanation may be that the success of supplier involvement is contingent upon the process of evaluating and selecting the supplier for collaboration (Petersen et al.,