**Conditional particle filters for** **system identification**

**Fredrik Lindsten**

Division of Automatic Control Linköping University, Sweden

**Identification of state-space models**

2(28)
Consider a nonlinear, discrete-time state-space model, xt+1

### =

ft### (

xt*; θ*

### ) +

vt### (

*θ*

### )

,y_{t}

### =

h_{t}

### (

x_{t}

*; θ*

### ) +

e_{t}

### (

*θ*

### )

. We observey_{1:T}

### = {

y_{1}, . . . , y

_{T}

### }

.**Aim: Estimate**

*θ*giveny

_{1:T}.

**Latent variable model**

3(28)
Alternate between updating*θ* and updatingx_{1:T}.

**Frequentists:**

• Find *ˆθ*ML

### =

arg max*p*

_{θ}

_{θ}### (

y_{1:T}

### )

.• Use e.g. the EM algorithm.

**Bayesians:**

• Findp

### (

*θ*

### |

y_{1:T}

### )

.• Use e.g. Gibbs sampling

**Invariant distribution**

4(28)
Definition

LetK

### (

y### |

x### )

be a Markovian transition kernel and let*π*

### (

x### )

be some probability density function.Kis said to leave*π*invariant if

Z

K

### (

y### |

x### )

*π*

### (

x### )

dx### =

*π*

### (

y### )

.• A Markov chain

### {

X_{n}

### }

_{n}

_{≥}

_{1}with transition kernelKhas

*π*as stationary distribution.

• If the chain is ergodic, then*π*is its limiting distribution.

**ex) Invariant distribution**

5(28)
0 100 200 300 400 500

−10 0 10 20 30 40

Time (t) xt

−100 −5 0 5 10

0.05 0.1 0.15 0.2

x

Densityvalue

**ex) Invariant distribution**

5(28)
0 2000 4000 6000 8000 10000

−10 0 10 20 30 40

Time (t) xt

−100 −5 0 5 10

0.05 0.1 0.15 0.2

x

Densityvalue

**Markov chain Monte Carlo**

6(28)
**Objective: Sample from intractable target distribution***π*

### (

x### )

.Markov chain Monte Carlo (MCMC):

• Find kernelK

### (

y### |

x### )

which leaves*π*

### (

x### )

invariant (key step).• Sample a Markov chain with kernelK.

• Discard the transient (burnin) and use the sample path as
approximate samples from*π*.

**Gibbs sampler**

7(28)
**ex) Sample from,**

### N

^{x}

y

;10 10

,2 1 1 1

.

Gibbs sampler

• Drawx^{0}

### ∼

*π*

### (

x### |

y### )

;• Drawy^{0}

### ∼

*π*

### (

y### |

x^{0}

### )

.0 5 10 15

0 5 10 15

X

Y

**Gibbs sampler for SSMs**

8(28)
**Aim: Find**p

### (

*θ*

### |

y_{1:T}

### )

.Alternate between updating*θ* and updatingx_{1:T}.
**MCMC: Gibbs sampling for state-space models. Iterate,**

• Draw*θ*

### [

r### ] ∼

p### (

*θ*

### |

x_{1:T}

### [

r### −

1### ]

, y_{1:T}

### )

;• Drawx_{1:T}

### [

r### ] ∼

p### (

x_{1:T}

### |

*θ*

### [

r### ]

, y1:T### )

.The above procedure results in a Markov chain,

### {

*θ*

### [

r### ]

, x_{1:T}

### [

r### ]}

_{r}≥1

with stationary distributionp

### (

*θ, x*

_{1:T}

### |

y_{1:T}

### )

.**Gibbs sampler for SSMs**

8(28)
**Aim: Find**p

### (

*θ, x*

_{1:T}

### |

y_{1:T}

### )

.Alternate between updating*θ* and updatingx_{1:T}.
**MCMC: Gibbs sampling for state-space models. Iterate,**

• Draw*θ*

### [

r### ] ∼

p### (

*θ*

### |

x_{1:T}

### [

r### −

1### ]

, y_{1:T}

### )

;• Drawx_{1:T}

### [

r### ] ∼

p### (

x_{1:T}

### |

*θ*

### [

r### ]

, y1:T### )

.The above procedure results in a Markov chain,

### {

*θ*

### [

r### ]

, x_{1:T}

### [

r### ]}

_{r}≥1

with stationary distributionp

### (

*θ, x*

_{1:T}

### |

y_{1:T}

### )

.**Gibbs sampler for SSMs**

8(28)
**Aim: Find**p

### (

*θ, x*

_{1:T}

### |

y_{1:T}

### )

.Alternate between updating*θ* and updatingx_{1:T}.

**MCMC: Gibbs sampling for state-space models. Iterate,**

• Draw*θ*

### [

r### ] ∼

p### (

*θ*

### |

x_{1:T}

### [

r### −

1### ]

, y_{1:T}

### )

;• Drawx_{1:T}

### [

r### ] ∼

p### (

x_{1:T}

### |

*θ*

### [

r### ]

, y1:T### )

.The above procedure results in a Markov chain,

### {

*θ*

### [

r### ]

, x_{1:T}

### [

r### ]}

_{r}≥1

with stationary distributionp

### (

*θ, x*

_{1:T}

### |

y_{1:T}

### )

.**Gibbs sampler for SSMs**

8(28)
**Aim: Find**p

### (

*θ, x*

_{1:T}

### |

y_{1:T}

### )

.Alternate between updating*θ* and updatingx_{1:T}.
**MCMC: Gibbs sampling for state-space models. Iterate,**

• Draw*θ*

### [

r### ] ∼

p### (

*θ*

### |

x_{1:T}

### [

r### −

1### ]

, y_{1:T}

### )

;• Drawx_{1:T}

### [

r### ] ∼

p### (

x_{1:T}

### |

*θ*

### [

r### ]

, y1:T### )

.The above procedure results in a Markov chain,

### {

*θ*

### [

r### ]

, x_{1:T}

### [

r### ]}

_{r}≥1

with stationary distributionp

### (

*θ, x*

_{1:T}

### |

y_{1:T}

### )

.**Linear Gaussian state-space model**

9(28)
**ex) Gibbs sampling for linear system identification.**

x_{t}+1

y_{t}

### =

^{A B}C D

x_{t}
u_{t}

### +

^{v}

^{t}

e_{t}

. Iterate,

• Draw*θ*^{0}

### ∼

p### (

*θ*

### |

x_{1:T}, y

_{1:T}

### )

;• Drawx^{0}_{1:T}

### ∼

p### (

x_{1:T}

### |

*θ*

^{0}, y1:T

### )

.0 0.5 1 1.5 2 2.5 3

−10

−5 0 5 10 15 20 25

Frequency (rad/s)

Magnitude(dB)

0 0.5 1 1.5 2 2.5 3

−50 0 50 100

Frequency (rad/s)

Phase(deg)

TruePosterior mean 95 % credibility

**Gibbs sampler for general SSM?**

10(28)
What about the general nonlinear/non-Gaussian case?

• Draw*θ*^{0}

### ∼

p### (

*θ*

### |

x_{1:T}, y

_{1:T}

### )

;**OK!**

• Drawx^{0}_{1:T}

### ∼

p### (

x_{1:T}

### |

*θ*

^{0}, y

_{1:T}

### )

.**Hard!**
**Problem:**p

### (

x_{1:T}

### |

*θ, y*

_{1:T}

### )

not available!**Idea: Approximate**p

### (

x_{1:T}

### |

*θ, y*

_{1:T}

### )

using a particle filter.**Gibbs sampler for general SSM?**

10(28)
What about the general nonlinear/non-Gaussian case?

• Draw*θ*^{0}

### ∼

p### (

*θ*

### |

x_{1:T}, y

_{1:T}

### )

;**OK!**

• Drawx^{0}_{1:T}

### ∼

p### (

x_{1:T}

### |

*θ*

^{0}, y

_{1:T}

### )

.**Hard!**

**Problem:**p

### (

x_{1:T}

### |

*θ, y*

_{1:T}

### )

not available!**Idea: Approximate**p

### (

x_{1:T}

### |

*θ, y*

_{1:T}

### )

using a particle filter.**Gibbs sampler for general SSM?**

10(28)
What about the general nonlinear/non-Gaussian case?

• Draw*θ*^{0}

### ∼

p### (

*θ*

### |

x_{1:T}, y

_{1:T}

### )

;**OK!**

• Drawx^{0}_{1:T}

### ∼

p### (

x_{1:T}

### |

*θ*

^{0}, y

_{1:T}

### )

.**Hard!**

**Problem:**p

### (

x_{1:T}

### |

*θ, y*

_{1:T}

### )

not available!**Idea: Approximate**p

### (

x_{1:T}

### |

*θ, y*

_{1:T}

### )

using a particle filter.**The particle filter**

11(28)
• Resampling:

### {

x^{i}

_{1:t}

_{−}

_{1}, w

^{i}

_{t}

_{−}

_{1}

### }

^{N}

_{i}

_{=}

_{1}

### → {

˜x^{i}

_{1:t}

_{−}

_{1}, 1/N

### }

^{N}

_{i}

_{=}

_{1}.

• Propagation: x^{i}_{t}

### ∼

R_{t}

^{θ}### (

dx_{t}

### |

˜x^{i}

_{1:t}

_{−}

_{1}

### )

andx^{i}

_{1:t}

### = {

˜x^{i}

_{1:t}

_{−}

_{1}, x

^{i}

_{t}

### }

.• Weighting:w^{i}_{t}

### =

W_{t}

^{θ}### (

x^{i}

_{1:t}

### )

.### ⇒ {

x^{i}

_{1:t}, w

^{i}

_{t}

### }

^{N}

_{i}

_{=}

_{1}

Weighting Resampling Propagation Weighting Resampling

**The particle filter**

11(28)
• Resampling + Propagation:

### (

a^{i}

_{t}, x

^{i}

_{t}

### ) ∼

M_{t}

^{θ}### (

a_{t}, x

_{t}

### ) =

^{w}

at

t−1

∑lw^{l}_{t}_{−}_{1}R_{t}^{θ}

### (

x_{t}

### |

x^{a}

_{1:t}

^{t}

_{−}

_{1}

### )

.• Weighting:w^{i}_{t}

### =

W_{t}

^{θ}### (

x^{i}

_{1:t}

### )

.### ⇒ {

x^{i}

_{1:t}, w

^{i}

_{t}

### }

^{N}

_{i}

_{=}

_{1}

Weighting Resampling Propagation Weighting Resampling

**The particle filter**

12(28)
**Algorithm Particle filter (PF)**
1. **Initialize (**t

### =

1**):**

(a) Drawx^{i}_{1}∼R^{θ}_{1}(x1)fori=1, . . . , N.
(b) Setw^{i}_{1}=W^{θ}_{1}(x^{i}_{1})fori=1, . . . , N.
2. **for**t

### =

2, . . . , T**:**

(a) Draw(a^{i}_{t}, x^{i}_{t}) ∼M^{θ}_{t}(at, xt)fori=1, . . . , N.

(b) Setx^{i}_{1:t}= {x^{a}_{1:t−1}^{i}^{t} , x^{i}_{t}}andw^{i}_{t}=W_{t}* ^{θ}*(x

^{i}

_{1:t})fori=1, . . . , N.

**The particle filter**

13(28)
5 10 15 20 25

−4

−3

−2

−1 0 1

Time

State

**Sampling based on the PF**

14(28)
• WithP

### (

x^{0}

_{1:T}

### =

x^{i}

_{1:T}

### )

∝ w^{i}

_{T}

^{we get,}x

_{1:T}

^{0}

^{approx.}

### ∼

p### (

x_{1:T}

### |

*θ, y*

_{1:T}

### )

.5 10 15 20 25

−4

−3.5

−3

−2.5

−2

−1.5

−1

−0.5 0 0.5 1

Time

State

**Problems**

15(28)
Problems with this approach,

• Based on a PF

### ⇒

approximate sample.• Does not leavep

### (

*θ, x*

_{1:T}

### |

y_{1:T}

### )

invariant!• Relies on largeNto be successful.

• A lot of wasted computations.

To get around these problems,

Use a conditional particle filter (CPF). One prespecified path is retained throughout the sampler.

C. Andrieu, A. Doucet and R. Holenstein, “Particle Markov chain Monte Carlo methods”, Journal of the Royal Statistical Society: Series B, 72:269-342, 2010.

**Problems**

15(28)
Problems with this approach,

• Based on a PF

### ⇒

approximate sample.• Does not leavep

### (

*θ, x*

_{1:T}

### |

y_{1:T}

### )

invariant!• Relies on largeNto be successful.

• A lot of wasted computations.

To get around these problems,

Use a conditional particle filter (CPF). One prespecified path is retained throughout the sampler.

C. Andrieu, A. Doucet and R. Holenstein, “Particle Markov chain Monte Carlo methods”, Journal of the Royal Statistical Society: Series B, 72:269-342, 2010.

**Conditional PF with ancestor sampling**

16(28)
**Algorithm CPF w. ancestor sampling (CPF-AS), conditioned on**x^{?}_{1:T}
1. **Initialize (**t

### =

1**):**

(a) Drawx^{i}_{1}∼R^{θ}_{1}(x1)fori6=Nand setx^{N}_{1} =x^{?}_{1}.
(b) Setw^{i}_{1}=W^{θ}_{1}(x^{i}_{1})fori=1, . . . , N.

2. **for**t

### =

2, . . . , T**:**

(a) Draw(a^{i}_{t}, x^{i}_{t}) ∼M^{θ}_{t}(at, xt)fori6=Nand setx^{N}_{t} =x^{?}_{t}.
(b) Drawa^{N}_{t} withP(a^{N}_{t} =i)∝ w^{i}_{t−1}p(x^{?}_{t} |*θ, x*^{i}_{t−1}).

(c) Setx^{i}_{1:t}= {x^{a}_{1:t−1}^{i}^{t} , x^{i}_{t}}andw^{i}_{t}=W_{t}* ^{θ}*(x

^{i}

_{1:t})fori=1, . . . , N.

F. Lindsten, M. I. Jordan and T. B. Schön, “Ancestor Sampling for Particle Gibbs” In proceedings of the 2012 Conference on Neural Information Processing Systems (NIPS), (accepted).

**Conditional PF with ancestor sampling**

17(28)
Theorem

For anyN

### ≥

2, the procedure;(i) Run CPF-AS

### (

x_{1:T}

^{?}

### )

;(ii) SampleP

### (

x^{0}

_{1:T}

### =

x^{i}

_{1:T}

### )

_{∝ w}

^{i}

_{T};

defines a Markov kernel on X^{T} which leavesp

### (

x_{1:T}

### |

*θ, y*

_{1:T}

### )

invariant.Proof.

Ask me later. . .

**Conditional PF with ancestor sampling**

17(28)
Theorem

For anyN

### ≥

2, the procedure;(i) Run CPF-AS

### (

x_{1:T}

^{?}

### )

;(ii) SampleP

### (

x^{0}

_{1:T}

### =

x^{i}

_{1:T}

### )

_{∝ w}

^{i}

_{T};

defines a Markov kernel on X^{T} which leavesp

### (

x_{1:T}

### |

*θ, y*

_{1:T}

### )

invariant.Proof.

Ask me later. . .

**CPF vs. CPF-AS**

18(28)
CPFCPF-AS

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

**Particle Gibbs with ancestor sampling**

19(28)
**Bayesian identification: Gibbs + CPF-AS = PG-AS**
**Algorithm PG-AS: Particle Gibbs with ancestor sampling**

1. **Initialize: Set**

### {

*θ*

### [

0### ]

, x_{1:T}

### [

0### ]}

arbitrarily.2. **For**r

### ≥

1

^{, iterate:}(a) Draw*θ*

### [

r### ] ∼

p### (

*θ*

### |

x_{1:T}

### [

r### −

1### ]

, y_{1:T}

### )

.(b) Run CPF-AS

### (

x_{1:T}

### [

r### −

1### ])

, targetingp### (

x_{1:T}

### |

*θ*

### [

r### ]

, y_{1:T}

### )

. (c) Sample withP### (

x_{1:T}

### [

r### ] =

x^{i}

_{1:T}

### )

_{∝ w}

^{i}

_{T}.

For any number of particles N

### ≥

_{2}, the Markov chain

### {

*θ*

### [

r### ]

, x_{1:T}

### [

r### ]}

_{r}≥1has stationary distributionp

### (

*θ, x*

_{1:T}

### |

y_{1:T}

### )

.**ex) Stochastic volatility**

20(28)
PG-AS for stochastic volatility model,

x_{t}+1

### =

*θ*

_{1}xt

### +

vt, vt### ∼ N (

*0, θ*

_{2}

### )

, y_{t}

### =

e_{t}exp 1

2x_{t}

, e_{t}

### ∼ N (

0, 1### )

.0 200 400 600 800 1000

0.2 0.4 0.6 0.8 1

Iteration number θ1

0 200 400 600 800 1000

0 0.2 0.4 0.6 0.8 1

Iteration number θ2

N=5 N=20 N=100 N=1000 N=5000

**ex) Stochastic volatility**

20(28)
PG-AS for stochastic volatility model,

x_{t}+1

### =

*θ*

_{1}xt

### +

vt, vt### ∼ N (

*0, θ*

_{2}

### )

, y_{t}

### =

e_{t}exp 1

2x_{t}

, e_{t}

### ∼ N (

0, 1### )

.0.8 0.85 0.9 0.95 1

0 10 20 30 40

θ1

Probabilitydensity

0 0.1 0.2 0.3 0.4 0.5

0 5 10 15

θ2

Probabilitydensity

N=5 N=20 N=100 N=1000 N=5000

**Maximum likelihood inference**

21(28)
Back to frequentistic objective,
*ˆθ*ML

### =

arg max*θ*

p_{θ}

### (

y_{1:T}

### )

.Expectation maximization (EM). Iterate;

**(E)** Q

### (

*θ, θ*

### [

r### −

_{1}

### ]) =

E

_{θ}_{[}

_{r}

_{−}

_{1}

_{]}

### [

log p*θ*

### (

x_{1:T}, y1:T

### ) |

y_{1:T}

### ]

;**(M)**

*θ*

### [

r### ] =

arg max*Q*

_{θ}### (

*θ, θ*

### [

r### −

1### ])

.**Problem: The E-step requires us to solve a smoothing prob-**
lem, i.e. to compute an expectation underp_{θ}

### (

x_{1:T}

### |

y_{1:T}

### )

.**Maximum likelihood inference**

21(28)
Back to frequentistic objective,
*ˆθ*ML

### =

arg max*θ*

p_{θ}

### (

y_{1:T}

### )

.Expectation maximization (EM). Iterate;

**(E)** Q

### (

*θ, θ*

### [

r### −

_{1}

### ]) =

E

_{θ}_{[}

_{r}

_{−}

_{1}

_{]}

### [

log p*θ*

### (

x_{1:T}, y1:T

### ) |

y_{1:T}

### ]

;**(M)**

*θ*

### [

r### ] =

arg max*Q*

_{θ}### (

*θ, θ*

### [

r### −

1### ])

.**Problem: The E-step requires us to solve a smoothing prob-**
lem, i.e. to compute an expectation underp_{θ}

### (

x_{1:T}

### |

y_{1:T}

### )

.**Particle smoother EM**

22(28)
**Idea: Use a particle smoother (PS) for the E-step.**

p_{θ}^{0}

### (

x_{1:T}

### |

y_{1:T}

### ) ≈

^{1}N

## ∑

N j=1*δ*˜x^{j}_{1:T}

### (

x_{1:T}

### )

.The E-step is approximated with

Qb^{N}

### (

*θ, θ*

^{0}

### ) ,

^{1}N

## ∑

N j=1log p_{θ}

### (

˜x^{j}

_{1:T}, y

_{1:T}

### )

.**Problems with PS-EM**

23(28)
Problems with PS-EM,

• Doubly asymptotic – requiresN

### →

_{∞}andR

### →

_{∞}simultaneously to converge.

• Relies on largeNto be successful.

• A lot of wasted computations.

**Stochastic approximation EM**

24(28)
Assume for the time being that we can sample fromp_{θ}

### (

x_{1:T}

### |

y_{1:T}

### )

.**Stochastic approximation EM (SAEM): Replace the E-step with,**

Qb_{r}

### (

*θ*

### ) =

Qb_{r}−1

### (

*θ*

### ) +

*γ*

_{r}1 M

## ∑

M j=1log p_{θ}

### (

˜x^{j}

_{1:T}, y

_{1:T}

### ) −

Qb_{r}−1

### (

*θ*

### )

! ,

where˜x^{j}_{1:T}^{i.i.d.}

### ∼

p*θ*

### (

x1:T### |

y1:T### )

forj### =

1, . . . , M.SAEM converges to a maximum of p_{θ}

### (

y_{1:T}

### )

for any M### ≥

1 under standard stochastic approximation conditions.B. Delyon, M. Lavielle and E. Moulines, “Convergence of a stochastic approximation version of the EM algorithm”, The Annals of Statistics, 27:94-128, 1999.

**Stochastic approximation EM**

24(28)
Assume for the time being that we can sample fromp_{θ}

### (

x_{1:T}

### |

y_{1:T}

### )

.**Stochastic approximation EM (SAEM): Replace the E-step with,**

Qb_{r}

### (

*θ*

### ) =

Qb_{r}−1

### (

*θ*

### ) +

*γ*

_{r}1 M

## ∑

M j=1log p_{θ}

### (

˜x^{j}

_{1:T}, y

_{1:T}

### ) −

Qb_{r}−1

### (

*θ*

### )

! ,

where˜x^{j}_{1:T}^{i.i.d.}

### ∼

p*θ*

### (

x1:T### |

y1:T### )

forj### =

1, . . . , M.SAEM converges to a maximum of p_{θ}

### (

y_{1:T}

### )

for any M### ≥

1 under standard stochastic approximation conditions.B. Delyon, M. Lavielle and E. Moulines, “Convergence of a stochastic approximation version of the EM algorithm”, The Annals of Statistics, 27:94-128, 1999.

**Stochastic approximaiton EM**

25(28)
• **Bad news: We cannot sample from**p_{θ}

### (

x_{1:T}

### |

y_{1:T}

### )

.• **Good news: It is enough to sample from a uniformly ergodic**
Markov kernel, leavingp_{θ}

### (

x_{1:T}

### |

y_{1:T}

### )

invariant.We can use CPF-AS to sample the states!

**Stochastic approximaiton EM**

25(28)
• **Bad news: We cannot sample from**p_{θ}

### (

x_{1:T}

### |

y_{1:T}

### )

.• **Good news: It is enough to sample from a uniformly ergodic**
Markov kernel, leavingp_{θ}

### (

x_{1:T}

### |

y_{1:T}

### )

invariant.We can use CPF-AS to sample the states!

**Stochastic approximaiton EM**

25(28)
• **Bad news: We cannot sample from**p_{θ}

### (

x_{1:T}

### |

y_{1:T}

### )

.• **Good news: It is enough to sample from a uniformly ergodic**
Markov kernel, leavingp_{θ}

### (

x_{1:T}

### |

y_{1:T}

### )

invariant.We can use CPF-AS to sample the states!

**SAEM-AS**

26(28)
**Maximum likelihood identification: SAEM + CPF-AS = SAEM-AS**
**Algorithm SAEM-AS: Particle SAEM with ancestor sampling**

1. **Initialize: Set**

### {

*θ*

### [

0### ]

, x_{1:T}

### [

0### ]}

arbitrarily.2. **For**r

### ≥

1

^{, iterate:}(a) Run CPF-AS

### (

x_{1:T}

### [

r### −

1### ])

, targetingp### (

x_{1:T}

### |

*θ*

### [

r### −

1### ]

, y_{1:T}

### )

. (b) ComputeQb_{r}

### (

*θ*

### ) =

Q^{b}

_{r}−1

### (

*θ*

### ) +

*γ*

_{r}

## ∑

N i=1w^{i}_{T}log p_{θ}

### (

x^{i}

_{1:T}, y

_{1:T}

### ) −

Q^{b}

_{r}−1

### (

*θ*

### )

! .

(c) Compute*θ*

### [

r### ] =

arg max*Qbr*

_{θ}### (

*θ*

### )

. (d) Sample withP### (

x_{1:T}

### [

r### ] =

x^{i}

_{1:T}

### )

∝ w^{i}

_{T}

^{.}

**ex) Stochastic volatility**

27(28)
SAEM-AS withN

### =

10for stochastic volatility model, x_{t}+1

### =

*θ*

_{1}xt

### +

vt, vt### ∼ N (

*0, θ*

_{2}

### )

,y_{t}

### =

e_{t}exp 1 2x

_{t}

, e_{t}

### ∼ N (

0, 1### )

.0 200 400 600 800 1000

0.2 0.4 0.6 0.8 1

Iteration number θ1

0 200 400 600 800 1000

0 0.2 0.4 0.6 0.8 1

Iteration number θ2

**Conclusions**

28(28)
Conclusions,

• Conditional particle filters are useful for identification!

• CPF-AS defines a kernel on X^{T} leavingp_{θ}

### (

x_{1:T}

### |

y_{1:T}

### )

invariant.• CPF-AS consists of two parts

• **Conditioning: Ensures correct stationary distribution for any**N.

• **Ancestor sampling: Mitigates path degeneracy and enables**
movement around the conditioned path.

• PG-AS for Bayesian inference and SAEM-AS for maximum likelihood inference. Both work with few particles.

Future work includes finding. . .

• . . . stronger ergodicity results for CPF-AS (uniformly ergodic?).

• . . . exact conditions for almost sure convergence of SAEM-AS.

**Conclusions**

28(28)
Conclusions,

• Conditional particle filters are useful for identification!

• CPF-AS defines a kernel on X^{T} leavingp_{θ}

### (

x_{1:T}

### |

y_{1:T}

### )

invariant.• CPF-AS consists of two parts

• **Conditioning: Ensures correct stationary distribution for any**N.

• **Ancestor sampling: Mitigates path degeneracy and enables**
movement around the conditioned path.

• PG-AS for Bayesian inference and SAEM-AS for maximum likelihood inference. Both work with few particles.

Future work includes finding. . .

• . . . stronger ergodicity results for CPF-AS (uniformly ergodic?).

• . . . exact conditions for almost sure convergence of SAEM-AS.