Particle Gibbs with Ancestor Sampling

(1)

Particle Gibbs with Ancestor Sampling

Fredrik Lindsten ^? , Michael I. Jordan ^† , Thomas B. Schön ^‡

Chamonix, January 6, 2014

?

Division of Automatic Control Linköping University, Sweden

†

Departments of EECS and Statistics University of California, Berkeley, USA

‡

Department of Information Technology Uppsala University, Sweden

AUTOMATIC CONTROL

(2)

Identification of state-space models 2(24)

Consider a nonlinear discrete-time state-space model,

x _t ∼ f _θ ( x _t | x _t ₋ ₁ ) , y _t ∼ g _θ ( y _t | x _t ) , and x ₁ ∼ π ( x ₁ ) .

We observe y _1:T = ( y ₁ , . . . , y T ) and wish to estimate θ .

• Frequentists: Find ˆθ ML = arg max _θ p _θ ( y _1:T ) . - Use e.g. the Monte Carlo EM algorithm.

• Bayesians: Find p ( θ | y _1:T ) .

- Use e.g. Gibbs sampling.

(3)

Gibbs sampler for SSMs 3(24)

Aim: Find p ( θ, x _1:T | y _1:T ) .

MCMC: Gibbs sampling for state-space models. Iterate,

• Draw θ [ k ] ∼ p ( θ | x _1:T [ k − 1 ] , y _1:T ) ; OK!

• Draw x _1:T [ k ] ∼ p _θ _[ _k _] ( x _1:T | y _1:T ) . Hard!

Problem: p _θ ( x _1:T | y _1:T ) not available!

Idea: Approximate p _θ ( x _1:T | y _1:T ) using a particle filter.

AUTOMATIC CONTROL

(4)

The particle filter 4(24)

• Resampling: P ( a ⁱ _t = j | F _t ^N ₋ ₁ ) = w ^j _t ₋ ₁ / ∑ l w ^l _t ₋ ₁ .

• Propagation: x ⁱ _t ∼ R _t ^θ ( x _t | x ^a _1:t

ⁱ^t

₋ ₁ ) and x ⁱ _1:t = { x ^a _1:t

ⁱ^t

₋ ₁ , x ⁱ _t } .

• Weighting: w ⁱ _t = W _t ^θ ( x ⁱ _1:t ) .

⇒ { x ⁱ _1:t , w ⁱ _t } ^N _i ₌ ₁

Weighting

Resampling Propagation Weighting

Resampling

(5)

The particle filter 5(24)

Algorithm Particle filter (PF) 1. Initialize ( t = 1 ):

(a) Draw x ⁱ ₁ ∼ R ^θ ₁ ( x ₁ ) for i = 1, . . . , N . (b) Set w ⁱ ₁ = W ^θ ₁ ( x ⁱ ₁ ) ^for i = 1, . . . , N ^. 2. for t = 2, . . . , T :

(a) Draw a ⁱ _t ∼ Discrete ({ w ^j _t−1 } ^N _j=1 ) for i = 1, . . . , N . (b) Draw x ⁱ _t ∼ R ^θ _t ( x t | x ^a _1:t−1

ⁱ^t

) for i = 1, . . . , N .

(c) Set x ⁱ _1:t = { x ^a _1:t−1

ⁱ^t

, x ⁱ _t } and w ⁱ _t = W _t ^θ ( x ⁱ _1:t ) for i = 1, . . . , N .

AUTOMATIC CONTROL

(6)

The particle filter 6(24)

5 10 15 20 25

−4

−3

−2

−1 0 1

Time

State

(7)

Sampling based on the PF 7(24) With P ( x ^? _1:T = x ⁱ _1:T | F _T ^N ) ∝ w ⁱ _T ^{we get} x ^? _1:T ^approx. ∼ p _θ ( x _1:T | y _1:T ) .

5 10 15 20 25

−4

−3.5

−3

−2.5

−2

−1.5

−1

−0.5 0 0.5 1

Time

State

AUTOMATIC CONTROL

(8)

Conditional particle filter with ancestor sampling 8(24)

Problems with this approach,

• Based on a PF ⇒ approximate sample.

• Does not leave p ( θ, x _1:T | y _1:T ) invariant!

• Relies on large N to be successful.

• A lot of wasted computations.

Conditional particle filter with ancestor sampling (CPF-AS) Let x ⁰ _1:T = ( x ⁰ ₁ , . . . , x ⁰ _T ) be a fixed reference trajectory.

• At each time t , sample only N − 1 particles in the standard way.

• Set the N th particle deterministically: x ^N _t = x ⁰ _t .

• Generate an artificial history for x ^N _t by ancestor sampling.

(9)

Conditional particle filter with ancestor sampling 9(24)

Algorithm CPF-AS, conditioned on x ⁰ _1:T 1. Initialize ( t = 1 ):

(a) Draw x ⁱ ₁ ∼ R ^θ ₁ ( x 1 ) for i = 1, . . . , N − 1 . (b) Set x ^N ₁ = x ⁰ ₁ .

(c) Set w ⁱ ₁ = W ^θ ₁ ( x ⁱ ₁ ) for i = 1, . . . , N . 2. for t = 2, . . . , T :

(a) Draw a ⁱ _t ∼ Discrete ({ w ^j _t−1 } ^N _j=1 ) for i = 1, . . . , N − 1 . (b) Draw x ⁱ _t ∼ R ^θ _t ( x t | x ^a _1:t−1

ⁱ^t

) for i = 1, . . . , N − 1 . (c) Set x ^N _t = x ⁰ _t .

(d) Draw a ^N _t with P ( a ^N _t = i | F _t−1 ^N ) ∝ w ⁱ _t−1 f _θ ( x ⁰ _t | x ⁱ _t−1 ) . (e) Set x ⁱ _1:t = { x ^a _1:t−1

ⁱ^t

, x ⁱ _t } and w ⁱ _t = W _t ^θ ( x ⁱ _1:t ) for i = 1, . . . , N .

AUTOMATIC CONTROL

(10)

The PGAS Markov kernel (I/II) 10(24)

Consider the procedure:

1. Run CPF-AS ( N, x ⁰ _1:T ) targeting p _θ ( x _1:T | y _1:T ) , 2. Sample x ^? _1:T with P ( x ^? _1:T = x ⁱ _1:T | F _T ^N ) ∝ w ⁱ _T ^.

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

(11)

The PGAS Markov kernel (II/II) 11(24)

This procedure:

• Maps x ⁰ _1:T stochastically into x ^? _1:T .

• Implicitly defines a Markov kernel ( P ^N _θ ) on ( X ^T , X ^T ) , referred to as the PGAS (Particle Gibbs with ancestor sampling) kernel.

Theorem

For any number of particles N ≥ 1 and for any θ ∈ _Θ , the PGAS kernel P ^N _θ leaves p _θ ( x _1:T | y _1:T ) invariant,

p _θ ( dx ^? _1:T | y _1:T ) =

Z

P ^N _θ ( x _1:T ⁰ , dx ^? _1:T ) p _θ ( dx ⁰ _1:T | y _1:T ) .

F. Lindsten, M. I. Jordan and T. B. Schön, P. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou and K. Q. Weinberger (Eds.), Ancestor Sampling for Particle Gibbs Advances in Neural Information Processing Systems (NIPS) 25, 2600-2608, 2012.

F. Lindsten, M. I. Jordan and T. B. Schön, Particle Gibbs with Ancestor sampling, arXiv:1401.0604, 2014.

AUTOMATIC CONTROL

(12)

Ergodicity 12(24)

Theorem

Assume that there exist constants ε > 0 and κ < ∞ such that, for any θ ∈ _Θ , t ∈ { 1, . . . , T } and x _1:t ∈ X ^t , W _t ^θ ( x _1:t ) ≤ κ and p _θ ( y _1:T ) ≥ ε . ^a

Then, for any N ≥ 2 the PGAS kernel P ^N _θ is uniformly ergodic. That is, there exist constants R < ∞ ^and ρ ∈ [ 0, 1 ) such that

k( P ^N _θ ) ^k ( x _1:T ⁰ , ·) − p _θ ( · | y _1:T )k TV ≤ Rρ ^k , ∀ x _1:T ⁰ ∈ X ^T .

a

N.B. These conditions are simple, but unnecessarily strong; see (Lindsten, Douc, and Moulines. 2014).

F. Lindsten, M. I. Jordan and T. B. Schön, Particle Gibbs with Ancestor sampling, arXiv:1401.0604, 2014.

(13)

PGAS for Bayesian identification 13(24)

Bayesian identification: PGAS + Gibbs Algorithm PGAS for Bayesian identification

1. Initialize: Set { θ [ 0 ] , x _1:T [ 0 ]} arbitrarily.

2. For k ≥ 1 , iterate:

(a) Draw x 1:T [ k ] ∼ P ^N _θ[k−1] ( x 1:T [ k − 1 ] , · ) . (b) Draw θ [ k ] ∼ p ( θ | x 1:T [ k ] , y 1:T ) .

For any number of particles N ≥ 2 , the Markov chain { θ [ k ] , x _1:T [ k ]} _k _≥ ₁ has limiting distribution p ( θ, x _1:T | y _1:T ) .

AUTOMATIC CONTROL

(14)

ex) Stochastic volatility model 14(24)

Stochastic volatility model,

x _t + 1 = 0.9x _t + v _t , v _t ∼ N ( 0, θ ) , y t = e t exp (

¹₂

x t ) , e t ∼ N ( 0, 1 ) . Consider the ACF of θ [ k ] − E [ θ | y _1:T ] .

0 50 100 150 200

0 0.2 0.4 0.6 0.8 1

PG-AS, T = 1000

Lag

ACF

N=5 N=20 N=100 N=1000

0 50 100 150 200

0 0.2 0.4 0.6 0.8 1

PG, T = 1000

Lag

ACF

N=5 N=20 N=100 N=1000

(15)

PGAS vs. PG 15(24)

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

AUTOMATIC CONTROL

PGAS PG

(16)

Examples 16(24)

• F. Lindsten, T. B. Schön and M. I. Jordan, Bayesian semiparametric Wiener system identification, Automatica, 49(7): 2053-2063, July 2013.

−10 0 10 20

Magnitude(dB)

0 0.5 1 1.5 2 2.5 3

−50 0 50 100

Frequency (rad/s)

Phase(deg)

TruePosterior mean 99 % credibility

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−1

−0.8

−0.6

−0.4

−0.2 0 0.2 0.4 0.6 0.8 1

z

h(z)

TruePosterior mean 99 % credibility

• R. Frigola, F. Lindsten, T. B. Schön and C. E. Rasmussen, Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC, Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 2013.

−20 −10 0 10 20

−1

−0.5 0 0.5 1

−20

−10 0 10 20

x u

f(x,u)

10 20 30 40 50 60 70 80 90

−20

−15

−10

−5 0 5 10 15 20

t

x

(17)

Maximum likelihood identification 17(24)

Back to frequentistic objective: ˆθ ML = arg max _θ p θ ( y 1:T ) .

• Common strategy: Particle smoothing + EM (PSEM)

• Alternative strategy: PGAS + stochastic approximation EM (PSAEM)

EM MCEM PSEM

SAEM PSAEM

F. Lindsten, An efficient stochastic approximation EM algorithm using conditional particle filters, Proceedings of the 38th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013.

AUTOMATIC CONTROL

(18)

PGAS for maximum likelihood identification 18(24)

Maximum likelihood identification: PGAS + SAEM = PSAEM Algorithm PSAEM

1. Initialize: Set { θ [ 0 ] , x _1:T [ 0 ]} arbitrarily. Set Q b ₀ ≡ 0 . 2. For k ≥ 1 , iterate:

(a) Draw x 1:T [ k ] ∼ P ^N _θ[k−1] ( x 1:T [ k − 1 ] , · ) . (b) Compute

Q b _k ( θ ) = Q b _k−1 ( θ ) + γ _k

log p _θ ( x _1:T [ k ] , y _1:T ) − Q b _k−1 ( θ ) .

(c) Set θ [ k ] = arg max _θ Q b k ( θ ) .

(19)

ex) Nonlinear time series 19(24)

Consider,

x _t + 1 = ax _t + b x t

1 + x ² _t + c cos ( 1.2t ) + v _t , v _t ∼ N ( 0, σ _v ² ) , y _t = 0.05x ² _t + e _t , e _t ∼ N ( 0, σ _e ² ) .

• Parameterization: θ = ( a, b, c, σ _v ² , σ _e ² )

• Relative error: k( θ [ k ] − b θ ML ) ./b θ ML k ₂ .

AUTOMATIC CONTROL

(20)

ex) Nonlinear time series 20(24)

10⁻² 10⁻¹ 10⁰ 10¹ 10² 10³ 10⁴

10⁻³ 10⁻² 10⁻¹ 10⁰ 10¹ 10²

Computational time (seconds)

Average relative error

PSAEM,N=15 PSEM, N=15 PSEM, N=50 PSEM, N=100 PSEM, N=500

(21)

Summary 21(24)

Particle Gibbs with ancestor sampling (PGAS):

• Uses SMC within MCMC in a systematic manner.

• Defines an ergodic Markov kernel on ( X ^T , X ^T ) leaving p _θ ( x _1:T | y _1:T ) invariant for any number of particles N ≥ 2 .

• Seems to work well with a moderate N (say 5–50).

• Consists of two parts

• Conditioning: Ensures correct stationary distribution for any N .

• Ancestor sampling: Mitigates path degeneracy and enables movement around the conditioned path.

• Useful for both Bayesian and maximum-likelihood-based parameter inference as well as for state inference.

AUTOMATIC CONTROL

(22)

Ongoing and future work 22(24)

Ongoing and future work:

• Adapting PGAS to other types of models:

• General probabilistic graphical models (factor graphs).

• Non-Markovian latent variable models.

• Ancestor sampling for models with intractable transitions.

• How to scale N with T ? — establish more informative rates.

(23)

A few selected references 23(24)

Particle Gibbs with ancestor sampling:

• F. Lindsten, M. I. Jordan and T. B. Schön, Particle Gibbs with Ancestor sampling, arXiv:1401.0604, 2014.

• F. Lindsten, M. I. Jordan and T. B. Schön, P. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou and K. Q.

Weinberger (Eds.), Ancestor Sampling for Particle Gibbs Advances in Neural Information Processing Systems (NIPS) 25, 2600-2608, 2012.

Particle Gibbs with backward simulation (related procedure):

• N. Whiteley, Discussion on Particle Markov chain Monte Carlo methods, Journal of the Royal Statistical Society:

Series B, 72(3), 306–307, 2010.

• N. Whiteley, C. Andrieu and A. Doucet, Efficient Bayesian Inference for Switching State-Space Models using Discrete Particle Markov Chain Monte Carlo methods, Bristol Statistics Research Report 10:04, 2010.

• F. Lindsten and T. B. Schön, On the use of backward simulation in the particle Gibbs sampler, Proceedings of the 37th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012.

Seminal PMCMC paper:

• C. Andrieu, A. Doucet and R. Holenstein, Particle Markov chain Monte Carlo methods Journal of the Royal Statistical Society: Series B, 72(3), 269–342, 2010.

AUTOMATIC CONTROL

(24)

A few selected references 24(24)

Ergodicity of Particle Gibbs:

• F. Lindsten, R. Douc and E. Moulines, Uniform ergodicity of the Particle Gibbs sampler, arXiv:1401.0683, 2014.

• C. Andrieu, A. Lee and M. Vihola, Uniform Ergodicity of the Iterated Conditional SMC and Geometric Ergodicity of Particle Gibbs samplers, arXiv:1312.6432, 2013.

• N. Chopin, S. S. Singh, On the particle Gibbs sampler, arXiv:1304.1887, 2013.

PMCMC and SAEM:

• F. Lindsten, An efficient stochastic approximation EM algorithm using conditional particle filters, Proceedings of the 38th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013.

• C. Andrieu and M. Vihola, Markovian stochastic approximation with expanding projections, arXiv.org, arXiv:1111.5421, 2011.

• S. Donnet and A. Samson, EM algorithm coupled with particle filter for maximum likelihood parameter estimation of stochastic differential mixed-effects models, Tech. Rep. hal-00519576, v2, Université Paris Descartes, MAP5, 2011.

Particle Gibbs with Ancestor Sampling

Particle Gibbs with Ancestor Sampling

Fredrik Lindsten ? , Michael I. Jordan † , Thomas B. Schön ‡

Chamonix, January 6, 2014

Division of Automatic Control Linköping University, Sweden

Departments of EECS and Statistics University of California, Berkeley, USA

Department of Information Technology Uppsala University, Sweden

Identification of state-space models 2(24)

Consider a nonlinear discrete-time state-space model,

x t ∼ f θ ( x t | x t − 1 ) , y t ∼ g θ ( y t | x t ) , and x 1 ∼ π ( x 1 ) .

We observe y 1:T = ( y 1 , . . . , y T ) and wish to estimate θ .

• Frequentists: Find ˆθ ML = arg max θ p θ ( y 1:T ) . - Use e.g. the Monte Carlo EM algorithm.

• Bayesians: Find p ( θ | y 1:T ) .

- Use e.g. Gibbs sampling.

Gibbs sampler for SSMs 3(24)

Aim: Find p ( θ, x 1:T | y 1:T ) .

MCMC: Gibbs sampling for state-space models. Iterate,

• Draw θ [ k ] ∼ p ( θ | x 1:T [ k − 1 ] , y 1:T ) ; OK!

• Draw x 1:T [ k ] ∼ p θ [ k ] ( x 1:T | y 1:T ) . Hard!

Problem: p θ ( x 1:T | y 1:T ) not available!

Idea: Approximate p θ ( x 1:T | y 1:T ) using a particle filter.

The particle filter 4(24)

• Resampling: P ( a i t = j | F t N − 1 ) = w j t − 1 / ∑ l w l t − 1 .

• Propagation: x i t ∼ R t θ ( x t | x a 1:t

− 1 ) and x i 1:t = { x a 1:t

− 1 , x i t } .

• Weighting: w i t = W t θ ( x i 1:t ) .

⇒ { x i 1:t , w i t } N i = 1

Resampling Propagation Weighting

The particle filter 5(24)

Algorithm Particle filter (PF) 1. Initialize ( t = 1 ):

(a) Draw x i 1 ∼ R θ 1 ( x 1 ) for i = 1, . . . , N . (b) Set w i 1 = W θ 1 ( x i 1 ) for i = 1, . . . , N . 2. for t = 2, . . . , T :

(a) Draw a i t ∼ Discrete ({ w j t−1 } N j=1 ) for i = 1, . . . , N . (b) Draw x i t ∼ R θ t ( x t | x a 1:t−1

) for i = 1, . . . , N .

(c) Set x i 1:t = { x a 1:t−1

, x i t } and w i t = W t θ ( x i 1:t ) for i = 1, . . . , N .

The particle filter 6(24)

5 10 15 20 25

−4

−3

−2

−1 0 1

Time

State

Sampling based on the PF 7(24) With P ( x ? 1:T = x i 1:T | F T N ) ∝ w i T we get x ? 1:T approx. ∼ p θ ( x 1:T | y 1:T ) .

Conditional particle filter with ancestor sampling 8(24)

Problems with this approach,

• Based on a PF ⇒ approximate sample.

• Does not leave p ( θ, x 1:T | y 1:T ) invariant!

• Relies on large N to be successful.

• A lot of wasted computations.

Conditional particle filter with ancestor sampling (CPF-AS) Let x 0 1:T = ( x 0 1 , . . . , x 0 T ) be a fixed reference trajectory.

• At each time t , sample only N − 1 particles in the standard way.

• Set the N th particle deterministically: x N t = x 0 t .

• Generate an artificial history for x N t by ancestor sampling.

Conditional particle filter with ancestor sampling 9(24)

Algorithm CPF-AS, conditioned on x 0 1:T 1. Initialize ( t = 1 ):

(a) Draw x i 1 ∼ R θ 1 ( x 1 ) for i = 1, . . . , N − 1 . (b) Set x N 1 = x 0 1 .

(c) Set w i 1 = W θ 1 ( x i 1 ) for i = 1, . . . , N . 2. for t = 2, . . . , T :

(a) Draw a i t ∼ Discrete ({ w j t−1 } N j=1 ) for i = 1, . . . , N − 1 . (b) Draw x i t ∼ R θ t ( x t | x a 1:t−1

) for i = 1, . . . , N − 1 . (c) Set x N t = x 0 t .

(d) Draw a N t with P ( a N t = i | F t−1 N ) ∝ w i t−1 f θ ( x 0 t | x i t−1 ) . (e) Set x i 1:t = { x a 1:t−1

, x i t } and w i t = W t θ ( x i 1:t ) for i = 1, . . . , N .

The PGAS Markov kernel (I/II) 10(24)

Consider the procedure:

1. Run CPF-AS ( N, x 0 1:T ) targeting p θ ( x 1:T | y 1:T ) , 2. Sample x ? 1:T with P ( x ? 1:T = x i 1:T | F T N ) ∝ w i T .

The PGAS Markov kernel (II/II) 11(24)

This procedure:

• Maps x 0 1:T stochastically into x ? 1:T .

• Implicitly defines a Markov kernel ( P N θ ) on ( X T , X T ) , referred to as the PGAS (Particle Gibbs with ancestor sampling) kernel.

Theorem

For any number of particles N ≥ 1 and for any θ ∈ Θ , the PGAS kernel P N θ leaves p θ ( x 1:T | y 1:T ) invariant,

p θ ( dx ? 1:T | y 1:T ) =

Z

P N θ ( x 1:T 0 , dx ? 1:T ) p θ ( dx 0 1:T | y 1:T ) .

Ergodicity 12(24)

Theorem

Assume that there exist constants ε > 0 and κ < ∞ such that, for any θ ∈ Θ , t ∈ { 1, . . . , T } and x 1:t ∈ X t , W t θ ( x 1:t ) ≤ κ and p θ ( y 1:T ) ≥ ε . a

Then, for any N ≥ 2 the PGAS kernel P N θ is uniformly ergodic. That is, there exist constants R < ∞ and ρ ∈ [ 0, 1 ) such that

k( P N θ ) k ( x 1:T 0 , ·) − p θ ( · | y 1:T )k TV ≤ Rρ k , ∀ x 1:T 0 ∈ X T .

Fredrik Lindsten ^? , Michael I. Jordan ^† , Thomas B. Schön ^‡

x _t ∼ f _θ ( x _t | x _t ₋ ₁ ) , y _t ∼ g _θ ( y _t | x _t ) , and x ₁ ∼ π ( x ₁ ) .

We observe y _1:T = ( y ₁ , . . . , y T ) and wish to estimate θ .

• Frequentists: Find ˆθ ML = arg max _θ p _θ ( y _1:T ) . - Use e.g. the Monte Carlo EM algorithm.

• Bayesians: Find p ( θ | y _1:T ) .

Aim: Find p ( θ, x _1:T | y _1:T ) .

• Draw θ [ k ] ∼ p ( θ | x _1:T [ k − 1 ] , y _1:T ) ; OK!

• Draw x _1:T [ k ] ∼ p _θ _[ _k _] ( x _1:T | y _1:T ) . Hard!

Problem: p _θ ( x _1:T | y _1:T ) not available!

Idea: Approximate p _θ ( x _1:T | y _1:T ) using a particle filter.

• Resampling: P ( a ⁱ _t = j | F _t ^N ₋ ₁ ) = w ^j _t ₋ ₁ / ∑ l w ^l _t ₋ ₁ .

• Propagation: x ⁱ _t ∼ R _t ^θ ( x _t | x ^a _1:t

₋ ₁ ) and x ⁱ _1:t = { x ^a _1:t

₋ ₁ , x ⁱ _t } .

• Weighting: w ⁱ _t = W _t ^θ ( x ⁱ _1:t ) .

⇒ { x ⁱ _1:t , w ⁱ _t } ^N _i ₌ ₁

(a) Draw x ⁱ ₁ ∼ R ^θ ₁ ( x ₁ ) for i = 1, . . . , N . (b) Set w ⁱ ₁ = W ^θ ₁ ( x ⁱ ₁ ) ^for i = 1, . . . , N ^. 2. for t = 2, . . . , T :

(a) Draw a ⁱ _t ∼ Discrete ({ w ^j _t−1 } ^N _j=1 ) for i = 1, . . . , N . (b) Draw x ⁱ _t ∼ R ^θ _t ( x t | x ^a _1:t−1

(c) Set x ⁱ _1:t = { x ^a _1:t−1

, x ⁱ _t } and w ⁱ _t = W _t ^θ ( x ⁱ _1:t ) for i = 1, . . . , N .

Sampling based on the PF 7(24) With P ( x ^? _1:T = x ⁱ _1:T | F _T ^N ) ∝ w ⁱ _T ^{we get} x ^? _1:T ^approx. ∼ p _θ ( x _1:T | y _1:T ) .

• Does not leave p ( θ, x _1:T | y _1:T ) invariant!

Conditional particle filter with ancestor sampling (CPF-AS) Let x ⁰ _1:T = ( x ⁰ ₁ , . . . , x ⁰ _T ) be a fixed reference trajectory.

• Set the N th particle deterministically: x ^N _t = x ⁰ _t .

• Generate an artificial history for x ^N _t by ancestor sampling.

Algorithm CPF-AS, conditioned on x ⁰ _1:T 1. Initialize ( t = 1 ):

(a) Draw x ⁱ ₁ ∼ R ^θ ₁ ( x 1 ) for i = 1, . . . , N − 1 . (b) Set x ^N ₁ = x ⁰ ₁ .

(c) Set w ⁱ ₁ = W ^θ ₁ ( x ⁱ ₁ ) for i = 1, . . . , N . 2. for t = 2, . . . , T :

(a) Draw a ⁱ _t ∼ Discrete ({ w ^j _t−1 } ^N _j=1 ) for i = 1, . . . , N − 1 . (b) Draw x ⁱ _t ∼ R ^θ _t ( x t | x ^a _1:t−1

) for i = 1, . . . , N − 1 . (c) Set x ^N _t = x ⁰ _t .

(d) Draw a ^N _t with P ( a ^N _t = i | F _t−1 ^N ) ∝ w ⁱ _t−1 f _θ ( x ⁰ _t | x ⁱ _t−1 ) . (e) Set x ⁱ _1:t = { x ^a _1:t−1

, x ⁱ _t } and w ⁱ _t = W _t ^θ ( x ⁱ _1:t ) for i = 1, . . . , N .

1. Run CPF-AS ( N, x ⁰ _1:T ) targeting p _θ ( x _1:T | y _1:T ) , 2. Sample x ^? _1:T with P ( x ^? _1:T = x ⁱ _1:T | F _T ^N ) ∝ w ⁱ _T ^.

• Maps x ⁰ _1:T stochastically into x ^? _1:T .

• Implicitly defines a Markov kernel ( P ^N _θ ) on ( X ^T , X ^T ) , referred to as the PGAS (Particle Gibbs with ancestor sampling) kernel.

For any number of particles N ≥ 1 and for any θ ∈ _Θ , the PGAS kernel P ^N _θ leaves p _θ ( x _1:T | y _1:T ) invariant,

p _θ ( dx ^? _1:T | y _1:T ) =

P ^N _θ ( x _1:T ⁰ , dx ^? _1:T ) p _θ ( dx ⁰ _1:T | y _1:T ) .

Assume that there exist constants ε > 0 and κ < ∞ such that, for any θ ∈ _Θ , t ∈ { 1, . . . , T } and x _1:t ∈ X ^t , W _t ^θ ( x _1:t ) ≤ κ and p _θ ( y _1:T ) ≥ ε . ^a

Then, for any N ≥ 2 the PGAS kernel P ^N _θ is uniformly ergodic. That is, there exist constants R < ∞ ^and ρ ∈ [ 0, 1 ) such that

k( P ^N _θ ) ^k ( x _1:T ⁰ , ·) − p _θ ( · | y _1:T )k TV ≤ Rρ ^k , ∀ x _1:T ⁰ ∈ X ^T .

1. Initialize: Set { θ [ 0 ] , x _1:T [ 0 ]} arbitrarily.

(a) Draw x 1:T [ k ] ∼ P ^N _θ[k−1] ( x 1:T [ k − 1 ] , · ) . (b) Draw θ [ k ] ∼ p ( θ | x 1:T [ k ] , y 1:T ) .

For any number of particles N ≥ 2 , the Markov chain { θ [ k ] , x _1:T [ k ]} _k _≥ ₁ has limiting distribution p ( θ, x _1:T | y _1:T ) .

x _t + 1 = 0.9x _t + v _t , v _t ∼ N ( 0, θ ) , y t = e t exp (

x t ) , e t ∼ N ( 0, 1 ) . Consider the ACF of θ [ k ] − E [ θ | y _1:T ] .

Back to frequentistic objective: ˆθ ML = arg max _θ p θ ( y 1:T ) .

1. Initialize: Set { θ [ 0 ] , x _1:T [ 0 ]} arbitrarily. Set Q b ₀ ≡ 0 . 2. For k ≥ 1 , iterate:

(a) Draw x 1:T [ k ] ∼ P ^N _θ[k−1] ( x 1:T [ k − 1 ] , · ) . (b) Compute

Q b _k ( θ ) = Q b _k−1 ( θ ) + γ _k

log p _θ ( x _1:T [ k ] , y _1:T ) − Q b _k−1 ( θ ) .

(c) Set θ [ k ] = arg max _θ Q b k ( θ ) .

x _t + 1 = ax _t + b x t

1 + x ² _t + c cos ( 1.2t ) + v _t , v _t ∼ N ( 0, σ _v ² ) , y _t = 0.05x ² _t + e _t , e _t ∼ N ( 0, σ _e ² ) .

• Parameterization: θ = ( a, b, c, σ _v ² , σ _e ² )

• Relative error: k( θ [ k ] − b θ ML ) ./b θ ML k ₂ .

• Defines an ergodic Markov kernel on ( X ^T , X ^T ) leaving p _θ ( x _1:T | y _1:T ) invariant for any number of particles N ≥ 2 .