Particle Gibbs with Ancestor Sampling
Fredrik Lindsten ? , Michael I. Jordan † , Thomas B. Schön ‡
Chamonix, January 6, 2014
?
Division of Automatic Control Linköping University, Sweden
†
Departments of EECS and Statistics University of California, Berkeley, USA
‡
Department of Information Technology Uppsala University, Sweden
AUTOMATIC CONTROL
Identification of state-space models 2(24)
Consider a nonlinear discrete-time state-space model,
x t ∼ f θ ( x t | x t − 1 ) , y t ∼ g θ ( y t | x t ) , and x 1 ∼ π ( x 1 ) .
We observe y 1:T = ( y 1 , . . . , y T ) and wish to estimate θ .
• Frequentists: Find ˆθ ML = arg max θ p θ ( y 1:T ) . - Use e.g. the Monte Carlo EM algorithm.
• Bayesians: Find p ( θ | y 1:T ) .
- Use e.g. Gibbs sampling.
Gibbs sampler for SSMs 3(24)
Aim: Find p ( θ, x 1:T | y 1:T ) .
MCMC: Gibbs sampling for state-space models. Iterate,
• Draw θ [ k ] ∼ p ( θ | x 1:T [ k − 1 ] , y 1:T ) ; OK!
• Draw x 1:T [ k ] ∼ p θ [ k ] ( x 1:T | y 1:T ) . Hard!
Problem: p θ ( x 1:T | y 1:T ) not available!
Idea: Approximate p θ ( x 1:T | y 1:T ) using a particle filter.
AUTOMATIC CONTROL
The particle filter 4(24)
• Resampling: P ( a i t = j | F t N − 1 ) = w j t − 1 / ∑ l w l t − 1 .
• Propagation: x i t ∼ R t θ ( x t | x a 1:t
it− 1 ) and x i 1:t = { x a 1:t
it− 1 , x i t } .
• Weighting: w i t = W t θ ( x i 1:t ) .
⇒ { x i 1:t , w i t } N i = 1
Weighting
Resampling Propagation Weighting
ResamplingThe particle filter 5(24)
Algorithm Particle filter (PF) 1. Initialize ( t = 1 ):
(a) Draw x i 1 ∼ R θ 1 ( x 1 ) for i = 1, . . . , N . (b) Set w i 1 = W θ 1 ( x i 1 ) for i = 1, . . . , N . 2. for t = 2, . . . , T :
(a) Draw a i t ∼ Discrete ({ w j t−1 } N j=1 ) for i = 1, . . . , N . (b) Draw x i t ∼ R θ t ( x t | x a 1:t−1
it) for i = 1, . . . , N .
(c) Set x i 1:t = { x a 1:t−1
it, x i t } and w i t = W t θ ( x i 1:t ) for i = 1, . . . , N .
AUTOMATIC CONTROL
The particle filter 6(24)
5 10 15 20 25
−4
−3
−2
−1 0 1
Time
State
Sampling based on the PF 7(24) With P ( x ? 1:T = x i 1:T | F T N ) ∝ w i T we get x ? 1:T approx. ∼ p θ ( x 1:T | y 1:T ) .
5 10 15 20 25
−4
−3.5
−3
−2.5
−2
−1.5
−1
−0.5 0 0.5 1
Time
State
AUTOMATIC CONTROL
Conditional particle filter with ancestor sampling 8(24)
Problems with this approach,
• Based on a PF ⇒ approximate sample.
• Does not leave p ( θ, x 1:T | y 1:T ) invariant!
• Relies on large N to be successful.
• A lot of wasted computations.
Conditional particle filter with ancestor sampling (CPF-AS) Let x 0 1:T = ( x 0 1 , . . . , x 0 T ) be a fixed reference trajectory.
• At each time t , sample only N − 1 particles in the standard way.
• Set the N th particle deterministically: x N t = x 0 t .
• Generate an artificial history for x N t by ancestor sampling.
Conditional particle filter with ancestor sampling 9(24)
Algorithm CPF-AS, conditioned on x 0 1:T 1. Initialize ( t = 1 ):
(a) Draw x i 1 ∼ R θ 1 ( x 1 ) for i = 1, . . . , N − 1 . (b) Set x N 1 = x 0 1 .
(c) Set w i 1 = W θ 1 ( x i 1 ) for i = 1, . . . , N . 2. for t = 2, . . . , T :
(a) Draw a i t ∼ Discrete ({ w j t−1 } N j=1 ) for i = 1, . . . , N − 1 . (b) Draw x i t ∼ R θ t ( x t | x a 1:t−1
it) for i = 1, . . . , N − 1 . (c) Set x N t = x 0 t .
(d) Draw a N t with P ( a N t = i | F t−1 N ) ∝ w i t−1 f θ ( x 0 t | x i t−1 ) . (e) Set x i 1:t = { x a 1:t−1
it, x i t } and w i t = W t θ ( x i 1:t ) for i = 1, . . . , N .
AUTOMATIC CONTROL
The PGAS Markov kernel (I/II) 10(24)
Consider the procedure:
1. Run CPF-AS ( N, x 0 1:T ) targeting p θ ( x 1:T | y 1:T ) , 2. Sample x ? 1:T with P ( x ? 1:T = x i 1:T | F T N ) ∝ w i T .
5 10 15 20 25 30 35 40 45 50
−3
−2
−1 0 1 2 3
Time
State
The PGAS Markov kernel (II/II) 11(24)
This procedure:
• Maps x 0 1:T stochastically into x ? 1:T .
• Implicitly defines a Markov kernel ( P N θ ) on ( X T , X T ) , referred to as the PGAS (Particle Gibbs with ancestor sampling) kernel.
Theorem
For any number of particles N ≥ 1 and for any θ ∈ Θ , the PGAS kernel P N θ leaves p θ ( x 1:T | y 1:T ) invariant,
p θ ( dx ? 1:T | y 1:T ) =
Z
P N θ ( x 1:T 0 , dx ? 1:T ) p θ ( dx 0 1:T | y 1:T ) .
F. Lindsten, M. I. Jordan and T. B. Schön, P. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou and K. Q. Weinberger (Eds.), Ancestor Sampling for Particle Gibbs Advances in Neural Information Processing Systems (NIPS) 25, 2600-2608, 2012.
F. Lindsten, M. I. Jordan and T. B. Schön, Particle Gibbs with Ancestor sampling, arXiv:1401.0604, 2014.
AUTOMATIC CONTROL
Ergodicity 12(24)
Theorem
Assume that there exist constants ε > 0 and κ < ∞ such that, for any θ ∈ Θ , t ∈ { 1, . . . , T } and x 1:t ∈ X t , W t θ ( x 1:t ) ≤ κ and p θ ( y 1:T ) ≥ ε . a
Then, for any N ≥ 2 the PGAS kernel P N θ is uniformly ergodic. That is, there exist constants R < ∞ and ρ ∈ [ 0, 1 ) such that
k( P N θ ) k ( x 1:T 0 , ·) − p θ ( · | y 1:T )k TV ≤ Rρ k , ∀ x 1:T 0 ∈ X T .
a
N.B. These conditions are simple, but unnecessarily strong; see (Lindsten, Douc, and Moulines. 2014).
F. Lindsten, M. I. Jordan and T. B. Schön, Particle Gibbs with Ancestor sampling, arXiv:1401.0604, 2014.
PGAS for Bayesian identification 13(24)
Bayesian identification: PGAS + Gibbs Algorithm PGAS for Bayesian identification
1. Initialize: Set { θ [ 0 ] , x 1:T [ 0 ]} arbitrarily.
2. For k ≥ 1 , iterate:
(a) Draw x 1:T [ k ] ∼ P N θ[k−1] ( x 1:T [ k − 1 ] , · ) . (b) Draw θ [ k ] ∼ p ( θ | x 1:T [ k ] , y 1:T ) .
For any number of particles N ≥ 2 , the Markov chain { θ [ k ] , x 1:T [ k ]} k ≥ 1 has limiting distribution p ( θ, x 1:T | y 1:T ) .
AUTOMATIC CONTROL
ex) Stochastic volatility model 14(24)
Stochastic volatility model,
x t + 1 = 0.9x t + v t , v t ∼ N ( 0, θ ) , y t = e t exp (
12x t ) , e t ∼ N ( 0, 1 ) . Consider the ACF of θ [ k ] − E [ θ | y 1:T ] .
0 50 100 150 200
0 0.2 0.4 0.6 0.8 1
PG-AS, T = 1000
Lag
ACF
N=5 N=20 N=100 N=1000
0 50 100 150 200
0 0.2 0.4 0.6 0.8 1
PG, T = 1000
Lag
ACF
N=5 N=20 N=100 N=1000
PGAS vs. PG 15(24)
5 10 15 20 25 30 35 40 45 50
−3
−2
−1 0 1 2 3
Time
State
5 10 15 20 25 30 35 40 45 50
−3
−2
−1 0 1 2 3
Time
State
5 10 15 20 25 30 35 40 45 50
−3
−2
−1 0 1 2 3
Time
State
AUTOMATIC CONTROL
PGAS PG
Examples 16(24)
• F. Lindsten, T. B. Schön and M. I. Jordan, Bayesian semiparametric Wiener system identification, Automatica, 49(7): 2053-2063, July 2013.
−10 0 10 20
Magnitude(dB)
0 0.5 1 1.5 2 2.5 3
−50 0 50 100
Frequency (rad/s)
Phase(deg)
TruePosterior mean 99 % credibility
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
−1
−0.8
−0.6
−0.4
−0.2 0 0.2 0.4 0.6 0.8 1
z
h(z)
TruePosterior mean 99 % credibility
• R. Frigola, F. Lindsten, T. B. Schön and C. E. Rasmussen, Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC, Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 2013.
−20 −10 0 10 20
−1
−0.5 0 0.5 1
−20
−10 0 10 20
x u
f(x,u)
10 20 30 40 50 60 70 80 90
−20
−15
−10
−5 0 5 10 15 20
t
x
Maximum likelihood identification 17(24)
Back to frequentistic objective: ˆθ ML = arg max θ p θ ( y 1:T ) .
• Common strategy: Particle smoothing + EM (PSEM)
• Alternative strategy: PGAS + stochastic approximation EM (PSAEM)
EM MCEM PSEM
SAEM PSAEM
F. Lindsten, An efficient stochastic approximation EM algorithm using conditional particle filters, Proceedings of the 38th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013.
AUTOMATIC CONTROL
PGAS for maximum likelihood identification 18(24)
Maximum likelihood identification: PGAS + SAEM = PSAEM Algorithm PSAEM
1. Initialize: Set { θ [ 0 ] , x 1:T [ 0 ]} arbitrarily. Set Q b 0 ≡ 0 . 2. For k ≥ 1 , iterate:
(a) Draw x 1:T [ k ] ∼ P N θ[k−1] ( x 1:T [ k − 1 ] , · ) . (b) Compute
Q b k ( θ ) = Q b k−1 ( θ ) + γ k
log p θ ( x 1:T [ k ] , y 1:T ) − Q b k−1 ( θ ) .
(c) Set θ [ k ] = arg max θ Q b k ( θ ) .
ex) Nonlinear time series 19(24)
Consider,
x t + 1 = ax t + b x t
1 + x 2 t + c cos ( 1.2t ) + v t , v t ∼ N ( 0, σ v 2 ) , y t = 0.05x 2 t + e t , e t ∼ N ( 0, σ e 2 ) .
• Parameterization: θ = ( a, b, c, σ v 2 , σ e 2 )
• Relative error: k( θ [ k ] − b θ ML ) ./b θ ML k 2 .
AUTOMATIC CONTROL
ex) Nonlinear time series 20(24)
10−2 10−1 100 101 102 103 104
10−3 10−2 10−1 100 101 102
Computational time (seconds)
Average relative error
PSAEM,N=15 PSEM, N=15 PSEM, N=50 PSEM, N=100 PSEM, N=500
Summary 21(24)
Particle Gibbs with ancestor sampling (PGAS):
• Uses SMC within MCMC in a systematic manner.
• Defines an ergodic Markov kernel on ( X T , X T ) leaving p θ ( x 1:T | y 1:T ) invariant for any number of particles N ≥ 2 .
• Seems to work well with a moderate N (say 5–50).
• Consists of two parts
• Conditioning: Ensures correct stationary distribution for any N .
• Ancestor sampling: Mitigates path degeneracy and enables movement around the conditioned path.
• Useful for both Bayesian and maximum-likelihood-based parameter inference as well as for state inference.
AUTOMATIC CONTROL
Ongoing and future work 22(24)
Ongoing and future work:
• Adapting PGAS to other types of models:
• General probabilistic graphical models (factor graphs).
• Non-Markovian latent variable models.
• Ancestor sampling for models with intractable transitions.
• How to scale N with T ? — establish more informative rates.
A few selected references 23(24)
Particle Gibbs with ancestor sampling:
• F. Lindsten, M. I. Jordan and T. B. Schön, Particle Gibbs with Ancestor sampling, arXiv:1401.0604, 2014.
• F. Lindsten, M. I. Jordan and T. B. Schön, P. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou and K. Q.
Weinberger (Eds.), Ancestor Sampling for Particle Gibbs Advances in Neural Information Processing Systems (NIPS) 25, 2600-2608, 2012.
Particle Gibbs with backward simulation (related procedure):
• N. Whiteley, Discussion on Particle Markov chain Monte Carlo methods, Journal of the Royal Statistical Society:
Series B, 72(3), 306–307, 2010.
• N. Whiteley, C. Andrieu and A. Doucet, Efficient Bayesian Inference for Switching State-Space Models using Discrete Particle Markov Chain Monte Carlo methods, Bristol Statistics Research Report 10:04, 2010.
• F. Lindsten and T. B. Schön, On the use of backward simulation in the particle Gibbs sampler, Proceedings of the 37th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012.
Seminal PMCMC paper:
• C. Andrieu, A. Doucet and R. Holenstein, Particle Markov chain Monte Carlo methods Journal of the Royal Statistical Society: Series B, 72(3), 269–342, 2010.
AUTOMATIC CONTROL
A few selected references 24(24)
Ergodicity of Particle Gibbs:
• F. Lindsten, R. Douc and E. Moulines, Uniform ergodicity of the Particle Gibbs sampler, arXiv:1401.0683, 2014.
• C. Andrieu, A. Lee and M. Vihola, Uniform Ergodicity of the Iterated Conditional SMC and Geometric Ergodicity of Particle Gibbs samplers, arXiv:1312.6432, 2013.
• N. Chopin, S. S. Singh, On the particle Gibbs sampler, arXiv:1304.1887, 2013.
PMCMC and SAEM:
• F. Lindsten, An efficient stochastic approximation EM algorithm using conditional particle filters, Proceedings of the 38th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013.
• C. Andrieu and M. Vihola, Markovian stochastic approximation with expanding projections, arXiv.org, arXiv:1111.5421, 2011.
• S. Donnet and A. Samson, EM algorithm coupled with particle filter for maximum likelihood parameter estimation of stochastic differential mixed-effects models, Tech. Rep. hal-00519576, v2, Université Paris Descartes, MAP5, 2011.