Blocking Strategies and Stability of Particle Gibbs Samplers
Fredrik Lindsten
Uppsala University
February 10, 2016
Outline
1. Background Particle Gibbs 2. Uniform ergodicity
F. Lindsten, R. Douc, and E. Moulines, Uniform ergodicity of the Particle Gibbs sampler. Scandinavian Journal of Statistics, 42(3): 775-797, 2015.
3. Blocking strategies and stability
S. S. Singh, F. Lindsten, and E. Moulines, Blocking Strategies and Stability of Particle Gibbs Samplers. arXiv:1509.08362, 2015.
Inference in state-space models
Consider a nonlinear discrete-time state-space model, Xt| Xt−1∼ mθ(Xt−1, ·),
Yt| Xt∼ gθ(Xt, ·), and X1 ∼ µ.
We observe Y1:T = (y1, . . . , yT) and wish to estimate θ and/or X1:T.
Gibbs sampler for SSMs
Let
φT ,θ(dx1:T) = p(x1:T | θ, y1:T)dx1:T, denote the joint smoothing distribution.
MCMC: Gibbs sampling for state-space models. Iterate,
I Draw θ[k] ∼ p(θ | X1:T[k − 1], y1:T); OK!
I Draw X1:T[k] ∼ φT ,θ[k](·). Hard!
One-at-a-time: Xt[k] ∼ p(xt| θ[k], Xt−1[k], Xt+1[k − 1], yt) Particle Gibbs: Approximate φT ,θ(dx1:T)using a particle lter.
The particle lter
Theparticle lterapproximates φt,θ(dx1:t), t = 1, . . . , T by
φbNt,θ(dx1:t) :=
N
X
i=1
ωti P
`ω`tδXi
1:t(dx1:t).
I Resampling: {X1:t−1i , ωt−1i }Ni=1→ { ˜X1:t−1i , 1/N }Ni=1.
I Propagation: Xti∼ qt,θ( ˜Xt−1i , ·) and X1:ti = ( ˜X1:t−1i , Xti).
I Weighting: ωti= Wt,θ( ˜Xt−1i , Xti).
⇒ {X1:ti , ωit}Ni=1
Weighting Resampling Propagation Weighting Resampling
MCMC using particle lters
In MCMC we need a Markov kernel with invariant distribution φT. (From now on we drop θ from the notation.)
Conditional particle lter (CPF)
Letx01:T = (x01, . . . , x0T)be a given reference trajectory.
I Sample only N − 1 particles in the standard way.
I Set the Nth particle deterministically:
XtN =x0t and ANt = N.
C. Andrieu, A. Doucet and R. Holenstein, Particle Markov chain Monte Carlo methods.
Journal of the Royal Statistical Society: Series B, 72:269-342, 2010.
The PG Markov kernel (I/II)
Consider the procedure:
1. Run CPF(N,x01:T) targeting φT(dx1:T),
2. SampleX1:T? with P(X1:T? = X1:Ti | FTN) ∝ ωiT.
5 10 15 20 25 30 35 40 45 50
−3
−2
−1 0 1 2 3
Time
State
5 10 15 20 25 30 35 40 45 50
−3
−2
−1 0 1 2 3
Time
State
5 10 15 20 25 30 35 40 45 50
−3
−2
−1 0 1 2 3
Time
State
The PG Markov kernel (II/II)
This procedure:
I Mapsx01:T stochastically intoX1:T? .
I Implicitly denes a Markov kernel PN on (XT, XT) (the PG kernel),
PN(x01:T, A) = E[1A(X1:T? )]
I PN is φT-invariant for any number of particles N ≥ 1.
What about ergodicity?
Outline
1. Background Particle Gibbs 2. Uniform ergodicity
F. Lindsten, R. Douc, and E. Moulines, Uniform ergodicity of the Particle Gibbs sampler. Scandinavian Journal of Statistics, 42(3): 775-797, 2015.
3. Blocking strategies and stability
S. S. Singh, F. Lindsten, and E. Moulines, Blocking Strategies and Stability of Particle Gibbs Samplers. arXiv:1509.08362, 2015.
Minorisation
Assume kWtk∞< ∞and dene {Bt,T}Tt=1 by
Bt,T = sup
0≤`≤T −t
kWtk∞supxtp(yt+1:t+`| xt) p(yt:t+`| y1:t−1) Theorem
The PG kernel is minorised by φT:
PN(x01:T, A) ≥ (1 − εT ,N)φT(A) where
εT,N := 1 −
T
Y
t=1
N − 1
2Bt,T + N − 2 ≤ 1 N − 1
T
X
t=1
(2Bt,T− 1) +O(N−2).
Mixing conditions
Under strong mixing conditions:
I 1 − εT ,N ≥
1 −c(N −1)+11 T
for c ∈ (0, 1] (depending on mixing).
I Stable as T → ∞ if N ∼ γT .
Under (weaker) moment conditions:
I (1 − εT ,N)−1 bounded in probability as T → ∞, provided N ∼ T1/γ for γ ∈ (0, 1) (depending on mixing).
I Generalised to case with misspecied model (unknown θ).
I Veriable conditions (also for non-compact state spaces).
Gibbs sampling for state space models
Alternative Gibbs sampling strategies:
Particle Gibbs: X1:T? ∼ PN(x1:T, ·).
I Samples X1:T in one block.
I Requires N ∝ T as T → ∞ for stability (strong mixing)
⇒ O(T2) computational cost!
One-at-a-time: Xt?∼ p(xt| x−t, yt), t = 1, . . . , T .
I Slow mixing/convergence speed!
I Stable as T → ∞?
Outline
1. Background Particle Gibbs 2. Uniform ergodicity
F. Lindsten, R. Douc, and E. Moulines, Uniform ergodicity of the Particle Gibbs sampler. Scandinavian Journal of Statistics, 42(3): 775-797, 2015.
3. Blocking strategies and stability
S. S. Singh, F. Lindsten, and E. Moulines, Blocking Strategies and Stability of Particle Gibbs Samplers. arXiv:1509.08362, 2015.
Blocking strategy
J1 J3 J5
J2 J4
1 · · · · · · T
Intermediate strategy blocked Particle Gibbs:
PNJ(xJ+, dx?J) PG kernel for p(xJ | x∂J, yJ).
Trade o:
(1) Mixing ofideal blocked Gibbs sampler% as |J| % (how fast? stable?)
(2) Mixing of PNJ =
1 −c(N −1)+11
|J|
, i.e., & as |J| %
∂J = {t ∈ Jc: t + 1 ∈ J or t − 1 ∈ J} (boundary points for block J)
Stability of ideal blocked Gibbs sampler
Theorem
Let J = {J1, . . . , Jm} be a cover of {1, . . . , T } and let
P = PJ1· · · PJm be the ideal Gibbs kernel forone complete sweep.
Let all blocks have common size L and common overlap p. Then
|µPk(f ) − φT(f )| ≤ 2λk−1
T
X
t=1
osct(f ),
where λ = αp+1+ αL−p and α ∈ [0, 1) is a constant depending on the mixing coecients of the model (assuming strong mixing).
Def: osct(f ) = sup
x,z∈XT x−t=z−t
|f (x) − f (z)|
Stability of ideal blocked Gibbs sampler
Jk
Jk−1 Jk+1
L
p
I If L ≥ 2p + 1 the odd/even blocks can be updated inparallel!
I To control the rate λ = αp+1+ αL−p we need to increase both block size L and overlap p!
ex) with L = 2p + 1 (. 50% overlapping blocks) we get λ < 1 if L > log αlog 4−1 − 1.
I For left-to-right and parallel blocking the rate is ∼ λ2.
Stability of blocked Particle Gibbs sampler
Theblocked Particle Gibbs samplerPN can be seen as a perturbation of the ideal blocked Gibbs sampler P.
Theorem
|µPNk(f ) − φT(f )| ≤ 2λk−1N PT
t=1osct(f ) λN = λ +const. × εL,N, εL,N ≤ 1 −
1 − 1
c(N − 1) + 1
L
.
I λ → 0with increasing block size L and overlap p.
I L,N & as N %; L,N % as L %.
I Bound independent of T for marginals of φT(dx1:T).
I kµPNk(·) − φT(·)kTV ≤ 2T λk−1N .
Summary
I Particle Gibbs mimics sampling from φT ,θ(dx1:T)
(e.g., in a Gibbs sampler or stochastic approximation method).
I Uniformly ergodicunder weak conditions.
I Strong mixing conditions: stable if N = γT .
I (Weaker) Moment conditions: stable if N = T1/γ.
I Blocking⇒stable as T → ∞ for constant N.
I Set block size L and overlap p s.t. ideal sampler is stable.
I Set N large enough to obtain a stable Particle Gibbs sampler (depends only on L, not T ).
I Opens up for parallelisation!
I Requires evaluation of mθ(xt−1, xt)!
Wasserstein estimates
Def: For f : XT 7→ R, the oscillation in the i-th coordinate is osci(f ) = sup
x,z∈XT x−i=z−i
|f (x) − f (z)|
Def: W is aWasserstein matrix for Markov kernel P if
osci(P f ) ≤
T
X
j=1
Wijoscj(f ).
Wasserstein matrix for blocked Gibbs sam- pler
Under strong mixing
WJ =
1 . ..
1
α 0 · · · 0 α|J |
α2 0 · · · 0 α|J |−1 ..
. ... . .. ... ...
α|J | 0 · · · 0 α
1 . ..
1
,
is a Wasserstein Matrix for theideal Gibbs kernel updating block J,
PJ(x1:T, dx?1:T) :
(XJ? ∼ p(xJ | x∂J, yJ)dxJ, XJ?c = xJc
where α ∈ [0, 1) is a constant depending on the mixing coecients.
Stability of blocked Gibbs sampler
Theorem
Let J = {J1, . . . , Jm} be a cover of {1, . . . , T } and let
P = PJ1· · · PJm be the Gibbs kernel forone complete sweep. Let
∂ =S
J ∈J ∂J. Then, if sup
i∈J ∩∂
T
X
j=1
WijJ ≤ λ < 1 ∀J ∈ J , (?)
it follows that |µPk(f ) − φT(f )| ≤ 2λk−1
T
X
i=1
osci(f ).
I With . 50% overlapping equally sized blocks, (?) is satised if the block size satises |J| >log αlog 4−1 − 1.
I For left-to-right and parallel blocking the rate is ∼ λ2.