Blocking Strategies and Stability of Particle Gibbs Samplers

(1)

Blocking Strategies and Stability of Particle Gibbs Samplers

Fredrik Lindsten

Uppsala University

February 10, 2016

(2)

Outline

1. Background Particle Gibbs 2. Uniform ergodicity

F. Lindsten, R. Douc, and E. Moulines, Uniform ergodicity of the Particle Gibbs sampler. Scandinavian Journal of Statistics, 42(3): 775-797, 2015.

3. Blocking strategies and stability

S. S. Singh, F. Lindsten, and E. Moulines, Blocking Strategies and Stability of Particle Gibbs Samplers. arXiv:1509.08362, 2015.

(3)

Inference in state-space models

Consider a nonlinear discrete-time state-space model, Xt| X_t−1∼ m_θ(Xt−1, ·),

Yt| X_t∼ g_θ(Xt, ·), and X1 ∼ µ.

We observe Y1:T = (y₁, . . . , y_T) and wish to estimate θ and/or X1:T.

(4)

Gibbs sampler for SSMs

Let

φT ,θ(dx1:T) = p(x1:T | θ, y_1:T)dx1:T, denote the joint smoothing distribution.

MCMC: Gibbs sampling for state-space models. Iterate,

I Draw θ[k] ∼ p(θ | X1:T[k − 1], y1:T); OK!

I Draw X1:T[k] ∼ φ_{T ,θ[k]}(·). Hard!

One-at-a-time: Xt[k] ∼ p(xt| θ[k], X_t−1[k], Xt+1[k − 1], yt) Particle Gibbs: Approximate φT ,θ(dx_1:T)using a particle lter.

(5)

The particle lter

Theparticle lterapproximates φt,θ(dx_1:t), t = 1, . . . , T by

φb^N_t,θ(dx_1:t) :=

N

X

i=1

ω_tⁱ P

`ω^`_tδ_Xi

1:t(dx_1:t).

I Resampling: {X1:t−1ⁱ , ω_t−1ⁱ }^N_i=1→ { ˜X_1:t−1ⁱ , 1/N }^N_i=1.

I Propagation: Xtⁱ∼ q_t,θ( ˜X_t−1ⁱ , ·) and X1:tⁱ = ( ˜X_1:t−1ⁱ , X_tⁱ).

I Weighting: ωtⁱ= Wt,θ( ˜X_t−1ⁱ , X_tⁱ).

⇒ {X_1:tⁱ , ωⁱ_t}^N_i=1

Weighting Resampling Propagation Weighting Resampling

(6)

MCMC using particle lters

In MCMC we need a Markov kernel with invariant distribution φT. (From now on we drop θ from the notation.)

Conditional particle lter (CPF)

Letx⁰_1:T = (x⁰₁, . . . , x⁰_T)be a given reference trajectory.

I Sample only N − 1 particles in the standard way.

I Set the Nth particle deterministically:

X_t^N =x⁰_t and A^Nt = N.

C. Andrieu, A. Doucet and R. Holenstein, Particle Markov chain Monte Carlo methods.

Journal of the Royal Statistical Society: Series B, 72:269-342, 2010.

(7)

The PG Markov kernel (I/II)

Consider the procedure:

1. Run CPF(N,x⁰_1:T) targeting φT(dx_1:T),

2. SampleX_1:T^? with P(X_1:T^? = X_1:Tⁱ | F_T^N) ∝ ωⁱ_T.

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

(8)

The PG Markov kernel (II/II)

This procedure:

I Mapsx⁰_1:T stochastically intoX_1:T^? .

I Implicitly denes a Markov kernel PN on (X^T, X^T) (the PG kernel),

P_N(x⁰_1:T, A) = E[1A(X_1:T^? )]

I PN is φT-invariant for any number of particles N ≥ 1.

What about ergodicity?

(9)

Outline

(10)

Minorisation

Assume kWtk_∞< ∞and dene {Bt,T}^T_t=1 by

Bt,T = sup

0≤`≤T −t

kW_tk_∞sup_x_tp(y_t+1:t+`| x_t) p(y_t:t+`| y_1:t−1) Theorem

The PG kernel is minorised by φT:

P_N(x⁰_1:T, A) ≥ (1 − ε_{T ,N})φ_T(A) where

εT,N := 1 −

T

Y

t=1

N − 1

2B_t,T + N − 2 ≤ 1 N − 1

T

X

t=1

(2Bt,T− 1) +O(N⁻²).

(11)

Mixing conditions

Under strong mixing conditions:

I 1 − ε_{T ,N} ≥

1 −_{c(N −1)+1}¹ T

for c ∈ (0, 1] (depending on mixing).

I Stable as T → ∞ if N ∼ γT .

Under (weaker) moment conditions:

I (1 − ε_{T ,N})⁻¹ bounded in probability as T → ∞, provided N ∼ T^1/γ for γ ∈ (0, 1) (depending on mixing).

I Generalised to case with misspecied model (unknown θ).

I Veriable conditions (also for non-compact state spaces).

(12)

Gibbs sampling for state space models

Alternative Gibbs sampling strategies:

Particle Gibbs: X_1:T^? ∼ P_N(x1:T, ·).

I Samples X1:T in one block.

I Requires N ∝ T as T → ∞ for stability (strong mixing)

⇒ O(T²) computational cost!

One-at-a-time: Xt^?∼ p(x_t| x_−t, y_t), t = 1, . . . , T .

I Slow mixing/convergence speed!

I Stable as T → ∞?

(13)

Outline

(14)

Blocking strategy

J₁ J₃ J₅

J₂ J₄

1 · · · · · · T

Intermediate strategy blocked Particle Gibbs:

P_N^J(x_J⁺, dx^?_J) PG kernel for p(xJ | x_∂J, yJ).

Trade o:

(1) Mixing ofideal blocked Gibbs sampler% as |J| % (how fast? stable?)

(2) Mixing of P_N^J =

1 −_{c(N −1)+1}¹

|J|

, i.e., & as |J| %

∂J = {t ∈ J^c: t + 1 ∈ J or t − 1 ∈ J} (boundary points for block J)

(15)

Stability of ideal blocked Gibbs sampler

Theorem

Let J = {J1, . . . , Jm} be a cover of {1, . . . , T } and let

P = P^J¹· · · P^J^m be the ideal Gibbs kernel forone complete sweep.

Let all blocks have common size L and common overlap p. Then

|µP^k(f ) − φT(f )| ≤ 2λ^k−1

T

X

t=1

osct(f ),

where λ = α^p+1+ α^L−p and α ∈ [0, 1) is a constant depending on the mixing coecients of the model (assuming strong mixing).

Def: osct(f ) = sup

x,z∈X^T x−t=z−t

|f (x) − f (z)|

(16)

Stability of ideal blocked Gibbs sampler

Jk

Jk−1 Jk+1

L

p

I If L ≥ 2p + 1 the odd/even blocks can be updated inparallel!

I To control the rate λ = α^p+1+ α^L−p we need to increase both block size L and overlap p!

ex) with L = 2p + 1 (. 50% overlapping blocks) we get λ < 1 if L > _{log α}^{log 4}−1 − 1.

I For left-to-right and parallel blocking the rate is ∼ λ².

(17)

Stability of blocked Particle Gibbs sampler

Theblocked Particle Gibbs samplerP_N can be seen as a perturbation of the ideal blocked Gibbs sampler P.

Theorem

|µP_N^k(f ) − φT(f )| ≤ 2λ^k−1_N PT

t=1osct(f ) λ_N = λ +const. × εL,N, ε_L,N ≤ 1 −

1 − 1

c(N − 1) + 1

L

.

I λ → 0with increasing block size L and overlap p.

I _L,N & as N %; L,N % as L %.

I Bound independent of T for marginals of φT(dx1:T).

I kµP_N^k(·) − φT(·)k_TV ≤ 2T λ^k−1_N .

(18)

Summary

I Particle Gibbs mimics sampling from φT ,θ(dx_1:T)

(e.g., in a Gibbs sampler or stochastic approximation method).

I Uniformly ergodicunder weak conditions.

I Strong mixing conditions: stable if N = γT .

I (Weaker) Moment conditions: stable if N = T^1/γ.

I Blocking⇒stable as T → ∞ for constant N.

I Set block size L and overlap p s.t. ideal sampler is stable.

I Set N large enough to obtain a stable Particle Gibbs sampler (depends only on L, not T ).

I Opens up for parallelisation!

I Requires evaluation of mθ(xt−1, xt)!

(19)

Wasserstein estimates

Def: For f : X^T 7→ R, the oscillation in the i-th coordinate is osci(f ) = sup

x,z∈X^T x−i=z−i

|f (x) − f (z)|

Def: W is aWasserstein matrix for Markov kernel P if

osci(P f ) ≤

T

X

j=1

W_ijoscj(f ).

(20)

Wasserstein matrix for blocked Gibbs sam- pler

Under strong mixing

W^J =







1 . ..

1

α 0 · · · 0 α^{|J |}

α² 0 · · · 0 α^{|J |−1} ..

. ... . .. ... ...

α^{|J |} 0 · · · 0 α

1 . ..

1





 ,

is a Wasserstein Matrix for theideal Gibbs kernel updating block J,

P^J(x1:T, dx^?_1:T) :

(X_J^? ∼ p(x_J | x_∂J, y_J)dx_J, X_J^?c = xJ^c

where α ∈ [0, 1) is a constant depending on the mixing coecients.

(21)

Stability of blocked Gibbs sampler

Theorem

Let J = {J1, . . . , J_m} be a cover of {1, . . . , T } and let

P = P^J¹· · · P^J^m be the Gibbs kernel forone complete sweep. Let

∂ =S

J ∈J ∂J. Then, if sup

i∈J ∩∂

T

X

j=1

W_ij^J ≤ λ < 1 ∀J ∈ J , (?)

it follows that |µP^k(f ) − φT(f )| ≤ 2λ^k−1

T

X

i=1

osci(f ).

I With . 50% overlapping equally sized blocks, (?) is satised if the block size satises |J| >_{log α}^{log 4}⁻¹ − 1.

I For left-to-right and parallel blocking the rate is ∼ λ².

Blocking Strategies and Stability of Particle Gibbs Samplers