• No results found

Blocking Strategies and Stability of Particle Gibbs Samplers

N/A
N/A
Protected

Academic year: 2022

Share "Blocking Strategies and Stability of Particle Gibbs Samplers"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

Blocking Strategies and Stability of Particle Gibbs Samplers

Fredrik Lindsten

Uppsala University

February 10, 2016

(2)

Outline

1. Background  Particle Gibbs 2. Uniform ergodicity

F. Lindsten, R. Douc, and E. Moulines, Uniform ergodicity of the Particle Gibbs sampler. Scandinavian Journal of Statistics, 42(3): 775-797, 2015.

3. Blocking strategies and stability

S. S. Singh, F. Lindsten, and E. Moulines, Blocking Strategies and Stability of Particle Gibbs Samplers. arXiv:1509.08362, 2015.

(3)

Inference in state-space models

Consider a nonlinear discrete-time state-space model, Xt| Xt−1∼ mθ(Xt−1, ·),

Yt| Xt∼ gθ(Xt, ·), and X1 ∼ µ.

We observe Y1:T = (y1, . . . , yT) and wish to estimate θ and/or X1:T.

(4)

Gibbs sampler for SSMs

Let

φT ,θ(dx1:T) = p(x1:T | θ, y1:T)dx1:T, denote the joint smoothing distribution.

MCMC: Gibbs sampling for state-space models. Iterate,

I Draw θ[k] ∼ p(θ | X1:T[k − 1], y1:T); OK!

I Draw X1:T[k] ∼ φT ,θ[k](·). Hard!

One-at-a-time: Xt[k] ∼ p(xt| θ[k], Xt−1[k], Xt+1[k − 1], yt) Particle Gibbs: Approximate φT ,θ(dx1:T)using a particle lter.

(5)

The particle lter

Theparticle lterapproximates φt,θ(dx1:t), t = 1, . . . , T by

φbNt,θ(dx1:t) :=

N

X

i=1

ωti P

`ω`tδXi

1:t(dx1:t).

I Resampling: {X1:t−1i , ωt−1i }Ni=1→ { ˜X1:t−1i , 1/N }Ni=1.

I Propagation: Xti∼ qt,θ( ˜Xt−1i , ·) and X1:ti = ( ˜X1:t−1i , Xti).

I Weighting: ωti= Wt,θ( ˜Xt−1i , Xti).

⇒ {X1:ti , ωit}Ni=1

Weighting Resampling Propagation Weighting Resampling

(6)

MCMC using particle lters

In MCMC we need a Markov kernel with invariant distribution φT. (From now on we drop θ from the notation.)

Conditional particle lter (CPF)

Letx01:T = (x01, . . . , x0T)be a given reference trajectory.

I Sample only N − 1 particles in the standard way.

I Set the Nth particle deterministically:

XtN =x0t and ANt = N.

C. Andrieu, A. Doucet and R. Holenstein, Particle Markov chain Monte Carlo methods.

Journal of the Royal Statistical Society: Series B, 72:269-342, 2010.

(7)

The PG Markov kernel (I/II)

Consider the procedure:

1. Run CPF(N,x01:T) targeting φT(dx1:T),

2. SampleX1:T? with P(X1:T? = X1:Ti | FTN) ∝ ωiT.

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

(8)

The PG Markov kernel (II/II)

This procedure:

I Mapsx01:T stochastically intoX1:T? .

I Implicitly denes a Markov kernel PN on (XT, XT) (the PG kernel),

PN(x01:T, A) = E[1A(X1:T? )]

I PN is φT-invariant for any number of particles N ≥ 1.

What about ergodicity?

(9)

Outline

1. Background  Particle Gibbs 2. Uniform ergodicity

F. Lindsten, R. Douc, and E. Moulines, Uniform ergodicity of the Particle Gibbs sampler. Scandinavian Journal of Statistics, 42(3): 775-797, 2015.

3. Blocking strategies and stability

S. S. Singh, F. Lindsten, and E. Moulines, Blocking Strategies and Stability of Particle Gibbs Samplers. arXiv:1509.08362, 2015.

(10)

Minorisation

Assume kWtk< ∞and dene {Bt,T}Tt=1 by

Bt,T = sup

0≤`≤T −t

kWtksupxtp(yt+1:t+`| xt) p(yt:t+`| y1:t−1) Theorem

The PG kernel is minorised by φT:

PN(x01:T, A) ≥ (1 − εT ,NT(A) where

εT,N := 1 −

T

Y

t=1

N − 1

2Bt,T + N − 2 ≤ 1 N − 1

T

X

t=1

(2Bt,T− 1) +O(N−2).

(11)

Mixing conditions

Under strong mixing conditions:

I 1 − εT ,N ≥

1 −c(N −1)+11 T

for c ∈ (0, 1] (depending on mixing).

I Stable as T → ∞ if N ∼ γT .

Under (weaker) moment conditions:

I (1 − εT ,N)−1 bounded in probability as T → ∞, provided N ∼ T1/γ for γ ∈ (0, 1) (depending on mixing).

I Generalised to case with misspecied model (unknown θ).

I Veriable conditions (also for non-compact state spaces).

(12)

Gibbs sampling for state space models

Alternative Gibbs sampling strategies:

Particle Gibbs: X1:T? ∼ PN(x1:T, ·).

I Samples X1:T in one block.

I Requires N ∝ T as T → ∞ for stability (strong mixing)

⇒ O(T2) computational cost!

One-at-a-time: Xt?∼ p(xt| x−t, yt), t = 1, . . . , T .

I Slow mixing/convergence speed!

I Stable as T → ∞?

(13)

Outline

1. Background  Particle Gibbs 2. Uniform ergodicity

F. Lindsten, R. Douc, and E. Moulines, Uniform ergodicity of the Particle Gibbs sampler. Scandinavian Journal of Statistics, 42(3): 775-797, 2015.

3. Blocking strategies and stability

S. S. Singh, F. Lindsten, and E. Moulines, Blocking Strategies and Stability of Particle Gibbs Samplers. arXiv:1509.08362, 2015.

(14)

Blocking strategy

J1 J3 J5

J2 J4

1 · · · · · · T

Intermediate strategy blocked Particle Gibbs:

PNJ(xJ+, dx?J) PG kernel for p(xJ | x∂J, yJ).

Trade o:

(1) Mixing ofideal blocked Gibbs sampler% as |J| % (how fast? stable?)

(2) Mixing of PNJ =

1 −c(N −1)+11

|J|

, i.e., & as |J| %

∂J = {t ∈ Jc: t + 1 ∈ J or t − 1 ∈ J} (boundary points for block J)

(15)

Stability of ideal blocked Gibbs sampler

Theorem

Let J = {J1, . . . , Jm} be a cover of {1, . . . , T } and let

P = PJ1· · · PJm be the ideal Gibbs kernel forone complete sweep.

Let all blocks have common size L and common overlap p. Then

|µPk(f ) − φT(f )| ≤ 2λk−1

T

X

t=1

osct(f ),

where λ = αp+1+ αL−p and α ∈ [0, 1) is a constant depending on the mixing coecients of the model (assuming strong mixing).

Def: osct(f ) = sup

x,z∈XT x−t=z−t

|f (x) − f (z)|

(16)

Stability of ideal blocked Gibbs sampler

Jk

Jk−1 Jk+1

L

p

I If L ≥ 2p + 1 the odd/even blocks can be updated inparallel!

I To control the rate λ = αp+1+ αL−p we need to increase both block size L and overlap p!

ex) with L = 2p + 1 (. 50% overlapping blocks) we get λ < 1 if L > log αlog 4−1 − 1.

I For left-to-right and parallel blocking the rate is ∼ λ2.

(17)

Stability of blocked Particle Gibbs sampler

Theblocked Particle Gibbs samplerPN can be seen as a perturbation of the ideal blocked Gibbs sampler P.

Theorem

|µPNk(f ) − φT(f )| ≤ 2λk−1N PT

t=1osct(f ) λN = λ +const. × εL,N, εL,N ≤ 1 −



1 − 1

c(N − 1) + 1

L

.

I λ → 0with increasing block size L and overlap p.

I L,N & as N %; L,N % as L %.

I Bound independent of T for marginals of φT(dx1:T).

I kµPNk(·) − φT(·)kTV ≤ 2T λk−1N .

(18)

Summary

I Particle Gibbs  mimics sampling from φT ,θ(dx1:T)

(e.g., in a Gibbs sampler or stochastic approximation method).

I Uniformly ergodicunder weak conditions.

I Strong mixing conditions: stable if N = γT .

I (Weaker) Moment conditions: stable if N = T1/γ.

I Blocking⇒stable as T → ∞ for constant N.

I Set block size L and overlap p s.t. ideal sampler is stable.

I Set N large enough to obtain a stable Particle Gibbs sampler (depends only on L, not T ).

I Opens up for parallelisation!

I Requires evaluation of mθ(xt−1, xt)!

(19)

Wasserstein estimates

Def: For f : XT 7→ R, the oscillation in the i-th coordinate is osci(f ) = sup

x,z∈XT x−i=z−i

|f (x) − f (z)|

Def: W is aWasserstein matrix for Markov kernel P if

osci(P f ) ≤

T

X

j=1

Wijoscj(f ).

(20)

Wasserstein matrix for blocked Gibbs sam- pler

Under strong mixing

WJ =

1 . ..

1

α 0 · · · 0 α|J |

α2 0 · · · 0 α|J |−1 ..

. ... . .. ... ...

α|J | 0 · · · 0 α

1 . ..

1

 ,

is a Wasserstein Matrix for theideal Gibbs kernel updating block J,

PJ(x1:T, dx?1:T) :

(XJ? ∼ p(xJ | x∂J, yJ)dxJ, XJ?c = xJc

where α ∈ [0, 1) is a constant depending on the mixing coecients.

(21)

Stability of blocked Gibbs sampler

Theorem

Let J = {J1, . . . , Jm} be a cover of {1, . . . , T } and let

P = PJ1· · · PJm be the Gibbs kernel forone complete sweep. Let

∂ =S

J ∈J ∂J. Then, if sup

i∈J ∩∂

T

X

j=1

WijJ ≤ λ < 1 ∀J ∈ J , (?)

it follows that |µPk(f ) − φT(f )| ≤ 2λk−1

T

X

i=1

osci(f ).

I With . 50% overlapping equally sized blocks, (?) is satised if the block size satises |J| >log αlog 4−1 − 1.

I For left-to-right and parallel blocking the rate is ∼ λ2.

References

Related documents

Colour is described by RGBA values which, by using a blend function in OpenGL, means that our particles can become transparent.. RGBA values are red, green and blue values with the

The purpose of this paper is to compare two alternative continuum based methods, the Particle Finite Element Method (PFEM) and the Smoothed Particle (SP) method, to model

1) Particle characterization (Particle size and mass distribution, tapped and bulk density) 2) Feeding properties (determined by standard deviation from mass flow in screw feeding)

The PSO algorithm is an iterative optimization process and repeated iterations will continue until a stopping condition is satisfied. Within one iteration, a particle

Now we have seen that the Casimir operator W 2 describes the spin of a massive particle in quantum field theory, and since the Pauli-Lubanski spin is invariant under the

To investigate the relevance of our model for human disease, we compared Krt18 −/− mice with human hepatoma cell lines (n = 26) and HCC patient data (n = 300), revealing that

This document describes a calibration and verification method for a light scattering airborne particle counter (LSAPC), which is used to measure the size distribution and

● With beam the hysteresis-curve is displaced and even harmonics are generated and can be compensated to zero with extra current that can be measured.... Beam