• No results found

Monte Carlo methods for inference in dynamical systems

N/A
N/A
Protected

Academic year: 2022

Share "Monte Carlo methods for inference in dynamical systems"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

Monte Carlo methods for inference in dynamical systems

Fredrik Lindsten

Uppsala University

March 30, 2016

1 / 34 fredrik.lindsten@it.uu.se Uppsala University

(2)

Identication of state-space models

Consider a nonlinear, discrete-time state-space model, xt+1 | xt∼ fθ(xt+1| xt),

yt| xt∼ gθ(yt| xt), and x1 ∼ µ(x1).

We observe y1:T := (y1, . . . , yT) and wish to estimate θ.

Bayesian model: θ is a random variable with prior density p(θ).

Aim: Compute p(θ | y1:T).

(3)

Gibbs sampler for SSMs

Aim: Find p(θ | y1:T).

Alternate between updating θ and updating x1:T.

⇒ Gibbs sampling

3 / 34 fredrik.lindsten@it.uu.se Uppsala University

(4)

Crash course in MCMC (I/IV)

Markov kernel

Let {Xk}k≥0 be a Markov chain on a general state space.

The Markov transition kernel is given by,

P (xk−1, A) = P(Xk∈ A | Xk−1= xk−1).

Stationary distribution

If π is a probability distribution such that, π(A) =

Z

P (x, A)π(x) dx, we say that π is a stationary distribution for P .

(5)

Crash course in MCMC (II/IV)

Ergodic theorem

If {Xk}k≥0 is ergodic with stationary distribution π, then

Eπ[g(X)] = Z

g(x)π(x)dx

| {z }

average over space

= lim

m→∞

1 m

m

X

k=1

g(Xk)

| {z }

average over "time"

Markov chain Monte Carlo: We want to compute Eπ[g(X)]

1. Construct an ergodic Markov kernel P with stationary distribution π.

2. Simulate {Xk}mk=1 according to P for large m.

3. Approximate Eπ[g(X)] ≈ m1 Pm

k=1g(Xk)

5 / 34 fredrik.lindsten@it.uu.se Uppsala University

(6)

Crash course in MCMC (III/IV)

Two common methods for constructing P : 1. Metropolis-Hastings

2. Gibbs sampling

Update components Xj of the vector X = (X1, . . . , Xd) in turn, Xj ∼ π(xj | x1, . . . , xj−1, xj+1, . . . , xd), j = 1, . . . , d.

(7)

Crash course in MCMC (IV/IV)

ex)

Sample from, π(x, y) = Nx y



;10 10



,2 1 1 1



. Gibbs sampler

I Draw x0 ∼ π(x | y);

I Draw y0 ∼ π(y | x0).

7 / 34 fredrik.lindsten@it.uu.se Uppsala University

0 5 10 15

0 5 10 15

X

Y

(8)

Gibbs sampler for SSMs, cont'd

Aim: Find p(θ, x1:T | y1:T).

Alternate between updating θ and updating x1:T. MCMC: Gibbs sampling for state-space models. Iterate,

I Draw θ[k] ∼ p(θ | x1:T[k − 1], y1:T);

I Draw x1:T[k] ∼ p(x1:T | θ[k], y1:T). The above procedure results in a Markov chain,

{θ[k], x1:T[k]}k≥1 with stationary distribution p(θ, x1:T | y1:T).

(9)

Linear Gaussian state-space model (I/II)

ex) Gibbs sampling for linear system identication.

xt+1= Axt+ vt, vt∼ N (0, Q), yt= Cxt+ et, et∼ N (0, R), with θ = (A, C, Q, R).

Gibbs sampling: Iterate,

I Draw θ0∼ p(θ | x1:T, y1:T);

I Draw x01:T ∼ p(x1:T | θ0, y1:T).

I Run a Kalman lter for t = 1, . . . , T .

I Simulate backward, x0T, x0T −1, . . . , x01.

0 0.5 1 1.5 2 2.5 3

−10

−5 0 5 10 15 20 25

Frequency (rad/s)

Magnitude(dB)

0 0.5 1 1.5 2 2.5 3

−50 0 50 100

Frequency (rad/s)

Phase(deg)

TruePosterior mean 95 % credibility

9 / 34 fredrik.lindsten@it.uu.se Uppsala University

(10)

Gibbs sampler for general SSM?

What about the general nonlinear/non-Gaussian case?

I Draw θ[k] ∼ p(θ | x1:T[k − 1], y1:T); OK!

I Draw x1:T[k] ∼ p(x1:T | θ[k], y1:T). Hard!

One-at-a-time: xt[k] ∼ p(xt| θ[k], xt−1[k], xt+1[k − 1], yt) Idea: Approximate p(x1:T | θ, y1:T) using a particle lter.

(11)

The particle lter

Consider state inference for a xed value of θ.

Theparticle lterapproximates p(xt| θ, y1:t), t = 1, . . . , T by

pbN(xt| θ, y1:t) :=

N

X

i=1

wit P

`wt`δxi

t(xt).

t= 1, importance sampling from prior: For i = 1, . . . , N,

I Sample xi1 ∼ µ(x1);

I Compute weights, wi1 = gθ(y1 | xi1).

11 / 34 fredrik.lindsten@it.uu.se Uppsala University

(12)

The particle lter

t> 1:

I Resampling: {xit−1, wit−1}Ni=1→ {˜xit−1, 1/N }Ni=1.

I Propagation: xit∼ fθ(xt| ˜xit−1).

I Weighting: wit= gθ(yt| xit).

⇒ {xit, wit}Ni=1

N.B. This is the simplest PF and many extensions and improvements are avialble.

Weighting Resampling Propagation Weighting Resampling

(13)

The particle lter

Algorithm Bootstrap particle lter (all steps for i = 1, . . . , N) 1. Initialize (t = 1):

(a) Draw xi1∼ µ(x1). (b) Set w1i = gθ(y1| xi1).

2. for t = 2, 3, . . . :

(a) Draw aitDiscrete({wjt−1}Nj=1).

(b) Draw xit∼ fθ(xt| xat−1it ).

(c) Set wti= gθ(yt| xit).

Theoretical justication:

I Strongly consistent, central limit theorem at rate N.

I Stable as t → ∞ under mixing assumptions.

I Unbiased estimate of normalizing constant p(y1:t| θ).

I . . .

13 / 34 fredrik.lindsten@it.uu.se Uppsala University

(14)

PF illustration

5 10 15 20 25

−4

−3.5

−3

−2.5

−2

−1.5

−1

−0.5 0 0.5 1

Time

State

(15)

Filtering or smoothing?

I Possible to reconstructstate trajectoriesby tracing the genealogy of the particles,

(. . . , xa

aiT T −1

T −2, xaT −1iT , xiT) =: xi1:T

I At t = T the particle lter provides an approximation of the joint smoothing distribution, p(x1:T | θ, y1:T):

pbN(x1:T | θ, y1:T) :=

N

X

i=1

wiT P

`w`Tδxi

1:T(x1:T).

15 / 34 fredrik.lindsten@it.uu.se Uppsala University

(16)

Path degeneracy

5 10 15 20 25

−4

−3.5

−3

−2.5

−2

−1.5

−1

−0.5 0 0.5 1

Time

State

(17)

Path degeneracy

5 10 15 20 25

−4

−3.5

−3

−2.5

−2

−1.5

−1

−0.5 0 0.5 1

Time

State

17 / 34 fredrik.lindsten@it.uu.se Uppsala University

(18)

Sampling based on the PF

Recall Gibbs sampler:

Want to sample from p(x1:T | θ, y1:T) for a xed value of θ.

With P(x?1:T = xi1:T) ∝ wiT we get x?1:T approx.∼ p(x1:T | θ, y1:T).

5 10 15 20 25

−4

−3.5

−3

−2.5

−2

−1.5

−1

−0.5 0 0.5 1

Time

State

(19)

Particle MCMC idea

Problems with this approach,

I Based on a PF ⇒ approximate sample.

I N → ∞required to get a correct MCMC.

I A lot of wasted computations.

Analyze PF + MCMC together ⇒ Particle MCMC Particle Gibbs sampling

Letx01:T = (x01, . . . , x0T)be a xed reference trajectory.

I Sample only N − 1 particles in the standard way.

I Set the Nth particle deterministically: xNt =x0t and aNt = N.

C. Andrieu, A. Doucet and R. Holenstein, Particle Markov chain Monte Carlo methods.

Journal of the Royal Statistical Society: Series B, 72:269-342, 2010.

19 / 34 fredrik.lindsten@it.uu.se Uppsala University

(20)

Particle Gibbs

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

(21)

Particle Gibbs

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

I Samplingx?1:T with P(x?1:T = xi1:T) ∝ wTi stochastically

maps x01:T intox?1:T.

I Implicitly denes a Markov kernel (the PG kernel) on XT, the space of state trajectories.

21 / 34 fredrik.lindsten@it.uu.se Uppsala University

(22)

Particle Gibbs

Theorem (Stationary distribution)

The joint smoothing distribution p(x1:T | θ, y1:T) is a stationary distribution of the PG kernelfor any N ≥ 1.

C. Andrieu, A. Doucet and R. Holenstein, Particle Markov chain Monte Carlo methods.

Journal of the Royal Statistical Society: Series B, 72:269-342, 2010.

Theorem (Uniform ergodicity)

The PG kernel isuniformly ergodic under weak conditions.

Additionally, under strong mixing conditions the convergence rate does not degenerate as T → ∞ provided that N ∝ T .

F. Lindsten, R. Douc, and E. Moulines, Uniform ergodicity of the Particle Gibbs sampler.

Scandinavian Journal of Statistics, 42(3): 775-797, 2015.

(23)

ex) Particle Gibbs

Stochastic volatility model,

xt+1= 0.9xt+ vt, vt∼ N (0, θ), yt= etexp (12xt) , et∼ N (0, 1).

Consider the ACF of θ[k] − E[θ | y1:T].

0 50 100 150 200

0 0.2 0.4 0.6 0.8 1

PG, T = 100

Lag

ACF

N=5 N=20 N=100 N=1000

0 50 100 150 200

0 0.2 0.4 0.6 0.8 1

PG, T = 1000

Lag

ACF

N=5 N=20 N=100 N=1000

23 / 34 fredrik.lindsten@it.uu.se Uppsala University

(24)

Ancestor Sampling

Particle Gibbs:

Letx01:T = (x01, . . . , x0T)be a xed reference trajectory.

I Sample only N − 1 particles in the standard way.

I Set the Nth particle deterministically: xNt =x0t.

I Set aNt = N.

I Sample aNt ∈ {1, . . . , N } with

P(aNt = j) ∝ wt−1j fθ(x0t | xjt−1).

F. Lindsten, M. I. Jordan and T. B. Schön, Particle Gibbs with Ancestor sampling, Journal of Machine Learning Research, 15: 2145-2184, 2014.

(25)

Particle Gibbs with Ancestor Sampling

25 / 34 fredrik.lindsten@it.uu.se Uppsala University

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

(26)

Particle Gibbs with Ancestor Sampling

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

(27)

PGAS vs. PG

PGAS

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

PG

5 10 15 20 25 30 35 40 45 50

−3

−2

−1 0 1 2 3

Time

State

27 / 34 fredrik.lindsten@it.uu.se Uppsala University

(28)

ex) Stochastic volatility model, cont'd

Stochastic volatility model,

xt+1= 0.9xt+ vt, vt∼ N (0, θ), yt= etexp (12xt) , et∼ N (0, 1).

Consider the ACF of θ[k] − E[θ | y1:T].

0 50 100 150 200

0 0.2 0.4 0.6 0.8 1

PG, T = 1000

Lag

ACF

N=5 N=20 N=100 N=1000

0 50 100 150 200

0 0.2 0.4 0.6 0.8 1

PG-AS, T = 1000

Lag

ACF

N=5 N=20 N=100 N=1000

(29)

ex) Wiener system identication

G h(·) Σ

ut yt

vt et

Semi-parametric model: State-space model for G, Gaussian process model for h(·).

−10 0 10 20

Magnitude(dB)

0 0.5 1 1.5 2 2.5 3

−50 0 50 100

Frequency (rad/s)

Phase(deg)

TruePosterior mean 99 % credibility

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−1

−0.8

−0.6

−0.4

−0.2 0 0.2 0.4 0.6 0.8 1

z

h(z)

True Posterior mean 99 % credibility

F. Lindsten, T. B. Schön and M. I. Jordan, Bayesian semiparametric Wiener system identication, Automatica, 49(7): 2053-2063, 2013.

29 / 34 fredrik.lindsten@it.uu.se Uppsala University

(30)

ex) Gaussian process state-space models

A GP-SSM is a exible nonparametric dynamical systems model,

f (·) ∼ GP mθ(xt), kθ(xt, x0t), xt+1| f, xt∼ N (f (xt), Q),

yt| xt∼ gθ(yt| xt).

−20 −10 0 10 20

−1

−0.5 0 0.5 1

−20

−10 0 10 20

u x

f(x,u)

Idea: Marginalize out f(·) and infer x1:T (and θ) directly.

Marginalization of f introduces non-Markovian dependencies in {xt}.

R. Frigola, F. Lindsten, T. B. Schön and C. E. Rasmussen, Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC, Conference on Neural Information Processing Systems (NIPS), 2013.

(31)

ex) Epidemiological forecasting

10 20 30 40 50 60 70 80 90

0 500 1000 1500

It

Month

True 1 month predictions 95 % credibility

Figure: Disease activity over an eight year period. First half used for inference, second half for one-month predictions.

Inference (and prediction) for SIR model with environmental noise and seasonal uctuations

St+dt= St+ µPdt − µStdt − (1 + F vt) βtStP−1Itdt, It+dt= It− (γ + µ)Itdt + (1 + F vt) βtStP−1Itdt, Rt+dt= Rt+ γItdt − µRtdt,

31 / 34 fredrik.lindsten@it.uu.se Uppsala University

(32)

Applications of PGAS

I Oceanography[Marcos et al., JGR, 2015]

I Econometrics[Nonejad, Economics Letters, 2015]

I Synaptic plasticity models[Linderman et al., NIPS, 2014]

I Probabilistic programming[Meent et al., AISTATS, 2015]

I Power disaggregation[Valera et al., NIPS, 2015]

I . . .

(33)

Ongoing: Blocking

J1 J3 J5

J2 J4

1 · · · · · · T

I Update blocks of states using PG(AS).

I Stable as T → ∞ using xed N.

I Opens up for parallelisation.

I Possible to do apply blocking also over components of the state vector (spatial blocking).

S. S. Singh, F. Lindsten, and E. Moulines, Blocking Strategies and Stability of Particle Gibbs Samplers, arXiv preprint, arXiv:1509.08362, 2015.

33 / 34 fredrik.lindsten@it.uu.se Uppsala University

(34)

Ongoing: Spatio-temporal models

Spatio-temporal model:

I State xt spatially structured

I High dimensionality prevents conventional methods!

I Using multiple Nested Particle Filters we can handle dimensions ∼ 100  1 000.

Figure: Drought probability in Sahel region in 1989.

C. A. Naesseth, F. Lindsten, and T. B. Schön, Nested Sequential Monte Carlo Methods.

(35)

Ongoing: Probabilistic graphical models

ϕ ϕ ϕ

ϕ ϕ ϕ

G3

≅ G2

≅ G2

u3(1)

u3(2:9)

ϕ ϕ

ϕ ϕ

ϕ ϕ

ϕ ϕ

ϕ ϕ

ϕ ϕ

ϕ ϕ

ϕ ϕ

ϕ ϕ

ϕ ϕ

ϕ ϕ

ϕ ϕ

≅ G(2,4) ≅ G(2,4)

G(4,4)

I(b)Solving staticproblems using sequential methods.

I Divide-and-Conquer PF  extending particle ltering from chainstotrees

I In particular: hierarchical Bayesian models

I Applications in genetics, linguistics, social science, . . .

F. Lindsten, A. M. Johansen, C. A. Naesseth, B. Kirkpatrick, T. B. Schön, J. Aston, and A. Bouchard- Côté, Divide-and-Conquer with Sequential Monte Carlo, arXiv preprint, arXiv:1406.4993v2, 2015.

35 / 34 fredrik.lindsten@it.uu.se Uppsala University

(36)

S. Linderman, C. H. Stock, and R. P. Adams.

A framework for studying synaptic plasticity with neural spike train data.

In Advances in Neural Information Processing Systems (NIPS) 27. 2014.

M. Marcos, F. M. Calafat, A. Berihuete, and S. Dangendorf.

Long-term variations in global sea level extremes.

Journal of Geophysical Research, 120(12):81158134, 2015.

N. Nonejad.

Flexible model comparison of unobserved components models using particle Gibbs with ancestor sampling.

Economics Letters, 133:3539, 2015.

I. Valera, F. Ruiz, L. Svensson, and F. Perez-Cruz.

Innite factorial dynamical model.

In Advances in Neural Information Processing Systems (NIPS) 28. 2015.

J.-W. van de Meent, Y. Hongseok, V. Mansinghka, and F. Wood.

Particle Gibbs with ancestor sampling for probabilistic programs.

In Proceedings of the 18th International Conference on Articial Intelligence and Statistics, 2015.

References

Related documents

The main contribution of this thesis is the exploration of different strategies for accelerating inference methods based on sequential Monte Carlo ( smc) and Markov chain Monte Carlo

When it comes to the problem of using both external and internal data into the future loss distribution, Bayesian Inference has the advantage of allowing for incor- poration of

1754 Department of Electrical Engineering, Linköping University. Linköping University SE-581 83

Machine learning using approximate inference: Variational and sequential Monte Carlo methods. by Christian

Efter hysterektomin rapporterade nästan alla kvinnor att de sällan eller aldrig hade problem med urininkontinens, förmågan att tömma urinblåsan samt att de inte behöver gå

The second identification method is referred to as RBPS-EM and is designed for maxi- mum likelihood parameter estimation in a type of mixed linear/nonlinear Gaussian state-

On a particular type of examples, with prefix-closed languages, typically used to model reactive systems, the required number of input se- quences grow approximately quadratically

1710, 2015 Department of Electrical Engineering. Linköping University SE-581 83