Variational Iterations for Smoothing with Unknown Process and Measurement Noise Covariances

(1)

Technical report from Automatic Control at Linköpings universitet

Variational Iterations for Smoothing with

Unknown Process and Measurement

Noise Covariances

Tohid Ardeshiri, Emre Özkan, Umut Orguner, Fredrik

Gustafsson

Division of Automatic Control

E-mail: tohid@isy.liu.se, emre@isy.liu.se,

umut@metu.edu.tr, fredrik@isy.liu.se

30th August 2015

Report no.: LiTH-ISY-R-3086

Submitted to

Address:

Department of Electrical Engineering Linköpings universitet

SE-581 83 Linköping, Sweden

WWW: http://www.control.isy.liu.se

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Technical reports from the Automatic Control group in Linköping are available from http://www.control.isy.liu.se/publications.

(2)

Abstract

In this technical report, some derivations for the smoother proposed in [1] are presented. More specically, the derivations for the cyclic iteration needed to solve the variational Bayes smoother for linear state-space mod-els with unknown process and measurement noise covariances in [1] are presented. Further, the variational iterations are compared with iterations of the Expectation Maximization (EM) algorithm for smoothing linear state-space models with unknown noise covariances.

Keywords: Adaptive smoothing, variational Bayes, sensor calibration, Rauch-Tung-Striebel smoother, Kalman ltering, noise covariance

(3)

Variational Iterations for Smoothing with

Unknown Process and Measurement Noise

Covariances

Tohid Ardeshiri, Emre Özkan, Umut Orguner

∗

and Fredrik Gustafsson

2015-08-31

Abstract

In this technical report, some derivations for the smoother proposed in [1] are presented. More specically, the derivations for the cyclic it-eration involved in the variational Bayes smoother for linear state-space models with unknown process and measurement noise covariances in [1] are presented. Further, the variational iterations are compared with it-erations of the Expectation Maximization (EM) algorithm for smoothing linear state-space models with unknown noise covariances.

1 Problem formulation

A Bayesian smoother using the variational Bayes method is given in [1]. The algorithm computes an approximation of the smoothing distribution for the state variable and the unknown noise covariances.

The dynamical model for the covariance matrix Σk adopted in [1] is taken

from [2] where the matrix Beta-Bartlett stochastic evolution model was proposed for estimating the multivariate stochastic volatility. Let 0 λ ≤ 1 be a covari-ance discount factor and p(Σk−1) = IW(Σk−1; νk−1, Ψk−1). The forward

pre-dictive model p(Σk|Σk−1)is such that, the forward prediction marginal density

becomes the inverse Wishart density parametrized by p(Σk) = IW(Σk; νk, Ψk)

where

Ψk = λΨk−1, (1a)

νk = λνk−1+ (1 − λ)(2d + 2). (1b)

Furthermore, the backwards smoothing recursion is given as [2]

Ψ−1_k ← (1 − λ)Ψ−1_k + λΨ−1_k+1, (2a) νk ← (1 − λ)νk+ λνk+1. (2b)

Here, we derive the expectations needed for the cyclic iterations of the vari-ational Bayes smoother given in [1] which approximates the joint smoothing ∗_{U. Orguner is with Department of Electrical and Electronics Engineering, Middle East}

(4)

posterior density for the states and the process and measurement noise covari-ance matrices. The joint smoothing posterior density

is approximated in [1] by a factorized probability density function (PDF) in the form

p(x0:K,R0:K, Q0:K−1|y0:K) ≈ qx(x0:K)qQ(Q0:K−1)qR(R0:K). (5)

The analytical solutions for ˆqx, ˆqQ and ˆqR can be obtained by cyclic iteration

of the following form.

log ˆqx(x0:K) ← _E ˆ qQqˆR [log p(x0:K, Q0:K−1, R0:K, y0:K)] + cx, (6a) log ˆqQ(Q0:K−1) ← E ˆ qxqˆR [log p(x0:K, Q0:K−1, R0:K, y0:K)] + cQ, (6b) log ˆqR(R0:K) ← E ˆ qxqˆQ [log p(x0:K, Q0:K−1, R0:K, y0:K)] + cR, (6c)

where the expected values on the right hand sides of (6) are taken with respect to the current qx, qQ and qR and cx, cQ and cR are constants with respect to

the variables xk, Qk and Rk, respectively [3, Chapter 10] [4].

2 Derivations for the smoother

In subsections 2.1 to 2.3, we will derive the necessary expressions for completing one iteration of the algorithm. For brevity all constant values are denoted by c in the derivation. Starting from the last estimate of the distributions (i.e., the ith iterates), we derive the (i + 1)th iterates which are denoted as q(i+1)

x (·),

q(i+1)_Q (·)and q_R(i+1)(·). The joint density p(x0:K, Q0:K−1, R0:K, y0:K)needed for

the derivations is given as follows,

(5)

2.1 Derivations for the approximate posterior q

(i+1) x

(·)

Using (6a) and (8), we obtain

logq(i+1)_x (x0:K) = log N (x0; m0, P0)

−1 2 K−1 X k=0 E q(i)_Q [Tr Q−1_k (xk+1− Akxk)(xk+1− Akxk)T] −1 2 K X k=0 E q(i)_R [Tr R−1_k (yk− Ckxk)(yk− Ckxk)T] + c (9) = log N (x0; m0, P0) + K−1 X k=0 log N (xk+1; Akxk, ( E q(i)_Q [Q−1_k ])−1) + K X k=0 log N (yk; Ckxk, ( E q(i)_R [R−1_k ])−1) + c. (10)

Hence, (10) has the same form as the logarithm of the joint posterior distribution of the state trajectory in a linear-Gaussian state-space model with the process noise covariance Qek , (E_q(i)

Q

[Q−1_k ])−1 and with the measurement noise covari-anceRek , (E_q(i)

R

[R−1_k ])−1. The approximate posterior density qx(i+1)(x0:K)can

be computed using the well-known RTS smoother [5].

2.2 Derivations for the approximate posterior q

(i+1) Q

(·)

The variational form for qQ(·), using (6b) and (8) obeys

log q_Q(i+1)(Q0:K−1) = log IW(Q0; ν0, V0) + K−2 X k=0 log p(Qk+1|Qk) + K−1 X k=0 E q(i)x [log N (xk+1|Akxk, Qk)] + c (11) = log IW(Q0; ν0, V0) + K−2 X k=0 log p(Qk+1|Qk) −1 2 K−1 X k=0 E qx(i) [Tr Q−1_k (xk+1− Akxk)(xk+1− Akxk)T] −1 2 K−1 X k=0 log |Qk| + c. (12)

Taking the exponential of both sides, we get

q(i+1)_Q (Q0:K−1) ∝ IW(Q0; ν0, V0) K−2 Y k=0 p(Qk+1|Qk) K−1 Y k=0 L(i+1)_Q,k (Qk) (13)

(6)

where L(i+1)_Q,k (Qk) , |Qk|− 1 2exp −1 2Tr Q −1 k E q(i)x [(xk+1− Akxk)(xk+1− Akxk)T] !! (14) for k = 0, . . . , K −1. Notice that the posterior density given in (13) corresponds to a smoothing problem with the following Markov model.

Q0∼IW(Q0; ν0, V0) (15a) Qk+1|Qk∼p(Qk+1|Qk), k = 0, . . . , K − 2 (15b) Z_Q,k(i+1)∼p(Z_Q,k(i+1)|Qk) , L (i+1) Q,k (Qk), k = 0, . . . , K − 1 (15c) where Z(i+1)

Q,k are some pseudo-measurements having the pseudo-likelihood

L(i+1)_Q,k (·). Since the problem is a standard smoothing problem for a Markov model, it can be solved using a forward-backward recursion with the following descriptions.

• Forward Recursion:

q_Q,0|−1(i+1) (Q0) =IW(Q0; ν0, V0), (16a)

corresponds to the pseudo-measurements Z(i+1)

Q,0:K−1 in the

arti-cial problem (15).

The forward recursion starts with the prior density q(i+1)

Q,0|−1(Q0) = IW(Q0; ν0, V0)

which is an inverse Wishart density. Suppose now that the intermediate pre-dicted density q(i+1)

Q,k|k−1(·)is inverse Wishart in the following form.

q_Q,k|k−1(i+1) (Qk) =IW Qk; νk|k−1, V (i+1) k|k−1 (19) Thanks to the form of the pseudo-likelihood function (14), when q(i+1)

Q,k|k−1(·)

in (19) is substituted into the update expression (16b), the posterior q(i+1) Q,k|k(·)

becomes also inverse Wishart as given below. q(i+1)_Q,k|k(Qk) =IW Qk; νk|k, V (i+1) k|k (20)

(7)

where

V_k|k(i+1)=V_k|k−1(i+1)_{+ E}

qx(i)

[(xk+1− Akxk)(xk+1− Akxk)T], (21a)

νk|k=νk|k−1+ 1. (21b)

When the posterior (20) is substituted into the prediction update expression (16c), thanks to the Beta-Bartlett transition density whose prediction updates are given by (1), the predicted density q(i+1)

Q,k+1|k(·) turns out to be also inverse

Wishart as given below.

As a result, via induction, all forward predicted and posterior densities are inverse Wishart.

The backward recursion starts with the nal posterior density which is in-verse Wishart as discussed above, i.e., we have

q_Q,K−1|K(i+1) (QK−1) =IW QK−1; νK−1|K, V (i+1) K−1|K . (24)

Note here again that the given condition K in the smoothed density q(i+1) Q,K−1|K(·)

and its parameters νK−1|K, V (i+1)

K−1|K pertains to y0:K in the original

smooth-ing problem which corresponds to the pseudo-measurements Z(i+1)

Q,0:K−1 in the

articial problem (15). Suppose now that an intermediate smoothed density q(i+1)_Q,k+1|K(·)is inverse Wishart as given below.

q(i+1)_Q,k+1|K(Qk+1) =IW Qk+1; νk+1|K, V (i+1) k+1|K . (25)

When the smoothed density (25) is substituted into the backward update ex-pression (17), thanks to the Beta-Bartlett transition density whose backward smoothing updates are given by (2), the smoothed density q(i+1)

Q,k|K(·) turns out

(8)

2.2.1 Summary

Combining (21), (23) and (27), the marginals for the approximate joint smooth-ing density q(i+1)

Q (Q0:K−1)in (13) can be found as the following inverse Wishart

density q_Q(i+1)(Qk) = IW Qk; νk|K, V (i+1) k|K , (28)

whose parameters can be computed using V_0|0(i+1)=V0+ E

qx(i)

[(x1− A0x0)(x1− A0x0)T], (29a)

ν0|0 =ν0+ 1. (29b)

for k = 0 along with the forward (ltering) recursion given by V_k|k(i+1)=λQV (i+1) k−1|k−1+ E qx(i) [(xk+1− Akxk)(xk+1− Akxk)T], (30a) νk|k=λQνk−1|k−1+ (1 − λQ)(2nx+ 2) + 1, (30b)

2.2.2 Fixed Parameter Case

When Qk is a xed parameter i.e., λQ = 1or Qk = Qk−1, the forward ltering

recursion (30) takes the form V_k|k(i+1)=V_k−1|k−1(i+1) _{+ E}

q(i)x

[(xk+1− Akxk)(xk+1− Akxk)T], (32a)

νk|k=νk−1|k−1+ 1, (32b)

and the backward smoothing recursion (31) becomes

V_k|K(i+1)= V_k+1|K(i+1), (33a) νk|K = νk+1|K. (33b)

Hence, the smoothing posterior for Qk is the same for all time instances and is

given by q_Q(i+1)(Qk) = IW Qk; ν, V(i+1) , (34)

whose parameters are

V(i+1)=V0+ K−1 X k=0 E q(i)x [(xk+1− Akxk)(xk+1− Akxk)T], (35a) ν =ν0+ K. (35b)

When the variational recursions converge, the expected value of Qk can be

obtained using the stationary values of V and ν via c

Qk, E[Qk] =

V ν − 2nx− 2

(9)

2.3 Derivations for the approximate posterior q

(i+1) R

(·)

Using (6c) and (8), q(i+1)

R (·)is given as

log q(i+1)_R (R0:K) = log IW(R0; µ0, M0)

+ K X k=0 E q(i)x [log N (yk; Ckxk, Rk)] + K−1 X k=0 log p(Rk+1|Rk) (37) = log IW(R0; µ0, M0) + K−1 X k=0 log p(Rk+1|Rk) − 1 2 K X k=0 log |Rk| −1 2 K X k=0 E q(i)x [Tr R_k−1(yk− Ckxk)(yk− Ckxk)T] + c. (38)

Taking the exponential of both sides, we get

q_R(i+1)(R0:K) ∝ IW(R0; µ0, M0) K−1 Y k=0 p(Rk+1|Rk) K Y k=0 L(i+1)_R,k (Rk) (39) where L(i+1)_R,k (Rk) , |Rk|− 1 2_exp −1 2Tr R −1 k E q(i)x [(yk− Ckxk)(yk− Ckxk)T] !! . (40) for k = 0, . . . , K. Notice that the posterior density given in (39) corresponds to a smoothing problem with the following Markov model.

R0∼IW(R0; µ0, M0) (41a) Rk+1|Rk∼p(Rk+1|Rk), k = 0, . . . , K − 1 (41b) Z_R,k(i+1)∼p(Z_R,k(i+1)|Rk) , L (i+1) R,k (Rk), k = 0, . . . , K (41c) where Z(i+1)

R,k are some pseudo-measurements having the pseudo-likelihood

L(i+1)_R,k (·). Since the problem is a standard smoothing problem for a Markov model, it can be solved using a forward-backward recursion. The rest of the derivation follows exactly the same steps as those in Section 2.2 and therefore are not repeated here. A summary of the results is given in the next subsection. 2.3.1 Summary

The marginals for the approximate joint smoothing density q(i+1)

R (R0:K)in (39)

can be found as the following inverse Wishart density. q_R(i+1)(Rk) = IW Rk; µk|K, M (i+1) k|K , (42)

whose parameters can be computed using M_0|0(i+1)=M0+ E

qx(i)

(10)

for k = 0 along with the forward (ltering) recursion given by M_k|k(i+1)=λRM

(i+1) k−1|k−1+ E

qx(i)

[(yk− Ckxk)(yk− Ckxk)T], (44a)

µk|k=λRµk−1|k−1+ (1 − λR)(2ny+ 2) + 1, (44b)

for 1 ≤ k ≤ K, followed by a backwards (smoothing) recursion given by

2.3.2 Fixed Parameter Case

When Rk is a xed parameter i.e., λR= 1or Rk = Rk−1, the forward ltering

recursion (44) takes the form

M_k|k(i+1)=M_k−1|k−1(i+1) _{+ E}

q(i)x

µk|k=µk−1|k−1+ 1, (46b)

and the backwards smoothing recursion (45) becomes

M_k|K(i+1)= M_k+1|K(i+1) , (47a) µk|K = µk+1|K. (47b)

Hence, the smoothing posterior for Rk is the same for all time instances and is

given by q(i+1)_R (Rk) = IW Rk; µ, M(i+1) , (48)

whose parameters are given by

M(i+1)=M0+ K X k=0 E qx(i)

µ =µ0+ K + 1. (49b)

When the variational recursions converge, the expected value of Rk can be

obtained using the stationary values of M and µ via

c

Rk , E[Rk] =

M µ − 2ny− 2

. (50)

2.4 Calculation of the expected values

Now we can calculate the expected values needed for the iterations in Sections 2.1 to 2.3. The approximate distribution of the random matrices Qk and Rk

(11)

is inverse Wishart. Therefore their inverses are Wishart distributed and their expectations are given by

E q_Q(i) [Q−1_k ] = (νk|K− nx− 1)(V (i) k|K) −1_, ₍₅₁₎ E q_R(i) [R−1_k ] = (µk|K− ny− 1)(M (i) k|K) −1_. ₍₅₂₎

At each recursion of the algorithm, the RTS smoother provides the approximate joint posterior for p(xk+1, xk|y0:K)denoted by q

Using (53) we can calculate the following expected values

(12)

3 Comparison with Expectation Maximization

Consider the following linear time-invariant state-space representation, xk+1= Axk+ wk, wk iid ∼ N (wk; 0, Q), (56a) yk = Cxk+ vk, vk iid ∼ N (vk; 0, R), (56b)

where {xk ∈ Rnx|0 ≤ k ≤ K}is the state trajectory, also denoted as x0:K; {yk ∈

Rny_{|0 ≤ k ≤ K}}is the measurement sequence, denoted in more compact form as

y0:K; A ∈ Rnx×nxand C ∈ Rny×nxare known state transition and measurement

matrices, respectively; {wk ∈ Rnx|0 ≤ k ≤ K − 1} and {vk ∈ Rny|0 ≤ k ≤ K}

are mutually independent white Gaussian noise sequences. The initial state x0is

assumed to have a Gaussian prior, i.e., p(x0) = N (x0; m0, P0). Q and R are the

unknown (deterministic) xed positive denite process noise and measurement noise covariance matrices.

Expectation-maximization (EM) [6] method can be used as in [710] to com-pute the maximum likelihood (ML) estimate of the noise covariance matrices. In the E (Expectation) step of the EM algorithm the conditional expectation of the joint log-likelihood is computed using the last estimates of the unknown parameters Q(i) _{and R}(i)_{as in}

Q = Ehlog p(x0:K, y0:K) y0:K, R (i)_{, Q}(i)i ₍₅₇₎ where log p(x0:K, y0:K) = log N (x0; m0, P0) − K + 1 2 log |R| −1 2 K X k=0 Tr R−1(yk− Cxk)(yk− Cxk)T − K 2 log |Q| −1 2 K−1 X k=0 Tr Q−1(xk+1− Axk)(xk+1− Axk)T + c. (58) Therefore, Q = −1 2 E[(x0− m0)P −1 0 (x0− m0)T+ log |P0|] −K + 1 2 log |R| − 1 2Tr R −1 K X k=0 E[(yk− Cxk)(yk− Cxk)T|y0:K] ! −K 2 log |Q| − 1 2Tr Q −1 K−1 X k=0 E[(xk+1− Axk)(xk+1− Axk)|y0:K] ! + c, (59) where the expectations can be computed using the posterior of the RTS smoother which uses Q(i) _{and R}(i)_.

Now the expressions for the M (Maximization) step of the EM algorithm can be computed. Taking the partial derivatives of Q with respect to Q−1 _and

(13)

R−1, we get1 ∂Q ∂R−1 = K + 1 2 R − 1 2 K X k=0

E[(yk− Cxk)(yk− Cxk)T|y0:K], (60a)

∂Q ∂Q−1 = K 2Q − 1 2 K−1 X k=0 E[(xk+1− Axk)(xk+1− Axk)T|y0:K], (60b)

and equating the results to zero gives

R(i+1)= 1 K + 1

K

X

k=0

E[(yk− Cxk)(yk− Cxk)T|y0:K, R(i), Q(i)], (61a)

Q(i+1)= 1 K

K−1

X

k=0

E[(xk+1− Axk)(xk+1− Axk)T|y0:K, R(i), Q(i)] (61b)

where, R(i+1) _{and Q}(i+1) _{are (i + 1)th estimates for R and Q, respectively.}

These estimates can be used in the E step of the EM algorithm using the RTS smoother again until convergence.

Note that the iteration of the EM algorithm for the estimation of noise covariances is not identical to the iterations of the VB algorithm but they have close functional forms. In particular, the RTS smoother in the EM solution uses the last estimates R(i)_{and Q}(i)_{while calculating the expected values in (61)}

while the RTS smoother in the VB solution uses the harmonic mean of these random matrices i.e., Q , (Ee qQ[Q

−1_])−1 _{for the process noise covariance and}

e

R , (EqR[R

−1_])−1 _{for the measurement noise covariance for calculating the}

similar expected values, where ( E_q Q [Q−1])−1= V /(ν − nx− 1), (62) ( E qR [R−1])−1= M/(µ − ny− 1). (63)

Note that the matrices (62) and (63) are strictly larger than (in a positive denite sense) the VB estimates given in (36) and (50), respectively, which is a result of the Jensen's inequality. Hence VB does not use the last estimates in calculating the involved expected values.

Expressions for V and ν are given in (35) and expressions for M and µ are given in (49). When V0 and M0are small and

ν − nx− 1 = K ⇒ ν0= nx+ 1, (64)

µ − ny− 1 = K + 1 ⇒ µ0= ny+ 1, (65)

the noise covariances used in the RTS smoother of EM solution and VB solution will coincide. However, such quantities for the initial degree of freedom of inverse Wishart distributions are not valid since the degree of freedom should be always greater than twice the dimension of the random matrix in the inverse Wishart distribution, see [11, page 111]. Hence VB recursions to do not reduce to EM recursions for some initial (hyper-) parameter selections.

1_{Since Q and R are positive denite, taking the partial derivatives with respect to Q}−1

(14)

References

[1] T. Ardeshiri, E. Özkan, U. Orguner, and F. Gustafsson, Approximate Bayesian smoothing with unknown process and measurement noise covari-ances, Signal Processing Letters, submitted 2015.

[2] C. M. Carvalho and M. West, Dynamic matrix-variate graphical models, Bayesian Anal, vol. 2, pp. 6998, 2007.

[3] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2007. [4] D. Tzikas, A. Likas, and N. Galatsanos, The variational approximation for Bayesian inference, IEEE Signal Process. Mag., vol. 25, no. 6, pp. 131146, Nov. 2008.

[5] H. E. Rauch, C. T. Striebel, and F. Tung, Maximum Likelihood Esti-mates of Linear Dynamic Systems, Journal of the American Institute of Aeronautics and Astronautics, vol. 3, no. 8, pp. 14451450, Aug 1965. [6] A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from

incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B (Methodological), vol. 39, no. 1, pp. pp. 138, 1977. [Online]. Available: http://www.jstor.org/stable/2984875

[7] R. H. Shumway and D. S. Stoer, An approach to time series smoothing and forecasting using the em algorithm, Journal of time series analysis, vol. 3, no. 4, pp. 253264, 1982.

[8] S. Gibson and B. Ninness, Robust maximum-likelihood estimation of multivariable dynamic systems, Automatica, vol. 41, no. 10, pp. 1667 1682, 2005. [Online]. Available: http://www.sciencedirect.com/science/ article/pii/S0005109805001810

[9] Z. Ghahramani and G. E. Hinton, Parameter estimation for linear dynam-ical systems, Department of Computer Science, University of Toronto, Tech. Rep., 1996.

[10] S. Särkkä, Bayesian Filtering and Smoothing. New York, NY, USA: Cam-bridge University Press, 2013.

[11] A. K. Gupta and D. K. Nagar, Matrix variate distributions. Boca Raton, FL: Chapman & Hall/CRC, 2000.

(15)

Avdelning, Institution Division, Department

Division of Automatic Control Department of Electrical Engineering

Datum Date 2015-08-30 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version http://www.control.isy.liu.se

ISBN ISRN

Serietitel och serienummer

Title of series, numbering ISSN_1400-3902

LiTH-ISY-R-3086

Titel

Title Variational Iterations for Smoothing with Unknown Process and Measurement Noise Covari-ances

Författare

Author Tohid Ardeshiri, Emre Özkan, Umut Orguner, Fredrik Gustafsson Sammanfattning

Abstract

In this technical report, some derivations for the smoother proposed in [1] are presented. More specically, the derivations for the cyclic iteration needed to solve the variational Bayes smoother for linear state-space models with unknown process and measurement noise covari-ances in [1] are presented. Further, the variational iterations are compared with iterations of the Expectation Maximization (EM) algorithm for smoothing linear state-space models with unknown noise covariances.