• No results found

Variational Iterations for Filtering and Smoothing with skew-t measurement noise

N/A
N/A
Protected

Academic year: 2021

Share "Variational Iterations for Filtering and Smoothing with skew-t measurement noise"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

Technical report from Automatic Control at Linköpings universitet

Variational Iterations for Filtering and

Smoothing with skew-

t

measurement

noise

Tohid Ardeshiri, Henri Nurminen, Robert Piché, Fredrik

Gustafsson

Division of Automatic Control

E-mail: tohid@isy.liu.se, henri.nurminen@tut.fi,

robert.piche@tut.fi, fredrik@isy.liu.se

19th March 2015

Report no.: LiTH-ISY-R-3083

Submitted to

Address:

Department of Electrical Engineering Linköpings universitet

SE-581 83 Linköping, Sweden

WWW: http://www.control.isy.liu.se

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Technical reports from the Automatic Control group in Linköping are available from http://www.control.isy.liu.se/publications.

(2)

Abstract

In this technical report, some derivations for the lter and smoother pro-posed in [1] are presented. More specically, the derivations for the cyclic iteration needed to solve the variational Bayes lter and smoother for state space models with skew t likelihood proposed in [1] are presented.

Keywords: skew t-distribution, skewness, t-distribution, robust ltering, Kalman lter, RTS smoother, variational Bayes

(3)

Variational Iterations for Filtering and

Smoothing with skew-t measurement noise

Tohid Ardeshiri, Henri Nurminen

, Robert Piché

,

and Fredrik Gustafsson

2015-03-24

Abstract

In this technical report, some derivations for the lter and smoother proposed in [1] are presented. More specically, the derivations for the cyclic iteration needed to solve the variational Bayes lter and smoother for state space models with skew t likelihood proposed in [1] are presented.

1 Problem formulation

A Bayesian lter and a Bayesian smoother using the variational Bayes method for normal prior and skew-t measurement noise are given in [1]. These algorithms compute an approximation of the ltering distribution p(xk|y1:k)and smoothing

distribution p(xk|y1:K), respectively. Here, we derive the expectations needed

for the cyclic iterations of the variational Bayes smoother which approximates the joint smoothing posterior density given in [1]. The joint smoothing posterior density p(x1:K, u1:K, Λ1:K|y1:K) ∝ p(x1:K, u1:K, Λ1:K, y1:K) (1) = p(x1) K−1 Y l=1 p(xl+1|xl) K Y k=1 p(yk|xk, uk, Λk)p(uk|Λk)p(Λk) (2) = N (x1; x1|0, P1|0) K−1 Y l=1 N (xl+1; Axl, Q) × K Y k=1 ( N (yk; Cxk+ ∆uk, Λk−1R)N+(uk; 0, Λ−1k ) ny Y i=1 G[Λk]ii; νi 2, νi 2  ) (3)

is approximated in [1] by a factorized probability density function (PDF) in the form

p(x1:K,u1:K, Λ1:K|y1:K) ≈ qx(x1:K)qu(u1:K)qΛ(Λ1:K). (4)

H. Nurminen and R. Piché are with the Department of Automation Science and

Engi-neering, Tampere University of Technology (TUT), PO Box 692, 33101 Tampere, Finland (e-mails: henri.nurminen@tut., robert.piche@tut.). H. Nurminen receives funding from TUT Graduate School, Finnish Doctoral Programme in Computational Sciences (FICS), and the Foundation of Nokia Corporation.

(4)

The analytical solutions for ˆqx, ˆquand ˆqΛcan be obtained by cyclic iteration of

log qx(x1:K) ← E quqΛ

[log p(y1:K, x1:K, u1:K, Λ1:K)] + cx (5a)

log qu(u1:K) ← E qxqΛ [log p(y1:K, x1:K, u1:K, Λ1:K)] + cu (5b) log qΛ(Λ1:K) ← E qxqu [log p(y1:K, x1:K, u1:K, Λ1:K)] + cΛ (5c)

where the expected values on the right hand sides of (5) are taken with respect to the current qx, quand qΛand cx, cuand cΛare constants with respect to the

variables xk, uk and Λk, respectively [2, Chapter 10] [3].

2 Derivations for the smoother

In sections 2.1, 2.3 and 2.2 the derivations for the variational solution (5) are given. For brevity all constant values are denoted by c in the derivation. The logarithm of the joint smoothing posterior which is needed for the derivations is given as log p(x1:K, u1:K,Λ1:K, y1:K) = log N (x1; x1|0, P1|0) + K−1 X l=1 log N (xl+1; Axl, Q) + K X k=1

log N (yk; Cxk+ ∆uk, Λk−1R) + log N+(uk; 0, Λ−1k )

+ K X k=1 ny X i=1 log G[Λk]ii; νi 2, νi 2  (6)

(5)

2.1

Derivations for q

x

Using equation (5a) we obtain

logqx(x1:K) = log N (x1; x1|0, P1|0) + K−1 X l=1 log N (xl+1; Axl, Q) + K X k=1 E quqΛ log N (yk; Cxk+ ∆uk, Λ−1k R) + c (7) = log N (x1; x1|0, P1|0) + K−1 X l=1 log N (xl+1; Axl, Q) −1 2 K X k=1 E quqΛ

(yk− Cxk− ∆uk)TR−1Λk(yk− Cxk− ∆uk) + c (8)

= log N (x1; x1|0, P1|0) + K−1 X l=1 log N (xl+1; Axl, Q) −1 2 K X k=1 (yk− Cxk− ∆uk)TR−1Λk(yk− Cxk− ∆uk) + c (9) = log N (x1; x1|0, P1|0) + K−1 X l=1 log N (xl+1; Axl, Q) + K X k=1 log N (yk− ∆uk; Cxk, Λk −1 R) + c (10)

where uk , Equ[uk] and Λk , EqΛ[Λk] are derived in sections 2.2 and 2.3, respectively. Hence, log qx(x1:K)in (10) has the same form as logarithm of the

joint posterior of a linear state-space model with measurementsyek , yk− ∆uk and Gaussian measurement noise covariance R , Λe k

−1

R. Therefore, qx(x1:K)

can be computed using the Rauch-Tung-Striebel smoother's recursion [4]. The approximate marginal distribution of xk turns out to be

qx(xk) = N (xk; xk|K, Pk|K), (11)

(6)

2.2

Derivations for q

u

Using equation (5b) we obtain

log qu(u1:K) = K X k=1 E qΛqx

log N (yk; Cxk+ ∆uk, Λk−1R) + log N+(uk; 0, Λ−1k ) + c.

(12) Therefore, qu(u1:K) =Q K k=1qu(uk)where log qu(uk) = E qΛqx

log N (yk; Cxk+ ∆uk, Λk−1R) + log N+(uk; 0, Λ−1k ) + c

(13) = −1

2 EqΛqx

[(yk− Cxk− ∆uk)TR−1Λk(yk− Cxk− ∆uk)] −

1 2 EqΛ [uTkΛkuk] + c (14) = −1 2(yk− Cxk− ∆uk) TR−1Λ k(yk− Cxk− ∆uk) − 1 2u T kΛkuk+ c, (15) where, xk , Eqx[xk] = xk|K. Therefore, qu(uk) = N+(uk; uk|K, Uk|K) (16) where uk|K = Ku(yk− Cxk|K), (17) Uk|K = (I − Ku∆)Λk −1 , (18) Ku= Λk −1 ∆(∆Λk −1 ∆ + Λk −1 R)−1 = ∆(∆2+ R)−1. (19) The expectation uk is needed in (10) and can be calculated using e.g., [5]. Note

that the cumulative distribution function of univariate normal distribution (or some approximation of it) is required in the computation of the moments of the truncated normal distribution.

2.3

Derivations for q

Λ

Using equation (5c) we obtain

log qΛ(Λ1:K) = K X k=1  E quqx

log N (yk; Cxk+ ∆uk, Λk−1R) + log N+(uk; 0, Λ−1k )

  + K X k=1 ny X i=1 log G[Λk]ii; νi 2, νi 2  + c. (20)

(7)

Therefore, qΛ(Λ1:K) =Q K

k=1qΛ(Λk)where

logqΛ(Λk) = E quqx

log N (yk; Cxk+ ∆uk, Λk−1R) + log N+(uk; 0, Λ−1k )

 + ny X i=1 log G[Λk]ii; νi 2, νi 2  + c (21) = −1 2 Equqx

[tr(ΛkR−1(yk− Cxk− ∆uk)(yk− Cxk− ∆uk)T)]

−1 2 Equ [tr(ΛkukuTk)] + ny X i=1 νi 2 log[Λk]ii− νi 2[Λk]ii  + c (22) = −1 2 Eqx [tr(ΛkR−1(yk− Cxk)(yk− Cxk)T)] − 1 2 Equ [tr(Λk∆R−1∆ukuTk)] +1 2(yk− Cxk) TΛ kR−1∆uk+ 1 2uk T∆Λ kR−1(yk− Cxk) −1 2 Equ [tr(ΛkukuTk)] + ny X i=1 νi 2 log[Λk]ii− νi 2[Λk]ii  + c (23) = −1 2tr(ΛkR −1((y k− Cxk|K)(yk− Cxk|K)T+ CPk|KCT)]) −1 2tr(Λk(∆R −1 ∆ + I) E qu [ukuTk]) + 1 2tr(ΛkR −1∆u k(yk− Cxk|K)T) +1 2tr(Λk∆R −1(y k− Cxk|K)ukT) + ny X i=1 νi 2 log[Λk]ii− νi 2[Λk]ii  + c (24) = ny X i=1  νi 2 log[Λk]ii− νi+ [Ψk]ii 2 [Λk]ii  + c, (25)

where the commutative property of product of diagonal matrices ∆, Λk and R

is used in several occasions and

Ψk=R−1((yk− Cxk|K)(yk− Cxk|K)T+ CPk|KCT) + (∆R−1∆ + I) E qu [ukuTk] − R−1∆u k(yk− Cxk|K)T− ∆R−1(yk− Cxk|K)ukT. (26) Therefore, qΛ(Λk) = ny Y i=1 G  [Λk]ii; νi 2 + 1, νi+ [Ψk]ii 2  . (27)

Note that only the diagonal elements of the matrix Ψk are required. As a

consequence, provided that ∆ and R are diagonal, only the diagonal elements of E[ukuTk] are required. These are second moments of univariate truncated

normal distributions that can be computed using e.g., [5]. In the derivations above EqΛ[Λk]is required. The diagonal elements of EqΛ[Λk]are

E qΛ [[Λk]ii] = νi+ 2 νi+ [Ψk]ii . (28)

(8)

3 Derivations for the Filter

The ltering recursions are similar to those of the smoother given in section 2. However, since the notation used in the ltering algorithm [1, Table II] is dierent from the notation used for smoothing algorithm, the derivation for the lter will be given separately.

Suppose that at time index k the measurement yk is available, and the

prediction PDF p(xk|y1:k−1)is

p(xk|y1:k−1) = N (xk; xk|k−1, Pk|k−1). (29)

Then, using Bayes' theorem the joint ltering posterior PDF can be written as p(xk, uk, Λk|y1:k) ∝p(yk, xk, uk, Λk|y1:k−1) (30) =p(yk|xk, uk, Λk)p(xk|y1:k−1)p(uk|Λk)p(Λk) (31) =N (yk; Cxk+ ∆uk, Λ−1k R)N (xk; xk|k−1, Pk|k−1) × N+(uk; 0, Λ−1k ) ny Y i=1 G[Λk]ii; νi 2, νi 2  . (32)

This posterior is not analytically tractable. We seek an approximation in the form

p(xk,uk, Λk|y1:k) ≈ qx(xk)qu(uk)qΛ(Λk). (33)

In the VB approach, the Kullback-Leibler divergence (KLD) of the true posterior from the factorized approximation is minimized;

ˆ

qx, ˆqu, ˆqΛ= argmin qx,qu,qΛ

DKL(qx(xk)qu(uk)qΛ(Λk)||p(xk, uk, Λk|y1:k))

where DKL(q(·)||p(·)) ,R q(x) logq(x)p(x) dxis the KLD [6]. The analytical

solu-tions for ˆqx, ˆquand ˆqΛ can be obtained by cyclic iteration of

log qx(xk) ← E quqΛ

[log p(yk, xk, uk, Λk|y1:k−1)] + cx (34a)

log qu(uk) ← E qxqΛ [log p(yk, xk, uk, Λk|y1:k−1)] + cu (34b) log qΛ(Λk) ← E qxqu [log p(yk, xk, uk, Λk|y1:k−1)] + cΛ (34c)

where the expected values on the right hand sides of (34) are taken with respect to the current qx, qu and qΛ and cx, cu and cΛ are constants with respect to

the variables xk, uk and Λk, respectively [2, Chapter 10] [3]. This recursion is

convergent to a local optimum [2, Chapter 10]. In sections 3.1, 3.3 and 3.2 the derivations for the variational solution (34) are given. For brevity all constant values are denoted by c in the derivation. The logarithm of the joint ltering posterior which is needed for the derivations is given by

log p(yk, xk, uk, Λk|y1:k−1) = − 1 2(yk− Cxk− ∆uk) TΛ kR−1(yk− Cxk− ∆uk) −1 2(xk− xk|k−1) TP−1 k|k−1(xk− xk|k−1) −1 2u T kΛkuk+ ny X i=1 νi 2 log[Λk]ii− νi 2[Λk]ii+ c. (35)

(9)

3.1

Derivations for q

x

Using equation (34a) we obtain log qx(xk) = −

1 2 EquqΛ

[(yk− Cxk− ∆uk)TΛkR−1(yk− Cxk− ∆uk)]

−1 2(xk− xk|k−1) TP−1 k|k−1(xk− xk|k−1) + c (36) = −1 2(yk− Cxk− ∆uk) TΛ kR−1(yk− Cxk− ∆uk) −1 2(xk− xk|k−1) TP−1 k|k−1(xk− xk|k−1) + c, (37)

where uk , Equ[uk] and Λk , EqΛ[Λk] are derived in sections 3.2 and 3.3, respectively. Hence,

qx(xk) ∝ N (yk− ∆uk; Cxk, Λk −1

R)N (xk; xk|k−1, Pk|k−1) (38)

which can be computed using the Kalman lter's [7] measurement update. Therefore, qx(xk) = N (xk; xk|k, Pk|k) (39) where, xk|k= xk|k−1+ Kx(yk− Cxk|k−1− ∆uk), (40) Pk|k= (I − KxC)Pk|k−1, (41) Kx= Pk|k−1CT(CPk|k−1CT+ Λk −1 R)−1. (42)

3.2

Derivations for q

u

Using equation (34b) we obtain log qu(uk) = −

1 2 EqxqΛ

[(yk− Cxk− ∆uk)TΛkR−1(yk− Cxk− ∆uk)]

−1 2 EqΛ [uTkΛkuk] + c (43) = −1 2(yk− Cxk− ∆uk) TΛ kR−1(yk− Cxk− ∆uk) −1 2u T kΛkuk+ c, (44) where, xk , Eqx[xk] = xk|k. Therefore, qu(uk) = N+(uk; uk|k, Uk|k) (45) where uk|k= Ku(yk− Cxk|k), (46) Uk|k= (I − Ku∆)Λk −1 , (47) Ku= Λk −1 ∆(∆Λk −1 ∆ + Λk −1 R)−1= ∆(∆2+ R)−1. (48)

(10)

3.3

Derivations for q

Λ

Using equation (34c) we obtain

logqΛ(Λk) = −

1 2 Equqx

[tr(ΛkR−1(yk− Cxk− ∆uk)(yk− Cxk− ∆uk)T)]

−1 2 Equ [tr(ΛkukuTk)] + ny X i=1 νi 2 log[Λk]ii− νi 2[Λk]ii  + c (49) = −1 2 Eqx [tr(ΛkR−1(yk− Cxk)(yk− Cxk)T)] − 1 2 Equ [tr(Λk∆R−1∆ukuTk)] +1 2(yk− Cxk) TΛ kR−1∆uk+ 1 2uk T∆Λ kR−1(yk− Cxk) −1 2 Equ [tr(ΛkukuTk)] + ny X i=1 νi 2 log[Λk]ii− νi 2[Λk]ii  + c (50) = −1 2tr(ΛkR −1((y k− Cxk|k)(yk− Cxk|k)T+ CPk|kCT)]) −1 2tr(Λk(∆R −1 ∆ + I) Eq u [ukuTk]) + 1 2tr(ΛkR −1∆u k(yk− Cxk|k)T) +1 2tr(Λk∆R −1(y k− Cxk|k)ukT) + ny X i=1 νi 2 log[Λk]ii− νi 2[Λk]ii  + c (51) = ny X i=1  νi 2 log[Λk]ii− νi+ [Ψk]ii 2 [Λk]ii  + c, (52) where Ψk=R−1((yk− Cxk|k)(yk− Cxk|k)T+ CPk|kCT) + (∆R−1∆ + I) E qu [ukuTk] − R−1∆uk(yk− Cxk|k)T− ∆R−1(yk− Cxk|k)ukT. (53) Therefore, qΛ(Λk) = ny Y i=1 G  [Λk]ii; νi 2 + 1, νi+ [Ψk]ii 2  . (54)

Note that only the diagonal elements of the matrix Ψk are required. As a

consequence, provided that ∆ and R are diagonal, only the diagonal elements of E[ukuTk] are required. These are second moments of univariate truncated

normal distributions that can be computed using e.g., [5]. In the derivations above EqΛ[Λk]is required. The diagonal elements of EqΛ[Λk]are

E qΛ [[Λk]ii] = νi+ 2 νi+ [Ψk]ii . (55)

(11)

References

[1] H. Nurminen, T. Ardeshiri, R. Piché, and F. Gustafsson, Robust inference for state-space models with skewed measurement noise, Signal Processing Letters, p. submitted, 2015. [Online]. Available: http://arxiv.org/abs/1503.06606

[2] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2007. [3] D. G. Tzikas, A. C. Likas, and N. P. Galatsanos, The variational approxi-mation for Bayesian inference, IEEE Signal Processing Magazine, vol. 25, no. 6, pp. 131146, Nov. 2008.

[4] H. E. Rauch, C. T. Striebel, and F. Tung, Maximum Likelihood Estimates of Linear Dynamic Systems, Journal of the American Institute of Aeronau-tics and AstronauAeronau-tics, vol. 3, no. 8, pp. 14451450, Aug 1965.

[5] D. R. Barr and E. T. Sherrill, Mean and variance of truncated normal distributions, The American Statistician, vol. 53, no. 4, pp. 357361, 1999. [6] T. M. Cover and J. Thomas, Elements of Information Theory. John Wiley

and Sons, 2006.

[7] R. E. Kalman, A new approach to linear ltering and prediction problems, Transactions of the ASMEJournal of Basic Engineering, vol. 82, no. Series D, pp. 3545, 1960.

(12)
(13)

Avdelning, Institution Division, Department

Division of Automatic Control Department of Electrical Engineering

Datum Date 2015-03-19 Språk Language  Svenska/Swedish  Engelska/English   Rapporttyp Report category  Licentiatavhandling  Examensarbete  C-uppsats  D-uppsats  Övrig rapport  

URL för elektronisk version http://www.control.isy.liu.se

ISBN  ISRN



Serietitel och serienummer

Title of series, numbering ISSN1400-3902

LiTH-ISY-R-3083

Titel

Title Variational Iterations for Filtering and Smoothing with skew-t measurement noise

Författare

Author Tohid Ardeshiri, Henri Nurminen, Robert Piché, Fredrik Gustafsson

Sammanfattning Abstract

In this technical report, some derivations for the lter and smoother proposed in [1] are presented. More specically, the derivations for the cyclic iteration needed to solve the variational Bayes lter and smoother for state space models with skew t likelihood proposed in [1] are presented.

References

Related documents

The algorithms use a variational Bayes approximation of the posterior distribution of models that have normal prior and skew-t-distributed measurement noise.. The proposed filter

The effects of the students ’ working memory capacity, language comprehension, reading comprehension, school grade and gender and the intervention were analyzed as a

3.1 Possible values for the different number of levels in the quantization 23 4.1 MAP metrics for different size of the second layer of the encoder 34 4.2 MAP metrics for the

To prove Theorem 1.2, we can state, in view of the general variational formula of p-capacity for general bounded convex domains, the p-capacitary Minkowski inequality for

In the last years, several other mean value for- mulas for the normalized p-Laplacian have been found, and the corresponding program (equivalence of solutions in the viscosity

The ambiguous space for recognition of doctoral supervision in the fine and performing arts Åsa Lindberg-Sand, Henrik Frisk & Karin Johansson, Lund University.. In 2010, a

1) to set up a temporary (5 year) Centre of Gender Excellence (Gende- ring EXcellence: GEXcel) in order to develop innovative research on changing gender

Mutation testing has been used in this thesis as a method to evaluate the quality of test suites of avionic applications from different safety critical levels.. The results