Supplementary Material for “Recent results on Bayesian Cramer-Rao bounds for jump Markov systems”

(1)

Supplementary Material for “Recent results on Bayesian

Cramér-Rao bounds for jump Markov systems”.

Carsten Fritsche, Umut Orguner

Division of Automatic Control

E-mail: carsten@isy.liu.se, umut@metu.edu.tr

12th February 2016

Report no.: LiTH-ISY-R-3089

Address:

Department of Electrical Engineering Linköpings universitet

SE-581 83 Linköping, Sweden

WWW: http://www.control.isy.liu.se

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Technical reports from the Automatic Control group in Linköping are available from http://www.control.isy.liu.se/publications.

(2)

propositions that could not be included into the paper due to space limitations. The notation is adapted from the paper.

Keywords: Bayesian Cramér-Rao Lower Bounds, jump Markov nonlinear systems, Monte Carlo simula-tions, Rao-Blackwellized Particle Filter

(3)

Supplementary Material for “Recent results on

Bayesian Cramér-Rao bounds for jump Markov

systems”

Carsten Fritsche and Umut Orguner

†

† _{Department of Electrical & Electronics Engineering, Middle East Technical University, 06800, Ankara, Turkey}

1 _{Computing the BCRB}

1.1 _{Bayesian Information Matrix Computation for Model 1}

1.1.1 _{Proof of Lemma 1}

The proof of Lemma 1 starts with representing the (temporal) dependiencies between the different stochastic variables x0:k+1, r1:k+1 and z1:k+1for model 1 using the corresponding directed acyclic graph (DAG) shown in

Fig. 1.

x₀ x₁ x_k−1 x_k x_k₊₁

r1 rk−1 rk rk+1

z₁ z_k−1 z_k z_k₊₁

Figure 1: DAG representation of jump Markov nonlinear system given by Model 1

From the above figure we can determine the recurrence relations for p(x0:k, z1:k|rk) and∇x0:kp(x0:k, z1:k|rk). If

k= 1, we have p(x0:1, z1|r1) = p(x0:1, z1, r1) Pr{r1} = p(z1|x1:0, r1)p(x1|x0, r1)p(x0|r1) Pr{r1} Pr{r1} , = p(z1|x1, r1)p(x1|x0, r1)p(x0) (1a) ∇x_0:1p(x0:1, z1|r1) = h [∇x_0:1p(z1|x1, r1)] p(x1|x0, r1) + p(z1|x1, r1) [∇x_0:1p(x1|x0, r1)] i p(x0) + p(z1|x1, r1) p(x1|x0, r1) [∇x0:1p(x0)]. (1b)

(4)

For all other cases, we have p(x0:k, z1:k|rk) = p(x0:k, z1:k, rk) Pr{rk} = P r_k−1 p(x0:k, z1:k, rk, rk−1) Pr{rk} = X r_k−1 p(zk|x0:k, z1:k−1, rk, rk−1)p(xk|x0:k−1, z1:k−1, rk, rk−1) Pr{rk|rk−1, x0:k−1, z1:k−1} ×p(x0:k−1, z1:k−1|rk−1) Pr{rk−1} Pr{rk} = X r_k−1 p(zk|xk, rk)p(xk|xk−1, rk)p(x0:k−1, z1:k−1|rk−1)Pr{r k|rk−1} Pr{rk−1} Pr{rk} = p(zk|xk, rk)p(xk|xk−1, rk) X r_k−1 Pr{rk−1|rk}p(x0:k−1, z1:k−1|rk−1) (2a) ∇x0:kp(x0:k, z1:k|rk) = h [∇x0:kp(zk|xk, rk)] p(xk|xk−1, rk) + p(zk|xk, rk)[∇x0:kp(xk|xk−1, rk)] i X r_k−1 Pr{rk−1|rk} × p(x0:k−1, z1:k−1|rk−1) + p(zk|xk, rk)p(xk|xk−1, rk) X r_k−1 Pr{rk−1|rk}[∇x_0:kp(x0:k−1, z1:k−1|rk−1)] (2b) where Pr{rk−1|rk} = Pr{rk|rk−1} · Pr{rk−1} Pr{rk} . (3)

This concludes the proof of Lemma 1.

1.2 _{Bayesian Information Matrix Computation for Model 2}

1.2.1 Proof of Lemma 2

For the case that the process equation is independent of the discrete mode variable rk, the state xk is a Markov

process. In this case, the pdf of x0:k can be decomposed as follows:

p(x0:k) = p(x0)

k

Y

n=1

p(xn|xn−1) (4)

Inserting (4) into the definition of the Bayesian information matrix of the prior yields

J_x 0:k = Ep(x0:k){−∆ x_0:k x0:klog p(x0:k)} J_x 0:k = Ep(x0:k){−∆ x_0:k x_0:klog p(x0)} + k X n=1 Ep(x0:k){−∆ x_0:k x_0:klog p(xn|xn−1)} (5)

Let ⊗ denote the Kronecker product and let δk(i, j) denote a k× k dimensional matrix whose elements are

all zero except at the i-th row and the j-th column which is one. Then, the expression in (5) can be further rewritten Jx_0:k = Ep(x0:k){−δk+1(1, 1)⊗ △ x₀ x0log p(x0)} + k X n=1 h Ep(x0:k){−δk+1(n, n)⊗ △ x_n−1 x_n−1logep(xn|xn−1)} + Ep(x0:k){−δk+1(n, n + 1)⊗ △ x_n−1 x_n log p(xn|xn−1)} + Ep(x0:k){−δk+1(n + 1, n)⊗ △ x_n x_n−1log p(xn|xn−1)} + Ep(x0:k){−δk+1(n + 1, n + 1)⊗ △ x_n x_nlog p(xn|xn−1)} i = δk+1(1, 1)⊗ Jx₀+ k X n=1 h δk+1(n, n)⊗ D11n + δk+1(n, n + 1)⊗ D12n + δk+1(n + 1, n)⊗ D21n + δk+1(n + 1, n + 1)⊗ D22,an i (6) with Jx₀ = Ep(x0){−∆ x₀ x₀log p(x0)} (7)

(5)

and D11_n _{= E}_p_(x 0:n){−∆ x_n−1 x_n−1log p(xn|xn−1)}, (8a) D12_n _{= E}_p_(x 0:n){−∆ x_n−1 x_n log p(xn|xn−1)} = [D21n ]T, (8b) D22,a_n _{= E}_p_(x 0:n){−∆ x_n x_nlog p(xn|xn−1)} (8c)

which concludes the proof of Lemma 2.

1.2.2 _{Proof of Lemma 3}

The proof of Lemma 3 starts with representing the (temporal) dependencies between the different stochastic variables x0:k+1, r1:k+1 and z1:k+1for model 2 using the corresponding directed acyclic graph (DAG) shown in

Fig. 2.

x₀ x₁ x_k−1 x_k x_k₊₁

r1 rk−1 rk rk+1

z₁ z_k−1 z_k z_k₊₁

Figure 2: DAG representation of jump Markov nonlinear system given by Model 2

From the above figure we can determine the recurrence relations for p(z1:k, rk|x0:k) and∇x_0:kp(z1:k, rk|x0:k). If

k= 1, we have p(z1, r1|x0:1) = p(z1, r1, x0:1) p(x0:1) = p(z1|x1:0, r1)p(x1|x0, r1)p(x0|r1) Pr{r1} p(x0:1) =p(z1|x1, r1)p(x1|x0)p(x0) Pr{r1} p(x0:1) = p(z1|x1, r1)p(x1|x0)p(x0) Pr{r1} p(x1|x0)p(x0) = p(z1|x1, r1) Pr{r1} (9a) ∇x0:1p(z1, r1|x0:1) = [∇x0:1p(z1|x1, r1)] Pr{r1}. (9b) For all other cases, we have

p(z1:k, rk|x0:k,) = p(z1:k, rk, x0:k) p(x0:k) = P r_k−1 p(z1:k, rk, x0:k, rk−1) p(x0:k) = X r_k−1 p(zk|x0:k, z1:k−1, rk, rk−1)p(xk|x0:k−1, z1:k−1, rk, rk−1) Pr{rk|rk−1, x0:k−1, z1:k−1} ×p(z1:k−1, rk−1|x0:k−1)p(x0:k−1) p(x0:k) = X r_k−1 p(zk|xk, rk)p(xk|xk−1)p(z1:k−1, rk−1|x0:k−1)Pr{r k|rk−1}p(x0:k−1) p(x0:k) = X r_k−1 p(zk|xk, rk)p(xk|xk−1)p(z1:k−1, rk−1|x0:k−1)Pr{rk|rk−1}p(x0:k−1) p(xk|xk−1)p(x0:k−1) = p(zk|xk, rk) X r_k−1 Pr{rk|rk−1}p(z1:k−1, rk−1|x0:k−1) (10a) ∇x_0:kp(z1:k, rk|x0:k) = X rk−1 Pr{rk|rk−1} h [∇x_0:kp(zk|xk, rk)] p(z1:k−1, rk−1|x0:k−1) + p(zk|xk, rk) [∇x_0:kp(z1:k−1, rk−1|x0:k−1)] i . (10b)

(6)

1.3 _{Recursive Computation of the BCRB}

1.3.1 _{Proof of Lemma 4}

The proof begins with first showing that the first equality of the lemma holds. The assumption of conditional independence in the lemma yields that

p(x0:k, z1:k) = p(x0:d, z1:d)

k

Y

n=d+1

p(xn, zn|xn−d:n−1, z1:n−1). (11)

For k1> d, the gradient is given by

∇T x_k1log p(x0:k, z1:k) =∇Tx_k1log p(x0:d, z1:d) | {z } =0 + k X n=k1 ∇T x_k1log p(xn, zn|xn−d:n−1, z1:n−1) = k X n=k1 ∇T x k1log p(xn, zn|xn−d:n−1, z1:n−1). (12) Obviously,∇T x

k1log p(x0:k, z1:k) does not depend on x0:k1−d−1, so that we have

△xk1 x 0:k1−d−1log p(x0:k, z1:k) = (△ x 0:k1−d−1 x_k1 log p(x0:k, z1:k))T= 0. (13)

As a result, the corresponding information matrices are given by

[J0:k]0:k1−d−1×k1:k1= ([J0:k]k1:k1×0:k1−d−1)T= 0 (14)

which proves the first equality in the lemma. In a next step, it is shown that the second equality of the lemma holds. The Bayesian information matrix can be written as

[J0:k+1]0:k−d×0:k−d = E −△x0:k−d x_0:k−dlog p(x0:k+1, z1:k+1) =− E△x0:k−d x_0:k−dlog p(x0:k, z1:k) +△x0:k−d x_0:k−dlog p(xk+1, zk+1|x0:k, z1:k) =− E△x0:k−d x_0:k−dlog p(x0:k, z1:k) +△x0:k−d x_0:k−dlog p(xk+1, zk+1|xk−d+1:k, z1:k) =− E△x0:k−d x_0:k−dlog p(x0:k, z1:k) =[J0:k]0:k−d×0:k−d, (15)

where the third equality follows from the conditional independence assumption in the lemma. Clearly, the second term in the expectation of the third equality is independent of x0:k−d, which proves the second equality

of Lemma 4. This concludes the proof of Lemma 4.

1.3.2 Proof of Proposition 1

The proof starts with decomposing the state vector x0:k as follows: x0:k= [x0:k−d−1T , xTk−d:k−1, xTk]T. Then, the

Bayesian information matrix can be written as

J_0:k₌   [J[J0:k0:k]]k−d0:k−d−1×0:k−d−1:k−1×0:k−d−1 [J[J0:k0:k]]k−d0:k−d−1×k−d:k−1:k−1×k−d:k−1 [J[J0:k0:k]]k−d0:k−d−1×k:k:k−1×k:k [J0:k]k:k×0:k−d−1 [J0:k]k:k×k−d:k−1 [J0:k]k:k×k:k   . (16)

Assuming that the joint vector (xk, zk) and x0:k−d−1 are conditionally independent given xk−d:k−1, z1:k−1, it

follows from Lemma 4 that

[J0:k]0:k−d−1×k:k = 0nx(k−d)×nx, (17a) [J0:k]k:k×0:k−d−1 = 0nx×nx(k−d). (17b) By introducing A_k ₌ _[J_0:k_]_{0:k−d−1×0:k−d−1}_, _(18a) B_k ₌ _[J_0:k_]_{0:k−d−1×k−d:k−1}_, _(18b) C_k ₌ _[J_0:k_]_k−d_{:k−1×k−d:k−1}, (18c) D_k ₌ _[J_0:k_]_k−d_:k−1×k:k_, _(18d) E_k ₌ _[J_0:k_]_k_:k×k:k, (18e)

(7)

the Bayesian information matrix can be written as J_0:k₌   A_k B_k 0_n x(k−d)×nx BT_k C_k D_k 0_n x×nx(k−d) D T k Ek   . (19)

Note that the dimension of the matrix Akis (nx(k− d) × nx(k− d)), Bk is (nx(k− d) × nxd), Ck is (nxd× nxd),

D_k _{is (n}_xd×nx) and Ekis (nx×nx). In order to compute the (nx×nx) matrix Jk, we have to find an expression

for the submatrix located in the lower-right corner of [J0:k]−1. The relation between [J0:k]−1and [Jk]−1is given

as follows: [J0:k]−1=   ✷_✷ ✷_✷ ✷_✷ ✷ ✷ [Jk]−1   , (20)

where ✷ denote matrix elements we are not interested in. We define

H₁₁ ₌ _A k Bk BT_k C_k , H₁₂₌ 0_n x(k−d)×nx D_k , H₂₂_{= E}_k, (21) such that [J0:k]−1= H₁₁ H₁₂ HT 12 H22 . (22)

Note, that the dimension of H11is (nxk× nxk), H12is, (nxk× nx) and H22is (nx× nx). Then, the submatrix

located in the lower-right corner of [J0:k]−1 can be found by block-matrix inversion, yielding

[Jk]

−1_{= [H}

22− HT12[H11]−1H12]−1. (23)

By following the same approach for computing [H11]−1= ✷ ✷ ✷ [Ck− BTk[Ak]−1Bk]−1 , (24) we end up in J_k _{= E}_k_{− [0}_n x×nx(k−d) D T k] ✷ ✷ ✷ [Ck− BTk[Ak]−1Bk]−1 0_n x(k−d)×nx D_k = Ek− DTk[Ck− BTk[Ak] −1_B k] −1_D k = Ek− DTk[Hk]−1Dk, (25)

where we have introduced

H_k _{= C}_k_{− B}T_k_[A_k_]−1_B

k. (26)

Note, that Hkis of dimension (nxd× nxd). The expression in (25) can be used to compute Jk. However, it does

not give a recursive relationship. Furthermore, it relies on the inversion of the matrix Ak, whose dimension

increases with time k. For large values of k, this leads to a tremendous increase in computational complexity and should be avoided.

In the following, we develop a recursive relationship for computing Hkwhich does not require the inversion of a

large matrix such as Ak. We decompose the state vector x0:k+1as follows: x0:k+1= [xT0:k−d−1, xk−dT :k−1, xTk, xkT+1]T.

Then, the Bayesian information matrix [J0:k+1] ∆

= eJ_{can be written as follows}

eJ =      [eJ_]_{0:k−d−1×0:k−d−1} _[eJ_]_{0:k−d−1×k−d:k−1} _[eJ_]_{0:k−d−1×k:k} _[eJ_]_{0:k−d−1×k+1:k+1} [eJ_]_k−d_{:k−1×0:k−d−1} _[eJ_]_k−d_{:k−1×k−d:k−1} _[eJ_]_k−d_:k−1×k:k _[eJ_]_k−d_{:k−1×k+1:k+1} [eJ_]_k_{:k×0:k−d−1} _[eJ_]_k_{:k×k−d:k−1} _[eJ_]_k_:k×k:k _[eJ_]_k_:k×k+1:k+1 [eJ_]_k_{+1:k+1×0:k−d−1} _[eJ_]_k_{+1:k+1×k−d:k−1} _[eJ_]_k_+1:k+1×k:k _[eJ_]_k_{+1:k+1×k+1:k+1}     . (27)

(8)

Assuming that the joint vector (xk, zk) is conditionally independent of x0:k−d−1 given xk−d:k−1 and z1:k−1, as

well as (xk+1, zk+1) is conditionally independent of x0:k−dgiven xk−d+1:k and z1:k, it follows from Lemma 4

[J0:k+1]k+1:k+1×0:k−d−1 = E{△ x_0:k−d−1 x_k₊₁ log p(x0:k+1, z1:k+1)} = E{△x0:k−d−1 x_k₊₁ log p(x_0:k, z_1:k)} +E{△x0:k−d−1 x_k₊₁ log p(xk+1, zk+1|xk−d+1:k, z1:k)} = 0nx×nx(k−d), (28a) [J0:k+1]k:k×0:k−d−1 = E{△ x_0:k−d−1 x_k log p(x0:k+1, z1:k+1)} = E{△x0:k−d−1 x_k log p(x0:k−1, z1:k−1)} +E{△x0:k−d−1 x_k log p(xk+1, zk+1|xk−d+1:k, z1:k)} +E{△x0:k−d−1 x_k log p(xk, zk|xk−d:k−1, z1:k−1)} = 0nx×nx(k−d), (28b) [J0:k+1]0:k−d−1×k+1:k+1 = 0nx(k−d)×nx, (28c) [J0:k+1]0:k−d−1×k:k = 0nx(k−d)×nx. (28d)

Furthermore, we refactor the submatrix [J0:k+1]k−d:k−1×k+1:k+1 [J0:k+1]k:k×k+1:k+1 = [J0:k+1]k−d:k−d×k+1:k+1 [J0:k+1]k−d+1:k×k+1:k+1 = 0_n x×nx D_k₊₁ . (29) Using the same approach, we can write

[ [J0:k+1]k+1:k+1×k−d:k−1 [J0:k+1]k+1:k+1×k:k] = 0_n x×nx D T k+1 . (30) We further define e C_k₊₁ ₌ _[J_0:k+1_]_k−d_{:k−1×k−d:k−1} _(31a) e D_k₊₁ ₌ _[J_0:k+1_]_k−d_:k−1×k:k _(31b) e E_k₊₁ ₌ _[J_0:k+1_]_k_:k×k:k_. _(31c)

Note, that the dimension of eC_k₊₁ _{is (n}_x_d_{× n}_x_d_{), e}D_k₊₁ _{is (n}_x_d_{× n}_x_{) and e}E_k₊₁ _{is (n}_x_{× n}_x_{). From Lemma 4,}

we know that

[J0:k]0:k−d×0:k−d= [J0:k+1]0:k−d×0:k−d (32)

holds. For any submatrix, (32) still holds, such that we can write

[J0:k]0:k−d−1×0:k−d−1= [J0:k+1]0:k−d−1×0:k−d−1= Ak. (33)

We can further show, that

[J0:k+1]0:k−d−1×k−d:k−1 = E{−∆ x_k−d:k−1 x_0:k−d−1log(p(x0:k+1, z1:k+1))} = E{−∆xk−d:k−1 x_0:k−d−1log(p(x0:k, z1:k))} +E{−∆xk−d:k−1 x_0:k−d−1log(p(xk+1, zk+1|xk−d+1:k, z1:k))} = E{−∆xk−d:k−1 x_0:k−d−1log(p(x0:k, z1:k))} = [J0:k]0:k−d−1×k−d:k−1= Bk. (34)

For the Bayesian information matrix, we can finally write:

J_0:k+1₌      A_k B_k 0_n x(k−d)×nx 0_n x(k−d)×nx BT k 0_n x×nx(k−d) " _e C_k₊₁ De_k₊₁ e DT k+1 Eek+1 # 0_n x×nx D_k₊₁ 0_n x×nx(k−d) ₀ nx×nx DTk+1 _E k+1     . (35)

The quantity of interest [Jk+1]−1 is again located in the lower-right corner of [J0:k+1]−1, i.e.

J_0:k+1₌      ✷ ✷ ✷ ✷ ✷ ✷ ✷ ✷ ✷ ✷ ✷ ✷ ✷ ✷ ✷ [Jk+1]−1     . (36)

(9)

We define N₁₁₌    A_k B_k 0_n x(k−d)×nx BT k 0_n x×nx(k−d) " _e C_k₊₁ De_k₊₁ e DT k+1 Eek+1 #    , N₁₂₌   0_n x(k−d)×nx 0_n x×nx D_k₊₁   , N₂₂_{= E}_k₊₁_, ₍₃₇₎ such that [Jk+1]−1 = [N22− N12T[N11]−1N12]−1 (38) m E_k₊₁_{− J}_k₊₁ _{= N}T 12[N11]−1N12 (39)

holds. Note, that the dimension of N11is (nx(k + 1)× nx(k + 1)), N12is (nx(k + 1)× nx) and N22is (nx× nx).

As will be seen later, the lower-right corner of the inverse [N11]−1 is of main interest. It can be computed

according to [N11]−1= ✷ ✷ ✷ [ eN₂₂_{− e}NT₁₂_{[ e}N₁₁_]−1_N_e 12]−1 , (40)

where we have defined e N₁₁_{= A}_k, Ne₁₂₌ B_k 0_n x(k−d)×nx , Ne₂₂₌ " e C_k₊₁ De_k₊₁ e DT k+1 Eek+1, # (41)

which are of dimensions (nx(k− d) × nx(k− d)), (nx(k− d) × nx(d + 1)) and (nx(d + 1)× nx(d + 1)) respectively.

Evaluation of the component of interest yields e N₂₂_{− e}NT₁₂_{[ e}N₁₁_]−1_N_e 12 = " e C_k₊₁ De_k₊₁ e DT k+1 Eek+1 # − BT_k_[A_k_]−1_B k 0nxd×nx 0_n x×nxd 0nx×nx = " e C_k₊₁_{− B}T k[Ak]−1Bk Dek+1 e DT k+1 Eek+1 # . (42) By further defining ˘ D_k₊₁₌ ₀ nx×nx D_k₊₁ , (43)

with dimension (nx(d + 1)× nx), we can express (42) as follows

E_k₊₁_{− J}_k₊₁ ₌ 0_n x×(k−d)nx D˘ T k+1 [ eN₂₂_{− e}NT 12[ eN11]−1Ne12]−1 ₀ (k−d)nx×nx ˘ D_k₊₁ = D˘T k+1[ eN22− eNT12[ eN11]−1Ne12]−1D˘k+1 = D˘T_k₊₁ " e C_k₊₁_{− B}T k[Ak]−1Bk Dek+1 e DT k+1 Eek+1 #−1 ˘ D_k₊₁. (44)

In order to obtain a recursive relationship for the computation of Hk, we define

e

H_k₊₁ _{= e}C_k₊₁_{− B}_kT_[A_k_]−1B_k_. ₍₄₅₎

With Hk = Ck− BTk[Ak]−1Bk, we can rewrite eHk+1as follows

e H_k₊₁_{= H}_k_{+ e}C_k₊₁_{− C}_k_. ₍₄₆₎ By letting e H_k₊₁₌ " e H₁₁ He₁₂ e HT 12 He22 # , and De_k₊₁₌ " e D1 k+1 e D2 k+1 # , (47)

the expression in (44) can be rewritten as

E_k₊₁_{− J}_k₊₁_{= ˘}DT_k₊₁     e H₁₁ h He₁₂ De1 k+1 i " e HT 12 [ eD1 k+1]T # " e H₂₂ De2 k+1 [ eD2 k+1]T Eek+1 #     −1 ˘ D_k₊₁_. ₍₄₈₎

(10)

Note, that the dimension of eH₁₁ _{and e}D1_k +1 is (nx× nx), eH12 and [ eD2k+1]T is (nx× nx(d− 1)) and eH22 is (nx(d− 1) × nx(d− 1)). We define f M₁₁_{= e}H₁₁_, Mf₁₂₌h He₁₂ De1_k +1 i , Mf₂₂₌ " e H₂₂ De2_k +1 [ eD2_k +1]T Eek+1 # (49)

so that the inverse in (48) can be equivalently expressed as

E_k₊₁_{− J}_k₊₁₌ 0_n x×nx DTk+1 [fM₂₂_{− f}MT 12[fM11]−1fM12]−1 ₀ nx×nx D_k₊₁ . (50) This yields finally

J_k₊₁_{= E}_k₊₁_{− D}T_k₊₁ "" e H₂₂ De2_k +1 [ eD2_k +1]T Eek+1 # − " e HT 12 [ eD1_k +1]T # [ eH₁₁_]−1h He₁₂ De1_k +1 i#−1 D_k₊₁_. ₍₅₁₎

By taking into account that

J_k₊₁_{= E}_k₊₁_{− D}T_k

+1[Hk+1]−1Dk+1 (52)

holds, we can identify a relation for Hk+1, which is given by

H_k₊₁₌ " e H₂₂ De2 k+1 [ eD2 k+1]T Eek+1 # − " e HT 12 [ eD1 k+1]T # [ eH₁₁_]−1h _H_e 12 De1k+1 i , (53)

which finally concludes the proof of Proposition 1.

The proof begins with first showing that the first equality of the lemma holds. The assumption of conditional independence in the lemma yields that

p(x0:k, z1:k) = p(x0:d, z1:d) k Y n=d+1 p(xn, zn|xn−d:n−1, z1:n−1) = p(x0:d, z1:d) k Y n=d+1 p(zn|xn−d:n−1, z1:n−1)p(xn|xn−1) (54)

For k1> d, the gradient is given by

∇T x_k1log p(x0:k, z1:k) =∇Tx_k1log p(x0:d, z1:d) | {z } =0 + k X n=k1 ∇T x_k1log p(xn, zn|xn−d:n−1, z1:n−1) = k X n=k1 ∇T x_k1log p(xn, zn|xn−d:n−1, z1:n−1) = k X n=k1 ∇T x k1log p(zn|xn−d:n, z1:n−1) + k X n=k1 ∇T x k1log p(xn|xn−1) (55) Obviously,∇T x

k1log p(x0:k, z1:k) does not depend on x0:k1−d−1, so that we have

△xk1 x 0:k1−d−1log p(x0:k, z1:k) = (△ x 0:k1−d−1 x_k1 log p(x0:k, z1:k))T= 0. (56)

As a result, the corresponding information matrices are given by

(11)

which proves the first equality in the lemma. In a next step, it is shown that the second equality of the lemma holds. The Bayesian information matrix can be written as

[J0:k+1]0:k−d×0:k−d = E−△ x_0:k−d x_0:k−dlog p(x0:k+1, z1:k+1) =− E△x0:k−d x_0:k−dlog p(x0:k, z1:k) +△x0:k−d x_0:k−dlog p(xk+1, zk+1|x0:k, z1:k) =− E△x0:k−d x_0:k−dlog p(x_0:k, z_1:k) +△x0:k−d x_0:k−dlog p(xk+1, zk+1|xk−d+1:k, z1:k) =− E△x0:k−d x_0:k−dlog p(x_0:k, z_1:k) +△x0:k−d x_0:k−dlog p(zk+1|xk−d+1:k, z1:k) +△x0:k−d x_0:k−dlog p(xk+1|xk) =− E△x0:k−d x_0:k−dlog p(x0:k, z1:k) =[J0:k]0:k−d×0:k−d, (58)

where the third equality follows from the conditional independence assumption in the lemma. Clearly, the second and third term in the expectation of the fourth equality are independent of x0:k−d, which proves the

second equality of Lemma 5. This concludes the proof of Lemma 5.

2 _{Algorithms for Conditional Independence Assumption Verification}

2.1 _{Model 1}

2.1.1 _{Proof of Lemma 6}

We can decompose the prediction probability as follows Pr{rk|x0:k−1, z1:k−1} = X r_k−1 Pr{rk, rk−1|x0:k−1, z1:k−1} = X r_k−1 Pr{rk, rk−1, x0:k−1, z1:k−1} p(x0:k−1, z1:k−1) = X r_k−1 Pr{rk|rk−1}p(zk−1|xk−1, rk−1)p(xk−1|xk−2, rk−1) Pr{rk−1|x0:k−2, z1:k−2} ×p(x0:k−2, z1:k−2) p(x0:k−1, z1:k−1) = X rk−1 Pr{rk|rk−1}p(zk−1|xk−1, rk−1)p(xk−1|xk−2, rk−1) Pr{rk−1|x0:k−2, z1:k−2} X rk X r_k−1 Pr{rk|rk−1}p(zk−1|xk−1, rk−1)p(xk−1|xk−2, rk−1) Pr{rk−1|x0:k−2, z1:k−2} . (59) The last equality gives us the desired recursion for calculating Pr{rk|x0:k−1, z1:k−1}. The recursions are initiated

with Pr{r1|x0, z1:0} = Pr{r1} where z1:0 shall be read as the empty set, i.e. no measurements, and r1 is

statistically independent of x0 according to the assumptions made in Section 2 of [1], see also Fig. 1. The

recursion can be further rewritten in an algorithmic structure as follows: • At k = 1 initialize the recursions with Pr{r1|x0, z1:0} = Pr{r1}

• For n = 1, . . . , k − 1 evaluate the prediction probability from the following recursion

Pr{rn+1|x0:n, z1:n} = X rn Pr{rn+1|rn}p(zn|xn, rn)p(xn|xn−1, rn) Pr{rn|x0:n−1, z1:n−1} X rn+1 X rn Pr{rn+1|rn}p(zn|xn, rn)p(xn|xn−1, rn) Pr{rn|x0:n−1, z1:n−1} (60)

(12)

We can decompose the modified prediction probability as follows Pr{rk|xk−d:k−1, z1:k−1} = X rk−1 Pr{rk, rk−1|xk−d:k−1, z1:k−1} = X r_k−1 Pr{rk, rk−1, xk−d:k−1, z1:k−1} p(xk−d:k−1, z1:k−1) = X r_k−1 Pr{rk|rk−1}p(zk−1|xk−1, rk−1)p(xk−1|xk−2, rk−1) Pr{rk−1|xk−d:k−2, z1:k−2} ×p(xk−d:k−2, z1:k−2) p(xk−d:k−1, z1:k−1) = X r_k−1 Pr{rk|rk−1}p(zk−1|xk−1, rk−1)p(xk−1|xk−2, rk−1) Pr{rk−1|xk−d:k−2, z1:k−2} X rk X rk−1 Pr{rk|rk−1}p(zk−1|xk−1, rk−1)p(xk−1|xk−2, rk−1) Pr{rk−1|xk−d:k−2, z1:k−2} . (61) The last equality gives us the desired recursion for calculating Pr{rk|xk−d:k−1, z1:k−1}. The recursions are

initiated with Pr{rk−d+1|xk−d, z1:k−d} which has to be calculated separately, see Section 2.1.3. The recursion

can be further rewritten in an algorithmic structure as follows: • Initialize the recursions with Pr{rk−d+1|xk−d, z1:k−d}

• For n = k − d + 1, . . . , k − 1 evaluate the modified prediction probability from the following recursion

Pr{rn+1|xk−d:n, z1:n} = X rn Pr{rn+1|rn}p(zn|xn, rn)p(xn|xn−1, rn) Pr{rn|xk−d:n−1, z1:n−1} X rn+1 X rn Pr{rn+1|rn}p(zn|xn, rn)p(xn|xn−1, rn) Pr{rn|xk−d:n−1, z1:n−1} (62)

It is worth noting that the above recursion has to be re-evaluated at each time step. This concludes the proof of Lemma 7.

2.1.3 Computing the initial probabilityPr{rk−d+1|xk−d, z1:k−d}

The initial probability can be rewritten as follows Pr{rk−d+1|xk−d, z1:k−d} = X r_k−d X r_k−d−1 Z Pr{rk−d+1, rk−d, rk−d−1, xk−d−1|xk−d, z1:k−d} dxk−d−1 = X r_k−d X r_k−d−1 Z Pr{rk−d+1, rk−d, rk−d−1, xk−d−1, xk−d, z1:k−d} dxk−d−1 p(xk−d, z1:k−d) (63) = X r_k−d X r_k−d−1 Pr{rk−d+1|rk−d} Pr{rk−d|rk−d−1}p(zk−d|xk−d, rk−d) × Z p(xk−d|xk−d−1, rk−d)p(xk−d−1, rk−d−1|z1:k−d−1) dxk−d−1 p(z1:k−d−1) p(xk−d, z1:k−d) = X r_k−d X r_k−d−1 Pr{rk−d+1|rk−d} Pr{rk−d|rk−d−1}p(zk−d|xk−d, rk−d) ×Rp(xk−d|xk−d−1, rk−d)p(xk−d−1, rk−d−1|z1:k−d−1) dxk−d−1 X rk−d+1 X rk−d X rk−d−1 Pr{rk−d+1|rk−d} Pr{rk−d|rk−d−1}p(zk−d|xk−d, rk−d) ×R p(xk−d|xk−d−1, rk−d)p(xk−d−1, rk−d−1|z1:k−d−1) dxk−d−1  , (64) and the idea is to numerically approximate p(xk−d−1, rk−d−1|z1:k−d−1) using Rao-Blackwellized particle filters

(RBPFs). In the following, we distinguish between jump Markov linear Gaussian systems (JMLGSs) and jump Markov nonlinear systems (JMNLS), and derive the solution for JMLGSs as it is required to the reproduce

(13)

simulation results in the paper. For JMLGSs, the RBPF aims at approximating the following decomposed density

p(xk−d−1, r1:k−d−1|z1:k−d−1) =p(xk−d−1|r1:k−d−1, z1:k−d−1)p(r1:k−1|z1:k−1) (65)

The first density is solved analytically using conditional Kalman filters, while the second density is approximated using SMC techniques [2]. A particle based approximation of the density p(xk−d−1, r1:k−d−1|z1:k−d−1) is thus

given as follows: p(xk−d−1, r1:k−d−1|z1:k−d−1)≈ Np X i=1 e w(i)_k−d−₁δ_r(i) 1:k−d−1(r1:k−d−1), (66) with weights ew(i)_k−d−₁ = w(i)_k−d−₁p(xk−d−1|r(i)1:k−d−1, z1:k−d−1), Np denotes the number of particles, and where

δy(x) is a Dirac point-mass located at the point y. By dropping the past sequence of modes r1:k−d−2in (66) we

arrive at the SMC approximation of the desired density, which is given by

p(xk−d−1, rk−d−1|z1:k−d−1)≈ Np X i=1 e w(i)_k−d−₁δ_r(i) k−d−1 (rk−d−1). (67)

Inserting (67) into (80), the unnormalized probability fPr{rk−d+1|xk−d, z1:k−d} can be approximated as follows:

f Pr{rk−d+1|xk−d, z1:k−d} ≈ X r_k−d Np X i=1 w(i)_k−d−₁Pr{rk−d+1|rk−d} Pr{rk−d|r(i)_k−d−₁}p(zk−d|xk−d, rk−d) × Z p(xk−d|xk−d−1, rk−d) p(xk−d−1|r(i)_1:k−d−1, z1:k−d−1)dxk−d−1. (68)

Since for JMLGS both densities p(xk−d|xk−d−1, rk−d) and p(xk−d−1|r(i)_1:k−d−1, z1:k−d−1) are Gaussian, the

inte-gral in (68) can be solved analytically: Z

p(xk−d|xk−d−1, rk−d) p(xk−d−1|r1:k−d−1(i) , z1:k−d−1)dxk−d−1=N (xk−d; µ(i)(rk−d), Σ(i)(rk−d)). (69)

Thus for the unnormalized probability we arrive at f Pr{rk−d+1|xk−d, z1:k−d} ≈ X r_k−d Np X i=1 w_k−d−(i) ₁Pr{rk−d+1|rk−d} Pr{rk−d|r_k−d−(i) ₁}p(zk−d|xk−d, rk−d) × N (xk−d; µ(i)(rk−d), Σ(i)(rk−d)), (70)

so that finally we can compute

Pr{rk−d+1|xk−d, z1:k−d} = f Pr{rk−d+1|xk−d, z1:k−d} X rk−d+1 f Pr{rk−d+1|xk−d, z1:k−d} . (71)

The above introduced approach can be used as long as k− d − 1 > 0. In case k − d − 1 = 0, the initial probability is evaluated from Pr{r2|x1, z1} = X r1 Z Pr{r2, r1, x0|x1, z1} dx0 = X r1 Z Pr{r2, r1, x0, x1, z1} dx0 p(x1, z1) = X r1 Pr{r2|r1}p(z1|x1, r1) Z p(x1|x0, r1)p(x0) dx0Pr{r1} p(x1, z1) = X r1 Pr{r2|r1}p(z1|x1, r1) Z p(x1|x0, r1)p(x0) dx0Pr{r1} X r2 X r1 Pr{r2|r1}p(z1|x1, r1) Z p(x1|x0, r1)p(x0) dx0Pr{r1} (72)

(14)

Since for JMLGS both densities p(x1|x0, r1) and p(x0) are Gaussian, the integral appearing in the above

expression can be solved analytically, yielding

Pr{r2|x1, z1} = X r1 Pr{r2|r1}p(z1|x1, r1)N (x1; µ(r1), Σ(r1)) Pr{r1} X r2 X r1 Pr{r2|r1}p(z1|x1, r1)N (x1; µ(r1), Σ(r1)) Pr{r1} . (73)

For JMNLS, a very similar solution for approximating Pr{rk−d+1|xk−d, z1:k−d} can be derived using the RBPF

introduced in [3, 4], see also [5].

2.2 _{Model 2}

In case the jump Markov system follows Model 2, the transition pdf simplifies to p(xk|xk−1, rk) = p(xk|xk−1).

For the likelihood and its approximation we can thus write

p(zk|x0:k, z1:k−1) = X rk p(zk, rk|x0:k, z1:k−1) = X rk p(zk, rk, x0:k, z1:k−1) p(x0:k, z1:k−1) =X rk p(zk|xk, rk)p(xk|xk−1) Pr{rk|x0:k−1, z1:k−1} p(x0:k−1, z1:k−1) p(xk|xk−1)p(x0:k−1, z1:k−1) =X rk p(zk|xk, rk) Pr{rk|x0:k−1, z1:k−1} (74) and p(zk|xk−d:k, z1:k−1) = X rk p(zk, rk|xk−d:k, z1:k−1) = X rk p(zk, rk, xk−d:k, z1:k−1) p(xk−d:k, z1:k−1) =X rk p(zk|xk, rk)p(xk|xk−1) Pr{rk|xk−d:k−1, z1:k−1} p(xk−d:k−1, z1:k−1) p(xk|xk−1)p(xk−d:k−1, z1:k−1) =X rk p(zk|xk, rk) Pr{rk|xk−d:k−1, z1:k−1}. (75)

Hence, we can use the same approach as in Model 1 and compute the average Jensen-Shannon divergence (AJSD) from the prediction probabilities Pr{rk|x0:k−1, z1:k−1} and Pr{rk|xk−d:k−1, z1:k−1}.

2.2.1 _{Derivation of recursion for}_Pr{r_k_|x_0:k−1_{, z}_1:k−1_}

We can modify the recursion for the prediction probability as follows

Pr{rk|x0:k−1, z1:k−1} = X r_k−1 Pr{rk|rk−1}p(zk−1|xk−1, rk−1)p(xk−1|xk−2) Pr{rk−1|x0:k−2, z1:k−2} X rk X r_k−1 Pr{rk|rk−1}p(zk−1|xk−1, rk−1)p(xk−1|xk−2) Pr{rk−1|x0:k−2, z1:k−2} = X rk−1 Pr{rk|rk−1}p(zk−1|xk−1, rk−1) Pr{rk−1|x0:k−2, z1:k−2} X rk X r_k−1 Pr{rk|rk−1}p(zk−1|xk−1, rk−1) Pr{rk−1|x0:k−2, z1:k−2} (76)

which is initialized with Pr{r1|x0, z1:0} = Pr{r1}. The recursion can be further rewritten in an algorithmic

structure as follows:

(15)

• For n = 1, . . . , k − 1 evaluate the prediction probability from the following recursion Pr{rn+1|x0:n, z1:n} = X rn Pr{rn+1|rn}p(zn|xn, rn) Pr{rn|x0:n−1, z1:n−1} X rn+1 X rn Pr{rn+1|rn}p(zn|xn, rn) Pr{rn|x0:n−1, z1:n−1} (77)

2.2.2 _{Derivation of recursion for}_Pr{r_k_|x_k−d_:k−1_{, z}_1:k−1_}

Similarly, we can rewrite the modified prediction probability as follows

Pr{rk|xk−d:k−1, z1:k−1} = X r_k−1 Pr{rk|rk−1}p(zk−1|xk−1, rk−1) Pr{rk−1|xk−d:k−2, z1:k−2} X rk X rk−1 Pr{rk|rk−1}p(zk−1|xk−1, rk−1) Pr{rk−1|xk−d:k−2, z1:k−2} , (78)

and the recursions are initiated with Pr{rk−d+1|xk−d, z1:k−d}. The recursion rewritten in an algorithmic

struc-ture gives:

• Initialize the recursions with Pr{rk−d+1|xk−d, z1:k−d}

• For n = k − d + 1, . . . , k − 1 evaluate the modified prediction probability from the following recursion

Pr{rn+1|xk−d:n, z1:n} = X rn Pr{rn+1|rn}p(zn|xn, rn) Pr{rn|xk−d:n−1, z1:n−1} X rn+1 X rn Pr{rn+1|rn}p(zn|xn, rn) Pr{rn|xk−d:n−1, z1:n−1} . (79)

2.2.3 Computing the initial probabilityPr{rk−d+1|xk−d, z1:k−d}

The approach for computing the initial probability closely follows the approach introduced for Model 1. The necessary modifications are as follows. The initial probability can be rewritten

Pr{rk−d+1|xk−d, z1:k−d} = X r_k−d X r_k−d−1 Pr{rk−d+1|rk−d} Pr{rk−d|rk−d−1}p(zk−d|xk−d, rk−d) ×R p(xk−d|xk−d−1)p(xk−d−1, rk−d−1|z1:k−d−1) dxk−d−1 X r_k−d+1 X r_k−d X r_k−d−1 Pr{rk−d+1|rk−d} Pr{rk−d|rk−d−1}p(zk−d|xk−d, rk−d) ×Rp(xk−d|xk−d−1)p(xk−d−1, rk−d−1|z1:k−d−1) dxk−d−1  . (80) For JMLGSs the unnormalized probability fPr{rk−d+1|xk−d, z1:k−d} can be approximated using the RBPF as

follows: f Pr{rk−d+1|xk−d, z1:k−d} ≈ X r_k−d Np X i=1 w(i)_k−d−₁Pr{rk−d+1|rk−d} Pr{rk−d|r (i) k−d−1}p(zk−d|xk−d, rk−d) × Z p(xk−d|xk−d−1) p(xk−d−1|r(i)_1:k−d−1, z1:k−d−1)dxk−d−1. (81)

The integral in (81) can be solved analytically: Z

p(xk−d|xk−d−1) p(xk−d−1|r(i)1:k−d−1, z1:k−d−1)dxk−d−1=N (xk−d; µ(i), Σ(i))). (82)

with µ(i)_{and Σ}(i) _{now independent of r}

k−d. We arrive at f Pr{rk−d+1|xk−d, z1:k−d} ≈ X r_k−d Np X i=1 wk−d−(i) 1Pr{rk−d+1|rk−d} Pr{rk−d|rk−d−(i) 1}p(zk−d|xk−d, rk−d) × N (xk−d; µ(i), Σ(i)), (83) and finally Pr{rk−d+1|xk−d, z1:k−d} = f Pr{rk−d+1|xk−d, z1:k−d} X r_k−d+1 f Pr{rk−d+1|xk−d, z1:k−d} . (84)

(16)

In case k− d − 1 = 0, the initial probability is evaluated from Pr{r2|x1, z1} = X r1 Pr{r2|r1}p(z1|x1, r1) Z p(x1|x0)p(x0) dx0Pr{r1} X r2 X r1 Pr{r2|r1}p(z1|x1, r1) Z p(x1|x0)p(x0) dx0Pr{r1} = X r1 Pr{r2|r1}p(z1|x1, r1) Pr{r1} X r2 X r1 Pr{r2|r1}p(z1|x1, r1) Pr{r1} . (85)

2.2.4 Special case whenPr{rk|x0:k−1, z1:k−1} and Pr{rk|xk−d:k−1, z1:k−1} are equal

In order to simplify the exposition, we assume that d = 2 and we want to evaluate the prediction probabilities for k = 1, 2, 3. From the recursions introduced in the previous subsections we obtain:

Pr{r1|x0, z1:0} = Pr{r1} (86) Pr{r2|x0:1, z1} = X r1 Pr{r2|r1}p(z1|x1, r1) Pr{r1} X r2 X r1 Pr{r2|r1}p(z1|x1, r1) Pr{r1} (87) Pr{r3|x0:2, z1:2} = X r2 Pr{r3|r2}p(z2|x2, r2) Pr{r2|x0:1, z1} X r3 X r2 Pr{r3|r2}p(z2|x2, r2) Pr{r2|x0:1, z1} (88)

A depth of d = 2 means that we keep the previous two states in the calculations of Pr{rk|xk−d:k−1, z1:k−1}.

Thus, we arrive at Pr{r1|x−1, z1:0} = Pr{r1} (89) Pr{r2|x0:1, z1} = X r1 Pr{r2|r1}p(z1|x1, r1) Pr{r1} X r2 X r1 Pr{r2|r1}p(z1|x1, r1) Pr{r1} (90) Pr{r3|x1:2, z1:2} = X r2 Pr{r3|r2}p(z2|x2, r2) Pr{r2|x1, z1} X r3 X r2 Pr{r3|r2}p(z2|x2, r2) Pr{r2|x1, z1} . (91)

We evaluate the initial probability Pr{r2|x1, z1} from (85) and observe that this is equivalent to Pr{r2|x0:1, z1}

given in (87). Hence, Pr{r3|x0:2, z1:2} = Pr{r3|x1:2, z1:2} must hold and thus AJSD= 0 for k = 1, 2, 3. This

result can be generalized to arbitrary depths d, meaning that for k = 1, . . . , d + 1 we have AJSD= 0. Note, that the above result is a special case and only holds for Model 2. The reason is the process of integrating out

x₀ _{according to}R_p_(x₁_|x₀_)p(x₀_{) dx}₀_{, which will yield a result that is independent of the mode variable r}₁ _and

(17)

References

[1] C. Fritsche, U. Orguner, L. Svensson, and F. Gustafsson, “Recent results on Bayesian Cramér-Rao bounds for jump Markov systems,” in International conference on Information Fusion (FUSION), Heidelberg, Germany, Jul. 2016, submitted.

[2] A. Doucet, N. J. Gordon, and V. Krishnamurthy, “Particle filters for state estimation of jump Markov linear systems,” IEEE Trans. Signal Process., vol. 49, no. 3, pp. 613–624, 2001.

[3] E. Özkan, F. Lindsten, C. Fritsche, and F. Gustafsson, “Recursive maximum likelihood identification of jump Markov nonlinear systems,” IEEE Trans. Signal Process., vol. 63, no. 3, pp. 754–765, Mar. 2015.

[4] S. Saha and G. Hendeby, “Rao-Blackwellized particle filter for Markov modulated nonlinear dynamic systems,” in IEEE Workshop on Statistical Signal Processing (SSP), Gold Coast, Australia, Jun. 2014, pp. 272–275. [5] C. Fritsche and F. Gustafsson, “The marginal Bayesian Cramér-Rao bound for jump Markov systems,” IEEE

(18)

(19)

Division of Automatic Control

Department of Electrical Engineering 2016-02-12

Språk Language ✷ Svenska/Swedish ✷ Engelska/English ✷ ⊠ Rapporttyp Report category ✷ Licentiatavhandling ✷ Examensarbete ✷ C-uppsats ✷ D-uppsats ✷ Övrig rapport ✷ ⊠

URL för elektronisk version

http://www.control.isy.liu.se

ISBN

—

ISRN

—

Serietitel och serienummer

Title of series, numbering

ISSN

1400-3902

LiTH-ISY-R-3089

Titel

Title

Supplementary Material for “Recent results on Bayesian Cramér-Rao bounds for jump Markov systems”.

Författare

Author

Carsten Fritsche, Umut Orguner

Sammanfattning

Abstract

This report contains supplementary material for the paper [1], and gives detailed proofs of all lemmas and propositions that could not be included into the paper due to space limitations. The notation is adapted from the paper.

Nyckelord

Keywords Bayesian Cramér-Rao Lower Bounds, jump Markov nonlinear systems, Monte Carlo simula-tions, Rao-Blackwellized Particle Filter