On the Cramer-Rao bound for Terrain-Aided Navigation
Niclas Bergman
Department of Electrical Engineering Linkoping University, S-581 83 Linkoping, Sweden
www:
http://www.control.isy.liu.seemail:
niclas@isy.liu.seLiTH-ISY-R-1970 August 26, 1997
REGLERTEKNIK
AUTOMATIC CONTROL
LINKÖPING
Technical reports from the Automatic Control group in Linkoping are available as UNIX-compressed Postscript les by anonymous ftp at the address
130.236.20.24(ftp.control.isy.liu.se)
.
On the Cramer-Rao bound for Terrain-Aided Navigation
Niclas Bergman August 26, 1997
1 Introduction
The Cramer-Rao bound sets a lower limit on the error covariance matrix of an unbiased estimator. It is calculated from the Fisher information matrix, which depends on the joint probability density function (PDF) of the estimated parameters and the observations. Since the Cramer-Rao bound depends on the true value of the parameters it is mainly used for analysis. The Cramer- Rao bound also sets a lower limit on the mean square error performance of the estimator thus it can be used to test how far from optimal the estimator performs.
This paper is organized as follows. We review and derive the Cramer-Rao bound in the next section. In the last section a brief background to the tarrain- aided navigation (TAN) problem is presented and the Cramer-Rao bound for TAN is derived.
2 Theory
This section serves as a background to state estimation and reviews some rela- tions with the Fisher information matrix and the Cramer-Rao bound. Mainly, we follow the presentation and notation in Sch91].
2.1 Maximum likelihood estimation
Let p ( ) denote the PDF for the stochastic vector which is parameterized in the parameter vector . We are interested in the problem of, from measurements or samples ^ of , estimate the unknown parameter . One natural estimate is the maximum likelihood (ML) estimate
^ = argmax p ^
:
This estimate has some interesting properties such as being asymptotically nor- mal and unbiased.
For ease of notation we make some denitions.
Denition 1
The likelihood function l ( ^ ) is
l ( ^ ) =
Mp ^
the log-likelihood function L ( ^ ) is
L ( ^ ) = ln
Mp ^
and the score function s ( ) is
s ( ) =
M@
@L ( ) :
If s ( ) is continously dierentiable, the ML estimate satises
s (^ ) = 0 : (1)
We will assume that this equation only has one solution.
Now, we extend the idea of ML to the case of stochastic parameters
. Let p
( ) denote the joint PDF for the stochastic variables and
. For an observed value ^ of the ML estimate of
is
^ = argmax p
^
: Using
p
( ) = p
j(
j) p
( ) (2) it is easy to incorporate the prior information, before observing ^ , in the ML estimate. Using (1) we see that the ML estimate must satisfy
s (^ ) = @
@
ln p
j(
j) + ln p
( )
=^= 0 :
The conditional PDF p
j(
j) for the case of a stochastic parameter plays the same role as p ( ) for the case of non-stochastic parameters. Alternatively the conditioning in (2) could be interchanged, yielding
p
( ) = p
j(
j) p ( ) and the ML estimate should thus also satisfy
s (^ ) = @
@
ln p
j(
j)
=^
= 0 :
The PDF p
j(
j) is the posterior PDF for
given , this explains why the ML estimate is often called the maximum a posteriori (MAP) estimate of
.
In the sequel of this section we will work with non-stochastic parameters
but from the discussion above it follows that the stochastic case is handled by
substituting p ( ) with p
( ).
2.2 Fisher's information matrix
The rst and second central moments are two fundamental properties of an estimator,
E
^E
(
^;E
^)(
^;E
^)
T:
A more interesting measure on the performance of an estimator is the mean deviation from the estimated stochastic variable
Denition 2
The error covariance matrix C for an estimator is C =
ME
h(
;^)(
;^)
TiUsing the denition above, the error covariance can be written C = E
h(
^;E
^)(
^;E
^)
Ti+
hE
;^ihE
;^iT:
For an unbiased estimator the second term disappears and the error covariance matrix is the estimator covariance matrix,
E
^=
)C = E
h(
^;E
^)(
^;E
^)
Ti:
Let
kx
k2denote the usual 2-norm of a vector,
kx
k2= x
Tx . The diagonal elements of C are the mean square errors for each of the scalar estimators
^n, the sum of these is the mean square error of the total estimator,
tr C = E
hk;k^2i:
The performance of an estimator can be evaluated using Monte Carlo analysis to nd this mean square error, evaluating the estimator M times for dierent noise realizations,
M 1
M
X
k =1
k
;^
(k )k2!tr C as M
!1:
Above, ^
(k )denotes the estimate at Monte Carlo run k . The Fisher information matrix will give us a lower bound on C that can be compared with the Monte Carlo result in an evaluation of an estimator.
Replacing the realization by the stochastic variable in the score function we get the stochastic score function,
s ( ) = @
@L ( ) = @
@ ln p ( ) :
Theorem 1
E s ( ) = 0
Es
(
) =
E @@
ln
p(
) =
Z @@
ln
p(
)]
p(
)
d=
Z @@
p
(
)
d=
@@
Z
p
(
)
d= 0
:2
Denition 3 (The Fisher information matrix)
The Fisher information matrix J ( ) is the covariance matrix of the score function J ( ) =
ME
s ( ) s
T( )
= E
"
@ @ ln p ( )
@
@ ln p ( )
T
#
There is an alternative, and in some cases simpler, expression for the information matrix.
Theorem 2
J ( ) =
;E
"
@ @
@
@ ln p ( )
T
#
Proof:
Since
@
@
ln
p(
) = 1
p
(
)
@@ p(
) (3) the second gradient of ln
p(
) is
@
@
@
@
ln
p(
)
T=
p(
)
@@ ;@@p(
)
T ;@@p(
)
;@@p(
)
T(
p(
))
2=
@@;
@
@
p
(
)
Tp
(
)
;@@ln
p(
)
@@
ln
p(
)
T:Taking expectation on both sides yields the desired result since, using (3), we get
E
"
@
@
;
@
@
p
(
)
Tp
(
)
#
=
Z @@
@
@
p
(
x)
T
dx
=
Z @@
p
(
x)
@@
ln
p(
x)
dx=
@@
Es
(
) = 0
by Theorem 1.
22.3 The Cramer-Rao bound
For the derivation of the Cramer-Rao bound an intermediate result is needed.
Theorem 3
The cross covariance matrix between the score function and any unbiased esti- mator is the identity matrix, i.e., if E
^=
E
hs ( )(
^;)
Ti= I:
Proof:
The unbiasedness may be expressed as
Z
(^
;)
Tp(
)
d= 0
:Take the gradient w.r.t.
of this expression
Z
@
@
h
(^
;)
Tp(
)
id= 0
Z
@
@
p(
)](^
;)
Td;Z Ip(
)
d= 0
:Using (3), we get
Z
@
@
ln
p(
)](^
;)
Tp(
)
d=
I:2
The famous Cramer-Rao lower bound can now be proven.
Theorem 4 (The Cramer-Rao bound)
Let be an n -dimensional stochastic variable with PDF p ( x ). For any un- biased estimator ^ ( ), the error covariance matrix is bounded by
C = E
h(
;^)(
;^)
TiJ ( )
;1if J > 0.
Proof:
(From Sch91].) Study the stochastic vector
;
^
s
(
)
its covariance matrix,
Q
=
E
;
^
s
(
)
(
;^)
T sT(
)
=
C I
I J
can be diagonalized if
J>0
I ;J
;1
0
I
C I
I J
I
0
;J
;1
I
=
C;J
;1
0
0
J
:
Since
Qis a covariance matrix it is positive semi denite by denition. Further,
Q
0
)ATQA0 and we get
CJ
;1
:
2
3 Application { Terrain-Aided Navigation
Both civil and military aircraft navigation is based on inertial navigation systems (INS). These dead-reckoning systems measure accelerations and attitude angle rates of the aircraft. The position and velocity of the aircraft is found by integrating these measurements from a given initial position and velocity. Due to the open loop conguration of these dead-reckoning systems they will have an error that drifts away with time. Using terrain information is one among several ways to limit the errors in INS.
The principle of terrain-aided navigation is as follows. A combination of the INS and a barometric pressure sensor gives the aircraft absolute altitude over mean sea-level. With a vertical radar the aircraft measures the distance to the ground below it, the dierence between the aircraft absolute altitude and this ground clearance measurement is an estimate of the terrain altitude over mean sea-level. Comparing measurements of this terrain altitude variations with a pre-stored map of the terrain it is possible to estimate the aircraft position. For a more detailed background on the terrain aided navigation problem see Ber96].
3.1 The estimation model
The estimation of the aircraft position in the map is a nonlinear problem. The INS gives estimates of the aircraft movement between measurements and the map gives the nonlinear relation between the measurements and the position of the aircraft. Let
xtdenote the two-dimensional position of the aircraft over the map. Assuming no bias errors in the absolute aircraft altitude, the model of the estimation problem is
x
k +1
=
xk+ u
k+
vky
k
= h (
xk) +
ekk = 0 1 ::: (4) The INS provides the estimate u
kof heading and distance own from the last measurement to the next, the process noise
vkmodels the error in this estimate.
The nonlinear function h (
) is the on-board carried map, it is stored as a mesh of values of terrain height in known positions. The terrain height over mean sea-level at any point inside the map can be found through interpolation of the neighboring values. The additive measurement error,
ek, models both errors in the vertical radar, errors in the map and errors in the absolute aircraft altitude.
As seen in (4) the problem is linear in the state update but nonlinear in the measurements. Further the nonlinearity has no structure since the map is just a set of points on a grid. Thus the estimation problem is non-trivial and the algorithm solving this problem must be approximate. For this reason there is a need to know just how well the algorithm can perform over a certain map. The Cramer-Rao bound gives a lower bound on the mean square error performance of any unbiased state estimation algorithm. Comparing the Cramer-Rao bound with Monte Carlo simulations will give an indication of how \approximate" the algorithm is.
Let
N( mP ) denote the n-dimensional Gaussian distribution with mean vector m and covariance matrix P ,
N
( mP ) = 1
p
(2 )
njP
jexp
;
1 2( x
;m )
TP
;1( x
;m )
:
The noises in the model (4) are assumed to be white, independent of each other and distributed with
v
k
p
vk( x ) =
N(0 Q
k)
ekp
ek( x ) =
N(0 R
k) : The prior distribution of the position is described by
x
0
p
x0( x ) =
N( x
0P
0)
and it is assumed to be independent of both measurement and process noise.
Note that the model is two-dimensional with scalar measurements but we will keep the matrix notation so that the the result below will be valid for general dimensions on both
xkand
yk. We will assume that both Q
kand R
kare strictly positive denite so that there inverses always exist.
3.2 The Cramer-Rao bound of the TAN problem
We will consider the one step prediction problem, nding a lower bound on E
(
xk;^
xk jk ;1)(
xk;x^
k jk ;1)
Tthe lter result follows analogously.
The key to the Cramer-Rao bound is the likelihood. Due to the nonlinear measurement equation in (4) the likelihood for all gathered measurements until time k
;1 given that the actual position is x
kat time k will be hard to nd.
Even though the state update is linear, and thus the states will be Gaussian distributed, the unstructured nonlinearity in the measurement equation makes the measurement's distribution impossible to nd. The trick here is to consider the whole sequence of states from initial time k = 0 until present time and derive the bound for the estimation of this whole sequence. The bound on the one step predictor performance can be found from this result.
Denote the complete state vector history by X
k=
x
T0x
T1::: x
TkTand similarly Y
kfor all measurements taken until time k . Using (2) and the whiteness of
vkrepeatedly, the prior for
Xkis
p
Xk;X
k= p
xkjXk ;1;x
kjX
k ;1::: p
x1jx0( x
1jx
0) p
x0( x
0)
= p
x0( x
0)
Yki=1
p
vi;1( x
i;x
i;1;u
i;1) : (5) Likewise, with (2) and the model (4) the likelihood is
p
Yk ;1jXk;Y
k ;1jX
k= p
yk ;1jYk ;2Xk;y
k ;1jY
k ;2X
k::: p
y0jXk;y
0jX
k=
k ;1Yi=0
p
ei( y
i;h ( x
i)) (6)
since
ekis white.
The stochastic score function for the estimation of the complete state history
X
k
from all measurements up to and including k
;1 is s ( X
kYk ;1) = @
@X
kln p
Yk ;1jXk;Yk ;1jX
k+ @
@X
kln p
Xk;X
k: (7) Denoting the weighted quadratic norm by
k
x
k2Q= x
TQx and inserting (5) and (6) in (7) we have
s ( X
kYk ;1) =
;@
@X
k"
1
2
k
x
0;x
0k2P;1
0
+
k ;1Xi=0 1
2 ky
i
;
h ( x
i)
k2R;1
i
+ +
Xki=1 1
2
k
x
i;x
i;1;u
i;1k2Q;1
i;1
#
: In Appendix B some common formulas on vector gradients are summarized.
Using these we get
s ( X
kYk ;1) =
2
6
6
6
6
6
4
D
0;P
0;1( x
0;x
0) D
1;Q
;10( x
1;x
0;u
0) D
k ;1;Q
k ;2( x
k ;1...
;x
k ;2;u
k ;2)
;
Q
;1k ;1( x
k;x
k ;1;u
k ;1)
3
7
7
7
7
7
5
where
D
i= @h
T( x
i)
@x
iR
;1i ;yi;h ( x
i)
+ Q
;1i( x
i+1;x
i;u
i) : Taking the gradient of the stochastic score function above yields
@X @
ks
T( X
kYk ;1) =
2
6
6
6
6
6
6
4
D
00;P
0;1Q
;10Q
;10D
01;Q
;10Q
;11Q
;11... ...
... D
0k ;1;Q
;1k ;2Q
;1k ;1Q
;1k ;1 ;Q
;1k ;13
7
7
7
7
7
7
5
(8)
where D
0i= @
@x
i@h
T( x
i)
@x
i
R
;1i ;yi;h ( x
i)
;@h
T( x
i)
@x
iR
;1i@h
T( x
i)
@x
i
T
;
Q
;1iTheorem 2 says that
J ( X
k) =
;E
@
@X
ks
T( X
kYk ;1)
i.e., taking negative expectation of (8) yields the Fisher information matrix for the complete state vector history. Since
yiare the only stochastic entities in (8) and E
yi= h ( x
i) we get
J ( X
k) =
2
6
6
6
6
6
6
4
S
0+ P
0;1 ;Q
;10;
Q
;10S
1+ Q
;10 ;Q
;11;
Q
;11... ...
... S
k ;1+ Q
;1k ;2 ;Q
;1k ;1;
Q
;1k ;1Q
;1k ;13
7
7
7
7
7
7
5
(9)
where
S
i= H
iR
;1iH
iT+ Q
;1iand
H
i= @h
T( x
i)
@x
i:
Note that the information matrix (9) will depend on the actual state value X
k. The information matrix can be rewritten compactly as a recursion in k , introduce the abbreviated notation
Jk= J ( X
k)
J
k
=
J 11
k J
12
(
Jk12)
T Jkk22
=
2
4 J
11
k ;1
J 12
k ;1
0
(
Jk ;112)
T Jk ;122+ S
k ;1 ;Q
;1k ;10
;Q
;1k ;1Q
;1k ;13
5
(10) starting the recursion with
J
0
=
J022= P
0;1:
Alternately the information matrix for the lter estimate of X
kgiven
Ykcould be derived. It is straight forward to verify that, with obvious notation, the information matrix obeys the following two-step, measurement update, time update, recursion
J
k jk
=
"
J 11
k jk ;1
J 12
k jk ;1
(
Jk jk ;112)
T Jk jk ;122+ H
kR
;1kH
kT#
(11)
J
k +1jk
=
2
6
4 J
11
k jk
J 12
k jk
0
(
Jk jk12)
T Jk jk22+ Q
;1k ;Q
;1k0
;Q
;1kQ
;1k3
7
5
: (12)
The Cramer-Rao bound, Theorem 4, says that
C
k +1jk
= E
h(
Xk +1;X^k +1jk)(
Xk +1;X^k +1jk)
TiJk +1jk;1:
Using the corollary to the block matrix inversion lemma in Appendix A, Corol- lary 1, we have
J
;1
k +1jk
=
Mk +1jk=
2
6
4 M
11
k jk M
12
k jk
M 12
(
M12k jk)
T M22k jk Mk jk22k jk(
M12k jk)
T M22k jk M22k jk+ Q
k3
7
5
:
As mentioned above we assume that Q
k> 0. Making the same partitioning of the error covariance
C
k +1jk
= E
"
~
X
k jk
(
X~k jk)
T X~k jk~xTk +1jk~ x
k +1jk
(
X~k jk)
T ~xk +1jkx~Tk +1jk#
=
C
k jk
?
? C
k +1jk
where
~
X
k jk
=
Xk;X^k jkand
~ x
k +1jk
=
xk +1;^xk +1jksummarizing the relations above the Cramer-Rao bound says that
C
k +1jk
M
k +1jk
C
k jk
?
? C
k +1jk
"
M
k jk
?
?
M22k jk+ Q
k#
:
Thus, introducing the notation P
k +1for the Cramer-Rao bound for the one step ahead predictor error covariance, we have
C
k +1jkM22k jk+ Q
k= P
k +1: From (11), (12) and Lemma 1 we get
P
k +1;1=
Jk +1jk22 ;(
Jk +1jk12)
T(
Jk +1jk11)
;1Jk +1jk12= Q
;1k ;0
;Q
;1k(
Jk +1jk11)
;1
0
;
Q
;1k
= Q
;1k ;Q
;1k ;Q
;1k+ H
kR
;1kH
kT+
Jk jk ;122 ;(
Jk jk ;112)
T(
Jk jk ;111)
;1Jk jk ;112| {z }
P
;1
k
;1
Q
;1k: That is, the Cramer-Rao bound for the complete state sequence says that the one step ahead predictor error covariance will be bounded by
C
k +1jkP
k +1= Q
;1k ;Q
;1k ;H
kR
;1kH
kT+ Q
;1k+ P
k;1;1Q
;1k ;1: Using the matrix inversion lemma, Lemma 2 in Appendix A, twice on the ex- pression above a familiar recursion for P
kis found,
P
k +1= Q
;1k ;Q
;1k ;H
kR
;1kH
kT+ Q
;1k+ P
k;1;1Q
;1k ;1= Q
;1k+ ( P
k;1+ H
kR
;1kH
kT)
;1= P
k;P
kH
k( H
kTP
kH
k+ R
k)
;1H
kTP
k+ Q
k:
The last expression coincides with the recursion for the error covariance in the discrete time Kalman lter, see, e.g., AM79]. Thus, the expression above is the Riccati recursion for the Kalman lter state error covariance solution to the model (4) where the nonlinear function h (
) has been replaced by its gradient evaluated at the true state value at time k . This result is rather reassuring since, if the nonlinearity in fact is linear we can never outperform the Kalman
lter, which is known to be the optimal state estimator for the linear model.
4 Conclusion
To summarize the result of this text, any unbiased estimator of the states in (4) has an error covariance that, at time k , is bounded by the relation
E
(
xk;^xk jk ;1)(
xk;^xk jk ;1)
TP
kwhere
P
k +1= P
k;P
kH
k( H
kTP
kH
k+ R
k)
;1H
kTP
k+ Q
kand the recursion is initiated with the initial state covariance P
0. Further, H
kis the gradient of h (
), evaluated at the true state value at time k ,
H
k= @h
T( x
k)
@x
k:
We refer to Ber97] for some simulation analysis, using the Cramer-Rao bound for TAN.
A Some Matrix Relations
In this appendix some useful matrix identities are summarized, the relations have been taken from Kai80].
The inverse of a block matrix i described in the following lemma.
Lemma 1 (Block matrix inversion)
If A
;1exists
A D C B
;1
=
A
;1+ A
;1D
;1CA
;1 ;A
;1D
;1;
;1
CA
;1 ;1
where = B
;CA
;1D .
Proof:
Multiplying the original block matrix with the right hand side expression
yields an identity block matrix.
2Note that the matrix is known as the Schur complement of
ACDB.
Corollary 1
If B
;1exists
A D C B
;1
=
;1
;
;1
DB
;1;
B
;1C
;1B
;1+ B
;1C
;1DB
;1
where = A
;DB
;1C .
Proof:
Apply the permutation matrix
P=
0II0=
P;1on both sides of the expres-
sion in the lemma above.
2A very useful identity in systems theory is a relation for the inverse of a sum
of matrices, known as the matrix inversion lemma.
Lemma 2 (The matrix inversion lemma)
If A
;1and C
;1exist
( A + BCD )
;1= A
;1;A
;1B ( DA
;1B + C
;1)
;1DA
;1:
Proof:
Multiplying the left hand side without the inverse with the right hand side
will yield an identity matrix.
2B Vector Gradients
Here we summarize some denitions and relations on vector gradients, in the appendix of chapter 6 in Sch91] more details can be found.
Let a ( x ) denote a scalar function of the p
1 vector x . The gradient of a (
) with respect to x is the p
1 vector:
@xa @ ( x ) =
2
6
6
6
6
4
@a(x)
@x1
@a(x)
@x2
...
@a(x)
@x
p 3
7
7
7
7
5
:
The generalization to vector valued functions is straight forward, let a
T( x ) be a 1
n row vector, then
@xa @
T( x ) =
2
6
6
4
@a
1 (x)
@x1
:::
@a@x1n(x)... ...
@a1(x)
@x
p
:::
@an(x)@xp 37
7
5