2 and H1 Optimal Estimation

(1)

On

^H

2 and

^H¹

Optimal Estimation

Urban Forssell

Department of Electrical Engineering, Linkoping University S-581 83 Linkoping, email: ufo@isy.liu.se

July 3, 1996

Abstract

We review some existing results on

^H2

and

^H1

estimation and ex- plore possible connections between the optimal algorithms. For instance, in order to relate the

^H2

optimal Kalman lter to the

^H1

lters we show that, with special choices of the covariance matrices, the Kalman lter is

^H¹

optimal. Moreover, by studying the matrix operator relating the estimation errors and the disturbances, we obtain simple and useful in- terpretations of both the

^H²

and the

^H¹

results. Finally, an

^H¹

error bound for the RLS algorithm is derived.

1 Introduction

Assume you have a state-space model of a system and you want to estimate the states given measurements of the output. A standard approach to this problem is to minimize some quadratic criterion involving the estimation errors. This least-squares approach is attractive from many points of view, one is that it frequently enables the use of extremely ecient methods for nding the optimizing estimate.

The LMS algorithm 1, 2], for instance, was conceived as an approximate solution to the following problem: given a sequence,

^f'ⁱ^g

, of

ⁿ

1 input vectors and a corresponding sequence of desired outputs

^fyⁱ^g

, nd the estimate of the

n

1 parameter vector

that minimizes the squared error

N

X

i=0 jy

i

;' T

i

j 2

:

In the solution the estimate is recursively updated in the direction of the instan-

taneous gradient of the squared error. LMS is a very simple recursive algorithm

and it is considered very robust. However, since LMS only provides an approxi-

mate solution to the least squares problem (the exact solution can be computed

using the RLS algorithm 3, 4]) it is interesting to note that, in 5], it is shown

that LMS actually gives an exact solution of another problem, namely a cer-

tain minimax problem. The standard name for this kind of problems in the

(2)

literature today is

^H¹

problems. The aim in

^H¹

estimation is to minimize the maximal energy gain from the disturbances to the estimation errors. The

H

1

criterion can thus be understood as a worst-case criterion: the estimator will be robust against the worst possible disturbances. This is a completely dierent, and not very well known, approach to the estimation problem com- pared to the least-squares, or

^H²

, approaches that are the standard tools today.

In this contribution we will therefore review some existing results on both

^H²

and

^H¹

estimation and also illustrate various connections between the optimal algorithms.

Returning to LMS, we may also note that in 5] it is shown that LMS is not only

H

1

optimal but that it is in fact the central

^H¹

lter, implying that LMS also minimizes a risk-sensitive criterion under certain assumptions and that it is the minimum entropy lter in case of steady-state LTI ltering 6]. Furthermore, the version of LMS called Normalized LMS is shown to be the central

^H¹

a posteriori lter as opposed to LMS which, more correctly, is the central

^H¹

a priori lter (the vocabulary will be explained below).

In Sections 2 and 3 we will, for completeness and ease of reference, state the solutions to the

^H²

optimal and the

^H¹

optimal state estimation problem, respectively. The material in these sections is well-known to most readers and much discussed in the literature. This is especially true for Section 2 which therefore will be very brief. Section 3 contains less familiar results perhaps here we focus on the

^H¹

estimation problem and we will give a thorough statement of both the

^H¹

criterion and the optimizing solution. Then in Section 4 we will narrow the scope a bit and consider the problem of tracking a time-varying system. We will then assume that the parameters are time varying according to a random walk model and that the output can be described by a linear regression. Within this framework we will discuss various aspects of the two approaches in order to link them together. As we will see, the solutions are in some respects closely related while in others they are not. Finally, in Section 5 we derive an

^H¹

error bound for RLS.

2

^H

2 Optimal Estimation

In this section we present two versions of the celebrated Kalman lter, which is known to be the best linear estimator in the least-squares (

^H²

) sense. The Kalman lter is very well known and much discussed in the literature (see e.g.

4, 7, 8]). We will therefore keep the presentation very brief and mainly use this section to introduce some notation.

Since we mainly will be interested in the predicted estimates, or the a priori estimates and hence we rst state the following result (cf. 7]).

Theorem 1 (The Kalman Filter Equations for Predicted Estimates)

Consider the state-space equations

(

x

i+1

=

^Fⁱ^xⁱ

+

^Gⁱ^wⁱ

y

i

=

^Hⁱ^xⁱ

+

^vⁱ ⁱ

0 (1)

(3)

with

^fwⁱ^vⁱ^x⁰^g

zero-mean random variables such that

E 2

4 w

i

v

i

x

0 3

5 2

4 w

j

v

j

x

0 3

5 T

=

2

4 Q

i

ij

0 0

0

^Rⁱ^ij

0 0 0

⁰

3

5

(2)

and where the matrices

^fFⁱ^Gⁱ^Hⁱ^Qⁱ^Rⁱ

⁰^g

are assumed known. The one- step predicted state estimate of

^xⁱ

given

^fy⁰^:^:^:^y^i;1^g

,

^

x

i

,^x

^

^iji;1

(3)

can be recursively computed via the equations

^

x

i+1

=

^Fⁱ^x

^

ⁱ

+

^K^pi

(

^yⁱ^;^Hⁱ^x

^

ⁱ

)

ⁱ

0 ^

^x⁰

= 0

(4) where the Kalman gain

^K^pi

is given by

K

pi

=

^Fⁱ^Pⁱ^Hⁱ^R^;1^ei

with

^R^ei

=

^Hⁱ^Pⁱ^Hⁱ^T

+

^Rⁱ

(5) and where

^Pⁱ

obeys the discrete time Riccati recursion (DRE)

P

i+1

=

^Fⁱ^Pⁱ^Fⁱ^T

+

^Gⁱ^Qⁱ^G^Tⁱ ^;^K^pi^R^ei^K^pi^T ⁱ

0

^P⁰

=

⁰^:

(6) Furthermore,

^Pⁱ

is the covariance matrix of the instantaneous error in the predicted state estimate:

P

i

,E^x

~

ⁱ

~

^x^Tⁱ ^x

~

ⁱ^,^xⁱ^;^x

^

ⁱ^:

(7) Instead of computing the estimate of

^xⁱ

given

^fy⁰^:^:^:^y^i;1^g

one may want to use measurements up to, and including, time

ⁱ

. The Kalman lter is still the best linear estimator but the lter equations will now involve the ltered quantities

^

x

iji

, i.e. the estimate of

^xⁱ

given

^fy⁰^:^:^:^yⁱ^g

. To formalize the discussion, we state the following corollary to the previous theorem.

Corollary 1 (The Kalman Filter Equations for Filtered Estimates)

When the assumptions in Theorem 1 hold, the ltered state estimates of

^xⁱ

given

^fy⁰^:^:^:^yⁱ^g

can be computed via the recursion

^

x

iji

=

^F^i;1^x

^

^i;1ji;1

+

^Pⁱ^Hⁱ

(

^Hⁱ^Pⁱ^Hⁱ^T

+

^Rⁱ

)

^;1

(

^yⁱ^;^Hⁱ^F^i;1^x

^

^i;1ji;1

)

(8) where

^Pⁱ

obeys the same DRE as in Theorem 1.

The proof consists in the observation that the predicted and ltered state estimates are related through (cf. 7])

^

x

i+1

=

^Fⁱ^x

^

^iji^:

(9)

We may also introduce the ltered Kalman gain

K

fi ,P

i H

i R

;1

ei

(10)

and note the following simple relation between the two Kalman gains

K

pi

=

^Fⁱ^K^fi^:

(11)

(4)

We make one last remark on Kalman ltering before turning to the

^H¹

lters and that is on how to estimate a dierent process than the state sequence.

Suppose you want to estimate

^fzⁱ^g

and that

^zⁱ

and the states

^xⁱ

are related through

z

i

=

^Lⁱ^xⁱ^:

(12)

The best estimate of

^zⁱ

is then given by

^

z

i

=

^Lⁱ^x

^

ⁱ

(13)

where ^

^xⁱ

are the state estimates outputted by the Kalman lter.

3

^H¹

Optimal Estimation

The

^H¹

lters, to be presented in this section, are interesting alternatives to the famed Kalman lter in most estimation problems. As we shall see, the

lter equations are very similar despite that the underlying ideas are completely dierent.

The optimality of the Kalman lter relies on the knowledge of the covariance matrices

^Qⁱ

and

^Rⁱ

. In most real-world applications this kind of a priori infor- mation is not available and one has to use, more or less, ad hoc choices of

^Qⁱ

and

R

i

. Is the resulting lter guaranteed to achieve a certain level of performance?

The answer is no, although the eects of dierent choices of

^Qⁱ

and

^Rⁱ

are well understood and frequently utilized.

The

^H¹

lters, on the other hand, give hard upper bounds on the estimation errors, no matter what the disturbances are (as long as they are of nite energy).

We will now formulate the

^H¹

problem and then present two

^H¹

optimal lters.

We will not give much background material, instead the reader is referred to the papers 6, 9, 10, 11, 12, 13, 14, 15, 16] and the references therein. One may also want to consult some text book on

^H¹

control, e.g. 17, 18], for a presentation of the dual, control problem.

3.1 Formulation of the

^H¹

Problem

Consider a state-space model of the form

(

x

i+1

=

^Fⁱ^xⁱ

+

^Gⁱ^wⁱ

y

i

=

^Hⁱ^xⁱ

+

^vⁱ ⁱ

0 (14) with

^x⁰^fwⁱ^g

and

^fvⁱ^g

unknown quantities and

^fFⁱ^Gⁱ^Hⁱ^g

known matrices of appropriate sizes.

We may now pose the following problem: estimate some linear combination of the states, say

z

i

=

^Lⁱ^xⁱ

(5)

using the measured output

^fyⁱ^g

. Let ^

^zⁱ

=

^K^p

(

^y⁰^:^:^:^y^i;1

) denote the estimate of

^zⁱ

given

^fy⁰^:^:^:^y^i;1^g

, i.e. the predicted, or a priori, estimate, and

^z^iji

=

K

f

(

^y⁰^:^:^:^yⁱ

) the ltered, or a posteriori, estimate given measurements

^fyⁱ^g

up to, and including, time

ⁱ

.

Denition 1 The

^H¹

norm of an operator

^T

is de ned as

kTk

1

= sup

u2l2u6=0 kTuk

2

kuk

2

where

^k^k²

is the usual

^l²

norm of the causal sequence

^fu^k^g

, i.e.

^kuk²²

=

P

1

i=0 ju

i j

2

.

Remark: ^If

^T

a matrix, then the

^H¹

norm of

^T

is the maximum singular value of

^T

,

(

^T

).

Let

^T^N

(

^K^p

) be the transfer operator that maps the disturbances

^f

^;1=2⁰

(

^x⁰^;

^

x

0

)

^fwⁱ^g^N;1ⁱ⁼⁰ ^fvⁱ^g^N;1ⁱ⁼⁰ ^g

(

⁰

denotes the penalty on the initial error) onto the predicted estimation errors

^fzⁱ^;^z

^

ⁱ^g^Nⁱ⁼⁰

and, similarly,

^T^N

(

^K^f

) the operator that maps the disturbances

^f

^;1=2⁰

(

^x⁰^;^x

^

⁰

)

^fwⁱ^g^Nⁱ⁼⁰^fvⁱ^g^Nⁱ⁼⁰^g

onto the ltered estimation errors

^fzⁱ^;^z

^

^iji^g^Nⁱ⁼⁰

. The

^H¹

optimal estimators minimize the

^H¹

norm of the operators

^T^N

(

^K^p

) and

^T^N

(

^K^f

), respectively. The corresponding

H

1

optimal transfer operators will be denoted

^T^N

(

^K¹^p

) and

^T^N

(

^K¹^f

) as in Figure 1. We may interpret the

^H¹

norm as the maximal energy gain from the disturbances to the estimation errors. Hence, the

^H¹

estimators can be viewed as worst-case estimators that will be robust against the worst possible disturbances.

fw

i g

N

i=0

^;1=2⁰

(

^x⁰^;

^

^x⁰

)

^;1=2⁰

(

^x⁰^;^x

^

⁰

)

fw

i g

N;1

i=0

fv

i g

N;1

i=0

fv

i g

N

i=0

fL

i x

i

;^z

^

ⁱ^g^Nⁱ⁼⁰

fL

i x

i

;^z

^

^iji^g^Nⁱ⁼⁰

T

N (K

1

p )

T

N (K

1

f )

Figure 1:

^H¹

optimal transfer operators from disturbances to predicted and

ltered estimation errors.

Our problem may now formally be stated as follows (we only treat the nite horizon case, the innite horizon case follows by taking limits).

Problem 1 (Optimal

^H¹

Problem) Find estimators,

^K^p

and

^K^f

, that min-

imize the

^H¹

norm of the transfer operators

^T^N

(

^K^p

) and

^T^N

(

^K^f

), respectively,

(6)

and obtain the corresponding

2

popt

= inf

K

p kT

N

(

^K^p

)

^k²¹

= inf

K

p

sup

x

0

w2l

2

v2l

2

P

N

i=0 jz

i

;^z

^

ⁱ^j²

(

^x⁰^;^x

^

⁰

)

^T

^;1⁰

(

^x⁰^;^x

^

⁰

) +

^P^N;1ⁱ⁼⁰ ^jwⁱ^j²

+

^P^N;1ⁱ⁼⁰ ^jvⁱ^j²

and

2

fopt

= inf

K

f kT

N

(

^K^f

)

^k²¹

= inf

K

f

sup

x

0

w2l

2

v2l

2

P

N

i=0 jz

i

;^z

^

^iji^j²

(

^x⁰^;^x

^

⁰

)

^T

^;1⁰

(

^x⁰^;^x

^

⁰

) +

^P^Nⁱ⁼⁰^jwⁱ^j²

+

^P^Nⁱ⁼⁰^jvⁱ^j²

Remark: We may also write

2

popt

=

^kT^N

(

^K¹^p

)

^k²¹

and

^fopt²

=

^kT^N

(

^K^f¹

)

^k²¹

using our previous denitions of

^T^N

(

^K¹^p

) and

^T^N

(

^K¹^f

).

Closed form solutions to the optimal

^H¹

problem are available only in some special cases (cf. 5]) and it is common in the literature to settle for a sub- optimal solution.

Problem 2 (Sub-optimal

^H¹

Problem) Given

^p ^>

0 and

^f ^>

0, nd estimation strategies that achieve

sup

x0w2l2v2l2

P

N

i=0 jz

i

;^z

^

ⁱ^j²

(

^x⁰^;^x

^

⁰

)

^T

^;1⁰

(

^x⁰^;^x

^

⁰

) +

^P^N;1ⁱ⁼⁰ ^jwⁱ^j²

+

^P^N;1ⁱ⁼⁰ ^jvⁱ^j² ^<^p²

and

sup

x0w2l2v2l2

P

N

i=0 jz

i

;^z

^

^iji^j²

(

^x⁰^;^x

^

⁰

)

^T

^;1⁰

(

^x⁰^;^x

^

⁰

) +

^P^Nⁱ⁼⁰^jwⁱ^j²

+

^P^Nⁱ⁼⁰^jvⁱ^j² ^<^f²

Note: this requires checking whether

^p^popt

and

^f ^fopt

.

3.2 Solution of the Sub-optimal

^H¹

Problem

We now give solutions to the sub-optimal

^H¹

problem stated in the previous section. The results are presented as two theorems (cf. 15, 16]).

Theorem 2 (An

^H¹

A Priori Filter) For a given

^>

0, if the

^Fⁱ ^Gⁱ

have full rank, then an estimator that achieves

^kT^N

(

^K^p

)

^k¹ ^<

exists if, and only if,

~

P

;1

i

=

^Pⁱ^;1^;^;2^L^Tⁱ^Lⁱ^>

0

ⁱ

= 0

^:^:^:^N

(15) where

^P⁰

=

⁰

and where

^Pⁱ

obeys the Riccati recursion

P

i+1

=

^Fⁱ^Pⁱ^Fⁱ^T

+

^Gⁱ^G^Tⁱ ^;^Fⁱ^Pⁱ^Hⁱ^T ^L^Tⁱ^R^;1^ei

H

i

L

i

(16)

(7)

with

R

ei

=

I

0

^;²^I

+

H

i

L

i

P

i

H T

i L

T

i

:

(17)

If this is the case, then one possible level-

^H¹

lter is given by

^

z

i

=

^Lⁱ^x

^

ⁱ

(18)

^

x

i+1

=

^Fⁱ^x

^

ⁱ

+

^K^ai

(

^yⁱ^;^Hⁱ^x

^

ⁱ

) (19) where

K

ai

=

^Fⁱ^P

~

i H

i

(

^I

+

^Hⁱ^P

~

i H

T

i

)

^;1^:

(20)

This lter is the central level-

^H¹

a priori lter and the corresponding transfer operator, from the disturbances to the prediction errors, will be denoted

T

N

(

^K^p^cen

).

Theorem 3 (An

^H¹

A Posteriori Filter) For a given

^>

0, if the

^Fⁱ ^Gⁱ

have full rank, then an estimator that achieves

^kT^N

(

^K^f

)

^k¹ ^<

exists if, and only if,

P

;1

i

+

^Hⁱ^Hⁱ^T^;^;2^L^Tⁱ ^Lⁱ^>

0

ⁱ

= 0

^:^:^:^N

(21) where

^Pⁱ

is the same as in Theorem 2.

If this is the case, then one possible level-

^H¹

a posteriori lter is given by

^

z

iji

=

^Lⁱ^x

^

^iji

(22)

^

x

i+1ji+1

=

^Fⁱ^x

^

^iji

+

^K^si+1

(

^yⁱ⁺¹^;^Hⁱ^x

^

^iji

) (23) where

K

si+1

=

^Pⁱ⁺¹^Hⁱ⁺¹

(

^I

+

^Hⁱ⁺¹^Pⁱ⁺¹^Hⁱ⁺¹^T

)

^;1^:

(24) This lter is the central level-

^H¹

a posteriori lter and the corresponding transfer operator, from the disturbances to the ltered errors, will be denoted

T

N

(

^K^f^cen

).

Remarks:

1. The above level-

lters are not unique, but all possible level-

lters can be parameterized using these central lters.

2. The structure of the estimator depends, via the Riccati recursion, on the

L

i

.

3. We have additional conditions, (15) and (21), that must be satised for the estimators to exist.

4. We have indenite (covariance) matrices. Besides this complication the central

^H¹

lters are just Kalman lters (but now in an abstract indenite space called Krein space (cf. 19])).

5. As

^!¹

, the Riccati recursion (16) reduces to the Kalman lter recur-

sion (6). This indicates that the robustness of the Kalman lter might be

poor.

(8)

4 Connecting the Two Approaches

After having reviewed the existing

^H²

and

^H¹

optimal estimation strategies, we now turn to the question of how to relate the approaches to each-other. From now on we will assume a state-space model of the form

(

i+1

=

ⁱ

+

^wⁱ

y

i

=

^'^Tⁱⁱ

+

^vⁱ ⁱ

0

^:

(25) Consider the problem of recursively estimating the parameters

ⁱ

, given measurements of the output

^yⁱ

. This is a special case of the estimation problem discussed in the previous sections corresponding to a state-space model with

F

i

=

^I ^Gⁱ

=

^I ^Hⁱ

=

^'^Tⁱ

and the choice

^Lⁱ

=

^I

in, e.g. (12). It is thus clear that we may use both the Kalman lter and the

^H¹

lters to obtain estimates of

ⁱ

. The question is then whether our choice of algorithm matters. In this section we will try to answer this question, e.g. by trying to relate the Kalman

lter and the

^H¹

lters through the Riccati recursion and the lter gains. To simplify the discussion we will rst reformulate the Kalman lter and the

^H¹

lter equations using the simplied model (25).

4.1 Reformulation of the Filters

For the Kalman lter we start by noting that

^Fⁱ

=

^I

implies that

^

i+1

= ^

^iji

(26)

and that

K

i ,K

pi

=

^K^fi

=

^Pⁱ^'ⁱ

(

^Rⁱ

+

^'^Tⁱ^Pⁱ^'ⁱ

)

^;1

(27) where

^Pⁱ

is given by

P

i+1

=

^Pⁱ

+

^Qⁱ^;^Pⁱ^'ⁱ

(

^Rⁱ

+

^'^Tⁱ ^Pⁱ^'ⁱ

)

^;1^'^Tⁱ^Pⁱ

(28) with

^P⁰

=

⁰

. Thus there is no longer any dierence between the Kalman lter in a priori form and in a posteriori form. The update equation can now be written as

^

i+1

= ^

ⁱ

+

^Kⁱ

(

^yⁱ^;^'^Tⁱ

^

i

) (29) We may also use the following two-step procedure to update

^Pⁱ

, instead of the DRE (28)

(

P

iji

=

^Pⁱ^;^Pⁱ^'ⁱ

(

^Rⁱ

+

^'^Tⁱ^Pⁱ^'ⁱ

)

^;1^'^Tⁱ ^Pⁱ

P

i+1

=

^P^iji

+

^Qⁱ^:

(30)

The recursions for the

^H¹

lters also simplify but before we give the reformu-

lated versions of the lter equations we rst present a revised version of Problem

2.

(9)

Problem 3 (Reformulation of the Sub-optimal

^H¹

Problem) ^Given

p

>

0 and

^f ^>

0, nd estimation strategies that achieve sup

0

w 2l

2

v2l

2

P

N

i=0 j

i

;

^

i j

2

(

⁰^;⁰

^ )

^T

^;1⁰

(

⁰^;⁰

^ ) +

^P^N;1i=0 w

T

i Q

;1

i

wi

+

^P^N;1i=0 v

T

i R

;1

i vi

< 2

p

(31) and

sup

0

w 2l

2

v2l

2

P

N

i=0 j

i

;

^

iji j

2

(

⁰^;⁰

^ )

^T

^;1⁰

(

⁰^;⁰

^ ) +

^P^Nⁱ⁼⁰^w^Tⁱ^Q^;1ⁱ ^wi

+

^P^Nⁱ⁼⁰^vⁱ^T^R^;1ⁱ ^vi ^<^f²

(32)

T

(

^K

) will from now on denote the transfer operator from the weighted disturbances

^f

^;1=2⁰

(

⁰^;

^

0

)

^fQ^;1=2ⁱ ^wⁱ^g^fRⁱ^;1=2^vⁱ^gg

to the estimation errors. Note also that Problem 2 is a special case of Problem 3, corresponding to the choices

Q

i

=

^I

and

^Rⁱ

=

^I

. We may now reformulate the results in Section 3.2 as follows.

Corollary 2 (Reformulation of Theorem 2) An estimator that achieves

kT

N

(

^K^p

)

^k¹^<

, for a given

^>

0, exists if, and only if ,

~

P

;1

i

=

^Pⁱ^;1^;^;2^I ^>

0

ⁱ

= 0

^:^:^:^N

(33) where

^P⁰

=

⁰

and where

^Pⁱ

obeys the Riccati recursion

P

i+1

=

^Pⁱ

+

^Qⁱ^;^Pⁱ^'ⁱ ^I^R^ei^;1

' T

i

I

(34) with

R

ei

=

R

i

0

^;²^I

+

' T

i

I

P

i

'

i I

:

(35)

If this is the case, then one possible level-

^H¹

lter is given by

^

i+1

= ^

ⁱ

+

^K^ai

(

^yⁱ^;^'^Tⁱ

^

ⁱ

) (36) where

K

ai

= ~

^Pⁱ^'^Tⁱ

(

^Rⁱ

+

^'^Tⁱ ^P

~

ⁱ^'ⁱ

)

^;1^:

(37)

Corollary 3 (Reformulation of Theorem 3) An estimator that achieves

kT

N

(

^K^f

)

^k¹^<

, for a given

^>

0, exists if, and only if,

P

;1

i

+

^'^Tⁱ ^R^;1ⁱ ^'ⁱ^;^;2^I ^>

0

ⁱ

= 0

^:^:^:^N

(38) where

^Pⁱ

is the same as in Corollary 2.

If this is the case, then one possible level-

^H¹

a posteriori lter is given by

^

i+1ji+1

= ^

^iji

+

^K^si+1

(

^yⁱ⁺¹^;^'^Tⁱ

^

iji

) (39) where

K

si+1

=

^Pⁱ⁺¹^'ⁱ⁺¹

(

^Rⁱ⁺¹

+

^'ⁱ⁺¹^Pⁱ⁺¹^'^Tⁱ⁺¹

)

^;1^:

(40)

(10)

4.2 Kalman Filter Interpretation of the

^H¹

Filters

In this section we will show how the

^H¹

lters can be seen as Kalman lters with particular choices of the design variables

^Qⁱ

and

^Rⁱ

. A standing assumption in this section will be that

^>

0 is such that (33) (or (38)) holds.

We will start with the

^H¹

a posteriori lter since we immediately can make the observation that

K

si

=

^Pⁱ^'ⁱ

(

^Rⁱ

+

^'^Tⁱ^Pⁱ^'ⁱ

)

^;1

=

^Kⁱ

(41) i.e. that the lter have the same gain as the Kalman lter, given that you select the same

^Rⁱ

in the two lters. To nd the corresponding expression for

^Qⁱ

we may rewrite the Riccati recursion (34) as follows. First, let

P

iji ,P

i

;P

i '

i

(

^Rⁱ

+

^'^Tⁱ ^Pⁱ^'ⁱ

)

^;1^'^Tⁱ ^Pⁱ

(42) and

ⁱ^,^P^iji^;²^I:

(43)

Now, using Schur complements we can write

R

i

+

^'^Tⁱ^Pⁱ^'ⁱ ^'^Tⁱ ^Pⁱ

P

i '

i

P

i

; 2

I

;1

=

1

^;Kⁱ^T

0

^I

(

^Rⁱ

+

^'^Tⁱ^Pⁱ^'ⁱ

)

^;1

0

^;1ⁱ

1 0

;K

i I

(44) and thus (34) can be rewritten as

P

i+1

=

^Pⁱ

+

^Qⁱ^;^Pⁱ^'ⁱ ^I

R

i

+

^'^Tⁱ^Pⁱ^'ⁱ ^'^Tⁱ^Pⁱ

P

i '

i

P

i

; 2

I

;1

' T

i

I

=

^Pⁱ

+

^Qⁱ^;^Pⁱ^'ⁱ ^P^iji

(

^Rⁱ

+

^'^Tⁱ^Pⁱ^'ⁱ

)

^;1

0 0 (

^P^iji^;²^I

)

^;1

' T

i P

i

P

iji

=

^P^iji

+

^Qⁱ^;^P^iji

(

^P^iji^;²^I

)

^;1^P^iji

(45) So if we replace the covariance matrix

^Qⁱ

in the Kalman lter recursions with the quantity

^Qⁱ^;^P^iji

(

^P^iji^;²^I

)

^;1^P^iji

the resulting lter is in fact

^H¹

optimal.

We may summarize the above calculations in a small lemma.

Lemma 1 If we run the Kalman lter with

^Qⁱ

chosen as

Q

i

;P

iji

(

^P^iji^;²^I

)

^;1^P^iji

(46) the resulting lter is

^H¹

optimal in the sense that it guarantees that the a posteriori bound (32) holds.

Furthermore, if we rewrite the condition (38) as

P

;1

iji

;

;2

I >

0

ⁱ

= 0

^:^:^:^N

2 and H1 Optimal Estimation

On

2 and

Optimal Estimation

Urban Forssell

Department of Electrical Engineering, Linkoping University S-581 83 Linkoping, email: ufo@isy.liu.se

July 3, 1996

We review some existing results on

and

estimation and ex- plore possible connections between the optimal algorithms. For instance, in order to relate the

optimal Kalman lter to the

lters we show that, with special choices of the covariance matrices, the Kalman lter is

optimal. Moreover, by studying the matrix operator relating the estimation errors and the disturbances, we obtain simple and useful in- terpretations of both the

and the

results. Finally, an

error bound for the RLS algorithm is derived.

1 Introduction

The LMS algorithm 1, 2], for instance, was conceived as an approximate solu- tion to the following problem: given a sequence,

, of

1 input vectors and a corresponding sequence of desired outputs

, nd the estimate of the

1 parameter vector

that minimizes the squared error

In the solution the estimate is recursively updated in the direction of the instan-

taneous gradient of the squared error. LMS is a very simple recursive algorithm

and it is considered very robust. However, since LMS only provides an approxi-

mate solution to the least squares problem (the exact solution can be computed

using the RLS algorithm 3, 4]) it is interesting to note that, in 5], it is shown

that LMS actually gives an exact solution of another problem, namely a cer-

tain minimax problem. The standard name for this kind of problems in the

literature today is

problems. The aim in

estimation is to minimize the maximal energy gain from the disturbances to the estimation errors. The

criterion can thus be understood as a worst-case criterion: the estimator will be robust against the worst possible disturbances. This is a completely dierent, and not very well known, approach to the estimation problem com- pared to the least-squares, or

, approaches that are the standard tools today.

In this contribution we will therefore review some existing results on both

and

estimation and also illustrate various connections between the optimal algorithms.

Returning to LMS, we may also note that in 5] it is shown that LMS is not only

optimal but that it is in fact the central

lter, implying that LMS also minimizes a risk-sensitive criterion under certain assumptions and that it is the minimum entropy lter in case of steady-state LTI ltering 6]. Furthermore, the version of LMS called Normalized LMS is shown to be the central

a posteriori lter as opposed to LMS which, more correctly, is the central

a priori lter (the vocabulary will be explained below).

In Sections 2 and 3 we will, for completeness and ease of reference, state the solutions to the

optimal and the

optimal state estimation problem, respectively. The material in these sections is well-known to most readers and much discussed in the literature. This is especially true for Section 2 which therefore will be very brief. Section 3 contains less familiar results perhaps here we focus on the

estimation problem and we will give a thorough statement of both the

error bound for RLS.

2

2 Optimal Estimation

In this section we present two versions of the celebrated Kalman lter, which is known to be the best linear estimator in the least-squares (

) sense. The Kalman lter is very well known and much discussed in the literature (see e.g.

4, 7, 8]). We will therefore keep the presentation very brief and mainly use this section to introduce some notation.

Since we mainly will be interested in the predicted estimates, or the a priori estimates and hence we rst state the following result (cf. 7]).

Theorem 1 (The Kalman Filter Equations for Predicted Estimates)

Consider the state-space equations

=

+

=

+

0 (1)

with

zero-mean random variables such that

=

0 0

0

0

0 0 

(2)

and where the matrices



are assumed known. The one- step predicted state estimate of

given

,

^

^

(3)

can be recursively computed via the equations

^

=

optimal Kalman lter to the

lters we show that, with special choices of the covariance matrices, the Kalman lter is

The LMS algorithm 1, 2], for instance, was conceived as an approximate solu- tion to the following problem: given a sequence,

, nd the estimate of the

using the RLS algorithm 3, 4]) it is interesting to note that, in 5], it is shown

criterion can thus be understood as a worst-case criterion: the estimator will be robust against the worst possible disturbances. This is a completely dierent, and not very well known, approach to the estimation problem com- pared to the least-squares, or

Returning to LMS, we may also note that in 5] it is shown that LMS is not only

lter, implying that LMS also minimizes a risk-sensitive criterion under certain assumptions and that it is the minimum entropy lter in case of steady-state LTI ltering 6]. Furthermore, the version of LMS called Normalized LMS is shown to be the central

a posteriori lter as opposed to LMS which, more correctly, is the central

a priori lter (the vocabulary will be explained below).

optimal state estimation problem, respectively. The material in these sections is well-known to most readers and much discussed in the literature. This is especially true for Section 2 which therefore will be very brief. Section 3 contains less familiar results perhaps here we focus on the

In this section we present two versions of the celebrated Kalman lter, which is known to be the best linear estimator in the least-squares (

) sense. The Kalman lter is very well known and much discussed in the literature (see e.g.

4, 7, 8]). We will therefore keep the presentation very brief and mainly use this section to introduce some notation.

Since we mainly will be interested in the predicted estimates, or the a priori estimates and hence we rst state the following result (cf. 7]).

0 0

=

. The Kalman lter is still the best linear estimator but the lter equations will now involve the ltered quantities

The proof consists in the observation that the predicted and ltered state esti- mates are related through (cf. 7])

We may also introduce the ltered Kalman gain

We make one last remark on Kalman ltering before turning to the

lters and that is on how to estimate a dierent process than the state sequence.