On
H2 and H1 Optimal Estimation
Urban Forssell
Department of Electrical Engineering, Linkoping University S-581 83 Linkoping, email: ufo@isy.liu.se
July 3, 1996
Abstract
We review some existing results on
H2and
H1estimation and ex- plore possible connections between the optimal algorithms. For instance, in order to relate the
H2optimal Kalman lter to the
H1lters we show that, with special choices of the covariance matrices, the Kalman lter is
H1optimal. Moreover, by studying the matrix operator relating the estimation errors and the disturbances, we obtain simple and useful in- terpretations of both the
H2and the
H1results. Finally, an
H1error bound for the RLS algorithm is derived.
1 Introduction
Assume you have a state-space model of a system and you want to estimate the states given measurements of the output. A standard approach to this problem is to minimize some quadratic criterion involving the estimation errors. This least-squares approach is attractive from many points of view, one is that it fre- quently enables the use of extremely ecient methods for nding the optimizing estimate.
The LMS algorithm 1, 2], for instance, was conceived as an approximate solu- tion to the following problem: given a sequence,
f'ig, of
n1 input vectors and a corresponding sequence of desired outputs
fyig, nd the estimate of the
n
1 parameter vector
that minimizes the squared error
N
X
i=0 jy
i
;' T
i
j 2
:
In the solution the estimate is recursively updated in the direction of the instan-
taneous gradient of the squared error. LMS is a very simple recursive algorithm
and it is considered very robust. However, since LMS only provides an approxi-
mate solution to the least squares problem (the exact solution can be computed
using the RLS algorithm 3, 4]) it is interesting to note that, in 5], it is shown
that LMS actually gives an exact solution of another problem, namely a cer-
tain minimax problem. The standard name for this kind of problems in the
literature today is
H1problems. The aim in
H1estimation is to minimize the maximal energy gain from the disturbances to the estimation errors. The
H
1
criterion can thus be understood as a worst-case criterion: the estimator will be robust against the worst possible disturbances. This is a completely dierent, and not very well known, approach to the estimation problem com- pared to the least-squares, or
H2, approaches that are the standard tools today.
In this contribution we will therefore review some existing results on both
H2and
H1estimation and also illustrate various connections between the optimal algorithms.
Returning to LMS, we may also note that in 5] it is shown that LMS is not only
H
1
optimal but that it is in fact the central
H1lter, implying that LMS also minimizes a risk-sensitive criterion under certain assumptions and that it is the minimum entropy lter in case of steady-state LTI ltering 6]. Furthermore, the version of LMS called Normalized LMS is shown to be the central
H1a posteriori lter as opposed to LMS which, more correctly, is the central
H1a priori lter (the vocabulary will be explained below).
In Sections 2 and 3 we will, for completeness and ease of reference, state the solutions to the
H2optimal and the
H1optimal state estimation problem, respectively. The material in these sections is well-known to most readers and much discussed in the literature. This is especially true for Section 2 which therefore will be very brief. Section 3 contains less familiar results perhaps here we focus on the
H1estimation problem and we will give a thorough statement of both the
H1criterion and the optimizing solution. Then in Section 4 we will narrow the scope a bit and consider the problem of tracking a time-varying system. We will then assume that the parameters are time varying according to a random walk model and that the output can be described by a linear regression. Within this framework we will discuss various aspects of the two approaches in order to link them together. As we will see, the solutions are in some respects closely related while in others they are not. Finally, in Section 5 we derive an
H1error bound for RLS.
2
H2 Optimal Estimation
In this section we present two versions of the celebrated Kalman lter, which is known to be the best linear estimator in the least-squares (
H2) sense. The Kalman lter is very well known and much discussed in the literature (see e.g.
4, 7, 8]). We will therefore keep the presentation very brief and mainly use this section to introduce some notation.
Since we mainly will be interested in the predicted estimates, or the a priori estimates and hence we rst state the following result (cf. 7]).
Theorem 1 (The Kalman Filter Equations for Predicted Estimates)
Consider the state-space equations
(
x
i+1
=
Fixi+
Giwiy
i
=
Hixi+
vi i0 (1)
with
fwivix0gzero-mean random variables such that
E 2
4 w
i
v
i
x
0 3
5 2
4 w
j
v
j
x
0 3
5 T
=
2
4 Q
i
ij
0 0
0
Riij0
0 0
03
5
(2)
and where the matrices
fFiGiHiQiRi0gare assumed known. The one- step predicted state estimate of
xigiven
fy0:::yi;1g,
^
x
i
,x
^
iji;1(3)
can be recursively computed via the equations
^
x
i+1
=
Fix^
i+
Kpi(
yi;Hix^
i)
i0
^
x0= 0
(4) where the Kalman gain
Kpiis given by
K
pi
=
FiPiHiR;1eiwith
Rei=
HiPiHiT+
Ri(5) and where
Piobeys the discrete time Riccati recursion (DRE)
P
i+1
=
FiPiFiT+
GiQiGTi ;KpiReiKpiT i0
P0=
0:(6) Furthermore,
Piis the covariance matrix of the instantaneous error in the pre- dicted state estimate:
P
i
,Ex
~
i~
xTi x~
i,xi;x^
i:(7) Instead of computing the estimate of
xigiven
fy0:::yi;1gone may want to use measurements up to, and including, time
i. The Kalman lter is still the best linear estimator but the lter equations will now involve the ltered quantities
^
x
iji
, i.e. the estimate of
xigiven
fy0:::yig. To formalize the discussion, we state the following corollary to the previous theorem.
Corollary 1 (The Kalman Filter Equations for Filtered Estimates)
When the assumptions in Theorem 1 hold, the ltered state estimates of
xigiven
fy0:::yigcan be computed via the recursion
^
x
iji
=
Fi;1x^
i;1ji;1+
PiHi(
HiPiHiT+
Ri)
;1(
yi;HiFi;1x^
i;1ji;1)
(8) where
Piobeys the same DRE as in Theorem 1.
The proof consists in the observation that the predicted and ltered state esti- mates are related through (cf. 7])
^
x
i+1
=
Fix^
iji:(9)
We may also introduce the ltered Kalman gain
K
fi ,P
i H
i R
;1
ei
(10)
and note the following simple relation between the two Kalman gains
K
pi
=
FiKfi:(11)
We make one last remark on Kalman ltering before turning to the
H1lters and that is on how to estimate a dierent process than the state sequence.
Suppose you want to estimate
fzigand that
ziand the states
xiare related through
z
i
=
Lixi:(12)
The best estimate of
ziis then given by
^
z
i
=
Lix^
i(13)
where ^
xiare the state estimates outputted by the Kalman lter.
3
H1Optimal Estimation
The
H1lters, to be presented in this section, are interesting alternatives to the famed Kalman lter in most estimation problems. As we shall see, the
lter equations are very similar despite that the underlying ideas are completely dierent.
The optimality of the Kalman lter relies on the knowledge of the covariance matrices
Qiand
Ri. In most real-world applications this kind of a priori infor- mation is not available and one has to use, more or less, ad hoc choices of
Qiand
R
i
. Is the resulting lter guaranteed to achieve a certain level of performance?
The answer is no, although the eects of dierent choices of
Qiand
Riare well understood and frequently utilized.
The
H1lters, on the other hand, give hard upper bounds on the estimation errors, no matter what the disturbances are (as long as they are of nite energy).
We will now formulate the
H1problem and then present two
H1optimal lters.
We will not give much background material, instead the reader is referred to the papers 6, 9, 10, 11, 12, 13, 14, 15, 16] and the references therein. One may also want to consult some text book on
H1control, e.g. 17, 18], for a presentation of the dual, control problem.
3.1 Formulation of the
H1Problem
Consider a state-space model of the form
(
x
i+1
=
Fixi+
Giwiy
i
=
Hixi+
vi i0 (14) with
x0fwigand
fvigunknown quantities and
fFiGiHigknown matrices of appropriate sizes.
We may now pose the following problem: estimate some linear combination of the states, say
z
i
=
Lixiusing the measured output
fyig. Let ^
zi=
Kp(
y0:::yi;1) denote the estimate of
zigiven
fy0:::yi;1g, i.e. the predicted, or a priori, estimate, and
ziji=
K
f
(
y0:::yi) the ltered, or a posteriori, estimate given measurements
fyigup to, and including, time
i.
Denition 1 The
H1norm of an operator
Tis de ned as
kTk
1
= sup
u2l2u6=0 kTuk
2
kuk
2
where
kk2is the usual
l2norm of the causal sequence
fukg, i.e.
kuk22=
P
1
i=0 ju
i j
2
.
Remark: If
Ta matrix, then the
H1norm of
Tis the maximum singular value of
T,
(
T).
Let
TN(
Kp) be the transfer operator that maps the disturbances
f;1=20(
x0;^
x
0
)
fwigN;1i=0 fvigN;1i=0 g(
0denotes the penalty on the initial error) onto the predicted estimation errors
fzi;z^
igNi=0and, similarly,
TN(
Kf) the operator that maps the disturbances
f;1=20(
x0;x^
0)
fwigNi=0fvigNi=0gonto the ltered estimation errors
fzi;z^
ijigNi=0. The
H1optimal estimators minimize the
H1norm of the operators
TN(
Kp) and
TN(
Kf), respectively. The corresponding
H
1
optimal transfer operators will be denoted
TN(
K1p) and
TN(
K1f) as in Figure 1. We may interpret the
H1norm as the maximal energy gain from the disturbances to the estimation errors. Hence, the
H1estimators can be viewed as worst-case estimators that will be robust against the worst possible disturbances.
fw
i g
N
i=0
;1=20(
x0;^
x0)
;1=20(
x0;x^
0)
fw
i g
N;1
i=0
fv
i g
N;1
i=0
fv
i g
N
i=0
fL
i x
i
;z
^
igNi=0fL
i x
i
;z
^
ijigNi=0T
N (K
1
p )
T
N (K
1
f )
Figure 1:
H1optimal transfer operators from disturbances to predicted and
ltered estimation errors.
Our problem may now formally be stated as follows (we only treat the nite horizon case, the innite horizon case follows by taking limits).
Problem 1 (Optimal
H1Problem) Find estimators,
Kpand
Kf, that min-
imize the
H1norm of the transfer operators
TN(
Kp) and
TN(
Kf), respectively,
and obtain the corresponding
2
popt
= inf
K
p kT
N
(
Kp)
k21= inf
K
p
sup
x
0
w2l
2
v2l
2
P
N
i=0 jz
i
;z
^
ij2(
x0;x^
0)
T;10(
x0;x^
0) +
PN;1i=0 jwij2+
PN;1i=0 jvij2and
2
fopt
= inf
K
f kT
N
(
Kf)
k21= inf
K
f
sup
x
0
w2l
2
v2l
2
P
N
i=0 jz
i
;z
^
ijij2(
x0;x^
0)
T;10(
x0;x^
0) +
PNi=0jwij2+
PNi=0jvij2Remark: We may also write
2
popt
=
kTN(
K1p)
k21and
fopt2=
kTN(
Kf1)
k21using our previous denitions of
TN(
K1p) and
TN(
K1f).
Closed form solutions to the optimal
H1problem are available only in some special cases (cf. 5]) and it is common in the literature to settle for a sub- optimal solution.
Problem 2 (Sub-optimal
H1Problem) Given
p >0 and
f >0, nd estimation strategies that achieve
sup
x0w2l2v2l2
P
N
i=0 jz
i
;z
^
ij2(
x0;x^
0)
T;10(
x0;x^
0) +
PN;1i=0 jwij2+
PN;1i=0 jvij2 <p2and
sup
x0w2l2v2l2
P
N
i=0 jz
i
;z
^
ijij2(
x0;x^
0)
T;10(
x0;x^
0) +
PNi=0jwij2+
PNi=0jvij2 <f2Note: this requires checking whether
ppoptand
f fopt.
3.2 Solution of the Sub-optimal
H1Problem
We now give solutions to the sub-optimal
H1problem stated in the previous section. The results are presented as two theorems (cf. 15, 16]).
Theorem 2 (An
H1A Priori Filter) For a given
>0, if the
Fi Gihave full rank, then an estimator that achieves
kTN(
Kp)
k1 <exists if, and only if,
~
P
;1
i
=
Pi;1;;2LTiLi>0
i= 0
:::N(15) where
P0=
0and where
Piobeys the Riccati recursion
P
i+1
=
FiPiFiT+
GiGTi ;FiPiHiT LTiR;1ei
H
i
L
i
(16)
with
R
ei
=
I
0
0
;2I
+
H
i
L
i
P
i
H T
i L
T
i
:
(17)
If this is the case, then one possible level-
H1lter is given by
^
z
i
=
Lix^
i(18)
^
x
i+1
=
Fix^
i+
Kai(
yi;Hix^
i) (19) where
K
ai
=
FiP~
i Hi
(
I+
HiP~
i HT
i
)
;1:(20)
This lter is the central level-
H1a priori lter and the corresponding trans- fer operator, from the disturbances to the prediction errors, will be denoted
T
N
(
Kpcen).
Theorem 3 (An
H1A Posteriori Filter) For a given
>0, if the
Fi Gihave full rank, then an estimator that achieves
kTN(
Kf)
k1 <exists if, and only if,
P
;1
i
+
HiHiT;;2LTi Li>0
i= 0
:::N(21) where
Piis the same as in Theorem 2.
If this is the case, then one possible level-
H1a posteriori lter is given by
^
z
iji
=
Lix^
iji(22)
^
x
i+1ji+1
=
Fix^
iji+
Ksi+1(
yi+1;Hix^
iji) (23) where
K
si+1
=
Pi+1Hi+1(
I+
Hi+1Pi+1Hi+1T)
;1:(24) This lter is the central level-
H1a posteriori lter and the corresponding transfer operator, from the disturbances to the ltered errors, will be denoted
T
N
(
Kfcen).
Remarks:
1. The above level-
lters are not unique, but all possible level-
lters can be parameterized using these central lters.
2. The structure of the estimator depends, via the Riccati recursion, on the
L
i
.
3. We have additional conditions, (15) and (21), that must be satised for the estimators to exist.
4. We have indenite (covariance) matrices. Besides this complication the central
H1lters are just Kalman lters (but now in an abstract indenite space called Krein space (cf. 19])).
5. As
!1, the Riccati recursion (16) reduces to the Kalman lter recur-
sion (6). This indicates that the robustness of the Kalman lter might be
poor.
4 Connecting the Two Approaches
After having reviewed the existing
H2and
H1optimal estimation strategies, we now turn to the question of how to relate the approaches to each-other. From now on we will assume a state-space model of the form
(
i+1
=
i+
wiy
i
=
'Tii+
vi i0
:(25) Consider the problem of recursively estimating the parameters
i, given mea- surements of the output
yi. This is a special case of the estimation problem discussed in the previous sections corresponding to a state-space model with
F
i
=
I Gi=
I Hi=
'Tiand the choice
Li=
Iin, e.g. (12). It is thus clear that we may use both the Kalman lter and the
H1lters to obtain estimates of
i. The question is then whether our choice of algorithm matters. In this section we will try to answer this question, e.g. by trying to relate the Kalman
lter and the
H1lters through the Riccati recursion and the lter gains. To simplify the discussion we will rst reformulate the Kalman lter and the
H1lter equations using the simplied model (25).
4.1 Reformulation of the Filters
For the Kalman lter we start by noting that
Fi=
Iimplies that
^
i+1
= ^
iji(26)
and that
K
i ,K
pi
=
Kfi=
Pi'i(
Ri+
'TiPi'i)
;1(27) where
Piis given by
P
i+1
=
Pi+
Qi;Pi'i(
Ri+
'Ti Pi'i)
;1'TiPi(28) with
P0=
0. Thus there is no longer any dierence between the Kalman lter in a priori form and in a posteriori form. The update equation can now be written as
^
i+1
= ^
i+
Ki(
yi;'Ti^
i) (29) We may also use the following two-step procedure to update
Pi, instead of the DRE (28)
(
P
iji
=
Pi;Pi'i(
Ri+
'TiPi'i)
;1'Ti PiP
i+1
=
Piji+
Qi:(30)
The recursions for the
H1lters also simplify but before we give the reformu-
lated versions of the lter equations we rst present a revised version of Problem
2.
Problem 3 (Reformulation of the Sub-optimal
H1Problem) Given
p
>
0 and
f >0, nd estimation strategies that achieve sup
0
w 2l
2
v2l
2
P
N
i=0 j
i
;
^
i j2
(
0;0^ )
T;10
(
0;0^ ) +
PN;1i=0 wT
i Q
;1
i
wi
+
PN;1i=0 vT
i R
;1
i vi
< 2
p
(31) and
sup
0
w 2l
2
v2l
2
P
N
i=0 j
i
;
^
iji j2
(
0;0^ )
T;10
(
0;0^ ) +
PNi=0wTiQ;1i wi+
PNi=0viTR;1i vi <f2(32)
T
(
K) will from now on denote the transfer operator from the weighted distur- bances
f;1=20(
0;^
0)
fQ;1=2i wigfRi;1=2viggto the estimation errors. Note also that Problem 2 is a special case of Problem 3, corresponding to the choices
Q
i
=
Iand
Ri=
I. We may now reformulate the results in Section 3.2 as follows.
Corollary 2 (Reformulation of Theorem 2) An estimator that achieves
kT
N
(
Kp)
k1<, for a given
>0, exists if, and only if ,
~
P
;1
i
=
Pi;1;;2I >0
i= 0
:::N(33) where
P0=
0and where
Piobeys the Riccati recursion
P
i+1
=
Pi+
Qi;Pi'i IRei;1
' T
i
I
(34) with
R
ei
=
R
i
0
0
;2I
+
' T
i
I
P
i
'
i I
:
(35)
If this is the case, then one possible level-
H1lter is given by
^
i+1
= ^
i+
Kai(
yi;'Ti^
i) (36) where
K
ai
= ~
Pi'Ti(
Ri+
'Ti P~
i'i)
;1:(37)
Corollary 3 (Reformulation of Theorem 3) An estimator that achieves
kT
N
(
Kf)
k1<, for a given
>0, exists if, and only if,
P
;1
i
+
'Ti R;1i 'i;;2I >0
i= 0
:::N(38) where
Piis the same as in Corollary 2.
If this is the case, then one possible level-
H1a posteriori lter is given by
^
i+1ji+1
= ^
iji+
Ksi+1(
yi+1;'Ti^
iji) (39) where
K
si+1
=
Pi+1'i+1(
Ri+1+
'i+1Pi+1'Ti+1)
;1:(40)
4.2 Kalman Filter Interpretation of the
H1Filters
In this section we will show how the
H1lters can be seen as Kalman lters with particular choices of the design variables
Qiand
Ri. A standing assumption in this section will be that
>0 is such that (33) (or (38)) holds.
We will start with the
H1a posteriori lter since we immediately can make the observation that
K
si
=
Pi'i(
Ri+
'TiPi'i)
;1=
Ki(41) i.e. that the lter have the same gain as the Kalman lter, given that you select the same
Riin the two lters. To nd the corresponding expression for
Qiwe may rewrite the Riccati recursion (34) as follows. First, let
P
iji ,P
i
;P
i '
i
(
Ri+
'Ti Pi'i)
;1'Ti Pi(42) and
i,Piji;2I:(43)
Now, using Schur complements we can write
R
i
+
'TiPi'i 'Ti PiP
i '
i
P
i
; 2
I
;1
=
=
1
;KiT0
I
(
Ri+
'TiPi'i)
;10
0
;1i
1 0
;K
i I
(44) and thus (34) can be rewritten as
P
i+1
=
Pi+
Qi;Pi'i I
R
i
+
'TiPi'i 'TiPiP
i '
i
P
i
; 2
I
;1
' T
i
I
=
Pi+
Qi;Pi'i Piji
(
Ri+
'TiPi'i)
;10 0 (
Piji;2I)
;1
' T
i P
i
P
iji
=
Piji+
Qi;Piji(
Piji;2I)
;1Piji(45) So if we replace the covariance matrix
Qiin the Kalman lter recursions with the quantity
Qi;Piji(
Piji;2I)
;1Pijithe resulting lter is in fact
H1optimal.
We may summarize the above calculations in a small lemma.
Lemma 1 If we run the Kalman lter with
Qichosen as
Q
i
;P
iji
(
Piji;2I)
;1Piji(46) the resulting lter is
H1optimal in the sense that it guarantees that the a posteriori bound (32) holds.
Furthermore, if we rewrite the condition (38) as
P
;1
iji
;
;2
I >