Performance Analysis of General Tracking Algorithms
Lei Guo
Institute of Systems Science, Chinese Academy of Sciences Beijing, 100080, China
Lennart Ljung and
Department of Electrical Engineering, Linkoping University S-581 83 Linkoping, Sweden
Abstract | A general family of tracking algorithms for lin- ear regression models is studied. It includes the familiar LMS (gradient approach), RLS (recursive least squares) and KF (Kalman lter) based estimators. The exact expressions for the quality of the obtained estimates are complicated.
Approximate, and easy-to-use, expressions for the covari- ance matrix of the parameter tracking error are developed.
These are applicable over whole time interval, including the transient and the approximation error can be explicitly cal- culated.
I. Introduction
Tracking is the key factor in adaptive algorithms of all kinds. We shall in this contribution study the special case where the underlying model is a linear regression, i.e., the observations are related by
y k = ' k k + v k k 0 : (1) Here y k is an observation made at time k , and ' k is a d - dimensional vector, that is known at time k , v k represents a disturbance and the parameter vector k describes how the components of ' k relate to the observation y k . It is the objective to estimate the vector k from measurements
f
y t ' t t
k
g.
Many technical problem formulations t the structure (1) by choosing ' k and y k appropriately. See, among many references, for example, 15] and 22].
In order to come up with good algorithms for estimating
k , it is natural to introduce some assumptions about the time-variation of this parameter vector. In general we may write
k = k
;1+ w k (2) where is a scaling constant and w k is an as yet undened variable.
The tracking algorithms will provide us with an estimate
^ k = ^ k ( y k ' k k ) (3) where superscript denotes the whole time history: y k =
f
y
0y
1:::y k
g, etc.
Supported by the National Natural Science Foundation of China Supported by the Swedish Research Council for Engineering Sci- ences (TFR)
A prime question concerns of course the quality of such an estimate. We shall evaluate the quality in terms of the covariance matrix of the tracking error
e
k = k
;^ k (4) This covariance matrix will be denoted by
0k = E
ek
ek ] (5) where expectation will be taken over all relevant stochastic variables. A precise denition will be given later.
An exact expression for
0k will be very complicated | except in some trivial cases | and it will not be possible to derive it explicitly in closed form. However, the prac- tical importance of having good tracking algorithms and estimates of their quality still makes it vital to be able to work with
0k .
For that reason, there is a quite substantial literature on the problem of how to approximate
0k with expressions k
that are simple to to work with. This literature is { partly { surveyed in 2], 1], 12], and 20].
The current paper has the ambition to give a general re- sult that subsumes and extends most of the earlier results.
Example 1.1 A Preview Example .
Consider the model (1){(2) under the assumptions that a). ' k and k are scalars
b).
f' k
gfv k
gand
fw k
gare independent sequences of independent random variables with zero mean values and variances R ' , R v and Q w , respectively.
c). The fourth moment of ' k is R
4.
Assume also that the estimate ^ k is computed by the sim- ple LMS algorithm
^ k
+1= ^ k + ' k ( y k
;' k ^ k ) : (6) This case is one | essentially the only one | where a simple exact expression for
0k can be calculated. Straight- forward calculations give
ek
+1= (1
;'
2k )
ek
;' k v k + w k
+1: (7) Squaring and taking expectations gives
0k
+1= (1
;2 R ' +
2R
4)
0k +
2R ' R v +
2Q w : (8)
This is a linear time-invariant dierence equation for
0k , and can be explicitly solved. In particular, if
j
1
;2 R ' +
2R
4j< 1 the solution of (8) will converge to
with
= 1
1
;R
4= (2 R ' ) = 1 2 R ' R ' R v +
2Q w ] Simple manipulations then give (9)
j
;j( ) ( ) = R
4= (2 R ' ) 1
;R
4= (2 R ' )] : Thus,
can be well approximated by for small , since
( )
!0 as
!0.
2Now, this example was particularly easy, primarily be- cause of the assumed independence among
f' k v k w k
gwhich makes ' k and
ek independent.
In more general cases we have to deal with dependence among
f' k
g, and that is actually at the root of the prob- lem. Generally speaking, if
f' k
gare weakly dependent, so should ^ ^ k in (3) depends to a small extent on the \latest" k and ' k be, provided that ' k , i.e. if the adaptation rate ( in the example) is small and the error equation ((7) in the example) is stable.
The extra term caused by the dependence in the equa- tion corresponding to (8) in the example should then have negligible inuence. Indeed, it is the purpose of this contri- bution to establish this for a fairly general family of track- ing algorithms. Despite the simple idea, it turns out to be surprisingly technically dicult to prove. This paper could be said to make the end of a series of results on performance analysis, starting with Theorem 1 in 12] and then followed by 14],13] and 10]. There are many related, relevant re- sults using other approaches. We may point to 20], 2], 5], 6], 4], 16], 3] 18], and to the references in these books and papers.
The bottom line of the analysis is a result of the character
k
E
ek
ek ]
;k
k( )
kk
k(10) where ( )
!0 as
!0, and is a measure of the adaptation rate in the algorithm, k obeys a simple linear, deterministic, dierence equation (like (8) without the term
2R
4).
The point with a result of the character (10) is, clearly, that we can arbitrarily well approximate the actual track- ing error covariance matrix with a simple expression that can be easily evaluated and analyzed. The essence of this paper does not lie in the expression for k itself | it is not dicult to conjecture that such an approximation should be reasonable. Our contribution is rather to establish the connection in the explicit fashion (10) for a wide family of the most common tracking algorithms. One important step in achieving such results is to rst establish that the underlying algorithm is exponentially stable. This is a ma- jor problem in itself, and a companion paper 9] is devoted to this step, for the same family of algorithms.
The paper is organized as follows. In Section 2 the track- ing algorithms are briey described. Section 3 gives the main result: That (10) holds under the same general condi- tions for all algorithms in the family. There we also briey discuss the practical consequences of the result. In the fol- lowing section, a more general theorem is presented, which is the basis for the analysis. This theorem is more general, and uses weaker but less explicit conditions. The proof of the main result is then given in Section 5, by showing that the general theorem can be applied to our family of algo- rithms. Notice that this analysis is of independent interest in that for each individual algorithm, the conditions can be somewhat weakened in dierent ways.
II. The Family of Tracking Algorithms We shall consider the general adaptation algorithm
^ k
+1= ^ k + L k ( y k
;' k ^ k )
2(0 1) (11) where the gain L k is chosen in some dierent ways:
Case 1: Least Mean Squares (LMS) :
L k = ' k (12)
This is a standard algorithm, 21],22], and has been used in numerous adaptive signal processing applications.
Case 2 : Recursive Least Squares(RLS) :
L k = P k ' k (13)
P k = 1 1
;
P k
;1;P k
;1' k ' k P k
;11
;+ ' k P k
;1' k
(14)
P
0> 0 : (15)
This gives an estimate ^ k that minimizes
k
X
t
=1(1
;) k
;t ( y t
;' t )
2where (1
;) is the \forgetting factor".
Case 3: Kalman Filter (KF) Based Algorithm:
L k = P k
;1' k
R + ' k P k
;1' k (16) P k = P k
;1;P k
;1' k ' k P k
;1R + ' k P k
;1' k + Q (17) ( R > 0 Q > 0) (18) Here R is a positive number and Q is a positive denite matrix. The choice of L k corresponds to a Kalman lter state estimation for (1)-(2), and is optimal in the a poste- riori mean square sense if v k and w k are Gaussian white noises with covariance matrices R and Q , respectively, and if is chosen as in (2).
If
f' k y k k
gobey (1) - (2) and ^ k is found using (11) we can write the estimation error
ek as
e
k
+1= ( I
;F k )
ek
;L k v k + w k
+1F k = L k ' k (19)
This is a purely algebraic consequence of (1) - (2) and (11), and holds for whatever sequences v k and w k .
If we introduce stochastic assumptions about
fv k
gand
f
w k
g, we can use (19) to express the covariance matrix E
ek
+1ek
+1]. That will however be quite complex, primar- ily due to the dependence between
fL k ' k
ek
g. The basic approximating expression will instead be based on the fol- lowing expression
k
+1= ( I
;G k ) k ( I
;G k ) +
2R v ( k ) M k +
2Q w ( k +1) where G k = EF k , M k = EL k L k , R v ( k ) = Ev
2k (20) and Q w ( k ) = Ew k w k . As follows from Example 1.1, this would be the correct expression for the covariance matrix of
ek
+1, if v k and w k were white noises and L k ' k was independent of
ek , and if a term of size
2k was neglected.
Indeed, we shall prove that (20) provides a good approx- imation of the true covariance matrix in the sense that (10) holds. Note that k obeys a simple linear dierence equa- tion, and can easily be calculated and examined.
III. The Main Result A. The Assumptions
We shall now consider the algorithm (11) with either of the three choices of the gain L k , discussed in the previous section. For the analysis we shall impose some conditions on the involved variables. These are of the following char- acter.
C1. The regressors
f' k
gspan the regressor space (in or- der to ensure that the whole parameter vector can be estimated)
C2. The dependence between the regressors ' k and ( ' i v i
;1w i ) decays to zero as the time distance ( k
;i ) tends to innity
C3. The measured error v k and the parameter drift w k
are of white noise character.
In more exact terms, the three assumptions take the fol- lowing form:
P1 . Let S t = E ' t ' t ], assume that there exist constants h > 0 and > 0 such that
k
X+h
t
=k
+1S t I
8k
P2 . Let
Gk =
f' k
g,
Fk =
f' i v i
;1w i i
k
g. As- sume that
f' k
gis weakly dependent ( -mixing) in the sense that there is a function ( m ) with ( m )
!0, as m
!1, such that
A
2Gk+sup
mB
2FkjP ( A
jB )
;P ( A )
j( m )
8k
8m: (21) Also, assume that there is a constant c ' > 0 such that
k
' k
kc ' a:s:
8k .
P3 . Let
Fk be the -algebra dened in P2, assume that E v k
jFk ] = 0 E w k
+1jFk ] = E w k
+1v k
jFk ] = 0 E v k
2jFk ] = R v ( k ) E w k w k ] = Q w ( k )
sup k
fE
jv k
jr
jFk ] + E
kw k
kr
gM
for some r > 2 M > 0 : B. The Result
Now, let k be dened by the following linear, determin- istic dierence equation:
k
+1= ( I
;R k S k ) k ( I
;R k S k )
+
2R v ( k ) R k S k R k +
2Q w ( k + 1) (22) where S k = E ' k ' k ], and R k is dened as follows:
LMS-case
R k = I (23)
RLS-case
R k = R k
;1;R k
;1S k R k
;1+ R k
;1( R
0= P
0) (24)
KF-case
R k = R k
;1;R k
;1S k R k
;1+ Q=R ( R
0= P
0=R ) (25) We then have the following main result.
Theorem 3.1 Consider any of the three basic algo- rithms in Section 2. Assume that P1, P2 and P3 hold. Let
k be dened as above. Then
82(0
)
8k 1
k
E
ek
ek ]
;k
kc ( ) +
2+ (1
;) k ] (26) where ( )
!0 (as
!0), which is dened by
( ) = min
4m
1 fp
m + ( m )
g(27) and ( m ) was dened in P2, and
2(0 1)
2(0 1) c >
0 are constants which may be computed using properties of
f' k v k w k
g.
The proof is given in Section 5. Let us now discuss the conditions used in the above theorem.
C. The Degree of Approximation
First of all, it is clear that the quantity ( ) plays an important role. The faster it tends to zero, the better ap- proximation is obtained. The rate by which it tends to zero is according to (27) a reection of how fast ( m ) (that is, the dependence among the regressors) tends to zero as m increases. For example, if the regressors are m -dependent, so that ' k and ' ` are independent for
jk
;`
j> m , then ( n ) = 0 for n > m and ( ) will behave like
p
. Also, if the dependence is exponentially decaying ( ( m )
Ce
;m ), then we can nd that
( ) < C
0:
5;for arbitrarily small, positive . This gives a good picture
of typical decay rates of .
D. Persistence of Excitation: Condition P1
Condition P1 is quite natural and weak, just requiring the regressor covariance matrix to add up to full rank over a given time span of arbitrary length. It has been known to be a necessary condition (in a certain sense) for bound- edness of E
kek
k2generated by LMS (cf. 8]), it is also known to be the minimum excitation condition needed for the stability analysis of RLS (cf. 10]).
E. Boundedness and -mixing of the regressors: Condition Condition P2 requires boundedness and P2 -mixing of the regressors. Although such conditions are standard ones in the literature (e.g. 11]), they can still be considered as re- strictive. As seen in several of the results in Section 5, both -mixing and boundedness can be weakened considerably when we deal with specic algorithms.
It may also be remarked that when
f' k
gis unbounded, we can modify the algorithm and make Theorem 3.1 hold true: Introduce the normalized signal
( y k ' k v k ) = 1
p
1 +
k' k
k2( y k ' k v k ) Then we have from (1)
y k = k ' k + v k :
Thus,
fk
gmay be estimated based on this normalized linear regression. In this case, Theorem 3.1 can be applied to this case if only S k and R v ( k ) in (22){(25) are replaced by E ' k ' k
1 +
k' k
k2] and E 1
1 +
k' k
k2] R v ( k ), respectively.
F. The Parameter Drift Model: Condition P3
There are two things to mention around the Conditions P3. First, we note that the martingale dierence property of w k essentially means that the true parameters, accord- ing to the model (2) are assumed to be a random walk.
Although this model is quite standard, it has also been criticized as being too restrictive. We believe that a ran- dom walk model, in the context of slow adaptation (small
), captures the tracking behavior of the algorithm very well. This is, in a sense, a worst case analysis, since the future behavior of the model is unpredictable.
We may also note that time-varying covariances Q w ( k ) and R v ( k ) are allowed. Several of the special model drift cases described in 12] are therefore covered by P3. Other drift models, where the driving noise is colored, can be put into a similar Kalman lter framework. However, to cover also that case with our techniques requires more work.
Condition P3 also introduces assumptions about higher moments than 2. We remark that if we only assume that
f
v k
gand
fw k
gare bounded in e.g. mean square sense, then upper bounds for the mean square tracking errors can be established (cf. 8] and 7]). The strengthened assumption in P3 allows us to obtain performance values much more accurate than upper bounds.
G. The Practical Use of the Theorem
The practical consequences of Theorem 3.1 is that a very simple algorithm, the linear, deterministic dierence equa- tion (22) will describe the tracking behavior. Now, this equation is quite easy to analyze. In fact, there is an ex- tensive literature on such analysis, in particular for the special case of LMS. Among many references, we may refer to 12] for a survey of such results. In essence, all these results capture the dilemma between tracking error ( is large because is small) and the noise sensitivity ( is large because is large) and may point to the best compromises between these requirements.
For example, under weak stationarity of the regressors S k
S
we nd that R k will converge to ~ R as k
!1, where ~ R = I in the LMS-case, ~ R = S
;1in the RLS case and for the KF case we have to solve
RS ~ R ~ = Q=R
for ~ R . Inserted into (22) this gives the following stationary values for the tracking error covariance matrix (neglect- ing the term
2):
LMS S + S = R v S +
2Q w
RLS = 12 R v S
;1+
2Q w ]
KF RS ~ + ( ~ RS ) = R v Q=R +
2Q w
Note, that if we have Q = Q w and R = R v , then the latter equation can be solved as
= R
2 ( +
2) ~ R
From these expressions the trade-os between tracking ability and noise sensitivity are clearly visible.
IV. A General Theorem
In this section, we shall present a general theorem on performance of tracking algorithm (11) when the gain L k is not specied, from which our main result Theorem 3.1 will follow. The general theorem has weaker, but less explicit assumptions. From now on the treatment and discussion will be more technical. However, the main line of thought in the proofs follows the outline given after Example 1.1 in the Introduction.
A. Notations
The following notations will be used in the remainder of
the paper. These notations are the same as in the compan-
ion paper 9].
a). The minimum and maximum eigenvalues of a matrix X are denoted by
min( X ) and
max( X ), respectively, and
k
X
k=
4fmax( XX )
g12k
X
kp =
4fE (
kX
kp )
g1pp 1 :
b). Let x =
fx k ( ) k 1
gbe a random sequence pa- rameterized by
2(0 1) . Denote
L
p (
) =
x : sup
2(0
]
k sup
1 k
x k ( )
kp <
1
(28)
c). Let F =
fF k ( )
gbe any (square) matrix random process parameterized by
2(0 1). For any p 1
2(0 1), dene
S
p (
) =
fF :
k Yk
j
=i
+1( I
;F k ( ))
kp
M (1
;) k
;i
8
2(0
]
8k i 0 for some M > 0and
2(0 1)
gsimilarly,
S
(
) =
fF :
k Yk
j
=i
+1( I
;E F k ( )])
kM (1
;) k
;i
8
2(0
]
8k i 0 for some M > 0 and
2(0 1)
gIn what follows, it will be convenient to introduce the sets
S
p =
42(01)
S
p (
)
S=
42(01)
S
(
) (29) We may call these stability sets. They are related to the stability of random equation (19) and deterministic equa- tion (20), respectively. For simplicity, we shall sometimes suppress the parameter ( ) in F k ( ), when there is no risk of confusion.
d). For scalar random sequences a = ( a k k 0), we set
S0
( ) =
f
a : a k
20 1] E
Yn
j
=i
+1(1
;a j )
M k
;i
8
k i 0 for some M > 0
g: Also,
S
0
=
4 2(01)S
0
( ) (30)
e). Let p 1 and let x =
4fx i
gbe any random process.
Set
M
p =
(
x :
km
X+n
i
=m
+1x i
kp
C p n
12 8n 1 m 0 for some C p depending only on p and x
)
:
As is known for example from 10], martingale dierence sequence, - and -mixing sequences, and linear processes (a process generated from a white noise source via a linear lter with absolutely summable impulse response) are all in the set
Mp .
In particular, when
fx i
gis a martingale dierence se- quence, by the Burkholder inequality we have ( p > 1)
k
m
X+n
i
=m
+1x i
kp
( B p x
p ) n
12 8n 1 m 0 (31) where x
p = sup
4k
kx k
kp , and B p is a constant depending on p only (cf. 11]). (This fact will be frequently used in the sequel without explanations).
f). Let
fA k
gbe a matrix sequence, b k 0
8k 0.
Then by A k = O ( b k ) we mean that there exists a constant M <
1such that
k
A k
kMb k
8k 0 :
The constant M may be called the ordo-constant.
Throughout the sequel, the ordo-constant does not depend on , even if
fA k
gor
fb k
gdoes.
B. Assumptions
We will rst show that given the exponential stability of the homogenous part of (19) and a certain weak depen- dence property of the adaptation gains, how the tracking performance can be analyzed, then we present more de- tailed discussions on such properties.
In the sequel, unless otherwise stated,
Fk denotes the -algebra generated by
f' i w i v i
;1i
k
g, and
fF k
gis dened in (19).
To establish the general theorem, we need the following assumptions:
(A1). (Exponential stability) There are
2(0 1), and p 2 such that
f
F k
g2Sp (
)
\S(
)
(A2). (Weak dependence) There is a real number q 3 together with a bounded function ( m ) 0 with
m
lim
!1!0( m ) = 0
(taking rst m to innity and then to zero) such that
8m
8k
82(0
]
k
E F k
jFk
;m ]
;E F k ]
kq
( m )
(A3). L i
2Fi
8i 1, and there is
2(0 1) such that
f
L i
g2Lr (
)
fF i
g2L2q (
)
with r = (12
;p 1
;32 q )
;1, and with p and q dened as
in (A1) and (A2).
(A4). For all k 1 we have
E v k
jFk ] = 0 E w k
+1jFk ] = E w k
+1v k
jFk ] = 0 E v k
2jFk ] = R v ( k ) E w k
+1w k
+1] = Q w ( k + 1) E
jv k
jr
jFk ] + E
kw k
+1kr ]
M <
1 8k 1 for deterministic quantities R v ( k ), Q w ( k + 1) and M , where r is dened as in (A3).
The key conditions are (A1) and (A2). In general, (A1) can be guaranteed by a certain type of stochastic persis- tence of excitation condition, which is studied in the com- panion paper 9] while (A2) can be guaranteed by impos- ing a certain weak dependence condition on the regressor
f
' i
g. More detailed discussions will be given later. At the moment, we just remark that if (A1) and (A2) hold for all p 1 and all q 1, then in (A3) and (A4), the number r needs only to satisfy r > 2.
C. The General Theorem
Now, recursively dene a matrix sequence
f^ k
gas fol- lows:
^ k
+1= ( I
;E F k ])^ k ( I
;E F k ])
+
2R v ( k ) E L k L k ] +
2Q w ( k + 1) (32) where ^
0= E
e0e0
], and R v ( k ) and Q w ( k +1) are dened in Assumption (A4). Note that this denition is very close to the denition of k in (22). We now have a result that is the "mother-theorem" of Theorem 3.1:
Theorem 4.1 Let Assumptions (A1)-(A4) hold. Let the tracking error
ek be dened by (11) (or (19)), and let ^ k
dened by (32). Then
82(0
]
8k 1:
k
E
ek
+1ek
+1]
;^ k
+1kc ( ) +
2+ (1
;) k ] where c > 0 and
2(0 1) are constants and ( ) is a function that tends to zero as tends to zero. It is dened by ( )
4= min m
1 fp
m + ( m )
g: The proof is given in Appendix A.
Next, we show that under more conditions, the expres- sion for ^ k in (32) can be further simplied.
Corollary 4.1 . Under the conditions of Theorem 4.1, if F k = P k ' k ' k with
k' k
k2t = O (1),
kF k
kt = O (1), for some t > 1, and if there are some function ( ), tending to zero as tends to zero, and some deterministic sequence
fR k
gsuch that
k
P k
;R k
ks = O ( ( ))
8k
82(0
] s = (1
;t
;1)
;1then we have (
82(0
]
8k 1)
k
E
ek
+1ek
+1]
;k
+1kc ( ) + ( )]
+
2+ (1
;) k
(33)
for some constants c > 0 and
2(0 1), where k is recur- sively dened by
k
+1= ( I
;R k S k ) k ( I
;R k S k ) +
2R v ( k ) R k S k R k +
2Q w ( k + 1) (34) with S k = E ' k ' k ] and
0= ^
0.
Proof . By Theorem 4.1, we need only to show that
k
^ k
+1;k
+1k= O
( ) +
2+ (1
;) k ]
This can be derived by straightforward calculations based on the equations for ^ k and k , and hence Corollary 4.1 is true.
Remark . If in Condition (A2),
( m ) = O ( ( m ) + ( )) ( ) = min m
1
pm + ( m )] then ( ) dened in Theorem 4.1 satises ( ) = O ( ( )).
This will be the case for RLS and KF algorithms in Theo- rem 3.1, as can be seen from section V.
The following result also follows directly from Theorem 4.1.
Corollary 4.2. If, in addition to the conditions of The- orem 4.1, R v ( k )
R v Q w ( k )
Q w , and there are F G
and a function ( ), tending to zero as tends to zero, such that
82(0
],
k
EF k
;F
k+
kE ( L k L k )
;G
k( )
8k then for some
2(0 1) and for all
2(0
] k 1
E
ek
+1ek
+1]
= + O
( ) + ( )] +
2]
+ O
;(1
;) k
(35) where satises the following Lyapunov equation:
F + F = R v G +
2Q w : (36) Now denote
R v = R v
Z
1
0
e
;Ft Ge
;F
t Q w =
Z
1
0
e
;Ft Q w e
;F
t the solution to the Lyapunov equation (36) can be ex- pressed as
= R v +
2Q w
in which there is a reminiscence of the results obtained in
the simple example discussed in Section 1 (see, (9)).
D. Discussion on the Assumptions
Now, let us discuss the key assumptions (A1) and (A2).
First, assumption (A1) has been studied in the compan- ion paper 9], and here we only give some results concerning
f
F k
g2S, which will be used shortly in the next section.
Proposition 4.1. Let
fG k
gbe a random matrix pro- cess, possibly dependent on , with the property
E
kG k
k( ) for all small and all k (37) where ( )
!0 as
!0 :
Then
fF k
g2S ()fF k + G k
g2S.
Proof. Suciency: Recursively dene (
8x :
kx
k= 1) x k
+1= ( I
;E F k + G k ]) x k
8k m x m = x
Then
x k
+1= ( I
;E ( F k )) x k
;E ( G k ) x k
=
Yn
i
=m I
;E ( F i )] x m
;
n
X
i
=m
Yn
j
=i
+1I
;E ( F j )] E ( G i ) x i
Consequently, similar to the proof of Theorem 3.1 in 9], by the Gronwall inequality we have
k
x n
+1k2 M (1
;) n
;m
+18
<
:
1 +
Xn
i
=m n
Y
j
=i
+1(1 + E
kG j
k)
E
kG i
k9
=
From this and the condition (37), it is not dicult to con- vince oneself that
fF k + G k
g2S.
Necessity: by using the fact proved above, and noting that F k = ( F k + G k )
;G k , we know that
fF k
g2S. This completes the proof.
The following useful result follows from Proposition 4.1 # immediately.
Proposition 4.2. Let F k = P k H k and the following conditions be satised:
(i).
fH k
g2Lt (
),
2(0 1) t 1.
(ii).
kP k
;P k
ks
( ) ,
8 2(0
], where, ( )
!0 as
!0, s = (1
;t
;1)
;1, and
fP k
gis a deterministic process.
Then
fF k
g2S ()fP k H k
g2S.
Proof. The result follows directly from Proposition 4.1, if we note that
F k = P k H k + ( P k
;P k ) H k :
We now turn to discuss the weak dependence condition # (A2).
Example 4.1 Let
f' i
gsatisfy (21), and L (
) : R d
;!R d
d be a real matrix function with
kL ( ' ( k ))
kq = O (1), for some 1
q
1. Then we have the following inequality (c.f.19]):
k
E L ( ' k )
jFk
;m ]
;EL ( ' k )
kq = O ( ( m )]
1;1q) :
8k m Hence, if F k = L ( ' k ), then condition (A2) holds. (38)
Note that when
f' k
gsatises condition P2 in Section 3, we have by taking q =
1in (38)
k
E ' k ' k
jFk
;m ]
;E' k ' k
k1= O ( ( m )) : (39) This fact will be used in the next section in the proof of
Theorem 3.1. #
Example 4.2 Let
f' k
gbe generated by
x k = Ax k
;1+ B k (A stable) : ' k = Cx k + k
where
fj j k + 1
gand
fv j
;1w j j
k
gare indepen- dent, and
fj
gis an independent sequence. Assume that
sup k E
kk
k(b
+1)q <
1for some b 0 q 1 : Then for any function L (
) : R d
;!R d
d , with
k
L ( x )
;L ( x
0)
kM (
kx
k+
kx
0k+ 1) b
kx
;x
0k 8x x
0there is a constant
2(0 1) such that (cf.14])
8m 0
8k 0
k
E L ( ' k
+m )
jFk ]
;EL ( ' k
+m )
kq = O ( m ) Hence, if F k = L ( ' k ), then again, condition (A2) holds.
The following simple result will be useful in the sequel. #
Proposition 4.3 Let F k = P k L ( ' k ), and the following two conditions hold:
(i). There is a bounded deterministic matrix sequence
f
P k
g, and a function ( ) tending to zero as tends to zero, such that
k
P k
;P k
ks
( )
82(0
] for some s > 1 (ii). There is a number r > 1 such that
kL ( ' k )
kr = O (1), together with a function ( m ) tending to 0 as m tends to innity, such that
k
E L ( ' k
+m )
jFk ]
;EL ( ' k
+m )
kq
( m )
8
k
8m ( q = ( r
;1+ s
;1)
;1) :
Then condition (A2) holds with ( m ) = O ( ( m )+ ( )) : Proof. The result follows directly from the following identity:
E F k
+m
jFk ]
;EF k
+m
= ( P k
+m
;P k
+m ) L ( ' k
+m )
jFk ]
;
E
( P k
+m
;P k
+m ) L ( ' k
+m )
+ P k
+m
fE L ( ' k
+m )
jFk ]
;EL ( ' k
+m )
g:
#
V. Analysis of the Basic Algorithms In this section, we shall show that, for the basic LMS, RLS and KF algorithms, conditions (A1)-(A3) of the pre- vious section can be guaranteed by imposing some explicit (stochastic excitation and weak dependence) conditions on the regressors
f' k
g, and at the same time prove Theorem 3.1.
A. Analysis of LMS
For the LMS dened by (11) - (12), let us introduce the following two kinds of weak dependence conditions:
L1). Condition P2 of Section 3 is satised but with the boundedness condition on
f' k
grelaxed to the following : There exist positive constants "M and K such that
E exp
f Xn
j
=i
+1"
k' j
k2+gM exp
fK ( n
;i )
g 8n i 0 :
L1'). The random process F k =
4' k ' k has the following expansion:
F k =
X1j
=0A j Z k
;j + D k
X1j
=0k
A j
k<
1where
fZ k
gis an independent process such that
fZ j j k + 1
gand
fv j
;1w j j
k
gare independent and satises
sup k E exp
fkZ k
k1+g
<
1for some > 0 > 0 and where
fD k
gis a bounded deterministic process.
Theorem 5.1 . Let Conditions P1 and P3 of Section 3 be satised. If either L1) or L1') above holds, then Condi- tions (A1)-(A4) of Theorem 4.1 hold (for all p 1 q 1) and Theorem 3.1 is true for the LMS case.
Proof . First, in the LMS case, Conditions P1 and L1 (or L1') ensure that Condition (A1) of Theorem 4.1 holds for all p 1 (cf. 9], Theorem 3.3). Next, when L1) holds, by Example 4.1 we know that Condition (A2) is true for all q 1. Also, when L1') holds, by the assumed independency we have for all q 1,
k
E F k
jFk
;m ]
;EF k
kq =
kX1j
=m A j Z k
;j
;EA j Z k
;j ]
kq
= O (
X1j
=m
k
A j
k)
8m 1 : Hence (A2) holds again for all q 1.
Moreover, Conditions (A3) and (A4) hold obviously in the present case. Finally, by 39, the result of Theorem 3.1(in the LMS case) follows directly from Theorem 4.1.
This completes the proof.
B. Analysis of RLS
For the RLS algorithm dened by (11), (13) and (14), let us introduce the following two kinds of excitation con- ditions:
R1) . There exist constants h > 0 c > 0 > 0 such that P
(
min ( k
X+h
i
=k
+1' i ' i ) c
jFk
)
>
8k
R1') . There exists h > 0 such that sup k E
"
min ( k
X+h
i
=k
+1' i ' i )]
#
;
t
<
1 8t 1 : The following weak dependence condition will also be used:
R2) . There exists a number t 5, such that
k' k
k4t = O (1), and that
k
E ' k ' k
jFk
;m ]
;E' k ' k
k2t
( m )
8k m where ( m )
!0 as m
!1.
Remark 5.1 . Detailed discussions and investigations on the above rst two conditions can be found in 10] and 17].
It has been shown in 10] that if Condition P1 and (21) in Section 3 hold, then R1) is true Also, if
f' k
gis generated by a linear state space model as in Example 4.2, then R1') can be veried (cf.17]). Moreover, Condition R2) has been discussed in the last section.
Theorem 5.2. Let Conditions R1 ( or R1') and R2 above be satised. Then Conditions (A1)-(A3) of Theorem 4.1 hold (for any p < 2 tq < t ) and Theorem 3.1 is true for the RLS case.
Proof. First, note that
k
Y
j
=i
+1( I
;F j ) = (1
;) k
;i P k
+1P i
;+11 8k i (40) and P k
;1= (1
;) P k
;;11+ ' k ' k : (41) From this and condition R2 it follows that
k
P k
;1k2t = O (1)
82(0 1) : (42) Also, by Theorem 1 in 10], there is
2(0 1) such that
f
P k
g2Ls (
)
8s 1 (43) Combining (40), (42), (43), we get
f
F k
g2Sp
8p < 2 t: (44)
Now, dene ( P
0= P
0)
P
;k
1= (1
;) P
;k
;11+ E ( ' k ' k ) : (45) Since either R1 or R1' implies P1 in Section 3 (cf.10]), by a similar (actually simpler) argument as that used for the proof of (43) we know that
kP k
k= O (1) : We next prove that
k
P k
;1;P
;k
1k2t = O ( ( )) ( ) = min m
1f
p
m + ( m )
g: First, by (41) and (45) (46)
P k
;1;P
;k
1=
Xk
i
=1(1
;) k
;i ' i ' i
;E' i ' i ] (47) For any xed m 1, by denoting
j ( i ) = E ' i ' i
jFi
;j ]
;E ' i ' i
jFi
;j
;1] 0
j
m
;1 we have
' i ' i
;E' i ' i
= m
X;1j
=0j ( i ) +
fE ' i ' i
jFi
;m ]
;E ' i ' i ]
g(48) Now, since for each j , the sequence
fj ( i ) i 1
gis a mar- tingale dierence, we can apply Lemma A.2 in the Ap- pendix to each such
fj ( i ) i 1
gto obtain
kXk
i
=1(1
;) k
;i m
X;1j
=0j ( i )
k2t = O (
pm ) (49) Also, by our assumption
kXk
i
=1(1
;) k
;i
fE ' i ' i
jFi
;m ]
;E ' i ' i ]
gk2t
( m ) Hence, (46) follows from (47)-(50) immediately. (50)
Similar to the proof of (44), it is evident that
P k ' k ' k
2S: (51) Now
k
P k
;P k
kkP k
k kP k
;1;P
;k
1k kP k
kfrom this, (43) and (46) it follows that
k
P k
;P k
ks = O ( ( ))
8s < 2 t (for small ) (52) Hence, by Proposition 4.2 and (51), we know that
fF k
g2S
: This in conjunction with (44) veries Condition (A1).
Now, by (52) and R2 from Proposition 4.3 it is evident that Condition (A2) holds for any q < t .
To prove (A3), rst note that for any q < t , (44) implies
f
F k
g2L2q (
) for some
> 0 :
So we need only to prove that
f
L i
g2Lr (
) for r > (12
;2 1 t
;2 3 t )
;1= 2 t t
;4 : This is true since by (43) and
k' k
k4t = O (1),
f
L i
g=
fP i ' i
g2Lr (
)
8r < 4 t
and since 4 Thus, by taking t > t 2
;t 4. Hence (A3) holds. t =
1in the above argument, we see that Conditions (A1) and (A2) hold for all p 1 and all q 1. Hence Theorem 4.1 can be applied to prove Theorem 3.1 for the RLS case, while the expression for k will follow from Corollary 4.1 if we can prove that
k
P k
;R k
ks = O ( ( )) s = t
t
;1 (53) where P k and R k are respectively dened by (14) and (24).
Furthermore, by (52), it is clear that (53) will be true if
k
R k
;P k
k= O ( ( ))
holds. However, this can be veried by using the deni- tions for R k and P k (see Appendix B). Hence the proof is complete.
C. Analysis of the KF algorithm
Among the three basic algorithms described in Section 2, the KF algorithm dened by (11), (16) and (17) is the most complicated one to analyze. Let us now introduce the following two conditions on stochastic excitation and weak dependence.
K1) . There are constants h > 0 and
2(0 1) (inde- pendent of ) such that
k
1 + b kh
+1
2S0
( )
where
S0( ) is dened by (30), and k and b k are dened as follows: (
Gk is as before the sigma-algebra generated by
f
' i i
k
g.)
k =
4min8
<
:
E
2
4
1 1 + h
(
k
X+1)h i
=kh
+1' i ' i 1 +
k' i
k2jGkh
3
5 9
=
b k = (1
;) b k
;1+ (
k' k
k2+ 1)
2(0 1)
K2) . There exists a number t 7 together with a func- tion ( m )
!0 ( as m
!1) such that
k' k
k4t = O (1), and that
k