Performance Analysis of General Tracking Algorithms

(1)

Performance Analysis of General Tracking Algorithms

Lei Guo

Institute of Systems Science, Chinese Academy of Sciences Beijing, 100080, China

Lennart Ljung and

Department of Electrical Engineering, Linkoping University S-581 83 Linkoping, Sweden

Abstract | A general family of tracking algorithms for linear regression models is studied. It includes the familiar LMS (gradient approach), RLS (recursive least squares) and KF (Kalman lter) based estimators. The exact expressions for the quality of the obtained estimates are complicated.

Approximate, and easy-to-use, expressions for the covariance matrix of the parameter tracking error are developed.

These are applicable over whole time interval, including the transient and the approximation error can be explicitly calculated.

I. Introduction

Tracking is the key factor in adaptive algorithms of all kinds. We shall in this contribution study the special case where the underlying model is a linear regression, i.e., the observations are related by

y k = ' _k k + v k k 0 : (1) Here y k is an observation made at time k , and ' k is a d - dimensional vector, that is known at time k , v k represents a disturbance and the parameter vector k describes how the components of ' k relate to the observation y k . It is the objective to estimate the vector k from measurements

f

y t ' t t

k

^g

.

Many technical problem formulations t the structure (1) by choosing ' k and y k appropriately. See, among many references, for example, 15] and 22].

In order to come up with good algorithms for estimating

_k , it is natural to introduce some assumptions about the time-variation of this parameter vector. In general we may write

k = k

^;1

+ w k (2) where is a scaling constant and w k is an as yet undened variable.

The tracking algorithms will provide us with an estimate

^ _k = ^ _k ( y ^k ' ^k ^k ) (3) where superscript denotes the whole time history: y ^k =

f

y

0

y

1

:::y k

^g

, etc.

Supported by the National Natural Science Foundation of China Supported by the Swedish Research Council for Engineering Sci- ences (TFR)

A prime question concerns of course the quality of such an estimate. We shall evaluate the quality in terms of the covariance matrix of the tracking error

e

k = k

^;

^ k (4) This covariance matrix will be denoted by

⁰

_k = E

^e

k

^e

_k ] (5) where expectation will be taken over all relevant stochastic variables. A precise denition will be given later.

An exact expression for

⁰

_k will be very complicated | except in some trivial cases | and it will not be possible to derive it explicitly in closed form. However, the practical importance of having good tracking algorithms and estimates of their quality still makes it vital to be able to work with

⁰

_k .

For that reason, there is a quite substantial literature on the problem of how to approximate

⁰

_k with expressions k

that are simple to to work with. This literature is { partly { surveyed in 2], 1], 12], and 20].

The current paper has the ambition to give a general result that subsumes and extends most of the earlier results.

Example 1.1 A Preview Example .

Consider the model (1){(2) under the assumptions that a). ' k and k are scalars

b).

^f

' k

^g

^f

v k

^g

and

^f

w k

^g

are independent sequences of independent random variables with zero mean values and variances R ' , R v and Q w , respectively.

c). The fourth moment of ' k is R

4

.

Assume also that the estimate ^ k is computed by the simple LMS algorithm

^ k

+1

= ^ k + ' k ( y k

^;

' k ^ k ) : (6) This case is one | essentially the only one | where a simple exact expression for

⁰

_k can be calculated. Straight- forward calculations give

e

_k

+1

= (1

^;

'

²

_k )

^e

_k

^;

' _k v _k + w _k

+1

: (7) Squaring and taking expectations gives

⁰

_k

+1

= (1

^;

2 R _' +

²

R

4

)

⁰

_k +

²

R _' R _v +

²

Q _w : (8)

(2)

This is a linear time-invariant dierence equation for

⁰

_k , and can be explicitly solved. In particular, if

j

1

^;

2 R ' +

²

R

4^j

< 1 the solution of (8) will converge to

with

= 1

1

^;

R

4

= (2 R _' ) = 1 2 R _' R ' R v +

²

Q ^w ^] Simple manipulations then give (9)

j

^;

^j

( ) ( ) = R

4

= (2 R ' ) 1

^;

R

4

= (2 R ' )] : Thus,

can be well approximated by for small , since

( )

^!

0 as

^!

0.

²

Now, this example was particularly easy, primarily because of the assumed independence among

^f

' k v k w k

^g

which makes ' k and

^e

k independent.

In more general cases we have to deal with dependence among

^f

' _k

^g

, and that is actually at the root of the problem. Generally speaking, if

^f

' k

^g

are weakly dependent, so should ^ ^ _k in (3) depends to a small extent on the \latest" k and ' k be, provided that ' _k , i.e. if the adaptation rate ( in the example) is small and the error equation ((7) in the example) is stable.

The extra term caused by the dependence in the equation corresponding to (8) in the example should then have negligible inuence. Indeed, it is the purpose of this contribution to establish this for a fairly general family of tracking algorithms. Despite the simple idea, it turns out to be surprisingly technically dicult to prove. This paper could be said to make the end of a series of results on performance analysis, starting with Theorem 1 in 12] and then followed by 14],13] and 10]. There are many related, relevant results using other approaches. We may point to 20], 2], 5], 6], 4], 16], 3] 18], and to the references in these books and papers.

The bottom line of the analysis is a result of the character

k

E

^e

k

^e

_k ]

^;

k

^k

( )

^k

k

^k

(10) where ( )

^!

0 as

^!

0, and is a measure of the adaptation rate in the algorithm, k obeys a simple linear, deterministic, dierence equation (like (8) without the term

²

R

4

).

The point with a result of the character (10) is, clearly, that we can arbitrarily well approximate the actual tracking error covariance matrix with a simple expression that can be easily evaluated and analyzed. The essence of this paper does not lie in the expression for k itself | it is not dicult to conjecture that such an approximation should be reasonable. Our contribution is rather to establish the connection in the explicit fashion (10) for a wide family of the most common tracking algorithms. One important step in achieving such results is to rst establish that the underlying algorithm is exponentially stable. This is a ma- jor problem in itself, and a companion paper 9] is devoted to this step, for the same family of algorithms.

The paper is organized as follows. In Section 2 the tracking algorithms are briey described. Section 3 gives the main result: That (10) holds under the same general conditions for all algorithms in the family. There we also briey discuss the practical consequences of the result. In the following section, a more general theorem is presented, which is the basis for the analysis. This theorem is more general, and uses weaker but less explicit conditions. The proof of the main result is then given in Section 5, by showing that the general theorem can be applied to our family of algorithms. Notice that this analysis is of independent interest in that for each individual algorithm, the conditions can be somewhat weakened in dierent ways.

II. The Family of Tracking Algorithms We shall consider the general adaptation algorithm

^ k

+1

= ^ k + L k ( y k

^;

' _k ^{^} k )

²

(0 1) (11) where the gain L _k is chosen in some dierent ways:

Case 1: Least Mean Squares (LMS) ^:

L k = ' k (12)

This is a standard algorithm, 21],22], and has been used in numerous adaptive signal processing applications.

Case 2 : Recursive Least Squares(RLS) ^:

L k = P k ' k (13)

P _k = 1 1

^;

P _k

^;1^;

P ^k

^;¹

' k ' _k P k

^;1

1

^;

+ ' _k P _k

^;1

' _k

(14)

P

0

> 0 : (15)

This gives an estimate ^ k that minimizes

k

X

t

=1

(1

^;

) ^k

^;

^t ( y t

^;

' _t )

²

where (1

^;

) is the \forgetting factor".

Case 3: Kalman Filter (KF) Based Algorithm:

L k = P k

^;1

' k

R + ' _k P k

^;1

' k (16) P k = P k

^;1^;

P k

^;1

' k ' _k P k

^;1

R + ' _k P k

^;1

' k + Q (17) ( R > 0 Q > 0) (18) Here R is a positive number and Q is a positive denite matrix. The choice of L _k corresponds to a Kalman lter state estimation for (1)-(2), and is optimal in the a poste- riori mean square sense if v k and w k are Gaussian white noises with covariance matrices R and Q , respectively, and if is chosen as in (2).

If

^f

' k y k k

^g

obey (1) - (2) and ^ k is found using (11) we can write the estimation error

^e

k as

e

_k

+1

= ( I

^;

F _k )

^e

_k

^;

L _k v _k + w _k

+1

F _k = L _k ' _k (19)

(3)

This is a purely algebraic consequence of (1) - (2) and (11), and holds for whatever sequences v k and w k .

If we introduce stochastic assumptions about

^f

v k

^g

and

f

w k

^g

, we can use (19) to express the covariance matrix E

^e

k

+1

^e

_k

₊₁

]. That will however be quite complex, primarily due to the dependence between

^f

L _k ' _k

^e

_k

^g

. The basic approximating expression will instead be based on the following expression

k

+1

= ( I

^;

G k ) k ( I

^;

G k ) +

²

R v ( k ) M k +

²

Q w ( k +1) where G k = EF k , M k = EL k L _k , R v ( k ) = Ev

²

_k (20) and Q w ( k ) = Ew k w _k . As follows from Example 1.1, this would be the correct expression for the covariance matrix of

^e

k

+1

, if v k and w k were white noises and L k ' _k was independent of

^e

_k , and if a term of size

²

k was neglected.

Indeed, we shall prove that (20) provides a good approximation of the true covariance matrix in the sense that (10) holds. Note that k obeys a simple linear dierence equation, and can easily be calculated and examined.

III. The Main Result A. The Assumptions

We shall now consider the algorithm (11) with either of the three choices of the gain L _k , discussed in the previous section. For the analysis we shall impose some conditions on the involved variables. These are of the following character.

C1. The regressors

^f

' _k

^g

span the regressor space (in order to ensure that the whole parameter vector can be estimated)

C2. The dependence between the regressors ' k and ( ' _i v _i

^;1

w _i ) decays to zero as the time distance ( k

^;

i ) tends to innity

C3. The measured error v k and the parameter drift w k

are of white noise character.

In more exact terms, the three assumptions take the following form:

P1 ^{. Let} ^S t = E ' t ' _t ], assume that there exist constants h > 0 and > 0 such that

k

X+

h

t

=

k

+1

S t I

⁸

k

P2 . Let

^G

k =

^f

' k

^g

,

^F

k =

^f

' i v i

^;1

w i i

k

^g

. As- sume that

^f

' _k

^g

is weakly dependent ( -mixing) in the sense that there is a function ( m ) with ( m )

^!

0, as m

^!¹

, such that

A

^2Gk⁺

sup

m

B

^2Fk^j

P ( A

^j

B )

^;

P ( A )

^j

( m )

⁸

k

⁸

m: (21) Also, assume that there is a constant c ' > 0 such that

k

' k

^k

c ' a:s:

⁸

k .

P3 ^{. Let}

^F

k be the -algebra dened in P2, assume that E v k

^jF

k ] = 0 E w k

+1^jF

k ] = E w k

+1

v k

^jF

k ] = 0 E v _k

²^jF

k ] = R v ( k ) E w k w _k ] = Q w ( k )

sup _k

^f

E

^j

v k

^j

r

jF

k ] + E

^k

w k

^k

r

g

M

for some r > 2 M > 0 : B. The Result

Now, let k be dened by the following linear, deterministic dierence equation:

k

+1

= ( I

^;

R _k S _k ) k ( I

^;

R _k S _k )

+

²

R v ( k ) R k S k R k +

²

Q w ( k + 1) (22) where S k = E ' k ' _k ], and R k is dened as follows:

LMS-case

R k = I (23)

RLS-case

R _k = R _k

^;1^;

R _k

^;1

S _k R _k

^;1

+ R _k

^;1

( R

0

= P

0

) (24)

KF-case

R k = R k

^;1^;

R k

^;1

S k R k

^;1

+ Q=R ( R

0

= P

0

=R ) (25) We then have the following main result.

Theorem 3.1 Consider any of the three basic algorithms in Section 2. Assume that P1, P2 and P3 hold. Let

k be dened as above. Then

⁸

²

(0

)

⁸

k 1

k

E

^e

k

^e

_k ]

^;

k

^k

c ( ) +

²

^{+ (1}

^;

) ^k ] (26) where ( )

^!

0 (as

^!

0), which is dened by

( ) = min

⁴

_m

1 ^f

p

m + ( m )

^g

(27) and ( m ) was dened in P2, and

²

(0 1)

²

(0 1) c >

0 are constants which may be computed using properties of

^f

' k v k w k

^g

.

The proof is given in Section 5. Let us now discuss the conditions used in the above theorem.

C. The Degree of Approximation

First of all, it is clear that the quantity ( ) plays an important role. The faster it tends to zero, the better approximation is obtained. The rate by which it tends to zero is according to (27) a reection of how fast ( m ) (that is, the dependence among the regressors) tends to zero as m increases. For example, if the regressors are m -dependent, so that ' k and ' ` are independent for

^j

k

^;

`

^j

> m , then ( n ) = 0 for n > m and ( ) will behave like

p

. Also, if the dependence is exponentially decaying ( ( m )

Ce

^;

^m ), then we can nd that

( ) < C

⁰

^:

⁵^;

for arbitrarily small, positive . This gives a good picture

of typical decay rates of .

(4)

D. Persistence of Excitation: Condition P1

Condition P1 is quite natural and weak, just requiring the regressor covariance matrix to add up to full rank over a given time span of arbitrary length. It has been known to be a necessary condition (in a certain sense) for boundedness of E

^k^e

_k

^k²

generated by LMS (cf. 8]), it is also known to be the minimum excitation condition needed for the stability analysis of RLS (cf. 10]).

E. Boundedness and -mixing of the regressors: Condition Condition P2 requires boundedness and P2 -mixing of the regressors. Although such conditions are standard ones in the literature (e.g. 11]), they can still be considered as restrictive. As seen in several of the results in Section 5, both -mixing and boundedness can be weakened considerably when we deal with specic algorithms.

It may also be remarked that when

^f

' k

^g

is unbounded, we can modify the algorithm and make Theorem 3.1 hold true: Introduce the normalized signal

( y _k ' _k v k ) = 1

p

1 +

^k

' k

^k²

( y k ' k v k ) Then we have from (1)

y _k = _k ' _k + v k :

Thus,

^f

k

^g

may be estimated based on this normalized linear regression. In this case, Theorem 3.1 can be applied to this case if only S k and R v ( k ) in (22){(25) are replaced by E ' k ' _k

1 +

^k

' k

^k²

] and E 1

1 +

^k

' k

^k²

] R v ( k ), respectively.

F. The Parameter Drift Model: Condition P3

There are two things to mention around the Conditions P3. First, we note that the martingale dierence property of w k essentially means that the true parameters, according to the model (2) are assumed to be a random walk.

Although this model is quite standard, it has also been criticized as being too restrictive. We believe that a random walk model, in the context of slow adaptation (small

), captures the tracking behavior of the algorithm very well. This is, in a sense, a worst case analysis, since the future behavior of the model is unpredictable.

We may also note that time-varying covariances Q w ( k ) and R v ( k ) are allowed. Several of the special model drift cases described in 12] are therefore covered by P3. Other drift models, where the driving noise is colored, can be put into a similar Kalman lter framework. However, to cover also that case with our techniques requires more work.

Condition P3 also introduces assumptions about higher moments than 2. We remark that if we only assume that

f

v k

^g

and

^f

w k

^g

are bounded in e.g. mean square sense, then upper bounds for the mean square tracking errors can be established (cf. 8] and 7]). The strengthened assumption in P3 allows us to obtain performance values much more accurate than upper bounds.

G. The Practical Use of the Theorem

The practical consequences of Theorem 3.1 is that a very simple algorithm, the linear, deterministic dierence equation (22) will describe the tracking behavior. Now, this equation is quite easy to analyze. In fact, there is an ex- tensive literature on such analysis, in particular for the special case of LMS. Among many references, we may refer to 12] for a survey of such results. In essence, all these results capture the dilemma between tracking error ( is large because is small) and the noise sensitivity ( is large because is large) and may point to the best compromises between these requirements.

For example, under weak stationarity of the regressors S k

S

we nd that R k will converge to ~ R as k

^!¹

, where ~ R = I in the LMS-case, ~ R = S

^;¹

in the RLS case and for the KF case we have to solve

RS ~ R ^~ = Q=R

for ~ R . Inserted into (22) this gives the following stationary values for the tracking error covariance matrix (neglect- ing the term

²

):

LMS ^S ⁺ ^S ⁼ ^R v S +

²

Q ^w

RLS _{= 12} R v S

^;¹

+

²

Q ^w ^]

KF ^RS ^~ ^{+ ( ~} ^RS ⁾ ⁼ ^R v Q=R +

²

Q ^w

Note, that if we have Q = Q w and R = R v , then the latter equation can be solved as

= R

2 ( +

²

^{) ~} R

From these expressions the trade-os between tracking ability and noise sensitivity are clearly visible.

IV. A General Theorem

In this section, we shall present a general theorem on performance of tracking algorithm (11) when the gain L k is not specied, from which our main result Theorem 3.1 will follow. The general theorem has weaker, but less explicit assumptions. From now on the treatment and discussion will be more technical. However, the main line of thought in the proofs follows the outline given after Example 1.1 in the Introduction.

A. Notations

The following notations will be used in the remainder of

the paper. These notations are the same as in the compan-

ion paper 9].

(5)

a). The minimum and maximum eigenvalues of a matrix X are denoted by

min

( X ) and

max

( X ), respectively, and

k

X

^k

=

⁴^f

max

( XX )

^g¹²

k

X

^k

_p =

⁴^f

E (

^k

X

^k

^p )

^g¹^p

p 1 :

b). ^Let ^x ⁼

^f

^x k ( ) k 1

^g

be a random sequence parameterized by

²

(0 1) . Denote

L

p (

) =

x : sup

2(0

]

_k sup

1 ^k

x k ( )

^k

p <

¹

(28)

c). Let F =

^f

F k ( )

^g

be any (square) matrix random process parameterized by

²

(0 1). For any p 1

²

(0 1), dene

S

p (

) =

^f

F :

^k ^Y

^k

j

=

i

+1

( I

^;

F k ( ))

^k

p

M (1

^;

) ^k

^;

ⁱ

8

²

(0

]

⁸

k i 0 for some M > 0and

²

(0 1)

^g

similarly,

S

(

) =

^f

F :

^k ^Y

^k

j

=

i

+1

( I

^;

E F k ( )])

^k

M (1

^;

) ^k

^;

ⁱ

8

²

(0

]

⁸

k i 0 for some M > 0 and

²

(0 1)

^g

In what follows, it will be convenient to introduce the sets

S

p =

⁴

²(0

1)

S

p (

)

^S

=

⁴

²(0

1)

S

(

) (29) We may call these stability sets. They are related to the stability of random equation (19) and deterministic equation (20), respectively. For simplicity, we shall sometimes suppress the parameter ( ) in F k ( ), when there is no risk of confusion.

d). For scalar random sequences a = ( a _k k 0), we set

S0

( ) =

f

a : a k

²

0 1] E

^Y

ⁿ

j

=

i

+1

(1

^;

a j )

M ^k

^;

ⁱ

8

k i 0 for some M > 0

^g

: Also,

S

0

=

⁴

²(0

1)

S

0

( ) (30)

e). ^Let ^p ^{1 and let} ^x ⁼

⁴^f

^x i

^g

be any random process.

Set

M

p =

(

x :

^k

^m

^X⁺

ⁿ

i

=

m

+1

x i

^k

p

C p n

¹²

⁸

n 1 m 0 for some C p depending only on p and x

)

:

As is known for example from 10], martingale dierence sequence, - and -mixing sequences, and linear processes (a process generated from a white noise source via a linear lter with absolutely summable impulse response) are all in the set

^M

p .

In particular, when

^f

x i

^g

is a martingale dierence sequence, by the Burkholder inequality we have ( p > 1)

k

m

^X+

n

i

=

m

+1

x i

^k

p

( B p x

_p ) n

¹²

⁸

n 1 m 0 (31) where x

_p = sup

⁴

_k

^k

x k

^k

p , and B p is a constant depending on p only (cf. 11]). (This fact will be frequently used in the sequel without explanations).

f). ^Let

^f

^A k

^g

be a matrix sequence, b k 0

⁸

k 0.

Then by A k = O ( b k ) we mean that there exists a constant M <

¹

such that

k

A k

^k

Mb k

⁸

k 0 :

The constant M may be called the ordo-constant.

Throughout the sequel, the ordo-constant does not depend on , even if

^f

A _k

^g

or

^f

b _k

^g

does.

B. Assumptions

We will rst show that given the exponential stability of the homogenous part of (19) and a certain weak dependence property of the adaptation gains, how the tracking performance can be analyzed, then we present more detailed discussions on such properties.

In the sequel, unless otherwise stated,

^F

k denotes the -algebra generated by

^f

' i w i v i

^;1

i

k

^g

, and

^f

F k

^g

is dened in (19).

To establish the general theorem, we need the following assumptions:

(A1). (Exponential stability) There are

²

(0 1), and p 2 such that

f

F k

^g²^S

p (

)

^\^S

(

)

(A2). (Weak dependence) There is a real number q 3 together with a bounded function ( m ) 0 with

m

lim

^!1^!0

( m ) = 0

(taking rst m to innity and then to zero) such that

⁸

m

⁸

k

⁸

²

(0

]

k

E F k

^jF

k

^;

m ]

^;

E F k ]

^k

q

( m )

(A3). ^L i

²^F

i

⁸

i 1, and there is

²

(0 1) such that

f

L i

^g²^L

r (

)

^f

F i

^g²^L2

q (

)

with r _{= (12}

^;

p ¹

^;

₃₂ q ⁾

^;¹

, and with p and q dened as

in (A1) and (A2).

(6)

(A4). ^{For all} ^k ^{1 we have}

E v _k

^jF

_k ] = 0 E w _k

+1^jF

k ] = E w _k

+1

v _k

^jF

_k ] = 0 E v _k

²^jF

k ] = R v ( k ) E w k

+1

w _k

₊₁

] = Q w ( k + 1) E

^j

v k

^j

r

jF

k ] + E

^k

w k

+1^k

r ]

M <

¹

⁸

k 1 for deterministic quantities R _v ( k ), Q _w ( k + 1) and M , where r is dened as in (A3).

The key conditions are (A1) and (A2). In general, (A1) can be guaranteed by a certain type of stochastic persistence of excitation condition, which is studied in the companion paper 9] while (A2) can be guaranteed by imposing a certain weak dependence condition on the regressor

f

' i

^g

. More detailed discussions will be given later. At the moment, we just remark that if (A1) and (A2) hold for all p 1 and all q 1, then in (A3) and (A4), the number r needs only to satisfy r > 2.

C. The General Theorem

Now, recursively dene a matrix sequence

^f

^ k

^g

as follows:

^ k

+1

= ( I

^;

E F k ])^ k ( I

^;

E F k ])

+

²

R v ( k ) E L k L _k ] +

²

Q w ( k + 1) (32) where ^

0

= E

^e0^e

₀

], and R _v ( k ) and Q _w ( k +1) are dened in Assumption (A4). Note that this denition is very close to the denition of k in (22). We now have a result that is the "mother-theorem" of Theorem 3.1:

Theorem 4.1 Let Assumptions (A1)-(A4) hold. Let the tracking error

^e

k be dened by (11) (or (19)), and let ^ k

dened by (32). Then

⁸

²

(0

]

⁸

k 1:

k

E

^e

k

+1

^e

_k

₊₁

]

^;

^ k

+1^k

c ( ) +

²

^{+ (1}

^;

) ^k ] where c > 0 and

²

(0 1) are constants and ( ) is a function that tends to zero as tends to zero. It is dened by ( )

⁴

= min _m

1 ^f

p

m + ( m )

^g

: The proof is given in Appendix A.

Next, we show that under more conditions, the expression for ^ k in (32) can be further simplied.

Corollary 4.1 . Under the conditions of Theorem 4.1, if F k = P k ' k ' _k with

^k

' k

^k2

t = O (1),

^k

F k

^k

t = O (1), for some t > 1, and if there are some function ( ), tending to zero as tends to zero, and some deterministic sequence

^f

R k

^g

such that

k

P k

^;

R k

^k

s = O ( ( ))

⁸

k

⁸

²

(0

] s = (1

^;

t

^;¹

)

^;¹

then we have (

⁸

²

(0

]

⁸

k 1)

k

E

^e

k

+1

^e

_k

₊₁

]

^;

k

+1^k

c ( ) + ( )]

+

²

^{+ (1}

^;

) ^k

(33)

for some constants c > 0 and

²

(0 1), where k is recursively dened by

k

+1

= ( I

^;

R k S k ) k ( I

^;

R k S k ) +

²

R v ( k ) R k S k R _k +

²

Q w ( k + 1) (34) with S k = E ' k ' _k ] and

0

= ^

0

.

Proof . By Theorem 4.1, we need only to show that

k

^ k

+1^;

k

+1^k

= O

( ) +

²

^{+ (1}

^;

) ^k ]

This can be derived by straightforward calculations based on the equations for ^ k and k , and hence Corollary 4.1 is true.

Remark . If in Condition (A2),

( m ) = O ( ( m ) + ( )) ( ) = min _m

1

^p

m + ( m )] then ( ) dened in Theorem 4.1 satises ( ) = O ( ( )).

This will be the case for RLS and KF algorithms in Theo- rem 3.1, as can be seen from section V.

The following result also follows directly from Theorem 4.1.

Corollary 4.2. If, in addition to the conditions of The- orem 4.1, R _v ( k )

R _v Q _w ( k )

Q _w , and there are F G

and a function ( ), tending to zero as tends to zero, such that

⁸

²

(0

],

k

EF k

^;

F

^k

+

^k

E ( L k L _k )

^;

G

^k

( )

⁸

k then for some

²

(0 1) and for all

²

(0

] k 1

E

^e

_k

+1

^e

_k

₊₁

]

= + O

( ) + ( )] +

²

^]

+ O

^;

(1

^;

) ^k

(35) where satises the following Lyapunov equation:

F + F = R v G +

²

Q ^w : (36) Now denote

R v = R v

Z

1

0

e

^;

^Ft Ge

^;

^F

^t Q _w =

Z

1

0

e

^;

^Ft Q w e

^;

^F

^t the solution to the Lyapunov equation (36) can be ex- pressed as

= R v +

²

Q ^w

in which there is a reminiscence of the results obtained in

the simple example discussed in Section 1 (see, (9)).

(7)

D. Discussion on the Assumptions

Now, let us discuss the key assumptions (A1) and (A2).

First, assumption (A1) has been studied in the companion paper 9], and here we only give some results concerning

f

F k

^g²^S

, which will be used shortly in the next section.

Proposition 4.1. ^Let

^f

^G k

^g

be a random matrix process, possibly dependent on , with the property

E

^k

G k

^k

( ) for all small and all k (37) where ( )

^!

0 as

^!

0 :

Then

^f

F k

^g²^S ⁽⁾^f

F k + G k

^g²^S

.

Proof. Suciency: Recursively dene (

⁸

x :

^k

x

^k

= 1) x k

+1

= ( I

^;

E F k + G k ]) x k

⁸

k m x m = x

Then

x k

+1

= ( I

^;

E ( F k )) x k

^;

E ( G k ) x k

=

^Y

ⁿ

i

=

m I

^;

E ( F _i )] x _m

;

n

X

i

=

m

^Y

ⁿ

j

=

i

+1

I

^;

E ( F j )] E ( G i ) x i

Consequently, similar to the proof of Theorem 3.1 in 9], by the Gronwall inequality we have

k

x n

+1^k

2 M (1

^;

) ⁿ

^;

^m

⁺¹

8

<

:

1 +

^X

ⁿ

i

=

m n

Y

j

=

i

+1

(1 + E

^k

G _j

^k

)

E

^k

G _i

^k

9

=

From this and the condition (37), it is not dicult to con- vince oneself that

^f

F k + G k

^g²^S

.

Necessity: by using the fact proved above, and noting that F k = ( F k + G k )

^;

G k , we know that

^f

F k

^g²^S

. This completes the proof.

The following useful result follows from Proposition 4.1 # immediately.

Proposition 4.2. Let F _k = P _k H _k and the following conditions be satised:

(i).

^f

H k

^g²^L

t (

),

²

(0 1) t 1.

(ii).

^k

P k

^;

P k

^k

s

( ) ,

⁸

²

(0

], where, ( )

^!

0 as

^!

0, s = (1

^;

t

^;¹

)

^;¹

, and

^f

P k

^g

is a deterministic process.

Then

^f

F k

^g²^S ⁽⁾^f

P k H k

^g²^S

.

Proof. The result follows directly from Proposition 4.1, if we note that

F k = P k H k + ( P k

^;

P k ) H k :

We now turn to discuss the weak dependence condition # (A2).

Example 4.1 ^Let

^f

^' i

^g

satisfy (21), and L (

) : R ^d

^;^!

R ^d

^d be a real matrix function with

^k

L ( ' ( k ))

^k

q = O (1), for some 1

q

¹

. Then we have the following inequality (c.f.19]):

k

E L ( ' k )

^jF

k

^;

m ]

^;

EL ( ' k )

^k

q = O ( ( m )]

¹^;¹^q

) :

⁸

k m Hence, if F k = L ( ' k ), then condition (A2) holds. (38)

Note that when

^f

' _k

^g

satises condition P2 in Section 3, we have by taking q =

¹

in (38)

k

E ' k ' _k

^jF

k

^;

m ]

^;

E' k ' _k

^k¹

= O ( ( m )) : (39) This fact will be used in the next section in the proof of

Theorem 3.1. #

Example 4.2 ^Let

^f

^' k

^g

be generated by

x k = Ax k

^;1

+ B k (A stable) : ' k = Cx k + k

where

^f

j j k + 1

^g

and

^f

v j

^;1

w j j

k

^g

are independent, and

^f

j

^g

is an independent sequence. Assume that

sup _k E

^k

k

^k⁽

b

+1)

q <

¹

for some b 0 q 1 : Then for any function L (

) : R ^d

^;^!

R ^d

^d , with

k

L ( x )

^;

L ( x

⁰

)

^k

M (

^k

x

^k

+

^k

x

⁰^k

+ 1) ^b

^k

x

^;

x

⁰^k

⁸

x x

⁰

there is a constant

²

(0 1) such that (cf.14])

⁸

m 0

⁸

k 0

k

E L ( ' k

+

m )

^jF

k ]

^;

EL ( ' k

+

m )

^k

q = O ( ^m ) Hence, if F k = L ( ' k ), then again, condition (A2) holds.

The following simple result will be useful in the sequel. #

Proposition 4.3 Let F k = P k L ( ' k ), and the following two conditions hold:

(i). There is a bounded deterministic matrix sequence

f

P k

^g

, and a function ( ) tending to zero as tends to zero, such that

k

P k

^;

P k

^k

s

( )

⁸

²

(0

] for some s > 1 (ii). There is a number r > 1 such that

^k

L ( ' k )

^k

r = O (1), together with a function ( m ) tending to 0 as m tends to innity, such that

k

E L ( ' k

+

m )

^jF

k ]

^;

EL ( ' k

+

m )

^k

q

( m )

8

k

⁸

m ( q = ( r

^;¹

+ s

^;¹

)

^;¹

) :

Then condition (A2) holds with ( m ) = O ( ( m )+ ( )) : Proof. The result follows directly from the following identity:

E F k

+

m

^jF

k ]

^;

EF k

+

m

= ( P k

+

m

^;

P k

+

m ) L ( ' k

+

m )

^jF

k ]

;

E

( P _k

+

m

^;

P _k

+

m ) L ( ' _k

+

m )

+ P k

+

m

^f

E L ( ' k

+

m )

^jF

k ]

^;

EL ( ' k

+

m )

^g

:

#

(8)

V. Analysis of the Basic Algorithms In this section, we shall show that, for the basic LMS, RLS and KF algorithms, conditions (A1)-(A3) of the previous section can be guaranteed by imposing some explicit (stochastic excitation and weak dependence) conditions on the regressors

^f

' k

^g

, and at the same time prove Theorem 3.1.

A. Analysis of LMS

For the LMS dened by (11) - (12), let us introduce the following two kinds of weak dependence conditions:

L1). Condition P2 of Section 3 is satised but with the boundedness condition on

^f

' _k

^g

relaxed to the following : There exist positive constants "M and K such that

E exp

^f ^X

ⁿ

j

=

i

+1

"

^k

' j

^k²⁺

g

M exp

^f

K ( n

^;

i )

^g ⁸

n i 0 :

L1'). The random process F k =

⁴

' k ' _k has the following expansion:

F k =

^X¹

j

=0

A j Z k

^;

j + D k

^X¹

j

=0

k

A j

^k

<

¹

where

^f

Z _k

^g

is an independent process such that

^f

Z _j j k + 1

^g

and

^f

v j

^;1

w j j

k

^g

are independent and satises

sup _k E exp

^f

^k

Z _k

^k¹⁺

^g

<

¹

for some > 0 > 0 and where

^f

D k

^g

is a bounded deterministic process.

Theorem 5.1 . Let Conditions P1 and P3 of Section 3 be satised. If either L1) or L1') above holds, then Condi- tions (A1)-(A4) of Theorem 4.1 hold (for all p 1 q 1) and Theorem 3.1 is true for the LMS case.

Proof . First, in the LMS case, Conditions P1 and L1 (or L1') ensure that Condition (A1) of Theorem 4.1 holds for all p 1 (cf. 9], Theorem 3.3). Next, when L1) holds, by Example 4.1 we know that Condition (A2) is true for all q 1. Also, when L1') holds, by the assumed independency we have for all q 1,

k

E F _k

^jF

_k

^;

_m ]

^;

EF _k

^k

_q =

^k^X¹

j

=

m A _j Z _k

^;

_j

^;

EA _j Z _k

^;

_j ]

^k

q

= O (

^X¹

j

=

m

k

A j

^k

)

⁸

m 1 : Hence (A2) holds again for all q 1.

Moreover, Conditions (A3) and (A4) hold obviously in the present case. Finally, by 39, the result of Theorem 3.1(in the LMS case) follows directly from Theorem 4.1.

This completes the proof.

B. Analysis of RLS

For the RLS algorithm dened by (11), (13) and (14), let us introduce the following two kinds of excitation conditions:

R1) . There exist constants h > 0 c > 0 > 0 such that P

(

min ( ^k

^X⁺

^h

i

=

k

+1

' i ' _i ) c

^jF

k

)

>

⁸

k

R1') . There exists h > 0 such that sup _k E

"

min ( ^k

^X⁺

^h

i

=

k

+1

' i ' _i )]

#

;

t

<

¹

⁸

t 1 : The following weak dependence condition will also be used:

R2) . There exists a number t 5, such that

^k

' _k

^k4

t = O (1), and that

k

E ' k ' _k

^jF

k

^;

m ]

^;

E' k ' _k

^k2

t

( m )

⁸

k m where ( m )

^!

0 as m

^!¹

.

Remark 5.1 . Detailed discussions and investigations on the above rst two conditions can be found in 10] and 17].

It has been shown in 10] that if Condition P1 and (21) in Section 3 hold, then R1) is true Also, if

^f

' k

^g

is generated by a linear state space model as in Example 4.2, then R1') can be veried (cf.17]). Moreover, Condition R2) has been discussed in the last section.

Theorem 5.2. Let Conditions R1 ( or R1') and R2 above be satised. Then Conditions (A1)-(A3) of Theorem 4.1 hold (for any p < 2 tq < t ) and Theorem 3.1 is true for the RLS case.

Proof. First, note that

k

Y

j

=

i

+1

( I

^;

F j ) = (1

^;

) ^k

^;

ⁱ P k

+1

P _i

^;₊₁¹

⁸

k i (40) and P _k

^;¹

= (1

^;

) P _k

^;^;¹₁

+ ' k ' _k : (41) From this and condition R2 it follows that

k

P _k

^;¹^k2

t = O (1)

⁸

²

(0 1) : (42) Also, by Theorem 1 in 10], there is

²

(0 1) such that

f

P k

^g²^L

s (

)

⁸

s 1 (43) Combining (40), (42), (43), we get

f

F _k

^g²^S

_p

⁸

p < 2 t: (44)

(9)

Now, dene ( P

0

= P

0

)

P

^;

_k

¹

= (1

^;

) P

^;

_k

^;¹₁

+ E ( ' k ' _k ) : (45) Since either R1 or R1' implies P1 in Section 3 (cf.10]), by a similar (actually simpler) argument as that used for the proof of (43) we know that

^k

P k

^k

= O (1) : We next prove that

k

P _k

^;¹^;

P

^;

_k

¹^k2

t = O ( ( )) ( ) = min _m

1^f

p

m + ( m )

^g

: First, by (41) and (45) (46)

P _k

^;¹^;

P

^;

_k

¹

=

^X

^k

i

=1

(1

^;

) ^k

^;

ⁱ ' i ' _i

^;

E' i ' _i ] (47) For any xed m 1, by denoting

j ( i ) = E ' i ' _i

^jF

i

^;

j ]

^;

E ' i ' _i

^jF

i

^;

j

^;1

] 0

j

m

^;

1 we have

' i ' _i

^;

E' i ' _i

= ^m

^X^;¹

j

=0

j ( i ) +

^f

E ' i ' _i

^jF

i

^;

m ]

^;

E ' i ' _i ]

^g

(48) Now, since for each j , the sequence

^f

j ( i ) i 1

^g

is a martingale dierence, we can apply Lemma A.2 in the Ap- pendix to each such

^f

j ( i ) i 1

^g

to obtain

^k^X

^k

i

=1

(1

^;

) ^k

^;

ⁱ ^m

^X^;¹

j

=0

j ( i )

^k2

t = O (

^p

m ) (49) Also, by our assumption

^k^X

^k

i

=1

(1

^;

) ^k

^;

ⁱ

^f

E ' i ' _i

^jF

i

^;

m ]

^;

E ' i ' _i ]

^gk2

t

( m ) Hence, (46) follows from (47)-(50) immediately. (50)

Similar to the proof of (44), it is evident that

P _k ' _k ' _k

²^S

: (51) Now

k

P k

^;

P k

^k^k

P k

^k^k

P _k

^;¹^;

P

^;

_k

¹^k^k

P k

^k

from this, (43) and (46) it follows that

k

P k

^;

P k

^k

s = O ( ( ))

⁸

s < 2 t (for small ) (52) Hence, by Proposition 4.2 and (51), we know that

^f

F k

^g²

S

: This in conjunction with (44) veries Condition (A1).

Now, by (52) and R2 from Proposition 4.3 it is evident that Condition (A2) holds for any q < t .

To prove (A3), rst note that for any q < t , (44) implies

f

F _k

^g²^L2

q (

) for some

> 0 :

So we need only to prove that

f

L i

^g²^L

r (

) for r > ₍₁₂

^;

₂ ¹ t

^;

2 ³ t ⁾

^;¹

^{= 2} t t

^;

4 : This is true since by (43) and

^k

' k

^k4

t = O (1),

f

L i

^g

=

^f

P i ' i

^g²^L

r (

)

⁸

r < 4 t

and since 4 Thus, by taking t > t ²

^;

t 4. Hence (A3) holds. t =

¹

in the above argument, we see that Conditions (A1) and (A2) hold for all p 1 and all q 1. Hence Theorem 4.1 can be applied to prove Theorem 3.1 for the RLS case, while the expression for k will follow from Corollary 4.1 if we can prove that

k

P k

^;

R k

^k

s = O ( ( )) s = t

t

^;

1 (53) where P k and R k are respectively dened by (14) and (24).

Furthermore, by (52), it is clear that (53) will be true if

k

R k

^;

P k

^k

= O ( ( ))

holds. However, this can be veried by using the deni- tions for R k and P k (see Appendix B). Hence the proof is complete.

C. Analysis of the KF algorithm

Among the three basic algorithms described in Section 2, the KF algorithm dened by (11), (16) and (17) is the most complicated one to analyze. Let us now introduce the following two conditions on stochastic excitation and weak dependence.

K1) . There are constants h > 0 and

²

(0 1) (independent of ) such that

k

1 + b kh

+1

2S0

( )

where

^S⁰

( ) is dened by (30), and k and b k are dened as follows: (

^G

k is as before the sigma-algebra generated by

f

' i i

k

^g

.)

k =

⁴

min

8

<

:

E

2

4

1 1 + h

(

k

X+1)

h i

=

kh

+1

' i ' _i 1 +

^k

' i

^k²^jG

kh

3

5 9

=

b k = (1

^;

) b k

^;1

+ (

^k

' k

^k²

+ 1)

²

(0 1)

K2) . There exists a number t 7 together with a function ( m )

^!

0 ( as m

^!¹

) such that

^k

' k

^k4

t = O (1), and that

k

E ' k ' _k

^jF

k

^;

m ]

^;

E' k ' _k

^k

t

( m )

⁸

k m:

Remark 5.2 . If Conditions P1 and P2 of Section 3 are

satised, then both K1) and K2) above hold (cf.10]) When

P2 is replaced by, for example, the situation discussed in

Example 4.2, then again, both K1) and K2) can be veried

(cf. 8]).

Performance Analysis of General Tracking Algorithms