This work was supported by the National Natural Science Foundation of China and the Swedish Research Council for Engineering Sciences (TFR).

(1)

Necessary and Sucient Conditions for Stability of LMS

Lei Guo y

Lennart Ljung z

and Guan-Jun Wang x

First Version: November 9, 1995 Revised Version: August 28, 1996

Abstract . In a recent work 7], some general results on exponential stabil- ity of random linear equations are established, which can be applied directly to the performance analysis of a wide class of adaptive algorithms including the basic LMS ones, without requiring stationarity, independency and boundedness assumptions of the system signals. The current paper attempts to give a complete characterization of the exponential stability of the LMS algorithms, by providing a necessary and sucient condition for such a stability in the case of possibly unbounded, nonstationary and non- -mixing signals. The results of this paper can be applied to a very large class of signals including those generated from, e.g., a Gaussian process via a time-varying linear lter. As an application, sev- eral novel and extended results on convergence and tracking performance of LMS are derived under various assumptions. Neither stationarity nor Markov chain assumptions are necessarily required in the paper.

This work was supported by the National Natural Science Foundation of China and the Swedish Research Council for Engineering Sciences (TFR).

y

Institute of Systems Science, Chinese Academy of Sciences, Beijing, 100080, P.R. China.

Email: Lguo@iss03.iss.ac.cn.

z

Department of Electrical Engineering, Linkoping University, S-581 83 Linkoping, Sweden.

Email: Ljung@isy.liu.se.

x

Department of Mathematics, The Central University for Nationalities, Beijing 100081, P. R. China.

1

(2)

1 Introduction

1.1 The Contribution

The well-known least mean squares (LMS) algorithm, aiming at tracking the

\best linear t" of an observed (or desired) signal

^f

y k

^g

based on a measured d -dimensional (input) signal

^f

k

^g

, is dened recursively by

x k+1 = x k + k ( y k

^;

k x k ) x 0

²^R

d (1) where > 0 is a step-size.

Due to its simplicity, robustness and ease of implementation, the LMS algo- rithm is known to be one of the most basic adaptive algorithms in many areas including adaptive signal processing, system identication and adaptive control, and it has received considerable attention in both theory and applications over the past several decades (see, among many others, the books 20], 19] and 2], the survey 14], and the references therein). Also, it has been found recently that the LMS is H

¹

-optimal in the sense that it minimizes the energy gain from the disturbances to the predicted errors, and it is also risk sensitive optimal and minimizes a certain exponential cost function (see 11]).

In many situations, it is desirable to know at least the answers to the following questions:

Is the LMS stable in the mean squares sense?

Does the LMS have good tracking ability?

How to calculate and to minimize the tracking errors?

Now, for a given sequence

^f

k

^g

, (1) is a linear, time-varying dierence equa- tion. The properties of this equation are essentially determined by the homoge- neous equation:

x k+1 = ( I

^;

_{k k} ) x k (2)

with fundamental matrix

( tk ) =

^Y

^t

j=k ( I

^;

_{j j} ) (3)

2

(3)

The expression for tracking errors will then be of the form

t

X

k=1 ( tk ) v ( k ) (4)

where

^f

v ( k )

^g

describes the error sources (measurement noise, parameter varia- tions etc). As elaborated in, e.g., 8] and 6], the essential key to the analysis of (4) is to prove exponential stability of (3). This was also the motivation behind the work of 1]. We shall establish such exponential stability in the sense that for any p

1 there exist positive constants M , and

such that

E

^k

( tk )

^k

^p ] ^1=p

M (1

^;

) ^t

^;

^k

⁸

t

k

⁸

²

(0

] : (5) The expectation E here is with respect to the sequence

^f

k

^g

.

Clearly, the property (5) is a property of the sequence

^f

k

^g

only. We shall here establish (5) under very general conditions on

^f

k

^g

. These are of the kind (precise conditions are given in Theorem 2):

Restrictions on the dependence among the k : This takes the form that k

is formed by possibly time varying, but uniformly stable ltering of a noise source " j which is mixing and obeys an additional condition on the rate of decay of dependence.

Restrictions on the tail of the distribution of k . This takes the form that Eexp(

^k

" k

^k

2 )] < C

⁸

k (6) for some > 0 and some constant C . Here " k is the \source" from which

k was formed.

Both these restrictions are very mild, and allow for example the Gaussian, de- pendent case (unlike most previous treatments). Now, for sequences k subject to these two restrictions the necessary and sucient condition for (5) to hold is

that _k+h

X

i=k+1 E _{i i} ]

I

⁸

k

0 (7)

for some h > 0 and > 0. This is the \persistence of excitation" or \full rank"

condition on k .

This result is the main contribution of this paper. Furthermore, several direct applications of the stability result to adaptive tracking will be given under various noise assumptions, which in particular, yield more general results on LMS than those established recently in 8].

3

(4)

Most of the existing work related to exponential stability of (2) is concerned with the case where the signals

^f

k

^g

are independent or M -dependent (cf., e.g., 20], 19], 4], 1],2]). This independence assumption can be relaxed considerably if we assume that the signals

^f

k

^g

are bounded as in, e.g., 6],18] and 12].

Note that the boundedness assumption is suitable for the study of the so called normalized LMS algorithms (cf. 19], 6] and 15]), since the normalized signals are automatically bounded. In this case, some general results together with a very week (probably the weakest ever known) excitation condition for guaranteeing the exponential stability of LMS can be found in 6]. Moreover, in the bounded - mixing case, a complete characterization of the exponential stability can also be given. Indeed, in that case it has been shown in 6] that (7) is the necessary and sucient condition for (2) to be exponentially stable.

For general unbounded and correlated random signals, the stability analysis for the standard LMS algorithm (1), becomes more complex as to have deed complete solution for over 30 years. Recently, some general stability results ap- plicable to unbounded nonstationary dependent signals are established in 7], and based on which a number of results on the tracking performance of the LMS al- gorithms can be derived (see 8]). In particular, the result of 7] can be applied to a typical situation where the signal process is generated from a white noise sequence through a stable linear lter :

k =

^X¹

j=

^;1

A j " k

^;

j + k

^X¹

j=

^;1^k

A j

^k

<

¹

(8)

where

^f

" k

^g

is an independent sequence satisfying

sup _k E exp(

^k

" k

^k

)] <

¹

for some > 0 > 2 (9) and

^f

k

^g

is a bounded deterministic process.

It is obvious that the expression (8) has a similar form as the well-known Wold decomposition for wide-sense stationary processes. Note, however, that the signal process

^f

k

^g

dened by (8) need not be a stationary process nor a Markov chain in general.

Unfortunately, the condition (9) with > 2 excludes the case where

^f

" k

^g

is a

4

(5)

Gaussian process, since such signals could only satisfy a weaker condition : sup _k E exp(

^k

" k

^k

2 )] <

¹

for some > 0 : (10) The motivation of this paper has thus been to relax the moment condition (9) so that, at least, the signal process

^f

k

^g

dened by (8) and (10) can be included.

This will be done in a more general setting based on a relaxation of the moment condition used in Theorem 3.2 of 7].

2 The Main results

2.1 Notations

Here we adopt the following notations introduced in 7].

a) . The maximum eigenvalue of a matrix X is denoted by max ( X ), and the Euclidean norm of X is dened as its maximum singular value, i.e.,

k

X

^k

=

⁴^f

max ( XX )

^g¹²

and the L p -norm of a random matrix X is dened as

k

X

^k

p =

⁴ ^f

E (

^k

X

^k

^p )

^g^p¹

p

1 b) . For any square random matrix sequence F =

^f

F k

^g

, and real numbers p

1

²

(0 1), the L p -exponentially stable family

^S

p is dened by

S

p (

) =

8

<

:

F :

^k ^Y

^k

j=i+1 ( I

^;

F j )

^k

p

M (1

^;

) ^k

^;

ⁱ

8

²

(0

]

⁸

k

i

0 for some M > 0and

²

(0 1)

)

Likewise, the averaged exponentially stable family

^S

is dened by

S

(

) =

8

<

:

F :

^k ^Y

^k

j=i+1 ( I

^;

E F j ])

^k

M (1

^;

) ^k

^;

ⁱ

8

²

(0

]

⁸

k

i

0 for some M > 0 and

²

(0 1)

)

In what follows , it will be convenient to set

S

p =

⁴

²

(01)

S

p (

)

^S

=

⁴

²

(01)

S

(

) (11)

5

(6)

c) . Let p

1 F =

⁴^f

F i

^g

. Set

M

p =

F : sup _i

^k

S i ^(T)

^k

p = o ( T ) as T

^!¹

(12) where

S _i ^(T) = ^(i+1)T

^X^;

¹

j=iT ( F j

^;

E F j ]) (13)

The denition of

^M

p is reminiscent of the law of large numbers. As shown by Lemma 3 of 9], it includes a large class of random processes.

2.2 The Main Results

We rst present a preliminary theorem.

Theorem 1 ^{. Let}

^f

^F ^k

^g

be a random matrix process. Then

f

F k

^g²^S

=

⁾ ^f

F k

^g²^S

p

⁸

p

1 provided that the following two conditions are satised:

(i). There exist positive constants "M and K such that for any n

1, E

"

exp

"

^X

ⁿ

i=1

^k

F j

i^k

!#

M exp( Kn ) holds for any integer sequence 0

j 1 < j 2 ::::: < j n .

(ii). There exists a constant M and a nondecreasing function g ( T ) with g ( T ) = o ( T ), as T

^! ¹

, such that for any xed T , all small > 0 and any n

i

0,

E

8

<

:

exp

0

@

^X

ⁿ

j=i+1

^k

S ^(T) _j

^k

1

A 9

=

M exp

^f

g ( T ) + o ( )]( n

^;

i )

^g

where S j ^(T) is dened by (13).

The proof is given in Section 4.

Remark 1 . The form of Theorem 1 is similar to that of Theorem 3.2 in 7].

The key dierence lies in the condition (i). This condition was introduced in 5], p.112 and is, in a certain sense, a relaxation of the corresponding condition used in Theorem 3.2 of 7]. Such a relaxation enables us to include Gaussian signals as a special case, when the LMS algorithms are in consideration, as will be shown shortly.

6

(7)

Based on Theorem 1 we may prove that for a large class of unbounded non- stationary signals including (8), the condition (7) is also necessary and sucient for the exponential stability of LMS.

Let us start with the following decomposition which is more general than that in (8):

k =

^X¹

j=

^;1

A ( kj ) " k

^;

j + k

^X¹

j=

^;1

sup _k

^k

A ( kj )

^k

<

¹

(14) where

^f

k

^g

is a d -dimensional bounded deterministic process, and

^f

" k

^g

is now a general m -dimensional -mixing sequence. The weighting matrices A ( kj )

²

R

d

m are assumed to be deterministic.

We remark that the summability condition in (14) is precisely the standard denition for uniform stability of time-varying linear lters (cf., e.g., 13]). Also, recall that a random sequence

^f

" k

^g

is called -mixing if there exists a non- increasing function ( m ) (called the mixing rate) with ( m )

²

0 1],

⁸

m

0 and ( m )

^!

0 as m

^!¹

such that

A

^2F^;1^k

sup B

^2F_k¹⁺_m^j

P ( B

^j

A )

^;

P ( B )

^j

( m )

⁸

m

0 k

²

(

^;1

¹

)

where by denition

^F

i ^j ,

^;1

i

j

¹

, is the -algebra generated by

^f

" k i

k

j

^g

.

The -mixing concept is a standard one in the literature for describing weakly dependent random processes. As is well-known, the -mixing property is satised by, for example, any M-dependent sequences, sequences generated from bounded white noises via a stable linear lter, and stationary aperiodic Markov chains which are Markov ergodic and satisfy Doeblin's condition (cf. 3]).

The main result of this paper is then stated as follows.

Theorem 2 . Consider the random linear equation (2). Let the signal process

f

k

^g

be generated by (14) where

^f

k

^g

is a bounded deterministic sequence, and

f

" k

^g

is a -mixing process which satises for any n

1 and any integer sequence j 1 < j 2 ::::: < j n

E

"

exp

^X

ⁿ

i=1

^k

" j

i^k

2

!#

M exp( Kn ) (15) where M and K are positive constants. Then for any p

1, there exist constants

> 0, M > 0 and

²

(0 1), such that for all

²

(0

]

2

4

E

^k ^Y

^t

j=k+1 ( I

^;

_{j j} )

^k

^p

3

5

1=p

M (1

^;

) ^t

^;

^k

⁸

t

k

0 (16)

7

(8)

if and only if there exists an integer h > 0 and a constant > 0 such that

k+h

^X

i=k+1 E _{i i} ]

I

⁸

k

0 : (17)

The proof is also given in Section 4.

Remark 2 . By taking A ( k 0) = IA ( kj ) = 0

⁸

k

⁸

j

⁶

= 0 and k = 0

⁸

k in (14), we see that

^f

k

^g

coincides with

^f

" k

^g

, which means that Theorem 2 is applicable to any -mixing sequences. Furthermore, if

^f

" k

^g

is bounded, then (15) is automatically satised. This shows that Theorem 2 may include the corresponding result in 6] as a special case.

Note, however, that a linearly ltered -mixing process like (14) will no longer be a -mixing sequence in general (because of the possible unboundedness of

f

" k

^g

). In fact, Theorem 2 is applicable also to a quite large class of processes other than -mixing, as shown by the following corollary.

Corollary 1 . Let the signal process

^f

k

^g

be generated by (14) where

^f

k

^g

is a bounded deterministic sequence, and

^f

" k

^g

is an independent sequence satisfying condition (10). Then

^f

_{k k}

^g ² ^S

p for all p

1 if and only if there exists an integer h > 0 and a constant > 0 such that (17) holds.

Proof . By Theorem 2, we need only to show that condition (15) is true. This is obvious since

^f

" k

^g

is an independent sequence satisfying (10).

²

Remark 3 . Corollary 1 continues to hold if the independence assumption of

f

" k

^g

is weakened to M-dependence. Moreover, the moment condition (10) used in Corollary 1 may also be further relaxed if additional conditions are imposed.

This is the case when, for example,

^f

k

^g

is a stationary process generated by a stable nite dimensional linear state space model with the innovation process

f

" k

^g

being an i.i.d. sequence (see, 16]).

3 Performance of Adaptive tracking

Let us now assume that

^f

y k

^g

and

^f

k

^g

are related by a linear regression

y k = _k x

_k + v k (18)

where

^f

x

_k

^g

is the true or \ctitious" time-varying parameter process, and

^f

v k

^g

represents the disturbance or unmodeled dynamics.

8

(9)

The objective of the LMS algorithm (1) is then to track the time-varying un- known parameter process

^f

x

_k

^g

. The tracking error will depend on the parameter variation process

^f

k

^g

dened by

k = x

_k

^;

x

_k

^;

₁ (19)

through the following error equation obtained by substituting (18)-(19) into (1):

x ~ k+1 = ( I

^;

_{k k} )~ x k + k v k

^;

k+1 (20) where ~ x k =

⁴

x k

^;

x

_k .

Obviously, the quality of tracking will essentially depend on properties of

f

k k v k

^g

. The homogeneous part of (20) is exactly the equation (2), and can be dealt with by Theorem 2. Hence, we need only to consider the non- homogeneous terms in (20). Dierent assumptions on

^f

k v k

^g

will give dierent tracking error bounds or expressions, and we shall treat three cases separately in the following.

3.1 First Performance Analysis

By this, we mean that the tracking performance analysis is carried out under a \worst case" situation, i.e., the parameter variations and the disturbances are only assumed to be bounded in an averaging sense. To be specic, let us make the following assumption:

A1) . There exists r > 2 such that

= sup

⁴

_k

^k

v k

^k

r <

¹

and = sup

⁴

_k

^k

k

^k

r <

¹

Note that this condition includes any \unknown but bounded" deterministic disturbances and parameter variations as a special case.

Theorem 3 . Consider the LMS algorithm (1) applied to (18). Let condition A1) be satised. Also, let

^f

k

^g

be as in Theorem 2 with (17) satised. Then for all t

1 and all small > 0

E

^k

x t

^;

x

_t

^k

² = O ( ² + ²

² ^{) +} O (1

^;

] ^t ) where

²

(0 1) is a constant.

9

(10)

This result follows immediately from Theorem 2, (20) and the Holder inequal- ity. We remark that various such \worst case" results for other commonly used algorithms(e.g., RLS and KF) may be found in 6]. The main implication of Theorem 3 is that the tracking error will be small if both the parameter variation ( ) and the disturbance ( ) are small.

3.2 Second Performance Analysis

By this, we mean that the tracking performance analysis is carried out for zero mean random parameter variations and disturbances which may be correlated processes in general. To be specic, we introduce the following set for r

1,

N

r =

8

<

:

w : sup _k

^k

^k+n

^X

i=k+1 w i

^k

r

c _wr

^p

n

⁸

n

1

9

=

(21)

where c _wr is a constant depending on r and the distribution of

^f

w i

^g

only.

Obviously,

^N

r is a subset of

^M

r dened by (12). It is known (see 9]) that martingale dierence, zero mean

^;

and

^;

mixing sequences can all be included in

^N

r . Also, from the proof of Lemma 3 in 9], it is known that the constant c _wr can be dominated by sup _k

^k

w k

^k

r in the rst two cases, and by sup _k

^k

w k

^k

r+ ( > 0), in the last case.

Moreover, it is interesting to note that

^N

r is invariant under linear transfor- mations. This means that if

^f

k

^g

and

^f

" k

^g

are related by (8) with k

0, then

f

" k

^g ² ^N

r implies that

^f

k

^g ² ^N

r . This can be easily seen from the following inequality:

k

k+n

^X

i=k+1 i

^k

r =

^k ^X¹

j=

^;1

A j k+n

^X

i=k+1 " i

^;

j

^k

r

1

X

j=

^;1^k

A j

^k^k

k+n

^X

i=k+1 " i

^;

j

^k

r

Thus, random processes generated from martingale dierences, or

^;

or

^;

mixing sequences via an innite order linear lter can all be included in

^N

r .

Now, we are in a position to introduce the following condition for the second performance analysis.

A2) ^{. For some} ^{r >} ^2,

^f

^k

^g²^N

^r ^and

^f

^k ^v ^k

^g² ^N

^r ^:

Theorem 4 . Consider the LMS algorithm (1) applied to the model (18). Let

f

k

^g

be dened as in Theorem 2 with (17) satised, and let the condition A2)

10

(11)

hold for a certain r . Then for all t

1 and all small > 0, E

^k

x t

^;

x

t

^k

2 = O

( c ^v r ) ² + ( c _r ) ²

!

+ O (1

^;

] ^t )

where c ^v _r and c _r are the constants dened in (21), and which depend on the distributions of

^f

k v k

^g

and

^f

k

^g

respectively. Moreover, is the same constant as in Theorem 3.

Proof . By Lemma A.2 of 8] and Theorem 2, it is easy to see from (20) that

the desired result is true.

²

Note that the upper bound in Theorem 4 signicantly improves the \crude"

bound given in Theorem 3 for small , and it roughly indicates the familiar trade-o between noise sensitivity and tracking ability.

Theorem 4 can be applied directly to the convergence analysis of some stan- dard ltering problems (cf. 20],4] and 2]). For example, let

^f

y k

^g

and

^f

k

^g

be two stationary processes, and assume that our purpose is to track the least mean squares solution

x

= E ( _{k k} )]

^;

¹ E ( k y k )

of min _x E ( y k

^;

x k ) ²

recursively based on real-time measurements

^f

y i i i

k

^g

. Now, dene

^f

v k

^g

by

y k = _k x

+ v k

It is then obvious that E k v k = 0. Furthermore, in many standard situations it can be veried that

^f

k v k

^g ² ^N

r for some r > 2. Thus, Theorem 4 applied to the above linear regression, gives

E

^k

x t

^;

x

^k

² = O ( ) + O (1

^;

] ^t ) which tends to zero as t

^!¹

and

^!

0. Apparently, Theorem 4 is also applicable to nonstationary signals

^f

y k

^g

and

f

k

^g

.

3.3 Third Performance Analysis

By this, we mean that the analysis is purposed to get an explicit (approxi- mate) expression for the tracking performance rather than just getting an upper

11

(12)

bound as in the previous two cases. This is usually carried out under white noise assumptions on

^f

k v k

^g

. Roughly speaking, the parameter process in this case will behave like a random walk, and some detailed interpretations of this param- eter model may be found in 14] and 8]. We make the following assumptions:

A3 . The regressor process is generated by a time-varying causal lter

k =

^X¹

j=0 A ( kj ) " k

^;

j + k

^X¹

j=0 sup _k

^k

A ( kj )

^k

<

¹

(22) where

^f

k

^g

is a bounded deterministic sequence, and

^f

" k k v k

^;

1

^g

is a -mixing process with mixing rate denoted by ( m ). Assume also that (15) and (17) hold.

A4 . The process

^f

k v k

^g

satises the following conditions:

( i ) : E v k

^jF

k ] = 0 E k+1

^jF

k ] = E k+1 v k

^jF

k ] = 0 ( ii ) : E v _2k

^jF

k ] = R v ( k ) E k _k ] = Q ( k )

( iii ) : sup _k E

^j

v k

^j

r

jF

k ]

M = sup

⁴

_k

^k

k

^k

r <

¹

where r > 2 and M > 0 are constants, and

^F

k denotes the -algebra generated by

^f

" i i v i

^;

1 i

k

^g

.

Theorem 5 . Consider the LMS algorithm (1) applied to the model (18). Let conditions A3) and A4) be satised. Then the tracking error covariance matrix has the following expansion for all t

1 and all small > 0

E ~ x t x ~ _t ] = ! t + O

( ) + ²

^{+ (1}

^;

) ^t ]

!

where the function ( )

^!

0 as

^!

0, and ! t is recursively dened by

! t+1 = ( I

^;

S t )! t ( I

^;

S t ) + ² R v ( t ) S t + Q ( t + 1) with S t = E _{t t} ] and R v ( t ) and Q ( t ) being dened as in condition A4).

This theorem relaxes and unies the conditions used in Theorem 5.1 of 8].

The proof is given in Section 4. The expression for the function ( ) may be found from the proof, and from the related formula in Theorem 4.1 of 8]. (See (45)).

Note that in the (wide-sense) stationary case, S t

SR v ( t )

R v Q ( t )

Q , and ! t will converge to a matrix ! dened by the Lyapunov equation(cf.8])

S ! + ! S = R v S + Q

12

(13)

In this case, the trace of the matrix !, which represents the dominating part of the tracking error E

^k

x ~ t

^k

2 for small and large t , can be expressed as

tr _{(!) = 12} R v d + tr ( S

^;

¹ Q )

^]

where d =

⁴

dim ( k ). Minimizing tr (!) with respect to , one obtain the following formula for the step-size :

=

s

tr ( S

^;

¹ Q ) R v d :

4 Proof of Theorems 1, 2 and 5

Proof of Theorem 1 ^.

By the proof of Lemma 5.2 in 7] we know that Theorem 1 will be true if (32) in 7] can be established. However, by (34) in 7] and condition (ii), it is easy to see that we need only to show that for any xed c

1, t

1 and T > 1, and for all small > 0,

k

n

Y

j=i+1 (1 + ² c

^k

H j

^k

)

^k

t

M

1 + O (

²³

)

ⁿ

^;

ⁱ

⁸

n > i (23) where M > 0 is a constant and

² H j = ² H j (2) + ³ H j (3) +

+ ^T H j ( T ) + O ( ² )

with H j ( k ) =

^X

jT

j

¹

<j

²

<

<j

_k

(j+1)T

^;

1 F j

k

F j

¹

k = 2

T:

Now, let us set

f j = exp

^f

¹⁴

^(j+1)T

^X^;

¹

s=jT

^k

F s

^kg

Then for any 2

k

T and jT

j 1 < ::: < j k

( j + 1) T

^;

1, by using the inequalities k

³ ₂ + ^k ₄ and x

exp ( x ), we have for

²

(0 1)

^k

F j

k

:::F j

¹^k

³²

(

¹⁴^k

F j

_k^k

) ::: (

¹⁴^k

F j

¹^k

)

³²

exp

^f

¹⁴

(

^k

F j

¹^k

+ ::: +

^k

F j

k^k

)

^g

³²

f j

13

(14)

Consequently,

(1 + ² c

^k

H j

^k

)

T

Y

k=2 (1 + ^k c

^k

H j ( k )

^k

)(1 + O ( ² ))

T

Y

k=2

Y

iT

j

¹

<j

²

<j

_k

(i+1)T

^;

1 (1 + ^k c

^k

F j

_k

F j

¹^k

)(1 + O ( ² ))

(1 +

³²

cf j ) ²

^T

(1 + O ( ² )) (24) Note that

n

Y

j=i+1 (1 +

³²

cf j ) = ⁿ

^X^;

ⁱ

k=0 (

³²

c ) ^k

^X

i+1

j

¹

<:::<j

k

n f j

¹

:::f j

k

Now, applying the Minkowski inequality to the above identity, noting the disjoint property of the sets

^f

j i T < j < ( j i + 1) T

^;

1

^g

, i = 1 2 ::: , for j 1 < j 2 <

::: , taking small enough so that 2 ^T t

¹⁴

" and using Condition (i) it is evident that

k

n

Y

j=i+1 (1 +

³²

cf j )

^k

2

^T

t

n

^;

i

X

k=0 (

³²

c ) ^k

^X

i+1

j

¹

<:::<j

k

n M

²^{T t}¹

exp

^f

( KT 2 ^T t ⁾ k

^g

M

²^{T t}¹

1 + c

²³

exp( KT 2 ^T t ⁾

n

^;

i

Finally, from this and (24), we have for any n > i

k

n

Y

j=i+1 (1 + ² c

^k

H j

^k

)

^k

t

n

Y

j=i+1 (1 +

³²

cf j )

2

^T

2

^T

t 1 + O ( ² )] ⁿ

^;

ⁱ

M

8

<

:

1 + c

³²

exp( KT 2 ^T t ⁾

2

^T⁹⁼

n

^;

i

1 + O ( ² )] ⁿ

^;

ⁱ

M 1 + O (

³²

)] ⁿ

^;

ⁱ for all small > 0 which is (23). This completes the proof of Theorem 1.

2

14

(15)

The proof of Theorem 2 is rather involved, and so it is divided (prefaced) with several lemmas.

For the analysis to follow, it is convenient to rewrite (14) as

k =

^X¹

j=

^;1

a j " ( kj ) + k

^X¹

j=

^;1

a j <

¹

(25)

where by denition

a j = sup

⁴

_k

^k

A ( kj )

^k

" ( kj ) =

⁴

a

^;

_j ¹ A ( kj ) " k

^;

j (26) (We set " ( kj ) = 0

⁸

k , if a j = 0 for some j ).

The new process

^f

" ( kj )

^g

has the following simple properties:

(i). For any k and j ,

^k

" ( kj )

^k^k

" k

^;

j

^k

(ii). For any xed j , the process

^f

" ( kj )

^g

is -mixing with the same mixing rate as

^f

" k

^g

(iii). For any k and j , " ( kj ) is

^f

" k

^;

j

^g

-measurable.

These three properties will be frequently used in the sequel without further explanations.

Lemma 1 ^{. Let}

^f

^F t

^g

be a -mixing d

d dimensional matrix process with mixing rate

^f

( m )

^g

. Then

sup _i

^k

S _i ^(T)

^k

2 2 cd

(

T ^T

^X^;

¹

m=0

q

( m )

) 1

2

⁸

T

1 where S _i ^(T) is dened by (13) and c is dened by c = sup

⁴

_i

^k

F i

^;

EF i

^k

2 .

Proof ^{. Denote} G k = F k

^;

EF k . Then by Theorem A.6 in 10](p.278) we have

k

E G j G _k ]

^k

2 dc ²

^q

(

^j

j

^;

k

^j

)

⁸

jk Consequently, by using the inequality

j

trF

^j

d

^k

F

^k

⁸

F

²^R

^d

^d We get

k

S i ^(T)

^k

2 2 = E

^k

^(i+1)T

^X^;

¹

jk=iT G j G _k

^k

15

(16)

tr

^f

^(i+1)T

^X^;

¹

jk=iT EG j G _k

^g

d ^(i+1)T

^X^;

¹

jk=iT

k

EG j G _k

^k

2 c ² d ² ^(i+1)T

^X^;

¹

jk=iT

q

(

^j

j

^;

k

^j

)

4 c ² d ² T ^T

^X^;

¹

m=0

q

( m )

This gives the desired result.

²

Lemma 2 ^{. Let} ^F k = _{k k} , where

^f

k

^g

is dened by (14) with sup _k

^k

" k

^k

4 <

¹

. Then

^f

F k

^g²^M

2 where

^M

2 is dened by (12).

Proof . First of all, we may assume that the process

^f

" k

^g

is of zero mean (otherwise, the mean can be included in k ). Then by (25),

k

S _i ^(T)

^k

2 =

^k

^(i+1)T

^X^;

¹

t=iT _{t t}

^;

E _{t t} ]

^k

2

1

X

kj=

^;1

a k a j

^k

(i+1)T

^;

1

X

t=iT " ( tk ) " ( tj )

^;

E" ( tk ) " ( tj ) ]

^k

2 +2

^X¹

j=

^;1

a j

^k

(i+1)T

X^;

1 t=iT " ( tj ) _t

^k

2 (27)

Note that for any xed k and j , both the processes

^f

" ( tk ) " ( tj )

^g

and

f

" ( tj )

^g

are -mixing with mixing rate ( m

^;^j

k

^;

j

^j

) and ( m ) respectively (where by denition, ( m ) = 1

⁴

⁸

m < 0) .

By Lemma 1, it is easy to see that the last term in (27) is of order o ( T ). For dealing with the second last term, we denote

f kj ( T ) = 2 cd

(

T ^T

^X^;

¹

m=0

q

( m

^;^j

k

^;

j

^j

)

) 1

2

: (28)

where c is dened as in Lemma 1. Consequently, by ( m )

1,

⁸

m , it is not dicult to see that

sup _kj f kj ( T )

2 cdT (29)

and sup

j

k

^;

j

^j

<

^p

T f kj ( T ) = o ( T ) : (30)

16

(17)

Now, by the summability of

^f

a j

^g

,

X

j

k

^;

j

^j^p

T a k a j

^!

0 as T

^!¹

Hence by (29)

X

j

k

^;

j

^j^p

T a k a j f kj ( T ) = o ( T ) (31) and by (30)

X

j

k

^;

j

^j

<

^p

T a k a j f kj ( T ) = o ( T ) : (32) Combining (31) and (32) gives

1

X

kj=

^;1

a k a j f kj ( T ) = o ( T ) : (33) By this and Lemma 1, we know that the second last term in (27) is also of the order o ( T ) uniformly in i . Hence,

^f

F k

^g²^M

2 by the denition (12).

²

Lemma 3 ^{. Let sup} _k E

^k

k

^k

2 <

¹

. Then

^f

_{k k}

^g ²^S

if and only if condition (17) holds, where

^S

is dened in (11).

Proof. Let us rst assume that (17) is true. Take

= (1 + sup _k E

^k

k

^k

2 )

^;

¹ . Then applying Theorem 2.1 in 6] to the deterministic sequence A k = E _{k k} ] for any

²

(0

], it is easy to see that

^f

_{k k}

^g²^S

(

).

Conversely, if

^f

_{k k}

^g²^S

, then there exists

²

(0 (1+sup _k E

^k

k

^k

2 )

^;

¹ ] such that

^f

_{k k}

^g ² ^S

(

). Now, applying Theorem 2.2 in 6] to the deterministic sequence A k =

E _{k k} ], it is easy to see that (17) holds. This completes the

proof.

²

Lemma 4 ^{. Let} ^F k = _{k k} , where

^f

k

^g

is dened by (14) with (15) satised.

Then

^f

F k

^g

satises Condition (i) of Theorem 1.

Proof . Without loss of generality assume that k

0. Let us denote A =

^X¹

j=

^;1

a j (34)

where

^f

a j

^g

is dened by (26). Then by the Schwarz inequality from (25) we have

k

^k

2 A

^X¹

j=

^;1

a j

^k

" k

^;

j

^k

2

17

(18)

Consequently, by the Holder inequality and (15) we have for "

A

^;

² E exp

^f

"

^X

ⁿ

i=1

^k

F j

i^kg

E exp

^f

"A

^X¹

j=

^;1

a j

^X

n i=1

k

" j

i^;

j

^k

2

g

= E

^Y¹

j=

^;1

exp

^f

"Aa j

^X

n i=1

k

" j

_i^;

j

^k

2

g

1

Y

j=

^;1

E exp

^f

"A ²

^X

ⁿ

i=1

k

" j

i^;

j

^k

2

g

!ajA

1

Y

j=

^;1

( M exp

^f

Kn

^g

)

^aj^A

= M exp

^f

Kn

^g

:

This completes the proof.

²

The following lemma was originally proved in 5] (p.113).

Lemma 5 . Let

^f

z k

^g

be a nonnegative random sequence such that for some a > 0 b > 0 and for all i 1 < i 2 < :::::: < i n

⁸

n

1,

E exp

^f^X

ⁿ

k=1 z i

_k^g

exp

^f

an + b

^g

: (35)

Then for any L > 0 and any n

i

0, E exp

^f

1

2 n

X

j=i+1 z j I ( z j

L )

^g

exp

^f

e ^a

^;^L²

( n

^;

i ) + b

^g

where I (

) is the indicator function.

Proof ^{. Denote}

f j = exp(12 z j ) I ( z j

L ) :

Then by rst applying the simple inequality I ( x

L )

e

^x²

=e

^L²

and then using (35), we have for any subsequence j 1 < j 2 :::::: < j k

E f j

¹

::::::f j

_k

]

= E _exp(12

^X

^k

i=1 z j

i

)

^Y

^k

i=1 I ( z j

i

L )

E exp(

^X

^k

i=1 z j

i

) = exp( kL 2 )

exp

^f

( a

^;

L

2 ) k + b

^g

18

(19)

By this we have

E exp

^f ^X

ⁿ

j=i+1

1 2 z j I ( z j

L )

^g

= E

^Y

ⁿ

j=i+1 exp

^f

1 2 z j I ( z j

L )

^g

E

^Y

ⁿ

j=i+1

^f

1 + exp(12 z j ) I ( z j

L )

^g

= E

^Y

ⁿ

j=i+1

^f

1 + f j

^g

= E

8

<

:

n

^;

i

X

k=0

X

i+1

j

¹

<:::<j

_k

n f j

¹

:::f j

_k

9

=

e ^b

8

<

:

n

^;

i

X

k=0

X

i+1

j

¹

<:::<j

_k

n exp

^f

( a

^;

L 2 ) k

^g

9

=

= e ^{b n}

^Y

j=i+1

f

1 + exp( a

^;

L 2 )

^g

exp

( n

^;

i )exp( a

^;

L 2 ) + b

This completes the proof of Lemma 5.

²

Lemma 6 ^{. Let} ^F k = _{k k} , where

^f

k

^g

is dened by (14) with (15) satised.

Then

^f

F k

^g

satises Condition (ii) of Theorem 1.

Proof . Set for any xed k and l , z j =

⁴

z j ( kl ) =

^k

^(j+1)T

^X^;

¹

t=jT " ( tk ) " ( tl )

^;

E" ( tk ) " ( tl ) ]

^k

Then, similar to (27) from (25) we have

n

X

j=i+1

k

S _j ^(T)

^k ^X¹

kl=

^;1

a k a l

^X

n j=i+1 z j + +2

^X¹

k=

^;1

a k

^X

n j=i+1

^k

(j+1)T

X^;

1 t=jT " ( tk ) _t

^k

: (36) We rst consider the second last term in (36). By the Holder inequality,

E exp

8

<

:

^X¹

kl=

^;1

a k a l

^X

n j=i+1 z j

9

=

This work was supported by the National Natural Science Foundation of China and the Swedish Research Council for Engineering Sciences (TFR).

Necessary and Sucient Conditions for Stability of LMS

First Version: November 9, 1995 Revised Version: August 28, 1996

This work was supported by the National Natural Science Foundation of China and the Swedish Research Council for Engineering Sciences (TFR).

Institute of Systems Science, Chinese Academy of Sciences, Beijing, 100080, P.R. China.

Email: Lguo@iss03.iss.ac.cn.

Department of Electrical Engineering, Link oping University, S-581 83 Link oping, Sweden.

Email: Ljung@isy.liu.se.

Department of Mathematics, The Central University for Nationalities, Beijing 100081, P. R. China.

1

1 Introduction

The well-known least mean squares (LMS) algorithm, aiming at tracking the

\best linear t" of an observed (or desired) signal

y k

based on a measured d -dimensional (input) signal

k

, is dened recursively by

x k+1 = x k + k ( y k

k x k ) x 0

d (1) where > 0 is a step-size.

-optimal in the sense that it minimizes the energy gain from the disturbances to the predicted errors, and it is also risk sensitive optimal and minimizes a certain exponential cost function (see 11]).

In many situations, it is desirable to know at least the answers to the following questions:

Is the LMS stable in the mean squares sense?

Does the LMS have good tracking ability?

How to calculate and to minimize the tracking errors?

Now, for a given sequence

k

, (1) is a linear, time-varying dierence equa- tion. The properties of this equation are essentially determined by the homoge- neous equation:

x k+1 = ( I

k k ) x k (2)

with fundamental matrix

 ( tk ) =

t

j=k ( I

j j ) (3)

2

The expression for tracking errors will then be of the form

t

k=1  ( tk ) v ( k ) (4)

where

v ( k )

1 there exist positive constants M ,  and

such that

E

 ( tk )

p ] 1=p

M (1

 ) t

k

t

k

(0

] : (5) The expectation E here is with respect to the sequence

k

.

Clearly, the property (5) is a property of the sequence

k

only. We shall here establish (5) under very general conditions on

k

. These are of the kind (precise conditions are given in Theorem 2):

Restrictions on the dependence among the k : This takes the form that k

is formed by possibly time varying, but uniformly stable ltering of a noise source " j which is mixing and obeys an additional condition on the rate of decay of dependence.

Restrictions on the tail of the distribution of k . This takes the form that E exp( 

" k

2 )] < C

k (6) for some  > 0 and some constant C . Here " k is the \source" from which

k was formed.

Both these restrictions are very mild, and allow for example the Gaussian, de- pendent case (unlike most previous treatments). Now, for sequences k subject to these two restrictions the necessary and sucient condition for (5) to hold is

that k+h

i=k+1 E i i ]

I

k

0 (7)

for some h > 0 and > 0. This is the \persistence of excitation" or \full rank"

condition on k .

This result is the main contribution of this paper. Furthermore, several direct applications of the stability result to adaptive tracking will be given under various noise assumptions, which in particular, yield more general results on LMS than those established recently in 8].

3

Most of the existing work related to exponential stability of (2) is concerned with the case where the signals

k

are independent or M -dependent (cf., e.g., 20], 19], 4], 1], 2]). This independence assumption can be relaxed considerably if we assume that the signals

Department of Electrical Engineering, Linkoping University, S-581 83 Linkoping, Sweden.

\best linear t" of an observed (or desired) signal

, is dened recursively by

, (1) is a linear, time-varying dierence equa- tion. The properties of this equation are essentially determined by the homoge- neous equation:

_{k k} ) x k (2)

( tk ) =

^t

_{j j} ) (3)

k=1 ( tk ) v ( k ) (4)

1 there exist positive constants M , and

( tk )

^p ] ^1=p

) ^t

^k

is formed by possibly time varying, but uniformly stable ltering of a noise source " j which is mixing and obeys an additional condition on the rate of decay of dependence.

Restrictions on the tail of the distribution of k . This takes the form that Eexp(

k (6) for some > 0 and some constant C . Here " k is the \source" from which

that _k+h

i=k+1 E _{i i} ]

are independent or M -dependent (cf., e.g., 20], 19], 4], 1],2]). This independence assumption can be relaxed considerably if we assume that the signals

are bounded as in, e.g., 6],18] and 12].

sup _k E exp(

)] <

for some > 0 > 2 (9) and

dened by (8) need not be a stationary process nor a Markov chain in general.

Gaussian process, since such signals could only satisfy a weaker condition : sup _k E exp(

for some > 0 : (10) The motivation of this paper has thus been to relax the moment condition (9) so that, at least, the signal process

dened by (8) and (10) can be included.

a) . The maximum eigenvalue of a matrix X is denoted by max ( X ), and the Euclidean norm of X is dened as its maximum singular value, i.e.,

and the L p -norm of a random matrix X is dened as

^p )

p is dened by

^k

) ^k

ⁱ