Lennart Ljung and

(1)

of LMS

Lei Guo

Institute of Systems Science, Chinese Academy of Sciences Beijing, 100080, China

Lennart Ljung and

^y

Department of Electrical Engineering, Linkoping University S-581 83 Linkoping, Sweden

G. J. Wang

^z

Institute of eSystems Science, Chinese Academy of Sciences Beijing, 100080, China

October 30, 1995

Abstract . In a recent work 7], some general results on exponential stabil- ity of random linear equations are established, which can be appled directly to the performance analysis of a wide class of adaptive algorithms including the basic LMS ones, without requiring stationarity, independency and boundedness assumptions of the system signals. The main purpose of this paper is to provide further results on exponential stability of the LMS algorithms, in particular, to

SupportedbytheNationalNaturalScienceFoundationofChina

y

SupportedbytheSwedish ResearchCouncilforEngineeringSciences(TFR)

z

SupportedbytheNationalNaturalScienceFoundationofChina

1

(2)

provide a necessary and sucient condition for such a stability in the case of possibly unbounded and non- -mixing signals. The results of this paper can be applied to a fairely large class of signals including those generated from, e.g., a Gaussian process via a stable linear lter. As an application, several re ned and extended results on convergence and tracking performance of LMS are derived under various assumptions. Neither stationarity nor Markov chain assumptions are necessarily required in the paper.

1 Introduction

The well-known least mean squares (LMS) algorithm, aiming at tracking the

\best linear t" of an observed (or desired) signal

^f

y k

^g

based on a measured d -dimensional (input) signal

^f

k

^g

, is de ned recursively by ^2]

x k+1 = x k + k ( y k

^;

k x k ) x 0

²^R

d (1) where > 0 is a step-size.

Due to its simplicity, robustness and ease of implementation, the LMS algo- rithm is known to be one of the most basic adaptive algorithms in many areas including adaptive signal processing, system identi cation and adaptive control, and it has received considerable attention in both theory and applications over the past several decades (see, among many others, the books 18], 17] and 2], the survey 13], and the references therein). Also, it has been found recently that the LMS is H

¹

-optimal in the sense that it minimizes the energy gain from the disturbances to the predicted errors, and it is also risk sensitive optimal and minimizes a certain exponential cost function (see, 11]).

In many situations, we would like to know at least the answers to the following questions: Is the LMS stable in the mean squares sense ? Does the LMS has good tracking ability ? and how to calculate and to minimize the tracking errors ?

It is shown in 9] that the study of the last two questions essentially depends on the rst one, which in turn necessarily depends on the exponential stability study of the following homogeneous equation of LMS (cf.6]):

x k+1 = ( I

^;

_{k k} ) x k (2)

This equation is in essence a product of random matrices, and its stability depends mainly on the properties of the measured signals

^f

k

^g

. Most of the

2

(3)

early works in this direction concern with the case where the signals

^f

k

^g

are independent or M -dependent (cf.18], 4], 1]). This independence assumption can be relaxed considerably if we assume that the signals

^f

k

^g

are bounded as in e.g. 17], 6], and 12]. Note that the boundedness assumption is suitable for the study of the so called normalized LMS algorithms (cf. 17], 6] and 14]), since the normalized signals are automatically bounded. In this case, some general results together with a very week (probably the weakest known) excitation condition for guaranteeing the exponential stability of LMS can be found in 6]. Moreover, in the bounded -mixing case, a complete characterization of the exponential stability can also be given. Indeed, in that case it has been shown in 6] that the

necessary and sucient condition for (2) to be exponentially stable is that there exist an interger h > 0 and a constant > 0 such that

k+h

^X

i=k+1 E _{i i} ]

I

⁸

k: (3)

For general unbouded and correlated signals, the stability analysis (for the unnormalized LMS algorithm (1)), becomes more complex as to have de ed com- plete solution for over 30 years. Recently, some general stability results applicable to possibly unbounded nonstationary weakly dependent signals are established in

8], and based on which a number of results on the tracking performance of the LMS algorithms can be derived (see ? ] ). In particular, the result of 8] can be applied to a typical situation where the signal process is generated from a white noise sequence through a stable linear lter :

k =

^X¹

j=

^;1

A j " k

^;

j + k

^X¹

j=

^;1

k

A j

^k

<

¹

(4) where

^f

" k

^g

is an independent sequence satisfying

sup _k E exp

ⁿ

^k

" k

^k

^o

<

¹

for some > 0 > 2 (5) and where

^f

k

^g

is a bounded deterministic process.

It is obvious that the expansion (4) has a similar form as the well-known Wold decomposition for wide-sense stationary processes. Note, however, that the signal process

^f

k

^g

de ned by (4) may neither be a stationary process nor be a Markov chain in general.

3

(4)

Unfortunately, the condition (5) with > 2 excludes the case where

^f

" k

^g

is a Gaussian process, since such signals could only satisfy a weaker condition :

sup _k E exp

ⁿ

^k

" k

^k

2

^o

<

¹

for some > 0 : (6) The motivation of this paper has been to relax the moment condition (5) so that, at least, the signal process

^f

k

^g

de ned by (4) and (6) can be included. This will be done based on a relaxation of the moment condition used in Theorem 3.2 of 8]. Moreover, we will show that for a large class of weakly dependent nonstationary signals, the condition (3) is also necessary and sucient ^for

the exponential stability of (2), even in the case where the signal process

^f

k

^g

is unbounded and non- -mixing. Furthermore, several direct applications of the stability result to adaptive tracking will be given, which yield more general results than those established previously.

2 The Main results

2.1 Notations

Here we adopt the following notations introduced in 8].

a) . The maximum eigenvalue of a matrix X is denoted by max ( X ), and the Euclidean norm of X is de ned as its maximum singular value, i.e.,

k

X

^k

=

⁴^f

max ( XX )

^g¹²

and the L p -norm of a random matrix X is de ned as

k

X

^k

p =

⁴ ^f

E (

^k

X

^k

^p )

^g^p¹

p

1 b) . For any square random matrix sequence F =

^f

F k

^g

, and real numbers p

1

²

(0 1), the stochastic exponentially stable family is de ned by

S

p (

) =

8

<

:

F :

^k ^Y

^k

j=i+1 ( I

^;

F j )

^k

p

M (1

^;

) ^k

^;

ⁱ

8

²

(0

]

⁸

k

i

0 for some M > 0and

²

(0 1)

)

4

(5)

Likewise, the corresponding deterministic exponentially stable family is de- ned by

S

(

) =

8

<

:

F :

^k ^Y

^k

j=i+1 ( I

^;

E F j ])

^k

M (1

^;

) ^k

^;

ⁱ

8

²

(0

]

⁸

k

i

0 for some M > 0 and

²

(0 1)

)

In what follows , it will be convenient to set

S

p =

⁴

²

(01)

S

p (

)

^S

=

⁴

²

(01)

S

(

) (7)

c) ^{. Let} ^p

¹ ^F ⁼

⁴^f

^F i

^g

. Set

M

p =

F : sup _i

^k

S _i ^(T)

^k

p = o ( T ) as T

^!¹

(8) where

S _i ^(T) = ^(i+1)T

^X^;

¹

j=iT ( F j

^;

E F j ]) (9)

The de nition of

^M

p is reminicense of the law of large numbers. As shown by Lemma 3 of 10], it includes a large class of random peocesses.

2.2 The Main Results

We rst present a preliminary theroem.

Theorem 1 . Let

^f

F k

^g

be a random matrix process. Then

f

F k

^g²^S

=

⁾ ^f

F k

^g²^S

p

⁸

p

1 provided that the following two conditions are satis ed:

(i). There exist positive constants "M and K such that for any n

1, E exp

(

"

^X

ⁿ

i=1

^k

F j

i^k

)

M exp

^f

Kn

^g

holds for any interger sequence 0

j 1 < j 2 ::::: < j n .

(ii). There exist a constant M and a nondecreasing function g ( T ) with g ( T ) = o ( T ), as T

^!¹

, such that for any xed T , all small > 0 and any n

i

0,

E exp

8

<

:

^X

ⁿ

j=i+1

^k

S ^(T) j

^k

9

=

M exp

^f

g ( T ) + o ( )]( n

^;

i )

^g

5

(6)

where S ^(T) _j is de ned by (9).

The proof is given in Section 4.

Remark 1 . The form of Theorem 1 is similar to that of Theorem 3.2 in 8].

The key dierence lies in the condition (i). This condition was introduced in 5], p.112 and is, in a certain sense, a relaxation of the corresponding condition used in Theorem 3.2 of 8]. Such a relaxation enables us to include Gaussian signals as a special case, when the LMS algorithms are in consideration, as will be shown shortly.

Based on Theorem 1 we may prove that for a large class of unbounded nonsta- tionary signals including (4), the condition (3) is also a necessary and sucient condition for the exponential stability of LMS. Let us start with the decomposi- tion (4):

k =

^X¹

j=

^;1

A j " k

^;

j + k

^X¹

j=

^;1

k

A j

^k

<

¹

(10) where

^f

k

^g

is a bounded deterministic process, and

^f

" k

^g

is now a general -mixing sequence.

Recall that a random sequence

^f

" k

^g

is called -mixing if there exists a positive nonincreasing function ( m ) with ( m )

^!

0 as m

^!¹

such that

A

^2F^;1^k

sup B

^2F_k¹⁺_m^j

P ( B

^j

A )

^;

P ( B )

^j

( m )

⁸

m

0 k

²

(

^;1

¹

) where by de nition

F

i j =

^f

" k i

k

j

^g

^;1

i

j

¹

The -mixing concept is a standard one in the literature for describing weakly dependent random processes. As is well-known, it can be veri ed by, for exam- ple, any M-dependent sequences, sequences generated from bounded white noise processes via a stable linear lter, and stationary aperiodic Markov chains which are Markov ergodic and satisfy Doeblin's condition (cf. 3]).

The main result of this paper is then stated as follows.

Theorem 2 . Consider the random linear equation (2). Let the signal process

f

k

^g

be generated by (10) where

^f

k

^g

is a bounded deterministic sequence, and

f

" k

^g

is a -mixing process which satis es for any n

1 and any interger sequence j 1 < j 2 ::::: < j n

E exp

(

^X

ⁿ

i=1

^k

" j

i^k

2

)

M exp

^f

Kn

^g

(11)

6

(7)

where M and K are positive constants. Then

^f

_{k k}

^g²^S

p for all p

1 if and only if there exist an integer h > 0 and a constant > 0 such that

k+h

^X

i=k+1 E _{i i} ]

I

⁸

k

0 : (12)

The proof is also given in Section 4.

Remark 2 . By taking A 0 = IA k = 0 k

⁶

= 0 and k = 0

⁸

k in (10), we see that

^f

k

^g

concides with

^f

" k

^g

, which means that Theorem 2 is applicable to any -mixing sequences. Furthermore, if

^f

" k

^g

is bounded, then (11) is automatically satis ed. This shows that Theorem 2 may include the corresponding result in 6]

as a special case.

Note, however, that a linearly ltered -mixing process like (10) will no longer be a -mixing sequence in general. In fact, Theorem 2 is applicable also to a large class of processes other than -mixing, as shown by the following corollary.

Corollary 1 . Let the signal process

^f

k

^g

be generated by (4) where

^f

k

^g

is a bounded deterministic sequence, and

^f

" k

^g

is an independent sequence satis ng condition (6). Then

^f

_{k k}

^g ² ^S

p for all p

1 if and only if there exist an integer h > 0 and a constant > 0 such that (12) holds.

Proof . By Theorem 2, we need only to show that condition (11) is true. This is obvious since

^f

" k

^g

is an indedpendent sequence satisfying (6).

Remark 3 . The moment condition (6) used in Corollary 1 may be further relaxed if more conditions are imposed. This is the case when, for example, the regressor process is a Markov Chain generated by a nite dimensional linear stable state space model with the innovation process being an i.i.d. sequence (see, 15]).

3 Applications to Adaptive tracking

Let us now assume that

^f

y k

^g

and

^f

k

^g

be related by a linear regression

y k = _k x

_k + v k (13)

where

^f

x

_k

^g

is the true or \ ctitious" time-varying parameter process, and

^f

v k

^g

represents the disturbance or unmodeled dynamics.

The objective of the LMS algorithm (1) is then to track the time-varying un- known parameter process

^f

x

_k

^g

. The tracking error will depend on the parameter

7

(8)

variation process

^f

k

^g

de ned by

k = x

_k

^;

x

_k

^;

₁ (14)

through the following error equation obtained by substituting (13)-(14) into (1):

x ~ k+1 = ( I

^;

_{k k} )~ x k + k v k

^;

k+1 (15) where ~ x k =

⁴

x k

^;

x

_k .

Obviously, the quality of tracking will essentially depends on properties of

f

k k v k

^g

. The homogenous part of (15) is exactly the equation (2), and can be dealt with by Theorem 2. Hence, we need only to consider the forcing terms.

Dierent assumptions on

^f

k v k

^g

will give dierent tracking error bounds or expressions, and we shall treat three cases seperately in the following.

3.1 First Performance Analysis

By this, we mean that the tracking performance analysis is carried out under a \worst case" situation, i.e., the parameter variations and the disturbances are only assumed to be bounded in an averaging sense. To be speci c let us make the following assumption:

A1) . There exists r > 2 such that

= sup

⁴

_k

^k

v k

^k

r <

¹

and = sup

⁴

_k

^k

k

^k

r <

¹

Note that this condition also includes any \unknown but bounded" determin- istic disturbances and parameter variations.

Theorem 3 . Consider the LMS algorithm (1) applied to (13). Let condition A1) be satis ed. Also, let

^f

k

^g

be as in Theorem 2 with (12) satis ed. Then for all t

1 and all small > 0

E

^k

x t

^;

x

_t

^k

² = O ( ² + ²

² ^{) +} O (1

^;

] ^t ) where

²

(0 1) is a constant.

This result follows immediately from Theorem 2, (15) and the Holder inequal- ity. We remark that various such \worst case" results for other commonly used algorithms(e.g., RLS and KF) may be found in 6]. The main implication of

8

(9)

Theorem 3 is that the tracking error will be small if both the parameter variation ( ) and the disturbance ( ) are small.

3.2 Second Performance Analysis

By this, we mean that the tracking performance analysis is carried out for zero mean parameter variations and disturbances which may be correlated random processes in general. To be speci c, we introduce the following set for r

1,

N

r =

8

<

:

w : sup

k

^k

k+n

^X

i=k+1 w i

^k

r

c r ( w )

^p

n

⁸

n

1

9

=

where c r ( w ) is a constant depending on r and

^f

w i

^g

only.

Obviously,

^N

r is a subset of

^M

r de ned by (8). It is known (see 10]) that martingale dierence, zero mean

^;

and

^;

mixing sequences can all be included in

^N

r . Also, from the proof of Lemma 3 in 10], it is known that the constant c r ( w ) can be dominated by sup _k

^k

w k

^k

r in the rst two cases, and by sup _k

^k

w k

^k

r+ ( > 0), in the last case.

Moreover, it is interesting to note that

^N

r is invariant under linear transfor- mations. This means that if

^f

k

^g

and

^f

" k

^g

are related by (10) with k

0, then

f

" k

^g ² ^N

r implies that

^f

k

^g ² ^N

r . This can be easily seen from the following inequality:

k

k+n

^X

i=k+1 i

^k

r =

^k ^X¹

j=

^;1

A j k+n

^X

i=k+1 " i

^;

j

^k

r

1

X

j=

^;1^k

A j

^k^k

k+n

^X

i=k+1 " i

^;

j

^k

r

Thus, random processes generated from martigale dierences, or

^;

or

^;

mixing sequences via an in nite order linear lter can all be included in

^N

r .

Now, we are in a position to introduce the following condition for the second performance analysis.

A2) ^{. For some} r > 2,

^f

k

^g²^N

r and

^f

k v k

^g² ^N

r :

Theorem 4 . Consider the LMS algorithm (1) applied to the model (13). Let

f

k

^g

be de ned as in Theorem 2 with (12) satis ed, and let the condition A2) hold. Then for all t

1 and all small > 0,

E

^k

x t

^;

x

_t

^k

² = O

c _2r ( v ) + c _2r ()

!

+ O (1

^;

] ^t )

9

(10)

where c r ( v ) and c r () are constants depending on

^f

v k

^g

and

^f

k

^g

respectively, which may be found in condition A2) through the de nition of

^N

r , and where is the same constant as in Theorem 3.

Proof . By Lemma A.2 of 9] and Theorem 2, it is easy to see from (15) that the desired result is true.

Note that the upper bound in Theorem 4 signi cantly improves the \crude"

bound given in Theorem 3 for small , and it roughly indicates the familiar trade-o between noise sensitivity and tracking ability.

Theorem 4 can be applied directly to the convergence analysis of some stan- dard ltering problems (cf. 18],4] and 2]). For example, let

^f

y k

^g

and

^f

k

^g

be two stationary processes, and assume that our purpose is to track the least mean squares solution

x

= ( E _{k k} )

^;

¹ E k y k

of min _x E ( y k

^;

x k ) ²

recursively based on real-time measurements

^f

y i i i

k

^g

Now, de ne

^f

v k

^g

by

y k = _k x

+ v k

It is then obvious that E k v k = 0. Furthermore, in many standard situations it can be veri ed that

^f

k v k

^g ² ^N

r for some r > 2. Thus, Theorem 4 applied to the above linear regression, gives

E

^k

x t

^;

x

^k

² = O ( ) + O (1

^;

] ^t ) which tends to zero as t

^!¹

and

^!

0. Apparantly, Theorem 4 is also applicable to nonstationary signals

^f

y k

^g

and

f

k

^g

.

3.3 Third Performance Analysis

By this, we mean that the analysis is purposed to get an explicit (approxi- mate) expression for the tracking performance rather than just getting an upper bound as in the previous two cases. This is usually carried out under white noise assumptions on

^f

k v k

^g

. Roughly speaking, the parameter process in this case will behave like a random walk, and some detailed interpretations of this param- eter model may be found in 13] and 9]. We make the following assumptions:

10

(11)

A3 . The regressor process is generated by a causal lter

k =

^X¹

j=0 A j " k

^;

j + k

^X¹

j=0

^k

A j

^k

<

¹

(16)

where

^f

k

^g

is a bounded deterministic sequence, and

^f

" k k v k

^;

1

^g

is a -mixing process with mixing rate denoted by ( m ). Assume also that (11) and (12) hold.

A4 . The process

^f

k v k

^g

satis es the following conditions:

( i ) : E v k

^jF

k ] = 0 E k+1

^jF

k ] = E k+1 v k

^jF

k ] = 0

( ii ) : E v _2k

^jF

k ] = R v ( k ) E k _k ] = Q ( k )

( iii ) : sup _k E

^j

v k

^j

r

jF

k ]

M = sup

⁴

_k

^k

k

^k

r <

¹

where r > 2 and M > 0 are constants, and

^F

k denotes the -algebra generated by

^f

" i i v i

^;

1 i

k

^g

.

Theorem 5 . Consider the LMS algorithm (1) applied to the model (13). Let conditions A3) and A4) be satis ed. Then for all t

1 and all small > 0

E ~ x t x ~ _t ] = t + O

( ) + ²

^{+ (1}

^;

) ^t ]

!

where the function ( )

^!

0 as

^!

0, and t is recursively de ned by

t+1 = ( I

^;

S t ) t ( I

^;

S t ) + ² R v ( t ) S t + Q ( t + 1) with S t = E _{t t} ] and R v ( t ) and Q ( t ) being de ned as in condition A4).

This theorem relaxies and uni es the conditions used in Theorem 5.1 of 9].

The proof is given in section 4, which is based on a general result established in

9]. The expression for the function ( ) may also be found from the analysis, and from the related formular in Theorem 4.1 of 9].

Note that in the (wide-sense) statinary case, S t

SR v ( t )

R v Q ( t )

Q , and t will converge to a matrix de ned by the Lyapunov equation(cf.9])

S + S = R v S + Q

In this case, the trace of the matrix , which represents the dominating part of the tracking error E

^k

x ~ t

^k

2 for small , can be expressed as

tr _{( ) = 12} R v d + tr ( S

^;

¹ Q )

^]

11

(12)

where d =

⁴

dim ( k ). Minimizing tr ( ) with respect to , one obtain the following formular for the step-size :

=

s

tr ( S

^;

¹ Q ) R v d :

4 Proof of Theorems 1, 2 and 5

Proof of Theorem 1 .

Going through the proof of Theorem 3.2 in Section V of 8], we nd that it sucies to show that for any xed T > 1 and all small > 0

k

n

Y

j=i+1 (1 + ² c

^k

H j

^k

)

^k

t

M

1 + O (

²³

)

ⁿ

^;

ⁱ

⁸

n

i (17) where c

1, t

1 and M > 0 are constants and

² H j = ² H j (2) + ³ H j (3) +

+ ^T H j ( T ) + O ( ² )

with H j ( k ) =

^X

jT

j

¹

<j

²

<

<j

_k

(j+1)T

^;

1 F j

_k

F j

¹

k = 2

T:

Now, let us set

f j = exp

^f

¹⁴

^(j+1)T

^X^;

¹

s=jT

k

F s

^kg

Then for any 2

k

T and jT

j 1 < ::: < j k

( j + 1) T

^;

1, by using the inequalities k

³ ₂ + ^k ₄ and x

e ^x , we have for

²

(0 1)

^k

F j

_k

:::F j

¹^k

³²

(

¹⁴^k

F j

k^k

) ::: (

¹⁴^k

F j

¹^k

)

³²

exp

^f

¹⁴

(

^k

F j

¹^k

+ ::: +

^k

F j

_k^k

)

^g

³²

f j

Consequently,

(1 + ² c

^k

H j

^k

)

T

Y

k=2 (1 + ^k c

^k

H j ( k )

^k

)(1 + O ( ² ))

T

Y

k=2

Y

iT

j

¹

<j

²

<j

k

(i+1)T

^;

1 (1 + ^k c

^k

F j

k

F j

¹^k

)(1 + O ( ² ))

(1 +

³²

cf j ) ²

^T

(1 + O ( ² ) (18)

12

(13)

Note that

n

Y

j=i+1 (1 +

³²

cf j ) = ⁿ

^X^;

ⁱ

k=0 (

³²

c ) ^k

^X

i+1

j

¹

<:::<j

k

n f j

¹

:::f j

k

Now, applying the Minkowski inequality to the above identity, taking small enough so that 2 ^T t

¹⁴

" and using condition (i) it is evident that

k

n

Y

j=i+1 (1 +

³²

cf j )

^k

2

^T

t

n

^;

i

X

k=0 (

³²

c ) ^k

^X

i+1

j

¹

<:::<j

k

n M

²^{T t}¹

exp

^f

( KT 2 ^T t ⁾ k

^g

M

²^{T t}¹

1 + c

²³

exp( KT 2 ^T t ⁾

n

^;

i

Finally, combining this with (18), it is not dicult to see that (17) is true. This completes the proof.

The proof of Theorem 2 is rather involved, and so it is divided (prefaced) with several lemmas.

Lemma 1 . Let

^f

F t

^g

be a -mixing d

d dimensional matrix process with mixing rate

^f

( m )

^g

. Then

sup _i

^k

S _i ^(T)

^k

2 2 cd

(

T ^T

^X^;

¹

m=0

q

( m )

) 1

2

⁸

T

1 where S _i ^(T) is de ned by (9) and c is de ned by c = sup _i

^k

F i

^;

EF i

^k

2 .

Proof ^{. Denote} G k = F k

^;

EF k . Then by Theorem A.6 in 12,p.278] we have

k

E G j G _k ]

^k

2 dc ²

^q

(

^j

j

^;

k

^j

)

⁸

jk Consequently, by using the inequality

j

trA

^j

d

^k

A

^k

⁸

A

²^R

^d

^d We get

k

S _i ^(T)

^k

² ₂ = E

^k

^(i+1)T

^X^;

¹

jk=iT G j G _k

^k

tr

^f

^(i+1)T

^X^;

¹

jk=iT EG j G _k

^g

13

(14)

d ^(i+1)T

^X^;

¹

jk=iT

k

EG j G _k

^k

2 c ² d ² ^(i+1)T

^X^;

¹

jk=iT

q

(

^j

j

^;

k

^j

)

4 c ² d ² T ^T

^X^;

¹

m=0

q

( m ) This gives the desired result.

Lemma 2 ^{. Let} ^F k = _{k k} , where

^f

k

^g

is de ned by (10) with sup _k

^k

" k

^k

2 <

¹

. Then

^f

F k

^g²^M

2 where

^M

2 is de ned by (8).

Proof . First of all, we may assume that the process

^f

" k

^g

is of zero mean (otherwise, it can be included in k ). Then by (10),

k

S _i ^(T)

^k

2

1

X

kj=

^;1

k

A k

^k^k

A j

^k^k

(i+1)T

X^;

1 t=iT " t

^;

k " _t

^;

_j

^;

E" t

^;

k " _t

^;

_j

^k

2 +2

^X¹

j=

^;1^k

A j

^k^k

(i+1)T

X^;

1 t=iT " t

^;

j _t

^k

2 (19) Note that for any xed k and j , both the processes

^f

" t

^;

k " _t

^;

_j

^g

and

^f

" t

^;

j

^g

are -mixing with mixing rate ( m

^;^j

k

^;

j

^j

) and ( m ) respectively (where by de nition, ( m ) = 1

⁸

m < 0) .

By Lemma 1, it is easy to see that the last term in (19) is of order o ( T ). For dealling with the second last term, we denote

f kj ( T ) = 2 cd

(

T ^T

^X^;

¹

m=0

q

( m

^;^j

k

^;

j

^j

)

) 1

2

: (20)

Also, assume without loss of generality that ( m )

1

⁸

m

0. Then it is obvious that

sup _kj f kj ( T )

2 cdT (21)

and sup

j

k

^;

j

^j

<

^p

T f kj ( T ) = o ( T ) : (22) Now, by the summability of

^f

A i

^g

,

X

j

k

^;

j

^j^p

T

k

A k

^kk

A j

^k^!

0 as T

^!¹

14

(15)

Hence by (21)

X

j

k

^;

j

^j^p

T

k

A k

^kk

A j

^k

f kj ( T ) = o ( T ) (23) and by (22)

X

j

k

^;

j

^j

<

^p

T

k

A k

^kk

A j

^k

f kj ( T ) = o ( T ) : (24) Combining (23) and (24) gives

1

X

kj=

^;1

k

A k

^kk

A j

^k

f kj ( T ) = o ( T ) : (25) By this and Lemma 1, we know that the second last term in (19) is also of the order o ( T ) uniformly in i . Hence,

^f

F k

^g²^M

2 by de nition.

Lemma 3 . Let sup _k E

^k

k

^k

2 <

¹

. Then

^f

_{k k}

^g ²^S

if and only if condition (12) holds, where

^S

is de ned by (7).

Proof. Let us rst assume that (12) is true. Take

= (1 + sup _k E

^k

k

^k

2 )

^;

¹ . Then applying Theorem 2.3 in 6] to the deterministic sequence A k = E _{k k} ] for any

²

(0

], it is easy to see that

^f

_{k k}

^g²^S

(

).

Conversely, if

^f

_{k k}

^g²^S

, then there exists

²

(0 (1+sup _k E

^k

k

^k

2 )

^;

¹ ] such that

^f

_{k k}

^g ² ^S

(

). Now, applying Theorem 2.3 in 6] to the deterministic sequence A k =

E _{k k} ], it is easy to see that (12) holds. This completes the proof.

Lemma 4 ^{. Let} F k = _{k k} , where

^f

k

^g

is de ned by (10) with (11) satis ed.

Then

^f

F k

^g

satis es condition (i) of Theorem 1.

Proof . Without loss of generality assume that k = 0. Let us denote A =

^X¹

j=

^;1

k

A j

^k

(26)

Then by the Schwarz inequality from (10) we have

k

^k

2 A

^X¹

j=

^;1^k

A j

^kk

" k

^;

j

^k

2 Consequently, by the Holder inequality and (11) we have for "

A

^;

² E exp

^f

"

^X

ⁿ

i=1

^k

F j

i^kg

15

(16)

E exp

^f

"A

^X¹

j=

^;1^k

A j

^k^X

n

i=1

^k

" j

i^;

j

^k

2

g

= E

^Y¹

j=

^;1

exp

^f

"A

^k

A j

^k^X

n

i=1

^k

" j

i^;

j

^k

2

g

1

Y

j=

^;1

E exp

^f

"A ²

^X

ⁿ

i=1

^k

" j

i^;

j

^k

2

g

! kAjA^k

1

Y

j=

^;1

( M exp

^f

Kn

^g

)

^k^Aj^A^k

= M exp

^f

Kn

^g

: This completes the proof.

The following lemma was originally appeared in 10, p.113].

Lemma 5 . Let

^f

z k

^g

be a nonnegative sequence such that for some a > 0 b > 0 and for all i 1 < i 2 < :::::: < i n

⁸

n

1,

E exp

^f^X

ⁿ

k=1 z i

_k^g

exp

^f

an + b

^g

: (27)

Then for any L > 0 and any n

i

0, E exp

^f

1

2 n

X

j=i+1 z j I ( z j

L )

^g

exp

^f

e ^a

^;^L²

( n

^;

i ) + b

^g

where I ( : ) is the indicate function.

Proof . Denote

f j = exp(12 z j ) I ( z j

L ) :

Then by rst applying the simple inequality I ( x

L )

e

^x²

=e

^L²

and then using (27), we have for any subsequence j 1 < j 2 :::::: < j k

E f j

¹

::::::f j

_k

]

= E _exp(12

^X

^k

i=1 z j

i

) I (

^\

^k

i=1

f

z j

i

L

^g

) :

E exp(

^X

^k

i=1 z j

i

) = exp( kL 2 )

exp

^f

( a

^;

L

2 ) k + b

^g

16

(17)

By this we have

E exp

^f ^X

ⁿ

j=i+1

1 2 z j I ( z j

L )

^g

= E

^Y

ⁿ

j=i+1 exp

^f

1 2 z j I ( z j

L )

^g

E

^Y

ⁿ

j=i+1

f

1 + exp(12 z j ) I ( z j

L )

^g

= E

8

<

:

n

^;

i

X

k=0

X

i+1

j

¹

<:::<j

k

n f j

¹

:::f j

_k

9

=

e ^b

8

<

:

n

^;

i

X

k=0

X

i+1

j

¹

<:::<j

_k

n exp

^f

( a

^;

L 2 ) k

^g

9

=

= e ^{b n}

^Y

j=i+1

f

1 + exp( a

^;

L 2 )

^g

exp

( n

^;

i )exp( a

^;

L 2 ) + b

This completes the proof of Lemma 5.

Lemma 6 . Let F k = _{k k} , where

^f

k

^g

is de ned by (10) with (11) satis ed.

Then

^f

F k

^g

satis es condition (ii) of Theorem 1.

Proof . Set for any xed k and l , z j = z j ( kl ) =

^k

^(j+1)T

^X^;

¹

t=jT " t

^;

k " _t

^;

_l

^;

E" t

^;

k " _t

^;

_l ]

^k

Then, similar to (19) we have

n

X

j=i+1

k

S _j ^(T)

^k ^X¹

kl=

^;1

k

A k

^kk

A l

^k ^X

n j=i+1 z j + +2

^X¹

k=

^;1^k

A k

^k ^X

n j=i+1

^k

(j+1)T

X^;

1 t=jT " t

^;

k _t

^k

: (28) We rst consider the second last term in (28). By the Holder inequality,

E exp

8

<

:

^X¹

kl=

^;1

k

A k

^kk

A l

^k ^X

n j=i+1 z j

9

=

= E

^Y¹

kl=

^;1

exp

8

<

:

^jj

A k

^jj^jj

A `

^jj ^X

n j=i+1 j

9

=

17

(18)

1

Y

kl=

^;1

8

<

:

E exp

^f

A ²

^X

ⁿ

j=i+1 z j

^g

9

=

kAkA^kk²Al^k

(29) where A is de ned by (26).

Now, let c = sup _k E

^k

" k

^k

2 , and note that

k

" t

^;

k " _t

^;

_l

^k

¹ ₂₍

^k

" t

^;

k

^k

2 +

^k

" t

^;

l

^k

2 ) we have

z j

1 2

(j+1)T

X^;

1 t=jT (

^k

" t

^;

k

^k

2 +

^k

" t

^;

l

^k

2 ) + cT

By this and (11) it is easy to prove that the sequence

^f

z j

^g

satis es con- dition (27) with a = ( K + c ) T and b = log M , where is de ned as in (11).

Consequently, by Lemma 5 we have for any L > 0 E exp

8

<

:

2 n

X

j=i+1 z j I ( z j

LT )

9

=

M exp

ⁿ

e ^(K+c

^;^L²

^)T ( n

^;

i )

^o

(30) Now, in view of (30), taking < ^A ₄

^;2

and L > 2

^;

¹ ( K + c ), and applying the Holder inequality, we have

E exp

^f

2 A ²

^X

ⁿ

j=i+1 z j I ( z j

LT )

^g

exp

^f

( T )( n

^;

i )

^g

(31) where ( T )

^!

0 as T

^!¹

, which is de ned by

( T ) = 4

^;

¹ A ² exp

^f

( K + c

^;

L 2 ) T

^g

Next, we consider the term x j =

⁴

z j I ( z j

LT ).

By the inequality e ^x

1 + 2 x 0

x

log2, we have for small > 0 exp

^f

2 A ²

^X

ⁿ

j=i+1 x j

^g ^Y

n

j=i+1 (1 + 4 A ² x j ) (32)

As noted before, for any xed k and l , the process

^f

" t

^;

k " _t

^;

_l

^g

is -mixing with mixing rate ( m

^;^j

k

^;

l

^j

). Hence, similar to the proof of Corollary 3.1 in 1,

18

(19)

p.1383], we have E

^Y

ⁿ

j=i+1 (1 + 4 A ² x j )

2

ⁿ

1 + 8 A ² f kl ( T ) + 2 LT ( T + 1

^;^j

k

^;

l

^j

)]

^o

ⁿ

^;

ⁱ

2exp

ⁿ

8 A ² f kl ( T ) + 2 LT ( T + 1

^;^j

k

^;

l

^j

)]( n

^;

i )

^o

(33) where f kl ( T ) is de ned by (20).

Finally, combining (31){ (33) and using the Schwarz inequality we get E exp

^f

A ^{2 n}

^X

j=i+1 z j

^g

8

<

:

E exp

^f

2 A ^{2 n}

^X

j=i+1 z j I ( z j

LT )

9

=

1

2 8

<

:

E exp

^f

2 A ^{2 n}

^X

j=i+1 x j

9

=

1

2

p

2 M exp

ⁿ

( T ) + 8 A ² f kl ( T ) + 16 LTA ² ( T + 1

^;^j

k

^;

l

^j

)]( n

^;

i )

^o

Substituting this into (29) and noting (25), it is not dicult to see that there exists a function g ( T ) = o ( T ) such that for all small > 0,

E exp

8

<

:

^X¹

kl=

^;1

k

A k

^k^k

A l

^k ^X

n j=i+1 z j

9

=

p

2 M exp

^f

g ( T )( n

^;

i )

^g

:

Obviously, for the last term in (28), a similar bound can also be derived using a similar treatment. Hence it is easy to see that the lemma is true .

Proof of Theorem 2 ^.

Necessity: ^Let

^f

k k

^g² ^S

p for p = 2. Then by Lemma 2 and Theorem 3.1 in 8], we know that

^f

_{k k}

^g ²^S

. Consequently, by Lemma 3 we know that (12) holds.

Suciency: If condition (12) holds, then by Lemma 3 we have

^f

_{k k}

^g ²

S

. By this and Lemmas 4 and 6, we know that Theorem 1 is applicable, and consequently

^f

_{k k}

^g²^S

p

⁸

p

1. This completes the proof.

Proof of Theorem 5.

We need to verify all the conditions in Theorem 4.1 of 9]. However, Theorem 2, Lemma 3 and the conditions of Theorem 5, it is not dicult to see that we

Lennart Ljung and

Lei Guo

Institute of Systems Science, Chinese Academy of Sciences Beijing, 100080, China

Lennart Ljung and

Department of Electrical Engineering, Linkoping University S-581 83 Linkoping, Sweden

G. J. Wang

Institute of eSystems Science, Chinese Academy of Sciences Beijing, 100080, China

October 30, 1995

1

1 Introduction

The well-known least mean squares (LMS) algorithm, aiming at tracking the

\best linear t" of an observed (or desired) signal

y k

based on a measured d -dimensional (input) signal

k

, is de ned recursively by 2]

x k+1 = x k + k ( y k

k x k ) x 0

d (1) where > 0 is a step-size.

-optimal in the sense that it minimizes the energy gain from the disturbances to the predicted errors, and it is also risk sensitive optimal and minimizes a certain exponential cost function (see, 11]).

In many situations, we would like to know at least the answers to the following questions: Is the LMS stable in the mean squares sense ? Does the LMS has good tracking ability ? and how to calculate and to minimize the tracking errors ?

It is shown in 9] that the study of the last two questions essentially depends on the rst one, which in turn necessarily depends on the exponential stability study of the following homogeneous equation of LMS (cf.6]):

x k+1 = ( I

k k ) x k (2)

This equation is in essence a product of random matrices, and its stability depends mainly on the properties of the measured signals

k

. Most of the

2

early works in this direction concern with the case where the signals

k

are independent or M -dependent (cf.18], 4], 1]). This independence assumption can be relaxed considerably if we assume that the signals

k

necessary and sucient condition for (2) to be exponentially stable is that there exist an interger h > 0 and a constant  > 0 such that

k+h

i=k+1 E i i ]

I

k: (3)

8], and based on which a number of results on the tracking performance of the LMS algorithms can be derived (see ? ] ). In particular, the result of 8] can be applied to a typical situation where the signal process is generated from a white noise sequence through a stable linear lter :

k =

j=

A j " k

j +  k

j=

A j

<

(4) where

" k

is an independent sequence satisfying

sup k E exp

" k

<

for some > 0  > 2 (5) and where

 k

is a bounded deterministic process.

It is obvious that the expansion (4) has a similar form as the well-known Wold decomposition for wide-sense stationary processes. Note, however, that the signal process

k

de ned by (4) may neither be a stationary process nor be a Markov chain in general.

3

Unfortunately, the condition (5) with  > 2 excludes the case where

" k

is a Gaussian process, since such signals could only satisfy a weaker condition :

sup k E exp

" k

2

<

for some > 0 : (6) The motivation of this paper has been to relax the moment condition (5) so that, at least, the signal process

k

de ned by (4) and (6) can be included. This will be done based on a relaxation of the moment condition used in Theorem 3.2 of 8]. Moreover, we will show that for a large class of weakly dependent nonstationary signals, the condition (3) is also necessary and sucient for

the exponential stability of (2), even in the case where the signal process

k

is unbounded and non- -mixing. Furthermore, several direct applications of the stability result to adaptive tracking will be given, which yield more general results than those established previously.

2 The Main results

2.1 Notations

Here we adopt the following notations introduced in 8].

a) . The maximum eigenvalue of a matrix X is denoted by max ( X ), and the Euclidean norm of X is de ned as its maximum singular value, i.e.,

X

=

max ( XX )

and the L p -norm of a random matrix X is de ned as

X

Department of Electrical Engineering, Linkoping University S-581 83 Linkoping, Sweden

, is de ned recursively by ^2]

_{k k} ) x k (2)

necessary and sucient condition for (2) to be exponentially stable is that there exist an interger h > 0 and a constant > 0 such that

i=k+1 E _{i i} ]

I

j + k

sup _k E exp

for some > 0 > 2 (5) and where

k

Unfortunately, the condition (5) with > 2 excludes the case where

sup _k E exp

de ned by (4) and (6) can be included. This will be done based on a relaxation of the moment condition used in Theorem 3.2 of 8]. Moreover, we will show that for a large class of weakly dependent nonstationary signals, the condition (3) is also necessary and sucient ^for

^p )

^k

) ^k

ⁱ

^k

) ^k

ⁱ

(01)

(01)

c) ^{. Let} ^p

¹ ^F ⁼

^F i