• No results found

Performance Analysis of General Tracking Algorithms

N/A
N/A
Protected

Academic year: 2021

Share "Performance Analysis of General Tracking Algorithms"

Copied!
16
0
0

Loading.... (view fulltext now)

Full text

(1)

Performance Analysis of General Tracking Algorithms

Lei Guo

Institute of Systems Science, Chinese Academy of Sciences Beijing, 100080, China

Lennart Ljung and

Department of Electrical Engineering, Linkoping University S-581 83 Linkoping, Sweden

Abstract | A general family of tracking algorithms for lin- ear regression models is studied. It includes the familiar LMS (gradient approach), RLS (recursive least squares) and KF (Kalman lter) based estimators. The exact expressions for the quality of the obtained estimates are complicated.

Approximate, and easy-to-use, expressions for the covari- ance matrix of the parameter tracking error are developed.

These are applicable over whole time interval, including the transient and the approximation error can be explicitly cal- culated.

I. Introduction

Tracking is the key factor in adaptive algorithms of all kinds. We shall in this contribution study the special case where the underlying model is a linear regression, i.e., the observations are related by

y k = ' k  k + v k  k 0 : (1) Here y k is an observation made at time k , and ' k is a d - dimensional vector, that is known at time k , v k represents a disturbance and the parameter vector  k describes how the components of ' k relate to the observation y k . It is the objective to estimate the vector  k from measurements

f

y t ' t t



k

g

.

Many technical problem formulations t the structure (1) by choosing ' k and y k appropriately. See, among many references, for example, 15] and 22].

In order to come up with good algorithms for estimating

 k , it is natural to introduce some assumptions about the time-variation of this parameter vector. In general we may write

 k =  k

;1

+ w k (2) where  is a scaling constant and w k is an as yet unde ned variable.

The tracking algorithms will provide us with an estimate

 ^ k = ^  k ( y k ' k  k ) (3) where superscript denotes the whole time history: y k =

f

y

0

y

1

:::y k

g

, etc.

Supported by the National Natural Science Foundation of China Supported by the Swedish Research Council for Engineering Sci- ences (TFR)

A prime question concerns of course the quality of such an estimate. We shall evaluate the quality in terms of the covariance matrix of the tracking error

e

 k =  k

;

 ^ k (4) This covariance matrix will be denoted by



0

k = E

e

 k 

e

k ] (5) where expectation will be taken over all relevant stochastic variables. A precise de nition will be given later.

An exact expression for 

0

k will be very complicated | except in some trivial cases | and it will not be possible to derive it explicitly in closed form. However, the prac- tical importance of having good tracking algorithms and estimates of their quality still makes it vital to be able to work with 

0

k .

For that reason, there is a quite substantial literature on the problem of how to approximate 

0

k with expressions  k

that are simple to to work with. This literature is { partly { surveyed in 2], 1], 12], and 20].

The current paper has the ambition to give a general re- sult that subsumes and extends most of the earlier results.

Example 1.1 A Preview Example .

Consider the model (1){(2) under the assumptions that a). ' k and  k are scalars

b).

f

' k

g



f

v k

g

and

f

w k

g

are independent sequences of independent random variables with zero mean values and variances R ' , R v and Q w , respectively.

c). The fourth moment of ' k is R

4

.

Assume also that the estimate ^  k is computed by the sim- ple LMS algorithm

 ^ k

+1

= ^  k + ' k ( y k

;

' k  ^ k ) : (6) This case is one | essentially the only one | where a simple exact expression for 

0

k can be calculated. Straight- forward calculations give



e

k

+1

= (1

;

'

2

k ) 

e

k

;

' k v k + w k

+1

: (7) Squaring and taking expectations gives



0

k

+1

= (1

;

2 R ' + 

2

R

4

)

0

k + 

2

R ' R v + 

2

Q w : (8)

(2)

This is a linear time-invariant dierence equation for 

0

k , and can be explicitly solved. In particular, if

j

1

;

2 R ' + 

2

R

4j

< 1  the solution of (8) will converge to 



with





= 1

1

;

R

4

= (2 R ' )   = 1 2 R ' R ' R v + 

2

 Q w ] Simple manipulations then give (9)

j



;



j

(  )  (  ) = R

4

= (2 R ' ) 1

;

R

4

= (2 R ' )]  : Thus, 



can be well approximated by  for small  , since

(  )

!

0 as 

!

0.

2

Now, this example was particularly easy, primarily be- cause of the assumed independence among

f

' k v k w k

g

which makes ' k and 

e

k independent.

In more general cases we have to deal with dependence among

f

' k

g

, and that is actually at the root of the prob- lem. Generally speaking, if

f

' k

g

are weakly dependent, so should ^  ^ k in (3) depends to a small extent on the \latest"  k and ' k be, provided that ' k , i.e. if the adaptation rate (  in the example) is small and the error equation ((7) in the example) is stable.

The extra term caused by the dependence in the equa- tion corresponding to (8) in the example should then have negligible inuence. Indeed, it is the purpose of this contri- bution to establish this for a fairly general family of track- ing algorithms. Despite the simple idea, it turns out to be surprisingly technically dicult to prove. This paper could be said to make the end of a series of results on performance analysis, starting with Theorem 1 in 12] and then followed by 14], 13] and 10]. There are many related, relevant re- sults using other approaches. We may point to 20], 2], 5], 6], 4], 16], 3] 18], and to the references in these books and papers.

The bottom line of the analysis is a result of the character

k

E

e

 k 

e

k ]

;

 k

k

(  )

k

 k

k

 (10) where (  )

!

0 as 

!

0, and  is a measure of the adaptation rate in the algorithm,  k obeys a simple linear, deterministic, dierence equation (like (8) without the term



2

R

4

).

The point with a result of the character (10) is, clearly, that we can arbitrarily well approximate the actual track- ing error covariance matrix with a simple expression that can be easily evaluated and analyzed. The essence of this paper does not lie in the expression for  k itself | it is not dicult to conjecture that such an approximation should be reasonable. Our contribution is rather to establish the connection in the explicit fashion (10) for a wide family of the most common tracking algorithms. One important step in achieving such results is to rst establish that the underlying algorithm is exponentially stable. This is a ma- jor problem in itself, and a companion paper 9] is devoted to this step, for the same family of algorithms.

The paper is organized as follows. In Section 2 the track- ing algorithms are briey described. Section 3 gives the main result: That (10) holds under the same general condi- tions for all algorithms in the family. There we also briey discuss the practical consequences of the result. In the fol- lowing section, a more general theorem is presented, which is the basis for the analysis. This theorem is more general, and uses weaker but less explicit conditions. The proof of the main result is then given in Section 5, by showing that the general theorem can be applied to our family of algo- rithms. Notice that this analysis is of independent interest in that for each individual algorithm, the conditions can be somewhat weakened in dierent ways.

II. The Family of Tracking Algorithms We shall consider the general adaptation algorithm

 ^ k

+1

= ^  k + L k ( y k

;

' k  ^ k )  

2

(0  1)  (11) where the gain L k is chosen in some dierent ways:

Case 1: Least Mean Squares (LMS) :

L k = ' k (12)

This is a standard algorithm, 21], 22], and has been used in numerous adaptive signal processing applications.

Case 2 : Recursive Least Squares(RLS) :

L k = P k ' k (13)

P k = 1 1

;





P k

;1;

 P k

;1

' k ' k P k

;1

1

;

 + ' k P k

;1

' k



(14)

P

0

> 0 : (15)

This gives an estimate ^  k that minimizes

k

X

t

=1

(1

;

 ) k

;

t ( y t

;

' t  )

2

where (1

;

 ) is the \forgetting factor".

Case 3: Kalman Filter (KF) Based Algorithm:

L k = P k

;1

' k

R + ' k P k

;1

' k (16) P k = P k

;1;

P k

;1

' k ' k P k

;1

R + ' k P k

;1

' k + Q (17) ( R > 0  Q > 0) (18) Here R is a positive number and Q is a positive de nite matrix. The choice of L k corresponds to a Kalman lter state estimation for (1)-(2), and is optimal in the a poste- riori mean square sense if v k and w k are Gaussian white noises with covariance matrices R and Q , respectively, and if  is chosen as  in (2).

If

f

' k y k  k

g

obey (1) - (2) and ^  k is found using (11) we can write the estimation error 

e

k as

e

 k

+1

= ( I

;

F k ) 

e

k

;

L k v k + w k

+1

 F k = L k ' k (19)

(3)

This is a purely algebraic consequence of (1) - (2) and (11), and holds for whatever sequences v k and w k .

If we introduce stochastic assumptions about

f

v k

g

and

f

w k

g

, we can use (19) to express the covariance matrix E 

e

k

+1



e

k

+1

]. That will however be quite complex, primar- ily due to the dependence between

f

L k ' k  

e

k

g

. The basic approximating expression will instead be based on the fol- lowing expression

 k

+1

= ( I

;

G k ) k ( I

;

G k )  + 

2

R v ( k ) M k + 

2

Q w ( k +1) where G k = EF k , M k = EL k L k , R v ( k ) = Ev

2

k (20) and Q w ( k ) = Ew k w k . As follows from Example 1.1, this would be the correct expression for the covariance matrix of 

e

k

+1

, if v k and w k were white noises and L k ' k was independent of 

e

k , and if a term of size 

2

 k was neglected.

Indeed, we shall prove that (20) provides a good approx- imation of the true covariance matrix in the sense that (10) holds. Note that  k obeys a simple linear dierence equa- tion, and can easily be calculated and examined.

III. The Main Result A. The Assumptions

We shall now consider the algorithm (11) with either of the three choices of the gain L k , discussed in the previous section. For the analysis we shall impose some conditions on the involved variables. These are of the following char- acter.

C1. The regressors

f

' k

g

span the regressor space (in or- der to ensure that the whole parameter vector  can be estimated)

C2. The dependence between the regressors ' k and ( ' i v i

;1

w i ) decays to zero as the time distance ( k

;

i ) tends to innity

C3. The measured error v k and the parameter drift w k

are of white noise character.

In more exact terms, the three assumptions take the fol- lowing form:

P1 . Let S t = E ' t ' t ], assume that there exist constants h > 0 and > 0 such that

k

X+

h

t

=

k

+1

S t I 

8

k

P2 . Let

G

k =

f

' k

g

,

F

k =

f

' i v i

;1

w i i



k

g

. As- sume that

f

' k

g

is weakly dependent ( -mixing) in the sense that there is a function ( m ) with ( m )

!

0, as m

!1

, such that

A

2Gk+

sup

m

B

2Fkj

P ( A

j

B )

;

P ( A )

j

( m )

8

k 

8

m: (21) Also, assume that there is a constant c ' > 0 such that

k

' k

k

c ' a:s:

8

k .

P3 . Let

F

k be the -algebra de ned in P2, assume that E v k

jF

k ] = 0  E w k

+1jF

k ] = E w k

+1

v k

jF

k ] = 0 E v k

2jF

k ] = R v ( k )  E w k w k ] = Q w ( k )

sup k

f

E

j

v k

j

r

jF

k ] + E

k

w k

k

r

g

M

for some r > 2  M > 0 : B. The Result

Now, let  k be de ned by the following linear, determin- istic dierence equation:

 k

+1

= ( I

;

R k S k ) k ( I

;

R k S k ) 

+ 

2

R v ( k ) R k S k R k + 

2

Q w ( k + 1) (22) where S k = E ' k ' k ], and R k is de ned as follows:

LMS-case

R k = I (23)

RLS-case

R k = R k

;1;

R k

;1

S k R k

;1

+ R k

;1

 ( R

0

= P

0

) (24)

KF-case

R k = R k

;1;

R k

;1

S k R k

;1

+ Q=R ( R

0

= P

0

=R ) (25) We then have the following main result.

Theorem 3.1 Consider any of the three basic algo- rithms in Section 2. Assume that P1, P2 and P3 hold. Let

 k be de ned as above. Then

8



2

(0 



) 

8

k 1

k

E 

e

k 

e

k ]

;

 k

k

c (  )  + 

2

 + (1

;

 ) k ] (26) where (  )

!

0 (as 

!

0), which is de ned by

(  ) = min

4

m

1 f

p

m + ( m )

g

 (27) and ( m ) was de ned in P2, and 

2

(0  1) 

2

(0  1) c >

0 are constants which may be computed using properties of

f

' k v k w k

g

.

The proof is given in Section 5. Let us now discuss the conditions used in the above theorem.

C. The Degree of Approximation

First of all, it is clear that the quantity (  ) plays an important role. The faster it tends to zero, the better ap- proximation is obtained. The rate by which it tends to zero is according to (27) a reection of how fast ( m ) (that is, the dependence among the regressors) tends to zero as m increases. For example, if the regressors are m -dependent, so that ' k and ' ` are independent for

j

k

;

`

j

> m , then ( n ) = 0 for n > m and (  ) will behave like

p

 . Also, if the dependence is exponentially decaying ( ( m )



Ce

;

m ), then we can nd that

(  ) < C

0

:

5;



for arbitrarily small, positive . This gives a good picture

of typical decay rates of .

(4)

D. Persistence of Excitation: Condition P1

Condition P1 is quite natural and weak, just requiring the regressor covariance matrix to add up to full rank over a given time span of arbitrary length. It has been known to be a necessary condition (in a certain sense) for bound- edness of E

ke

 k

k2

generated by LMS (cf. 8]), it is also known to be the minimum excitation condition needed for the stability analysis of RLS (cf. 10]).

E. Boundedness and -mixing of the regressors: Condition Condition P2 requires boundedness and P2 -mixing of the regressors. Although such conditions are standard ones in the literature (e.g. 11]), they can still be considered as re- strictive. As seen in several of the results in Section 5, both -mixing and boundedness can be weakened considerably when we deal with speci c algorithms.

It may also be remarked that when

f

' k

g

is unbounded, we can modify the algorithm and make Theorem 3.1 hold true: Introduce the normalized signal

( y k ' k v k ) = 1

p

1 +

k

' k

k2

( y k ' k v k ) Then we have from (1)

y k =  k ' k + v k :

Thus,

f

 k

g

may be estimated based on this normalized linear regression. In this case, Theorem 3.1 can be applied to this case if only S k and R v ( k ) in (22){(25) are replaced by E ' k ' k

1 +

k

' k

k2

] and E 1

1 +

k

' k

k2

] R v ( k ), respectively.

F. The Parameter Drift Model: Condition P3

There are two things to mention around the Conditions P3. First, we note that the martingale dierence property of w k essentially means that the true parameters, accord- ing to the model (2) are assumed to be a random walk.

Although this model is quite standard, it has also been criticized as being too restrictive. We believe that a ran- dom walk model, in the context of slow adaptation (small

 ), captures the tracking behavior of the algorithm very well. This is, in a sense, a worst case analysis, since the future behavior of the model is unpredictable.

We may also note that time-varying covariances Q w ( k ) and R v ( k ) are allowed. Several of the special model drift cases described in 12] are therefore covered by P3. Other drift models, where the driving noise is colored, can be put into a similar Kalman lter framework. However, to cover also that case with our techniques requires more work.

Condition P3 also introduces assumptions about higher moments than 2. We remark that if we only assume that

f

v k

g

and

f

w k

g

are bounded in e.g. mean square sense, then upper bounds for the mean square tracking errors can be established (cf. 8] and 7]). The strengthened assumption in P3 allows us to obtain performance values much more accurate than upper bounds.

G. The Practical Use of the Theorem

The practical consequences of Theorem 3.1 is that a very simple algorithm, the linear, deterministic dierence equa- tion (22) will describe the tracking behavior. Now, this equation is quite easy to analyze. In fact, there is an ex- tensive literature on such analysis, in particular for the special case of LMS. Among many references, we may refer to 12] for a survey of such results. In essence, all these results capture the dilemma between tracking error ( is large because  is small) and the noise sensitivity ( is large because  is large) and may point to the best compromises between these requirements.

For example, under weak stationarity of the regressors S k

S

we nd that R k will converge to ~ R as k

!1

, where ~ R = I in the LMS-case, ~ R = S

;1

in the RLS case and for the KF case we have to solve

RS ~ R ~ = Q=R

for ~ R . Inserted into (22) this gives the following stationary values  for the tracking error covariance matrix (neglect- ing the term 

2

):

LMS S  +  S = R v S + 

2

 Q w

RLS  = 12 R v S

;1

+ 

2

 Q w ]

KF RS ~  + ( ~ RS )  = R v Q=R + 

2

 Q w

Note, that if we have Q = Q w and R = R v , then the latter equation can be solved as

 = R

2 (  + 

2

 ) ~ R

From these expressions the trade-os between tracking ability and noise sensitivity are clearly visible.

IV. A General Theorem

In this section, we shall present a general theorem on performance of tracking algorithm (11) when the gain L k is not speci ed, from which our main result Theorem 3.1 will follow. The general theorem has weaker, but less explicit assumptions. From now on the treatment and discussion will be more technical. However, the main line of thought in the proofs follows the outline given after Example 1.1 in the Introduction.

A. Notations

The following notations will be used in the remainder of

the paper. These notations are the same as in the compan-

ion paper 9].

(5)

a). The minimum and maximum eigenvalues of a matrix X are denoted by 

min

( X ) and 

max

( X ), respectively, and

k

X

k

=

4f



max

( XX  )

g12

k

X

k

p =

4f

E (

k

X

k

p )

g1p

 p 1 :

b). Let x =

f

x k (  ) k 1

g

be a random sequence pa- rameterized by 

2

(0  1) . Denote

L

p ( 



) =



x : sup

2(0



]

k sup

1 k

x k (  )

k

p <

1



(28)

c). Let F =

f

F k (  )

g

be any (square) matrix random process parameterized by 

2

(0  1). For any p 1  

2

(0  1), de ne

S

p ( 



) =

f

F :

k Y

k

j

=

i

+1

( I

;

F k (  ))

k

p



M (1

;

 ) k

;

i 

8



2

(0 



] 

8

k i 0  for some M > 0and 

2

(0  1)

g

similarly,

S

( 



) =

f

F :

k Y

k

j

=

i

+1

( I

;

E F k (  )])

k

M (1

;

 ) k

;

i 

8



2

(0 



] 

8

k i 0  for some M > 0  and 

2

(0  1)

g

In what follows, it will be convenient to introduce the sets

S

p =

4 

2(0



1)

S

p ( 



) 

S

=

4 

2(0



1)

S

( 



)  (29) We may call these stability sets. They are related to the stability of random equation (19) and deterministic equa- tion (20), respectively. For simplicity, we shall sometimes suppress the parameter (  ) in F k (  ), when there is no risk of confusion.

d). For scalar random sequences a = ( a k  k 0), we set

S0

(  ) =

f

a : a k

2

0  1] E

Y

n

j

=

i

+1

(1

;

a j )



M k

;

i 

8

k i 0  for some M > 0

g

: Also,

S

0

=

4 

2(0



1)

S

0

(  ) (30)

e). Let p 1 and let x =

4f

x i

g

be any random process.

Set

M

p =

(

x :

k

m

X+

n

i

=

m

+1

x i

k

p



C p n

12



8

n 1 m 0  for some C p depending only on p and x

)

:

As is known for example from 10], martingale dierence sequence, - and  -mixing sequences, and linear processes (a process generated from a white noise source via a linear lter with absolutely summable impulse response) are all in the set

M

p .

In particular, when

f

x i

g

is a martingale dierence se- quence, by the Burkholder inequality we have ( p > 1)

k

m

X+

n

i

=

m

+1

x i

k

p



( B p x



p ) n

12



8

n 1 m 0 (31) where x



p = sup

4

k

k

x k

k

p , and B p is a constant depending on p only (cf. 11]). (This fact will be frequently used in the sequel without explanations).

f). Let

f

A k

g

be a matrix sequence, b k 0 

8

k 0.

Then by A k = O ( b k ) we mean that there exists a constant M <

1

such that

k

A k

k

Mb k 

8

k 0 :

The constant M may be called the ordo-constant.

Throughout the sequel, the ordo-constant does not depend on  , even if

f

A k

g

or

f

b k

g

does.

B. Assumptions

We will rst show that given the exponential stability of the homogenous part of (19) and a certain weak depen- dence property of the adaptation gains, how the tracking performance can be analyzed, then we present more de- tailed discussions on such properties.

In the sequel, unless otherwise stated,

F

k denotes the -algebra generated by

f

' i  w i  v i

;1

 i



k

g

, and

f

F k

g

is de ned in (19).

To establish the general theorem, we need the following assumptions:

(A1). (Exponential stability) There are 

 2

(0  1), and p 2 such that

f

F k

g2S

p ( 



)

\S

( 



)

(A2). (Weak dependence) There is a real number q 3 together with a bounded function ( m ) 0  with

m

lim

!1!0

( m ) = 0

(taking rst m to in nity and then  to zero) such that

8

m

8

k

8



2

(0 



]

k

E F k

jF

k

;

m ]

;

E F k ]

k

q



( m )

(A3). L i

2F

i 

8

i 1, and there is 

 2

(0  1) such that

f

L i

g2L

r ( 



) 

f

F i

g2L2

q ( 



) 

with r = (12

;

p 1

;

32 q )

;1

, and with p and q de ned as

in (A1) and (A2).

(6)

(A4). For all k 1 we have

E v k

jF

k ] = 0  E w k

+1jF

k ] = E w k

+1

v k

jF

k ] = 0 E v k

2jF

k ] = R v ( k )  E w k

+1

w k

+1

] = Q w ( k + 1) E

j

v k

j

r

jF

k ] + E

k

w k

+1k

r ]



M <

1



8

k 1 for deterministic quantities R v ( k ), Q w ( k + 1) and M , where r is de ned as in (A3).

The key conditions are (A1) and (A2). In general, (A1) can be guaranteed by a certain type of stochastic persis- tence of excitation condition, which is studied in the com- panion paper 9] while (A2) can be guaranteed by impos- ing a certain weak dependence condition on the regressor

f

' i

g

. More detailed discussions will be given later. At the moment, we just remark that if (A1) and (A2) hold for all p 1 and all q 1, then in (A3) and (A4), the number r needs only to satisfy r > 2.

C. The General Theorem

Now, recursively de ne a matrix sequence

f

^ k

g

as fol- lows:

^ k

+1

= ( I

;

E F k ])^ k ( I

;

E F k ]) 

+ 

2

R v ( k ) E L k L k ] + 

2

Q w ( k + 1)  (32) where ^

0

= E 

e0e

 

0

], and R v ( k ) and Q w ( k +1) are de ned in Assumption (A4). Note that this de nition is very close to the de nition of  k in (22). We now have a result that is the "mother-theorem" of Theorem 3.1:

Theorem 4.1 Let Assumptions (A1)-(A4) hold. Let the tracking error 

e

k be de ned by (11) (or (19)), and let ^ k

de ned by (32). Then

8



2

(0 



] 

8

k 1:

k

E 

e

k

+1



e

k

+1

]

;

^ k

+1k

c (  )  + 

2

 + (1

;

 ) k ] where c > 0 and 

2

(0  1) are constants and (  ) is a function that tends to zero as  tends to zero. It is de ned by (  )

4

= min m

1 f

p

m + ( m )

g

: The proof is given in Appendix A.

Next, we show that under more conditions, the expres- sion for ^ k in (32) can be further simpli ed.

Corollary 4.1 . Under the conditions of Theorem 4.1, if F k = P k ' k ' k with

k

' k

k2

t = O (1),

k

F k

k

t = O (1), for some t > 1, and if there are some function (  ), tending to zero as  tends to zero, and some deterministic sequence

f

R k

g

such that

k

P k

;

R k

k

s = O ( (  ))

8

k 

8



2

(0 



]  s = (1

;

t

;1

)

;1

then we have (

8



2

(0 



] 

8

k 1)

k

E 

e

k

+1



e

k

+1

]

;

 k

+1k



c (  ) + (  )]



 + 

2

 + (1

;

 ) k

(33)

for some constants c > 0 and 

2

(0  1), where  k is recur- sively de ned by

 k

+1

= ( I

;

R k S k ) k ( I

;

R k S k )  +



2

R v ( k ) R k S k R k + 

2

Q w ( k + 1)  (34) with S k = E ' k ' k ] and 

0

= ^

0

.

Proof . By Theorem 4.1, we need only to show that

k

^ k

+1;

 k

+1k

= O

(  )  + 

2

 + (1

;

 ) k ]

 This can be derived by straightforward calculations based on the equations for ^ k and  k , and hence Corollary 4.1 is true.

Remark . If in Condition (A2),

( m ) = O ( ( m ) + (  ))  (  ) = min m

1

p

m + ( m )]  then (  ) de ned in Theorem 4.1 satis es (  ) = O ( (  )).

This will be the case for RLS and KF algorithms in Theo- rem 3.1, as can be seen from section V.

The following result also follows directly from Theorem 4.1.

Corollary 4.2. If, in addition to the conditions of The- orem 4.1, R v ( k )

R v Q w ( k )

Q w , and there are F G

and a function (  ), tending to zero as  tends to zero, such that

8



2

(0 



],

k

EF k

;

F

k

+

k

E ( L k L k )

;

G

k

(  ) 

8

k then for some 

2

(0  1) and for all 

2

(0 



] k 1 

E 

e

k

+1



e

k

+1

]

=  + O

(  ) + (  )]  + 

2

 ]

+ O

;

(1

;

 ) k

(35) where  satis es the following Lyapunov equation:

F  +  F  = R v G + 

2

 Q w : (36) Now denote

R v = R v

Z

1

0

e

;

Ft Ge

;

F



t  Q w =

Z

1

0

e

;

Ft Q w e

;

F



t  the solution to the Lyapunov equation (36) can be ex- pressed as

 = R v + 

2

 Q w 

in which there is a reminiscence of the results obtained in

the simple example discussed in Section 1 (see, (9)).

(7)

D. Discussion on the Assumptions

Now, let us discuss the key assumptions (A1) and (A2).

First, assumption (A1) has been studied in the compan- ion paper 9], and here we only give some results concerning

f

F k

g2S

, which will be used shortly in the next section.

Proposition 4.1. Let

f

G k

g

be a random matrix pro- cess, possibly dependent on  , with the property

E

k

G k

k

(  )  for all small  and all k (37) where (  )

!

0 as 

!

0 :

Then

f

F k

g2S ()f

F k + G k

g2S

.

Proof. Suciency: Recursively de ne (

8

x :

k

x

k

= 1) x k

+1

= ( I

;

E F k + G k ]) x k 

8

k m x m = x

Then

x k

+1

= ( I

;

E ( F k )) x k

;

E ( G k ) x k

=

Y

n

i

=

m I

;

E ( F i )] x m

;

n

X

i

=

m 

Y

n

j

=

i

+1

I

;

E ( F j )] E ( G i ) x i

Consequently, similar to the proof of Theorem 3.1 in 9], by the Gronwall inequality we have

k

x n

+1k

2 M (1

;

 ) n

;

m

+1

8

<

:

1 +

X

n

i

=

m n

Y

j

=

i

+1

(1 + E

k

G j

k

)

E

k

G i

k

9

=



From this and the condition (37), it is not dicult to con- vince oneself that

f

F k + G k

g2S

.

Necessity: by using the fact proved above, and noting that F k = ( F k + G k )

;

G k , we know that

f

F k

g2S

. This completes the proof.

The following useful result follows from Proposition 4.1 # immediately.

Proposition 4.2. Let F k = P k H k and the following conditions be satis ed:

(i).

f

H k

g2L

t ( 



), 

2

(0  1)  t 1.

(ii).

k

P k

;

P k

k

s



(  )  ,

8



2

(0 



], where, (  )

!

0 as 

!

0, s = (1

;

t

;1

)

;1

, and

f

P k

g

is a deterministic process.

Then

f

F k

g2S ()f

P k H k

g2S

.

Proof. The result follows directly from Proposition 4.1, if we note that

F k = P k H k + ( P k

;

P k ) H k :

We now turn to discuss the weak dependence condition # (A2).

Example 4.1 Let

f

' i

g

satisfy (21), and L (

) : R d

;!

R d



d be a real matrix function with

k

L ( ' ( k ))

k

q = O (1), for some 1



q

1

. Then we have the following inequality (c.f. 19]):

k

E L ( ' k )

jF

k

;

m ]

;

EL ( ' k )

k

q = O ( ( m )]

1;1q

) :

8

k m Hence, if F k = L ( ' k ), then condition (A2) holds. (38)

Note that when

f

' k

g

satis es condition P2 in Section 3, we have by taking q =

1

in (38)

k

E ' k ' k

jF

k

;

m ]

;

E' k ' k

k1

= O ( ( m )) : (39) This fact will be used in the next section in the proof of

Theorem 3.1. #

Example 4.2 Let

f

' k

g

be generated by



x k = Ax k

;1

+ B k  (A stable) : ' k = Cx k +  k

where

f

 j  j k + 1

g

and

f

v j

;1

w j  j



k

g

are indepen- dent, and

f

 j

g

is an independent sequence. Assume that

sup k E

k

 k

k(

b

+1)

q <

1

 for some b 0  q 1 : Then for any function L (

) : R d

;!

R d



d , with

k

L ( x )

;

L ( x

0

)

k

M (

k

x

k

+

k

x

0k

+ 1) b

k

x

;

x

0k



8

x x

0

 there is a constant 

2

(0  1) such that (cf. 14])

8

m 0

8

k 0

k

E L ( ' k

+

m )

jF

k ]

;

EL ( ' k

+

m )

k

q = O (  m ) Hence, if F k = L ( ' k ), then again, condition (A2) holds.

The following simple result will be useful in the sequel. #

Proposition 4.3 Let F k = P k L ( ' k ), and the following two conditions hold:

(i). There is a bounded deterministic matrix sequence

f

P k

g

, and a function (  ) tending to zero as  tends to zero, such that

k

P k

;

P k

k

s



(  ) 

8



2

(0 



]  for some s > 1 (ii). There is a number r > 1 such that

k

L ( ' k )

k

r = O (1), together with a function ( m ) tending to 0 as m tends to in nity, such that

k

E L ( ' k

+

m )

jF

k ]

;

EL ( ' k

+

m )

k

q



( m ) 

8

k

8

m ( q = ( r

;1

+ s

;1

)

;1

) :

Then condition (A2) holds with ( m ) = O ( ( m )+ (  )) : Proof. The result follows directly from the following identity:

E F k

+

m

jF

k ]

;

EF k

+

m

= ( P k

+

m

;

P k

+

m ) L ( ' k

+

m )

jF

k ]

;

E



( P k

+

m

;

P k

+

m ) L ( ' k

+

m )



+ P k

+

m

f

E L ( ' k

+

m )

jF

k ]

;

EL ( ' k

+

m )

g

:

#

(8)

V. Analysis of the Basic Algorithms In this section, we shall show that, for the basic LMS, RLS and KF algorithms, conditions (A1)-(A3) of the pre- vious section can be guaranteed by imposing some explicit (stochastic excitation and weak dependence) conditions on the regressors

f

' k

g

, and at the same time prove Theorem 3.1.

A. Analysis of LMS

For the LMS de ned by (11) - (12), let us introduce the following two kinds of weak dependence conditions:

L1). Condition P2 of Section 3 is satis ed but with the boundedness condition on

f

' k

g

relaxed to the following : There exist positive constants " M and K such that

E exp

f X

n

j

=

i

+1

"

k

' j

k2+



g

M exp

f

K ( n

;

i )

g 8

n i 0 :

L1'). The random process F k =

4

' k ' k has the following expansion:

F k =

X1

j

=0

A j Z k

;

j + D k 

X1

j

=0

k

A j

k

<

1

where

f

Z k

g

is an independent process such that

f

Z j j k + 1

g

and

f

v j

;1

w j j



k

g

are independent and satis es

sup k E exp

f



k

Z k

k1+



g

<

1

 for some  > 0  > 0  and where

f

D k

g

is a bounded deterministic process.

Theorem 5.1 . Let Conditions P1 and P3 of Section 3 be satis ed. If either L1) or L1') above holds, then Condi- tions (A1)-(A4) of Theorem 4.1 hold (for all p 1 q 1) and Theorem 3.1 is true for the LMS case.

Proof . First, in the LMS case, Conditions P1 and L1 (or L1') ensure that Condition (A1) of Theorem 4.1 holds for all p 1 (cf. 9], Theorem 3.3). Next, when L1) holds, by Example 4.1 we know that Condition (A2) is true for all q 1. Also, when L1') holds, by the assumed independency we have for all q 1,

k

E F k

jF

k

;

m ]

;

EF k

k

q =

kX1

j

=

m A j Z k

;

j

;

EA j Z k

;

j ]

k

q

= O (

X1

j

=

m

k

A j

k

) 

8

m 1 : Hence (A2) holds again for all q 1.

Moreover, Conditions (A3) and (A4) hold obviously in the present case. Finally, by 39, the result of Theorem 3.1(in the LMS case) follows directly from Theorem 4.1.

This completes the proof.

B. Analysis of RLS

For the RLS algorithm de ned by (11), (13) and (14), let us introduce the following two kinds of excitation con- ditions:

R1) . There exist constants h > 0 c > 0  > 0 such that P

(

 min ( k

X+

h

i

=

k

+1

' i ' i ) c

jF

k

)

> 

8

k

R1') . There exists h > 0 such that sup k E

"

 min ( k

X+

h

i

=

k

+1

' i ' i )]

#

;

t

<

1



8

t 1 : The following weak dependence condition will also be used:

R2) . There exists a number t 5, such that

k

' k

k4

t = O (1), and that

k

E ' k ' k

jF

k

;

m ]

;

E' k ' k

k2

t



( m ) 

8

k m where ( m )

!

0 as m

!1

.

Remark 5.1 . Detailed discussions and investigations on the above rst two conditions can be found in 10] and 17].

It has been shown in 10] that if Condition P1 and (21) in Section 3 hold, then R1) is true Also, if

f

' k

g

is generated by a linear state space model as in Example 4.2, then R1') can be veri ed (cf. 17]). Moreover, Condition R2) has been discussed in the last section.

Theorem 5.2. Let Conditions R1 ( or R1') and R2 above be satis ed. Then Conditions (A1)-(A3) of Theorem 4.1 hold (for any p < 2 tq < t ) and Theorem 3.1 is true for the RLS case.

Proof. First, note that

k

Y

j

=

i

+1

( I

;

F j ) = (1

;

 ) k

;

i P k

+1

P i

;+11



8

k i (40) and P k

;1

= (1

;

 ) P k

;;11

+ ' k ' k : (41) From this and condition R2 it follows that

k

P k

;1k2

t = O (1) 

8



2

(0  1) : (42) Also, by Theorem 1 in 10], there is 

2

(0  1) such that

f

P k

g2L

s ( 



) 

8

s 1 (43) Combining (40), (42), (43), we get

f

F k

g2S

p 

8

p < 2 t: (44)

(9)

Now, de ne ( P

0

= P

0

)

P

;

k

1

= (1

;

 ) P

;

k

;11

+ E ( ' k ' k ) : (45) Since either R1 or R1' implies P1 in Section 3 (cf. 10]), by a similar (actually simpler) argument as that used for the proof of (43) we know that

k

P k

k

= O (1) : We next prove that

k

P k

;1;

P

;

k

1k2

t = O ( (  ))  (  ) = min m

1f

p

m + ( m )

g

: First, by (41) and (45) (46)

P k

;1;

P

;

k

1

= 

X

k

i

=1

(1

;

 ) k

;

i ' i ' i

;

E' i ' i ] (47) For any xed m 1, by denoting

j ( i ) = E ' i ' i

jF

i

;

j ]

;

E ' i ' i

jF

i

;

j

;1

]  0



j



m

;

1 we have

' i ' i

;

E' i ' i

= m

X;1

j

=0

j ( i ) +

f

E ' i ' i

jF

i

;

m ]

;

E ' i ' i ]

g

(48) Now, since for each j , the sequence

f

j ( i ) i 1

g

is a mar- tingale dierence, we can apply Lemma A.2 in the Ap- pendix to each such

f

j ( i ) i 1

g

to obtain



kX

k

i

=1

(1

;

 ) k

;

i m

X;1

j

=0

j ( i )

k2

t = O (

p

m ) (49) Also, by our assumption



kX

k

i

=1

(1

;

 ) k

;

i

f

E ' i ' i

jF

i

;

m ]

;

E ' i ' i ]

gk2

t



( m ) Hence, (46) follows from (47)-(50) immediately. (50)

Similar to the proof of (44), it is evident that



P k ' k ' k

2S

: (51) Now

k

P k

;

P k

kk

P k

k k

P k

;1;

P

;

k

1k k

P k

k

 from this, (43) and (46) it follows that

k

P k

;

P k

k

s = O ( (  )) 

8

s < 2 t (for small  )  (52) Hence, by Proposition 4.2 and (51), we know that

f

F k

g2

S

: This in conjunction with (44) veri es Condition (A1).

Now, by (52) and R2 from Proposition 4.3 it is evident that Condition (A2) holds for any q < t .

To prove (A3), rst note that for any q < t , (44) implies

f

F k

g2L2

q ( 



)  for some 



> 0 :

So we need only to prove that

f

L i

g2L

r ( 



)  for r > (12

;

2 1 t

;

2 3 t )

;1

= 2 t t

;

4 : This is true since by (43) and

k

' k

k4

t = O (1),

f

L i

g

=

f

P i ' i

g2L

r ( 



) 

8

r < 4 t

and since 4 Thus, by taking t > t 2

;

t 4. Hence (A3) holds. t =

1

in the above argument, we see that Conditions (A1) and (A2) hold for all p 1 and all q 1. Hence Theorem 4.1 can be applied to prove Theorem 3.1 for the RLS case, while the expression for  k will follow from Corollary 4.1 if we can prove that

k

P k

;

R k

k

s = O ( (  ))  s = t

t

;

1 (53) where P k and R k are respectively de ned by (14) and (24).

Furthermore, by (52), it is clear that (53) will be true if

k

R k

;

P k

k

= O ( (  ))

holds. However, this can be veri ed by using the de ni- tions for R k and P k (see Appendix B). Hence the proof is complete.

C. Analysis of the KF algorithm

Among the three basic algorithms described in Section 2, the KF algorithm de ned by (11), (16) and (17) is the most complicated one to analyze. Let us now introduce the following two conditions on stochastic excitation and weak dependence.

K1) . There are constants h > 0 and 

2

(0  1) (inde- pendent of ) such that



 k

1 + b kh

+1



2S0

(  ) 

where

S0

(  ) is de ned by (30), and  k and b k are de ned as follows: (

G

k is as before the sigma-algebra generated by

f

' i i



k

g

.)

 k =

4



min

8

<

:

E

2

4

1 1 + h

(

k

X+1)

h i

=

kh

+1

' i ' i 1 +

k

' i

k2jG

kh

3

5 9

=



 b k = (1

;

) b k

;1

+ (

k

' k

k2

+ 1) 

2

(0  1)

K2) . There exists a number t 7 together with a func- tion ( m )

!

0 ( as m

!1

) such that

k

' k

k4

t = O (1), and that

k

E ' k ' k

jF

k

;

m ]

;

E' k ' k

k

t



( m )

8

k m:

Remark 5.2 . If Conditions P1 and P2 of Section 3 are

satis ed, then both K1) and K2) above hold (cf. 10]) When

P2 is replaced by, for example, the situation discussed in

Example 4.2, then again, both K1) and K2) can be veri ed

(cf. 8]).

References

Related documents

Syftet eller förväntan med denna rapport är inte heller att kunna ”mäta” effekter kvantita- tivt, utan att med huvudsakligt fokus på output och resultat i eller från

• Utbildningsnivåerna i Sveriges FA-regioner varierar kraftigt. I Stockholm har 46 procent av de sysselsatta eftergymnasial utbildning, medan samma andel i Dorotea endast

Den förbättrade tillgängligheten berör framför allt boende i områden med en mycket hög eller hög tillgänglighet till tätorter, men även antalet personer med längre än

På många små orter i gles- och landsbygder, där varken några nya apotek eller försälj- ningsställen för receptfria läkemedel har tillkommit, är nätet av

In our first result, we prove that after suitable normalization, both the number of bids allocated by the greedy algorithm and those submitted jointly converge in distribution to

One of the solution algorithms developed by the speedcubing community for the use of human executors to solve the Professor’s Cube is known as Davenports algorithm[3]..

Figure 8: Comparison in speed between normal and randomized insertion sort using the original algorithms 10% lists with slowest sorting times... The distribution of M1 seems to

Our goal is to (i) develop an algorithm for solving regularized stochastic optimiza- tion problems which combines the strong performance guarantees of serial stochastic