Necessary and Sucient Conditions for Stability of LMS
Lei Guo y
Lennart Ljung z
and Guan-Jun Wang x
First Version: November 9, 1995 Revised Version: August 28, 1996
Abstract . In a recent work 7], some general results on exponential stabil- ity of random linear equations are established, which can be applied directly to the performance analysis of a wide class of adaptive algorithms including the basic LMS ones, without requiring stationarity, independency and boundedness assumptions of the system signals. The current paper attempts to give a complete characterization of the exponential stability of the LMS algorithms, by providing a necessary and sucient condition for such a stability in the case of possibly unbounded, nonstationary and non- -mixing signals. The results of this paper can be applied to a very large class of signals including those generated from, e.g., a Gaussian process via a time-varying linear lter. As an application, sev- eral novel and extended results on convergence and tracking performance of LMS are derived under various assumptions. Neither stationarity nor Markov chain assumptions are necessarily required in the paper.
This work was supported by the National Natural Science Foundation of China and the Swedish Research Council for Engineering Sciences (TFR).
y
Institute of Systems Science, Chinese Academy of Sciences, Beijing, 100080, P.R. China.
Email: Lguo@iss03.iss.ac.cn.
z
Department of Electrical Engineering, Linkoping University, S-581 83 Linkoping, Sweden.
Email: Ljung@isy.liu.se.
x
Department of Mathematics, The Central University for Nationalities, Beijing 100081, P. R. China.
1
1 Introduction
1.1 The Contribution
The well-known least mean squares (LMS) algorithm, aiming at tracking the
\best linear t" of an observed (or desired) signal
fy k
gbased on a measured d -dimensional (input) signal
fk
g, is dened recursively by
x k+1 = x k + k ( y k
;k x k ) x 0
2Rd (1) where > 0 is a step-size.
Due to its simplicity, robustness and ease of implementation, the LMS algo- rithm is known to be one of the most basic adaptive algorithms in many areas including adaptive signal processing, system identication and adaptive control, and it has received considerable attention in both theory and applications over the past several decades (see, among many others, the books 20], 19] and 2], the survey 14], and the references therein). Also, it has been found recently that the LMS is H
1-optimal in the sense that it minimizes the energy gain from the disturbances to the predicted errors, and it is also risk sensitive optimal and minimizes a certain exponential cost function (see 11]).
In many situations, it is desirable to know at least the answers to the following questions:
Is the LMS stable in the mean squares sense?
Does the LMS have good tracking ability?
How to calculate and to minimize the tracking errors?
Now, for a given sequence
fk
g, (1) is a linear, time-varying dierence equa- tion. The properties of this equation are essentially determined by the homoge- neous equation:
x k+1 = ( I
;k k ) x k (2)
with fundamental matrix
( tk ) =
Yt
j=k ( I
;j j ) (3)
2
The expression for tracking errors will then be of the form
t
X
k=1 ( tk ) v ( k ) (4)
where
fv ( k )
gdescribes the error sources (measurement noise, parameter varia- tions etc). As elaborated in, e.g., 8] and 6], the essential key to the analysis of (4) is to prove exponential stability of (3). This was also the motivation behind the work of 1]. We shall establish such exponential stability in the sense that for any p
1 there exist positive constants M , and
such that
E
k( tk )
kp ] 1=p
M (1
;) t
;k
8t
k
82(0
] : (5) The expectation E here is with respect to the sequence
fk
g.
Clearly, the property (5) is a property of the sequence
fk
gonly. We shall here establish (5) under very general conditions on
fk
g. These are of the kind (precise conditions are given in Theorem 2):
Restrictions on the dependence among the k : This takes the form that k
is formed by possibly time varying, but uniformly stable ltering of a noise source " j which is mixing and obeys an additional condition on the rate of decay of dependence.
Restrictions on the tail of the distribution of k . This takes the form that Eexp(
k" k
k2 )] < C
8k (6) for some > 0 and some constant C . Here " k is the \source" from which
k was formed.
Both these restrictions are very mild, and allow for example the Gaussian, de- pendent case (unlike most previous treatments). Now, for sequences k subject to these two restrictions the necessary and sucient condition for (5) to hold is
that k+h
X
i=k+1 E i i ]
I
8k
0 (7)
for some h > 0 and > 0. This is the \persistence of excitation" or \full rank"
condition on k .
This result is the main contribution of this paper. Furthermore, several direct applications of the stability result to adaptive tracking will be given under various noise assumptions, which in particular, yield more general results on LMS than those established recently in 8].
3
Most of the existing work related to exponential stability of (2) is concerned with the case where the signals
fk
gare independent or M -dependent (cf., e.g., 20], 19], 4], 1],2]). This independence assumption can be relaxed considerably if we assume that the signals
fk
gare bounded as in, e.g., 6],18] and 12].
Note that the boundedness assumption is suitable for the study of the so called normalized LMS algorithms (cf. 19], 6] and 15]), since the normalized signals are automatically bounded. In this case, some general results together with a very week (probably the weakest ever known) excitation condition for guaranteeing the exponential stability of LMS can be found in 6]. Moreover, in the bounded - mixing case, a complete characterization of the exponential stability can also be given. Indeed, in that case it has been shown in 6] that (7) is the necessary and sucient condition for (2) to be exponentially stable.
For general unbounded and correlated random signals, the stability analysis for the standard LMS algorithm (1), becomes more complex as to have deed complete solution for over 30 years. Recently, some general stability results ap- plicable to unbounded nonstationary dependent signals are established in 7], and based on which a number of results on the tracking performance of the LMS al- gorithms can be derived (see 8]). In particular, the result of 7] can be applied to a typical situation where the signal process is generated from a white noise sequence through a stable linear lter :
k =
X1j=
;1A j " k
;j + k
X1j=
;1kA j
k<
1(8)
where
f" k
gis an independent sequence satisfying
sup k E exp(
k" k
k)] <
1for some > 0 > 2 (9) and
fk
gis a bounded deterministic process.
It is obvious that the expression (8) has a similar form as the well-known Wold decomposition for wide-sense stationary processes. Note, however, that the signal process
fk
gdened by (8) need not be a stationary process nor a Markov chain in general.
Unfortunately, the condition (9) with > 2 excludes the case where
f" k
gis a
4
Gaussian process, since such signals could only satisfy a weaker condition : sup k E exp(
k" k
k2 )] <
1for some > 0 : (10) The motivation of this paper has thus been to relax the moment condition (9) so that, at least, the signal process
fk
gdened by (8) and (10) can be included.
This will be done in a more general setting based on a relaxation of the moment condition used in Theorem 3.2 of 7].
2 The Main results
2.1 Notations
Here we adopt the following notations introduced in 7].
a) . The maximum eigenvalue of a matrix X is denoted by max ( X ), and the Euclidean norm of X is dened as its maximum singular value, i.e.,
k
X
k=
4fmax ( XX )
g12and the L p -norm of a random matrix X is dened as
k
X
kp =
4 fE (
kX
kp )
gp1p
1
b) . For any square random matrix sequence F =
fF k
g, and real numbers p
1
2(0 1), the L p -exponentially stable family
Sp is dened by
S
p (
) =
8
<
:
F :
k Yk
j=i+1 ( I
;F j )
kp
M (1
;) k
;i
8
2(0
]
8k
i
0 for some M > 0and
2(0 1)
)
Likewise, the averaged exponentially stable family
Sis dened by
S
(
) =
8
<
:
F :
k Yk
j=i+1 ( I
;E F j ])
kM (1
;) k
;i
8
2(0
]
8k
i
0 for some M > 0 and
2(0 1)
)
In what follows , it will be convenient to set
S
p =
4 2(01)
S
p (
)
S=
4 2(01)
S
(
) (11)
5
c) . Let p
1 F =
4fF i
g. Set
M
p =
F : sup i
kS i (T)
kp = o ( T ) as T
!1(12) where
S i (T) = (i+1)T
X;1
j=iT ( F j
;E F j ]) (13)
The denition of
Mp is reminiscent of the law of large numbers. As shown by Lemma 3 of 9], it includes a large class of random processes.
2.2 The Main Results
We rst present a preliminary theorem.
Theorem 1 . Let
fF k
gbe a random matrix process. Then
f
F k
g2S=
) fF k
g2Sp
8p
1 provided that the following two conditions are satised:
(i). There exist positive constants "M and K such that for any n
1, E
"
exp
"
Xn
i=1
kF j
ik!#
M exp( Kn ) holds for any integer sequence 0
j 1 < j 2 ::::: < j n .
(ii). There exists a constant M and a nondecreasing function g ( T ) with g ( T ) = o ( T ), as T
! 1, such that for any xed T , all small > 0 and any n
i
0,
E
8
<
:
exp
0
@
Xn
j=i+1
kS (T) j
k1
A 9
=
M exp
fg ( T ) + o ( )]( n
;i )
gwhere S j (T) is dened by (13).
The proof is given in Section 4.
Remark 1 . The form of Theorem 1 is similar to that of Theorem 3.2 in 7].
The key dierence lies in the condition (i). This condition was introduced in 5], p.112 and is, in a certain sense, a relaxation of the corresponding condition used in Theorem 3.2 of 7]. Such a relaxation enables us to include Gaussian signals as a special case, when the LMS algorithms are in consideration, as will be shown shortly.
6
Based on Theorem 1 we may prove that for a large class of unbounded non- stationary signals including (8), the condition (7) is also necessary and sucient for the exponential stability of LMS.
Let us start with the following decomposition which is more general than that in (8):
k =
X1j=
;1A ( kj ) " k
;j + k
X1j=
;1sup k
kA ( kj )
k<
1(14) where
fk
gis a d -dimensional bounded deterministic process, and
f" k
gis now a general m -dimensional -mixing sequence. The weighting matrices A ( kj )
2R
d
m are assumed to be deterministic.
We remark that the summability condition in (14) is precisely the standard denition for uniform stability of time-varying linear lters (cf., e.g., 13]). Also, recall that a random sequence
f" k
gis called -mixing if there exists a non- increasing function ( m ) (called the mixing rate) with ( m )
20 1],
8m
0 and ( m )
!0 as m
!1such that
A
2F;1ksup B
2Fk1+mjP ( B
jA )
;P ( B )
j( m )
8m
0 k
2(
;11)
where by denition
Fi j ,
;1i
j
1, is the -algebra generated by
f" k i
k
j
g.
The -mixing concept is a standard one in the literature for describing weakly dependent random processes. As is well-known, the -mixing property is satised by, for example, any M-dependent sequences, sequences generated from bounded white noises via a stable linear lter, and stationary aperiodic Markov chains which are Markov ergodic and satisfy Doeblin's condition (cf. 3]).
The main result of this paper is then stated as follows.
Theorem 2 . Consider the random linear equation (2). Let the signal process
f
k
gbe generated by (14) where
fk
gis a bounded deterministic sequence, and
f
" k
gis a -mixing process which satises for any n
1 and any integer sequence j 1 < j 2 ::::: < j n
E
"
exp
X
n
i=1
k" j
ik2
!#
M exp( Kn ) (15) where M and K are positive constants. Then for any p
1, there exist constants
> 0, M > 0 and
2(0 1), such that for all
2(0
]
2
4
E
k Yt
j=k+1 ( I
;j j )
kp
3
5
1=p
M (1
;) t
;k
8t
k
0 (16)
7
if and only if there exists an integer h > 0 and a constant > 0 such that
k+h
Xi=k+1 E i i ]
I
8k
0 : (17)
The proof is also given in Section 4.
Remark 2 . By taking A ( k 0) = IA ( kj ) = 0
8k
8j
6= 0 and k = 0
8k in (14), we see that
fk
gcoincides with
f" k
g, which means that Theorem 2 is applicable to any -mixing sequences. Furthermore, if
f" k
gis bounded, then (15) is automatically satised. This shows that Theorem 2 may include the corresponding result in 6] as a special case.
Note, however, that a linearly ltered -mixing process like (14) will no longer be a -mixing sequence in general (because of the possible unboundedness of
f
" k
g). In fact, Theorem 2 is applicable also to a quite large class of processes other than -mixing, as shown by the following corollary.
Corollary 1 . Let the signal process
fk
gbe generated by (14) where
fk
gis a bounded deterministic sequence, and
f" k
gis an independent sequence satisfying condition (10). Then
fk k
g 2 Sp for all p
1 if and only if there exists an integer h > 0 and a constant > 0 such that (17) holds.
Proof . By Theorem 2, we need only to show that condition (15) is true. This is obvious since
f" k
gis an independent sequence satisfying (10).
2Remark 3 . Corollary 1 continues to hold if the independence assumption of
f
" k
gis weakened to M-dependence. Moreover, the moment condition (10) used in Corollary 1 may also be further relaxed if additional conditions are imposed.
This is the case when, for example,
fk
gis a stationary process generated by a stable nite dimensional linear state space model with the innovation process
f
" k
gbeing an i.i.d. sequence (see, 16]).
3 Performance of Adaptive tracking
Let us now assume that
fy k
gand
fk
gare related by a linear regression
y k = k x
k + v k (18)
where
fx
k
gis the true or \ctitious" time-varying parameter process, and
fv k
grepresents the disturbance or unmodeled dynamics.
8
The objective of the LMS algorithm (1) is then to track the time-varying un- known parameter process
fx
k
g. The tracking error will depend on the parameter variation process
fk
gdened by
k = x
k
;x
k
;1 (19)
through the following error equation obtained by substituting (18)-(19) into (1):
x ~ k+1 = ( I
;k k )~ x k + k v k
;k+1 (20) where ~ x k =
4x k
;x
k .
Obviously, the quality of tracking will essentially depend on properties of
f
k k v k
g. The homogeneous part of (20) is exactly the equation (2), and can be dealt with by Theorem 2. Hence, we need only to consider the non- homogeneous terms in (20). Dierent assumptions on
fk v k
gwill give dierent tracking error bounds or expressions, and we shall treat three cases separately in the following.
3.1 First Performance Analysis
By this, we mean that the tracking performance analysis is carried out under a \worst case" situation, i.e., the parameter variations and the disturbances are only assumed to be bounded in an averaging sense. To be specic, let us make the following assumption:
A1) . There exists r > 2 such that
= sup
4k
kv k
kr <
1and = sup
4k
kk
kr <
1Note that this condition includes any \unknown but bounded" deterministic disturbances and parameter variations as a special case.
Theorem 3 . Consider the LMS algorithm (1) applied to (18). Let condition A1) be satised. Also, let
fk
gbe as in Theorem 2 with (17) satised. Then for all t
1 and all small > 0
E
kx t
;x
t
k2 = O ( 2 + 2
2 ) + O (1
;] t ) where
2(0 1) is a constant.
9
This result follows immediately from Theorem 2, (20) and the Holder inequal- ity. We remark that various such \worst case" results for other commonly used algorithms(e.g., RLS and KF) may be found in 6]. The main implication of Theorem 3 is that the tracking error will be small if both the parameter variation ( ) and the disturbance ( ) are small.
3.2 Second Performance Analysis
By this, we mean that the tracking performance analysis is carried out for zero mean random parameter variations and disturbances which may be correlated processes in general. To be specic, we introduce the following set for r
1,
N
r =
8
<
:
w : sup k
kk+n
Xi=k+1 w i
kr
c wr
pn
8n
1
9
=
(21)
where c wr is a constant depending on r and the distribution of
fw i
gonly.
Obviously,
Nr is a subset of
Mr dened by (12). It is known (see 9]) that martingale dierence, zero mean
;and
;mixing sequences can all be included in
Nr . Also, from the proof of Lemma 3 in 9], it is known that the constant c wr can be dominated by sup k
kw k
kr in the rst two cases, and by sup k
kw k
kr+ ( > 0), in the last case.
Moreover, it is interesting to note that
Nr is invariant under linear transfor- mations. This means that if
fk
gand
f" k
gare related by (8) with k
0, then
f
" k
g 2 Nr implies that
fk
g 2 Nr . This can be easily seen from the following inequality:
k
k+n
Xi=k+1 i
kr =
k X1j=
;1A j k+n
Xi=k+1 " i
;j
kr
1
X
j=
;1kA j
k kk+n
Xi=k+1 " i
;j
kr
Thus, random processes generated from martingale dierences, or
;or
;mixing sequences via an innite order linear lter can all be included in
Nr .
Now, we are in a position to introduce the following condition for the second performance analysis.
A2) . For some r > 2,
fk
g2Nr and
fk v k
g2 Nr :
Theorem 4 . Consider the LMS algorithm (1) applied to the model (18). Let
f
k
gbe dened as in Theorem 2 with (17) satised, and let the condition A2)
10
hold for a certain r . Then for all t
1 and all small > 0, E
kx t
;x
t
k2 = O
( c v r ) 2 + ( c r ) 2
!
+ O (1
;] t )
where c v r and c r are the constants dened in (21), and which depend on the distributions of
fk v k
gand
fk
grespectively. Moreover, is the same constant as in Theorem 3.
Proof . By Lemma A.2 of 8] and Theorem 2, it is easy to see from (20) that
the desired result is true.
2Note that the upper bound in Theorem 4 signicantly improves the \crude"
bound given in Theorem 3 for small , and it roughly indicates the familiar trade-o between noise sensitivity and tracking ability.
Theorem 4 can be applied directly to the convergence analysis of some stan- dard ltering problems (cf. 20],4] and 2]). For example, let
fy k
gand
fk
gbe two stationary processes, and assume that our purpose is to track the least mean squares solution
x
= E ( k k )]
;1 E ( k y k )
of min x E ( y k
;x k ) 2
recursively based on real-time measurements
fy i i i
k
g. Now, dene
fv k
gby
y k = k x
+ v k
It is then obvious that E k v k = 0. Furthermore, in many standard situations it can be veried that
fk v k
g 2 Nr for some r > 2. Thus, Theorem 4 applied to the above linear regression, gives
E
kx t
;x
k2 = O ( ) + O (1
;] t ) which tends to zero as t
!1and
!0.
Apparently, Theorem 4 is also applicable to nonstationary signals
fy k
gand
f
k
g.
3.3 Third Performance Analysis
By this, we mean that the analysis is purposed to get an explicit (approxi- mate) expression for the tracking performance rather than just getting an upper
11
bound as in the previous two cases. This is usually carried out under white noise assumptions on
fk v k
g. Roughly speaking, the parameter process in this case will behave like a random walk, and some detailed interpretations of this param- eter model may be found in 14] and 8]. We make the following assumptions:
A3 . The regressor process is generated by a time-varying causal lter
k =
X1j=0 A ( kj ) " k
;j + k
X1j=0 sup k
kA ( kj )
k<
1(22) where
fk
gis a bounded deterministic sequence, and
f" k k v k
;1
gis a -mixing process with mixing rate denoted by ( m ). Assume also that (15) and (17) hold.
A4 . The process
fk v k
gsatises the following conditions:
( i ) : E v k
jFk ] = 0 E k+1
jFk ] = E k+1 v k
jFk ] = 0 ( ii ) : E v 2k
jFk ] = R v ( k ) E k k ] = Q ( k )
( iii ) : sup k E
jv k
jr
jF
k ]
M = sup
4k
kk
kr <
1where r > 2 and M > 0 are constants, and
Fk denotes the -algebra generated by
f" i i v i
;1 i
k
g.
Theorem 5 . Consider the LMS algorithm (1) applied to the model (18). Let conditions A3) and A4) be satised. Then the tracking error covariance matrix has the following expansion for all t
1 and all small > 0
E ~ x t x ~ t ] = ! t + O
( ) + 2
+ (1
;) t ]
!
where the function ( )
!0 as
!0, and ! t is recursively dened by
! t+1 = ( I
;S t )! t ( I
;S t ) + 2 R v ( t ) S t + Q ( t + 1) with S t = E t t ] and R v ( t ) and Q ( t ) being dened as in condition A4).
This theorem relaxes and unies the conditions used in Theorem 5.1 of 8].
The proof is given in Section 4. The expression for the function ( ) may be found from the proof, and from the related formula in Theorem 4.1 of 8]. (See (45)).
Note that in the (wide-sense) stationary case, S t
SR v ( t )
R v Q ( t )
Q , and ! t will converge to a matrix ! dened by the Lyapunov equation(cf.8])
S ! + ! S = R v S + Q
12
In this case, the trace of the matrix !, which represents the dominating part of the tracking error E
kx ~ t
k2 for small and large t , can be expressed as
tr (!) = 12 R v d + tr ( S
;1 Q )
]
where d =
4dim ( k ). Minimizing tr (!) with respect to , one obtain the following formula for the step-size :
=
s
tr ( S
;1 Q ) R v d :
4 Proof of Theorems 1, 2 and 5
Proof of Theorem 1 .
By the proof of Lemma 5.2 in 7] we know that Theorem 1 will be true if (32) in 7] can be established. However, by (34) in 7] and condition (ii), it is easy to see that we need only to show that for any xed c
1, t
1 and T > 1, and for all small > 0,
k
n
Y
j=i+1 (1 + 2 c
kH j
k)
kt
M
1 + O (
23)
n
;i
8n > i (23) where M > 0 is a constant and
2 H j = 2 H j (2) + 3 H j (3) +
+ T H j ( T ) + O ( 2 )
with H j ( k ) =
XjT
j
1<j
2<
<j
k(j+1)T
;1 F j
kF j
1k = 2
T:
Now, let us set
f j = exp
f14(j+1)T
X;1
s=jT
kF s
kgThen for any 2
k
T and jT
j 1 < ::: < j k
( j + 1) T
;1, by using the inequalities k
3 2 + k 4 and x
exp ( x ), we have for
2(0 1)
k
kF j
k:::F j
1k 32(
14kF j
kk) ::: (
14kF j
1k)
32exp
f14(
kF j
1k+ ::: +
kF j
kk)
g 32f j
13
Consequently,
(1 + 2 c
kH j
k)
T
Y
k=2 (1 + k c
kH j ( k )
k)(1 + O ( 2 ))
T
Y
k=2
Y
iT
j
1<j
2<j
k(i+1)T
;1 (1 + k c
kF j
kF j
1k)(1 + O ( 2 ))
(1 +
32cf j ) 2
T(1 + O ( 2 )) (24) Note that
n
Y
j=i+1 (1 +
32cf j ) = n
X;i
k=0 (
32c ) k
Xi+1
j
1<:::<j
kn f j
1:::f j
kNow, applying the Minkowski inequality to the above identity, noting the disjoint property of the sets
fj i T < j < ( j i + 1) T
;1
g, i = 1 2 ::: , for j 1 < j 2 <
::: , taking small enough so that 2 T t
14" and using Condition (i) it is evident that
k
n
Y
j=i+1 (1 +
32cf j )
k2
Tt
n
;i
X
k=0 (
32c ) k
Xi+1
j
1<:::<j
kn M
2T t1exp
f( KT 2 T t ) k
gM
2T t11 + c
23exp( KT 2 T t )
n
;i
Finally, from this and (24), we have for any n > i
k
n
Y
j=i+1 (1 + 2 c
kH j
k)
kt
n
Y
j=i+1 (1 +
32cf j )
2
T2
Tt 1 + O ( 2 )] n
;i
M
8
<
:
1 + c
32exp( KT 2 T t )
2
T9=
n
;i
1 + O ( 2 )] n
;i
M 1 + O (
32)] n
;i for all small > 0 which is (23). This completes the proof of Theorem 1.
2
14
The proof of Theorem 2 is rather involved, and so it is divided (prefaced) with several lemmas.
For the analysis to follow, it is convenient to rewrite (14) as
k =
X1j=
;1a j " ( kj ) + k
X1j=
;1a j <
1(25)
where by denition
a j = sup
4k
kA ( kj )
k" ( kj ) =
4a
;j 1 A ( kj ) " k
;j (26) (We set " ( kj ) = 0
8k , if a j = 0 for some j ).
The new process
f" ( kj )
ghas the following simple properties:
(i). For any k and j ,
k" ( kj )
kk" k
;j
k(ii). For any xed j , the process
f" ( kj )
gis -mixing with the same mixing rate as
f" k
g(iii). For any k and j , " ( kj ) is
f" k
;j
g-measurable.
These three properties will be frequently used in the sequel without further explanations.
Lemma 1 . Let
fF t
gbe a -mixing d
d dimensional matrix process with mixing rate
f( m )
g. Then
sup i
kS i (T)
k2
2 cd
(
T T
X;1
m=0
q
( m )
) 1
2
8T
1 where S i (T) is dened by (13) and c is dened by c = sup
4i
kF i
;EF i
k2 .
Proof . Denote G k = F k
;EF k . Then by Theorem A.6 in 10](p.278) we have
k
E G j G k ]
k2 dc 2
q(
jj
;k
j)
8jk Consequently, by using the inequality
j
trF
jd
kF
k 8F
2Rd
d We get
k
S i (T)
k2 2 = E
k(i+1)T
X;1
jk=iT G j G k
k15
tr
f(i+1)T
X;1
jk=iT EG j G k
gd (i+1)T
X;1
jk=iT
k
EG j G k
k2 c 2 d 2 (i+1)T
X;1
jk=iT
q
(
jj
;k
j)
4 c 2 d 2 T T
X;1
m=0
q
( m )
This gives the desired result.
2Lemma 2 . Let F k = k k , where
fk
gis dened by (14) with sup k
k" k
k4 <
1. Then
fF k
g2M2 where
M2 is dened by (12).
Proof . First of all, we may assume that the process
f" k
gis of zero mean (otherwise, the mean can be included in k ). Then by (25),
k
S i (T)
k2 =
k(i+1)T
X;1
t=iT t t
;E t t ]
k2
1
X
kj=
;1a k a j
k(i+1)T
;1
X
t=iT " ( tk ) " ( tj )
;E" ( tk ) " ( tj ) ]
k2
+2
X1j=
;1a j
k(i+1)T
X;1
t=iT " ( tj ) t
k2 (27)
Note that for any xed k and j , both the processes
f" ( tk ) " ( tj )
gand
f
" ( tj )
gare -mixing with mixing rate ( m
;jk
;j
j) and ( m ) respectively (where by denition, ( m ) = 1
4 8m < 0) .
By Lemma 1, it is easy to see that the last term in (27) is of order o ( T ). For dealing with the second last term, we denote
f kj ( T ) = 2 cd
(
T T
X;1
m=0
q
( m
;jk
;j
j)
) 1
2
: (28)
where c is dened as in Lemma 1. Consequently, by ( m )
1,
8m , it is not dicult to see that
sup kj f kj ( T )
2 cdT (29)
and sup
j
k
;j
j<
pT f kj ( T ) = o ( T ) : (30)
16
Now, by the summability of
fa j
g,
X
j
k
;j
j pT a k a j
!0 as T
!1Hence by (29)
X
j
k
;j
j pT a k a j f kj ( T ) = o ( T ) (31) and by (30)
X
j
k
;j
j<
pT a k a j f kj ( T ) = o ( T ) : (32) Combining (31) and (32) gives
1
X
kj=
;1a k a j f kj ( T ) = o ( T ) : (33) By this and Lemma 1, we know that the second last term in (27) is also of the order o ( T ) uniformly in i . Hence,
fF k
g2M2 by the denition (12).
2Lemma 3 . Let sup k E
kk
k2 <
1. Then
fk k
g 2Sif and only if condition (17) holds, where
Sis dened in (11).
Proof. Let us rst assume that (17) is true. Take
= (1 + sup k E
kk
k2 )
;1 . Then applying Theorem 2.1 in 6] to the deterministic sequence A k = E k k ] for any
2(0
], it is easy to see that
fk k
g2S(
).
Conversely, if
fk k
g2S, then there exists
2(0 (1+sup k E
kk
k2 )
;1 ] such that
fk k
g 2 S(
). Now, applying Theorem 2.2 in 6] to the deterministic sequence A k =
E k k ], it is easy to see that (17) holds. This completes the
proof.
2Lemma 4 . Let F k = k k , where
fk
gis dened by (14) with (15) satised.
Then
fF k
gsatises Condition (i) of Theorem 1.
Proof . Without loss of generality assume that k
0. Let us denote A =
X1j=
;1a j (34)
where
fa j
gis dened by (26). Then by the Schwarz inequality from (25) we have
k
k
k2
A
X1j=
;1a j
k" k
;j
k2
17
Consequently, by the Holder inequality and (15) we have for "
A
;2 E exp
f"
Xn
i=1
kF j
ikgE exp
f"A
X1j=
;1a j
Xn i=1
k
" j
i;j
k2
g
= E
Y1j=
;1exp
f"Aa j
Xn i=1
k
" j
i;j
k2
g
1
Y
j=
;1
E exp
f"A 2
Xn
i=1
k
" j
i;j
k2
g
!ajA
1
Y
j=
;1( M exp
fKn
g)
ajA= M exp
fKn
g:
This completes the proof.
2The following lemma was originally proved in 5] (p.113).
Lemma 5 . Let
fz k
gbe a nonnegative random sequence such that for some a > 0 b > 0 and for all i 1 < i 2 < :::::: < i n
8n
1,
E exp
fXn
k=1 z i
kgexp
fan + b
g: (35)
Then for any L > 0 and any n
i
0, E exp
f1
2
n
X
j=i+1 z j I ( z j
L )
gexp
fe a
;L2( n
;i ) + b
gwhere I (
) is the indicator function.
Proof . Denote
f j = exp(12 z j ) I ( z j
L ) :
Then by rst applying the simple inequality I ( x
L )
e
x2=e
L2and then using (35), we have for any subsequence j 1 < j 2 :::::: < j k
E f j
1::::::f j
k]
= E exp(12
Xk
i=1 z j
i)
Yk
i=1 I ( z j
iL )
E exp(
Xk
i=1 z j
i) = exp( kL 2 )
exp
f( a
;L
2 ) k + b
g18
By this we have
E exp
f Xn
j=i+1
1 2 z j I ( z j
L )
g= E
Yn
j=i+1 exp
f1
2 z j I ( z j
L )
gE
Yn
j=i+1
f1 + exp(12 z j ) I ( z j
L )
g= E
Yn
j=i+1
f1 + f j
g= E
8
<
:
n
;i
X
k=0
X
i+1
j
1<:::<j
kn f j
1:::f j
k9
=
e b
8
<
:
n
;i
X
k=0
X
i+1
j
1<:::<j
kn exp
f( a
;L 2 ) k
g9
=
= e b n
Yj=i+1
f
1 + exp( a
;L 2 )
gexp
( n
;i )exp( a
;L 2 ) + b
This completes the proof of Lemma 5.
2Lemma 6 . Let F k = k k , where
fk
gis dened by (14) with (15) satised.
Then
fF k
gsatises Condition (ii) of Theorem 1.
Proof . Set for any xed k and l , z j =
4z j ( kl ) =
k(j+1)T
X;1
t=jT " ( tk ) " ( tl )
;E" ( tk ) " ( tl ) ]
kThen, similar to (27) from (25) we have
n
X
j=i+1
k
S j (T)
k X1kl=
;1a k a l
Xn j=i+1 z j + +2
X1k=
;1a k
Xn j=i+1
k(j+1)T
X;1
t=jT " ( tk ) t
k: (36) We rst consider the second last term in (36). By the Holder inequality,
E exp
8
<
:
X1kl=
;1a k a l
Xn j=i+1 z j
9
=