of LMS
Lei Guo
Institute of Systems Science, Chinese Academy of Sciences Beijing, 100080, China
Lennart Ljung and
yDepartment of Electrical Engineering, Linkoping University S-581 83 Linkoping, Sweden
G. J. Wang
zInstitute of eSystems Science, Chinese Academy of Sciences Beijing, 100080, China
October 30, 1995
Abstract . In a recent work 7], some general results on exponential stabil- ity of random linear equations are established, which can be appled directly to the performance analysis of a wide class of adaptive algorithms including the basic LMS ones, without requiring stationarity, independency and boundedness assumptions of the system signals. The main purpose of this paper is to provide further results on exponential stability of the LMS algorithms, in particular, to
SupportedbytheNationalNaturalScienceFoundationofChina
y
SupportedbytheSwedish ResearchCouncilforEngineeringSciences(TFR)
z
SupportedbytheNationalNaturalScienceFoundationofChina
1
provide a necessary and sucient condition for such a stability in the case of possibly unbounded and non- -mixing signals. The results of this paper can be applied to a fairely large class of signals including those generated from, e.g., a Gaussian process via a stable linear lter. As an application, several re ned and extended results on convergence and tracking performance of LMS are derived under various assumptions. Neither stationarity nor Markov chain assumptions are necessarily required in the paper.
1 Introduction
The well-known least mean squares (LMS) algorithm, aiming at tracking the
\best linear t" of an observed (or desired) signal
fy k
gbased on a measured d -dimensional (input) signal
fk
g, is de ned recursively by 2]
x k+1 = x k + k ( y k
;k x k ) x 0
2Rd (1) where > 0 is a step-size.
Due to its simplicity, robustness and ease of implementation, the LMS algo- rithm is known to be one of the most basic adaptive algorithms in many areas including adaptive signal processing, system identi cation and adaptive control, and it has received considerable attention in both theory and applications over the past several decades (see, among many others, the books 18], 17] and 2], the survey 13], and the references therein). Also, it has been found recently that the LMS is H
1-optimal in the sense that it minimizes the energy gain from the disturbances to the predicted errors, and it is also risk sensitive optimal and minimizes a certain exponential cost function (see, 11]).
In many situations, we would like to know at least the answers to the following questions: Is the LMS stable in the mean squares sense ? Does the LMS has good tracking ability ? and how to calculate and to minimize the tracking errors ?
It is shown in 9] that the study of the last two questions essentially depends on the rst one, which in turn necessarily depends on the exponential stability study of the following homogeneous equation of LMS (cf.6]):
x k+1 = ( I
;k k ) x k (2)
This equation is in essence a product of random matrices, and its stability depends mainly on the properties of the measured signals
fk
g. Most of the
2
early works in this direction concern with the case where the signals
fk
gare independent or M -dependent (cf.18], 4], 1]). This independence assumption can be relaxed considerably if we assume that the signals
fk
gare bounded as in e.g. 17], 6], and 12]. Note that the boundedness assumption is suitable for the study of the so called normalized LMS algorithms (cf. 17], 6] and 14]), since the normalized signals are automatically bounded. In this case, some general results together with a very week (probably the weakest known) excitation condition for guaranteeing the exponential stability of LMS can be found in 6]. Moreover, in the bounded -mixing case, a complete characterization of the exponential stability can also be given. Indeed, in that case it has been shown in 6] that the
necessary and sucient condition for (2) to be exponentially stable is that there exist an interger h > 0 and a constant > 0 such that
k+h
Xi=k+1 E i i ]
I
8k: (3)
For general unbouded and correlated signals, the stability analysis (for the unnormalized LMS algorithm (1)), becomes more complex as to have de ed com- plete solution for over 30 years. Recently, some general stability results applicable to possibly unbounded nonstationary weakly dependent signals are established in
8], and based on which a number of results on the tracking performance of the LMS algorithms can be derived (see ? ] ). In particular, the result of 8] can be applied to a typical situation where the signal process is generated from a white noise sequence through a stable linear lter :
k =
X1j=
;1A j " k
;j + k
X1j=
;1k
A j
k<
1(4) where
f" k
gis an independent sequence satisfying
sup k E exp
nk" k
ko<
1for some > 0 > 2 (5) and where
fk
gis a bounded deterministic process.
It is obvious that the expansion (4) has a similar form as the well-known Wold decomposition for wide-sense stationary processes. Note, however, that the signal process
fk
gde ned by (4) may neither be a stationary process nor be a Markov chain in general.
3
Unfortunately, the condition (5) with > 2 excludes the case where
f" k
gis a Gaussian process, since such signals could only satisfy a weaker condition :
sup k E exp
nk" k
k2
o<
1for some > 0 : (6) The motivation of this paper has been to relax the moment condition (5) so that, at least, the signal process
fk
gde ned by (4) and (6) can be included. This will be done based on a relaxation of the moment condition used in Theorem 3.2 of 8]. Moreover, we will show that for a large class of weakly dependent nonstationary signals, the condition (3) is also necessary and sucient for
the exponential stability of (2), even in the case where the signal process
fk
gis unbounded and non- -mixing. Furthermore, several direct applications of the stability result to adaptive tracking will be given, which yield more general results than those established previously.
2 The Main results
2.1 Notations
Here we adopt the following notations introduced in 8].
a) . The maximum eigenvalue of a matrix X is denoted by max ( X ), and the Euclidean norm of X is de ned as its maximum singular value, i.e.,
k
X
k=
4fmax ( XX )
g12and the L p -norm of a random matrix X is de ned as
k
X
kp =
4 fE (
kX
kp )
gp1p
1
b) . For any square random matrix sequence F =
fF k
g, and real numbers p
1
2(0 1), the stochastic exponentially stable family is de ned by
S
p (
) =
8
<
:
F :
k Yk
j=i+1 ( I
;F j )
kp
M (1
;) k
;i
8
2(0
]
8k
i
0 for some M > 0and
2(0 1)
)
4
Likewise, the corresponding deterministic exponentially stable family is de- ned by
S
(
) =
8
<
:
F :
k Yk
j=i+1 ( I
;E F j ])
kM (1
;) k
;i
8
2(0
]
8k
i
0 for some M > 0 and
2(0 1)
)
In what follows , it will be convenient to set
S
p =
4 2(01)
S
p (
)
S=
4 2(01)
S
(
) (7)
c) . Let p
1 F =
4fF i
g. Set
M
p =
F : sup i
kS i (T)
kp = o ( T ) as T
!1(8) where
S i (T) = (i+1)T
X;1
j=iT ( F j
;E F j ]) (9)
The de nition of
Mp is reminicense of the law of large numbers. As shown by Lemma 3 of 10], it includes a large class of random peocesses.
2.2 The Main Results
We rst present a preliminary theroem.
Theorem 1 . Let
fF k
gbe a random matrix process. Then
f
F k
g2S=
) fF k
g2Sp
8p
1 provided that the following two conditions are satis ed:
(i). There exist positive constants "M and K such that for any n
1, E exp
(
"
Xn
i=1
kF j
ik)
M exp
fKn
gholds for any interger sequence 0
j 1 < j 2 ::::: < j n .
(ii). There exist a constant M and a nondecreasing function g ( T ) with g ( T ) = o ( T ), as T
!1, such that for any xed T , all small > 0 and any n
i
0,
E exp
8
<
:
Xn
j=i+1
kS (T) j
k9
=
M exp
fg ( T ) + o ( )]( n
;i )
g5
where S (T) j is de ned by (9).
The proof is given in Section 4.
Remark 1 . The form of Theorem 1 is similar to that of Theorem 3.2 in 8].
The key dierence lies in the condition (i). This condition was introduced in 5], p.112 and is, in a certain sense, a relaxation of the corresponding condition used in Theorem 3.2 of 8]. Such a relaxation enables us to include Gaussian signals as a special case, when the LMS algorithms are in consideration, as will be shown shortly.
Based on Theorem 1 we may prove that for a large class of unbounded nonsta- tionary signals including (4), the condition (3) is also a necessary and sucient condition for the exponential stability of LMS. Let us start with the decomposi- tion (4):
k =
X1j=
;1A j " k
;j + k
X1j=
;1k
A j
k<
1(10) where
fk
gis a bounded deterministic process, and
f" k
gis now a general -mixing sequence.
Recall that a random sequence
f" k
gis called -mixing if there exists a positive nonincreasing function ( m ) with ( m )
!0 as m
!1such that
A
2F;1ksup B
2Fk1+mjP ( B
jA )
;P ( B )
j( m )
8m
0 k
2(
;11) where by de nition
F
i j =
f" k i
k
j
g ;1i
j
1The -mixing concept is a standard one in the literature for describing weakly dependent random processes. As is well-known, it can be veri ed by, for exam- ple, any M-dependent sequences, sequences generated from bounded white noise processes via a stable linear lter, and stationary aperiodic Markov chains which are Markov ergodic and satisfy Doeblin's condition (cf. 3]).
The main result of this paper is then stated as follows.
Theorem 2 . Consider the random linear equation (2). Let the signal process
f
k
gbe generated by (10) where
fk
gis a bounded deterministic sequence, and
f
" k
gis a -mixing process which satis es for any n
1 and any interger sequence j 1 < j 2 ::::: < j n
E exp
(
Xn
i=1
k" j
ik2
)
M exp
fKn
g(11)
6
where M and K are positive constants. Then
fk k
g2Sp for all p
1 if and only if there exist an integer h > 0 and a constant > 0 such that
k+h
Xi=k+1 E i i ]
I
8k
0 : (12)
The proof is also given in Section 4.
Remark 2 . By taking A 0 = IA k = 0 k
6= 0 and k = 0
8k in (10), we see that
fk
gconcides with
f" k
g, which means that Theorem 2 is applicable to any -mixing sequences. Furthermore, if
f" k
gis bounded, then (11) is automatically satis ed. This shows that Theorem 2 may include the corresponding result in 6]
as a special case.
Note, however, that a linearly ltered -mixing process like (10) will no longer be a -mixing sequence in general. In fact, Theorem 2 is applicable also to a large class of processes other than -mixing, as shown by the following corollary.
Corollary 1 . Let the signal process
fk
gbe generated by (4) where
fk
gis a bounded deterministic sequence, and
f" k
gis an independent sequence satis ng condition (6). Then
fk k
g 2 Sp for all p
1 if and only if there exist an integer h > 0 and a constant > 0 such that (12) holds.
Proof . By Theorem 2, we need only to show that condition (11) is true. This is obvious since
f" k
gis an indedpendent sequence satisfying (6).
Remark 3 . The moment condition (6) used in Corollary 1 may be further relaxed if more conditions are imposed. This is the case when, for example, the regressor process is a Markov Chain generated by a nite dimensional linear stable state space model with the innovation process being an i.i.d. sequence (see, 15]).
3 Applications to Adaptive tracking
Let us now assume that
fy k
gand
fk
gbe related by a linear regression
y k = k x
k + v k (13)
where
fx
k
gis the true or \ ctitious" time-varying parameter process, and
fv k
grepresents the disturbance or unmodeled dynamics.
The objective of the LMS algorithm (1) is then to track the time-varying un- known parameter process
fx
k
g. The tracking error will depend on the parameter
7
variation process
fk
gde ned by
k = x
k
;x
k
;1 (14)
through the following error equation obtained by substituting (13)-(14) into (1):
x ~ k+1 = ( I
;k k )~ x k + k v k
;k+1 (15) where ~ x k =
4x k
;x
k .
Obviously, the quality of tracking will essentially depends on properties of
f
k k v k
g. The homogenous part of (15) is exactly the equation (2), and can be dealt with by Theorem 2. Hence, we need only to consider the forcing terms.
Dierent assumptions on
fk v k
gwill give dierent tracking error bounds or expressions, and we shall treat three cases seperately in the following.
3.1 First Performance Analysis
By this, we mean that the tracking performance analysis is carried out under a \worst case" situation, i.e., the parameter variations and the disturbances are only assumed to be bounded in an averaging sense. To be speci c let us make the following assumption:
A1) . There exists r > 2 such that
= sup
4k
kv k
kr <
1and = sup
4k
kk
kr <
1Note that this condition also includes any \unknown but bounded" determin- istic disturbances and parameter variations.
Theorem 3 . Consider the LMS algorithm (1) applied to (13). Let condition A1) be satis ed. Also, let
fk
gbe as in Theorem 2 with (12) satis ed. Then for all t
1 and all small > 0
E
kx t
;x
t
k2 = O ( 2 + 2
2 ) + O (1
;] t ) where
2(0 1) is a constant.
This result follows immediately from Theorem 2, (15) and the Holder inequal- ity. We remark that various such \worst case" results for other commonly used algorithms(e.g., RLS and KF) may be found in 6]. The main implication of
8
Theorem 3 is that the tracking error will be small if both the parameter variation ( ) and the disturbance ( ) are small.
3.2 Second Performance Analysis
By this, we mean that the tracking performance analysis is carried out for zero mean parameter variations and disturbances which may be correlated random processes in general. To be speci c, we introduce the following set for r
1,
N
r =
8
<
:
w : sup
k
kk+n
Xi=k+1 w i
kr
c r ( w )
pn
8n
1
9
=
where c r ( w ) is a constant depending on r and
fw i
gonly.
Obviously,
Nr is a subset of
Mr de ned by (8). It is known (see 10]) that martingale dierence, zero mean
;and
;mixing sequences can all be included in
Nr . Also, from the proof of Lemma 3 in 10], it is known that the constant c r ( w ) can be dominated by sup k
kw k
kr in the rst two cases, and by sup k
kw k
kr+ ( > 0), in the last case.
Moreover, it is interesting to note that
Nr is invariant under linear transfor- mations. This means that if
fk
gand
f" k
gare related by (10) with k
0, then
f
" k
g 2 Nr implies that
fk
g 2 Nr . This can be easily seen from the following inequality:
k
k+n
Xi=k+1 i
kr =
k X1j=
;1A j k+n
Xi=k+1 " i
;j
kr
1
X
j=
;1kA j
kkk+n
Xi=k+1 " i
;j
kr
Thus, random processes generated from martigale dierences, or
;or
;mixing sequences via an in nite order linear lter can all be included in
Nr .
Now, we are in a position to introduce the following condition for the second performance analysis.
A2) . For some r > 2,
fk
g2Nr and
fk v k
g2 Nr :
Theorem 4 . Consider the LMS algorithm (1) applied to the model (13). Let
f
k
gbe de ned as in Theorem 2 with (12) satis ed, and let the condition A2) hold. Then for all t
1 and all small > 0,
E
kx t
;x
t
k2 = O
c 2r ( v ) + c 2r ()
!
+ O (1
;] t )
9
where c r ( v ) and c r () are constants depending on
fv k
gand
fk
grespectively, which may be found in condition A2) through the de nition of
Nr , and where is the same constant as in Theorem 3.
Proof . By Lemma A.2 of 9] and Theorem 2, it is easy to see from (15) that the desired result is true.
Note that the upper bound in Theorem 4 signi cantly improves the \crude"
bound given in Theorem 3 for small , and it roughly indicates the familiar trade-o between noise sensitivity and tracking ability.
Theorem 4 can be applied directly to the convergence analysis of some stan- dard ltering problems (cf. 18],4] and 2]). For example, let
fy k
gand
fk
gbe two stationary processes, and assume that our purpose is to track the least mean squares solution
x
= ( E k k )
;1 E k y k
of min x E ( y k
;x k ) 2
recursively based on real-time measurements
fy i i i
k
gNow, de ne
fv k
gby
y k = k x
+ v k
It is then obvious that E k v k = 0. Furthermore, in many standard situations it can be veri ed that
fk v k
g 2 Nr for some r > 2. Thus, Theorem 4 applied to the above linear regression, gives
E
kx t
;x
k2 = O ( ) + O (1
;] t ) which tends to zero as t
!1and
!0.
Apparantly, Theorem 4 is also applicable to nonstationary signals
fy k
gand
f
k
g.
3.3 Third Performance Analysis
By this, we mean that the analysis is purposed to get an explicit (approxi- mate) expression for the tracking performance rather than just getting an upper bound as in the previous two cases. This is usually carried out under white noise assumptions on
fk v k
g. Roughly speaking, the parameter process in this case will behave like a random walk, and some detailed interpretations of this param- eter model may be found in 13] and 9]. We make the following assumptions:
10
A3 . The regressor process is generated by a causal lter
k =
X1j=0 A j " k
;j + k
X1j=0
kA j
k<
1(16)
where
fk
gis a bounded deterministic sequence, and
f" k k v k
;1
gis a -mixing process with mixing rate denoted by ( m ). Assume also that (11) and (12) hold.
A4 . The process
fk v k
gsatis es the following conditions:
( i ) : E v k
jFk ] = 0 E k+1
jFk ] = E k+1 v k
jFk ] = 0
( ii ) : E v 2k
jFk ] = R v ( k ) E k k ] = Q ( k )
( iii ) : sup k E
jv k
jr
jF
k ]
M = sup
4k
kk
kr <
1where r > 2 and M > 0 are constants, and
Fk denotes the -algebra generated by
f" i i v i
;1 i
k
g.
Theorem 5 . Consider the LMS algorithm (1) applied to the model (13). Let conditions A3) and A4) be satis ed. Then for all t
1 and all small > 0
E ~ x t x ~ t ] = t + O
( ) + 2
+ (1
;) t ]
!
where the function ( )
!0 as
!0, and t is recursively de ned by
t+1 = ( I
;S t ) t ( I
;S t ) + 2 R v ( t ) S t + Q ( t + 1) with S t = E t t ] and R v ( t ) and Q ( t ) being de ned as in condition A4).
This theorem relaxies and uni es the conditions used in Theorem 5.1 of 9].
The proof is given in section 4, which is based on a general result established in
9]. The expression for the function ( ) may also be found from the analysis, and from the related formular in Theorem 4.1 of 9].
Note that in the (wide-sense) statinary case, S t
SR v ( t )
R v Q ( t )
Q , and t will converge to a matrix de ned by the Lyapunov equation(cf.9])
S + S = R v S + Q
In this case, the trace of the matrix , which represents the dominating part of the tracking error E
kx ~ t
k2 for small , can be expressed as
tr ( ) = 12 R v d + tr ( S
;1 Q )
]
11
where d =
4dim ( k ). Minimizing tr ( ) with respect to , one obtain the following formular for the step-size :
=
s
tr ( S
;1 Q ) R v d :
4 Proof of Theorems 1, 2 and 5
Proof of Theorem 1 .
Going through the proof of Theorem 3.2 in Section V of 8], we nd that it sucies to show that for any xed T > 1 and all small > 0
k
n
Y
j=i+1 (1 + 2 c
kH j
k)
kt
M
1 + O (
23)
n
;i
8n
i (17) where c
1, t
1 and M > 0 are constants and
2 H j = 2 H j (2) + 3 H j (3) +
+ T H j ( T ) + O ( 2 )
with H j ( k ) =
XjT
j
1<j
2<
<j
k(j+1)T
;1 F j
kF j
1k = 2
T:
Now, let us set
f j = exp
f14(j+1)T
X;1
s=jT
k
F s
kgThen for any 2
k
T and jT
j 1 < ::: < j k
( j + 1) T
;1, by using the inequalities k
3 2 + k 4 and x
e x , we have for
2(0 1)
k
kF j
k:::F j
1k 32(
14kF j
kk) ::: (
14kF j
1k)
32exp
f14(
kF j
1k+ ::: +
kF j
kk)
g 32f j
Consequently,
(1 + 2 c
kH j
k)
T
Y
k=2 (1 + k c
kH j ( k )
k)(1 + O ( 2 ))
T
Y
k=2
Y
iT
j
1<j
2<j
k(i+1)T
;1 (1 + k c
kF j
kF j
1k)(1 + O ( 2 ))
(1 +
32cf j ) 2
T(1 + O ( 2 ) (18)
12
Note that
n
Y
j=i+1 (1 +
32cf j ) = n
X;i
k=0 (
32c ) k
Xi+1
j
1<:::<j
kn f j
1:::f j
kNow, applying the Minkowski inequality to the above identity, taking small enough so that 2 T t
14" and using condition (i) it is evident that
k
n
Y
j=i+1 (1 +
32cf j )
k2
Tt
n
;i
X
k=0 (
32c ) k
Xi+1
j
1<:::<j
kn M
2T t1exp
f( KT 2 T t ) k
gM
2T t11 + c
23exp( KT 2 T t )
n
;i
Finally, combining this with (18), it is not dicult to see that (17) is true. This completes the proof.
The proof of Theorem 2 is rather involved, and so it is divided (prefaced) with several lemmas.
Lemma 1 . Let
fF t
gbe a -mixing d
d dimensional matrix process with mixing rate
f( m )
g. Then
sup i
kS i (T)
k2
2 cd
(
T T
X;1
m=0
q
( m )
) 1
2
8T
1 where S i (T) is de ned by (9) and c is de ned by c = sup i
kF i
;EF i
k2 .
Proof . Denote G k = F k
;EF k . Then by Theorem A.6 in 12,p.278] we have
k
E G j G k ]
k2 dc 2
q(
jj
;k
j)
8jk Consequently, by using the inequality
j
trA
jd
kA
k 8A
2Rd
d We get
k
S i (T)
k2 2 = E
k(i+1)T
X;1
jk=iT G j G k
ktr
f(i+1)T
X;1
jk=iT EG j G k
g13
d (i+1)T
X;1
jk=iT
k
EG j G k
k2 c 2 d 2 (i+1)T
X;1
jk=iT
q
(
jj
;k
j)
4 c 2 d 2 T T
X;1
m=0
q
( m ) This gives the desired result.
Lemma 2 . Let F k = k k , where
fk
gis de ned by (10) with sup k
k" k
k2 <
1. Then
fF k
g2M2 where
M2 is de ned by (8).
Proof . First of all, we may assume that the process
f" k
gis of zero mean (otherwise, it can be included in k ). Then by (10),
k
S i (T)
k2
1
X
kj=
;1k
A k
kkA j
kk(i+1)T
X;1
t=iT " t
;k " t
;j
;E" t
;k " t
;j
k2
+2
X1j=
;1kA j
kk(i+1)T
X;1
t=iT " t
;j t
k2 (19) Note that for any xed k and j , both the processes
f" t
;k " t
;j
gand
f" t
;j
gare -mixing with mixing rate ( m
;jk
;j
j) and ( m ) respectively (where by de nition, ( m ) = 1
8m < 0) .
By Lemma 1, it is easy to see that the last term in (19) is of order o ( T ). For dealling with the second last term, we denote
f kj ( T ) = 2 cd
(
T T
X;1
m=0
q
( m
;jk
;j
j)
) 1
2
: (20)
Also, assume without loss of generality that ( m )
1
8m
0. Then it is obvious that
sup kj f kj ( T )
2 cdT (21)
and sup
j
k
;j
j<
pT f kj ( T ) = o ( T ) : (22) Now, by the summability of
fA i
g,
X
j
k
;j
j pT
k
A k
kkA j
k!0 as T
!114
Hence by (21)
X
j
k
;j
j pT
k
A k
kkA j
kf kj ( T ) = o ( T ) (23) and by (22)
X
j
k
;j
j<
pT
k
A k
kkA j
kf kj ( T ) = o ( T ) : (24) Combining (23) and (24) gives
1
X
kj=
;1k
A k
kkA j
kf kj ( T ) = o ( T ) : (25) By this and Lemma 1, we know that the second last term in (19) is also of the order o ( T ) uniformly in i . Hence,
fF k
g2M2 by de nition.
Lemma 3 . Let sup k E
kk
k2 <
1. Then
fk k
g 2Sif and only if condition (12) holds, where
Sis de ned by (7).
Proof. Let us rst assume that (12) is true. Take
= (1 + sup k E
kk
k2 )
;1 . Then applying Theorem 2.3 in 6] to the deterministic sequence A k = E k k ] for any
2(0
], it is easy to see that
fk k
g2S(
).
Conversely, if
fk k
g2S, then there exists
2(0 (1+sup k E
kk
k2 )
;1 ] such that
fk k
g 2 S(
). Now, applying Theorem 2.3 in 6] to the deterministic sequence A k =
E k k ], it is easy to see that (12) holds. This completes the proof.
Lemma 4 . Let F k = k k , where
fk
gis de ned by (10) with (11) satis ed.
Then
fF k
gsatis es condition (i) of Theorem 1.
Proof . Without loss of generality assume that k = 0. Let us denote A =
X1j=
;1k
A j
k(26)
Then by the Schwarz inequality from (10) we have
k
k
k2
A
X1j=
;1kA j
kk" k
;j
k2
Consequently, by the Holder inequality and (11) we have for "
A
;2 E exp
f"
Xn
i=1
kF j
ikg15
E exp
f"A
X1j=
;1kA j
kXn
i=1
k" j
i;j
k2
g
= E
Y1j=
;1exp
f"A
kA j
kXn
i=1
k" j
i;j
k2
g
1
Y
j=
;1
E exp
f"A 2
Xn
i=1
k" j
i;j
k2
g
! kAjAk
1
Y
j=
;1( M exp
fKn
g)
kAjAk= M exp
fKn
g: This completes the proof.
The following lemma was originally appeared in 10, p.113].
Lemma 5 . Let
fz k
gbe a nonnegative sequence such that for some a > 0 b > 0 and for all i 1 < i 2 < :::::: < i n
8n
1,
E exp
fXn
k=1 z i
kgexp
fan + b
g: (27)
Then for any L > 0 and any n
i
0, E exp
f1
2
n
X
j=i+1 z j I ( z j
L )
gexp
fe a
;L2( n
;i ) + b
gwhere I ( : ) is the indicate function.
Proof . Denote
f j = exp(12 z j ) I ( z j
L ) :
Then by rst applying the simple inequality I ( x
L )
e
x2=e
L2and then using (27), we have for any subsequence j 1 < j 2 :::::: < j k
E f j
1::::::f j
k]
= E exp(12
Xk
i=1 z j
i) I (
\k
i=1
f
z j
iL
g) :
E exp(
Xk
i=1 z j
i) = exp( kL 2 )
exp
f( a
;L
2 ) k + b
g16
By this we have
E exp
f Xn
j=i+1
1 2 z j I ( z j
L )
g= E
Yn
j=i+1 exp
f1
2 z j I ( z j
L )
gE
Yn
j=i+1
f
1 + exp(12 z j ) I ( z j
L )
g= E
8
<
:
n
;i
X
k=0
X
i+1
j
1<:::<j
kn f j
1:::f j
k9
=
e b
8
<
:
n
;i
X
k=0
X
i+1
j
1<:::<j
kn exp
f( a
;L 2 ) k
g9
=
= e b n
Yj=i+1
f
1 + exp( a
;L 2 )
gexp
( n
;i )exp( a
;L 2 ) + b
This completes the proof of Lemma 5.
Lemma 6 . Let F k = k k , where
fk
gis de ned by (10) with (11) satis ed.
Then
fF k
gsatis es condition (ii) of Theorem 1.
Proof . Set for any xed k and l , z j = z j ( kl ) =
k(j+1)T
X;1
t=jT " t
;k " t
;l
;E" t
;k " t
;l ]
kThen, similar to (19) we have
n
X
j=i+1
k
S j (T)
k X1kl=
;1k
A k
kkA l
k Xn j=i+1 z j + +2
X1k=
;1kA k
k Xn j=i+1
k(j+1)T
X;1
t=jT " t
;k t
k: (28) We rst consider the second last term in (28). By the Holder inequality,
E exp
8
<
:
X1kl=
;1k
A k
kkA l
k Xn j=i+1 z j
9
=
= E
Y1kl=
;1exp
8
<
:
jjA k
jjjjA `
jj Xn j=i+1 j
9
=
17
1
Y
kl=
;18
<
:
E exp
fA 2
Xn
j=i+1 z j
g9
=
kAkAkk2Alk
(29) where A is de ned by (26).
Now, let c = sup k E
k" k
k2 , and note that
k
" t
;k " t
;l
k1 2(
k" t
;k
k2 +
k" t
;l
k2 ) we have
z j
1 2
(j+1)T
X;1
t=jT (
k" t
;k
k2 +
k" t
;l
k2 ) + cT
By this and (11) it is easy to prove that the sequence
fz j
gsatis es con- dition (27) with a = ( K + c ) T and b = log M , where is de ned as in (11).
Consequently, by Lemma 5 we have for any L > 0 E exp
8
<
:
2
n
X
j=i+1 z j I ( z j
LT )
9
=
M exp
ne (K+c
;L2)T ( n
;i )
o(30) Now, in view of (30), taking < A 4
;2and L > 2
;1 ( K + c ), and applying the Holder inequality, we have
E exp
f2 A 2
Xn
j=i+1 z j I ( z j
LT )
gexp
f( T )( n
;i )
g(31) where ( T )
!0 as T
!1, which is de ned by
( T ) = 4
;1 A 2 exp
f( K + c
;L 2 ) T
gNext, we consider the term x j =
4z j I ( z j
LT ).
By the inequality e x
1 + 2 x 0
x
log2, we have for small > 0 exp
f2 A 2
Xn
j=i+1 x j
g Yn
j=i+1 (1 + 4 A 2 x j ) (32)
As noted before, for any xed k and l , the process
f" t
;k " t
;l
gis -mixing with mixing rate ( m
;jk
;l
j). Hence, similar to the proof of Corollary 3.1 in 1,
18
p.1383], we have E
Yn
j=i+1 (1 + 4 A 2 x j )
2
n1 + 8 A 2 f kl ( T ) + 2 LT ( T + 1
;jk
;l
j)]
on
;i
2exp
n8 A 2 f kl ( T ) + 2 LT ( T + 1
;jk
;l
j)]( n
;i )
o(33) where f kl ( T ) is de ned by (20).
Finally, combining (31){ (33) and using the Schwarz inequality we get E exp
fA 2 n
Xj=i+1 z j
g8
<
:
E exp
f2 A 2 n
Xj=i+1 z j I ( z j
LT )
9
=
1
2 8
<
:
E exp
f2 A 2 n
Xj=i+1 x j
9
=
1
2
p
2 M exp
n( T ) + 8 A 2 f kl ( T ) + 16 LTA 2 ( T + 1
;jk
;l
j)]( n
;i )
oSubstituting this into (29) and noting (25), it is not dicult to see that there exists a function g ( T ) = o ( T ) such that for all small > 0,
E exp
8
<
:
X1kl=
;1k
A k
kkA l
k Xn j=i+1 z j
9
=
p
2 M exp
fg ( T )( n
;i )
g:
Obviously, for the last term in (28), a similar bound can also be derived using a similar treatment. Hence it is easy to see that the lemma is true .
Proof of Theorem 2 .
Necessity: Let
fk k
g2 Sp for p = 2. Then by Lemma 2 and Theorem 3.1 in 8], we know that
fk k
g 2S. Consequently, by Lemma 3 we know that (12) holds.
Suciency: If condition (12) holds, then by Lemma 3 we have
fk k
g 2S