case of no exogenous inputs
DietmarBauer
Departmentof ElectricalEngineering
Linkoping University, SE-58183 Linkoping,Sweden
WWW: http://www.control.isy.l iu.s e
Email: Dietmar.Bauer@tuwien.ac.at
June 7,2000
REG
LERTEKNIK
AUTO
MATIC CONTR
OL
LINKÖPING
Report no.: LiTH-ISY-R-2262
Submitted toJournal of TimeSeries Analysis
TechnicalreportsfromtheAutomaticControlgroupinLinkopingareavailablebyanonymousftp
case of no exogenous inputs
DietmarBauer
Institute f. Econometrics, Operations Research and System Theory
TUWien, E119
Argentinierstr. 8, A-1040 Wien
e- mail: Dietmar.Bauer@tuwien.ac.at
June 7,2000
Abstract
Inthispaperoneofthemainopenquestionsintheareaofsubspacemethodsisanswered
partly.Oneparticularalgorithm,sometimestermedCCA,isshowntobeasymptotically
equiv-alenttoestimatesobtainedbyminimizingthepseudomaximumlikelihood. Here
asymptoti-callyequivalentmeans,thatthedierenceofthetwoestimatorstimesthesquarerootofthe
samplesizetendstozero.
Keywords: linear systems, discrete time systems, subspace methods, asymptotic
eÆ-ciency
1 Introduction
The subspace method sometimes termed CCA has been proposed in (Larimore, 1983). It has
attainedagreatdealofattentionduetoitsgoodnumericalpropertiesanditsgoodperformanceon
realandsimulateddata. Theasymptoticpropertiesofthismethodhavebeenanalyzedinaseries
ofpapers: (Peternellet al., 1996)establishconsistencyof theestimatesof thetransferfunction.
(Baueretal.,1999)providetheproofoftheasymptoticnormalityofthesystemmatrixestimates.
This paper also demonstrates a method to approximate the asymptotic variance numerically.
These expressions havebeenfurther investigated in (Baueret al., 1997b;Bauer, 1998; Bauer et
al., 2000), where the rst papershows someplots for some low dimensional system, indicating
that aspecicchoiceofuserparametersleadstoestimates,whichareindistinguishablefrom the
Cramer Rao bound. The latterpaperreduces the numberof user parameters, which aect the
asymptotic accuracyof the transferfunction estimates. Examples given in the abovereferences
indicatethattheperformanceof CCAis'atleastclosetomaximum-likelihood',howevernoformal
verication of this statement is given in general. The aim of this paperis to ll this gap, i.e.
to prove the proposition that CCA is in fact a realization of a (generalized pseudo) maximum
likelihood procedure and thus asymptotically eÆcient. Here generalized refersto the denition,
that a maximum likelihood estimate is any estimate, which 'essentially' is equal to the global
optimumof thelikelihood. Here the meaningofessentiallywillbedened below. Pseudo refers
tothefact,thattheGaussianlikelihoodisoptimized,howeverthetruenoisedistributionwillnot
beassumedto beGaussian. Theauthor wantstonote, that this paperonlydealswith thecase
ofnoobservedexogenousinputs.
The organizationof this paper is as follows: In the next section the model set and the
as-sumptionsarestated. Inthissectionalsoashortdescriptionoftheconsideredalgorithmisgiven.
Section 3 states the main result i.e. the asymptotic equivalence of pseudo maximum-likelihood
estimatesandCCAestimates. Section4thengivestheproof. Finallysection5concludes.
Through-outX 0
will beused to denote transpose, tr[X] denotesthe traceoperator and X
i;j
the (i;j)-th
Inthispapernitedimensional,discretetime,linearstatespacesystemswithoutobservedinputs oftheform x t+1 = Ax t +K" t y t = Cx t +" t (1)
areconsidered. Here(y
t
;t2Z)denotesthes-dimensionalobservedoutputprocessand("
t ;t2Z)
denotess-dimensionalergodicwhitenoiseofmeanzeroandpositivedenitevariance. (x
t ;t2Z)
denotesthen-dimensionalstatesequence. Itwillbeassumedthroughoutthatthesystemisstable,
i.e. that j
i
(A)j <1holds for alleigenvalues
i
of A, and that thesystemis strictly
minimum-phase,i.e. j
i (
A )j<1holdsforalleigenvaluesof
A=(A KC). Theseassumptionsinparticular
imply that the system is in innovation form and that the innovation variance is nonsingular.
In order to guarantee asymptotic normality of the parameter estimates, we also impose some
additionalconditions: Ef" t jF t 1 g=0 Ef" t " 0 t jF t 1 g==E" t " 0 t Ef" t;a " t;b " t;c jF t 1 g=! a;b;c Ef" 4 t;a g<1
HereE denotesexpectation,F
t
denotesthe-algebraspannedby(y
s
;st) and"
t;a
denotesthe
a-thcomponentofthevector"
t
. Notethattheseassumptionscoincidewiththeassumptionsused
in the proof of the asymptotic normality for the ML estimates in (Hannan and Deistler, 1988,
Theorem4.3.2). Ithasbeenarguedthere,thatweakerconditionsaresuÆcientfortheasymptotic
normality. Howeversuch conditionswillnotbeconsidered here,sincetheargumentsalreadyare
quitecomplicated.
TheCCAsubspacealgorithmhasbeenintroducedby(Larimore,1983). Thebasicfact,thatis
exploitedisthepropertyofthestatetobeaninterfacebetweenthepastandthefutureofthetime
seriesin acertainsense: LetY + t;f =[y 0 t ;y 0 t+1 ;;y 0 t+f 1 ] 0 and letY t;p =[y 0 t 1 ;;y 0 t p ] 0 . Here f
andparetwointegerparameterstobechosenbytheuser. Inthispaperitwillalwaysbeassumed,
thatf =pischosenas ^ f =p^=ddp^ BIC e;d>1,wherep^ BIC
isequaltotheorderestimateobtained
byusingBICinalongautoregressiony
t =^a 1 y t 1 ++^a p y t p +"^ t
. Heredxedenotesthesmallest
integerlargerthan x. Itis known (seee.g. Hannanand Deistler,1988, Theorem 6.6.3.),that in
the present setting p^
BIC
tends to innity and fullls ^ p BIC 2logj 0 j logT
! 1 a.s. and thus f = p
tendto innityat rateproportionalto logT. Here
0 =maxfj i ( A)j;i=1;;ng. Dene E + t;f analogouslytoY + t;f
forthenoise"
t
inplace ofy
t
. Itfollowsfrom thesystemequations(1)that
Y + t;f =O f K p Y t;p +O f A p x t p +E f E + t;f : (2) HereO f =[C 0 ;(CA) 0 ;:::;(CA f 1 ) 0 ] 0 andK p =[K ; AK ;:::; A p 1 K]. FurthermoreE f denotesthe
lowertriangularblockToeplitzmatrix,whosei-thblockrowequals[CA i 2 K ;:::;CK ;I s ;0 s(f i)s ], whereI s
isthes-dimensionalidentityand0 ab
isanabnullmatrix. Notethatin thisequation
thefutureoftheoutputisdecomposed intothreeterms,wheretherstcorrespondstothenite
pastof the outputs, the second to thefar past (i.e. the term involving x
t p
), and the third to
thefuture of thenoise. Sinceptends toinnityit seemsto bereasonableto neglectthesecond
term in the equation. The rst and the last term are uncorrelated by assumption. This is the
(somewhatheuristical)motivationforthefollowingthreestepidenticationscheme:
1. Obtainanestimate ^ of=O f K p byregressingY + t;f onY t;p . 2. Typically ^
will be of full rank, whereas = O
f K
p
is of rank n. Thus nd a rank n
approximationof ^
usingaweightedsingularvaluedecomposition ^ W + f ^ ^ W p = ^ U n ^ n ^ V 0 n + ^ R , where ^ U n
denotesthematrix,whose columnsaretheleftsingularvectorscorrespondingto
thedominatingnsingularvalues,thelatterarethediagonalentriesin thediagonalmatrix
^ n . ^ V n
correspondsto therespectiverightsingularvectors. ^
R accountsforthe remaining
singular values. An estimateof K
p is dened as ^ K p = ^ T ^ V 0 n ( ^ W p ) 1 . Theweightings ^ W p ,
^
W
f
andthematrix ^
T areexplainedinmoredetailbelow. ForthemomentitsuÆcestostate,
that ^
T isrelatedtoabasischangein thestatespaceandthusisrequiredtobenonsingular.
3. Compute an estimate of the state as x^
t = ^ K p Y t;p
and then use the system equations to
obtain estimates of the various system matrices: Estimate C by regression of y
t onto x^ t . Let ^
C denote this estimate,then "^
t =y t ^ C^x t
isan estimateof thenoise. Finally [A;K]
is estimated byregressingx^ t+1 on [^x 0 t ;"^ 0 t ] 0
, and thevariance is estimated bythesample
covarianceof"^
t .
Thechoiceoftheweightingmatrices ^ W + f and ^ W p
willplayacrucialroleinthefollowing. Ithas
beenshownin(Baueretal.,2000)thatthechoiceoftheweightingmatrix ^
W
p
doesnothaveany
eectontheasymptoticdistributionoftheestimatedsystem. Thuswemightchose ^ W p =( ^ p ) 1=2 w.r.o.g. Here ^ p
denotesthesamplecovariancematrixofY
t;p
andY =X 1=2
isdenedsuchthat
YY 0 = X. The choice of ^ W + f
howeverhas been seen to have an in uence on the asymptotic
distribution. The term CCAwill be reserved for thechoice ^ W + f = ( ^ + f ) 1=2 , where ^ + f denotes
thesamplecovarianceofY +
t;f
. Methodsthatuseadierentchoiceoff and/or ^
W +
f
aresometimes
termedLarimoretypeofprocedures.InthispaperonlytheCCAcasewillbedealt with.
Thematrix ^
T obviouslyreferstoachangeinthebasisoftheestimatedstate^x
t
. Itis
straight-forwardtoshow,thattheestimatedtransferfunction isnotaectedbysuchachange. Therefore
itmightbeassumedw.r.o.g. that ^
T ischosensuchthattheestimatedsystem( ^ A; ^ K; ^ C)liesinan
appropriateoverlappingform. Herethemeaningofappropriatewill becomeclearbelow.
Notethat with these choices there exists aremarkablesymmetry: Inthe single-input
single-outputcasee.g. thematrix,whichisdecomposedin theSVDwill besymmetric. Thissymmetry
builds the core of the stochastic balancing ideas of (Desai et al., 1985). In this paper two
dif-ferent state space realizations attached to a covariance sequence are constructed, oneof which
istermed theforwardrepresentation andtheother oneis called backward representation. Inthe
present framework theforwardrepresentationis givenby theequations (1), whilethe backward
representationisoftheform
z t 1 =A b z t +K b t ; y t =C b z t + t (3) Here t
denotes the backward innovation sequence. Thebackwardrepresentation stems from a
realizationofthesequenceoftransposesoftheoriginalcovariancesequence. Fortheforwardsystem
we have denoting Ey t y 0 t+j = (j) that (j) = M 0 (A 0 ) j 1 C 0 ;j > 0; (0) = C x C 0 +; (j) = ( j);j < 0. Here M 0 = Ey t x 0 t+1 = C x A 0 +K 0 ; x = Ex t x 0 t = A x A 0 +KK 0 . For the backwardrepresentation (j)=C b (A b ) j 1 M b ;j>0; (0)=C b z (C b ) 0 + b ; (j)= ( j);j<0, with (M b ) 0 = Ey t z 0 t 1 = C b z (A b ) 0 + b (K b ) 0 ; z = Ez t z 0 t = A b z (A b ) 0 +K b b (K b ) 0 . Here b = E t 0 t
denotes the backward innovation variance matrix. From a decomposition of the
covarianceHankel matrixit follows, that there exists a nonsingularmatrix T, such that M b = TC 0 ;C b =M 0 T 1 ;A b =TA 0 T 1
. Elementarycalculationsalsoshow,that
[M;AM;]( 1 ) 1 = K ;( A)K ; M b ;A b M b ; ( + 1 ) 1 = [TC 0 ;TA 0 C 0 ;]( + 1 ) 1 =TO 0 1 ( + 1 ) 1 = K b ;(A b K b C b )K b ;
Note that xing T = I
n
there exists a one-one relation between the forward representation
(A;K ;C)andandthebackwardrepresentation(A b ;K b ;C b )and b
. Thissymmetrywilllieat
thecoreofthepaper,asitallowsustotransfereachresultobtainedfortheforwardrepresentation
toaresultcorrespondingto thebackwardrepresentationandviceversa.
3 Main result
In(Baueretal.,1999)ithasbeenshown,thatingenericcasesforawiderangeofchoicesforf;p
and ^ W + f ; ^ W p
ofleadingtoexpressions fortheasymptoticvarianceand notrelyingonspecialproperties ofthe
weightingmatricesandtheintegerf. Thedisadvantagehoweveremanatingfromthisframeworkis
thediÆcultytocomparedierentapproachesaccordingto theirasymptoticvarianceandrelative
eÆciency. In this paper a dierent approach is pursued leading to another derivation of the
asymptoticnormalityresultintheCCAcaseandmoreovertoaproofoftheasymptoticequivalence
toageneralizedpseudomaximumlikelihoodestimate,whichisdenedasfollows: ^
ML
essentially
maximizes the (pseudo) likelihood L(Y +
1;T
;) according to the model dened by (1) using the
Gaussianlikelihood, i.e. iL(Y + 1;T ; ^ ML ) min 2 L(Y + 1;T ;)!0a.s. and p T@ i L(Y + 1;T ; ^ ML )! 0;8iinprobability,where@ i
denotesthederivativewithrespecttothei-th coordinateof. This
isstatedinthemaintheorem ofthispaper:
Theorem3.1 Let (y
t
;t2 Z)be generated by a systemof the form (1), where the ergodic white
noise "
t
fulllstheassumptions ofsection2. Assumethatthetrueorderofthesystemnisknown
andthatf =p=ddp^
BIC
e;d>1isused. Thentheestimatesofthetransferfunctionk(z)obtained
byusing CCAaregeneralisedpseudomaximum-likelihoodestimates.
The author wants to stress, that the assumptions put forth in the theorem are exactly the
sameasused in the proofof the asymptoticnormalityof pseudomaximum-likelihood estimates
givenin (HannanandDeistler,1988). Notehoweverthat theanalogousresultfor theinnovation
varianceis notprovided.
Inthetheoremitisassumedthattheorderofthesystemisknownapriori. Thisisanunrealistic
situation inpractice. Thus theorderhastobeestimated. Onepossiblemethod todothis could
bethefollowing: Theinnovation varianceis usually estimatedas ^ n =hy t ;y t i ^ C n h^x n t ;x^ n t i ^ C 0 n . Here ha t ;b t i =T 1 P T f t=p+1 and x^ n t = ^ K p (n)Y t;p
denotes theestimated state using theorder n.
Similarily ^
C
n
denotesthe estimateofC usingorder n. The matrices ^
n
can becalculated with
extremely lowcomputational costs (see Bauer, 1998, Chapter 5, for details). Thus it seems to
be tempting to estimate n by minimizing a criterion, which is constructed analogously to the
information criteriaintroduced by(Akaike, 1976), insertingthe estimates ^
n
. In(Bauer, 1998)
sometheoreticalproblemswith thismethod arediscussedand anexampleisgiven,which shows
clearly, that this method might in somesituation lead to bad estimates, which are signicantly
dierent from BIC estimates in the ML framework. Note however, that the result given above
alsohasdirectimplicationsfortheorderestimationissue,sinceitreducesthenumericalcostsfor
the optimizationof the likelihood signicantly and makesit possibleto estimate theinnovation
varianceinacomputationallyfeasiblewayfromestimatingthesystemmatricesforarangeoforders
andthenestimatingtheinnovationvariancein thestandardwayasthesamplecovarianceofthe
estimated residuals obtainedfrom ltering the output with theinverse of thetransfer function.
Moreonorderestimation inthecontextofsubspacemethods canbefoundin (Bauer,2000).
Theauthoralsowantstostress,that theresultismoregeneralthanthecorrespondingresult
in(Baueretal.,1999)asitisnotrestrictedtogenericcases.
4 Proof of the main result
Inthissectiontheproofoftheresultofthispaperisgiven. Sincethedetailsarerathercomplicated
rst an outline of the strategyis provided: The proof orientates itself veryclosely at the
stan-dardproofoftheasymptoticnormalityforpseudomaximumlikelihoodestimators. LetL(Y +
1;T ;)
denote the pseudo likelihood accordingto the model (1) , where 2 R 2ns
denotes the vectorof
parameters,accordingtotheoverlappingformsdescribedin(HannanandDeistler,1988,Chapter
2). Note,that doesnotcontainanyparameterscorrespondingto. Anyoverlappingcanonical
formparameterizingthesetofallrationaltransferfunctionsofMcMillandegreencouldbeused.
It is assumed withoutrestriction ofgenerality, that theparametersrefer to aparticular
coordi-nate neighbourhood,suchthat thetrueparameter,
Æ
say,isaninterior pointofthis coordinate
system. Itwillalsobeassumed,thatthematrix ^
tricesimposed onthisspecial neighbourhood,whichin thecaseoftheechelonoverlappingforms
amountstozerooronerestrictionsof certainelements. Let ^
ML
denoteageneralisedmaximum
likelihood estimate, i.e. an parameter value essentially maximizing the likelihood. Further let
the corresponding system be denoted as ( ^ A ML ; ^ K ML ; ^ C ML
). Then the proof of the asymptotic
normalityin thelikelihoodcasereliesonalinearisationofthepseudolikelihood:
@L(Y + 1;T ; ^ ML )=@L(Y + 1;T ; Æ )+@ 2 L(Y + 1;T ; ) ^ ML Æ Inthis equation
denotes aconvexcombination (not necessarilythesame in each row)of ^
ML
and
Æ
and @ denotes the derivativewith respect to evaluated at the parametervectorin the
argumentofthefunction. Here p T@L(y t ; ^ ML
)!0in probabilityaccordingto thedenitionof
the estimate and p T@L(Y + 1;T ; Æ
) is shownto be asymptotically normal. TheHessian is shown
to beconvergentto anonsingular limit,which isensured bythe assumptionthat the truestate
dimensionisusedfortheestimation.
Themainideaofthisproofistomimicthisargument. Let ^
cca
denotethevectorofparameter
estimatesobtainedbyusingtheCCAmethod. Thecorrespondingsystemmatriceswillbedenoted
with( ^ A cca ; ^ K cca ; ^ C cca
). Supposewendascalarfunction M
f (Y + 1;T ;)suchthat p T@M f (Y + 1;T ; ^ ML )!0inprobability p T@M f (Y + 1;T ; ^ cca )!0in probability @ 2 M f (Y + 1;T ;) ! H 1
() uniformely on some compact subset, such that
Æ is an interior point,where H 1 iscontinuousat Æ andH 1 ( Æ )isnonsingular.
Herethefunction alsodepends onthenoisecovariance,butthisdependence isnotre ectedin
thenotation. Thenitfollowsfrom
@M f (Y + 1;T ; ^ cca )=@M f (Y + 1;T ; ^ ML )+@ 2 M f (Y + 1;T ; ) ^ cca ^ ML that p T ^ cca ^ ML
! 0in probabilityand thus the asymptoticdistribution ofthe two
esti-mateswillbethesame. Thereforethestructureoftheproofistoproposeafunction M
f (Y
+
1;T ;)
and thenverifythe abovementionedproperties. Thisis donein thefollowingsubsections,after
dealingwith somepreliminaries.
4.1 Preliminaries Deneha t ;b t i=T 1 P T f t=p+1 a t b 0 t
. Forascalarrandomsequenceg
T thenotationg T =o(h T )will
beusedtodenotethefact,thatg
T =h T !0a.s. Furtherg T =O(h T
)meanslimsup
T!1 jg T =h T j
M<1a.s. forsomeconstantM <1. Forvectorormatrixvaluedrandomvariablesg
T =O(h
T )
means limsup
T!1 max i;j jg T;i;j =h T
j M < 1, i.e. the bound is assumed to hold uniformely.
Hereg
T;i;j
denotesthe(i;j)-thelementofg
T
. Thisnotationwillbeusedalsointhecase,thatthe
dimensionofg
T
dependsonthesamplesizeT. Note,thatthisnotationisnonstandard,asusually
thenorm ofthematrixis comparedto h
T
. This isadierence forsequencesof matrices,whose
dimensions grow with the sample size T. Furthermore in the case, where g
T
depends on some
otherquantities,e.g. g
T =g
T;p
,itwillbesaid,thatg
T;p =O(h
T
)uniformelyinp,iftheconstant
M involvedin thedenition oftheorder O(h
T
)canbechosenindependentofp. Furthermore :
=
willmeanessentialequivalenceinthesensethattheequalityholdsuptotermsofordero(1= p
T),
againcomponentwisebutuniformely. Inthiscasethedierencewillbecalled neglectable. Finally
Q
T =
p
loglogT=T will beused.
Thefollowingobviouslemma willbeusedwidelyintheremainingsections:
Lemma4.1 Let ^ A2R a ^ b and ^ B 2R ^ bc suchthat ^ A=O(Q T ) and ^ B B =O(Q T )for some matrix B 2R ^ bc
for possibly data dependent ^
b, possibly tending to innity. Then ^ A ^ B ^ A B = O(Q 2 T ^ b)and thus ^ A ^ B and ^
A B areessentially equivalentfor ^ bQ 2 T p T !0a.s.
^
K
p
willbeneeded. This isprovidedin thenextlemma:
Lemma4.2 Let the conditions of Theorem 3.1hold. Then ^ K p K Æ p =O(Q T ) uniformely inp, whereK Æ p
corresponds totherealisation ofthe true systeminthe overlappingcanonical form, i.e.
K Æ p =[K Æ ;(A Æ K Æ C Æ )K Æ ;;(A Æ K Æ C Æ ) p 1 K Æ
],usingthe realisation (A
Æ ;K Æ ;C Æ )ofthetrue system. Also k ^ A ML A Æ k=O(Q T ); k ^ K ML K Æ k=O(Q T ); k ^ C ML C Æ k=O(Q T ) k ^ A cca A Æ k=O(Q T ); k ^ K cca K Æ k=O(Q T ); k ^ C cca C Æ k=O(Q T ) Itfollows thatK p K Æ p andO f O Æ f
areof theorderO(Q
T
),uniformely inpandf respectively.
HereK
p andO
f
canbedenedusing( ^ A ML ; ^ K ML ; ^ C ML )orusing ( ^ A cca ; ^ K cca ; ^ C cca ).
Proof: The norm bound on the error for the ML estimate follows from Theorem 4.3.2. in
(HannanandDeistler,1988). Itfollowsfrom Lemma2in (Baueretal.,2000)that
~ K p ~ K Æ p = O 0 f ( + f ) 1 O f 1 O 0 f ( + f ) 1 ( ^ ) I I n 0 ~ K Æ p +o(T 1=2 ) Here ~ K p
denotes thematrix [ ^ K p ] 1 n ^ K p , where [ ^ K p ] n
denotes therst n columns of ^ K p . Further ~ K Æ p = [K Æ p ] 1 n K Æ p
. These matrices are well dened, if [K Æ
p ]
n
is nonsingular. In the nongeneric
case, that this is violated, there exists a dierent normalisation using dierent columns of K Æ
p
such that an analogous result holds. In the proof however we will only deal with the generic
case. The dierences for the nongeneric cases are minor and thus omitted. In the equation
we have used ^ = hY + t;f ;Y t;p ihY t;p ;Y t;p i 1 and = O Æ 1 K Æ 1
denotes the corresponding limit for
T ! 1 and thus also f = p ! 1. The main fact used here is the uniform convergence of
sample covariancesas stated in (Hannan and Deistler, 1988, Theorem 5.3.2): If ^
j =hy t ;y t j i and j =Ey t y 0 t j ,thenmax jjj<(logT) a k^ j j k=O(Q T )fora<1. LetH f;p =EY + t;f (Y t;p ) 0 ,then H f;p hY + t;f ;Y + t;p iisoforderO(Q T
)uniformelyinf andpandthesameistrueforhY
t;p ;Y t;p i= ^ p
and the corresponding expectation
p
. This shows, that ^ : =(hY + t;f ;Y t;p i H f;p )( p ) 1 + H f;p ( p ) 1 ( p ^ p )( p ) 1 is of order O(Q T
), using the resultsof Lemma 4.1 and Theorem
6.6.11of (Hannan andDeistler, 1988), which states k(
p ) 1 k 1 c uniformely in p. Therefore ~ K p ~ K Æ p
isofthesameorder.
Consider the estimation of the state matrices: ~ C cca = hy t ;Y t;p i ~ K 0 p ( ~ K p ^ p ~ K 0 p ) 1 . The same
argumentsasgivenaboveshowthat ~ C cca ~ C Æ =O(Q T
). Similarargumentsalsoshowthesame
result for ~ A cca ~ A Æ and ~ K cca ~ K Æ
. The overlappingcanonical forms canbe obtainedfrom a
decompositionoftheHankelmatrixoftheimpulseresponsecoeÆcientsCA j
Kintoobservability
andcontrollabilitymatrices,suchthatthematrixbuiltofcertainrowsoftheobservabilitymatrix
described by the structural indices are equal to the identity matrix. Thus the transformation
matrix,thattransformsthegivenrealisation( ~ A cca ; ~ K cca ; ~ C cca
)totheoverlappingcanonicalform
istheinverseofthecorrespondingmatrixoftheobservabilitymatrix ~ O f builtof ~ A cca and ~ C cca . Thusconsider ~ C cca ~ A j cca ~ C Æ ( ~ A Æ ) j =( ~ C cca ~ C Æ ) ~ A j cca + ~ C Æ ( ~ A j cca ( ~ A Æ ) j
). Thersttermhereis
oforderO(Q
T
)obviouslyandthesecondtermisequalto ~ C Æ (( ~ A cca ~ A Æ ) ~ A j 1 cca + ~ A Æ ( ~ A j 1 cca ~ A Æ )).
Thus it canbe shown by induction, that the second termis alsoof thesame order,since both
~ A cca and ~ A Æ
are stable, i.e. have all their eigenvalues smaller than one and j j ! 0, where 1>>maxfj max ( ~ A cca )j;j max ( ~ A Æ
)jg. Thisshows,thattheerrorinthetransformation, ~
T ~
T
Æ
say, isof orderO(Q
T ). Thus ^ K p K Æ p = ~ T ~ K p ~ T Æ ~ K p =( ~ T ~ T Æ ) ~ K p ~ T Æ ( ~ K p ~ K Æ p )is oforder O(Q T
). Inall the aboveevaluationsthe constantinvolvedin thedenition ofthe order canbe
chosentobeindependentofp. Thisfollowsfromaninvestigationofthestepsabove.
Note,thatontherouteto theproofoftherstclaim,wealsoprovedthesecond claim,since
the transformed matrices are obtained as ( ~ T ~ A cca ~ T 1 ; ~ T ~ K cca ; ~ C cca ~ T 1
). The remaining claims
followsfromtheargumentsgivenabove,sinceinO
f O
Æ
f
of ^ C cca ^ A j cca C Æ A Æ ,or( ^ A cca ^ K cca ^ C cca ) ^ K cca (A Æ K Æ C Æ )K Æ
respectively. Termsofthisform
havebeendealtwithbefore. .
The key to the proof lies in imposing more of the structure to the estimates of the state
obtainedfrom thesubspaceprocedure. That this canbe donewithoutchangingthe asymptotic
distribution oftheerrorinsomecasesisshowninthenextlemma:
Lemma4.3 Let x^ t = ^ K p Y t;p and let x~ t = K 1 Y t;1 with K 1 = [ ^ K cca ; b A cca ^ K cca ;], where y t =0;t<1isusedand b A cca = ^ A cca ^ K cca ^ C cca . Then h^x t ~ x t ;"^ t 1 i : =0; h^x t ~ x t ;x^ t 1 i : =0; h^x t ~ x t ;y t+j i : =0; h^x t ~ x t ;" t+j i : =0 j0
Proof: Startwith therstclaim: Let
t+1 =x^ t+1 ~ x t+1 =( ^ K p K p )Y t+1;p b A p cca ~ x t p+1 . Also t+1 = b A cca t + t , where t =x^ t+1 b A cca ^ x t ^ K cca y t . Thus h t+1 ;"^ t i = b A cca h t ;"^ t i. Notethath t ;"^ t " t i :
=0whichfollowsfromLemma4.1usingtheerrorboundontheestimated
systemmatrices, since
h t ;"^ t " t i=h( ^ K p K p )Y t 1;p ; ^ C cca ^ x t +C Æ x Æ t i+h b A p cca ~ x t p ; ^ C cca ^ x t +C Æ x Æ t i
Each ofthe termsin the aboveexpressionis equalto aproduct ofthree matrices, twoof which
areoforderO(Q
T
). Asanexampleinvestigatetherstterm( ^ K p K p )hY t 1;p ;C Æ x Æ t ^ C cca ^ x t i= ( ^ K p K p )hY t 1;p ;x Æ t i(C Æ ^ C cca ) 0 +( ^ K p K p )hY t 1;p ;x Æ t ^ x t i ^ C 0 cca . The entries of ^ K p K p are
of therequired orderdue to Lemma 4.2. Also x^
t x Æ t =( ^ K p K Æ p )Y t 1;p +(A Æ K Æ C Æ ) p x p t p .
Theconjecturethen followsfrom b A p cca =O(Q T
)and theanalogousresultfor(A
Æ K Æ C Æ ) p due tothechoiceofp=d^p BIC . Finallyhy t j ;" t i=O(Q T
)dueto theuniformconvergenceofsample
covariancesshowsthat h
t ;" t i : =0andthush t ;"^ t i : =0implyingh t+1 ;"^ t i : =0.
For the second claim note that h
t ;x^ t i : = h t ;x~ t i = h t ;x~ t 1 i ^ A 0 cca +h t ;"~ t 1 i ^ K 0 cca : = h t+1 ;x~ t i ^ A 0 cca ,whereweused h t+1 ;"~ t ^ " t i :
=0andtherstclaim. Thus
h t+1 ;x^ t i= b A cca h t ;x^ t i : = b A cca h t+1 ;x^ t i ^ A 0 cca
whichshowsthesecondclaim. h
t ;x^ t x Æ t i : =0analogouslytoh t ;"^ t " t i : =0. Thush t ;x Æ t i : = 0. This shows h t ;y t+j i : = 0 since y t+j = " t+j + P j 1 i=1 C Æ (A Æ ) i 1 K Æ " t+j i +C Æ (A Æ ) j x Æ t and h t ;" t+j i :
=0;j 0asis straightforwardto showusingtheuncorrelatednessof"
r+j andY
t;p in
thatcase.
Ithasbeenstatedalready,that thesymmetrybetweenforwardandbackwardrepresentation
willplayamajorroleintheproof. ThedenitionoftheSVDintheCCAcaseleadsto
^ K p hY t;p ;Y t;p i = ^ 1 z b O 0 f hY + t;f ;Y t;p i b O 0 f hY + t;f ;Y + t;f i = ^ 1 x ^ K p hY t;p ;Y + t;f i (4) since ^ + f = hY + t;f ;Y + t;f i; ^ p = hY t;p ;Y t;p
i and the choice of the weighting ^ W + f and ^ W p . Here ^ x = h^x t ;x^ t i; ^ z = h^z t ;z^ t i using z^ t = ^ O f ( ^ + f ) 1 Y + t+1;f , where ^ O f = ( ^ W + f ) 1 ^ U n ^ n ^ T 1 and b O f =( ^ + f ) 1 ^ O f
. In fact, onemight ask,why the systemmatrices are estimated from the
for-wardstateestimateratherthanusingthebackwardstateestimatez^
t
andcalculatingtheforward
representation from these estimates. Thus compare the two estimated system, obtained from
the forwardand the backwardstate estimatesrespectively: Using the stateestimates x^
t and z^
t
respectivelytwodierentsystemscouldbeestimated:
h [ b A cca ; ^ K cca ]; ^ C cca i = h^x t+1 ; ^ x t y t ih ^ x t y t ; ^ x t y t i 1 ; hy t ;^x t i ^ 1 x [ b A b cca ; ^ K b cca ]; ^ C b cca = h^z t 1 ; ^ z t y t ih ^ z t y t ; ^ z t y t i 1 ; hy t ;z^ t i ^ 1 z
other. Thereareacoupleofinterestinglinks betweenthesetwoestimates: Oneresult,whichwill
beneededfurtheronisthefollowing:
^ C cca =hy t ;x^ t i ^ 1 x =[I;0]hY + t;f ;Y t;p i ^ K 0 p ^ 1 x =[I;0]hY + t;f ;Y + t;f i b O f =hy t ;z^ t 1 i (5) Thisimplies ^ C b cca =hy t ;x^ t+1
i. Nextconsider theestimationofA:
^ A cca = h^x t+1 ;x^ t i ^ 1 x = ^ K p hY t+1;p ;Y t+1;p+1 i 0 ^ K 0 p ^ 1 x : = ^ 1 z b O 0 f hY + t+1;f ;Y t+1;p+1 i 0 ^ K 0 p ^ 1 x = ^ 1 z h^z t ;x^ t i ^ 1 x ^ A b cca = h^z t 1 ;z^ t i ^ 1 z = b O 0 f hY + t;f ;Y + t;f+1 i " 0 b O f # ^ 1 z : = ^ 1 x ^ K p hY t;p ;Y + t;f+1 i " 0 b O f # ^ 1 z = ^ 1 x h^x t ;z^ t i ^ 1 z
Heretheessentialequivalenceisduetotheneglectionofthetermsh[ ^ K p ] p y t p ;x^ t iandh[ b O 0 f ] f y t+f ;^z t i respectively,where[X] l
denotesthel-thblockcolumnofthematrixX. Theseareneglectabledue
to Lemma 4.1 and the increaseof f and p respectivelyof order logT. Therefore the dierence
^ A 0 cca ^ A b cca
is neglectable and theestimates ofthe dynamics forthe forwardand the backward
systemisessentiallyidentical. Thefollowinglemmaclariesthedierencebetweencalculatedand
estimated backward system. The proof willbe omitted,asonly aweakerbound on theerroris
neededinthefollowing.
Lemma4.4 Under the assumptions of Theorem 3.1 let( ^ A b cca ; ^ K b cca ; ^ C b cca
)denote the system
es-timatedfrom the estimate of the backwardstate z^
t . Let ( ~ A b cca ; ~ K b cca ; ~ C b cca
) denote the system
cal-culatedfrom the estimatedforward system( ^ A cca ; ^ K cca ; ^ C cca ). Then ^ A b cca : = ~ A b cca ; ^ K b cca : = ~ K b cca ; ^ C b cca : = ~ C b cca
Infactintheproofbelownottheneglectabilityoftheerrorisneeded,butonlyanerrorbound
of the order O(Q
T
), which can be established easily from bounding the dierence to the true
backwardrepresentation: Forthe calculatedsystemthis followsfrom thedierentiability of the
backwardsystemmatriceswithrespectto theentriesintheforwardrepresentationtogetherwith
therespectiveerrorboundfortheestimatedforwardsystem. Fortheestimatedbackwardsystem
ananalysissimilartotheargumentsofLemma4.2canbeappliedalsoforthebackwardanalysis,
leading to the required result. Finally it is noted, that due to the symmetry of backwardand
forward representationstheresults ofLemma 4.3 also imply analogousresults forthe backward
state, e.g. h~z t ^ z t ;y t j i : = 0;j > 0orh~z t ^ z t ;z^ t+j i : = 0;j =0;1. Here ~z t
canstandfor both,
the backward state calculated from ( ^ A b cca ; ^ K b cca ; ^ C b cca ) or ( ~ A b cca ; ~ K b cca ; ~ C b cca ) respectively, due to
Lemma4.4. TheproofofthisstatementiscompletelyanalogoustotheproofofLemma4.3.
4.2 The criterion function M
f (Y
+
1;T ;)
The main trouble with maximum likelihood estimates is that they are only given implicitely.
Therefore thecriterion function has to be adapted to the equation dening theML estimate in
ordertoderiveitspropertiesattheMLestimate. Ontheotherhandtherehastobeaconnection
tothestructureofthesubspaceestimates. Thismotivatesthechoice
M f (Y + 1;T ;)= 1 f tr h ( f ) 1 hY + t;f O f K p Y t;p ;Y + t;f O f K p Y t;p i i (6) Here f =I f
, where denotes the Kroneckerproduct,i.e.
f
denotes theblock diagonal
f E f f E 0 f
isused. For f =1andp=1thisis identicalto thepredictionerrorcriterionfunction.
Itisnothardtoshow,that undertheassumptionsonf andpgiveninTheorem3.1thefunction
M f (Y + 1;T ;)=(logdet f )=f+M f (Y + 1;T
;)convergesto theasymptoticlikelihood. On theother
handit isstraightforwardto show,that ifnorestrictionsontheentries ofE
f ;O f ;K p and f are
imposed thematrices ^ E f ; ^ O f ; ^ K p and ^ f
obtainedastheCCAchoicesoptimize M f (Y + 1;T ;). This
followsfromthepropertiesofthesingularvaluedecomposition(cf.StoorvogelandVanSchuppen,
1997, for a discussion on this issue). The diÆcult part of the proof lies in imposing the full
structureonthematricesE
f ; f ;O f andK p
fortheCCAestimates.
4.3 Properties of the derivative at ^
ML
Theproofisbasedonthefactthat
0 : =@ i L(Y + 1;T ; ^ ML ) : = 8 < : 1 T T X t=1 " t ( ^ ML ) 0 ^ ( ^ ML ) 1 1 X j=1 @ i K (j; ^ ML )y t j 9 = ; Here @ i K (j; ^ ML
) denotes thederivative of thej-th coeÆcient of theinverse transferfunction
k(z; ^ ML ) 1
withrespect tothei-th coordinateof . Thisequalityfollowsfrom thedenitionof
theestimate. Straightforwardcalculationsshowthat
@ i M f (Y + 1;T ;) = 2 f tr h 1 f (@ i E f )E 1 f n hY + t;f O f K p Y t;p ;Y + t;f O f K p Y t;p i o i 2 f tr h 1 f h( @ i O f )K p Y t;p +O f ( @ i K p )Y t;p ;Y + t;f O f K p Y t;p i i (7)
Correspondingtotheterminvolving@
i O f and@ i K p
respectivelynote thatY + t;f O f K p Y t;p = E Æ f E +;Æ t;f + O Æ f x Æ t O f K p Y t;p
. Here quantities using the true system
Æ
are denoted with the
additional superscript Æ
. Also note that hy
t j ;"
t
i = O(Q
T
) due to the uniform convergence
of the sample covariances (see e.g. Hannan and Deistler, 1988, Theorem 5.3.2). It is easy to
see, that O f ;K p ;@ i O f ;@ i K p
all have entries decreasing geometrically, since ^
ML
will enter any
neighbourhoodof
Æ
a.s. forTlargeenoughaccordingtoLemma4.2. Thegeometricdecreasethen
followsfromthestabilityandthestrictminimum-phaseassumptiononthesystemcorrespondingto
Æ
. Thereforethesumoftheabsolutevaluesoftheentriesofthesematricesisboundeduniformely
in f;p and T large enough. It follows from easily established results about the multiplication
withblockToeplitzmatrices,that thepremultiplicationwith (E
f f E 0 f ) 1
doesnotchangethese
properties(seee.g. Bauer,1998,Chapter4). Thereforethetermdueto E Æ f E +;Æ t;f contributesas 2 f tr h K p hY t;p ;E +;Æ t;f i E Æ f 0 1 f (@ i O f )+(@ i K p )hY t;p ;E +;Æ t;f i E Æ f 0 1 f O f i Here f = E f f E 0 f
is used. It is easy to see that the 1-norm of this matrix is of the order
O(Q T =f) =O( p loglogT=( p
Tf))and thus o(1= p
T), if p
loglogT =o(f). This is truefor the
choiceoff statedin thetheorem, which isof orderlogT a.s. Hencethistermmaybeneglected.
ForthetermduetoO Æ f x Æ t O f K p Y t;p : = O Æ f O f K Æ p Y t;p +O Æ f K Æ p K p Y t;p holdsasfollows
from thesize of p =p(T), i.e. the state reconstruction can be truncated withoutaecting the
asymptoticalbehaviour. Here O Æ f O f andK Æ p K p
areoforderO(Q
T
)asshowninLemma4.2.
Analogously to above investigate the contribution of O Æ
f O
f
to the second summand of the
derivative: 2 f tr K p ^ p K Æ p 0 O Æ f O f 0 1 ( @ i O f )+( @ i K p ) ^ p K Æ p 0 O Æ f O f 0 1 O f : = 2 f tr Efx Æ t (x Æ t ) 0 g O Æ f O f 0 1 Æ @ i O Æ f +(@ i K p )E Y t;p (x Æ t ) 0 O Æ f O f 0 1 Æ O f
neglectability of this contributionis obtained. Thesecond term due to K Æ p K p canbe treated
analogously. Thustheessentialtermin (7)isequalto
@ i M f (Y + 1;T ;) : = 2 f tr h 1 f E 1 f (@ i E f )E 1 f n hY + t;f O f K p Y t;p ;E 1 f Y + t;f O f K p Y t;p i o i (8)
Note that up to nowno properties of the MLestimate except forthe norm bound onthe error
havebeenused. Thusalsofortheestimate ^
cca
theessentialtermisgivenbyequation(8). This
followsfromLemma 4.2
Fortheinvestigationoftheessentialtermthespecialpropertiesof theMLestimateareused.
Analogously to the arguments given above it follows that the term in equation (8) involving
O f K p hY t;p ;E 1 f Y + t;f O f K p Y t;p i is neglectable, since p
loglogT = o(f). Thus we are led
to examine 2 f tr h 1 f @ i E 1 f hY + t;f ;E + t;f ( ^ ML )i i evaluatedat ^ ML . Here @ i (E 1 f )is equalto the
blockToeplitzmatrixcontainingthederivativesofthecoeÆcientsoftheinversetransferfunction
k(z; ^ ML ) 1 = (I +z ^ C ML (I z ^ A ML ) 1 ^ K ML ) 1
as its blocks. Here z denotes the backward
shiftoperator. FurthermoreE + t;f ( ^ ML )=E 1 f Y + t;f O f K Y t;1
hasbeenused,where E + t;f ( ^ ML )
denotesthevectorobtainedfromstackingtheestimatedresidualsy
t ^ C ML ~ x t
intoavector
analo-gouslyasthevectorY + t;f isbuiltofy t . Herex~ t
denotestheestimateofthestateusingthesystem
^
ML
. Thusthe(l;j)-thblockinthematrix,whosetraceiscalculatedinequation(8)isessentially
equal to the inner product of a truncated version of @
i " 1 t+j ( ^ ML ) = P 1 r=1 @ i K (r; ^ ML )y t+j r (i.e. @ i " j t+j ( ^ ML )= P j 1 r=1 @ i K (r; ^ ML )y t+j r ) with " t+l ( ^ ML
)for 0 j;l f 1. Since the
traceisexamined,onlythediagonalblocks,i.e. theblocksforj=lhavetobeconsidered. Note
thath@ i " 1 t+j ( ^ ML );" t+j ( ^ ML )i :
=0foralljf,asfollowsfrom aneglectionoftheinitialvalues,
whichisstraightforwardtojustify. Thereforethecrucial termisessentiallyequalto
2 f f X j=1 tr h ^ 1 h@ i " j t+j ( ^ ML );" t+j ( ^ ML )i i : = 2 f f X j=1 tr 2 4^ 1 h 1 X r=j+1 @ i K (r; ^ ML )y t r ;" t ( ^ ML )i 3 5
Againtheessentialequalityisduetotheneglectionofinitialeects. Notethathy
t r ;" t ( ^ ML )i= O(Q T
)uniformelyin r forr=O((logT) a
)for a<1, sincein this casehy
t r ;" t i=O(Q T )and alsohy t r ;" t ( ^ ML ) " t i=O(Q T
)due tothenorm bound ontheestimationerrorin thesystem
matricesshownin Lemma 4.2. Terms oflargerr are premultipliedby@
i K (r; ^ ML )=O( r )= O(T alog )=o(T 1
)foralog< 1. Thereforetheaboveexpressionisoforder
O 0 @ 2 f f X j=1 (logT) a X r=j+1 k@ i K (r; ^ ML )kQ T 1 A =O 0 @ Q T 2 f f X j=1 (logT) a X r=j+1 r 1 A =O(Q T =f)
for some0< <1. This follows from thestabilityand the strict minimum-phase assumption,
whichalsoholdsforthederivativeofthetransferfunction. Therefore@
i M f (Y + 1;T ; ^ ML ) : =0forall iasis required.
4.4 Properties of the derivative at ^
cca
Theevaluationsin thissectionaremoreinvolved. Themainstrategyisto reducethe expression
ofthe derivativeof thefunction M
f (Y + 1;T ; ^ cca
)to terms,whichare knownto be neglectable,i.e.
o(T 1=2
). TheresultsofLemma4.3andLemma 4.4willserveasthemainbasisforthis,i.e. the
derivativewillberelatedtotermsoftheformh^x
t ~ x t ;y t+j iandh~z t ^ z t ;y t j irespectively.
Recall that the rst part of the proof is unchanged for ^ cca replacing ^ ML . Therefore the interestingtermis @ i M f (Y + 1;T ; ^ cca ) : = 2 f tr h 1 f (@ i E f )E 1 f n hY + t;f O f K p Y t;p ;Y + t;f O f K p Y t;p i o i (9)
Inthis section termslike@ i E f ;E f ;O f ;K p ; f and f
alwaysareformed using(A
cca ;K cca ;C cca ) and ^
orthecorrespondingbackwardsystem. Considerthematrix@
i E
f
, where thederivativeis
withrespecttoanentryinAorKrst. ThederivativeswithrespecttoentriesinCwillbedealt
withlateron. It followsfrom theformofE
f
that thej-thblockcolumn ofthederivative@
i E f is givenby 0 sjn @ i (O f j K) = 0 sjn O f j (@ i K)+ 0 s(j+1)n (@ i O f j 1 ) ^ A cca ^ K cca +O f j 1 (@ i A) ^ K cca
Therefore terms like 2 f hE + t;f ( ^ cca );Y + t;f O f K p Y t;p i 1 f 0 jsn O f j
have to be considered. Here
E + t;f ( ^ cca )=E 1 f Y + t;f O f K p Y t;p
. UsingthestructureofE
f one obtains 1 f 0 jsn O f j = " E 0 j E 0 j C 0 j O 0 f j E 0 f j 0 E 0 f j # 1 f E 1 j 0 E 1 f j O f j C j E 1 j E 1 f j 0 jsn O f j = " E 0 j E 0 j C 0 j O 0 f j E 0 f j 0 E 0 f j # 1 f 0 jsn E 1 f j O f j = " E 0 j C 0 j O 0 f j E 0 f j 1 f j E 1 f j O f j E 0 f j 1 f j E 1 f j O f j # = E 0 j C 0 j O 0 f j O f j O f j HereE 0 f j 1 f j E 1 f j O f j = O f j
,wherethisdenes O f j ,andC j = h ^ A j 1 cca ^ K cca ; ^ A j 2 cca ^ K cca ;; ^ K cca i .
Ananalogouspartitioning leadsto:
1 1 O 1 f = " E 0 f E 0 f C 0 f O 0 1 E 0 1 0 E 0 1 # 1 1 E 1 f 0 E 1 1 O 1 C f E 1 f E 1 1 O f O 1 ^ A f cca = h E 0 f E 0 f C 0 f O 0 1 E 0 1 i 1 1 " E 1 f O f E 1 1 O 1 ^ A f cca C j E 1 f O f # = E 0 f C 0 f O 0 1 E 0 1 1 1 E 1 1 O 1 ( ^ A cca ^ K cca ^ C cca ) f + O f Here [X] f
denotes the matrix of the rst f block rows of the matrix X. Thus the norm of
[ 1 1 O 1 ] f O f is smallerthanC() f
for allf forsomeC <1, wherej
0 j<<1. Notethat (K b 1 ) 0 =( + 1 ) 1 O 1 =(E 1 1 E 0 1 ) 1 O 1 S,since + 1 =(E 1 1 E 0 1 )+O 1 x O 0 1
. Thustheabove
resultimplies,that O 0 f S 0 K b f =O( f ).
In thenext stepthe matrix
O 0
f j
will be replaced with S 0
K b
f
, the matrixconsisting of the
rstf blockcolumnsofK b
. Theerrorintroducedbythisreplacementisoftheform
2 f h" t+l ( ^ cca );Y + t;f+j O f+j K p Y t;p i 2 4 0 @ E 0 j C 0 j O 0 f j O f j O f j 0 1 A 0 @ E 0 j C 0 j O 0 f (K b f ) 0 S 1 (K b f j ) 0 S 1 (K b j ) 0 (( A b ) f j ) 0 S 1 1 A 3 5 where l<j and A b = ~ A b cca ~ K b cca ~ C b cca . Note that h" t+l ( ^ cca );" t+j ( ^ cca )i=O(Q T ) forj 6=l, as followsfrom h" t+l ;" t+j i =O(Q T
)and the fact that the estimation errorsin thesystem matrix
estimatesareoforderO(Q
T ). Thetermsh" t+l ;" t+l icanbereplacedbyh" t+l ;" t+l i duetothe
lowertriangularblockstructure of E
f and @
i E
f
andthis termis also oforder O(Q
T
). Recalling
thebound on the normof O f j (K b f j ) 0 S 1
itfollowsthat thenorm of theexpression above
isof orderO(Q T kK b f j+1 k=f)=O(jj f j Q T =f),whereK b j =( A b ) j 1 ~ K b cca . Itremainsto assess
the total contribution of the replacement. The matrices O
f j
occur in j places: Once for the
derivative of K in the j-th column of E
f
and postmultiplied with ^
A j i 1
cca
in the i-th column,
1ij 1,for thederivativewith respect toan entryin A. Due tothe stability of ^
A
cca the
contributionisthusoforderO(jj Q
T
=f). Summingthiscontributionover1j f 1results
inatotaleectoforderO(Q
T
=f)=o(T 1=2
). Thereforethedierencecanbeneglectedandthe
factorScan bedropped,asitisnotessentialfortheanalysis. Theinvestigationthusfocusseson
2 f h" t+l ( ^ cca );Y + t;f+j O f+j K p Y t;p i " E 0 j C 0 j O 0 f (K b f ) 0 (K b f ) 0 # = 2 f h" t+l ( ^ cca );K b f Y + t+j;f O f ( ^ A j cca K p Y t;p +C j E 1 j [Y + t;j O j K p Y t;p ]) i : = 2 f h" t+l ( ^ cca );K b f Y + t+j;f O f ~ x t+j i : = 2 f h" t+l ( ^ cca ); ^ O 0 f ( ^ + f ) 1 Y + t+j;f O f ~ x t+j i Here x~ t =K p Y t;p
. Thesecond last equalityfollowsfrom ^ A j cca K p Y t;p +C j E 1 j [Y + t;j O j K p Y t;p ] : = ~ x t+j
,asisstraightforwardtoprove. Thelastequationfollowsfromthefact,thatboth ^ O 0 f ( ^ + f ) 1 O 0 f ( + f ) 1 andh" t+l ( ^ cca );Y + t+j;f O f ~ x t+j
iareoforderO(Q
T
)andthattheerrorinthe
replace-mentofO 0 f ( + f ) 1 byK b f
isboundedinnormoforderO( f
)=o(T 1=2
)ashasbeenusedbefore.
Thusconsiderh" t+l ( ^ cca ); b O 0 f Y + t+j;f i : =h" t+l j ( ^ cca ); b O 0 f Y + t;f
i. Accordingtoequation(4)itfollows
that hy t s ; b O 0 f Y + t;f i = hy t s ; b O 0 f O f ^ x t
i for s p. Note that "
t+l j = [I; ^ C cca K 1 ]Y t+l j+1;1 . Thereforeh" t+l j ( ^ cca ); b O 0 f (Y + t;f ^ O f ^ x t )i isof orderO(Q T p j+l
). Againaccountingfor all
oc-curringtermsleadstotheneglectabilityofthedierence. Thuswehavereducedthecrucialterm
to 2 f h" t+l ( ^ cca ); ^ O 0 f ( ^ + f ) 1 ^ O f ^ x t+j O f ~ x t+j i = 2 f h" t+l ( ^ cca ); ^ O 0 f ( ^ + f ) 1 ( ^ O f O f )^x t+j +O f (^x t+j ~ x t+j ) i where x^ t = ^ K p Y t;p
has been used. Consider ^ O 0 f ( ^ + f ) 1 ( ^ O f O f ) : = K b f ( ^ O f O f ) rst: Note that K b f O f = ^ K b cca ^ C cca +( ^ A b cca ^ K b cca ^ C b cca )K b f O f ^ A cca +O( f 0 f p ), where j max ( ^ A cca )j < p < 1;j max ( ^ A b cca ^ K b cca ^ C b cca )j< 0 <1. Here max
(:)denotes aneigenvalueofmaximummodulus.
Furthermore K b f ^ O f = K b f hY + t;f ;Y + t;f i b O f = ^ K b cca hy t ;z^ t 1 i+( ^ A b cca ^ K b cca ^ C b cca )K b f 1 hY + t+1;f 1 ;^z t 1 i : = ^ K b cca ^ C cca +( ^ A b cca ^ K b cca ^ C b cca )h~z t ;z^ t 1 i : = ^ K b cca ^ C cca +( ^ A b cca ^ K b cca ^ C b cca )K b f ^ O f ^ A cca
Here the last equation followsfrom h~z
t ;z^ t 1 i : =h^z t ;z^ t 1 i=h^z t ;z^ t i( ^ A b cca ) 0 and h^z t ;z^ t i : =h~z t ;^z t i
accordingtothebackwardversionofLemma4.3. Also ^ A b cca : = ^ A 0 cca and ^ C cca =hy t ;z^ t 1 iaccording
to(5)hasbeenused. Thisshowsthat K b f O f b O 0 f ^ O f isofordero(T 1=2
)andthusneglectable.
It remainsto bound thenorm ofthe contributionof thecrucial termsto thederivative.
Re-callequation (9), which statesthe essentialterm in the derivativeof M
f (Y
+
1;T
;). Consider the
derivativewithrespecttoanentryinK rst: Inthiscasethepartialderivativeisequalto
@ i M f (Y + 1;T ; ^ cca ) = 2 f f 1 X j=1 tr h" t+j 1 ( ^ cca );Y + t;f O f K p Y t;p i(E f f E 0 f ) 1 0 jss O f j @ i K : = 2 f f 1 X j=1 tr h" t+j 1 ( ^ cca ); b O 0 f ( ^ O f O f )^x t+j +O f (^x t+j ~ x t+j ) iS 1 (@ i K) : = 2tr h E" t 1 x 0 t (K b f ( ^ O f O f )) 0 S 1 (@ i K) i : =0
wheretheresultsderivedabovehavebeenused.
i f 2 f P f j 1 l=0 tr " ^ K 0 cca ( ^ A l cca ) 0 (@ i A) 0 0 (j+l)sn O f j l 0 E 0 f 1 f E 1 f hY + t;f O f K p Y t;p ;" t+j 1 ( ^ cca )i # : = 2 f P f j 1 l=0 tr h (@ i A) 0 S 0 hK b f ( ^ O f O f )^x t+j+l +O f (^x t+j+l ~ x t+j+l ) ; ^ A l cca ^ K cca " t+j 1 ( ^ cca )i i : = 2 f P f j 1 l=0 tr h (@ i A) 0 S 0 hK b f ( ^ O f O f )^x t +O f (^x t ~ x t ) ; ^ A l cca ^ K cca " t l 1 ( ^ cca )i i = 2 f tr h (@ i A) 0 S 0 hK b f ( ^ O f O f )^x t +O f (^x t ~ x t ) ; P f j 1 l=0 ^ A l cca ^ K cca " t l 1 ( ^ cca )i i : = 2 f tr h (@ i A) 0 S 0 hK b f O f (^x t ~ x t ); P f j 1 l=0 ^ A l cca ^ K cca " t l 1 ( ^ cca )i i Notingthatx~ t : = P f j 1 l=0 ^ A l cca ^ K cca " t l 1 ( ^ cca )+ ^ A f j cca ~ x t j andthath t ;x~ t i : =h t ;x^ t i : =0ashas
beenshowninLemma4.3,thenormoftheaboveexpressionisoforderO(Q
T k ^ A f j cca k=f). Summing
overall columns 1 j f 2 then amounts to a total error of order O(Q
T
=f) = o(T 1=2
).
Thereforealsointhiscaseoneobtains@
i M f (Y + 1;T ; ^ cca ) : =0asrequired.
FinallyconsiderthederivativewithrespecttoentriesinC: Theessentialpartofthederivative
isequalto 2 f tr h 1 f E 1 f @ i E f hE + t;f ( ^ cca );E + t;f ( ^ cca )i i
. Herethej-throwofthematrix@
i E f isequal to@ i C[ ^ A j 2 cca ^ K cca ;; ^ K cca ;0 n(f j+1)s
]. Thecontributionofthej-thdiagonalblocktothetrace
forj2isequalto 2 f h j 2 X r=0 ^ A r cca ^ K cca " t+j r 1 ( ^ cca ); ~ K b f j Y + t+j;f j O f j ~ x t+j i Here ~ K b f j isequalto(E f j f j E 0 f j ) 1 [I ss ;0 s(f j 1)s ] 0
. ThisfollowsfromtheblockToeplitz
structureofE f considering 1 f [0 sjs ;I s ;0 s(f j 1) ] 0
usinganalogousargumentsasinthecaseof
1
f O
f
Notethatfrom
[I s ; ^ C b cca K b 1 ] + 1 =E[I s ; ^ C b cca K b 1 ]Y + t;1 (Y + t;1 ) 0 =[ b ;0 s1 ] itfollowsthat ( + 1 ) 1 [I s ;0 s1 ] 0 =[I; ^ C b cca K b 1 ] 0 ( b ) 1
. Usingthematrixinversionlemmaone
obtains ( 1 ) 1 = 0 fsfs 0 fs1 0 1fs ( + 1 ) 1 + I ( + 1 ) 1 ~ H 0 f;1 1 f h I; ~ H 1 ( + 1 ) 1 i since f =( + f H f;1 ( + 1 ) 1 ~ H 0 f;1 ). Here ~ H f;1 =EY + t;f (Y + t+f;1 ) 0
. Thereforeitisobserved,that
(E f j f j E 0 f j ) 1 [I ss ;0 s(f j 1)s ] 0
isequaltotherstf jblockrowsof( + 1 ) 1 [I ss ;0 s1 ] 0 .
Thustheessentialtermisoftheform
2 f h j 2 X r=0 ^ A r cca ^ K cca " t+j r 1 ( ^ cca );"^ t+j C b cca K b f j Y + t+j+1;f j O f j ~ x t+j+1 i( b ) 1
Againcomputingtheerrorinthetruncationofthestatereconstructionshows,thattheresulting
errormaybeneglected. Thus thesecond summandisneglectablesince
f 1 X j=1 2 f h~x t+j ;K b f j Y + t+j+1;f j O f j ~ x t+j+1 i : =2hx^ t ;K b f ^ O f ^ x t+1 O f ~ x t+1 i : =0
asthistermhasbeenanalysedalready. Theothersummandisneglectable,sinceh~x
t+j ;" t+j ( ^ cca )i : = h^x t+j ;" t+j ( ^ cca )i : =h^x t+j ;" t+j ( ^ cca ) "^ t+j i=h^x t+j ;x^ t+j ~ x t+j i ^ C 0 cca : =0accordingtoLemma4.3.
HereweusedasidefromtheequationsgiveninLemma4.3alsoy
t = ^ C cca ~ x t +" t ( ^ cca )= ^ C cca ^ x t +^" t .
Itremainstoshow,thattheHessianisasymptoticallynonsingular. Thiswillbedonebyreferring
tothesamepropertyofthepseudomaximum-likelihoodmethod. Simplebutcumbersome
calcula-tionsofthederivativeswithrespecttothei-thandthej-thcomponentofrespectivelyleadtothe
fact,thattheonlyessentialtermequals 2 f tr h @ i (E 1 f )hY + t;f O f K p Y t;p ;Y + t;f O f K p Y t;p i@ j (E 0 f ) 1 f i .
This since in all terms which include some derivative of O
f or K
p
include terms of the form
hy t j ;" t ( ^ cca
)i, which converge to zero due to the uniform convergence of the estimates of the
samplecovariancesandtheconsistencyofthesystemmatrixestimates. Theremainingterm
cor-respondstothesecondderivativeofE
f isequalto 2 f tr h 1 f @ 2 i;j (E 1 f )hY + t;f O f K p Y t;p ;E + t;f ( ^ cca )i i . Here @ 2 i;j (E f ) 1
is block lower triangular with zeroes on the block diagonal, whereas hY + t;f O f K p Y t;p ;E + t;f i ! E Æ f Æ f
. This shows, that the contribution of this term is zero. As for the
rst derivative theconvergence of the contribution of the terms O
f K p Y t;p to zerois immediate
from theToeplitzstructure andthefact, that1=f !0. Therefore theessentialtermisequalto
2 f tr h h@ i (E 1 f )Y + t;f ;@ i (E 1 f )Y + t;f i 1 f i . Let i t ()=@ i " t ()= P 1 j=1 @ i K (j;)y t j
. Note that the
familyoflterscorrespondingto@
i
K (j;)isuniformelystableduetothestrictminimum-phase
assumptionon
Æ
foracompactneighbourhoodof
Æ . Here@ i K (j;)isequaltothe(l+j;l)-th blockentry(1 lf j)in @ i (E 1 f
), asis immediate fromthedenition of E
f . Thereforethe dierence@ 2 a;b M f (Y + 1;T ; Æ ) @ 2 a;b L(Y + 1;T ; Æ )isessentiallyequalto 2 f P f j=1 tr h h P j l=1 @ a K (l;)y t+j l ; P j l=1 @ b K (l;)y t+j l i ^ 1 h P 1 l=1 @ a K (l;)y t+j l ; P 1 l=1 @ b K (l;)y t+j l i ^ 1 i = 4 f P f j=1 tr h h P j l=1 @ a K (l;)y t+j l ; P 1 l=j+1 @ b K (l;)y t+j l i ^ 1 i 2 f P f j=1 tr h h P 1 l=j+1 @ a K (l;)y t+j l ; P 1 l=j+1 @ b K (l;)y t+j l i ^ 1 i
The evaluations to show, that this tends to zero, are standard (see e.g. Hannan and Deistler,
1988, Chapter 4) and therefore omitted. Since the two Hessians are asymptotically equal the
nonsingularityoftheHessianofM
f (Y + 1;T ; Æ
)canbeinferredfromthecorrespondingresultforthe
HessianofL(Y +
1;T ;
Æ
). Thisnally concludestheproof.
5 Conclusions
Inthispaperthelongstandingquestionofasymptotic eÆciencyof CCAhasbeenanswered
aÆr-mativeinthecaseofnoexogenousinputs. Theimplicationsofthisresultisinthecase,wherethe
systemorderisknown,theCCAsubspacealgorithmisanimplicitimplementationofageneralised
pseudomaximum-likelihoodprocedure,which doesnotrequirethenumericaloptimisationofthe
pseudolikelihood and thus is noniterative. The proof also leads to a new central limittheorem
for theCCAsubspaceestimates, asit doesnot onlyapply in genericcases, but on thewhole set
of transfer functions of McMillan degree n, which are stable and strictly minimum-phase. The
authorwantstostress,thattheproofonlycontainsthesuÆciencyoftheassumptions. No
state-menthasbeenmade,thatdierentschemesaresuboptimal,althoughtheexamplestreatedsofar
in theliteraturesupportthis conjecture. Thepaperonlytreatsthecaseofnoexogenousinputs.
The case including exogenous inputs is still largely unsolved. Whether themethod used above
leadsto asimilar resultin the caseof additionalobservedinputsis amatter offuture research.
Finallynote,thattheproofonlyconsidersthecase,wherethetrueorderofthesystemisusedfor
estimationanddoesinparticularnotimply,thattheMLmethodsandtheCCAsubspacealgorithm
TheauthorwouldliketothankManfredDeistlerandWolfgangScherrerforthevaluablesupport
and many useful remarks during the time, while this work has been done. Also the nancial
support from the Austrian 'FWF' under projectnumberP-11213-MATand from the EU TMR
project'SI' in form of apost-doc position at the Department f. Automatic Control,Linkoping
University,Linkoping,Sweden,isgratefullyacknowledged.
References
Akaike, H. (1976). Canonical correlation analysis of time series and the use of an information
criterion.In: SystemIdentication: AdvancesandCase Studies(R.MehraandD. Lainiotis,
Eds.).pp.27{96. AcademicPressInc.
Bauer,D.(1998).SomeAsymptoticTheoryfortheEstimationofLinearSystemsUsingMaximum
LikelihoodMethodsorSubspaceAlgorithms.PhDthesis.TUWien.
Bauer, D. (2000). Order estimation for subspace methods. Technical report. Dept. Automatic
Control,LinkopingUniversitetet. SubmittedtoAutomatica.
Bauer,D.,M.DeistlerandW. Scherrer (1997b).Theanalysisoftheasymptoticvarianceof
sub-spacealgorithms.Proceedingsofthe11 th
IFACSymposiumonSystemIdentication,Fukuoka,
Japanpp.1087{1091.
Bauer, D., M. Deistler and W. Scherrer (1999). Consistency and asymptotic normality of some
subspacealgorithmsforsystemswithoutobservedinputs.Automatica35,1243{1254.
Bauer,D.,M.Deistlerand W.Scherrer(2000).Ontheimpactofweightingmatricesinsubspace
algorithms.In: Proceedings of the IFACConference 'SYSID'.SantaBarbara,California.
Desai, U. B., D. Paland R. D. Kirkpatrick (1985).A realization approach to stochastic model
reduction.International Journalof Control42(4),821{838.
Hannan,E.J.andM.Deistler(1988).TheStatisticalTheoryof LinearSystems.JohnWiley.New
York.
Larimore,W. E. (1983).System identication, reduced order ltersand modeling viacanonical
variateanalysis.In: Proc.1983Amer.ControlConference2.(H.S.RaoandP.Dorato,Eds.).
Piscataway,NJ.pp.445{451. IEEEServiceCenter.
Peternell, K.,W. Scherrerand M.Deistler(1996).Statisticalanalysis ofnovelsubspace
identi-cationmethods. SignalProcessing52, 161{177.
Stoorvogel,A.andJ.VanSchuppen(1997).Approximationproblemswiththedivergencecriterion
forgaussianvariablesandgaussianprocesses.TechnicalReportBS-R9616.CWI.Department