Asymptotic Efficiency of CCA Subspace Methods in the Case of no Exogenous Inputs

(1)

case of no exogenous inputs

DietmarBauer

Departmentof ElectricalEngineering

Linkoping University, SE-58183 Linkoping,Sweden

WWW: http://www.control.isy.l iu.s e

Email: Dietmar.Bauer@tuwien.ac.at

June 7,2000

REG

LERTEKNIK

AUTO

_{MATIC CONTR}

OL

LINKÖPING

Report no.: LiTH-ISY-R-2262

Submitted toJournal of TimeSeries Analysis

TechnicalreportsfromtheAutomaticControlgroupinLinkopingareavailablebyanonymousftp

(2)

(3)

case of no exogenous inputs

DietmarBauer

Institute f. Econometrics, Operations Research and System Theory

TUWien, E119

Argentinierstr. 8, A-1040 Wien

e- mail: Dietmar.Bauer@tuwien.ac.at

June 7,2000

Abstract

Inthispaperoneofthemainopenquestionsintheareaofsubspacemethodsisanswered

partly.Oneparticularalgorithm,sometimestermedCCA,isshowntobeasymptotically

equiv-alenttoestimatesobtainedbyminimizingthepseudomaximumlikelihood. Here

asymptoti-callyequivalentmeans,thatthedierenceofthetwoestimatorstimesthesquarerootofthe

samplesizetendstozero.

Keywords: linear systems, discrete time systems, subspace methods, asymptotic

eÆ-ciency

1 Introduction

The subspace method sometimes termed CCA has been proposed in (Larimore, 1983). It has

attainedagreatdealofattentionduetoitsgoodnumericalpropertiesanditsgoodperformanceon

realandsimulateddata. Theasymptoticpropertiesofthismethodhavebeenanalyzedinaseries

ofpapers: (Peternellet al., 1996)establishconsistencyof theestimatesof thetransferfunction.

(Baueretal.,1999)providetheproofoftheasymptoticnormalityofthesystemmatrixestimates.

This paper also demonstrates a method to approximate the asymptotic variance numerically.

These expressions havebeenfurther investigated in (Baueret al., 1997b;Bauer, 1998; Bauer et

al., 2000), where the rst papershows someplots for some low dimensional system, indicating

that aspecicchoiceofuserparametersleadstoestimates,whichareindistinguishablefrom the

Cramer Rao bound. The latterpaperreduces the numberof user parameters, which aect the

asymptotic accuracyof the transferfunction estimates. Examples given in the abovereferences

indicatethattheperformanceof CCAis'atleastclosetomaximum-likelihood',howevernoformal

verication of this statement is given in general. The aim of this paperis to ll this gap, i.e.

to prove the proposition that CCA is in fact a realization of a (generalized pseudo) maximum

likelihood procedure and thus asymptotically eÆcient. Here generalized refersto the denition,

that a maximum likelihood estimate is any estimate, which 'essentially' is equal to the global

optimumof thelikelihood. Here the meaningofessentiallywillbedened below. Pseudo refers

tothefact,thattheGaussianlikelihoodisoptimized,howeverthetruenoisedistributionwillnot

beassumedto beGaussian. Theauthor wantstonote, that this paperonlydealswith thecase

ofnoobservedexogenousinputs.

The organizationof this paper is as follows: In the next section the model set and the

as-sumptionsarestated. Inthissectionalsoashortdescriptionoftheconsideredalgorithmisgiven.

Section 3 states the main result i.e. the asymptotic equivalence of pseudo maximum-likelihood

estimatesandCCAestimates. Section4thengivestheproof. Finallysection5concludes.

Through-outX 0

will beused to denote transpose, tr[X] denotesthe traceoperator and X

i;j

the (i;j)-th

(4)

Inthispapernitedimensional,discretetime,linearstatespacesystemswithoutobservedinputs oftheform x t+1 = Ax t +K" t y t = Cx t +" t (1)

areconsidered. Here(y

t

;t2Z)denotesthes-dimensionalobservedoutputprocessand("

t ;t2Z)

denotess-dimensionalergodicwhitenoiseofmeanzeroandpositivedenitevariance. (x

t ;t2Z)

denotesthen-dimensionalstatesequence. Itwillbeassumedthroughoutthatthesystemisstable,

i.e. that j

i

(A)j <1holds for alleigenvalues

i

of A, and that thesystemis strictly

minimum-phase,i.e. j

i (

A )j<1holdsforalleigenvaluesof

A=(A KC). Theseassumptionsinparticular

imply that the system is in innovation form and that the innovation variance is nonsingular.

In order to guarantee asymptotic normality of the parameter estimates, we also impose some

additionalconditions: Ef" t jF t 1 g=0 Ef" t " 0 t jF t 1 g==E" t " 0 t Ef" t;a " t;b " t;c jF t 1 g=! a;b;c Ef" 4 t;a g<1

HereE denotesexpectation,F

t

denotesthe-algebraspannedby(y

s

;st) and"

t;a

denotesthe

a-thcomponentofthevector"

t

. Notethattheseassumptionscoincidewiththeassumptionsused

in the proof of the asymptotic normality for the ML estimates in (Hannan and Deistler, 1988,

Theorem4.3.2). Ithasbeenarguedthere,thatweakerconditionsaresuÆcientfortheasymptotic

normality. Howeversuch conditionswillnotbeconsidered here,sincetheargumentsalreadyare

quitecomplicated.

TheCCAsubspacealgorithmhasbeenintroducedby(Larimore,1983). Thebasicfact,thatis

exploitedisthepropertyofthestatetobeaninterfacebetweenthepastandthefutureofthetime

seriesin acertainsense: LetY + t;f =[y 0 t ;y 0 t+1 ;;y 0 t+f 1 ] 0 and letY t;p =[y 0 t 1 ;;y 0 t p ] 0 . Here f

andparetwointegerparameterstobechosenbytheuser. Inthispaperitwillalwaysbeassumed,

thatf =pischosenas ^ f =p^=ddp^ BIC e;d>1,wherep^ BIC

isequaltotheorderestimateobtained

byusingBICinalongautoregressiony

t =^a 1 y t 1 ++^a p y t p +"^ t

. Heredxedenotesthesmallest

integerlargerthan x. Itis known (seee.g. Hannanand Deistler,1988, Theorem 6.6.3.),that in

the present setting p^

BIC

tends to innity and fullls ^ p BIC 2logj 0 j logT

! 1 a.s. and thus f = p

tendto innityat rateproportionalto logT. Here

0 =maxfj i ( A)j;i=1;;ng. Dene E + t;f analogouslytoY + t;f

forthenoise"

t

inplace ofy

t

. Itfollowsfrom thesystemequations(1)that

Y + t;f =O f K p Y t;p +O f A p x t p +E f E + t;f : (2) HereO f =[C 0 ;(CA) 0 ;:::;(CA f 1 ) 0 ] 0 andK p =[K ; AK ;:::; A p 1 K]. FurthermoreE f denotesthe

lowertriangularblockToeplitzmatrix,whosei-thblockrowequals[CA i 2 K ;:::;CK ;I s ;0 s(f i)s ], whereI s

isthes-dimensionalidentityand0 ab

isanabnullmatrix. Notethatin thisequation

thefutureoftheoutputisdecomposed intothreeterms,wheretherstcorrespondstothenite

pastof the outputs, the second to thefar past (i.e. the term involving x

t p

), and the third to

thefuture of thenoise. Sinceptends toinnityit seemsto bereasonableto neglectthesecond

term in the equation. The rst and the last term are uncorrelated by assumption. This is the

(somewhatheuristical)motivationforthefollowingthreestepidenticationscheme:

1. Obtainanestimate ^ of=O f K p byregressingY + t;f onY t;p . 2. Typically ^

will be of full rank, whereas = O

f K

p

is of rank n. Thus nd a rank n

approximationof ^

usingaweightedsingularvaluedecomposition ^ W + f ^ ^ W p = ^ U n ^ n ^ V 0 n + ^ R , where ^ U n

denotesthematrix,whose columnsaretheleftsingularvectorscorrespondingto

thedominatingnsingularvalues,thelatterarethediagonalentriesin thediagonalmatrix

^ n . ^ V n

correspondsto therespectiverightsingularvectors. ^

R accountsforthe remaining

singular values. An estimateof K

p is dened as ^ K p = ^ T ^ V 0 n ( ^ W p ) 1 . Theweightings ^ W p ,

(5)

^

W

f

andthematrix ^

T areexplainedinmoredetailbelow. ForthemomentitsuÆcestostate,

that ^

T isrelatedtoabasischangein thestatespaceandthusisrequiredtobenonsingular.

3. Compute an estimate of the state as x^

t = ^ K p Y t;p

and then use the system equations to

obtain estimates of the various system matrices: Estimate C by regression of y

t onto x^ t . Let ^

C denote this estimate,then "^

t =y t ^ C^x t

isan estimateof thenoise. Finally [A;K]

is estimated byregressingx^ t+1 on [^x 0 t ;"^ 0 t ] 0

, and thevariance is estimated bythesample

covarianceof"^

t .

Thechoiceoftheweightingmatrices ^ W + f and ^ W p

willplayacrucialroleinthefollowing. Ithas

beenshownin(Baueretal.,2000)thatthechoiceoftheweightingmatrix ^

W

p

doesnothaveany

eectontheasymptoticdistributionoftheestimatedsystem. Thuswemightchose ^ W p =( ^ p ) 1=2 w.r.o.g. Here ^ p

denotesthesamplecovariancematrixofY

t;p

andY =X 1=2

isdenedsuchthat

YY 0 = X. The choice of ^ W + f

howeverhas been seen to have an in uence on the asymptotic

distribution. The term CCAwill be reserved for thechoice ^ W + f = ( ^ + f ) 1=2 , where ^ + f denotes

thesamplecovarianceofY +

t;f

. Methodsthatuseadierentchoiceoff and/or ^

W +

f

aresometimes

termedLarimoretypeofprocedures.InthispaperonlytheCCAcasewillbedealt with.

Thematrix ^

T obviouslyreferstoachangeinthebasisoftheestimatedstate^x

t

. Itis

straight-forwardtoshow,thattheestimatedtransferfunction isnotaectedbysuchachange. Therefore

itmightbeassumedw.r.o.g. that ^

T ischosensuchthattheestimatedsystem( ^ A; ^ K; ^ C)liesinan

appropriateoverlappingform. Herethemeaningofappropriatewill becomeclearbelow.

Notethat with these choices there exists aremarkablesymmetry: Inthe single-input

single-outputcasee.g. thematrix,whichisdecomposedin theSVDwill besymmetric. Thissymmetry

builds the core of the stochastic balancing ideas of (Desai et al., 1985). In this paper two

dif-ferent state space realizations attached to a covariance sequence are constructed, oneof which

istermed theforwardrepresentation andtheother oneis called backward representation. Inthe

present framework theforwardrepresentationis givenby theequations (1), whilethe backward

representationisoftheform

z t 1 =A b z t +K b t ; y t =C b z t + t (3) Here t

denotes the backward innovation sequence. Thebackwardrepresentation stems from a

realizationofthesequenceoftransposesoftheoriginalcovariancesequence. Fortheforwardsystem

we have denoting Ey t y 0 t+j = (j) that (j) = M 0 (A 0 ) j 1 C 0 ;j > 0; (0) = C x C 0 +; (j) = ( j);j < 0. Here M 0 = Ey t x 0 t+1 = C x A 0 +K 0 ; x = Ex t x 0 t = A x A 0 +KK 0 . For the backwardrepresentation (j)=C b (A b ) j 1 M b ;j>0; (0)=C b z (C b ) 0 + b ; (j)= ( j);j<0, with (M b ) 0 = Ey t z 0 t 1 = C b z (A b ) 0 + b (K b ) 0 ; z = Ez t z 0 t = A b z (A b ) 0 +K b b (K b ) 0 . Here b = E t 0 t

denotes the backward innovation variance matrix. From a decomposition of the

covarianceHankel matrixit follows, that there exists a nonsingularmatrix T, such that M b = TC 0 ;C b =M 0 T 1 ;A b =TA 0 T 1

. Elementarycalculationsalsoshow,that

[M;AM;]( 1 ) 1 = K ;( A)K ; M b ;A b M b ; ( + 1 ) 1 = [TC 0 ;TA 0 C 0 ;]( + 1 ) 1 =TO 0 1 ( + 1 ) 1 = K b ;(A b K b C b )K b ;

Note that xing T = I

n

there exists a one-one relation between the forward representation

(A;K ;C)andandthebackwardrepresentation(A b ;K b ;C b )and b

. Thissymmetrywilllieat

thecoreofthepaper,asitallowsustotransfereachresultobtainedfortheforwardrepresentation

toaresultcorrespondingto thebackwardrepresentationandviceversa.

3 Main result

In(Baueretal.,1999)ithasbeenshown,thatingenericcasesforawiderangeofchoicesforf;p

and ^ W + f ; ^ W p

(6)

ofleadingtoexpressions fortheasymptoticvarianceand notrelyingonspecialproperties ofthe

weightingmatricesandtheintegerf. Thedisadvantagehoweveremanatingfromthisframeworkis

thediÆcultytocomparedierentapproachesaccordingto theirasymptoticvarianceandrelative

eÆciency. In this paper a dierent approach is pursued leading to another derivation of the

asymptoticnormalityresultintheCCAcaseandmoreovertoaproofoftheasymptoticequivalence

toageneralizedpseudomaximumlikelihoodestimate,whichisdenedasfollows: ^

ML

essentially

maximizes the (pseudo) likelihood L(Y +

1;T

;) according to the model dened by (1) using the

Gaussianlikelihood, i.e. iL(Y + 1;T ; ^ ML ) min 2 L(Y + 1;T ;)!0a.s. and p T@ i L(Y + 1;T ; ^ ML )! 0;8iinprobability,where@ i

denotesthederivativewithrespecttothei-th coordinateof. This

isstatedinthemaintheorem ofthispaper:

Theorem3.1 Let (y

t

;t2 Z)be generated by a systemof the form (1), where the ergodic white

noise "

t

fulllstheassumptions ofsection2. Assumethatthetrueorderofthesystemnisknown

andthatf =p=ddp^

BIC

e;d>1isused. Thentheestimatesofthetransferfunctionk(z)obtained

byusing CCAaregeneralisedpseudomaximum-likelihoodestimates.

The author wants to stress, that the assumptions put forth in the theorem are exactly the

sameasused in the proofof the asymptoticnormalityof pseudomaximum-likelihood estimates

givenin (HannanandDeistler,1988). Notehoweverthat theanalogousresultfor theinnovation

varianceis notprovided.

Inthetheoremitisassumedthattheorderofthesystemisknownapriori. Thisisanunrealistic

situation inpractice. Thus theorderhastobeestimated. Onepossiblemethod todothis could

bethefollowing: Theinnovation varianceis usually estimatedas ^ n =hy t ;y t i ^ C n h^x n t ;x^ n t i ^ C 0 n . Here ha t ;b t i =T 1 P T f t=p+1 and x^ n t = ^ K p (n)Y t;p

denotes theestimated state using theorder n.

Similarily ^

C

n

denotesthe estimateofC usingorder n. The matrices ^

n

can becalculated with

extremely lowcomputational costs (see Bauer, 1998, Chapter 5, for details). Thus it seems to

be tempting to estimate n by minimizing a criterion, which is constructed analogously to the

information criteriaintroduced by(Akaike, 1976), insertingthe estimates ^

n

. In(Bauer, 1998)

sometheoreticalproblemswith thismethod arediscussedand anexampleisgiven,which shows

clearly, that this method might in somesituation lead to bad estimates, which are signicantly

dierent from BIC estimates in the ML framework. Note however, that the result given above

alsohasdirectimplicationsfortheorderestimationissue,sinceitreducesthenumericalcostsfor

the optimizationof the likelihood signicantly and makesit possibleto estimate theinnovation

varianceinacomputationallyfeasiblewayfromestimatingthesystemmatricesforarangeoforders

andthenestimatingtheinnovationvariancein thestandardwayasthesamplecovarianceofthe

estimated residuals obtainedfrom ltering the output with theinverse of thetransfer function.

Moreonorderestimation inthecontextofsubspacemethods canbefoundin (Bauer,2000).

Theauthoralsowantstostress,that theresultismoregeneralthanthecorrespondingresult

in(Baueretal.,1999)asitisnotrestrictedtogenericcases.

4 Proof of the main result

Inthissectiontheproofoftheresultofthispaperisgiven. Sincethedetailsarerathercomplicated

rst an outline of the strategyis provided: The proof orientates itself veryclosely at the

stan-dardproofoftheasymptoticnormalityforpseudomaximumlikelihoodestimators. LetL(Y +

1;T ;)

denote the pseudo likelihood accordingto the model (1) , where 2 R 2ns

denotes the vectorof

parameters,accordingtotheoverlappingformsdescribedin(HannanandDeistler,1988,Chapter

2). Note,that doesnotcontainanyparameterscorrespondingto. Anyoverlappingcanonical

formparameterizingthesetofallrationaltransferfunctionsofMcMillandegreencouldbeused.

It is assumed withoutrestriction ofgenerality, that theparametersrefer to aparticular

coordi-nate neighbourhood,suchthat thetrueparameter,

Æ

say,isaninterior pointofthis coordinate

system. Itwillalsobeassumed,thatthematrix ^

(7)

tricesimposed onthisspecial neighbourhood,whichin thecaseoftheechelonoverlappingforms

amountstozerooronerestrictionsof certainelements. Let ^

ML

denoteageneralisedmaximum

likelihood estimate, i.e. an parameter value essentially maximizing the likelihood. Further let

the corresponding system be denoted as ( ^ A ML ; ^ K ML ; ^ C ML

). Then the proof of the asymptotic

normalityin thelikelihoodcasereliesonalinearisationofthepseudolikelihood:

@L(Y + 1;T ; ^ ML )=@L(Y + 1;T ; Æ )+@ 2 L(Y + 1;T ; ) ^ ML Æ Inthis equation

denotes aconvexcombination (not necessarilythesame in each row)of ^

ML

and

Æ

and @ denotes the derivativewith respect to evaluated at the parametervectorin the

argumentofthefunction. Here p T@L(y t ; ^ ML

)!0in probabilityaccordingto thedenitionof

the estimate and p T@L(Y + 1;T ; Æ

) is shownto be asymptotically normal. TheHessian is shown

to beconvergentto anonsingular limit,which isensured bythe assumptionthat the truestate

dimensionisusedfortheestimation.

Themainideaofthisproofistomimicthisargument. Let ^

cca

denotethevectorofparameter

estimatesobtainedbyusingtheCCAmethod. Thecorrespondingsystemmatriceswillbedenoted

with( ^ A cca ; ^ K cca ; ^ C cca

). Supposewendascalarfunction M

f (Y + 1;T ;)suchthat p T@M f (Y + 1;T ; ^ ML )!0inprobability p T@M f (Y + 1;T ; ^ cca )!0in probability @ 2 M f (Y + 1;T ;) ! H 1

() uniformely on some compact subset, such that

Æ is an interior point,where H 1 iscontinuousat Æ andH 1 ( Æ )isnonsingular.

Herethefunction alsodepends onthenoisecovariance,butthisdependence isnotre ectedin

thenotation. Thenitfollowsfrom

@M f (Y + 1;T ; ^ cca )=@M f (Y + 1;T ; ^ ML )+@ 2 M f (Y + 1;T ; ) ^ cca ^ ML that p T ^ cca ^ ML

! 0in probabilityand thus the asymptoticdistribution ofthe two

esti-mateswillbethesame. Thereforethestructureoftheproofistoproposeafunction M

f (Y

+

1;T ;)

and thenverifythe abovementionedproperties. Thisis donein thefollowingsubsections,after

dealingwith somepreliminaries.

4.1 Preliminaries Deneha t ;b t i=T 1 P T f t=p+1 a t b 0 t

. Forascalarrandomsequenceg

T thenotationg T =o(h T )will

beusedtodenotethefact,thatg

T =h T !0a.s. Furtherg T =O(h T

)meanslimsup

T!1 jg T =h T j

M<1a.s. forsomeconstantM <1. Forvectorormatrixvaluedrandomvariablesg

T =O(h

T )

means limsup

T!1 max i;j jg T;i;j =h T

j M < 1, i.e. the bound is assumed to hold uniformely.

Hereg

T;i;j

denotesthe(i;j)-thelementofg

T

. Thisnotationwillbeusedalsointhecase,thatthe

dimensionofg

T

dependsonthesamplesizeT. Note,thatthisnotationisnonstandard,asusually

thenorm ofthematrixis comparedto h

T

. This isadierence forsequencesof matrices,whose

dimensions grow with the sample size T. Furthermore in the case, where g

T

depends on some

otherquantities,e.g. g

T =g

T;p

,itwillbesaid,thatg

T;p =O(h

T

)uniformelyinp,iftheconstant

M involvedin thedenition oftheorder O(h

T

)canbechosenindependentofp. Furthermore :

=

willmeanessentialequivalenceinthesensethattheequalityholdsuptotermsofordero(1= p

T),

againcomponentwisebutuniformely. Inthiscasethedierencewillbecalled neglectable. Finally

Q

T =

p

loglogT=T will beused.

Thefollowingobviouslemma willbeusedwidelyintheremainingsections:

Lemma4.1 Let ^ A2R a ^ b and ^ B 2R ^ bc suchthat ^ A=O(Q T ) and ^ B B =O(Q T )for some matrix B 2R ^ bc

for possibly data dependent ^

b, possibly tending to innity. Then ^ A ^ B ^ A B = O(Q 2 T ^ b)and thus ^ A ^ B and ^

A B areessentially equivalentfor ^ bQ 2 T p T !0a.s.

(8)

^

K

p

willbeneeded. This isprovidedin thenextlemma:

Lemma4.2 Let the conditions of Theorem 3.1hold. Then ^ K p K Æ p =O(Q T ) uniformely inp, whereK Æ p

corresponds totherealisation ofthe true systeminthe overlappingcanonical form, i.e.

K Æ p =[K Æ ;(A Æ K Æ C Æ )K Æ ;;(A Æ K Æ C Æ ) p 1 K Æ

],usingthe realisation (A

Æ ;K Æ ;C Æ )ofthetrue system. Also k ^ A ML A Æ k=O(Q T ); k ^ K ML K Æ k=O(Q T ); k ^ C ML C Æ k=O(Q T ) k ^ A cca A Æ k=O(Q T ); k ^ K cca K Æ k=O(Q T ); k ^ C cca C Æ k=O(Q T ) Itfollows thatK p K Æ p andO f O Æ f

areof theorderO(Q

T

),uniformely inpandf respectively.

HereK

p andO

f

canbedenedusing( ^ A ML ; ^ K ML ; ^ C ML )orusing ( ^ A cca ; ^ K cca ; ^ C cca ).

Proof: The norm bound on the error for the ML estimate follows from Theorem 4.3.2. in

(HannanandDeistler,1988). Itfollowsfrom Lemma2in (Baueretal.,2000)that

~ K p ~ K Æ p = O 0 f ( + f ) 1 O f 1 O 0 f ( + f ) 1 ( ^ ) I I n 0 ~ K Æ p +o(T 1=2 ) Here ~ K p

denotes thematrix [ ^ K p ] 1 n ^ K p , where [ ^ K p ] n

denotes therst n columns of ^ K p . Further ~ K Æ p = [K Æ p ] 1 n K Æ p

. These matrices are well dened, if [K Æ

p ]

n

is nonsingular. In the nongeneric

case, that this is violated, there exists a dierent normalisation using dierent columns of K Æ

p

such that an analogous result holds. In the proof however we will only deal with the generic

case. The dierences for the nongeneric cases are minor and thus omitted. In the equation

we have used ^ = hY + t;f ;Y t;p ihY t;p ;Y t;p i 1 and = O Æ 1 K Æ 1

denotes the corresponding limit for

T ! 1 and thus also f = p ! 1. The main fact used here is the uniform convergence of

sample covariancesas stated in (Hannan and Deistler, 1988, Theorem 5.3.2): If ^

j =hy t ;y t j i and j =Ey t y 0 t j ,thenmax jjj<(logT) a k^ j j k=O(Q T )fora<1. LetH f;p =EY + t;f (Y t;p ) 0 ,then H f;p hY + t;f ;Y + t;p iisoforderO(Q T

)uniformelyinf andpandthesameistrueforhY

t;p ;Y t;p i= ^ p

and the corresponding expectation

p

. This shows, that ^ : =(hY + t;f ;Y t;p i H f;p )( p ) 1 + H f;p ( p ) 1 ( p ^ p )( p ) 1 is of order O(Q T

), using the resultsof Lemma 4.1 and Theorem

6.6.11of (Hannan andDeistler, 1988), which states k(

p ) 1 k 1 c uniformely in p. Therefore ~ K p ~ K Æ p

isofthesameorder.

Consider the estimation of the state matrices: ~ C cca = hy t ;Y t;p i ~ K 0 p ( ~ K p ^ p ~ K 0 p ) 1 . The same

argumentsasgivenaboveshowthat ~ C cca ~ C Æ =O(Q T

). Similarargumentsalsoshowthesame

result for ~ A cca ~ A Æ and ~ K cca ~ K Æ

. The overlappingcanonical forms canbe obtainedfrom a

decompositionoftheHankelmatrixoftheimpulseresponsecoeÆcientsCA j

Kintoobservability

andcontrollabilitymatrices,suchthatthematrixbuiltofcertainrowsoftheobservabilitymatrix

described by the structural indices are equal to the identity matrix. Thus the transformation

matrix,thattransformsthegivenrealisation( ~ A cca ; ~ K cca ; ~ C cca

)totheoverlappingcanonicalform

istheinverseofthecorrespondingmatrixoftheobservabilitymatrix ~ O f builtof ~ A cca and ~ C cca . Thusconsider ~ C cca ~ A j cca ~ C Æ ( ~ A Æ ) j =( ~ C cca ~ C Æ ) ~ A j cca + ~ C Æ ( ~ A j cca ( ~ A Æ ) j

). Thersttermhereis

oforderO(Q

T

)obviouslyandthesecondtermisequalto ~ C Æ (( ~ A cca ~ A Æ ) ~ A j 1 cca + ~ A Æ ( ~ A j 1 cca ~ A Æ )).

Thus it canbe shown by induction, that the second termis alsoof thesame order,since both

~ A cca and ~ A Æ

are stable, i.e. have all their eigenvalues smaller than one and j j ! 0, where 1>>maxfj max ( ~ A cca )j;j max ( ~ A Æ

)jg. Thisshows,thattheerrorinthetransformation, ~

T ~

T

Æ

say, isof orderO(Q

T ). Thus ^ K p K Æ p = ~ T ~ K p ~ T Æ ~ K p =( ~ T ~ T Æ ) ~ K p ~ T Æ ( ~ K p ~ K Æ p )is oforder O(Q T

). Inall the aboveevaluationsthe constantinvolvedin thedenition ofthe order canbe

chosentobeindependentofp. Thisfollowsfromaninvestigationofthestepsabove.

Note,thatontherouteto theproofoftherstclaim,wealsoprovedthesecond claim,since

the transformed matrices are obtained as ( ~ T ~ A cca ~ T 1 ; ~ T ~ K cca ; ~ C cca ~ T 1

). The remaining claims

followsfromtheargumentsgivenabove,sinceinO

f O

Æ

f

(9)

of ^ C cca ^ A j cca C Æ A Æ ,or( ^ A cca ^ K cca ^ C cca ) ^ K cca (A Æ K Æ C Æ )K Æ

respectively. Termsofthisform

havebeendealtwithbefore. .

The key to the proof lies in imposing more of the structure to the estimates of the state

obtainedfrom thesubspaceprocedure. That this canbe donewithoutchangingthe asymptotic

distribution oftheerrorinsomecasesisshowninthenextlemma:

Lemma4.3 Let x^ t = ^ K p Y t;p and let x~ t = K 1 Y t;1 with K 1 = [ ^ K cca ; b A cca ^ K cca ;], where y t =0;t<1isusedand b A cca = ^ A cca ^ K cca ^ C cca . Then h^x t ~ x t ;"^ t 1 i : =0; h^x t ~ x t ;x^ t 1 i : =0; h^x t ~ x t ;y t+j i : =0; h^x t ~ x t ;" t+j i : =0 j0

Proof: Startwith therstclaim: Let

t+1 =x^ t+1 ~ x t+1 =( ^ K p K p )Y t+1;p b A p cca ~ x t p+1 . Also t+1 = b A cca t + t , where t =x^ t+1 b A cca ^ x t ^ K cca y t . Thus h t+1 ;"^ t i = b A cca h t ;"^ t i. Notethath t ;"^ t " t i :

=0whichfollowsfromLemma4.1usingtheerrorboundontheestimated

systemmatrices, since

h t ;"^ t " t i=h( ^ K p K p )Y t 1;p ; ^ C cca ^ x t +C Æ x Æ t i+h b A p cca ~ x t p ; ^ C cca ^ x t +C Æ x Æ t i

Each ofthe termsin the aboveexpressionis equalto aproduct ofthree matrices, twoof which

areoforderO(Q

T

). Asanexampleinvestigatetherstterm( ^ K p K p )hY t 1;p ;C Æ x Æ t ^ C cca ^ x t i= ( ^ K p K p )hY t 1;p ;x Æ t i(C Æ ^ C cca ) 0 +( ^ K p K p )hY t 1;p ;x Æ t ^ x t i ^ C 0 cca . The entries of ^ K p K p are

of therequired orderdue to Lemma 4.2. Also x^

t x Æ t =( ^ K p K Æ p )Y t 1;p +(A Æ K Æ C Æ ) p x p t p .

Theconjecturethen followsfrom b A p cca =O(Q T

)and theanalogousresultfor(A

Æ K Æ C Æ ) p due tothechoiceofp=d^p BIC . Finallyhy t j ;" t i=O(Q T

)dueto theuniformconvergenceofsample

covariancesshowsthat h

t ;" t i : =0andthush t ;"^ t i : =0implyingh t+1 ;"^ t i : =0.

For the second claim note that h

t ;x^ t i : = h t ;x~ t i = h t ;x~ t 1 i ^ A 0 cca +h t ;"~ t 1 i ^ K 0 cca : = h t+1 ;x~ t i ^ A 0 cca ,whereweused h t+1 ;"~ t ^ " t i :

=0andtherstclaim. Thus

h t+1 ;x^ t i= b A cca h t ;x^ t i : = b A cca h t+1 ;x^ t i ^ A 0 cca

whichshowsthesecondclaim. h

t ;x^ t x Æ t i : =0analogouslytoh t ;"^ t " t i : =0. Thush t ;x Æ t i : = 0. This shows h t ;y t+j i : = 0 since y t+j = " t+j + P j 1 i=1 C Æ (A Æ ) i 1 K Æ " t+j i +C Æ (A Æ ) j x Æ t and h t ;" t+j i :

=0;j 0asis straightforwardto showusingtheuncorrelatednessof"

r+j andY

t;p in

thatcase.

Ithasbeenstatedalready,that thesymmetrybetweenforwardandbackwardrepresentation

willplayamajorroleintheproof. ThedenitionoftheSVDintheCCAcaseleadsto

^ K p hY t;p ;Y t;p i = ^ 1 z b O 0 f hY + t;f ;Y t;p i b O 0 f hY + t;f ;Y + t;f i = ^ 1 x ^ K p hY t;p ;Y + t;f i (4) since ^ + f = hY + t;f ;Y + t;f i; ^ p = hY t;p ;Y t;p

i and the choice of the weighting ^ W + f and ^ W p . Here ^ x = h^x t ;x^ t i; ^ z = h^z t ;z^ t i using z^ t = ^ O f ( ^ + f ) 1 Y + t+1;f , where ^ O f = ( ^ W + f ) 1 ^ U n ^ n ^ T 1 and b O f =( ^ + f ) 1 ^ O f

. In fact, onemight ask,why the systemmatrices are estimated from the

for-wardstateestimateratherthanusingthebackwardstateestimatez^

t

andcalculatingtheforward

representation from these estimates. Thus compare the two estimated system, obtained from

the forwardand the backwardstate estimatesrespectively: Using the stateestimates x^

t and z^

t

respectivelytwodierentsystemscouldbeestimated:

h [ b A cca ; ^ K cca ]; ^ C cca i = h^x t+1 ; ^ x t y t ih ^ x t y t ; ^ x t y t i 1 ; hy t ;^x t i ^ 1 x [ b A b cca ; ^ K b cca ]; ^ C b cca = h^z t 1 ; ^ z t y t ih ^ z t y t ; ^ z t y t i 1 ; hy t ;z^ t i ^ 1 z

(10)

other. Thereareacoupleofinterestinglinks betweenthesetwoestimates: Oneresult,whichwill

beneededfurtheronisthefollowing:

^ C cca =hy t ;x^ t i ^ 1 x =[I;0]hY + t;f ;Y t;p i ^ K 0 p ^ 1 x =[I;0]hY + t;f ;Y + t;f i b O f =hy t ;z^ t 1 i (5) Thisimplies ^ C b cca =hy t ;x^ t+1

i. Nextconsider theestimationofA:

^ A cca = h^x t+1 ;x^ t i ^ 1 x = ^ K p hY t+1;p ;Y t+1;p+1 i 0 ^ K 0 p ^ 1 x : = ^ 1 z b O 0 f hY + t+1;f ;Y t+1;p+1 i 0 ^ K 0 p ^ 1 x = ^ 1 z h^z t ;x^ t i ^ 1 x ^ A b cca = h^z t 1 ;z^ t i ^ 1 z = b O 0 f hY + t;f ;Y + t;f+1 i " 0 b O f # ^ 1 z : = ^ 1 x ^ K p hY t;p ;Y + t;f+1 i " 0 b O f # ^ 1 z = ^ 1 x h^x t ;z^ t i ^ 1 z

Heretheessentialequivalenceisduetotheneglectionofthetermsh[ ^ K p ] p y t p ;x^ t iandh[ b O 0 f ] f y t+f ;^z t i respectively,where[X] l

denotesthel-thblockcolumnofthematrixX. Theseareneglectabledue

to Lemma 4.1 and the increaseof f and p respectivelyof order logT. Therefore the dierence

^ A 0 cca ^ A b cca

is neglectable and theestimates ofthe dynamics forthe forwardand the backward

systemisessentiallyidentical. Thefollowinglemmaclariesthedierencebetweencalculatedand

estimated backward system. The proof willbe omitted,asonly aweakerbound on theerroris

neededinthefollowing.

Lemma4.4 Under the assumptions of Theorem 3.1 let( ^ A b cca ; ^ K b cca ; ^ C b cca

)denote the system

es-timatedfrom the estimate of the backwardstate z^

t . Let ( ~ A b cca ; ~ K b cca ; ~ C b cca

) denote the system

cal-culatedfrom the estimatedforward system( ^ A cca ; ^ K cca ; ^ C cca ). Then ^ A b cca : = ~ A b cca ; ^ K b cca : = ~ K b cca ; ^ C b cca : = ~ C b cca

Infactintheproofbelownottheneglectabilityoftheerrorisneeded,butonlyanerrorbound

of the order O(Q

T

), which can be established easily from bounding the dierence to the true

backwardrepresentation: Forthe calculatedsystemthis followsfrom thedierentiability of the

backwardsystemmatriceswithrespectto theentriesintheforwardrepresentationtogetherwith

therespectiveerrorboundfortheestimatedforwardsystem. Fortheestimatedbackwardsystem

ananalysissimilartotheargumentsofLemma4.2canbeappliedalsoforthebackwardanalysis,

leading to the required result. Finally it is noted, that due to the symmetry of backwardand

forward representationstheresults ofLemma 4.3 also imply analogousresults forthe backward

state, e.g. h~z t ^ z t ;y t j i : = 0;j > 0orh~z t ^ z t ;z^ t+j i : = 0;j =0;1. Here ~z t

canstandfor both,

the backward state calculated from ( ^ A b cca ; ^ K b cca ; ^ C b cca ) or ( ~ A b cca ; ~ K b cca ; ~ C b cca ) respectively, due to

Lemma4.4. TheproofofthisstatementiscompletelyanalogoustotheproofofLemma4.3.

4.2 The criterion function M

f (Y

+

1;T ;)

The main trouble with maximum likelihood estimates is that they are only given implicitely.

Therefore thecriterion function has to be adapted to the equation dening theML estimate in

ordertoderiveitspropertiesattheMLestimate. Ontheotherhandtherehastobeaconnection

tothestructureofthesubspaceestimates. Thismotivatesthechoice

M f (Y + 1;T ;)= 1 f tr h ( f ) 1 hY + t;f O f K p Y t;p ;Y + t;f O f K p Y t;p i i (6) Here f =I f

, where denotes the Kroneckerproduct,i.e.

f

denotes theblock diagonal

(11)

f E f f E 0 f

isused. For f =1andp=1thisis identicalto thepredictionerrorcriterionfunction.

Itisnothardtoshow,that undertheassumptionsonf andpgiveninTheorem3.1thefunction

M f (Y + 1;T ;)=(logdet f )=f+M f (Y + 1;T

;)convergesto theasymptoticlikelihood. On theother

handit isstraightforwardto show,that ifnorestrictionsontheentries ofE

f ;O f ;K p and f are

imposed thematrices ^ E f ; ^ O f ; ^ K p and ^ f

obtainedastheCCAchoicesoptimize M f (Y + 1;T ;). This

followsfromthepropertiesofthesingularvaluedecomposition(cf.StoorvogelandVanSchuppen,

1997, for a discussion on this issue). The diÆcult part of the proof lies in imposing the full

structureonthematricesE

f ; f ;O f andK p

fortheCCAestimates.

4.3 Properties of the derivative at ^

ML

Theproofisbasedonthefactthat

0 : =@ i L(Y + 1;T ; ^ ML ) : = 8 < : 1 T T X t=1 " t ( ^ ML ) 0 ^ ( ^ ML ) 1 1 X j=1 @ i K (j; ^ ML )y t j 9 = ; Here @ i K (j; ^ ML

) denotes thederivative of thej-th coeÆcient of theinverse transferfunction

k(z; ^ ML ) 1

withrespect tothei-th coordinateof . Thisequalityfollowsfrom thedenitionof

theestimate. Straightforwardcalculationsshowthat

@ i M f (Y + 1;T ;) = 2 f tr h 1 f (@ i E f )E 1 f n hY + t;f O f K p Y t;p ;Y + t;f O f K p Y t;p i o i 2 f tr h 1 f h( @ i O f )K p Y t;p +O f ( @ i K p )Y t;p ;Y + t;f O f K p Y t;p i i (7)

Correspondingtotheterminvolving@

i O f and@ i K p

respectivelynote thatY + t;f O f K p Y t;p = E Æ f E +;Æ t;f + O Æ f x Æ t O f K p Y t;p

. Here quantities using the true system

Æ

are denoted with the

additional superscript Æ

. Also note that hy

t j ;"

t

i = O(Q

T

) due to the uniform convergence

of the sample covariances (see e.g. Hannan and Deistler, 1988, Theorem 5.3.2). It is easy to

see, that O f ;K p ;@ i O f ;@ i K p

all have entries decreasing geometrically, since ^

ML

will enter any

neighbourhoodof

Æ

a.s. forTlargeenoughaccordingtoLemma4.2. Thegeometricdecreasethen

followsfromthestabilityandthestrictminimum-phaseassumptiononthesystemcorrespondingto

Æ

. Thereforethesumoftheabsolutevaluesoftheentriesofthesematricesisboundeduniformely

in f;p and T large enough. It follows from easily established results about the multiplication

withblockToeplitzmatrices,that thepremultiplicationwith (E

f f E 0 f ) 1

doesnotchangethese

properties(seee.g. Bauer,1998,Chapter4). Thereforethetermdueto E Æ f E +;Æ t;f contributesas 2 f tr h K p hY t;p ;E +;Æ t;f i E Æ f 0 1 f (@ i O f )+(@ i K p )hY t;p ;E +;Æ t;f i E Æ f 0 1 f O f i Here f = E f f E 0 f

is used. It is easy to see that the 1-norm of this matrix is of the order

O(Q T =f) =O( p loglogT=( p

Tf))and thus o(1= p

T), if p

loglogT =o(f). This is truefor the

choiceoff statedin thetheorem, which isof orderlogT a.s. Hencethistermmaybeneglected.

ForthetermduetoO Æ f x Æ t O f K p Y t;p : = O Æ f O f K Æ p Y t;p +O Æ f K Æ p K p Y t;p holdsasfollows

from thesize of p =p(T), i.e. the state reconstruction can be truncated withoutaecting the

asymptoticalbehaviour. Here O Æ f O f andK Æ p K p

areoforderO(Q

T

)asshowninLemma4.2.

Analogously to above investigate the contribution of O Æ

f O

f

to the second summand of the

derivative: 2 f tr K p ^ p K Æ p 0 O Æ f O f 0 1 ( @ i O f )+( @ i K p ) ^ p K Æ p 0 O Æ f O f 0 1 O f : = 2 f tr Efx Æ t (x Æ t ) 0 g O Æ f O f 0 1 Æ @ i O Æ f +(@ i K p )E Y t;p (x Æ t ) 0 O Æ f O f 0 1 Æ O f

(12)

neglectability of this contributionis obtained. Thesecond term due to K Æ p K p canbe treated

analogously. Thustheessentialtermin (7)isequalto

@ i M f (Y + 1;T ;) : = 2 f tr h 1 f E 1 f (@ i E f )E 1 f n hY + t;f O f K p Y t;p ;E 1 f Y + t;f O f K p Y t;p i o i (8)

Note that up to nowno properties of the MLestimate except forthe norm bound onthe error

havebeenused. Thusalsofortheestimate ^

cca

theessentialtermisgivenbyequation(8). This

followsfromLemma 4.2

Fortheinvestigationoftheessentialtermthespecialpropertiesof theMLestimateareused.

Analogously to the arguments given above it follows that the term in equation (8) involving

O f K p hY t;p ;E 1 f Y + t;f O f K p Y t;p i is neglectable, since p

loglogT = o(f). Thus we are led

to examine 2 f tr h 1 f @ i E 1 f hY + t;f ;E + t;f ( ^ ML )i i evaluatedat ^ ML . Here @ i (E 1 f )is equalto the

blockToeplitzmatrixcontainingthederivativesofthecoeÆcientsoftheinversetransferfunction

k(z; ^ ML ) 1 = (I +z ^ C ML (I z ^ A ML ) 1 ^ K ML ) 1

as its blocks. Here z denotes the backward

shiftoperator. FurthermoreE + t;f ( ^ ML )=E 1 f Y + t;f O f K Y t;1

hasbeenused,where E + t;f ( ^ ML )

denotesthevectorobtainedfromstackingtheestimatedresidualsy

t ^ C ML ~ x t

intoavector

analo-gouslyasthevectorY + t;f isbuiltofy t . Herex~ t

denotestheestimateofthestateusingthesystem

^

ML

. Thusthe(l;j)-thblockinthematrix,whosetraceiscalculatedinequation(8)isessentially

equal to the inner product of a truncated version of @

i " 1 t+j ( ^ ML ) = P 1 r=1 @ i K (r; ^ ML )y t+j r (i.e. @ i " j t+j ( ^ ML )= P j 1 r=1 @ i K (r; ^ ML )y t+j r ) with " t+l ( ^ ML

)for 0 j;l f 1. Since the

traceisexamined,onlythediagonalblocks,i.e. theblocksforj=lhavetobeconsidered. Note

thath@ i " 1 t+j ( ^ ML );" t+j ( ^ ML )i :

=0foralljf,asfollowsfrom aneglectionoftheinitialvalues,

whichisstraightforwardtojustify. Thereforethecrucial termisessentiallyequalto

2 f f X j=1 tr h ^ 1 h@ i " j t+j ( ^ ML );" t+j ( ^ ML )i i : = 2 f f X j=1 tr 2 4^ 1 h 1 X r=j+1 @ i K (r; ^ ML )y t r ;" t ( ^ ML )i 3 5

Againtheessentialequalityisduetotheneglectionofinitialeects. Notethathy

t r ;" t ( ^ ML )i= O(Q T

)uniformelyin r forr=O((logT) a

)for a<1, sincein this casehy

t r ;" t i=O(Q T )and alsohy t r ;" t ( ^ ML ) " t i=O(Q T

)due tothenorm bound ontheestimationerrorin thesystem

matricesshownin Lemma 4.2. Terms oflargerr are premultipliedby@

i K (r; ^ ML )=O( r )= O(T alog )=o(T 1

)foralog< 1. Thereforetheaboveexpressionisoforder

O 0 @ 2 f f X j=1 (logT) a X r=j+1 k@ i K (r; ^ ML )kQ T 1 A =O 0 @ Q T 2 f f X j=1 (logT) a X r=j+1 r 1 A =O(Q T =f)

for some0< <1. This follows from thestabilityand the strict minimum-phase assumption,

whichalsoholdsforthederivativeofthetransferfunction. Therefore@

i M f (Y + 1;T ; ^ ML ) : =0forall iasis required.

4.4 Properties of the derivative at ^

cca

Theevaluationsin thissectionaremoreinvolved. Themainstrategyisto reducethe expression

ofthe derivativeof thefunction M

f (Y + 1;T ; ^ cca

)to terms,whichare knownto be neglectable,i.e.

o(T 1=2

). TheresultsofLemma4.3andLemma 4.4willserveasthemainbasisforthis,i.e. the

derivativewillberelatedtotermsoftheformh^x

t ~ x t ;y t+j iandh~z t ^ z t ;y t j irespectively.

Recall that the rst part of the proof is unchanged for ^ cca replacing ^ ML . Therefore the interestingtermis @ i M f (Y + 1;T ; ^ cca ) : = 2 f tr h 1 f (@ i E f )E 1 f n hY + t;f O f K p Y t;p ;Y + t;f O f K p Y t;p i o i (9)

(13)

Inthis section termslike@ i E f ;E f ;O f ;K p ; f and f

alwaysareformed using(A

cca ;K cca ;C cca ) and ^

orthecorrespondingbackwardsystem. Considerthematrix@

i E

f

, where thederivativeis

withrespecttoanentryinAorKrst. ThederivativeswithrespecttoentriesinCwillbedealt

withlateron. It followsfrom theformofE

f

that thej-thblockcolumn ofthederivative@

i E f is givenby 0 sjn @ i (O f j K) = 0 sjn O f j (@ i K)+ 0 s(j+1)n (@ i O f j 1 ) ^ A cca ^ K cca +O f j 1 (@ i A) ^ K cca

Therefore terms like 2 f hE + t;f ( ^ cca );Y + t;f O f K p Y t;p i 1 f 0 jsn O f j

have to be considered. Here

E + t;f ( ^ cca )=E 1 f Y + t;f O f K p Y t;p

. UsingthestructureofE

f one obtains 1 f 0 jsn O f j = " E 0 j E 0 j C 0 j O 0 f j E 0 f j 0 E 0 f j # 1 f E 1 j 0 E 1 f j O f j C j E 1 j E 1 f j 0 jsn O f j = " E 0 j E 0 j C 0 j O 0 f j E 0 f j 0 E 0 f j # 1 f 0 jsn E 1 f j O f j = " E 0 j C 0 j O 0 f j E 0 f j 1 f j E 1 f j O f j E 0 f j 1 f j E 1 f j O f j # = E 0 j C 0 j O 0 f j O f j O f j HereE 0 f j 1 f j E 1 f j O f j = O f j

,wherethisdenes O f j ,andC j = h ^ A j 1 cca ^ K cca ; ^ A j 2 cca ^ K cca ;; ^ K cca i .

Ananalogouspartitioning leadsto:

1 1 O 1 f = " E 0 f E 0 f C 0 f O 0 1 E 0 1 0 E 0 1 # 1 1 E 1 f 0 E 1 1 O 1 C f E 1 f E 1 1 O f O 1 ^ A f cca = h E 0 f E 0 f C 0 f O 0 1 E 0 1 i 1 1 " E 1 f O f E 1 1 O 1 ^ A f cca C j E 1 f O f # = E 0 f C 0 f O 0 1 E 0 1 1 1 E 1 1 O 1 ( ^ A cca ^ K cca ^ C cca ) f + O f Here [X] f

denotes the matrix of the rst f block rows of the matrix X. Thus the norm of

[ 1 1 O 1 ] f O f is smallerthanC() f

for allf forsomeC <1, wherej

0 j<<1. Notethat (K b 1 ) 0 =( + 1 ) 1 O 1 =(E 1 1 E 0 1 ) 1 O 1 S,since + 1 =(E 1 1 E 0 1 )+O 1 x O 0 1

. Thustheabove

resultimplies,that O 0 f S 0 K b f =O( f ).

In thenext stepthe matrix

O 0

f j

will be replaced with S 0

K b

f

, the matrixconsisting of the

rstf blockcolumnsofK b

. Theerrorintroducedbythisreplacementisoftheform

2 f h" t+l ( ^ cca );Y + t;f+j O f+j K p Y t;p i 2 4 0 @ E 0 j C 0 j O 0 f j O f j O f j 0 1 A 0 @ E 0 j C 0 j O 0 f (K b f ) 0 S 1 (K b f j ) 0 S 1 (K b j ) 0 (( A b ) f j ) 0 S 1 1 A 3 5 where l<j and A b = ~ A b cca ~ K b cca ~ C b cca . Note that h" t+l ( ^ cca );" t+j ( ^ cca )i=O(Q T ) forj 6=l, as followsfrom h" t+l ;" t+j i =O(Q T

)and the fact that the estimation errorsin thesystem matrix

estimatesareoforderO(Q

T ). Thetermsh" t+l ;" t+l icanbereplacedbyh" t+l ;" t+l i duetothe

lowertriangularblockstructure of E

f and @

i E

f

andthis termis also oforder O(Q

T

). Recalling

thebound on the normof O f j (K b f j ) 0 S 1

itfollowsthat thenorm of theexpression above

isof orderO(Q T kK b f j+1 k=f)=O(jj f j Q T =f),whereK b j =( A b ) j 1 ~ K b cca . Itremainsto assess

the total contribution of the replacement. The matrices O

f j

occur in j places: Once for the

derivative of K in the j-th column of E

f

and postmultiplied with ^

A j i 1

cca

in the i-th column,

1ij 1,for thederivativewith respect toan entryin A. Due tothe stability of ^

A

cca the

(14)

contributionisthusoforderO(jj Q

T

=f). Summingthiscontributionover1j f 1results

inatotaleectoforderO(Q

T

=f)=o(T 1=2

). Thereforethedierencecanbeneglectedandthe

factorScan bedropped,asitisnotessentialfortheanalysis. Theinvestigationthusfocusseson

2 f h" t+l ( ^ cca );Y + t;f+j O f+j K p Y t;p i " E 0 j C 0 j O 0 f (K b f ) 0 (K b f ) 0 # = 2 f h" t+l ( ^ cca );K b f Y + t+j;f O f ( ^ A j cca K p Y t;p +C j E 1 j [Y + t;j O j K p Y t;p ]) i : = 2 f h" t+l ( ^ cca );K b f Y + t+j;f O f ~ x t+j i : = 2 f h" t+l ( ^ cca ); ^ O 0 f ( ^ + f ) 1 Y + t+j;f O f ~ x t+j i Here x~ t =K p Y t;p

. Thesecond last equalityfollowsfrom ^ A j cca K p Y t;p +C j E 1 j [Y + t;j O j K p Y t;p ] : = ~ x t+j

,asisstraightforwardtoprove. Thelastequationfollowsfromthefact,thatboth ^ O 0 f ( ^ + f ) 1 O 0 f ( + f ) 1 andh" t+l ( ^ cca );Y + t+j;f O f ~ x t+j

iareoforderO(Q

T

)andthattheerrorinthe

replace-mentofO 0 f ( + f ) 1 byK b f

isboundedinnormoforderO( f

)=o(T 1=2

)ashasbeenusedbefore.

Thusconsiderh" t+l ( ^ cca ); b O 0 f Y + t+j;f i : =h" t+l j ( ^ cca ); b O 0 f Y + t;f

i. Accordingtoequation(4)itfollows

that hy t s ; b O 0 f Y + t;f i = hy t s ; b O 0 f O f ^ x t

i for s p. Note that "

t+l j = [I; ^ C cca K 1 ]Y t+l j+1;1 . Thereforeh" t+l j ( ^ cca ); b O 0 f (Y + t;f ^ O f ^ x t )i isof orderO(Q T p j+l

). Againaccountingfor all

oc-curringtermsleadstotheneglectabilityofthedierence. Thuswehavereducedthecrucialterm

to 2 f h" t+l ( ^ cca ); ^ O 0 f ( ^ + f ) 1 ^ O f ^ x t+j O f ~ x t+j i = 2 f h" t+l ( ^ cca ); ^ O 0 f ( ^ + f ) 1 ( ^ O f O f )^x t+j +O f (^x t+j ~ x t+j ) i where x^ t = ^ K p Y t;p

has been used. Consider ^ O 0 f ( ^ + f ) 1 ( ^ O f O f ) : = K b f ( ^ O f O f ) rst: Note that K b f O f = ^ K b cca ^ C cca +( ^ A b cca ^ K b cca ^ C b cca )K b f O f ^ A cca +O( f 0 f p ), where j max ( ^ A cca )j < p < 1;j max ( ^ A b cca ^ K b cca ^ C b cca )j< 0 <1. Here max

(:)denotes aneigenvalueofmaximummodulus.

Furthermore K b f ^ O f = K b f hY + t;f ;Y + t;f i b O f = ^ K b cca hy t ;z^ t 1 i+( ^ A b cca ^ K b cca ^ C b cca )K b f 1 hY + t+1;f 1 ;^z t 1 i : = ^ K b cca ^ C cca +( ^ A b cca ^ K b cca ^ C b cca )h~z t ;z^ t 1 i : = ^ K b cca ^ C cca +( ^ A b cca ^ K b cca ^ C b cca )K b f ^ O f ^ A cca

Here the last equation followsfrom h~z

t ;z^ t 1 i : =h^z t ;z^ t 1 i=h^z t ;z^ t i( ^ A b cca ) 0 and h^z t ;z^ t i : =h~z t ;^z t i

accordingtothebackwardversionofLemma4.3. Also ^ A b cca : = ^ A 0 cca and ^ C cca =hy t ;z^ t 1 iaccording

to(5)hasbeenused. Thisshowsthat K b f O f b O 0 f ^ O f isofordero(T 1=2

)andthusneglectable.

It remainsto bound thenorm ofthe contributionof thecrucial termsto thederivative.

Re-callequation (9), which statesthe essentialterm in the derivativeof M

f (Y

+

1;T

;). Consider the

derivativewithrespecttoanentryinK rst: Inthiscasethepartialderivativeisequalto

@ i M f (Y + 1;T ; ^ cca ) = 2 f f 1 X j=1 tr h" t+j 1 ( ^ cca );Y + t;f O f K p Y t;p i(E f f E 0 f ) 1 0 jss O f j @ i K : = 2 f f 1 X j=1 tr h" t+j 1 ( ^ cca ); b O 0 f ( ^ O f O f )^x t+j +O f (^x t+j ~ x t+j ) iS 1 (@ i K) : = 2tr h E" t 1 x 0 t (K b f ( ^ O f O f )) 0 S 1 (@ i K) i : =0

wheretheresultsderivedabovehavebeenused.

(15)

i f 2 f P f j 1 l=0 tr " ^ K 0 cca ( ^ A l cca ) 0 (@ i A) 0 0 (j+l)sn O f j l 0 E 0 f 1 f E 1 f hY + t;f O f K p Y t;p ;" t+j 1 ( ^ cca )i # : = 2 f P f j 1 l=0 tr h (@ i A) 0 S 0 hK b f ( ^ O f O f )^x t+j+l +O f (^x t+j+l ~ x t+j+l ) ; ^ A l cca ^ K cca " t+j 1 ( ^ cca )i i : = 2 f P f j 1 l=0 tr h (@ i A) 0 S 0 hK b f ( ^ O f O f )^x t +O f (^x t ~ x t ) ; ^ A l cca ^ K cca " t l 1 ( ^ cca )i i = 2 f tr h (@ i A) 0 S 0 hK b f ( ^ O f O f )^x t +O f (^x t ~ x t ) ; P f j 1 l=0 ^ A l cca ^ K cca " t l 1 ( ^ cca )i i : = 2 f tr h (@ i A) 0 S 0 hK b f O f (^x t ~ x t ); P f j 1 l=0 ^ A l cca ^ K cca " t l 1 ( ^ cca )i i Notingthatx~ t : = P f j 1 l=0 ^ A l cca ^ K cca " t l 1 ( ^ cca )+ ^ A f j cca ~ x t j andthath t ;x~ t i : =h t ;x^ t i : =0ashas

beenshowninLemma4.3,thenormoftheaboveexpressionisoforderO(Q

T k ^ A f j cca k=f). Summing

overall columns 1 j f 2 then amounts to a total error of order O(Q

T

=f) = o(T 1=2

).

Thereforealsointhiscaseoneobtains@

i M f (Y + 1;T ; ^ cca ) : =0asrequired.

FinallyconsiderthederivativewithrespecttoentriesinC: Theessentialpartofthederivative

isequalto 2 f tr h 1 f E 1 f @ i E f hE + t;f ( ^ cca );E + t;f ( ^ cca )i i

. Herethej-throwofthematrix@

i E f isequal to@ i C[ ^ A j 2 cca ^ K cca ;; ^ K cca ;0 n(f j+1)s

]. Thecontributionofthej-thdiagonalblocktothetrace

forj2isequalto 2 f h j 2 X r=0 ^ A r cca ^ K cca " t+j r 1 ( ^ cca ); ~ K b f j Y + t+j;f j O f j ~ x t+j i Here ~ K b f j isequalto(E f j f j E 0 f j ) 1 [I ss ;0 s(f j 1)s ] 0

. ThisfollowsfromtheblockToeplitz

structureofE f considering 1 f [0 sjs ;I s ;0 s(f j 1) ] 0

usinganalogousargumentsasinthecaseof

1

f O

f

Notethatfrom

[I s ; ^ C b cca K b 1 ] + 1 =E[I s ; ^ C b cca K b 1 ]Y + t;1 (Y + t;1 ) 0 =[ b ;0 s1 ] itfollowsthat ( + 1 ) 1 [I s ;0 s1 ] 0 =[I; ^ C b cca K b 1 ] 0 ( b ) 1

. Usingthematrixinversionlemmaone

obtains ( 1 ) 1 = 0 fsfs 0 fs1 0 1fs ( + 1 ) 1 + I ( + 1 ) 1 ~ H 0 f;1 1 f h I; ~ H 1 ( + 1 ) 1 i since f =( + f H f;1 ( + 1 ) 1 ~ H 0 f;1 ). Here ~ H f;1 =EY + t;f (Y + t+f;1 ) 0

. Thereforeitisobserved,that

(E f j f j E 0 f j ) 1 [I ss ;0 s(f j 1)s ] 0

isequaltotherstf jblockrowsof( + 1 ) 1 [I ss ;0 s1 ] 0 .

Thustheessentialtermisoftheform

2 f h j 2 X r=0 ^ A r cca ^ K cca " t+j r 1 ( ^ cca );"^ t+j C b cca K b f j Y + t+j+1;f j O f j ~ x t+j+1 i( b ) 1

Againcomputingtheerrorinthetruncationofthestatereconstructionshows,thattheresulting

errormaybeneglected. Thus thesecond summandisneglectablesince

f 1 X j=1 2 f h~x t+j ;K b f j Y + t+j+1;f j O f j ~ x t+j+1 i : =2hx^ t ;K b f ^ O f ^ x t+1 O f ~ x t+1 i : =0

asthistermhasbeenanalysedalready. Theothersummandisneglectable,sinceh~x

t+j ;" t+j ( ^ cca )i : = h^x t+j ;" t+j ( ^ cca )i : =h^x t+j ;" t+j ( ^ cca ) "^ t+j i=h^x t+j ;x^ t+j ~ x t+j i ^ C 0 cca : =0accordingtoLemma4.3.

HereweusedasidefromtheequationsgiveninLemma4.3alsoy

t = ^ C cca ~ x t +" t ( ^ cca )= ^ C cca ^ x t +^" t .

(16)

Itremainstoshow,thattheHessianisasymptoticallynonsingular. Thiswillbedonebyreferring

tothesamepropertyofthepseudomaximum-likelihoodmethod. Simplebutcumbersome

calcula-tionsofthederivativeswithrespecttothei-thandthej-thcomponentofrespectivelyleadtothe

fact,thattheonlyessentialtermequals 2 f tr h @ i (E 1 f )hY + t;f O f K p Y t;p ;Y + t;f O f K p Y t;p i@ j (E 0 f ) 1 f i .

This since in all terms which include some derivative of O

f or K

p

include terms of the form

hy t j ;" t ( ^ cca

)i, which converge to zero due to the uniform convergence of the estimates of the

samplecovariancesandtheconsistencyofthesystemmatrixestimates. Theremainingterm

cor-respondstothesecondderivativeofE

f isequalto 2 f tr h 1 f @ 2 i;j (E 1 f )hY + t;f O f K p Y t;p ;E + t;f ( ^ cca )i i . Here @ 2 i;j (E f ) 1

is block lower triangular with zeroes on the block diagonal, whereas hY + t;f O f K p Y t;p ;E + t;f i ! E Æ f Æ f

. This shows, that the contribution of this term is zero. As for the

rst derivative theconvergence of the contribution of the terms O

f K p Y t;p to zerois immediate

from theToeplitzstructure andthefact, that1=f !0. Therefore theessentialtermisequalto

2 f tr h h@ i (E 1 f )Y + t;f ;@ i (E 1 f )Y + t;f i 1 f i . Let i t ()=@ i " t ()= P 1 j=1 @ i K (j;)y t j

. Note that the

familyoflterscorrespondingto@

i

K (j;)isuniformelystableduetothestrictminimum-phase

assumptionon

Æ

foracompactneighbourhoodof

Æ . Here@ i K (j;)isequaltothe(l+j;l)-th blockentry(1 lf j)in @ i (E 1 f

), asis immediate fromthedenition of E

f . Thereforethe dierence@ 2 a;b M f (Y + 1;T ; Æ ) @ 2 a;b L(Y + 1;T ; Æ )isessentiallyequalto 2 f P f j=1 tr h h P j l=1 @ a K (l;)y t+j l ; P j l=1 @ b K (l;)y t+j l i ^ 1 h P 1 l=1 @ a K (l;)y t+j l ; P 1 l=1 @ b K (l;)y t+j l i ^ 1 i = 4 f P f j=1 tr h h P j l=1 @ a K (l;)y t+j l ; P 1 l=j+1 @ b K (l;)y t+j l i ^ 1 i 2 f P f j=1 tr h h P 1 l=j+1 @ a K (l;)y t+j l ; P 1 l=j+1 @ b K (l;)y t+j l i ^ 1 i

The evaluations to show, that this tends to zero, are standard (see e.g. Hannan and Deistler,

1988, Chapter 4) and therefore omitted. Since the two Hessians are asymptotically equal the

nonsingularityoftheHessianofM

f (Y + 1;T ; Æ

)canbeinferredfromthecorrespondingresultforthe

HessianofL(Y +

1;T ;

Æ

). Thisnally concludestheproof.

5 Conclusions

Inthispaperthelongstandingquestionofasymptotic eÆciencyof CCAhasbeenanswered

aÆr-mativeinthecaseofnoexogenousinputs. Theimplicationsofthisresultisinthecase,wherethe

systemorderisknown,theCCAsubspacealgorithmisanimplicitimplementationofageneralised

pseudomaximum-likelihoodprocedure,which doesnotrequirethenumericaloptimisationofthe

pseudolikelihood and thus is noniterative. The proof also leads to a new central limittheorem

for theCCAsubspaceestimates, asit doesnot onlyapply in genericcases, but on thewhole set

of transfer functions of McMillan degree n, which are stable and strictly minimum-phase. The

authorwantstostress,thattheproofonlycontainsthesuÆciencyoftheassumptions. No

state-menthasbeenmade,thatdierentschemesaresuboptimal,althoughtheexamplestreatedsofar

in theliteraturesupportthis conjecture. Thepaperonlytreatsthecaseofnoexogenousinputs.

The case including exogenous inputs is still largely unsolved. Whether themethod used above

leadsto asimilar resultin the caseof additionalobservedinputsis amatter offuture research.

Finallynote,thattheproofonlyconsidersthecase,wherethetrueorderofthesystemisusedfor

estimationanddoesinparticularnotimply,thattheMLmethodsandtheCCAsubspacealgorithm

(17)

TheauthorwouldliketothankManfredDeistlerandWolfgangScherrerforthevaluablesupport

and many useful remarks during the time, while this work has been done. Also the nancial

support from the Austrian 'FWF' under projectnumberP-11213-MATand from the EU TMR

project'SI' in form of apost-doc position at the Department f. Automatic Control,Linkoping

University,Linkoping,Sweden,isgratefullyacknowledged.

References

Akaike, H. (1976). Canonical correlation analysis of time series and the use of an information

criterion.In: SystemIdentication: AdvancesandCase Studies(R.MehraandD. Lainiotis,

Eds.).pp.27{96. AcademicPressInc.

Bauer,D.(1998).SomeAsymptoticTheoryfortheEstimationofLinearSystemsUsingMaximum

LikelihoodMethodsorSubspaceAlgorithms.PhDthesis.TUWien.

Bauer, D. (2000). Order estimation for subspace methods. Technical report. Dept. Automatic

Control,LinkopingUniversitetet. SubmittedtoAutomatica.

Bauer,D.,M.DeistlerandW. Scherrer (1997b).Theanalysisoftheasymptoticvarianceof

sub-spacealgorithms.Proceedingsofthe11 th

IFACSymposiumonSystemIdentication,Fukuoka,

Japanpp.1087{1091.

Bauer, D., M. Deistler and W. Scherrer (1999). Consistency and asymptotic normality of some

subspacealgorithmsforsystemswithoutobservedinputs.Automatica35,1243{1254.

Bauer,D.,M.Deistlerand W.Scherrer(2000).Ontheimpactofweightingmatricesinsubspace

algorithms.In: Proceedings of the IFACConference 'SYSID'.SantaBarbara,California.

Desai, U. B., D. Paland R. D. Kirkpatrick (1985).A realization approach to stochastic model

reduction.International Journalof Control42(4),821{838.

Hannan,E.J.andM.Deistler(1988).TheStatisticalTheoryof LinearSystems.JohnWiley.New

York.

Larimore,W. E. (1983).System identication, reduced order ltersand modeling viacanonical

variateanalysis.In: Proc.1983Amer.ControlConference2.(H.S.RaoandP.Dorato,Eds.).

Piscataway,NJ.pp.445{451. IEEEServiceCenter.

Peternell, K.,W. Scherrerand M.Deistler(1996).Statisticalanalysis ofnovelsubspace

identi-cationmethods. SignalProcessing52, 161{177.

Stoorvogel,A.andJ.VanSchuppen(1997).Approximationproblemswiththedivergencecriterion

forgaussianvariablesandgaussianprocesses.TechnicalReportBS-R9616.CWI.Department