Matrices in Larimore Type of Subspace
Algorithms
Dietmar Bauerand Lennart Ljung
Department of Electrical Engineering
Linkping University, S-581 83Linkping, Sweden
WWW: http://www.control.isy.liu .se
Email: ljung@isy.liu.se
2000-10-13
REG
LERTEKNIK
AUTO
MATIC CONTR
OL
LINKÖPING
Report no.: LiTH-ISY-R-2305
Submitted toAutomatica
Technicalreportsfrom theAutomatic Controlgroupin Linkpingare available
by anonymous ftp at the address ftp.control.isy.liu.se. This report is
in Larimore type of subspace algorithms
Dietmar Bauer
Institute f. Econometrics, Operations Research and System Theory
TU Wien, Argentinierstr. 8, A-1040 Wien
Lennart Ljung
Division of Automatic Control, Department of Electrical Engineering,
Linkoping University, SE-581 83 Linkoping, Sweden,
e-mail: ljung@isy.liu.se
October 4, 2000
Abstract
In this paper the eect of some weighting matrices on the asymptotic variance
oftheestimates oflineardiscretetimestatespace systemsestimatedusingsubspace
methods is investigated. The analysis deals with systems with white or without
observed inputsand refersto the Larimoretype of subspace procedures. The main
resultexpressestheasymptoticvarianceofthesystemmatrixestimatesincanonical
form as a function of some of the user choices, clarifying the question on how to
choose them optimally. It isshown,that theCCAweighting scheme leadsto optimal
accuracy. The expressions for the asymptotic variance can be implemented more
eÆcientlyascomparedto theones previouslypublished.
Keywords: linear systems, discrete time systems, subspace methods, asymptotic
variance
Subspace algorithms are used for the estimation of linear, time invariant, discrete time,
nite dimensional black box state space models. The algorithms can be roughly divided
into Larimore type of algorithms (Larimore, 1983) (One algorithm in this class is
usu-ally called CCA, canonical correlation analysis), which estimate the state in the rst step
and then extract the estimates of the system matrices from these estimates, and MOESP
(multivariable output error state space) type of algorithms (Verhaegen, 1994), which
es-timate the observability matrix and use this estimate to obtain estimates of the system
matrices. The asymptotic properties of the Larimoretype of approach have been derived
in a series of papers: (Peternell et al., 1996) derive the consistency, (Bauer et al., 1999)
proveasymptoticnormalityinthecaseof noobserved inputs,(Bauer,1998)dealswiththe
general case. For the MOESP type of procedure consistency and asymptotic normality are
dealtwith in(Bauer and Jansson,2000),while preliminaryresults onconsistencycan also
be found in (Jansson and Wahlberg, 1998; Verhaegen, 1994). The asymptotic normality
proofisveryconstructiveinbothcases, whichledtoformulasforthe asymptoticvariance.
However these expressions were too complicatedin order to directly provide some insight
intothe eect of certainuser choices. Recently simplicationsof theseformulas havebeen
found independently in (Jansson, 2000)for the MOESP case and in(Bauer et al., 2000)for
the Larimore type of procedures. These simplerexpressions lieat the heart of this paper,
which derives the corresponding variance expressions as a function of a certain weighting
matrix. This expression can be used in order to optimizethe user choice with respect to
asymptoticaccuracy of the estimated system.
Thepaperisorganisedasfollows: Inthenextsectionthemodelsetandtheassumptions
arestatedandalsoashortoverviewoftheestimationalgorithmsisgiven. Section3presents
the main results. Section 4demonstrates the results insome numericalexamples. Finally
section 5concludes.
Throughout the paper the following notation willbe used: I
n
denotes the nn
iden-tity matrix, 0 ab
the nullmatrix ofrespective dimensions. Furtherf
T
=O(g
T
T>0
T T T T T!1 T T
0 a.s. Here T is used to denote the sample size. Convergence is denoted asusual with !
and is always meant to be a.s. if not stated explicitely. Prime is used to denote
transpo-sition of matrices. The Kronecker product between two matrices A and B is denoted as
AB. Finally Q T = p T 1
loglogT is used and :
=denotes equality up toterms of order
o(T 1=2
).
2 Model Set, Assumptions and Algorithm
This paper deals with linear, nite dimensional, discrete time, time invariant,state space
systems of the form
x t+1 = Ax t +Bu t +K" t y t = Cx t +Du t +" t (1) where y t 2 R s
denotes the observed output process, u
t 2 R
m
denotes the observed input
process and "
t 2 R
s
the unobserved white noise sequence. x
t 2 R
n
is the state sequence.
Thus the true order of the system is denoted by n. Here A 2 R nn ;B 2 R nm ;C 2 R sn ;D 2 R sm ;K 2 R ns
are real matrices. The system is assumed to be stable, i.e.
alleigenvalues of A are assumed to lie insidethe unit circle, and strictly minimum-phase,
i.e. the eigenvalues of A KC are assumed to lie inside the unit circle. The system
matrices correspond to a pair of transfer functions: Let H(q) =I
s +C(qI n A) 1 K and letG(q)=D+C(qI n A) 1
B,whereqdenotestheforwardshiftoperator. Furthermorelet
M
n
denotetheset ofallpairsoftransferfunctionsthat permitastatespacerepresentation
of the form (1)fulllingthe stabilityand the strict minimum-phaseassumption onH(q).
The white noise "
t
is for simplicity assumed to be i.i.d. with mean zero, nonsingular
variance and nite fourth moments. More general assumptions in a martingale dierence
framework can be found in (Bauer et al., 1999). The input is assumed to be i.i.d. with
mean zero and nonsingular variance, also having nite fourth moments. Input and noise
are assumed tobe independent.
see e.g. Bauer, 1998, Chapter 3): Let Y t;f = [y 0 t ;y 0 t+1 ; ;y 0 t+f 1 ] 0 and let U t;f and E t;f
respectivelybe constructedanalogouslyusing u
t and "
t
respectivelyin theplace ofy
t . Let Z t;p =[y 0 t 1 ;u 0 t 1 ; ;y 0 t p ;u 0 t p ] 0
. Here f and p are twointeger parameters,whichhave to
be chosen by the user. See belowfor assumptions onthe choice of these integers. Then it
follows fromthe system equations(1) that
Y + t;f =O f K p Z t;p +U f U + t;f +E f E + t;f +O f (A KC) p x t p HereO 0 f =[C 0 ;A 0 C 0 ;(A f 1 ) 0 C 0 ]andK p =[[K;B KD];(A KC)[K;B KD]; ;(A KC) p 1 [K;B KD]]. FurtherU f
isthematrixcontaining[CA j 2
B;;CB;D;0
s(f j)m
]
asitsj-th blockrowand E
f contains [CA j 2 K;;CK;I s ;0 s(f j)s
] asitsj-thblockrow.
This equation builds the basis for all subspace algorithms, which can be described as
follows: 1. Regress Y + t;f ontoU + t;f and Z t;p to obtainanestimate ^ z of O f K p and anestimate ^ u of U f
respectively. Due tonite sample eects ^
z
willtypicallybe of full rank.
2. For given n nd a rank n approximation of ^ z by using the SVD of ^ W + f ^ z ^ W p = ^ U n ^ n ^ V 0 n + ^ R . Here ^ n
denotes the diagonalmatrix containingthe largest n singular
values in decreasing order. ^
U
n
contains the corresponding left singular vectors as
columns and ^
V
n
the corresponding right singular vectors. Finally ^
R accounts for
the neglected singular values. The matrices ^ W + f and ^ W p
are weighting matrices,
which are chosen by the user. Further details are given below, for the moment it
is suÆcient to note, that these possibly data dependent matrices are assumed to be
nonsingular(a.s.). This leads toanapproximation ^ O f ^ K p =( ^ W + f ) 1 ^ U n ^ n ^ V 0 n ( ^ W p ) 1 .
The actual decomposition of this matrix into ^ O f and ^ K p
has no in uence on the
estimated transfer functions.
3. Usingthe estimates ^ O f ; ^ K p and ^ u
obtainthe system matrix estimates.
In thesecond stepanorder has tobespecied. Alsothe matrices ^ W + f and ^ W p haveto be
the matrix ^
W
p
typicalchoices are ( ^ p ) 1=2 and ( ^ ; p ) 1=2 , where ^ p = 1 T T t=p+1 Z t;p (Z t;p ) 0
denotes the sample variance of Z
t;p
and X 1=2
denotes the uniquely dened symmetric
square root ofa matrix X. Further ^ ; p = ^ p ^ u;z ^ 1 u ^ u;z . Here ^ u
denotes the sample
varianceofU + t;f and ^ u;z
thesamplecovarianceofU + t;f and Z t;p . Let p =E ^ ; p denotethe
expectationoftheweighting matrix. Itfollowsfromtheassumptionsontheinputsandthe
noise stated above, that for any xed p the estimate ^ ; p converges to p . Furthermore
the results stated e.g. in (Hannan and Deistler, 1988) imply, that the two norm of these
matricesisboundedfrombelowandfromabovea.s. uniformelyforp=O((logT) a
);a<1,
i.e. for moderately growing size.
Corresponding to ^
W +
f
typicalchoices includethe identity matrix and ( ^ +; f ) 1=2 using ^ +; f = ^ + y ^ y;u ^ 1 u ^ u;y (2) where ^ + y
stands forthe sample variance ofY +
t;f and
^
y;u
denotes the sample covarianceof
Y + t;f and U + t;f
. In this paper the choice of the weighting ^
W +
f
will be restricted depending
on the choice of the integer f: If f is chosen to be xed and nite, then ^ W + f is assumed to be chosen such k ^ W + f W + f k = O(Q T
) for some nonsingular matrix W + f . For f ! 1 only ^ W + f = ( ^ +; f ) 1=2
or a weighting matrix attached to a frequency weighting transfer
function(cf.Bauer,1998)areconsidered. Lettheexpectationbedenoted with +;
f
. Then
analogous results hold true: The errork ^ +; f +; f k 2
!0 and the two norm of +; f and thus of ^ +; f
is bounded and bounded away from zero for f = O((logT) a
);a < 1. The
name CCA(canonical correlation analysis) willbe reserved for the procedure using
^ W p =( ^ ; p ) 1=2 and ^ W + f =( ^ +; f ) 1=2 (3)
Inthethirdstepthe dierencebetween thetwoclassesofproceduresappears: Whereas
the Larimoretypeof procedures use ^
K
p
tocontinue, the MOESPtypeof proceduresuses ^
O
f
(fordetailseeBauer, 1998,Chapter3). Inthispaperonlythe Larimoretypeofprocedures
The main idea of the considered class of algorithmsis to estimatethe state ina rst step
and to obtainthe estimateof the system using this state estimate. Consider the estimate
^ K p = ^ S ^ V 0 n ( ^ W p ) 1 . Here ^ S = [ ^ V 0 n ( ^ p ) 1 ] n
appears to be a convenient choice of ^
S, where
[X]
n
denotes the submatrix containing the rst n columns of X. This is possible (a.s.
asymptotically), if the rst n columns of K
p
are linearily independent (in one and thus
in any representation). This holds true on a generic subset of M
n
, which is denoted with
M + n . Let( ^ A c ; ^ B c ; ^ C c ; ^ D c ; ^ K c
) denote the estimated system, which has been converted into
the canonical forminduced by the restriction that [K
p ] n =I n and let (A c ;B c ;C c ;D c ;K c )
denotethecorrespondingrepresentationofthetruesystem. Sincetheentriesinacanonical
form are system invariants, the estimation accuracy of two procedures can be assessed by
comparing the asymptoticcovariance matrix of the vectorisation of the estimated system
in acanonical form. This isdone in the main result of this paper:
Theorem 3.1 Let y
t
be generated by a system (A
c ;B c ;C c ;D c ;K c
), such that the
corre-sponding pairof transfer functions are in M +
n
. The noise isassumed to be i.i.d. with zero
mean, nonsingularvariance andnite fourth moments. Theobserved input isrestricted to
be i.i.d. with zero mean, nonsingular variance and nite fourth moments. Furthermore it
is assumed thatnoise and observed input are independent. Assume thatthe Larimore type
of procedureusing ^ W p =( ^ ; p ) 1=2
isusedtoestimate thesystem andnodelay isassumed,
i.e. theentriesinD
c
areestimatedandnotrestrictedtobezero. Additionallyitisassumed,
thatp dlogT=(2logj
0
j);1<d<1and p=o((logT) a
) holdsforsomea <1, where
T denotes thesample sizeand
0 = max (A c K c C c ), where max denotes aneigenvalueof
maximum modulus. Corresponding to ^
W +
f
it is assumed, that either f is xed and ^
W +
f is
chosen such that there exists a nonsingular matrix W + f , where k ^ W + f W + f k =O(Q T ), or that f ! 1 and ^ W + f
is chosen according to equation (2). Then the asymptotic variance
of vec h ^ A c A c ; ^ B c B c ; ^ C c C c ; ^ D c D c ; ^ K c K c i is of the form M 1 M 0 1 +M 2 z (O 0 f W 2 O f ) 1 O 0 f W 2 [E f (I f )E 0 f ]W 2 O f (O 0 f W 2 O f ) 1 M 0 2 (4)
where W 2 =lim T!1 (W f ) 0 W f
. Here the matrices M
1 2R [(n+s)(n+m)+ns]s(n+m) and M 2 2 R [(n+s)(n+m) +ns]1 do not depend on f or W + f
. Finally denotes the noise covariance
matrix and z =EZ t;1 (Z t;1 ) 0 .
Theexpression (4) asa functionof W +
f
isminimizedbytheCCAchoiceof theweighting
W + f =( +; f ) 1=2
for each value of f. The minimum variance decreases monotonically in
f for the CCA case.
The implications of the theorem are that it is always (i.e. for any choice of f) optimal to
use the CCA weighting scheme. The theorem also suggests to use f ! 1 at some rate,
whichisinaccordancewithearliersimulationstudies(cf.Bauer, 1998). Italsoshows,that
nochoice off nite can achieve the optimalaccuracy inall cases, since the decrease with
respecttof isingeneralstrict. Furthermorethe theoremprovidesameasure ofhowmuch
of attainable accuracy one looses by using any other methodthan the optimal. Notethat
inthe theoremithas beenassumedthatptendstoinnityasafunctionof thesamplesize.
Therefore the above expression should be viewed as the limitof the respective quantities
for p!1. It willbepart of the proof to demonstratethat this limitexists.
PROOF: Consider the estimation ofthe system matricesusing the estimateof the state
sequence x^ t = ^ K p Z t;p
: This is done using ordinary least squares. The expressions for the
estimation error are easily derived to be the following. Here (A;B;C;D;K) denotes the
true system in the representation according to [K
p ]
n = I
n
and so does the true state x
t . Let t =x^ t x t
. Introduce the notationha
t ;b t i=T 1 P T f t=p+1 a t b 0 t . Then: h ^ C C; ^ D D i = [h" t C t ;x^ t i;h" t C t ;u t i] 2 4 h^x t ;x^ t i h^x t ;u t i hu t ;x^ t i hu t ;u t i 3 5 1 h ^ A A; ^ B B i = [h t+1 +K" t A t ;x^ t i;h t+1 +K" t A t ;u t i] 2 4 h^x t ;x^ t i h^x t ;u t i hu t ;x^ t i hu t ;u t i 3 5 1 h ^ K K i = h t+1 A t +Bu t K(^" t " t );"^ t ih^" t ;"^ t i 1 (5) Since "^ t " t = ( ^ C C)^x t C t
itis observed, that a number of terms appear inthese
expressions: h" t ;x^ t i;h t ;x^ t i;h t+1 ;x^ t i;h t+1 ;u t i;h t ;u t i;h t ;"^ t i;h t+1 ;"^ t i and h" t ;u t i.
Notethat t =( ^ K p K p )Z t;p (A KC) x t p . Let P K =I p(s+m) [I n ;0 ]K p .
Here all system matrices are assumed to be in the canonical form introduced by the
re-striction[K p ] n =I n . Further letO y f =(O 0 f W 2 O f ) 1 O 0 f W 2
denotea generalizedleftinverse
of O f . Recall that W 2 = (W + f ) 0 W + f
. In the notation we refer to the case f xed and
nite. In the CCAcase with f ! 1 we obtainW
2 =( +; 1 ) 1
. This introduces nofurther
complications however, since it is straightforward toshow, that all necessary limitsexist.
Fordetails onthis see (Bauer, 1998). Then it has been shown in (Bauer et al., 2000)that
( ^ K p K p )=O y f ( ^ z z )P K +O(k ^ z z kk ^ W p W p k)+O(k ^ z z k 2 ) (6) where z =EY + t;f (Z t;p ) 0 1 p
. For any of the proposed weighting matrices, k ^ W p W p k = O(Q T
fp). This follows fromthe uniform convergence of the sample covariances as stated
e.g. in (Hannan and Deistler, 1988, Theorem 5.3.2). It also follows, that k ^ z z k 2 = o(T 1=2
). Therefore for the asymptotic distribution the term O y f ( ^ z z )P K is the
es-sential one, the remainingterms do not show up in the asymptotic distribution, as they
are o(T 1=2
). Here asymptotic distribution is to be understood for the vectorisation of
the matrix in the sense of (Lewis and Reinsel, 1985), as the dimensions of the matrix
in-crease according to p ! 1: i.e. a zero mean vector x
T 2 R
p(T)
is said to be distributed
asymptoticallynormal, if forany vector l
T 2R p(T) ,such that kl T k 2 M forsome M <1 k[l 0 T ;0] l 0 k 2
!0 for some vector l 2`
2 E(l 0 T x T ) 2 !cfor some 0c<1
the scalar product l 0
T x
T
converges indistribution toa normalrandom variable.
P
K
depends on p but this is not re ected in the notation. Note the fact, that this
expression does not depend on the weighting ^
W
p
and that regarding the weighting ^
W +
f
onlytheexpectationW +
f
hasanin uence. Thusitisseen,thattheonlyessentialstochastic
terms in the equations above are h"
t ;x t i;h" t ;u t i and ^ K p K p : =O y f ( ^ z z )P K , where : =
denotes equality up to terms of order o(T 1=2
size it follows that k[ z ;0 ] O f K k 2 =O(pj 0 j )=o(T ),where 0 denotes a zero
of H(q) of maximum modulus. Thenlet Z ; t;p =Z t;p ^ z;u ^ 1 u U + t;f . Therefore ^ z z : =hE f E + t;f ;Z ; t;p ihZ ; t;p ;Z ; t;p i 1
as follows from straightforward calculations. The main idea of the proof now is to nd
matricesM 1 2R [(n+s)( n+m)+ns]s(n+m) and M 2;p 2R [(n+s)(n+m)+ns]np(m+s) such that vec h ^ A c A; ^ B c B; ^ C c C; ^ D c D; ^ K c K i : = M 1 vec[h" t ; 2 4 x t u t 3 5 i]+M 2;p vec[O y f hE f E + t;f ;Z ; t;p i] where [M 2;p ;0 [(n+s)(n+m)+ns]1 ] converges in ` 1
norm to a matrix whose entries decrease
exponentially with column index. In this case the suÆcient conditions for asymptotic
normality given in (Lewis and Reinsel, 1985) are fullled (see e.g. Bauer et al., 1999). It
willbeclearfromthederivationbelow,that M 1 andM 2 =lim T!1 [M 2;p ;0 [(n+s)(n+m)+ns]1 ] donot depend onf or W + f . Note that Evec[h" t ;x t i]vec [h" t ;x t i] 0 = 1 T 2 T f X t;s=p+1 E(x t x 0 s " t " 0 s ) : = 1 T ( x ) where x =K z K 0 = Ex t x 0 t and :
= again denotes equality up to terms of order o(T 1=2 ). Analogously Eve c[h" t ;u t i]vec[h" t ;u t i] 0 : =( u ), where u =Eu t u 0 t . Also Evec[h" t ;x t i]vec[h" t ;u t i] 0 =0
Further note that for any component of vec [h"
t ;x t i] : = vec[h" t ;K p Z t;p
i] and any linear
combination x 0 ( ^ K p K p )X p : = x 0 O y f E f hE + t;f ;Z ; t;p i 1 p P K X p
for some vectors x 2 R n and X p 2R p(s+m) suchthat k[X 0 p ;0 11 ] X 0 k 1 !0,whereX isavector in` 1 having elements
decreasing exponentially, one obtainsthat
E 1 T 2 P T f s;t=p+1 " t;i (K p;j Z t;p )(x 0 O y f E f E + s;f )(X 0 p P 0 K 1 p Z ; s;p ) = 1 T 2 P T f t=p+1 P t s=s E" t;i (x 0 O y f E f E + s;f ) EK p;j Z t;p (Z ; s;p ) 0 1 z P K X p = 1 T 2 P T f t=p+1 P t s=s E" t;i (x 0 O y f E f E + s;f )[K p;j ;0 1(m+s)(t s) ] 2 4 H t s;p p 3 5 1 p P K X P = 1 T 2 P T f t=p+1 P t s=s E" t;i (x 0 O y f E f E + s;f )[K p;j ;0 1(m+s)(t s) ] 2 4 ~ O t s K p I p 3 5 P K X p +o(T 1 )=o(T 1 )
whereH j;p =EZ t ;j (Z t j;p ) = ~ O j K p p
+o(T )ands=maxfp+1;t f+1g. Heremostly
K p P K =0andkH j;p 1 p ~ O j K p k=o(T 1=2
)forpasspeciedintheTheoremisused(fora
proofofthelatterstatementseee.g.Bauer,1998). Thelatterfactisusedinthereplacement
involved inthe third equality. The convergence follows fromthe convergence assumptions
on X
p
and the analogous property of K
p
. These properties allow the replacement of the
limitfor p ! 1 by the expression obtained for p =1, which will be done frequently in
the following inorder to simplifynotations. Thus letH
j denote H j;1 . Forthe covariance of elements of h" t ;u t i with x 0 ( ^ K p K p )X p
analogous arguments hold. Therefore the two
terms are asymptoticallyuncorrelated. The rst term due toh"
t ;x t i and h" t ;u t i has been
shown to have variance M
1 M 0 1 , where M 1
can be shown to be independent of f and W +
f
because of equation (5), see also below. Thus the only eect of these choices is hidden
in the second term. Consider the variance of O y f hE f E + t;f ;Z ; t;p i 1 p P K x p
for some vector
x p 2 R p(m+s) such that k[x 0 p ;0 11 ] x 0 k 1
! 0 for some vector x in `
1 having elements decreasing exponentially: 1 T 2 P T f s;t=p+1 EO y f E f E + t;f (Z ; t;p ) 0 1 p P K x p x 0 p P 0 K 1 p Z ; s;p (E + s;f ) 0 E 0 f (O y f ) 0 = 1 T P f 1 l =1 f h O y f E f EE + t ;f (E + t+l ;f ) 0 E 0 f (O y f ) 0 i x 0 p P 0 K 1 p EZ t ;p (Z t+l ;p ) 0 1 p P K x p +o(T 1 ) (7)
Note that for l = 0 the part due to E + t;f is equal to O y f E f (I f )E 0 f (O y f ) 0 . This is the
centralterminthe expression for the asymptoticvariancegivenin thetheorem. Fromthe
description of the estimation error in equation (5) it follows, that a matrix M
2
as used
above exists. The construction of this matrix will be claried below. The theorem then
is proved, if for any vectors x
p ;y p postmultiplying ^ K p K p
inequations (5) itholds, that
Ex 0 p P 0 K 1 p Z t;p (Z t j;p ) 0 1 p P K y p
!0;j 6=0. This willbe done below.
Considerthecasen <(s+m)rst. Examiningtheexpressionsfortheestimationerrors
given inequation (5), one observes that only anumberof terms are multiplying ^ K p K p :
Thesetermsconverge to
z K 0 ; h ~ H 0 1 ; z i 0 K 0 ;[0 ms ;I m ;0 m1 ] 0 and[I s ;0 s1 ] 0 , where conver-gence is in` 1
norm asrequired above for the vectors x
p
. This follows fromthe facts,that
EZ t ;1 u 0 t =0;EZ t ;1 " 0 t =0,theerrork ^ K p K p k=O(Q T
pf)andtheuniformconvergenceof
EP 0 K 1 z Z t;1 (Z t j;1 ) 0 1 z P K =P 0 K 1 z 2 4 H j z 3 5 1 z P K =P 0 K 1 z 2 4 0 j(s+m)1 I 1 3 5 P K =P 0 K 2 4 0 j(s+m)1 1 z 3 5 P K = 2 4 0 j(s+m)1 1 z 3 5 P K (8)
evaluatingtheexpressionforp=1ratherthandealingwiththelimit. Usingthe
exponen-tialdecrease howeverit isstraightforward show that inallsituation,where the expression
occurs, the limitand the expression for p=1 coincide. The next tolast equality follows
fromthe block matrix inversion formula
1 z = 2 4 0 (s+m)(s+m) 0 (s+m)1 0 1(s+m) 1 z 3 5 + 2 4 I s+m 1 z H 0 1 3 5 ( z (0) H 1 1 z H 0 1 ) 1 I s+m ; H 1 1 z (9) Theprojection P 0 K =I 1 K 0 [I n ;0 n1
]andthusthelastequalityfollowsfromn<(s+m).
Premultiplying with the above mentioned terms from the left shows, that the terms for
l 6=0 inthe equation(7) donot matter inthe case n<(s+m). Take e.g.
EK[H 0 1 ; z ]P 0 K 1 z Z t;1 (Z t j;1 ) 0 1 z P K =K[H 0 1 ; z ] 2 4 0 j(s+m)1 1 z 3 5 P K =0
This shows, thatinthe casen <(s+m)onlythe termfor l=0inequation(7)isnonzero
and thus the theorem holds in this case. Therefore assume n (s+m) from nowon.
In the theorem the asymptotic variance of ( ^ A c ; ^ B c ; ^ C c ; ^ D c ; ^ K c ) is given. In order to
show that only the term for l = 0 in equation (7) is nonzero, it is suÆcient to show this
fact for any invertible (possibly nonlinear) transformation of these matrices. It proves to
be convenient to consider ( b A c ; b B c ; ^ C c ; ^ D c ; ^ K c ), where b A c = ^ A c ^ K c ^ C c ; b B c = ^ B c ^ K c ^ C c .
These estimates are obtained by transforming the estimates of the subspace algorithm
( b A; b B; ^ C; ^ D; ^
K) into the particular canonical form dened by [K
p ] n = I n , which is done
using the transformation matrix ^ S, which is dened as ^ S = [[ ^ K; b B]; ; b A n 1 [ ^ K; b B]] n . Thene.g. ^ C c C = ^ C ^ S C =( ^ C C) ^ S+C( ^ S I n ),where I n isthe limitof ^ S asfollows
inastraightforward fashionfromthe consistencyresultsforK
p
and thesamplecovariances
used in the regression.
Thusconsidertheestimationerrorsinthesystemmatrixestimatesmoreclosely: Dueto
the whitenoiseassumption ontheinputs oneobtainsh^x
t ;u
t
i!0and thustheestimation
errors of ^
C and ^
D can be treatedindependently. Consider ^ C rst: ^ C C : = h" t C t ;x^ t ih^x t ;x^ t i 1 : =h" t ;x t i 1 x hC t ;x t i 1 x : = h" t ;x t i 1 x C( ^ K p K p ) p K 0 p 1 x
Here the error bound k ^ K p K p k = o(Q T
pf) has been used to show e.g. that h"
t ;x^ t i : = h" t ;x t
i. Next deal with K:
^ K K : =h t+1 ;" t i 1 +h A t K(^" t " t );" t i 1 : =( ^ K p K p ) 2 4 I s 0 [p(s+m) s]s 3 5
This follows fromthe error bound cited above and the uniform convergence of the sample
covariances, as e.g. h t ;" t i = ( ^ K p K p )hZ t;p ;" t i A p hx t p ;" t i :
= 0. Also the fact that
hu
t ;"^
t
i=0 has been used. Forn (s+m) this iszero. In this case simplemanipulations
lead to b A A : =h t+1 A t ;x t i 1 x : =[ ^ K p K p ;0 n(m+s) ] 2 4 H 1;p p 3 5 K 0 p 1 x A ( ^ K p K p ) z K 0 p 1 x where again :
= denotes equality up to terms of ordero(T 1=2
). The same arguments show
that b B = ^ B ^ K ^ D : =
B =B KD,where again n(s+m) is used. Therefore denoting
^ T i = b A i [ ^ K; b B];T i = A i [K;
B]the following recursion is obtained for i=0;1;:
^ T i+1 T i+1 = b A ( ^ T i T i )+( b A A)T i : = K 2 4 H 1 z 3 5 K 0 1 x T i A K z K 0 1 x T i + A( ^ T i T i ) : = K 2 4 H 1 z 3 5 K 0 1 x T i A i+1 K z K 0 1 x T 0 + i X j=1 A j K 8 < : 2 4 H 1 z 3 5 K 0 1 x z K 0 1 x A 9 = ; T i j
where K =[ ^ K p ;0 ] KandwhereK=K 1
isused. Therecursionisstartedat ^ T 0 =T 0 .
Note that due to the normalisation of K
p
it follows that each column of I
n
is equal to a
columnof T
i
forsomeindexi. Letn =bn=(s+m)c,wherebxcdenotesthe greatestinteger
smallerthan x. Furtherlet m =n n(s +m). Thenthe columns ofT
i
;0i<n and the
rst m columns of T
n
are vectors of the canonical basis. Therefore the columnsof ^
C
c C
are equal tothe columns of the following expressions for the respective integers i:
( ^ C C)T i 1 +C( ^ T i 1 T i 1 ) : = h" t ;x t i 1 x T i 1 C A i 1 K z K 0 1 x T 0 + P i 2 j=0 C A j K 8 < : 2 4 H 1 z 3 5 K 0 1 x z K 0 1 x A 9 = ; T i j 2 (10)
Thus the columns of 8 < : 2 4 H 1 z 3 5 K 0 1 x z K 0 1 x A 9 = ; and of z K 0 1 x T 0 appear as the vectors x p ory p
inequation (7). Now for j >0consider
ET 0 0 1 x K z P 0 K 1 z Z t;1 (Z t j;1 ) 0 1 z P K =T 0 0 1 x K z P 0 K 1 z 2 4 0 j(s+m)1 I 1 3 5 P K =T 0 0 1 x (K x [I n ;0 n1 ] 1 z ) 2 4 0 j(s+m)1 I 1 3 5 P K = T 0 0 [I n ;0 n1 ] 1 z 2 4 0 j(s+m)1 I 1 3 5 P K
which is zero due to the block matrix inversion (9) since T 0 0 [I n ;0 n1 ] = [I s+m ;0 (s+m)1 ]. Here K P K =0is used throughout.
The second term appearingabove has the same property:
E 1 x K [H 0 1 ; z ] A 0 1 x K z P 0 K 1 z Z t;1 (Z t j;1 ) 0 1 z P K = 1 x K [H 0 1 ; z ] A 0 1 x K z P 0 K 1 z 2 4 0 j(s+m)1 I 1 3 5 P K = 1 x K [H 0 1 ; z ] A 0 1 x K z (I 1 K 0 [I n ;0 n1 ]) 1 z 2 4 0 j(s+m)1 I 1 3 5 P K = 1 x K [H 0 1 ; z ] A 0 1 x K z K 0 [I n ;0 n1 ] 1 z 2 4 0 j(s+m)1 I 1 3 5 P K = [C 0 ;0 nm ]T 0 0 [I n ;0 n1 ] 1 z 2 4 0 j(s+m)1 I 1 3 5 P K =0
Here K [H 1 ; z ]K = Ex t (x t+1 ) = x
A has been used. Corresponding to D the situation
is much easier, since it is easily seen, that ^ D c D = ^ D D : = h" t ;u t i(Eu t u 0 t ) 1 , since h t ;u t i :
= 0. This follows from the bound on k ^
K
p K
p
k and the uncorrelatedness of u
t
and Z
t;p
inthe whitenoise case. Thus the asymptoticdistributionof ^ D c is notaected by the choice of f orW + f .
Due to the denition of ^ K c and b B c = ^ B c ^ K c ^ D c it follows, that ^ K c = K; b B c = B
for n (s+m). Thus it remains to consider the estimation of b A c . Note that ^ A c A = ( ^ S I n ) A+( ^ A A)+ A( ^ S I n ). If i is such that T i
is a blockcolumn of the identity
matrix, then ^ ST i = ^ T i and thus ( ^ S I n ) A T i 1 + ( ^ A A)T i 1 + A( ^ S I n )T i 1 = ( ^ S I n )T i + ( ^ A A )T i 1 + A( ^ T i 1 T i 1 )=0
This is also true for the rst m columns of T
n 1
. Therefore it remains to deal with
the matrix X = [x 0 0 ; ;x 0 n 1 ;x~ 0 n ] 0
, where X denotes the matrix built of the last s+m
columns of A. Here x i 2 R (s+m) (s+m) ;i = 0; ;n 1;x~ n 2 R m(s+m) . In order to
unify the notation let x
n =[~x 0 n ;0 (s+m)(s+m m ) ] 0 . Let s n 1 =[0 (s+m m ) m ;I s+m m ] 0 ;s n = [I m ;0 m(s+m m) ] 0 ;s n 1 = [I s+m m ;0 (s+m m) m ] 0 and s n = [0 m (s+m m) ;I m ] 0 respectively. Finally dene Y = 2 4 H 1 z 3 5 K 0 1 x
. Then fori=n 1 and i=n one obtains
( ^ T i T i ( ^ S I n ) A T i 1 )s i = P n j=0 ( ^ T j T j )x j s i + ^ T i T i = P n j=1 K YT j 1 x j s i + A j K z K 0 1 x T 0 x j s i + K YT i 1 s i A i K z K 0 1 x T 0 s i P n j=1 P j 1 l =1 A l K Y z K 0 1 x A T j l 1 x j s i + P i 1 j=1 A j K Y z K 0 1 x A T i j 1 s i
It has been shown before, that forthe termspostmultipliedby T
0
and the termsincluding
Y z K 0 1 x A
only the covariance of the respective term matters in the asymptotic
K Y T i 1 s i n X j=1 T j 1 x j s i ! = K Y 0 B B B B B B B B B @ 2 6 6 6 6 6 6 6 6 6 4 0 (s+m)(s+m) 0 (s+m)(s+m) . . . 0 m(s+m) I s+m 3 7 7 7 7 7 7 7 7 7 5 + 2 6 6 6 6 6 6 6 6 6 4 x 1 x 2 . . . ~ x n 0 (s+m)(s+m) 3 7 7 7 7 7 7 7 7 7 5 1 C C C C C C C C C A s i
Denoting the matrix in brackets on the right hand side with ~ X we obtain A ~ X = T 0 x 1 . Also A ~ X = A ~ X+T 0 [C 0 ;0 nm ] 0 ~ X =T 0
V for somematrix V. Therefore
E ~ X 0 1 x K [H 0 1 ; z ]P 0 K 1 z Z t;1 (Z t j;1 ) 0 1 z P K = ~ X 0 1 x K [H 0 1 ; z ]P 0 K 1 z 2 4 0 j(s+m)1 I 1 3 5 P K = ~ X 0 1 x K [H 0 1 ; z ](I 1 K 0 [I n ;0 n1 ]) 1 z 2 4 0 j(s+m)1 I 1 3 5 P K = ~ X 0 A 0 [I n ;0 n1 ] 1 z 2 4 0 j(s+m)1 I 1 3 5 P K =0
Here the secondlast equationfollows fromK [H 0 1 ; z ]K 0 =Ex t (x t+1 ) 0 = x A 0 .
Summingup the ndingsup tonow itfollows that
vec h b A c A; c B c B; ^ C c C; ^ D c D; ^ K c K i : = M 1 vech" t ; 2 4 x t u t 3 5 i+M 2;p vec h O y f E f hE + t;f ;Z t;p i i (11)
Here the matrices
M
1
and M
2;p
can be found by tracing the computations so far.
Addi-tionallyit has been shown, that
E 2 4 M 1 0 @ 2 4 x t u t 3 5 " t 1 A 3 5 M 2;p (P 0 K 1 p Z t+j;p O y E f E + t+j;f ) 0 ! 0 ;8j E M 2;p (P 0 K 1 p Z t;p O y E f E + t;f ) M 2;p (P 0 K 1 p Z t+j;p O y E f E + t+j;f ) 0 ! 0 ;j 6=0
FromthedenitionofM
1
andM
2;p
itfollowsthatthesematricesdonotdependonW
2 orf.
operations,thatthetermsgiven aboveareofordero(T 1=2
)uniformelyinf =O((logT a
)).
Here the formofthe weightingmatricesis usedtoshowe.g. that thevarianceofO y f E f E + t;f
is bounded uniformely in f. Therefore the expression for the asymptotic variance of the
estimated system matrices alsoholds inthe case f !1 for CCAweights.
Notethat O y E f (I f )E 0 f (O y ) 0 =(O 0 f W 2 O f ) 1 O 0 f W 2 [E f (I f )E 0 f ]W 2 O f (O 0 f W 2 O f ) 1 . ThisisminimizedbyW Æ 2 =[E f (I f )E 0 f ] 1 withminimum(O 0 f (E 0 f ) 1 (I f 1 )E 1 f O f ) 1 .
Some matrix algebra shows, that this gives the identical variance with any choice W
2 = W Æ 2 +W Æ 2 O f XO 0 f W Æ 2 , such that W 2
is invertible. Thus the CCA weightings minimize the
variance of the estimated system, since in this case W
2 =( +; y ) 1 , where +; y =E f (I f )E 0 f +O f x O 0 f
. Therefore due tothe matrix inversion lemma
( +; y ) 1 =(E f (I f )E 0 f ) 1 +(E f (I f )E 0 f ) 1 O f XO 0 f (E f (I f )E 0 f ) 1
for suitablematrix X. The lowerdiagonalblock Toeplitzstructure of E
f
then shows, that
this minimum variance decreases monotonically in f, since it ensures, that E 1 f O f is a submatrix of E 1 1 O 1
. This completes the proof.
Thistheorem clariesalongstandingquestionabouttheoptimalchoicesofthe
weight-ing matricesfor the algorithmsdealt with inthis contribution. It shows clearly,that CCA
is the optimal choice in any situation, where no input is present or the observed input
is white noise. It also shows, that f nite leads to noneÆcient estimation in the generic
case. The subset of M
n
, where nite f also leads to optimal estimates consists of ARX
systems, as is easily seen from the form of the essential term of the asymptotic variance
of the parameter estimates. The amount of accuracy, which is lost by using a small f is
thus inthe case of using optimalweights determinedby the magnitude of the noisezeros,
as they govern the rate of exponential decrease in the matrix E 1
f O
f .
The theoremalsohas implicationsfordierentweightings: Onthe basisofthetheorem
it is possible to assess the loss of eÆciency due to using a nonoptimal weighting. One
In a second step for a particular weighting the optimal f is calculated by minimizingthe
matrix given in the theorem. The benets of such methods must be doubted however,
since the derivation of the theorem have been given onthe grounds of knowing the order
ofthe system. In thiscase itdoesnot seemtomake sensetouse any otherweightingthan
the optimal.
Also another question is quite natural: The observed input has been assumed to be
white. Some of the analysis did not need this assumption and thus also holds for more
generalinputs. Inparticularthe equation(7)holdsformoregeneralinputs. This equation
in principlecould be used to obtainoptimal weightings with respect to specic measures:
Note, that the equationdenes amatrix andthere need not bea minimum forthe matrix
minimization problem. However we could for example minimize the trace of the matrix,
suchthatascalarminimizationproblemisobtained,whichcouldbesolvedusingnonlinear
search methods. However such aprocess includes optimisation issues, which one wants to
avoidinthe rst place by usingsubspace rather thanprediction errormethods. Thusalso
such a procedure is not recommended, althoughpossible. It isbelieved however, that the
expressions given inthis papermightbevaluablealsofortheanalysis ofthis generalinput
case.
4 Numerical Illustration
In this sectiontwo exampleswillbe given, whichillustratethe ndingsof the lastsection.
As arst case consider the following single-inputsingle-output system withoutexogenous
inputs havingstate dimension three:
A= 2 6 6 6 4 0:532 0:4639 0:2855 1 0 0:2568 0 1 0:0054 3 7 7 7 5 ;K = 2 6 6 6 4 1 0 0 3 7 7 7 5 ;C = h 0:532 0:4639 0:0413 i
Thenoise isassumed tobewhitewithvarianceequalto1. Forthis systemthe asymptotic
varianceiscomparedtotheCramer-Raoboundusingthefollowingmeasure: LetF
I
4
6
8
10
12
14
16
18
20
0
0.5
1
1.5
2
2.5
3
f
tr[V
f
(W
f
+
)F
I
]−6
CCA
N4SID
4
6
8
10
12
14
16
18
20
500
1000
1500
2000
2500
3000
3500
4000
f
det[O
d
E
f
E
f
T
(O
d
)
T
]
CCA
N4SID
Figure 1: The no input case: The left plot shows the measure E
f
for the two weighting
schemes CCA and N4SID for the range of values f = 3;;20. The right picture shows
a plot of the determinant of (O 0 f W 2 O f ) 1 O 0 f W 2 E f E 0 f W 2 O f (O 0 f W 2 O f ) 1
for the same two
procedures and the same range of integers f.
theFisherinformationmatrixwithrespecttotheecheloncanonicalforminthiscase. Then
it is well known, that the Cramer Rao bound for the estimation is equal to F 1 I . Thus let V f (W + f
) denote the asymptotic variance of parameter estimates obtained from the
subspace procedure using the integer f and the weighting matrix W +
f
. Then the measure
E f = tr[V f (W + f )F I
] 2ns (n+s)m is used. For an eÆcient estimation method this is
equalto zero,otherwise positive. The left plot inFigure1 shows this measure for the two
weightingschemesCCA(i.e. ^ W + f =( ^ + f ) 1=2
)and N4SID(i.e. W +
f =I
fs
). The rightplotof
this gure shows det[(O 0 f W 2 O f ) 1 O 0 f W 2 E f E 0 f W 2 O f (O 0 f W 2 O f ) 1 ], where W 2 =(W + f ) 0 W + f .
The plots clearly reveal the identical behaviour of the two measures. It alsocan be seen,
that for the CCA weights the measure E
f
decreases to zero for f ! 1, whereas for the
N4SID weightsthe choice off =n isoptimal. Forboth weightings aconverging behaviour
is observed for large f, which alsois in accordancewith the theory.
addi-2
4
6
8
10
12
14
16
18
20
0
0.5
1
1.5
2
2.5
f
tr[V
f
(W
f
+
)F
I
]−7
CCA
N4SID
2
4
6
8
10
12
14
16
18
20
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
3
f
det[O
d
E
f
E
f
T
(O
d
)
T
]
CCA
N4SID
Figure 2: The white noise input case: The left plot shows the measure E
f
for the two
weighting schemes CCAand N4SID forthe range of values f =2; ;20. The right picture
shows aplotof thedeterminantof(O 0 f W 2 O f ) 1 O 0 f W 2 E f E 0 f W 2 O f (O 0 f W 2 O f ) 1
forthesame
two procedures and the same range of integers f.
tional observed white noise input given by the system matrices
A= 2 4 0:393 2:022 0:208 0:685 3 5 ;B = 2 4 0:95 1:00 3 5 ;C = h 0:326 0:743 i ;D=0:95;K = 2 4 1 0 3 5
The observed and the unobserved noise are assumed to have mean zero and variance 1.
Thus there are a total of 7 parameters to be estimated. Analogously to the case of no
inputs dene E f = tr[V f (W + f )F I
] 7. Figure 2 shows the result of the calculation. The
pictures demonstrateagainidenticalbehaviourofthetwomeasures ofaccuracy. Againthe
CCA weighting scheme is superior to the N4SID weighting scheme and again it reaches the
Cramer Rao lower bound for f ! 1. This illustrates the signicance of the expressions
Inthis paperthe dependence of theasymptoticaccuracyof the Larimoretype ofsubspace
methodswithrespecttothechoiceoftheintegerf andtheweighting matrixW +
f
has been
explored. It has been shown, that the eects of these choices in the case of no observed
inputsor whiteobserved inputscan besummarized inthe term(O 0 f W 2 O f ) 1 O 0 f W 2 E f (I )E 0 f W 2 O f (O 0 f W 2 O f ) 1
ashas beenshown inTheorem3.1. This termshows, thatthe CCA
choiceof the weighting accordingto(3)isoptimalwithrespecttothe asymptoticvariance
foreachf. Italsofollowsthatforthisoptimalchoicethevariancedecreaseswithincreasing
f, achieving the optimalaccuracy for the choice f !1. Forother weighting procedures
the expression can be used to optimise the choice of f. Finally the new expressions for
the asymptoticvariancemightalsolead toaneÆcientimplementationofthe computation
of the asymptotic variance, which could be used for practicalimplementation ratherthan
only foracademic purposes.
Acknowledgements
This work has been done, while the rst author was holding a post doc position at the
DivisionofAutomaticControlinLinkoping. FinancialsupportfromtheEU TMRproject
'SI' isgratefully acknowledged.
References
Bauer, D. (1998). Some Asymptotic Theory for the Estimation of Linear Systems Using
Maximum Likelihood Methods orSubspace Algorithms.PhD thesis. TUWien.
Bauer, D. and M. Jansson (2000). Analysis of the asymptotic properties of the MOESP
somesubspace algorithmsforsystems withoutobserved inputs.Automatica35,1243{
1254.
Bauer, D., M. Deistler and W. Scherrer (2000). On the impact of weighting matrices in
subspace algorithms. In: Proceedings of the IFAC Conference 'SYSID' 2000. Santa
Barbara, California.
Hannan, E. J. and M. Deistler (1988). The Statistical Theory of Linear Systems. John
Wiley.New York.
Jansson, M. (2000). Asymptotic variance analysis of subspace identication methods. In:
Proceedings of the SYSID'2000 Conference. SantaBarbara, California.
Jansson, M. and B. Wahlberg (1998). On consistency of subspace methods for system
identication. Automatica 34(12), 1507{1519.
Larimore, W. E. (1983). System identication, reduced order lters and modeling via
canonical variate analysis. In: Proc. 1983 Amer. Control Conference 2. (H. S. Rao
and P.Dorato, Eds.). Piscataway, NJ. pp. 445{451. IEEE Service Center.
Lewis, R. and G. Reinsel (1985). Prediction of multivariate time series by autoregressive
modeltting. Journal of Multivariate Analysis 16, 393{411.
Peternell, K., W. Scherrer and M. Deistler (1996). Statistical analysis of novel subspace
identicationmethods. SignalProcessing 52, 161{177.
Verhaegen, M.(1994). Identicationof the deterministic part of mimostate space models