Parallel QR Implementation of Subspace Identification with Parsimonious Models

(1)

Parallel QR-implementation of Subspace

Identification with Parsimonious Models

S. Joe Qin

,

Lennart Ljung

Division of Automatic Control

Department of Electrical Engineering

Link¨

opings universitet

, SE-581 83 Link¨

oping, Sweden

WWW:

http://www.control.isy.liu.se

E-mail:

ljung@isy.liu.se

,

@isy.liu.se

3rd December 2003

AUTOMATIC CONTROL

COM

MUNICATION SYSTEMS

LINKÖPING

Report no.:

LiTH-ISY-R-2563

Submitted to The 13th IFAC Symposium on System Identification,

Rotterdam, The Netherlands

Technical reports from the Control & Communication group in Link¨oping are available athttp://www.control.isy.liu.se/publications.

(2)

PARALLEL QR IMPLEMENTATION OF SUBSPACE IDENTIFICATION WITH PARSIMONIOUS MODELS

S. Joe Qin∗ _{and Lennart Ljung}∗∗

∗_{Department of Chemical Engineering} The University of Texas at Austin

Austin, TX 78712 e-mail:qin@che.utexas.edu. ∗∗_{Department of Electrical Engineering}

Linkoping University Linkoping, Sweden

Abstract:

In this paper we reveal that the typical subspace identification algorithms use non-parsimonious model formulations, with extra terms in the model that appear to be non-causal. These terms are the causes for inflated variance in the estimates and partially responsible for the loss of closed-loop identifiability. We then propose a parallel parsimonious formulation of a new subspace identification algorithm and demonstrate the effectiveness of the proposed algorithm via simulation.

Keywords: subspace identification; parsimonious formulation; parallel projections

1. INTRODUCTION

Subspace identification methods (SIM) are attrac-tive not only because of their numerical simplicity and stability, but also for their state space form that is very convenient for estimation, filtering, prediction, and control. A few drawbacks, how-ever, have been experienced with SIMs:

(1) The estimation accuracy is in general not as good as the prediction error methods (PEM), represented by large variance.

(2) The application of SIMs to closed-loop data is still a challenge, even though the data sat-isfy identifiability conditions for traditional methods such as PEMs.

(3) The estimation of B and D is more problem-atic than that of A and C, which is reflected in the poor estimation of zeros and steady state gains.

In this paper, we are concerned with the reasons why subspace identification approaches exhibit these drawbacks and propose parsimonious SIMs

for open-loop applications. First of all, we start with the analysis of existing subspace formulation using the linear regression formulation (Jansson and Wahlberg, 1998; Knudsen, 2001). From this analysis we reveal that the typical SIM algorithms actually use non-parsimonious model formulation, with extra terms in the model that appear to be non-causal. These terms, although conveniently included for performing subspace projections, are the causes for inflated variance in the estimates and partially responsible for the loss of closed-loop identifiability.

2. ANALYSIS OF SUBSPACE MODEL FORMULATION

Parsimoniousness is a general rule is regression analysis and system identification. The typical subspace identification models, however, are not parsimonious and even non-causal as will be re-vealed in this section. We begin with an innova-tion model formulainnova-tion,

(3)

xk+1= Axk+ Buk+ Kek (1a)

yk= Cxk+ Duk+ ek (1b)

where yk ∈ Rny, xk ∈ Rn, uk ∈ Rnu, and ek ∈ Rny are the system output, state, input, and innovation, respectively. A,B,C,D and K are system matrices with appropriate dimensions. An extended state space model can be formulated as

Yf= ΓfXk+ HfUf+ GfEf (2a) Yp= ΓpXk−p+ HpUp+ GpEp (2b) where the extended observability matrix

Γf =      C CA .. . CAf −1      (3)

and the Toeplitz matrices are

Hf=      D 0 · · · 0 CB D · · · 0 .. . ... . .. ... CAf −2B CAf −3B · · · D      (4a) Gf=      I 0 · · · 0 CK I · · · 0 .. . ... . .. ... CAf −2K CAf −3K · · · I      (4b)

The input and output data are arranged in the following Hankel form:

Uf=      uk uk+1 · · · uk+N −1 uk+1 uk+2 · · · uk+N .. . ... . .. ... u_{k+f −1} uk+f · · · uk+f +N −2      (5a) ∆ = [uf(k) · · · uf(k + N − 1)] (5b) Up=      u_k−p u_k−p+1 · · · u_k−p+N−1 u_k−p+1 u_k−p+2 · · · u_k−p+N .. . ... . .. ... u_k−1 uk · · · uk+N −2      (5c) ∆ = up(k − p) · · · up(k − p + N − 1) (5d) Similar formulations are made for Yf, Yp, Ef, and Ep. Subspace identification methods minimize the following objective function (Overschee and Moor, 1996), [ ˆL1 _ˆ L2 _ˆ L3 ] = arg min{ Yf− L1Yp− L2Up− L3Uf 2 F} (6) = arg min{ N −1 X j=0 yf(k + j) − L1 L2 L3 "_y p(k − p + j) up(k − p + j) uf(k + j) # 2 } Denoting L1₌      L111 L 1 12 · · · L 1 1p L1 21 L 1 22 · · · L 1 2p .. . . .. L1 f 1 L1f 1 L1f p      ∆ =      L1 1 L1 2 .. . L1f      (7a) L2₌      L211 L 2 12 · · · L 2 1p L2 21 L 2 22 · · · L 2 2p .. . . .. L2f 1 L 2 f 1 L 2 f p      ∆ =      L21 L2 2 .. . L2 f      (7b) L3₌      L3 11 L 3 12 · · · L 3 1f L3 21 L322 · · · L32f .. . . .. L3 f 1 L 3 f 1 L 3 f f      ∆ =      L31 L3 2 .. . L3 f      (7c)

the above problem is equivalent to f separate sub-problems: _ˆ L1i Lˆ 2 i Lˆ 3 i = arg min{ N −1 X j=0 y(k + j + i − 1) −L1i L 2 i L 3 i "_y p(k − p + j) up(k − p + j) uf(k + j) # 2 for i = 1, 2, . . . , f (8)

For the moment consider the first subproblem, that is, i = 1. In this case the problem implies that the following model is specified:

y(k) = L1 1 L 2 1 L 3 1   yp(k − p) up(k − p) uf(k)  + v(k) = L1 1 L 2 1 yp(k − p) up(k − p) +L3 11u(k) + f X j=2 L3 1ju(k + j − 1) + v(k) (9)

Note that the third term on the RHS of the above equation is non-causal and unnecessary. Therefore, we can make the following statements about the typical SIM formulation in general.

(1) The model format used in SIM during the projection step is non-causal. This would result in non-causal models in the projec-tion step. Although the non-causal terms are ignored at the step to estimate B, D, all the model parameters estimate have inflated variance due to the fact that extra and un-necessary terms are included in the model, making the model nonparsimonious.

(2) Because of the extra terms that turn out to be ‘future’ inputs, SIMs in general have

(4)

problems with closed-loop data using direct identification methods. Most SIMs usually project out Uf as follows:

Yf/Π⊥Uf = ΓfXk/Π⊥Uf + GfEf/Π⊥Uf (10) where Π⊥

Uf = I − U T

f(UfUfT)−1Uf. Be-cause of the non-causal terms in the model,

1

NEfUfT 6= 0 as N → ∞ for closed-loop data. As a consequence, many SIMs fail to work on closed loop data, except for a few SIM algo-rithms that avoid this projection (Chou and Verhaegen, 1997; Wang and Qin, 2001; Wang and Qin, 2002).

(3) Because Uf contains extra rows due to the extra terms, the projection in Eq. 10 tends to reduce the information content unncessarily even for open-loop data, leading to inefficient use of the data.

(4) These non-causal terms will have negligible coefficients only when the number of data is very large and process is very well excited. For limited number of samples or non-white input signals, SIM algorithms tend to have large estimation errors.

To avoid these problems the SIM model must not include these non-causal terms. We propose a parallel QR implementation of a parsimonious subspace identification method (PARSIM) which removes these non-causal terms by enforcing tri-angular structure of the Toeplitz matrix Hf at every step of the SIM procedure. The parallel PARSIM (PARSIM-P) method involves a bank of least squares problems in parallel. This idea was presented in (Qin et al., 2002), In this paper, the bank of least squares problems is implemented via QR factorization. Optimal weighting is derived for this PARSIM-P method. An optimal estimate of the B, D matrices is given using the Kalman filter structure. Numerical simulation is used to demon-strate the effectiveness of the proposed method.

3. PARALLEL PARSIM METHOD The key idea in the proposed method is to exclude those non-causal terms of Uf. To accomplish this we partition the extended state space model row-wise as follows: Yf =      Yf 1 Yf 2 .. . Yf f      ; Yi ∆ =      Yf 1 Yf 2 .. . Yf i      ; i = 1, 2, . . . , f (11) Partition Uf and Ef in a similar way to define Uf i, Ui, Ef i, and Ei, respectively, for i = 1, 2, . . . , f . Denote further Γf=      Γf 1 Γf 2 .. . Γf f      (12a) Hf i ∆ = CAi−2_{B · · · CB D} (12b) ∆ = H_i−1 · · · H1 H0 (12c) Gf i ∆ = CAi−2_{K · · · CK I} (12d) ∆ = G_i−1 · · · G1 G0 (12e) ∀i = 1, 2, · · · , f where Hi and Gi are the Markov parameters for the deterministic input and innovation sequence, respectively. We have the following partitioned equations:

Yf i= Γf iXk+ Hf iUi+ Gf iEi ∀i = 1, 2, · · · , f

(13) Note that each of the above equation is guaran-teed causal.

3.1 Parallel Estimation of Γf i andHf i

By eliminating e(k) in the innovation model through iteration, it is straightforward to derive the following relation (Knudsen, 2001),

Xk= LzZp+ ApKXk−p (14) where Lz ∆ = ∆p(AK, K) ∆p(AK, BK)(15a) ∆p(A, B) ∆ = Ap−1_{B · · · AB B} (15b) AK ∆ = A − KC (15c) BK ∆ = B − KD (15d)

Substituting this equation into Eq. 13, we obtain Yf i= Γf iLzZp+ Γf iApKXk−p+ Hf iUi+ Gf iEi ∀i = 1, 2, · · · , f

(16) Since the second term in the RHS of Eq. 16 tends to zero as p tends to infinity, we have the following least squares estimates:

_ˆ Γf iLz Hˆf i = Yf i Z_Up i + ∀i = 1, 2, · · · , f (17)

Augmenting all estimates ˆΓf iLz, ∀i = 1, 2, · · · , f , we have      ˆ Γf 1Lz ˆ Γf 2Lz .. . ˆ Γf fLz      = ˆΓfLz (18)

(5)

Now we have the following parallel PARSIM algo-rithm to estimate Γf iand Hf i, for i = 1, 2, · · · , f . [Algorithm 1] Parallel PARSIM (PARSIM-P)

(1) Perform the following least squares esti-mates, _ˆ Γf iLz Hˆf i = Yf i Z_Up i + ∀i = 1, 2, · · · , f (19) (2) Perform SVD for the following weighted

ma-trix

W1(ˆΓfLz)W2= U SVT (20) where W1 is nonsingular and LzW2 does not lose rank. We choose

ˆ

Γf = W1−1U S

1/2 ₍₂₁₎

which will yield the estimate for A and C (Verhaegen, 1994).

(3) The estimates for Hi−1, ∀i = 1, 2, · · · , f can be calculated by averaging repeated esti-mates ˆ Hi−1= 1 f− i + 1 f X j=i ˆ Hf j(:, (j − i)lf + 1 :(j − i + 1)lf ) ∀i = 1, 2, · · · , f (22)

which can be used to estimate B and D. [Theorem 1]

Algorithm 1 gives consistent estimates for Γf and Hi−1, ∀i = 1, 2, · · · , f under the following conditions:

(1) The past horizon p → ∞.

(2) The input u(k) and innovation sequence e(k) are uncorrelated, i.e.,

1 NEiU

T

i → 0 as N → ∞.

(3) {C, A} is observable and {A, B K } reach-able.

[Proof ] See Appendix A.

[Remark 1] For finite past horizon p the al-gorithm is biased, but the bias decays to zero exponentially with p. If p is too large in practice, however, large variance is expected for the esti-mates. Therefore, it is necessary in practice to use a finite p for the best trade-off. Cross-validation can be used to select an optimal p.

[Remark 2] Because of Condition 2 in Theorem 1 where the EiUiT term requires no correlation between future uk and past ek, this PARSIM-P algorithm is biased for direct closed loop identifi-cation.

3.2 Optimal Weighting and QR Implementation 3.2.1. Optimal Weighting In the conventional SIM formulation under open-loop conditions,

EfΠ⊥Uf → Ef as N → ∞ (23) since Ef is uncorrelated with Uf. Therefore Eq. 10 becomes:

YfΠ⊥Uf = ΓfXkΠ⊥Uf + GfEf (24) Post-multiplying 1

NZpT to eliminate the noise term,

YfΠ⊥UfZpT = ΓfXkΠ⊥UfZpT (25) Van Overshee and De Moor (1995) show that all SIM methods do SVD on the following weighted matrix:

WrYfΠ⊥UfZpTWc= WrΓfXkΠ⊥UfZpTWc (26)

where Wr and Wc are the row and column

weighting matrices, respectively. In CVA Wr = (YfΠ⊥UfYfT)−1/2 which basically normalizes the output variables to achieve a maximum likelihood type of weighting. Gustafsson (Gustafsson, 2002) shows that an approximately optimal weighting for Wc is

Wc= (ZpZpT − ZpUfT(UfUfT)UfZpT)−1/2

= (ZpΠ⊥UfZpT)−1/2 (27)

which is used in CVA and MOESP. Substituting Eq. 27 into Eq. 26, and replacing Xk with LzZp as an instrumental variable, we obtain,

WrYfΠ⊥UfZpTWc= WrΓfLzZpΠ⊥UfZpT(ZpΠTUfZpT)−1/2 = WrΓfLz(ZpΠ⊥UfZpT)1/2 (28) Comparing Eq. 28 with Eq.20, the equivalent weightings for the PARSIM-P algorithm is

W1= Wr= (YfΠ⊥UpYfT)−1/2 (29a) W2= (ZpΠ⊥UfZ

T

p)1/2 (29b)

Jansson and Wahlberg (1996) show that the row-weighting W1 (or Wr) has no influence on the asymptotic accuracy of the estimate.

3.2.2. QR Implementation for Γf To implement the PARSIM-P algorithm efficiently, we perform the following QR decomposition

  Zp Ui Yf i  = R11,i R21,i R22,i Q1,i Q2,i (30) Using this relation in Eq. 19, we obtain

_ˆ

Γf iLz Hˆf i = R21,iR+11,i (31) Notice that the QR decomposition in Eq. 30 can be implemented recursively. Given that the ith QR decomposition is performed, the next QR decomposition can be performed on

(6)

  Zp Ui+1 Yf,i+1  =     Zp Ui Uf,i+1 Yf,i+1     =   R11,iQ1,i Uf,i+1 Yf,i+1   (32)

which is in QR decomposition form except for the last two rows.

3.2.3. QR Implementation for K Once ˆΓf is known in Step 2 of Algorithm 1, the Kalman filter gain K can be estimated similar to Wang and Qin (Wang and Qin, 2002).

With large p, substituting Eq. 14 into Eq. 2 leads to: Yf = ΓfLzZp+ HfUf+ GfEf (33) Therefore, YfΠ⊥h_Zp Uf i= GfEfΠ⊥h_Zp Uf i= GfEf (34)

since Ef is not correlated with Zpand Ufin open-loop. Performing QR decomposition,   Zp Uf Yf  =   R11 R21 R22 R31 R32 R33     Q1 Q2 Q3   (35) then R33Q3= GfEf (36)

Denoting ek = F e∗k such that cov(e∗k) = I, GfEf = (Gf⊗ F )Ef∗

∆ = G∗

fEf∗ (37) where ⊗ denotes Kronecker product and

G∗ f =     F 0 · · · 0 CKF F · · · 0 . .. ... . .. ... CAf −2KF CAf −3KF · · · F     ∈ <m.f ×m.f (38)

From Eqs. 36 and 37 and using the fact that Q3 is an orthonormal matrix, we choose

ˆ Ef∗= Q3 (39a) ˆ G∗ f= R33 (39b) Therefore, ˆ F = R33(1 : m, 1 : m) (40)

and K can be calculated from G∗

f using Γf.

3.2.4. Optimal Estimation for B and D With A and C estimates, Section 10.6 in (Ljung, 1999) gives an effective approach to estimate B and D with an output error formulation. Here we give a modified approach to estimating B, D optimally using A, C, K and F for the general innovation form.

From the innovation form of the system we have: xk+1= AK xk+ BK uk+ K yk (41) The process output can be represented as

yk= C(qI − AK)−1x0+ [C(qI − AK)−1BK+ D]uk

+C(qI − AK)−1Kyk+ ek (42)

or:

[I − C(qI − AK)−1K]yk= C(qI − AK)−1x0 +[C(qI − AK)−1BK+ D]uk+ ek (43) using ek = F e∗k where e∗k has an identity covari-ance matrix, and defining

˜

yk= F−1[I − C(qI − AK)−1K]yk (44a) G(q) = F−1_{C(qI − A} K)−1 (44b) D∗_{= F}−1_D _(44c) we obtain, ˜ yk= G(q)BKuk+ D∗uk+ G(q)x0δk+ e∗k = G(q) ⊗ uTk vec(BK) + Im⊗ uTk vec(D∗) +G(q)x0δk+ e∗k (45) where vec(BK) and vec(D∗) are vectorized BK and D∗ _{matrices along the rows. δ}

k is the Kro-necker delta function. Now vec(BK), vec(D∗) and x0 can be estimated using least squares from the above equation. The B, D matrices can be backed out as:

ˆ

D = F ˆD∗ (46a)

ˆ

B = ˆBK+ K ˆD (46b)

3.2.5. PARSIM-P with QR Implementation We summarize the above procedure into following algorithm that implements PARSIM-P with QR decomposition.

[Algorithm 2] PARSIM-P with QR decomposi-tion

(1) Calculate Γf iLz as in Eq. 31 via a bank of QR decompositions for i = 1, 2, · · · , f . Form ΓfLz and calculate A and C estimates from SVD of W1ˆΓfLzW2as in Algorithm 1.

(7)

(2) Find the Kalman filter gain K and matrix F using A and C estimates.

(3) Calculate B, D estimates from A, C, K and F estimates.

Using these three steps, it is straightforward to show that the estimates are consistent if the prior steps are consistent.

4. SIMULATION RESULTS

The counter example proposed in (Jansson and Wahlberg, 1998) is used here to test the effec-tiveness of the proposed parallel parsimonious method. The case of K = [−0.21 − 0.559]T _is used here. For comparison any standard SIM al-gorithms can be used, but we choose the MOESP algorithm in (Verhaegen, 1994). Figure 1 shows the poles and zero estimates using the PARSIM-P algorithm and MOESPARSIM-P. In this simulation, we choose N=2000, f=3, p=5. It can be seen from the results that the PARSIM-P gives better estimates of the poles than the MOESP algorithm, which knows the benefit of the parsimonious formula-tion. The zero estimates from the PARSIM-P is much better than the MOESP estimates.

0.7 0.8 0.9 1 1.1 −0.2 −0.1 0 0.1 0.2

Pole Estimates from PARSIM_P

−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

Zero Estimates from PARSIM_P

0.7 0.8 0.9 1 1.1 −0.2

−0.1 0 0.1

0.2 Pole Estimates from MEOSP

−20 −15 −10 −5 0 −5

0 5

Zero Estimates from MOESP

Fig. 1. Poles and zero estimates for the counter example.

5. CONCLUSIONS

The conventional subspace identification models are non-parsimonious with extra non-causal input terms. The extra terms in the models are respon-sible for large variance in SIM estimates and par-tially responsible for the loss of closed-loop iden-tifiability. The proposed parallel subspace identi-fication algorithm overcomes these problems.

ACKNOWLEDGMENTS

Financial support from National Science Founda-tion under CTS-9985074 and a Faculty Research Assignment grant from University of Texas is gratefully acknowledged.

6. REFERENCES

Chou, C.T. and Michel Verhaegen (1997). Sub-space algorithms for the identification of mul-tivariable dynamic errors-in-variables models. Automatica33(10), 1857–1869.

Gustafsson, Tony (2002). Subspace-based system identification: weighting and pre-filtering of instruments. Automatica 38, 433–443. Jansson, Magnus and Bo Wahlberg (1998). On

consistency of subspace methods for system identification. Automatica 34(12), 1507–1519. Knudsen, Torben (2001). Consistency analysis of subspace identification methods based on a linear regression approach. Automatica 37, 81–89.

Ljung, L. (1999). System Identification: Theory for the User. Prentice-Hall, Inc.. Englewood Cliffs, New Jersey.

Overschee, Peter Van and Bart De Moor (1996). Subspace Identification for Linear Systems. Kluwer Academic Publishers.

Qin, S.J., J. Wang and L. Ljung (2002). Sub-space identification methods using parsimo-nious model formulation. In: AIChE Annual Meeting. Indianapolis, IN.

Verhaegen, Michel (1994). Identification of the de-terministic part of MIMO state space models given in innovations form from input-output data. Automatica 30(1), 61–74.

Wang, Jin and S. Joe Qin (2001). Principal com-ponent analysis for errors-in-variables sub-space identification. In: Proc. of the 40th IEEE Conf. On Decision and Control. Or-lando, FL. pp. 3936–3941.

Wang, Jin and S. Joe Qin (2002). A new subspace identification approach based on principal component analysis. J. Proc. Cont. 12, 841– 855.

APPENDIX A

Condition 1 implies the initial state has zero effect on the estimate since AK is always stable. From Condition 2 we have 1

NEiZpT → 0 as N → ∞, that is, the effect of noise is zero under condition 2. It is straightforward to show that Condition 3 is need for Lz to have full row rank and Γf full column rank.

(8)

Abstract

(9)

Avdelning, Institution

Division, Department

Division of Automatic Control

Department of Electrical Engineering

Datum Date

2003-12-03

Spr˚ak Language 2 Svenska/Swedish 2 X Engelska/English 2 ... Rapporttyp Report category 2 Licentiatavhandling 2 Examensarbete 2 C-uppsats 2 D-uppsats 2 X ¨Ovrig rapport 2 ...

URL f¨or elektronisk version

http://www.control.isy.liu.se

ISBN

...

ISRN

...

Serietitel och serienummer Title of series, numbering

LiTH-ISY-R-2563

ISSN

1400-3902

...

Titel

Title _{Parallel QR-implementation of Subspace Identification with Parsimonious Models} F¨orfattare

Author _{S. Joe Qin, Lennart Ljung,}

Sammanfattning Abstract

.

Nyckelord Keywords