Multiple Model Least-Squares and Augmented UD Identication

(1)

A Tutorial On

Multiple Model Least-Squares and Augmented UD Identication

Steve S. Niu

1

,

^D. ^Grant ^Fisher²

,

^Lennart ^Ljung³

and Sirish L. Shah 4

December 21, 1994

1

Department of Electrical Engineering, Linkoping Univeristy, Linkoping, S58183,

^Sweden

.

Email: niu@isy.liu.se, Fax: (+46) 13 282622

2

Department of Chemical Engineering, Univeristy of Alberta, Edmonton,

^Canada

T6G 2G6,

Email: fisher@cupid.eche.ualberta.ca, Phone/Fax: (403) 492-3301

3

Department of Electrical Engineering, Linkoping Univeristy, Linkoping, S58183,

^Sweden

.

Email: ljung@isy.liu.se, Fax: (+46) 13 282622

4

Department of Chemical Engineering, Univeristy of Alberta, Edmonton,

^Canada

T6G 2G6,

Email: shah@prancer.eche.ualberta.ca, Fax: (403) 492-2881

(2)

Abstract

The augmented UD identication (AUDI) is a family of new identication algorithms that are based on some well-known matrix decomposition and updating techniques. Compared with conventional least-squares methods, the AUDI methods are conceptually more con- cise, computationally more ecient, numerically more robust and application-wise more complete. As a result, AUDI is recommended as a complete replacements for conventional recursive least-squares in all parameter estimation and system identication applications.

This tutorial paper presents an overview of the AUDI concept, implementation and applications. The multiple model least-squares (

^MMLS

) method, which is a fundamental reformulation and an ecient implementation of the basic least-squares method, is discussed

rst. Some application examples of the MMLS/AUDI are presented to demonstrate the versatility and reliability of this type of algorithms.

Keywords: Parameter Estimation, System Identication, Least-Squares, UD Factoriza-

tion, Multiple Model Least-Squares (MMLS), Augmented UD Identication (AUDI)

(3)

Chapter 1

Introduction

The least-squares principle (Gauss 1809) is widely used for data processing in almost all scientic and engineering elds and applications such as statistics, equation solving, sig- nal processing and system identication. It has been the dominant method for parameter estimation for many decades, due to its theoretical simplicity and implementational con- venience. However, at least two problems exist with the conventional least-squares formu- lation. One is the well known poor numerical performance in the case of ill-conditioned coecient matrix. The other problem, which has not yet been realized by many people, is that the conventional least-squares method does not eciently utilize the information contained in the data matrix. In another words, the conventional least-squares method only extracts a small portion of the information contained in the data matrix.

A simple reformulation of the least-squares method leads to the decomposition least- squares method. The decomposition least-squares attains improved numerical properties by exploiting various decomposition techniques and more fully extracts information con- tained in the data matrix via rearrangement and augmentation of the data matrix. As a fundamental numerical tool, the decomposition least-squares can be used in many elds and applications. In this paper, however, only the application in parameter estimation is consid- ered, which is the multiple model least-squares estimator. The multiple model least-squares estimator preserves the numerical properties of the decomposition least-squares while at the same time, clear physical signicance is revealed for the extra information extracted from the data. In addition, the special structure provides several other useful properties and practical features that make application easier.

Based on the multiple model least-squares principle, a family of new recursive iden- tication algorithms are developed, which are the augmented UD identication (AUDI) algorithms (Niu et al. 1990, Niu et al. 1992, Niu et al. 1993, Niu 1994). The AUDI family of algorithms integrate many useful features and information into a simple and compact yet very versatile and exible structure, provides more \completeness' of information for identication application.

The

^MMLS

/AUDI hierarchy is depicted in Figure 1.1, where it is shown that the

^MMLS

estimator is the basis for many potential applications including system identication. This paper will focus on the least-squares application in parameter estimation and system iden- tication, which is the shaded area in the gure.

This paper is organized as follows: Section 2 reviews the basic least-squares method and

then introduces the decomposition least-squares methods, from a general point of view and

with generic notations. In Section 3, the decomposition least-squares method is applied to

(4)

STATISTICS

SIGNAL PROCESSING ADAPTIVE CONTROL

PREDICTION FAULT DETECTION

IDENTIFICATION (AUDI)

ESTIMATION (MMLS)

DECOMPOSITION LS

DYNAMIC MODELING

SOLVING EQUATIONS

Figure 1.1: The MMLS/AUDI Application Tree

parameter estimation and leads to the multiple model least-squares method, which is based on an ARX model structure. MMLS for FIR model structure is also briey discussed. The advantages of using the MMLS structure are demonstrated and justied by examples. In Section 4, the MMLS method is further narrowed down to recursive system identication.

The AUDI family algorithms, which includes the AUDI-LS, AUDI-ELS, AUDI-IV, AUDI-

FIR and MIMO AUDI algorithms, are briey discussed. Section 5 presents several control-

oriented application examples of the AUDI algorithms. For clarity of presentation, the

mathematical derivation and proof are left to the appendices and readers who are not

interested in the mathematical details can just ignore the appendices.

(5)

Chapter 2

Decomposition Least-Squares Method

Least-squares method can be traced back to almost two hundred years ago when Gauss used this simple method to calculate the orbits of the planets (Gauss 1809, Barnard 1963). The least squares method has ever since become one of the most widely used and most useful numerical tools.

Many of the real world problems would nally reduce to the problem of solving a set of overdetermined linear simultaneous equations of the form

2

6

4

y

¹

y

²

y

³

y ... m

3

7

5

=

2

6

4

x

¹¹

x

¹²

x

¹³

x

¹

n

x

²¹

x

²²

x

²³

x

²

n

x

³¹

x

³²

x

³³

x

³

n

... ... ... ... ...

x m

¹

x m

²

x m

³

x mn

3

7

5 2

6

4

¹

²

³

... n

3

7

5

(2.1)

or in a compact form

y

=

^X

(2.2)

where

^X ² ^R

^{m n} is the coecient matrix and

^y² ^R

^m

¹

is the observation vector.

²

R

n

¹

is the vector of the unknowns to be solved. This is one of the most commonly encountered least-squares problems, though the method of least-squares can apply to a much greater range of problems.

The least-squares method aims to nd the solution ^

= ^

¹

^{^}

²

^{^} n ] that minimizes the sum of squares of the dierence between the observed data (the observation vector) and their estimated values, which is

^

= argmin

jjy;Xjj

2

(2.3)

where

^jj ^jj²

denotes the Euclidian vector norm. According to Gauss's Theorem of Least- Squares (Barnard 1963), the least-squares solution ^

to equation (2.2) is given by the fol- lowing normal equation

^X

]

=

^X

^y

(2.4)

Conceptually, the least-squares solution to (2.2) can be trivially obtained by solving the above normal equation, which is by a simple inversion of

^X

if it is invertible

^ = ]

^;1

(2.5)

(6)

and the loss function (sum of squared residuals) is given by

J =

^y^;^X

^

]

^y^;^X

^

] (2.6) In practice, however, least-squares problem is almost never solved in this way due to the poor numerical performance when the covariance matrix involved is ill-conditioned (Lawson

& Hanson 1974, Golub & Loan 1989). Much eort has been devoted to the improvement of the numerical properties of the least-squares method, among which matrix decomposition is one of the most successful endeavors. The Cholesky decomposition, QR decomposi- tion and singular value decomposition (SVD) (Bjorck 1994, Golub & Loan 1989, Forsyth et al. 1977, Dahlquist & Bjorck 1974, Stewart 1973) are some of the commonly used decom- position techniques, among which the QR and SVD decompositions are the most celebrated.

For instance, the data matrix

^X

in equation (2.2) can be decomposed with the QR decom- position as

X

=

^Q

R

0 and

^Q

^y

=

z

1

z

2

(2.7) and the least-squares estimate is then given by solving

R

=

^z¹

and the lost function is given by

J =

^jjy^;^X^jj

^

²

₌

^jjz²^jj²

A summary of some of the decomposition methods that can be used for solving least- squares is given in Table 2.1, where

^G

is lower triangular,

^L

is unit lower triangular,

^U

is unit upper triangular,

^D

is diagonal, and

^Q

is orthogonal. Obviously, the decomposition

Table 2.1: Decomposition Methods For Solving Least-Squares

Method Decomposition Least-Squares Solution

LU Decomposition

^X

=

^L^U

^

=

^U^;1^L^;1

(

^X

^y

) LDL

^T

Decomposition

^X

=

^LDL

^

=

^L^;

^D^;1^L^;1

(

^X

^y

) Cholesky Decomposition

^X

=

^G^G

^

=

^G^;

^G^;1

(

^X

^y

) QR Decomposition

^X

=

^Q^R

^

=

^R^;1^Q

^y

methods are used solely as a numerical enhancement to the least-squares method. In this paper, however, the matrix decomposition techniques for solving least-squares problems are investigated from a dierent standpoint. The following two questions are the main concerns 1. With equation (2.5), least-squares solution is only available when

^X

is invertible.

Are there any numerically more reliable methods that can handle both well- and ill- conditioned (and singular) matrix

^X

? This is the topic that hundreds of papers has been focused on in the past decades, which includes the methods listed in Table 2.1.

In this section, two dierent re-formulations are proposed which take advantage of

the well-known techniques, while at the same time, preserves the simplicity of the

least-squares method and produce more information than the original least-squares.

(7)

2. The second question concerns the information that is contained in the data: what other useful information are contained in the matrix

^X

in addition to the solution vector ^

? This section shows that the data does contain much more information than just the least-squares solution for equation (2.2). The extra information can be obtained without extra computation. More importantly, the extra information can be very useful in many applications.

The least-squares problem dened in (2.1) or (2.2) can be represented in a slightly dierent form

^X

^;y

]

1 =

^e

(2.8)

or

^X

₌

^e

_(2.9)

This is called the augmented system.

^X

is called the augmented data matrix and

_{is called}

the augmented solution vector. In this paper, we will be focusing on the augmented data matrix

^X

and show that the least-squares problem dened in (2.2) actually contains much more information than just the solution vector

. By appropriate decomposition of the augmented data matrix, all these information can be extracted and made explicit in useful forms. Many of the well-established decomposition methods such as LU/LDL

^T

, Cholesky, QR, UDU

^T

can all be used to extract the information. However, for simplicity of notation and convenience of presentation, we will in this section use the LDL

^T

(symmetrical matrix) and LDU decomposition (non-symmetrical matrix) methods to illustrate the concept of these type of decomposition least-squares methods. For practical implementation, which is discussed in Section 2.4, the QR decomposition is recommended because of its superior numerical reliability.

2.1 Forward Decomposition Least-Squares

Rewrite the augmented data matrix

^X

into columns

X

=

^x¹

^x²

^x

n

^;y

] and consider the following n sets of simultaneous equations

6

4

u

¹²^x¹

=

^;x²

+

^e¹

u

¹³^x¹

+ u

²³^x²

=

^;x³

+

^e²

u

¹

n

^x¹

+ u

²

n

^x²

+ + u n

^;1

n

^x

n

^;1

= ...

^;x

n +

^e

n

^;1

¹^x¹

+

²^x²

+ + n

^;1^x

n

^;1

+ n

^x

n =+

^y

+

^e

n

(2.10)

Obviously, the last equation is the the original least-squares problem dened in (2.2) or

(2.1). The other rows dene some additional equations. The equations can be re-written

(8)

into a matrix form as

2

6

4 x

1 x

2 x

3

x

n

^;y

3

7

5 2

6

4

1 u

¹²

u

¹³

u

¹

n

¹

1 u

²³

u

²

n

²

1 u

³

n

³

... ... ...

1 n

1

3

7

5

=

^E

(2.11)

or in the compact form

X U

=

^E

(2.12)

The least-squares solutions for all the above n sets of linear equations can be produced, simultaneously, by a simple decomposition. This is stated as the following theorem.

Theorem 2.1 (Forward Decomposition Least-Squares)

Assuming that the augmented coecient matrix

^X

is dened in (2.12) and is non-singular, dene following LDL

^T

decomposition

S

=

^X

=

^L^D^L

(2.13)

where

^L

is unit lower triangular and

^D

is diagonal. Then

^L

and

^D

simultaneously provide all the least-squares solutions and loss functions for the n sets of linear equations dened by (2.10) or (2.11) in the form

U

=

^L^;

(unit upper triangular)

D

=

^D

(diagonal) (2.14)

where \

^;

" stands for matrix transpose and inverse,

^U

and

^D

are called the solution and loss function matrices respectively.

Obviously, the conventional least-squares (2.3) problem is only a subset of the augmented least-squares problem. Proof of this theorem can be found in Appendix A.1.

The solution matrix

^U

in (2.14) contains all the least-squares solutions to all the equa- tions in (2.10), with its last column being the solution vector (2.5) in conventional least- squares. The loss function matrix

^D

is in analog to the loss function (scalar) in (2.6). It contains the loss functions (sum of the squared residuals) for the least-squares solutions to all the equations in (2.10). The n additional sets of linear simultaneous equations are produced without any additional computational cost. Although here the extra n sets of equations do not seem to have very specic practical signicance, in parameter estimation, however, with an appropriate formulation of the data vector, they constitute all the lower order models. This is discussed in details in Section 3.

Example 2.1 (Forward Decomposition Least-Squares)

Consider the following linear simultaneous equations

X

=

^y

(2.15)

with

X

=

2

4

1 2 3 4 5 6 7 8 0

3

5

=

^x¹

^x²

^x³

]

^y

=

2

4

10 28 37

3

5

(9)

where rank(

^X

) = 3 and cond(

^X

) = 35. This is a well-conditioned matrix. Apply the standard LDL

^T

decomposition to the augmented data matrix

S

=

^X

^;y

]

^X

^;y

] =

^L^D^L

then the solution and loss function matrices are given by

U

=

^L^;

=

2

6

4

1

^;

1 : 1818 5 : 5000 3 : 0000 1

^;

5 : 0000 2 : 0000 1 1 : 0000 1

3

7

5

D

=

^D

= diag(66 : 000 0 : 8182 13 : 500 0 : 0000) which can be written into 3 sets of equations as

6

4

;

1 : 1818

^x¹

=

^;x²

(0 : 8182)

+5 : 5000

^x¹^;

5 : 0000

^x²

=

^;x³

(13 : 500)

+3 : 0000

^x¹

+ 2 : 0000

^x²

+ 1 : 0000

^x³

= +

^y

(0 : 0000) (2.16) The numbers in the brackets are the corresponding loss functions. The last set of equations is the original linear equations that is dened by (2.15). The least-squares solution provided in the last column of

^U

, namely ^

= 3 : 0000 2 : 0000 1 : 0000] , is the best linear least-squares

t of

^y

by

^x¹

^x²

^x³

. The least-squares tting error (the loss function) is given by the last element of

^D

as zero since the solution ^

agrees to the true solution 3 2 1] .

Note that the third column of matrix

^U

gives the best linear least-squares t of

^x³

with

x

1

and

^x²

, which is the second set of equations in (2.16). The loss function is given by the third element in

^D

. Similarly, the second column of

^U

gives the t of

^x²

with

^x¹

and the corresponding loss function is given by the second element in

^D

.

Conventional methods can work equally well. However, with approximately the same amount of computational eort, the conventional implementation of the least-squares method does not produce any information other than the least-squares solution ^

.

2.2 Backward Decomposition Least-Squares

If the the augmented data matrix

^X

is formulated as follows

X

=

^;y

^x¹

^x²

^x

n ] and consider the following n sets of equations

6

4

y

+

^e¹

= u

¹²^x¹

y

+

^e²

= u

¹³^x¹

+ u

²³^x²

...

y

+

^e

n

^;1

= u

¹

n

^x¹

+ u

²

n

^x²

+ + u n

^;1

n

^x

n

^;1

y

+

^eⁿ

=

¹^x¹

+

²^x²

+ + n

^;1^x

n

^;1

+ n

^x

n

(2.17)

that is, each set of equations uses the

^x

vectors to t the observation vector

^y

, which

obviously makes more sense than those in (2.10). The dierent set of equations dier in the

number of

^x

vectors (ranging from 1 to n ) used for tting

^y

. Again, the last equation is

the original least-squares problem dened in (2.2) or (2.1).

(10)

Rewrite Equation (2.17) into a matrix form as

2

6

4

;yx

1 x

2

x

n

^;1 ^x

n

3

7

5 2

6

4

1 1 1 1 1

u

¹²

u

¹³

u

¹

n

¹

u

²³

u

²

n

²

... ... ...

u n

^;1

n n

^;1

_n

3

7

5

=

^E

(2.18)

where

^E

is the residual matrix of appropriate dimension. In a more compact form

X U

=

^E

(2.19)

The solution matrix

^U

is then given by the following theorem.

Theorem 2.2 (Backward Decomposition Least-Squares)

For the least-squares problem dened in (2.2), dene the (non-symmetric) augmented data product matrix

^S

as

S

=

^;y

^X

]

^X

^;y

]

and decompose

^S

with LDU decomposition into

^S

=

^LDU

, the least-squares solution matrix

^U

and the corresponding loss function matrix

^D

for the n sets of linear equations in (2.17) or (2.19) are then given by

(

U

= ~

^L^;1

D

= diag(

^E

) (2.20)

where the residual matrix

^E

is calculated by (2.19), and ~

^L

is the normalized matrix whose inverse

^U

takes the upper triangular form as in (2.18).

A simple procedure for calculating

^U

from

^L

is provided in Section 2.4. The proof of this theorem can be in Appendix A.2.

Example 2.2 (Backward Decomposition Least-Squares)

Consider the same example as Example 2.1. Apply the LDU decomposition to the aug- mented data matrix

S

=

^;y

^X

]

^X

^;y

] =

^LDU

then the solution and loss function matrices

^U

and

^D

are given by

U

=

2

6

4

1 1 1 1

5 : 7727

^;

2 : 5000 3 : 0000 7 : 0000 2 : 0000 1 : 0000

3

7

5

D

= diag(2253 : 0 53 : 591 13 : 500 0 : 0000)

which provides the least-squares solutions as the following three sets of equations

6

4

y

= +5 : 7727

^x¹

(53 : 591)

y

=

^;

2 : 5000

^x¹

+ 7 : 0000

^x²

(13 : 500)

y

= +3 : 0000

^x¹

+ 2 : 0000

^x²

+ 1 : 0000

^x³

(0 : 0000) (2.21)

(11)

That is, the last column of

^U

gives the least-squares solution to the original equations, which is the third set of equations in (2.21). The third column of matrix

^U

gives the observation vector

^y

as the best linear combination, in least-squares sense, of the two vectors

^x¹

and

x

2

, which is the second set of equations in (2.21). The second column of

^U

gives the least- squares t of

^y

with a single vector

^x¹

. The corresponding loss functions for the above three equations are given by the 2nd, 3rd and 4th elements in matrix

^D

respectively.

A comparison of Example 2.2 with Example 2.1 reveals that the results here are more natural and meaningful.

2.3 Generalization

With the augmented equations, the least-squares problem in (2.3) can be redened as follows

^

= argmin

jjXjj

2

subject to

n

⁺¹

1 (2.22) This is a more general form for least-squares problems. The least-squares method with QR decomposition then becomes

X

=

^QR

=

^QDU

(2.23)

where

^D

and

^U

is the further decomposition of the triangular matrix

^R

and has a diagonal and unit upper triangular form respectively. The least-squares solution can then be directly read o from the last column of

^U^;1

and the corresponding loss function is read o from the last diagonal element of

^D²

. In addition, matrix

^Q

is not explicitely needed so both computation and storage can be saved or reduced. For example, in the QR decomposition,

X

=

^Q¹

^Q²

]

R

0 =

^Q¹^R

then only the

^Q¹

part of the orthogonal matrix

^Q

needs to be calculated.

As a generalization, the following three least-squares problems can be dened with the same cost function (2.22). That is,

1. Forward Prediction Least-Squares.

^

= argmin

jjXjj

2

subject to

n

⁺¹

1 (2.24) 2. Backward Prediction Least-Squares.

^

= argmin

jjXjj

2

subject to

¹

1 (2.25)

3. Total Least-Squares

^

= argmin

jjXjj

2

subject to

^jj^jj²

1 (2.26) In fact, we can go one step further, that is, to extend the conventional single model least-squares method as dened in (2.22) into a multiple model form, without increasing the computational eort.

Consider the following extension of the least-squares problem (2.2)

⁼ ^(2.27)

(12)

where

^X ² ^R

^m

⁽

ⁿ

⁺¹⁾

is dened in (2.12) and is a square solution matrix of dimension ( n + 1)

( n + 1). Compared with the conventional least-squares method dened in (2.12) which solves for one set of equations, equation (2.27) denes n sets of equations and is referred to as multiple models.

The following multiple model least-squares problem can then be dened

De nition 2.1 (Multiple Model Least-Squares)

For the multiple equations dened in (2.27), the least-squares estimate of the solution matrix is dened as

U

= argmin

^jj^X

^jj²

^(2.28)

where

^X

² ^R

^{m n} is the augmented data matrix and can take any form. is the solution matrix with a user specied special structure.

The structure of the conventional least-squares and multiple model least-squares can be depicted as in Figure 2.1, where it is clear that conventional least-squares is only a subset

-

Multiple Model Least-Squares

E X

Least-Squares

U

θ

^e

X

Figure 2.1: Least-Squares vs Multiple Model Least-Squares of the multiple model least-squares problem.

Orthogonal transformations are the natural choices for the multiple model least-squares problem. Here is this paper, we consider the QR and singular value decomposition methods.

For the QR decomposition shown in (2.23), the

^MMLS

criterion (2.28) becomes

U

= argmin

^jj^X

^jj²

= argmin

^jjR

^jj²

which indicates that the

^R

matrix contains the same amount of information as

^X

in terms of meeting the

^MMLS

criterion (2.28). The following are the two special structures for

that are discussed in this Sections 2.1 and 2.2:

(13)

1. is restricted to an upper triangular matrix with all its diagonal elements being unity

U

=

2

6

4

1 u

¹²

u

¹³

u

¹

n

¹

1 u

²³

u

²

_n

²

1 u

³

n

³

... ... ...

1 n

1

3

7

5

(2.29)

This is the structure that is discussed in Section 2.1 and will be referred to as the forward MMLS method.

2. is assumed to be an upper triangular matrix with its rst row being unity

U

=

2

6

4

1 1 1 1 1

u

¹²

u

¹³

u

¹

n

¹

u

²³

u

²

n

²

... ... ...

u n

^;1

n n

^;1

n

3

7

5

(2.30)

This is the structure that is discussed in Section 2.2 and will be referred to as the backward MMLS method.

Consequently, with QR decomposition, the multiple model least-squares problem becomes that of extracting the useful information from

^R

in an ecient and explicit form such as (2.29) and (2.30). The leftover of the

^R

matrix after the extraction, which is a diagonal matrix, is the corresponding loss function matrix.

Obviously, an appropriate combination of the data matrix structure and the decompo- sition method can leads to several useful structures. For example

1. If the augmented data matrix is arranged in the form of

X

=

^x¹

^x²

^x

n

^;

y ]

the applying the forward and backward decompositions (Theorem 2.1 and Theo- rem 2.2) respectively leads to the results shown in Figures 2.2a and 2.2b, where each arrow represents a set of equations. For instance, the bottom arrow in Figures 2.2a implies tting

^y

with

^x¹

^x²

^x

n .

2. If the order of the columns in the data matrix is reversed as

X

=

^;

y

^x

n

^x²

^x¹

]

then applying Theorem 2.1 and Theorem 2.2 produces the solutions and loss functions for the equations represented by Figures 2.2c and 2.2d.

Another obvious choice of orthogonal decomposition is the singular value decomposition (Golub & Loan 1989), where the augmented data matrix

^X

is decomposed into

X

=

^U^SV

(2.31)

(14)

x₁ x₂ x₃ x₄ x₅ x_n-1 x_n y

(c) (d)

(a) (b)

Order

Order 2 3 4 5

n-1 n .. .

.. .

2 3 4 5

n-1 n .. .

x₁ x₂ x₃ x₄ x₅ _n-1 x_n y

.. .

1

.. .

x x x x x x x

y ₁ ₂ ₃ ₄ ₅ _n-1 _n y x₁ x₂ x₃ x₄ x₅ x_n-1 x_n x

Figure 2.2: Dierent Model Structures

U

and

^V

are two orthogonal matrices and

^S

is a diagonal matrix that contains all the singular values of matrix

^X

as its diagonal elements. If the parameter matrix in (2.27) is assumed to be orthogonal, then the best parameter matrix

^U

for is given by

U

=

^V^;

=

^V

(2.32)

From Huel & Vandewalle (1991) it is known that the last column of

^U

gives the Total Least-Squares solution to (2.27). In parameter estimation, total least-squares problem is also called the Error in Variable problem. In least-squares method the variable to be

tted (corresponding to the 1 in the augmented parameter vector) is assumed to be noise- corrupted and other variables are noise free. The total least-squares method, on the other hand, assumes that all the variables are contaminated by noise. Details on total least- squares can be found in Huel & Vandewalle (1991) and Golub & Loan (1989) and will not discussed further in this paper.

Notice that up to this stage, there has been no any restrictions posed on the structure of the data matrix. This implies that, given an arbitrary matrix

^X

, the forward and backward MMLS provides the information on various forms of linear dependence among its columns as shown in Figure 2.2. The linear dependence is uniquely and conveniently determined and interpreted by equation (2.27).

2.4 Implementation

In previous sections, the LDU/LDL

^T

decomposition methods are used to explain the prin-

ciple of the multiple model least-squares method since they are structurally more appealing.

(15)

However, forming the data product matrix may lose precision due to nite word length and roundo errors. As a result, the QR decomposition, which works directly on the data matrix, is always recommended for practical application. This section presents the implementation of the

^MMLS

with QR decomposition.

First of all, the QR decomposition is related to the LU/LDL

^T

and Cholesky decompo- sitions as follows. Given the symmetrical matrix

^S

=

^X

, the following decompositions

LDL

^T

Decomposition:

^S

=

^LDL

LU Decomposition:

^S

=

^LU

Cholesky Decomposition:

^S

=

^GG

QR Decomposition:

^X

₌

^QR

are related to each other by

U

=

^DL

^G

=

^LD¹

⁼

²

^R

=

^G

Theoretically, all these decompositions can produce exactly the same results. Choosing QR as our decomposition method is solely for its superior numerical properties over others.

The general procedure for MMLS with QR is thus as follows. First do a standard QR decomposition (Bjorck 1994, for instance) to the augmented data matrix. Then extract the parameter and loss function matrices according to the special structure of the parameter matrix. For the forward MMLS, the extraction is trivially done by

D

= diag(

^R

)

²

^U

=

^R^;1

diag(

^R

)

For the backward MMLS, the stepwise procedures shown in Table 2.2 can be used.

d ( n ) = R ( nn )

²

z = R (1 : n

^;

1 n ) for k = n

^;

1 downto 1

d ( k ) = d ( k + 1) + z

²

( k ) y = z (1 : k )

y ( k ) = z ( k ) =R ( kk ) for i = k

^;

1 downto 1

y ( i ) = ( z ( i )

^;

R ( ii + 1 : k ) y ( i + 1 : k )) =R ( ii ) U (1 : kk ) = y

Table 2.2: Backward MMLS Procedure

(16)

Chapter 3

Multiple Model Least-Squares Estimator

This section rst reviews the conventional least-squares estimator for parameter estimation and then proposes and discusses the multiple model least-squares estimator, which is based on the multiple model least-squares principle discussed in Section 2. As the focus is nar- rowed down from general least-squares to parameter estimation, some notational changes are necessary just to be consistent with the notations that have been widely accepted in parameter estimation and system identication. However, the notation change should not cause any problems in interpretation.

3.1 Parameter Estimation with Least-Squares

Assume that the input/output data from a system are available up to time step t as

^f

u (1) z (1)

^g

^f

u (2) z (2)

^g

^f

u ( t ) z ( t )

^g

(3.1) where u ( ) and z ( ) are the system input and output respectively. Then the general problem of least-squares parameter estimation can be dened as to nd a linear model that minimizes the following quadratic criterion

J ( t ) =

^X

^N

t

⁼¹

z ( t )

^;

z ^ ( t )]

²

where ^ z ( t ) is the predicted system output using the assumed model. Assume that the system under investigation can be described by the following linear dierence equation (ARX) model

z ( t ) + a

¹

z ( t

^;

1) + + a n z ( t

^;

n ) = b

¹

u ( t

^;

1) + + b n u ( t

^;

n ) + v ( t ) (3.2) or in a more compact format

z ( t ) =

^h

( t )

+ v ( t ) (3.3)

where n is the model order. v ( t ) is the unmodeled part of the system and is assumed to be a

zero-mean white noise sequence with variance

²

_v . The data vector

^h

( t ) and the parameter

(17)

vector

are dened as follows

h

( t ) =

^;

z ( t

^;

1)

^;

z ( t

^;

n ) u ( t

^;

1) u ( t

^;

n )] (3.4)

= a

¹

a _n b

¹

b _n ] (3.5)

For simplicity of notation , assuming that the observed input/output sequence

^f

u ( i ) z ( i )

^g

are available for time i =

^;

n +1 to t , then the data matrix is dened, in terms of rows and columns respectively, as

H

( t ) =

2

6

4 h

(1)

h

(2) ...

h

( t )

3

7

5

(3.6) and

H

( t ) =

^;z

( t

^;

1)

^;z

( t

^;

n )

^u

( t

^;

1)

^u

( t

^;

n )] (3.7) The observation vector is dened as

z

( t ) = z (1) z (2) z (3) z ( t

^;

1) z ( t )] (3.8) The least-squares solution for this problem is then equivalent to nding the best estimate

^

( t ) for

H

( t )

( t ) =

^z

( t ) (3.9)

in the sense of minimizing the following quadratic criterion

J ( t ) =

^z

( t )

^;^H

( t )

( t )]

^z

( t )

^;^H

( t )

( t )] (3.10) Note here that equation (3.9) has the standard least-squares format as dened in (2.2).

Setting the derivative of J ( t ) w.r.t

( t ) to zero leads to

^

( t ) =

^H

( t )

^H

( t )]

^;1^H

( t )

^z

( t ) (3.11) which is the least-squares estimate to system (3.3) and agrees to (2.5). Or in another form

^

( t ) =

2

4

t

X

j

⁼¹

h

( j )

^h

( j )

3

5

;1^X

t j

⁼¹

h

( j )

^z

( j ) (3.12) Obviously, the availability of the least-squares estimate ^

( t ) depends on the invertibility of the matrix

^H

( t )

^H

( t ).

For many decades, the above least-squares estimator (3.12) or (3.11) has been the corner- stone for parameter estimation (Box & Jenkins 1970, Lawson & Hanson 1974, Ljung 1987).

However, since a straightforward implementation of the least-squares method suers from numerical problems, more stable implementation are being constantly searched for. In addition to the decomposition methods such as the LU/LDL

^T

, QR, Cholesky, and sin- gular value decomposition techniques (Golub & Loan 1989), in the community of signal processing and system identication, Potter's square-root least-squares method (Potter &

Stern 1963, Bennet 1963, Andrews 1968) and later Bierman's UDU

^T

factorization method

(Bierman 1977, Thornton & Bierman 1980) are examples of the successful endeavors. The

Least-Squares Lattice (LSL) method (Morf et al. 1977, Aling 1993), which has been widely

used in signal processing, especially for recursive implementation, is another example of

improved least-squares estimator.

(18)

In this section, the multiple model least-squares principle proposed in Section 2 is used to develop the multiple model least-squares estimator, which is characterized by its improved numerical properties due to the decomposition technique exploited, and the explicit multiple model structure resulting from the rearrangement and augmentation of the regressor.

In addition, the basic least-squares method of Gauss (1809) only produces the least- squares estimate for the highest order model, while in literature, through an order-recursive structure, it has been shown that the data matrix does contain much more information than just the parameter estimate of the highest order model, and these additional information can be obtained without extra computation.

3.2 The Multiple Model Least-Squares Method

The

^MMLS

method is implicitly an order recursive least-squares method, which is an \old"

topic and has already attracted much attention in the past decades, for instance, both Levinson's (Levinson 1947, Morf et al. 1977, Whittle 1963) method and the least-squares lattice method (Morf et al. 1977, Lee 1980, Lee et al. 1981, Makhoul 1977, Friedlander 1983, Aling 1993) can all produces all the lower order models as by-products (Ljung 1987), and mathematically it can be proven that both of them are equivalent to the

^MMLS

formulation.

However, the MMLS method proposed here diers from previous contributions in that it \integrates" many of the well-established techniques, favorable properties and features into a simple, exible and also very reliable structure in a very simple and natural way, and provides more \completeness" of information and \convenience" of implementation for parameter estimation and signal processing.

Assuming that the order of the actual system (3.1) is set to a maximum possible order (MPO) of n , dene the augmented data vector as

'

( t ) =

^;

z ( t

^;

n ) u ( t

^;

n )

^;

z ( t

^;

1) u ( t

^;

1)

^;

z ( t )] (3.13) Note that in comparison with the data vector (3.4) in the conventional least-squares esti- mator, the elements of the augmented data vector (3.13) are interleaved and rearranged in input/output pairs. The current system output z ( t ) is also included in the data vector. This special structure is the basis of the MMLS method and is also the fundamental dierence between the MMLS formulation and the conventional least-squares formulation.

Dene the augmented data matrix as

( t ) =

2

6

4 '

(1)

'

(2) ...

'

( t )

3

7

5

(3.14) and then dene the data product matrix in analog to (2.13) as

S

( t ) = ⁽ ^t ⁾ ⁽ ^t ^{) =}

^X

^t

j

⁼¹

'

( j )

^'

( j ) (3.15)

This matrix is called the augmented information matrix (AIM) in the context of parameter estimation and system identication. Decomposing the AIM according to Theorem 2.1, that is, by

^S

( t ) =

^L

( t )

^D

( t )

^L

, the resulting matrices

^U

( t ) =

^L^;

( t ) and

^D

( t ) =

^D

( t ) are the parameter and loss function matrices for all the equations (models) dened by

'

( t )

^U

( t ) =

^e

Multiple Model Least-Squares and Augmented UD Identication

A Tutorial On