• No results found

Regularization of singular least squares problems

N/A
N/A
Protected

Academic year: 2021

Share "Regularization of singular least squares problems"

Copied!
18
0
0

Loading.... (view fulltext now)

Full text

(1)

Regularization of singular least squares problems

P. Carrette

Department of Electrical Engineering Linkping University, S-581 83 Linkping, Sweden

WWW:

http://www.control.isy.liu.se

Email:

[email protected]

March 11, 1998

REGLERTEKNIK

AUTOMATIC CONTROL LINKÖPING

Report no.: LiTH-ISY-R-2019 Submitted to BIT journal

Technical reports from the Automatic Control group in Linkping are available by anony-

mous ftp at the address

ftp.control.isy.liu.se

. This report is contained in the com-

pressed postscript le

2019.ps.Z

.

(2)

Regularization of singular least squares problems

P. Carrette

Department of Electrical Engineering, Linkoping University S-58183 Linkoping, Sweden

Abstract

In this note, we analyze the inuence of the regularization procedure applied to singular LS square problems.

It appears that, due to nite numerical accuracy within the computer calculations, the regularization parameter has to belong to a particular range of values in order to have the regularized solution close to that associated to the singular LS problem. Surprisingly enough, this range essentially depends on the square root of the computer precision while the deciency (or singularity) of the regularized LS problem is governed by this precision.

The analysis is based on matrix perturbation theory for which the paper 12] is an utmost reference.

Keywords

: Matrix perturbation, Tikhonov regularization, Singular value decompo- sition.

1 Introduction

In this contribution, we present results concerning the use of Tikhonov regularization (see 13, 10] and references therein) while solving singular least squares (LS) problems, i.e.

x

0

= min

x

kAx;bk

2

subject to min

kxk2

(1)

for which the matrix

A2RN n

(for

N n

) is (column) rank decient. The corresponding regularized LS problem is as follows

~

x



= min

x

kAx;bk 2

2

+

2kxk22

(2)

for some



value.

Here, we intend to investigate the inuence of the regularization parameter



upon the deviation between the regularized solution ~

x

and the vector

x0

(solution to problem (1)), i.e. ~

x;x0

. The reason for this study is that we want to nd a good approximation of

x

0

without solving the singular LS problem itself (by, e.g., performing the pseudo-inverse

1

(3)

10−12 10−10 10−8 10−6 10−4 10−2 100 10−8

10−6 10−4 10−2 100 102 104 106 108

Figure 1: Quantities

kx

~

 ;x0k2

(

|

) and

kx

~

k2

(

;;

) as functions of .

2.2 2.25 2.3 2.35 2.4 2.45 2.5 2.55 2.6 2.65 2.7

101 102

kAx~;bk2

k~xk2

Figure 2:

L

-curve example with the three points appearing in Figure 1.

11] of the matrix

A

, i.e.

x0

=

Ayb

).

As it appears in the literature (see 1, 5, 7, 4, 14] and the Matlab package presented in 8]), Tikhonov regularization gives us a grip for such an approximation while solving the ordinary (full column rank) LS problem (2). The available results about Tikhonov regularization generally consider the links between the two contributions to the cost of the regularized problem, i.e.

kAx;bk2

and

kxk2

, only. Indeed, roughly speaking, the solution to the original problem (1) can be obtained in making these two quantities simultaneously small for appropriate value of the regularization parameter



. By this, we have in mind the reasoning that leads to the selection of \best"



values by inspecting the

L

-curve that is the representation of these two quantities at regularized solutions, i.e.

kx

~

k2

as a function of

kAx

~

;bk2

for dierent values of



, (see 7] for more details).

Unfortunately, the \corner" in the

L

-curve does not provide



values that are robust with respect to the 2-norm of the deviation we are interested in, i.e.

kx

~

 ;x0k2

. Let us illustrate this fact by a simple example:

A

is a 50



5 matrix with rank 4 and unit nonzero singular values while the 50 elements of the column

b

are samples of an uniformly distributed (in 0



1]) random variable. In Figure 1, we have presented the 2-norm of the deviation between the regularized solution ~

x

and the original solution

x0

as well as the 2-norm of the regularized solution ~

x

as functions of the parameter



. The graph of the primer deviation 2-norm can be divided into two parts in agreement with the decreasing and increasing behaviors of this error with



. Three points have been enlightened on the straight line curve. In Figure 2, we have displayed the corresponding

L

-curve, i.e.

(

kAx

~

;bk2kx

~

k2

) for dierent values of



. The preceding three points are concentrated in the lower-left corner of this curve so that it does not succeed in giving a high preference to the \o" point that is associated to the \best" approximation of the reference solution

x

0

, i.e. for



5

:

6 10

;5

.

In order to overcome the poor capability of the

L

-curve (as well as of the usual studies of Tikhonov regularization) to end up with a



value corresponding to a solution ~

x

close

2

(4)

to the reference vector

x0

, we here provide the analysis of the deviation between these two solutions, i.e. ~

x ;x0

, on the basis of matrix perturbation theory. Therefore, we make an intensive use of the paper of Stewart 12] that presents a complete analysis of the perturbation of the singular value decomposition (SVD) of matrices. Note that in 5], Hansen provides expressions for the upperbound on the relative deviation between a perturbed and the unperturbed solution, denoted

x

, of a LS problem similar to (2), i.e.

kx

~

;xk2=kxk2

.

As a by-product of our analysis, intervals over the regularization parameter values are found for any given admissible accuracy imposed on the regularized solution ~

x

, i.e.

 2

;+

] implying that

kx

~

;x0k2 

.

The paper is organized as follows. In Section 2, we introduce the notations we will be dealing with in the sequel and we discuss on how numerical errors should be taken into account in order to make the analysis of the regularized solution deviation explain its behavior (as seen in Figure 1). In Section 3, we decompose the deviation between the regularized solution ~

x

and the original

x0

into two components, i.e. either in or out of the kernel of the problem matrix

A

. In Section 4, we gives expressions for the singular value decomposition (SVD) of the perturbed version of the matrix

A

that enters in the resolution of the regularized LS problem (2). This is achieved on the basis of results presented in 12].

In Section 5, we end up with a closed form expression of the 2-norm of the regularized solution error. Its intrinsic characteristics with respect to the regularization parameter



are commented in some details. Finally, simulation examples are presented in Section 6.

They completely agree with the results derived in the preceding sections.

2 Notations and numerical error discussion

First, let us introduce notations for the two LS problems, i.e. minimization (1) and (2), respectively.

The matrix

A

has a column rank identical to

r <n

while

b

=

Ax0

+

e

where

e 2

ker

AT

originates from the inconsistency of the LS problem (1). We also dene the matrix

A 2R

(N+n) n

as

A

=



A

0



and



=



b

0



The SVD of

A

is written as

A

=

U1 U2

]



S

1

0 0 0



V T

1

V T

2



=

U1S1V1T

(3)

where

S1

= diag(

s1  sr

) with

si

, the

i

-th (in decreasing order) nonzero singular value of

A

while the matrices

U1 U2

] and

V1 V2

] are orthogonal of dimension (

N

+

n

) and

n

, respectively. Then, the orthogonal projector onto the range of

AT

is

Kk

=

V1V1T

while its orthogonal counterpart, i.e. the orthogonal projector onto the kernel of

A

, is denoted

3

(5)

K

?

=

I;Kk

(=

V2V2T

).

With these notations in mind, we can write the solution to the original LS problem as

x

0

=

Ay

where

Ay

is the pseudo-inverse of

A

, i.e.

Ay

=

V1S1;1U1T

(uniquely dened). It is worth mentioning that

x0

=

Kkx0

so that

x0

all lies in the range of

AT

.

Remark 1 The \

n

" operator designed in Matlab for solving LS problems, i.e.

An

, leads to solutions that generally contains a component in the kernel space of

A

(when it is non-trivial), i.e.

x

LS

=

x0

+

K?

with nonzero

2 Rn

. The reason for this is that the procedure uses a

QR

decompo- sition of the matrix

A

and, in case

R

has its (

n ;r

) last rows that contain negligible elements regarding the numerical accuracy of the computer, the (

n;r

) last elements of the associated LS solution are xed to zero. In fact, this corresponds to choosing

so that

K?

exactly compensates the (

n;r

) last elements of the reference solution

x0

(after permutations if necessary in the

QR

decomposition process).

2

Now, let us turn to the solution of the regularized LS problem presented in (2). This problem can be expressed in terms of a particular perturbation of the matrix

A

. Obviously, the second term in the brackets of (2) makes the bottom part of the corresponding matrix

A

, denoted

A

, become an identity matrix scaled by the regularization parameter



, i.e.

A



=



A

I



The corresponding solution is

x



=

Ay

(or

An

)

It is unique because

A

is full column rank for

>

0. From a numerical point of view, we must ask for



(e.g.

>

10

2

) where

denotes the accuracy of the computer.

Unfortunately, when we simulate the deviation between this \theoretically" regularized solution

x

and the solution

x0

, it is impossible to explain the deviation generally associ- ated with the \practically" regularized solution (see the straight line in Figure 8 compared to that in Figure 1). This means that our assumption concerning the inuence of the reg- ularization upon the matrix

A

(leading to

A

) is not correct. This can be viewed as a bad model of the regularization eect because it cannot reveal its intrinsic characteristics. It actually appears that additional perturbations must be considered: namely, the numerical errors associated to the computation of the regularized solution. More precisely, we deal with the following regularized matrix

~

A



=

A

+

E

(4)

4

(6)

where

denotes the numerical accuracy, i.e.

= 2

:

2 10

;16

in Matlab 5.1, and

E 2

R

( N+n) n

stands for the (normalized) numerical error matrix (whose structure is detailed below). The SVD of ~

A

is denoted

~

A



= ~

U1 U

~

2 U

~

3

]

2

4

~

S

1

0 0 ~

S2

0 0

3

5

 V

~

1T

~

V T

2



= ~

U1S

~

1V

~

1T

+ ~

U2S

~

2V

~

2T

where ~

S1

is a diagonal containing the

r

largest singular values of ~

A

(in increasing order!) and ~

S2

is a diagonal containing its (

n ;r

) last nonzero singular values. The matrices ~

U1 U

~

2 U

~

3

] and ~

V1 V

~

2

] are orthogonal of dimension (

N

+

n

) and

n

, respectively.

The solution of the corresponding LS problem is written as

~

x



= ~

Ay

(or ~

An

) (5)

It is unique in case ~

A

is full column rank, i.e.



for instance. The purpose of the paper is then to analyze in details the deviation between this solution ~

x

and the reference solution

x0

dened for the original LS problem.

Finally, let us give a reasonable structure for the numerical error matrix

E

. Therefore, we consider the left singular subspace associated to

U1

and we show how the associated singular values may induce a particular scaling of the elements of this matrix. We can write

~

A T

 U

1

=

V1S1

+

ETU1

(6)

The

i

-th column of this matrix has a 2-norm that is approximately identical to

si

. This means that the related numerical perturbation must take this scale into account, i.e.

(

V1

)

isi

+

(

ETU1

)

i

=

si;

(

V1

)

i

+

(

X1T

)

i

for an appropriate matrix

X1

whose denition is made regardless of the singular values of

A

. This leads to

ETU1

=

X1TS1

. For what concerns the

ETU2

counterpart, it can only be said that the largest elements within

E

will induce large contributions to this matrix product. Hence, we globally propose that the numerical error matrix

E

can be written as

E

=

U1S1X1

+

s1 U2X2

(7) for which we point out that the matrices

X1

and

X2

have normalized elements (indepen- dently of the

si

's and of



).

3 Deviation of the regularized solution

The deviation between the regularized solution ~

x

and the reference solution

x0

can be decomposed into two parts, i.e.

~

x



;x

0

=

Kkx

~

;x0

+

K?x

~



(8)

5

(7)

The rst term in the right hand side belongs to the range of the transpose of the matrix

A

while the second completely lies in its kernel (see denition of the orthogonal projector

Kk

and

K?

, respectively). Because of these orthogonal projectors, each of these contributions is orthogonal to the other, i.e.

kx

~

;x0k22

=

kKkx

~

;x0k22

+

kK?x

~

k22

In order to analyze these contributions, let us develop the links that exist between the two solution vectors.

From its denition in equation (5), the regularized solution ~

x

can be written as

~

x



= ~

V1 V

~

2

]

 S

~

1

0 0 ~

S2



;2

 V

~

1T

~

V T

2

 A

~

T



=

V

~

1S

~

1;2V

~

1T

+ ~

V2S

~

2;2V

~

2T



;

V

1 S

2

1 V

T

1 x

0

+

ET

where we have used the fact that

~

A T





=

AT

+

ET

=

V1S12V1Tx0

+

ET

with

x0

=

V1S1;1 U1T

. With the structure we have introduced for the numerical error matrix

E

, we also have that

E T



= (

U1S1X1

+

s1 U2X2

)

T

=

X1TS12V1Tx0

+

s1X2T

(

U2T

)

where it is worth noting that

kU2Tk22

(identical to

kek2

) is the value of the original LS cost at

x0

.

While coming back to the two contributions to the deviation of this regularized solution, we can write

K

kx

~

;x0

=

V1h

(

V1TV

~

1

)~

S1;2

(~

V1TV1

)

S12;IiV1Tx0

+

V

1

(

V1TV

~

1

)~

S1;2 h

(~

V1TX1T

)

S12V1Tx0

+

s1

(~

V1TX2T

) (

U2T

)

i

+

V

1

(

V1TV

~

2

)~

S2;2h

(~

V2TV1

) +

(~

V2TX1T

)]

S12V1Tx0

+

s1

(~

V2TX2T

) (

U2T

)

i

(9) and

K

?x

~



=

V2

(

V2TV

~

1

)~

S1;2h

(~

V1TV1

) +

(~

V1TX1T

)]

S12V1Tx0

+

s1

(~

V1TX2T

) (

U2T

)

i

+

V

2

(

V2TV

~

2

)~

S2;2h

(~

V2TV1

) +

(~

V2TX1T

)]

S12V1Tx0

+

s1

(~

V2TX2T

) (

U2T

)

i

(10) Let us give an interpretation for these two expressions. Because of the regularization of the singular LS problem, the right singular vectors of

A

rotate a little bit, i.e. leading to ~

V1 V

~

2

], and its two singular value subsets (i.e.

si

for

i

= 1

r

as well as the remaining

6

(8)

~

K k

~

K

?

K k K

?

~

x



x

0

Figure 3: Schematic representation of the compo- nents of the regularized solution ~

x

.

zero singular values) are also slightly altered giving rise to ~

S1

and ~

S2

. The regularized solution ~

x

is naturally expressed in terms of these perturbed singular pairs, i.e. (~

S1V

~

1

) and (~

S2V

~

2

). In other words, two components are found for it according to the related sub- spaces for which the orthogonal projectors are ~

Kk

= ~

V1V

~

1T

and ~

K?

= ~

V2V

~

2T

, respectively.

Hence, the (above) expressions for the contributions to the deviation of the regularized solution show up the projection of this solution back to the subspaces associated to the original matrix

A

, i.e. by use of the orthogonal projector

Kk

and

K?

. In Figure 3, we have schematically drawn this back projection for a 2-dimensional case.

It is also worth noticing that the components in the perturbed subspaces are inuenced by the inverse of the square of the corresponding singular values, i.e. ~

S1;2

and ~

S2;2

, respec- tively. As the latter will be seen to behave similarly to



, the corresponding component will show an extreme sensitivity to this regularization parameter, i.e.

1

=2

.

In order to analyze these expressions, we must evaluate the role of the regularization parameter



and of the numerical error matrix

E

on the following quantities

V T

1 V

~

1 V1TV

~

2 V2TV

~

1

and

V2TV

~

2

as well as ~

S1

and ~

S2

. A simple way to achieve this goal is to use results concerning the SVD of perturbed matrices.

4 SVD of perturbed matrices

In 12], Stewart shows that the right singular vectors of a perturbed matrix, ~

A

say, can be expressed in terms of those of the original matrix,

A

say, as follows

~

V1 V

~

2

] =

V1 V2

]



I ;P T

P I



(

I

+

PTP

)

;1=2

0 0 (

I

+

PPT

)

;1=2



(11) while

P 2R( n;r) r

is a matrix satisfying the equation system

(

Q

(

S1

+

E11

)

;

(





22

+

E22

)

P

= (





21

+

E21

)

; QE12P

P

(

S1

+

E11T

)

;

(





T22

+

E22T

)

Q

=

E12T ;P

(





T21

+

E21T

)

Q

(12)

7

(9)

for

Q2R( N+n;r) r

and 

2j

=

U2T

0

I

]

TVj

while

Eij

=

UiTEVj

.

Note that 

T2j



2j

=

I

and 

T21



22

= 0 because of ( 0

I

]

U2

)( 0

I

]

U2

)

T

=

I

together with the orthogonality of the original right singular vectors, i.e.

V1 V2

].

From expression (11), we immediately have that

V T

1 V

~

1

= (

I

+

PTP

)

;1=2 V1TV

~

2

=

;PT

(

I

+

PPT

)

;1=2

V T

2

~

V

1

=

P

(

I

+

PTP

)

;1=2 V2TV

~

2

= (

I

+

PPT

)

;1=2

(13) He also states that the perturbed singular values belong to disjoint sets so that

i

(~

S1

) =

i;

(

I

+

QTQ

)

1=2

(

S1

+

(

E11

+

E12P

))(

I

+

PTP

)

;1=2

i

(~

S2

) =

i;

(

I

+

QQT

)

;1=2

(





22

+

(

E22;QE12

))(

I

+

PPT

)

1=2

(14) where

i

(

X

) denotes the

i

-th singular value of the matrix

X

(in increasing order).

After straightforward derivations, we end up with the following result.

Proposition 1 Under the assumption that

 sr

, we have that the solution for the

P

and

Q

matrices in the equation system (12) respectively satisfy

P  V T

2 X

T

1

and

Q



21S1;1

where the symbol \



" should be understood in the spectral sense, i.e.

B C

is equivalent to

kB;Ck2

1. Furthermore, we have that

V T

1

~

V

1

I V T

2

~

V

2

I

and

V2TV

~

1

;

V1TV

~

2

]

T  V2TX1T

as well as

i

(~

S1

)

=si 

1 +

2=

2

s2i

and

i

(~

S2

)



for appropriate

i

.

Proof: In case of small perturbations, the last quadratic terms in equations (12) are generally not considered. Thus, we only have to solve

(

Q

(

S1

+

E11

)

;

(





22

+

E22

)

P

= (





21

+

E21

)

P

(

S1

+

E11T

)

;

(





T22

+

E22T

)

Q

=

E12T

As the numerical accuracy

is negligible compared to the diagonal elements in

S1

, we get

(

QS

1

;

(





22

+

E22

)

P

= (





21

+

E21

)

P

=

E12TS1;1

+ (





T22

+

E22T

)

QS1;1

Then, we can develop the rst equation as follows

QS 2

1

;

(





22

+

E22

)(





T22

+

E22T

)

Q

=



(

21S1

+



22 E12T

) +

(

E21S1

+

E22E12T

) leading to

QS 2

1

;

2



22



T22Q

(





21

+

E21

)

S1

8

(10)

while considering



. Hence, for

 sr

, we end up with

Q 

(





21

+

E21

)

S1;1 





21S1;1

as well as

P  E T

12 S

;1

1

+ (





T22

+

E22T

)(





21

+

E21

)

S1;2  E12TS1;1

where we used the fact that 

T22



21

= 0. So, from the structure of the numerical error matrix

E

in expression (7), we get that

P 

(

V2T

X1TS1U1T

+

s1 X2TU2T

]

U1

)

S1;1

=

V2TX1T

which means that the approximation of

P

is expressed regardless of the singular values of the original matrix

A

.

Finally, we are able to evaluate the expressions (13) and (14), respectively.

Therefore, we rst consider approximations of the square root matrices as follows (

I

+

QTQ

)

1=2  I

+

QTQ=

2

I

+

2S1;2=

2

(

I

+

QQT

)

1=2  I

+

QQT=

2

I

+

2



21S1;2



T21=

2 as 

T21



21

=

I

, and

(

I

+

PTP

)

1=2  I

+

PTP=

2

I

+

2X1

(

V2V2T

)

X1T=

2

I

(

I

+

PPT

)

1=2  I

+

PPT=

2

I

+

2V2T

(

X1TX1

)

V2=

2

I

Moreover, we have

(

I

+

QTQ

)

1=2S1

(

I

+

PTP

)

;1=2 

(

I

+

2S1;2=

2)

S1

and (

I

+

QQT

)

;1=2



22

(

I

+

PPT

)

1=2 

(

I

+

2



21S1;2



T21=

2)

22

=





22

because 

T21



22

= 0. Hence, from the denition (14), these expressions lead to approxi- mations of the diagonal elements of the perturbed singular value matrices, i.e.

i

(~

S1

)

=si 

1 +

2=

2

s2i

and

i

(~

S2

)



as the diagonal elements of ~

S1

are increasingly ordered and 

T22



22

=

I

.

From this proposition, it is immediately seen that the right singular vectors are robust regarding the perturbation related to the regularization parameter



. In fact, this is a well known result concerning the robustness of eigenvectors of symmetric matrices additively perturbed by a scaled identity matrix, e.g.

2I

for instance.

9

References

Related documents

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

Exakt hur dessa verksamheter har uppstått studeras inte i detalj, men nyetableringar kan exempelvis vara ett resultat av avknoppningar från större företag inklusive

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Av tabellen framgår att det behövs utförlig information om de projekt som genomförs vid instituten. Då Tillväxtanalys ska föreslå en metod som kan visa hur institutens verksamhet

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar

Den förbättrade tillgängligheten berör framför allt boende i områden med en mycket hög eller hög tillgänglighet till tätorter, men även antalet personer med längre än

På många små orter i gles- och landsbygder, där varken några nya apotek eller försälj- ningsställen för receptfria läkemedel har tillkommit, är nätet av

While firms that receive Almi loans often are extremely small, they have borrowed money with the intent to grow the firm, which should ensure that these firm have growth ambitions even