Regularization of singular least squares problems

(1)

Regularization of singular least squares problems

P. Carrette

Department of Electrical Engineering Linkping University, S-581 83 Linkping, Sweden

WWW:

http://www.control.isy.liu.se

Email:

carrette@isy.liu.se

March 11, 1998

REGLERTEKNIK

AUTOMATIC CONTROL LINKÖPING

Report no.: LiTH-ISY-R-2019 Submitted to BIT journal

Technical reports from the Automatic Control group in Linkping are available by anony-

mous ftp at the address

ftp.control.isy.liu.se

. This report is contained in the com-

pressed postscript le

^2019.ps.Z

.

(2)

Regularization of singular least squares problems

P. Carrette

Department of Electrical Engineering, Linkoping University S-58183 Linkoping, Sweden

Abstract

In this note, we analyze the inuence of the regularization procedure applied to singular LS square problems.

It appears that, due to nite numerical accuracy within the computer calculations, the regularization parameter has to belong to a particular range of values in order to have the regularized solution close to that associated to the singular LS problem. Surprisingly enough, this range essentially depends on the square root of the computer precision while the deciency (or singularity) of the regularized LS problem is governed by this precision.

The analysis is based on matrix perturbation theory for which the paper 12] is an utmost reference.

Keywords

: Matrix perturbation, Tikhonov regularization, Singular value decomposition.

1 Introduction

In this contribution, we present results concerning the use of Tikhonov regularization (see 13, 10] and references therein) while solving singular least squares (LS) problems, i.e.

x

0

= min

x

kAx;bk

2

subject to min

^kxk²

(1)

for which the matrix

^A²^R^N ⁿ

(for

^N ⁿ

) is (column) rank decient. The corresponding regularized LS problem is as follows

~

x

= min

x

kAx;bk 2

2

+

²^kxk²²

(2)

for some

value.

Here, we intend to investigate the inuence of the regularization parameter

upon the deviation between the regularized solution ~

^x

and the vector

^x⁰

(solution to problem (1)), i.e. ~

^x^;^x⁰

. The reason for this study is that we want to nd a good approximation of

x

0

without solving the singular LS problem itself (by, e.g., performing the pseudo-inverse

1

(3)

10⁻¹² 10⁻¹⁰ 10⁻⁸ 10⁻⁶ 10⁻⁴ 10⁻² 10⁰ 10⁻⁸

10⁻⁶ 10⁻⁴ 10⁻² 10⁰ 10² 10⁴ 10⁶ 10⁸

Figure 1: Quantities

^k^x

~

^;^x⁰^k²

(

^|

) and

k^x

~

^k²

(

^;;

) as functions of .

2.2 2.25 2.3 2.35 2.4 2.45 2.5 2.55 2.6 2.65 2.7

10¹ 10²

kA^x~^;^bk2

k~xk2

Figure 2:

^L

-curve example with the three points appearing in Figure 1.

11] of the matrix

^A

, i.e.

^x⁰

=

^A^y^b

).

As it appears in the literature (see 1, 5, 7, 4, 14] and the Matlab package presented in 8]), Tikhonov regularization gives us a grip for such an approximation while solving the ordinary (full column rank) LS problem (2). The available results about Tikhonov regularization generally consider the links between the two contributions to the cost of the regularized problem, i.e.

^kAx^;^bk²

and

^kxk²

, only. Indeed, roughly speaking, the solution to the original problem (1) can be obtained in making these two quantities simultaneously small for appropriate value of the regularization parameter

. By this, we have in mind the reasoning that leads to the selection of \best"

values by inspecting the

^L

-curve that is the representation of these two quantities at regularized solutions, i.e.

^k^x

~

^k²

as a function of

^kA^x

~

^;^bk²

for dierent values of

, (see 7] for more details).

Unfortunately, the \corner" in the

^L

-curve does not provide

values that are robust with respect to the 2-norm of the deviation we are interested in, i.e.

^k^x

~

^;^x⁰^k²

. Let us illustrate this fact by a simple example:

^A

is a 50

5 matrix with rank 4 and unit nonzero singular values while the 50 elements of the column

^b

are samples of an uniformly distributed (in 0

1]) random variable. In Figure 1, we have presented the 2-norm of the deviation between the regularized solution ~

^x

and the original solution

^x⁰

as well as the 2-norm of the regularized solution ~

^x

as functions of the parameter

. The graph of the primer deviation 2-norm can be divided into two parts in agreement with the decreasing and increasing behaviors of this error with

. Three points have been enlightened on the straight line curve. In Figure 2, we have displayed the corresponding

^L

-curve, i.e.

(

^kA^x

~

^;^bk²^k^x

~

^k²

) for dierent values of

. The preceding three points are concentrated in the lower-left corner of this curve so that it does not succeed in giving a high preference to the \o" point that is associated to the \best" approximation of the reference solution

x

0

, i.e. for

5

^:

6 10

^;5

.

In order to overcome the poor capability of the

^L

-curve (as well as of the usual studies of Tikhonov regularization) to end up with a

value corresponding to a solution ~

^x

close

2

(4)

to the reference vector

^x⁰

, we here provide the analysis of the deviation between these two solutions, i.e. ~

^x ^;^x⁰

, on the basis of matrix perturbation theory. Therefore, we make an intensive use of the paper of Stewart 12] that presents a complete analysis of the perturbation of the singular value decomposition (SVD) of matrices. Note that in 5], Hansen provides expressions for the upperbound on the relative deviation between a perturbed and the unperturbed solution, denoted

^x

, of a LS problem similar to (2), i.e.

k^x

~

^;^x^k²^=kx^k²

.

As a by-product of our analysis, intervals over the regularization parameter values are found for any given admissible accuracy imposed on the regularized solution ~

^x

, i.e.

²

^;⁺

] implying that

^k^x

~

^;^x⁰^k²

.

The paper is organized as follows. In Section 2, we introduce the notations we will be dealing with in the sequel and we discuss on how numerical errors should be taken into account in order to make the analysis of the regularized solution deviation explain its behavior (as seen in Figure 1). In Section 3, we decompose the deviation between the regularized solution ~

^x

and the original

^x⁰

into two components, i.e. either in or out of the kernel of the problem matrix

^A

. In Section 4, we gives expressions for the singular value decomposition (SVD) of the perturbed version of the matrix

^A

that enters in the resolution of the regularized LS problem (2). This is achieved on the basis of results presented in 12].

In Section 5, we end up with a closed form expression of the 2-norm of the regularized solution error. Its intrinsic characteristics with respect to the regularization parameter

are commented in some details. Finally, simulation examples are presented in Section 6.

They completely agree with the results derived in the preceding sections.

2 Notations and numerical error discussion

First, let us introduce notations for the two LS problems, i.e. minimization (1) and (2), respectively.

The matrix

^A

has a column rank identical to

^r ^<ⁿ

while

^b

=

^Ax⁰

+

^e

where

^e ²

ker

^A^T

originates from the inconsistency of the LS problem (1). We also dene the matrix

A 2R

(N+n) n

as

A

=

A

0 and

=

b

0 The SVD of

^A

is written as

A

=

^U¹ ^U²

]

S

1

0 0 0

V T

1

V T

2

=

^U¹^S¹^V¹^T

(3)

where

^S¹

= diag(

^s¹ ^s^r

) with

^sⁱ

, the

ⁱ

-th (in decreasing order) nonzero singular value of

^A

while the matrices

^U¹ ^U²

] and

^V¹ ^V²

] are orthogonal of dimension (

^N

+

ⁿ

) and

ⁿ

, respectively. Then, the orthogonal projector onto the range of

^A^T

is

^K^k

=

^V¹^V¹^T

while its orthogonal counterpart, i.e. the orthogonal projector onto the kernel of

^A

, is denoted

3

(5)

K

?

=

^I^;^K^k

(=

^V²^V²^T

).

With these notations in mind, we can write the solution to the original LS problem as

x

0

=

^A^y

where

^A^y

is the pseudo-inverse of

^A

, i.e.

^A^y

=

^V¹^S¹^;1^U¹^T

(uniquely dened). It is worth mentioning that

^x⁰

=

^K^k^x⁰

so that

^x⁰

all lies in the range of

^A^T

.

Remark 1 The \

ⁿ

" operator designed in Matlab for solving LS problems, i.e.

^An

, leads to solutions that generally contains a component in the kernel space of

^A

(when it is non-trivial), i.e.

x

LS

=

^x⁰

+

^K^?

with nonzero

² ^Rⁿ

. The reason for this is that the procedure uses a

^QR

decomposition of the matrix

^A

and, in case

^R

has its (

ⁿ ^;^r

) last rows that contain negligible elements regarding the numerical accuracy of the computer, the (

ⁿ^;^r

) last elements of the associated LS solution are xed to zero. In fact, this corresponds to choosing

so that

^K^?

exactly compensates the (

ⁿ^;^r

) last elements of the reference solution

^x⁰

(after permutations if necessary in the

^QR

decomposition process).

²

Now, let us turn to the solution of the regularized LS problem presented in (2). This problem can be expressed in terms of a particular perturbation of the matrix

^A

. Obviously, the second term in the brackets of (2) makes the bottom part of the corresponding matrix

A

, denoted

^A

, become an identity matrix scaled by the regularization parameter

, i.e.

A

=

A

I

The corresponding solution is

x

=

^A^y

(or

^Aⁿ

)

It is unique because

^A

is full column rank for

^>

0. From a numerical point of view, we must ask for

(e.g.

^>

10

²

) where

denotes the accuracy of the computer.

Unfortunately, when we simulate the deviation between this \theoretically" regularized solution

^x

and the solution

^x⁰

, it is impossible to explain the deviation generally associated with the \practically" regularized solution (see the straight line in Figure 8 compared to that in Figure 1). This means that our assumption concerning the inuence of the regularization upon the matrix

^A

(leading to

^A

) is not correct. This can be viewed as a bad model of the regularization eect because it cannot reveal its intrinsic characteristics. It actually appears that additional perturbations must be considered: namely, the numerical errors associated to the computation of the regularized solution. More precisely, we deal with the following regularized matrix

~

A

=

^A

+

^E

(4)

4

(6)

where

denotes the numerical accuracy, i.e.

= 2

^:

2 10

^;16

in Matlab 5.1, and

^E ²

R

( N+n) n

stands for the (normalized) numerical error matrix (whose structure is detailed below). The SVD of ~

^A

is denoted

~

A

= ~

^U¹ ^U

~

² ^U

~

³

]

2

4

~

S

1

0 0 ~

^S²

0 0

3

5

^V

~

1^T

~

V T

2

= ~

^U¹^S

~

¹^V

~

1^T

+ ~

^U²^S

~

²^V

~

2^T

where ~

^S¹

is a diagonal containing the

^r

largest singular values of ~

^A

(in increasing order!) and ~

^S²

is a diagonal containing its (

ⁿ ^;^r

) last nonzero singular values. The matrices ~

^U¹ ^U

~

2 ^U

~

3

] and ~

^V¹ ^V

~

2

] are orthogonal of dimension (

^N

+

ⁿ

) and

ⁿ

, respectively.

The solution of the corresponding LS problem is written as

~

x

= ~

^A^y

(or ~

^Aⁿ

) (5)

It is unique in case ~

^A

is full column rank, i.e.

for instance. The purpose of the paper is then to analyze in details the deviation between this solution ~

^x

and the reference solution

^x⁰

dened for the original LS problem.

Finally, let us give a reasonable structure for the numerical error matrix

^E

. Therefore, we consider the left singular subspace associated to

^U¹

and we show how the associated singular values may induce a particular scaling of the elements of this matrix. We can write

~

A T

U

1

=

^V¹^S¹

+

^E^T^U¹

(6)

The

ⁱ

-th column of this matrix has a 2-norm that is approximately identical to

^sⁱ

. This means that the related numerical perturbation must take this scale into account, i.e.

(

^V¹

)

ⁱ^sⁱ

+

(

^E^T^U¹

)

ⁱ

=

^sⁱ^;

(

^V¹

)

ⁱ

+

(

^X¹^T

)

ⁱ

for an appropriate matrix

^X¹

whose denition is made regardless of the singular values of

^A

. This leads to

^E^T^U¹

=

^X¹^T^S¹

. For what concerns the

^E^T^U²

counterpart, it can only be said that the largest elements within

^E

will induce large contributions to this matrix product. Hence, we globally propose that the numerical error matrix

^E

can be written as

E

=

^U¹^S¹^X¹

+

^s¹ ^U²^X²

(7) for which we point out that the matrices

^X¹

and

^X²

have normalized elements (indepen- dently of the

^sⁱ

's and of

).

3 Deviation of the regularized solution

The deviation between the regularized solution ~

^x

and the reference solution

^x⁰

can be decomposed into two parts, i.e.

~

x

;x

0

=

^K^k^x

~

^;^x⁰

+

^K^?^x

~

(8)

5

(7)

The rst term in the right hand side belongs to the range of the transpose of the matrix

^A

while the second completely lies in its kernel (see denition of the orthogonal projector

^K^k

and

^K^?

, respectively). Because of these orthogonal projectors, each of these contributions is orthogonal to the other, i.e.

k^x

~

^;^x⁰^k²²

=

^kK^k^x

~

^;^x⁰^k²²

+

^kK^?^x

~

^k²²

In order to analyze these contributions, let us develop the links that exist between the two solution vectors.

From its denition in equation (5), the regularized solution ~

^x

can be written as

~

x

= ~

^V¹ ^V

~

²

]

^S

~

¹

0 0 ~

^S²

;2

^V

~

1^T

~

V T

2

^A

~

^T

=

^V

~

¹^S

~

1^;2^V

~

1^T

+ ~

^V²^S

~

2^;2^V

~

2^T

;

V

1 S

2

1 V

T

1 x

0

+

^E^T

where we have used the fact that

~

A T

=

^A^T

+

^E^T

=

^V¹^S¹²^V¹^T^x⁰

+

^E^T

with

^x⁰

=

^V¹^S¹^;1 ^U¹^T

. With the structure we have introduced for the numerical error matrix

^E

, we also have that

E T

= (

^U¹^S¹^X¹

+

^s¹ ^U²^X²

)

^T

=

^X¹^T^S¹²^V¹^T^x⁰

+

^s¹^X²^T

(

^U²^T

)

where it is worth noting that

^kU²^T^k²²

(identical to

^kek²

) is the value of the original LS cost at

^x⁰

.

While coming back to the two contributions to the deviation of this regularized solution, we can write

K

k^x

~

^;^x⁰

=

^V¹^h

(

^V¹^T^V

~

1

)~

^S¹^;2

(~

^V¹^T^V¹

)

^S¹²^;^Iⁱ^V¹^T^x⁰

+

V

1

(

^V¹^T^V

~

1

)~

^S¹^;2 ^h

(~

^V¹^T^X¹^T

)

^S¹²^V¹^T^x⁰

+

^s¹

(~

^V¹^T^X²^T

) (

^U²^T

)

ⁱ

+

V

1

(

^V¹^T^V

~

²

)~

^S²^;2^h

(~

^V²^T^V¹

) +

(~

^V²^T^X¹^T

)]

^S¹²^V¹^T^x⁰

+

^s¹

(~

^V²^T^X²^T

) (

^U²^T

)

ⁱ

(9) and

K

?^x

~

=

^V²

(

^V²^T^V

~

1

)~

^S¹^;2^h

(~

^V¹^T^V¹

) +

(~

^V¹^T^X¹^T

)]

^S¹²^V¹^T^x⁰

+

^s¹

(~

^V¹^T^X²^T

) (

^U²^T

)

ⁱ

+

V

2

(

^V²^T^V

~

²

)~

^S²^;2^h

(~

^V²^T^V¹

) +

(~

^V²^T^X¹^T

)]

^S¹²^V¹^T^x⁰

+

^s¹

(~

^V²^T^X²^T

) (

^U²^T

)

ⁱ

(10) Let us give an interpretation for these two expressions. Because of the regularization of the singular LS problem, the right singular vectors of

^A

rotate a little bit, i.e. leading to ~

^V¹ ^V

~

²

], and its two singular value subsets (i.e.

^sⁱ

for

ⁱ

= 1

^r

as well as the remaining

6

(8)

~

K k

~

K

?

K k K

?

~

x

0

Figure 3: Schematic representation of the components of the regularized solution ~

^x

.

zero singular values) are also slightly altered giving rise to ~

^S¹

and ~

^S²

. The regularized solution ~

^x

is naturally expressed in terms of these perturbed singular pairs, i.e. (~

^S¹^V

~

¹

) and (~

^S²^V

~

²

). In other words, two components are found for it according to the related subspaces for which the orthogonal projectors are ~

^K^k

= ~

^V¹^V

~

1^T

and ~

^K^?

= ~

^V²^V

~

2^T

, respectively.

Hence, the (above) expressions for the contributions to the deviation of the regularized solution show up the projection of this solution back to the subspaces associated to the original matrix

^A

, i.e. by use of the orthogonal projector

^K^k

and

^K^?

. In Figure 3, we have schematically drawn this back projection for a 2-dimensional case.

It is also worth noticing that the components in the perturbed subspaces are inuenced by the inverse of the square of the corresponding singular values, i.e. ~

^S¹^;2

and ~

^S²^;2

, respectively. As the latter will be seen to behave similarly to

, the corresponding component will show an extreme sensitivity to this regularization parameter, i.e.

1

⁼²

.

In order to analyze these expressions, we must evaluate the role of the regularization parameter

and of the numerical error matrix

^E

on the following quantities

V T

1 ^V

~

¹ ^V1^T^V

~

² ^V2^T^V

~

¹

and

^V²^T^V

~

²

as well as ~

^S¹

and ~

^S²

. A simple way to achieve this goal is to use results concerning the SVD of perturbed matrices.

4 SVD of perturbed matrices

In 12], Stewart shows that the right singular vectors of a perturbed matrix, ~

^A

say, can be expressed in terms of those of the original matrix,

^A

say, as follows

~

^V¹ ^V

~

2

] =

^V¹ ^V²

]

I ;P T

P I

(

^I

+

^P^T^P

)

^;1=2

0 0 (

^I

+

^P^P^T

)

^;1=2

(11) while

^P ²^R^{( n;r)} ^r

is a matrix satisfying the equation system

(

Q

(

^S¹

+

^E¹¹

)

^;

(

²²

+

^E²²

)

^P

= (

²¹

+

^E²¹

)

^;^QE¹²^P

P

(

^S¹

+

^E¹¹^T

)

^;

(

^T²²

+

^E²²^T

)

^Q

=

^E¹²^T ^;^P

(

^T²¹

+

^E²¹^T

)

^Q

(12)

7

(9)

for

^Q²^R^{( N+n;r)} ^r

and

^2j

=

^U²^T

0

^I

]

^T^V^j

while

^E^ij

=

^Uⁱ^T^EV^j

.

Note that

^T^2j

^2j

=

^I

and

^T²¹

²²

= 0 because of (0

^I

]

^U²

)(0

^I

]

^U²

)

^T

=

^I

together with the orthogonality of the original right singular vectors, i.e.

^V¹ ^V²

].

From expression (11), we immediately have that

V T

1 ^V

~

¹

= (

^I

+

^P^T^P

)

^;1=2 ^V¹^T^V

~

²

=

^;P^T

(

^I

+

^P^P^T

)

^;1=2

V T

2

~

V

1

=

^P

(

^I

+

^P^T^P

)

^;1=2 ^V²^T^V

~

2

= (

^I

+

^P^P^T

)

^;1=2

(13) He also states that the perturbed singular values belong to disjoint sets so that

i

(~

^S¹

) =

ⁱ^;

(

^I

+

^Q^T^Q

)

¹⁼²

(

^S¹

+

(

^E¹¹

+

^E¹²^P

))(

^I

+

^P^T^P

)

^;1=2

i

(~

^S²

) =

ⁱ^;

(

^I

+

^QQ^T

)

^;1=2

(

²²

+

(

^E²²^;^QE¹²

))(

^I

+

^P^P^T

)

¹⁼²

(14) where

ⁱ

(

^X

) denotes the

ⁱ

-th singular value of the matrix

^X

(in increasing order).

After straightforward derivations, we end up with the following result.

Proposition 1 Under the assumption that

^s^r

, we have that the solution for the

P

and

^Q

matrices in the equation system (12) respectively satisfy

P V T

2 X

T

1

and

^Q

²¹^S¹^;1

where the symbol \

" should be understood in the spectral sense, i.e.

^B ^C

is equivalent to

^kB^;^Ck²

1. Furthermore, we have that

V T

1

~

V

1

I V T

2

~

V

2

I

and

^V²^T^V

~

1

;

^V¹^T^V

~

2

]

^T ^V²^T^X¹^T

as well as

ⁱ

(~

^S¹

)

^=sⁱ

1 +

²⁼

2

^s²ⁱ

and

ⁱ

(~

^S²

)

for appropriate

ⁱ

.

Proof: In case of small perturbations, the last quadratic terms in equations (12) are generally not considered. Thus, we only have to solve

(

Q

(

^S¹

+

^E¹¹

)

^;

(

²²

+

^E²²

)

^P

= (

²¹

+

^E²¹

)

P

(

^S¹

+

^E¹¹^T

)

^;

(

^T²²

+

^E²²^T

)

^Q

=

^E¹²^T

As the numerical accuracy

is negligible compared to the diagonal elements in

^S¹

, we get

(

QS

1

;

(

²²

+

^E²²

)

^P

= (

²¹

+

^E²¹

)

P

=

^E¹²^T^S¹^;1

+ (

^T²²

+

^E²²^T

)

^QS¹^;1

Then, we can develop the rst equation as follows

QS 2

1

;

(

²²

+

^E²²

)(

^T²²

+

^E²²^T

)

^Q

=

(

²¹^S¹

+

²²^E¹²^T

) +

(

^E²¹^S¹

+

^E²²^E¹²^T

) leading to

QS 2

1

;

2

²²

^T²²^Q

(

²¹

+

^E²¹

)

^S¹

8

(10)

while considering

. Hence, for

^s^r

, we end up with

^Q

(

²¹

+

^E²¹

)

^S¹^;1

²¹^S¹^;1

as well as

P E T

12 S

;1

1

+ (

^T²²

+

^E²²^T

)(

²¹

+

^E²¹

)

^S¹^;2 ^E¹²^T^S¹^;1

where we used the fact that

^T²²

²¹

= 0. So, from the structure of the numerical error matrix

^E

in expression (7), we get that

P

(

^V²^T

^X¹^T^S¹^U¹^T

+

^s¹ ^X²^T^U²^T

]

^U¹

)

^S¹^;1

=

^V²^T^X¹^T

which means that the approximation of

^P

is expressed regardless of the singular values of the original matrix

^A

.

Finally, we are able to evaluate the expressions (13) and (14), respectively.

Therefore, we rst consider approximations of the square root matrices as follows (

^I

+

^Q^T^Q

)

¹⁼² ^I

+

^Q^T^Q=

2

^I

+

²^S¹^;2⁼

2 (

^I

+

^QQ^T

)

¹⁼² ^I

+

^QQ^T⁼

2

^I

+

²

²¹^S¹^;2

^T²¹⁼

2 as

^T²¹

²¹

=

^I

, and

(

^I

+

^P^T^P

)

¹⁼² ^I

+

^P^T^P⁼

2

^I

+

²^X¹

(

^V²^V²^T

)

^X¹^T⁼

2

^I

(

^I

+

^P^P^T

)

¹⁼² ^I

+

^P^P^T⁼

2

^I

+

²^V²^T

(

^X¹^T^X¹

)

^V²⁼

2

^I

Moreover, we have

(

^I

+

^Q^T^Q

)

¹⁼²^S¹

(

^I

+

^P^T^P

)

^;1=2

(

^I

+

²^S¹^;2⁼

2)

^S¹

and (

^I

+

^QQ^T

)

^;1=2

²²

(

^I

+

^P^P^T

)

¹⁼²

(

^I

+

²

²¹^S¹^;2

^T²¹⁼

2)

²²

=

²²

because

^T²¹

²²

= 0. Hence, from the denition (14), these expressions lead to approximations of the diagonal elements of the perturbed singular value matrices, i.e.

i

(~

^S¹

)

^=sⁱ

1 +

²⁼

2

^s²ⁱ

and

ⁱ

(~

^S²

)

as the diagonal elements of ~

^S¹

are increasingly ordered and

^T²²

²²

=

^I

.

From this proposition, it is immediately seen that the right singular vectors are robust regarding the perturbation related to the regularization parameter

. In fact, this is a well known result concerning the robustness of eigenvectors of symmetric matrices additively perturbed by a scaled identity matrix, e.g.

²^I

for instance.

9

Regularization of singular least squares problems