Regularization of singular least squares problems
P. Carrette
Department of Electrical Engineering Linkping University, S-581 83 Linkping, Sweden
WWW:
http://www.control.isy.liu.seEmail:
[email protected]March 11, 1998
REGLERTEKNIK
AUTOMATIC CONTROL LINKÖPING
Report no.: LiTH-ISY-R-2019 Submitted to BIT journal
Technical reports from the Automatic Control group in Linkping are available by anony-
mous ftp at the address
ftp.control.isy.liu.se. This report is contained in the com-
pressed postscript le
2019.ps.Z.
Regularization of singular least squares problems
P. Carrette
Department of Electrical Engineering, Linkoping University S-58183 Linkoping, Sweden
Abstract
In this note, we analyze the inuence of the regularization procedure applied to singular LS square problems.
It appears that, due to nite numerical accuracy within the computer calculations, the regularization parameter has to belong to a particular range of values in order to have the regularized solution close to that associated to the singular LS problem. Surprisingly enough, this range essentially depends on the square root of the computer precision while the deciency (or singularity) of the regularized LS problem is governed by this precision.
The analysis is based on matrix perturbation theory for which the paper 12] is an utmost reference.
Keywords
: Matrix perturbation, Tikhonov regularization, Singular value decompo- sition.
1 Introduction
In this contribution, we present results concerning the use of Tikhonov regularization (see 13, 10] and references therein) while solving singular least squares (LS) problems, i.e.
x
0
= min
x
kAx;bk
2
subject to min
kxk2(1)
for which the matrix
A2RN n(for
N n) is (column) rank decient. The corresponding regularized LS problem is as follows
~
x
= min
x
kAx;bk 2
2
+
2kxk22(2)
for some
value.
Here, we intend to investigate the inuence of the regularization parameter
upon the deviation between the regularized solution ~
xand the vector
x0(solution to problem (1)), i.e. ~
x;x0. The reason for this study is that we want to nd a good approximation of
x
0
without solving the singular LS problem itself (by, e.g., performing the pseudo-inverse
1
10−12 10−10 10−8 10−6 10−4 10−2 100 10−8
10−6 10−4 10−2 100 102 104 106 108
Figure 1: Quantities
kx~
;x0k2(
|) and
kx
~
k2(
;;) as functions of .
2.2 2.25 2.3 2.35 2.4 2.45 2.5 2.55 2.6 2.65 2.7
101 102
kAx~;bk2
k~xk2
Figure 2:
L-curve example with the three points appearing in Figure 1.
11] of the matrix
A, i.e.
x0=
Ayb).
As it appears in the literature (see 1, 5, 7, 4, 14] and the Matlab package presented in 8]), Tikhonov regularization gives us a grip for such an approximation while solving the ordinary (full column rank) LS problem (2). The available results about Tikhonov regularization generally consider the links between the two contributions to the cost of the regularized problem, i.e.
kAx;bk2and
kxk2, only. Indeed, roughly speaking, the solution to the original problem (1) can be obtained in making these two quantities simultaneously small for appropriate value of the regularization parameter
. By this, we have in mind the reasoning that leads to the selection of \best"
values by inspecting the
L-curve that is the representation of these two quantities at regularized solutions, i.e.
kx~
k2as a function of
kAx~
;bk2for dierent values of
, (see 7] for more details).
Unfortunately, the \corner" in the
L-curve does not provide
values that are robust with respect to the 2-norm of the deviation we are interested in, i.e.
kx~
;x0k2. Let us illustrate this fact by a simple example:
Ais a 50
5 matrix with rank 4 and unit nonzero singular values while the 50 elements of the column
bare samples of an uniformly distributed (in 0
1]) random variable. In Figure 1, we have presented the 2-norm of the deviation between the regularized solution ~
xand the original solution
x0as well as the 2-norm of the regularized solution ~
xas functions of the parameter
. The graph of the primer deviation 2-norm can be divided into two parts in agreement with the decreasing and increasing behaviors of this error with
. Three points have been enlightened on the straight line curve. In Figure 2, we have displayed the corresponding
L-curve, i.e.
(
kAx~
;bk2kx~
k2) for dierent values of
. The preceding three points are concentrated in the lower-left corner of this curve so that it does not succeed in giving a high preference to the \o" point that is associated to the \best" approximation of the reference solution
x
0
, i.e. for
5
:6 10
;5.
In order to overcome the poor capability of the
L-curve (as well as of the usual studies of Tikhonov regularization) to end up with a
value corresponding to a solution ~
xclose
2
to the reference vector
x0, we here provide the analysis of the deviation between these two solutions, i.e. ~
x ;x0, on the basis of matrix perturbation theory. Therefore, we make an intensive use of the paper of Stewart 12] that presents a complete analysis of the perturbation of the singular value decomposition (SVD) of matrices. Note that in 5], Hansen provides expressions for the upperbound on the relative deviation between a perturbed and the unperturbed solution, denoted
x, of a LS problem similar to (2), i.e.
kx
~
;xk2=kxk2.
As a by-product of our analysis, intervals over the regularization parameter values are found for any given admissible accuracy imposed on the regularized solution ~
x, i.e.
2 ;+] implying that
kx~
;x0k2.
The paper is organized as follows. In Section 2, we introduce the notations we will be dealing with in the sequel and we discuss on how numerical errors should be taken into account in order to make the analysis of the regularized solution deviation explain its behavior (as seen in Figure 1). In Section 3, we decompose the deviation between the regularized solution ~
xand the original
x0into two components, i.e. either in or out of the kernel of the problem matrix
A. In Section 4, we gives expressions for the singular value decomposition (SVD) of the perturbed version of the matrix
Athat enters in the resolution of the regularized LS problem (2). This is achieved on the basis of results presented in 12].
In Section 5, we end up with a closed form expression of the 2-norm of the regularized solution error. Its intrinsic characteristics with respect to the regularization parameter
are commented in some details. Finally, simulation examples are presented in Section 6.
They completely agree with the results derived in the preceding sections.
2 Notations and numerical error discussion
First, let us introduce notations for the two LS problems, i.e. minimization (1) and (2), respectively.
The matrix
Ahas a column rank identical to
r <nwhile
b=
Ax0+
ewhere
e 2ker
AToriginates from the inconsistency of the LS problem (1). We also dene the matrix
A 2R
(N+n) n
as
A
=
A
0
and
=
b
0
The SVD of
Ais written as
A
=
U1 U2]
S
1
0 0 0
V T
1
V T
2
=
U1S1V1T(3)
where
S1= diag(
s1 sr) with
si, the
i-th (in decreasing order) nonzero singular value of
Awhile the matrices
U1 U2] and
V1 V2] are orthogonal of dimension (
N+
n) and
n, respectively. Then, the orthogonal projector onto the range of
ATis
Kk=
V1V1Twhile its orthogonal counterpart, i.e. the orthogonal projector onto the kernel of
A, is denoted
3
K
?
=
I;Kk(=
V2V2T).
With these notations in mind, we can write the solution to the original LS problem as
x
0
=
Aywhere
Ayis the pseudo-inverse of
A, i.e.
Ay=
V1S1;1U1T(uniquely dened). It is worth mentioning that
x0=
Kkx0so that
x0all lies in the range of
AT.
Remark 1 The \
n" operator designed in Matlab for solving LS problems, i.e.
An, leads to solutions that generally contains a component in the kernel space of
A(when it is non-trivial), i.e.
x
LS
=
x0+
K?with nonzero
2 Rn. The reason for this is that the procedure uses a
QRdecompo- sition of the matrix
Aand, in case
Rhas its (
n ;r) last rows that contain negligible elements regarding the numerical accuracy of the computer, the (
n;r) last elements of the associated LS solution are xed to zero. In fact, this corresponds to choosing
so that
K?exactly compensates the (
n;r) last elements of the reference solution
x0(after permutations if necessary in the
QRdecomposition process).
2Now, let us turn to the solution of the regularized LS problem presented in (2). This problem can be expressed in terms of a particular perturbation of the matrix
A. Obviously, the second term in the brackets of (2) makes the bottom part of the corresponding matrix
A
, denoted
A, become an identity matrix scaled by the regularization parameter
, i.e.
A
=
A
I
The corresponding solution is
x
=
Ay(or
An)
It is unique because
Ais full column rank for
>0. From a numerical point of view, we must ask for
(e.g.
>10
2) where
denotes the accuracy of the computer.
Unfortunately, when we simulate the deviation between this \theoretically" regularized solution
xand the solution
x0, it is impossible to explain the deviation generally associ- ated with the \practically" regularized solution (see the straight line in Figure 8 compared to that in Figure 1). This means that our assumption concerning the inuence of the reg- ularization upon the matrix
A(leading to
A) is not correct. This can be viewed as a bad model of the regularization eect because it cannot reveal its intrinsic characteristics. It actually appears that additional perturbations must be considered: namely, the numerical errors associated to the computation of the regularized solution. More precisely, we deal with the following regularized matrix
~
A
=
A+
E(4)
4
where
denotes the numerical accuracy, i.e.
= 2
:2 10
;16in Matlab 5.1, and
E 2R
( N+n) n
stands for the (normalized) numerical error matrix (whose structure is detailed below). The SVD of ~
Ais denoted
~
A
= ~
U1 U~
2 U~
3]
2
4
~
S
1
0 0 ~
S20 0
3
5
V
~
1T~
V T
2
= ~
U1S~
1V~
1T+ ~
U2S~
2V~
2Twhere ~
S1is a diagonal containing the
rlargest singular values of ~
A(in increasing order!) and ~
S2is a diagonal containing its (
n ;r) last nonzero singular values. The matrices ~
U1 U~
2 U~
3] and ~
V1 V~
2] are orthogonal of dimension (
N+
n) and
n, respectively.
The solution of the corresponding LS problem is written as
~
x
= ~
Ay(or ~
An) (5)
It is unique in case ~
Ais full column rank, i.e.
for instance. The purpose of the paper is then to analyze in details the deviation between this solution ~
xand the reference solution
x0dened for the original LS problem.
Finally, let us give a reasonable structure for the numerical error matrix
E. Therefore, we consider the left singular subspace associated to
U1and we show how the associated singular values may induce a particular scaling of the elements of this matrix. We can write
~
A T
U
1
=
V1S1+
ETU1(6)
The
i-th column of this matrix has a 2-norm that is approximately identical to
si. This means that the related numerical perturbation must take this scale into account, i.e.
(
V1)
isi+
(
ETU1)
i=
si;(
V1)
i+
(
X1T)
ifor an appropriate matrix
X1whose denition is made regardless of the singular values of
A. This leads to
ETU1=
X1TS1. For what concerns the
ETU2counterpart, it can only be said that the largest elements within
Ewill induce large contributions to this matrix product. Hence, we globally propose that the numerical error matrix
Ecan be written as
E
=
U1S1X1+
s1 U2X2(7) for which we point out that the matrices
X1and
X2have normalized elements (indepen- dently of the
si's and of
).
3 Deviation of the regularized solution
The deviation between the regularized solution ~
xand the reference solution
x0can be decomposed into two parts, i.e.
~
x
;x
0
=
Kkx~
;x0+
K?x~
(8)
5
The rst term in the right hand side belongs to the range of the transpose of the matrix
Awhile the second completely lies in its kernel (see denition of the orthogonal projector
Kkand
K?, respectively). Because of these orthogonal projectors, each of these contributions is orthogonal to the other, i.e.
kx
~
;x0k22=
kKkx~
;x0k22+
kK?x~
k22In order to analyze these contributions, let us develop the links that exist between the two solution vectors.
From its denition in equation (5), the regularized solution ~
xcan be written as
~
x
= ~
V1 V~
2]
S
~
10 0 ~
S2
;2
V
~
1T~
V T
2
A
~
T
=
V~
1S~
1;2V~
1T+ ~
V2S~
2;2V~
2T
;
V
1 S
2
1 V
T
1 x
0
+
ETwhere we have used the fact that
~
A T
=
AT+
ET=
V1S12V1Tx0+
ETwith
x0=
V1S1;1 U1T. With the structure we have introduced for the numerical error matrix
E, we also have that
E T
= (
U1S1X1+
s1 U2X2)
T=
X1TS12V1Tx0+
s1X2T(
U2T)
where it is worth noting that
kU2Tk22(identical to
kek2) is the value of the original LS cost at
x0.
While coming back to the two contributions to the deviation of this regularized solution, we can write
K
kx
~
;x0=
V1h(
V1TV~
1)~
S1;2(~
V1TV1)
S12;IiV1Tx0+
V
1
(
V1TV~
1)~
S1;2 h(~
V1TX1T)
S12V1Tx0+
s1(~
V1TX2T) (
U2T)
i+
V
1
(
V1TV~
2)~
S2;2h(~
V2TV1) +
(~
V2TX1T)]
S12V1Tx0+
s1(~
V2TX2T) (
U2T)
i(9) and
K
?x
~
=
V2(
V2TV~
1)~
S1;2h(~
V1TV1) +
(~
V1TX1T)]
S12V1Tx0+
s1(~
V1TX2T) (
U2T)
i+
V
2
(
V2TV~
2)~
S2;2h(~
V2TV1) +
(~
V2TX1T)]
S12V1Tx0+
s1(~
V2TX2T) (
U2T)
i(10) Let us give an interpretation for these two expressions. Because of the regularization of the singular LS problem, the right singular vectors of
Arotate a little bit, i.e. leading to ~
V1 V~
2], and its two singular value subsets (i.e.
sifor
i= 1
ras well as the remaining
6
~
K k
~
K
?
K k K
?
~
x
x
0
Figure 3: Schematic representation of the compo- nents of the regularized solution ~
x.
zero singular values) are also slightly altered giving rise to ~
S1and ~
S2. The regularized solution ~
xis naturally expressed in terms of these perturbed singular pairs, i.e. (~
S1V~
1) and (~
S2V~
2). In other words, two components are found for it according to the related sub- spaces for which the orthogonal projectors are ~
Kk= ~
V1V~
1Tand ~
K?= ~
V2V~
2T, respectively.
Hence, the (above) expressions for the contributions to the deviation of the regularized solution show up the projection of this solution back to the subspaces associated to the original matrix
A, i.e. by use of the orthogonal projector
Kkand
K?. In Figure 3, we have schematically drawn this back projection for a 2-dimensional case.
It is also worth noticing that the components in the perturbed subspaces are inuenced by the inverse of the square of the corresponding singular values, i.e. ~
S1;2and ~
S2;2, respec- tively. As the latter will be seen to behave similarly to
, the corresponding component will show an extreme sensitivity to this regularization parameter, i.e.
1
=2.
In order to analyze these expressions, we must evaluate the role of the regularization parameter
and of the numerical error matrix
Eon the following quantities
V T
1 V
~
1 V1TV~
2 V2TV~
1and
V2TV~
2as well as ~
S1and ~
S2. A simple way to achieve this goal is to use results concerning the SVD of perturbed matrices.
4 SVD of perturbed matrices
In 12], Stewart shows that the right singular vectors of a perturbed matrix, ~
Asay, can be expressed in terms of those of the original matrix,
Asay, as follows
~
V1 V~
2] =
V1 V2]
I ;P T
P I
(
I+
PTP)
;1=20 0 (
I+
PPT)
;1=2
(11) while
P 2R( n;r) ris a matrix satisfying the equation system
(
Q
(
S1+
E11)
;(
22+
E22)
P= (
21+
E21)
;QE12PP
(
S1+
E11T)
;(
T22+
E22T)
Q=
E12T ;P(
T21+
E21T)
Q(12)
7
for
Q2R( N+n;r) rand
2j=
U2T0
I]
TVjwhile
Eij=
UiTEVj.
Note that
T2j2j=
Iand
T2122= 0 because of (0
I]
U2)(0
I]
U2)
T=
Itogether with the orthogonality of the original right singular vectors, i.e.
V1 V2].
From expression (11), we immediately have that
V T
1 V
~
1= (
I+
PTP)
;1=2 V1TV~
2=
;PT(
I+
PPT)
;1=2V T
2
~
V
1
=
P(
I+
PTP)
;1=2 V2TV~
2= (
I+
PPT)
;1=2(13) He also states that the perturbed singular values belong to disjoint sets so that
i
(~
S1) =
i;(
I+
QTQ)
1=2(
S1+
(
E11+
E12P))(
I+
PTP)
;1=2
i
(~
S2) =
i;(
I+
QQT)
;1=2(
22+
(
E22;QE12))(
I+
PPT)
1=2(14) where
i(
X) denotes the
i-th singular value of the matrix
X(in increasing order).
After straightforward derivations, we end up with the following result.
Proposition 1 Under the assumption that
sr, we have that the solution for the
P
and
Qmatrices in the equation system (12) respectively satisfy
P V T
2 X
T
1
and
Q 21S1;1where the symbol \
" should be understood in the spectral sense, i.e.
B Cis equivalent to
kB;Ck21. Furthermore, we have that
V T
1
~
V
1
I V T
2
~
V
2
I
and
V2TV~
1;
V1TV~
2]
T V2TX1Tas well as
i(~
S1)
=si1 +
2=2
s2iand
i(~
S2)
for appropriate
i.
Proof: In case of small perturbations, the last quadratic terms in equations (12) are generally not considered. Thus, we only have to solve
(
Q
(
S1+
E11)
;(
22+
E22)
P= (
21+
E21)
P
(
S1+
E11T)
;(
T22+
E22T)
Q=
E12TAs the numerical accuracy
is negligible compared to the diagonal elements in
S1, we get
(
QS
1
;
(
22+
E22)
P= (
21+
E21)
P
=
E12TS1;1+ (
T22+
E22T)
QS1;1Then, we can develop the rst equation as follows
QS 2
1
;
(
22+
E22)(
T22+
E22T)
Q=
(
21S1+
22E12T) +
(
E21S1+
E22E12T) leading to
QS 2
1
;
2
22T22Q(
21+
E21)
S18
while considering
. Hence, for
sr, we end up with
Q(
21+
E21)
S1;1 21S1;1as well as
P E T
12 S
;1
1
+ (
T22+
E22T)(
21+
E21)
S1;2 E12TS1;1where we used the fact that
T2221= 0. So, from the structure of the numerical error matrix
Ein expression (7), we get that
P
(
V2TX1TS1U1T+
s1 X2TU2T]
U1)
S1;1=
V2TX1Twhich means that the approximation of
Pis expressed regardless of the singular values of the original matrix
A.
Finally, we are able to evaluate the expressions (13) and (14), respectively.
Therefore, we rst consider approximations of the square root matrices as follows (
I+
QTQ)
1=2 I+
QTQ=2
I+
2S1;2=2
(
I+
QQT)
1=2 I+
QQT=2
I+
221S1;2T21=2 as
T2121=
I, and
(
I+
PTP)
1=2 I+
PTP=2
I+
2X1(
V2V2T)
X1T=2
I(
I+
PPT)
1=2 I+
PPT=2
I+
2V2T(
X1TX1)
V2=2
IMoreover, we have
(
I+
QTQ)
1=2S1(
I+
PTP)
;1=2(
I+
2S1;2=2)
S1and (
I+
QQT)
;1=222(
I+
PPT)
1=2(
I+
221S1;2T21=2)
22=
22because
T2122= 0. Hence, from the denition (14), these expressions lead to approxi- mations of the diagonal elements of the perturbed singular value matrices, i.e.
i