• No results found

Generalized Gram–Schmidt Process: Its Analysis and

N/A
N/A
Protected

Academic year: 2022

Share "Generalized Gram–Schmidt Process: Its Analysis and"

Copied!
101
0
0

Loading.... (view fulltext now)

Full text

(1)

Faculty of Mechatronics,

Informatics and Interdisciplinary Studies

Technical University of Liberec, Liberec

Institute of Computer Science

Academy of Sciences of the Czech Republic, Prague

Generalized Gram–Schmidt Process: Its Analysis and

Use in Preconditioning

Jiˇr´ı Kopal

(2)

PhD Thesis

Generalized Gram–Schmidt Process: Its Analysis and Use in Preconditioning Author: Jiˇr´ı Kopal

(jiri.kopal@tul.cz) Supervisor: Miroslav T˚uma

(tuma@cs.cas.cz) Supervisor specialist: Miroslav Rozloˇzn´ık

(miro@cs.cas.cz)

Study programme: P2612 Electrotechnics and informatics Field of study: 3901V025 Science engineering

Departments: Faculty of Mechatronics, Informatics and Interdisciplinary Studies

Institute of Novel Technologies and Applied Informatics, Technical University of Liberec,

Studentsk´a 2, 461 17 Liberec 1, Czech Republic and Institute of Computer Science,

Department of Computational Methods, Academy of Sciences of the Czech Republic,

Pod Vod´arenskou vˇeˇz´ı 2, 182 07 Prague 8, Czech Republic

Copyright© 2014 by Jiˇr´ı Kopal.

Typeset by AMS–LATEX.

(3)

Prohl´aˇsen´ı:

Byl jsem sezn´amen s t´ım, ˇze na mou dizertaˇcn´ı pr´aci se plnˇe vztahuje z´akon ˇc. 121/2000 Sb., o pr´avu autorsk´em, zejm´ena§60 – ˇskoln´ı d´ılo.

Beru na vˇedom´ı, ˇze Technick´a univerzita v Liberci (TUL) nezasahuje do m´ych autorsk´ych pr´av uˇzit´ım m´e dizertaˇcn´ı pr´ace pro vnitˇrn´ı potˇrebu TUL.

Uˇziji-li dizertaˇcn´ı pr´aci nebo poskytnu-li licenci k jej´ımu vyuˇzit´ı, jsem si vˇedom povinnosti informovat o t´eto skuteˇcnosti TUL; v tomto pˇr´ıpadˇe m´a TUL pr´avo ode mne poˇzadovat ´uhradu n´aklad˚u, kter´e vynaloˇzila na vytvoˇren´ı d´ıla, aˇz do jejich skuteˇcn´e v´yˇse.

Dizertaˇcn´ı pr´aci jsem vypracoval samostatnˇe s pouˇzit´ım uveden´e literatury a na z´akladˇe konzultac´ı se ˇskolitelem a ˇskolitelem specialistou.

Souˇcasnˇe ˇcestnˇe prohlaˇsuji, ˇze tiˇstˇen´a verze pr´ace se shoduje s elektronickou verz´ı, vloˇzenou do IS STAG.

Datum:

(4)

iv

(5)

Acknowledgements

First, I want to express my deepest gratitude to my supervisor Mirek T˚uma and supervisor specialist Miro Rozloˇzn´ık, who introduced me to the area of numerical linear algebra. Both of them are also an excelent example on personal and profes- sional level for me. I also want to give special thanks to Mirek for computing large test problems for this thesis by using his Fortran codes.

I am also indebted to Pavel Jir´anek for careful reading of the thesis and valuable suggestions which improved presentation of the thesis, to Martin Pleˇsinger, my roommate from the department of mathematics, for his advices at writing the thesis.

I wish to thank my wife Veronika for personal support throughout my postgrad- uate study. I also wish to thank my parents. Without their assistance I would have never been able to start my postgraduate study.

My thanks also go to Jaroslav Ml´ynek, our head of department of mathematics, for his support during the last semester before finishing the thesis and also to Lucie Bartoˇsov´a for language correction.

Financial support of the work presented in this thesis was provided in part by the projects 13-06684S and 108/11/0853 of the Grant Agency of the Czech Republic.

(6)

vi

(7)

Abstract

Many problems arising from mathematical modelling in science and technology lead to solving large and sparse systems of linear algebraic equations. Then a natural way to solve them is to use iterative methods based on Krylov subspaces such as the method of conjugate gradients. In order to be efficient, such methods often need good preconditioners.

We focus on a particular strategy to obtain such preconditioners based on the generalized Gram–Schmidt process. We study both its theoretical properties as well as algorithms to compute it approximately and use the resulting factors as preconditioners. In this way, the thesis couples together two different areas of numerical linear algebra, i.e., numerical analysis and computational mathematics that is more focused on real-world computations.

From the theoretical point of view, the generalized Gram–Schmidt process in exact arithmetic computes a factorization of the inverse of the corresponding ma- trix. In other words, it can be formally considered as a direct method. Here we prefer to see the process as an incomplete scheme that produces a factorized sparse approximate inverse of the matrix. The generalized Gram–Schmidt process was introduced in a couple of important classical papers as [29, 41]. Its use as an in- complete scheme for computing approximate inverse preconditioning was proposed by Benzi et al. [5] and later enhanced in [7, 4]. The incompleteness is achieved by a dropping strategy based on discarding entries that are in some sense small.

We analyze the generalized Gram–Schmidt process in finite precision arithmetic.

This analysis is a continuation of the paper by Rozloˇzn´ık et al. [60] (2012). For example, we extend the generalized Gram–Schmidt process by pivoting that en- ables an improvement of some error bounds. In particular, differences between component-wise and norm-wise error bounds are highlighted.

Both theoretical considerations as well as experimental observations lead to a new dropping technique that behaves similarly to rounding in finite precision arith- metic. The new dropping technique introduced by Kopal et al. [49] (2013) is studied more in detail and derived in a different way.

The main goal is to present a new dropping technique that may generally help to better understand the interplay between floating-point analysis and numerical aspects of preconditioning by incomplete decompositions.

Keywords: approximate inverse preconditioning, Gram–Schmidt process, pivot- ing, sparse matrices, incomplete algorithms, dropping techniques, iterative methods

(8)

viii

(9)

Abstrakt

Mnoho ´uloh matematick´eho modelov´an´ı ve vˇedˇe a technice vede na ˇreˇsen´ı velk´ych a ˇ

r´ıdk´ych soustav line´arn´ıch algebraick´ych rovnic. Takov´e ´ulohy je pˇrirozen´e ˇreˇsit po- moc´ı iteraˇcn´ıch metod zaloˇzen´ych na Krylovovsk´ych podprostorech, coˇz je napˇr´ıklad metoda sdruˇzen´ych gradient˚u. Aby dan´e ˇreˇsen´ı bylo efektivn´ı, iteraˇcn´ı metody potˇrebuj´ı dobr´e pˇredpodm´ınˇen´ı.

Zamˇeˇr´ıme se na na konkr´etn´ı strategii v´ypoˇctu takov´ych pˇredpodm´ınˇen´ı, kter´e jsou zaloˇzeny na zobecnˇen´em Gram–Schmidtovˇe procesu. Zab´yv´ame se jak studiem teoretick´ych vlastnost´ı ´upln´eho algoritmu tak i jeho ne´uplnou verz´ı, kterou pouˇz´ıv´ame pro v´ypoˇcet faktor˚u pˇredpodm´ınˇen´ı. T´ımto zp˚usobem se zde prol´ınaj´ı dvˇe r˚uzn´e discipl´ıny numerick´e line´arn´ı algebry, numerickou anal´yzu a v´ypoˇcetn´ı matematiku se zamˇeˇren´ı na re´aln´e v´ypoˇcty.

Z teoretick´eho hlediska Gram–Schmidt˚uv algoritmus v pˇresn´e aritmetice poˇc´ıt´a faktorizaci inverze pˇr´ısluˇsn´e matice. M˚uˇze b´yt proto form´alnˇe povaˇzov´an za pˇr´ımou metodu. Zde se budeme zab´yvat pˇredevˇs´ım jeho ne´uplnou verz´ı, coby algoritmem pro v´ypoˇcet pˇribliˇzn´e ˇr´ıdk´e faktorizace inverze matice. Zobecnˇen´y Gram–Schmidt˚uv proces byl pˇredstaven v ˇcl´anc´ıch [29, 41]. Jako ne´upln´y algoritmus pro v´ypoˇcet pˇribliˇzn´e inverze pˇredpodm´ınˇen´ı byl navrˇzen Benzim et al. [5] a pozdˇeji byl koncept rozˇs´ıˇren v [7, 4]. Ne´uplnost je realizov´ana technikou odvrhov´an´ı prvk˚u, kter´e jsou v nˇejak´em smyslu mal´e.

Zab´yv´ame se anal´yzou zobecnˇen´eho Gram–Schmidtova procesu v aritmetice s koneˇcnou pˇresnost´ı. Anal´yza navazuje na ˇcl´anek Rozloˇzn´ıka et al. [60] (2012), zde je rozˇs´ıˇrena napˇr´ıklad o pivotaci, kter´a umoˇznila vylepˇsit nˇekter´e horn´ı odhady chyb.

Zejm´ena rozd´ıly mezi horn´ımi odhady chyb po prvc´ıch a v normˇe jsou zd˚urazˇnov´any.

Jak teoretick´e tak experiment´aln´ı ´uvahy daly z´aklad nov´e technice odvrhov´an´ı prvk˚u, kter´a vykazuje stejn´e vlastnosti jako zaokrouhlov´an´ı v aritmetice s koneˇcnou pˇresnost´ı. Tato nov´a technika odvrhov´an´ı pˇredstaven´a Kopalem et al. [49] (2013) je zde studov´ana detailnˇeji a odvozena jin´ym zp˚usobem.

Hlavn´ım c´ılem je prezentovat techniku odvrhov´an´ı prvk˚u, kter´a m˚uˇze obecnˇe napomoci lepˇs´ımu pochopen´ı spojitosti mezi anal´yzou v aritmetice s koneˇcnou pˇresnost´ı a numerick´ymi aspekty predpodm´ınˇen´ı poˇc´ıtan´ych pomoc´ı ne´upln´ych rozk- lad˚u.

Kl´ıˇcov´a slova: pˇredpodm´ınˇen´ı pˇribliˇzn´ymi inverzemi, Gram–Schmidt˚uv proces, pivotace, ˇr´ıdk´e matice, ne´upln´e algoritmy, odvrhovac´ı techniky, iteraˇcn´ı metody

(10)

x

(11)

Contents

Acknowledgements v

Abstract vii

Abstrakt (in Czech) ix

Contents xv

I Introduction 1

Notation 3

1 Introduction 7

1.1 A brief background . . . 7

1.2 Scientific framework of the thesis . . . 8

1.3 Outline of the thesis . . . 9

1.4 Finite precision arithmetic . . . 9

II Solution methods for linear systems with symmetric and positive definite matrices 11

2 Direct methods 15 2.1 Cholesky factorization . . . 15

2.1.1 Computational issues . . . 16

2.1.2 Cholesky factorization in finite precision arithmetic . . . 16

3 Iterative methods 17 3.1 Stationary iterative methods . . . 17

3.2 Non-Stationary iterative methods . . . 18

4 Preconditioning algorithms for CG 21 4.1 Standard preconditioning techniques . . . 21

4.1.1 Preconditioning based on stationary iterative methods . . . 21

4.1.2 Incomplete Cholesky factorization . . . 22

4.2 Further preconditioning techniques . . . 23

4.2.1 Domain decomposition . . . 23

4.2.2 Multigrid . . . 24

4.2.3 Sparse Approximate Inverse . . . 24

4.2.4 Approach by Chow & Saad . . . 24

4.2.5 Factorized approximate inverse preconditioning . . . 24

(12)

xii CONTENTS

III The Generalized Gram–Schmidt process 27

5 Gram–Schmidt in exact arithmetic 31

5.1 Variants of the GGS process . . . 32

5.1.1 Pivoting in the GGS process . . . 34

5.1.2 Pivoting and the theoretical magnitude of the growth factor . 35 6 Gram–Schmidt in finite precision arithmetic 39 6.1 Left residuals . . . 40

6.2 Error bounds in the inverse | ¯U−1− ¯Z| . . . 41

6.3 Right residuals . . . 41

6.4 Norm-wise vs. component-wise error bounds . . . 42

6.4.1 Skeel’s condition number . . . 42

6.4.2 Right residuals from the point of view of the Skeel’s condition number . . . 43

6.5 Rounding errors in A-orthogonal projections . . . 43

6.6 Error in Cholesky factorization . . . 44

6.6.1 Classical GGS . . . 44

6.6.2 Modified GGS . . . 47

6.7 Loss of A-orthogonality . . . 49

6.8 Summary of the error bounds . . . 49

7 Numerical experiments 51 7.1 Description of the experiments . . . 51

7.2 Numerical results . . . 52

8 Gram–Schmidt as an incomplete scheme 57 8.1 Motivation . . . 58

8.2 Incomplete algorithms, error bounds and dropping strategies . . . . 58

8.3 Construction of the incomplete scheme . . . 59

8.3.1 Summary of theory and numerical experiments . . . 59

8.3.2 Adaptive dropping by a posteriori filtering . . . 59

8.4 Numerical aspects of a posteriori filtering . . . 60

8.4.1 Minimizing of the local condition numbers . . . 61

8.4.2 Monitoring of the local condition numbers . . . 61

8.5 Incomplete GGS based on the a posteriori filtering . . . 62

9 Incomplete schemes: numerical experiments 65 9.1 Definition of the test problems . . . 65

9.2 Stopping criterion and computational cost . . . 66

9.3 Small test problem . . . 66

9.3.1 Approximate inverse: effects of pivoting and scaling . . . 67

9.4 Large test problems . . . 75

IV Conclusions and open questions 77

10 Conclusions and open questions 79 10.1 Conclusions . . . 79

10.2 Open questions . . . 80

10.3 List of related publications . . . 81

References 85

(13)

Part I

Introduction

(14)
(15)

Notation

Scalars, vectors and matrices

R set of real numbers (scalars);

Rm linear vector space of real vectors of length m;

Rm×n linear vector space of real m by n matrices.

Small Greek letters α, β, etc. usually denote scalars. Small Roman letters x, b, etc.

denote real vectors, matrix components, and integers. Capital Roman and Greek letters A, Z, U , etc. denote real matrices.

Mk, [M ]k leading principal submatrix corresponding to the row indices i = 1, . . . , k and column indices j = 1, . . . , k, of the matrix M ; [M ]∗,k kth column of the matrix M ;

[M ]k,∗ kth row of the matrix M ;

[M ]i:j,k:l submatrix corresponding to the row indices i, . . . , j and column indices k, . . . , l, of the matrix M ;

I identity matrix (of a suitable dimension clear from the con- text);

ej jth column of the identity matrix (of a suitable dimension clear from the context);

[mi,j] matrix that contain mi,j entries in corresponding positions;

mi,j ith component of the jth column of the matrix M , i.e. mi,j≡ eTiM ej.

In particular, we use the following notation:

A, x, b coefficient matrix, vector of unknowns, and right-hand side vector in the system of linear algebraic equations Ax = b, respectively;

n, k dimensions of the vectors and matrices: A ∈ Rn×n.

Computation in finite precision arithmetic

fl[α op β] result of an arithmetic operation (op) between operands α and β in finite precision arithmetic;

O(n) low degree polynomial in n;

u unit roundoff, u ≈ 1.1 · 10−16 for IEEE 754 double-precision format;

 machine precision,  ≈ 2.2·10−16for IEEE 754 double-precision format.

An extra bar notation, e.g., ¯αj,i, ¯zk, ¯U , etc. is used for quantities (scalars, vectors, matrices, respectively) that are computed in finite precision arithmetic. Symbols

(16)

4 NOTATION δ and ∆ used with scalars (e.g., ∆α, δα), vectors (e.g., ∆v, δv), matrices (e.g.,

∆M , δM ) denote error coefficients, vectors, matrices that arise from finite precision computation.

Computation of the incomplete algorithms

τk local dropping parameter of the preconditioner;

τ accuracy of the preconditioner.

An extra tilde notation, e.g., αej,i, zek, eZ, etc. is used for quantities (scalars, vec- tors, matrices, respectively) that are computed in finite precision arithmetic by an incomplete algorithm.

Vector and matrix norms, absolute value of vector and matrix For a given matrix M ∈ Rn×n and a real vector we define:

kvk 2-norm (Euclidean norm) of the vector v, kvk ≡ (vTv)1/2; kvkA energy-norm (A-norm), induced by some symmetric and posi-

tive matrix A of the vector v, kvkA≡ (vTAv)1/2; kvk infinity-norm of the vector v, kvk≡ max |[v]i|;

|v| absolute value of the vector v;

kM k 2-norm (spectral norm) of the matrix M , kM k ≡ maxkvk=1{kM vk} = σ1(M );

kM k ∞-norm (infinity norm) of the matrix M , kM k ≡ maxkvk=1{kM vk};

kM kmax max-norm (maximum norm) of the matrix M , kM k ≡ max{|M |};

kM kF Frobenius norm of the matrix M , kM kF ≡ (P

ijm2ij)1/2= (P

jσj2(M ))1/2;

|M | absolute value of the matrix M .

Singular values, eigenvalues and condition numbers

σj(U ) jth largest singular value of the matrix U ∈ Rm×n, for j = 1, . . . , min{m, n};

λj(A) jth largest eigenvalue value of the square matrix A ∈ Rn×n, for j = 1, . . . , n;

κ(A) condition number with respect singular values of the matrix A, κ(A) ≡ σ1(A)/σn(A);

κS(A) spectral condition number (with respect eigenvalues) of the matrix A, κS(A) ≡ λ1(A)/λn(A);

κ(A) condition number with respect infinity norm of the matrix A, κ(A) ≡ kAkkA−1k;

cond(A) Skeel’s condition number of the matrix A, cond(A) ≡ min

DRdiagonal(DRB));

(17)

NOTATION 5 Standard matrix operations and properties

MT transpose of the matrix M ;

K−1 inverse of the square nonsingular matrix K;

nnz(M ) number of nonzero entries in the matrix M ∈ Rm×n;

zk◦ sk Hadamard (component-wise) product of the vectors zk, sk ∈ Rn, [zk◦ sk]i≡ [zk]i· [sk]i;

diag(M ) diagonal part of the matrix M ;

triu(M ) upper triangular part of the matrix M ; striu(M ) strict upper triangular part of the matrix M ; tril(M ) lower triangular part of the matrix M ; stril(M ) strict lower triangular part of the matrix M ;

hu, viA energetic inner product of the vectors u and v induced by an SPD matrix A, hu, viA≡ vTAu.

Matrix associated subspaces, subspaces operations R(M ) range of the matrix M ∈ Rm×n,

R(M ) ≡ {y : y = M x, x ∈ Rn} ⊂ , Rm;

span(vj) span of vectors vj, span(v1, ..., vn) ≡ R([v1, . . . , vn]).

Other notation

sgn(·) signum function;

iters PCG iteration count.

List of abbreviations and acronyms AINV Approximate INVerse;

CG Conjugate Gradient (method);

GGS Generalized Gram–Schmidt (process);

GS Gram–Schmidt (process);

PCG Preconditioned Conjugate Gradient (method);

QR QR decomposition, A = QR, where R is upper triangular;

LU LU decomposition, A = LU , where L is lower triangular and U is upper triangular;

PDE Partial Differential Equations;

SAINV Stabilized Approximate INVerse;

SPD Symmetric and Positive Definite (matrix);

SPAI Sparse Approximate Inverse;

FSAI Factorized Sparse Approximate Inverse;

IC(·) Incomplete Cholesky factorization and its variants (depending on the parameter);

MIC Modified Incomplete Cholesky factorization;

(18)

6 NOTATION

(19)

Chapter 1

Introduction

1.1 A brief background

A lot of challenging problems in science and industrial applications lead to large systems of linear algebraic equations. It happens very often that these systems are sparse. Such applications include, for example, modeling in climate forecasts, geology, quantum chemistry, circuit design, aerospace computations and many oth- ers. One needs to get a solution of acceptable accuracy within a reasonably short time. This is especially true in real-time simulations. An example is the problem of weather forecast where the need to obtain results very fast is absolutely cru- cial. Computation of large and sparse systems of linear algebraic equations that arise from discretization partial differential equation (PDE) is another important area with the need to solve systems very quickly in particular if they result from sequences from solving nonlinear problems.

There are two basic classes of methods for solving large and sparse systems of linear algebraic equations: direct and iterative ones. Direct methods have been developed over time from the basic scheme of the Gaussian elimination. Their main idea is to solve the system as accurately as the computational hardware and soft- ware allow in a finite number of steps. This often requires a very large amount of work. Iterative methods represent a wide class of methods that obtain the solution by generating a sequence of successive approximations. They are appropriate when the direct methods are prohibitively expensive and, sometimes, they are the only possible choice in practice. An important subclass of iterative methods is repre- sented by the Krylov subspace methods. This subclass is based on forming a basis of the subspace that involves successive powers of the system matrix and gets an approximate solution typically from solving a certain low-dimensional optimization problem. The complicated nonlinear nature of the process implies that their con- vergence behavior may be not fully understood. In general, one can say that direct methods come with robustness but at the expense of additional computational work.

On the other hand, iterative methods provide very often a sequence of cheap steps but they may suffer from slow convergence or even stagnation.

Numerical methods are also strongly linked with computer architectures. Al- though different computers perform various operations of linear algebra with differ- ent efficiencies, a common feature shared by the vast majority of computers is that quantities are computed only approximatively with a finite precision. The knowl- edge of the arithmetic is often important in design and application of numerical methods.

(20)

8 CHAPTER 1. INTRODUCTION

1.2 Scientific framework of the thesis

In many applications there is no need to find highly accurate solutions of their subproblems. Instead, obtaining their rough approximations is sufficient. This is an important reason why iterative methods may be methods of choice. However, in order to increase their robustness, the problems should be suitably transformed.

Such a transformation is called preconditioning and it should lead to solving prob- lems that are more tractable by the iterative method than the original ones. Note that preconditioning may mean completely different transformations for different problems with their specific properties.

Properties of systems of linear algebraic equations depend on several factors, e.g., on physics of the considered application or methods of the problem discretization.

In our description we concentrate on a compact introduction into preconditioning that covers just the case when the system matrix is symmetric and positive definite (SPD). Main contributions of this thesis restrict to this case.

We consider the generalized Gram–Schmidt process (GGS), that is the Gram–

Schmidt process with energetic inner product induced by an SPD matrix A. Numer- ical behavior of all main orthogonalization schemes including Householder, Givens QR, and standard modified and classical Gram–Schmidt decompositions is well un- derstood for A = I [42, 14, 31, 32, 66]. This is not the case, however, for generalized schemes with A 6= I. For example, the QR decomposition for solving weighted least squares problems is studied in [38, 36, 37]. The modified QR decomposition with a non-standard inner product with an SPD matrix, where matrix is given in the factored form as A = LLT (its Cholesky factor is known a priori) was studied in [71], see also, [70, 69]. Several orthogonalization schemes are analyzed in [60] and also in [56].

Independently of the papers on the Gram–Schmidt process, an incomplete de- composition based on the general oblique GGS orthogonalization process has been introduced in [5]. It is called AINV (Approximate INVerse) and is based on a gener- alization of the orthogonalization scheme that is close to the classical Gram-Schmidt process. Further development of AINV has led to its stabilized variant so-called SAINV [4]. The stabilization was performed both in terms of the orthogonalization scheme (changed to the modified GGS process) and in terms of appropriate compu- tation of the normalization coefficients (one sided (non-stabilized) versus stabilized [4, 9]). It was demonstrated that both AINV and SAINV may provide successful preconditioning, in particular for some classes of problems. On the other hand, pro- cess to obtain appropriate dropping parameters that balance sparsity and accuracy of the approximate inverse has been restricted to a trial-and-error approach.

Up to now, no sufficiently general theory for preconditioning by incomplete factorizations has been proposed. Research in this field even for preconditioning of general SPD problems is focused either to analyzing the behavior of algorithms in finite precision arithmetic or to development of incomplete schemes and it covers both theoretical and practical aspects of the problem very rarely. This note applies to approaches based on both the standard or generalized Gram–Schmidt process. It may be caused by the fact that an analysis of the most common dropping techniques that control discarding of the small entries is difficult. Typically, errors that arise from dropping modify the schemes in a different way than errors due to rounding in the floating-point arithmetic.

In contrast to the SPD case, there exist nice contributions devoted to its special cases (such as finite difference matrices). An important work devoted to the mod- ified incomplete Cholesky factorization [39] that analyzes the condition number of the preconditioned system was presented by Ivar Gustafsson. An even earlier pio- neering work is due to Dupont et al. [27]. Another theoretical approach deals with the support graph preconditioning [16, 10].

(21)

1.3. OUTLINE OF THE THESIS 9 Lack of theory for preconditioning by incomplete decompositions has been our main motivation. We apply the approach based on rounding error analysis to pre- conditioning, although it uses only one specific preconditioning strategy, namely that based on the generalized Gram–Schmidt process. The theory and experimental observations result into a new dropping strategy that seems to be rather efficient. A set of numerical experiments then demonstrates that additional techniques as scal- ing and pivoting may significantly improve properties of the preconditioner. Last but not least, we believe that analyzing floating-point properties of decompositions may also lead to development of other types of algebraic preconditioners.

1.3 Outline of the thesis

The thesis is divided into four parts. The first part is the introduction, it also involves chapter introduction, that brings an insight into main topic of the thesis.

The second part is devoted to solve the systems of linear algebraic equations with a symmetric and positive definite matrix. We provide a survey of direct and itera- tive solution methods and preconditioning techniques. The third part is focused on the generalized Gram-Schmidt process. We focus to the exact arithmetic identities and pivoting, we also deal with analysis in finite precision arithmetic, we verify theoretical results on simple numerical experiments, we introduce a new approach of construction of the incomplete schemes and finally we present numerical experi- ments that verifies properties of the previously introduced incomplete scheme. The fourth part contains summary, possible directions of the research in the future, open questions and also the list of related publications of the author.

1.4 Finite precision arithmetic

Representation of the real numbers is on computer restricted to a finite subset.

More details can be found in IEEE 754 standard (current version IEEE 754-2008) that defines the floating point arithmetic. Probably the mostly used data type for scientific application is data type with double precision that uses 64 bit data representation that corresponds to the relative machine precision  ≈ 2.2 · 10−16 (and the unit roundoff u = /2 ≈ 1.1 · 10−16 ). IEEE 754 standard also guaranties result of the each elementary operation (+, −, ×, / ) and also for the square root√

· is correctly rounded value of the exact operation.

Consider two real numbers α and β and an elementary floating point operation fl[· op ·] in finite precision arithmetic. If the operation is well defined it holds

|fl[α op β]| ≤ (α op β)(1 + δ), |δ| ≤ u, (1.1) similarly also for square root. The standard IEEE 754 also includes extensive recom- mendations for advanced exception handling, additional operations (such as trigono- metric functions), expression evaluation, and for achieving reproducible results.

Using (1.1) one can establish error bounds for various operations of linear alge- bra. E.g., for matrix-vector multiplication M v, where M ∈ Rm×n and v ∈ Rn, we have

|fl[M v] − M v| ≤ O(n)u|M ||v|, (1.2) where |·| denotes the absolute value. Note that this notation can be used not only for scalars, but also for vectors and matrices. The expression O(n) denotes low-degree polynomials in n. Rounding error bounds for the matrix-vector multiplication can be also formulated by using spectral and Euclidean norms of the matrices and vectors, respectively, (both denoted by k · k) as

(22)

10 CHAPTER 1. INTRODUCTION Rounding errors for additional linear algebra operations are developed in Chapter 6. In general, for upper bound for rounding errors, coefficients of the polynomial (e.g., in (1.2), (1.3)) in terms O(·) are very small. Norm-wise error bounds as (1.3) may overestimate actual result. Throughout the thesis, we use only the bounds for the rounding errors that are linear in the unit roundoff u and do not consider the higher order terms.

(23)

Part II

Solution methods for linear systems with symmetric and positive

definite matrices

(24)
(25)

System of linear algebraic equations

A system of linear algebraic equations can be written in the form

n

X

j=1

ai,jxj= bi, i = 1, . . . , m

where the coefficients ai,j are scalars. The solution components are denoted by x1, x2, . . . , xnand b1, b2, . . . , bmdenote the components of the right-hand side vector.

The system can be written in a more compact form using the matrix notation as Ax = b, A ∈ Rm×n, x ∈ Rn, b ∈ Rm, (1.4)

A = [ai,j] =

a1,1 a1,2 · · · a1,n a2,1 a2,2 · · · a2,n ... ... . .. ... am,1 am,2 · · · am,n

 , x =

 x1 x2 ... xn

 , b =

 b1 b2 ... bm

 .

Here we will always deal with the case where A is a real, square (m ≡ n) and nonsingular matrix (R(A) = Rn). Let us characterize the three main classes of such matrices by their properties and properties of their singular values σi(A) and eigenvalues λi(A):

1. A is symmetric and positive definite (SPD)

• (∀i, j)(ai,j= aj,i) and

• (∀i)(λi(A) > 0) ⇔ (∀x 6= 0)(xTAx > 0),

• note that in this case we always have (∀i)(σi(A) = λi(A)).

2. A is symmetric and generally indefinite

• (∀i, j)(ai,j= aj,i) thus

• (∀i)(λi(A) ∈ R \ {0}),

• similarly as above, we have (∀i)(σi(A) = |λi(A)|).

3. A is non-symmetric

• (∀i, j)(ai,jS aj,i),

• (∀i)(λi(A) ∈ C \ {0 + 0 i}),

• (∀i)(σi(A) S |λi(A)|).

We mostly consider solving systems of equations with symmetric and positive defi- nite matrices.

(26)

14

(27)

Chapter 2

Direct methods

Direct methods represent a general class of robust techniques that often provide high numerical accuracy and that are suitable for efficient solving of a wide range of problems with sparse or dense matrices. These methods are based on a factorization of the matrix A but they may be expensive in terms of floating-point operation counts (flops) or memory consumption. Beneficial may be that the factorization of the original matrix can be reused. Direct methods are suitable to solve problems with multiple right-hand sides at once. Direct methods provide solution at the end of the computational process.

Direct methods are based on the Gaussian elimination and are tightly connected with the LU factorization (in the nonsymmetric case) and with the Cholesky fac- torization (in the SPD case). More details including numerical aspects related to the direct methods can be found in [25].

2.1 Cholesky factorization

The Cholesky factorization is a variant of the LU factorization for SPD systems. It provides the triangular factorization of a SPD matrix A in the form

A = LLT, (2.1)

where L = [li,j] is the Cholesky factor that has a lower triangular shape (its nonzeros are only in its lower triangular part) with positive diagonal entries. It is well known that the Cholesky factorization is unique for SPD matrices. Algorithm 1 describes the Cholesky factorization of the matrix A. Note that the argument in the square Algorithm 1 Cholesky factorization

for i := 1, . . . , n do for j := i + 1, . . . , n do

li,i=q

ai,i−Pi−1 k=1li,k2 lj,i= l1

i,i



aj,i−Pi−1 k=1lj,kli,k

 end for

end for

root in Algorithm 1 is always positive [32]. Algorithm 1 is in the so-called ijk form [23], but other variants are possible.

The Cholesky factorization leads to solving the system LLTx = b. The solution x is then obtained from the forward and backward substitution, solving Ly = b for y, and LTx = y for x, respectively.

(28)

16 CHAPTER 2. DIRECT METHODS Remark 2.1. General symmetric indefinite matrices may have Cholesky-like fac- torization, but generally, more sophisticated decompositions (Bunch–Parlett or Bunch–

Kaufman-type) are needed. Such decompositions have the form A = LDLT, where L is also a lower triangular and D is a block diagonal matrix with 1 × 1 and 2 × 2 blocks.

2.1.1 Computational issues

It is a well-known fact that the triangular factorization of sparse matrices may have certain significant amount of nonzero entries than the original matrix (known as fill-in). Then factorization is more expensive and a higher amount of memory is required. Therefore, the fill-in in the factor has to be predicted (using elimination trees and symbolic factorization) and reduced (using ordering algorithms).

The elimination tree is a rooted tree computed from the adjacency graph of the matrix A. This tree contains a lot of information important for the decomposi- tion. In particular, communication efficiency of factorizations on parallel computing architectures may be roughly described by height and width of the associated elim- ination tree. E.g., high average width of the elimination tree may enable good granularity of the computations. Granularity of computations could be then im- proved by matrix reorderings that are often based on graph algorithms. Ordering algorithms are important preprocessing techniques for direct methods. They can be computed with respect to

• minimizing the storage requirements,

• minimizing the computational expenses,

• maximizing the parallelism available for computing the factor L.

Due to the fact that finding the optimal reordering of the input matrix with respect to the efficiency of the factorization is an NP-complete problem, heuristic reordering methods are used. There exist several approximate algorithms that per- mute a matrix in an affordable time on average, e.g., the minimum degree algorithm and its variants, see, e.g., [3, 2, 21]. Various algorithms of this type provide per- mutations of the original matrix that imply rather sparse direct factors. In order to enhance also the parallelism the method of choice may be the nested dissection algorithm [24] and its subsequent enhancements [45].

2.1.2 Cholesky factorization in finite precision arithmetic

The Cholesky factorization is numerically stable even without pivoting, unlike the LU factorization that usually needs pivoting schemes. In finite precision arithmetic we get the computed Cholesky factor ¯L that does not satisfy the identity (2.1) but it can be shown that

L ¯¯LT = A + ∆A, k∆Ak≤ O(n2)ukAk, (2.2) where ∆A is a small perturbation of the original problem and k · k denotes the infinity norm.

Remark 2.2. The error bound for the LU factorization is even more pessimistic since its magnitude is proportional to the so-called growth factor. And this factor could be rather large (≈ 2n).

(29)

Chapter 3

Iterative methods

Iterative methods represent a counterpart to the direct methods. Their key feature and the main difference with respect to direct methods is that they provide in every step an approximation of the solution. Accuracy of the solution is improved in every iteration (in some sense, e.g., minimization of A-norm of the error, residual, etc.) From point of view of storage requirements are much less demanding, in addition they are limited. They work with the original matrix A. On the other hand, in their plain form, their robustness is not guaranteed. Convergence of the iterative methods may be improved by a preconditioner. Early methods based on matrix splittings are called stationary. There exist more sophisticated methods (non-stationary ones), in particular the Krylov subspace methods.

3.1 Stationary iterative methods

Assume a splitting of the matrix A in the form

A = LS+ DS+ LTS, (3.1)

where LS = stril(A) is the strictly lower triangular matrix, DS = diag(A) is the diagonal matrix and LTS = striu(A) is the strictly upper triangular matrix. We can distinguish the following basic cases.

Jacobi and symmetric Gauss–Seidel method

Both methods were originally proposed for the nonsymmetric case. In order to be consistent with the topic of the thesis, we will use their SPD variations only.

Assume x(0) be an initial guess, Jacobi iteration is then defined by

x(k+1)= D−1S (b − (LS+ LTS)x(k)). (3.2) For the convergence of the Jacobi method one can formulate a necessary and suffi- cient condition based on the spectral radius ρ(D−1S (LS+ LTS)) ≤ 1. The symmetric Gauss–Seidel method uses iteration in the form

x(k+1/2) = (LS+ DS)−1(b − LTSx(k)) x(k+1) = (LTS + DS)−1(b − LSx(k+1/2))

Positive definiteness or diagonal dominance are sufficient for the convergence of the symmetric Gauss–Seidel method.

(30)

18 CHAPTER 3. ITERATIVE METHODS

Symmetric successive over-relaxation method

This method in an acceleration of the symmetric Gauss–Seidel method and uses iteration in the form

x(k+1/2) = (ωLS+ DS)−1{ωb + [(1 − ω)DS− ωLTS]x(k)} x(k+1) = (ωLTS + DS)−1{ωb + [(1 − ω)DS− ωLS]x(k+1/2)},

where ω ∈ [0, 2] is the relaxation parameter. This improvement may lead to a faster convergence than the symmetric Gauss–Seidel method that corresponds putting ω = 1. For a more details, we refer to [76, 79].

3.2 Non-Stationary iterative methods

Conjugate gradient method

The conjugate gradient method (CG) has been developed by Hestenes and Stiefel around 1951 and introduced in [41]. Originally, the CG algorithm has been derived using conjugacy and minimization of quadratic functional f (x) = 12xTAx − xTb.

CG is closely related to the Lanczos process [50, 51]. The algorithms are also very interesting numerically and their behavior in the floating-point arithmetic can be explained by an elegant mathematical theory. CG represents a very interesting as well as important general mathematical method with highly nonlinear properties.

It can be shown that in exact arithmetic CG converges in at most m steps where m means number of the different eigenvalues. From this point of view it can be also seen as a direct (finite) method, but it is only a theoretical concept. From the point of view of its numerical properties it is more important to consider it as an iterative method in finite precision arithmetic. Behavior of CG in the finite precision arithmetic has been extensively studied in [34, 57].

CG is used mainly for SPD matrices. The CG method builds up (in exact arithmetic) orthogonal basis of the Krylov subspace Kk(A, r(0)) ≡ span([r(0), Ar(0), . . . , Ak−1r(0)]), k = 1, 2, . . . , where r(0) = b − Ax(0) is an initial residual and x(0) is an initial guess of the solution. CG algorithm can be put down as Algorithm 2.

Algorithm 2 The conjugate gradient method r(0)= b − Ax(0)

p(0) = r(0)

for i := 1, 2, . . . until convergence do αk = (p(p(k)(k)))TTApr(k)(k)

x(k+1)= x(k)+ αkp(k) r(k+1)= r(k)− αkAp(k) βk =(p(p(k)(k))T)TArAp(k+1)(k) p(k+1)= r(k+1)− βkp(k) end for

In [34] it has been shown that convergence of CG is determined by the spectrum of the matrix A. The well-known simple bound is derived for Chebyshev semi- iterative method. Namely, it has been shown in [28] for Chebyshev semi-iterative method that

kx − x(k)kA

kx − x(0)kA

≤ 2 κ1/2(A) − 1 κ1/2(A) + 1

k

, k ≤ 0, (3.3)

(31)

3.2. NON-STATIONARY ITERATIVE METHODS 19 where k · kAdenotes the energy norm induced by the matrix A and κ(A) is the condi- tion number with respect to singular values of the matrix A (κ(A) ≡ σ1(A)/σn(A)).

The bound (3.3) describes just the linear convergence. In practice, CG convergence depends on the distribution of all the eigenvalues of A and not just on the condition number. Consequently, it can be faster, superlinear. For the difference between con- vergence of Chebyshev semi-iterative method (3.3) and realistic CG convergence, see [30].

As it has been mentioned above, the spectrum of the matrix A crucially influ- ences the convergence of CG. Preconditioning is a transformation of the original problem into a more favorable one (with respect to the convergence of CG). If we assume eZ eZT ≈ A−1, CG in preconditioned form (PCG) can be written down as Algorithm 3. From the description it is clear that a preconditioner does not need to be in a factorized form. CG (and also iterative methods in general) is the most Algorithm 3 The preconditioned conjugate gradient method

r(0)= b − Ax(0) p(0)= eZ eZTr(0)

for i := 1, 2, . . . until convergence do αk= (p(p(k)(k)))TTApr(k)(k)

x(k+1)= x(k)+ αkp(k) r(k+1)= r(k)− αkAp(k) βk= (p(k)(p)T(k)A e)Z eTZApTr(k)(k+1) p(k+1)= eZ eZTr(k+1)− βkp(k) end for

useful if it is able to quickly achieve a desired accuracy of the solution. Convergence of CG may be really greatly improved by a good preconditioner. There exist a lot of different preconditioning techniques for CG and some of them we briefly discuss

(32)

20 CHAPTER 3. ITERATIVE METHODS

(33)

Chapter 4

Preconditioning algorithms for CG

In this chapter, we briefly recall some preconditioning techniques for the conjugate gradient method. As it has been already mentioned, the convergence of CG is determined by the spectral properties of the coefficient matrix A. Transformation to a form with a better spectral properties is what is actually often achieved by the preconditioning. From the point of view of its application, we distinguish left, right, and two-sided preconditioning. The left preconditioning is defined as

Mf−1Ax = fM−1b, (4.1)

and the right preconditioningis is defined as

A fM−1y = b, y = fM x. (4.2)

Assume a factorized form of the preconditioner fM = fM1Mf2. One can then introduce two-sided preconditioning in the form

Mf1−1A fM2−1z = fM1−1b, y = fM2x. (4.3) In order to preserve symmetry of the preconditioned matrix, assume that fM2= fM1T. Preconditioning matrices ( fM , fM−1 or its factors) do not need to be known explicitly. They may only represent some numerical (often iterative) method for solving an auxiliary system of linear equations.

4.1 Standard preconditioning techniques

Here we will mention some classical preconditioning techniques based on the sta- tionary iterative methods and also the techniques based on incomplete factorization algorithms.

Incomplete factorization algorithms usually arising from complete algorithms combined with a dropping rule that control discarding of the entries that are con- sidered to be small in some sense. In this way incomplete algorithms preserve sparsity of the computed factorization.

4.1.1 Preconditioning based on stationary iterative methods

Jacobi preconditioning is one of the most simple preconditioning that is defined using (3.1) as

M = ff M1Mf1T = DS1/2D1/2S , Mf1= DS1/2.

(34)

22 CHAPTER 4. PRECONDITIONING ALGORITHMS FOR CG It is efficient in particular mainly for diagonally dominant matrices. Symmetric Gauss–Seidel preconditioning is defined similarly as the corresponding iterative method by

M = ff M1Mf1T = (LS+ DS)D−1/2S D−1/2S (DS+ LTS), Mf1= (LS+ DS)DS−1/2. This preconditioning has been studied more in detail in [58]. Modification of the symmetric Gauss–Seidel preconditioner (similarly as for corresponding iterative method) has led to the symmetric successive over-relaxation (SSOR) preconditioner where

Mfω= fM1,ωMf1,ωT = 1 2 − ω



LS+DS ω

  DS ω

−1/2

 DS ω

−1/2

 DS ω + LTS

 .

4.1.2 Incomplete Cholesky factorization

The term incomplete Cholesky factorization (IC) stands for a large class sparse approximate factorizations for SPD matrices. Incomplete (Cholesky) factorization algorithms involve dropping rules that control the fill-in in the incomplete factors.

All the variants of IC produce approximate triangular factorization of a matrix A ≈ fM = eL eLT.

In [19] there are discussed norms kA− eL eLTk, called the accuracy of the precondi- tioner, and k eL−1A eL−T− Ik, called the stability of the preconditioner, that measure quality of the preconditioner. Magnitudes of this norms plays an important role. A good IC preconditioner should keep both norms small.

Variant: IC(0)

Let S be a set of nonzero entries such that S = {(i, j) : ai,j 6= 0}. In IC(0) all the fill-in with indices outside S is dropped. Such version of the IC does not allow any fill-in. Only entries at the positions (i, j) ∈ S could be nonzeros. This approach is efficient in special cases as for the M-matrices and for diagonally dominant matrices, but it may not the best choice for solving more complicated problems.

Variant: IC(`)

This factorization is a generalization of IC(0). The underlying concept of levels has been formalized in [39] for finite difference problems and in [78] for more general problems. Formal treatment (as given in [62]) presents the level of fill ` is a non- negative integer. Entries of the fill-in are dropped based on the value of the level.

Initially, all levels `i,j are set to 0 for the matrix nonzero entries and to ∞ for the zero entries. Each time an element is modified by this type of IC process, its level must be updated according to the update rule

`i,j = min(`i,j, `i,k+ `k,j+ 1)

All entries with `i,j> ` are dropped. Computational cost for the higher values of ` may be rather high. This preconditioner has a lot of variations differing sometimes also by the update rules.

Variant: IC(τ )

A counterpart to the incomplete factorizations based on the structure of nonzero entries and levels are drop tolerance-based incomplete factorization, e.g., IC(τ ).

These decomposition drop entries by using a criterion involving the magnitudes of

(35)

4.2. FURTHER PRECONDITIONING TECHNIQUES 23 the matrix entries. There exist two main approaches: dropping based on abso- lute value of the dropping parameter τA and dropping based on relative value of the dropping parameter τRwith respect to some intermediate quantities computed throughout the factorization process. It means that entries that are absolutely, or relatively smaller than τA or τR multiplied by some intermediate quantity are dis- carded. For badly scaled problems absolute dropping criterion may not work well, relative dropping criterion can be preferable. In both cases, finding the optimal value of τAor τR may be highly problem dependent.

Variant: ICT(τR, p)

Incomplete factorization based on dual threshold has been introduced in [61]. Pa- rameter τR has the same meaning as in IC(τ ) using relative dropping strategy. In addition p is a non-negative integer. This approach combines the relative dropping strategy where fill-in is allowed if its magnitude is larger than τRmultiplied by the Euclidean norm of the current row. At the same time, the preconditioner also keeps (at most) p nonzeros the largest in magnitude in a row of the factor.

Variant: MIC

Modified incomplete Cholesky factorization (MIC) methods were considered in spe- cial cases in [27, 43]. They are based on the idea to preserve the row sums in the original matrix and in the preconditioner. This can be accomplished by adding the dropped entries to the diagonal entries (pivots) in the course of the elimination.

This factorization belongs to the methods (IC(·)) that have been partly theoretically analyzed in [39, 40]. There has been shown a relation between condition number of the original matrix and the preconditioned one.

Remark 4.1. In general, none of the previously mentioned cases of the IC is not breakdown-free. In order to avoid breakdowns one can increase diagonal dominance of A by diagonal shifts (either locally or globally, before or during the factorization).

A trial-and-error heuristic is often the only feasible possibility to find an optimal value of the dropping or shift parameters.

4.2 Further preconditioning techniques

This section briefly describes some advanced preconditioning techniques as hierar- chical methods and approximate inverses.

4.2.1 Domain decomposition

Domain decomposition (DD) methods are techniques originally introduced for solv- ing the PDE problems. They are based on decomposing of the spatial domain of the problem into several subdomains and solving the problems separately on these subdomains. The earliest known domain decomposition method was proposed by H. A. Schwarz in 1870. Namely, there are two main approaches with overlap- ping subdomains (multiplicative, additive Schwarz method) or non-overlapping sub- domains (Neumann–Dirichlet, Neumann–Neumann preconditioner). The primary technique consists of solving subproblems on various subdomains where suitable continuity requirements on boundaries is enforced. An early survey of DD methods can be found in [17]. More up-to-date survey that includes also popular BDDC and

(36)

24 CHAPTER 4. PRECONDITIONING ALGORITHMS FOR CG

4.2.2 Multigrid

Multigrid methods [75] are very useful tools for solving partial differential equations.

They have been originally designed as purely iterative methods. These methods use a hierarchy of progressively coarse levels and improve the approximations of the solution on a fine level by combining smoothing (simple iterative scheme such as Jacobi or Gauss–Seidel) and coarse grid correction (solving approximately the problem on a coarser level). The hierarchy can be constructed by coarsening the discretization grids of the given PDE with the transfer operators induced by the natural injection of the associated functional spaces, which leads to the so-called geometric multigrid. An algebraic generalization of this method is represented by the algebraic multigrid (AMG).

4.2.3 Sparse Approximate Inverse

The sparse approximate inverse (SPAI) algorithm has been developed by Grote and Huckle in [35]. The idea of the algorithm is to compute M ≈ A−1 by a technique based on a constrained minimization problem

min

M ∈Sf

kI − A fM kF,

where S is a set of the sparsity pattern and kI − A fM kF. Problem (4.4) can be decomposed as

kI − A fM k2F =

n

X

j=1

kej− Amejk22, (4.4)

where ej denotes jth column of the identity matrix andmejis the jth column of the computed matrix fM . This reformulation transforms to solving n independent least squares problems. The technique is targeted to a general square (and nonsingular) matrices. In the simplest formulations, the symmetry of the matrix could is not employed but it could be enforced by some postprocessing. The sparsity pattern is often prescribed by the sparsity pattern of a certain power of A.

4.2.4 Approach by Chow & Saad

Serial cost of computing the SPAI preconditioner can be very high. In order to alleviate this problem, Chow and Saad in [20] proposed an iterative method that reduces residuals corresponding to each column of the approximate inverse fM = [me1, . . . ,men]. These authors thus introduced the minimal residual algorithm to approximately solve n independent linear subproblems

Amej = ej, j = 1, . . . , n.

The algorithm starts with a sparse initial guess. After several iterations of the min- imal residual algorithm,mej vectors remain sparse when in addition less profitable nonzeros (as described in [20]) are regularly discarded. The mej vectors computed so far can be used for the so-called self-preconditioning of the minimal residual algorithm.

4.2.5 Factorized approximate inverse preconditioning

There are two main approaches for computing factorized approximate inverse pre- conditioning targeted to SPD matrices. The first one is the factorized sparse approx- imate inverse (FSAI). This is an algorithm that has been introduced by Kolotilina

(37)

4.2. FURTHER PRECONDITIONING TECHNIQUES 25 and Yeremin in [47]. The algorithm is based on minimization of the Frobenius norm kI − fM1−1LkF, where fM1−1 is the approximate inverse of the Cholesky factor and LLT = A is the Cholesky factorization of the matrix A, which is not assumed to be known explicitly. Sparsity pattern of the fM1−1has to be prescribed. The matrix Mf1−1 is then obtained as the solution of a constrained minimization problem. The serial cost for the construction of this type of preconditioner is usually very high, although its theoretical parallel complexity can be quite moderate [48]. The second class of the algorithms for computing factorized approximate inverse precondition- ers is based on the incomplete generalized Gram–Schmidt process. This approach has been introduced by Benzi and T˚uma in [6]. This algorithm does not require prescription of the sparsity pattern for the approximate inverse factor. Instead, the sparsity is preserved in a similar way as in IC(τ ) algorithm. We investigate this

(38)

26 CHAPTER 4. PRECONDITIONING ALGORITHMS FOR CG

(39)

Part III

The Generalized

Gram–Schmidt process

(40)
(41)

The Gram–Schmidt process

History of the Gram–Schmidt (GS) process is extensively summarized in [53]. The first mention of algorithm related to the modified Gram–Schmidt process is from 1820 in [52] by P. S. Laplace. Then in 1883 J. P. Gram published a paper [33] with an orthogonalization procedure for functions. Finally in 1907 E. Schmidt published paper [63] introducing essentially the same algorithm as J. P. Gram, but its version have become popular and widely used. The Gram–Schmidt process as the algorithm for solving system of linear equation was also introduced by Erhard Schmidt in [64].

Although the GS process as for classical and also modified versions produces the same results in exact arithmetic, in finite precision arithmetic the results may be different. There are several papers [14, 1, 15, 71, 31] devoted to the analysis of the GS process in finite precision arithmetic.

This part of the thesis deals with the generalized GS (GGS) process, that can be seen as a standard orthogonalization with a non-standard inner product induced by an SPD matrix. The GGS process in finite precision arithmetic has been already studied in [60] and [56].

(42)

30

(43)

Chapter 5

Generalized Gram–Schmidt process in exact arithmetic

Let Z(0)= [z(0)1 , . . . , zn(0)] ∈ Rn×nbe a nonsingular matrix. The non-generalized GS process (orthogonalization) provides decomposition of the matrix Z(0) in the form Z(0) = QR where Q ∈ Rn×n is an orthogonal matrix QTQ = I and R ∈ Rn×n is an upper triangular matrix and can be seen as the Cholesky factor of the matrix (Z(0))TZ(0). Matrix Q is in this case computed by orthogonalization column of the matrix Z(0)(using orthogonal projections that employ the Euclidean inner product) against previously computed column vectors in Q.

The GGS process uses the non-standard inner-product induced by an SPD ma- trix A ∈ Rn×n. Special case A = I corresponds to the (standard) Euclidean inner- product. Let us deal with identities for GGS, we have decomposition of the matrix Z(0) in the form Z(0) = ZU where Z ∈ Rn×n has A-orthogonal columns such that ZTAZ = I. Matrix U ∈ Rn×nis an upper triangular matrix and can be seen as the Cholesky factor of the matrix (Z(0))TAZ(0) with the norm and minimum singular value

kU k = kA1/2Z(0)k, σn(U ) = σn(A1/2Z(0))

and for the condition number κ(U ) it holds κ(U ) = κ(A1/2Z(0)). Due to the orthogonality relation (A1/2Z)T(A1/2Z) = I, we have for the norm and minumum singular values of Z

kZk = kA−1/2k = kA−1k1/2, σn(Z) = λ1/2n (A−1) = kAk−1/2. Since Z = Z(0)U−1 the product ZZT can be written as

ZZT = Z(0)[(Z(0))TAZ(0)]−1(Z(0))T.

Then AZZT represents the oblique projector onto R(AZ(0)) and orthogonal to R(Z(0)). Similarly, ZZTA is the oblique projector onto R(Z(0)) and orthogonal to R(AZ(0)). In the considered case we always have ZZT = A−1 and AZZT = ZZTA = I.

In order to have all computations in the GGS process well defined, the matrix A has to be symmetric and positive definite, but note that this type of decomposition can be analyzed in the symmetric and indefinite case as well, see, e.g., [59].

References

Related documents

The probability of a word w given hypothesis h is estimated from the training set: if the documents classified with h contain N words in total and f occurrences of word w, then

Efter vidare trimningar – och också på grund av att trafiken minskat genom att den sökt sig till andra vägar, resmål färdsätt mm – har restidsfördröjningen för trafik

Genius Loci, The Spirit of Place, phenomenology, architecture, analogue photography, darkroom, film, visualization, imaginary space, space, place, Sara Ahmed, Lewis Baltz, John

In addition to studying innate mechanisms activated in response to bacteria and bacterial products, we also examined adaptive CD4 + T cell responses to an antigen

Gram-positive bacterial fragments inhibited IL-12 production, which may serve as a negative feedback to turn off phagocyte activation the bacteria have been

It is known that the set of Sarymsakov matrices, first in- troduced by Sarymsakov [38] and redefined in [39], forms a semigroup [40] and is the largest known subset of the class

Han har varit aktiv i allt från kommersiella produktioner och internationella tävlingar till konstnärliga produktioner som Icelo, ett verk skapat parallellt för scen

Energy consumption needed to heat a standard charge in an electric oven cavity during a single cycle in conventional mode for each cavity (final electric energy consumption)