Preconditioned iterative methods for PDE-constrained optimization problems with pointwise state constraints

(1)

,

UPTEC F 17001

Examensarbete 30 hp Januari 2017

Preconditioned iterative methods for PDE-constrained optimization problems with pointwise state constraints

Anders Ström

(2)

,

Teknisk- naturvetenskaplig fakultet UTH-enheten

Besöksadress:

Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0

Postadress:

Box 536 751 21 Uppsala

Telefon:

018 – 471 30 03

Telefax:

018 – 471 30 00

Hemsida:

http://www.teknat.uu.se/student

Preconditioned iterative methods for PDE-constrained optimization problems with pointwise state

constraints

Anders Ström

Optimization problems constrained by partial differential equations (PDEs) arise in a variety of fields when one wants to optimize a system governed by a PDE. The goal is to compute a control variable such that a state variable is as close as possible to some desired state when control and state are coupled by some PDE. The control and state may have additional conditions acting on them, such as, the so-called box constraints which define upper and lower bounds on these variables. Here we study the optimal control of the Poison equation with pointwise inequality constraints on the state variable with Moreau-Yosida regularization. The state constraints make the optimality system nonlinear and a primal-dual active set method is used to solve it. In each nonlinear step a large saddle point system has to be solved. This system is generally ill-conditioned and preconditioning is required to efficiently solve the system with an iterative solution method. The preconditioner should also be efficiently realizable. The convergence rate is also dependent on model and discretization parameters, for this reason, a preconditioning technique needs to be applicable to a wide range of parameters. Three preconditioners are tested for the problem and compared in terms of iteration counts and execution time for wide range of problem and discretization parameters.

ISSN: 1401-5757, UPTEC F 17001 Examinator: Tomas Nyberg

Ämnesgranskare: Sverker Holmgren

Handledare: Maya Neytcheva

(3)

Sammanfattning

Optimeringsproblem med partiella differential ekvationer (PDE:er) som bivillkor f¨ orekommer i m˚ anga olika typer av till¨ ampningar, s˚ a som medicinsk bildanalys, aerodynamik och fi- nansiell matematik. M˚ alet i den h¨ ar typen av problem ¨ ar att ber¨ akna en kontrollvariabel s˚ a att en tillst˚ andsvariabel, s˚ a n¨ ara som m¨ ojligt, efterliknar ett ¨ onskat tillst˚ and. Prob- lemet formuleras med en kostnadsfunktion som ska minimeras och ett bivillkor som kop- plar kontroll- och tillst˚ andsvariabeln med en PDE. I m˚ anga praktiska applikationer finns ytterligare begr¨ ansningar som kontroll och tillst˚ and m˚ aste uppfylla. En s˚ an begr¨ ansning kan exempelvis uttryckas med de s˚ a kallade box-villkoren som punktvis definierar ¨ ovre och undre gr¨ anser f¨ or variablerna.

I det h¨ ar arbetet unders¨ oks optimering av Poisson-ekvationen med box-villkor applicerade p˚ a tillst˚ andsvariabeln. F¨ or att kunna definiera optimall¨ osningen med Karush–Kuhn–Tucker- villkoret m˚ aste problemet regulariseras. Detta g¨ ors genom att ers¨ atta boxvillkoren med Moreau-Yosida-straff-term i kostnadsfunktionen. Villkoren p˚ a tillst˚ andsvariabeln g¨ or ocks˚ a problemet icke-linj¨ art d˚ a de punkter d¨ ar straff-termer appliceras ¨ andras under l¨ osningen.

Det diskreta optimeringsproblemet inneh˚ aller i sin originalform ocks˚ a en Lagrangemulti- plikator men om denna diskretiseras med samma basfunktion som kontrollvariabeln kan systemet reduceras.

Den icke-linj¨ ara l¨ osaren ¨ ar en ’primal-dual active set’-metod. I varje iteration l¨ oses f¨ orst ett linj¨ art problem och sedan uppdateras strafftermerna s˚ a att de verkar i de punkter d¨ ar box-villkoren inte ¨ ar uppfyllda, det ’aktiva setet’. Konvergens n˚ as n¨ ar det aktiva setet inte

¨

andrats mellan tv˚ a iterationer. Det linj¨ ara systemet l¨ oses med en iterativ metod. Eftersom m˚ alet ¨ ar att l¨ osa stora ekvationssystem anv¨ ands en Krylov-metod. Diskretisering av opti- meringsproblemet leder till algebraiska ekvationssystem p˚ a sadelpunkt-form. Dessa system

¨

ar ofta illa-konditionerade och f¨ or att l¨ osa dem kr¨ avs d¨ arf¨ or effektiva prekonditioneringsme- toder.

Iden med prekonditionering ¨ ar att transformera systemet till ett system med samma l¨ osning men med b¨ attre konditionstal. Prekonditioneraren ska ocks˚ a vara effektiv att applicera praktiskt. Forts¨ attningsvis s˚ a p˚ averkas systemet av modell- och diskretiseringsparametrar och prekonditioneringsmetoden b¨ or i idealfallet vara oberoende av dessa.

Tre olika prekonditioneringsmetoder ( P I , P II , P III ) f¨ or det reducerade systemet testas och j¨ amf¨ ors i f¨ orh˚ allande till iterationsantal och l¨ osningstid f¨ or ett stort spann av parametrar.

L¨ agst iterationsantal uppn˚ as med P III men d˚ a den inte kunde appliceras p˚ a ett effektivt s¨ att har den ocks˚ a h¨ ogst l¨ osningstid. B˚ ade P I och P II kan appliceras effektivt och ¨ overlag

¨

ar P _II snabbare b˚ ade i egenskap av iterationer och l¨ osningstid.

Systemet som prekonditionerats med P II ¨ ar illa skalat och det leder i vissa fall till att felet i l¨ osningen ¨ ar stort. Detta skulle kunna f¨ orb¨ attras genom att skala det prekonditionerade systemet s˚ a att dess matris har enhetsdiagonal. Ytterligare unders¨ okning kr¨ avs ocks˚ a f¨ or att helt f¨ orst˚ a samspelet mellan de olika modell- och diskretiseringsparametrarna och hur

iii

(4)

iv

(5)

Nomenclature

Abbreviations

AMG Algebraic multigrid

CG Conjugate gradient method FEM Finite element method

FGMRES Flexible generalized minimal residual method GCR Generalized conjugate residual method

GMRES Generalized minimal residual method KKT Karush-Kuhn-Tucker

MINRES Minimal resiudual method

OPT-PDE Optimization problems constrained by partial differential equations SPD Symmetric positive definite

SPSD Symmetric and postive semidefinite Symbols

R ⁿ Real coordinate space of n dimensions β Tikhonov regularization parameter

ˆ

y Discrete desired state u Discrete control variable y Discrete state variable

∆ Laplace operator ˆ

y Desired state κ Condition number

∇ Nabla operator

viii

(9)

CONTENTS ix

Ω Domain

y Upper state constraint C Constraint functional J Cost functional

K k Krylov subspace of order-k y Lower state constraint

ε Moreau-Yosida regularization parameter K Stiffness matrix

M Mass matrix u Control variable y State variable

y, y Discrete state constraints

(10)

Chapter 1 Introduction

1.1 PDE-constrained optimization problems

Optimization problems constrained by partial differential equations (OPT-PDE) can be applied to formulate problems in a wide range of applications such as medical imaging, aerodynamics and financial mathematics.

OPT-PDE are formulated as follows,

min y,u J (y, u), s.t. C (y, u) = 0,

(1.1)

where y and u are the state and control variables, J (y, u) is the cost functional and C (y, u) is the PDE-constraint that couples the state and control.

To offer an intuitive understanding of this type of problem we consider the static optimal heat control problem on a domain Ω,

min y,u J (y, u) = 1 2

Z

Ω

(y − ˆ y) ² dx + β 2

Z

Ω

u ² dx, s.t. − ∆y = u in Ω,

(1.2)

where ˆ y is a desired heat distribution in Ω, the control u is a heat source and y is the resulting heat distribution. It is assumed that u and y are coupled through the Poisson equation and the goal is to compute u such that y is as close as possible to the given desired heat distribution. The term ^β ₂ R

Ω u ² dx is the Tikhonov regularization term which is added due to the problem being ill-posed. The parameter β is the Tikhonov regularization parameter, also referred to as the control cost parameter.

1

(11)

CHAPTER 1. INTRODUCTION 2 The so-called box constraints define maximum and minimum values on the state and the control, namely, we require u and y to be within certain bounds

u ≤ u ≤ u,

y ≤ y ≤ y. (1.3)

Here we focus only on the state constrained case.

The control variable u in (1.2) acts on the entire domain Ω and this is hence referred to as a distributed control problem. There also exists a class of problems that use local control where u only act in a limited part of Ω. In this case the control can be applied, e.g., to the boundary of the domain or in a number of discrete points.

1.2 Goal of study

The discrete counterpart of a continuous OPT-PDE problem leads to the solution of linear systems of algebraic equations of very large scale, where iterative solution methods are the only viable choice. As a rule, the convergence of such methods needs to be accelerated via some preconditioning technique. The aim of this work is to examine preconditioning of state constrained optimal control with Moreau-Yosida penalty functions.

1.3 Problem setting and available tools

The open source C++ finite element library deal.II [1] is used for constructing meshes and the finite element matrices. Deal.II also provides iterative solvers and preconditioners as well as an interface to Trilinos [2] used in this study. Deal.II also has interfaces with other packages such as PETSc [3], METIS [4]. The study is limited to two space dimensions.

1.4 Layout of the thesis

In Chapter 2 the state constrained optimal control problem constrained by the Poisson

equation is presented and the discrete nonlinear system is derived. In Chapter 3 an overview

of nonlinear and linear solution methods is given with a focus on Krylov subspace methods

for large sparse linear systems. In Chapter 4 three new preconditioners for the discrete

linear system are presented as well as a short description of one previously studied precon-

ditioner from [5]. In Chapter 5 we present and discuss results of numerical experiments for

the three proposed preconditioners followed by concluding remarks in Chapter 6.

(12)

Chapter 2 Mathematical framework

After discretization OPT-PDE problems leads to algebraic linear or nonlinear systems of equations with structured block matrices. Most often these matrices are real, indefinite and of saddle point form. Therefore, below we consider the basic properties of saddle point matrices in some more detail.

2.1 Saddle point problems

Consider a linear system of equations,

A B ₁ ^T B ₂ −C

x y

= f g

, (2.1)

where A ∈ R ^n×n , B ₁ , B ₂ ∈ R ^m×n , C ∈ R ^m×m . Following [6], the matrix in (2.1) is said to be of saddle point form if one or more of the following conditions are satisfied:

1. A = A ^T (A is symmetric).

2. The symmetric part of A, H ≡ ¹ ₂ (A + A ^T ), is positive semidefinite.

3. B ₁ = B ₂ = B.

4. C is symmetric and positive semidefinite (SPSD) . 5. C = 0.

If all the conditions are satisfied we obtain a symmetric linear system which has the form,

A B ^T

B 0

x y

= f g

. (2.2)

3

(13)

CHAPTER 2. MATHEMATICAL FRAMEWORK 4 If all but the last condition are met we have a linear system on the form

A B ^T B −C

x y

= f g

. (2.3)

The matrices in saddle point systems are indefinite and, in general, have poor spectral properties, which makes finding an appropriate solution method a challenging task. We note that, in this work, we consider saddle point matrices of more specific form, to be introduced below.

2.1.1 Two-by-two block matrices

For two-by-two block matrices

A = A B

C D

, (2.4)

the Schur complement of block A is defined by

S 1 = D − CA ⁻¹ B (2.5)

and the Schur complement of the block D by

S ₂ = A − BD ⁻¹ C. (2.6)

The matrices of the form (2.4) define the more general class of two-by-two block matrices.

Straightforward computation shows that, assuming A and/or D are nonsingular, these matrices can be written in a block-factorized form, such as

A = A 0 C S ₁

I ₁ A ⁻¹ B 0 I ₂

=

I 1 0 CD ⁻¹ I ₂

S ₂ B

0 D

(2.7) or

A =

I 1 0 CA ⁻¹ I ₂

A 0 0 S ₁

I ₁ A ⁻¹ B 0 I ₂

. (2.8)

We can write analogous factorizations using S ₂ . The factorized forms are used to construct various approximations of A , that act as preconditioners when solving systems with it.

Some examples are

• P ₁ =

"

A e 0 0 S e ₁

#

( block-diagonal form),

• P 2 =

"

A e 0 C S e ₁

#

(block-lower triangular form),

(14)

• P 3 = I ₁ 0 CA I 2

"

A e 0 0 S e 1

# I ₁ AB 0 I 2

(full block-factorized form), where e A, e S ₁ , A are some approximations of A, S ₁ and A ⁻¹ respectively.

When choosing and implementing some of the preconditioners, P 1 , P 2 or P 3 , the important question is how to choose the approximations so that the eigenvalues of the preconditioned system P _i ⁻¹ A , i = 1, 2, 3 are tightly clustered, possibly around one. To illustrate the analysis we consider the generalized eigenvalue problem

A v = λ P 2 v

A B

C D

v ₁ v ₂

= λ

"

A e 0 C S e ₁

# v ₁ v ₂

. (2.9)

As a first step we assume that e A = A and e S 1 = S 1 . From (2.9) we obtain then P ₂ ⁻¹ A =

A ⁻¹ 0

−S ₁ ⁻¹ CA ⁻¹ S ₁ ⁻¹

A B

C D

= I ₁ A ⁻¹ B 0 I ₂

(2.10) and we see that the eigenvalues cluster around 1 (in the complex plane). Thus, with exact A and S ₁ , the preconditioner P 2 approximates A with very high quality. ( P 2 is referred to as the ’ideal preconditioner’).

As computing S ₁ is expensive, we now keep e A = A and use some approximation, e S ₁ , of S ₁ . We obtain

A ⁻¹ 0

− e S ₁ ⁻¹ CA ⁻¹ S e ₁ ⁻¹

A B

C D

=

"

I ₁ A ⁻¹ B 0 S e ₁ ⁻¹ S ₁

#

. (2.11)

From (2.11) we see that we have still a cluster of eigenvalues around 1 but clustering of the rest of the spectrum depends on how well e S ₁ approximates S ₁ .

Next, we use e A and let e S ₁ = S ₁ .

"

A e ⁻¹ 0

−S ₁ ⁻¹ C e A ⁻¹ S ₁ ⁻¹

# A B

C D

=

"

A e ⁻¹ A A e ⁻¹ B

S ₁ ⁻¹ C(I − e A ⁻¹ A) S ₁ ⁻¹ (D − C e A ⁻¹ B)

#

. (2.12)

Thus, even if we use the exact Schur complement, if e A is not a good approximation of A, the quality of P 2 as a preconditioner to A could be destroyed. To ensure a good accuracy for the block A, we usually use an inner solver with controlled stopping tolerance.

Preconditioners for the considered class of problems based on the above techniques, are

used in [5].

(15)

CHAPTER 2. MATHEMATICAL FRAMEWORK 6

2.2 State constrained optimal control problems with Moreau-Yosida regularization

Consider the PDE-constrained optimization problem, min _y,u J , with the cost functional, J (y, u) = 1

2 ky − ˆ yk ² _L

2

(Ω

1

) + β 2 kuk ² _L

2

(Ω) (2.13)

constrained the Poisson equation with Dirichlet boundary conditions,

−∆y = u in Ω

y = g on ∂Ω (2.14)

and with upper and lower state constraints

y ≤ y ≤ y. (2.15)

One approach to deal with the state constraints is to add the so-called Moreau-Yosida penalty term to the cost functional (2.13), see for instance [5], [7]. Thus, we minimize

J (y, u) = 1

2 ky − ˆ yk ² _L

₂

_(Ω

₁

₎ + β

2 kuk ² _L

₂

_(Ω) + 1

2ε k max{0, y − y}k ² _L

₂

_(Ω) + 1

2ε k min{0, y − y}k ² _L

₂

_(Ω)

(2.16)

Another way of regularizing pure state constraints is by using mixed constraints [8]

y ≤ εu + y ≤ y. (2.17)

2.2.1 Discrete system

As a discretization technique we consider here the Finite Element Method (FEM). To derive the discrete problem one can use either the discretize-then-optimize or optimize- then-discretize framework. This refers to whether the optimality conditions are derived before or after the equations are discretized. In the case of control problems constrained by the Poisson equation it has been shown that the order does not impact the solution though there exists problems where this is not true. See, e.g., [9] and the references therein.

The finite element discretization of the continuous optimal control problem with the cost functional in (2.16) gives the discrete optimization problem

min 1

2 (y − ˆ y) ^T M (y − ˆ y) + β

2 u ^T M u + 1

2ε max{0, y − y} ^T M max{0, y − y} ^T + 1

2ε min{0, y − y} ^T M min{0, y − y} ^T s.t. Ky = M u + d

(2.18)

(16)

where K is the stiffness matrix and M is the mass matrix. We also point out that y, ˆ y, y, y, u and d are now vectors.

The usual approach to solve (2.18) is to formulate the first order optimality, or Karush- Kuhn-Tucker (KKT), conditions, namely,

M y − K ^T λ − M ˆ y + ε ⁻¹ χ A

+

M max{0, y − y} + ε ⁻¹ χ A

−

M min{0, y − y} = 0 (2.19) βM u + M λ = 0 (2.20)

−Ky + M u + d = 0 (2.21) where χ A

+

and χ A

−

, contain the indices where y > y and y < y respectively. Considering the KKT conditions it follows that, in matrix-vector form, the following system has to be solved,

K



 y u λ



 =





L 0 −K ^T

0 βM M

−K M 0







 y u λ



 =



 c _A

0 d



 , (2.22)

where L = M + ⁻¹ G _A M G _A and c _A = M ˆ y + ε ⁻¹ (G _A

₊

M G _A

₊

+ G _A

₋

M G _A

₋

), with G _A representing a projection onto the active set of constraints.

The system in (2.22) is of saddle point form, K = A B ^T

B 0

, (2.23)

where the blocks A and B are block matrices themselves, A = L 0 0 βM

and B =

−K M . We note that in this case K = K ^T but we keep the notation as we consider below methods that can be applied in more general cases.

The system (2.22) is nonlinear and of very large size. In Chapter 3 we describe first the nonlinear iterative solution method and, then, the linear solution method, applied at each nonlinear iteration.

2.2.2 Reduced system

From (2.20) we observe that λ = −βu and the second equation in (2.22) can thus be eliminated, which gives rise to the reduced system

A x =

L βK ^T

−K M

y u

= c _A d

. (2.24)

We now consider some transformations of A , that can be useful when considering the solution of the system in (2.24). By replacing u with ˜ u = −u we get,

A _I x = L −βK ^T

K M

y

˜ u

= c _A

−d

. (2.25)

(17)

CHAPTER 2. MATHEMATICAL FRAMEWORK 8 We note that the matrix in (2.24) is also of saddle point form.

Following [5] we use a lumped mass matrix. Thus, M is diagonal and L = M + ε ⁻¹ D 0

is a diagonal matrix where D ₀ , the active set projection of M , has nonzero diagonal el- ements corresponding to the active set and is zero elsewhere. By definition both L and M are positive definite and K is symmetric and positive definite, arising from the Poisson constraint.

We further introduce ˆ u = √

β ˜ u, i.e. ˜ u = ^√ ¹ _β u. Elementary computation shows that we ˆ can now instead solve the system

A II x =

L − √

βK ^T

√ βK M

y ˆ u

=

c _A

− √ βd

(2.26) and then recover u = − √

β ˆ u.

The system in (2.25) with matrix A I can be further transformed by utilizing an idea from [5] where it is noted that since L is diagonal it can be seen as

L = M _I 0

0 (1 + ε ⁻¹ )M _A

(2.27) where M _I corresponds to inactive points and M _A corresponds to active points. We now transform L into the form βM by diagonal scaling and we do this by seeking parameters α _I and α _A such that,

D _α LD _α = α _I I ₁ 0 0 α _A I 2

M _I 0

0 (1 + ε ⁻¹ )M _A

α _I I ₁ 0 0 α _A I 2

= β M _I 0 0 M _A

(2.28) or

α ² _I M _I 0

0 (1 + ε ⁻¹ )α ² _A M _A

= β M _I 0 0 M _A

. (2.29)

Choosing α _I = √

β and α _A =

q β

1+ε

⁻¹

is seen to satisfy (2.29). Now we construct the matrix

D =





α _I I ₁ 0 0 0 α _A I ₂ 0

0 0 I



 (2.30)

and apply this to the system matrix A I in (2.25),

A e 0 = D A 0 D = D _α LD _α −βD _α K ^T

KD _α M

=

"

βM −β e K ^T

K e M

#

. (2.31)

After scaling the first row with β ⁻¹ we obtain the new system matrix of the form A _III =

"

M − e K ^T K e M

#

. (2.32)

(18)

The new system to be solved is now

A _III w = ˜ b (2.33)

where y

˜ u

= Dw, and ˜ b = D c _A

−d

.

As the block matrices L, M and K originate from FEM discretizations we posses much information about their extreme eigenvalues and condition numbers, respectively. Since M is a lumped mass matrix we know that its eigenvalues are bounded by

c ₁ h ² ≤ λ(M ) ≤ c ₂ h ² , (2.34)

where c ₁ and c ₂ are constants independent of h. With L being the mass matrix plus some scaled components of the mass matrix we analogously have

c 1 h ² ≤ λ(L) ≤ c 2 (1 + ε ⁻¹ )h ² . (2.35) With K being the stiffness matrix we have, from Gershgorin’s circle theorem, that

c ₃ h ² ≤ λ(K) ≤ 8, (2.36)

where c ₃ again is a constant independent of h [10].

We recall briefly the definition of a symmetric positive definite (SPD) (real) matrix, namely A is SPD if

x ^T Ax > 0 ∀x 6= 0. (2.37)

A is SPSD if

x ^T Ax ≥ 0 ∀x 6= 0. (2.38)

Below the notation A ≥ B is used in the positive definite sense, i.e.

A ≥ B is equivalent to x ^T Ax ≥ x ^T Bx ∀x 6= 0. (2.39) Further we introduce the following matrix relation, used in the construction of various preconditioners for the matrix A in (2.24). Let A and B be given, α > 0, A be SPD, then we see that

(A − αB)A ⁻¹ (A − αB ^T ) = A − α(B + B ^T ) + α ² BA ⁻¹ B ^T > 0 (2.40) and thus

A + α ² BA ⁻¹ B ^T > α(B + B ^T ). (2.41)

(Here A and B are generic matrix notations.)

(19)

Chapter 3 Numerical solution methods for PDE-constrained problems with pointwise state constraints

3.1 Newton-type solution methods for nonlinear prob- lems

Here we discuss methods to find the solution of a system of nonlinear equations. Given a nonlinear functional

F : R ⁿ → R ⁿ , (3.1)

the task is to find

x ∗ ∈ R ⁿ such that F (x ∗ ) = 0. (3.2) A well established solution approach for nonlinear problems is Newton’s method and its modifications. The basic Newton method is defined by using a linearization of F about a current estimate x _k of x ∗ such that

F (x _k + d) ≈ F (x _k ) + ∇F (x _k )d = m _k (x _k + d) (3.3) where ∇F is the Jacobian matrix of F . The estimate x k can now be improved by computing d _k such that,

m _k (x _k + d _k ) = 0. (3.4)

The new estimate is x _k+1 = x _k + d _k . Algorithm 1 on page 11 shows the basic Newton method for smooth F , where it is required that ∇F (x _k ) is nonsingular.

Locally Algorithm 1 has quadratic convergence for Lipschitz continuous functions and superlinear convergence for functions that are only H¨ older continuous. The proof can be found, for example, in [11].

10

(20)

Algorithm 1 Newton’s method for smooth systems

1: k := 0

2: while stopping criteria is not satisfied do

3: solve: ∇F (x _k )d _k = −F (x _k ) for d _k

4: set x _k+1 = x _k + d _k , k = k + 1

5: end while

The convergence rate is only guaranteed locally i.e. the initial guess x 0 must be sufficiently close to the solution x ∗ . For general problems, we do not possess any additional knowledge how to choose x ₀ . Therefore, global convergence is achieved by using a reduced step-size.

The step in (3.4) then becomes,

x _k+1 = x _k + α _k d _k (3.5)

where 0 < α ≤ 1 is called a damping parameter. Various techniques to choose α in an appropriate way have been used, leading to damped Newton methods [12].

3.1.1 Semismooth Newton method

If F : R ⁿ → R ⁿ is not differentiable, which is often true in the nonlinear case, one can still apply a Newton type method by constructing a generalized Jacobian ∂F (x) of F (x).

Algorithm 2 Newton’s method for semismooth systems

1: k := 0

2: while stopping criteria is not satisfied do

3: solve: G(x _k )d _k = −F (x _k ) for d _k , (G(x _k ) is an arbitrary element of ∂F (x _k ))

4: set x _k+1 = x _k + d _k , k = k + 1

5: end while

If F is only locally Lipschitz continuous one can expect at most linear convergence. With the property of semismoothness of F (x) at x, the generalized Newton method is shown to be well-defined and super-linearly convergent. Semismoothness can be defined by the following equivalent statements [11],

1. F is semismooth at x.

2. F is locally Lipschitz continuous at x, F ⁰ (x; ·) exists, and for any G ∈ ∂F (x + d), kGd − F ⁰ (x, d)k = o(kdk) as d → 0.

3. F is locally Lipschitz continuous at x, F ⁰ (x; ·) exists, and for any G ∈ ∂F (x + d),

kF (x + d) − F (x) − Gdk = o(kdk) as d → 0.

(21)

CHAPTER 3. NUMERICAL SOLUTION METHODS 12 To solve the system G(x _k ) = F (x _k ) exactly can be an expensive operation. Instead, one can use an approximate method to solve the system, e.g. a Krylov subspace iteration method. This is referred to as an inexact semismooth Newton method.

3.1.2 Primal-dual active set method

Optimization problems with inequality constraints give rise to nonlinear systems. The primal-dual active set method is a way to treat the inequality constraints and is equivalent to a semismooth Newton method given certain conditions [13]. For the problem in (2.18) it has been shown that the functions max{0, y − y}} and min{0, y − y} and has semismooth generalized derivatives [11].

The primal dual active set method can be described as follows: First calculate the active set of points where the inequality constraint is violated, then solve the linear system that corresponds to the equality constraints, and update the active set. This is repeated until the active set does not change between iterations which is the convergence criteria used in e.g. [5] and [14].

Algorithm 3 Primal-dual active set method

1: Initialize solution

2: for k = [1, max iterations] do

3: update active set and linear system

4: if active set did not change & k 6= 1 then

5: Solver converged: break

6: end if

7: solve: linear system

8: end for

3.2 Krylov subspace iteration methods for linear prob- lems

Some of the most popular methods to solve large and sparse linear systems are the so- called Krylov subspace methods. A detailed description can be found in [15], on which this section is largely based.

Considering a linear system of equations,

A x = b, (3.6)

a Krylov method looks for the solution, x _k , in a vector space of the form,

K k ( A , c) ≡ span{c, A c, . . . , A ^k−1 c} (3.7)

(22)

by performing repeated matrix-vector multiplication of A and c. K k is referred to as the search space and the solution is chosen such that the residual satisfies the Petrov- Galerkin condition b − Ax _k ⊥ L k , where L k is another subspace, referred to as the space of constraints. The choice of L k defines different Krylov subspace methods.

The vector c = b− A x 0 is determined by the initial guess x 0 . Choosing x 0 = 0 gives c = b, which is a common choice for the vector space that allows convergence estimates.

The matrix-vector multiplication A c can also be seen as a black-box returning the matrix- multiplication A c, i.e., the matrix A does not need to be explicitly available since no manipulation with individual matrix elements is required to be performed.

For nonsingular matrices A , the solution to (3.6) is guaranteed to lie in a Krylov subspace of dimension, k, equal to the degree of the minimal polynomial q(t), of A (q(A) = 0). This means that the exact solution will be produced in at most k iterations. In practice though a good enough approximation, determined by some convergence criteria, can normally be found in j k iterations.

For singular matrices there exists a class of right hand sides b, where the solution again lies in a Krylov subspace [15].

Among the most used Krylov subspace methods are:

• Conjugate Gradient (CG) method which can be applied to systems with SPD matrices ( L k = K k ).

• Minimal residual (MINRES) method can solve systems with symmetric indefinite matrices ( L k = AK k ).

• General minimal residual (GMRES) method can solve systems with nonsymmetric indefinite matrices ( L k = AK k ).

3.2.1 Conjugate Gradient (CG) Method

Algorithm 4 shows the computational procedure, describing implementation of the conju-

gate gradient method, being one of the most often used methods to solve systems with

SPD matrices. We see that per iteration the CG requires one matrix-vector multiplica-

tion, three vector updates and two scalar products. The memory requirements and the

arithmetic cost per iteration do not grow.

(23)

CHAPTER 3. NUMERICAL SOLUTION METHODS 14 Algorithm 4 Conjugate Gradient method

1: x ₀ = 0, r ₀ = 0, p ₁ = b

2: for k = 1, 2, . . . , until kr _k k ₂ is small enough do

3: z = Ap _k

4: ν _k = (r ^T _k−1 r _k−1 /p ^T _k z)

5: x _k = x _k−1 + ν _k p _k

6: r _k = r _k−1 − ν _k z

7: µ _k+1 = (r _k ^T r _k )/(r _k−1 ^T r _k−1 )

8: p _k+1 = r _k + µ _k+1 p _k

9: end for

3.2.2 Generalized Minimal Residual (GMRES) Method

In each iteration GMRES finds a solution x _k in the Krylov space K _k ( A , b) that minimizes the residual by solving the least squares problem

min

z∈ K

k

( A ,b) kb − A zk. (3.8)

The problem (3.8) is solved by creating an orthonormal basis {v 1 , v 2 , . . . , v k } for K k (A, b) using Arnoldi’s method. The new basis vector is constructed by first orthogonalizing the vector A v _k against the previous subspace,

ˆ

v j+1 = A v k − (h 1k v 1 + · · · + h jj v j ), (3.9) where h _i j = v _i ^∗ A v _j , and then normalizing it

v _j+1 = ˆ v _j+1 /kˆ v _j+1 k. (3.10) The vectors in the orthonormal basis for K j (A, b) can be collected in a matrix V _j = (v ₁ , . . . , v _j ) to give,

A V _j = V _j+1 H _j (3.11)

where H _j is an upper Hessenberg matrix of size (j + 1) × j. For (3.8) this gives that if z ∈ K _k (A, b) then z = V _k y for some y, hence

A z = A V _k y = V _k+1 H _k y

b = βv ₁ = βV _k+1 e ₁ (3.12)

for β = kbk and e 1 = [1, 0, . . . ] ^T . The least squares problem can then be reduced to min

z∈ K

k

( A ,b) kb − A zk = min

y kβe ₁ − H _k yk (3.13)

(24)

Algorithm 5 GMRES

1: Initialize: x ₀ = 0, v ₁ = b/β, V ₁ = v ₁

2: for k = 1, 2, . . . do

3: Orthogonalize: ˆ v _k+1 = A v _k − V _k h _k where h _k = V _k ^∗ A v _k

4: Normalize: v _k+1 = ˆ v _k+1 /kˆ v _k+1 k

5: Update: V _k+1 = (V _k , v _k+1 ), H _k = H _k−1 h _k 0 kˆ v _k+1 k

6: Solve least squares problem : min y kβe 1 − H k yk and call the solution y k

7: compute solution x _k = V _k y _k

8: end for

GMRES could be run until it produces a zero vector, i.e. ˆ v _k+1 = 0 which would indicate that the exact solution to (3.6) has been found. In practice though one would set a convergence criterion and terminate when this is reached.

Memory usage grows with each iteration since all the vectors building the subspace need to be stored. Restarting is a way to keep the memory usage bounded but the restarted method may not have the same convergence as full GMRES and may even stall. We will not use restart and instead aim to keep the iterations small enough that memory usage is not an issue.

3.2.3 Preconditioned Krylov subspace methods

To explain the idea of preconditioning we introduce the condition number of a generic matrix A,

κ(A) = kA ⁻¹ k · kAk, (3.14)

where k · k represents some norm. If κ(A) 1, the matrix A is said to be ill-conditioned.

This leads to high iteration numbers when solving systems with A with an iterative method.

This is a major problem of the Krylov subspace methods, compared to direct solvers, since κ(A) and, hence, the efficiency is highly dependent on the problem at hand. Convergence rate, as a result, can vary with mesh size as well as with equation parameters. The estimated relative error for CG in iteration k is

τ _k = √κ − 1

√ κ + 1

k

, (3.15)

where one can see that the error decrease faster if κ is close to one.

The idea of preconditioning is to transform the system

Ax = b (3.16)

(25)

CHAPTER 3. NUMERICAL SOLUTION METHODS 16 to a system with the same solution but with a matrix that has better conditioning. As an additional requirement, we want to do this in a way that is robust with respect to the various problem and discretization parameters.

Preconditioning can be seen as applying a matrix P to the problem (3.16) where P is constructed such that the preconditioned system P ⁻¹ A has a condition number close to 1 and P ⁻¹ v is easy to compute for some vector v. Constructing a preconditioner is often a trade off between these two requirements. If we choose P = A the system P ⁻¹ A = A ⁻¹ A = I is perfectly conditioned but computing P ⁻¹ v = A ⁻¹ v is equivalent of solving the original problem (3.16).

There are several ways to apply preconditioning to a system. If the preconditioner is applied from the left the resulting system looks as follows

P ⁻¹ Ax = P ⁻¹ b. (3.17)

One can also use right preconditioning, in which case the preconditioned system be- comes

AP ⁻¹ u = b, x ≡ P ⁻¹ u. (3.18)

If a factorization P = P _L P _R of the preconditioner is available, split preconditioning can be applied,

P _L ⁻¹ AP _R ⁻¹ u = P _L ⁻¹ b, x = P _R ⁻¹ u, (3.19) where P _L and P _R are typically triangular matrices.

For systems with symmetric matrices, where the eigenvalues are real, the conditioning is described by the matrix eigenvalue spectrum. For SPD matrices we have the spectral condition number

κ(A) = λ _max (A)

λ _min (A) . (3.20)

For nonsymmetric systems the eigenvalues alone may not be enough to describe the con- vergence of nonsymmetric matrix iterations. However, if the preconditioned matrix is not too nonsymmetric clustered eigenvalues often result in fast convergence.

Standard preconditioning methods often have poor performance on saddle point problems due to lack of diagonal dominance and indefiniteness. Instead, we usually construct pre- conditioners based on some knowledge of the specific problem.

When preconditioning is performed approximately with an iterative method the resulting preconditioned system will change between iterations. This is called a variable precondi- tioning and one has to choose a solver that can handle this. The flexible GMRES (FGM- RES) is one such method [16]. Another appropriate choice is the Generalize Conjugate Residual(GCR) method, describe e.g in [19].

We note that at each step in left preconditioning only preconditioned residual is available

while in right preconditioning the actual residual norm is available.

(26)

3.2.4 Multigrid methods

Some of the preconditioned Krylov subspace solvers, such as when using incomplete fac- torization preconditioning, may still experience slower convergence when the mesh size of the system is increased. Multigrid methods can theoretically reach convergence rates that are independent of mesh size.

The idea behind multigrid methods is to move between coarse and fine grids while doing only a few iterations on each grid. The reason behind is that relaxation type iterative methods, like the Jacobi and Gauss-Seidel methods remove high frequency errors in only a few iterations while low frequencies can take very long to reach convergence. By moving to a coarse grid, some of the low frequency components on a fine grid will behave like high frequency components and can thus be removed with only a few iterations. To move from a fine to a coarse grid a restriction operator is used, this can e.g. be an injection operator, which for a coarser grid with twice the grid spacing can be defined as v ^2h _i = v _2i ^h . To move from coarse to fine grids a prolongation operator is used. This can be a simple linear interpolation operator. On each grid smoothing is performed with few iterations of an iterative method to remove the high frequency components of the error. The smoother can be e.g. Jacobi, Richardson or Gauss-Seidel iteration.

Moving between grid levels can be performed in different ways as shown in Figure 3.1.

The V-cycle starts at the finest grid level and moves to the coarsest and then back up to the finest. A W-cycle similarly starts at the finest level and proceeds to the coarsest, it can then repeatedly go between the coarser grid levels before returning to the finest grid.

Full multigrid starts at the coarsest grid to create an initial guess for the next finer grid, then a V-cycle is performed on this grid creating an initial guess for the next level, this is repeated until a full V-cycle is performed on the finest grid level.

V-cycle W-cycle Full multigrid

8h 4h 2h h

Grid level

Figure 3.1: Example of how the V-cycle, W-cycle and Full multigrid moves between grid levels.

Algebraic multigrid (AMG) is a generalization of geometric multigrid ideas when the prob-

lem is given only as a system of equations Ax = b and there is no grid hierarchy readily

(27)

CHAPTER 3. NUMERICAL SOLUTION METHODS 18 available. One then uses the nonzero matrix entries A _i,j to determine which components are connected, to construct coarse level matrices, prolongation and restriction operators in an algebraic way.

Multigrid methods can in practice be used as solvers but most commonly act as precondi- tioners for e.g. Krylov subspace methods.

Multigrid works best on symmetric matrices. For nonsymmetric matrices an M-matrix is sufficient for convergence [17].

Mifune et al showed in [18] that AMG outperformed incomplete LU(ILU) factorization as a preconditioner for systems with nonsymmetric matrices arising in electromagnetic finite-element analyses.

The interested reader can find a detailed description of the multigrid method in e.g.

[19].

(28)

Chapter 4 Preconditioners for PDE-constrained optimization problems with state

constraints

4.1 State constrained OPT-PDE with Moreau-Yosida penalty function

4.1.1 A nonstandard inner product preconditioner

Here the full system from (2.22) is considered,

K



 y u λ



 =





L 0 −K ^T

0 βM M

−K M 0







 y u λ



 =



 c _A

0 d



 . (4.1)

In [5], Wathen et al. suggest a block triangular preconditioner on the form,

P =





A ₀ 0 0

0 A ₁ 0

−K M −S ₀



 (4.2)

where A ₀ , A ₁ and S ₀ approximates the (1,1) and (2,2) system matrix blocks and the Schur complement, respectively. Since a lumped mass matrix is used A ₀ = L and A ₁ = βM can be used without approximation. The Schur complement is approximated by

S = (K + c b M )L ⁻¹ (K + c M ) (4.3)

19

(29)

CHAPTER 4. PRECONDITIONERS FOR OPT-PDE 20 where the matrix,

L = M _I 0

0 (1 + ε)M _A

, (4.4)

is split such that M _I correspond to the free variables and M _A to the active set of variables.

The matrix ˆ M in the Schur-comlement approximation is structured similarly as, M = c αM _I 0

0 γM _A

(4.5) and it is shown that choosing the parameters

α = 1

√ β and γ =

√ 1 + ε ⁻¹

√ β (4.6)

gives eigenvalue bounds λ ∈ ₁

2 , 1 for the matrix S b ⁻¹ S which determine the eigenvalues of the preconditioned matrix P ⁻¹ K . However, the preconditioner is P is not fully robust with respect to the problem parameters, see, eg, Table V in [5].

4.1.2 Structure utilizing preconditioners

Below we discuss some preconditioners that utilize the fact that the arising matrices consist of square blocks.

Here we first consider the reduced system in (2.25), A I

y

˜ u

= L −βK ^T

K M

y

˜ u

= c _A

−d

(4.7) and the transform (2.32) that leads to the matrix

A III = M − ˜ K ^T K ˜ M

. (4.8)

We know from earlier results that, under certain conditions,

A −βB ₂ αB ₁ A

(4.9) can be very efficiently preconditioned by

A −βB ₂

αB ₁ A + √

αβ(B ₁ + B ₂ )

(4.10) and all eigenvalues of the preconditioned matrix

A −βB ₂

αB ₁ A + √

αβ(B ₁ + B ₂ )

−1

A −βB ₂ αB ₁ A

(4.11)

(30)

are real and lie in the interval [ ¹ ₂ , 1]. The conditions for the latter result to hold require A to be SPD and B ₁ + B ₂ to be SPSD [23]. The matrix A _III is of the form (4.9) and the preconditioner is

P _III = M − ˜ K ^T K ˜ M + ( ˜ K + ˜ K ^T )

. (4.12)

However ˜ K + ˜ K ^T is not positive definite and this will affect both the eigenvalues of the preconditioned system as well as the ease of solving a system with the preconditioner.

We show now that the lower bound of the spectrum of P _III ⁻¹ A III is preserved. To simplify the derivation, we first apply a diagonal scaling to A _III , to make its diagonal blocks identity matrices.

Consider the matrix

A ⁰ _III = I −B ^T

B I

, (4.13)

where I is the identity matrix, B has full rank however B 6= B ^T and B + B ^T is not positive definite. Consider a preconditioner to A ⁰ _III of the form

P _III ⁰ = I −B ^T B I + B + B ^T

(4.14) and analyze the generalized eigenvalue problem λ P _III ⁰ v = A ⁰ _III v. We have

λ I −B ^T B I + B + B ^T

v w

= I −B ^T

B I

v w

. (4.15)

After some algebraic transformations we obtain

0 0

0 B + B ^T

v w

= 1

λ − 1 I −B ^T

B I

v w

. (4.16)

As A ⁰ _III is nonsingular, so is P _III ⁰ and λ is not equal to zero. Below we follow the logic of the derivations in [23].

We see from (4.16) that for vectors v 0

, v 6= 0, we have _λ ¹ − 1 v = 0, thus, λ = 1 with multiplicity n, where n is size of the blocks.

If λ 6= 1 then v = B ^T w for w 6= 0. Again from (4.16) we see that the following equality must hold true,

w ^T (B + B ^T )w = 1 λ − 1

w ^T (Bv + w) = 1 λ − 1

w ^T (I + BB ^T )w. (4.17) Using the relations in (2.40) and (2.41) we have,

0 ≤ w ^T (I − B)(I − B ^T )w = w ^T (I + BB ^T )w − w ^T (B + B ^T )w. (4.18)

(31)

CHAPTER 4. PRECONDITIONERS FOR OPT-PDE 22 The left inequality is true because the matrix (I − B)(I − B ^T ) is SPSD. Thus,

w ^T (B + B ^T )w ≤ w ^T (I + BB ^T )w (4.19) for any w 6= 0. Using the latter relation in (4.17) we obtain

1 λ − 1

w ^T (I + BB ^T )w ≤ w ^T (I + BB ^T )w. (4.20) Thus, _λ ¹ − 1 ≤ 1 and λ ≥ ¹ ₂ .

In Figure 4.1 we see that the lower bound of the spectrum is preserved while the maximum eigenvalue depends on the parameters β and as well as increases when the mesh is refined and may become very large. Most eigenvalues are still clustered in the interval ₁

2 , 1.

0 100 200 300 400 500 600

0.5 1 1.5 2 2.5 3 3.5 4

(a) β = 10 ⁻² , ε ⁻⁶ , h = 2 ⁻⁴

0 100 200 300 400 500 600

0.5 0.6 0.7 0.8 0.9 1 1.1

(b) β = 10 ⁻⁶ , ε ⁻² , h = 2 ⁻⁴

0 100 200 300 400 500 600

0.5 0.6 0.7 0.8 0.9 1 1.1

(c) β = 10 ⁻⁶ , ε ⁻⁶ , h = 2 ⁻⁴

0 500 1000 1500 2000 2500

0 1 2 3 4 5 6 7

(d) β = 10 ⁻² , ε ⁻⁶ , h = 2 ⁻⁵

0 500 1000 1500 2000 2500

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2

(e) β = 10 ⁻⁶ , ε ⁻² , h = 2 ⁻⁵

0 500 1000 1500 2000 2500

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

(f) β = 10 ⁻⁶ , ε ⁻⁶ , h = 2 ⁻⁵

Figure 4.1: Eigenvalues of the preconditioned matrix P _III ⁻¹ A III

For the system, (4.7), we now consider the preconditioner B I = L −βK ^T

K L

. (4.21)

We analyze the generalized eigenvalue problem

Ax = λ B _I x or, equivalently, (4.22) 1

λ Ax = B _I x. (4.23)

(32)

To ease the analysis, we modify (4.22) as follows,

(A − B _I )x = (λ − 1) B _I x. (4.24)

In detail,

0 0

0 −ε ⁻¹ D

x ₁ x ₂

= (λ − 1) L −βK ^T

K L

x ₁ x ₂

, (4.25)

with D = M _A being the diagonal matrix with nonzero elements corresponding to the active set. Then the following relations hold,

0 = (λ − 1)(Lx 1 − βK ^T x 2 ), (4.26)

−ε ⁻¹ Dx ₂ = (λ − 1)(Kx ₁ + Lx ₂ ). (4.27) From (4.26) we see that for all vectors ˆ x, such that Lˆ x ₁ − βK ^T x ˆ ₂ 6= 0, we have λ = 1. As L is SPD and K is full rank, there are n such vectors and λ = 1 has multiplicity n.

If λ 6= 1, then Lx ₁ = βK ^T x ₂ or x ₁ = βL ⁻¹ K ^T x ₂ . We substitute the latter in (4.27) and obtain

−ε ⁻¹ Dx ₂ = (λ − 1)(βKL ⁻¹ K ^T + L)x ₂ (4.28) or, equivalently,

ε ⁻¹ Dx ₂ = (1 − λ)(L + βKL ⁻¹ K ^T )x ₂ . (4.29) As an immediate result from (4.29) we see that 0 < λ < 1, due to the fact that ε > 0, D is SPSD and L + βKL ⁻¹ K ^T is SPD.

We use the corresponding Rayleigh quotient, to estimate 1 − λ, min ε ⁻¹ x ^T ₂ Dx ₂

x ^T ₂ (L + βKL ⁻¹ K ^T )x 2

≤ 1 − λ ≤ max ε ⁻¹ x ^T ₂ Dx ₂ x ^T ₂ (L + βKL ⁻¹ K ^T )x 2

. (4.30) The left part of (4.30) does not bring any new insight as min x ^T ₂ Dx ₂ = 0. From the right part of the inequality we obtain

max ε ⁻¹ x ^T ₂ Dx ₂

x ^T ₂ (L + βKL ⁻¹ K ^T )x ₂ ≤ ε ⁻¹ h ²

ε ⁻¹ h ² + βh ² (ε ⁻¹ + h ² ) ⁻¹ h ² = ε ⁻¹ h ²

ε ⁻¹ h ² + βh ² ( ^1+εh _ε

²

) ⁻¹ h ² = ε ⁻¹ h ²

ε ⁻¹ h ² + βh ² _1+εh ^ε

2

h ²

= 1

1 + _1+εh ^ε

² 2

h ² β = 1 + εh ²

1 + εh ² + ε ² h ² β = 1 − ε ² h ² β 1 + εh ² + ε ² h ² β .

(4.31)

Combining (4.31) with (4.30) we get the eigenvalue bounds ε ² h ² β

1 + εh ² + ε ² h ² β ≤ λ ≤ 1 (4.32)

(33)

CHAPTER 4. PRECONDITIONERS FOR OPT-PDE 24 for the the preconditioned system B _I ⁻¹ A I .

We see in (4.32) that λ becomes very small and acts numerically. as zero, thus, the pre- conditioned system becomes numerically singular. Also we note that the small eigenvalues become tightly clustered, as shown in Figure 4.2.

If we precondition A I in (4.7) by

P I = L −βK ^T K L + √

β(K + K ^T )

, (4.33)

the effect of this is as follows.

λ( P _I ⁻¹ A I ) = λ( P _I ⁻¹ B I B _I ⁻¹ A II ) ≤ 1 1

2 ε ² h ² β

1 + ε + ε ² h ² β ≤ 1. (4.34) Thus, roughly twice more iterations could be expected. The eigenvalues are shown in Figure 4.3.

0 100 200 300 400 500 600

10^-6 10^-5 10^-4 10^-3 10^-2 10^-1 10⁰

(a) β = 10 ⁻² , ε ⁻⁶ , h = 2 ⁻⁴

0 100 200 300 400 500 600

10^-3 10^-2 10^-1 10⁰

(b) β = 10 ⁻⁶ , ε ⁻² , h = 2 ⁻⁴

0 100 200 300 400 500 600

10^-6 10^-5 10^-4 10^-3 10^-2 10^-1 10⁰

(c) β = 10 ⁻⁶ , ε ⁻⁶ , h = 2 ⁻⁴

0 500 1000 1500 2000 2500

10^-6 10^-5 10^-4 10^-3 10^-2 10^-1 10⁰

(d) β = 10 ⁻² , ε ⁻⁶ , h = 2 ⁻⁵

0 500 1000 1500 2000 2500

10^-3 10^-2 10^-1 10⁰

(e) β = 10 ⁻⁶ , ε ⁻² , h = 2 ⁻⁵

0 500 1000 1500 2000 2500

10^-6 10^-5 10^-4 10^-3 10^-2 10^-1 10⁰

(f) β = 10 ⁻⁶ , ε ⁻⁶ , h = 2 ⁻⁵

Figure 4.2: Eigenvalues of the preconditioned matrix B ⁻¹ _I A I for some parameters

(34)

0 100 200 300 400 500 600 10^-6

10^-5 10-4 10^-3 10^-2 10-1 10⁰

(a) β = 10 ⁻² , ε ⁻⁶ , h = 2 ⁻⁴

0 100 200 300 400 500 600

10^-4 10^-3 10^-2 10^-1 10⁰

(b) β = 10 ⁻⁶ , ε ⁻² , h = 2 ⁻⁴

0 100 200 300 400 500 600

10^-6 10^-5 10-4 10^-3 10^-2 10-1 10⁰

(c) β = 10 ⁻⁶ , ε ⁻⁶ , h = 2 ⁻⁴

0 500 1000 1500 2000 2500

10^-6 10-5 10^-4 10^-3 10^-2 10^-1 10⁰

(d) β = 10 ⁻² , ε ⁻⁶ , h = 2 ⁻⁵

0 500 1000 1500 2000 2500

10^-4 10^-3 10^-2 10^-1 10⁰

(e) β = 10 ⁻⁶ , ε ⁻² , h = 2 ⁻⁵

0 500 1000 1500 2000 2500

10^-7 10^-6 10^-5 10^-4 10^-3 10^-2 10^-1 10⁰

(f) β = 10 ⁻⁶ , ε ⁻⁶ , h = 2 ⁻⁵ Figure 4.3: Eigenvalues of the preconditioned matrix P _I ⁻¹ A I

We next consider the reduced system of the form (2.26), with the matrix A II =

L − √

√ βK

βK M

(4.35) and a preconditioner

P II = M + σD + √

β(K ^T + K) − √ βK ^T

√ βK M + σD

, (4.36)

where σ is a parameter and D = ε ⁻¹ M _A . We analyze the generalized eigenvalue prob- lem

L − √

√ βK

βK M

x ₁ x ₂

= λ M + σD + √

β(K ^T + K) − √ βK ^T

√ βK M + σD

x ₁ x ₂

(4.37) and note the relation

M + σD = M + D − D + σD = L − (1 − σ)D. (4.38) By dividing (4.37) with λ and subtracting the left hand side we get

Preconditioned iterative methods for PDE-constrained optimization problems with pointwise state constraints

,

UPTEC F 17001

Examensarbete 30 hp Januari 2017

Preconditioned iterative methods for PDE-constrained optimization problems with pointwise state constraints

Anders Ström

,

Teknisk- naturvetenskaplig fakultet UTH-enheten

Besöksadress:

Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0

Postadress:

Box 536 751 21 Uppsala

Telefon:

018 – 471 30 03

Telefax:

018 – 471 30 00

Hemsida:

http://www.teknat.uu.se/student

Preconditioned iterative methods for PDE-constrained optimization problems with pointwise state

constraints

Anders Ström

ISSN: 1401-5757, UPTEC F 17001 Examinator: Tomas Nyberg

Ämnesgranskare: Sverker Holmgren

Handledare: Maya Neytcheva

Sammanfattning

Det diskreta optimeringsproblemet inneh˚ aller i sin originalform ocks˚ a en Lagrangemulti- plikator men om denna diskretiseras med samma basfunktion som kontrollvariabeln kan systemet reduceras.

¨

andrats mellan tv˚ a iterationer. Det linj¨ ara systemet l¨ oses med en iterativ metod. Eftersom m˚ alet ¨ ar att l¨ osa stora ekvationssystem anv¨ ands en Krylov-metod. Diskretisering av opti- meringsproblemet leder till algebraiska ekvationssystem p˚ a sadelpunkt-form. Dessa system

¨

ar ofta illa-konditionerade och f¨ or att l¨ osa dem kr¨ avs d¨ arf¨ or effektiva prekonditioneringsme- toder.

Tre olika prekonditioneringsmetoder ( P I , P II , P III ) f¨ or det reducerade systemet testas och j¨ amf¨ ors i f¨ orh˚ allande till iterationsantal och l¨ osningstid f¨ or ett stort spann av parametrar.

L¨ agst iterationsantal uppn˚ as med P III men d˚ a den inte kunde appliceras p˚ a ett effektivt s¨ att har den ocks˚ a h¨ ogst l¨ osningstid. B˚ ade P I och P II kan appliceras effektivt och ¨ overlag

¨

ar P II snabbare b˚ ade i egenskap av iterationer och l¨ osningstid.

iii

iv

Contents

Nomenclature viii

1 Introduction 1

1.1 PDE-constrained optimization problems . . . . 1

1.2 Goal of study . . . . 2

1.3 Problem setting and available tools . . . . 2

1.4 Layout of the thesis . . . . 2

2 Mathematical framework 3 2.1 Saddle point problems . . . . 3

2.1.1 Two-by-two block matrices . . . . 4

2.2 State constrained optimal control problems with Moreau-Yosida regularization 6 2.2.1 Discrete system . . . . 6

2.2.2 Reduced system . . . . 7

3 Numerical solution methods 10 3.1 Newton-type solution methods for nonlinear problems . . . . 10

3.1.1 Semismooth Newton method . . . . 11

3.1.2 Primal-dual active set method . . . . 12

3.2 Krylov subspace iteration methods for linear problems . . . . 12

3.2.1 Conjugate Gradient (CG) Method . . . . 13

3.2.2 Generalized Minimal Residual (GMRES) Method . . . . 14

3.2.3 Preconditioned Krylov subspace methods . . . . 15

3.2.4 Multigrid methods . . . . 17

4 Preconditioners for OPT-PDE 19 4.1 State constrained OPT-PDE with Moreau-Yosida penalty function . . . . . 19

4.1.1 A nonstandard inner product preconditioner . . . . 19

4.1.2 Structure utilizing preconditioners . . . . 20

4.2 Implementation of preconditioners . . . . 26

4.2.1 Implementation of P I and P III . . . . 26

4.2.2 Implementation of P II . . . . 27

4.2.3 Numerical realization of preconditioners . . . . 28

v

5 Result 29

5.1 Solver implementation . . . . 29

5.2 Numerical results for P III . . . . 30

5.3 Numerical results P I . . . . 35

5.4 Numerical results P II . . . . 39

5.5 Discussion . . . . 42

6 Conclusion 46 6.1 Outlook and future work . . . . 46

Bibliography 48

CONTENTS vii

Nomenclature

Abbreviations

AMG Algebraic multigrid

CG Conjugate gradient method FEM Finite element method

FGMRES Flexible generalized minimal residual method GCR Generalized conjugate residual method

GMRES Generalized minimal residual method KKT Karush-Kuhn-Tucker

MINRES Minimal resiudual method

OPT-PDE Optimization problems constrained by partial differential equations SPD Symmetric positive definite

ar P _II snabbare b˚ ade i egenskap av iterationer och l¨ osningstid.

4.2.2 Implementation of P _II . . . . 27

5.4 Numerical results P _II . . . . 39

R ⁿ Real coordinate space of n dimensions β Tikhonov regularization parameter

(y − ˆ y) ² dx + β 2

u ² dx, s.t. − ∆y = u in Ω,

Ω u ² dx is the Tikhonov regularization term which is added due to the problem being ill-posed. The parameter β is the Tikhonov regularization parameter, also referred to as the control cost parameter.

A B ₁ ^T B ₂ −C

x y

= f g

where A ∈ R ^n×n , B ₁ , B ₂ ∈ R ^m×n , C ∈ R ^m×m . Following [6], the matrix in (2.1) is said to be of saddle point form if one or more of the following conditions are satisfied:

1. A = A ^T (A is symmetric).