• No results found

Approaches to accelerate methods for solving systems of equations arising in nonlinear optimization

N/A
N/A
Protected

Academic year: 2022

Share "Approaches to accelerate methods for solving systems of equations arising in nonlinear optimization"

Copied!
35
0
0

Loading.... (view fulltext now)

Full text

(1)

Doctoral Thesis in Applied and Computational Mathematics

Approaches to accelerate methods for solving systems of equations arising in nonlinear optimization

DAVID EK

Stockholm, Sweden 2020 www.kth.se

ISBN 978-91-7873-713-0 TRITA-SCI-FOU 2020:37

kth royal institute of technology

Dav iD Ek Appr oa ch es to ac ce ler ate me th od s fo r solv in g sys te m s o f e qu ati on s a ris in g in no nli ne ar o pti m iza tio n K TH 20 20

(2)

Approaches to accelerate methods for solving systems of equations arising in nonlinear optimization

DAVID EK

Doctoral Thesis in Applied and Computational Mathematics KTH Royal Institute of Technology

Stockholm, Sweden 2020

Academic Dissertation which, with due permission of the KTH Royal Institute of Technology,

is submitted for public defence for the Degree of Doctor of Philosophy on Friday the 18th of

December 2020, at 10:00 a.m. in U1, KTH Royal Institute of Technology, Brinellvägen 28A,

Stockholm, Sweden.

(3)

© David Ek

Cover page photo: Photographer name ISBN 978-91-7873-713-0

TRITA-SCI-FOU 2020:37

Printed by: Universitetsservice US-AB, Sweden 2020

(4)

iii

Abstract

Methods for solving nonlinear optimization problems typically involve solving systems of equations. This thesis concerns approaches for acceler- ating some of those methods. In our setting, accelerating involves finding a trade-off between the computational cost of an iteration and the quality of the computed search direction. We design approaches for which theoretical results in ideal settings are derived. We also investigate the practical perfor- mance of the approaches within and beyond the boundaries of the theoretical frameworks with numerical simulations.

Paper A concerns methods for solving strictly convex unconstrained quad- ratic optimization problems. This is equivalent to solving systems of linear equations where the matrices are symmetric positive definite. The main fo- cus is exact linesearch limited-memory quasi-Newton methods which generate search directions parallel to those of the method of preconditioned conjugate gradients. We give a class of limited-memory quasi-Newton methods. In addi- tion, we provide a dynamic framework for the construction of these methods.

The methods are meant to be particularly useful for solving sequences of re- lated systems of linear equations. Such sequences typically arise as methods for solving systems of nonlinear equations converge.

Paper B deals with solving systems of nonlinear equations that arise in interior-point methods for bound-constrained nonlinear programming. Appli- cation of Newton’s method gives sequences of systems of linear equations. We propose partial and full approximate solutions to the Newton systems. The partial approximate solutions are computationally inexpensive, whereas each full approximate solution typically requires a reduced Newton system to be solved. In addition, we suggest two Newton-like approaches, which are based upon the proposed partial approximate solutions, for solving the systems of nonlinear equations.

Paper C is focused on interior-point methods for quadratic programming.

We propose a structured modified Newton approach to solve the systems of nonlinear equations that arise. The modified Jacobians are composed of a previous Jacobian, plus one low-rank update matrix per succeeding iteration.

For a given rank restriction, we construct a low-rank update matrix such that the modified Jacobian becomes closest to the current Jacobian, in both 2-norm and Frobenious norm. The approach is structured in the sense that it preserves the nonzero pattern of the Jacobian.

The approaches suggested in Paper B and Paper C are motivated by asymptotic results in ideal theoretical frameworks. In particular, it is shown that the approaches become increasingly accurate as primal-dual interior- point methods converge. A demonstration of the practical performance is given by numerical results. The results indicate the performance and limita- tions of the approaches suggested.

We envisage that the approaches of Paper A, Paper B and Paper C can be useful in parallel, or in combination, with existing solvers in order to accelerate methods.

Paper D is more pedagogical in nature. We give a derivation of the method

of conjugate gradients in an optimization framework. The result itself is well

(5)

iv

known but the derivation has, to the best of our knowledge, not been presented before.

Keywords: Nonlinear optimization, mathematical programming, interior-

point methods, approximate solutions to systems of linear equations, method

of conjugate gradients, quasi-Newton methods, modified Newton methods.

(6)

v

Sammanfattning

Metoder för att lösa ickelinjära optimeringsproblem involverar vanligtvis lösning av ekvationssystem. Denna avhandling handlar om strategier för att accelerera några av dessa metoder. I vårt ramverk innebär acceleration att hit- ta en avvägning mellan beräkningskostnaden för en iteration och kvaliteten av den beräknade sökriktningen. Vi utformar strategier för vilka teoretiska resultat i ideala omgivningar härleds. Vi undersöker också strategiernas prak- tiska prestanda inom och utanför gränserna för de teoretiska omgivningarna med numeriska simuleringar.

Artikel A behandlar metoder för att lösa strikt konvexa kvadratiska op- timeringsproblem utan bivillkor. Detta motsvarar att lösa linjära ekvations- system där matriserna är symmetriska och positivt definita. Huvudfokus är kvasi-Newtonmetoder, med exakt linjesökning och begränsat minne, som ge- nererar sökriktningar parallella med de i konjugerade gradientmetoden. Vi ger en klass av kvasi-Newtonmetoder med begränsat minne. Dessutom före- slår vi ett dynamiskt ramverk för att konstruera dessa metoder. Metoderna är avsedda att vara särskilt användbara för att lösa sekvenser av relaterade linjära ekvationssystem. Sådana sekvenser förekommer vanligtvis då metoder för att lösa ickelinjära ekvationssystem konvergerar.

Artikel B handlar om att lösa ickelinjära ekvationsssystem som uppstår i inre-punktsmetoder för ickelinjär optimering med begränsade variabler. Till- ämpning av Newtons metod ger sekvenser av linjära ekvationssystem. Vi före- slår partiella och fullständiga approximativa lösningar till Newton-systemen.

De partiella approximativa lösningarna är beräkningsmässigt billiga, medan varje fullständig approximativ lösning vanligtvis kräver att ett reducerat Newton-system löses. Dessutom föreslår vi två Newton-liknande strategier, som baseras på de föreslagna partiella approximativa lösningarna, för att lösa de ickelinjära ekvationssystemen.

Artikel C berör inre-punktsmetoder för kvadratisk optimering med linjära olikhetsbivillkor. Vi föreslår en strukturerad modifierad Newton-strategi för att lösa de ickelinjära ekvationssystemen som uppstår. De modifierade Ja- kobianerna är sammansatta av en tidigare Jakobian, samt en uppdaterings- matris av låg rang per efterföljande iteration. För en given rangbegränsning konstruerar vi en uppdateringsmatris av låg rang sådan att den modifiera- de Jakobianen blir närmast den nuvarande Jakobianen, i både 2-norm och Frobenious-norm. Strategin är strukturerad i avseendet att den bevarar Ja- kobianens gleshetsstruktur.

Strategierna som föreslås i Artikel B och Artikel C motiveras med asymp- totiska resultat i ideala teoretiska omgivningar. Särskilt visas det att strate- gierna blir alltmer exakta då primal-duala inre-punktsmetoder konvergerar.

En undersökning av deras praktiska prestanda ges genom numeriska resultat.

Resultaten illustrerar de föreslagna strategiernas praktiska prestanda samt deras begränsningar.

Vi föreställer oss att strategierna i Artikel A, Artikel B och Artikel C kan

vara användbara parallellt, eller i kombination, med befintliga lösare för att

accelerera metoder.

(7)

vi

Artikel D är mer pedagogisk i sin natur. Vi ger en härledning av konjuge- rade gradientsmetoden i ett optimeringsramverk. Resultatet i sig är välkänt men härledningen har, såvitt vi vet, inte presenterats tidigare.

Nyckelord: Ickelinjär optimering, matematisk programmering, inre-

punktsmetoder, approximativa lösningar till linjära ekvationssystem, konju-

gerade gradientmetoden, kvasi-Newtonmetoder, modifierade Newtonmetoder.

(8)

Acknowledgements

First and foremost, I would like to give my sincerest thanks to my advisor Anders Forsgren for granting me this opportunity. Thanks for sharing your knowledge, all the encouragement, guidance and support. Your enthusiasm, positive spirit and sense of humor has made this experience not only a privilege but also a pleasure. I am truly grateful for everything that I have learned from you, both academic and on a personal level, and for all the freedom that you have granted me throughout the process.

Also, many thanks to the other members of the faculty at the division, Krister Svanberg, Xiaoming Hu, Per Enqvist and Johan Karlsson, for sharing your know- ledge and providing a friendly atmosphere which have made my time at KTH more pleasant.

I also want to thank my friends and colleagues at KTH for the stimulating and friendly atmosphere. In particular, but without any particular order, thanks to Julian, Thomas, Parikshit, Samuel, Gerard and Simon, for the sports and other activities outside work which helped make this time extra enjoyable. I hope this continues.

Last but not least, thanks to all my friends and family for contributing to a lively and active free time. I am particularly grateful to John, Erik, Ani and, my parents Anna-Maria and Roger, for all the unconditional support and encouragement.

Finally, I would also like to acknowledge and thank the Swedish Research Council (VR) which partly financed the research included in this thesis.

Stockholm, November 2020 David Ek

vii

(9)
(10)

Contents

Abstract iii

Acknowledgements vii

Contents ix

Part I: Introduction and Summary

1 Introduction 1

1.1 Unconstrained nonlinear programming . . . . 2

1.1.1 Linesearch methods . . . . 3

1.1.2 Quadratic problems . . . . 5

1.2 Constrained nonlinear programming . . . . 7

1.2.1 Primal-dual interior-point methods . . . . 8

Bound-constrained nonlinear programming . . . . 12

Quadratic programming . . . . 13

2 Thesis summary and contributions 15 2.1 Summary of appended papers . . . . 15

2.2 Main contribution . . . . 18

References 19

Part II: Scientific Papers Paper A

Exact linesearch limited-memory quasi-Newton methods for minimizing a quadratic function

(joint with Anders Forsgren)

Preprint: https://arxiv.org/abs/1809.10590

(submitted to Computational Optimization and Applications)

ix

(11)

Paper B

Approximate solution of system of equations arising in interior-point methods for bound-constrained optimization

(joint with Anders Forsgren)

Preprint: https://arxiv.org/abs/2004.04057

(submitted to Computational Optimization and Applications) Paper C

A structured modified Newton approach for solving systems of nonlinear equations arising in interior-point methods for quadratic programming (joint with Anders Forsgren)

Preprint: https://arxiv.org/abs/2009.07913 (submitted to SIAM Journal on Optimization) Paper D

An optimization derivation of the method of conjugate gradients (joint with Anders Forsgren)

Preprint: http://arxiv.org/abs/2011.02337 (submitted to SIAM Review)

x

(12)

1 Introduction

Terminology associated with optimization tends to be ambiguous among the general public. Frequently, it is used in contexts of optimizing an experience, a strategy or some performance, etc. The goal is often to select the best decisions, according to some subjective measure, among some available alternatives through trial and error, or simply to make qualitative decisions which improve some previous state.

In these cases, optimization may be without regard to optimality as it may not take into account all feasible alternatives or may not include a well-defined measure of the outcome. In contrast, this thesis concerns mathematical optimization, which typically includes selecting the elements from a feasible set of alternatives, which are the best with respect to a certain well-defined measure.

The history of mathematical optimization is rich and intertwined with many subfields emerging and evolving over time. Optimization as a field began to emerge as mathematical programming gained traction in the 1940’s, although its origin may be traced back further to work on continuous functions using calculus. As mathematical programming expanded and developed, it gave rise to a variety of subfields like linear programming, nonlinear programming, integer programming, stochastic programming, and more. During certain periods of time, some of these subfields were considered more independent of each other and evolved in this way as well. As theory and numerical methods developed across the subfields, these grew to be more interconnected and nowadays all of these go under the umbrella of optimization. The problems arise in a great variety of areas ranging from finance to healthcare to physics and more. A general finite real-valued optimization problem can be posed on the form

minimize

x∈F f (x) (1.0.1)

where x is a vector of decision variables, F is the feasible set and f : D ⊆ R n → R is

the objective function. Depending on the nature of the objective function f and the

feasible set F these problems may be more or less challenging to solve. As problems

become more complex it quickly becomes nonviable to find analytic solutions and

instead numerical methods are employed. On a high level a numerical method is a

complete set of unambiguous protocols aimed to find the solution of a problem. The

goal in every optimization problem is to find globally optimal solutions, i.e., x ∈ F

1

(13)

1. Introduction

such that f (x ) ≤ f (x) for all x ∈ F . However, in general it may be intractable to determine the global properties of a problem. In consequence numerical methods often seek locally optimal solutions, i.e., x ∈ int(S) ∩ F , for a compact set S, such that f (x ) ≤ f (x) for all x ∈ S ∩ F . A typical tool to establish global properties of a solution is the concept of convexity. Indeed, locally optimal solutions are also globally optimal solutions if the problem is a convex problem, i.e., if F is a convex set and f (x) is a convex function for all x ∈ F [13, Ch. 4].

Notation

Throughout, ∇f and ∇ 2 f denote the gradient and Hessian of a function f , respec- tively. Moreover, R(M ) denotes the range of a matrix M and C 2 denotes the class of functions which are twice continuously differentiable. The notion “ 0” and

“ 0” are defined as positive definite and positive semidefinite respectively.

1.1 Unconstrained nonlinear programming

A twice continuously differentiable unconstrained optimization problem can be for- mulated as

minimize

x∈R

n

f (x) (P)

where f : R n → R is assumed to be in C 2 . A vector x ∈ R n that is a locally optimal solution of (P) satisfies

∇f (x ) = 0, (1.1.1a)

2 f (x )  0. (1.1.1b)

Moreover, x is a strict locally optimal solution if it in addition satisfies

2 f (x )  0. (1.1.1c)

The stationarity condition (1.1.1a) constitutes a first-order necessary condition for a locally optimal solution of (P). This condition combined with (1.1.1b) and (1.1.1c) form second-order necessary conditions and second-order sufficient conditions re- spectively. Numerical methods typically aim to find solutions which satisfy the stationarity condition (1.1.1a). For strictly convex problems stationary solutions are also globally optimal solutions.

The focus of this thesis is iterative methods where the aim at each iteration k is to improve a current solution x k by moving to x k+1 . Two fundamental techniques to advance from x k to x k+1 are linesearch methods and trust-region methods. A typical iteration in a trust-region method takes the from x k+1 = x k + d k , where the size of d k is restricted by a parameter that depends on k. The step d k is typically chosen such that x k+1 minimizes a local approximation of the objective function, see e.g., [17] [62, Ch. 4] [67, Ch. 6] for further details on trust-region methods.

2

(14)

1.1. Unconstrained nonlinear programming

1.1.1 Linesearch methods

In a linesearch method, the search direction is calculated prior to the step length.

For a thorough introduction to these methods, see e.g., [62, Ch. 3] [67, Ch. 2].

A general iteration k takes the from

x k+1 = x k + α k p k ,

where α k is the step length and p k is the search direction. The step length is typically determined such that sufficient decrease of some merit function is achieved at x k+1 . In general, the search direction is chosen such that p k is a descent direction, i.e., ∇f (x k ) T p k < 0. This guarantees the existence of a step such that the objective function is decreased, i.e., the existence of α k such that f (x k+1 ) < f (x k ). The ideal strategy is exact linesearch which, at iteration k, means α k as the solution of

minimize

α≥0 f (x k + αp k ). (1.1.2)

The exact linesearch strategy involves solving an optimization problem on its own and may, in general, be too computationally expensive. More computationally tractable approaches include finding α k such that a certain condition is satisfied, e.g., Armijo, Wolfe, Goldstein, etc. [62, Ch. 3].

Linesearch methods can be separated into subclasses which are determined by the order of the information that is utilized. The order of information affects the computational cost of an iteration. In the last couple of decades, there has been an increased interest in large-scale problems, partly due to the evolution of machine learning and the increased availability of large quantities of data. This has increased the demand for methods where each iteration is computationally in- expensive. Consequently, a substantial amount of research has been focused on first-order methods in recent decades. The perhaps most basic first-order method is gradient descent in which the search direction at iteration k is chosen as the negative gradient, i.e., p k = −∇f (x k ) [18]. Even though this is a descent direc- tion, ∇f (x k ) T p k = −k∇f (x k )k 2 2 < 0, it is well known that gradient descent may converge slowly [13, Ch. 9]. Other popular first-order methods are versions of co- ordinate descent, alternating direction method of multipliers, stochastic gradient methods, etc., [3, 12]. In this work a particular focus is put on the method of con- jugate gradients. In particular its linear version, which is described in detail in Section 1.1.2 below. For a thorough treatment of methods of conjugate gradients and their properties, see e.g., [50, 65] and [49] for a survey on nonlinear versions.

In regard to convergence rates, a general first-order method where each search di- rection is composed of a linear combination of observed gradients, possess at most linear convergence rate in the vicinity of a strict locally optimal solution [60, Ch. 2].

The perhaps most well-known second-order method to find zeros of nonlinear equations is Newton’s method, see e.g., Ortega and Rheinboldt [63]. At each itera- tion k, Newton’s method finds a zero along a linearization of ∇f (x k ). This means solving

2 f (x k )p k = −∇f (x k ). (1.1.3)

3

(15)

1. Introduction

If ∇ 2 f (x k ) is positive definite then (1.1.3) gives the step, in both length and di- rection, to the minimizer of a local quadratic approximation at x k . As iterates approach a solution the local quadratic approximations become increasingly accu- rate and a step length of one becomes natural. If ∇ 2 f (x k ) is positive definite then (1.1.3) also generates descent directions and linesearch may be used as a globaliza- tion strategy. Many methods take unit steps when applicable and adjust otherwise, e.g., if an iterate becomes infeasible or does not produce sufficient decrease of a merit function.

In general the matrix of (1.1.3) may have undesirable properties, such as being singular, indefinite or expensive to factorize. Approaches which modify the ma- trix of (1.1.3) to attain certain properties are called modified Newton methods, see e.g., [62, Ch. 3] for an introduction to these methods. Paper C concerns a struc- tured modified Newton approach where the matrix of (1.1.3) is modified so that the system becomes more computationally tractable to solve.

Newton’s method may achieve rapid local convergence. If the Hessian is lo- cally Lipschitz continuous around a strict locally optimal solution then Newton’s method attains local quadratic convergence [63]. However, each iteration is, in gen- eral, computationally expensive compared to a first-order method as it requires the evaluation of second-order derivatives and the solution of the system (1.1.3).

Computationally tractable alternatives to second-order approaches include quasi- Newton methods. These were initially presented as variable metric methods in the 1950’s by Davidon [20], and were a few years later characterized by Fletcher and Powell [33]. Quasi-Newton methods are first-order methods which sequentially build an approximation of local curvature as iterations proceed. Each iteration k may be interpreted as finding a zero along an approximate linearization of ∇f (x k ).

The search direction is given by

B k p k = −∇f (x k ), (1.1.4)

where B k is an approximation of the true Hessian ∇ 2 f (x k ). In comparison to New- ton’s method, the solution of (1.1.4) is the step to a zero along a linearization of

∇f (x k ) where the curvature information is approximated by B k . If B k is positive definite then (1.1.4) also gives the step to the minimizer of a local quadratic approx- imation at x k with curvature information given by B k . In particular, the Hessian approximation is sequentially built up as iterations proceed as B k−1 is updated by an update matrix U k to

B k = B k−1 + U k .

The matrix U k is typically of low rank, composed of observed information of at most first-order and chosen such that the iterates satisfy some kind of secant condition.

The iterates of the symmetric two-parameter Huang class satisfy the scaled secant condition

B k s k−1 = σ k y k−1 ,

where s k−1 = x k − x k−1 , y k−1 = ∇f (x k ) − ∇f (x k−1 ) and σ k is one of the free parameters [52]. The choice σ k = 1 gives the well known Broyden class where B k−1

4

(16)

1.1. Unconstrained nonlinear programming

is updated by a rank-two matrix to B k = B k−1 − 1

s T k−1 B k−1 s k−1

B k−1 s k−1 s T k−1 B k−1 + 1 y T k−1 s k−1

y k−1 y k−1 T

+ φ k−1 ω k−1 ω k−1 T , with

ω k−1 = s T k−1 B k−1 s k−1  1/2  1 y k−1 T s k−1

y k−1 − 1

s T k−1 B k−1 s k−1

B k−1 s k−1

 ,

and φ k−1 as a free parameter [32]. The parameter φ k−1 = 0 gives the Broyden- Flecher-Goldfarb-Shanno (BFGS) scheme and φ = 1 gives Davidon-Fletcher-Powell (DFP) scheme. Quasi-Newton methods in the Broyden class generate identical iterates under exact linesearch [25]. Under reasonable assumptions certain quasi- Newton schemes attain local superlinear convergence [24, Ch. 8-9] [14,23] [62, Ch. 6]

[31, Ch. 3].

1.1.2 Quadratic problems

We will be particularly interested in unconstrained quadratic optimization problems as such problems are fundamentally important. These problems can be formulated as

minimize

x∈R

n

1

2 x T Hx + c T x (QP)

where H = H T ∈ R n×n and c ∈ R n . From (1.1.1) it follows that (QP) has a locally optimal solution only if H  0 and there exists a vector x ∈ R n such that

∇f (x ) = 0, or equivalently,

Hx + c = 0, (1.1.5)

i.e., if −c ∈ R(H). Moreover, the solution is unique if H  0. The focus in this work is on the case where H  0. Solving (QP) is then equivalent to solving the system of linear equations in (1.1.5). As problems become large, direct methods typically become computationally intractable and iterative methods are considered. Herein, a particular emphasis is put on the method of conjugate gradients as proposed by Hestenes and Stiefel [51]. A derivation of the method in an optimization framework is given in Paper D. The method is an exact linesearch method that generates search directions which are mutually conjugate with respect to H, i.e., at iteration k the step length α k is the solution of (1.1.2) and search directions satisfy p T i Hp j = 0, i 6= j. For quadratic problems it rather natural to use exact linesearch since an explicit expression for the solution of (1.1.2) is attainable. The search directions and step lengths are given by

5

(17)

1. Introduction

p k = (−∇f(x k ) k = 0,

−∇f (x k ) + ∇f (x ∇f (x

k

)

T

∇f (x

k

)

k−1

)

T

∇f (x

k−1

) p k−1 k ≥ 1,

and

α k = − ∇f (x k ) T p k

p T k Hp k

k ≥ 0,

respectively. Note that the method does not require an explicit representation of the matrix H as it only requires matrix-vector products with H to be computable.

In exact arithmetic, the method of conjugate gradients converges in at most n steps. Specifically, convergence is achieved in the number of steps equal to the number of distinct eigenvalues of H [65, Ch. 1]. However, in finite precision arith- metic round-off errors typically deteriorate the performance of the method [53].

Under exact linesearch and with B 0 = I, quasi-Newton methods in the Huang class generate search directions which are parallel to those of the method of conjugate gradients [37, 52]. The analogous result also holds for an arbitrary symmetric posi- tive definite matrix B 0 and the method of preconditioned conjugate gradients with preconditioning matrix B 0 [38]. In essence, the goal of preconditioning is to linearly transform the original system of linear equations so that it may be solved more effi- ciently, see e.g., [66, Ch. 10] [42, Ch. 11] [68] for an introduction to preconditioning.

The performance of quasi-Newton methods is, in comparison to the method of conjugate gradients, typically not as sensitive to round-off errors. However, each iteration in a quasi-Newton method is more computationally expensive. A nat- ural question is then how methods may be constructed that are computationally inexpensive compared to quasi-Newton methods and less affected by round-off er- rors than the method of conjugate gradients, e.g., methods and solution techniques which use limited-memory [27–29, 61]. This is in essence the concern of Paper A.

In the paper, a particular focus is put on the case where a natural preconditioner exists and the aim is to solve a sequence of related systems of linear equations. Two examples where such sequences arise in optimization are: i) when Newton’s method is used to solve unconstrained nonlinear optimization problems on the form (P).

Close to a strict locally optimal solution this means solving sequences of related systems of linear equations on the form (1.1.3) where each ∇ 2 f (x k )  0. A natural preconditioner is then ∇ 2 f (x k ˆ )  0 for some ˆ k < k; ii) when solving sequences of condensed systems of linear equations on the form (1.2.7) as interior-point meth- ods converge to a solution, see Section 1.2.1. In certain situations, the matrices of the condensed systems are positive definite and changes in the matrices between iterations are small. Similarly here, a natural preconditioner is a matrix of the condensed system at a previous iteration.

6

(18)

1.2. Constrained nonlinear programming

1.2 Constrained nonlinear programming

A general twice continuously differentiable optimization problem can be formulated as

minimize

x∈R

n

f (x)

subject to g i (x) = 0, i ∈ E , g i (x) ≥ 0, i ∈ I.

(NLP)

where f : R n → R and g i : R n → R, i = 1, . . . , m, are assumed to be in C 2 . The sets E and I contain the indices of equality and inequality constraints respectively.

These sets are such that E ∩ I = ∅ and E ∪ I = {1, . . . , m}. Moreover, let A(x) be defined as the Jacobian of the constraints, i.e., A(x) = (∇g 1 (x), . . . , ∇g m (x)) T . As may be anticipated, the situation with nonlinear constraints may be rather complicated. In general, it is useful to distinguish inequality constraints that are active at a current solution x from those that are inactive. For a given x, con- straint g i (x), i ∈ I, is said to be active if g i (x) = 0 and inactive if g i (x) > 0.

Local optimality conditions can be formed by combining the concepts of Karush- Kuhn-Tucker-conditions (KKT-conditions) and constraint qualifications. A vector x ∈ R n is a second-order KKT-solution of (NLP) if there exists a Lagrange mul- tiplier vector λ ∈ R m such that

∇f (x ) − A(x ) T λ = 0, (1.2.1a) g i (x ) = 0, i ∈ E , (1.2.1b) g i (x ) ≥ 0, λ i ≥ 0, g i (x i = 0, i ∈ I, (1.2.1c) Z A (x ) T2 xx L(x , λ )Z A (x )  0, (1.2.1d) where L(x, λ) = f (x) − λ T g(x) is the Lagrangian and Z A (x ) is a matrix whose columns span the nullspace of the Jacobian corresponding to active constraints, i.e., constraints g i (x), i ∈ {i ∈ I : g i (x ) = 0} ∪ E . The KKT-conditions of (NLP) are given by (1.2.1a)-(1.2.1c) [54, Ch. 11]. The KKT-conditions are based upon prop- erties of local first-order approximations of the constraints. Consequently, KKT- conditions constitute local optimality conditions only if local linear approximations of the constraints at x describe the feasible region sufficiently well. This is for- malized with the notion of constraint qualification. Linear independence constraint qualification (LICQ) holds for a feasible vector x if the gradients of the active con- straints at x are linearly independent. Other constraint qualifications also exist, see e.g., [2, Ch. 5] [64, Ch. 5]. In particular, x satisfies first-order necessary conditions for a locally optimal solution of (NLP) if LICQ holds, and there exist multipliers λ such that the KKT-conditions are fulfilled [54, Ch. 11]. Moreover, x and λ sat- isfy second-order necessary conditions if in addition (1.2.1d) holds. Second-order sufficient conditions are satisfied for x and λ if they fulfill second-order neces- sary conditions and in addition Z + (x ) T2 xx L(x , λ )Z + (x )  0, where Z + (x ) is a matrix whose columns span the nullspace of the Jacobian corresponding to constraints g i (x), i ∈ {i ∈ I : g i (x ) = 0, λ i > 0} ∪ E [54, Ch. 11].

7

(19)

1. Introduction

Methods for solving problems on the form (NLP) typically aim to find vectors x and λ which satisfy the KKT-conditions (1.2.1a)-(1.2.1c). Examples of such methods are: sequential quadratic programming (SQP) methods, which may be interpreted as a form of Newton iterations on the KKT-conditions of an equality- constrained program adapted to a program that also contains inequality constraints, see e.g., [48, Ch. 15] [62, Ch. 18] [46]; active-set methods, in which certain inequality constraints may be treated temporarily as equality constraints, e.g., [54, Ch. 15];

penalty or barrier transformation methods, where the approach is to construct a function whose unconstrained optimal solution coincides with the solution of the original problem, either directly or by a known relation, see e.g., [41, Ch. 6] [54, Ch. 13] [48, Ch. 16] [62, Ch. 17]; augmented Lagrangian methods, which may be interpreted as a combined penalty-duality method, see e.g., [54, Ch. 14] [41, Ch. 6];

interior-point methods, which may be viewed as a logarithmic barrier transforma- tion or as Newton iterations on a perturbed version of the KKT-conditions (1.2.1a)- (1.2.1c), see e.g., [48, Ch. 16] [62, Ch. 18] [46, 70]. The main focus in this thesis is primal-dual interior-point methods. These methods are described in more detail in the section below.

In contrast to the unconstrained case, a fundamental difficulty in constrained nonlinear programming is how to handle the two competing goals of decreasing the objective function and satisfying the constraints. It might for instance be nonviable to generate a sequence of iterates that is feasible with respect to all constraints.

Fundamental techniques to balance the competing goals include merit functions and filters [62, Ch. 15]. Merit functions typically weigh decrease of the objective function and constraint violation. A step is only accepted if it satisfies sufficient decrease of some predefined merit function. Filters on the other hand, may be interpreted as a bi-objective, or a two-component measure, which minimizes objective function and constraint violation respectively. A step is only accepted if it sufficiently improves both measures.

1.2.1 Primal-dual interior-point methods

A traditional way of deriving interior-point methods is through logarithmic bar- rier transformations. Such transformations for nonlinear programming were first introduced in 1955 by Frisch [39]. These were later studied in detail by Fiacco and McCormick who also introduced the notion of interior-point techniques [30].

For a thorough introduction to interior-point methods, see e.g., [36,43] [62, Ch. 19].

These methods were specifically designed to handle the combinatorial difficulty that

arises from inequality constraints. In particular, by the complementarity condition

g i (x i = 0, i ∈ I, in (1.2.1c) it must hold that for each i ∈ I, either g i (x ) = 0 or

λ i = 0, or both. To ease the notation in the description of primal-dual interior-point

methods, we consider problems (NLP) with inequality constraints only. However,

equality constraints compose a natural part of constrained nonlinear programming

and need to be dealt with. In practice, this may be done by different techniques

such as barrier-SQP approaches, penalty-barrier approaches or slack and shift re-

8

(20)

1.2. Constrained nonlinear programming

formulations [36]. An inequality-constrained nonlinear program may be posed as minimize

x∈R

n

f (x)

subject to g(x) ≥ 0, (iNLP)

where g(x) = (g 1 (x), . . . , g m (x)) T . As mentioned, a traditional way of deriving interior-point methods is through a logarithmic barrier transformation of the con- straints in (iNLP). This requires that {x ∈ R n : g(x) > 0} 6= ∅ and gives an uncon- strained problem with objective function B µ (x) = f (x)−µ P m

i=1 ln g i (x), µ > 0, and the implicit condition g(x) > 0. Newton iterations on ∇B µ (x) = 0 then give the primal view of interior-point methods. The primal approach has long been disputed since it is impaired by ill-conditioning of the nonlinear equations due to ∇ 2 B µ (x) being singular in the limit as a solution is approached [62, Ch. 19] [48, Ch. 16].

The primal-dual view may be obtained by introducing multipliers, λ i = µ/g i (x), i = 1, . . . , m, in ∇B µ (x) = 0, followed by Newton iterations with x and λ as vari- ables. Herein, we will start from a slightly different angle which is based directly on KKT-conditions and does not require an initial feasible solution. By introducing slack variables s ∈ R m to the inequality constraints in (1.2.1), the conditions for a second-order KKT-solution of (iNLP) become

∇f (x ) − A(x ) T λ = 0, (1.2.2a)

g(x ) − s = 0, (1.2.2b)

s i λ i = 0, i = 0, . . . , m, (1.2.2c)

s ≥ 0, λ ≥ 0, (1.2.2d)

Z A (x ) T2 xx L(x , λ )Z A (x )  0, (1.2.2e) Similarly as in the general case, (1.2.2a)-(1.2.2d) constitute the KKT-conditions of (iNLP). Moreover, a vector x satisfies first-order necessary conditions for a locally optimal solution of (iNLP) if LICQ holds, and there exist vectors λ and s such that the KKT-conditions are fulfilled. Primal-dual interior-point methods may be interpreted as Newton iterations on the perturbed KKT-conditions of (iNLP), i.e., (1.2.2a)-(1.2.2d) where each complementarity condition (1.2.2c) is perturbed by a barrier parameter µ. Each µ gives a system of nonlinear equations described by the function F µ : R n+2m → R n+2m

F µ (x, λ, s) =

∇f (x) − A(x) T λ g(x) − s SΛe − µe

, (1.2.3)

where S = diag(s) and Λ = diag(λ) and e is a vector of ones of appropriate size.

By construction, vectors (x, λ, s), with s ≥ 0 and λ ≥ 0, which fulfill LICQ and

F µ (x, λ, s) = 0, for µ = 0, satisfy first-order necessary conditions for a locally

optimal solution of (iNLP). Interior-point methods aim to solve or approximately

9

(21)

1. Introduction

solve F µ (x, λ, s) = 0 for a decreasing sequence of µ > 0, while preserving λ > 0 and s > 0. Newton iterations on F µ (x, λ, s) = 0 give a system of linear equations on the form

F 0 (x, λ, s)

∆x

∆λ

∆s

= −F µ (x, λ, s), (1.2.4) to be solved at each iteration. The Jacobian F 0 : R 2m+n → R (2m+n)×(2m+n) is defined by

F 0 (x, λ, s) =

2 xx L(x, λ) −A(x) T

A(x) −I

S Λ

,

where the subscript µ has been omitted since the Jacobian is independent of the barrier parameter. To preserve λ > 0 and s > 0 the step length is often chosen as the minimum of one, which is the natural step length in Newton’s method, and a fraction of the maximum feasible step length. To improve efficiency many methods seek approximate solutions of F µ (x, λ, s) = 0 for each µ, see e.g., [1, 4, 44]

for theoretical results on inexact solutions. One approach to seek such approximate solutions is to keep µ fixed until sufficient decrease of a merit function is achieved, thereupon µ is decreased and the process repeated. Another approach to improve efficiency is to dynamically update µ at each iteration. In either case, there are numerous techniques to improve the solution for the specific µ. Newton systems on the form (1.2.4) can be solved directly. Alternatively, (1.2.4) may be reduced to the 2-by-2 block system

∇ 2 xx L(x, λ) −A(x) T

ΛA(x) S

 ∆x

∆λ



= − ∇f (x) − A(x) T λ Λg(x) − µe



, (1.2.5)

together with ∆s = −(s − µΛ −1 e) − Λ −1 S∆λ. If x is strictly feasible, then (1.2.5) with S = diag(g(x)) becomes equivalent to Newton iterations on the perturbed KKT-conditions of (iNLP) without slack variables. Hence, there is a trade-off between the introduction of more variables and the requirement of a strictly feasible initial solution. Under the assumption that Λ is nonsingular, (1.2.5) can be made symmetric and formulated as the reduced system

∇ 2 xx L(x, λ) A(x) T A(x) −Λ −1 S

  ∆x

−∆λ



= − ∇f (x) − A(x) T λ g(x) − µΛ −1 e



. (1.2.6)

If in addition S is nonsingular then the Schur complement of −Λ −1 S in (1.2.6) gives the condensed system

2 xx L(x, λ) + A(x) T S −1 ΛA(x) ∆x = − ∇f (x) − A(x) T λ 

− A(x) T S −1 (Λg(x) − µe) , (1.2.7) in combination with ∆λ = −S −1 (Λg(x) − µe) − S −1 ΛA(x)∆x. Symmetric sys- tems may be desirable for several reasons, e.g., existing efficient solution techniques.

10

(22)

1.2. Constrained nonlinear programming

However, a toll of the transformation to a symmetric system is ill-conditioning. The ill-conditioning arise due to the complementarity condition which forces either s i

or λ i , i = 1, . . . , m, or both, to approach zero as µ → 0. In consequence, some ele- ments in the symmetric matrices of (1.2.6) and (1.2.7) become unbounded. Aspects regarding stability of these systems have been studied in [35, 69, 69] and related ef- fects of finite precision arithmetic in [71]. It should be noted that the inherent ill-conditioning of the symmetric matrices is not always harmful for direct solvers, e.g., for diagonal ill-conditioning [35]. However, ill-conditioning typically deteri- orates the performance of iterative solvers. Therefore, in practice preconditioned iterative methods are frequently employed [11, 15, 19, 34, 55]. The construction of a suitable preconditioner that is computationally inexpensive is by itself not straight- forward. The development of such preconditioners is an active area of research in itself [5–10, 40].

As mentioned briefly, Paper A concerns preconditioned iterative methods for sequences of related systems of linear equations where the matrices are symmetric positive definite. A more precise example where such sequences arise in interior- point methods is as they converge. In particular, as they converge to a solution that satisfies second-order sufficient conditions for a locally optimal solution of (iNLP) and strict complementarity, s + λ > 0. The matrix in (1.2.7) is then positive definite. Consequently, the method of preconditioned conjugate gradient may be used to solve the systems that arise. Moreover, close to such a solution the changes in the iterates may be anticipated to be small. Given that a previous factorization of the matrix in (1.2.7) is available it may be used as a natural preconditioner. This situation is hence one of the possible applications of the results of Paper A.

The solution of the Newton system (1.2.4), or equivalently the solution of (1.2.5)- (1.2.7), is typically of high quality. However, these systems may be rather complex and computationally expensive to solve. As inexact solutions of F µ (x, λ, s) = 0 often are sufficient for each µ, alternative approaches may be considered. E.g., approximate solutions of the systems (1.2.4)-(1.2.7), or quasi-Newton and modified Newton approaches to approximately solve F µ (x, λ, s) = 0 at each µ. Approximate solutions of the systems (1.2.4)-(1.2.7) are also related to termination control of iterative methods for the systems which is a topics in itself [16,19], see also truncated Newton methods and inexact Newton methods [21, 22, 26, 58, 59]. Each iteration in these alternative approaches typically give lower quality solutions but conceivably to a lower computational cost compared to solving (1.2.4)-(1.2.7) exactly. Thus, there is a trade-off between finding solutions to Newton systems on the form (1.2.4) and performing more iterations with alternative approaches.

For certain subclasses of constrained nonlinear programming the Newton sys- tem (1.2.4) simplifies and gives rise to specific characteristics. These character- istics may be utilized to construct alternative approaches to approximately solve F µ (x, λ, s) = 0. The remaining part of this section will be devoted to two important subclasses which are considered in Paper B and Paper C.

11

(23)

1. Introduction

Bound-constrained nonlinear programming

The discussion in this section will be given for nonlinear problems subject to non- negativity bounds only. This will ease the notation and make the description more comprehensible. However, note that these problems in themselves constitute an important subclass, for example as they are frequent subproblems of augmented Lagrangian methods [46]. An analogous discussion can also be given for gen- eral bound-constrained programs, i.e., for (iNLP) with constraints on the form l ≤ x ≤ u, where l, u ∈ {R ∪ {−∞, ∞}} n . Nonlinear problems subject to non- negativity bounds can be posed as

minimize

x∈R

n

f (x)

subject to x ≥ 0. (BP)

Problem (BP) is thus (iNLP) with g(x) = x. The function of interest F µ : R 2n → R 2n is in this case

F µ (x, λ) = ∇f (x) − λ XΛe − µe



. (1.2.8)

Newton iterations on F µ (x, λ) = 0 give systems of linear equations on the form

∇ 2 f (x) −I

Λ X

 ∆x

∆λ



= −F µ (x, λ). (1.2.9)

If X is nonsingular then the solution of (1.2.9) can also be obtained from the Schur complement of X

2 f (x) + X −1 Λ ∆x = −∇f (x) + µX −1 e, (1.2.10) in combination with ∆λ = − λ − µX −1 e − X −1 Λ∆x. Paper B concerns partial and full approximate solutions of (1.2.9). These approximate solutions are derived by exploiting the asymptotic behavior of x and λ as interior-point methods con- verge to a solution, together with the specific structure of (1.2.9) and (1.2.10). In particular, as µ → 0 either x i → 0 or λ i → 0 for strict complementarity prod- ucts. Consequently, some diagonal elements in X −1 Λ of (1.2.10) will dominate the coefficients of the matrix as µ → 0. Partial approximate solutions can hence be formed by neglecting off-diagonal elements in the matrix of (1.2.10). These par- tial approximate solutions become increasingly accurate as interior-point methods converge. Similarly, by exploiting the asymptotic behavior of strict complemen- tarity products, partial approximate solutions may also be formed from the last block equation of (1.2.9). Full approximate solutions may then also be derived by utilizing the structure of the systems (1.2.9)-(1.2.10) in combination with the afore- mentioned partial approximate solutions. The calculation of such full approximate solutions typically involves solving a reduced system on the form (1.2.9) or (1.2.10).

In certain ideal situations, the matrix of the reduced version of the system (1.2.10)

12

(24)

1.2. Constrained nonlinear programming

does not become increasingly ill-conditioned as µ approaches zero, in contrast to the matrix of (1.2.10). As mentioned, diagonal ill-conditioning may not be harmful for direct methods, however it typically deteriorates the performance of iterative solvers. See Paper B for further details on partial and full approximate solutions of (1.2.9).

Quadratic programming

A particularly important subclass of constrained nonlinear programming is quadratic programming. Quadratic programs appear by themselves but are also subprograms in both SQP and trust-region methods. Each iteration of these methods requires the solution of one quadratic program. A quadratic problem with linear inequality constraints can be formulated as

minimize

x∈R

n

1

2 x T Hx + c T x

subject to Ax ≥ b, (IQP)

where H = H T ∈ R n×n , c ∈ R n , A = R m×n and b ∈ R m . Problem (IQP) is thus the special case of (iNLP) with f (x) = 1 2 x T Hx + c T x and g(x) = Ax − b.

In consequence A(x) = A and ∇ 2 xx L(x, λ) = H. The function of interest F µ (x, λ, s) : R n+2m → R n+2m becomes

F µ (x, λ, s) =

Hx + c − A T λ Ax − b − s

SΛe − µe

. (1.2.11)

Newton iterations on F µ (x, λ, s) = 0 give systems on the form (1.2.4) with F µ (x, λ, s) as in (1.2.11) and Jacobian

F 0 (x, λ, s) =

H −A T

A −I

S Λ

. (1.2.12)

In contrast to the general case, the Jacobian of (1.2.12) is computationally inex- pensive to evaluate. The most computationally expensive part of an iteration is thus to find the solution of the Newton system. As mentioned, this may be done by either solving the Newton system directly or solving the corresponding reduced or condensed system, see [57] for a comparison of the solutions of the different sys- tems for quadratic programs on standard form. It is well known that the unreduced block 3-by-3 matrix has favorable spectral properties [47, 56]. The change of the Jacobian given by (1.2.12) in the step to (x + ∆x, λ + ∆λ, s + ∆s) is linear and only depends on ∆λ and ∆s since

F 0 (x + ∆x, λ + ∆λ, s + ∆s) = F 0 (x, λ, s) + ∆F 0 (∆x, ∆λ, ∆s),

13

(25)

1. Introduction

where

∆F 0 (∆x, ∆λ, ∆s) =

0 0 0

0 0 0

0 ∆S ∆Λ

, (1.2.13)

with ∆Λ = diag(∆λ) and ∆S = diag(∆s). The change of the Jacobian is thus in general of rank m. A natural question is then if there are situations when it may be viable to consider certain low-rank updates. Quasi-Newton approaches with Broy- den rank-one updates have been studied by Gondzio and Sobral [45]. A particularly interesting situation occurs as interior-point methods converge to a solution. Specif- ically, a solution that satisfies second-order sufficient conditions for a locally optimal solution of (IQP) and strict complementarity. It may then be anticipated that the changes in the Jacobian matrix (1.2.12) are small. If in addition a factorization of a previous Jacobian is available then low-rank updates may be a tractable alterna- tive to performing re-factorizations of the full Jacobian. Paper C concerns modified Jacobians composed of a previous Jacobian, whose factorization is assumed to be available, plus one low-rank update matrix per succeeding iteration. The suggested update matrices preserve the sparsity pattern of the true Jacobian. Consequently, each modified Jacobian may be viewed as a Jacobian at a different point. Such an approach may be interpreted as a modified Newton approach in the following sense.

The Jacobian of each Newton system is modified to a Jacobian at a different point such that the system becomes less computationally expensive to solve. Alterna- tively, the approach may be interpreted as a quasi-Newton method which updates the previous Jacobian approximation by a low-rank matrix. The initial Jacobian approximation is then one at a previous iteration whose factorization is assumed to be available. See Paper C for further details.

14

(26)

2 Thesis summary and contributions

The main content of this thesis includes the four appended papers. This section contains a summary for each of these papers, in which the core contribution is high- lighted. Thereafter, the main contribution of the thesis is described in connection to the development of effective numerical methods.

2.1 Summary of appended papers

The bulk of the work in papers A, B and C, including the preparation of the manuscripts, has been carried out by Ek who is therefore the main author.

Paper D is more pedagogical in nature. It is the result of discussions between Ek and Forsgren on Paper A. The co-author Forsgren has throughout acted as an academic adviser in the classical sense, suggesting directions of research and supervising the work.

Paper A: Exact linesearch limited-memory quasi-Newton methods for minimizing a quadratic function

The paper is co-authored with Anders Forsgren and has been submitted to Computational Optimization and Applications.

This paper concerns unconstrained strictly convex quadratic optimization prob- lems. This is equivalent to solving systems of linear equations where the matrix is symmetric positive definite. The focus is on exact linesearch limited-memory methods which generate search directions parallel to those of the method of pre- conditioned conjugate gradients. We give a class of limited-memory Hessian ap- proximations which, in a quasi-Newton setting, generate search directions with the desired property. It is shown that this class in combination with concepts of reduced-Hessian methods provide a dynamical framework for the construction of new methods. In essence, we propose a class of limited-memory quasi-Newton methods and provide a framework for the construction of these methods including results on complexity.

15

(27)

2. Thesis summary and contributions

In order to give an initial indication of the practical performance of the methods in the class, we construct two members and give numerical results. We present results when the techniques are applied to sequences of related systems of linear equations. The linear systems are constructed by Newton’s method as it converges to solutions of unconstrained nonlinear optimization problems. The results indicate that the proposed class of methods provides potential to decrease the number of iterations necessary for convergence in comparison to the method of preconditioned conjugate gradients.

The overall practicability of the methods in the class is dependent on the CPU time. The time depends on the specific choice of method in the class and its implementation which can be tailored depending on the application. Although such studies are of utmost importance to evaluate the practicability of the methods, they are beyond the scope of this first paper.

The paper also contains compact representations of quasi-Newton Hessian ap- proximations. A representation is introduced for members of the full Broyden class while considering general unconstrained nonlinear problems. The representations consist of explicit matrices and gradients only as vector components.

The main contribution of the paper is the characterization of a class of limited- memory Hessian approximations together with the framework for constructing methods. This combination provides a dynamic framework for a tailored construc- tion of limited-memory quasi-Newton methods.

Paper B: Approximate solution of system of equations arising in interior-point methods for bound-constrained optimization

The paper is co-authored with Anders Forsgren and has been submitted to Computational Optimization and Applications.

This paper concerns primal-dual interior-point methods for bound-constrained nonlinear optimization. In particular when Newton’s method is applied to solve the systems of nonlinear equations that arise. We propose partial and full ap- proximate solutions to the Newton systems. The approximate solutions depend on estimates of the active constraints at the solution of the problem considered. The partial approximate solutions are computationally inexpensive, whereas each full approximate solution requires a reduced Newton system to be solved. The size of the reduced system depends on the estimated number of active constraints at the optimal solution. It is shown that the approximate solutions become increasingly accurate as interior-point methods converge in an ideal theoretical framework.

In addition, we motivate and propose two Newton-like approaches which involve an intermediate step combined with the solution of a Newton-like system. The intermediate step is computationally inexpensive as it consists of the proposed partial approximate solutions.

16

(28)

2.1. Summary of appended papers

Numerical results are given in order to give an initial indication of the prac- tical performance within and beyond the ideal theoretical framework. The initial numerical results were produced with a primitive primal-dual interior-point imple- mentation.

The results indicate that the suggested approaches may be useful as interior- point methods converge. However, future work is needed to further access the practicability. In particular, the approximate solutions need to be implemented with care in more sophisticated interior-point solvers.

The main contribution is the approximate solutions motivated by theoretical results in the asymptotic region of our framework. We envisage that the suggest approach can be useful in parallel, or in combination, with a direct solver to accel- erate interior-point methods as they converge to a solution.

Paper C: A structured modified Newton approach for solving systems of nonlinear equations arising in interior-point methods for quadratic programming

The paper is co-authored with Anders Forsgren and has been submitted to SIAM Journal on Optimization.

This paper concerns primal-dual interior-point methods for quadratic problems with linear inequality constraints. We propose a structured modified Newton ap- proach to solve the systems of nonlinear equations that arise. The modified Ja- cobians are in general composed of a previous Jacobian plus one low-rank update matrix per succeeding iteration. Each update matrix is, for a given rank, chosen such that the modified Jacobian becomes closest to the current Jacobian, in both 2-norm and Frobenius norm. The approach is structured as is preserves the spar- sity structure of the true Jacobian. Consequently, each modified Jacobian may be viewed as a Jacobian at a different point. The analysis is done on the unsymmetric block 3-by-3 Newton system. However, we also show that an analogous update may be done in the reduced and the condensed system without affecting the rank of the update.

The choice of update matrix is motivated by asymptotic results in an ideal theoretical framework. We produce numerical results with a basic primal-dual interior-point implementation to investigate the practical performance within and beyond the theoretical framework.

The general ideas of the approach show potential but further work is needed to access the practicability of the approach. In particular, the approach has to be implemented in a more sophisticated interior-point solver with a precise way of solving the modified Newton systems.

17

(29)

2. Thesis summary and contributions

The main contribution is the specific choice of low-rank update matrix which is motivated by theoretical results in the asymptotic region of our setting. We envisage that the structured modified Newton approach can be useful in parallel, or in combination, with a Newton solver to accelerate interior-point methods as they converge.

Paper D: An optimization derivation of the method of conjugate gradients

The paper is co-authored with Anders Forsgren and has been submitted to SIAM Review.

We give a derivation of the method of conjugate gradients in an optimization framework. To the best of our knowledge, the derivation proposed has not been presented before. The derivation is based on the requirement that each iterate min- imizes a strictly convex quadratic on the space spanned by the previously observed gradients. We show that the generation of such iterates is equivalent to the gener- ation of orthogonal gradients. Rather than verifying that the search direction has the correct properties, we show that the generation of orthogonal gradients gives the description of the direction.

In addition, it is shown that this approach provides a straightforward geometri- cal interpretation of the successive search directions as they are parallel to gradients of minimum Euclidean norm.

The main contribution is the pedagogic nature of the derivation itself.

2.2 Main contribution

The development of effective numerical procedures typically involve different stages.

Most approaches are initially described in a mathematical framework from which theoretical results for the approaches in certain settings are derived. Thereafter, the practical performance of the approaches is investigated, often within and beyond the theoretical setting. Depending upon the initial results, the approaches may be implemented with greater care in more sophisticated solvers and evaluated further before being finalized in a practicable version. The contribution of this thesis lies in the former stages of the process. In essence, the contribution lies in the ideas and the approaches themselves, which are aimed to improve classes of important methods for nonlinear optimization. We envisage the these can facilitate in accelerating methods for solving systems of equations arising in nonlinear optimization, either by acting as procedures which can be run in parallel to existing methods or as methods themselves.

18

(30)

References

[1] P. Armand, J. Benoist, and J.-P. Dussault. Local path-following property of inexact interior methods in nonlinear programming. Comput. Optim. Appl., 52(1):209–238, 2012.

[2] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty. Nonlinear programming.

Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, third edition, 2006.

Theory and algorithms.

[3] A. Beck. First-order methods in optimization, volume 25 of MOS-SIAM Series on Optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA; Mathematical Optimization Society, Philadelphia, PA, 2017.

[4] S. Bellavia. Inexact interior-point method. J. Optim. Theory Appl., 96(1):109–

121, 1998.

[5] S. Bellavia, V. De Simone, D. di Serafino, and B. Morini. A precondition- ing framework for sequences of diagonally modified linear systems arising in optimization. SIAM J. Numer. Anal., 50(6):3280–3302, 2012.

[6] S. Bellavia, V. De Simone, D. di Serafino, and B. Morini. Updating constraint preconditioners for KKT systems in quadratic programming via low-rank cor- rections. SIAM J. Optim., 25(3):1787–1808, 2015.

[7] M. Benzi, G. H. Golub, and J. Liesen. Numerical solution of saddle point problems. Acta Numer., 14:1–137, 2005.

[8] L. Bergamaschi, V. De Simone, D. di Serafino, and A. Martínez. BFGS-like updates of constraint preconditioners for sequences of KKT linear systems in quadratic programming. Numer. Linear Algebra Appl., 25(5):e2144, 19, 2018.

[9] L. Bergamaschi, J. Gondzio, M. Venturin, and G. Zilli. Inexact constraint preconditioners for linear systems arising in interior point methods. Comput.

Optim. Appl., 36(2-3):137–147, 2007.

[10] L. Bergamaschi, J. Gondzio, M. Venturin, and G. Zilli. Erratum to: Inexact constraint preconditioners for linear systems arising in interior point methods [mr2312004]. Comput. Optim. Appl., 49(2):401–406, 2011.

19

References

Related documents

· àegbDd^o‡egbDd†hgxuwd—vpyjn;v{dLy~er?ikhFi‚opbDd^ygi¢egdLŠwzw Z|Do{Š}d^yNld^yjegxJi‚o‡d^Èejysxln;opŠ}i‚ejiknJophLœ*ɗdLhj|D ¢esh0nJt

1 Arbetsmiljöverkets föreskrifter om smittrisker (AFS 2018:4).. I projektet har vi intervjuat, observerat och i några fall deltagit i produktionsprocessen och därefter kontrollerat

Detta fenomen kopplades även till vinets ursprungstypicitet som informanten ansåg vara en del av kvalitetsbegreppet, vilket kan illustreras genom citatet ”Att sitta ner och

It has been confirmed by work done at SP and SINTEF NBL that mounting the heat flux meter in a specially designed water cooled holder reduces the measurement uncertainty due

At stator 1 outlet, figure 4.5, the differences between temperature contours become noticeable, but since in single passage steady state simulation the MP method is used, the

8. Resultat 

The method of sequential quadratic programming, suggested by Wilson [74] in 1963, for the special case of convex optimization, has been of great inter- est for solving

1528, 2013 Department of Electrical Engineering. Linköping University SE-581 83