Feasible Direction Methods for Constrained Nonlinear Optimization

(1)

Feasible Direction Methods for

Constrained Nonlinear Optimization

Suggestions for Improvements

Maria Mitradjieva–Daneva

(2)

Link¨oping Studies in Science and Technology. Dissertations, No. 1095

Feasible Direction Methods for Constrained Nonlinear Optimization

Suggestions for Improvements

Maria Mitradjieva-Daneva

Division of Optimization, Department of Mathematics, Link¨oping University, SE-581 83 Link¨oping, Sweden

ISBN: 978-91-85715-11-4 ISSN 0345-7524

Typeset LA_{TEX 2ε}

Printed by LiU-Tryck, Link¨oping University, SE - 581 83 Link¨oping, Sweden 2007

(3)

To the men of my life,

Stefan, Petter, Martin and Danyo.

(4)

(5)

Acknowledgments

There are a lot of people who made the appearance of this work possible.

First of all my sincere thanks go to professor Maud G¨othe Lundgren, for all encouragement and advice. My deepest thanks for her support during my time of difficulties.

Many thanks go to Clas Rydergren for all help and support he has given me over the years. It has been a great pleasure to work with him.

Special thanks to Torbj¨orn Larsson for his guidance in optimization theory, research and writing methodology. The interesting discussions with him and his profound remarks were very helpful.

I would like to thank professor Per Olov Lindberg, for giving me the op-portunity to work within the Optimization group in Link¨oping. I have to thank him for his importance in my work, for his enthusiasm and endless finickiness.

I also gratefully acknowledge the financial support from KFB (Swedish Trans-portation & Communications Research Board) and later Vinnova under the project ”Mathematical Models for Complex Problems within the Road and Traffic Area”.

Special acknowledgments to Leonid Engelson from Inregia, Stockholm, for introducing me an interesting research topic.

Many thanks go to all my colleagues at the Division of Optimization. There are many that from behind the scenes have encouraged me and made my work pleasant and easier. I am especially grateful to Helene, Andreas and Oleg for all support and discussions. Sometimes only a few words can make a lot!

(6)

Many heartfelt thanks to the girls in LiTH Doqtor, esspecially to Linn´ea. There is one man in my life who urged me on by way of his unbelievable generousness and love. To Danyo, I send all my love.

Last, but absolutely not least, I would like to express my deepest gratitude to my parents, my sister Rumi and my friends only for being there.

Thank you to my lovely sons, Petter, Stefan and Martin, who made hard times seem brighter with their cheering laugh.

To all of you, I send my deepest gratitude!

Link¨oping, May 2007 Maria Mitradjieva-Daneva

(7)

Sammanfattning

Avhandlingen behandlar utveckling av nya effektiva optimeringsmetoder. Optimering med hjälp av matematiska modeller används inom en mängd tillämpningar, s˚asom trafikplanering, telekommunikation, schemaläggning, produktionsplanering, finans, massa- och pappersindustri. I avhandlingen studeras lösningsmetoder för olinjära optimeringsproblem.

De optimeringsmetoder som utvecklas i avhandlingen är tillämpbara för ett stort antal problemtyper. I avhandlingen studeras bland annat trafikjäm-viktsproblemet, vilket är centralt vid analys och planering av trafiksystem. Denna typ av modell kan användas för att simulera val av färdväg i sam-band med arbetsresor i tätort. Vi har studerat flera typer av trafikjämvikter, till exempel s˚adana som tar hänsyn till trafikanternas tidsvärdering vid beräkning av vägavgifter baserade p˚a samhällsekonomiska marginalkostnader. Avhandlingen beskriver nya koncept för snabbare och noggrannare lösnings-metoder. Snabbhet och noggrannhet är speciellt viktiga d˚a man har opti-meringsproblem med ett stort antal beslutsvariabler. Metoderna som utveck-lats har det gemensamt att de är baserade p˚a till˚atna sökriktningar. Meto-dutvecklingen som föresl˚as i avhandlingen bygger p˚a förbättringar vid ber¨ ak-ning av dessa till˚atna riktningar.

Avhandlingen har rubriken: Till˚atnariktningsmetoder för begränsad olinjär optimering - n˚agra förslag till förbättringar

(8)

(9)

Abstract

This thesis concerns the development of novel feasible direction type algo-rithms for constrained nonlinear optimization. The new algoalgo-rithms are based upon enhancements of the search direction determination and the line search steps.

The Frank–Wolfe method is popular for solving certain structured linearly constrained nonlinear problems, although its rate of convergence is often poor. We develop improved Frank–Wolfe type algorithms based on conjugate directions. In the conjugate direction Frank–Wolfe method a line search is performed along a direction which is conjugate to the previous one with respect to the Hessian matrix of the objective. A further refinement of this method is derived by applying conjugation with respect to the last two directions, instead of only the last one.

The new methods are applied to the single-class user traffic equilibrium prob-lem, the multi-class user traffic equilibrium problem under social marginal cost pricing, and the stochastic transportation problem. In a limited set of computational tests the algorithms turn out to be quite efficient. Addi-tionally, a feasible direction method with multi-dimensional search for the stochastic transportation problem is developed.

We also derive a novel sequential linear programming algorithm for general constrained nonlinear optimization problems, with the intention of being able to attack problems with large numbers of variables and constraints. The algorithm is based on inner approximations of both the primal and the dual spaces, which yields a method combining column and constraint generation in the primal space.

(10)

(11)

The field of nonlinear programming has a very broad range of applications and it has experienced major developments in the last few decades. Nonlin-ear models arise in various fields of the real life and there is a wide variety of approaches for solving the resulting nonlinear optimization programs. Non-linear optimization problems appear, for example, in routing problems in traffic [9, 48, 52, 63, 65] and telecommunications [29], oil [35] and chemical in-dustries [47], design optimization of large-scale structures [43, 44], variational inequalities [62], applications in structural optimization [66], economics [42], marketing [54, 58] and business applications [36], in solving systems of equa-tions [77], and in scientific applicaequa-tions as biology, chemistry, physics and mechanics, protein structure prediction [59], etc.

The traffic assignment problem is a nonlinear model, which describes how each traveler minimizes his/her own travel cost for reaching the desired des-tination. Modeling of the travel times, congestion and differences in the travelers value of time leads to nonlinearities. In the management of invest-ment portfolios, the goal might be to determine a mix of investinvest-ments so as to maximize return while minimizing risk. The nonlinearity in the model comes from taking risk into account. Although there is a variety of port-folio selection models, widely used is the quadratic optimization problem that minimizes the risk. In physics, for example, minimizing the potential energy function would determine a stable configuration of a system of atoms or determine the configuration of the largest terminal or kinetic energy, is also a nonlinear programming model. A related problem in chemistry is to determine the molecular structure that minimizes Gibb’s free energy, known also as chemical equilibrium [17]. In the very last years much research has been devoted to the development of nonlinear optimization for atomic and molecular physics, to solve difficult molecular configuration problems like cluster problems, protein folding problems, etc. [25, 41, 59].

Some important recent developments in nonlinear optimization are in least squares [60], neural networks [4, 73] and interior point methods for linear and nonlinear programs [4, 26]. Karmarkar [40] introduced a polynomial-time linear programming method and this work started the revolution of interior-point methods. These algorithms are especially efficient for convex

(20)

4 Introduction and Overview

optimization problems [1]. One can show that the number of iterations that an interior point algorithm needs in order to achieve a specified accuracy is bounded by a polynomial function of the size of the problem. For more details on interior-point methods, see [1, 4, 26]. Other important recent developments are the increased accent on large-scale problems [6, 13, 33, 48, 67, 79], and algorithms that take advantage of problem structures as well as parallel computation [10, 56, 76].

When modeling real-world problems different types of optimization problems occur. They can be linear or nonlinear, with or without constraints, contin-uous, integer or mixed-integer. The functions in an optimization problem can be differentiable or non-differentiable, convex or non-convex. Sometimes we consider optimization under uncertainty, known as stochastic optimiza-tion, where the functions are only given probabilistically. Nice references on fundamental theory, methods, algorithm analysis and advices on how to obtain good implementations in nonlinear optimization are among others [1, 3, 4, 51, 60, 77].

The focus of this thesis is on algorithms that solve nonlinear constrained optimization problems. Our concern is on algorithms which iteratively gen-erate a sequence of points {xk_}∞

k=1, which either terminates at or converges

to a solution of the problem under consideration. Only in very special cases, such as linear programming, LP, and convex quadratic programming, QP [77, Chapter 1.6], finite termination at an optimal point occurs.

We consider a general constrained nonlinear optimization problem

min

x∈X f (x), (1)

where the objective function f : X 7→ R is differentiable and the feasible set X ⊂Rn _{is non-empty and compact. A generic form of a primal feasible}

(21)

2 Selected topics in nonlinear optimization 5

Algorithm 1 A Generic Primal Feasible Descent Algorithm

Step 0 (Initialization): Choose an initial point x0∈ X and let k = 0. Step 1 (Termination check): If a termination criterion is satisfied, then

stop, else k = k + 1.

Step 2 (Direction determination): Determine a feasible descent search direction dk_.

Step 3 (Step length determination): Determine a step length tk> 0 such

that f (xk_{+ t}

kdk) < f (xk) and xk+ tkdk∈ X.

Step 4 (Update): Update xk+1= xk_{+ t}

kdk and go to Step 1.

The development of better performing algorithms can be made through mod-ifications in both the direction determination and the step length determi-nation steps of Algorithm 1.

2 Selected topics in nonlinear optimization

We here present some fundamental concepts from nonlinear optimization problems and methods.

2.1 Prerequisites

We consider the nonlinear minimization problem

min

x f (x), (2)

where the objective f : Rn _7→ _{R is a differentiable function. Below, we}

describe basic methods for solving unconstrained optimization problems. An interesting aspect of these approaches is if they converge globally. By a globally convergent algorithm we mean that the method generates a sequence that converges to a stationary point x∗, i.e. k∇f (x∗)k = 0, for any starting point.

2.1.1 Descent directions

How descent directions are generated depends on the particular optimization problem. A sufficient condition for dk _{to be a descent direction with respect}

(22)

to f at xk _{is given by ∇f (x}k₎T_dk _{< 0. In unconstrained optimization a}

search direction often has the form

dk = −B_k−1∇f (xk), (3) where Bk is a symmetric and nonsingular matrix. If Bk additionally is

positive definite, dk _{becomes a descent direction. In the steepest descent}

method, Bk is simply the identity matrix, thus dk = −∇f (xk), which is a

descent direction for f at xk. Global convergence of the steepest descent method is shown under convexity requirements of the problem [60, Chapter 3]. The steepest descent method is important from a theoretical point of view, but it is quite slow in practice.

The Newton method, that performs a second-order approximation of the ob-jective function and enjoys a better convergence rate [51, Ch. 7], is obtained from (3), when Bk is the Hessian matrix ∇2f (xk) of the objective function.

The Newton method converges rapidly when started close enough to a local optimum. A drawback is that it may not converge starting at an arbitrary point. The Newton method acts as a descent method at the iterate xk_{, if the}

Hessian matrix ∇2_{f (x}k_{) is positive definite, and as an ascent method if it}

is negative definite. The lack of positive definiteness of the Hessian may be cured by adding to ∇2f (xk_{) a diagonal matrix D}

k, such that ∇2f (xk) + Dk

becomes positive definite.

Accelerating the steepest descent method while avoiding the evaluation, stor-age and inversion of the Hessian matrix motivates the existence of quasi– Newton methods as well as conjugate direction methods. In conjugate di-rection methods (see e.g. [51, Ch. 8]) for unconstrained convex quadratic optimization, one performs line searches consecutively in a set of directions, d1, · · · , dn_{, mutually conjugate with respect to the Hessian ∇}2_{f (x) of the}

ob-jective (i.e. fulfilling diT_∇2_{f (x)d}j _{= 0 for i 6= j). In}_Rn_{the optimum is then}

identified after n line searches [51, p. 241, Expanding Subspace Theorem]. In conjugate gradient methods, one obtains conjugate directions by ”conju-gating” the gradient direction with respect to the previous search direction, that is dk _{= −∇f (x}k_{) + β}

kdk−1, with βk chosen so that dk is conjugate to

dk−1_{, which is accomplished by the choice}

βk =

∇f (xk₎T_{∇f (x}k₎

∇f (xk−1₎T_{∇f (x}k−1₎.

In the quadratic case, dk _{then in fact becomes conjugate to all previous}

(23)

In 1962, Fletcher and Reeves [23] introduced the nonlinear conjugate gra-dient method, known as the Fletcher-Reeves, FR, method. This method is shown to be globally convergent when all the search direction are descent. The method can produce a poor search direction in the sense that the search direction dk _{is almost orthogonal to −∇f (x}k_{), which results in a small}

im-provement in the objective value. Therefore, whenever this happens, using a steepest descent direction is advisable. The search direction may fail to be a descent direction, unless the step size βk satisfies the Wolfe condition

[60]. The FR method may take small steps and thus have bad numerical performance (see [10]).

The Polak–Ribi´ere method has proved to be more efficient in practice. The two methods differ by the formula calculating βk

βk=

∇f (xk₎T_{(∇f (x}k_{) − ∇f (x}k−1₎₎

∇f (xk−1₎T_{∇f (x}k−1₎ .

Quasi–Newton methods are based on approximations of the inverse of the Hessian. In these methods, the search direction is chosen to be dk ₌

−Dk∇f (xk), where Dk is an approximation of the inverse Hessian. The

quasi-Newton methods use information gathered from the iterates, xk _and

xk+1_{, and the gradients, ∇f (x}k_{) and ∇f (x}k+1_{). The well-known Davidon–}

Fletcher–Powell method, see e.g. [1, 51, 60], has the property that in the quadratic case, it generates the same direction as the conjugate direction method, while constructing the inverse of the Hessian. The method starts with a symmetric and positive definite matrix D0 and iteratively updates

the approximate inverse Hessian by the formula

Dk+1= Dk+ pk_(pk₎T (pk₎T_qk − (Dkqk)((qk)TDk) (qk₎T_D kqk , where qk=∇f (xk+1) − ∇f (xk), pk=tkdk, with dk_{= −D}

k∇f (xk), and tk= arg min t f (x

(24)

2.1.2 Line search

To ensure global convergence of a descent method, a line search can be performed. A step length, that gives a substantial reduction of the objective value, is obtained, usually by the minimization of a one-dimensional function. The one-dimensional optimization problem can be formulated as

min

t>0 ϕ(t) = f (x

k_{+ td}k_),

where t is the step length to be determined. In practice, an exact line search is not recommended since it is often too time consuming. As a matter of fact, an effective step, tk, needs not to be near a minimizer of ϕ(t). A typical line

search procedure requires an initial estimate t0

kand generates a sequence {tik}

that terminates when the step length satisfies certain conditions. An obvious condition on tkis a reduction of the objective value, i.e. f (xk+tkdk) < f (xk).

To get convergence, the line search needs to obtain sufficient decrease, i.e. to satisfy a condition like the strong Wolfe condition, which is

f (xk+ tkdk) ≤ f (xk) + η1tk∇f (xk)Tdk, (7a)

η2|∇f (xk)Tdk| ≥ |∇f (xk+ tkdk)Tdk|, (7b)

where 0 < η1 < η2 < 1. In practice η1 is chosen to be quite small, usually

η1 = 10−4. Typical values for η2 is 0.9 if dk is obtained by Newton or

quasi–Newton and 0.1 if dk _{is obtained by a nonlinear conjugate gradient}

method (see e.g. [60]). The condition (7a) is also known as the Armijo condition. The strong condition (7b) does not allow the directional derivative ∇f (xk _{+ t}

kdk)Tdk to be too positive. A nice discussion on practical

line-searches can be found in [60].

Another way to effect the convergence properties of an optimization algo-rithm is to use a trust region. The trust region methods avoid line searches by bounding the length of the search direction, d. In the context of a Newton type method, the second-order approximation

f (xk_{) + ∇f (x}k₎T_{d +} 1

2d

T_∇2_{f (x}k_)d

is trusted only in a neighborhood of xk_{, i.e. if kdk}

2≤ ∆k, for some positive

(25)

semi-positive definite. The idea is that when ∇2_{f (x}k_{) is badly conditioned,}

∆k should be kept low, and thereby the algorithm turns into a steepest

descent-like method. Even if ∇f (xk_{) = 0, progress is made if the Hessian}

∇2_{f (x}k_{) is not positive definite, i.e. the trust region algorithms move away}

from stationary points if they are saddle points or local maxima.

Line search methods and trust region methods differ in the order in which they choose the direction and the step length of the move to the next iterate. Trust region methods first choose the maximum distance and then determine the new direction. The choice of the trust region size is crucial and it is based on the ratio between the actual and the predicted reduction of the objective value. The robustness and strong convergence characteristics have made trust regions popular, especially for non-convex optimization [1, 4, 11, 57, 60].

2.2 Linearly constrained optimization

The direction determination step of Algorithm 1 produces a feasible descent direction. A direction dk _{is feasible if there is a scalar α > 0, such that}

f (xk_{+ td}k_{) ∈ X for all nonnegative t ≤ α. A steepest descent direction or}

a Newton direction do not guarantee that feasibility is maintained.

We briefly discuss methods that solve linearly constrained nonlinear opti-mization problems, that is

f∗= min

x∈Xf (x), (LCP )

where f : Rn _→ _{R is continuously differentiable and the feasible set X =}

{x : Ax ≤ b} is a nonempty polytope. The Frank-Wolfe algorithm is one of the most popular methods for solving some instances of such problems.

2.2.1 The Frank–Wolfe method

The Frank–Wolfe, FW, method [28] was originally suggested for quadratic programming problems, but in the original paper it was noted that the method could be applied also to linearly constrained convex programs.

The FW method approximates the objective f of (LCP ) by a first-order Taylor expansion (linearization) at the current iterate, xk_{, giving an affine}

minorant fk to f , i.e.

(26)

Then, the FW method determines a feasible descent direction by minimizing fk over X.

f_k∗= min

x∈Xfk(x). (F W SU B)

We denote by yk_{, the solution of this linear program, which is called the FW}

subproblem. The Frank–Wolfe direction is dk _{= y}k_{− x}k_{. Note, that if f is}

convex, f_k∗ is a lower bound to f∗, a fact that may be used for terminating the method.

The next step of the method is to perform a line search in the FW direction, i.e. a one-dimensional minimization of f , along the line segment between the current iterate xk _{and the point y}k_{. The point where this minimum is}

attained (at least approximately) is chosen as the next iterate, xk+1. Note that f (xk+1) is an upper bound to f∗.

The algorithm generally makes good progress towards an optimum during the first few iterations, but convergence often slows down substantially when close to an optimum. The reason for this is that the search directions of the FW method, in late iterations, tend to become orthogonal to the gradient of the objective function, leading to extreme zigzagging (e.g. [63, p. 102]). For this reason the algorithm is perhaps best used to find an approximate solution. It can be shown that the worst case convergence rate is sublinear [4]. In order to improve the performance of the algorithm, there are many suggestions for modifications of the direction finding [30, 49, 53] and the line search steps [64, 72]. There are also other more complex extensions of the FW method, such as simplicial decomposition, introduced by von Hohenbalken [70].

2.2.2 Simplicial decomposition

The idea of simplicial decomposition is to build up an inner approximation of the feasible set X, founded on Caratheodory’s theorem (e.g. [3]), which states that any point in the convex hull of a set X ⊂Rn _{can be expressed}

as a convex combination of at most 1 + dim X points of the set X. Thus, any feasible solution of (LCP ) can be represented as a convex combination of the extreme points of the set X. The simplicial decomposition algorithm alternates between a master problem, which minimizes the objective f over the convex hull of a number of extreme points of X, and a subproblem that generates a new extreme point of the feasible set X and, if f is convex on X, also provides a lower bound on the optimal value.

(27)

Given the current iterate xk _{and the extreme points y}i_{, i = 1 . . . , k + 1,}

generated by the subproblem, the next iterate is obtained from the master problem min f (xk+ k+1 X i=0 λi(yi− xk)) s.t. k+1 X i=0 λi≤ 1 (10) λi≥ 0, i = 0, . . . , k + 1,

where y0= x0. This problem is typically of lower dimension than the prob-lem (LCP ).

The advantage of using an inner representation of X is that it is much easier to deal with the linear constraints. The disadvantage is that the number of the extreme points is very large, for a large-scale problem. The algorithm may also need a large number of them in order to span an optimal solution to (LCP ). In [70] von Hohenbalken shows finite convergence of the simplicial decomposition algorithm, in the number of master problems, even if extreme points with zero weights are removed from one master to the next [71]. This result allows for the use of column dropping, which is essential to gain computational efficiency in large-scale applications.

When the algorithm throws away every point that is previously generated, we are back to the Frank–Wolfe algorithm. The number of the stored extreme points is crucial for the convergence properties, since if it is to small the behavior can be as bad as the Frank–Wolfe algorithm. We refer to [1] for further information about column dropping and simplicial decomposition.

Hearn et al. [37] extend the simplicial decomposition concept to the re-stricted simplicial decomposition algorithm [38, 69], in which the number of stored extreme points is bounded by a parameter r. Convergence to an opti-mal solution is obtained provided that r is greater then the dimension of the optimal face of the feasible set. Another extension of the simplicial decom-position strategy, known as disaggregate simplicial decomdecom-position, is made by Larsson and Patriksson [45], who take advantage of Cartesian product structures. The simplicial decomposition strategy has been applied mainly to certain classes of structured linearly constrained convex programs, where it has been shown to be successful.

(28)

2.3 General constrained optimization

We here consider the constrained nonlinear optimization problem

min

x { f (x) | g(x) ≤ 0 }, (N LP )

where f : Rn _7→ _{R and g : R}n _7→ _Rm _{are continuously differentiable}

func-tions. There are plenty of methods that attempt to solve optimization pro-grams with general constraints (see e.g. [34, 60]). A frequently employed solution principle is to alternate between the solution of an approximate problem and a line search with respect to a merit function. The merit func-tion measures the degree of non-optimality of any tentative solufunc-tion. The sequential linear programming (SLP) and the sequential quadratic program-ming (SQP) approaches are methods that are based on this principle.

2.3.1 Sequential linear programming

The sequential linear programming methods have become popular because of their easiness and robustness for large-scale problems. They are based on the application of first-order Taylor series expansions. The idea is to linearize all non-linear parts (objective and/or constraints) and, thereafter, to solve the resulting linear programming problem. The solution to this LP problem is used as a new iterate. The scheme is continued until some stopping criterion is met.

The SLP approach originates from Griffith and Stewart [35]. Their method is called the Method of Approximation Programming, and utilizes an LP approximation of the type

min ∇f (xk)T(x − xk) (SLP SU B) s.t. g(xk) + ∇g(xk)(x − xk) ≤ 0

kx − xk_k

2≤ ∆k,

where ∆k is some positive scalar. The linearity of the subproblem makes

the choice of the step size crucial. It is necessary to impose trust regions on the steps taken in order to ensure convergence and numerical efficiency of an SLP algorithm. The trust regions must be neither too large nor too small. If they are too small, the procedure will terminate prematurely or move slowly towards an optimum and if they are too large infeasibility or oscillation may occur. The SLP methods are most successful when curvature

(29)

effects are negligible. For problems which are highly nonlinear, SLP methods may converge slowly and become unreliable. A variety of numerical methods has been proposed [7, 11, 12, 21, 24, 57, 61, 78] to improve the convergence properties of SLP algorithms.

One of the milestones in the development of the SLP concept is the work of Fletcher and Sainz de la Maza [24]. They describe an algorithm that solves a linear program to identify an active set of constraints, followed by the solution of an equality constrained quadratic problem (EQP). This sequential linear programming - EQP (SLP-EQP) method is motivated by the fact that solving quadratic subproblems with inequality constraints can be expensive. The cost of solving one linear program followed by an equality constrained quadratic problem would be much lower.

2.3.2 Sequential quadratic programming

The method of sequential quadratic programming, suggested by Wilson [74] in 1963, for the special case of convex optimization, has been of great inter-est for solving large-scale constrained optimization problems with nonlinear objective and constraints. An SQP method obtains search directions from a sequence of QP subproblems. Each QP subproblem minimizes a quadratic approximation of the Lagrangian function subject to linear constraints. At the primal-dual point (xk_{, u}k_{) the SQP subproblem can be written as}

min ∇f (xk)T(x − xk) +1 2(x − x k₎T ∇2xxL(x k_{, u}k_{)(x − x}k₎ s.t. g(xk) + ∇g(xk)(x − xk) ≤ 0, (SQP SU B) where ∇2

xxL(xk, uk) denotes the Hessian of the Lagrangian. The SQP

algo-rithm in this form is a local algoalgo-rithm. If the algoalgo-rithm starts at a point in a vicinity of a local minimum, the algorithm has a quadratic local convergence. A line search or a trust region method is used to achieve global convergence from a distant starting point. In the line search case the new iterate is obtained by searching along the direction generated by solving (SQPSUB), until a certain merit function is sufficiently decreased. A variety of merit functions are described in e.g. [60, Chapter 15]. Another way to find the next iterate is to use trust regions. SQP methods have proved to be efficient in practice. They typically require fewer function evaluations than some of the other methods. For an overview of SQP methods, see [5].

(30)

introduction of the filter concept by Fletcher and Leyffer [20]. The main advantage of using the filter concept is to avoid using a merit function. The filter allows a trial step to be accepted if it reduces either the objective function or a constraint violation function. The filter is used in trust region type algorithms as a criterion for accepting or rejecting a trial step. Global convergence of an SLP-filter algorithm is shown in [12, 21] and the global convergence properties of an SQP-filter algorithm are discussed in [19, 22, 68].

2.4 The Lagrangian dual problem

Suppose, that (N LP ) has a set of optimal solutions which is non-empty and compact. Let u ∈Rm

+ be a vector of Lagrangian multipliers associated with

the constraints g(x) ≤ 0, and consider the Lagrangian function L(x, u) = f (x) + uT_g(x).

Under a suitable constraint qualification the problem (N LP ) can be restated as the saddle point problem (e.g. [3])

max

u≥0 minx L(x, u) = f (x) + u

T_g(x). _{(SP P )}

If a point (x∗_{, u}∗_{) solves (SP P ), then, according to the saddle point theorem}

([4, p. 427]), x∗_{is a local minimum to (N LP ). Furthermore, if the problem}

(N LP ) is convex then x∗is a global optimal solution to the problem (N LP ) (see [3]).

The saddle point theorem gives sufficient conditions for optimality. By intro-ducing the Lagrangian function for the (N LP ) problem with slack variables in the constrains, gi(x) + s2i = 0, i = 1, . . . , m, necessary conditions for a

local optimum of general constrained optimization problem can be estab-lished. A point (x∗, u∗, s∗) is a stationary point to (SP P ) if it satisfies ∇L(x∗_{, u}∗_{, s}∗_{) = 0, and the Hessian with respect to x and s is positive}

semidefinite. These requirements can be written as

∇f (x∗) + ∇g(x∗)Tu∗= 0, (14a)

u∗Tg(x∗) = 0, (14b)

g(x∗) ≤ 0, (14c)

(31)

The conditions (14a – 14d) are known as the Karush-Kuhn-Tucker (KKT) conditions and a point that satisfies them is known as a KKT point. The condition (14a) means that there is no descent direction, with respect to x, for L(x, u) from x∗_{. Additionally it is required that the complementarity}

condition u∗Tg(x∗) = 0 is fulfilled. The equation (14b) says that u∗i can

be strictly positive only when the corresponding constraint gi(x∗) is active,

that is gi(x∗) = 0 holds. The KKT conditions are first-order necessary

conditions, and they may be satisfied by both local maxima, local minima and other vectors. The second-order condition

dT∇2xxL(x∗, u∗)d ≥ 0, for all d 6= 0 with ∇g(x∗)d = 0,

is used to guarantee that a given point x∗_{is a local minimum.}

The methods that solve (N LP ) problems can be divided into methods that work in primal, dual and primal-dual spaces. The primal algorithms work with feasible solutions and improve the value of the objective function. Com-putational difficulties may arise from the necessity to remain within the feasible region, particularly for problems with nonlinear constraints. For problems with linear constraints they enjoy fast convergence.

The dual methods attempt to solve the dual problem. In this case a direction determination step should find an ascent direction for the dual objective function, which is always concave even when the primal problem may be non-convex. This means that a local optimum of (SP P ) is also a global one. The main difficulty of the dual problem is that it may be non-differentiable and is not explicitly available.

Primal-dual methods [27, 31, 32, 50] are methods that simultaneously work in the primal and dual spaces. This principle is widely spread in the field of interior point methods. A nice book that covers the theoretical properties, practical and computational aspects of primal-dual interior-point methods is written by Stephen J. Wright [75].

2.5 Convergence

An important subject when considering methods in nonlinear optimization is their local and global convergence properties. Local convergence properties measure the ultimate speed of convergence, and can be used to determine the relative advantage of one algorithm to another. If, for arbitrary starting points, an algorithm generates a sequence of points converging to a solution,

(32)

then the algorithm is said to be globally convergent. Many algorithms for solving nonlinear programming problems are not globally convergent, but it is often possible to modify such algorithms so as to achieve global conver-gence.

The subject of global convergence is treated by Zangwill [77]. We here think of an algorithm as a mapping, that is, the algorithm is represented as a point-to-set map A, that maps the iteration point xk _{to a set A(x}k_{) to}

which xk+1_{will belong, i.e. x}k+1_{∈ A(x}k_).

Definition A point-to-set map A is closed at x if for all sequences {xk_}∞

k=1→ x and {yk}∞k=1→ y with yk∈ A(xk), we have y ∈ A(x).

The Convergence Theorem [77, p. 91] establishes global convergence of closed algorithmic point-to-set maps.

Convergence theorem: Let A be an algorithm on X, and suppose that, given x1, the sequence {xk_}∞

k=1 is generated satisfying xk+1∈ A(xk). Let a

solution set Γ ⊂ X be given and suppose that

i) all points xk _{are contained in a compact set S ⊂ X}

ii) there is a continuous function Z on X such that

a) if x /∈ Γ, then Z(y) < Z(x) for all points y ∈ A(x) b) if x ∈ Γ, then Z(y) ≤ Z(x) for all points y ∈ A(x)

iii) the mapping A(x) is closed at points outside Γ.

Then the limit of any convergent subsequence of {xk_}∞

k=1 is a solution.

The requirements ii) amounts to the existence of a merit function, which can be used to measure the progress of an algorithm.

3 Outline of the thesis and contribution

The attention in this thesis is on the development of feasible descent direction algorithms. The thesis consists of five papers.

The first paper, ”The Stiff is Moving - Conjugate Direction Frank–Wolfe Meth-ods with Applications to Traffic Assignment”, treats the traffic assignment problem [63]. In this problem, travelers between different origin-destination

(33)

3 Outline of the thesis and contribution 17

pairs in a congested urban transportation network, want to travel along their shortest routes (in time). However, the travel times depend on the conges-tion levels, which, in turn, depend on the route choices. The problem is to find the equilibrium traffic flows, where each traveler indeed travels along his shortest route. It is well known that this equilibrium problem can be stated as a linearly constrained convex minimization problem of the form (LCP ), see e.g. [63, Ch. 2].

The conventional Frank–Wolfe, FW, method is frequently used for solving structured linearly constrained optimization problems. We improve the per-formance of the Frank–Wolfe method by choosing better search directions, based on conjugate directions. In conjugate gradient methods, one obtains search directions by conjugating the gradient direction with respect to the previous search direction. The same trick can be applied to the FW direc-tion.

In the conjugate direction FW method, CFW, we choose the search direction ˜

dk _as

˜

dk= dk+ βkd˜k−1,

where dk _{is the FW direction found by solving the (F W SU B) problem}

and βk is chosen to make ˜dk conjugate to ˜dk−1 with respect to the Hessian

∇2_{f (x}k_).

Global convergence of the CFW method using an inexact line search is proved. Further refinement of the conjugate direction Frank–Wolfe method is derived by applying conjugation with respect to the last two directions instead of only the last one. The computations in the Bi-Conjugate Frank– Wolfe Method, BFW, are slightly more complicated. This modification outperforms CFW, at least for high iteration counts. The CFW and BFW algorithms were first implemented in the Matlab [55] environment. The promising results spurred us to implement the two algorithms, as well as FW, in the programming language C, to be able to make more detailed investigations on larger networks.

In a limited set of computational tests the new algorithms, applied to the single-class traffic equilibrium problem, turned out to be quite efficient. Our results indicate that CFW and BFW algorithms outperform, for accuracy requirements suggested by Boyce et al. [8], the pure and “PARTANized” Frank–Wolfe, disaggregate simplicial decomposition [45] and origin-based algorithms [2].

(34)

We extend the conjugate Frank–Wolfe method to non-convex optimization problems with linear constraints and apply this extension to the multi-class traffic equilibrium problem under social marginal cost pricing (SMC). In the second paper ”Multi-Class User Equilibria under Social Marginal Cost Pricing” we study the model in which the cost of a link may differ between the different classes of users in the same transportation network [15]. Under SMC pricing, the users have to pay a toll for the delays they incur to other users. We show that, depending on the formulation, the multi-class SMC pricing equilibrium problem (with different time values) can be stated either as an asymmetric or as a symmetric equilibrium problem. In the latter case, the corresponding optimization problem is in general non-convex. For this non-convex problem, we devise descent methods of Frank–Wolfe type. We apply these methods to a synthetic case based on Sioux Falls network.

The third paper ”A Conjugate Direction Frank–Wolfe Method for Non-convex Problems ” generalizes the conjugate Frank–Wolfe method, examine some properties of it for non-convex problems, and show through limited testing that it seems to be more efficient than Frank–Wolfe, at least for high iteration counts.

Further, we exploit the conjugate Frank–Wolfe algorithm for solving the stochastic transportation problem, for which Frank–Wolfe type methods have been claimed to be efficient [14, 49, 39]. The stochastic transporta-tion problem, first described by Elmaghraby [18] in 1960, can be considered as the problem of determining the shipping volumes from supply points to demand points with uncertain demands, that yields the minimal expected total cost. In the fourth paper ”A Comparison of Feasible Direction Meth-ods for the Stochastic Transportation Problem” we compare several feasible direction methods for solving this problem.

Besides the conjugate Frank–Wolfe algorithm, we also apply the diagonalized Newton, DN, approach [46]. In this method the direction generation sub-problem of the Frank–Wolfe method is replaced by a diagonalized Newton subproblem, based on a second-order approximation of the objective func-tion. The CFW and DN methods do not introduce any further parameters in the solution algorithm, they have a better practical rate of convergence than the Frank–Wolfe algorithm, and they take full advantage of the structure of the problem.

Additionally, an algorithm of FW type but with multi-dimensional search is described in this paper. In the previously discussed approaches for the

(35)

3 Outline of the thesis and contribution 19

stochastic transportation problem the direction finding subproblem is mod-ified in order to improve upon the FW algorithm. Numerical results for the proposed methods, applied to two types of test problems presented in Cooper and LeBlanc [14] and LeBlanc et al. [49], show a performance that is superior to that of the Frank–Wolfe method, and to the heuristic variation of the Frank–Wolfe algorithm used in LeBlanc et al. [49], whenever solutions of moderate or high accuracy are sought.

In paper five ”A Sequential Linear Programming Algorithm with Multi-dimen-sional Search — Derivation and Convergence”we utilize ideas from simplicial decomposition (see Section 2.2.2), sequential linear programming (see Sec-tion 2.3.1) and duality (see SecSec-tion 2.4). This results in a novel SLP algo-rithm for solving problems with large number of variables and constraints. In particular, the line search step is replaced by a multi-dimensional search. The algorithm is based on inner approximations of both the primal and the dual spaces, and yields both column and constraint generation in the primal space and its linear programming subproblem differs from the one obtained in traditional SLP methods.

A linear approximation of (SP P ) (see Section 2.4) at the current primal and dual points gives a column generation problem which reduces and separates into a primal and a dual column generation problems. These are used to find better approximations of the inner primal and dual spaces. The line search problem of a traditional SLP algorithm is replaced by a minimization problem of the same type as the original one, but with typically fewer vari-ables and fewer constraints. Because of the fewer number of varivari-ables and constraints, it should be computationally less demanding than the original problem.

The theoretical results presented in this paper show the convergence of the new method to a point that satisfies the KKT conditions, and thus to a global optimal solution for a convex problem. In the presented algorithm it is not necessary to introduce rules to control the move limits ∆k, and we may

abandon the merit function as well, while still guaranteeing convergence. In the paper, the suggested idea of using multi-dimensional search is also outlined for the case of sequential quadratic programming algorithms.

We apply the new method to a selection of the Hoch-Schittkowski’s nonlinear test problems and report preliminary computational results in a Matlab environment.

(36)

in-20 Introduction and Overview

volvement in the development of the solution methods, in the writing process and the analysis of the results. My contributions are in the implementation and testing of the solution algorithms that are described in the papers, as well.

4 Chronology and publication status

The papers that has contributed to the contents of the thesis, arised in the following order.

”A Conjugate Direction Frank–Wolfe Method with Applications to the Traffic Assignment Problem”, co-authored with Per Olov Lindberg.

Published in Operations Research Proceedings 2002, pp. 133-138, Springer, 2003. The paper is also presented in my licentiate thesis [16].

”Improved Frank–Wolfe Directions through Conjugation with Applications to the Traffic Assignment Problem”, co-authored with Per Olov Lindberg.

Published as Technical Report LiTH-MAT-R-2003-6, Department of Math-ematics, Link¨oping University. The paper is part of my licentiate thesis [16].

”Multi-Class User Equilibria under Social Marginal Cost Pricing”, co-authored with Leonid Engelson and Per Olov Lindberg.

Published in Operations Research Proceedings 2002, pp. 174-179, Springer, 2003. This paper is presented as paper II in the thesis and is also presented in my licentiate thesis [16].

(37)

4 Chronology 21

”A Conjugate Direction Frank–Wolfe Method for Non-convex Problems”, co-authored with Per Olov Lindberg.

Published as Technical Report LiTH-MAT-R-2003-09, Department of Math-ematics, Link¨opings University. The paper is presented as paper III in this thesis and is also in my licentiate thesis [16].

”The Stiff is Moving - Conjugate Direction Frank–Wolfe Methods with Appli-cations to Traffic Assignment”, co-authored with Per Olov Lindberg.

The paper is under review for publication in the journal Transportation Science. This paper is presented as paper I in this thesis and is an extension of the first two papers above.

”A Sequential Linear Programming Algorithm with Multi-dimensional Search — Derivation and Convergence”, co-authored with Maud G¨othe-Lundgren, Torbj¨orn Larsson, Michael Patriksson and Clas Rydergren.

The paper is submitted for publication and is presented as paper V in this thesis.

”A Comparison of Feasible Direction Methods for the Stochastic Transportation Problem”, co-authored with Torbj¨orn Larsson, Michael Patriksson and Clas Rydergren.

The paper is submitted for publication and is presented as paper IV in this thesis.

(38)

(39)

Bibliography

[1] N. Andr´easson, A. Evgrafov, and M. Patriksson. An Introduction to Continuous Optimization: Foundations and fundamental algorithms. Studentlitteratur, 2005.

[2] H. Bar-Gera. Origin-based algorithms for the traffic assignment prob-lem. Transportation Sci., 36(4):398–417, 2002.

[3] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty. Nonlinear Program-ming: Theory and Algorithms. John Wiley & Sons, New York, NY, second edition, 1993.

[4] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, second edition, 1999.

[5] P. T. Boggs and J. W. Tolle. Sequential quadratic programming. Acta Numerica, pages 1–51, 1995.

[6] P. T. Boggs, J. W. Tolle, and A. J. Kearsley. A truncated SQP al-gorithm for large scale nonlinear programming problems. In Advances in optimization and numerical analysis (Oaxaca, 1992), volume 275 of Math. Appl., pages 69–77. Kluwer Acad. Publ., Dordrecht, 1994.

[7] J. F. Bonnans, J. Ch. Gilbert, C. Lemar´echal, and C. Sagastiz´abal. Nu-merical Optimization – Theoretical and Practical Aspects. Universitext. Springer Verlag, Berlin, second edition, 2006.

[8] D. Boyce, B. Ralevic-Dekic, and H. Bar-Gera. Convergence of traf-fic assignments: How much is enough? In 16th _{Annual International}

EMME/2 Users’ Group Conference, Albuquerque, NM, 2002.

[9] M. Bruynooghe, A. Gibert, and M. Sakarovitch. Une m´ethode d’affectation du trafic. In Proceedings of the 4th International

(40)

sium on the Theory of Road Traffic Flow, pages 198–204. Bundesmin-isterium f¨ur Verkehr, Bonn, Karlsruhe, 1969.

[10] Y. Censor and S. A. Zenios. Parallel optimization. Numerical Mathe-matics and Scientific Computation. Oxford University Press, New York, 1997.

[11] T. Y. Chen. Calculation of the move limits for the sequential linear pro-gramming method. Internat. J. Numer. Methods Engrg., 36(15):2661– 2679, 1993.

[12] C. M. Chin and R. Fletcher. On the global convergence of an SLP-filter algorithm that takes EQP steps. Math. Programming, 96(1):161–177, 2003.

[13] A. R. Conn, N. I. M. Could, and Ph. L. Toint. LANCELOT: a fortran package for large-scale nonlinear optimization (release a). In Springer Series in Computational Mathematics, volume 17, 1992.

[14] L. Cooper and L. J. LeBlanc. Stochastic transportation problems and other network related convex problems. Naval. Res. Logist. Quart., 24(2):327–337, 1977.

[15] S. Dafermos. Toll patterns for multiclass-user transportation networks. Transportation Sci., 7:211–223, 1973.

[16] M. Daneva. Improved Frank-Wolfe directions with applications to the traffic assignment problem. Link¨oping Studies in Science and Technol-ogy. Theses No. 1023. Department of Mathematics, Link¨oping Univer-sity, 2003.

[17] G. B. Dantzig. Linear programming and extensions. Princeton Univer-sity Press, Princeton, N.J., 1963.

[18] S. E. Elmaghraby. Allocation under uncertainty when the demand has continuous d.f. Management Sci., 6:270–294, 1960.

[19] R. Fletcher, N. I. M. Gould, S. Leyffer, Ph. L. Toint, and A. W¨achter. Global convergence of a trust-region SQP-filter algorithm for general nonlinear programming. SIAM J. Optim., 13(3):635–659, 2002.

[20] R. Fletcher and S. Leyffer. Nonlinear programming without a penalty function. Technical Report 171, Department of Mathematics, University of Dundee, Scotland, 1996.

(41)

Bibliography 25

[21] R. Fletcher, S. Leyffer, and Ph. L. Toint. On the global convergence of an SLP-filter algorithm. Technical Report 183, Department of Mathe-matics, University of Dundee, Scotland, 1998.

[22] R. Fletcher, S. Leyffer, and Ph. L. Toint. On the global convergence of a filter-SQP algorithm. SIAM J. Optim., 13(1):44–59, 2002.

[23] R. Fletcher and C. M. Reeves. Function minimization by conjugate gradients. Comput. J., 7:149–154, 1964.

[24] R. Fletcher and E. S´ainz de la Maza. Nonlinear programming and nonsmooth optimization by successive linear programming. Math. Pro-gramming, 43(3):235–256, 1989.

[25] C. A. Floudas. A global optimization approach for Lennard-Jones mi-croclusters. Journal of Chemical Physics, 97:7667 – 7678, 1992.

[26] A. Forsgren and Ph. E. Gill. Interior methods for nonlinear optimiza-tion. SIAM Rev., 44:525–597, 2002.

[27] A. Forsgren, Ph. E. Gill, and M. H. Wright. Primal-dual interior meth-ods for nonconvex nonlinear programming. SIAM J. Optim., 8:1132 – 1152, 1998.

[28] M. Frank and Ph. Wolfe. An algorithm for quadratic programming. Naval Res. Logist. Quart., 3:95–110, 1956.

[29] L. Fratta, M. Gerla, and L. Kleinrock. The flow deviation method: An approach to store-and-forward communication network design. Net-works, 3:97–133, 1973.

[30] M. Fukushima. A modified Frank-Wolfe algorithm for solving the traffic assignment problem. Transportation Res. Part B, 18(2):169–177, 1984.

[31] E. M. Gertz and Ph. E. Gill. A primal-dual trust region algorithm for nonlinear optimization. Math. Program., 100(1):49–94, 2004.

[32] Ph. E. Gill, W. Murray, D. B. Poncele´on, and M.A. Saunders. Primal-dual methods for linear programming. Math. Programming, 70(3, Ser. A):251–277, 1995.

[33] Ph. E. Gill, W. Murray, and M. A. Saunders. SNOPT: an SQP algo-rithm for large-scale constrained optimization. SIAM Rev., 47(1):99– 131, 2005.

(42)

[34] N. Gould, D. Orban, and Ph. Toint. Numerical methods for large-scale nonlinear optimization. Acta Numerica, pages 299–361, 2005.

[35] R. E. Griffith and R. A. Stewart. A nonlinear programming technique for the optimization of continuous processing systems. Management Sci., 7:379–392, 1960/1961.

[36] R. Haugen. In Modern Investment Theory, pages 92–130. 1997. [37] D. W. Hearn, S. Lawphongpanich, and J. A. Ventura. Finiteness in

re-stricted simplicial decomposition. Oper. Res. Lett., 4(3):125–130, 1985. [38] D. W. Hearn, S. Lawphongpanich, and J. A. Ventura. Restricted simpli-cial decomposition: computation and extensions. Math. Programming Study, 31:99–118, 1987.

[39] K. Holmberg. Efficient decomposition and linearization methods for the stochastic transportation problem. Comput. Optim. Appl., 4(4):293– 316, 1995.

[40] N. Karmarkar. A new polynomial-time algorithm for linear program-ming. Combinatorica, (4):373–395, 1984.

[41] V. G. Kartavenko, K. A. Gridnev, and W. Greiner. Nonlinear effects in nuclear cluster problem. Int. J. Mod. Phys., E7:287 – 299, 1998. [42] D. M. Kreps. Course in Microeconomic Theory. Princeton University

Press, New Jersey, 1990.

[43] L. Lamberti and C. Pappalettere. Move limits definition in structural optimization with sequential linear programming. I. Optimization algo-rithm. Comput. & Structures, 81(4):197–213, 2003.

[44] L. Lamberti and C. Pappalettere. Move limits definition in structural optimization with sequential linear programming. II. Numerical exam-ples. Comput. & Structures, 81(4):215–238, 2003.

[45] T. Larsson and M. Patriksson. Simplicial decomposition with disaggre-gated representation for the traffic assignment problem. Transportation Sci., 26:4–17, 1992.

[46] T. Larsson, M. Patriksson, and C. Rydergren. An efficient solution method for the stochastic transportation problem. Link¨oping Studies in Science and Technology. Theses No. 702. Department of Mathematics, Link¨oping University, 1998.

(43)

Bibliography 27

[47] L. S. Lasdon and A. D. Waren. Large scale nonlinear programming. Computers and Chemical Engineering, 7(5):595–613, 1983.

[48] L. J. Leblanc. Mathematical programming algorithms for large scale network equilibrium and network design problems. PhD thesis, IE/MS Dept, Northwestern University, Evanston IL, 1973.

[49] L. J. LeBlanc, R. V. Helgason, and D. E. Boyce. Improved efficiency of the Frank-Wolfe algorithm for convex network programs. Transporta-tion Sci., 19(4):445–462, 1985.

[50] X. Liu and J. Sun. A robust primal-dual interior-point algorithm for nonlinear programs. SIAM J. Optim., 14(4):1163–1186, 2004.

[51] D. G. Luenberger. Linear and Nonlinear Programming. Addison-Wesley, Reading, MA, 1984.

[52] J. T. Lundgren. Optimization approaches to travel demand mod-elling. PhD thesis, Department of Mathematics, Link¨opings university, Link¨oping, Sweden, 1989.

[53] M. Lupi. Convergence of the Frank-Wolfe algorithm in transportation network. Civil Engineering Systems, 3:7–15, 1986.

[54] R. Markland and J. Sweigart. Quantitative Methods: Applications to Managerial Decision Making. John Wiley & Sons, New York, 1987. [55] The MathWorks, Inc., Natick, MA. Matlab User’s Guide, 1996. [56] A. Migdalas, G. Toraldo, and V. Kumar. Nonlinear optimization and

parallel computing. Parallel Comput., 29(4):375–391, 2003.

[57] J. J. Mor´e and D. C. Sorensen. Computing a trust region step. SIAM J. Sci. Statist. Comput., 4(3):553–572, 1983.

[58] R. M. Nauss and R. E. Markland. Optimization of bank transit check clearing operations. Management Sci., 31(9):1072–1083, 1985.

[59] A. Neumaier. Molecular modeling of proteins and mathematical pre-diction of protein structure. SIAM Rev., 39(3):407–460, 1997.

[60] J. Nocedal and S. J. Wright. Numerical Optimization. Springer-Verlag, New York, 1999. Springer series in operations research.

[61] J. Nocedal and Y. Yuan. Combining trust region and line search techiques. Advances in Nonlinear Programming, pages 153–175, 1998.

(44)

[62] M. Patriksson. A unified framework of descent algorithms for nonlin-ear programs and variational inequalities. PhD thesis, Department of Mathematics, Link¨opings university, Link¨oping, Sweden, 1993.

[63] M. Patriksson. The Traffic Assignment Problem - Models and Methods. VSP, Utrecht, 1994.

[64] W. B. Powell and Y. Sheffi. The convergence of equilibrium algorithms with predetermined step sizes. Transportation Sci., 16(1):45–55, 1982.

[65] C. Rydergren. Decision support for strategic traffic management : an optimization-based methodology. PhD thesis, Department of Mathemat-ics, Link¨opings university, Link¨oping, Sweden, 2001.

[66] M. Rönnqvist. Applications of Lagrangean dual schemes to structural optimization. PhD thesis, Department of Mathematics, Linköpings uni-versity, Linköping, Sweden, 1993.

[67] K. Schittkowski and C. Zillober. Nonlinear programming: algorithms, software, and applications. From small to very large scale optimization. In System modeling and optimization, volume 166 of IFIP Int. Fed. Inf. Process., pages 73–107. Kluwer Acad. Publ., Boston, MA, 2005.

[68] S. Ulbrich. On the superlinear local convergence of a filter-SQP method. Math. Programming, 100(1, Ser. B):217–245, 2004.

[69] J. A. Ventura and D. W. Hearn. Restricted simplicial decomposition for convex constrained problems. Math. Programming, 59(1):71–85, 1993.

[70] B. von Hohenbalken. A finite algorithm to maximize certain pseudo-concave functions on polytopes. Math. Programming, 9:189–206, 1975.

[71] B. von Hohenbalken. Simplicial decomposition in nonlinear program-ming algorithms. Math. Programprogram-ming, 13:49–68, 1977.

[72] A. Weintraub, C. Ortiz, and J. Gonz´alez. Accelerating convergence of the Frank-Wolfe algorithm. Transportation Res. Part B, 19(2):113–122, 1985.

[73] Y. Wen, M. A. Moreno-Armendariz, and E. Gomez-Ramirez. Modelling of gasoline blending via discrete-time neural networks. In Proceedings. 2004 IEEE International Joint Conference on Neural Networks, vol-ume 2, pages 1291 – 1296. 2004.

(45)

Bibliography 29

[74] R. B. Wilson. A simplicial method for concave programming. PhD thesis, Harward University, Cambridge, Mass., 1963.

[75] S. J. Wright. Primal-Dual Interior-Point Methods. SIAM, 1997.

[76] G. L. Xue, R. S. Maier, and J. B. Rosen. Minimizing the Lennard-Jones potential function on a massively parallel computer. In ICS ’92: Pro-ceedings of the 6th international conference on Supercomputing, pages 409–416, New York, NY, USA, 1992. ACM Press.

[77] W. I. Zangwill. Nonlinear programming: a unified approach. Prentice-Hall Inc., Englewood Cliffs, N.J., 1969.

[78] J. Z. Zhang, N-H. Kim, and L. Lasdon. An improved successive linear programming algorithm. Management Sci., 31(10):1312–1331, 1985.

[79] Ch. Zillober, K. Schittkowski, and K. Moritzen. Very large scale opti-mization by sequential convex programming. Optim. Methods Softw., 19(1):103–120, 2004.

Feasible Direction Methods for Constrained Nonlinear Optimization