### Parallel Optimization in Matlab

### Joakim Agnarsson, Mikael Sunde, Inna Ermilova Project in Computational Science: Report January 2013

## PROJECT REPORT

### Contents

1 Introduction 4

1.1 Hot Rolling . . . 4

1.2 Optimization . . . 4

1.3 Goal . . . 5

2 Theory 5 2.1 Gradient based methods . . . 5

2.1.1 Gradient based local optimization methods . . . 5

2.1.2 Global optimization using fmincon . . . 9

2.2 Simulated annealing . . . 11

2.3 Genetic algorithms . . . 12

2.4 Pattern search . . . 14

2.4.1 Pattern search: important options . . . 15

3 Method of solving the optimizational problem 17 3.1 Matlab framework . . . 17

3.1.1 Matlab toolboxes . . . 17

3.1.2 Our implementation of optimization framework . . . 17

3.2 Approach to parameter selection . . . 18

3.2.1 MultiStart . . . 18

3.2.2 GlobalSearch . . . 22

3.2.3 Hybrid Simulated Annealing . . . 23

3.2.4 Genetic Algorithm . . . 29

3.2.5 Pattern Search . . . 33

4 Results 33 4.1 Gradient based methods . . . 33

4.2 Hybrid simulated annealing . . . 35

4.3 Genetic algorithms . . . 35

4.4 Pattern search . . . 35

4.4.1 Pattern search on Windows-cluster . . . 35

4.4.2 Pattern search on a Linux-cluster . . . 39

4.5 Comparison of methods . . . 40

5 Discussion 42 5.1 Gradient based solvers . . . 42

5.2 Hybrid simulated annealing . . . 43

5.3 Genetic algorithms . . . 44

5.4 Pattern search . . . 44

5.5 Improving speedup . . . 45

6 Conclusions 46 6.1 Current state . . . 46

6.2 Future work . . . 46

### 1 Introduction

In this report we explore different mathematical optimization methods applied on a production process called hot rolling. This is done by comparing accu- racy, serial speed and parallel speedup for various optimization methods us- ing Matlab’s Optimization Toolbox, Global Optimization Toolbox and Parallel Computing Toolbox.

### 1.1 Hot Rolling

Hot rolling is a process in metalworking where metal slabs, blocks of metal, are processed into a product of suitable dimensions and material quality. This is done by heating the material in a furnace to make it malleable, passing it through various mills to shape the material, and then cooling it down under controlled conditions. In the hot rolling process the settings for the mills, such as the rolling speed, are referred to as the rolling schedule.

In production processes the goal is always to achieve an end-product with suffi- cient quality while minimizing the cost of the production. Performing physical tests to find the best rolling schedule is very time consuming and, most impor- tantly, very expensive. Instead, as there are mathematical models that describe how the rolling schedule affects the material, it is possible to use computers to calculate the optimal rolling schedule using optimization.

### 1.2 Optimization

Optimization is a mathematical technique to find extreme values, without loss of generality, a minimum of a given objective function, f (x), subject to some con- straints on which coordinates x are acceptable. Such an optimization problem can be defined as in equation (1).

minx f (x),

suchthat Gi(x) = 0, i = 1, . . . , me, (1)
G_{i}(x) ≤ 0, i = m_{e}+ 1, . . . , m,

A point with the lowest objective value is called an optimizer and the cor- responding objective value is called the optimal value; together they are the optimum. A point is called feasible if it satisfies all the constraints and the set of all feasible points is called the feasible set or the feasible region.

For the hot rolling process, the optimization objectives, i.e. objective functions f(x), depend on certain parameters, x, such as rolling speed, transfer bar tem- perature and thickness of the metal slabs. Furthermore, some constraints are

superimposed on the process, for instance minimum or maximum temperature of the slabs entering the rougher mill, dimensions in some particular production machine and physical restrictions on the process.

One major issue with optimization is how to determine the difference between a local and a global optimizer. A local optimizer is a point which has the best objective value in some small region around that optimizer while a global optimizer is the point which has the lowest objective value of all feasible points.

For discontinuous functions it is in general not possible to detect whether a local optimum is also a global optimum and the hot rolling objective function may be discontinuous. Thus in this report a global optimum will refer to the best local optimum found during all optimizations.

### 1.3 Goal

Our goal is to investigate Matlab’s implemented global optimization methods regarding accuracy, serial speed and parallel speedup when applied to the hot rolling schedule provided by ABB. Accuracy is measured by the method’s abil- ity to consistently find the global minimum. These studies are important but preparatory to the long term goal that is to be able to perform online or real time optimization, i.e. to optimize the hot rolling schedule during the industrial process.

### 2 Theory

Below we briefly summarize some of the most often used optimization tech- niques.

### 2.1 Gradient based methods

2.1.1 Gradient based local optimization methods

As the name suggests, gradient-based methods use first and second order deriva- tives, gradients and hessians, to find local minima. Two important gradient based methods for solving constrained nonlinear optimization problems as de- fined in (1) are called Sequential Quadratic Programming (SQP) and interior- point methods (IP). They both reduce this quite complicated problem into an easier set of sub-problems and solve them successively until an optimum is found.

Matlab has two gradient-based global optimization solvers: MultiStart and GlobalSearch. These two methods make use of a function in Matlab called fmincon that finds a local minimum. fmincon iterates from a given starting point towards a local minimum using one of four implemented optimization tech- niques: trust-region-reflective, sqp, active-set and interior-point.

We do not consider trust-region-reflective in this report since the user have to supply the solver with a pre-defined gradient for the objective function and the constraints, which we don’t have in this case.

SQP methods solve a sequence of Quadratic Programs (QP) to find the descent direction, hence the name sequential quadratic programming. Optimality is reached when the Karush-Kuhn-Tucker (KKT) conditions, (2), are fulfilled.

∇f (x^{∗}) +

m_{e}

X

i=1

µi· ∇Gi(x^{∗}) +

m

X

i=me+1

λi· ∇Gi(x^{∗}) = 0

G_{i}(x^{∗}) = 0, i = 1, . . . , m_{e}

Gi(x^{∗}) ≤ 0, i = me+ 1, . . . , m (2)
λi≥ 0, i = me+ 1, . . . , m
λ_{i}G_{i}(x^{∗}) = 0, i = m_{e}+ 1, . . . , m

The Karush-Kuhn-Tucker conditions are first order necessary conditions for a point to be an optimum and describe the relation between the gradient of the objective function and the gradient of the active constraints. If the optimum lies in the interior of the feasible space the first order necessary condition is simply the same as in the unconstrained case, namely, that the gradient of the objective function is equal to zero.

Matlab’s function fmincon uses two variants of the SQP method called active-set and sqp. These two algorithms are very similar and make use of a quasi-Newton method to iterate toward a solution that satisfies the Karush-Kuhn-Tucker equa- tions. It is called a quasi-Newton method since the Hessian is not computed exactly but rather approximated using an update scheme, in this case a BFGS (appendix) update. This approximation is made since the computational cost to compute the Hessian directly is often too high. Both active-set and the sqp ensure that the Hessian is positive definite by choosing to initialize the BFGS method with a positive definite matrix. This property of the Hessian is main- tained by the algorithm using different matrix operations during the BFGS updates, for more information see MathWorks Optimization Toolbox User’s Guide (2012). The condition on the Hessian to be positive definite together with the first order optimality conditions described by the Karush-Kuhn-Tucker- equations are necessary and sufficient conditions for x to be a local minimum.

At each iteration, the gradient is computed using finite differences and the Hessian is updated. This information is then used to set up a Quadratic Program (QP) that is minimized to find the descent direction, the QP is stated at point xk as follows.

min

d∈R^{n}

1

2d^{T}H_{k}d + ∇f (x_{k})^{T}d,

s.t. ∇Gi(xk)^{T}d + gi(xk)^{T} = 0, i = 1, . . . , me, (3)

∇G_{i}(x_{k})^{T}d + g_{i}(x_{k})^{T} ≤ 0, i = m_{e}+ 1, . . . , m,

where Hk = ∇^{2}L(x, λ) is the Hessian of the Lagrangian function. The QP
is then solved using an active-set method to find a descent direction, d.

The active-set method transforms the problem to only work with active con-
straints and then moves between constraints to find a local minimum, see Math-
Works Optimization Toolbox User’s Guide (2012) for further reading. The non-
linear constraints are linearized by a first order Taylor expansion around the
point x_{k}. The step length α_{k} is then determined such that a merit function pro-
duces a better value at the next point, x_{k+1}, and also updates the Lagrangian
multipliers, λ_{i,k+1}at that point.

In summary, the sqp and active-set follows these three steps;

1. Compute gradient and update the Hessian 2. Set up and solve QP to find descent direction

3. Perform a line search with merit function to find a proper step length

As mentioned active-set and sqp works mostly in the same way, nevertheless, they differ from each other on some important aspects. First of all, sqp only takes steps in the region constrained by bounds while active-set can take intermediate infeasible steps outside bounds. Note that if the objective function is complex or undefined outside the bounds, it is preferable to never go outside the bounds, as any such point would yield an unusable value. Furthermore, sqp uses some look-ahead techniques. If for some step length the value of the objective function return Nan, Inf or a complex value, then the algorithm reduces the step length and attempts again. In addition to these differences, sqp uses more efficient linear algebra routines that both require less memory and are faster than the routines used in the active-set method.

Finally, sqp has another approach when constraints are not satisfied. The ap- proach is to combine the objective function and relaxed constraints into a merit function that is the sum of the objective function and the amount of infeasibility of the constraints. Thus, if the new point reduces the objective function and infeasibility the merit value will decrease. Also, if the next point is infeasible with respect to the non-linear constraints sqp makes a better approximation of the non-linear constraints using a second order Taylor expansion around the current point. The downside with these approaches is that the problem size becomes somewhat larger than in the active-set algorithm which might slow down the solver. See MathWorks Optimization Toolbox User’s Guide (2012) for more information.

The interior-point method, also called barrier method, successively solves a sequence of approximate minimization problems. The approach is to rearrange

the original problem, defined in (3) using a barrier function, usually a logarith- mic or inverse function, and then solving this new merit function for decreasing µ. Fmincon uses a logarithmic barrier function and sets up the following prob- lem:

minx,s f_{µ}(x, s) = min

x,s f (x) − µX

i

ln(s_{i}),

s.t. Gi(x) = 0, i = 1, . . . , me, (4)
G_{i}(x) + s_{i}= 0, i = m_{e}+ 1, . . . , m,

where si is so called slack variables and are used to transform the inequality constraints to equality constraints. The minimum to this approximate problem will approach the minimum to the original problem as µ decreases.

The problem defined in (4) is solved by defining the Lagrange function of
f_{µ}(x, s) and then solving the corresponding Karush-Kuhn-Tucker-equations.

The fmincon algorithm interior-point first tries to solve these equations di- rectly by Cholesky factorization. There are cases when a direct step might be inappropriate, e.g. when the approximate problem is not locally convex around the current iterate. Equations (4) is in this case solved by a conjugate gradient method. The approach is to minimize a quadratic approximation to the approx- imate problem in a trust region subject to linearized constraints. Conjugate gradient methods might work better when solving large and sparse systems.

Factoring these types of matrices is often too time-consuming and results in heavy memory usage. Since no inversions of matrices needs to be computed and stored the Conjugate gradient method outperforms direct methods for large and sparse system of equations, see MathWorks Optimization Toolbox User’s Guide (2012) for more information.

The interior-point method, in contrast to active-set and sqp, generates a sequence of strictly feasible iterates that converge to a solution from the interior of the feasible region.

Matlab’s Optimization User’s Guide has some recommendations on when to use which algorithm. The first recommendation is to use the interior-point method which is able to handle both large, sparse problems as well as small and dense problems i.e. it can implement both Large-Scale and Medium-Scale algorithms. It also satisfies bounds on all iterations and handles NaN and Inf results. If the problem is small a better choice could be to run sqp or active-set which is faster on smaller problems.

In Matlab the Large-Scale algorithm uses sparse matrix data structures instead of the na¨ıve matrix data structure the Medium-Scale algorithm uses. Choosing the Medium-Scale algorithm could possibly lead to better accuracy but could also result in computations that are limited in speed due to many memory accesses (MathWorks Optimization Toolbox User’s Guide 2012).

Gradient based methods have some limitations. The most obvious issue is that they require the objective function and constraints to be continuous and have continuous first derivatives. If discontinuities exist a gradient free optimization method might be used to pass these discontinuities, a gradient based method

could then be used to quickly detect a local minimum. One could also introduce several starting points and then locate a minimum by approaching it from many different directions.

Matlab provides the user with an option to compute the gradient of the objec- tive function and the constraints in parallel. The evaluations of the objective function and the constraints in the finite difference scheme are distributed be- tween the processors, for more information about parallel implementation see MathWorks Optimization Toolbox User’s Guide (2012).

2.1.2 Global optimization using fmincon

fmincon is only capable of finding and determining if a point is a local minimum, it is not capable to determine if the a minimum is a global minimum. Matlab tries to locate a global minimum with gradient based optimizations techniques using two algorithms: MultiStart and GlobalSearch. These algorithms use the approach to use many starting points and then call fmincon from these points to find the corresponding local minima. The global minimum is then chosen to be the point that has the lowest objective value and is feasible.

MultiStart is an easy and straightforward algorithm that initiates a local solver from a set of starting points and then creates a vector containing found local minima, returning the best of these points as the global minimum. The algo- rithm goes as follows:

1. Generate starting points

2. Run fmincon from these starting points

3. Create a vector of the results and return the best value as global minimum.

Matlab uses starting points that are by default distributed uniformly within the bounds, MultiStart also handles user-supplied starting points. Note that fmincon is not the only local solver that MultiStart can use, other local solvers are fminunc, lsqcurvefit and lsqnonlin. Nevertheless, we consider only fmincon in this report since it is the only gradient based local solver that handles constrained optimization problems.

GlobalSearch works in a slightly more complicated way. The starting points are generated by a scatter-search mechanism. GlobalSearch then tries to analyze these starting points and throw away points that are unlikely to generate a better minimum than the best minimum found so far. The algorithm performs as follows:

1. Run fmincon from x0

2. Generate trial points, generate a score function

3. Generate a set of points, among these points, choose the one with best score function value to be a ”Stage 1” point, run fmincon

4. Initialize basins, counters, threshold. A basin is a part of the domain for which it is assumed that all points, when used as starting points, will converge to the same minimum. An assumption is made that the basins are spherical. Threshold is the smallest objective value from x0 or ”Stage 1”. Two types of counters are initialized; number of consecutive points that lie within a basin of attraction and number of consecutive points that have score function larger than the threshold.

5. Main loop where GS examines a remaining trial point, p from the list

• Run fmincon if i-ii true:

i p is not in a existing basin, within some basin radius ii p has lower score than the threshold

iii p satisfies the bounds and/or the inequality constraints

• If fmincon runs

i Reset counters for the basins and the threshold

ii If fmincon converges, update GlobalOptimSolution vector iii Set a new threshold to score value at p and a new radius for this

basin

• If fmincon does not run

i Increment the counter for every basin containing p and set all other to 0

ii Increment the counter for the threshold if score of p is larger than the threshold and set all other to 0

iii If the basin counter is too large, decrease the basin radius and reset the counter. If the threshold counter is too large, increase the threshold and set the counter to 0

6. Create GlobalOptimSolution vector of optimum points.

GlobalSearch tries to determine in advance whether a point will result in a im- provement of the best minimum found so far. This analysis is done by checking the current points score value and also if the point already lies within an existing basin. The score value is a merit function that punishes constraint violations, thus the score of a feasible point is simply the value of the objective function.

The radius of a basin is defined to be the distance between a starting point and the point to which fmincon converged. If no point has a better score value or lies outside a basin, increase score threshold and decrease radius of existing basins until a point is found. GlobalSearch will stop when no points are left to examine or when a certain user-defined max time has been reached. See Math- Works Global Optimization Toolbox User’s Guide (2012) for more information about how GlobalSearch is implements.

While MultiStart provides the user with a choice regarding the local solver, GlobalSearch uses only fmincon. The most important difference is that Mat- lab has implemented a parallel version of MultiStart. MultiStart distributes

its starting points to multiple processors that then run fmincon locally and re- turn the result. This parallelism is not implemented in the GS algorithm, thus GS works best on single core machines while MS on the other hand works bet- ter on a multi-core machine (MathWorks Global Optimization Toolbox User’s Guide 2012). In our case we have tried to implement a na¨ıve parallelization of GlobalSearch by distributing the trial points between different cores and then run GlobalSearch locally. One could also run MultiStart and GlobalSearch using the parallel implemenation of fmincon, computing the gradients in par- allel.

### 2.2 Simulated annealing

Simulated annealing, SA, described by Kirkpatrick, Gelatt & Vecchi (1983), is a stochastic optimization method which takes inspiration from the physical pro- cess of annealing: heating and cooling a material under controlled circumstances to reduce defects. A general description of the SA algorithm is shown in Fig. 1.

input : Objective function f (x) : R^{n}→ R
Annealing function fAnn(T ) : R → R^{n}

Acceptance function fAcc(T, ∆f ) : R^{2}→ [0, 1]

Temperature function fT(k) : N^{0}→ [0, ∞)
Initial temperature T0∈ R

Initial point x0∈ R^{n}
output: A point xn∈ R^{n}
k = 0

while Stopping criteria are not met do

// Generate a trial point and test it if f (xt) < f (xi) then

x_{i+1}= x_{t}
else

xi+1= xtwith probability fAcc(T, f (xi) − f (xt)) xi+1= xi otherwise

end

// Check for reannealing if reannealing then

k = 0 else

k = k + 1 end

// Update the temperature T = fT(k, T0)

end

Figure 1: Algorithm for simulated annealing for some general objective
function f , annealing function f_{Ann}, acceptance function f_{Acc}, temperature
function fT, initial temperature T0 and initial point x0.

As Fig. 1 illustrates, SA stores a current point xi, creates a trial point xt, given by interpreting the stochastic annealing function fAnn as a step from xi. The

trial point is accepted as the next point x_{i+1}with probability 1 if it has a lower
objective value than the current point and with some probability, defined by the
acceptance function fAnn, if it has higher objective value. It is important to note
that the step size and the probability to accept worse points both decrease with
the temperature and that the temperature decreases with increasing annealing
parameter k.

Simulated annealing may thus be described as one type of direct search method and as such will have a slow asymptotic convergence rate (Tamara G. Kolda, Robert Michael Lewis & Virginia Torczon 2003) but it requires no information about the gradient or higher order differentials of the objective function meaning it is more robust to non-smooth or discontinuous objectives than gradient based methods. It should however be noted that there are significant problems for SA when applying penalty methods (Pedamallu & Ozdamar 2008) and as such the SA algorithm implemented in MATLAB is only capable of handling bounds constraints. However, both of these issues might to some degree be alleviated by implementing a hybrid solver and adjusting the stopping criteria. A hybrid solver refers to running a second solver after the primary solver; for instance running fmincon after simulated annealing. Matlab has built-in support for this functionality and it will be referred to as hybrid simulated annealing, HSA.

MATLAB’s implementation of simulated annealing is a part of the global opti- mization toolbox (GADS) and allows only bounds constraints. To satisfy other constraints a constrained hybrid function can be used, such as fmincon. As the optimization problem studied in this report has both linear and nonlinear constraints, we will only consider hybrid simulated annealing (HSA) making use of fmincon. For detailed information on how to set the options and call the functions, refer to MathWorks Global Optimization Toolbox User’s Guide (2012).

When adjusting the settings for simulated annealing, it is important to configure the temperature in an efficient way. If the point, generated by the annealing function, is outside bounds, each coordinate will be separately adjusted so that the trial point fulfills the bounds. This means that if the temperature is too large, few of the trial points will actually fall in the interior of the domain until the temperature has been decreased. If the domain is very elongated, it may be of interest to transform the domain so that the bounds are normalized.

From a high-performance computation and parallelization viewpoint it’s impor- tant to note that each execution of simulated annealing on the form given in Fig. 1 cannot be parallelized. A na¨ıve method of using parallel resources is thus to let each computational core run a unique instance of simulated annealing, gather up the results from each core and choose the solution with the lowest objective value.

### 2.3 Genetic algorithms

Genetic algorithms, GA, take inspiration from the process of natural selection by interpreting the objective function as a measure of fitness and the coordinates

of each point as the genes of an individual. Each iteration of the method is called a generation and consists of a set of individuals. Every new generation is created by randomly selecting individuals, called parents, from the current generation and in some fashion creating new individuals, children, from the genes of the parents. Three common ways to do this are by elitism, mutation and crossover. Elite children are not changed from one generation to the next;

mutation children introduce a random change in the genes of a single parent;

crossover children randomly combine the genes of two parents according to some algorithm, called crossover or recombination. See Fig. 2 for a description of GA incorporating these three types of children.

input : Fitness function f (x) : R^{n}→ R
Rank function fR(F (x)) : R → R

Selection function (possibly stochastic) fS:→ R^{n}
Mutation function (stochastic) fM: ∅ → R^{n}
Recombination function fRec(xa, xb) : R^{2n}→ R^{n}
Population size n ∈ N, n ≥ 2

Number of elite children E ∈ N^{+}
Number of mutation children M ∈ N^{0}
output: A point xn∈ R^{n}

while Stopping criteria not met do // Evaluate current generation for each individual i do

evaluate the fitness function end

for each individual i do evaluate the rank function end

// Create next generation for each elite child do

from the parents that have not yet been selected:

select the parent xa with the best rank child = xa

end

for each mutation child do

select a parents xb with the selection function child = xb+ fM

end

for each crossover child do

select two parents x_{c}, x_{d} with the selection function
child = f_{Rec}(x_{c}, x_{d})

end

Set the new generation as the current generation end

Figure 2: Algorithm for GA implementing elitism, mutation and crossover for some general fitness/objective function f .

Elite children are always chosen as the individuals with the best objective value.

This means that as long as there is at least one elite child the best solution found

so far will always be kept to the next generation and the sequence of the best solution in each generation will be a non-increasing sequence of numbers.

When running GA it is important to make sure that the genetic diversity of the population is large. If the recombination function is limited to simple crossover, picking each gene from a random parent, then the only way to introduce entirely new genes into the population is by mutation. This means that if a global minimum is not reachable from the initial set of genes, then the convergence of the method to that global optimum is at best slow, which is consistent with the description of direct search methods given by Tamara G. Kolda et al. (2003).

A simple method of parallelization of GA, the one implemented in MATLAB, is to evaluate the fitness function of the different individuals of a generation in parallel; this produces an increase in performance if the fitness function is sufficiently computationally heavy.

MATLAB’s implementation of GA uses penalty functions to handle nonlinear constraints. By setting penalties for violating the constraints and successively increasing them the solution found will hopefully approach a feasible global minimum. This is done by solving a series of subproblems where each problem has a higher penalty value than the last. As this requires several generations for each subproblem, GA may require a substantially higher number of total generation to converge when using nonlinear constraints. Furthermore, when talking about generations for such problems, one often refers to each subproblem as one generation of the complete problem, even though each subproblem has several generations of GA in them.

The implementation also allows flexibility in choosing the selection, mutation and recombination functions (the latter being called crossover in MATLAB) as well as several extra options, such as multiple populations, migration and multi objective optimization. For more information, refer to the MathWorks Global Optimization Toolbox User’s Guide (2012).

### 2.4 Pattern search

Pattern search represents the family of direct search algorithms for optimization of different functions. Direct search methods are not using any derivatives which mean that these methods are very useful when the objective function is not differentiable. The disadvantage is that these methods can be computationally very expensive; unlike gradient based methods they do not know a direction to search for lower objective values; instead they test multiple points in the vicinity of the current point, possibly leading to iterations where little or no improvement is seen. Pattern search is choosing its direction according to specified Poll Method. Polling means that we are “questioning”, “picking” the right points regarding to a chosen method/algorithm. The simplest algorithm for pattern search can be presented in the algorithm shown in Fig. 3.

Figure 3: Algorithm of Pattern Search.

2.4.1 Pattern search: important options

Pattern search has many options which can affect the performance and the result of a single computation. First of all, there are three Poll Methods for pattern search:

- GSS (Generalized Set Search) - GPS (Generalized Pattern Search) - MADS (Mesh Adaptive Direct Search)

Every Poll Method has two basis sets: PositiveBasisNp1 and PositiveBasis2N.

These basis sets are creating a pattern for our search. You can find more information about the different basis sets in MathWorks Global Optimization Toolbox User’s Guide (2012).

So in the simplest algorithm under the Poll Step we meant one of the Poll Methods. Secondly, pattern search has the following six search methods:

- GSS (Generalized Set Search);

- GPS (Generalized Pattern Search);

- MADS (Mesh Adaptive Direct Search);

- searchlhs (search using Latin Hypercube Algorithm);

- searchga (search using Genetic Algorithm);

- searchneldermead (search with Nelder-Mead algorithm).

The detailed description of these algorithms is given in MathWorks Global Op- timization Toolbox User’s Guide (2012). The only detail we mention is that it wasn’t possible to use searchneldermead, because it cannot handle constraints and our objective function was given with constraints.

If we are specifying poll and search method the new algorithm for optimization would look like the simplest algorithm for pattern search, shown in Fig. 3, with the addition of a search step before the poll step. The search step, like the poll step, attempts to find a better point than the current point and if the search method manages to improve the solution, then the poll step is skipped for that iteration. Note that if the same method is used in the search and poll step, the result from both steps would be identical so the poll step is skipped. As a default setting we have specified poll method GPSPositiveBasis2N. To do search one has to specify a search method, because there is no default setting for this.

The third important option is ‘Complete Poll’, as a default setting we had

‘Complete Poll’ disabled (‘off’) what means that our algorithm will stop polling as soon as it finds the value of the objective function which is be less than that of the current polling. When it happened will can call our poll successful and the found point will become a start point at the next iteration (MathWorks Global Optimization Toolbox User’s Guide 2012). If ‘Complete Poll’ is enabled (‘on’) the chosen algorithm will compute the values of objective function at the all mesh points. Then the method will compare the smallest value of objective function to the value at the current point. If the mesh point has the smallest value then the poll will be called successful (MathWorks Global Optimization Toolbox User’s Guide 2012).

The fourth important option is ‘TolBind’ (binding tolerance), which speci- fies the tolerance for linear and nonlinear constraints . The default value is

‘TolBind’=1e-3. However, with the default binding tolerance patternsearch could find strictly infeasible points with significantly lower objective value than any truly feasible point, leading to results which may be misinterpreted. A large binding tolerance also increases the region which may be searched, possibly in- creasing the time for finding a solution from inside the domain as more iterations may be required to find the optimum. Our results in Tables and Figures show the difference.

Pattern search takes care of nonlinear constraints. It formulates a subproblem by combining the objective function and the function for the nonlinear con- straints, here the Lagrangian is used and some penalty parameters. Also it is important to note that here the nonlinear constraints are handled separately from the linear constraints and bounds. At every iteration we get a new solu- tion of the new subproblem (MathWorks Global Optimization Toolbox User’s Guide 2012).

By specifying the use of parallelism inside the algorithm of pattern search the pattern search function will compute the values of objective function and con- straint function in parallel. For learning more read MathWorks Global Opti-

mization Toolbox User’s Guide (2012)

### 3 Method of solving the optimizational problem

### 3.1 Matlab framework

3.1.1 Matlab toolboxes

There are a number of different toolboxes available in Matlab; among these are the Global Optimization Toolbox (GADS) and the Parallel Computing Toolbox (PCT).

GADS adds support for a number of different global optimization methods; see Table 1 for a list of the methods used in this report.

Table 1: The Matlab implementations of the used global optimization methods.

Optimization method Matlab implementation Multi start MultiStart, fmincon Global search GlobalSearch, fmincon Simulated annealing simulannealbnd Genetic algorithm ga

Pattern search patternsearch

PCT lets a user start a local parallel environment in Matlab using a master- worker model. In such a local environment the user is limited to accessing the computational resources on the local computer, with a maximum of 12 workers.

Matlab’s Distributed Computation Server (MDCS) extends the functionality to allow clusters of any size for the computations.

3.1.2 Our implementation of optimization framework

With the different global optimization methods introduced with GADS and the parallel computational possibilities with PCT and MDCS there is no standard- ized framework which allows for simple switching between different optimization methods. By implementing such a framework we achieve two things. First, we make it easier for the user to call the different optimization functions in GADS in combination with the parallel environment from PCT and MDCS. Second, we introduce a structure to the program which means that performance optimiza- tion can be done once by a programmer and all users would benefit from that optimization, regardless of their knowledge of high performance computing.

The framework has two levels. The topmost level is globalOptimization, the function which the user will call. This defines a number of default parameters for the method-independent settings and supports calling the function either on a specific struct, containing all the data, or by using a standard string-value pair

method of input. The second level is the algorithm-specific optimization meth- ods. These are essentially shells for each global optimization method making sure to call the corresponding GADS function with the correct syntax.

After the GADS optimization functions have finished, the data they provide is stored in a result struct. Using a struct makes the framework output-agnostic, meaning that any output entered into the result struct will be given to the user.

This structure also makes the framework method-agnostic, meaning that new methods can be added just by implementing a method-specific optimization function with the correct name. The correct name means that you will have to specify, spell and put the settings regarding to used methods.

### 3.2 Approach to parameter selection

It should be noted that the feasible point with the lowest objective value found during all our tests was found by all methods except GA and had an objective value of approximately 0.0036. This is taken as the global minimum.

3.2.1 MultiStart

Since both MultiStart and GlobalSearch use fmincon, an investigation is made to determine which of the optimization techniques used by fmincon that was the most accurate and fastest for this particular problem. A run was made with MultiStart with nine randomly distributed starting points and a pre- defined starting point, ten starting points all together. All three algorithm had the same tolerances as convergence criteria.

One clearly sees that both sqp and active-set outperform interior-point in speed. Furthermore, sqp and active-set were able to find the global minimum which interior-point wasn’t able to do. An attempt was made to optimize the interior-point algorithm by comparing the method when using large scale vs.

medum scale, ldl-factorization vs. conjugate gradient method but no improve- ment was detected. With this in mind we decided to omit the interior-point optimization technique from further studies.

The methods sqp and active-set were then closely analyzed in more details
to find the optimal setup for each of the methods. First of all the tolerances
were varied to investigate how much the termination tolerance in x, termina-
tion tolerance on the function value and termination tolerance on the constraint
violation will influence accuracy and speed. A run was performed that varied
these tolerances between 10^{−1} and 10^{−10}. All the runs found the global mini-
mum but varied in time, as expected. Here one should keep in mind how many
decimals are present in the given data, i.e. constraints and starting points. The
tolerances should at least be of a higher accuracy than this particular data. The
maximal number of decimals present in the given data where of the order 10^{−3},
we therefore chose to set the tolerances to 10^{−4}and were able to gain some time
but still converged to find the global optimum, compare Fig. 4 and Fig. 5.

Figure 4: Comparison with MultiStart using sqp, active-set and interior-point with default settings.

When using sqp one can choose to normalize all constraints and the objective function which would be appropriate for this particular problem since the fea- sible set is elongated in one of the dimensions. However, no improvement on accuracy or speed were noted using this feature on the sqp algorithm. Further- more, we point out that while both active-set and sqp found the same global minimum active-set was slightly faster.

Matlab also provides the user with a choice of which starting points MultiStart should run fmincon from. One could choose to run from all starting points, only from starting points that satisfy bounds and finally from starting points that satisfies both bounds and inequality constraints. A test was made to investi- gate how these methods impacted the result. Restricting MultiStart to run fmincon only from starting points that are within bounds is only interesting when the starting points are generated in a different way than MultiStart does by default, namely to distribute them uniformly within bounds. However, when MultiStart was confined to only run fmincon from starting points that satis- fies bounds and inequality constraints the run was considerably faster but the global minimum wasn’t found.

Further studies on MultiStart were made to investigate how it scaled when us- ing multiple processors. This study was made using first and foremost active-set, since it was the fastest optimization technique on this particular problem. Nev- ertheless, we kept sqp in the study since the results could be used as a compar- ison with active-set.

Running MultiStart from 16 starting points in parallel on 8 cores with 0 to 8 workers resulted in the speedup presented in Fig. 6. The sqp method seems to be the better of the two methods since it scales better than active-set. On

Figure 5: Comparing MultiStart using sqp, active-set and interior-point
with tolerances set to 10^{−4}.

the other hand, active-set is always faster than sqp as shown in Fig. 7. Thus,
active-set with tolerances set to be 10^{−4} is the fastest fmincon algorithm.

Speed and accuracy to locate a minimum depend heavily on which starting points MultiStart uses. If the starting point is far from the local minimum it will take more time and if the starting point lies in a basin which doesn’t contain the global minimum the accuracy will be worse. Since the starting points are generated in a stochastic process a statistical study on MultiStart was made.

MultiStart was run using 1500 starting points, randomly distributed, the result was then analyzed. The run showed that 10 % didn’t converge at all, in Matlab this is implied by a negative exitflag in this case exit flag -2 which means that no feasible point was found. Furthermore, 60 % of the points did converge to a minimum but not to the global minimum. Finally, 30 % converged to the global minimum.

More importantly we noticed that of all the starting points that converged to the global minimum a majority of the points needed very few iterations, see Fig.

8.

More specifically, if a starting point converges to the global minimum the prob- ability that it will need 5 iterations or less is approximately 0.6, see Fig. 9.

Given this result we limited the maximum number of iterations that fmincon were allowed to take to 5. This resulted in a decrease in time by a factor of 7 and also in a better speedup, see Figures 23 and 24.

A serial run of 4 points and a parallel run with 4 points per worker using 8 workers was profiled with Matlab’s profiler. The serial run took 174.2 seconds, of which 172.4 were spent in the MEX-file FinalPassCalc. The parallel run took 199.5 second of which 198.0 were spent either in FinalPassCalc or in a

Figure 6: Speedup plot for MultiStart using sqp and active-set on an eight core machine.

Figure 7: Time taken for MultiStart using sqp and active-set on an eight core machine.

certain java method (java.util.concurrent.LinkedBlockingQueue) which is used for Matlab’s parallelism. From this data it is clear that the majority of time is spent in, and any future attempts to speed up the code should be focused on, FinalPassCalc.

Figure 8: Number of iterations needed for a starting point to find the global minimum.

Figure 9: Cumulative mass distribution showing the probability that a starting point will need a certain number of iterations or less to find the global minimum.

3.2.2 GlobalSearch

GlobalSearch was investigated using the same choice of parameters for fmincon as in MultiStart. In this case, the default setting weren’t able to find the global minimum. We therefore increased the penalty threshold factor and the penalty basin factor until the global minimum was found using 200 starting points and 50 stage 1 points. Tuned to always find the global minimum the parallel implementation was studied. In this case there is no specific parallelism implemented in the GlobalSearch algorithm. It is however possible to compute the finite differences in parallel when fmincon is used. Trying this approach on a computer with 8 cores using 0 to 8 workers some speedup was obtained, see Fig. 10.

The time for GlobalSearch to find the global minimum on 8 cores is by far

Figure 10: GlobalSearch shows almost no speedup when using up to 8 workers.

outrun by the MultiStart method. Due to limitation in time most of the effort was spent to investigate and improve MultiStart since it gave the most promising results both in speed and accuracy.

3.2.3 Hybrid Simulated Annealing

First, recall let’s remind ourselves that hybrid simulated annealing (HSA) refers to running two solvers in series: first simulated annealing (SA), then some other solver which can satisfy all necessary constraints. The hybrid solver for con- strained problems is Matlab’s fmincon and can be used by setting the hybridfcn option for SA in Matlab. It should, however, be noted that the framework im- plementation did not use this functionality, but instead called fmincon directly on simulated annealing’s solution, as to be able to get more detailed data for analysis. This design choice will give identical solutions and the difference in time taken between the two choices is insignificant.

An important issue that might influence the performance of the methods is the length of the interval each coordinate is bounded in. From Table 2 it is evident that the domain is very elongated in the 11:th coordinate.

This can be dealt with in two ways. First is to adjust the initial scalar tem- perature such that the annealing steps taken are small enough compared to the smallest constraint. Second is to normalize the annealing step with respect to bounds, which Matlab supports by setting a vector valued temperature. To compare these two methods a sweep of normalized initial temperature factor was performed to find the optimal value for the default settings. Fig. 11 - 13

Table 2: The magnitude of the difference in upper and lower bounds vary by a large amount.

Coordinate Upper bound Lower bound Difference

1 0.036 0.003 0.033

2 0.036 0.003 0.033

3 0.036 0.003 0.033

4 0.036 0.002 0.034

5 0.036 0.002 0.034

6 2.762 2.260 0.502

7 3.020 2.471 0.549

8 3.258 2.666 0.592

9 3.597 2.943 0.654

10 3.850 3.150 0.700

11 15.000 -15.000 30.000

12 0.167 0.137 0.030

show that a factor of 1.5 was statistically significantly faster than any other setting and found the global minimum as often as all other methods, though without significance.

Comparing this to using non-normalized initial temperatures of 0.03 showed a significantly lower time taken for the non-normalized initial temperature. For the probability of finding the global minimum, no statistical significance could be shown but using non-normalized initial temperature did find the minimum more times than using the normalized temperature did. To limit the scope of this project we therefor focus on using non-normalized initial temperatures.

Once the initial temperature is calibrated, the temperature function and the annealing function are the primary remaining parts of SA. Using the initial temperature 0.03 for all combinations of these functions showed that among the combinations that found the global minimum the most times the combination of

@annealingboltz and @temperatureexp showed a statistically significantly lower time of execution.

Limited to a single combination of annealing and temperature function it is not prohibitive to perform a more detailed sweep over initial temperature. Fig. 14 and Fig. 15 show that the already used initial temperature of 0.03 is likely to be the best choice, giving a probability of 0.15 to 0.47 of finding the global minimum.

At last, parallelization is implemented and 16 hybrid SA are run in parallel using up to 8 workers. The attained speedup is shown in Figure 16. Measurements of the run statistics are shown in Table 3. The probability of finding the global minimum was 0.76(12) using a 95% confidence interval.

Finally, it is worth mentioning that the settings for the hybrid solver, as is reasonable, strongly affects both the accuracy and the speed of the execution.

Using the results of section 3.2.1 as a guideline, active-set was chosen as the method for fmincon. It was also found that limiting the first few steps in

Figure 11: The probability of finding the global minimum with different nor- malized temperature factors has not been statistically shown to be better for any one choice.

Table 3: Measurements of the run time for hybrid SA using the final settings 16 hybrid SA runs Serial Parallel (8 workers)

Mean time (s) 4439 845

Confidence interval (s) ±64 ± 37

Standard deviation of time (s) 232 102

magnitude was highly beneficial. See Table 4 for the hybrid solver settings.

Table 4: Settings used for fmincon as a hybrid solver

Parameter (s) Choice

Algorithm active-set

RelLineSrchBnd 0.000001 RelLineSrchBndDuration 3

Figure 12: A closer examination around normalized temperature factor 1.4 in- dicates that a factor of 1.4 - 1.6 is statistically significantly more likely to find the minimum than a factor of 1.2.

Figure 13: A closer examination around normalized temperature factor 1.4 in- dicates that a factor of 1.5 is statistically significantly faster than other settings.

Figure 14: It is likely, though not completely statistically proven, that an initial scalar temperature of 0.03 is on average faster than most other choices from 0.005 to 0.08.

Figure 15: An initial scalar temperature of 0.03 or 0.055 show a higher prob- ability of finding the global minimum than most other choices in the interval 0.005 to 0.08

Figure 16: When running 8 hybrid SA on 1 to 8 workers the speedup is significant but not linear.

3.2.4 Genetic Algorithm

For genetic algorithms the creation of children is of high importance. Choosing how to generate new mutation or crossover children can strongly affect how efficient the algorithm is. Consider for instance an unconstrained 2D-function with an objective function which has low, negative values in a circular trench around origo and quickly approaches 0 as the distance from the trench increases.

In such a case, choosing crossover children by picking coordinates from each parent at random would generally not create good children from good parents;

a large part of the crossover children would be created in vain, requiring a lot of work.

As the specifics of the objective function and constraints are unknown we cannot a priori tell how the different crossover functions will work for the given problem.

For this reason it is of interest to test the different built-in crossover functions.

See Fig. 17 and Fig. 18 for the time taken and objective value found for 5 generations of GA with nonlinear constraints. The exceptions are the three runs that converged in less than 250 seconds; these attempts converged in 3, 2 and 2 generations respectively, despite stringent convergence criteria.

It’s clearly visible that the time taken for each generation doesn’t vary much, while there may be some advantage to the objective function in making an in- formed selection of the crossover function. Heuristic crossover and intermediate crossover show a high probability of improvement, albeit that the improvement is small and none of the runs were close to finding the global minimum of 0.0036.

Figure 17: The time taken per generation is not strongly influenced by the choice of crossover function.

For details on how these crossover functions work, refer to MathWorks Global

Figure 18: None of the default crossover functions with default settings manages to find the global optimum in 5 generations.

Optimization Toolbox User’s Guide (2012).

The next phase is to investigate how these results vary with the crossover frac- tion (the rate of crossover children to mutation children) and, for heuristic crossover, the ratio, which is an additional setting for the generation of crossover children using heuristic crossover. See Fig. 19 - 21.

Again, these figures show the objective value after 5 generations of GA. While the result has improved, it is still far from finding the global minimum even once.

Furthermore, runs of up to 15 generations of GA shows slow or insignificant improvement for further generations up until a point where the time required is so much larger than other reliable methods, such as MultiStart, that any further attempts to improve GA seems futile; a serial MultiStart can complete the optimization in the same time as serial GA can perform 10 generations, which is not enough for finding the global minimum. As a final nail in the coffin, Fig. 22 shows that GA does not even scale well so for parallel execution GA will be comparatively even worse.

For this reason GA will not be considered in more depth in this report.

Figure 19: Varying the crossover fraction for intermediate crossover improves the solution but still does not find the global optimum.

Figure 20: Varying the crossover fraction for heuristic crossover improves the solution but still does not find the global optimum.

Figure 21: Varying the ratio for heuristic crossover improves the solution but still does not find the global optimum.

Figure 22: There is a measurable speedup for GA, but it is much less than linear.

3.2.5 Pattern Search

Pattern search has many options that made it difficult to choose the best ones for our tests. In the beginning our aim was to test all options and find suitable settings for running the objective function which we got from ABB. After doing some tests on smaller problems we found out that all possible cases for MADS methods are meaningless for the objective function of our interest. They were too slow, so it wasn’t possible to test all settings for MADS as poll and search methods. The same opinion we got about pattern search using genetic algorithm as a search method. Both MADS and searchga showed very bad performance and were not so accurate. The main interest of our experiments was: GSS, GPS as poll and search methods and searchlhs as a search method. The cases were created in this way:

1. Running the default case with default setting serial and parallel.

2. Running another cases and choosing tolerance for changes in x ‘TolX’,

‘CompletePoll’, ‘on’, specifying search and poll methods, using parallelism when we specified more than 0 workers, and some cases had an option

‘TolBind’ - binding tolerance, to check if the constraints are active.

The reason of such choice is the time: it wasn’t enough time to try everything, so for the fastest and the most accurate cases we could try some extra settings to improve the result.

### 4 Results

### 4.1 Gradient based methods

The best result in speed, accuracy and parallel speedup was obtained when using MultiStart that used the active-set optimization technique when running fmincon. The convergence criteria were slightly reduced to 1e-4. Furthermore, maximum allowed iterations was limited to 5. The result of speedup and time taken is shown in Figures 23 and 24.

Figure 23: Final speedup for MultiStart with active-set and maximum al- lowed iterations five on an eight core machine. Comparison is also made whith default case when no limit on maximum allowed iterations is imposed.

Figure 24: Final time plot for MultiStart, with active-set and maximum allowed iterations five, on an eight core machine. Comparison is also made whith default case when no limit on maximum allowed iterations is imposed.

r

### 4.2 Hybrid simulated annealing

The simulated annealing part of hybrid SA can be very fast compared to the other solvers due to the fact that it does not use the computationally heavy nonlinear constraints shown in Fig. 34. It can also be moderately accurate, with a probability of 0.76(12) probability (95% confidence) of finding the global minimum when running 16 instances. The na¨ıve implementation of parallelism, letting each worker run one or more instances of hybrid SA, gives close to linear speedup when using two or more instances per worker. Using more instances would make the speedup approach linearity, but would increase the total time taken.

### 4.3 Genetic algorithms

Genetic algorithms took approximately 60 seconds per generation and didn’t manage to find the global optimum even once when using 15 or fewer genera- tions. The speedup on up to 8 workers was low, up to only twice the speed.

### 4.4 Pattern search

Pattern search showed accurate results during experiments on all tested objec- tive functions. The biggest problem with pattern search was its computational speed. All experiments on all objective functions took very long time what be- came a reason for the choice of test parameters for the main objective function.

That is why we didn’t try many options for MADS search and poll methods, searchga as a search method and searchlhs as a search method.

4.4.1 Pattern search on Windows-cluster

In this section we present the results for the main objective function. Because MEX-file was done for Windows we had no opportunity to run our optimization on machine with more than 8 cores. It became the reason for doing most of simulations on the main function. After doing several experiment for different methods on different numbers of cores the resulting data for the optimal values of the main objective function is shown in Fig. 25.

The Figure 25 shows that the best value of the objective function is 0.0026, but this value was produced during optimization with default value for bind- ing tolerance when ‘TolBind’=1e-3. To get the right result it is important to specify binding tolerance because it will show the result when the constraints are active. Otherwise the point could be found outside of the feasible region.

So by specifying the binding tolerance as ‘TolBind’=1e-6 or 1e-10 the result has changed to the value 0.0036, which we can see as a result from sequence number 18. Binding tolerance can affect the computational time. When it is specified smaller than a default value our method can converge faster because

Figure 25: The smallest values of the objective function at different sequences of runs (sequence here is several runs for the same method on different numbers of cores)

it won’t compute outside of the feasible region. Computational time is indeed a very important parameter when someone wants to choose the right method for finding the correct result. In pattern search we had methods which gave us the desirable values faster than other methods. GPSPositiveBasis2N is the fastest search method from all deterministic methods. We had some stochas- tic methods like search using genetic algorithm, MADS and search using Latin Hypercube algorithm. This means that if we use the same starting point for performing our optimization we will get different result each time we are run- ning our code. MADS showed different results when we gave the same starting point and it took almost the same time for computing the values no matter that the computation was performed on different numbers of processors. For MADSPositiveBasisNp1 the results is shown in Fig. 26.

From the Figure 26 we can see that MADSPositiveBasisNp1 is an expensive poll method which is not good to use if you want to save your time, and not so accurate. Also one can notice that there won’t be possible to see any speedup (here blue colour bars are symbolizing 2 workers, green - 6 workers, red - 8 workers).

MADSPositiveBasis2N is faster than the previous MADS-method but even if we use a big cluster we won’t win any time, as we can see on Fig. 27 for this method the time almost the same for all our computations. Depending on how the algorithm will generate vectors we will get different results every time we run our computation. The results are not accurate as well.

On this figure blue color is symbolizing run on 2 workers, green - on 6 workers, red - on 8 workers. Easy to see that increasing number of workers doesn’t give us any win in time at the second sequence and very little time between 8 and 6

Figure 26: Computational time for different sequences of runs using MADSPos- itiveBasisNp1 as a poll method.

Figure 27: Computational time for MADSPositiveBasis2N as a poll method.

workers at the first sequence.

The best method of all is GPSPositiveBasis2N: here one gets an accurate result fast; speedup is one of its characteristics when running on Windows-cluster.

When one chooses the search method the same as the poll method the algorithm performs only search what will save the time. The best speedup of all used methods we can observe for GPSPositiveBasis2N is shown in Fig. 28.

The Figure 28 was the best observation of all concerning speedup. One can say that probably speedup can be seen on MADS computation from the first

Figure 28: Speedup for GPSPositiveBasis2N as poll and search methods.

sequence, but when we try to run the method several times the observations won’t be the same. There is no stable speedup for any method from algorithms for pattern search except GPSPositiveBasis2N. The accurate optimal values can be found for the other algorithms, of course. But if one will consider all desirable parameters for finding the best method for performing an “online” optimization GPSPositiveBasis2N can be the one.

Another part of the study was the search method using Latin Hypercube algo- rithm. It is a stochastic method. Comparing to MADS method searchlhs gave always the same correct result, was faster than MADS but wasn’t possible to make any conclusions about its speedup. Fig. 29 shows that the computational time is different for the same initial conditions.

Important to mention is that for running searchlhs algorithm we have chosen the fastest and the most accurate poll method. Probably if one would choose MADS-polling the results might differ. Here the optimal value was the same after every run, probably, because of chosen poll method.

Figure 29: Computational time for different runs of searchlhs with GPSPosi- tiveBasis2N polling on a single core.

4.4.2 Pattern search on a Linux-cluster

Here the objective function was different from the previous experiments. The goal was to see if the use of Matlab’s Distributed Computing Server Software could improve the results, i.e. if one could get the accurate result faster. The objective function was different from the previous function which we had on Windows-cluster. The resource had 32 cores for usage. All the methods showed the same behaviour which was seen in previous experiments: no speedup or accuracy for MADS and searchga methods, GPSPositiveBasis2N showed the best results concerning accuracy and speedup when we chose the same search and poll methods what disabled polling. The best value of objective function was approximately -400.

The speedup for GPSPositiveBasis2N can be seen on the next figure.

As we see in the beginning of the figure when we specify 0 workers we get the result faster than when we specify 1 worker because of time which was spent on “unnecessary” communication, by sending the job to one worker. On 32 workers we got speedup up to 7.1631 what is a good value comparing with the optimization on 8 cores where this value was around 2.

Figure 30: Speedup for GPSPositiveBasis2N specified as search and poll meth- ods.

### 4.5 Comparison of methods

For serial evaluation patternsearch is on average the fastest method but MultiStart is only slightly slower and performs much more consistently, as shown in Fig. 31.

If any parallel processing is available, MultiStart scales better than patternsearch, as shown in Fig. 32 and at 8 workers it’s almost 3 times as fast, as shown in Fig. 33. It should be noted that for this data, all methods consistently found the global minimum. Genetic algorithms are not represented in Fig. 31 and Fig. 33 since they failed to find the global optimum.

Figure 31: For serial evaluation patternsearch provided the on average fastest method while MultiStart is almost as fast but much more consistent.

Figure 32: MultiStart and hybrid SA show decent speedup, though not linear, while the other methods show much worse scaling.

Figure 33: For parallel evaluation using 8 workers, MultiStart is 3 times as fast as the second fastest method, patternsearch.

### 5 Discussion

### 5.1 Gradient based solvers

First, why is active-set and sqp faster than the interior-point method?

This is probably due to that the global minimum lies on the border of the feasible set, e.i. at least one of the constraints are active at that point. This should make the interior-point method slow since it initially forces the solution to the interior of the feasible region and hence needs to iterate over a longer distance than sqp and active-set. Furthermore, the global minimum may only be reached when approaching it on the border of the feasible set, or even worse for interior-point, from the infeasible side of the inequality constraints.

Even though sqp has more effective linear algebra routines to solve the resulting system of equations, active-set is consistently the fastest method. In this case, the size of the problem is probably too small for the efficiency in the sqp algorithm to be fealt in the computations. Also, as shown in Fig. 34, reading the non-linear constraints is very expensive. Therefore, in this particular case, the performances of the algorithms rely heavily on how they treat the non- linear constraints. As the sqp method, in contrast to active-set, sometimes approximates the nonlinear constraints using a second order taylor expansion this should slow down sqp compared to active-set.

Figure 34: The time required to evaluate the nonlinear constraints dominate the time it takes to evaluate the objective function.

Regarding the assumption we made for MultiStart, we said that if a starting point needs more than 5 iterations to find the global minimum it won’t probably find it and is therefore not interesting. This assumption came from the results in our statistical investigation that most of the points that converged to the

global minimum did not need more than 5 iterations. If more iterations were needed, the global minimum will probably not be found or not converge at all.

The computational effort needed to take iterations above 5 is thus most likely unnecessary.

We also made an statistical investigation on how the restricted MultiStart behaved. By investigating a run with MultiStart using 100000 uniformly dis- tributed starting points we found out that 82 % of the starting points didn’t found any minimum, they either stopped when reaching it’s 5th iteration or couldn’t find a feasible point. Furthermore, 16.9 % of all the starting points converged to the global minimum. Finally, 1.1 % did converge but not to the global minimum, the result is showed in Fig. 35. Thus, when using MultiStart with maximum allowed iterations 5, you should use at least 17 starting points to find the global minimum with a probability larger than 0.95.

Figure 35: Convergence study on MultiStart using active-set and maximum 5 iterations from each

starting point.

Another result from the limitation of allowed iterations is that we obtained better parallel speedup, see Fig. 23. This is due to that we get better load balance since the work done on each worker can’t be more than 5 iterations. In the case when no limitation was imposed, the iterations could vary a lot, see Fig. 8.

Regarding GlobalSearch we were able to find the global minimum, at least when the radius of the assumed basins where sufficiently decreased and the threshold value sufficiently increased. Nevertheless, GlobalSearch wasn’t fast enough when compared to the restricted variant of MultiStart. GlobalSearch is more complicated in how it generates it’s starting points and how good starting points are selected to run fmincon from.

### 5.2 Hybrid simulated annealing

There are, undeniably, issues with implementing simulated annealing for opti- mization problems with other constraints than bounds. When using SA on the given objective function and constraints, it tends to quickly leave the feasible region and find infeasible points that are better than any feasible ones. Imple-