Tuning of Metaheuristics for Systems Biology Applications

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Tuning of Metaheuristics for Systems Biology

Applications

Examensarbete utfört i Reglerteknik vid Tekniska högskolan vid Linköpings universitet

av

Anton Höghäll

LiTH-ISY-EX--10/4387--SE

Linköping 2010

Department of Electrical Engineering Linköpings tekniska högskola

Linköpings universitet Linköpings universitet

(2)

(3)

Tuning of Metaheuristics for Systems Biology

Applications

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan i Linköping

av

Anton Höghäll

LiTH-ISY-EX--10/4387--SE

Handledare: Christian Lyzell

isy, Linköpings universitet

Gunnar Cedersund

ike, Linköpings universitet

Jan Brugård

MathCore

Examinator: Martin Enqvist

isy, Linköpings universitet

(4)

(5)

Avdelning, Institution Division, Department

Division of Automatic Control Department of Electrical Engineering Linköpings universitet

SE-581 83 Linköping, Sweden

Datum Date 2010-08-19 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version http://www.control.isy.liu.se http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-58842 ISBN — ISRN LiTH-ISY-EX--10/4387--SE Serietitel och serienummer Title of series, numbering

ISSN —

Titel Title

Fininställning av metaheuristiker för systembiologiska tillämpningar Tuning of Metaheuristics for Systems Biology Applications

Författare Author

Anton Höghäll

Sammanfattning Abstract

In the field of systems biology the task of finding optimal model parameters is a common procedure. The optimization problems encountered are often multi-modal, i.e., with several local optima. In this thesis, a class of algorithms for multi-modal problems called metaheuristics are studied. A downside of meta-heuristic algorithms is that they are dependent on algorithm settings in order to yield ideal performance.

This thesis studies an approach to tune these algorithm settings using user constructed test functions which are faster to evaluate than an actual biological model. A statistical procedure is constructed in order to distinguish differences in performance between different configurations. Three optimization algorithms are examined closer, namely, scatter search, particle swarm optimization, and simulated annealing. However, the statistical procedure used can be applied to any algorithm that has configurable options.

The results are inconclusive in the sense that performance advantages between configurations in the test functions are not necessarily transferred onto real biolog-ical models. However, of the algorithms studied a scatter search implementation was the clear top performer in general. The set of test functions used must be studied if any further work is to be made following this thesis.

Nyckelord

Keywords optimization, systems biology, metaheuristics, scatter search, simulated annealing, particle swarm optimization, core-box modeling

(6)

(7)

Abstract

In the field of systems biology the task of finding optimal model parameters is a common procedure. The optimization problems encountered are often multi-modal, i.e., with several local optima. In this thesis, a class of algorithms for multi-modal problems called metaheuristics are studied. A downside of metaheuristic algo-rithms is that they are dependent on algorithm settings in order to yield ideal performance.

This thesis studies an approach to tune these algorithm settings using user constructed test functions which are faster to evaluate than an actual biological model. A statistical procedure is constructed in order to distinguish differences in performance between different configurations. Three optimization algorithms are examined closer, namely, scatter search, particle swarm optimization, and simulated annealing. However, the statistical procedure used can be applied to any algorithm that has configurable options.

The results are inconclusive in the sense that performance advantages between configurations in the test functions are not necessarily transferred onto real biolog-ical models. However, of the algorithms studied a scatter search implementation was the clear top performer in general. The set of test functions used must be studied if any further work is to be made following this thesis.

Sammanfattning

Inom området systembiologi är uppgiften att hitta optimala modellparameterar vanligt förekommande. Dessa optimeringsproblem har ofta flera optima. I denna rapport studeras metaheuristiker, en klass algoritmer som är lämpliga för pro-blem med flera optima. En nackdel med metaheuristiker är att de är beroende av algoritmspecifika inställningar för bästa prestanda.

Denna rapport behandlar en ansats att ställa in algoritmerna med hjälp av test-funktioner, som är snabbare att utvärdera än modellsimuleringar. Ett statistiskt ramverk används för att kunna särskilja prestandaskillnader mellan konfigurationer. Tre algoritmer har studerats närmare, scatter search, particle swarm optimzation och simulated annealing. Det statistiska ramverket går dock att tillämpa på god-tycklig algoritm som har inställningsmöjligheter.

Resultaten kan sammanfattas med att bra prestanda på testfunktionerna inte garanterar bra prestanda på biologiska modeller. Av de tre studerade algoritmerna så var scatter search den klara vinnaren när allmän prestanda beaktades. Testfunk-tionerna behöver utvecklas och studeras noggrannare om vidare arbete ska göras.

(8)

(9)

Acknowledgments

First I would like to thank, in no particular order, Gunnar Cedersund, Christian Lyzell, Martin Enqvist, Jan Brugård, David Gomez-Cabrero, and Oscar Samuels-son for all the support during this thesis. I would also like to thank everyone at Plan 12 for all the laughs.

(10)

(11)

Introduction

Throughout the ages man has always strived for an understanding of his sur-roundings. Models of the solar system were made by the ancient Greeks and with refinements from the likes of Copernicus, Kepler, Newton, and Einstein finally ful-filling the requisites for a good model, providing predictive properties and physical insights.

Instead of gazing out into space some scientists have turned their focus inwards, to the human body. In the emerging field of systems biology, researchers are trying to model the human body and its intricate functions as a system at different levels. The primary goal in this field of research is understanding of the dynamics influencing the body. Examples range from cell metabolism to insulin signaling where the latter is the primary field of the research group where the work for this thesis has been performed.

Mathematical modeling in systems biology is the tool used in order to identify nontrivial correlations in data and generate hypotheses for testing. A reason why this is a fairly new research field is that up until recently, simulation of such complex systems was infeasible. Simulations of these systems require a substantial computational effort, resulting in optimizations taking several hours up to days to complete. This forces us to stop and reflect on, are we optimizing optimally?

1.1 Objective

The primary objective of this thesis is to evaluate the approach of tuning opti-mization algorithms using test functions. The test functions are problems that are faster to evaluate than regular biological models. The following objectives are stated:

• Construct a test function suite.

• Tune algorithms using the aforementioned test functions. • Evaluate performance on models from real applications.

(14)

• Devise a measure to evaluate parameter spread.

• Investigate the impact of ad hoc punishments on the optimization results. What constitutes a good test function, how does one tune the different algo-rithms and which properties, if any, are transferred from the test functions to the real applications are questions to be answered in this thesis.

The goal of this thesis is to enable a workflow where one can tune algorithms using the test functions, remaining confident in their performance in real world applications. Thus, saving time not having to do the tuning with lengthy model simulations.

Finally, by request of the research group at ike, an investigation regarding the usage of ad hoc punishments on the objective function should be conducted. These punishments are a way of restricting the search space by introducing severe penalties for parameters which do not describe data according to some qualitative aspect, see Section 2.2.1 for a description.

1.2 Related Work

This thesis is based on and extends the work by Oscar Samuelsson in [33]. Encoun-tered problems in [33] include too few test functions in order to make statistics, no measure to evaluate parameter spread, only one model used for evaluations and an incomplete investigation of the ad hoc punishments.

The measures and the discard procedure used in this thesis are slightly refined from [33]. However the basics remain the same and credit should be given to the original author for inspiration.

For full disclosure, the work in [33] is made in the same research group and Oscar Samuelsson has been a valuable resource for information and discussions during the work of this thesis.

1.3 Outline

The thesis is written in such a manner that it should be understandable for an engineering student with basic knowledge of control theory and statistics.

The background for this thesis is treated in Chapter 2, describing modeling in general and core-box modeling in particular, and the optimization algorithms used. The methodology used is detailed in Chapter 3, and the results are presented in Chapter 4. Conclusions made are summarized in Chapter 5 along with suggestions for future work and extensions of this thesis.

1.4 Computer Software

The coding of the algorithms mentioned in this thesis has primarily been done in Matlab, and simulations of the biological models were done with the help of Systems Biology Toolbox 2 [35]. Part of the calculations and simulations required

(15)

1.5 Acronyms and Terminology 3

usage of resources at NSC, National Supercomputer Center in Linköping, Sweden. Even though Matlab would suffice, the statistical tests were performed using R and Minitab due to user-friendliness.

1.5 Acronyms and Terminology

Configuration - Distinct set of algorithm settings. Cost function - Objective function

DHC - Dynamic hill climbing Instance - Test function

ODE - Ordinary differential equation PR - Peak ratio

PSO - Particle swarm optimization SA - Simulated annealing

SSm - Scatter search for Matlab [25] SP - Success performance

(16)

(17)

Chapter 2

Background

Mathematical modeling is an effective tool for gaining knowledge in a wide range of subjects, ranging from describing the motion of the celestial bodies through space to the movement of a robotic arm in an industry assembly line.

With the advent of increasingly powerful computers and software such as MathModelica and Matlab mathematical modeling is a possibility in problems where an analysis by hand is simply infeasible. Also, by offering a way to quickly alter model settings, different external and internal conditions can be considered and evaluated.

2.1 Modeling

A terse description of a system might be an object where signals interact. External signals to the system are called inputs if they are manipulable by the observer, disturbances if they are not. Disturbances can either be directly measurable, or observable only through their influence on the output. The external stimuli to the system produces an output which is measurable.

A model of a system is a formulation of the relationship between input and output signals, based on either a priori knowledge or even a guess. The field of system identification deals with the problem of building a model using measured data.

There exist several modeling approaches and two of them are briefly explained below. The interested reader is referred to [21] for an extensive treatment of the topic.

Black-box modeling considers the system unknown and as a tool to obtain a good fit to measured data. Black-box modeling requires no knowledge of the system in order to predict model output, but at the same time offers limited physical insight.

An alternative approach is white-box modeling, where one instead describes the system without the need of input/output data. However, this approach requires knowledge of the system extensive enough to precisely state the interactions be-tween input and output signals. This prerequisite knowledge of the system severely

(18)

restricts this approach to areas where the underlying principles are fully known. The perfect model will not only possess excellent predictive qualities but also offer complete insight into the interactions within the system. However, black-box modeling mainly offers predictive qualities, whereas white-box modeling requires the system to be known. To overcome these complications, the two modeling approaches can be combined into a third approach, gray-box modeling.

Commonly in the field of systems biology one deals with gray-box models where a basic model structure is known, but the specific parameter values remain unknown.

2.1.1 State Space Representation

Using x as an auxiliary vector, a general mathematical model of a system in continuous time can be written in a succinct way,

˙

x = f (t, x(t), u(t), w(t)|θ),

y = g(t, x(t), u(t), v(t)|θ). (2.1)

The system in (2.1) has input u, output y, process noise w, measurement noise v, time vector t, and θ represents the vector of unknown model parameters. A simple model of a water tank will exemplify the usage of (2.1).

Example 2.1

Consider a cistern where the rate of change of the water level, ˙x, is the difference

between inflow and outflow. With the help of Bernoulli’s principle [34] this system can be modeled as ˙ x = 1 A(u − θ p 2gx) y1= θ p 2gx y2= x (2.2)

where A is the area of the water surface, u is the inflow, θ√2gx describes the outflow, and g is the standard gravity. Here θ represents the unknown area of the opening, and is a model parameter. This parameter may be estimated by

ˆ

θ = √y1 2gy2

.

This example is tailor-made for illustrational purposes. General problems are not necessarily as straightforward, with parameters often being hard to estimate precisely.

(19)

2.1 Modeling 7 x1 x2 x3 k1 k2 k3

Figure 2.1: A simple biological model.

2.1.2 Biological Modeling

A biological analogue of the system in Example 2.1 is presented below as a segue between the physical realm of modeling into biology.

Example 2.2

Figure 2.1 illustrates a simple interaction between three proteins denoted, x1, x2,

and x3. The arrows in Figure 2.1 indicates that x1 can form x2, x2 can form

x3, etc. The parameters k1, k2, and k3 are called kinetic parameters and they

determine the speed of the reactions. For example, k1 determines at which rate

x1is forming x2.

This together with the fact that the rate of change is the difference between amount flow into a node and the outflow (cf. Example 2.1) the system can be described in a more familiar form,

˙ x1= k3x3− k1x1 ˙ x2= k1x1− k2x2 ˙ x3= k2x2− k3x3 (2.3)

Each state in (2.3) corresponds to a concentration of a protein in the sys-tem. The interactions can of course be made more realistic by adding for example Michaelis-Menten kinetics [26], yet the fundamentals remain the same.

The model in (2.3) is in state-space form as a system of ordinary differential equations (an ODE model). There exist many modeling frameworks with varying complexity, ranging from boolean modeling, an interaction graph approach with nodes only having two distinct values (one or zero), to systems of stochastic dif-ferential equations where Monte Carlo-based methods are required in simulations [14]. More information regarding different modeling frameworks in systems biology can be found in [16].

The models studied in this thesis are ODE models since it is the primary modeling technique currently being used by Gunnar Cedersund’s research group at ike, Linköping University.

(20)

2.2 Cost Function

A natural desire in modeling is good predictive qualities of the model. One way to achieve this is aiming to minimize the sum of norms between the predicted model output and the measured data. Furthermore, by weighting measurements using their variance, one can place more confidence in measurements with low variance. Stated mathematically, a possible objective function is given by

V (θ) = N X k=1 (y(tk) − ˆy(tk|θ))2 σ2 k , (2.4)

where y(t1), . . . , y(tN) are the measured values, σ2kis the variance in measurement

k, and ˆy(tk|θ) are the predicted values using parameter vector θ. Under the

as-sumption y(tk) = ˆy(tk|θ) + ε(tk), where ε(t) ∼ N (0, σ2(t)), this is asymptotically

equivalent to the maximum likelihood method [21].

A useful property of (2.4) is that one can reject the parameter vector ˆθ using

a chi-square test. A rejection means that the parameter vector is not describing data sufficiently well, and is possible if V (ˆθ) exceeds the critical value of a χ2α

(d)-distribution, where α is the significance level and d is the degrees of freedom. Given two independent sets of data, one for parameter estimation and one for model validation, the number of degrees of freedom is the number of data points in the validation set. However, this only applies as long as the parameters in the model are identifiable, where identifiability implies a theoretical possibility to identify the true value of a parameter.

Unfortunately, due to unidentifiability being commonplace in systems biology the value d is to determine. Since this thesis is neither focused on identifiability nor model rejections, further information regarding this topic can be found in [31] and references therein.

2.2.1 Ad Hoc Punishments

A common practice among the system biologists at ike is the usage of so called ad hoc punishments on the objective function,

˜

V (θ) = V (θ) + w(θ), (2.5) where V is the objective function found in (2.4) and w represents the ad hoc punishment.

The ad hoc punishments impose certain requirements on the model output, e.g, rise times or steady state levels. These requirements are often related to core predictions, which will be described in Section 2.3.

If a parameter vector cannot produce a satisfactory output, the parameters are discarded and a penalty is added to the cost function. The rationale behind the ad hoc punishments is discouraging search in the neighboring space.

The ad hoc punishments are quite simple in their current implementation adding penalties up to 1014 _{to the cost function, thereby altering the}

appear-ance of the cost function surface. Whether or not these modifications actually help the optimization algorithms has not been properly investigated.

(21)

2.3 Core-box Modeling 9 -2 0 2 -3 -2 -1 0 1 2 3 -10 -5 0 5 x1 x2 Cost

Model

P1 P2

Figure 2.2: Two different parameter sets giving approximately the same model behavior. Figure courtesy of Oscar Samuelsson [33].

2.3 Core-box Modeling

Models of biological systems are often complex, consisting of coupled processes with multiple feedback loops with varied dynamics. Parameter values in these systems depend on several factors such as cell types and experimental conditions. Furthermore, measuring biological systems is quite difficult and one often has to work with a small amount of samples with a high noise level. These dependencies combined with the identifiability issues discussed in [31] often make parameter estimations non-unique.

Not accounting for these circumstances, in a worst case scenario, one could end up with model estimates and subsequent conclusions which are arbitrarily unreliable. Since the parameters cannot be uniquely determined, an alternative is to investigate the set of all feasible parameter vectors,

Θ = {θ : V (θ) < δ}. (2.6)

By adjusting δ, Θ can be chosen as all parameters that describe the data sufficiently well, in a statistical sense. Given the uncertainties in the parameters due to unidentifiability, there exists no basis for favoring one particular parameter vector in Θ over another. Instead, by analyzing the model output given from all vectors in Θ one can establish any shared properties among these vectors, e.g., an overshoot behavior. Such a shared characteristic denotes a core prediction. This is illustrated in Figure 2.2.

Core-box modeling [3] is a method developed by Gunnar Cedersund comprising steps of experiments, data analysis, and rejections therefrom iteratively construct-ing a valid model structure [2].

Firstly, a proposed model structure is derived using physiological knowledge and assumptions about the system. Secondly, a parameter estimation is performed, usually minimizing the prediction errors in some respect, i.e., (2.4). This also serves as a basis for a statistical test determining the likelihood of the data being generated by the model. If the model passes this test it is likely that the model structure captures at least some of the true dynamics of the system.

By analyzing the acceptable parameter vectors found in the parameter estima-tion, the set in (2.6) is formed and the shared properties can be identified.

(22)

The end result is a core-box model, which is the accepted model and any physiological insights made, i.e., core predictions. One specific trait of this process is its independence from individual parameter vectors, instead relying on a set of acceptable parameters.

Determining the set in (2.6) does not comply with most current optimization algorithms since they are mostly focused on finding the optimum, which is the topic of next section.

2.4 Global Optimization

Given a model structure, how does one actually go about finding these elusive optimal parameters that supposedly explains the data and offers physiological insights?

The search space, the space of parameter vectors, in systems biology applica-tions is often high dimensional due to a large number of parameters. The varying dynamics in the systems also make the range of feasible parameter values very large. Luckily, there often exists a physical interpretation of the parameters, im-plying positive parameter values. This is a vast space to search, even with the help of computers, and in this thesis the parameter values have been restricted to [10−6, 106_{]. Nevertheless, efficient methods to locate these parameter vectors are}

crucial.

Global optimization is the method of searching a space to locate the optimal solution vector producing the best value with regard to some objective function, e.g., (2.4). There are two kinds of optimization problems, convex or non-convex. In a convex problem, a local minimum is equivalent to a global optimum. Thus, if a local minimum has been found the global optimum is also known. Non-convex optimization problems do not have the same useful property. Successfully locating a local minimum does not imply global optimality and this results in a harder problem to solve.

In general, there are two classes of algorithms, deterministic and stochastic methods. The stochastic methods will be treated in Section 2.4.2.

Deterministic algorithms aim to guarantee locating the global optimum. For example, a multi-start algorithm can locate optima given a dense enough grid of starting points. However, in practice this guarantee is often pointless because the required computational effort is increasing rapidly (often exponentially) with the problem size. For example, the number of starting points using multi-start must increase exponentially with the number of dimensions in order to retain the same density of points. This is commonly known as the curse of dimensionality.

Most optimization algorithms are focused on finding the best solution vector. As discussed in Section 2.3 this is not of primary utility for the core-box modeling framework. Deterministic algorithms have the drawback of always doing the same choice in a given situation, hence exploring the same path given a fixed starting point. One deterministic method, multi-start, was used in the thesis to serve as a comparison in the benchmarks.

(23)

2.4 Global Optimization 11

2.4.1 Multi-start Methods

A multi-start method consists of two parts, a diversification method generating random points in the search space and a local solver. Given a starting point (generated by the diversification method) the local solver will search “downhill” until it no longer improves the objective function value.

This is a simple approach, which can be very effective when dealing with convex problems. Drawbacks include inefficiency in non-convex problems, since starting points in the same basin of attraction will locate the same optimum. There exist clustering methods to remedy this situation [30]. However, these methods have issues with systems with a large number of parameters [25].

There exists a wide variety of local solvers to deploy. These are often de-pendent on problem specific properties, e.g., availability of analytical gradients or least squares problems, in order to optimize performance. In this thesis, the local phase of the Dynamic Hill Climbing (DHC) [37] algorithm is used. This algorithm allows discontinuous objective functions, which is the case when using ad hoc punishments.

In the realm of core-box modeling, the exploration of the search space using a multi-start method is highly dependent on choosing a dense grid of starting points. Otherwise, feasible parameter sets could be missed.

2.4.2 Metaheuristics

Metaheuristics are a class of optimization algorithms that use heuristic methods in a “higher sense”, from Greek meta meaning beyond. This term was coined by Glover in 1986 coinciding with his work on tabu search [10]. There exist several definitions of what constitutes a metaheuristic, though such a discussion is not relevant for this text.

Heuristics were first devised to solve hard, in the sense of computational effort, combinatorial problems like the Traveling Salesman Problem with for example simulated annealing practically “solving” this problem [29]. They have since then been adapted to suit continuous problems as well. A well-constructed heuristic algorithm can find the vicinity of a global optimum with relative ease compared to an exact method. However, the trade-off is that global optimality cannot be guar-anteed. Heuristic algorithms contain an element of randomness and are therefore stochastic methods. Heuristics are quite forgiving in the sense that they only need an objective function and require no additional information such as gradients.

One common attribute among heuristics is that they only serve as general algorithmic templates whose individual parameters need to be properly configured to yield ideal performance. A distinct set of parameters will be referred to as a configuration. Tuning of heuristics by hand is a process based on trial and error and since it involves decisions based on personal experience with a mixture of rules of thumb, it is not a solid foundation for an unbiased selection. Thus, reliable frameworks for tuning heuristics are needed, with [27], [13], and [1] containing possible strategies.

(24)

Scatter Search

Introduced by Glover in 1977 as a heuristic for integer programming [9], the scatter search algorithm aims to systematically explore the search space. The central part of scatter search is the reference set, which contains a number of promising solution vectors. Combinations of the vectors in the reference set are used to generate new solutions, ideally producing increasingly better objective function values.

The scatter search template presented by Glover [11] serves as the foundation for most implementations to date. Laguna describes a “five-method template”, detailed in [19], where the components of scatter search could be stated as:

• A diversification generation method that generates a set of diverse trial so-lutions.

• For each of the trial solutions, a local search is performed with the objective to improve the solution.

• A reference set is built using the improved trial solutions. Both quality and diversity are considered.

• A subset generation method is applied to the reference set to produce subsets of solutions, serving as a basis for creating combined solutions.

• Finally, a solution combination method transforms the subsets generated in the previous step into new trial solutions.

The last four steps in this procedure are then looped, keeping the number of so-lutions in the reference set constant between iterations, until satisfactory soso-lutions are available in the reference set. It is important to note that the performance of scatter search is highly dependent on the implementation of the individual steps above and not the steps themselves.

The implementation of scatter search used in this thesis is described in more detail in [7].

Particle Swarm Optimization

Introduced by Eberhart and Kennedy in 1995 [17], Particle Swarm Optimization (PSO) is based on the notion of swarm intelligence, i.e., the collective behavior in decentralized, self-organized systems. The swarm is made up of individuals with-out any centralized control structure, where local interactions between individuals influence the whole swarm. Analogues can be found in nature, such as ant colonies, flocking of birds, etc.

A particle is a point in the search space and is therefore representing a pa-rameter vector. In PSO, particles are moving around in the search space with their search directions and step lengths decided by building linear combinations using other particles in the swarm. The general idea in PSO is that particles share information amongst each other, and thus move into more promising regions of the search space.

(25)

These linear combinations are built using two factors, individuality and social-ity. Individuality describes the influence from the particle’s own best position. Sociality represents the best position found in its neighborhood.

Defining the neighborhood as the whole swarm, the swarm strives to find the global optimum. By introducing smaller neighborhoods in the swarm, instead smaller clusters are independently searching for an optimum. The choice of neigh-borhoods is in itself a research field, see for instance [20].

In this thesis, a simple implementation is made, keeping the number of algo-rithm parameters low. A pseudocode description of the PSO algoalgo-rithm is given in Pseudocode 1 using the definitions below.

Given a search space Rn

with a cost function f : Rn _{→ R, let particle i have}

position xi ∈ Rn and velocity vi. Define the particle’s best known position as pi

and let g be the best known position in the entire swarm. Finally, let I be the inertia weight of the particle. The inertia weight decides how prone the particle is to change its current search direction. Therefore, a high inertia weight results in a global search and a low inertia weight results in a local search, comparable to the concept of temperature in simulated annealing.

The inertia weight is typically decreased (while remaining positive) during the search. A usage of negative initial inertia weight to diversify the search can be seen in [22].

The parameters to be tuned later on are: φP, deciding the number of particles

in the swarm, φT, deciding the number of neighboring particles, and φI, defining

the inertia weight reduction factor.

Simulated Annealing

Like many other heuristic algorithms, simulated annealing is inspired by a phe-nomenon in nature. In this case, it is the behavior of atoms in metal serving as inspiration, specifically in the process of annealing. When heating a metal, the atoms are dislodged from their original positions (which are local minima of in-ternal energy) and are free to travel through arrangements with higher inin-ternal energy. By systematically cooling the metal one can obtain an arrangement of lower internal energy than before, ending up with a crystal structure with other properties.

Simulated annealing as an optimization algorithm is often attributed to [18] but the fundamental principles are treated as far back as Metropolis’ work in 1953 [23]. As mentioned before, simulated annealing was originally an algorithm for solving combinatorial problems. An adaptation to continuous problems is presented below based on the Nelder-Mead Downhill Simplex algorithm where a simplex is transformed in the search space looking for the best objective function value [29].

A simplex is a geometric figure which serves as the n-dimensional generalization of a triangle or tetrahedron, i.e., in N dimensions a simplex consists of N + 1 vertices. In simulated annealing for continuous problems, the main device is letting one or many simplexes move around in the search space.

(26)

Pseudocode 1 Particle Swarm Optimization

Require: Cost function f . Parameters φP, φT, and φI. Stopping thresholds

maxEval and maxIter.

I ← 1

evals ← 0 iters ← 0

numberOfParticles ← round(φP· n) + 1

numberOfNeighbors ← round(φT · numberOfParticles) + 1

if numberOfNeighbors > numberOfParticles then

numberOfNeighbors = numberOfParticles

end if

initParticles() {Allocate memory for particle matrix.}

for all particles do

xi← U (blower, bupper) {b being the bounds of the search space.}

pi← xi

vi← U (−(bupper− blower), (bupper− blower))

end for

defineNeighbors(numberOfNeighbors)

end for

while evals < maxEval and iters < maxIter do for all particles do

Calculate f (xi)

evals ← evals + 1

if f (xi) < f (pi) then

pi← xi

if f (xi) < f (g) then

g ← pi {Update global optimum.}

end if end if

n ← getNeighborWithBestObjectiveFunctionValue()

rp, rn ← U (0, 1)

{Elementwise product, i.e., x ◦ y = (x1· y1, x2· y2, . . . , xn· yn).}

vi← I · vi+ rp◦ (pi− xi) + rn◦ (pn− xi) xi← xi+ vi end for end for iters ← iters + 1 I ← φI· I end while

(27)

(a) (b) (c) (d)

Figure 2.3: Illustration of the simplex transformations in R2_{: (a) Reflection, (b)}

Reflection and expansion, (c) Contraction, (d) Multiple contraction. The dashed simplex represents the simplex at beginning of step with the top vertex having the worst objective function value.

The simplex is moving freely in the search space at high temperatures, as the search progresses and the temperature cools the simplex is less prone to erratic movement and is displaying behavior more akin to a local search. The movement is defined by the four transformations depicted in Figure 2.3.

Since rate of cooling is largely problem-dependent [29], the main tuning param-eters in simulated annealing are the start temperature T1, the number of iterations

for each temperature Nmax, and the cooling scheme, here represented by the

func-tion c.

The process used to select restarting points is fairly complicated and covered in [28]. This addition is made to encourage exploration of the search space for the possibility of finding several admissible solutions with regard to (2.6). The simulated annealing algorithm is given in Pseudocode 2.

(28)

Pseudocode 2 Simulated Annealing

Require: Cost function f . Cooling function c. Start guess x0. Start temperature

T1. Stop temperature Ts. Iteration limit at each temperature Nmax.

Initiate the simplex using x0.

k ← 1

iter ← 0

while Tk> Tsdo

while iter < Nmax do

for all vertices do

Calculate f (x)

valueOfVertex ← f (x) + L {L ∝ Tk is a random logarithmically

dis-tributed number.}

end for

Compare valueOfVertex.

Reflect, expand, or contract the simplex. Eliminate the vertices with worst value(s).

for all New vertices do

Calculate f (x)

valueOfVertex ← f (x) − L {L ∝ Tk is a random logarithmically

dis-tributed number.}

end for

iter ← iter + 1

end while

Tk+1← c(Tk)

Decide new starting points.

(29)

Chapter 3

Evaluation Methods

This chapter treats the evaluation methods used in this thesis, starting with an overview of the individual components, then connecting these together into the test setup.

3.1 Test Function Suite

In order to readily tune the metaheuristics, a suite of problems with known prop-erties is required. The first step taken here is to base the problems on analytical functions which are fast to evaluate instead of time-consuming evaluations of sim-ulated models. Moreover, making the test functions flexible to customizations is also a desired feature.

An aspiration is to create some resemblance to real problems since the objective is trying to map beneficial algorithm settings from the test functions to real ap-plications. Therefore, plots of objective functions from [2] were studied. The high dimensionality of the problems at hand makes visualization of the objective func-tion surfaces difficult. The procedure used here is picking a point and plotting the surrounding neighborhood by altering the value of two parameters while keeping the rest fixed. A representative image of the landscape is illustrated in Figure 3.1 which also depicts the impact of using ad hoc punishments from Section 2.2.1.

The test functions used here are based on the quadratic family described in [32], and are defined by the following expression,

f (x) = min

i=1,2,...,n(x − pi)

T_B

i(x − pi) + vi, (3.1)

where pi specifies the location of an optimum, n is the number of optima, vi

specifies the objective function value of the optimum, and Biis a positive-definite

matrix.

The definition in (3.1) enables construction of problems with different numbers of dimensions and optima among other features. The global optima were given value vi = 0 and local optima vi = 0.001. Furthermore, the positive-definiteness

of Bi implies that f (x) = 0 if and only if x is a global optimum.

(30)

100 200 300 400 500 20 40 60 80 0 100 200 300 400 500 p₁ p 2 100 200 300 400 500 20 40 60 80 0 50 100 150 200 250 300 p₁ p 2

Figure 3.1: An illustration of the objective function in (3.1), from the “Mfia” model in [2], close to an optimum, as seen in the p1p2-plane. The right plot shows

what is “left” after using the ad hoc punishments from Section 2.2.1. In the empty space the objective function is defined as 1014_.

(31)

3.2 Performance Measures 19

Thus, the test function surface is made up of distorted convex quadratic forms; where the appearance of the landscape can be modified by adjusting the matrix

Bi. Making the instances suitable in difficulty, neither too hard nor too easy for

a given amount of evaluations is difficult, turning instance creation into a process with a great deal of trial and error.

A total of three different sets of test functions were finally chosen with features described in Section 3.1.1. These sets of test functions can be viewed as extensions of the first set used in [33]. Each set consists of 81 instances with a varying number of dimensions, global and local optima, and a steepness flag which defines how steep the descents down to the optima are. See Table B.1 in Appendix B for a complete reference.

3.1.1 The Instance Sets

Set 1 - Regular In the first set, the test functions are characterized by large basins of attraction and smooth appearance. This is the same set used in [33].

Set 2 - Extra Steepness The second set of test functions is constructed with a modification in the normalization of the elements in the matrices Bi to

cre-ate steeper descents to the optima while at the same time making the basin of attraction smaller. The goal is to create a larger contrast in the cost function landscape.

Set 3 - Multiple Objective Functions In the last set, the test functions have two objective functions of the form (3.1) added onto each other. Each objective function has its own local optima, but the global optima between the two are shared. This creates an even more dynamic landscape with larger differences between what can be considered normal values and the optimum. This design further challenges the algorithms and hopefully excites the qualities of the better configurations.

3.2 Performance Measures

The three distinct properties investigated in this thesis are speed, robustness, and spread of search. The objective is to evaluate if these characteristics are transferred from the test functions to real applications.

3.2.1 Speed

Evaluating the number of function evaluations it takes to find a feasible solution is important since it is a useful measure to see how effective a specific algorithm or configuration is.

The use of the number of function evaluations as a measure is motivated by its independence from specifics like CPU-frequencies and memory subsystems.

(32)

Any differences in algorithm overhead is negligible compared to the cost function evaluation in the real applications. The speed measure is defined as

NFEavg= The average number of function evaluations to find a global optimum.

When used in real applications, a global optimum is defined as an objective function value close to the best result with the exact limits detailed later on.

3.2.2 Robustness

Robustness, i.e, reliability in locating an optimum serves as the basis for the next measure. In the test functions, the global optima are zero and a successful run is therefore defined by finding a value below the threshold of a local optimum, i.e., a cost below 0.001. Averaging this over the total number of trials gives the success rate (SR),

SR = Number of successful trials Total number of trials ,

where a successful trial in the biological applications is defined in the same way as the speed measure.

Given n trials, SR can only obtain n + 1 distinct values. If the number of trials are low, the probability of tied values between configurations is high. The number of trials used in this thesis is limited to five, resulting in complications when used in conjunction with the pairwise comparisons described in Section A.2. Therefore to create some divergence, it is combined with the speed measure creating what is called success performance (SP)

SP = NFEavg SR .

By punishing the speed measure NFEavg by dividing it with the success rate

SR, robustness is encouraged instead of only speed since a configuration must be twice as fast if it fails half the trials in order to maintain the same SP.

Note that SP is independent of NFEmaxassuming the limit is set high enough

not to halt a potentially successful trial [32], hence demanding a high evaluation budget.

3.2.3 Spread of Search

The core-box modeling approach is dependent on finding several feasible parameter vectors in order to determine uniquely identifiable model properties. Evaluation of how extensive the scan of the search space is therefore the last used measure.

PR = Number of unique global optima found over all trials Total number of global optima

The peak ratio (PR) [36] defines the ratio between the number of found global optima and the total number of global optima. The peak ratio could be measured over time, to study how fast several optima are found or even lost.

(33)

3.2 Performance Measures 21

Figure 3.2: The pitfall of using mean distance between points as spread measure. The two sets of points × and ◦ have comparable average distances between points, but the crosses (×) are more spread.

In this thesis no consideration has been made to time. This measure is only applicable when the location of the optima are known. In order to measure the spread of the search on the benchmark problems, where exact locations of optima are unknown, another measure has to be used, called max-min spread.

3.2.4 Max-min Spread

Given two searches, could one be considered more extensive than the other? The results from an optimization are points in the search space corresponding to feasible parameter vectors. One natural approach is maximizing the average distance between points, however Figure 3.2 illustrates a clear drawback. The two searches in Figure 3.2 have almost identical average distances between the points, but the crosses are in some sense more spread.

Instead, inspired by Figure 3.2, given a θ ∈ Θ maximize the sum of minimum Euclidian distances between K number of points in θ. Note that if K = 2 in Figure 3.2, the circles are more spread, so the max-min spread is a function of K. For a point i, let yi be equal to 1 if point i is selected and 0 otherwise. Let

xi define the minimum distance to the rest of the points selected if point i is

selected and 0 otherwise. Furthermore, let K denote the number of points to be selected, dij the distance between points i and j, and Mi and ∆i are sufficiently

big constants. The constants are bounding the search space which speed up the optimization.

(34)

Using previous definitions, the max-min spread can be stated as the following integer programming problem [12],

maximize Pn i=1xi subject to Pn i=1yi= K xi≤ Miyi i ∈ {1, . . . , n} xi≤ dij+ ∆i(1 − yj) i, j ∈ {1, . . . , n} yi∈ {0, 1} i ∈ {1, . . . , n} xi≥ 0 i ∈ {1, . . . , n} (3.2)

The problem in (3.2) is solved using a binary genetic algorithm (not detailed here) from The Optimization Toolbox in Matlab. It should be noted that this measure is under development and should not be considered finalized. In its current implementation it is very expensive to evaluate, limiting its utility. See [12] for more information regarding this measure.

3.3 Discarding Configurations

The discarding procedure comprises two steps, a Friedman test and pairwise com-parisons.

The Friedman test is a non-parametric statistic test used to detect differences among configurations over multiple test functions. In this thesis it is applied to compare the configurations over the test functions. Non-parametric statistics are a useful tool when Gaussian distributed data cannot be assumed which is the case in this thesis.

The pairwise comparisons are multiple hypothesis tests determining if config-uration A is indeed significantly better than configconfig-uration B.

The Friedman test and the pairwise comparisons are detailed in Appendix A.1 and A.2, respectively.

3.3.1 Assigned Configurations

The selection of relevant settings to adjust is delicate, since the default values often are set based on experience and are considered good. The changes made must be significant enough to induce a difference in performance yet stay clear of unnecessarily extreme values. The number of levels for each option has to be somewhat limited as well, avoiding a combinatorial explosion.

Scatter Search

The amount of options in scatter search is extensive and not as clear-cut as in the PSO, ranging from micromanaging tolerances for the local search to how to treat the reference set containing the good solutions. A selection of what could be deemed as relevant options had to be made to limit the amount of configurations and any other options are left at default values.

(35)

3.3 Discarding Configurations 23

Table 3.1 presents the options with their corresponding levels, amounting to 2 × 2 × 3 × 2 × 2 = 48 different configurations. All configurations are listed for reference in Table B.2.

Table 3.1: Summary of scatter search options with their levels.

Option Levels

Reference set size auto 40

Combination hyper-rectangle linear

Regenerate distance direction alternate

Delete aggressive standard

Merit filter on off

The reference set is essential in scatter search since new solution vectors are created using its members. A maximum size of 20 solution vectors is recommended in [8]. However, [6] suggests that larger set sizes yield solutions of higher quality. Therefore, ‘40’ was chosen as an interesting reference set size whereas ‘auto’ com-putes an appropriate size in relation to the problem, i.e., the number of parameters, etc.

The combination option determines how the new solutions are generated, ei-ther using linear combinations of reference set vectors or by constructing hyper-rectangles, the latter allowing for more freedom when constructing new vectors.

Regenerate decides the diversity criterion used when admitting new solution vectors in the reference set. The setting ‘distance’ considers Euclidian distance, ‘direction’ diversifies in terms of min-max cosine between new solutions and the vectors already in the reference set, encouraging searches in new directions, and ‘alternate’ alternates between these two criteria.

The delete option regulates how many solutions in the reference set are kept between iterations. The ‘standard’ setting discards half of the solution vectors in the reference set, and ‘aggressive’ only keeps one (the best) solution vector.

With the merit filter enabled, local search is disabled for solutions deemed bad, thus saving evaluations.

The local search procedure used in scatter search is DHC [37]. Any possible performance gains using other local search methods could be investigated, as will be discussed in Section 5.1.

Particle Swarm Optimization

The parameters to tune in the used implementation of PSO are clearly defined and are φP, φT, and φI. The configurations are all possible combinations of the

elements in the sets seen in Table 3.2 with the total number of configurations being 4 × 3 × 3 = 36. The assigned configurations are listed in Table B.3.

(36)

Table 3.2: Particle swarm optimization parameters

Parameter Influences Values

φP Number of particles. {0.5, 1, 1.5, 2}

φT Number of neighbors. {0.5, 0.75, 1}

φI Inertia reduction. {0.75, 0.9, 0.99}

Some remarks could be made about the implications of the different options even though they are quite intuitive. Setting φP = 2 results in more particles

in the swarm which could help locating an optimum, but at the same time these extra particles consume evaluations. As discussed earlier, the inertia reduction factor, φI, decides the nature of the search, hence possibly affecting convergence

speeds. Note that φT = 1 means that all particles are neighbors, which results in

the original implementation of PSO without niching. How this works out in for example the spread of search is treated in the next chapter when presenting the results.

3.4 Test Setup

With each of the individual components explained, the general test setup used can be presented.

(1) Run all assigned configurations over the test function suite.

(2) Calculate the performance measures.

(3) Use the discarding framework to discover any poorly performing configura-tions.

(4) Benchmark the configurations on two systems biological applications.

In (1), each configuration is given a total of five trials with 30,000 function evaluations each. This is done in order to make sure they get a chance to explore the search space and thus, hopefully, locating multiple optima.

Step (2) processes the output from the five trials in (1) and calculates the performance measures described in Section 3.2 resulting in a single value for each performance measure.

In step (3), the discarding framework is applied to uncover any poorly per-forming configurations.

In step (4), four configurations are selected: the best and worst configurations together with two others serving as controls. These four configurations are run over two systems biological benchmark problems. After running the chosen con-figurations over the benchmarks their mutual relation with regard to test function performance will hopefully be preserved.

(37)

3.4 Test Setup 25

The simulated annealing implementation, from [28], currently in use at ike will serve as a control algorithm, using configurations determined in [33] as solid per-formers for respective measure. A multi-start algorithm using the diversification method from scatter search and DHC as local solver will also be used in order to gauge how well a deterministic method performs.

(38)

(39)

Chapter 4

Results and Analysis

This section presents and analyzes the main results in this thesis: how well does the test function suite facilitate discarding of configurations, do any properties transfer from the test functions to real applications, and what are the consequences of using the ad hoc punishments?

4.1 Evaluation of Test Function Suite

The purpose of the test function suite is to use it as a sieve, i.e., challenge the algorithms enough to distinguish bad performing configurations from the rest and doing so in a timely fashion not requiring excessively time-consuming calculations.

4.1.1 Set 1

As can be seen in Figure 4.1, both algorithms find an optimum fairly easy, far from the cap of 30,000 evaluations. While the number of function evaluations finding an optimum is quite low for PSO, its success rate is not comparable to that of scatter search, which has a close to perfect success rate. The exploration of the search space is extensive, with scatter search averaging over half of the optima found in the 25 dimensional instances. The PSO is solid overall, yet showing some signs of performance deterioration when dimensionality is increased with scatter search less affected.

4.1.2 Set 2

The fact that this instance set is clearly more challenging can be seen in Figure 4.2 with success rates close to zero after instance number 54, when the instances step up to 25 dimensions. Interestingly, this applies to both scatter search and PSO in almost the same extent, with the PSO having a success rate identical to zero. This is most likely the curse of dimensionality manifesting itself. Compared to set 1, both PSO and scatter search are performing noticeably worse, even when ignoring the 25 dimensional instances.

(40)

20 40 60 80 0 1 2 3x 10 4 Evaluations ? Scatter Search ? Speed 20 40 60 80 0 0.5 1 Success rate Robustness 20 40 60 80 0 0.5 1

Ratio of optima found.

Instance Spread of search 20 40 60 80 0 1 2 3x 10 4

? Particle Swarm Optimization ? Speed 20 40 60 80 0 0.5 1 Robustness 20 40 60 80 0 0.5 1 Instance Spread of search

Figure 4.1: Performance on each instance in set 1, mean value taken over all configurations. 20 40 60 80 0 1 2 3x 10 4 Evaluations ? Scatter Search ? Speed 20 40 60 80 0 0.5 1 Success rate Robustness 20 40 60 80 0 0.5 1

Ratio of optima found.

Instance Spread of search 20 40 60 80 0 1 2 3x 10 4

? Particle Swarm Optimization ? Speed 20 40 60 80 0 0.5 1 Robustness 20 40 60 80 0 0.5 1 Instance Spread of search

Figure 4.2: Performance on each instance in set 2, mean value taken over all configurations.

(41)

4.1 Evaluation of Test Function Suite 29

4.1.3 Set 3

Although different in design, set 3 is very similar to set 2 in performance which is disappointing (plots not shown). The underlying reason behind this similarity is hard to pinpoint since care was taken to give the sets different properties. One possible explanation is the fact that both sets stem from the same quadratic family based on (3.1). Again, the jump from 10 to 25 dimensions in the test functions results in an insurmountable increase in difficulty for both algorithms, given the evaluation budget.

4.1.4 Summary of the Test Function Suite

Viewing Figure 4.1-4.2 one could argue that PSO is being outperformed, but re-member that this is not primarily about comparing the algorithms over the test function suite, it is only supposed to provide enough material to discern differences among the configurations within each algorithm. One should be cautious drawing any early conclusions when comparing different algorithms over a set of problems, see [38].

Of the instance attributes, dimensionality has the largest impact, making the 25 dimensional instances too hard for both algorithms to solve. The number of optima is intuitively affecting the effort to locate a single optimum, where a single optimum is hardest to find. An increase in difficulty can be seen when altering the steepness, represented by the peaks every third instance in the speed plots in Figure 4.1 and 4.2). The number of local optima does not seem to influence the results radically.

Before moving on to the step of discarding configurations, the 25 dimensional problems from set 2 and 3 were removed, due to the low success rates. Looking closer, the best performing configuration finds an optimum in only 5 % of the trials, far from reliable. The final instance pool consists of all instances from set 1 and the 5 and 10 dimensional instances from set 2 and 3, for a total of 189 instances.

Transforming the data from the final instance pool using rank statistics (see Section A.1) and taking mean values over all instances results in Figure 4.3, a comparison of configurations.

By brief inspection Figure 4.3 looks convincing to suggest differences between configurations. A Friedman test, Table 4.1, confirms the suspicions that there indeed are statistically significant differences between configurations for all three measures.

(42)

10 20 30 40 ï1 ï0.5 0 0.5 1 ? Scatter Search ? Speed 10 20 30 40 ï1 ï0.5 0 0.5 1 Robustness

Average deviation from median (normalized)

10 20 30 40 ï1 ï0.5 0 0.5 1 Spread of Search Configuration 10 20 30 ï1 ï0.5 0 0.5 1

? Particle Swarm Optimization ? Speed 10 20 30 ï1 ï0.5 0 0.5 1 Robustness 10 20 30 ï1 ï0.5 0 0.5 1 Spread of Search Configuration

Figure 4.3: Average rank deviation from the median, taken over all instances. The values have been normalized in such a way that -1 means being permanently ranked number one, 0 means having median rank in average, and 1 means being permanently ranked last.

(43)

4.2 Discarding of Configurations 31

Table 4.1: Results from the Friedman test. Size of the test statistic, T , and the critical value, CV, as defined by a χ2

0.01-distribution with df denoting the degrees

of freedom. Algorithm Measure T CV df SSm Speed 3.659 × 103 _{7.244 × 10}1 ₄₇ Robustness 3.204 × 103 _{7.244 × 10}1 ₄₇ Spread of search 3.032 × 103 7.244 × 101 47 PSO Speed 4.714 × 102 _{5.734 × 10}1 ₃₅ Robustness 7.677 × 102 _{5.734 × 10}1 ₃₅ Spread of search 1.878 × 103 _{5.734 × 10}1 ₃₅

In summary, the test function suite has performed its task, exciting disparity between the configurations. However, about a fifth of the instances ended up unused, which aside from being a waste of resources also exemplifies the difficulty in making a qualified test function.

4.2 Discarding of Configurations

With the Friedman test indicating differences between configurations, further in-vestigation can be made: how many configurations can be discarded and at what performance level? By applying the pairwise comparison procedure described in Section A.2 with an increasing Ppercgenerates Figure 4.4.

For a fixed Pperc, the resulting sets of discarded configurations are shown in

Table 4.2. It is important to note that the significance levels are only formally correct for a fixed Ppercchosen from Figure 4.4. Here Ppercbetween 0.60 and 0.85

was chosen. Lower levels of Pperc were chosen in order to receive larger sets of

discarded configurations to analyze or, in some cases, any sets at all.

Table 4.2: The best configurations for respective measure and the discarded set, with the worst performers in bold.

Algorithm Measure Best Pperc Set of discarded configurations

SSm Speed 2 0.85 {25, . . . , 41, . . . , 48} Robustness 10 0.85 {25, . . . , 41, . . . , 48} Spread of search 28 0.75 {13, . . . , 24} PSO Speed 31 0.60 {1, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36} Robustness 31 0.65 {1, 3, 4, 6, 7, 9, 15, 18, 27, 30, 36} Spread of search 30 0.85 {1, . . . , 7, 8, 9, 16}

(44)

0.5 0.6 0.7 0.8 0.9 0 5 10 15 20 25 30 35 40 45 P_perc

Number of discarded configurations

Scatter Search 0.5 0.6 0.7 0.8 0.9 0 5 10 15 20 25 30 35 P_perc

Number of discarded configurations

Particle Swarm Optimization Speed Robustness Spread of search Speed Robustness Spread of search

(45)

4.2 Discarding of Configurations 33

4.2.1 Scatter Search

Inspecting Figure 4.4, scatter search seemingly has a lot of poorly performing configurations since discarding half of the configurations is possible using the speed and robustness measure. The spread of search plot indicates that most scatter search configurations are quite similar in performance with regard to spread of search, probably due to the measure being discrete.

Perhaps the most remarkable result is the fact that half of the scatter search configurations, configurations 25–48, can be discarded with regard to both speed and robustness. The critical setting in this case seems to be the size of the reference set, as the last 24 configurations have the size of this set to 40 solution vectors instead of ‘auto’ apparently severely impacting performance.

Although not at the chosen fixed Pperclevel, the spread of search plot in

Fig-ure 4.4 at Pperc= 0.75 indicates configurations 13–24 as poor performers.

Refer-ring to Table B.2, it seems like the combination method is influencing the results. This outcome suggests that care should be taken when deciding both the ref-erence set size and combination method.

4.2.2 Particle Swarm Optimization

Interpretation of Figure 4.4 could suggest that PSO is either less dependent on settings since the number of discarded configurations is so low, or that the param-eter values used are not distinct enough to elicit radically altered performance. The latter seems unlikely since, for example, the number of particles are spanning a factor of four between configurations.

Further study revealed that the individual instance sets had different config-urations as top performers. This explains the behavior seen in Figure 4.3 with the tighter grouping around zero, i.e, tendency to average median rank seen over all instances. This tendency also explains the lack of discarded configurations for higher values of Pperc.

Analyzing Table 4.2 from the PSO point of view reveals some interesting re-sults. For example, using the speed measure one can discern that almost all of the discarded configurations using this measure are a multiple of three. Referring to Table B.3 one can suspect the setting φI = 0.99 as a possible source. It is an

appealing idea since φI = 0.99 implies slower inertia reduction resulting in a

parti-cle prone to global search, giving subsequent convergence problems. However, the

Pperclevel at which this observation is made is low, therefore lessening its impact.

When discarding using the spread of search measure, all discarded configura-tions except one shares φP = 0.5 (resulting in fewer particles in the swarm). This

is the only result obtained using the discarding procedure for PSO which holds for

Pperc = 0.85. If spread of search is crucial in the application, one should review

(46)

4.3 Analysis of Measures

Reviewing the measures, the success performance gives similar discarded config-urations as the speed measure. Since, by design, SP is dependent of the speed measure, the result is not surprising, but disappointing nonetheless since the ro-bustness aspect is possibly suppressed. Whether or not the implementation and inclusion of this measure was correct is a valid objection.

The spread of search measure provided insight into which configurations are better at locating multiple optima, highlighting the fact that spread of search and speed do not necessarily favor the same configurations.

4.3.1 Validation of the Max-min Spread Measure

Due to the computational challenge posed by the max-min spread measure, using it directly for discarding is not feasible. Before applying the measure to the results from the benchmarks, it is important to verify that it also measures the spread of search, albeit in a different way.

Since the optima in the test functions are known, the max-min distances are easily calculated. In the case of the max-min spread measure, the situation is different: Given a selection of k points, if not k optima have been located at least one point has to be picked from the same basin of attraction as another point and thus reducing the potential maximum value.

Plotting the norm of the difference between the value calculated by (3.2) and the known distance between the optima versus the number of found optima results in Figure 4.5. Figure 4.5 shows a correlation between decreased difference and the ratio of optima found. This correlation is statistically significant, a linear regression indicates a strictly negative slope at confidence level 95 %. Similar plots (not shown) verified anticipated behavior, i.e., increasing ratio of found optima reduces the difference mentioned above.

With the measure satisfactorily validated, comparisons between select dis-carded and non-disdis-carded configurations could be made. A Friedman test is used to assess the presence of any configurational differences between the max-min spread results. A Wilcoxon-Nemenyi-McDonald-Thompson test is then used for post hoc comparisons to distinguish which configurations are different [15].

Table 4.3 shows that most configurations are retaining their rank from the spread of search measure. In summary, the max-min spread measure is performing in a predictable manner showing correlation with spread of search.

4.4 Evaluation on Real Biological Models

The concluding experiments are made on biological applications instead of the synthetic analytical test functions. Using the test setup described in Section 3.4, the best configuration will be compared against the worst performer for each mea-sure, with two additional configurations acting as experimental controls. The ideal outcome is a retention of the mutual rankings between the configurations.

Tuning of Metaheuristics for Systems Biology Applications

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Tuning of Metaheuristics for Systems Biology

Applications

Tuning of Metaheuristics for Systems Biology

Applications

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan i Linköping

av

Abstract

Sammanfattning

Acknowledgments

Contents

Chapter 1

Introduction

1.1

Objective

1.2

Related Work

1.3

Outline

1.4

Computer Software

1.5

Acronyms and Terminology

Chapter 2

Background

2.1

Modeling

2.1.1

State Space Representation

2.1.2

Biological Modeling

2.2

Cost Function

2.2.1

Ad Hoc Punishments

Model

2.3

Core-box Modeling

2.4

Global Optimization

2.4.1

Multi-start Methods

2.4.2

Metaheuristics

Chapter 3

Evaluation Methods

3.1

Test Function Suite

3.1.1

The Instance Sets

3.2

Performance Measures

3.2.1

Speed

3.2.2

Robustness

3.2.3

Spread of Search

3.2.4

Max-min Spread

3.3

Discarding Configurations

3.3.1

Assigned Configurations

3.4

Test Setup

Chapter 4

Results and Analysis

4.1

Evaluation of Test Function Suite

4.1.1

Set 1

4.1.2

Set 2

4.1.3

Set 3