Parameter estimation of biological pathways

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Parameter estimation of biological pathways

Examensarbete utfört i Reglerteknik vid Tekniska högskolan i Linköping

av

Emil Svensson

LITH-ISY-EX--06/3845--SE

Linköping 2007

Department of Electrical Engineering Linköpings tekniska högskola

Linköpings universitet Linköpings universitet

(2)

(3)

Parameter estimation of biological pathways

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan i Linköping

av

Emil Svensson

LITH-ISY-EX--06/3845--SE

Handledare: Christina Grönwall

isy, Linköpings universitet

Henning Schmidt

Fraunhofer Chalmers Research Center, Gothenburg

Examinator: Jacob Roll

isy, Linköpings universitet

(4)

(5)

Avdelning, Institution

Division, Department

Division of Automatic Control Department of Electrical Engineering Linköpings universitet S-581 83 Linköping, Sweden Datum Date 2007-02-14 Språk Language ¤ Svenska/Swedish ¤ Engelska/English ¤ £ Rapporttyp Report category ¤ Licentiatavhandling ¤ Examensarbete ¤ C-uppsats ¤ D-uppsats ¤ Övrig rapport ¤ £

URL för elektronisk version

http://www.control.isy.liu.se http://www.ep.liu.se ISBN — ISRN LITH-ISY-EX--06/3845--SE

Serietitel och serienummer

Title of series, numbering

ISSN

—

Titel

Title Parameterskattning av biologiska system_{Parameter estimation of biological pathways}

Författare

Author Emil Svensson

Sammanfattning

Abstract

To determine parameter values for models of reactions in the human body, like the glycolysis, good methods of parameter estimation are needed. Those models are often non-linear and estimation of the parameters can be very time consuming if it is possible at all. The goal of this work is to test different methods to improve the calculation speed of the parameter estimation of an example system. If the parameter estimation speed for the example system can be improved it is likely that the method could also be useful for systems similar to the example system.

One approach to improve the calculation speed is to construct a new cost func-tion whose evaluafunc-tion does not require any simulafunc-tion of the system. Simulafunc-tion- Simulation-free parameter estimation can be much quicker than using simulations to evaluate the cost function since the cost function is evaluated many times. Also a modifi-cation of the simulated annealing optimization method has been implemented and tested.

It turns out that some of the methods significantly reduced the time needed for the parameter estimations. However the quick methods have disadvantages in the form of reduced robustness. The most successful method was using a spline approximation together with a separation of the model into several submodels, and repeated use of the simulated annealing optimization algorithm to estimate the parameters.

Nyckelord

(6)

(7)

Abstract

To determine parameter values for models of reactions in the human body, like the glycolysis, good methods of parameter estimation are needed. Those models are often non-linear and estimation of the parameters can be very time consuming if it is possible at all. The goal of this work is to test different methods to improve the calculation speed of the parameter estimation of an example system. If the parameter estimation speed for the example system can be improved it is likely that the method could also be useful for systems similar to the example system.

One approach to improve the calculation speed is to construct a new cost func-tion whose evaluafunc-tion does not require any simulafunc-tion of the system. Simulafunc-tion- Simulation-free parameter estimation can be much quicker than using simulations to evaluate the cost function since the cost function is evaluated many times. Also a modifi-cation of the simulated annealing optimization method has been implemented and tested.

It turns out that some of the methods significantly reduced the time needed for the parameter estimations. However the quick methods have disadvantages in the form of reduced robustness. The most successful method was using a spline approximation together with a separation of the model into several submodels, and repeated use of the simulated annealing optimization algorithm to estimate the parameters.

(8)

(9)

Acknowledgements

I would like to thank Henning Schmidt for all the support he has given me. Henning was the one who came up with the idea for this master thesis. I would also like to thank my examiner Jacob Roll and my supervisor Christina Grönvall for their support.

(10)

(11)

5 Discussion 39 Bibliography 41 A The Numbers Assigned to the Parameters, and the Parameter Bounds 43 B Cost Function for State 1 Implemented in Matlab 45 C Brutalsolve Matlab Code 47 D Spline generation 48 E Implementation of cost function 50 F Parameter initial values and S and P values 51 F.1 S- and P-values used in the different experiments . . . 51

(13)

Chapter 1

Introduction

1.1 Parameter estimation of biological systems

Parameter estimation is used to give estimates of unknown parameters in vari-ous applications. Before the parameter estimation takes place, a model of the phenomenon of interest must be constructed. If that model has parameters that cannot be directly determined, by direct measurements or by laws of nature, other measurements must be collected so that the unmeasured (unknown) parameters can be calculated indirectly through minimizing a cost function.

One area of use for parameter estimates is to estimate parameters in models of biological pathways like the glycolysis in the human body. This work will focus on an example model of a biological pathway introduced by Mendes et al. [1] to test new methods of parameter estimation on this system. While Mendes succeded in estimating all parameters correctly the estimation required a substantial amount of computation. If the same results could be obtained (or even just a better starting guess for the slow method) using a different method that required less computational power a step forward would be made.

1.2 Problem description

Pedro Mendes1and his partners did in [1] successfully estimate the 36 parameters of an example non-linear biological pathway problem. However, the estimation re-quired a substantial computational effort, and the goal of this work is to evaluate different means to reduce the computational effort for this and similar problems. The Mendes problem will be used as a reference throughout the work, but the methods investigated will potentially be useful on other biological pathway prob-lems as well as probprob-lems in other areas.

1

Ph.D. in biochemistry, works at Virginia Bioinformatics Institute.

(14)

2 Introduction

1.3 The Mendes problem

Here a brief review of the Mendes problem will be made. For all the details please see [1]. The Mendes problem is non-linear and consists of estimating 36 parameters for a biological pathway problem through minimization of a cost function,

J(p) = k X h=1 n X i=1 m X j=1

whij{[ypred(i, j, p) − yexp(i, j)]h}2, (1.1)

where

• h is the experiment index, i is the state variable index and j is the data index.

• k is the number of experiments. • n is the number of state variables.

• m is the number of measurement values for every state variable.

• whij is a weight matrix. whij = 1 is applied in this thesis, and in [2] Banga

claims that using a scaling weigth matrix does not make a difference when using stochastic methods.

• ypredcontains the state variable values predicted by the model from a set of

parameters p.

• yexp is the experimental data obtained by measuring the states at certain

intervals.

Equation (1.1) is the original cost function used by Mendes; later in this work, in Section 2.5, a different cost function will be introduced. We will later look at the model which is used to calculate ypred. Mendes performed 16 experiments with

different starting settings (k = 16). The parameters have been given bounds in the example problem, typically between [10−12 106], see [1] for details.

1.3.1 The model

The model for the Mendes example system consists of eight states and 36 parame-ters. Starting values for all states are also included. The model is in state-space form, i.e., for every state there is an equation with the state derivative on the left side and a function of the corresponding parameters on the right side of the equal sign. In equation (1.2) to (1.9) are the eight equations corresponding to state one through eight, the names of the parameters that are to be estimated are: p1= V1,

p2 = Ki1, p3 = ni1, p4 = Ka1, p5 = na1, p6 = k1, p7= V2, p8 = Ki2, p9 = ni2,

p10 = Ka2, p11 = na2, p12 = k2, p13 = V3, p14 = Ki3, p15 = ni3, p16 = Ka3,

p17= na3, p18= k3, p19 = V4, p20= K4, p21= k4, p22= V5, p23= K5, p24= k5,

p25 = V6, p26= K6, p27 = k6, p28= kcat1, p29= Km1, p30= Km2, p31= kcat2,

(15)

1.3 The Mendes problem 3 State 1: dM1 dt = kcat1· E1· µ 1 Km1 ¶ · (S − M1) 1 +_KmS1 + M1 Km2 − − kcat2· E2· µ 1 Km3 ¶ · (M1− M2) 1 + M1 Km3 + M2 Km4 (1.2) State 2: dM2 dt = kcat2· E2· µ 1 Km3 ¶ · (M1− M2) 1 + M1 Km3 + M2 Km4 − − kcat3· E3· µ 1 Km5 ¶ · (M2− P ) 1 + M2 Km5 + p Km6 (1.3) State 3: dE1 dt = V4· G1 K4+ G1 − k4· E1 (1.4) State 4: dE2 dt = V5· G2 K4+ G1 − k5· E2 (1.5) State 5: dE3 dt = V6· G2 K6+ G3 − k6· E3 (1.6) State 6: dG1 dt = V1 1 + µ p ki1 ¶ni1 + µ Ka1 S ¶na1 − k1· G1 (1.7) State 7: dG2 dt = V2 1 + µ p ki2 ¶ni2 + µ Ka2 M1 ¶na2 − k2· G2 (1.8) State 8: dG3 dt = V3 1 + µ p ki3 ¶ni3 + µ Ka3 M2 ¶na3 − k3· G3 (1.9)

1.3.2 Optimization algorithm - SRES

To minimize the cost function an optimization algorithm is needed. Mendes used the SRES [5] (Stochastic Ranking Evolution Strategy) algorithm. SRES works by first generating a set of solutions and then using the best ranked ones of those generated solutions to generate a new generation of solutions. The ranking of solutions is done by a bubble-sort-like procedure that considers both the cost function value and if the solution violate the parameter bounds. The bubble-sort-like procedure also uses randomness, so sometimes a solution that has parameters that violates the parameter bounds can rank better than a legal solution and be a part of the solutions used to generate the next generation. SRES does not use any derivatives.

(16)

4 Introduction

By using SRES, Mendes was able to solve the example problem in 40 hours according to [1]. Mendes used simulations to evaluate the cost function. Simulating a system can be rather time consuming so it is likely most of the estimation time was used for simulations in Mendes´ case.

1.4 Outline

In this section a short guidance to the different chapters is given. In Chapter 2 there is a brief discussion of optimization algorithms followed by a section about how to get a widely used cost function. After that a method section follows which discusses how to introduce a cost function based on differentiating splines that interpolate the state variable measurement points to approximate the state variable derivatives. The new cost function would require less computational effort to compute compared to the cost function used by Mendes.

In Chapter 3 the estimation performed by Mendes is repeated and also a test is made to see the effects of adding noise to the measurements. A small test system is also introduced as a reference.

In Chapter 4 the results obtained using the spline based cost function are presented, as well as the results obtained from separating the Mendes system into several small systems. The results from a method to decide a suitable sampling time and a modification of simulated annealing to examine multiple local optima are also presented in Chapter 4.

(17)

Chapter 2

Parameter estimation

2.1 Using time series measurements for

parame-ter fitting

A very common approach to parameter estimation is to make a time series mea-surement which can be used in a cost function. The cost function is a function of the parameters that are to be estimated. The cost function should be constructed so that it has its global minimum at the point where the parameters have their correct values. See (2.1) where f is the cost function, p is an arbitrary set of parameters and p0 is the optimal parameter values. In a perfect world p0 would

always be obtained as the estimated parameters but as the function f in (2.1) can be difficult to minimize it is not always possible to find the global minimum. Furthermore, due to noisy measurements, the global optimum will not in practice correspond exactly to the true parameter values.

min

p f (p) = f (p0) (2.1)

2.2 Brief discussion of optimization algorithms

2.2.1 Global optimization algorithms

Global optimization algorithms are designed with the purpose to find the global minimum even if the cost function has several other local minima. There are no global optimization algorithms that always can find the global minimum, it depends both on the function that is optimized and the optimization algorithm. SRES and simulated annealing are global optimization algorithms. Both SRES and simulated annealing are stochastic which means they are random or prob-abilistic but with some direction1. For more details about SRES and simulated annealing, see Sections 2.3 and 2.4.

1

According to W.B. Langdons (University College, London) glossary at http://www.cs.bham.ac.uk/ wbl/thesis.glossary.html

(18)

6 Parameter estimation

2.2.2 Local optimization algorithms

Local optimization algorithms are designed to find a local minimum. The local minimum may or may not be the global optimum. For example, simplex is a local optimization algorithm. Local optimization algorithms generally require signifi-cantly less computational power than global optimization algorithms.

2.2.3 Hybrid algorithms

A hybrid optimization algorithm is a mix of both a global and a local optimization algorithm. The hybrid algorithm first uses the global optimization algorithm, and when a criterion is met, which indicates that the global method is close to the global optimum, the hybrid algorithm switches to use the local optimization algorithm to speed up the last phase of the optimization. The difficulties in making a successful hybrid algorithm lies not only in choosing both a good global and a local optimization algorithm, but also in choosing a criterion that is fulfilled at an appropriate distance to the global optimum. If the criterion is fulfilled too far away from the global optimum the local optimization algorithm might find a local minimum instead of the global one, and if it is fulfilled too close to the global optimum the benefits of using the hybrid algorithm gets smaller.

2.3 SRES

SRES, Stochastic Ranking Evolution Strategy, is a global optimization algorithm that first generates an initial population, consisting of λ individuals which consti-tutes the first generation, according to a uniform n-dimensional probability distri-bution over the search space. Each individual is a set of real-valued vectors (xi,

σi),∀ i ∈ {1, ..., λ}, where xi can be seen as the set of parameters to be estimated

and σi is a step size used by the SRES algorithm to create new individuals. The

idea with SRES is to use the µ best individuals in the first generation to generate a new λ sized population which which is better than the first generation. From the second generation a third generation will be created and the procedure is con-tinued in the same manner until a preset generation number is reached and the algorithm is stopped and the best individual is saved. SRES also uses a penalty function φ that is specified by the user.

The individuals must be sorted somehow in order to decide which µ individuals that are to be considered the best ones and that are to be used to create the next generation. During the sorting one more search parameter is used by SRES, namely Pf which is the probability that only the cost function value is compared when

comparing two individuals and the penalty function, φ, value is ignored. In SRES a bubble-sort-like procedure is used to sort the individuals, see Figure (2.1).

When using SRES on the Mendes problem there is no need for a penalty function since the only constraints are the linear ones defining the search space.

(19)

2.3 SRES 7

1 Ij= j ∀ j ∈ {1, ..., λ}

2 fori = 1 to N do 3 forj = 1 to λ − 1 do 4 sample u ∈ U (0, 1)

5 if(φ(Ij) = φ(Ij+1) = 0) or (u < Pf) then

6 if(f (Ij) > f (Ij+1)) then

7 swap(Ij, Ij+1))

8 fi

9 else

10 if(φ(Ij) > φ(Ij+1))then

11 swap(Ij, Ij+1))

12 fi

13 fi

14 od

15 ifno swap done break fi 16 od

Figure 2.1. Stochastic ranking using a bubble-sort-like procedure where U (0, 1) is a uniformly distributed random number generator and N is the number of sweeps going through the whole population. When Pf = 0 the ranking is an over-penalization and

for Pf = 1 the ranking is an under-penalization.

used to create λ

µ new individuals,

x(g+1)_h,j = x(g)_i,j + σ_h,j(g+1)Nj(0, 1), (2.2)

which in total gives a new generation of λ new individuals. N (0, 1) is a normally distributed one-dimensional random variable with an expectation of zero and (2.2) is valid ∀ h ∈ {1, ...,λ

µ}. The complete information about how new individuals are

generated, in particular how the step sizes σ are updated, can be found in [5]. The description of SRES in this section is not complete, it is only here to give the reader an overview of the algorithm. The full description of SRES can be found in [5].

To summarize, the main steps of SRES are given below:

1. Generate the initial population which is the first generation. 2. Sort the individuals in a bubble-sort-like procedure.

3. Select the µ best ranked idividuals.

4. Generate a new generation of λ individuals from the µ selected individuals.

(20)

2.4 Simulated annealing

2.4.1 Introduction

Simulated annealingis a global optimization algorithm which is inspired by how the atoms in a metal align themselves as the metal cools down. The basic idea of simulated annealing is to introduce a search parameter T (called the temperature) which decides how likely the algorithm is to accept an uphill move. The temper-ature starts at a high level and then gradually decreases until it reaches zero. If the temperature is high the algorithm moves almost randomly and when it is de-creased the algorithm becomes more and more greedy until it at temperature zero becomes a pure downhill algorithm. Since the algorithm allows uphill moves it is less likely to get stuck in a local minimum.

2.4.2 Simulated annealing pseudo-code

Here the pseudo-code for simulated annealing, see Figure 2.22, is explained. The focus here lies on how the method will be applied to the Mendes problem using simannealingSBAO in this thesis. simannealingSBAO is an implementation of the simulated annealing method presented in Section 10.9 of Numerical recipes in C [6], with the modification that it is able to take into account simple box constraints on parameter values. simannealingSBAO is a part of the SBtoolbox3. For a more general and complete view of the simulated annealing method see [9] and [6]. Now follows a list with a comment for every line of the pseudo-code presented in Figure 2.2.

1. s := s0; e := E(s) // Initial state, energy.

2. sb:= s; eb:= e // Initial ’best’ solution.

3. k := 0 // Energy evaluation count.

4. whilek < kmax ande > emax // While time remains

// & not good enough: 5. sn:= neighbour(s) // Pick some neighbour.

6. en:= E(sn) // Compute its energy.

7. if en< eb then // Is this a new best?

8. sb:= sn; eb := en // Yes, save it.

9. if random() < P (e, en, temp(k/kmax)) then // Should we move to it?

10. s := sn; e := en // Yes, change state.

11. k := k + 1 // One more evaluation done.

12. return sb // Return the best solution found.

Figure 2.2. Pseudo-code for simulated annealing.

2

The pseudo-code in Figure 2.2 is taken from the wikipedia.com article about simulated annealing and is free to distribute according to the wikipedia.com copyright rules.

3

(21)

2.5 Simplifications of the cost function 9

1. First the initial state and the initial energy is defined. The s0in the Mendes

case is the parameter starting guess, and e is the cost function value evaluated in s0.

2. The best solution, sb, and energy, eb, is always saved, and initially only the

initial state has been looked upon so sb= s = s0and eb= e in the beginning

of the algorithm.

3. k is the number of times the energy has been calculated and is used as a counter for how many iterations of the algorithm that has been computed. 4. The algorithm can be set to continue until either a set number of

iter-ations has been reached or the energy reaches a low enough value. In simannealingSBAOthere is a number of predefined temperatures levels and at every temperature level a chosen number of iterations are made until the temperature is lowered to the next level.

5. Now the algorithm picks a neighbour of the the current state s. In the case of simannealingSBAO the neighbour is chosen by making an iteration with a simplex consisting of M +1 points where M is the dimension of the estimation problem. The simplex is also modified to make uphill moves possible, see [6] for details.

6. Compute the energy, en, of the neighbour sn.

7. Is en lower than the best energy eb?

8. If so, it should be saved. sb= sn and eb= en.

9. Should the algorithm move to sn? For every implementation of the

sim-ulated algorithm there is a probability function P (e, en, temp(k, kmax)). In

SimannealingSBAOP is implemented through the simplex mentioned earlier. 10. If the algorithm should move, then s = sn and e = en.

11. One more iteration has now been completed, k = k + 1 12. When the while clause is finished return sb as the solution.

2.5 Simplifications of the cost function

A common method for getting a cost function is by using single shot simulations and sum of squares. The simulation is performed using a model of the system that is to be simulated and a simulation program. The simulation program can either be implemented by the user or algorithm packages like SBtoolbox4 can be used. The cost function is often a sum of squares when using single shot simulations, with the sum of squares containing the difference between measured state values

4

(22)

and the state values received by simulating the system with a test set of parameters ((1.1) is an example of such a cost function).

If the simulations can be avoided or reduced in extent when evaluating the cost function, a lot of time would be saved. As can be seen in the equations (1.2) to (1.9), the state derivatives are functions of the parameters. This means that a new cost function can be constructed if the values of the state derivatives can be obtained through an approximation. The basis of the new cost function would be the absolute value of the difference between the left and right hand side in (1.2) to (1.9). The new cost function is shown in (2.3). Splines and other methods will be used to generate the approximate derivatives. More information on splines can be found in [7]. The spline cost function,

cexp(p) = X i X j µ dSij dt − fij(p) ¶2 , (2.3)

is here based on multiple states and a single experiment. dS

dt is a matrix containing

the derivatives of the splines, which in turn approximate the derivatives of the state variables in the model, with columns representing each state variable and rows containing the derivatives of those state variables. f is a matrix containing the derivatives as functions of the parameter values where i represents states and j measurements.

Since (1.2) to (1.9) does not depend on the same parameters, it is possible to separate the system into eight subsystems and attempt to solve one small problem for every state instead. It could be easier to solve several small problems than one large. The cost function obtained by separating the system can be seen in (2.4).

ci(p) = X j µ dSij dt − fij(p) ¶2 , (2.4)

is the cost function for state i and for a single experiment. dS

dt is a vector containing

the approximated derivatives of the splines, with a column representing the state and rows containing the derivatives of the state. f is a vector containing the derivatives as functions of the parameter values for state i and measurement j.

Using simulated annealing as described in [6] on the separated system could be helpful since that optimization algorithm may be more efficient than using SRES on problems with fewer parameters. Simulated annealing will be tried on the separated estimation problem. Modifying simulated annealing to keep track of more than one local optimum could also decrease computation time.

It is useful to know whether there are enough measurements of the states to correctly estimate the parameters since one might get erronous results if using too few measurement. Introducing a new parameter with a known parameter value could give a hint. If the introduced parameter is correctly estimated it is an indication that there are enough samples, this is discussed in Section 4.8.

(23)

2.6 Identification and parameter correlation 11

2.6 Identification and parameter correlation

The correlation ρX,Y between two random variables is defined as:

ρX,Y =

C(X, Y )

D(X)D(Y ), (2.5)

where C(X, Y ) is the covariance between X and Y and D(X) is the standard deviation of X, see [8] for more details.

In [4], a correlation measure is defined as follows: If there is a function, F (X, Y ), and X and Y are completely dependent upon each other, i.e. the corre-lation ρX,Y = ±1, it means a line can be drawn in the two-dimensional room with

X and Y as coordinates where F has the same value.

If a system has parameters that are heavily correlated it can be difficult to identify the system, i.e. difficult to estimate the correct parameter values.

In reality it is often impossible to decide the real correlation but instead an approximate correlation called the sample correlation, based on a limited amount of measurements, is calculated. A common sample correlation coefficient is called the Pearson product-moment correlation coefficient. In this work the script SBAOparamcorr written by Henning Schmidt5, based on the method de-scribed in [4] is used to calculate the sample correlation. Later in this work the sample correlation coefficient will be referred to simply as ’the correlation’.

5

(24)

(25)

Chapter 3

Using simulations to

evaluate the cost function

3.1 Introduction

To verify Mendes results and to get something to refer to, the Mendes experiment will be repeated in this chapter. There will also be a small investigation on how noise affects the estimation results. Finally a small system with only two states will be introduced to have something that is less time consuming to estimate to try new approaches on.

3.2 Repeating the Mendes experiment

The Mendes experiment was repeated, although only with 4000 generations of the SRES algorithm to get a reference. It took 125 hours to complete the estimation compared to 40 hours for Mendes, and he let SRES run 8000 generations. Also worth noting is that Mendes used a P3 866 Mhz, while an Athlon 1.2 Ghz was used here. The time difference occurs because of some difference between the implementations. Perhaps Mendes implemented everything in a faster environ-ment than Matlab or used a more effective simulation method for this particular problem. Also the simulation function used here, SBAOsimulate from SBtoolbox, could not complete the simulation when given strange parameter values and thus a number of individuals could not be simulated in every SRES generation. Mendes generated measurement data from 16 simulated experiments, and data from 16 experiments is used here as well. To generate 16 different experiments the values of the S and P parameters were varied as can be seen in Appendix F. The S and P parameters can be chosen rather freely and do not have to be estimated, which allows them to be varied to create different experiments without changning the values of the other parameters. Here the cost function SRES uses simulates the system 16 times for every evaluation since data is available from 16 different experiments.

(26)

14 Using simulations to evaluate the cost function

In Figure 3.1 the results can be seen. See also Table 3.1 which gives an idea of how many parameters that were correctly estimated. Of the 36 parameters, 34 were within 90% of the nominal value and 2 parameters had an even lower accuracy.

Accuracy No. of parameters 5% 12

10% 16 25% 24 90% 34

Table 3.1. Number of parameters with a certain accuracy after 4000 generations.

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 0 0.5 1 1.5 2 2.5 Parameter Value

Nominal parameter values Estimations

Figure 3.1. Parameter estimations after 4000 generations.

3.3 The impact of randomness

Randomness has an impact on how quickly the SRES method converges and whether it finds the correct optimum. To study this a few shorter test runs were made, see Figure 3.2. Only 400 generations were used, perhaps it would have been bet-ter with more generations, but due to the long simulation time this number was chosen.

In Figure 3.2 we can see that 400 iterations is not enough to get close to the real parameter values. However, a pattern can be seen that some of the parameters

(27)

3.3 The impact of randomness 15 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 0 0.5 1 1.5 2 2.5 3 Parameter Value

Nominal parameter values Run 1

Run 2 Run 3 Run 4 Run 5

Figure 3.2. Parameter estimations after 400 generations. Some symbols are missing since some estimations were outside the plot window.

are easier for the algorithm to estimate than others since several of the runs got close to the right values for those parameters. It is more interesting to look at the cost function values after 400 generations, see Figure 3.3.

1 2 3 4 5 25 30 35 40 45 50 55 60 65 Test run Cost function

Figure 3.3. Cost function values for the 5 test runs each with 400 generations.

In Figure 3.3 we can see that the cost function values are spread out roughly between 25 and 65. This is a pretty substantial difference between different runs since the function value decreases slowly after 400 generations. Thus, randomness

(28)

can have a significant impact on the amount of time needed to achieve a certain cost function value.

3.4 Adding noise

What happens when noise is added to the measurement data? In Figure 3.4 the results after 4000 generations can be seen. If w is a random variable with an uniform distribution U (−0.5, 0.5) the noise used is epsilon · w · measurement data. The noise was generated with the following Matlab code, with epsilon set to 0.05: % referenceData contains 16 matrices each with data from 1 experiment. for k = 1:16,

A = rand(size(referenceData{k})); B = 0.5*ones(size(referenceData{k})); A = A - B;

referenceData{k} = referenceData{k} + epsilon*referenceData{k}.*A; end 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 0 0.5 1 1.5 2 2.5 Parameter Value

Nominal parameter values Estimations without noise Estimations with noise

Figure 3.4. Estimations after 4000 generations with a noise of between +-5% (based on uniform random distribution) added to all measurement data.

Nothing certain can be concluded about the impact of the noise, longer sim-ulations would be needed for that. However, the noise does not seem to have a drastic impact. The results after 8000 generations can be seen in Figure 3.5 and Table 3.3.

(29)

3.4 Adding noise 17

10% 17 25% 23 90% 33

Table 3.2. Number of parameters with a certain accuracy after 4000 generations with 5% random unit distribution noise.

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 0 0.5 1 1.5 2 2.5 Parameter Value

Nominal parameter values Estimations with noise

Figure 3.5. Estimations after 8000 generations with noise.

10% 23 25% 27 90% 36

Table 3.3. Number of parameters with a certain accuracy after 8000 generations with 5% noise.

(30)

Most of the estimated parameters are close to the nominal values, but not as close as they were in Mendes estimation. It would be interesting to see how good the estimations would be after a larger number of generations, but since it is time consuming to do it and the focus of this work is to develop faster cost functions to the optimization algorithm it will be left out. It took 290.55 hours to finish the 8000 generations on a university computer, P4 1.7Ghz, but Matlab was run over LAN so speed may have decreased due to that.

3.5 Differences in noise impact

Since the noise is randomized for every measurement, it can have different impact on different experiments. To get a picture of this effect, 5 estimations of 400 generations were run, see Figure 3.6.

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 0 0.5 1 1.5 2 2.5 3 3.5 4 Parameter Value Nominal Run 1 Run 2 Run 3 Run 4 Run 5

Figure 3.6. Parameter estimations after 400 generations with noise

Is it possible to say anything about the difference in noise impact? Not really, longer simulations would be needed to decide that. Also, there are measurement values from 16 different experiments used so probably the differences in noise impact for different experiments negates out each other. However when Figure 3.6 is compared to Figure 3.2, it seems like with noise more of the estimations are

(31)

3.6 Introducing a small test system 19

considerably off target. Not all parameters estimations are shown in the figures since some of them were really large, order 105_{, but almost all are shown in the}

figures. It is not possible to draw any real conclusions based on this though.

3.6 Introducing a small test system

Now let us introduce a small test system for which we can try different cost func-tions without needing so much simulation time, see (3.1).

State variables dx1 dt = k1− V1· x1 K1+ x1 dx2 dt = V1· x1 K1+ x1 − k2· x2 Initial conditions x1(0) = 5 x2(0) = 0 (3.1)

The nominal parameter values for the small system are: k1 = 0.1

V1 = 1

K1 = 1

k2 = 0.5

3.6.1 Results with simulation based cost function and the

small system

An estimation of the parameters of the small system was made using a time vector of [0:1:30]. The measurements of the states were noise free and the bounds on all parameters where [10−12 106]. SRES was used together with a cost function of

the same type as in the Mendes problem, which was evaluated with single shot simulations. A P4 1.7 MHz was used for this estimation.

The obtained parameter estimations after 60 generations of SRES were [k1 V1

K1k2] = [0.1000 1.0000 1.0000 0.5000] and the cost function value was zero. The

(32)

(33)

Chapter 4

Results

4.1 Introduction

The idea is to use splines to interpolate the data measurement points for all states and experiments and then differentiate the splines to get an approximation of the true derivatives. In the given model the equations for computing the deriva-tives from the parameters are given, and a cost function comparing the derivaderiva-tives computed from the splines and the equations can be constructed. In this way the time consuming simulations used in Chapter 3 can be avoided. This chapter contains information about what ’the spline method’ is and how it is implemented implemented, how Mendes system is separated into 8 small systems, the results of applying the spline method to the Mendes problem, how to decide a good sam-pling time for the seperated problem and an attempt to modify the optimization algorithm simulated annealing to examine multiple local optima.

4.2 Test structure

First the method of using splines to approximate state derivatives ’the spline method’ will be applied to the small test system, (3.1), to get an impression of how well the method works. Then the spline method will be applied to the Mendes system.

After that other curve fits than splines will be evaluated, for example fitting using exponential curve and smoothing splines.

Separating the Mendes system to one small system for every state, which gives eight subsystems, will be performed. The spline method will be used unless a more suitable curve fit is found. When the separation has been implemented different methods will be tested to estimate the parameters. Simulated annealing will be the primarily used optimization algorithm for the separated system.

For the separated system, a method to see whether the sampling time is ade-quate will be tested. Simulated annealing will also be modified with the target to be more efficient in finding the global optimum.

(34)

22 Results

4.3 Implementation

The splines were generated with the Matlab function csapi. For details see Ap-pendix D. The splines were differentiated with the Matlab functions fnder and fnval. The details can be seen in Appendix D.

The last piece of the puzzle is the cost function which, given a test set of para-meters computes the state derivatives for those parapara-meters. As output argument the squared sum of the differences between the derivatives for the test set of pa-rameters and the spline derivatives is given. Also φ, which is a penalty function value, is given as an output argument, but it is always zero unless there are non-linear constraints, which is not the case here. An example of such a cost function for the small test system can be found in Appendix E. The general description of a cost function for one experiment based on splines is described in (4.1). Equation (4.1) is based on the equations (1.2) to (1.9). If more than one experiment is used the cost functions for each experiment are added together. (4.1) is the cost function used in the spline method, and it is also used when the state derivatives are approximated by other curve fits or methods. The spline cost function,

cexp(p) = X i X j µ dSij dt − fij(p) ¶2 , (4.1)

is here based on multiple states for a single experiment. dS

dt is a matrix containing

the approximated derivatives of the splines, with columns representing each state and rows containing the derivatives of those states. f is a matrix containing the derivatives as functions of the parameter values. i represents states and j measurements.

4.4 Applying spline fit on the small system

The sampling points t = [0:1:30], and the bounds [10−12 106] were used. Ten

estimations, each with 300 generations of SRES were evaluated. Every estimation took between 45 seconds and 140 seconds, but only one took more than one minute. Six times out of ten the correct parameter values where obtained.

In all the estimations where the correct parameter values were obtained the parameter values were [k1 V1 K1 k2] = [0.0989 0.9977 1.0041 0.4980], which can

be compared to the nominal parameter values of [k1V1K1k2] = [0.1 1 1 0.5]. The

cost function value when the correct parameter values were obtained was 0.000131 in all six cases. The cost function value when the incorrect parameter values were obtained varied from 0.23 to 0.31.

One possible reason that the correct parameters only where obtained in only six out of ten estimations is that the parameters K1 and V1are highly correlated

(35)

4.5 Test on the Mendes system 23

4.5 Test on the Mendes system

A parameter estimation was made using the spline method and the same search parameters as in [1] (Table 4.1), except that 24000 generations were used instead of 8000. The parameter bounds are for 30 parameters [10−12106] and for the rest

[10−1 101], complete information can be found in Appendix A.

G 8000 λ 350 µ 30 Pf 0.45 Varphi 1 Time vector [0:6:120]

Bounds (See text)

Table 4.1. Search parameters used in [1], and other settings.

The results of the estimations can be seen in Table 4.2 Accuracy No. of parameters

5% 15 10% 17 25% 19 90% 30

Table 4.2. Results after 24000 generations with spline cost function on Mendes problem.

The average time needed per generation with the spline cost function was 3.67 seconds for this estimation, while with the simulating cost function the average time was 130.3 seconds per generation on the same computer. This means that the computation time was reduced by a factor of 35.5 per generation, but the the parameters were poorly estimated with the spline cost function. The spline cost function value at the received parameter set after 24000 generations was 0.09, and its value with the nominal parameters were 1.66. This means the cost function was inadequate in this case.

So how well did the splines approximate the states in this case? A few typical state trajectories can be seen in Figure 4.1. Those states that have a steep slope in the beginning are not approximated well by the splines. To improve the accuracy of the cost function more sampling points would be needed in the beginning. Also, after 40 seconds most of the states seem to stabilize and the curves do not contain so much more information after that point. It would probably be a good idea to increase the number of samples in the beginning and use fewer samples for the last part of the trajectories.

A simulation with time vector [0:1:40] was run as well, see Table 4.3. The results here are better than the earlier ones after only 12000 generations. The cost function has a better value (0.018) for the nominal values than the estimated

(36)

24 Results 0 5 10 15 20 25 30 35 40 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 Cubic Spline Exp1 G1 Exp 4 G2 Exp 5 G2 Exp 6 M2

Figure 4.1. Selected states approximated by cubic spline, G1, G2 and M2 are state names. Nominal state values are represented by solid lines and the splines by dashed lines. The time vector used was [0:6:120].

(37)

4.6 Using other curve fits 25

values (0.2642) so in this case the optimization method was inadequate and not the cost function. Looking at the data from the last generations, the function values were still declining so a longer simulation was run too, 57 000, generations but with no improvements. The local optimization algorithm simplexsb could not improve the function value much when initialized with the parameters received from SRES so the optimum found was indeed a local one.

10% 21 25% 24 90% 31

Table 4.3. Results after 12000 generations with spline cost function on Mendes problem, time vector = [0:1:40].

Another test run was performed with tight limits, [0 5] for all parameters, see Table 4.4. All 36 parameters are within 10% of the nominal value here. This indicates that the spline cost function is good enough to identify the global optima but that SRES have difficulties to not get stuck in a local optimum when the bounds are less tight.

10% 36

Table 4.4. Results after 8000 generations with spline cost function on Mendes problem, time vector = [0:1:60].

4.6 Using other curve fits

Other ways to fit a curve to the measurement data have been evaluated using the curve fit toolbox in Matlab. The implementation was almost identical to the spline implementation except that the corresponding curve fit toolbox functions were used instead of spline toolbox functions.

4.6.1 Fitting using exponential curve

Using the second degree exponential curve fit exp2 from the curve fit toolbox gave the fits in Figure 4.2. While exp2 gave a better fit than the cubic spline on Experiment 1, state G1, it performed worse on the other states. Using more sampling points does improve performance, just as for the cubic splines as well. Since exp2 does not interpolate the data points it might be more robust than cubic splines when there is much noise on the measurements.

(38)

26 Results 0 5 10 15 20 25 30 35 40 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Exp2 Exp1 G1 Exp 4 G2 Exp 5 G2 Exp 6 M2

Figure 4.2. Exp2 (second degree exponential function) curve fit fitted to chosen states. The time vector used was [0:6:120].

(39)

4.6 Using other curve fits 27

4.6.2 Smoothing splines

Using smoothing splines, see Figure 4.3, all the fits were worse than with cubic splines. This is expected since in this case the algorithm tries to smooth something that is exact which increases the error. Compared to exp2 the smoothing splines gives somewhat better approximations on all states with not so steep trajectories while exp2 gives a much better approximation on very steep trajectories.

0 5 10 15 20 25 30 35 40 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 Smoothing Splines Exp1 G1 Exp 4 G2 Exp 5 G2 Exp 6 M2

Figure 4.3. Smoothing splines fitted to chosen states. The time vector used was [0:6:120].

4.6.3 Other curve fits

Most other curve fits in the curve fit toolbox were studied briefly and two other interesting curve fits were found, namely gaussian fit (degree 6) and rational fit (degree 5/5). What the degrees mean can be found in the curve fit toolbox, they are only provided here to make it possible for the reader to look up and use the same fits that was used here. While they did provide very good fits to the data

(40)

28 Results

points, they showed some strange behaviour between the data points. Matlab also gave a lot of warnings when making the fits. Both these methods gave fair approximations of the derivatives when used on the small system, but due to their strange behaviour it seems better not to use those fits.

4.6.4 Conclusions on curve fits

When the measurement noise is small, it is probably best to use cubic splines, unless a lot of the states decline or rise very rapidly in the beginning. In the latter case exp2 could be best. Smoothing splines or exp2 may be better to use than cubic splines when the noise level is high. However, when the sampling points were as far apart as in this case it is possible that the exp2 is best since the big error the other fits showed on experiment 1, state G1, may be worse than the fits of exp2on the other states. To get a good fit, more sampling points are needed than the [0:6:120] used in the Mendes experiment regardless of fitting method used.

4.7 Separation of the system

Instead of estimating all the parameters at once it is possible to only estimate the parameters in the right hand side of the differential equation corresponding to a single state at a time, i.e. we consider only one of the equations (1.2)-(1.9) at a time. Since the Mendes system contains eight states this means it is possible to get eight smaller estimation problems instead of one big. When there are only three to six parameters to be estimated every time there may be better optimizing functions to use than SRES, for example simulated annealing.

Separating the system can lead to the same parameters being included in dif-ferent difdif-ferential equations that now belong to difdif-ferent new subsystems. This could lead to a risk that false new solutions are created. However, no such false solution was discovered in this thesis that resulted in a better cost function value than obtained with the optimal parameters.

4.7.1 Implementation

To construct a cost function for a single state the right hand side of dS

dt = f (p), (4.2)

where S is the state and p is the parameter vector, for the specified state is computed using the set of test parameters taken as input arguments (also see (1.2) to (1.9)). f is an arbitrary function, here one of those in (1.2) to (1.9).

The state derivative values obtained are then compared to the nominal deriva-tives and the square sum of the differences is given as out argument. The definition of the cost function for state i for a single experiment is

ci(p) = X j µ dSij dt − fij(p) ¶2 , (4.3)

(41)

4.7 Separation of the system 29

where dS

dt is a vector containing the approximated derivatives of the splines, with

a column representing the state and rows containing the derivatives of the state. f is a vector containing the derivatives as functions of the parameter values. i represents the state and j the measurements.

The Matlab implementation of the cost function for state 3 (E1) is provided in

Appendix B.

dS

dt is approximated by fitting a curve to the measurements of the state and

then taking the gradient of the fitted curve, since the true value of dS

dt is unknown.

Because of (4.2) we know that the optimal value of c(p) is zero. Even with approx-imation of dS

dt the optimised value of c(t) should be close to zero (due to numerical

errors it will not be exactly zero), but the estimated parameters will have an error that is proportional to the approximation error.

4.7.2 Estimation of the parameters of the subsystems using

approximations of the derivatives

Both SRES and simulated annealing were unable to find the correct minimas of the state cost functions, regardless of the sampling frequency.

Simulated annealing did find the correct minima for the states containing only three parameters quite often, but it had serious problems finding correct op-tima for the states containing six parameters. Sometimes simulated annealing could find the correct optima of some of the six parameter states but it was not often. On the bright side, only roughly 30 seconds was needed to make an estima-tion with simulated annealing using a [0:1:60] time vector. Different settings for simulated annealing were tested, but no significant improvement compared to the default settings were found. For example, different starting temperatures and number of iterations were tried.

Another approach was also tested, still on the Mendes system and with simulated annealing. The estimation obtained from the state cost functions using approximated derivatives were used as starting guesses for new estimations using simulated cost functions for the states. This leads to much better correct-ness in the estimations. The correct values were obtained in the majority of the runs. However, it took much longer time to complete those optimizations and since correct values were not obtained often enough this approach was abandoned. Finally, another attempt was made with simulated annealing using only cost functions based on spline approximated derivatives on the separated system, now with the obtained parameters from the first estimation used as initialization on the next and so on until no further improvement of the cost function value was obtained. In other words: repeat the same estimation many times but use the parameter values obtained in the previous estimation as starting guess. Since the default starting temperature of the simulated annealing algorithm used in this work is ten times the cost function value at the starting point, there is a chance that the algorithm can escape a local minimum if the minimum is used as a new starting point. For this attempt Brutalsolve was constructed, see Section 4.7.3.

(42)

30 Results

4.7.3 Brutalsolve

Below is the description of the algorithm Brutalsolve. The Matlab implementa-tion can be found in Appendix C.

1. Initialize simulated annealing with given initial values (The first parameter set) if it is the first iteration of the algorithm. Otherwise use the previous estimation to initialize simulated annealing.

2. The obtained cost function value with estimated parameters is compared to the cost function value for the previous parameter set.

3. If the new cost function value is better than the old one by at least 1% or if the new cost function value is better than the old one by at least 0.001 go to step 1. Otherwise the algorithm is ended and the best obtained parameter set is considered the solution.

Comment: The values 1% or 0.001 can be modified to chosen values

This approach turned out to be more successful and for the cost functions tested at first it always seemed to give the correct estimation most of the times but not always. Since the estimations did not take long time an idea would be to repeat the procedure several times to get a good robustness for the method. Even if it had to be run three to five times it would be very quick compared to other methods. Table 4.5 shows the data from ten Brutalsolve runs, i.e Brutalsolve was run ten times for each state and then the estimations with the lowest cost function values were chosen as the final estimations.

State Number 1 5 2 5 3 9 4 10 5 8 6 9 7 2 8 9

Table 4.5. Number of times the correct global optimum for each state was found when Brutalsolve was run ten times for each state, time vector = [0:1:60].

However, state seven caused problems. Only two out of ten times the method found the correct solution here, see Tables 4.5 and 4.6. After looking at the state seven cost function it was found to have two parameters (k2 and V2) that had an

(43)

4.7 Separation of the system 31

State 7 correct: 20 times out of 100

Table 4.6. Number of times out of 100 the parameters of state seven were correctly estimated, time vector = [0:1:60].

the parameters for state seven with one of those troublesome parameters replaced with its nominal value the correct solution was found nine out of ten times.

It seems that the Brutalsolve method can have problems with robustness when the parameters are closely correlated. The two parameters of state 2 that were the most correlated had a correlation of 0.9931, and Brutalsolve found the correct optimum 5 out of ten times. For state 6 the highest correlation was 0.9990, between the parameters V1 and k1, however all other parameters had low

correlation values. The means for the correlations was for state 2 0.5387, state 6 0.4875 and state 7 0.5667. State 7 did have the highest correlated parameters, but state 6 had a parameter pair that was almost as much correlated. However the distance to 100% correlation was 0.0003 for state seven compared to 0.001 for state 6, which so perhaps the difference in correlation is significant. State 2 had five parameters that were closely correlated with at least one other parameter while state 6 only had 2 parameters that were highly correlated with each other, which explains why the parameters of state 6 were easier to estimate even though the maximal correlation was higher than for the parameters of state 2.

Since the probability of getting state seven wrong is 0.8, running Brutalsolve 15 times gives a 0.815_{= 0.035 risk of not getting the correct solution even once.}

Running PEst(i=15) (PEst(i=15) means running Brutalsolve 15 times for every state and picking out the best solution) gives the method a robustness that is greater than 95% for this particular system, since estimating the parameters of the other states do not contribute to the risk of failure. When this was done it took 81.05 minutes to perform the estimation and all the parameters were correct, within a 6% margin of error from the correct values, see Table 4.8. The vector with the nominal values and the estimations are provided below. Running SimplexSB for 1 hour with a simulated cost function improved the estimations even further and if allowed to run longer it would have gotten even closer to the global optimum. SimplexSB was also tested on the non-separated cost function based on spline approximated derivatives but then it failed to converge to the correct optimum. There are probably quicker algorithms than SimplexSB to use if one wants to improve the results from PEst and Brutalsolve though. For example dn2gb, a local optimization algorithm, which was used in [3] seems to be very good, but since it was only implemented in Fortran it was not used here. The cost function value obtained with PEst(15) was 0.0055 and after running SimplexSB 1 hour it was 8.6 · 10−4. The results of running PEst(i=10) can be found in Table 4.7.

An easy modification to PEst that would improve its usability would be to save the results in a file every time all the parameters for all states have been estimated. This would mean that PEst(15) would have the results saved 15 times. The main reason for doing this is that PEst can be initiated without any termination condition, the user would manually end the execution of the function when he is

(44)

32 Results

Accuracy No of parameters 5% 34

10% 36

Table 4.7. Number of correctly estimated parameters within a certain margin of error when executing PEst(10), time vector = [0:1:60]. The execution of PEst(10) took 1 hour.

satisfied with the results. He would then have the results saved in a file.

A modification to Brutalsolve that could improve the results would be to, instead of ending the algorithm when the ending criterion is met, run simulated annealing again and continue until the ending criteria have been met a decided number of times. This would reduce the risk of getting stuck in a local optimum with less computational effort than running the entire Brutalsolve algorithm many times as PEst does. However if the default setting for the initial temperature (10 · magnitude of cost f unction value at initial guess) is used this could lead to a too low initial temperature to get away from the current local optima. The use of this modification would probably lead to a reduction in both speed and robustness. There was not enough time to try this approach.

Nominal = nominal parameter values

PEst = Parameter values obtained running PEst(15)

Simplex1 = Parameter values obtained after running simplexSB 1 hour initialized with the PEst estimations Simplex2 = Parameter values obtained after running simplexSB

5.92 hours initialized with the PEst estimations Nominal PEst Simplex1 Simplex2

1 1.0000 0.9429 0.9924 0.9979 2 1.0000 0.9996 1.0002 0.9998 3 2.0000 2.0003 2.0013 2.0006 4 1.0000 1.0006 1.0000 1.0000 5 2.0000 2.0016 1.9992 1.9997 6 1.0000 0.9427 0.9924 0.9979 7 1.0000 0.9635 0.9572 0.9974 8 1.0000 1.0001 0.9999 0.9998 9 2.0000 1.9998 2.0012 2.0001 10 1.0000 1.0009 1.0004 0.9993 11 2.0000 2.0017 2.0034 2.0010 12 1.0000 0.9634 0.9574 0.9977 13 1.0000 0.9528 0.9676 1.0014 14 1.0000 1.0002 1.0006 0.9996 15 2.0000 2.0012 2.0075 2.0025 16 1.0000 1.0002 1.0006 1.0022 17 2.0000 2.0025 1.9998 1.9961 18 1.0000 0.9529 0.9675 0.9996

(45)

4.7 Separation of the system 33 19 0.1000 0.1000 0.1002 0.1001 20 1.0000 0.9996 0.9994 1.0003 21 0.1000 0.1000 0.1002 0.1000 22 0.1000 0.1000 0.0993 0.0998 23 1.0000 0.9997 1.0003 0.9975 24 0.1000 0.1000 0.0993 0.1000 25 0.1000 0.1000 0.1003 0.1001 26 1.0000 0.9997 0.9992 0.9986 27 0.1000 0.1000 0.1003 0.1001 28 1.0000 0.9984 0.9961 0.9995 29 1.0000 1.0082 1.0044 1.0040 30 1.0000 1.0233 1.0239 1.0107 31 1.0000 1.0014 1.0060 1.0064 32 1.0000 1.0069 1.0153 1.0109 33 1.0000 1.0137 1.0062 0.9990 34 1.0000 0.9982 1.0005 1.0027 35 1.0000 0.9978 1.0010 1.0035 36 1.0000 1.0018 1.0024 1.0023 Cost function valules:

Nominal: 3.54e-025 PEst: 0.0055

Simplex1: 8.6048e-004 Simplex2: 1.1892e-004

Table 4.8. Parameter values obtained when executing PEst(15) and initialising simplexSBwith the obtained estimation.

Another possible use of the method would be to obtain good initial values for a different optimization algorithm that uses a simulation based cost function.

An attempt was made with the simplex method. First, using PEst which runs Brutalsolve a chosen number of times for every state, was used to run Brutalsolve one time for every state. The parameters in all states except state three and seven were correctly estimated, and 28 parameters were correctly esti-mated within a 10% margin of error. No more parameters was found in the 10-20% error interval. The estimations obtained were then used as initial values for the Matlab function simplexSB (downhill simplex method in multidimensions) with the simulated cost function. Simplex did improve the cost function value from 16.1 to 4.4, but the amount of parameters within 10% of the correct value did not improve. Actually only 20 of the parameters obtained from SimplexSB were within 10% of the correct value, and 27 within 20%. It seems like the estimation obtained from PEst was not in the direct vicinity of the global optimum, a different method than SimplexSB would be needed to find the correct solution given the starting guess from PEst. However for SRES with the simulated cost function it would take many hours to reach a cost function value of 16.1 which was obtained in a few minutes with PEst.

(46)

34 Results

4.8 Deciding what sampling time to use

When using splines or other methods to approximate the derivatives it is important that the sampling time is short enough, otherwise the approximations will be too bad. One way to ensure this, is to introduce new parameters for a chosen set of derivatives and estimate the derivatives. If the estimations of the introduced parameters are close to the approximated values, it can be assumed that the sampling time is sufficient. This method has some connections to leave-one-out cross-validation.

In Appendix G an example of how a new parameter representing the state derivative of state seven and experiment five at the first measurement point is in-troduced. The new parameter replaces one nominal derivative in the cost function and the part of the total cost function that represents the new parameter is

cij(pnew, p) = (pnew− fij(p))2, (4.4)

where i represents the state and j represents the measurement point. pnew is the

introduced parameter that represents a chosen state variable derivative and p is the original parameters.

The total cost function was calculated using (2.4) for every experiment, with the place that corresponds to the selected state variable derivative that is to be estimated replaced with (4.4). The cost function is minimized when the original parameters have their original values (because of the part of the original cost function that remains unchanged) and when pnew equals fij which by definition1

equals the state derivative.

The following tests were made on the separated system using spline approxi-mations to obtain estiapproxi-mations of the state derivatives. As long as a short enough sampling interval is used and the parameters are correctly estimated, the estimate of the derivative is very accurate, for example when the time vector is [0:1:60]. With this sampling interval six out of eight derivatives got correct estimations. Even for state two the derivative was correctly estimated even though only a local minimum was found there. When it was increased to [0:6:120] two out of eight derivatives were close to the correct ones and the other ones were way off. For [0:12:120] three out of eight derivatives were somewhat closely estimated.

One problem when using this method is to decide what is considered close to the true value of the derivative. Using only a relative value could be problematic since when a derivative is very close to zero, also a small difference could result in a very large relative difference. Using an absolute difference might be difficult as well since its size would have to depend on how large the derivative is.

The method to estimate derivatives to decide whether the sampling time is good enough, can probably not be seen as an exact indicator. If a very good estimation for all of the chosen derivatives can be obtained one can be pretty sure the sampling time is good enough. It is also important to choose derivatives from areas where it can be assumed to be more difficult to get good estimates of the derivatives, since the most difficult areas are the ones that will decide what sampling time that is necessary.

1

(47)

4.9 Modifying simulated annealing to examine multiple local optima 35

Even though it might be a little difficult to get a good numerical measure on how good the approximation of the derivatives are, a plot of the measurement data, the spline, the derivative approximation and the identified derivative can give us an indication. Let us try with state seven (G2) from experiment five. As can be seen on some of the curves in Figure 4.1, this is a state that requires a high sampling frequency to get a good cubic spline fit. The result of the derivative identification with time vector [0:6:120] can be seen in Figure 4.4. The identified derivative and the spline-approximated derivative points in totally different directions, as could be expected, since the sampling time was too long. The optimum was found in [p7 p8 p9 p9 p10 p11 p12] = [0.3169 0.9949 1.9788 1.0345 2.0079 0.3145] which

corresponds to the correct global optimum of [p7 p8 p9 p9 p10 p11 p12] = [1 1 2

1 2 1]. The obtained estimation is close to the correct one, but the derivative of the spline for state G2 from experiment five is significantly different from the the true state variable derivative, compare the solid red line and the dashed red line in Figure 4.1, which indicates the sampling frequency should be increased to get an even better estimation of the parameters.

With a time vector of [0:1:60], see Figure 4.5, the identified derivative is close to the one approximated by splines. State seven from experiment five is one of the most difficult states to get a good approximation for at t=0. For almost all other states an even better fit between the approximated derivative and the identified derivative can be found, if the correct optimum is found.

4.9 Modifying simulated annealing to examine

mul-tiple local optima

Modifying the algorithm simulated annealing so that it keeps track of several local minima at the same time could help to increase robustness when looking at functions with heavily correlated parameters.

How can this be done? To know if two sets of parameter values (let us call it a point) belong to the vicinity of the same optimum a local optimization algorithm could be used. Start the local optimization algorithm at both points and see if they converge to the same optimum. While it is a possible method to try, it does take extra time to perform the local searches for all necessary comparisons. Then one would go through the best points and select a chosen number of the best ones that do not converge to the same minimum. It could be a good method but the local searches would take time and since the function to be optimised is non-linear and has parameters that are highly correlated it is not certain it would work. But it might.

4.9.1 Implementation and results

The method I used was selecting the best point and then applying a penalty function to all points evaluated by the optimization algorithm in that temperature. The penalty function I used,

f (d) = db

Parameter estimation of biological pathways

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Parameter estimation of biological pathways

Parameter estimation of biological pathways

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan i Linköping

av

Abstract

Acknowledgements

Contents

Chapter 1

Introduction

1.1

Parameter estimation of biological systems

1.2

Problem description

1.3

The Mendes problem

1.3.1

The model

1.3.2

Optimization algorithm - SRES

1.4

Outline

Chapter 2

Parameter estimation

2.1

Using time series measurements for

parame-ter fitting

2.2

Brief discussion of optimization algorithms

2.2.1

Global optimization algorithms

2.2.2

Local optimization algorithms

2.2.3

Hybrid algorithms

2.3

SRES

2.4

Simulated annealing

2.4.1

Introduction

2.4.2

Simulated annealing pseudo-code

2.5

Simplifications of the cost function

2.6

Identification and parameter correlation

Chapter 3

Using simulations to

evaluate the cost function

3.1

Introduction

3.2

Repeating the Mendes experiment

3.3

The impact of randomness

3.4

Adding noise

3.5

Differences in noise impact

3.6

Introducing a small test system

3.6.1

Results with simulation based cost function and the

small system

Chapter 4

Results

4.1

Introduction

4.2

Test structure

4.3

Implementation

4.4

Applying spline fit on the small system

4.5