Evaluation of two Methods for Identifiability Testing

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Evaluation of two Methods for Identifiability

Testing

Examensarbete utfört i Reglerteknik vid Tekniska högskolan i Linköping

av

Peter Nyberg

LiTH-ISY-EX--09/4278--SE

Linköping 2009

Department of Electrical Engineering Linköpings tekniska högskola

Linköpings universitet Linköpings universitet

(2)

(3)

Evaluation of two Methods for Identifiability

Testing

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan i Linköping

av

Peter Nyberg

LiTH-ISY-EX--09/4278--SE

Handledare: Gunnar Cedersund

ike, Linköpings universitet

Christian Lyzell

isy, Linköpings universitet

Jan Brugård

Mathcore

Examinator: Martin Enqvist

isy, Linköpings universitet

(4)

(5)

Avdelning, Institution

Division, Department

Division of Automatic Control Department of Electrical Engineering Linköpings universitet

SE-581 83 Linköping, Sweden

Datum Date 2009-10-07 Språk Language Svenska/Swedish Engelska/English ⊠ Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport ⊠

URL för elektronisk version http://www.control.isy.liu.se http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-ZZZZ ISBN — ISRN LiTH-ISY-EX--09/4278--SE

Serietitel och serienummer

Title of series, numbering

ISSN

—

Titel

Title Utvärdering av två metoder för identifierbarhetstestning_{Evaluation of two Methods for Identifiability Testing}

Författare

Author

Peter Nyberg

Sammanfattning

Abstract

This thesis concerns the identifiability issue; which, if any, parameters can be deduced from the input and output behavior of a model? The two types of iden-tifiability concepts, a priori and practical, will be addressed and explained. Two methods for identifiability testing are evaluated and the result shows that the two methods work well if they are combined. The first method is for a priori identifiability analysis and it can determine the a priori identifiability of a sys-tem in polynomial time. The result from the method is probabilistic with a high probability of correct answer. The other method takes the simulation approach to determine whether the model is practically identifiable. Non-identifiable pa-rameters manifest themselves as a functional relationship between the papa-rameters and the method uses transformations of the parameter estimates to conclude if the parameters are linked. The two methods are verified on models with known identifiability properties and then tested on some examples from systems biology. Although the output from one of the methods is cumbersome to interpret, the results show that the number of parameters that can be determined in practice (practical identifiability) are far fewer than the ones that can be determined in theory (a priori identifiability). The reason for this is the lack of quality, noise and lack of excitation, of the measurements.

Nyckelord

Keywords Identifiability, Mean optimal transformation approach, Multistart simulated an-nealing, Sedoglavic observability test

(6)

(7)

Abstract

This thesis concerns the identifiability issue; which, if any, parameters can be deduced from the input and output behavior of a model? The two types of iden-tifiability concepts, a priori and practical, will be addressed and explained. Two methods for identifiability testing are evaluated and the result shows that the two methods work well if they are combined. The first method is for a priori identifiability analysis and it can determine the a priori identifiability of a sys-tem in polynomial time. The result from the method is probabilistic with a high probability of correct answer. The other method takes the simulation approach to determine whether the model is practically identifiable. Non-identifiable pa-rameters manifest themselves as a functional relationship between the papa-rameters and the method uses transformations of the parameter estimates to conclude if the parameters are linked. The two methods are verified on models with known identifiability properties and then tested on some examples from systems biology. Although the output from one of the methods is cumbersome to interpret, the results show that the number of parameters that can be determined in practice (practical identifiability) are far fewer than the ones that can be determined in theory (a priori identifiability). The reason for this is the lack of quality, noise and lack of excitation, of the measurements.

Sammanfattning

Fokus i denna rapport är på identifierbarhetsproblemet. Vilka parametrar kan unikt bestämmas från en modell? Det existerar två typer av identifierbarhets-begrepp, a priori och praktisk identifierbarhet, som kommer att förklaras. Två metoder för identifierbarhetstestning är utvärderade och resultaten visar på att de två metoderna fungerar bra om de kombineras med varandra. Den första metoden är för a priori identifierbarhetsanalys och den kan avgöra identifierbarheten för ett system i polynomiell tid. Resultaten från metoden är slumpmässigt med hög sanno-likhet för ett korrekt svar. Den andra metoden använder sig av simuleringar för att avgöra om modellen är praktiskt identifierbar. Icke-identifierbara parametrar ytt-rar sig som funktionella kopplingar mellan parametytt-rar och metoden använder sig av transformationer av parameterskattningarna för att avgöra om parametrarna är kopplade. De två metoderna är verifierade på modeller där identifierbarheten är känd och är därefter testade på några exempel från systembiologi. Trots att resultaten från den ena metoden är besvärliga att tolka visar resultaten på att antalet parametrar som går att bestämma i verkligheten (praktiskt identifierbara)

(8)

vi

är betydligt färre än de parametrar som kan bestämmas i teorin (a priori identifi-erbara). Anledningen beror på brist på kvalitet, både brus och brist på excitation, i mätningarna.

(9)

Acknowledgments

First I would like to thank my examiner Martin Enqvist and my co-supervisor Christian Lyzell for the help with the report. I also would like to thank my co-supervisor Gunnar Cedersund for all good answers to my questions and the different models from systems biology. Thanks to Jan Brugård who gave me the opportunity to do my thesis at MathCore. Last but not least I would like to thank my wonderful girlfriend Eva for sticking with me.

(10)

(11)

Introduction

A common question in control theory and systems biology is whether or not a model is identifiable. The reason for this is that the parameters of the model cannot be uniquely determined if the model is non-identifiable. Why is this important? The answer is that the parameters can have some physical meaning or the search procedures for the parameter estimates may suffer if these are not unique (Ljung and Glad, 1994). In systems biology this is a big problem due to few measurements compared to the number of parameters in the model. The models in systems biology are often described by systems of differential equations (Hengl et al., 2007) also known as the state-space representation

˙x = f (x, p, u)

y = g(x, p, u)

where x are the states, p the parameters, y the outputs, and u the inputs of the model. The dynamics is described by the formula ˙x = dx(t)_dt = f (x, p, u). The parameters are, e.g., reaction rates which have to be determined with the help of the measured data. As we will see there are two types of identifiability concepts. The first one regards only the model structure and the other one originates form the lack of quality of the measurements. In this thesis two methods will be evaluated regarding the identifiability issue. The first method focuses on the model equations and the second method handles also the quality of the measurements.

1.1 Thesis Objectives

In this thesis we will describe the difference between the two types of identifiability concepts which will be explained in Section 2. The main purpose of this thesis is to investigate methods for determining the identifiability property of a given model. This includes both implementation and evaluation of existing methods in the MathModelica software environment. The algorithms which have been

(14)

2 Introduction

translated from Matlab to MathModelica/Mathematica are described in Section 2.2 and in Section 2.3.

1.2 Computer Software

During the work with this thesis several programming languages have been used. A translation of two algorithms from Matlab to Mathematica has been done. For the simulation a Modelica-based language MathModelica and the Systems Biol-ogy Toolbox (SBTB) have been used and finally Maple has been required due to another algorithm that is written in that language. These four languages will be briefly described in this section. Some syntax examples are shown in Appendix A.

1.2.1 Mathematica

Mathematica is a program originally developed by Stephen Wolfram. The largest part of the Mathematica user community consists of technical professionals but the language is also used in education around the globe.

One useful feature with Mathematica is that you can choose what programming style that suits you. Ordinary or procedural programming goes hand in hand with functional and rule-based programming. The downside is that you can write one line of code that does almost anything but the reader understands nothing. This can be solved with proper comments and basic knowledge about the language. An example in Appendix A illustrates the different ways to write a piece of code that sums up all the even numbers from 1 to 1000. For more information about Mathematica see Wolfram (1999).

1.2.2 MathModelica

Due to the demand of experiment fitting, that is, fitting the model several1times to data to obtain estimates of the parameters, there is a need of simulation envi-ronments and in this thesis MathModelica has partly been used for the simulation. MathModelica System Designer Professional from MathCore has been used and it provides a graphical environment for modeling and an environment for simulation tasks. With the Mathematica link, which connects MathModelica with

Mathemat-ica, one can make use of the Mathematica notebook environment and its facilities

(Modelica, 2009). For more information about MathModelica see Mathcore (2009) and for the Modelica language see Fritzson (2003).

One of the main advantages of a Modelica-based simulation environment is that its acausal. There is no predetermined input or output (if you do not force it to be) for the components meaning, e.g., for a resistor component (see Appendix A) either R, i or v can serve as inputs or outputs depending of the structure on the whole system.

1

(15)

1.3 Organization of the Report 3

1.2.3 Matlab

Matlab is a language that is based on numerical computing. It is developed by The

MathWorks and it is used heavily around the world for technical and education purposes. Matlab can be extended with several toolboxes and in this thesis the SBTB has been used with the package SBADDON for the purpose of simulating in the Matlab environment.

1.2.4 Maple

Maple is a technical computing and documentation environment, based on a com-puter algebra system and originated from the Symbolic Computation Group at the University of Waterloo, Canada. Maple has been used because one algorithm for testing identifiability has been written in Maple. The algorithm will be explained in Section 2.5.

1.3 Organization of the Report

The organization of this thesis is as follows. The background theory and the methods for testing identifiability will be explained in Section 2. In Section 3 the results are presented. The two methods are first verified with the help of some examples. Furthermore, the implementation aspects from Matlab to

Math-Modelica/Mathematica are discussed. Thereafter, the methods will be tested with

the help of examples from systems biology. In Section 4 the conclusions and the proposal to future work are to be found.

1.4 Limitations

One limitation in this thesis is that we focus on systems biology and the examples are all taken from that field of science. Another limitation is that the the second method for determining the identifiability of a model, see Section 2.5, has not been translated to MathModelica/Mathematica and therefore has not been studied as thoroughly.

1.5 Acronyms

MOTA _{Mean Optimal Transformation Approach . . . 6}

MSA _{Multistart Simulated Annealing . . . 6}

ACE _{Alternating Conditional Expectation . . . 6}

SOT _{Sedoglavic’s Observability Test . . . 5}

(16)

(17)

Chapter 2

Theoretical Background

In systems biology, but also within many other areas, over-parametrization and non-identifiability is a big problem. This problem is present when there are parts of a model that cannot be identified uniquely (or observed) from the given mea-surements. There are two types of identifiability: a priori and practical. Practical identifiability implies a priori identifiability, but not vice versa. For a model with unknown parameters there is a possibility to determine these parameters from the input and output signals. If the model is a priori non-identifiable the parameters cannot be determined even if the input signals are free from noise.

Analyzing the model structure itself with respect to non-identifiability is called a priori (structural) identifiability analysis since the analysis is done before any experiment fitting and simulation. In this thesis the a priori or structural iden-tifiability regards the local ideniden-tifiability aspects of a model and not the global property. Global identifiability implies local identifiability, but not the other way around. See Section 2.1 for the definition of global and local identifiability. For nonlinear models numerous approaches have been proposed, e.g., the power se-ries expansion (Pohjanpalo, 1978), differential algebra (Ljung and Glad, 1994) and similarity transform (Vajda et al., 1989). However, with increasing model complexity, these methods become mathematically intractable (Hengl et al., 2007; Audoly et al., 2001). Due to the importance of time efficient algorithms a method proposed by Alexandre Sedoglavic has been used in this thesis. The algorithm, Se-doglavic’s Observability Test (SOT), is polynomial in time with the drawback that the result is probabilistic (Sedoglavic, 2002). The probabilistic nature of the algo-rithm originates from simplifications in the algoalgo-rithm to speed up the calculations (Sedoglavic, 2002). In Section 2.5 the SOT will be further explained.

There is another way to detect non-identifiable parameters, besides a priori identifiability analysis, and this is done with the help of simulation and parameter fitting. Non-identifiable parameters manifest themselves as functionally related parameters, in other words the parameters depend on each other. If the parame-ters are functionally related this leads to that it may only be, for example, the sum or quotient that can be determined. This results in that the parameters cannot be determined uniquely and the parameters are non-identifiable. Non-identifiable

(18)

6 Theoretical Background

Figure 2.1. This figure show the block diagram of the different paths used by the two algorithms. The simulation approach is the upper one. From the model equations the MSA algorithm produces estimates of the parameters which are used in MOTA to determine which of the parameters that are identifiable. The a priori identifiability analysis is a bit more straightforward; from the model equations the parameters which are a priori identifiable are determined with the help of the SOT.

rameters can be detected by fitting the model to data consecutively, several1times, to obtain estimates of the parameters which are then examined. One such method is presented in Hengl et al. (2007). Their method, Mean Optimal Transformation Approach (MOTA), is based on the simulation approach and is developed from the Alternating Conditional Expectation (ACE) (Breiman and Friedman, 1985). The simulation approach needs a parameter estimator that produces estimates of the parameters. One such algorithm is Potterswheel (Maiwald and Timmer, 2008) that Hengl et al, used. Another one is the well known Gauss-Newton algorithm. However in this thesis the Multistart Simulated Annealing (MSA) (Pettersson, 2008) is used instead. One of the advantages with the latter one is that it only needs function evaluations and not derivatives of the function. An advantage over the Gauss-Newton method is that the MSA algorithm can find parameter esti-mates that can describe the model fairly well even if the initial guess is far from the minimum. In Figure 2.1 the a priori identifiability analysis and the simulation approach are illustrated.

This chapter contains an introduction to some basic results about identifiability and observability. Furthermore, the two algorithms that have been used in this thesis are explained.

1

(19)

2.1 Identifiability 7

2.1 Identifiability

To be more specific and formal about identifiability, study the following case. Given a constrained model structure, a structure with constraints given by the model, dx(t, p) dt = f (x(t, p), u(t), t, p) (2.1a) y(t, p) = g(x(t, p), p) (2.1b) x0= x(t0, p) (2.1c) h(x(t, p), u(t); p) ≥ 0 (2.1d) t0≤t ≤ tf, (2.1e)

where x denotes the state variables, u the externally given input signals, p the system parameters, and y the observations. The initial values x0 = x(t0, p), and h denote all additional constraints formulated as explicit or implicit algebraic

equations.

A single parameter pi in (2.1) is globally identifiable, if there exist a unique solution for pi from the constrained model structure. A parameter with

count-able or uncountcount-able number of solutions is locally identificount-able or non-identificount-able,

respectively.

In theory one can assume that the measurements are ideal, e.g., noise-free and continuous, but in many situations this is not the case. As for biological data the measurements include observational noise. To take this into consideration we can generalize (2.1b) to

y(t, p) = g(x(t, p), p) + ǫ(t),

where ǫ(t) represents the noise.

As mentioned before, practical identifiability implies a priori identifiability, but not the other way around. Practical identifiability of a given model structure holds when we can determine the values of the parameters, with a small enough variance, from the measurements of the input and output signals. Due to the noise, the a priori identifiable parameters can become practically non-identifiable (Hengl et al., 2007).

2.2 Multistart Simulated Annealing

To determine the functional relationship between parameters there is a need for an algorithm that produces estimates of these parameters that can be further analyzed. In this thesis the MSA algorithm has been used. Another algorithm that can be used is the multi-experiment fitting that is presented by Maiwald and Timmer (2008). From now on we shall concentrate on the first one and use it to obtain the estimates of the parameters. These estimates are then used by the algorithm that is presented in Section 2.3, see also Figure 2.1.

(20)

The MSA algorithm is a random search method that tries to mimic the behavior of atoms in equilibrium at a given temperature. The algorithm starts at one temperature and then begins to cool down until its temperature reaches the stop-temperature. The temperature is proportional to the randomness in the search, the higher the temperature the more random is the search and for lower temperature the algorithm behaves more like a local search algorithm. One of the advantages with this algorithm is that it only needs function evaluations, not derivatives of the function. For more information about the technical aspects of the MSA algorithm see Chapter 3 in Pettersson (2008) and the references therein.

Algorithm 1Multistart Simulated Annealing

Requirements:Initial starting guess X0, start temperature T1, stop temperature Ts, temperature factor νtemp, cost function, low bound lb, high bound hb, a positive number σ and the number of iterations used for each temperature N .

1: Initiate parameters and put k = 1. 2: while Tk > Ts do

3: Perform N iterations of Annealed Downhill-Simplex, a random search (Pet-tersson, 2008), at temperature Tk. Each iteration gives a point ¯p and N

iterations gives the points P = [¯p1, ¯p2, . . . , ¯pN].

4: Set Tk+1lower than Tk, Tk+1= νtempTk and put k = k + 1.

5: The suitable restart points R are calculated with the help of clustering anal-ysis. From the N points a critical distance,

rk= π− 1 2 Γ(1 +k 2)F (σ, lb, hb) log(kN ) kN 1 k ,

is calculated. The restart points R are the points ¯p that have a distance

sufficiently far from the other points according to

R =p | || col(P ) − ¯¯ p ||2> rk ,

where F is a rational function and Γ is the Gamma function and col(P ) = [¯p1, ¯p2, . . . , ¯pN] denotes all columns in P except ¯p .

6: end while

7: return The best points from each valley that have been found during the last temperature iteration.

2.2.1 Cost Function

To determine which parameter vector ¯p that gives the best fit, the principle of

(21)

2.2 Multistart Simulated Annealing 9

points from the system that we want to estimate the parameters for. For a param-eter vector ¯p the constrained model structure given by (2.1a)-(2.1e) is simulated

and the outcome is a prediction ˆy(¯p)i. The input u is known and it is used when we simulate the model for different parameter vectors. The prediction error

ǫ(¯p)i = yi− ˆy(¯p)i is used to measure how good the fit is between the measured data points to the simulated. From the measured output data points yi and the simulated ones ˆy(¯p)i the cost function calculates the cost, e.g., by

V = N X i (yi− ˆy(¯p)i)2 std(¯y)2 , (2.2)

where std denotes the standard deviation. When the algorithm minimizes V in (2.2) we obtain estimates of parameter vectors that hopefully can explain the output from the system fairly well. The length of the simulation is determined by the measured data points yiwhich have been measured before the experiment fitting. The time samples determine which prediction ˆyi that will be used in, e.g., the cost function (2.2).

There are also possibilities to take other things into consideration than just the prediction error normalized with the standard deviation. We can choose how the weight is distributed between the prediction error, ˜V, and an ad-hoc extra cost,

¯ V, by

V = α ˜V + ¯V, (2.3)

where α determines the weight between the two terms. For instance, the measured signal has an overshoot and all ¯p that does not produce this overshoot is given a

larger cost in (2.3).

For models with more than one output, one choice of cost function is the trivial expansion of (2.2), V = N1 X i1 (yi1− ˆyi1) 2 std(y1)2 + N2 X i2 (yi2− ˆyi2) 2 std(y2)2 + . . . (2.4)

The model or the constrained model structure is simulated numerous times in the MSA algorithm to calculate the cost function which the algorithm minimizes with respect to the parameter vector ¯p. In this thesis all the models have been

simulated either with the help of Mathematica and MathModelica or by Matlab and SBTB. For the first case the model has been written in MathModelica and then transfered to interact with Mathematica by the Mathematica Link.

Acceptable Parameters

The MSA algorithm is an optimization algorithm that searches for an optimal parameter vector ¯p that minimizes the cost. How can this be of any use regarding

the issue of identifiability? The reason will be thoroughly explained in Section 2.3 where the MOTA algorithm is presented. The MOTA algorithm takes an n × q matrix K = [¯p1, ¯p2, . . . ¯pq] as input,

(22)

10 Theoretical Background K=      p1,1 p1,2 . . . p1,q p2,1 p2,2 . . . p2,q .. . ... . .. ... pn,1 pn,2 . . . pn,q     , (2.5)

where each row represents the r-th estimate of the parameters. The parameters are represented by the columns, e.g., the third row and fifth column represent the third estimate of the fifth parameter in the model. Here lies the answer of the question above; the MSA algorithm is used to produce this matrix K and this in done by searching for acceptable parameters. The acceptable parameters are all parameter combinations that have a cost near the cost of the best parameter vector so far. In the best parameter vector we mean the one with the lowest cost so far. In other words, all parameters that produce an acceptable cost are regarded as acceptable parameters. In this thesis we have used the threshold 110 percent of the best cost so far to determine if the cost is near or not. For each cost function evaluation the current parameter vector ¯p with a given cost is taken as acceptable

if the cost is near the best cost so far. These acceptable parameters are then taken as estimates of the parameters and are used to determine if there exist any functional relationships between them.

2.2.2 Settings

The MSA algorithm can be controlled by a number of settings that affect the random search. This section will explain which these settings are and what they do. The settings of the algorithm are presented in Table 2.1.

Table 2.1: Settings in the MSA algorithm

Temp-start the temperature where the algorithm starts

Temp-end the temperature where the search do its last iteration

Temp-factor the factor of the cool-down process, higher factor implies faster cool-down after each temperature-loop

Maxitertemp the number of iterations for each temperature

Maxitertemp0 the number of iterations when the temp-end has been reached

Max-time the maximum time the search can proceed until termi-nation

Tolx a critical distance between the evaluated points in the optimization

Tolfun a tolerance between the maximum and minimum func-tion value

Maxrestartpoints the number of possible restart points after each iteration in the temperature loop

Low bound a low bound for the parameters which are optimized

(23)

2.2 Multistart Simulated Annealing 11

The algorithm consists mainly of two nested loops, one outer loop called the temperature loop and one inner loop that contains the main algorithm. The tem-perature loop runs as long as the temtem-perature is above the critical stop temtem-perature setting. For each temperature this loop constructs a simplex, a geometrical figure, for each restart point. From the restart point, which is a q-dimensional vector, the simplex is created by modifying the restart point element by element. This modification is either a relative change or an absolute one. In the current imple-mentation the relative change2is 1.25 times the element and if the relative change results in an element which is larger than the high bound (or lower than the low bound) then an absolute3change of ±0.5 (the sign depends on the settings of the low and high bound in the MSA) is done instead. The result is a geometrical figure which consists of q + 1 corners. In the two-dimensional case the simplex is a triangle. Let R = ¯p = [p1, p2] denote a restart point. The simplex,

simplex = p1 1.25p1 p1 p2 p2 1.25p2 , (2.6)

is constructed by copying the restart point and also modifying its elements. For example, if the modification is only relative the simplex is the one shown in (2.6). The idea is to update the worst point/column, the one with the highest function value, in the simplex with a better one. In the inner loop the simplex is contracted, reflected, and/or expanded and the new point that has been retrieved is compared with the rest of the points in the simplex. When the the outer loop has iterated through all the restart points the next restart points are calculated with clustering techniques that take all the evaluated points with function values. Depending on a critical distance the output is new restart points that will be used in the next temperature. More information can be found in Pettersson (2008).

2.2.3 Input

The algorithm needs two sorts of inputs. The first input is a cost function, e.g. (2.2), that determine how the function evaluations will be conducted for each parameter vector ¯p. The second is a start guess of the parameters ¯p0 where the

algorithm starts the search. This start guess is similar to the internal restart points with the exception that it, in the current version, has to be a vector (a single restart point).

2.2.4 Output

The original outputs from the algorithm are the best points in each valley with its cost. However in this thesis we are not solely interested in the optimal point. We want to get hold of several parameter estimates that later on can be analyzed with respect to functional relationships between the parameters. These estimates or acceptable parameters form a matrix J that contains m number of estimations of q number of parameters. This matrix is then used, after some modification,

2

A 25 percent increase of the value in the element.

3

(24)

in the MOTA algorithm which will be explained in the next section. Usually the matrix J contains a large number of estimates. Due to computational complexity, further explained in Section 3.3, some problems would occur if we would use J as an input directly to MOTA.

2.3 Mean Optimal Transformation Approach

The Mean Optimal Transformation Approach (MOTA) was proposed by Hengl et al. (2007) and is a non-parametric bootstrap-based identifiability testing algo-rithm. It uses optimal transformations that are estimated with the use of the Alternating Conditional Expectation (ACE) (Breiman and Friedman, 1985). The MOTA algorithm finds linear and/or nonlinear relations between the parameters regardless of the model complexity or size. This functional relationship between the parameters is then mapped to the identifiability problem; a parameter that can be expressed by other parameters is not identifiable and vice versa.

2.3.1 Alternating Conditional Expectation

The Alternating Conditional Expectation (ACE) algorithm was developed by Breiman and Friedman (1985). It was first intended to be used in regression analysis but has also been applied in several other fields, e.g., Wang and Mur-phy (2005) used it to identify nonlinear relationships. The algorithm estimates, non-parametrically, optimal transformations. In the bivariate case the algorithm estimates the optimal transformations bΘ(p1) and bΦ1(p2) which maximize the linear

correlation R between bΘ(p1) and bΦ1(p2)

{ bΘ, bΦ}p1,p2= sup

˜ Θ, ˜Φ

| R( ˜Θ(p1), ˜Φ(p2)) | . (2.7)

In the core of the algorithm there is a simple iterative algorithm that uses bivariate conditional expectations. When the conditional expectation are estimated from a finite data set the conditional expectation is replaced by smoothing techniques4 (Breiman and Friedman, 1985).

The two-dimensional case can easily be extended to any size. Let K denote an

n × m matrix where n is the number of estimates and m is the number of

param-eters. Suppose that the m parameters have an unknown functional relationship and let Θ and Φi denote the true transformation between the parameters,

Θ(pi) = m X j6=i

Φj(pj) + ǫ,

where ǫ is normal-distributed noise . The algorithm estimates optimal transfor-mations bΘ(pi) and bΦj(pj), where j 6= i, such that

b Θ(pi) = m X j6=i b Φj(pj), (2.8) 4

(25)

2.3 Mean Optimal Transformation Approach 13

and where optimal means in the sense of (2.7).

The ACE algorithm differs between the left and right-hand side terms. The left-hand term is denoted as response and the right-hand terms as predictors. The calculation of (2.8) is done iteratively by the algorithm, new estimates of the trans-formation of the response serve as input to new estimates of the transtrans-formation of the predictors and vice versa.

The ACE algorithm is summarized in Algorithm 2. For further information about the algorithm see Breiman and Friedman (1985) and Hengl et al. (2007) and the references therein.

Algorithm 2Alternating Conditional Expectation

ACE minimizes the unexplained variance between the response and predictors. For e2(Θ, Φ1, . . . , Φp) = E[Θ(Y ) −Ppj=1Φj(Xj)]2 the algorithm is the following (Breiman and Friedman, 1985)

1: Initiate Θ(Y ) = Y /kY k and Φ1(X1), . . . , Φp(Xp) = 0.

2: while e2(Θ, Φ1, . . . , Φp) decreases do

3: while e2(Θ, Φ1, . . . , Φp) decreases do

4: for k = 1 to p do: Φk,1(Xk) = E[Θ(Y ) −P_i6=kΦi(Xi)|Xk], replace Φk(Xk) with Φk,1(Xk); The conditional expectation is replaced by smoothing tech-niques (Breiman and Friedman, 1985).

5: end for loop 6: end inner while

7: Θ1(Y ) = E[PiΦi(Xi)|Y ]/kE[ Pp

iΦi(Xi)|Y ]k, replace Θ(Y ) with Θ1(Y ). 8: end outer while

9: Θ, Φ1, . . . , Φp are the solutions Θ∗, Φ∗1, . . . , Φ∗p. 10: return

2.3.2 Input

The input to the MOTA algorithm is an n × q matrix K containing n estimates of the q parameters. This matrix is then analyzed with the respect to functional relationships between the parameters. How the algorithm finds these relations is the topic of the next section.

2.3.3 The Idea Behind MOTA

Non-identifiability manifests itself as functionally related parameters. These rela-tionships can be estimated by ACE and the idea is to use these estimates to

(26)

inves-14 Theoretical Background

tigate the identifiability of the parameters. If there exists relationships between the parameters, the optimal transformations are quite stable from one sample to another, from a matrix K1 to a new draw of the matrix K2. If the first matrix K1renders one optimal transformation then K2 will render a similar one if there

exists a relation between the parameters. If there is no functional relationship then these transformations will differ from sample to sample. This depends on the data smoother/filter applied by the ACE algorithm, (Breiman and Friedman, 1985; Hengl et al., 2007). This is what differs the parameters that are linked with each other from the independent ones. The process of drawing new matrices K is replaced by bootstrapping. Bootstrapping is a re-sampling method that creates re-samples from a dataset. In this case the dataset is the input matrix K. The outcome from the bootstrapping is a new matrix that has been created from ran-dom sampling with replacement from matrix K. The bootstrapping speeds up the algorithm significantly. Each matrix K is denoted as a single fitting sequence.

2.3.4 Test-function

A well behaved test-function is of greatest importance and due to robustness all es-timated optimal transformations are ranked. The following definition is presented in Hengl et al. (2007)

Definition 1 Let bφi

k(pkr) denote the value of the optimal transformation of pa-rameter pk at its r-th estimate (r-th row) in the i-th fitting sequence, and let card denote the cardinal number of a given set, i.e., the number of elements contained in the set. Then we define αi

k(pkr) as the function which maps each parameter es-timate of a certain fitting sequence onto its cardinality divided by the total number

N of fits conducted within one fitting sequence αik(pkr) = 1 Ncard Φik(pkr′)|r′ ∈ {1, ..., N } , Φi_k(pkr′) ≤ Φi_k(pkr) .

This function is then used to calculate the average optimal transformation. Note that the values of αi

k(pkr) is in the range [0, 1]. Let M denote the number of fitting sequences. The used test-function is the following

Hk:=vardr " 1 M M X i=1 αik(pkr) # , (2.9)

where var is the empirical variance,d dvar = _q−p1 P(...)2. A motivation is needed here. For parameters that have a strong functional relationship the average trans-formation ¯ αk(pkr) = 1 M M X i=1 αi k(pkr)

is independent of M , meaning the variance is constant. For parameters without any functional relationship the function αi

k(pkr) is not stable from a fitting se-quence to another. This leads to that αi

(27)

a fitting sequence to another one. This implies that ¯αk(t) → 0.5 when M → ∞, in other words zero variance. In the supplementary material of Hengl et al. (2007) the test-function is thoroughly explained. In the supplementary material it is shown that E[Hk] = (₁₂1 − Ordo(N−1/2₎₎1

M holds for parameters that are inde-pendent and for parameters which have a functional relationship the it holds that

E[Hk] = ₁₂1(1 −_N12).

When the test-function Hk (2.9) is low this indicates there is no functional relationship between the current response to the predictors. The other case, when

Hk is not low, is more difficult. There are three threshold values T1, T2 and T3 that are used to determine whether there exists any functional relationship

between the parameters or not. If the test-function falls below T1 this is regarded

as the response parameter has no functional relationship with the predictors. If the test function is between T1and T2there is not enough information to establish

whether there exists a functional relationship between the parameters. When the test-function is above threshold T2 there is enough information to confirm that

there exists a strong relation between the response and the predictors. The third threshold T3is for the performance of the algorithm and is not important for the

functionality of MOTA. In Hengl et al. (2007) and Hengl (2007) these things are thoroughly explained.

2.3.5 Output

MOTA determines which parameters that are linked with each other. These re-lationships are one of the outputs from MOTA. These relations can be seen in Table 2.2. The table has a simple structure, the different rows represent which parameter taken as the response parameter and the columns represent the predic-tors. For example, the first row is when the parameter p1is taken as the response.

Due to the simple structure of the table one can easily show these relations in a matrix form instead (2.10). In this thesis the matrix (2.10) is denoted the output

q × q matrix S where q denotes the number of parameters. The matrix contains

only zeros and ones. In each row, ones indicate which parameters that have a functional relationship. The matrix (2.10) indicates that the first parameter p1 is

independent of the other parameters. This is also true for p5. The second row

shows that when p2 is taken as the response the MOTA algorithm finds that p2 is

related to p2 (trivial) but also p3and p4. Row three and four also display this.

Table 2.2: The parameter relations from MOTA * p1 p2 p3 p4 p5 p1 1 0 0 0 0 p2 0 1 1 1 0 p3 0 1 1 1 0 p4 0 1 1 1 0 p5 0 0 0 0 1

(28)

16 Theoretical Background S=       1 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 1       (2.10)

This is an ideal output, a symmetric matrix. However, this is not the case for all matrices S. The reason for this is that all parameters do not have equal contribution strength5when taken as a predictor for a certain response. The less a parameter, a predictor, contributes to the response the noisier the transformation Φj(pj) becomes. Finally the noise is so high the algorithm can not determine it from an independent parameter (Hengl et al., 2007). A simple example of this is low gradient, p1 = 0.001p2+ ǫ. The transformation Φ2(p2) will be noisy

and the algorithm will have difficulties to conclude that the two parameters are functionally related. Uneven contribution strength can result in output matrices that have a non-symmetric shape. A problem arises from this, if the matrix is non-symmetric, is this due to uneven contribution strength of the parameters or is the result incorrect for some parameters? The MOTA gives also the r2_-values

r2= 1 − P

(Θ −PΦ)2

P_{(Θ − mean(Θ))}2 (2.11)

i.e., the fractional amount of variance of the response explained by the predictors, as output. This will be used in the next section when we will analyze the output from MOTA. The coefficient of variation

cv = std(¯p)/mean(¯p) (2.12)

will also be used according to the recommendation given by Hengl (2007). Another tool in the investigation of the identifiability issue of a model can be seen in Table 2.3. The table contains information about the parameter estimates and also information from the MOTA from a single run. The first column in the Table is the index, ix, of the response parameter. The table also contains the well known output matrix. Next are the r2_{-values which are the fractional}

amount of variance of the response explained by the predictors. The larger value (maximum is one) the more variance that is explained by the predictors. The

cv-value is the coefficient of variation. Note that the cv-value is not a percentage

value as ordinary. The #-column is the number of special parameter combinations that have been found by the MOTA algorithm. In this case the output matrix contains two identical rows, the first and the third, that shows that p1 and p3

are related. This information is stored in that column. The last column shows the parameter combinations which the response parameter ix has found related. The difference between the r2-value of the second row and the fourth is that the test-function (2.9) was below threshold T1, Section 2.3.4, when the parameter p4

was taken as response but between threshold T1 and T2 when parameter p2 was

5

(29)

taken as response. When the test-function drops below threshold T1 the r2 value

is never calculated which the zero value for row 4 shows. When the test-function has a value that is between T1and T2there is not enough information to establish

if there exists a functional relationship between the parameters. The r2-value is calculated and the algorithm does another loop with more predictors. If the test-function does not reach T2 the result is the one shown in row 2 in Table 2.3. The

output matrix shows that parameter p2 is not linked with any other parameters

but the r2-value is nonzero.

In Hengl (2007) there are recommendations on how to interpret the output from MOTA. Functional relations with r2-value greater than 0.9 and a cv-value greater than 0.1 are recommended. If the functional relation has been found more than once this is a strong indication that the parameters are linked.

Table 2.3: Output from a MOTA run and properties of the esti-mates ix p1 p2 p3 p4 r2 cv # pars 1 1 0 1 0 0.9936 0.5836 2 p1, p3 2 0 1 0 0 0.5203 0.6592 1 p2 3 1 0 1 0 0.9936 0.8413 2 p1, p3 4 0 0 0 1 0.0000 0.6263 1 p4

Accumulated Output Matrix

Due to the bootstrap-based technique to speed up MOTA, the output can vary from a run to another. The number of estimates from MSA also affects the output matrix. In this thesis we have taken this into account and we use several MOTA runs in hope that the result will be more robust. An accumulated output matrix is just an ordinary output matrix but the elements have been summed up, accumu-lated, from numerous MOTA runs. An example of an accumulated output matrix is S200₁₀₀=     100 0 100 0 0 100 0 0 100 0 100 0 0 0 0 100     , (2.13)

where the lower index stands for how many times we have run the MOTA algo-rithm, and the upper index stands for how many estimates that serve as input to the algorithm. In this case MOTA has been run 100 times (lower index) and each run has been conducted with 200 estimates (upper index) of each parameter. The elements of the matrix can be as large as the number of runs in MOTA, since the ordinary output matrix only contains ones and zeros. From the matrix (2.13) one can conclude that parameter p1and p3 seem to have some relation with each

(30)

2.4 Observability

The observability of a system S is determined by the state variables and their impact on the output. If a state variable has no effect on the output, directly or indirectly through other state variables, the system is unobservable. A definition is given in Ljung and Glad (2006).

Definition 2 A state vector x∗ _{6= 0 is said to be unobservable if the output is}

identically zero when the initial value is x∗ _{and the input is identically zero. The}

system S is said to be observable if it lacks unobservable state vectors.

Observability is related to identifiability. Parameters can be considered as state variables with time derivative zero. By doing so the identifiability of the parameters is mapped to the observability rank test. This observability rank test is performed by rank calculation of the Jacobian (Anguelova, 2007).

For a linear system

˙x = Ax + Bu, (2.14a)

y = Cx + Du, (2.14b)

the observability can be determined by the well known observability matrix

O(A, C) =      C CA .. . CAn−1      (2.15)

where n denotes the number of state variables. The unobservable states form the linear null-space of O (Ljung and Glad, 2006). There is an equivalent test named PBH test; the system (2.14) is observable if and only if

A − λI C

has full rank for all λ, (Kailath, 1980). When the parameters are considered as state variables the model often becomes nonlinear and therefore a tool that can determine the observability of the system is needed. One such tool, SOT, is presented in the next section. For more information about the observability rank test see Anguelova (2007).

2.5 Sedoglavic’s Method

Besides the Mean Optimal Transformation Approach (MOTA) that was presented in Section 2.3 the other main algorithm used in this thesis is Sedoglavic’s Observ-ability Test (SOT) (Sedoglavic, 2002). SOT is an algorithm that calculates the identifiability of a system in polynomial time. A Maple implementation of the

(31)

2.5 Sedoglavic’s Method 19

algorithm can be retrieved from Sedoglavic’s homepage (Sedoglavic, 2009). SOT is an a priori (local) observability algorithm that only focuses on the model struc-ture. Due to the polynomial time properties, this algorithm is of interest when investigating the identifiability6 of fairly large models. In the default settings the SOT is quicker than MOTA. With the input 14 × 233 matrix K a single run of MOTA takes around an hour and for the SOT the a priori identifiability analysis takes a couple of seconds (Intel Celeron M processor 410, 1.46 GHz). In this thesis the algorithm has only been used in its current form in Maple.

2.5.1 A Quick Introduction

The algorithm is based on differential algebra and it is related to the power series approach of Pohjanpalo (1978). In this section we will present the main result from Sedoglavic (2002).

Let Σ denote an algebraic system as the following,

Σ    ˙¯p = 0, ˙¯x = f(¯x, ¯p, u), ¯ y = g(¯x, ¯p, ¯u), (2.16) and assume that there is l parameters p = [p1, p2, . . . , pl], n state variables x =

[x1, x2, . . . , xn], m output variables y = [y1, y2, . . . , ym] and r input variables u =

[u1, u2, . . . , ur]. Let also represent the system Σ by a straight-line program which

requires L arithmetic operations, e.g., the expression e = (x + 1)3is represented as the instructions t1 := x + 1, t2= t21, t3= t2t1 and L = 3. The following theorem

is the main result (Sedoglavic, 2002).

Theorem 2.1 Let Σ be a differential system described in (2.16). There exists a probabilistic algorithm which determines the set of observable variables of Σ and gives the number of unobservable variables which should be assumed to be known in order to obtain an observable system.

The arithmetic complexity of this algorithm is bounded by O M(ν) N (n + l) + (n + m)L + mνN (n + l)

with ν ≤ n + l and with M(ν) (resp. N (ν)) the cost of power series multiplication at order ν + 1 (resp. ν × ν).

Let µ be a positive integer, D be 4(n + l)2(n + m)d and D′ be 2 ln(n + l + r + 1) + ln µDD + 4(n + l)2 (n + m)h + ln 2nD

If the computations are done modulo a prime number p > 2D′µ then the

proba-bility of a correct answer is at least (1 − 1_µ)2.

The detected observable variables are for sure observable. It is the unobserv-able variunobserv-ables that are unobservunobserv-able with high probability. If we choose µ = 3000 the probability of a correct answer is 0.9993 and the modulo is 10859887151 (Se-doglavic, 2002). As we pointed out earlier the algorithm is fairly complicated and more information can be found in Sedoglavic (2002) and Sedoglavic (2009).

6

(32)

2.5.2 Usage

This section will describe how to use the SOT which is written in Maple. For more information about the usage and the syntax of the algorithm see Sedoglavic (2009).

Given an algebraic system Σ (2.16) we write down the equation as the following example

Example 2.1: Calling Observability Test in Maple

f := [x*(a-b*x)-c*x]; x := [x]; g := [x]; p := [a,b,c]; u := []; observabilityTest(f,x,g,p,u) ;

f a list of algebraic expressions representing a vector field.

x a list of names such that diff(x[i],t) = f[i].

g a list of algebraic expressions representing outputs.

p a list of the names of the parameters.

u a list of the names of the inputs.

All parameters are regarded as states with ˙p = 0 and the algorithm tests if the states are a priori observable. If a parameter, regarded as a state variable, is a priori observable then the parameter is a priori identifiable, which was discussed in Section 2.4. The output of the algorithm is a vector that contains information about which parameters/states that are observable, which parameters/states that are unobservable and also the transcendence-degree; how many parameters are needed to be known for the system to be observable/identifiable.

2.5.3 Drawbacks of the Algorithm

The SOT implementation is a pilot implementation and therefore contains some defects like, the variable t must be unassigned, the list x of the names of state variables has to be ordered such that diff(x[i],t) = f[i] represents the vector field associated to the model, a division by zero can occur if the chosen initial conditions cancel the separant. Some functions can not be handled. For example, the use of a square root implies that we work on an algebraic extension of a finite field. Some variables and some equations have to be added in order to handle this case. The implementation is efficient for one output. If there are many outputs and if the computation time is too long then some tests can be done in the main loop and some useless computations avoided, (Sedoglavic, 2009).

(33)

Chapter 3

Results

In this chapter the results will be presented. First of all the algorithms that have been used will be verified. This is done with the help of examples for which the identifiability/observability properties are already known. The two algorithms have been applied to these examples and the outputs compared with the correct one. Furthermore, a comparison between the MOTA algorithm written in Matlab and in Mathematica is presented. Finally the algorithms, MOTA and SOT, have been applied to three models with different complexity and sizes.

3.1 Verifying MOTA

In the article of Hengl et al. (2007) there are two examples that are used to illustrate and demonstrate MOTA. Those examples have been reused and the results are presented below. In the first example the amount of Gaussian noise is not presented in Hengl et al. (2007) and in this thesis we have used ǫ ∈ N (0, 0.1). The second example is exactly the one that Hengl et al. (2007) used. All MOTA runs have the default settings which are: T1 = 0.01, T2 = 0.07, T3 = 0.08, the

number of bootstrap samples drawn from the input matrix K is set to half of the number of estimates of the matrix. Finally the number of bootstraps or fitting sequences is set to 35 in the algorithm.

3.1.1 Example 1

The first example contains four parameters. The parameters p2, p3 and p4 have

a uniform distribution on the interval I = [0 5]. The parameter p1 depends on

two other parameters according to p1= p22+ sin(p3) + ǫ, where ǫ ∈ N (0, 0.1). To

verify the MOTA algorithm we will draw 100 and 200 estimates independently of each other and compare the results. The correct solution is that the three first parameters p1, p2and p3are functionally related and p4is lacking any relationship

with the other parameters. In other words the parameter relations are as shown in Table 3.1 or in the output matrix form (3.1),

(34)

22 Results

Table 3.1: Relationships between the parameters * p1 p2 p3 p4 p1 1 1 1 0 p2 1 1 1 0 p3 1 1 1 0 p4 0 0 0 1 S=     1 1 1 0 1 1 1 0 1 1 1 0 0 0 0 1     . (3.1)

From now on we will prefer to use the latter representation when we show the relationship between the parameters. As already explained a symmetric matrix is not always the case due to different contribution strengths from the predictors to the response parameter (Hengl et al., 2007; Hengl, 2007). We will also come aware of that the number of estimates, taken as input to MOTA, affects the outcome considerably.

100 estimates

When 100 estimates are drawn, the accumulated output matrix, described in Sec-tion 2.3.5, of 100 MOTA runs has the following composiSec-tion (100 runs and 100 estimates), S100₁₀₀=     100 0 100 0 0 100 0 0 100 0 100 0 0 0 0 100     . (3.2)

If one only looks at the matrix one would assume that p2and p4are independent

and that a functional relationship exists only between p1 and p3. The reason

for this faulty behavior of the MOTA algorithm originates from that only 100 estimates are drawn in this case. More estimates have to be drawn due to the low contribution strength. The number of fits required for the algorithm depends on the functional relations of the parameters (Hengl et al., 2007).

In Table 3.2, which is the outcome from a single MOTA run, we can see that the parameters p1and p3are functionally related and also meet the recommendations

from Hengl (2007), r2 ≥ 0.9 and cv ≥ 0.1. However, since the number of fits is too low the algorithm does not reveal that the parameter p2 is also related to p1

(35)

3.1 Verifying MOTA 23

Table 3.2: Output from a MOTA run and properties of the esti-mates from Example 1, 100 estiesti-mates

ix p1 p2 p3 p4 r2 cv # pars 1 1 0 1 0 0.9936 0.5836 2 p1, p3 2 0 1 0 0 0.5203 0.6592 1 p2 3 1 0 1 0 0.9936 0.8413 2 p1, p3 4 0 0 0 1 0.0000 0.6263 1 p4 200 estimates

In this experiment 200 estimates are drawn instead of 100 estimates in the previous experiment. The accumulated output matrix from 100 MOTA runs are shown below (100 runs and 200 estimates),

S100₂₀₀=     100 99 100 0 0 100 0 0 100 99 100 0 0 0 0 100     . (3.3)

The second row of S100200 is the same as the S100100. If one only looks at this row

one could assume that parameter p2 is lacking any relationship with the other

parameter. However, row one and three indicates that parameters p1, p2 and p3 have a strong functional relationship (correct assumption). This situation,

when some of the rows contradict each other, is a common result from MOTA. One reason is the contribution strength, another is the bootstrap-technique that results in some random behavior of the MOTA algorithm. A third reason, as mentioned previously, is due to the number of estimates taken as input. It can be a bit tricky to choose how many estimates that are needed for a specific run since the underlying relationship between the parameters is in general not known. The problem is that the more estimates that are used as input the longer the algorithm takes to calculate the relations between the parameters. However, if the number of estimates is too low, as seen above when 100 estimates are drawn, the output matrix can differ considerably from the correct one.

In Table 3.3 which is from a single MOTA run one can see that the param-eters p1, p2 and p3 are related. Row 1 and 3 show the same results, that all of

them are functionally related, and because of the nonzero r2-value in the second row there is likely that p2 is related to the other parameters. The test-function,

when parameter p2is taken as response, is between the first and second threshold

which indicates that the algorithm can not conclude if there exist any relationship between the parameters for that case. However, due to the test-function does not drop below threshold 1 and there are other rows that show that p2are functionally

(36)

24 Results

ix p1 p2 p3 p4 r2 cv # pars 1 1 1 1 0 0.9992 0.5590 2 p1, p2, p3 2 0 1 0 0 0.4728 0.6062 1 p2 3 1 1 1 0 0.9992 0.8621 2 p1, p2, p3 4 0 0 0 1 0.0000 0.5788 1 p4

3.1.2 Example 2

The second example, also taken from Hengl et al. (2007), contains seven parameters which are related as

p1= −p2+ 10

p3= 5

p4p5 (3.4)

p6= η p7= 0.1,

where p2, p4, p5and η are all uniformly distributed, drawn independently from the

interval I = [0 5]. The input to the MOTA algorithm is a matrix K = [ ¯p1, . . . ¯p7]

and the output matrix should look something like this

S=           1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1           . (3.5)

Two draws, one consisting of 100 estimates and the other one of 200 estimates, are presented below.

100 estimates

100 estimates of p2, p4, p5 and η are drawn from the interval I = [0 5] and the

other parameters are created from (3.4). The MOTA algorithm is run 100 times resulting in the accumulated output matrix

S100₁₀₀=           100 100 0 0 0 0 0 100 100 0 0 0 0 0 0 0 100 100 100 0 0 0 0 100 100 100 0 0 0 0 100 100 100 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 100           . (3.6)

(37)

3.1 Verifying MOTA 25

This is clearly the correct output for each run. Due to strong contribution strength the input matrix 100 × 7 K is sufficient to reveal these relationships between the parameters. The parameters p1 and p2 have a connection, p3, p4

and p5 show a functional relationship, and p6 and p7are independent of all other

parameters. From Table 3.4 one can see that the MOTA finds the relationships between the parameters and there is no ambiguity which parameters are linked and which are independent. The high r2-values for ix ∈ [1, 5] indicate that the predictors can explain the variance very well. The independent parameters p6and p7are identified fairly fast which can be seen in the zero r2-values which indicates

that the test-function drop below threshold 1 in its first iteration. During the first iteration, if the predictor that gives the largest value of the test-function is not good enough the test-function is below T1 and the algorithm conclude that the

response parameter is independent.

ix p1 p2 p3 p4 p5 p6 p7 r2 cv # pars 1 1 1 0 0 0 0 0 1.0000 0.1924 2 p1, p2 2 1 1 0 0 0 0 0 1.0000 0.6134 2 p1, p2 3 0 0 1 1 1 0 0 0.9832 2.4188 3 p3, p4, p5 4 0 0 1 1 1 0 0 0.9778 0.6452 3 p3, p4, p5 5 0 0 1 1 1 0 0 0.9692 0.5753 3 p3, p4, p5 6 0 0 0 0 0 1 0 0.0000 0.5999 1 p6 7 0 0 0 0 0 0 1 0.0000 0.0000 1 p7 200 estimates

When 200 estimates are drawn for the matrix 200 × 7 K the result is the same, as it should be, with the 100-estimates-case. The accumulated output matrix is shown below S100₂₀₀=           100 100 0 0 0 0 0 100 100 0 0 0 0 0 0 0 100 100 100 0 0 0 0 100 100 100 0 0 0 0 100 100 100 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 100           . (3.7)

The number of estimates, or in the general case acceptable parameters, that the MOTA algorithm needs to behave well depends obviously on the structure of the underlying system. In this example a change from 100 to 200 estimates does not change the behavior of the algorithm as it does for the first example. If one suspects that the number of estimates is to few then a re-run of the algorithm with more estimates is a good idea.

(38)

26 Results

change. The r2-value when ix = 6 is no longer zero. The algorithm has problem to determine that parameter p6is independent of all other parameters in this case.

During the first iteration the test-function is between T1and T2 and the r2-value

is calculated. However, there are no other rows that indicate that p6 would be

linked to any other parameter. Even if the algorithm gives a nonzero r2-value this does not automatically mean that the parameter has an undiscovered functional relationship.

ix p1 p2 p3 p4 p5 p6 p7 r2 cv # pars 1 1 1 0 0 0 0 0 1.0000 0.2045 2 p1, p2 2 1 1 0 0 0 0 0 1.0000 0.5690 2 p1, p2 3 0 0 1 1 1 0 0 0.9934 2.9393 3 p3, p4, p5 4 0 0 1 1 1 0 0 0.9885 0.4898 3 p3, p4, p5 5 0 0 1 1 1 0 0 0.9894 0.5615 3 p3, p4, p5 6 0 0 0 0 0 1 0 0.6531 0.6175 1 p6 7 0 0 0 0 0 0 1 0.0000 0.0000 1 p7

3.2 Verifying Sedoglavic Observability Test

To verify SOT (Sedoglavic, 2002, 2009) the algorithm has been tested on linear and nonlinear models. The properties of these models are known and the output from SOT can therefore be verified. Most of the examples have been taken from Ljung and Glad (1994).

3.2.1 A Linear Model

The linear model

˙x1= −x1

˙x2= −2x2

˙x3= −3x3 y = 2x1+ x2

has been taken from Ljung and Glad (2006) page 174. The observability of the equation above can be analyzed with the observability matrix, (2.15). With

A =  −10 −20 00 0 0 −3   and C =2 1 0

(39)

3.2 Verifying Sedoglavic Observability Test 27

the observability matrix is

O(A, C) =  −2 −2 02 1 0 2 4 0   .

Since the third column only consists of zeros one can conclude that the state x3

is unobservable. SOT applied on the model equations above gives that the third state is unobservable which is correct.

3.2.2 Goodwin’s Napkin Example

Goodwin’s Napkin example, from Ljung and Glad (1994) page 9, is described by ¨

y + 2θ ˙y + θ2y = 0. (3.8)

In the current form one can not use the SOT, however with the help of the following transformations,

x1= y ⇒ ˙x1= ˙y = x2

x2= ˙y ⇒ ˙x2= ¨y = −2θ ˙y − θ2y = −2θx2− θ2x1,

one can use the SOT. When observing y, the output of the system, then SOT gives that all parameters and states are observable. This is also the conclusion made by Ljung and Glad (1994) page 9.

3.2.3 A Nonlinear Model

In Ljung and Glad (1994), page 9, a simple nonlinear model is presented. The model is described by the following equations

˙x1= θx22

˙x2= u y = x1.

With the algorithm presented in Ljung and Glad (1994) the output is that the system is a priori identifiable. SOT gives the same result.

3.2.4 Compartmental Model

A compartmental model is given as Example 4 in (Ljung and Glad, 1994) page 10. The model equations are

˙x(t) = Vmx(t)

km+ x(t)

− k01x(t) x(0) = D

Evaluation of two Methods for Identifiability Testing

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Evaluation of two Methods for Identifiability

Testing

Evaluation of two Methods for Identifiability

Testing

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan i Linköping

av

Abstract

Sammanfattning

Acknowledgments

Contents

Chapter 1

Introduction

1.1

Thesis Objectives

1.2

Computer Software

1.2.1

Mathematica

1.2.2

MathModelica

1.2.3

Matlab

1.2.4

Maple

1.3

Organization of the Report

1.4

Limitations

1.5

Acronyms

Chapter 2

Theoretical Background

2.1

Identifiability

2.2

Multistart Simulated Annealing

2.2.1

Cost Function

2.2.2

Settings

2.2.3

Input

2.2.4

Output

2.3

Mean Optimal Transformation Approach

2.3.1

Alternating Conditional Expectation

2.3.2

Input

2.3.3

The Idea Behind MOTA

2.3.4

Test-function

2.3.5

Output

2.4

Observability

2.5

Sedoglavic’s Method

2.5.1

A Quick Introduction

2.5.2

Usage

2.5.3

Drawbacks of the Algorithm

Chapter 3

Results

3.1

Verifying MOTA

3.1.1

Example 1

3.1.2

Example 2

3.2