Numerical Algorithms for Optimization Problems in Genetical Analysis

Full text

(1)

(2) .

(3)

(4)

(5)

(6)

(7) . ! "# $%%&. !! .

(8) " # $ % & '(() * +,+-.'/) *0 1)-+-).)-).-( 2 3& &&" 4 5 6.

(9)

(10) .

(11) !"# $ . !%! &# ! '##. ( )*) +. + , + ) , ./ -- $ . + 0, , +1 ,2 , , + + , ./ 34 -5 +33 *) +,,3 + +,, .. -. ./ - 5 6 73 5 899:5 ;<99 (225

(12) 5 "/-+ 2 ;5 = > $+3 22 6 ?) & -3 # ),, 5

(13) 22

(14) 5 3 . & ., *5 # - . + , ./ 34 -5 +33 *) +,,3 + .

(15) 4* !) .*3 . ) ) 3, * - ), . .. * 3 . @! 24 , -. * $ 5 A * @! ,22 - 24 , A) ?B3 , 3 . *,23 - ) , . C 2 2 , D , ) . ) * 24 , )4 -4?* 2 , D *) , . , - ) 2 , . @! * " 5 ) * 24 , ) * 43 * , 4 ?* 0 1 . , * C 22 ) *. B3 ? A , ) ,2 , *) , . * . 2 , D !) - ), 2 ) @! 2 , D 24 , C. )A ) 2 4 3 ) A *) , 24 , A ) 32 E @!. .. * **3 5 ) ) A+ 3* A ) 32 A ,- 3 *,2 3 - -4 2 , D # * 5 A 3 3, * , ) . @! ,22 - A) * *,2 . , , 3 !) 3 ? 2 , D 24 , . *,23 - ) , . *) . @! * " 5 A *,2. .. 2 , D *) , 2 ) , . ) 2 * . * . ) 24 , !). 3 )A ) 3 . ) * , ) .. * 435 A) *) ) * . , ) 3 C 3 ) , 0 2 2 .,. ) 2 , D 25 2 ,. .. * - ), . ) *,23 C 2 *) , . 3* - ) 3,4 . 47 * .3 * . 3 5 A ** ) *,23 . ) . ). -? + ) 4 3* - .. * *) , . *,23 - ) . ). * ?* * , 0 ) *,2 . ) . ). -? + ) . ## ;E;?F8<: #& G:?;?:F:?:F?9.

(16) To my family.

(17)

(18) Contents List of papers. iii. Acknowledgments. v. Abstract. vii. Sammanfattning på svenska. viii. I 1. 2. Introduction. 1. This Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2. QTL Models and Numerical Models 1.1 QTL Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Genetic Distance . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 QTL Mapping using Linear Regression and Maximum Likelihood Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.3 ML and REML . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The Least Squares Method for QTL Mapping . . . . . . . . . . . . . . . 1.2.1 Local optimization . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 The Restricted Maximum Likelihood Method for QTL Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Models with a Single QTL . . . . . . . . . . . . . . . . . . . . . 1.3.2 Local and global searches . . . . . . . . . . . . . . . . . . . . . 1.3.3 Models with Forward Selection for an Additional QTL and Interaction Effects . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.4 The AI-REML Method for Log-Likelihood Maximization . . . .. . . . . . . . . . .. 3 3 3. . . . .. . . . .. 4 4 5 6. . . . . . . . . . . . . . . .. 7 7 8. Numerical Optimization 2.1 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Global and Local Optimization . . . . . . . . . . . . . . . . 2.3 Local Optimization Methods . . . . . . . . . . . . . . . . . 2.3.1 Unconstrained Newton-Based Optimization Methods 2.3.2 The Steepest Descent Method . . . . . . . . . . . . 2.3.3 Quasi–Newton Methods . . . . . . . . . . . . . . .. i. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . .. . . . .. . . . .. . . . . . 9 . . . . . 10. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. 12 12 12 14 14 16 17.

(19) 2.4 2.5. Methods for Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.1 The Active Set Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Downhill Simplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22. 3. Numerical Linear Algebra Algorithms for Variance Component Estimation. 4. Summary of the papers 4.1 Paper A . . . . . . . 4.2 Paper B . . . . . . . 4.3 Paper C . . . . . . . 4.4 Paper D . . . . . . . 4.5 Paper E . . . . . . . 4.6 Paper F . . . . . . . 4.7 Appendix I: Paper G 4.8 Appendix II: Paper H. 5. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. Summary and Future work. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 23. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 27 27 27 28 28 28 29 30 30 31. Bibliography. 32. II. 35. Papers. ii.

(20) List of papers This thesis contains the following eight papers: Paper A Ljungberg, K., Mishchenko, K. and Holmgren, S. Efficient Algorithms for Multi – Dimensional Global Optimization in Genetic Mapping of Complex Traits. Journal of Computational Methods in Sciences and Engineering (Submitted). Technical Report 2005-035, Division of Scientific Computing, Department of Information Technology, Uppsala University, Sweden. Paper B Mishchenko, K. Local Optimization for Genetic Mapping of Multiple Quantitative Trait Loci. Journal of Optimization and Engineering (Submitted). Research Report 2008-3, UKK, Mälardalen University, Sweden. Paper C Mishchenko K., Holmgren, S. and Rönnegård L. Efficient Implementation of the AIREML Iteration for Variance Component QTL Analysis. Computational Statistics and Data Analysis (Submitted). Research Report 2007-4, UKK, Mälardalen University, Sweden. Paper D Mishchenko, K. and Neytcheva, M. New Algorithms for Evaluating the Log-likelihood Function Derivatives in the AI-REML Method. Communications in Statistics (Submitted). Research Report 2008-2, UKK, Mälardalen University, Sweden. Paper E Mishchenko, K., Holmgren S. and Rönnegård, L. Newton-type Methods for REML Estimation in Genetic Analysis of Quantitative Traits. Journal of Computational Methods in Sciences and Engineering, 7(2007) (In Press). Research Report 2008-1, UKK, Mälardalen University, Sweden. Paper F Mishchenko, K., Rönnegård, L., Holmgren, S. and Mishchenko, V. Assessing a Multiple QTL Search Using the Variance Component Model. Manuscript, (2008). Appendix I: Paper G Mishchenko, K., Mishchenko, V. and Malyarenko, A. Adapted Downhill Simplex Method for Pricing Convertible Bonds. Theory of Stochastic Processes, 14(30): 130–147,(2008). Appendix II: Paper H Rönnegård, L., Mishchenko, K., Holmgren, S. and Carlborg, Ö. Increasing the Efficiency of Variance Component Quantitative Trait Loci Analysis by Using ReducedRank Identity-by-Descent Matrices. Genetics, 176: 1935–1938,(2007).. iii.

(21) Parts of this thesis have been presented at the following conferences: 1. The 8th SIAM Conference on Optimization. Stockholm, May 2005. 2. The 4th Annual Meeting of the Complex Trait Consortium. Groningen, the Netherlands, June 2005. 3. The 15th International Workshop on Matrices and Statistics. Uppsala, Sweden, June 2006. 4. The International Conference on Non-convex Programming - NCP07. Rouen, France December 2007.. iv.

(22) Acknowledgements During the years of my PhD study a lot of things have happened. I have met a lot of new people, learned a lot of new things, got a lot of new experience while working and communicating in different fields. In my private life a lot of changes have also occurred. As a result, I have changed a lot, too. This period was not easy, especially the first two years. Nevertheless, I am happy to have such an experience because I have learned a very important lesson: whatever happens, you will always get a help, even in unpredictable ways. There are many people I am greatly obliged to. Firstly, I would like to thank my supervisor Sverker Holmgren who has given me the possibility to participate in such an interesting scientific project. During my PhD study I have had a freedom of choosing the direction in my research. At the same time I knew that I would get support, guidance and help in scientific writing from Sverker. He always helped me to find easy solutions to the problems which seemed rather difficult for me. I really enjoyed all the time working in collaboration with my co-supervisor, Lars Rönnegård, who gave me an excellent introduction into the area of QTL analysis from both genetical and statistical perspectives. I am deeply grateful to Lars for helping me to understand the application problems, for interesting discussions and answering million questions I had during my study. I am also thankful to Örjan Carlborg and Kajsa Ljungberg for supporting and encouraging me at the initial stages. I would like to express special thanks to Lennart Egnesund who helped me to get in contact with Uppsala University and to initiate the the project presented in this thesis. Benjamin Baumslag, Jesper Oppelstrup and Stefan Ljung are acknowledged for their great support at the beginning of my study at MDH. I am grateful to Jan-Erik Käck, my research fellow during my first three years of study, for all his help, interest to my research, for scientific discussions and for keeping such a nice company. I am deeply obliged to Maya Neytcheva for her invaluable help with the numerical algebra issues, true scientific interest and wonderful support during my last months as a PhD student. Richard Bonner is greatly appreciated for his his kind advices, unforgettable discussions, great interest, help, support, assistance in English writing and friendship during the years of my stay in Västerås. Anatoliy Malyarenko is also one of the people who gave me a lot of help and support during my teaching and studying. Being an excellent expert in writing in LATEX, he provided a valuable help during the time of preparing my thesis. He was the initiator and co-author of the subproject dealing with the optimization in finance. My thanks are due to Dmitrii Silvestrov and Peter Gustafsson for their help and support in. v.

(23) administrative issues. Kicki, Eija and Mona are appreciated for their assistance with the arranging of lots of practical issues during my study as a PhD student. Additionally, my fellows at the former department of Mathematics and Physics are acknowledged for the warm atmosphere and hospitality. I am also very grateful to all my friends in Ukraine and Sweden for their love, friendship and support. Most of my gratitude is due to the whole of my big family. My father taught me to love mathematics, helped to step into scientific world and supported me during all the period of my study in Sweden. Especially this concerns the time when I wrote this thesis. For me, from the very childhood, he is the best example of a person devoted to science with all his heart. I thank my mother and beloved sister for their belief in me, patience, taking care of my mood all the time, and for their assistance in English writing. The distance between us is so great, but I always feel their boundless love and support. I am grateful to my husband, who was also my research fellow and co-author, for long hours of discussions, for his support, managing some of the programming, participating in testing and evaluating the results, taking care of kids and love. Finally, I would like to thank my dear Aleks and Anastasiya who are all for me. Whatever I do, I do it for them. Västerås, April 2008 Kateryna Mishchenko. vi.

(24) Abstract The focus of this thesis is on numerical algorithms for efficient solution of QTL analysis problem in genetics. Firstly, we consider QTL mapping problems where a standard least-squares model is used for computing the model fit. We develop optimization methods for the local problems in a hybrid global-local optimization scheme for determining the optimal set of QTL locations. Here, the local problems have constant bound constraints and may be non-convex and/or flat in one or more directions. We propose an enhanced quasi-Newton method and also implement several schemes for constrained optimization. The algorithms are adopted to the QTL optimization problems. We show that it is possible to use the new schemes to solve problems with up to 6 QTLs efficiently and accurately, and that the work is reduced with up to two orders magnitude compared to using only global optimization. Secondly, we study numerical methods for QTL mapping where variance component estimation and a REML model is used. This results in a non-linear optimization problem for computing the model fit in each set of QTL locations. Here, we compare different optimization schemes and adopt them for the specifics of the problem. The results show that our version of the active set method is efficient and robust, which is not the case for methods used earlier. We also study the matrix operations performed inside the optimization loop, and develop more efficient algorithms for the REML computations. We develop a scheme for reducing the number of objective function evaluations, and we accelerate the computations of the derivatives of the log-likelihood by introducing an efficient scheme for computing the inverse of the variance-covariance matrix and other components of the derivatives of the log-likelihood.. vii.

(25) Sammanfattning på svenska De flesta egenskaper som är medicinskt eller ekonomiskt betydelsefulla hos växter, djur och människor varierar på en kontinuerlig skala. Exempel är skördeutbyte, mjölkproduktion, blodtryck och kolesterolnivå. Sådana egenskaper påverkas av både miljön och flera samverkande genetiska faktorer. Områden i arvsmassan som påverkar kvantitativa egenskaper kallas QTL, som är en förkortning av engelskans "quantitative trait loci". Eftersom kvantitativa egenskaper är viktiga så är det intressant att studera deras genetiska bakgrund. En statistisk analys av sambanden mellan individers genetiska skillnader och skillnader i egenskapen kan avslöja var i arvsmassan det finns QTL som påverkar egenskapen. En sådan analys kan vara komplicerad, och ofta måste populationer som har enkla släktskapsförhållanden användas som bas för undersökningarna. En central del i den statistiska analysen är att maximera en matematisk funktion som talar om hur väl det går att beskriva variationen i egenskapen med den genetiska variationen. Denna maximering utförs med hjälp av ett beräkningsprogram i en dator. I avhandlingen tas nya, effektiva beräkningsmetoder fram för att den statistiska analysen skall kunna genomföras på ett snabbt och robust sätt för fall där man letar efter flera samverkande QTL. Denna typ av analyser har tidigare ofta varit mycket krävande, och man har fått nöja sig med att endast leta efter den dominerande QTL-positionen. Ur beräkningssynpunkt består problemet av två delar: Den första består i att så snabbt som möjligt räkna ut graden av modellanpassning för en enda kombination av QTL-positioner (funtionsevaluering), och den andra består i att söka igenom möjliga kombinationer av QTL-positioner och hitta den kombination som ger den bästa modellanpassningen (global optimiering). I avhandlingen studeras effektiva berä knigsmetoder för flera optimieringsproblem som uppkommer som delmoment i dessa beräkningar, t ex tas robusta algoritmer fram för de optimieringsproblem lösas för varje kombination av QTL-positioner om en mer avancerad statistisk modell (varianskomponentanalys) används. För denna typ av analys beskrivs också nya. effektiva beräkningsmetoder inom numerisk linjär algebra för vissa beräkningstunga delmoment. För en del viktiga problemställningar i QTL-analys fanns tidigare inga robusta och effektiva beräkningsmetoder, och de algoritmer som beskrivs i avhandlingen gör nu att denna typ av analyser kan utföras av t ex forskare i genetik.. viii.

(26) Part I Introduction. 1.

(27) This Thesis "It is unworthy of excellent men to lose hours like slaves in the labour of calculation." — Leibniz. In this thesis, efficient and robust computational procedures for solving optimization problems arising in the area of QTL mapping are derived. The schemes enable fast and accurate determination of a local minimizer to optimization problems with bound constraints and an objective function which may be non-convex of flat in one or more directions. In the first two papers in the thesis, standard linear regression models are considered. Here, the local optimization problems arising in a hybrid global-local scheme for determining the location of several QTL are studied. Different constrained optimization methods are adapted to the specific problem settings, and the schemes enable up to two orders of magnitude faster solution than the previously used methods. In the other four papers, more advanced models based on variance component estimation are studied. In two of these papers, local optimization schemes for maximizing the restricted likelihood function at a single position on genome for genetic models for a single QTL are considered, and the study is also extended to forward selection models including additive and interaction effects of two QTLs. We again consider constrained optimization schemes, and focus on how to approximate the derivatives of the objective function. We show that the optimization schemes derived result in robust performance for two- to five-dimensional problems. On average, about 20 iterations are required for five-dimensional problems. For practical genetic analysis, this is an important result. Previously used methods performs in an unpredictable way, and sometimes do not converge. In the two remaining papers, new numerical schemes for efficient computation of the operations within the optimization loop are considered. In this thesis summary, some material that has not been considered in detail in the papers is added. Also, the introduction is written in a form to be easily understood by readers who are not acquainted with the problems of QTL mapping and optimization. It describes the goal of the QTL mapping, gives a review of optimization and numerical linear algebra methods. Chapter 1 provides the basics of the QTL mapping problem and gives a brief review of the computational methods used in the thesis. Also, the underlying statistical models which are used in the thesis are introduced and described. Chapter 2 concerns the foundations of local optimization methods and explains of the choice of the schemes for the solution of the problems in the QTL mapping field. Moreover, the new features that have been introduced for enhancing the performance of the optimization procedures are pointed out. Chapter 3 gives an overview of the numerical linear algebra methods which are developed to speed up computations within the optimization loop for the solution of the QTL problem. Chapter 4 is devoted to a brief summary of the papers in the thesis. The introduction is finalized by discussion of possible directions of future research on numerical methods for QTL mapping problems.. 2.

(28) Chapter 1 QTL Models and Numerical Models 1.1. QTL Analysis. In most cases naturally inherited traits in humans and animals are characterized by quantitative variation, i.e. they are measured on a continuous scale. Quantitative traits are normally affected by a number of genes acting together. The contribution of each individual gene may be small, and variation because of environmental factors may be large. For instance, height and weight in humans are traits that are affected by the genetic composition, but there is no particular gene that is solely responsible for determining them, and environmental factors are very important for the height or weight of individuals. A QTL, or quantitative trait locus, is a location in the genome which affects a quantitative trait. The process of identifying QTLs is called QTL mapping. This is an important topic in biology today. Finding QTLs and estimating their effects is a first step towards understanding the biochemical basis of these traits and how they evolve in populations in the course of time. It should be also emphasized that identification of the key regions responsible for variation of traits such as overweight, predisposition to cancer, diabetes and cardiovascular deceases in humans, or production traits in animals, is of great significance and has large economical impact in medicine and animal breeding. In QTL mapping, statistical models which describe the relation between genotypes (genetic constitution) and phenotypes (visible properties of the organism) are employed. In the models, the conditional probability of a particular QTL genotype given an observed marker genotype is constructed. The analysis is simplified if some additional information about the genomes in the population is available. In this thesis, we focus on experimental crosses where animals from two divergent breeds have been mated for two generations according to standard schemes, producing a large number of second-generation offspring that supplied the data for the analysis.. 1.1.1. Genetic Distance. The distance between two loci in the genome can be expressed either as physical or genetic distance. The physical distance is the number of nucleotide pairs between the loci. Physical distance is time-consuming and expensive to measure since it requires sequencing of the DNA. The genetic distance m, which is more easy to measure, is the average recombination frequency c between. 3.

(29) the loci (recombination is the process of exchanging sections of DNA between two chromosomes when germ cells are produced), and is expressed in Morgan, M, or centi-Morgan, cM. It can be defined by the following equation, ln(1 − 2c) m=− , (1.1) 2 which is called a Haldane’s mapping function. 1 cM corresponds to 1% probability that a recombination occurs. There is no easily determined relationship between genetic distance and physical distances between loci. The recombination rate can vary between sexes, moreover, and cross-over is often suppressed in particular regions of chromosomes. Models for QTL mapping work with genetic distances, which puts a restriction on the possible resolution in determining the QTLs. In general, it is not viable to determine the locations more accurately than within one cM.. 1.1.2. QTL Mapping using Linear Regression and Maximum Likelihood Models. There are two principal methods classes used for QTL detection and estimation - linear regression models performed by ANOVA (analysis of variance) and variance component models performed by employing mixed linear models. In its simplest form, QTL analysis via ANOVA assumes that all animals with the two divergent breeds have no genetic variation within the founder breeds [13]. Moreover, the for the ANOVA estimates to be accurate, the sample sizes should be well balanced, with the number of the observations for each set of conditions being equal. The variances are estimated by equating observed mean squares to expressions describing their expected values, which are functions of the variance components. These models are the basis for the linear regression method [27] used in this thesis. From the computational standpoint, evaluating this type of model involves solving a single, linear least-squares problem. This is a rather cheap computation, and using linear regression methods may make models including multiple QTL computationally feasible. However, in reality substantial genetic variation within the breeds may take place. This can be taken into account by using a variance component QTL model [6, 9, 35]. Here, a mixed linear model with fixed non-QTL effects and a random QTL effects is used. Either maximum likelihood (ML) or restricted maximum likelihood (REML) interval mapping are used, and the QTLs are given by the regions that have the largest likelihood ratio. ML and REML schemes provide powerful approaches for estimating variance components in populations with complex known pedigree and may be applied to a wide variety of issues, including the estimation of non-additive genetic interaction. Unlike ANOVA, these methods do not put any special demands on design or balance of data. However, for evaluating maximum likelihood models, a nonlinear optimization problem much me solved iteratively which is more computationally demanding.. 1.1.3. ML and REML. Maximum likelihood (ML) and restricted maximum likelihood (REML) [40] methods are both used in QTL mapping. The ML method estimates the parameters of the distribution that maximize the likelihood of the observed data and tests alternative hypotheses with likelihood ratios. The distribution is assumed. 4.

(30) to be multivariate normal and it is supposed that all fixed effects are known without error. A strong feature of this method is the statistical efficiency, since it utilizes all available data, while a weak side is that ML methods may produce biased variance estimates and are often rather computationally expensive. This bias occurs because the observed deviations of individual phenotypic values from an estimates population mean tend to be smaller than their deviations from the true mean, [26]. Moreover, such bias can become large when a model contains many fixed effects, especially for small sample sizes. This problem is avoided by using REML estimators, which maximize only the portion of the likelihood that does not depend on fixed effects. The REML approach does not always remove all of the bias in the parameter estimation, however it is the preferred method for analyzing large data sets with complex structures. An introduction to ML and REML schemes applied to QTL mapping problems is given e.g. in [26, 28].. 1.2. The Least Squares Method for QTL Mapping. A linear model for QTL mapping in experimental populations has the following standard form, y = Ab + ε,. (1.2). where the matrix A is the n × k design matrix, b is the vector of k genetic effects, y is the vector of n phenotypic observations and ε is the vector of n residual errors. Here n is equal to the number of individuals and k ≤ n is the total number of parameters in the model. The main computational problem in QTL mapping is to find the positions of a set of d putative QTL. For a given model, any set of d positions x can be used to set up the design matrix A(x). The goal is to optimize the model fit over all possible positions x and to compute the corresponding residual sum of squares, RSSopt defined as: RSSopt = min kA(x)b − yk22 . b,x. (1.3). In some models, the system of equations includes a diagonal matrix of weights. The position x that minimizes (1.3) is denoted xopt , and is the most probable set of QTL positions for the given model. The expression (1.3) is a separable non-linear least-squares problem where the model is a linear combination of non-linear functions. According to [11], the solution of (1.3) can be separated into two parts: the inner (linear) and outer (nonlinear) problems. The inner problem is called as evaluation of the objective function and is given by RSS(x) = min kA(x)b − yk22 . b. (1.4). Methods for solving (1.4) are presented in [24, 23]. The outer problem, which is a global optimization problem, min RSS(x), (1.5) x∈Gd. has traditionally been solved using an exhaustive grid search. However, a more efficient global optimization scheme, based on the DIRECT method [21], was introduced in [25] and then refined and further improved in Paper A in this thesis.. 5.

(31) It is always possible to solve the problem (1.5) and find a set of hypothetical QTL positions xopt . However, the question if this corresponds to positions of true QTL remains. To evaluate the result, a statistical test is performed. RSSopt is compared to a significance threshold, which may be derived theoretically. However, a theoretical derivation depends on a large number of assumptions that may not be valid. An alternative method is to compute empirical significance thresholds for each experimental population and model using permutation tests [3, 5]. This method is robust but computationally demanding, since thousands of global optimization problems of the type (1.5) must be solved.. 1.2.1. Local optimization. The search space for the global optimization problem (1.5) is a d-dimensional hypercube Gd consisting of a set of Cd d-dimensional chromosome combination boxes, called cc-boxes. Each cc-box in turn consists of a set of d-dimensional marker boxes, or m-boxes, defined by the marker positions and the endpoints of the chromosome. The objective function RSS(x) defined in (1.4) is smooth within an m-box, but it has discontinuous derivatives across the m-box boundaries. Also, the function itself is discontinuous across cc-box boundaries. The main idea in the hybrid global-local optimization algorithm presented in Paper A is to use DIRECT to locate m-boxes with promising functions values, and then switch to a local scheme for performing the optimization within these. The local optimization problem is formulated as: min RSS(x). (1.6). s.t. x ∈ m−box.. (1.7). x. i , The d-dimensional m-box is defined by its boundaries xLi and xU i xLi ≤ xi ≤ xU , i = 1, ..d,. (1.8). The initial guess for the local optimization is the center of the m-box (this is the point where DIRECT performs the function evaluation at the outer, global stage of the algorithm). In paper B, an investigation of the local optimization problems for typical QTL mapping problems is performed. This shows that the objective function is often non-convex, and in many cases it is also very flat in one or more directions. Also, in a majority of cases, the minima are located on the boundary of the marker box. Below, the approach based on the model (1.2) and model fit (1.3) will be referred as to QTL model in LS formulation and the objective function RSS(x) will be denoted by f (x). Optimization methods for solution of this problem are the topic of Section 2.. 6.

(32) 1.3. The Restricted Maximum Likelihood Method for QTL Mapping. In this Section, we describe how REML models are applied to QTL mapping problems, including one or several individual QTL effects.. 1.3.1. Models with a Single QTL. The analysis of variance for a single QTL is based in the following general linear mixed model, y = Xb + Z1 u + e,. (1.9). where y is a vector of n phenotypic observations, X is the n × n f design matrix for n f fixed effects, Z1 is the n × nr design matrix for nr for random effects, b is the vector of n f unknown fixed effects, u is the vector of nr unknown random effects, and e is a vector of n residuals of random effects. In the QTL analysis setting, the additional assumptions that the entries of e are identically and independently distributed and that there is a single observation for each individual are imposed. The variance-covariance matrix associated with (1.9) then becomes V = σ12 Π1 + σ22 I,. (1.10). where σ12 is the variance of the random QTL effect and σ22 is the residual variance. The matrix Π1 is referred to as the Identity-By-Descent (IBD) matrix. In REML estimation, the task at hand is to determine estimates of σ12 and σ22 in (1.9) as well as the value of the likelihood function l at these points. Here, the parameters σ12 , σ22 are obtained as maximizers of the restricted likelihood of the observed data y. The log-likelihood function for the REML estimation approach based on model (1.9) is L ≡ −2 ln(l) = C + ln(det(V )) + ln(det(X T V −1 X)) + yT Py,. (1.11). where l is the likelihood function to be maximized and the projection matrix P is defined by P = V −1 −V −1 X(X T V −1 X)−1 X T V −1 .. (1.12). For this case, function L is a function of two variables, L(Σ) = L(σ12 , σ22 ), and the problem of maximizing the likelihood function l is equivalent to the problem of the minimizing the log-likelihood function L, which has a simpler representation. In summary, we determine the estimates of σ12 , σ22 by solving the local optimization problem min L(Σ) s.t. σ12 ≥ 0, σ22 > 0.. (1.13) (1.14). The generalized least square estimates of the fixed effect b and random effects u may be computed by first solving the optimization problem (1.13) - (1.14) and then computing b b and ub using the following expressions:. 7.

(33) ) b b = (X T V −1 X)−1 X T V −1 y , ub = σ 2 Π1 Z T V −1 (y − X b b) 1. (1.15). 1. where σ12 , σ22 in V are the optimizers for the problem (1.13) - (1.14). In papers C-F, we concentrate on estimating the variance components, Computing b b and ub at the optimum is a minor task. Numerical methods for solving the optimization problem (1.13)- (1.14) is the topic of Paper E. From a genetics perspective, it is often more interesting to study a model with a single QTL and an additional polygenic effect. This term models the combined effect of many genes spread throughout the genome, where each gene has a small effect on the trait under consideration. A linear mixed model for such a multivariate model is described by the following mixed model equations, y = Xb + Z1 u1 + Za a + e, (1.16) where Za is the n × na design matrix for the polygenic effects, a is the vector of na random polygenic effects, all other entries are given according to the model (1.9). Additionally, all random effects are assumed to be normally distributed. The variance-covariance matrix for the model (1.16) is V (y) = Π1 σ12 + Iσ22 + Aσ32 ,. (1.17). where σ12 and σ22 are again the variance of the random QTL effect and the residual variance, while σ32 is the variance of polygenic effects. The matrix A is referred to as the additive relationship matrix. Again, we determine the estimates of Σ = (σ12 , σ22 , σ32 ) by solving the optimization problem. s.t.. 1.3.2. σ12. min L(Σ) ≥ 0, σ22 > 0, σ3 ≥ 0.. (1.18) (1.19). Local and global searches. By solving thelocal optimization problems (1.13) - (1.14) and (1.18)- (1.19) we compute the maximal likelihood function at a given position τi in the genome. For variance component-based QTL mapping, this is the inner problem. A main QTL effect is searched for by searching the whole genome for position τ∗ that results in the largest maximum likelihood value. We refer to this search as a genome scan, or the outer problem. Computationally, this is again a global optimization problem. The dimensionality of the inner problem is determined by the number of variance components included in the underlying model. The dimensionality of the outer problem depends on the number of QTLs included in the model. So, the outer problem associated with models (1.9) and (1.16), including single QTL, corresponds to a one-dimensional scan. To solve the problem of finding several QTLs, possibly including interaction between them, a simultaneous search over several dimensions should be performed at the outer level. This multi-dimensional global search drastically increases the computational complexity of the method, since the inner problem must then be solved at many locations in a hypercube. In practice, a forward selection scheme is widely used for. 8.

(34) searching for several QTLs. Here, a one-dimensional search for τ∗ is first done, and then the corresponding QTL effect is included in the model as an additional random effect. Then a new search for another QTL can be made. Using forward selection, only one-dimensional global optimization problems are solved, and the computational complexity is smaller in comparison to simultaneous search for multiple QTL. However, it has been shown (e.g. in [2]) that schemes based on forward selection can be ineffective in detecting interacting QTL. In papers E and F, we develop efficient optimization schemes for solving the inner problem, i.e. maximizing the likelihood function at a given marker position. The outer, global optimization problem is solved using one-dimensional exhaustive grid searches using the forward selection technique. However, to be able to solve general problems with several QTL, efficient global searches should be exploited. Here, the DIRECT-based scheme developed in paper A could be employed.. 1.3.3. Models with Forward Selection for an Additional QTL and Interaction Effects. A forward selection associated with model (1.9) is given by, y = Xb + Z1 u1 + Za a + Z2 u2 + e.. (1.20). The variance-covariance matrix is given by, V (y) = Π1 σ12 + Iσ22 + Aσ32 + Π2 σ42 ,. (1.21). where Π2 is the IBD-matrix for the main QTL at position τ0 and σ42 is the variance of the random QTL effect u2 . The estimates for the forward selection problem (1.20) are obtained as minimizers of the problem. s.t.. σ12. min L(Σ) ≥ 0, σ22 > 0, σ32 ≥ 0, σ42 ≥ 0 ,. (1.22) (1.23). where Σ = (σ12 , σ22 , σ32 , σ42 ). Finally, the most complex variance-component model considered in this thesis also includes an interaction effect between the two QTLs. This is included by adding pair-wise epistasis between positions τ0 and τ in the forward selection scheme. Such a model is represented by the following mixed model equation, y = Xb + Z12 u12 + Z1 u1 + Z2 u2 + Za a + e, (1.24) with the variance-covariance matrix of y is given by, V (y) = Π1 σ22 + Iσ22 Aσ32 + Π2 σ42 + Π12 σ52 ,. (1.25). where Π12 is calculated as the Hadamard product between Π2 and Π1 ( see [37, 36]). The minimization problem has the same formulation as (1.22), with the vector of estimates Σ determined as Σ = (σ12 , σ22 , σ32 , σ42 , σ52 ).. 9.

(35) 1.3.4. The AI-REML Method for Log-Likelihood Maximization. The standard numerical methods used for REML estimation are computationally demanding and involve explicit inversion of large matrices. The optimization schemes used can be divided into several classes; derivative-free method, EM- methods, derivative-based methods and combinations of these schemes [28, 30, 26]. Derivative-free methods [12, 29], such as Downhill Simplex and Graser’s method, have been popular because of their computational feasibility. However, the have questionable numerical properties (slow convergence and often poor accuracy). Another type of methods widely used for likelihood maximization are the EM (expectationmaximization) schemes [4, 39]. These methods utilize only information of the first derivative of the objective function. However, the EM optimization often takes a considerably larger number of iterations than if Newton’s method is used. Also, they are sensitive to the choice of the initial guess which may result in multiple solutions for different initial points. In this thesis, we use Newton-based methods as the tool for solving the inner problems in variance component-based QTL analysis. Such methods require the gradient and Hessian of the log-likelihood function and are generally are known to be good schemes for local optimization. For animal breeding QTL mapping problems with variance component-based models, Newtontype methods have been the standard scheme for REML estimation. A common version of a Newton algorithms used for REML estimation is called as Fisher scoring. It is based on replacing the true Hessian by its negative expectation value, i.e. the Fisher information matrix F, [38]. The inverse of F contains the standard errors for the estimates, moreover, it is easier to compute the inverse of F than the inverse of the Hessian. Another popular version of Newtont’s algorithm is the average information, or AI-REML, method. Here, the Hessian is substituted by the average of the observed and expected Hessian, [8, 20]. This approach is even more powerful than the Fishers’ scoring method, since the expression for the matrix of second derivatives, denoted by AI, does not contain trace computations. As a result, the AI matrix is cheaper to evaluate than F. This motivated us to use the AI-REML method as the main tool for solving the variance component estimation problem in this thesis. Other authors have also employed methods based on modifications of the standard Newton’s method, introducing line searches, linearization techniques etc., see [15, 19, 14, 1]. The AI-REML method described in [20] is based on the Newton method for function minimization ( see Chapter 2.3.1) where the Hessian of the log-likelihood function (1.11), given by H(σi2 , σ 2j ) = − tr(. ∂V ∂V ∂V ∂V P 2 P) + 2yT P 2 P 2 Py 2 ∂ σi ∂ σ j ∂ σi ∂ σ j. i, j = 1, ..n. (1.26). is substituted by the average of itself and its expectation value, 1 ∂ 2L ∂ 2L ∂V ∂V AI(σi2 , σ 2j ) = ( 2 2 + E( 2 2 )) = yT P 2 P 2 Py 2 ∂ σi ∂ σ j ∂ σi ∂ σ j ∂ σi ∂ σ j. i, j = 1, ..n. (1.27). where n is equal to the number of variance components in the model. In [20], a special procedure for computing the derivatives of the log-likelihood is also described. For the problems considered there, the IBD matrices are sparse and positive-definite. This made. 10.

(36) it possible to simplify the computations. In short, the key idea is to avoid direct inversion of the variance-covariance matrix and computation of the factors included in the derivatives of the loglikelihood by using the mixed model equations [16]. However, this type of algorithm cannot be applied to QTL mapping problems, since the IBD matrices in the underlying models are singular and rather dense. In papers C and D we consider new approaches for computing the derivatives of the log-likelihood for the case of singular IBD matrices. The AI-REML methods described and implemented in Papers E and F build on solving a constrained optimization problem for determining the maximal likelihood. For these problems, a Newton-type method can provide a sufficiently good search direction, but for cases when the optimum is on the boundary of the feasible region it can produce infeasible solutions. In Paper E we suggest an improvement of AI-REML by utilizing a line search procedure that prevents constraint violation. However, the convergence of this scheme was shown to be slow, which led us to introducing constrained optimization methods that produced solutions in a robust and efficient ways. Besides the using the AI matrix for approximating the Hessian, we also use several BFGStype methods for updating the approximation. Besides the using the AI matrix for approximating the Hessian, we also use several BFGS-type methods for updating the approximation. Finally, we note that the components of the gradient of the log-likelihood function are given by the following analytical formulas, ∂V ∂V ∂L = tr( 2 P) − yT P 2 Py 2 ∂ σi ∂ σi ∂ σi. i = 1, ..n,. (1.28). , i = 1, ..n only need and should be computed in each iteration. It is clear that the derivatives ∂∂V σi2 to be computed once, which simplifies the computational procedure within the optimization loop. This is also the reason for using the squares of the variance components as the basis for the optimization. The problems of type (1.22) - (1.23) will be referred as to QTL model in VC formulation. They are given in the form min f (x), (1.29) s.t. x≥0, x2 >0. where f (x) denotes the log-likelihood, x is the vector of squares of variance components x = (σ12 , ..., σn2 ) and x2 is a strictly positive residual variance.. 11.

(37) Chapter 2 Numerical Optimization This section is devoted to a review of the main topic of this thesis, which is algorithms for numerical optimization. We start with describing the general notions of the area and discuss the differences between global and local optimization problems. The most general form of a gradient-based numerical method for local optimization is presented [32, 34, 7], and the QTL mapping applications as well as a the new approaches developed in the thesis are described. Also, the Downhill Simplex method [33], which is exploited for the convertible bond problem in paper G in appendix II is briefly described.. 2.1. Algorithms. Optimization algorithms are in general iterative, they start with an initial guess x0 and generate a sequence of improved iterates xi until they terminate, according to a pre-set termination criterion. The algorithms differ in what strategy is used to move from one iterate to the next. Many algorithms use information about the values of the objective function and its the first and second derivatives. Some algorithms accumulate information collected at previous iterations, while others use only local information obtained at the current point. According to [34], a good algorithm: Robustness Should have good performance on a wide classes of problems of, irrespective of values of the starting point. Efficiency Should not require excessive computer time or storage Accuracy Should identify a solution with some pre-set precision and should not be sensitive to errors in the data or to the arithmetic rounding errors that occur during implementation the algorithm. These requirement may (and often do) conflict. Tradeoffs between robustness and speed, and between convergence rate and storage requirements, are key issues in numerical optimization.. 2.2. Global and Local Optimization. The goal of global optimization is to find a point xgl ∈ S such that f (xgl ) ≤ f (x) ∀x ∈ S,. 12. (2.1).

(38) where S is the feasible domain bounded by a set of some constraints. The case when S is all the space Rn is referred to as an unconstrained optimization problem. The function f (x) is called the objective function and the point xgl is said to be the global minimizer of the problem (2.1). Moreover, if xgl satisfies the strict inequality (2.1) then it is the strict global minimizer. It is often hard to solved a global optimization problem in the sense that it is hard to prove that the computed solution is really a global minimizer. Not all functions have a finite number of global minimizers, and quite often the properties of the objective function properties are not sufficiently well known for a proof to be constructed. From a numerical standpoint this is manifested in the problem of choosing an appropriate termination criterion. Thus, the global solvers can in unlucky cases terminate at a local minimum. This is more common for functions with many local minima, and for these problems choice of initial guess is especially important. One special case of a global optimization problem is when a convex objective function is minimized over a convex feasible domain. For such a problem, a global minimizer exists and coincides with the local minimizer. Hence, for such problems it can be proved that they can be solved using local optimization methods. Unfortunately, the optimization problems occurring in QTL analysis are rarely convex. On the contrary, they are often non-convex, may contain flat or almost flat regions in the optimization landscape, or they may be noisy and have a lot of local minima. Global optimization for general problems is an active area of research, but with few commonly applicable results. However, for special type of problems such as global linear constrained problems, very efficient algorithms have been developed. Numerical methods for global optimization are much more complicated than local methods, and there is a wide range of different classes of methods applicable for different classes of problems [18]. The two main classes are deterministic and stochastic methods. Deterministic methods are commonly based on the idea of an exhaustive grid search, where the computations are completely determined by the information about the objective function obtained so far. Stochastic methods include some randomness into the search process as a means of avoiding getting stuck in local minima. In local optimization, the problem is solved in the neighborhood of the initial guess x0 . Under some assumptions on the objective function, the local search is guaranteed to find a point xloc such that f (xloc ) ≤ f (x) ∀x ∈ B, (2.2) where B is a neighborhood of xloc including x0 . The point xloc is called as strict local minimizer provided that the strict inequality in (2.2) is satisfied. This is why local optimization problems are normally much easier to solve than the global. The assumptions on the objective functions that are needed to guarantee that the local optimization problems can be solved are normally continuous differentiability of the objective function and its two first derivatives. The local methods exploit these properties of the objective function at a chosen point with their next extension within some small neighborhood of the point. Gradientbased methods are the common tool for solving local problems, however, the derivative-free methods also exist. Methods that use only the gradient of the objective function are only guaranteed to find a stationary point x∗ . For unconstrained problems, the point x∗ is a solution to ∇ f (x) = 0. Information about the second derivative at this point is required to confirm that the stationary point is actually a local minimizer. Newton-type methods are reliable and widely used for constrained and unconstrained opti-. 13.

(39) mizations. They are also used for solving optimization problems occurring in QTL mapping in this thesis. Below, we describe Newton-based methods for solving both constrained and unconstrained optimization problems.. 2.3. Local Optimization Methods. The local optimization problems appearing in QTL analysis are non-linear constraint problems with linear inequality constraints, min f (x), s.t. Gx ≥ b. (2.3) (2.4). For least-squares models, x is the d-dimensional vector representing the position of the set of hypothetical QTL and the constraints (2.4) represent the boundaries of a d-dimensional marker box. For the variance component-based problems, x is the d-dimensional vector of the variance components (or rather the squares of these entities) and (2.4) represents the d-dimensional nonnegative hyper-half plane. In both cases, the constraints have a quite simple structure which can be exploited when the schemes are set up.. 2.3.1. Unconstrained Newton-Based Optimization Methods. The necessary optimality conditions for the unconstrained version of the problem (2.3) are, ∇ f (x∗ ) = 0, ∇2 ( f (x∗ )) ≥ 0,. (2.5) (2.6). where ∇ f (x∗ ) and ∇2 f (x∗ ) are the gradient vector and the Hessian matrix of f (x) evaluated at the minimizer x∗ . The basic Newton method is derived from the first-order necessary condition (2.5) and has the following form: xk+1 2 ∇ f (xk ) · pk. = xk + pk , = −∇ f (xk ),. (2.7) (2.8). where xk+1 depends on the previous value xk and the direction pk . The vector pk is called the Newton direction and is the solution of the Newton equation (2.8). The idea behind the Newton method is that a nonlinear function f (x) is approximated by a quadratic model Q = Q(p) in each step of the iterative procedure (2.7) - (2.8). This quadratic model corresponds to the first three terms of the Taylor series expansion of f in the neighborhood of xn . Despite that the classical Newton method has a quadratic rate of convergence [34] it is rarely used in practice. There are two reasons for that. The first is that the Hessian matrix ∇2 f (x) may not be positive-definite, and descent can not be guaranteed. The second is the sensitivity of the method to the choice of initial guess x0 and possible slow convergence when the current iteration is far from the solution. In addition, it is not always feasible to determine the Hessian analytically.. 14.

(40) Guaranteeing descent The Newton direction is reliable only if the difference between f (xk + pk ) and the quadratic model Q(p) is small. As a consequence, the classical Newton iteration is not guaranteed to produce a descent direction. For the step ε taken along the direction pk the function value should decrease f (xk + ε pk ) < f (xk ),. (2.9). where 0 < ε << 1. A descent direction satisfies the inequality pT ∇ f (x) < 0,. (2.10). which follows from the Taylor series expansion of f in the neighborhood of the point (x + p). To fulfil the descent direction condition (2.10), the Newton direction defined by (2.8) has to satisfy the condition ∇ f (x)T [∇2 f (x)]−1 ∇ f (x) < 0, which is equivalent to that the Hessian ∇2 f (x) is positive-defininte. The Newton method with Hessian modification [34] is designed to maintain the positivedefiniteness of the Hessian. These methods iteratively modifies the analytical Hessian using different approaches. For example, some matrix can be added to the Hessian, making the sum positive-definite even if the Hessian is not. Other algorithms are based on modifying the eigenvalues or a factorization of the Hessian. Guaranteeing convergence The Newton method is based on a Taylor series expansion and may produce a poor approximation of the objective function far from the solution. To improve the performance far from the optimum, global features [32] can be added to the algorithm. Such auxiliary techniques are used to control and estimate the current solution but not to define the direction. In the early stages of the algorithms, when the current solution is far from the optimum, these global strategies play an important roll for preventing the algorithm from moving away from the optimum. Near the solution these strategies gradually become inactive. There are two fundamental global strategies for guaranteeing convergence: line search and trust region. Line search The line search strategy adjusts the distance of the step from the current iterate xk to the new iterate xk+1 such that a smaller objective function value is really obtained. The step length α is found as an approximate solution of the one-dimensional problem min f (xk + α pk ).. (2.11). α>0. An exact solution (2.11) will give the best value of the step along the direction pk . However, solving the minimization problem exactly can be costly and is usually unnecessary. In practice, a line search strategy generates a sequence of trial step lengths αk until it finds some appropriate function value. Such line searches are called as inexact. A popular inexact search condition the Armijo condition - requires a sufficient decrease of the objective function as measured by the inequality f (xk + α pk ) ≤ f (xk ) + µα∇ f (xk )T pk , (2.12). 15.

(41) where 0 < µ << 1. There are many different variants of line search based on the Armijo condition. In the thesis, we use the following Armijo line search: Algorithm 1 Armijo line search µ = 0.01 α =1 if f (xk + α pk ) > f (xk ) + µα∇ f (xk )T pk then α := α2 end if More advanced algorithms are based on the idea of making sufficient reduction in the objective function while preventing the algorithm from performing too short steps. One possibility is to use the curvature condition, ∇ f (xk + α pk )T pk ≥ c∇ f (xk )T pk , (2.13) where c1 ∈ (µ, 1). A line search scheme based on the conditions (2.12) and (2.13) is called the Wolfe line search. The curvature condition (2.13) can be strengthened to the strong Wolfe condition by forcing α to lie in a neighborhood of a local minimizer of f (xk + α pk ), |∇ f (xk + α pk )T pk | ≤ c|∇ f (xk )T pk |.. (2.14). Another algorithm, similar to the Wolfe line search, is the Goldstein line search, see [34]. In this thesis we work with local problems of type (2.3) where the evaluation of the objective function is rather costly. This puts restrictions on the optimization methods and in particularly on the procedure of adjusting the step lengths. This procedure has to be optimized in a way to achieve an adequate reduction in the objective function at a minimal cost and the amount of trial points in the line search should be minimized. A trust region type method is not feasible because of their complexity. A line search based on the Wolfe conditions (2.13) and (2.14) is considered to be the best choice when using the Quasi-Newton method for non-convex objective functions [34]. However, this approach requires additional function evaluations, and was also considered to be to costly. Instead, we use the Armijo line search (see Algorithm 1) because of its modest computational costs. This method requires one function evaluation f (xk + α pk ) for each trial step length α.. 2.3.2. The Steepest Descent Method. As it was pointed out earlier, the classical Newton method is rarely applied in practice. Introducing a line search technique will control the convergence but the question of choosing a descent direction remains to be unsolved. There are two potential problems: The Hessian is not necessarily positive-definite and a descent direction can not be guaranteed. Also, the Hessian computation is often computationally demanding or even infeasible. For example, for the local optimization problems arising from the least-squares model for QTL mapping, the objective function is only available as a "black-box" routine. This means that the point where the function is to be evaluated is sent to a computational. 16.

(42) routines that gives back the value of the function at this point. Neither the gradient nor the Hessian are directly available, and must be approximated in some way. The simplest way of providing a positive-definite Hessian is to use the Steepest descent scheme. In this method, the Hessian is substituted by the identity matrix. The computational cost here is the cost of a gradient computation, since the search direction is given for free. Despite this attractive feature it is normally not used in practice because of its slow convergence. In Paper A, we investigate the performance of steepest descent combined with the Armijo line search defined by ) xk+1 = xk + α pk , (2.15) pk = −∇ f (xk ) as a first test, since the algorithm is easy to implement.. 2.3.3. Quasi–Newton Methods. Like steepest descent, quasi–Newton methods require only the gradient of the objective function. However, unlike steepest descent such scheme may exhibit a super-linear rate of convergence. Quasi–Newton methods are one of the most widely used schemes for local optimization. The main idea here is to substitute the Hessian ∇2 f (x) by a sequence of matrices Hk which are updated on each iteration. The updating procedure makes use of the fact that changes in the gradient provide information about the second derivative of the objective function. The general quasiNewton iteration is given by ) xk+1 = xk + α pk . (2.16) pk = −Hk −1 ∇ f (xk ) Using elaborated procedures for the Hessian approximation, quasi–Newton methods are often more efficient than the classical Newton method with line search. Quasi–Newton methods differ in how the matrices Hk are obtained. To produce a direction of a descent, the matrices Hk are required to be positive-definite. A review of common procedures is given in [34]. On of the most commonly used quasi–Newton updating schemes is the BFGS method. This procedure is used papers A,B, E and F in this thesis. The inverse BFGS update. In our fist implementation in Paper A we use a scheme where the inverse of Hk is updated instead of Hk itself. The search direction pk is then computed using the matrix-vector multiplication pk = −Bk ∇ f (xk ), (2.17) where Hk ≡ Bk −1 . The simplest Inverse BFGS updating procedure uses B0 = I, corresponding to It means that the the first search direction is that of the steepest descent. The matrix Bk is updated iteratively according to Algorithm 2: where fk = f (xk ) and ∇ fk = ∇ f (xk ). In the algorithm, a descent direction is ensured by checking the curvature condition sT · y > 0. (2.20) If this condition does not hold then Bk is not positive-definite any longer and we restart our method by using Bk = I. In the thesis, we find this approach to be a quite efficient for convex objective functions, while the convergence for non-convex problems often becomes unacceptably slow.. 17.

(43) Algorithm 2 Inverse BFGS updating formula s = xk+1 − xk y = ∇ fk+1 − ∇ fk if (k=0) OR (sT · y ≤ 0) then. else ρ=. Bk = I. (2.18). Bk = (I − ρ · s · yT ) · Bk−1 · (I − ρ · y · sT ) + ρ · s · sT. (2.19). 1 yT ·s. end if Restarting the updating procedure whenever the curvature condition is not fulfilled does not result in an efficient scheme. This is not a new problem, it has been discussed in e.g. [34, 22]. A straightforward improvement is given by substituting Bk by some matrix providing better information about the curvature of the objective function instead. For example, Bk = Bk−1 .. (2.21). can be used. For the problems studied in paper B, the inverse BFGS with update (2.21) provides better results than using the identity Hessian, however the convergence is still quite slow. The BFGS update and BFGS with damping. To overcome the problem of poor approximation of the Hessian in non-convex cases a damping technique can be applied. A proper choice of the damping parameter θ k ensures that the new Hessian approximation is accurate while still positive-definite, see Algorithm 3. Algorithm 3 Hessian updating by BFGS with damping formula H0 = I s = xk+1 − xk y = ∇ fk+1 − ∇ fk if sT · y ≥ 0.2 · sT · Hk−1 · s. (2.22). then θk = 1 else θ k = (0.8 · s · Hk−1 · s)/(sT · Hk−1 · s − sT · y) end if rk = θ k · y + (1 − θ k ) · Hk−1 · s. (2.23) T. Hk = Hk−1 −. Hk−1 · s · sT · Hk−1 rk · rk + T k sT · Hk−1 · s s ·r. (2.24). Note that (2.24) is the standard BFGS updating formula, where y is substituted by rk . This approach was used in papers B and F and gave good results for non-convex objective functions.. 18.

(44) A hybrid BFGS approach. In the thesis, a hybrid BFGS approach is introduced. This is based on the observation that the inverse BFGS is effective for convex problems, while for non-convex cases the BFGS method with damping should be exploited. Enhancing the BFGS method by chosing the initial value for the Hessian. The initial value for the Hessian approximation is important for the performance of quasiNewton methods. In Paper B, a technique developed in [34] is used. The matrix B0 is computed as yT s (2.25) B0 = T · I. y y Another choice, used in papers E and F, is given by using an analytical formula for an approximation of the Hessian.. 2.4. Methods for Constrained Optimization. The problems of type (2.3)- (2.4) that are studied in this thesis have simple bound constraints. Some of the schemes used are based on simple modifications of methods for local, unconstrained optimization. However, the results in the thesis show that it is essential to treat the constraints in a proper way, since the optima are often found at the boundary of the feasible region. The unconstrained methods, even with special fixes for dealing with the bound constraints, does not behave in a robust way for multi-dimensional problems. In the thesis, several methods for constraint optimization are also considered, and they are adopted to the simple structure of the constraints for the QTL mapping problems. Already for linear constraint, several simplifications are possible in the optimality conditions. First, the constraint qualification holds at every feasible point. Second, the second-order optimality condition involve only the Hessian of the objective, since the constraints have zero curvature. This also results in a simpler matrix structure for the system of equations solved in the Primal-Dual method. In papers B, E and F we implemented the Active Set scheme as well as two interior point methods, the Barrier and Primal-Dual algorithms. A thorough description of the Barrier and PrimalDual methods, including implementation details are found in [31]. The general theory on interior point methods is discussed in [32, 34, 7]. Below, we give a condensed review of the main issues in constrained optimization and give some more details on the Active Set method. The linear inequality constraints (2.4) are defined by a linear system of equations of the form gi x ≥ bi , i = 1, ...m, where m is the number of constraints and gi is the i:th row of the matrix G. The set is divided into the active and inactive constraints. For the active constraints Gˆ and the inactive ¯ the following relations are valid constraints G, Gˆ xˆ = 0, G¯ x¯ > 0,. (2.26) (2.27). where x, ˆ x¯ ∈ x. The active constraints play a crucial role in many algorithms for constrained optimization. They are used for construction of reduced gradient and reduced Hessian matrix. Here, the null space matrix Z of the matrix Gˆ is first constructed. The columns of Z are orthogonal to the rows. 19.

(45) ˆ and the computation of Z is considered e.g. in [32]. Then the reduced gradient Z T ∇ f (x) and of G, the reduced Hessian matrix Z T ∇2 f (x)Z are computed and used as substitutes for the full gradient and Hessian in the basic algorithms. The iterate x∗ is a local minimizer for the nonlinear problem (2.3) with linear inequality constraints (2.4) if the following conditions hold: Z T ∇ f (x∗ ) = 0, λ∗ ≥ 0, T ∗ ∗ λ (Gx − b) = 0, Z T ∇2 f (x∗ )Z is positive semi-definite,. (2.28) (2.29) (2.30) (2.31). where λ ∗ is a vector of Lagrangian multipliers at x∗ . The condition (2.28) is equivalent to ∇ f (x∗ )T λ ∗ = 0,. (2.32). which is well known KKT (Karush-Kuhn-Tucker) condition of optimality.. 2.4.1. The Active Set Method. The Active Set method implemented in papers B,E and F is based on the standard iterative scheme xk+1 = xk + α pk ,. (2.33). where α and pk are a step length and search direction. The Active Set method defines pk using the working set of constraints. The goal is to identify the constraints which are active at a solution, which is done by solving an equality constrained problems subject to working set constraints. Below we consider the methods for computing direction pk , step length α and sketch a procedure for forming the working set W . Search direction The search direction should be a direction of descent and also result in a feasible step. The vector pk is called a feasible descent direction at a point xk if the conditions ˆ ≥0 ∇ f (xk )T pk < 0 and Gp. (2.34). holds. For the Active Set methods we used a reduced Newton descent direction pk obtained as a solution of the constrained version of the Newton method. The version is called the reduced Newton method, [Z T ∇2 f (xk )Z]v = −Z T ∇ f (xk ), pk = Zv,. (2.35) (2.36). where xk is a feasible point and v is some vector in the reduced space. To produce a descent direction the reduced Hessian Z T ∇2 f (xk )Z in the Active Set method should be positive-definite (as is the in unconstrained optimization). To ensure that, we use BFGSlike methods that generate sequences of positive-definite sequences of approximations of the reduced Hessian. In paper F, the damped BFGS method is used, see Algorithm 3.. 20.

(46) Line Search The inactive constraints are used to find an upper bounds on α in the equation (2.34) to produce a maximum step length αmax maintaining feasibility. Here, the following ratio test for computing σi was used, gTi x¯ − bi T αmax = min σi : σi = ( : g p < 0), (2.37) 1≤i≤m (−gTi p) i where the minimum is taken over all the inactive constraints. The step length computation takes the maximal step αmax into account so that the length of the step satisfies the condition f (xk + α pk ) ≤ f (xk ), α ≤ min(1, αmax ),. (2.38). where α is computed using e.g. the Armijo line search. This kind of the line search was employed in paper B for the solution the local optimization problems for the least-squares QTL models. In papers E and F, we used a scheme where we omitted the Armijo line search since the objective function is expensive to evaluate. Working Set Update In each iteration of the Active Set scheme, a minimization problem with equality constraints is solved. These constraints are determined by the working set, which can be considered as a tool for approximating the active at the solution constraints. The working set is updated iteratively and all other constrains are temporarily ignored. If the current solution satisfies the first order optimality condition (2.28), then the Lagrangian multipliers at this point are checked. This done using λ ∗ = Gˆ Tr ∇ f (x∗ ),. (2.39). where Gˆ r is the right inverse (or generalized inverse) matrix to the matrix of gradients of active constraints Gˆ , see [32]. The algorithm terminates if the optimality condition (2.29) holds. However, if some Lagrangian multiplier λi is negative than this constrained is dropped from the working set W and the whole procedure is repeated. A constraint is added to the working set if the current step arrives at some point on that constraint.. 21.

(47) 2.5. Downhill Simplex method. We end this chapter by a brief review of the downhill simplex methods, which is used in paper G in appendix I. This derivative-free method was first described in [33] and has been very popular in many areas including variance component estimation by REML, [29] and in financial applications. It’s main attractive features is that the algorithm is simple, and it uses only objective function evaluations. However, for the many problems with costly objective function evaluations, the downhill simplex scheme is not efficient enough. For such problems, the scheme can still be useful as a method for finding a good initial point for more efficient schemes. The method uses the concept of a simplex, which is a polytope of N + 1 vertices in N dimensions. The scheme generates a new test point by extrapolating the behavior of the objective function measured at each test point arranged as a simplex which forms a convex hull. The algorithm then chooses to replace one of these test points with the new test point and so the algorithm progresses. In each iteration, the worst point is replaced with a point reflected through the centroid of the remaining N points. If this point is better than the best current point, then we can try stretching out along this line. On the other hand, if this new point isn’t much better than the previous value, the simplex shrinks towards the best point. The algorithm involves several parameters that determine how far to move along the line and how much expand of contract the current simplex, depending on whether the search is successful or not. The Downhill Simplex is implemented in paper G. Here, it is able to find a local minimum rather quickly. However, it was also noted that this method is sensitive to the choice of the position and size of initial simplex.. 22.

(48) Chapter 3 Numerical Linear Algebra Algorithms for Variance Component Estimation The efficiency of the solution of the variance component estimation problem depends both on the efficiency of the optimization scheme as such, i.e. the number of iterations required, and the efficiency of the algorithms used for computing the derivatives evaluated inside the optimization loop. To optimize the overall procedure, we develop efficient algorithms for the derivative computations. The algorithms used for the numerical linear algebra operations needed for computing the gradient and Hessian approximation of the log-likelihood function should be adapted to the formulation of the problem, the properties of the variance-covariance matrices, and the number of variance components included in the model. Computation of derivatives of the log-likelihood involves the inverse of a variance-covariance matrix, matrix-matrix products and trace computations. Below, we review the formulas for the gradient of the log-likelihood and the average information matrix for the model (1.9), including two variance components: = σ12 Π1 + σ22 I, = V −1 −V −1 X(X T V −1 X)−1 X T V −1 , ∂V ∂V DL = ∂∂σL2 = tr( 2 P) − yT P 2 Py, i ∂ σi ∂ σi ∂V ∂V AI(σi2 , σ 2j ) = yT P 2 P 2 Py, i, j = 1, 2 ∂ σi ∂ σ j V P. (3.1) (3.2) (3.3) (3.4). Here, Π1 is the IBD matrix and X is the design matrix of size n × r, r ≡ n f the number of fixed effects in the model (1.9). In the optimization scheme, estimates of σ12 , σ22 are updated in each iteration. This implies that all the quantities (3.1)- (3.4) also need to be recomputed in each step. Here, the most computationally demanding operation is the computation of inverse of the variancecovariance matrix V , (3.1). Using a standard algorithm, like Cholesky factorization, the computational complexity of computing V −1 directly is of the order 43 n3 . However, for some classes of problems, it is possible to avoid a direct computation of V −1 . This is done by using the mixed model equations (MME) [16]. In [20, 1], efficient approaches have developed for performing the computations using the MME. Below, we briefly describe this approach applied to the model (1.9).. 23.

No results found