• No results found

Development of Variance Component Methods for Genetic Dissection of Complex Traits

N/A
N/A
Protected

Academic year: 2021

Share "Development of Variance Component Methods for Genetic Dissection of Complex Traits"

Copied!
34
0
0

Loading.... (view fulltext now)

Full text

(1) 

(2)  

(3)   

(4)  

(5)   

(6)  

(7)

(8) .  

(9)   

(10)      

(11)   !£"#$"&$!#&.    

(12)     

(13)

(14)   . #$$!'(')' #$%!*+,*'((+(-( ./0//../ '1'-**.

(15) . 

(16)         

(17)      !   "  # $%%& &'

(18)  (! 

(19) '

(20) 

(21) 

(22) ')(

(23) 

(24) (* +(  

(25) ,

(26)   - !(*    "*%&*

(27)  

(28) '.  

(29) 

(30)   (

(31) '

(32) /   

(33)

(34) '

(35) +  *0   .   *

(36)     

(37)

(38)       121*%*  * ,6%1&34&$424324* +( (      

(39)  

(40)  .   

(41) 

(42)   5.6 

(43) ( '

(44) 7 . +  8

(45) 57+86  !* +('   

(46)  

(47) ' (

(48) 

(49) 

(50) ! 

(51)    ,'   ''  (

(52) '

(53)    ! 9   (    

(54) * +( ,  (

(55)   :  .   

(56) ' ( 

(57)    

(58)     

(59) '

(60) . 

(61) 

(62) ,  

(63) !

(64) ' .  !' ( 

(65)   

(66)  ! (

(67) 

(68) 

(69) ! 

(70).  :  !

(71)  (

(72)   7+8*0 ,.

(73) (  

(74)  

(75)  

(76)  

(77) 

(78)      (

(79)  

(80) 

(81) '. ( '' , (    

(82) !    

(83)   

(84) *+( (

(85) , .  

(86)          (!( 

(87)  

(88) , 

(89)   7+8. (    ! 

(90)  7+8  !* +( 

(91)    '

(92) 

(93)  ( 

(94)  

(95)  ;  4!  

(96)  !   . 

(97) (* +( 9   

(98)  !

(99)  ( ,    

(100)    ( 

(101)   '

(102)  

(103)    

(104) 

(105)  !

(106)   ! 

(107)  

(108)  (  

(109) ' ( 9      , ( 

(110)  ! !

(111)  ( ,  

(112)  '

(113)    ! ( :

(114) ' ( 

(115)  ! 

(116)    !  

(117) ! *+(

(118)  ,

(119)  (

(120) ,( <  '

(121) . ( 

(122) '    !  

(123)  098 ! 

(124)    '   

(125)  ! ,

(126)  (:     !  '

(127) 

(128) ,!( *= 7+8   "%

(129)  

(130) ,      (098!  

(131)   !

(132) 

(133) (!

(134)  (

(135)  '

(136)  

(137) '

(138)        (    !      ''  

(139) ' (    !

(140)   '

(141)  

(142) 

(143)  098*  o  'HSDUWPHQWRI&HOODQG0ROHFXODU%LRORJ\    %LRLQIRUPDWLFV%R[8SSVDOD8QLYHUVLW\ !"#$%

(144)

(145) & >" o

(146)   %& 9??=$1$41%$2 9?=&34&$424324     4$$&&5(. @@ *:*@ 

(147) A B    4$$&&6.

(148) List of Papers. This thesis is based on the following papers, which are referred to in the text by their Roman numerals. I. II. III. IV. Besnier, F., Carlborg, Ö. (2007) A general and efficient method for estimating continuous IBD functions for use in genome scans for QTL. BMC bioinformatics, 8:440 Rönnegård, L., Besnier, F., Carlborg, Ö. (2007) An improved method for Quantitative Trait Loci detection and identification of within-line segregation in F2 intercross designs. genetics, 178:2315-2326 Besnier, F., Carlborg, Ö. (2009) A genetic algorithm based haplotyping method provides better control on haplotype error rate. (Manuscript) Besnier, F., Wahlberg, P., Rönnegård, L., Ek, W., Andersson, L., Siegel, L., Carlborg, Ö. (2009) Fine mapping and replication of QTL in a chicken Advance Intercross Line. (Manuscript). Reprints were made with permission from the respective publishers..

(149)

(150) Contents. INTRODUCTION .............................................................................................. 7 Introduction to QTL mapping, a linear regression approach ...................... 9 QTL mapping by a variance component approach.................................... 12 AIMS ............................................................................................................ 16 SUMMARY OF INCLUDED REPORTS...................................................... 17 Further developments of the variance component model (paper I and II)17 A general and efficient method for estimating continuous IBD functions for use in genome scans for QTL (Paper I) .......................... 17 An improved method for quantitative trait loci detection and identification of within-line segregation in F2 intercross designs (Paper II).................................................................................................. 19 Variance component analysis of deep pedigrees (paper III and IV) ........ 21 A genetic algorithm based haplotyping method provides better control on haplotype error rate (paper III) ......................................................... 23 Fine mapping and replication of QTL in a chicken Advance Intercross Line (paper IV)........................................................................................ 24 CONCLUSION................................................................................................. 26 FUTURE PROSPECTS ................................................................................... 27 Ongoing work .............................................................................................. 27 Near future ................................................................................................... 27 Looking ahead.............................................................................................. 28 Acknowledgments ............................................................................................ 29 References ......................................................................................................... 30.

(151)

(152) INTRODUCTION. All the individuals of any given species, with the exception of homozygous twins, differ in many phenotypic traits (size, skin or hair color, behavior…). Since the re-discovery of Mendel’s work in 1900, it is understood that the inter-individual differences observed between and among species are due to inherited genetic factors and understanding HOW and WHY individuals differ from each other implies the study of their genes. Phenotypic traits are usually divided into two categories. On the one hand, discrete traits that are measured on a scale containing a pre-determined number of factors (i.e. blue/ green / brown for eye color, or A/B/O for blood groups) and on the other hand, continuous traits that take a virtually infinite number of possible values on a continuous scale (i.e. body weight, hormone content in blood…). This apparently arbitrary distinction between discrete and continuous traits is particularly pertinent for understanding the underlying genetic mechanism, as most discrete phenotypes are determined by a single gene, whereas continuous phenotypes are driven by the combined effects of several genes. The association of discrete traits with monogenic factors on the one side, and continuous traits with polygenic factors on the other holds in most situations although some exceptions exist, such as monogenic traits appearing continuous by being largely influenced by environment, or continuous phenotypes that can be “discretized” by considering a certain relevant threshold. An example of the latter is represented by epidemiological traits that are indicators of patients’ health. Thus, phenotypes below the threshold are considered healthy whereas values above are interpreted as symptoms of disease. For instance, glucose is a continuous trait and diabetes is the resulting disease when glucose levels in blood are higher than the threshold value, and there is a similar relationship between cholesterol levels in blood and coronary disease. A summary of the different possible traits and underlying genetic architecture is given in table 1.. 7.

(153) Table 1. Examples of discrete/continuous traits and their underlying genetic architecture (monogenic/polygenic). Mendelian (monogenic trait) Complex (polygenic trait). Discrete. Continuous. Wrinkled / smooth peas Diabetes Litter size. X Body weight Cholesterolemia. In the case of monogenic traits, genes exist in several versions or alleles; thus allele-A would generate one phenotype (e.g. smooth surface in peas) whereas allele-B would generate a second phenotype (wrinkled surface). This well-known example in peas was first described by Gregor Mendel in 1866, and set the bases for understanding genetic inheritance. Hence, traits driven by a single gene are called Mendelian traits in recognition of his contribution. Continuous trait in contrast, applies to phenotypic traits that are influenced by the combined effect of several genes. The study of those traits is therefore more challenging than the study of Mendelian ones and they are thus often referred to as “complex” or multifactorial traits. As many complex traits are of economical value, for example animal body weight for meat production or milk yield, the genetics of complex phenotypes have been extensively studied since the early 20th century. Although the resemblance between members of the same family had been investigated by the mean of statistical correlation: Galton (1822-1911) Pearson (18571936), the first theoretical framework of quantitative genetics was the infinitesimal model [Fisher, 1918]. The phenotype is then assumed to be determined by an infinite number of unlinked and non-interacting loci, each contributing an infinitesimal effect [Falconer and Mackay 1996]. Each locus has an equivalent effect and frequencies, they are assumed to be interchangeable, the effect of all loci is then summarized in a single variable. This model allows the genetic contribution to phenotypic variance to be calculated, as well as the population’s response to artificial selection. Although it has proven to be very successful when applied to the genetic improvement of domestic breeds, this model gives a little insight into the actual mechanism underlying the complex traits. One alternative to improve our understanding of multifactorial traits consists in dissecting the genetics underlying those traits by firstly isolating genes that have notable influence on the quantitative phenotypes (major genes), and secondly by quantifying the contribution of every gene to the phenotype in terms of individual gene effect and possible interaction effect. The idea of major genes therefore represents an exception to the infinitesimal framework since the phenotype is no longer determined by loci with infinitesimal effects. Instead, a given proportion of the phenotypic variance is explained by one or several major genes, whereas the remaining propor8.

(154) tion is assumed to be determined by a large number of genes having a small effect. The distinction between major and minor genes does not rely on any functional difference between two kinds of genes, but on the magnitude of their effect that subsequently determine the ability of those loci to be detected [Falconer and Mackay 1996]. As the effect of the major genes is often only a small proportion of the phenotypic variance, the power to detect major genes remains an issue, and the dissection of complex genetic traits represents a challenge that, considering current knowledge, can only be achieved by combining several disciplines, including molecular biology, experimental population design, statistics and computer science. The first step, consisting in detecting a chromosome region or locus that has significant influence on a quantitative trait, is referred to as Quantitative Trait Locus (QTL) mapping.. Introduction to QTL mapping, a linear regression approach The general principle of QTL mapping consists in measuring a given phenotype for each individual in a population, as well as collecting information about the genotype of those individuals at several locations in the genome. The correlation between genetic and phenotypic variance is assessed at regular intervals along the genome, with high correlations indicating higher likelihood of the presence of a QTL affecting the measured phenotype. The resolution of QTL mapping depends both on the availability of dense sets of polymorphic markers that provide genotype information at regular intervals along the genome and on the availability of populations with sufficiently strong Linkage Disequilibrium (LD) that the information provided by a given marker can be extrapolated to the neighboring chromosome region. Recent advances in molecular genetics have improved the density and abundance of available genetic markers and this, together with the reduction in cost of high-throughput genotyping procedures allows obtaining high resolution of QTL scan by using dense genotype information for hundreds or thousands of individuals. A well-known approach to generate a level of linkage disequilibrium congruent with the density of markers utilized in the studied population is to generate an experimental population. The most widely used procedure starts with an intercross of pools of individuals coming from two genetically inbred lines. As described in Fig. 1a, animals from the same lines are expected to share similar genetic background. After the first generation of intercrossing, the resulting individuals (F1) are heterozygote at all loci with one of each chromosome pair from each parental line. After the second generation 9.

(155) of intercross (F2), recombination in the parental gametes generates offspring with fragmented chromosomes showing either of the two possible lines origin at different chromosome locations. Three different genotypes are then possible for the F2 individuals: A/A for homozygote line A, B/B for homozygote line B, or A/B for heterozygotes.. Figure 1a. Representation of chromosomes in an inbred cross population experiment. The founders are sampled from two inbred lines of animals with different genetic background. Genetic material from line A is colored in light grey, and material from line B in colored in dark grey.. Since the genotype can vary between individuals and between chromosome positions, tracing back the line origin of each allele in the F2 pedigree allows the genetic variability at each locus to be measured. Correlation between genotypic and phenotypic variability can subsequently be estimated as illustrated in Fig. 1b.. 10.

(156) Figure 1b. Regression of phenotype against genotype. For each analyzed position, the phenotype of each individual is plotted against the genotype (A/A, A/B or B/B). The fit of the regression is quantified using least squares regression (r). A high value of r indicates a high correlation between genetic and phenotypic variance, and therefore a high likelihood for the presence of a QTL.. A high correlation between genetic and phenotypic variance indicates a high likelihood for the presence of a QTL. Measured as the least square correlation coefficient (r), the correlation between genotype and phenotype can be reported for each tested position on the chromosome as in Fig. 1c. The curve representing the likelihood for presence of QTL at all tested positions along the chromosome is often referred to as a chromosome scan.. 11.

(157) Figure 1c. Chromosome scan for QTL Correlation between genotype and phenotype is reported at all tested positions as an indicator of the presence of a QTL.. The previous example corresponds to QTL mapping based on linear regression of the genotype against the phenotype in an experimental population, as described in [Haley and Knott 1992, Martinez and Curnow 1992]. The genotype is modeled as a fixed effect as in equation1,. y = X + e. (1). where y is the vector of phenotypes,  is the vector of fixed effects, X is the incidence matrix for the fixed effects including at least the genetic effects and other possible effects like sex or generation of the animals, and e represents the residuals. An equivalent model is fitted at each tested chromosome position as described in the previous section. The hypothesis of presence of a QTL affecting the trait can be tested by ANOVA at each position and corrected afterward for multiple testing [Lander and Kruglyak 1995 , De Koning et al 1999], or by mean of permutation [Churchill & Doerge 1994].. QTL mapping by a variance component approach. The linear regression based QTL mapping is a powerful approach that relies on the availability of an experimental population where the two founder lines are assumed to be fixed for alternative QTL alleles. When this condition is not fulfilled i.e. QTL alleles are not fixed within founder lines, alternative and more flexible approaches are often preferred. 12.

(158) A flexible method for detecting QTL in a broader range of cases and pedigree structures is the variance component (VC) approach, where the QTL effect is modeled as a random effect (Fernando and Grossman 1989; Goldgar 1990). Since fixed effects like sex or generation of the animals are often included, a mixed model is fitted as in equation 2.. y = X + v + e. (2). where y is the vector of phenotypes,  is the vector of fixed effects, X is the incidence matrix for the fixed effects, v is the random genetic effect, and e the residuals. y is then assumed to follow a multivariate normal distribution: y ~ MVN ( X ,V ), where V is the variance-covariance matrix of y. V is given by. V (y) = A v2 + I  e2. (3). where A is the genotype IBD (Identity By Descent) matrix,  v2 is the genetic variance due to the QTL, I is the identity matrix, and  e2 is the residual variance. As in linear regression based QTL mapping, the model is fitted at each tested position in the genome. The likelihood function follows a multivariate normal distribution as in equation 4. l(y / X , v2 , e2 ) =. 1. (2). n.  1. exp (y  X )T V 1 (y  X ).  2 |V |. (4). The QTL scan thus consists in finding, at each tested position, the estimates of  v2 and  e2 that maximize the likelihood of y. A significant value of  v2 indicates the presence of a QTL affecting the phenotype y. Solving equation 4 is not trivial, and numerical approaches are required to estimate the variance components  v2 and  e2 , [Lynch, and Walsh, 1998, Johnson and Thompson 1995. Dempster et al 1977, Harville 1977]. The different possible methods will not be discussed in detail here, it is however important to notice that among genome positions, the only change in the likelihood expression of equation 4 is in the matrix A which is included in V. A is the genotype IBD matrix estimated at each tested genome position. It is because equation 4 is solved with a different A matrix at each position that the variance component estimates  v2 and  e2 vary among positions. A is therefore a key element of the variance component approach: after being estimated from genetic marker and pedigree information [Wang et al 1995, PongWong et al 2001, Heath 1997, Abecasis et al 2002], it is via the A matrix that. 13.

(159) the individuals genotype information is included in the VC model. The following section will describe the structure of the IBD matrix in more detail. The genotype IBD matrix is a square matrix of dimension n, where n is the number of individuals in the population. Each value at line i and column j of the matrix correspond to the expected number of alleles shared Identical By Descent (IBD) between individual i and individual j: A is therefore a symmetric matrix. Identical By Descent (IBD) refers to alleles that have been inherited from the same known ancestor in the pedigree, this in contrast to Identical By State (IBS) that refers to alleles that display the same molecular genetic signature, but may not necessarily be traced back to the same ancestor within the pedigree, as illustrated in Figure 2. In Figure 2a the pedigree consists of two families and two generations each. Individual 2.1 and 2.2 share the same allele (in green) inherited from individual 1.2. This allele is thus shared IBD between 2.1 and 2.2. However individual 2.1 and 2.3 also share an allele (in blue), but since no common ancestor can be found between those individuals, 2.1 and 2.3 share this allele IBS but not IBD. If the information about the previous generation is included in the pedigree as in Figure 2b, we can see that all the blue alleles can be traced back to the same ancestor (individual 0.1). From this new information, we can conclude that all the “blue alleles” of the pedigree are IBD. Here the IBD coefficient between 2.1 and 2.3 that was null in Figure 2a becomes 0.5 in Figure 2b where one additional generation has been included in the pedigree. The IBD matrices are reported for each pedigree in Figure 2. A value equal to zero means that no alleles are shared IBD between a pair of individuals, 0.5 means that one allele is shared IBD, i.e. one half of the genotype, and 1 means that the individuals share two alleles in common. In case of inbreeding the IBD coefficient can be greater than 1 and is equal to two when a homozygote individual has two copies of the same allele IBD (i.e. individual 2.4 Figure 2b).. 14.

(160) 2.a. 2.b Figure 2. Schematic representation of the allelic inheritance and corresponding IBD matrix through a two generation pedigree (Fig. 2a) and through a three generation pedigree (Fig. 2b).. Compared with the linear regression method, the variance component is a more general and flexible approach for detecting QTL in various types of populations. However two computationally demanding steps must be performed at each genome position tested for presence of QTL: first estimation of the IBD matrix at the given chromosome location, and secondly compute the maximum likelihood variance components. The variance component approach is thus considerably more computationally demanding than the linear regression QTL mapping.. 15.

(161) AIMS The aims of this thesis are to further develop the variance component QTL mapping framework to be a more computationally efficient and statistically powerful tool when applied to experimental data, and to use the new methods to dissect the genetic architecture of complex traits in outbred intercross populations. The reports included in this thesis have addressed this by. 16. I. Improving the computational efficiency of the IBD matrix estimation.. II. Developing the VC model to include hypothesis testing about the level of fixation of the alleles in outbred populations. III. Developing a haplotype estimation method adapted for the analysis of deep pedigrees.. IV. Applying the newly developed methods in the analysis of an experimental chicken pedigree consisting of nine successive generations after intercross between two phenotypically divergent lines..

(162) SUMMARY OF INCLUDED REPORTS. Further developments of the variance component model (paper I and II) One limitation of variance component based QTL mapping is the computational work of testing multiple genomic positions in large pedigrees. Because the IBD matrix is a square array of dimension n (the number of individuals), the time needed to calculate the matrix and the memory space to store it is proportional to the square of n. Consequently, the computational work needed to carry out a genome scan for single QTL on a given pedigree increases linearly with the number of tested positions, but with the square of the number of individuals. It is therefore important to provide fast and efficient methods to compute IBD matrices. A solution to this problem is presented in the first paper of this thesis. A second limitation of the VC approach is that the effects of each allele from the base generation of the pedigree are modeled independently (Fernando and Grossman 1989; Goldgar 1990); therefore no particular assumption is made about the population structure. In experimental populations however, the line origin of the QTL alleles is expected to have a significant effect on the allelic effects. To address this problem, alternative models mixing fixed line effect and a random QTL effect have been proposed, [Perez-Enciso and Varona, 2000, Wang et al 1998]. In the second paper of this thesis, we develop a variance component method to estimate the genetic correlation within founder lines from the marker, pedigree and phenotype information of the experimental cross.. A general and efficient method for estimating continuous IBD functions for use in genome scans for QTL (Paper I) A new method for estimating IBD matrices inspired by [Pong Wong et al 2001] has been implemented. As in [Pong Wong et al 2001], the IBD coefficients are estimated recursively from ancestors to descendants through the pedigree, in a deterministic way. This method combines both advantages of being faster than stochastic approaches that require multiple iterations [Heath 1997, Abecasis et al 2002], and of being flexible e.g by easily including a genetic covariance structure for the individuals in the founder popula-. 17.

(163) tion. The covariance among founders can be computed independently based on population history [Meuwissen and Goddard 2000]. From [Pong Wong et al 2001] it is easy to show that the IBD probabilities at a given location can be expressed as a continuous function of the distance to the next flanking informative marker. We show how to calculate this function, either by derivation of the exact IBD function, or by approximating it utilizing a limited number of single-locus IBD values. In Figure 3 we illustrate the resemblance of our estimated IBD function to the results obtained when estimating several single-locus IBD probabilities in the same marker interval using the LOKI program [Heath 1997].. Figure 3. Estimation of two polynomial IBD functions for two different pair of individuals in a marker bracket using IBDs at the right and left flanking marker and at the mid-point in the interval as input.. If several IBD matrices have to be estimated within the same marker interval, computing the IBD function results in a better use of computer resources; the IBD probabilities are summarized into an IBD function file that requires less memory space than storing several single-locus IBD matrices. Moreover, each IBD matrix can be rapidly calculated from the IBD function file without needing to be stored. This last point can improve the efficiency of genome scan for multiple interacting QTL where every pair of loci must be tested. Pairwise testing of each locus implies importing the same IBD matrix several times into the variance component program, causing slow flow of information between the memory drives and the CPU of a computer. Estimation of an IBD function instead of single point IBD thus increases the efficiency of IBD-matrix estimation in genome scans for QTL and facilitates further improvements by resolving methodological bottlenecks in algorithms to scan multiple QTL.. 18.

(164) An improved method for quantitative trait loci detection and identification of within-line segregation in F2 intercross designs (Paper II) In order to estimate the genetic correlation within founder lines in an experimental cross, a new variance component model has been developed. An alternative notation for equation 2 is. y = X + Zv * +e. (5). [Rönnegård and Carlborg 2007], where v* is the vector of m independent normally distributed base generation QTL alleles with variance 1/2  v2 , and Z is an incidence matrix of size n x m relating individuals with the base generation alleles. In an experimental line cross population we set a mixed linear model as in equation 5, where v* is a random effect with m levels. m can be decomposed into mA + mB, mA being twice the number of individuals in line A, and mB twice the number of individuals in line B. An unknown correlation  is expected between the alleles of the same line origin, such that v* ~ MVN (0, G). For a pedigree consisting of one founder in line A and three founders in line B, G is written as  v2  c2  2 2  c  v   1 G= 2     .  v2  c2  c2  c2  c2  c2.  c2  v2  c2  c2  c2  c2.  c2  c2  v2  c2  c2  c2.  c2  c2  c2  v2  c2  c2.  c2  c2  c2  c2  v2  c2.     c2    c2   c2    c2   c2    v2 .     1 .           1 . (6a). or alternatively 1    1   1 2 G = v 2      . 1   .  .  1    .   1   .    1  . (6b). 19.

(165) Hence, y~MVN (X, V), where V=ZGZ ´+ I  e2 .. (7). Estimating  v2 and  c2 is achieved by decomposing ZGZ’ into  I  v2 + ( I   J ) c2 Thus V =  I  v2 + ( I   J ) c2 + I e2. (8). where I and  J are the IBD matrices calculated under the hypothesis of independent or fixed alleles within lines respectively. The power to detect QTL have been compared between our new Flexible Intercross Analysis (FIA) and a linear regression based QTL mapping method [Haley and Knott 1992,] in four different simulated pedigrees with four level of fixation (Figure 4). The power to detect QTL is equivalent between the two methods when the alleles are fixed within lines, and decreases for both methods when the level of fixation decreases. For the FIA method, the power is always higher or at least as high as the linear regression method. The difference is larger for pedigrees with smaller number of founders and large F2 population, whereas pedigrees generated by a large number of founders contributes to affect similarly the power of the two methods. Tested on experimental data [Lundström et al. 1995], the FIA method showed an substantial gain in power compared to the linear regression QTL mapping [Haley and Knott 1992] as the QTL responsible for meat quality in a Wild boar X Large white F2 cross was detected using FIA but not with Haley-Knott regression.. 20.

(166) Figure 4. Power to detect QTL with Haley-Knott regression (HK) and FIA for four simulated cases from total fixation (case 1) to complete segregation (case 4). Four different F2 pedigrees were simulated with a large (50 founders) or small (4 founders) base generation, and a large (800 individuals) or small (200 individuals) F2 generation. A QTL was simulated at a fully informative marker (solid lines), or between two fully informative markers 40 cM apart (dashed lines). Variance component analysis of deep pedigrees (paper III and IV) The F2 intercross design is a powerful approach for detecting QTL on broad chromosome regions. Because the chromosomes in the F2 generation have undergone a single set of recombination (gametes of the F1 generation), the linkage disequilibrium (LD) is strong in the F2 population [Lynch and Walsh 1998]. As a consequence, the genotype information given by molecular markers is strongly correlated along the chromosomes, and the density of marker does not need to be high in order to detect QTL, with e.g. one marker per 10 or 20 centimorgan (cM) [Jensen 1989]. However, the resulting confidence intervals for the QTL locations are large [Darvasi et al 1993], with QTL covering several hundreds of genes. 21.

(167) In order to improve the resolution in the QTL scan, it is necessary to use a denser genetic marker map together with a population where the level of LD is lower. To reduce LD in the F2, one can breed additional generations. Repeated intercrossing starting with F2 individuals to generate an F3 population and so on will generate an Advanced Intercross Line (AIL) pedigree [Darvasi and Soller 1995, Yu et al 2007, Jerzy et al 2006]. Each generation of intercross will introduce new recombination events and decrease the LD. QTL analysis of AIL pedigrees will thus provide better resolution than a QTL scan based on an F2 population. It is important to note that when the number of successive generations increases in the pedigree, the task of tracing the inheritance pathways of the alleles from the last generation to the founders becomes more challenging as the number of possible pathways increases. Moreover, the genetic map that will provide marker information at cM or sub-cM interval will in practice be based on Single Nucleotide Polymorphism (SNP) markers, which normally only display a binary polymorphism. Those markers are therefore less informative than e.g. microsatellite markers that might have a larger number of alleles at each locus [Grapes et al 2006]. A multigenerational pedigree together with binary markers will contribute to increase the difficulty in computing accurate estimates of the IBD relationship between individuals from their genotype. Our deterministic algorithm (paper I) computes IBD matrices using data from fully informative markers, i.e. markers where the alleles can be unambiguously traced back from offspring to grandparents [Pong Wong et al 2001], and discards markers that do not fulfill this criteria. It is then expected that when analyzing a deep pedigree with SNP marker information only, the algorithm will discard many markers, and thus make a poor use of the available information. To address this problem, we propose a new implementation of the algorithm that computes IBD probabilities based on haplotype information in addition to genotype and pedigree. By assigning a parental origin to each allele, the haplotype adds an extra level of information that will increase the number of informative markers taken used to estimate the IBD. In the third paper of this thesis, we describe a new haplotype estimation algorithm that was developed to minimize the risk of haplotyping error in deep pedigrees. In the fourth paper we describe the analysis of a nine generation AIL pedigree resulting from crossing two chicken lines subjected to more than 40 generations of bi-directional selection for increased and decreased body weight.. 22.

(168) A genetic algorithm based haplotyping method provides better control on haplotype error rate (paper III) A new algorithm has been implemented to estimate the chromosome haplotypes in a population, based on marker genotype and pedigree information. The haplotyping method relies on a mixture of rule based deterministic approach [Qian and Beckmann 2002], and a genetic algorithm approach [Tapadar et al 2000, Levine 1996]. The marker phases that can be inferred with certainty are resolved with the deterministic rules, whereas uncertain cases are resolved iteratively via a genetic algorithm procedure. Each uncertain marker phase is inferred several times in order to assess the robustness of the haplotype at a given location. Less robust haplotypes are discarded in order to minimize the risk of propagating erroneous marker phases through the consecutive generations of the pedigree. The accuracy of our new method has been tested on a simulated version of the AIL pedigree that will be analyzed in paper IV. In Figure 5, the proportion of correct marker phases (Figure 5.a), haplotype error and unresolved cases (Figure 5.b) are plotted against the stringency criterion value used to determine if a marker phase is robust enough to be trusted. Increasing the stringency reduces the amount of inferred haplotypes and the amount of erroneous marker phases, but increases the proportion of unresolved cases. Comparison of our algorithm with two other haplotyping methods indicated that with stringency criterion equal to 0.75, results from our algorithm are comparable to those of a deterministic method [Hernández-Sánchez and Knott 2009], with 97.7% of correct haplotypes, whereas MERLIN, a likelihood based method [Abecasis et al 2002], seems to display a higher amount of unresolved cases. This haplotype estimation algorithm represents an important step before starting the analysis of deep experimental pedigrees for QTL detection. Adapted to multigenerational pedigrees, the algorithm provides robust haplotype information that increases the information content of the marker data utilized by our IBD estimation program.. 23.

(169) Figure .5- Accuracy of the haplotyping algorithm as function of the stringency criterion a) Percentage of correct heterozygous marker phases in the estimated haplotypes. b) Percentage of haplotype error (plain dark line) and percentage of non estimated marker phases (dashed line).. Fine mapping and replication of QTL in a chicken Advance Intercross Line (paper IV) A nine generation AIL pedigree was bred from two chicken lines divergently selected for body weight [Dunnington and Siegel 1996]. All animals in the pedigree were genotyped at – 1cM intervals in nine genomic regions where significant or suggestive QTL signals were detected in an earlier F2 population from the same lines [Jacobsson et al 2005]. Haplotype information was inferred for each chromosome segment with the haplotyping algorithm described in paper III, and scan for QTL was performed in each QTL region using the FIA approach of paper 2. All nine regions showed a significant QTL signal for body weight at 56 days of age (BW56) (Figure 6). All QTL regions except Growth9 [Jacobsson et al 2005] on chromosome 7 displayed a single QTL signal. Further fine mapping in the segment on chromosome 7 revealed that the original Growth9 QTL was due to two independent and linked loci. The QTL peak in the AIL pedigree is often narrower than the original F2 QTL, illustrating the increased resolution in QTL mapping when using an AIL. In Figure 7, we compare the width of the QTL peaks in the F2 and the AIL for the strongest QTL; Growth1 and Growth9. The width of Growth1 in the AIL is about 1/3 of that in the F2 (Figure 7a). For Growth9 on chromosome 7, the increased resolution in the AIL allowed the single QTL peak in the F2 pedigree to be separated into two separate QTL in the AIL (Figure 7b). This study illustrates the power of using Advanced Intercross Lines for replication and fine mapping of QTL in divergent line crosses. Our strategy to use both genotype and phenotype information from all individuals in the entire pedigree clearly makes efficient use of all avail-. 24.

(170) able genotype information provided in AIL, where the use of outbred founders results in very few SNP marker alleles being fixed within the lines.. Figure 6. Chromosome scan based on score statistic for 9 regions genotyped in the AIL pedigree, 1% significance threshold in dashed horizontal line.. Figure 7. Comparison between width of QTL peak in F2 pedigree and AIL pedigree analysis on Growth1 (a) and Growth9 (b). 25.

(171) CONCLUSION. The work presented for this thesis focused on developing and applying variance component based QTL mapping tools in experimental pedigrees. We further developed several aspects of the VC QTL scan methodology by implementing a new fast and efficient method for estimating IBD matrices and a new VC model adapted for the analysis of experimental population with outbred founders. We then focused on the prospect of analyzing multigenerational pedigrees by the VC approach. Our IBD estimation algorithm was extended to include haplotype information in addition to genotype and pedigree to improve the accuracy of the IBD estimates in complex pedigrees. These newly developed methods where subsequently applied to analyze a nine-generation AIL pedigree obtained after crossing two chicken lines divergently selected for body weight. Nine QTL originally detected in a F2 population were replicated in the AIL pedigree, and our strategy to use both genotype and phenotype information from all individuals in the entire pedigree clearly made efficient use of the available genotype information provided in AIL.. 26.

(172) FUTURE PROSPECTS. Ongoing work Several interacting QTL were described in an F2 pedigree derived from the same chicken lines as the AIL presented in paper IV [Carlborg et al 2006]. It is thus a priority to further investigate the genetic architecture of body weight in the chicken AIL, focusing on the detection of possible gene by gene interaction (epistasis). An ongoing project consists in re analyzing the AIL pedigree for epistatic interactions, using both variance component and linear regression approach.. Near future The work presented in this thesis focuses on QTL detection, which is the first step in the dissection of the genetic architecture of complex traits. One step further in the dissection of complex traits consists in quantifying the genotypic contribution to the phenotype in terms of single QTL effect and possible interaction. In that regard, the variance component approach is still outperformed by linear regression methods where recent developments facilitate the orthogonal decomposition of the genetic effects into additive and dominance contributions, as well as interaction effects, e.g. additive X additive, additive X dominance effects [Alvarez-Castro et al 2007, AlvarezCastro et al 2008]. Variance component estimation of dominance effect has been investigated [Xu 1996], but so far, the ways to obtain independent estimates of additive and dominance effects within the VC framework are unclear. However, epistatic interactions have been described in experimental data set consisting of outbred lines crosses [Le Rouzic et al 2008, Carlborg et al 2006], where the conditions for linear regression QTL mapping are not fulfilled, i.e. the alleles are not fixed with parental lines. A model like FIA allowing for allelic segregation within founder lines would therefore be more adapted to such data. Projects in the near future thus involves exploring the possibility of computing independent VC estimates of additive and dominance effects, in order to decompose the genetic effects into independent partitions.. 27.

(173) Looking ahead Many traits of scientific interest cannot be investigated by mean of experimental populations, thus, the study of the genetic architecture underlying those traits has to be carried on the natural population. In some cases, the natural conditions confer to the population similar LD structure as in an experimental design. This is for example the case in natural hybridization [Slate 2005]: if two sub-species are in contact within their respective habitat and hybridize, the resulting individuals display the same characteristics as the F1 individuals in an experimental cross of outbred lines. The next generation of hybrids is therefore similar to an F2 or a back cross population. For other populations however, no such phenomenon happens and the LD level remain low. Studying the genetic architecture of continuous variation among natural population is therefore a very challenging area, where theoretical approach must be able to handle mapping population consisting of hybrids from different origin, i.e F2 and backcross possibly co-existing in the same population, or even with populations of low LD. Development of tool able to take the population structure and history into account can be an important contribution to such project.. 28.

(174) Acknowledgments. The work presented in this thesis was performed at the Linnaeus Center for Bioinformatics at Uppsala University and at the Department of Animal Breeding and Genetics in SLU Uppsala. We gratefully acknowledge founding from the Knut and Alice Wallenberg foundation, Swedish Foundation for Strategic Research, FORMAS and EURYI.. I would like to thank my current and former collaborators for stimulating and friendly working environment: -The OLD Carlborg group: José, Arnaud, Byeong-Woo and Lars, for everything I learned by working with them. For the NOIA discussions with Arnaud and José in the office, around a dark beer or in Lapland. Thanks to Arnaud for the table tennis games and music discussions. And thanks to Lars for his infinite patience in explaining Variance Component models. -The NEW Carlborg group: Anna, Lucy, Weronica, Linda, Xia Shen, Mats and Stefan for always being helpful and ready to share experience, knowledge and good cakes. Thanks to Weronica whose overload of energy makes my working environment an every day entertainment. Thanks to my main supervisor Örjan Carlborg who helped me searching in the good direction during these four years, for his wise advice about scientific work and interesting discussions about life in general. For believing on everyone’s potential to do good science, provided they are given the favorable environment. And thanks especially for the sincere care he takes of people who work with him. I would also like to Thanks my second supervisor Patrik Magnusson at Karolinska institute, Per Walberg and Leif Andersson at Uppsala University for their collaboration on the AIL pedigree, Paul Siegel in Virginia Polytechnic Institute for all his work on the chicken lines, Frank Albert for his enjoyable visits in Uppsala and the QTL mapping on rat behavior. Marcin Kierczak for high quality Polish vodka supply. Rebecca, Beni, Norbert, Steffi and Anna for being my first friends in Sweden. Jan-Åke and his birds, who made me discover the world of parrots. My family for all their support María for sharing with me; life, and a common taste for the company of Psittacidae. Irraciblae (aka: Mauricette & Mafalda).. 29.

(175) References. Abecasis G R, Cherny S S, Cookson W O, Cardon L R: Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nature genetics 2002, 30: 97-101 Alvarez-Castro J M, Le Rouzic A, Carlborg O. How to perform meaningful estimates of genetic effects. PLoS Genet (2008) vol. 4 (5) pp. e1000062 Alvarez-Castro J M, Carlborg O. A unified model for functional and statistical epistasis and its application in quantitative trait Loci analysis. Genetics (2007) vol. 176 (2) pp. 1151-67 Carlborg O, Jacobsson L, Ahgren P, Siegel P, Andersson L. Epistasis and the release of genetic variation during long-term selection. Nat Genet (2006) vol. 38 (4) pp. 418-20 Churchill, G. A., and Doerge R W, 1994 Empirical threshold values for quantitative trait mapping. Genetics 138: 963–971. Darvasi A, Soller M. Advanced Intercross Lines, an Experimental Population for Fine Genetic Mapping. Genetics (1995) vol. 141 (1199-1207 ) pp. 9 Darvasi A, Weinreb A, Minke V, Weller J I, Soller M. Detecting marker-QTL linkage and estimating QTL gene effect and map location using a saturated genetic map. Genetics (1993) vol. 134 (3) pp. 943-51 De Koning D J, L L Janss, A P Rattink, P A van Oers, B J de Vries, M A Groenen, J J van der Poel, P N de Groot, E W Brascamp, J A van Arendonk. Detection of quantitative trait loci for backfat thickness and intramuscular fat content in pigs (Sus scrofa). Genetics (1999) vol. 152 (4) pp. 1679-90 Dempster A P, Laird N M, Rubin D B, 1977 Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc. 39, 1-38 Dunnington E A, Siegel P B. Long-term divergent selection for eight-week body weight in white Plymouth rock chickens. Poult Sci (1996) vol. 75 (10) pp. 1168-79 Falconer D S, Mackay T F C. 1996. Introduction to quantitative genetics 4th edition. Longman group Limited, Essex, England. Fernando, R. L., and M. Grossman, 1989 Marker-assisted selection using best linear unbiased prediction. Genet. Sel. Evol. 21: 467– 477. Fisher RA (1918) The correlation between relatives on the supposition of Mendelian inheritance. Trans R Soc Edinburgh 52:399–433 Goldgar, D. E., 1990 Multipoint analysis of human quantitative genetic variation. Am. J. Hum. Genet. 47: 957–967. Grapes L, M Z Firat, J C M Dekkers, M F Rothschild, R L Fernando. Optimal haplotype structure for linkage disequilibrium-based fine mapping of quantitative trait loci using identity by descent. Genetics (2006) vol. 172 (3) pp. 1955-65 Harville D A. 1977. Maximum likelihood approaches to variance component estimation and to related problems. J am Stat Assoc 72: 320-338. Haley, C S, Knott S A. A simple regression method for mapping quantitative trait loci in line croflanking markers . Heredity (1992) vol. 69 pp. 315-324. 30.

(176) Heath SC: Markov Chain Monte Carlo Segregation and Linkage Analysis for Oligogenic Models. Am. J. Hum. Genet 1997, 61: 748-760 Hernández-Sánchez J, Knott S. Haplotyping via minimum recombinant paradigm. BMC Proceedings 2009, (3(Suppl 1):S7) pp. 5 Jacobsson L, Park H B, Wahlberg P, Fredriksson R, Perez-Enciso Siegel P, Andersson L. Many QTLs with minor additive effects are associated with a large difference in growth between two selection lines in chickens. Genet Res (2005) vol. 86 (2) pp. 115-25 Jensen J. Estimation of recombination parameters between a quantitative trait locus (QTL) and two marker gene loci . Theor Appl Genet (1989) vol. 78 pp. 613-618 Jerzy M Behnke, Fuad A Iraqi, John M Mugambi, Simon Clifford, Sonal Nagda, Derek Wakelin, Stephen J Kemp, R Leyden Baker, John P Gibson. High resolution mapping of chromosomal regions controlling resistance to gastrointestinal nematode infections in an advanced intercross line of mice. Mamm Genome (2006) vol. 17 (6) pp. 584-97 Johnson, D. L., and R. Thompson, 1995 Restricted maximum likelihood estimation of variance components for univariate animal models using sparse matrix techniques and average information. J. Dairy Sci. 78: 449–456. Lander E, L Kruglyak. Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat Genet (1995) vol. 11 (3) pp. 241-7 Le Rouzic A, Alvarez-Castro J M, Carlborg O. Dissection of the genetic architecture of body weight in chicken reveals the impact of epistasis on domestication traits. Genetics (2008) 179: 1591–1599. Levine D. PGAPack Parallel Genetic Algorithm Library, Online document, January 1996, Available: http://www-fp.mcs.anl.gov/CCST/research/reports_pre1998/ comp_bio/stalk/pgapack.htm Lundström, K., A. Karlsson, J. Håkansson, I. Hansson, M. Johansson et al., 1995 Production, carcass and meat quality traits of F2-crosses between European wild pigs and domestic pigs including halothane gene carriers. Anim. Sci. 61: 325– 331. Lynch, M. and B. Walsh, 1998 Genetics and analysis of quantitative traits. Sinauer Associates Inc., Sunderland, U.K Martinez O, Curnow R N. Estimating the locations and the sizes of the effects of quantitative trait loci using flanking markers . Theor Appl Genet (1992) vol. 85 pp. 480-488 Meuwissen T.H.E., and Goddard M.E. 2000. Fine mapping of quantitative trait loci using linkage disequilibria with closely linked marker loci, Genetics 155:421–430 Perez-Enciso, M., and L. Varona, 2000 Quantitative trait loci mapping in F2 crosses between outbred lines. Genetics 155: 391–405. Pong-Wong, R., George, A.W., Woolliams, J.A., and Haley, C.S. 2001. A simple and rapid method for calculating identity-by-descent matrices using multiple markers. Genet. Sel. Evol. 33:453–471. Qian D, Beckmann L. Minimum-Recombinant Haplotyping in Pedigrees. Am J Hum Genet 2002, 70: 1434-1445 Rönnegård, L., and O. Carlborg, 2007. Separation of base allele and sampling term effects gives new insights in variance component QTL analysis. BMC Genet. 8: 1. Slate J. Quantitative trait locus mapping in natural populations: progress, caveats and future directions. Mol Ecol (2005) vol. 14 (2) pp. 363-379 Tapadar P, Ghosh S, Majumder P P Haplotyping in pedigrees via a genetic algorithm. Hum Hered 2000, 50: 43-56. 31.

(177) Wang, T., R. L. Fernando and M. Grossman, 1998 Genetic evaluation by best linear unbiased prediction using marker and trait information in a multibreed population. Genetics 148: 507–515. Wang T, Fernando RL, Van der Beek S, Grossman M, Van Arendonk JAM: Covarience between relatives for a marked quantitative trait locus. Genet. Sel. evol 1995, 27: 251-274 Xu S. Computation of the full likelihood function for estimating variance at a quantitative trait locus. Genetics (1996) vol. 144 (4) pp. 1951-60 Yu X, Bauer K, Wernhoff P, Ibrahim S M. Using an advanced intercross line to identify quantitative trait loci controlling immune response during collageninduced arthritis. Genes Immun (2007) vol. 8 (4) pp. 296-301). 32.

(178)

(179) 2 2

(180)  .  

(181)  

(182)   

(183)  

(184)   

(185)  

(186)

(187)  & / . 3 $ 43      . 3 $  4352

(188)

(189)  2 35... 3. 3  . 0 

(190) 

(191) 6 7

(192)  

(193)   8

(194)  9 $7 057  . 3 0. . 3 .4 4 

(195)  $.  2

(196)

(197)     . 3 $ 436 :;  <.35)11(57

(198) .0 .  =

(199)  $.  2

(200)

(201)     . 3 $ 43>6?. 0. /

(202) .0  6..6 ./0//../ '1'-**.    

(203)     

(204)

(205)   .

(206)

References

Related documents

Haplotypes are inferred using a four step recursive approach, where step one and two were adapted from Qian and Beckmann [6]. The algorithm successively i) infers the parental origin

With the help of dasa5 system equipment which is provided by Dasa Company, we build the user interface for the desired algorithm developed in Code Composer Studio (CCS) for

Table F.2: The total number of memory accesses for a worst case lookup in HyperCuts (spf ac = 1, 4) when tested on core router databases. One memory access is

genetic algorithm provide a better learning efficiency, which answers our research question: How are the learning efficiency compared between reinforcement learning and

When we face different practical problems, a suitable selection method should be chosen and instead of one crossover point (which is shown in this dissertation) two or

Keywords: access, interest representation, civil society, neo-corporatism, pluralism, political opportunity structures, policy network, resource exchange,

We have conducted interviews with 22 users of three multi- device services, email and two web communities, to explore practices, benefits, and problems with using services both

One of the main components of our proposed approach is a fast, unsupervised and non-parametric 3D motion segmentation algorithm. This is used in order to: 1) provide labeled samples