• No results found

Population Genetic Analysis and Demographic Inference of FourSpruce Species from the Qinghai-Tibetan Plateau

N/A
N/A
Protected

Academic year: 2022

Share "Population Genetic Analysis and Demographic Inference of FourSpruce Species from the Qinghai-Tibetan Plateau"

Copied!
38
0
0

Loading.... (view fulltext now)

Full text

(1)

Population Genetic Analysis and Demographic Inference of Four Spruce Species from the Qinghai-Tibetan Plateau

Michael Stocks

Degree project inbiology, Master ofscience (2years), 2009 Examensarbete ibiologi 45 hp tillmasterexamen, 2009

Biology Education Centre and Department ofEvolutionary Functional Genomics, Uppsala University

(2)

Population Genetic Analysis and Demographic Inference of Four Spruce Species from the

Qinghai-Tibetan Plateau

Michael Stocks

Abstract

Geological upheaval has altered the topology and climate of the Qinghai-Tibetan Plateau considerably over the last 10 million years.

12-14 nuclear loci were sampled from four conifers in the genus Picea that occupy regions on and around the plateau. P. likiangensis, P.

purpurea & P. wilsonii are found on the eastern edge of the plateau with ranges that overlap to varying degrees, whilst P. schrenkiana is found in isolated mountain ranges northwest of the Qinghai-Tibetan Plateau. Nucleotide diversity amongst the three eastern spruce species was higher than in previously studied Picea from North America and Europe, implying that recent glaciations have not had as big an effect on Asian species as has been observed in boreal species. Further anal- ysis of population structure and demographic history using coalescent based methods imply a complex relationship amongst recently diverged species, with evidence for post-divergent gene flow being detected by MIMAR. The Isolation with Migration model implemented in MIMAR gives divergence times which coincide with studies of stable isotopes in marine and land fossils that approximate when the plateau began to dry out.

Department of Evolutionary Functional Genomics, Uppsala University

(3)

Introduction

The origin of new species is never simply defined by gargantuan events that establish impenetrable mountain ranges, oceans or deserts that tidily split a species in two. In reality, even the most dramatic of geological events take place on relatively long time scales, with local environmental and ecological conditions changing gradually. The fate of a species is intrinsically linked to its environment, shaping its demographic history. For example, favourable conditions may lead to a range expansion, while non-favourable conditions may lead to more restricted natural ranges. Differing demographic scenarios, such as population expansions or bottlenecks, leave genome-wide patterns of polymorphism in DNA sequences that can be analysed to make population genetic inferences.

The vast highlands of the Qinghai-Tibetan Plateau (QTP) have under- gone considerable altitudinal and climatic change over the course of the last 60 million years (Wang, Y. et al., 2006; Wang, Y. et al., 2008). While the exact timing and speed of the uplift of the central part of the plateau and of the ridges are still disputed (Wang, C., 2008), the drastic change in elevation and ensuing shift in climatic conditions have certainly had a profound im- pact on the flora and fauna. Picea likiangensis, P. purpurea, P. schrenkiana and P. wilsonii are four montane to subalpine spruce species that occur on or around the QTP. Geographically (see Figure 1), P. schrenkiana shows the greatest degree of isolation of the species, being distributed in fragmented populations in the Tian Shan range (China) and in the mountains north of the Naryn River (Kyrgyzstan) (Farj´on, 1990; Farj´on, 2001). P. likiangensis, P. purpurea and P. wilsonii are found along the eastern edge of the QTP, with ranges that overlap to varying degrees.

Morphologically, the Chinese spruce species have been difficult to clas- sify, but generally P. likiangensis and P. purpurea have been classified sep- arately from P. wilsonii and P. schrenkiana (Farj´on, 1990; Farj´on, 2001).

Ran et al. (2006) used chloroplast DNA to construct a phylogeny of the 34 spruce species described in Farj´on (1990, 2001) which, conversely, grouped P. purpurea and P. wilsonii within the same clade, with P. likiangensis and P. schrenkiana together in a sister clade. This genetic discordance with morphological data was recently emphasised in chloroplast (cpDNA) and mitochondrial (mtDNA) DNA analyses performed by Lanzhou Univer- sity (Gansu, China) (Jianquan Liu, unpublished data). The northernmost spruce, P. wilsonii, was fixed for a chlorotype and mitotype that was highly divergent from the haplotypes present amongst the P. likiangensis popula- tions to the south. Intriguingly, the P. purpurea chlorotypes and mitotypes were split between whole populations that were fixed for the P. wilsonii hap- lotype, and individuals that had one of the P. likiangensis haplotypes. The presence, for both cpDNA and mtDNA, of two very different haplotypes in P. purpurea suggests, given the slow substitution rates observed in cytoplas-

(4)

mic organelles in conifers (Jaramillo-Correa et al., 2003), that asymmetrical introgression of organelle haplotypes may be occurring.

Asymmetrical introgression has been observed in a number of studies on

Figure 1: Map showing the locations of P. likiangensis (yellow), P. purpurea (purple), P.

schrenkiana (red) & P. wilsonii (green) on and around the Qinghai-Tibetan Plateau.

natural populations (e.g. Slotte et al., 2008), and could easily be attributed to selection or a bias inherent in the estimation process. A recent study by Currat et al. (2008), however, used simulations to show that asymmetrical introgression could occur, in the absence of selection, due to purely demo- graphic reasons. When a population invades the range of another resident population, asymmetrical gene flow would occur from the local to the in- vading species, as interbreeding events occurring at the leading edge of the advance would gradually dilute the invading population.

If interspecific asymmetrical introgression were to have occurred between previously divergent populations it illustrates the potential ebb and flow of parapatric and sympatric speciation. While allopatric models of speciation can be mathematically and intuitively simpler, there is mounting evidence that the more complex processes involved in parapatric and sympatric spe- ciation might be more common than once thought. There are, naturally, difficulties when the traditional bifurcating pattern of evolution is blurred due to post-divergent or interspecific gene flow, but it is clear that some of these complicated scenarios can leave informative patterns of variation and

(5)

introgression in sequences, and it is of relevance to try to understand how to interpret these signals in polymorphism data. Coalescent simulations of cryptic speciation, however, offer a quantitative and statistical framework with which to answer some of these more testing questions.

Kingman’s (1982a, 1982b) n-coalescent is a mathematical approximation of the ancestral process, whereby lineages coalesce backwards in time until the Most Recent Common Ancestor (MRCA) of the sample is reached. By modelling only those lineages ancestral to the sample, it makes for a more efficient approach than forward-in-time models that require modelling each individual. Another advantageous aspect of the coalescent is that large sam- ple sizes are not necessarily needed as the probability of the MRCA of the sample being the MRCA of the entire population is (n−1)/(n+1) (Saunders et al., 1984). Further extensions of the coalescent have led to the modelling of the ancestral process with, for example, recombination (Hudson, 1983;

Griffiths, 1991; Griffiths & Marjoram, 1996), selection (Krone & Neuhauser, 1997; Neuhauser & Krone, 1997) or substructure (Notohara, 1990; Herbots, 1997), allowing the hypothesis testing of a range of evolutionary and demo- graphic scenarios. Specifically, the inclusion of coalescent based Bayesian statistical methods (e.g. Nielsen & Wakeley, 2001; Beaumont et al., 2002) in estimating population divergence parameters has allowed a fresh look at speciation.

Allopatric speciation has been commonly depicted in population genet- ics using the isolation model, where a panmictic ancestral population of effective population size NA splits t generations ago into two descendant populations of effective population size N1 and N2, with no gene flow occur- ring after divergence. The polymorphisms observed between species under the isolation model have been categorised by Wakeley & Hey (1997) into dis- tinct groups depending on their frequencies in the descendant populations.

Sites for which polymorphisms are present in both descendent populations are referred to as shared polymorphisms (Ss), whereas polymorphisms con- fined solely to a single population are referred to as private polymorphisms (SX1 or SX2, where 1 and 2 refer to populations 1 and 2 respectively). And finally, if a polymorphism becomes fixed in one of the populations and is absent from the other then it is referred to as a fixed polymorphism (Sf).

By counting the number of segregating sites present in each of the two popu- lations, irrespective of the frequency in the other population, you also get S1

and S2(the number of segregating sites in populations 1 and 2 respectively), which allows us to relate each of the groups of polymorphic sites with the others:

S1 = SX1+ SS S2 = SX2+ SS

S = SX1+ SX2+ SS+ SF

(6)

Formally, the isolation model is described by the following parameters: θA= 4NAµ, θ1= 4N1µ and θ2= 4N2µ, where NA, N1and N2refer to the effective population sizes of the ancestral and descendent populations respectively, µ is the per generation mutation rate and τ is the time in the past (in generations) when the two populations diverged. The population scaled mutation parameter, θ, can be estimated by the number of segregating sites divided by a coefficient reflecting the structure of the neutral genealogy through the estimate provided by Watterson (1975):

θW = Sn

1 +12. . . + n−11 (1)

where Sn is the number of segregating sites and n is the number of individ- uals sampled. Using this estimation for θ and considering the relationships between the groups of segregating sites it is possible to derive the expected values for SX1, SX2, Ss and Sf, and the number of those expected to occur before and after the time of divergence. Given observed values for the seg- regating sites it is then possible to solve for the parameters θ1, θ2, θAand τ that define the isolation model. To incorporate gene flow between diverging populations Nielsen & Wakeley (2001) described the isolation with migra- tion (IM) model by adding the parameters M1 = 2N1m and M2 = 2N2m, where Nie is the effective population size of the respective populations and m1 is the proportion of population 1 that is replaced each generation by migrants from population 2.

In order to statistically evaluate different values for the parameters within the IM model, Markov chain Monte Carlo (MCMC) methods were imple- mented in a Bayesian and likelihood framework (Nielsen & Wakeley, 2001;

Hey & Nielsen, 2004; Hey & Nielsen, 2007). Given the observed data (X) and a set of parameters of interest (Θ), Bayesian inference sets up a joint distribution (P (X, Θ)), that is simply the product of the prior distribution (P (Θ)) and the likelihood (P (X|Θ)) (Gilks, Richardson & Spiegelhalter, 1996). The prior distribution represents prior knowledge that may be avail- able for the data, and together with the likelihood gives a full probability model: P (X|Θ) = P (X|Θ)P (Θ). Bayes theorem, after the observation of the data (X), allows the determination of the distribution of the parameters (Θ) conditional on the data (X), the posterior distribution:

P (Θ|X) = P (Θ)P (X|Θ)

P (X) (2)

The likelihood function contained in the posterior distribution (P (X|Θ)) is the probability of observing the data given a set of unknown parameters, but is often difficult to explicitly define for a given model (Stephens, 2001).

Felsenstein (1988), by treating the genealogy as a nuisance variable, defined the likelihood of the parameters given the data (proportional to P (X|Θ))

(7)

as:

L(Θ|X) ∝ P r(X|Θ) = X

G∈ψ

P r(X|G)P (G|Θ) (3)

where G is the genealogy and ψ refers to all the possible gene genealogies.

However, given that the number of gene genealogies increases exponentially with the number of samples, the calculation of the likelihood for all possible genealogies is extremely difficult for large samples. By considering a prior distribution (P (Θ)) in a Bayesian framework, a joint posterior density,

P (G, Θ|X) = P (X|G)P (G|Θ)P (Θ)

P (X) (4)

can be used to simulate a Markov chain for a sample of the genealogies to give an estimate of P (Θ|X) (Hey & Nielsen, 2007).

The use of a uniformly distributed prior then allows P (Θ|X) to be esti- mated as:

P (Θ|X) ≈ 1 k

k

X

i=1

P (Θ|Gi) (5)

where Gi refers to samples of the marginal distribution

P (G|X) = P (X|G)P (G)/P (X) (6)

generated by MCMC simulations. This posterior density is then, assuming a uniform prior, proportional to the likelihood of the parameters (Hey &

Nielsen, 2007).

Samples are drawn from the marginal distribution using a Markov chain, whereby the next state depends only on the current state, and not on the history of the chain (Gilks, Richardson & Spiegelhalter, 1996). The MCMC approach updates the parameters of the model in accordance with the Metropolis-Hastings criterion:

min



1,P (X|G)P (G|Θ)P (G → G) P (X|G)P (G|Θ)P (G → G)



(7) where P (G → G) refers to the probability of G* being proposed as the next state from G and vice versa. Hey & Nielsen (2004) originally imple- mented this approach in their program IM, whereby the model used assumed no intralocus recombination. In samples where intralocus recombination is known to occur, loci (or sections of loci) can be removed to avoid breaking this assumption, but this can drastically reduce the amount of data anal- ysed. The program MIMAR (Becquet & Przeworski, 2007), in summarizing the polymorphism data, allows for intralocus recombination by using ances- tral recombination graphs (Hudson, 1983) to generate genealogies under the coalescent according to the parameters proposed by the MCMC approach.

Values for the parameters of the IM model are randomly selected and used to

(8)

perform coalescent simulations. After each simulation, polymorphism data is calculated for the simulated values and compared to the statistics from the observed data. The posterior density is calculated for each step of the simulations and used to determine the parameter values for the simulations in the following step. After a large number of simulations, a distribution of the parameter posterior densities can be obtained with the mode giving an estimate of each parameter.

MIMAR estimates several parameters that are of interest in this study.

Firstly, the divergence time (τ ) between P. purpurea, P. schrenkiana and P. wilsonii gives an estimate of the timing of speciation and can also be used as a measure of relatedness amongst the species. Secondly, estimates of the rates of post-divergent gene flow can potentially tell something about the mode of speciation. If there is no migration detected occurring after divergence then a more sudden, allopatric form of speciation could be in- ferred. If migration is detected occurring after divergence then it could be inferred that a more gradual, sympatric form of speciation took place. The migration rate estimates between P. purpurea and P. wilsonii are of special interest in light of evidence from organelle data that introgression has po- tentially occurred, and an estimate of gene flow in autosomal DNA would help to clarify the relationship between these two species.

(9)

Materials & Methods

Obtaining & Sequencing the DNA: Seeds were collected from 15, 6, 4 and 6 populations of Picea likiangensis, P. purpurea, P. schrenkiana and P.

wilsonii respectively, with up to 16 individuals sampled per population by colleagues at Lanzhou University (Gansu, China) (see Appendix: Table 5).

The collected seeds were stored in a cold room and then soaked in wa- ter at 4C overnight before the haploid megagametophytes were extracted.

Altogether 172 megagametophytes were used for DNA extraction carried out with Qiagen PCR Purification kits using the CTAB method (Doyle &

Doyle, 1990) (see Table 6 for a list of primers). Fourteen (P. likiangensis, P. purpurea and P. wilsonii ) and twelve (P. schrenkiana) loci (PtIF2009, 4cl, col2, ebs, FT3, GI, M002, M007D1, PCH, Sb16, Sb29, Sb62, sel1364, sel1390, xy1420 and ztl) were sequenced on an ABI 3130XL DNA Sequencer using the ABI Prism Bigdye Terminator Cycle Sequencing Ready Reaction Kit at Lanzhou University (Gansu, China) or by Macrogen (Seoul, Korea).

Contig assembly and base calling was performed using PHRED and PHRAP (Ewing & Green, 1998; Ewing et al., 1998), and CONSED (Gordon et al., 1998) was used to edit the sequences for analysis.

Nucleotide Diversity & Tests of the Standard Neutral Model: The number of segregating sites (S), number of singletons (η1), average num- ber of pairwise nucleotide differences (π) (Tajima, 1983), minimum number of recombination events (Rm) (Hudson & Kaplan, 1985), θW (Watterson, 1975), Tajima’s D (Tajima, 1989) and Fu & Li’s D* and F* statistics (Fu &

Li, 1993) were calculated using compute which is part of the analysis pack- age that utilises the libsequence C++ class library for evolutionary genetic analysis (Thornton, 2003). The expected number of singletons (η1) of the folded site frequency spectrum was calculated using E(ηi) = θ

1 i+n−i1 1+δi,n−1 (Fu, 1995; Griffiths & Tavar´e, 1998), where n is the number of samples, i is the number of samples containing the mutation (i.e. 1 in the case of a singleton) and δi,n−1is the Kronecker delta whose value is 1 if i = j and 0 if i 6= j. Fay

& Wu’s H (Zeng et al., 2006) was calculated using DnaSP ver. 4.50 (Rozas et al., 2003) and required the use of an out-group to determine the derived allele. The list of out-groups used for each population is given in Appendix:

Table 4. Tests of significance for Tajima’s D, Fu & Li’s D* and F* and Fay

& Wu’s H for each locus were performed in DnaSP assuming a standard neutral model with recombination (using the output from rhothetapost, see below).

Expected distributions for the summary statistics were obtained using the program ms (Hudson, 2002). Tajima’s D, Fu & Li’s F* and Fay & Wu’s H were chosen as the summary statistics that represented the data in the most informative manner. D and F* were calculated by the program msstats

(10)

(Thornton, 2003) from the data simulated in ms and the normalised version of Fay & Wu’s H (Zeng et al., 2006) was calculated using a Python code from the estimate of θH (Fay & Wu, 2000) given by msstats (some of the scripts used in the analysis are available at http://www.mspopgen.com). 1,000,000 simulations were performed under the standard coalescent with recombina- tion (using the output from rhothetapost, see below), with π used as an estimate of θ. Frequency distributions of the expected summary statistics were plotted for each of the four species using the statistical package R (R Development Core Team, 2007) alongside the observed values for compari- son and 5% quantiles to test for significance.

Population Structure: Wright’s fixation index (FST; Wright, 1951), as applied in this study, assesses population structure by comparing the prob- ability of identity by descent of pairs picked from within a deme to the probability of identity by descent picked randomly from the total popula- tion (Gillespie, 2004). Estimates of FST within and between species were carried out using the population genetics data analysis package Arlequin ver. 3.11 (Excoffier, 2005) and can be used as an indicator of population structure. To further understand the genetic structure of the populations a clustering algorithm was used as implemented in the software package Struc- ture ver. 2.2 (Pritchard et al., 2000; Falush et al., 2003; Falush et al., 2007).

Structure uses multilocus data to create clusters of individuals that are in both Hardy-Weinberg and linkage equilibrium. The admixture model was used with a burnin period of 150,000 steps followed by a further 8,000,000 steps, and repeated multiple times to ensure consistency in the results. The likelihoods of different values of K (the inferred number of populations) was assessed with K = 3 having the highest likelihood, and so this value was used to perform the longer runs described above.

Recombination: The population scaled recombination rate, ρ, was first estimated per locus per species using the composite likelihood method (Hud- son, 2001) as implemented by the program pairwise in LDhat 2.1 (McVean et al., 2002). ρ for diploid species is equal to 4Ner, where 4Ne is the popu- lation scalar and r is the product of the per site recombination rate and the physical distance across the region analysed. The loci with the minimum and maximum estimated values for ρ were then used as bounds for the mul- tilocus estimation of ρ that was performed using the libsequence program rhothetapost (Haddrill et al., 2005; Thornton & Andolfatto, 2006). rho- thetapost is an approximate Bayesian and rejection sampling method that gives joint multilocus estimates of ρ and θ. Random values of ρ and θ were drawn from a uniform prior and used to simulate 10 loci for which the mean π, S and Rm were calculated and compared with the observed values. If the simulated values were within tolerance  of the observed values then the randomly picked values for ρ and θ were accepted. Two runs were performed

(11)

in rhothetapost. The first run collected 12,000 accepted values for ρ and θ where  was 0.2 and the bounds for the prior were those given by LDhat.

The statistical package R was then used to determine the 0.5% and 99.5%

quantiles of the marginal distributions for ρ and θ, and these were then used as the bounds of a new uniform prior for a second rhothetapost run.

In the second run  was set to 0.05 and 12,000 more acceptances were col- lected, from which the modes were used to give the final estimates of ρ and θ.

Polymorphism data: A Python code was used to count the number of fixed differences and shared and private polymorphisms. The different groups of polymorphisms described by Wakeley & Hey (1997) were counted using out-groups to discern the derived from the ancestral allele as posited by Becquet & Przeworski (2007), and these were counted between each pair of populations to input into MIMAR. An additional count was done between the three populations P. likiangensis, P. purpurea and P. wilsonii as it was deemed important in assessing the extent of polymorphism sharing between each pair of populations. The categories described in Wakeley & Hey (1997) were extended to incorporate the combinations required for three popula- tions. Private polymorphisms (SX1, SX2 & SX3) refer to those that occur only in one population, exclusively shared polymorphisms (SS12, SS23 &

SS13) refer to those that are present in specific pairs of populations and uni- versally shared polymorphisms (SS123) refer to those that are shared by all three populations. There are also fixed differences that can be observed for individual populations only (Sf 1, Sf 2 & Sf 3), although none were present in the three populations studied here.

MIMAR: Input files were prepared for MIMAR incorporating the length (L), number of individuals (n) and the numbers of fixed differences and shared and private polymorphisms for each locus for each pair of species compared. For each species pair the respective estimates of ρ given by rhothetapost were averaged to give a mean ρ specific to each pairwise com- parison. An estimation of the per generation mutation rate per site, µ, was required to assign mutations to the genealogy in MIMAR and two sources were considered for this. Bouill´e & Bousquet (2005) used morphological fos- sil evidence and a molecular clock estimate based on three nuclear loci to calculate the mutation rate in the genus Picea. Willyard et al. (2007), on the other hand, used 11 nuclear loci and 4 chloroplast loci from the genus Pinus calibrated using fossil evidence to fix the divergence time of the Pinus subgenera. Due to the higher number of loci used and the better calibration in Willyard et al. (2007) it was decided that the mutation rate from this study of 1.01 × 10−9 (mean of the lower and upper bounds) per site per year would be used. Given a 25 year generation time (Brown et al., 2004), the per site per generation mutation rate is 2.5 × 10−8. It may seem counterintuitive to use an estimation taken from Pinus instead of Picea but any differences

(12)

in the mutation rates between these two genera were judged to be small compared to the potential variance given by the small number of loci used in the study by Bouill´e & Bousquet (2005), that could have distorted the estimate.

Preliminary runs were carried out to optimise, for each parameter, the variances of the kernel distributions used to determine new values for the next step of the MCMC, and this was assessed by the level of convergence and the acceptance rates. Convergence was monitored by referring to the mixing and autocorrelation graphs produced using a modified version of the R script provided with MIMAR. Due to time constraints, only two of the pairwise species comparisons were performed: P. purpurea with P.

schrenkiana, and P. purpurea with P. wilsonii. These two comparisons were important as they could 1) give an idea of the divergence times between P.

schrenkiana and the three other species, and 2) determine whether there has been introgression occurring between P. purpurea and P. wilsonii. Due to problems obtaining good estimates of migration in the P. purpurea / P.

wilsonii comparison, runs were performed with the migration rates fixed to symmetrical. In earlier runs the estimated rates of migration were approxi- mately symmetrical, so it was deemed to be more important to obtain good estimates under this assumption. After exploratory simulations, 20 million steps were run for the P. purpurea / P. schrenkiana and P. purpurea / P.

wilsonii comparisons. A burnin period equivalent to 10% of the total num- ber of steps was discarded to ensure that the posterior distributions were independent of their starting points. To further ensure that convergence had been reached, two runs of each comparison were performed simultaneously with different starting seeds to certify that the same result was reached each time. Subsequent posterior distributions were attained and smoothed using R.

(13)

Results

Nucleotide Diversity: In total 33,663 base pairs of nuclear DNA were analysed from Picea likiangensis, P. purpurea, P. schrenkiana and P. wilsonii, with 256, 211, 49 and 164 segregating sites identified respectively. Estimates of nucleotide diversity are shown in Table . The average number of pair- wise differences per base pair, π, indicated that nucleotide diversity was highest in P. purpurea (0.006240), followed by P. wilsonii (0.005744) and P. likiangensis (0.005584), with P. schrenkiana having the lowest diversity (0.002205). Accordingly, estimates of nucleotide diversity given by θW were highest in P. purpurea (0.007784) and lowest in P. schrenkiana (0.002246), but with estimates of P. likiangensis (0.006332) in this case being slightly higher than that of P. wilsonii (0.006106). Estimates of ρ (per base pair) given by rhothetapost are shown in Appendix: Table 14 and were of the same order of magnitude in P. likiangensis (0.01193), P. purpurea (0.01019) and P. wilsonii (0.01059), but smaller in P. schrenkiana (0.0075).

Tests of the standard neutral model: Tajima’s D (D) and Fu & Li’s D* (D*) and F* (F*) (see Table ) were negative in all four species, and only P. wilsonii had a positive Fay & Wu’s H. None of the neutrality tests were significant when averaged across all loci, but there were a number of loci that departed significantly from neutrality when tested individually (see Appendix: Tables 7 to 10). Fay & Wu’s H was significantly positive for 7 separate loci in P. wilsonii but as the pattern seemed mostly consistent across all loci it was deemed to be more likely due to demographics than selection. A significant excess of higher frequency variants was detected at the ft3 and sb29 loci in P. schrenkiana, but this can be explained by the thinner sampling undertaken for this individual as relatively few individuals were sampled at these loci (16 and 13 respectively). Other significantly neg- ative loci for D, D* and F* in P. purpurea and P. likiangensis appear to be more extreme values of a genome wide pattern of negative values and were not excluded from further analyses. Locus 2009 was consistently positive for values of D, D* and F* in P. likiangensis, P. purpurea and P. wilsonii but, as only one value (Tajima’s D in P. purpurea) was significant, the locus was not excluded from the analyses. To further assess whether the loci were compatible with the standard neutral model, coalescent simulations were carried out under the standard neutral model with recombination. Sum- mary statistics were calculated for the simulated data and compared to the observed data, and the distributions are shown in figure 2. Observed values of D, F* and H do not deviate significantly (5%) from what would be ex- pected under the standard neutral model with recombination. The observed value of F* for P. purpurea showed an excess of singletons when compared with π, further indicating that demographic factors, rather than selection, are responsible for the significance in the locus specific analyses. For illus-

(14)

tration purposes (see Figure 2), values of Fay & Wu’s H less than -7 were not included in the distributions. While not statistically correct, this did not noticeably alter the shape of the distributions or change any of the con- clusions reached.

Population Structure: FST was calculated between each pair of species (see Table 2) and similar levels of differentiation (0.52864, 0.52578 and 0.5134) were observed between P. schrenkiana and P. likiangensis, P. pur- purea and P. wilsonii respectively. FST was lowest between P. likiangen- sis and P. purpurea (0.06856), followed by P. purpurea and P. wilsonii (0.10501), and P. likiangensis and P. wilsonii. Structure ver. 2.2 was used to infer cryptic population structure within the four species consid- ered jointly, with three being the number of inferred populations with the highest likelihood. The three species P. likiangensis, P. schrenkiana and P. wilsonii appear to form three genetically differentiated clusters (see Fig- ure 15), with P. purpurea being made up of variation attributable to each of these clusters (see Appendix: Table 15). Interestingly P. wilsonii seems to be the most structurally differentiated, while the more geographically distant P. schrenkiana appears to cluster closer to P. likiangensis. The ma- jority of P. purpurea individuals cluster with those of P. likiangensis rather than P. wilsonii, however there are P. purpurea individuals that cluster with P. wilsonii as well as P. wilsonii individuals that group closer to P. likian- gensis.

Shared Polymorphism and MIMAR Analyses: No fixed differences were observed between P. likiangensis, P. purpurea and P. wilsonii, that

Table 1: Nucleotide diversity and tests of the standard neutral model for the four spruce species where ¯n is the average number of individuals sampled at each locus, ¯L is the average length of the loci, S is the total number of segregating sites observed for all loci, η1 and E(η1) are the total number of observed and expected singletons, θW is Watterson’s estimate of θ per base pair averaged across all loci and π is the average number of pairwise nucleotide differences per base pair for all loci. Tests of the standard neutral model based on the frequency spectrum are averaged over all loci.

Species n¯ L¯ S η1[E(η1)] θW π

P. likiangensis 34 645 256 98 [59] 0.006332 0.005584 P. purpurea 19 576 211 115 [53] 0.007784 0.006240 P. schrenkiana 17 601 49 18 [17] 0.002246 0.002205 P. wilsonii 19 577 164 65 [49] 0.006106 0.005744 Species Tajima’s D Fu & Li’s D Fu & Li’s F Fay & Wu’s H P. lik. -0.505634 -0.739421 -0.817759 -0.072349 P. pur. -0.802918 -1.257076 -1.324762 -0.683949 P. sch. -0.461984 -0.397038 -0.463501 -1.186889 P. wil. -0.381706 -0.690811 -0.725818 0.799697

(15)

Tajima's D

−3 −2 −1 0 1 2 3 4

0 10000 20000 30000 40000 50000

P. likiangensis

Fu & Li's F*

−4 −2 0 2

0 10000 20000 30000 40000

Fay & Wu's H

−6 −4 −2 0 2

0 10000 20000 30000

Tajima's D

−3 −2 −1 0 1 2 3 4

0 10000 20000 30000 40000 50000

P. purpurea

Fu & Li's F*

−4 −3 −2 −1 0 1 2 3

0 10000 20000 30000 40000

Fay & Wu's H

−6 −4 −2 0 2

0 5000 10000 15000 20000 25000 30000 35000

Tajima's D

−3 −2 −1 0 1 2 3

0 10000 20000 30000 40000

P. schrenkiana

Fu & Li's F*

−4 −3 −2 −1 0 1 2 3

0 10000 20000 30000

Fay & Wu's H

−6 −4 −2 0 2

0 10000 20000 30000 40000 50000 60000

Tajima's D

−3 −2 −1 0 1 2 3 4

0 10000 20000 30000 40000 50000

P. wilsonii

Fu & Li's F*

−4 −3 −2 −1 0 1 2 3

0 10000 20000 30000 40000

Fay & Wu's H

−6 −4 −2 0 2

0 10000 20000 30000

Figure 2: Expected distributions of Tajima’s D, Fu & Li’s F* and Fay & Wu’s H for each of the four spruce species under the standard neutral model with recombination, using π as an estimate for θ, and with ρ estimated using rhothetapost. Solid vertical lines represent the observed values of each statistic averaged over the loci, with the dotted lines representing the 2.5 and 97.5% confidence intervals.

are located on or around the QTP. There were, however, fixed differences observed between each of these species and the more isolated P. schrenkiana (see Appendix: Tables 12 & 13). When polymorphism data was calculated for the three QTP species simultaneously (see Figure 4 and Appendix: Ta- ble 11) the majority of the segregating sites were classified as either being private to individual populations (87 in P. likiangensis, 71 in P. purpurea and 65 in P. wilsonii ) or shared between all populations (67). There were less segregating sites that were shared exclusively between pairs of popula- tions (32 between P. likiangensis and P. purpurea, 23 between P. purpurea and P. wilsonii and 4 between P. likiangensis and P. wilsonii ).

Table 2: Between species estimates of FSTfor each pair of spruce species, with the diagonal values representing within species estimates of FST.

P. lik. P. pur. P. sch. P. wil.

P. lik. 0.157 - - -

P. pur. 0.069 0.048 - -

P. sch. 0.529 0.526 0.148 -

P. wil. 0.165 0.105 0.513 0.083

(16)

Figure 3: Clustering of individuals from each spruce species for each of the k = 3 clusters performed in the Structure analysis.

Table 3: Modes of the posterior densities from one of the two independent runs for each of the parameters estimated in MIMAR.

θ1 θ2 θA Tgen M1 M2

P. pur./P. sch. 0.00822 0.00127 0.00605 64300 0.48 0.5 P. pur./P. wil. 0.00941 0.00456 0.00615 21625 0.69 0.69

Estimates of population divergence parameters from MIMAR are shown in Table 3 and the posterior densities, for each pairwise compari- son, of one of the two independent runs are shown in Appendix: Figures 5

& 6. The divergence times show P. purpurea and P. wilsonii to be the most recently diverged (21,625 generations ago), with P. purpurea and P.

schrenkiana separating 64,300 generations ago. Approximately symmetrical migration was detected between P. purpurea and P. schrenkiana, indicat- ing that gene flow had occurred after speciation. For the P. purpurea and P. wilsonii pairwise comparison, problems were encountered obtaining good estimates for the migration rates. In order to improve the resolution the migration rates were fixed to be symmetrical, and subsequently a low rate of migration was observed between the two species. The values for ances- tral θ were estimated to be 0.00615 for the comparison between P. purpurea and P. wilsonii, and 0.00605 in the P. purpurea and P. schrenkiana pairing.

Appendix: Table 16 shows a comparison of the values of θ given by MIMAR

(17)

against estimates from other methods and can give an indication of how well the IM model approximates the system studied. Estimates from MIMAR for P. purpurea were closest to Watterson’s estimate of θ, whilst estimates for both P. schrenkiana and P. wilsonii were lower than those produced by any of the other methods.

Figure 4: Venn diagram illustration of the number of polymorphisms shared between and private to three of the spruce species.

(18)

Discussion

The distribution of variation amongst diverging populations can be used to make inferences about the demographic and evolutionary origins of the species studied. Polymorphisms shared between two diverging populations occur, assuming an infinite sites model, either because of the retention of ancestral polymorphisms or as a result of post-divergent gene flow. In re- cently diverged populations, where lineage sorting may still be occurring, it can be very difficult to infer which shared polymorphisms are retained from an ancestral population and which ones are the result of introgression. In their mathematical treatment of the genealogical species concept, Hudson &

Coyne (2002) calculated the probability of observing reciprocal monophyly for differing numbers of loci. Reciprocal monophyly applies if all sampled loci from a genealogical species are phylogenetically shown to be more closely related to each other than to alleles of the same locus in other genealogical species. For 15 nuclear loci the time it would take to have a 50% probability of observing reciprocal monophyly is 8.9 Ne generations (where Ne is the effective size of the population). Using π as an estimate for θ, a mutation rate of 2.5 × 10−8 per base pair and assuming a 25 year generation time it would take 13.9 and 4.9 million years for P. purpurea and P. schrenkiana respectively to have a 50% probability of observing reciprocal monophyly. If the studied species’ morphological similarity is consistent with a relatively recent divergence time then we would expect a large number of alleles in the four species to have been retained from a common ancestral population.

Variation within the three species of the QTP was considerably higher than recent estimates of nucleotide diversity per site (π) observed in bo- real spruce species such as P. abies (0.003838), P. glauca (0.004824) and P.

mariana (0.003199) (Chen et al., submitted), all of which have much larger natural ranges. This most likely reflects that the boreal species have been more severely affected by the glacial patterns in the northern hemisphere, and this is to some extent reflected in the neutrality tests. Estimates of Tajima’s D for the two boreal species P. abies (-0.897) and P. mariana (-0.779) were more negative than those obtained for all the Asian spruce species considered here except P. purpurea (-0.803). The low diversity ob- served in P. schrenkiana is somewhat expected given its limited and frag- mented distribution, and is lower than the widely distributed boreal species.

Both its diversity and Tajima’s D are, on the other hand, similar to those observed in P. breweriana, a Tertiary relict with a very small distribution range confined to Northern California (Chen et al., submitted).

Analysis of population structure performed in the program Structure clustered P. wilsonii individuals quite distinctly from the other species, with P. likiangensis, instead of grouping with the other two eastern species, hav- ing a cline of individuals clustering towards P. schrenkiana. One possible explanation for this is that P. schrenkiana shares a more recent common an-

(19)

cestor with P. likiangensis than P. likiangensis shares with P. wilsonii, in- dicating that P. wilsonii may not be as closely related as originally thought.

This explanation, however, is not reflected in the shared polymorphism data.

Differences between P. likiangensis and P. schrenkiana are such that 12 polymorphisms have gone to fixation since their divergence, whereas no fixed differences were observed between P. likiangensis and P. wilsonii.

The Structure algorithm assigned P. purpurea in equal proportions to each of the populations inferred using the program, with individuals clus- tering close to P. likiangensis, P. schrenkiana and P. wilsonii. The clus- tering pattern and the relatively high level of genetic diversity observed in P. purpurea could suggest that it is the result of hybridisation between P.

likiangensis and P. wilsonii, but if that were true then one would not expect as many private polymorphisms, as its genetic variation would be made up of contributions from P. likiangensis and P. wilsonii, leading to more shared polymorphisms. However, many of the polymorphisms are singletons so it is possible that the private alleles are the result of demographic factors, such as population growth. It is also contradicted by estimates of FST between species which, in accordance with morphological classifications, indicate that P. purpurea has diverged least from P. likiangensis, although this could also be as a result of substructure within P. likiangensis that can artificially lower estimates (Gillespie, 2004). P. purpurea’s intermediate clustering could be due to the retention of ancestral alleles, but it could also be due to intro- gression from one, or possibility two of the species.

Lanzhou’s study of cpDNA and mtDNA (Jianquan Liu, unpublished data) showed a similar pattern whereby some of the P. purpurea popula- tions were fixed for the divergent P. wilsonii haplotypes, while some had organelle haplotypes more like P. likiangensis. Interestingly, there were also some P. likiangensis individuals that had the P. wilsonii organelle haplo- types which, because of the large difference between each of the haplotypes, strongly indicates that introgression of organelle haplotypes may have oc- curred from P. wilsonii to P. purpurea. It could be that the patterns are caused by adaptation to local environmental conditions, but given the num- ber of mutational differences and the low mutation rate observed in conifers (Jaramillo-Correa et al., 2003), this explanation seems unlikely. Whilst se- lection can never be ruled out, it is important to consider whether certain non-selective processes could be responsible for the patterns observed in the data.

Recent simulations of introgression patterns performed by Currat et al.

(2008) could explain the interesting patterns observed in organelle DNA.

Their simulations showed that neutral demographic factors could be used to explain patterns of asymmetrical introgression, whereby gene flow occurs from the local into the invading species. This asymmetrical introgression is due to the “progressive dilution of the gene pool of the invading species by the few interbreeding events occurring at the wave front” (Currat et al.,

(20)

2008), and requires only minimal interbreeding success for complete intro- gression to take place. In situations where the two species are competing for local resources and the invading species (e.g. P. purpurea) replaces the local species (e.g. P. wilsonii ), then complete introgression of the invading species occurs even if the interbreeding success rate is low. In ranges that P. purpurea and P. wilsonii both inhabit, P. wilsonii is generally the rarer of the two (Farj´on, 1990), suggesting that even if it is being outcompeted for resources, complete introgression has still occurred.

Whilst the organelle data showed evidence for extensive introgression of haplotypes, the estimates of post-divergent gene flow between P. purpurea and P. wilsonii from MIMAR were not so clear. While there is evidence of gene flow in both directions, the migration rate is low and it is possible that this could simply be due to their recent divergence, or as a result of the vio- lation of one or more assumptions of the IM model. However, several studies have shown evidence for the higher introgression of organelle over autosomal DNA. For example, Bachtrog et al. (2006) showed extensive introgression of mitochondrial haplotypes in Drosophila yakuba, and Takahata & Slatkin (1984), when considering a neutral mtDNA genotype and non-lethal hy- brids, showed that it only takes a small amount of immigration to establish an mtDNA haplotype in a new population. Under the model described by Currat et al. (2008), introgression occurs most when intraspecific gene flow between demes is low, as introgressed haplotypes will compete more with those of the invading species. This has implications for loci that are inher- ited uniparentally because their restricted gene flow would lead to greater introgression compared to nuclear DNA, and could explain why evidence for migration is stronger for organelle DNA than for autosomal DNA. It is also important to consider that the estimates of migration from MIMAR are averaged over the time since divergence, and so if gene flow occurred relatively recently then the estimates would still appear to be small.

The divergence time between P. purpurea and P. wilsonii was also es- timated using MIMAR and gave a divergence time of 21,625 generations, which equates to 540,625 and 1,081,250 years assuming a generation time of 25 or 50 years respectively. Within-species patterns of nucleotide diversity could help to explain these complex patterns of divergence and convergence.

The segregating sites observed in P. purpurea contain a large proportion of singletons (double the expected number) and, as a result, estimates of Tajima’s D and Fu & Li’s F* are negative. This implies that the genealogy of P. purpurea is represented, to a greater extend than in the other species, by long external branches that are characteristic of population growth. This could suggest that, after diverging from a lineage ancestral to P. likiangensis, P. purpurea expanded into the range of P. wilsonii, leading to introgression of organelle haplotypes from the local species (P. wilsonii ) into the invad- ing species (P. purpurea). Potentially, this could account for P. purpurea’s phylogenetic and morphological grouping with P. likiangensis, as well as the

(21)

patterns of introgression observed in Lanzhou’s organelle study.

It is important to treat the MIMAR results with a certain amount of care, as the IM model is based on certain assumptions that do not hold in this case. Firstly, the two diverging populations are assumed to be the two most closely related with no other populations contributing genes. When dealing with extensively studied species it may be known whether this as- sumption holds, but for the spruce species on and around the QTP the evolutionary relationship amongst them is not fully understood. There are also other significant spruce species in the area (e.g. P. crassifolia, P. asper- ata or P. meyeri ) that may be contributing to the gene-pool. Secondly, the divergence time between P. purpurea and P. wilsonii is so recent that no fixed differences were observed. It is difficult to understand fully how this could affect the estimates, but it is entirely possible that the gene flow that we observe could be as a result of breaking one or more of these assumptions.

An indication of how these assumptions have affected the results from MIMAR can be ascertained by comparing the values of θ estimated from each of the different methods. Estimates of θ obtained for P. purpurea showed values closest to θW, which is another method that does not con- sider the frequency of each polymorphism. Statistics that do not consider the frequency of polymorphisms are sensitive to low frequency variants, such as singletons, that can make the estimate higher than would be expected. P.

purpurea shows an excess of singletons, so it is expected that estimates from MIMAR would inflate θ in a similar way. θW was also formulated based on the standard neutral model (Watterson, 1975), so it could be considered to be the statistic most suited to comparisons with MIMAR. Estimates of θ for P. schrenkiana and P. wilsonii from MIMAR were lower than any of the values given by other methods, which implies that other estimates given by the MIMAR analysis should be considered with care.

The estimates of post-divergent gene flow between P. purpurea and P.

schrenkiana indicated that there has been symmetrical introgression occur- ring after divergence. P. schrenkiana’s geographical isolation prevents any present day introgression, but the suggestion of post-divergent introgression is perhaps better understood with reference to the results from Structure, where P. purpurea was made up of equal amounts of variation from each of the inferred clusters. These results imply that the ancestral lineages were represented more by a panmictic metapopulation that would have occupied areas across the present day QTP, when the altitude and climatic conditions were more favourable for spruce species. While there has been considerable disagreement over estimates for the uplift of the plateau (Wang, C., 2008), studies of stable isotopes in marine and land fossils in the Kunlun Mountain area (Yang et al., 2008) have concluded that the climate was warm and wet during the late Pliocene (2-3 mya), and that since then there is evidence for a drying out towards the more arid climate we see today. The diver- gence time between P. purpurea and P. schrenkiana estimated by MIMAR

(22)

of 64,300 generations is equivalent to 1.61 or 3.22 million years, assuming a generation time of 25 or 50 years respectively, which corresponds with the period in which the drying up of the QTP began. Interestingly, this implies that the geographic barrier between P. schrenkiana and the other spruce species developed only as an indirect result of the uplift of the plateau, as the area gradually dried out. In this context, if geographic isolation pro- ceeded relatively slowly, post-divergent migration could have occurred.

Before coming to more confident conclusions it would be important to consider methods that allow for the simultaneous analysis of more than two species, such as Approximate Bayesian Computation (Beaumont et al., 2002), and to more specifically investigate whether the standard neutral model describes the variation observed in the spruce species better than other demographic scenarios. Other spruce species inhabit the area of study so it is necessary to investigate if they contribute significantly to the sampled gene pool. However, the patterns observed in the sequence data illustrate to a certain extent how the QTP uplift, and a constantly changing environ- ment, acts to diverge and converge differing populations to give the complex picture we see today. By considering analyses performed on molecular se- quence data together with knowledge of changing climatic conditions, an understanding of the interplay between environment and genotype can be reached.

(23)

Acknowledgements

I would first like to thank my supervisor Martin Lascoux for giving me the opportunity to do the project within the department, helping me with vari- ous aspects of the theory and analysis and for introducing me to population genetics in the first place. P´adraic Corcoran, Pontus Skoglund and Thomas K¨allman read through early versions of the manuscript and participated in endless discussions on various aspects of population genetics theory. I would like to thank Li Yuan and Sofia Hemmil¨a for doing a lot of the lab work, and Jun Chen for giving me sound advice in how to best implement MIMAR.

I would also like to thank Lanzhou University for supplying the seeds used in the extractions and would finally like to thank my parents for generously funding me throughout my education.

(24)

References

Bachtrog, D., Thornton, K., Clark, A. & Andolfatto, P. (2006). Extensive introgression of mitochondrial DNA relative to nuclear genes in Drosophila yakuba species group. Evolution 60(2): 292-302.

Beaumont, M. A., Zhang, W. & Balding, D. J. (2002). Approximate Bayesian computation in population genetics. Genetics 162: 2025-2035.

Becquet, C. & Przeworski, M. (2007). A new method to estimate parame- ters of speciation models, with application to apes. Genome Research 17:

1505-1519.

Bouill´e, M. & Bousquet, J. (2005). Trans-species shared polymorphisms at orthologous nuclear gene loci among distant species in the conifer Picea (Pinaceae): implications for the long-term maintenance of genetic diversity in trees. American Journal of Botany 92(1): 63-73.

Brown, G. R., Gill, G. P., Kuntz, R. J., Langley, C. H. & Neale, D. B. (2004).

Nucleotide diversity and linkage disequilibrium in loblolly pine. Proc. Natl.

Acad. Sci. USA 101: 15255-15260.

Currat, M., Ruedi, M., Petit, R. J. & Excoffier, L. (2008). The hidden side of invasions: massive introgression by local genes. Evolution 62-8: 1908- 1920.

Doyle, J. & Doyle, J. (1999). Isolation of plant DNA from fresh tissue.

BRL Focus 12: 13-15.

Excoffier, L., Laval, G. & Schneider, S. (2005). Arlequin, version 3.0: an integrated software package for population genetics data analysis. Evolu- tionary Bioinformatics Online 1: 47-50.

Ewing, B. & Green, P. (1998). Basecalling of automated sequencer traces using phred. II. Error probabilities. Genome Research 8: 186-194.

Ewing, B., Hillier, L., Webdl, M. & Green, P. (1998). Basecalling of au- tomated sequencer traces using phred . I. Accuracy assessment. Genome Research 8: 175-185.

Falush, D., Stephens, M. & Pritchard, J. K. (2003). Inference of population structure using multilocus data: linked loci and correlated allele frequencies.

Genetics 164: 1567-1587.

(25)

Falush, D., Stephens, M. & Pritchard, J. K. (2007). Inference of popu- lation structure using multilocus genotype data: dominant markers and null alleles. Molecular Ecology Notes 7: 574-578.

Farj´on, A. (1990). Pinaceae: drawings and descriptions of the genera Abies, Cedrus, Pseudolarix, Keteleeria, Nothotsuga, Tsuga, Cathaya, Pseudotsuga, Larix and Picea. [Regnum Vegetable 121]. Koenigstein : Koeltz Scientific Books.

Farj´on, A. (2001). World checklist and bibliography of Conifers, Second edn. Royal Bot. Gard., Kew, England.

Fay, J. C. & Wu, C.-I. (2000). Hitchhiking under positive Darwinian se- lection. Genetics 155: 1405-1413.

Felsenstein, J. (1988). Phylogenies from molecular sequences-inference and reliability. Ann. Rev. Genet. 22: 521-565.

Fu, Y.-X. (1995). Statistical properties of segregating sites. Theoret. Pop.

Biol. 48: 172-197.

Fu, Y.-X. & Li, W.-H. (1993). Statistical tests of neutrality of mutations.

Genetics 133: 693-709.

Gilks, W., Richardson, S. & Spiegelhalter, D. (1996). Markov Chain Monte Carlo in practice. (Chapman and Hall/CRC, Boca Raton, FL).

Gillespie, J. H. (2004). Population genetics - A Concise Guide. The John Hopkins University Press, Baltimor, Maryland, USA.

Gordon, D., Abajian, C. & Green, P. (1998). Consed: a graphical tool for sequence finishing. Genome Research 8: 195-202.

Griffiths, R. C. (1991). The two-locus ancestral graph, in Selected Pro- ceedings of the Symposium on Applied Probability (I. V. Basawa and R. L.

Taylor, eds), pp. 100-117, Institute of Mathematical Statistics, Hayward, CA, USA.

Griffiths, R. C. & Marjoram, P. (1996). Ancestral inference from samples of DNA sequences with recombination. J. Comp. Biol. 3: 479-502.

Griffiths, R. C. & Tavar´e, S. (1998). The age of a mutation in a general coalescent tree. Commun. Statist.-Stochastic Models 14: 373-295.

(26)

Haddrill, P. R., Thornton, K. R., Charlesworth, B. & Andolfatto, P. (2005).

Multilocus patterns of nucleotide variability and the demographic and se- lection history of Drosophila melanogaster populations. Genome Res. 15:

790-799.

Herbots, H. M. (1997). The structured coalescent. in Progress in Popu- lation Genetics and Human Evolution (IMA Volumes in Mathematics and Its Applications, vol. 87) (P. Donnelly and S. Tavar´e, eds), pp. 231-255, Springer-Verlag, New York.

Heuertz, M., De Paoli, E., Kllman, T., Larsson, H., Jurman, I., Morgante, M., Lascoux, M. & Gyllenstrand, N. (2006). Multilocus patterns of nu- cleotide diversity, linkage disequilibrium and demographic history of Norway Spruce [Picea abies (L.) Karst]. Genetics 174: 2095-2105.

Hey, J. & Nielsen, R. (2004). Multilocus methods for estimating popula- tion sizes, migration rates and divergence times, with applications to the divergence of Drosophila pseudoobscura and D. persimilis. Genetics 167:

747-760.

Hey, J. & Nielsen, R. (2007). Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proc.

Natl. Acad. Sci. U.S.A. 104(8): 2785-2790.

Hudson, R. R. (1983). Properties of a neutral allele model with intragenic recombination. Theor. Pop. Biol. 23: 183-201.

Hudson, R. R. (2001). Two-locus sampling distributions and their appli- cation. Genetics 159: 1805-1817.

Hudson, R. R. (2002). Generating samples under a Wright-Fisher neutral model. Bioinformatics 18: 337-338.

Hudson, R. R. & Coyne, J. A. (2002). Mathematical consequences of the genealogical species concept. Evolution 56(8): 1557-1565.

Hudson, R. R. & Kaplan, N. L. (1985). Statistical properties of the number of recombination events in the history of a sample of DNA-sequences. Ge- netics 111: 147-164.

Jaramillo-Correa, J. P., Bousquet, J., Beaulieu, J., Isabel, N., Perron, M.

& Bouill´e, M. (2003). Cross-species amplification of mitochondrial DNA sequence-tagged-site markers in conifers: the nature of polymorphism and variation within and among species in Picea. Theor. Appl. Genet. 106:

(27)

1353-1367.

Kingman, J. F. C. (1982a). The coalescent. Stochast. Proc. Appl. 13:

235-248.

Kingman, J. F. C. (1982b). On the genealogy of large populations. J.

Appl. Prob. 19A: 27-43.

Krone, S. M. & Neuhauser, C. (1997). Ancestral processes with selection.

Theoret. Pop. Biol. 51: 210-237.

Lamothe, M., Meirmans, P. & Isabel, N. (2006). A set of polymorphic EST-derived markers for Picea species. Molecular Ecology Notes 6: 237-240.

McVean, G., Awadalla, P. & Fearnhead, P. (2002). A coalescent-based method for detecting and estimating recombination from gene sequences.

Genetics 160: 1231-1241.

Neuhauser, C. & Krone, S. M. (1997). The genealogy of samples in models with selection. Genetics 145: 519-534.

Nielsen, R. & Wakeley, J. (2001). Distinguishing migration from isolation:

a Markov chain Monte Carlo approach. Genetics 158: 885-896.

Notohara, M. (1990). The coalescent and the genealogical process in ge- ographically structured populations. J. Math. Biol. 29: 59-75.

Perry, D.J. & Bousquet, J. (1998). Sequence-tagged-site (STS) markers of arbitrary genes: development, characterization and analysis of linkage in black spruce. Genetics 149: 1089-1098.

Pritchard, J. K., Stephens, M. & Donnelly, P. (2000). Inference of pop- ulation structure using multilocus genotype data. Genetics 155: 945-959.

Ran, J-H., Wei, X-X. & Wang, X-Q. (2006). Molecular phylogeny and biogeography of Picea (Pinaceae): Implications for phylogeographical stud- ies using cytoplasmic haplotypes. Mol. Phylogenet. Evol. 41: 405-419.

R Developments Core Team (2007). R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna.

Rozas, J., Snchez-DelBarrio, J.C., Messeguer, X. & Rozas, R. (2003). DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioin- formatics 19: 2496-2497.

(28)

Saunders, I. W., Tavar´e, S. & Watterson, G. A. (1984). On the geneal- ogy of nested subsamples from a haploid population. Adv. Appl. Prob. 16:

471-491.

Slotte, T., Huang, H., Lascoux, M. & Ceplitis, A. (2008). Polyploid specia- tion did not confer instant reproductive isolation in Capsella (Brassicaceae).

Mol. Bio. Evol. 25: 1472-1481.

Stephens, M. (2001). Inference under the coalescent in Handbook of Sta- tistical Genetics, eds Balding D.J., Bishop, M. & Cannings, C. (Wiley, West Sussex, UK).

Syring, J., Willyard, A., Cronn, R.C. & Liston, A. (2005). Evolutionary relationships among Pinus (Pinaceae) subsections inferred from multiple low-copy nuclear loci. American Journal of Botany 92: 2086-2100.

Tajima, F. (1983). Evolutionary relationship of DNA sequences in finite populations. Genetics 105: 437-460.

Tajima, F. (1989). Statistical method for testing the neutral mutation hy- pothesis by DNA polymorphism. Genetics 123: 585-595.

Takahata, N. & Slatkin, M. (1984). Mitochondrial gene flow. Proc. Natl.

Sci. USA. 81: 1764-1767.

Temesgen. B., Brown, G. R., Harry, D. E., Kinlaw, C. S., Sewell, M. M.

& Neale, D. B. (2001). Genetic mapping of expressed sequence tag poly- morphism (ESTP) markers in loblolly pine (Pinus taeda L.). Theor. Appl.

Genet. 102: 664-675.

Thornton, K. (2003). libsequence: a C++ class library for evolutionary genetic analysis. Bioinformatics 19 (17): 2325-2327.

Thornton, K. & Andolfatto, P. (2006). Approximate Bayesian Inference reveals evidence for a recent, severe, bottleneck in a Netherlands population of Drosophila melanogaster. Genetics 172: 1607-1619.

Wakeley, J. & Hey, J. (1997). Estimating ancestral population parame- ters. Genetics 145: 847-855.

Wang, C., Zhao, X., Liu, Z., Lippert, P. C., Graham, S. A., Coe, R. S., Yi, H., Zhu, L., Liu, S. & Li, Y. (2008). Constraints on the early uplift his- tory of the Tibetan Plateau. Proc. Natl. Acad. Sci. U.S.A. 105: 4987-4992.

(29)

Wang, Y., Deng, T. & Biasetti, D. (2006). Ancient diets indicate signif- icant uplift of southern Tibet after ca. 7 Ma. Geology 34: 309-312.

Wang, Y., Wang, X., Xu, Y., Zhang, C., Li, Q., Tseng, Z. J., Takeuchi, G.

& Deng, T. (2008). Stable isotopes in fossil mammals, fish and shells from Kunlun Pass Basin, Tibetan Plateau: Paleo-climatic and paleo-elevation implications. Earth and Planetary Science Letters 270: 73-85.

Watterson, G. A. (1975). On the number of segregating sites in the ge- netical models without recombination. Theoret. Pop. Biol. 7: 256-276.

Willyard, A., Syring, J., Gernandt, D. S., Liston, A. & Cronn, R. (2007).

Fossil calibration of molecular divergence infers a moderate mutation rate and recent radiations for Pinus. Mol. Biol. Evol. 24(1): 90-101.

Wright, S. (1951). The genetical structure of populations. Ann. Eugen- ics 15: 323-354.

Zeng, K., Fu, Y-X., Shi, S. & Wu, C-I. (2006). Statistical tests for de- tecting positive selection by utilizing high-frequency variants. Genetics 174:

1431-1439.

(30)

Appendix

Table 4: List of the outgroups used to infer the ancestral allele, with accession numbers and references supplied.

Locus Outgroups Accessions/References

sel1364 P. abies, P. breweriana, P. glauca, P. mar- iana, Pinus taeda †

Heuertz et al. (2006), † AM269163 sel1390 P. abies, P. breweriana, P. glauca, P. mar-

iana

Heuertz et al. (2006) xy1420 P. abies, P. breweriana †, P. glauca †, P.

mariana †

Heuertz et al. (2006), † Chen et al. (sub- mitted)

Ft3 P. abies, P. breweriana, P. glauca, P. mar- iana

Chen et al. (submitted) Gi P. abies, P. breweriana †, P. glauca †, P.

mariana †

Heuertz et al. (2006), † Chen et al. (sub- mitted)

Sb16 P. abies, P. breweriana, P. glauca, P. mar- iana

Chen et al. (submitted) Sb29 P. abies, P. breweriana, P. glauca, P. mar-

iana

Chen et al. (submitted) 2009 Cathaya argyrophylla, Pinus densata DQ424832, DQ232990 4cl Cathaya argyrophylla, Larix sukaczewii AF144505, EU280880

ebs P. abies AM267810.1

Pch P. abies, P. breweriana, Cathaya argy- rophylla, Pinus taeda

Chen et al. (submitted), DQ424820, X66727 S54300

M002 P. glauca, Pinus halepensis DQ120066.1, AY705798

M007D1 Pinus monticola AY596275

References

Related documents

The distribution of these mutations along the branches of the genealogy form different patterns at the tips of the trees, and by comparing the patterns pro- duced by the

Re-examination of the actual 2 ♀♀ (ZML) revealed that they are Andrena labialis (det.. Andrena jacobi Perkins: Paxton & al. -Species synonymy- Schwarz & al. scotica while

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

Both Brazil and Sweden have made bilateral cooperation in areas of technology and innovation a top priority. It has been formalized in a series of agreements and made explicit

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar

• Utbildningsnivåerna i Sveriges FA-regioner varierar kraftigt. I Stockholm har 46 procent av de sysselsatta eftergymnasial utbildning, medan samma andel i Dorotea endast

Industrial Emissions Directive, supplemented by horizontal legislation (e.g., Framework Directives on Waste and Water, Emissions Trading System, etc) and guidance on operating