• No results found

Conifer Evolution, from Demography and Local Adaptation to Evolutionary Rates: Examples from the Picea genus

N/A
N/A
Protected

Academic year: 2022

Share "Conifer Evolution, from Demography and Local Adaptation to Evolutionary Rates: Examples from the Picea genus"

Copied!
54
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)
(3)

"The way was long, and wrapped in gloom did seem, As I urged on to seek the hidden truth."

Qu, Yuan

(4)
(5)

List of papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Chen, J., T. Källman, N. Gyllenstrand, and M. Lascoux, 2010. New insights on the speciation history and nucleotide diversity of three boreal spruce species and a Tertiary relict. Heredity 104(1): 3-14 II Li, Y, M. Stocks, S. Hemmila, T. Källman, H. Zhu, Y. Zhou, J. Chen,

J. Liu, and M. Lascoux, 2010. Demographic histories of four spruce (Picea) species of the Qinghai-Tibetan Plateau and neighboring areas inferred from multiple nuclear loci. Molecular Biology & Evolution 27(5): 1001-1014.

III Chen, J., T. Källman, XF. Ma, N. Gyllenstrand, G. Zaina, M.

Morgante, J. Bousquet, A. Eckert, J. Wegrzyn, D. Neale, U.

Lagercrantz, and M. Lascoux, 2012. Disentangling the roles of history and local selection in shaping clinal variation in allele frequency and gene expression in Norway spruce (Picea abies). Genetics 191(3):

865-881

IV Chen, J., Y. Tsuda, M. Stocks, T. Källman, VL. Semerikov, GG.

Vendramin, and M. Lascoux, 2012. Clinal variation in allele frequency in photoperiodic genes in Siberian spruce (Picea. obovata): an

example of convergent evolution? Manuscript

V Chen, J., S. Uebbing, N. Gyllenstrand, U. Lagercrantz, M. Lascoux, and T. Källman, 2012. Sequencing of the needle transcriptome from Norway spruce (Picea abies Karst L.) reveals lower substitution rates but similar selective constraints in conifers compared to angiosperms.

Submitted manuscript

VI Källman, T., J. Chen, N. Gyllenstrand, U. Lagercrantz, 2012. Major difference in short RNA populations among plants – the result of interactions with pathogens? Manuscript

Reprints were made with permission from the publishers.

(6)
(7)

Contents

1 Introduction

. . . .

9

1.1 Speciation, Geographic Variation and Clines

. . . .

10

1.1.1 Ecology and phylogeny of spruce

. . . .

10

1.1.2 Speciation and population history

. . . .

12

1.1.3 Identifying local adaptation along a cline

. . . .

16

1.2 Molecular evolution of Norway spruce

. . . .

18

1.2.1 Transcriptome

. . . .

18

1.2.2 Small noncoding RNAs and their regulatory role in plants

. . . .

20

2 Results and discussion

. . . .

23

2.1 Estimate of speciation history of spruce based on multilocus patterns - Paper I and II

. . . .

23

2.1.1 Nucleotide diversity and gene flow

. . . .

24

2.1.2 Demographic history of eight spruce species

. . . .

24

2.2 Clinal variation and local selection - Paper III and IV

. . . .

25

2.2.1 Latitudinal gradient patterns in phenology

. . . .

26

2.2.2 Population genetic structures

. . . .

26

2.2.3 Allele frequency spectrum analysis for local adaptation

. . . .

27

2.2.4 Differentiation in gene expression

. . . .

29

2.2.5 Parallel evolution and the selection on Gigantea

. . . .

29

2.3 Norway spruce transcriptome and regulatory small RNAs - Paper V and VI

. . . .

31

2.3.1 Norway spruce needle transcriptome sequencing

. . . .

31

2.3.2 Gene expression differences

. . . .

31

2.3.3 Comparative molecular evolution in conifers

. . . .

32

2.3.4 Norway spruce small regulatory RNA sequencing

. . . .

34

2.3.5 MicroRNA prediction

. . . .

34

2.3.6 NBS-LRR gene family and relevant siRNAs

. . . .

35

3 Conclusion

. . . .

37

4 Svensk sammanfattning

. . . .

39

5 Acknowledgements

. . . .

41

References

. . . .

42

(8)
(9)

1. Introduction

"Nothing in evolution makes sense except in the light of population genetics"

M. Lynch (2007)

The evolutionary process can be inferred at three different levels: at the species level, at the population level or at the molecular level. The first level is the domain of phylogenetics and systematics, while the second and third levels are the domains of population genetics and molecular evolution, respec- tively. All three levels have often been apprehended separately and led to multi- facetted, and sometimes divergent, perceptions of the evolutionary process. For example, the inherent stochasticity of the evolutionary process which is one of the great obsessions of population geneticists, and with which we will grapple in the work presented here, was until recently largely neglected in systemat- ics. And while molecular biologists have tended to embrace gene regulation – rather than structural changes – as the backbone of evolutionary changes, pop- ulation geneticists have been more reserved (H

OEKSTRA AND

C

OYNE

, 2007;

S

TERN AND

O

RGOGOZO

, 2009). In spite of obvious differences in time and spatial scales between the three levels, it is also evident that the three levels are highly interconnected. A first unifying factor is the environment since, at all three levels, the outcome of the evolutionary process is strongly contingent to the biotic and physical environments in which the organisms live. Inciden- tally, the organisms thereby also define the environment in which they live. A second unifying factor is of course genes, and perhaps more importantly genes in populations. Notwithstanding recent findings of an heritable contribution of epigenetic processes to quantitative traits (e.g. J

OHANNES

et al., 2009), genes remain the main evolutionary unit.

The present thesis is an attempt to look at the evolutionary process in spruce

species at all three levels and, when possible, to try to highlight the connec-

tion between the different levels. The first two studies (Papers I and II) tackle

aspects of speciation with population genetics tools. As we shall see spruce

species sometimes still exchange genes and when they do not they still look

surprisingly young if their age is measured on the right timescale. The next two

studies (Papers III and IV) are related to adaptive variation and firmly rooted in

population genetics. Yet they focus on specific genes and compare two species

thereby constituting a natural bridge with the last two manuscripts (Papers V

and VI) that constitute some initial forays in the genomic world. Yet again,

even in this last group of studies, there is a connection with the species and

population levels as divergence data are used to infer genome–wide patterns of

selection.

(10)

1.1 Speciation, Geographic Variation and Clines

While it is generally accepted that evolution occurs by "descent with modifi- cation" and species originate from other species, the details of the speciation process are still hotly debated and, altogether, remain poorly understood. For example, can species evolve in sympatry? How are species maintained in the face of gene flow? Is natural selection necessary for speciation to take place or can speciation occur simply through mutation and random drift? If natural se- lection does indeed play an overwhelming role in evolution as argued by H

AHN

(2008), how then is genetic diversity maintained? In the present thesis I am going to address some of these questions using spruce species as an example.

1.1.1 Ecology and phylogeny of spruce

Distribution and ecology

The genus Picea currently consists of around 34 species. They are usually tall, sometimes large, evergreen trees with a monopodial, straight, and columnar trunk (F

ARJON

, 1990). Spruce grows from the polar circle down to just south of the Tropic of Cancer (see Figure 1.1). Some species are widely distributed across the northern conifer forest biome: P. abies and its close relative P. obo- vata grow from the Atlantic coast of Norway eastwards to the Sea of Okhotsk, and the combined ranges of P. glauca and P. mariana cover most of Canada and Alaska. At southern latitudes spruce species are generally confined to small and scattered populations at high altitude: P. martinezii and P. chihuahuana in Mex- ico or P. morissonicola in Taiwan. The Tertiary relict P. breweriana also grows in scattered montane populations in Northern California. Between these two ex- tremes we find species with intermediate distribution areas. Some are coastal, like P. sitchensis, or montane to subalpine, like P. schrenkiana, P. likiangensis, P. willsonii, and P. purpurea in the Sino-Himalayan area. Northern and southern range limits could be caused primarily by the deleterious effects of cold and hot air temperatures, respectively, on individual tree growth – although their impor- tance has been disputed in P. mariana (B

ONAN AND

S

IROIS

, 1992). Likely a combination of biotic and abiotic factors, such as the frequency of outbreaks of insect defoliators or pathogenic fungi, the duration of the growing season and moisture supply, all play a role in setting these limits.

Phylogeny

As for other conifer genus, there is not yet any satisfying phylogeny of Picea.

The best available phylogenies of the Picea genus are based on chloroplast and

mitochondrial markers (e.g. R

AN

et al., 2006; B

OUILLÉ

et al., 2010) that both

are non-recombining and therefore equivalent to single loci. These phyloge-

nies based on cytoplasmic markers neither match morphological classifications

nor are congruent with each other. These discrepancies could reflect the dif-

ferent inheritance and dispersal modes of the two cytoplasmic genomes as well

as incomplete lineage sorting. Even if the effective population size of cyto-

plasmic markers is one half that of nuclear markers in a monoecious species,

there are good grounds to believe that incomplete lineage sorting also matters

(11)

Figure 1.1.

Natural distribution ranges of ten spruce species studied in this thesis. In North America: P. mariana and P. glauca (blue), P. sitchensis (orange) and P. breweriana (pur- ple); In Eurasia: P. abies and P. obovata (green), P. shrenkiana (shaded green), P. likiangensis (shaded pink), P. purpurea (shaded orange), and P. wilsonii (shaded blue). Distribution data were collected from FARJON(1990), USGU (http://esp.cr.usgs.gov/) and EUFORGEN (http://www.euforgen.org/)

in this case as it seems pervading for nuclear loci. For instance, B

OUILLÉ AND

B

OUSQUET

(2005) recorded a vast number of trans-species shared polymor- phisms. They suggested that this could be the result of incomplete lineage sort- ing, which is consistent with the large effective population size of spruce (in a simple split model it takes ∼ 9−12N

e

generations for a pair of taxa to reach re- ciprocal monophyly in 95% nuclear loci, H

UDSON AND

C

OYNE

, 2002), and/or from inter-species gene flow which is indeed reflected by the presence of hy- brid zones, for instance between Norway and Siberian spruces (T

OLLEFSRUD

, 2008). Because of their smaller effective population size organelle DNA will also become monophyletic much more rapidly and fail to recover information on ancient speciation events (H

UDSON AND

C

OYNE

, 2002). Nonetheless the phylogenies based on cytoplasmic markers provide a first glimpse at the general history of the genus. Both markers from the mitochondria and chloroplast, for instance, indicate that P. breweriana is basal and that other species belong to three clades: a glauca clade, an asian clade and an abies–mariana clade. But eventually, a multilocus phylogeny will be required. A multilocus approach does not insure the retrieval of the true species tree (D

EGNAN AND

R

OSEN

-

BERG

, 2006, 2009; D

EGNAN

et al., 2009), especially if the number of nuclear

loci used is limited, but it will certainly reduce the level of uncertainty. Com-

putational problems associated with large multiloci trees remain formidable but

new methods are emerging and seem promising (e.g. B

RYANT

et al., 2012).

(12)

1.1.2 Speciation and population history

The main goal of phylogenetics is to retrieve the species tree linking the dif- ferent taxa under study. In genus such as spruces, where incomplete lineage sorting is the rule rather than the exception and where hybridization does oc- cur, going from gene tree to species tree is not an easy task and will require numerous independent loci. While the presence of a large proportion of shared polymorphisms is a cause for worries for species tree inference, it can also be viewed as a positive thing if one is instead interested in inferring ancient popu- lation demographics. Indeed the fact that ancient polymorphisms that coalesce in the ancestral population still segregate in current populations implies that the SNP variation detected in current data also contains information on the an- cestral population. Hence, one can use polymorphism and divergence data to estimate key demographic parameters associated to the divergence of popula- tions or species. Since the seminal paper of W

AKELEY AND

H

EY

(1997) and the introduction of "Isolation by migration" model, different approaches have been developed to estimate those demographic parameters. It should be noted that this large body of work is a direct evidence of the convergence of methods in population genetics and phylogenetics.

"Isolation with migration" model

In its generic form, the "Isolation with migration" (IM) model describes an ancestral population that splits into two descendant populations that are con- nected by gene flow. The IM model assumes neutrality and has six parameters:

the scaled mutation rates of the ancestral and descendant populations (θ

A

,

θ1

and

θ2

) where

θ = 4 × Ne× μ, Ne

is the effective population size and

μ is the

mutation rate, the split time (T ) between the descendent populations and the mi- gration rates between the descendant populations after the split (m

12

and m

21

) (Figure 1.2).

θA=4NAμ

θ1=4N1μ θ2=4N2μ T

m2 m1

Figure 1.2.

"Isolation with migration" model

Different statistical methods have been developed to estimate those param- eters (H

EY AND

N

IELSEN

, 2004; B

ECQUET AND

P

RZEWORSKI

, 2007), all of which are based on the coalescent model presented in N

IELSEN AND

W

AKE

-

LEY

(2001) for a single locus and H

EY AND

N

IELSEN

(2004) for multiple loci.

(13)

Briefly, the methods generally rely on a comparison between polymorphism and divergence and use summary statistics initially proposed by W

AKELEY AND

H

EY

(1997), or a slightly modified version of those (B

ECQUET AND

P

RZE

-

WORSKI

, 2007). Following W

AKELEY AND

H

EY

(1997) segregating sites (S) are classified into four categories, S

1

, S

2

, S

s

, and S

f

. For each locus, S

1

and S

2

are the number of polymorphic sites unique to populations 1 and 2, respectively;

S

s

is the number of sites with shared alleles between the two samples, and S

f

is the number of sites where fixed alleles are found in one sample and no poly- morphisms are found in the other sample (Figure 1.3). These summary statistics contain information about the demographic parameters of the IM model. For example, a large number of shared polymorphisms are expected when the split time is short and the effective population sizes are large, as most coalescent events would then take place in the ancestral population. The presence of gene flow complicates things a bit and makes them less intuitive since gene flow will also lead to shared polymorphisms. However, gene flow will also lead to an ex- cess of variation in coalescence time among loci compared to a pure split model (B

ECQUET AND

P

RZEWORSKI

, 2009).

Leads to a fixed difference Leads to a shared polymorphism

Leads to polymorphism in only one population

θ

1

θ

2

θ

Α

Τ

Figure 1.3.

Different categories of segregating sites

The two descendent populations can be two populations of the same species,

two incipient species or two well-defined species. In contrast to the prevailing

allopatric model of speciation a meta-analysis of P

INHO AND

H

EY

(2010) sug-

gests that speciation often occurs in the presence of gene flow. These results

should, however, be interpreted carefully. For example, as shown by B

ECQUET AND

P

RZEWORSKI

(2009), current methods to estimate gene flow under the

IM models, will have a limited power to estimate gene flow early on in the di-

vergence process, since it may not increase the variation in coalescence time

to a detectable level. Unfortunately it is precisely gene flow occurring early

on in the divergence process that one would like to detect in order to tell apart

sympatric and allopatric speciations. Finally, we note that neutral models, even

(14)

though they remain irreplaceable as null models, are unlikely to capture all the complexity of the speciation process that is the result of interactions between natural selection, genetic drift, gene flow and recombination, all of which can cause accumulation of fixed mutations and finally lead to genetic incompatibil- ity between the two closely related populations (see reviews of C

OYNE AND

O

RR

, 2004; H

EY

, 2006; P

INHO AND

H

EY

, 2010). So eventually, one would have to understand better the interplay of demography and selection within each species.

Demographic history of spruce species

Like most other plant and animal species, spruces experienced cycles of con- tractions and expansions in population size in response to climate changes dur- ing the late Quaternary (D

AVIS AND

S

HAW

, 2001). These fluctuations were particularly strong for species living at high latitudes in North America and Eurasia, but species at lower latitudes were also affected, though probably to a lesser degree. This is supported by a combination of palynological data, macro- fossils and surveys of genetic diversity. The available data, as well as the data presented in the present thesis, however, suggest a large variation in the way individual species reacted to climatic fluctuations.

In North America paleoecological studies of pollen fossil showed that P.

mariana and P. glauca went through a rapid population expansion that lasted over 3000 years since the Last Glacial Maximum (LGM) (M

C

L

EOD AND

M

AC

- D

ONALD

, 1997) and P. mariana replaced the dominant position of P. glauca at

∼ 14,000 cal yr BP (L

INDBLADH

et al., 2007). P. sitchensis is also thought to have experienced a rapid expansion from southern refugia in coastal Califor- nia (D

AUBENMIRE

, 1968; S

OLTIS

et al., 1997; M

IMURA AND

A

ITKEN

, 2007).

However, the Californian spruce species P. breweriana had a fairly stable pop- ulations size during the same period (L

EDIG AND

H

ODGSKISS

, 2005).

A similar variation in plant reactions to climatic changes is observed in Eurasian spruce species. Pollen data (G

IESECKE AND

B

ENNETT

, 2004; K

NAAP

, 2006) indicate that P. abies expanded strongly in Northwest Europe. After the LGM Norway spruce recolonized Scandinavia from populations located in cen- tral Russia and most likely also from cryptic refugia in the Northern part of European Russia (V

ÄLIRANTA

et al., 2011; C

HEN

et al., 2012). Spruces seem to have also survived in small glacial refugia on the Norwegian coast, but their contribution to current populations appears to have been limited (P

ARDUCCI

et al., 2012). The spread westwards followed at least two main recoloniza- tion routes (T

OLLEFSRUD

et al., 2009), today reflected in a complex popula- tion genetic structure despite of extensive gene flow (H

EUERTZ

et al., 2006;

C

HEN

et al., 2012). Norway spruce reached eastern Finland about 6,500 years ago and eastern central Sweden about 2,700 years ago (G

IESECKE AND

B

EN

-

NETT

, 2004; S

EPPÄ

et al., 2010). Thus the current Scandinavian latitudinal

clines in bud set (see below) were established during a short evolutionary period

and from different source populations. Interestingly, multiloci studies in both

conifers (Picea abies, Pinus sylvestris) and angiosperm trees (Populus trem-

ula) found that the three species experienced a severe bottleneck long before

the LGM (P

YHÄJÄRVI

et al., 2007; I

NGVARSSON

, 2008). In the three species,

(15)

inferences were based on the intensive use of coalescent simulations and sum- mary statistics of the site frequency spectrum (Tajima’s D and Fay & Wu’s H).

In Norway spruce, H

EUERTZ

et al. (2006) used 22 loci from a sample of seven populations representing Baltico-Nordic domain, Alpine domain and Romania domain. It should be noted that N

AMROUD

et al. (2010) using a smaller dataset but longer sequences detected a bottleneck corresponding roughly to the LGM.

As one moves eastwards into the Siberian plain and onto the Qinghai Ti- betan Plateau (QTP), paleoecological data become more scarce. In both places the available data suggest a different impact of climate changes compared to western Europe. In Siberia, glaciation has been much less extensive than in Northwestern Europe (M

AC

D

ONALD

et al., 2008; S

TAUCH AND

G

UALTIERI

, 2008). While the treeline certainly moved during the Holocene some conifer stands remained at high latitude in the Northeast (M

UELLER

et al., 2010). Much less is known about central Siberia. A recent study indicates that most of the area was a cold desert (V

ELICHKO

et al., 2011) as westerly moisture, which is today the source of both rain and snow in this region, was blocked by the Scan- dinavian Ice Sheet during the Glacial time. The depth of permafrost during the LGM was also much greater than it is today. However, macrofossil remains collected on lower river terrace and high floodplains suggest that while most of the region consisted of Aeolian dunes the river valleys were a separate habitat where spruces were able to survive (B

INNEY

et al., 2009). Altogether, paleoe- cological and genetic data hence suggest that the impact of climate changes on the population size of the main components of the boreal forest has probably been more limited than in their Scandinavian counterparts.

The mountain ranges surrounding the QTP are home to a very large diver-

sity of plant species (L

ÓPEZ

P

UJOL

et al., 2011). In particular, a third of the

spruce species are growing there, a proportion strikingly different to that of

boreal species, which occupied a much larger area. This high species diver-

sity is probably a consequence of the orogenesis of the highly rugged and dis-

sected topography of the area. In the words of L

ÓPEZ

P

UJOL

et al. (2011)

the eastern fringe of the Tibetan Plateau clearly constitutes the "evolutionary

front" of China. Palynological studies suggest that conifers were present in

the region some 50 mya but that spruce species only started to be an impor-

tant fraction of the pollen record on the QTP around 38 mya (W

ANG

et al.,

1990; D

UPONT

N

IVET

et al., 2008). Spruce species remained common un-

til ca. 20 mya at which time their contribution to the pollen record started

to decline and stayed low until 17 mya. The ensuing years are unfortunately

poorly documented, but the continuing elevation and climatic changes through

which the QTP went during this period certainly led to further fluctuations in

abundance. In particular, the significant uplift of southern Tibet some 7 mya

inferred by some authors (e.g. W

ANG

et al., 2006, 2008) would certainly have

affected spruce distribution. As for Siberia, the impact of the Quaternary full-

glacial periods on species genetic diversity seems to have been more limited

than in Western Europe even though there is indirect evidence that species also

retreated to glacial refugia. Altogether the Quaternary history of conifers on the

fringes of the QTP appears very dynamic with major population shifts but also

(16)

species staying put, secondary contacts and the presence of homoploid hybrids (M

A

et al., 2006; L

I

et al., 2010b, 2011).

In summary, paleoecological and genetic data suggest that spruce species experienced very different demography histories in various parts of the world.

A comprehensive description of the history of spruce species should account for this diversity. This is especially true for reconstruction based on standing genetic variation which is going to be influenced by the recent demographic changes reviewed above. Further, as we noted in the Phylogeny section, the presence of incomplete lineage sorting and hybridization means that retriev- ing species trees in such group of species is going to be hard. A multilocus population genetic analysis of smaller groups of species therefore seems more promising, at least in the short term. When this thesis was initiated the only available study B

OUILLÉ AND

B

OUSQUET

(2005) considered three distantly related species and gave estimates of N

e

varying from 96,000 to 182,000 and a divergence time ranging from 10 to 18 million years, depending on the species pair considered (assuming mutation rate μ = 1 ∼ 2× 10

−8

and generation time of 50 years). These estimates were simply derived from pairwise nucleotide di- vergence ( π), without any consideration of the demographic history of the pair of species analyzed.

The first two studies of the thesis (Paper I and II) constitute a first attempt in spruce species to use multilocus data and IM methods to:

i Estimate divergence time and gene flow between different species ii Identify putative hybrids

iii Resolve the phylogenic relationships between groups of spruce species

1.1.3 Identifying local adaptation along a cline

Geographic variation and clines

Speciation, like all evolutionary processes, depends in the end on a balance be-

tween evolutionary forces. As a matter of fact, populations can diverge genet-

ically and yet not lead to new species. Instead the various evolutionary forces

acting on the populations, like natural selection, mutation, gene flow and ge-

netic drift might reach an equilibrium which is reflected by different forms

of population genetic structure. Geographic variation is found in nearly ev-

ery group of organisms, and a given species may exhibit within its range any of

a large array of geographical patterns such as "Disjunction", "Overlap", "Hy-

brid zone", "Conjunction" and "Gradation" (E

NDLER

, 1977). The latter two

are called "clines" and have since long fascinated biologists (E

NDLER

, 1977)

as they can provide striking examples of natural selection. At a simpler level,

a cline can be regarded as a gradient of characters controlled by either a sin-

gle major gene or groups of linked or interacting genes, the dispersal of whose

frequencies could be used as a quantitative description of a cline itself. The

correlation of genotype or phenotype frequency with climatic variables (such

as temperature and precipitation) or physical variables (such as latitude, al-

titude and longitude) has been used as a powerful method of detecting spa-

tially varying natural selection and has been reported in a vast number of genes

(17)

across species, for example the pigmentation in humans (R

OBERTS

, 1977), Adh gene in Drosophila (B

ERRY AND

K

REITMAN

, 1993), FRI and FLC genes in Arabidopsis (C

AICEDO

et al., 2004). The large distribution ranges of forest trees such as spruces, span a broad range of environments and thereby popula- tions are under a highly diverse set of selection environments. S

AVOLAINEN

et al. (2007) and E

RIKSSON

(2008) have summarized a large number of studies showing that phenotypic variation is often associated to environmental vari- ables in forest trees. Classical example of clines in forest trees are traits related to phenology such as bud burst, bud set, growth cessation or cold tolerance.

While there are certainly good grounds to believe that these clines are adaptive, proving it at the gene level has turned out to be hard. This is in part because gene flow and population history can also create clines in allele frequency but also because these traits are inherently quantitative traits.

Detecting local adaptation in a cline

A cline can be generated by selection (I

NGVARSSON

et al., 2006) as well as many other forces like population demography (H

OLLIDAY

et al., 2010) and random genetic drift (P

OLECHOVÁ AND

B

ARTON

, 2011). Various groups of plant and animal species recolonized Scandinavia from different LGM refugia (e.g. J

AAROLA

et al., 1999; D

E

C

ARVALHO

et al., 2010; C

HEN

et al., 2012), resulting in a contact zone between the different ancestral lineages of varying breadth and location or in a more gradual change in allele frequency across Scandinavia. In wind-pollinated forest trees, in general, and in Norway spruce, in particular, another difficulty comes from extensive gene flow that results in low population genetic differentiation (average within population F

ST

∼ 0.025) and limited linkage disequilibrium (LD). Therefore only a slight shift of al- lele frequency could be observed at most loci even across a wide geographical range. To address these issues requires careful sampling and choice of statis- tics. It is also necessary to combine patterns from multilocus genetic markers since it can be difficult to detect clinal variation at individual loci if the clines in phenotypic traits are due to small frequency differences at many loci (B

AR

-

TON

, 1983, 1999; L

E

C

ORRE AND

K

REMER

, 2003; L

E

C

ORRE AND

K

REMER

, 2012; K

REMER AND

L

E

C

ORRE

, 2012).

Parallel evolution can provide even more compelling evidence of local adap- tation if similar geographic variation is observed at the same genetic markers along parallel clines with different population history. O

RR

(2005) pointed out that in a single-step substitution model parallel evolution occurs twice as often under natural selection as under neutrality. However, evolution in multi-step substitution becomes more complex and unpredictable since many factors such as pleiotropy, epistasis or population history (S

TERN AND

O

RGOGOZO

, 2009;

U

NCKLESS AND

O

RR

, 2009; C

HEVIN

et al., 2010) can influence the distribu- tion of beneficial mutants. Empirical data suggest that adaptive convergence may be more common that previously accepted at the level of genes and func- tion complexes (e.g. T

ENAILLON

et al., 2012; D

OMINGUES

et al., 2012; J

ONES

et al., 2012) although there are also examples of studies that failed to find com-

mon genes in different parts of the range (e.g. F

OURNIER

-L

EVEL

et al., 2011).

(18)

Comparison of molecular and quantitative genetics data suggests that recent selection, occurring after postglacial recolonization is the predominant factor that shapes standing quantitative trait variation (C

OLLIGNON

et al., 2002; K

RE

-

MER

et al., 2002). While there is a large number of studies at the phenotypic level, studies at the genotypic are less frequent and are mainly based on QTL or association mapping. This, in part, may be due to the fact that differen- tiation in nucleotide markers is much lower than for morphological or other polygenic traits among populations of forest trees (F

ST

< Q

ST

) and also to the dearth of physiological studies highlighting putative candidate genes. The in- creasing ease to obtain sequence data has, however, led to a renewed focus on approaches to identify local adaptation at the gene level in forest trees (E

CKERT

et al., 2010; H

ALL

et al., 2011; K

ELLER

et al., 2012) .

In Norway spruce, G

YLLENSTRAND

et al. (2007) showed that the expres- sion of the FT-homologue paFT4 (more correctly named paFTL2 after K

ARL

-

GREN

et al. 2011) is significantly correlated with bud set and that populations from latitude 66 .7

N and 47 .3

N are differentiated. These first results are a promising beginning and we can now start to address more detailed questions:

i Can clinal variation be observed at Single Nucleotide Polymorphism (SNP)?

ii How can we disentangle the effect of natural selection from that of population history when analyzing clinal variation at SNP?

iii Can we observe adaptive convergence in another parallel cline with similar geographic range in a phylogenetically close species P. obovata? And more importantly, at which level: genes or single substitutions?

1.2 Molecular evolution of Norway spruce

Had contemporaries of Darwin been aware that human beings share on average more than 95 ∼ 98% genome sequence with chimpanzees (Human Genome Project: http://genomics.energy.gov/), they would probably had found it less provocative that we descend from apes. The main question today has shifted and is more focused on explaining how this limited 2 ∼ 5% nucleotide divergence between the two species could lead to such dramatic anatomical and cognitive differences. Or stated differently: what makes a given species appear- ance and abilities so specific? The question was first brought up by K

ING AND

W

ILSON

(1975) who highlighted two possible places to search for answers:

the proteome and the regulatory mechanisms controlling gene expression (see C

ARROLL

, 2005 for a review). In attempts to look at this in spruce we carried out comparative studies (Paper V and VI) of the protein-coding transcriptome and of a group of small noncoding regulatory RNAs, respectively.

1.2.1 Transcriptome

Proteins are ultimately responsible for an organism’s anatomical, physiological,

and behavioral characteristics. A fully characterized transcriptome is therefore

a key step to understand life diversity. It is also crucial for genome annotation

(19)

and to gain new insights on the temporal and spatial patterns of gene expression.

It has recently become evident that species transcriptomes are highly diverse and complex in many organisms (e.g. M

ORTAZAVI

et al., 2008; P

AN

et al., 2008; F

ILICHKIN

et al., 2010; Z

HANG

et al., 2010).

In conifers, however, enormous genome sizes and a very large fraction of repetitive elements have so far hindered genomic studies. The transcriptome studies are mainly based on re-sequencing or microarrays. In spruce, EST se- quences based on Sanger sequencing are still the main sources for genetic stud- ies. Large EST libraries have been constructed in P. glauca (R

ALPH

et al. 2008) and in P. sitchensis (R

IGAULT

et al. 2011). 27,720 unique transcribed genes were sequenced in glauca and 23,589 were annotated as full length cDNA.

In sitchensis, the number of annotated full length cDNA decreases to around 8,000. Compared to those two species, P. abies still requires intensive efforts that only 8,715 Putative Unique Transcripts (PUT) are available and are poorly annotated (http://www.plantgdb.org).

mRNA-Seq and gene expression profiling

Application of next-generation sequencing (NGS) provides a full access to a species’ transcriptome. For example, mRNA-Sequencing (mRNA-Seq) leads to deep coverage and base-level resolution (N

AGALAKSHMI

et al., 2008). Com- pared to conventional methods such as microarrays, mRNA-Seq gives a much finer characterization of key aspects of the transcriptome and has already been instrumental for full annotation of protein-coding gene, gene expression pro- filing, noncoding RNA discovery and detection, transcript rearrangement dis- covery and single-nucleotide variation profiling in many model species (e.g.

Zea mays, E

MRICH

et al., 2007; Medicago truncatula, C

HEUNG

et al., 2006;

Arabidopsis thaliana, W

EBER

et al., 2007).

Gene expression profiling has been the main method for gene functional studies ever since the development of Northern blot, microarray (or gene chip) and other hybridization-based technology. The appearance of mRNA-Seq of- fers both sequence data and estimates of gene expression level in a single ex- periment. Compared to microarray, mRNA-Seq requires no prior annotation of the genome for probe selection, avoids biases introduced during hybridization of microarray, has a relatively low cost for the amount of data it produces and is the method of choice in projects using non-model organisms. Thus mRNA-Seq has become more and more popular for characterization of transcriptomes in a number of organisms and in various tissues (e.g. M

ORTAZAVI

et al., 2008;

Z

HANG

et al., 2010; G

AN

et al., 2011).

A circadian rhythm is the reaction to the light-dark cycle in most of living organisms. It is one of the most important and complicated biological pro- cesses in plants and affects thousands of physiological processes including the timing to flowering, leaf movement, germination, gas exchange and so on. In angiosperms, about 20% of the genes exhibit different diurnal patterns at ex- pression level (estimates of percentage varied due to differences in experimen- tal treatments, species and tissues, see e.g. J

IAO

et al., 2005; M

A

et al., 2005;

F

ILICHKIN

et al., 2011). However due to lack of data in both genetics and phys-

(20)

iology, no clear diurnal patterns have been reported yet in key photosynthesis genes in gymnosperms (A

LOSI

et al., 1990; G

USTAFSSON

et al., 1991).

Comparative evolution of coding sequences

Comparative genomics exploits both similarities and differences in the proteins, RNA and regulatory regions of different organisms to compare orthologues and test for selection (e.g. C

LARK

et al., 2003 for primates; L

I

et al., 2010a for vascular plants). To identify selection and its directionality, one can compare subsets of key genes between closely related species by assuming that they could explain the evolution of species in the context of their phylogenetic or- der. Therefore, most of the first genome-wide studies of natural selection have focused on coding sequence and estimates of non-synonymous versus synony- mous ratio (dN /dS) as the signal of selection intensity (e.g. Y

ANG

, 2006).

Gymnosperms separated from angiosperms about 300 million years ago (R

EN

-

NER

, 2009) and they differ from the latter in many aspects, such as recombi- nation (J

ARAMILLO

-C

ORREA

et al., 2010; M

ORITSUKA

et al., 2011), genome duplication (A

HUJA AND

N

EALE

, 2005) and genome size (M

ORGANTE AND

D

E

P

AOLI

, 2011). We would therefore expect different mode and tempo of evolution between these two groups. Early studies (e.g. P

ALMÉ

et al., 2008a;

B

USCHIAZZO

et al., 2012) indeed found lower synonymous mutation rate in gymnosperms than in angiosperms, particularly in annual angiosperms. B

USCHI

-

AZZO

et al. (2012) also reported a different pattern in average evolutionary rates (measured by dN/dS ratio) between gymnosperms (0.314, between P.

sitchensis and P. taeda) and between angiosperms (0.092, between Arabidopsis thaliana and Populus trichocarpa).

Our assembly is the first release of massive parallel sequencing applied to the needle transcriptome of P. abies, even though it is limited by its small sample size. Still together with the extant PUT sequences, the P. abies transcriptome now has reached a size of the same order of magnitude as the North American species. It becomes possible to do genome-wide protein-coding gene annota- tion, gene expression profiling, orthologue comparison and test for selection by comparing to P. glauca and P. sitchensis using Pinus taeda or Taxus mairei as outgroup. The goal of Paper V is to address the following questions using full transcriptome data for the first time:

i How much of the transcriptome is differentially expressed between dark and light treatments?

ii How reproducible are the estimates of mutation and evolutionary rates previously reported?

iii Are gene evolutionary rate and expression level correlated?

1.2.2 Small noncoding RNAs and their regulatory role in plants

Small noncoding RNA (sRNA) of 20-30 nucleotides (nt) guides regulatory pro- cesses at the DNA or RNA level in a wide range of eukaryotes and prokaryotes.

Although small-RNA-mediated silencing had already been observed in the late

1980s, the underlying mechanism remained unknown until the breakthrough

(21)

study of F

IRE

et al. (1998) in Caenorhabditis elegans. Small RNAs can cause either transcriptional gene silencing by guiding heterochromatin formation at homologous loci, or lead to post-transcriptional gene silencing through mRNA degradation or translational inhibition. In plants, a large portion of them has im- portant roles in regulation of diverse biological processes, including regulation of patterning and development, response to the environment, defense against pathogens and silencing of endogenous transposable elements (B

ONNET

et al., 2006).

Biogenesis

Small RNAs can either be endogenous or pathogen-derived. They can be clas- sified according to their different modes of biogenesis into three major types, short interfering RNA (siRNA), microRNA (miRNA) and piwi-interacting RNA (piRNA). miRNAs and siRNAs are processed from precursors by the RNase III endonuclease dicer, which acts on double-stranded substrates to release small RNA duplexes with 2-nt 3’ overhangs.

Dicer-like proteins have a distinct, hierarchical, and overlapping function in small RNA biogenesis. In Arabidopsis, four such Dicer-like proteins (DCL1-4) are responsible for processing specifically sized sRNA duplexes: DCL1 synthe- sizes 18 ∼ 21-nt-long sRNA including most of miRNAs (though a few miRNAs appear to be DCL4 dependent, see R

AJAGOPALAN

et al., 2006; F

AHLGREN

et al., 2007), whereas the products of DCL2, DCL3 and DCL4 are 22 nt, 24-nt and 21-nt long, respectively (X

IE

et al., 2004). Numerous studies of model and non-model plants have shown that the most abundant class of plant sRNA are siRNAs, which are typically 24-nt long in angiosperms and guide DNA methy- lation and heterochromatin formation of repetitive and transposable elements (X

IE

et al., 2004; R

AJAGOPALAN

et al., 2006). Gymnosperms seem to lack both 24-nt siRNAs and DCL3 (M

ORIN

et al., 2008). DCL4 and DCL2 con- duct a phased reaction dicing of trans-acting siRNA (ta-siRNA) precursors that are produced upon miRNA-guided cleavage of non-coding primary transcripts (reviewed in C

HAPMAN AND

C

ARRINGTON

2007).

Response to biotic stress

Besides their roles in developmental patterning and maintenance of genome integrity, many sRNAs are also components of plant responses to adverse envi- ronmental conditions, including biotic stress (reviewed by R

UIZ

-F

ERRER AND

V

OINNET

, 2009 and C

HEN

, 2009). All four Arabidopsis DCLs play critical and redundant antiviral roles. 21-nt-long siRNAs produced by DCL2 and DCL4 move between cells and likely immunize tissues just ahead of the infection by amplifying a systemic silencing response (M

OISSIARD

et al., 2007; D

ON

-

AIRE

et al., 2008). DCL1-dependent miRNAs can target DCL4 transcripts for degradation or negatively control transcription factors required for DCL4 ex- pression (Q

U

et al., 2008). Finally DCL3-dependent, 24-nt-long sRNA might dampen viral transcription by inducing chromatin condensation of nuclear viral episomes and mini-chromosomes (R

AJA

et al., 2008).

Recent researches in Medicago truncatula and Solanum lycopersicum have

shown that high levels of siRNA match to a large number of defense-related

(22)

nucleotide binding site-leucine-rich repeats (NBS-LRR) coding genes, in a way characterized by spaced in 21-nt "phased" intervals (Z

HAI

et al., 2011; S

HIV

-

APRASAD

et al., 2012). The production of these phased siRNAs are triggered by three groups of 22-nt miRNA (miRNA2118, miRNA173, and miRNA390).

No similar patterns have been reported in model species like Arabidopsis thaliana or Oryza sativa. This observation might reflect the evolutionary divergence in strategies of pathogen defense between species.

Characterization of sRNA using NGS and Bioinformatics

Next-generation sequencing has been developed to help characterize sRNA in many species (e.g. A

XTELL

et al., 2006). Conventional methods like direct cloning and sequencing can only provide sequences and microarray is exten- sively used for expression profiling but restricted to known miRNA and siRNA genes. While NGS can provide both at one experiment and has been highly efficient for the discovery of novel miRNAs, siRNAs and piRNAs genes on a genome-wide scale (e.g. B

ARAKAT

et al., 2007; A

XTELL

et al., 2006; G

IRARD

et al., 2006).

The aim of most sRNA studies is to profile miRNAs and their targets and thereby characterize miRNAs biogenesis and miRNA-target-recognition mech- anisms. miRNAs comprise a large family of small ∼ 21-nucleotide-long non- coding RNAs that emerge as key post-transcriptional regulators of gene expres- sion in metazoan animals, plants and protozoa (J

ONES

-R

HOADES

et al., 2006).

In plants, microRNA gene (MIR) is transcribed into a capped and polyadeny- lated primary microRNA (pri-miRNA). The pri-miRNA is processed into a stem-loop precursor (pre-miRNA), which is further processed into a duplex of miRNA and miRNA

. Plant miRNAs regulate target mRNAs through two ma- jor mechanisms: transcript cleavage and translation inhibition. They are high- level regulators of gene expression that affect numerous aspects of plant biol- ogy, especially developmental patterning. Mutants impaired in miRNA biogen- esis exhibit severe, pleiotropic abnormalities and over-expression of particular miRNAs or their targets exhibit a wide array of unusual phenotype in plants (see a review of J

ONES

-R

HOADES

et al., 2006). Many miRNAs and targets have been successfully identified by computational approaches based on empir- ical rules (e.g. D

AI

et al., 2011).

In spruce, Y

AKOVLEV

et al. (2010) performed direct cloning and sequenc- ing and reported 44 miRNAs, 24 of which were novel in Norway spruce. 25 target groups were also identified and the majority were annotated as TIR-NBS- LRR genes or transcription factors. All their predictions were based on hairpin stability and sequence complementarity. For the description of whole sRNA populations, little has been accomplished even in model systems. Our goal in this part of the thesis is to:

i Obtain a detailed characterization of the sRNA populations from P. abies.

ii Identify putative miRNAs and their targets.

iii Compare sRNA populations between different plant species with emphasis on

features identified in P. abies

(23)

2. Results and discussion

2.1 Estimate of speciation history of spruce based on multilocus patterns - Paper I and II

In Paper I and II we investigated the multilocus pattern of polymorphism and divergence in two groups of spruce species. We sequenced fragments of 10 and 12 genes covering over 8,000 − 10,000 base pair sequences in boreal and montane species, respectively. The first group included one Eurasian species, P. abies and three North American species, P. glauca, P. mariana and P. brewe- riana (Paper I) and the second group comprised four species from the Qinghai- Tibetan Plateau: P. shrenkiana, P. likiangensis, P. purpurea and P. wilsonii (Pa- per II). Average numbers of 35 and 44 individuals were studied across genes and species. In both cases, nucleotide diversity and divergence between pairs of species were summarized and used to estimate the speciation parameters (effective population size N

e

, divergence time T and migration rate M) under IM model. We examined the standard neutral model as well as three other de- mographic models (see Figure 2.1) and the goodness-of-fit for the models was examined using a set of summary statistics including nucleotide diversity ( π), private, shared and fixed polymorphism (S

1

, S

2

, S

s

and S

f

), population differ- entiation (F

ST

) and Tajima’s D.

Past Present

θ

SNM

θ

α

PEM

θ

f d

T

BNM

θ

f T

BEM

Figure 2.1.

Diagram of the different demographic models evaluated in Paper I and Paper II. The SNM is determined by population mutation rateθ alone, whereas PEM has two param- etersθ and α, the exponential growth rate. The BNM is characterized by four parameters θ, f (size of bottleneck), d (duration of bottleneck), T (time after bottleneck). The BEM has four parametersθ, α, T and f

(24)

2.1.1 Nucleotide diversity and gene flow

In agreement with B

OUILLÉ AND

B

OUSQUET

(2005), we observed a strikingly large number of shared polymorphisms (S

s

). The mean values are generally higher between Chinese species (61.5) than between Eurasian and North Amer- ican species (21). Conversely the average number of fixed sites between these species (S

f

) is low (5) although it is much higher (34) when they are paired with P. breweriana.

The analysis of polymorphism and divergence in these eight species led to interesting results. First, Picea species generally have fairly large effective pop- ulation sizes (85 ,000−206,000 depending on species) with two notable excep- tions: P. schrenkiana effective population size is about 30 ,000 and with a value around 12 ,000, P. breweriana has an even smaller effective population size. Our estimates reflected quite well the differences of nucleotide diversity between P.

breweriana and the other three North American and Eurasian spruce studied here. Interestingly, this low effective population size of breweriana spruce is not observed for allozyme loci that have similar diversity levels (L

EDIG

et al., 1997). Such a discrepancy between levels of polymorphism at allozyme and nucleotide has been previously noted in P. abies (H

EUERTZ

et al., 2006) and Pinus sylvestris (P

YHÄJÄRVI

et al., 2007). The latter suggested that selection could be the cause. Second, the three spruce species from the QTP have a surprisingly larger nucleotide diversity (> 0.005) considering their relatively small distribution compared to the continent-wide distributed species such as P. glauca and P. mariana. The low diversity in the boreal species could be due to a stronger impact of quaternary glaciations. Third, gene flow occurs between species, even species today found on different continents. We found evidence of gene flow from P. glauca to P. abies and from P. schrenkiana to P.

wilsonii, although most shared polymorphisms likely reflect incomplete lineage sorting. Because today these species cannot be crossed easily, our estimate of gene flow should be regarded as an average value dating back to the time when new species formed. The presence of gene flow between P. glauca and P. abies could trace back to a warm period where the two species where sympatric on Greenland (D

E

V

ERNAL AND

H

ILLAIRE

-M

ARCEL

, 2008).

Overall, our results are consistent with previous estimates of N

e

and T based on nucleotide diversity (B

OUILLÉ AND

B

OUSQUET

, 2005). Additionally, our studies highlighted the possibility of shared ancestral polymorphisms between species that separated earlier than 15 mya. This result is actually not too sur- prising if one considers an effective population size timescale. Considering the large population sizes of conifer species and assuming a generation time of 50 years, 15 million years translate into mere 4N

e

generations, a value much smaller than the 9 −12N

e

generations that is required for 95% of loci to become reciprocally monophyletic (H

UDSON AND

C

OYNE

, 2002).

2.1.2 Demographic history of eight spruce species

Despite an overall fit to the standard IM model, negative values in both Tajima’s

D and Fay & Wu’s H suggested a relatively ancient bottleneck (H

EUERTZ

et al.,

2006; P

YHÄJÄRVI

et al., 2007; I

NGVARSSON

, 2008) for most spruce species

(25)

that have been studied. However, we expected a recent population expansion in P. likiangensis and a rather stable population history of P. breweriana. There- fore, we examined the possibility of two other different demographic scenarios including the exponential growth model (PEM) and bottleneck model (BNM) in Paper II and we estimated parameters under a model of bottleneck with expo- nential growth (BEM) in Paper I (all models are illustrated in Figure 2.1). For most pairwise comparisons, SNM could not be rejected and the demographic models only had a marginal preference based on approximate Bayesian com- putation (Paper II). But simulations under the growth scenario did significantly improve the model fit to the observed data (Paper I, Figure 2.2). Our estimates implied an ancient bottleneck dating back to 2 mya followed by a slow popula- tion expansion. It is stark contrast to the traditional view of a recent bottleneck and quick expansion at the end of Last Glacial Maximum (LGM) based on paly- nology and fossil study (M

C

L

EOD AND

M

AC

D

ONALD

, 1997; K

NAAP

, 2006;

L

INDBLADH

et al., 2007). The latter is heavily limited by its data type and could only reveal the effect of quite recent history. Our results of population demography are close to those obtained by H

EUERTZ

et al. (2006) (0 .15 − 0.3 mya) which were also based on multilocus data but considered only one species.

In general, the researches in Paper I and II have highlighted the importance of using multilocus sequences and species comparisons when inferring population history. Multilocus data should have an obvious advantage over cytoplasmic DNA in inferring demographics further back in time. More and more phyloge- netic researches have adopted multilocus data, population size, recombination rate and other population genetic concepts to help solving incongruence be- tween species tree and gene trees (e.g. L

IU

2008; L

IU

et al. 2008). Additionally with the development of Approximate Bayesian Computing (ABC) approaches, we now have the ability to implement and assess much more complex demo- graphic models using data from a larger number of genes and species (see a review by B

EAUMONT

, 2010). However, questions still remain as pointed out by P

INHO AND

H

EY

(2010): what are the details of the interaction between disruptive selection and linkage and how often disruptive selection itself can be the initial trigger for divergence with gene flow? Answering these question require identification of the genes that are the target of selection, a topic that we are going to discuss in the following sections.

2.2 Clinal variation and local selection - Paper III and IV

After obtaining a global picture of spruce speciation history, we focused on

more recent history in Papers III and IV. Also, while Papers I and II focused on

neutral processes both Papers III and IV deal with adaptation. More specifically

we addressed the following questions: how did conifer trees adapt to the local

environment during their postglacial recolonization? And, does parallel evolu-

tion occur in Norway and Siberian spruces, two species with different recent

histories? In Norway spruce, data on latitudinal clinal variation at the levels of

phenotype, genotype and gene expression were combined and a three-step ap-

(26)

0 50 100 150 200 250 0.0

0.1 0.2 0.3 0.4

S statistics S1 S2 Ss Sf

0.1 0.2 0.3 0.4 0.50.6 0.7 0.00

0.05 0.10 0.15

Fst

1 2 3 4 5 6 7

0.00 0.05 0.10 0.15 0.20 0.25

1 2

-1.0 -0.5 0.0 0.5 1.0 0.00

0.05 0.10 0.15 0.20

D D1 D2

0 50 100 150 200

0.0 0.1 0.2 0.3 0.4

S statistics S1 S2 Ss Sf

0.2 0.4 0.6 0.8

0.00 0.05 0.10 0.15

Fst

2 4 6 8

0.00 0.05 0.10 0.15 0.20 0.25

1 2

-1.5 -1.0 -0.5 0.0 0.5 1.0 0.00

0.05 0.10 0.15 0.20

D D1 D2

Neutral model Growth model

Figure 2.2.An example of goodness-of-fit plots from P. abies× P. glauca under both the neutral (left) and growth (right) models.

proach was applied to disentangle the effect of local selection with population history. The study in Siberian spruce had, at that stage, a more restricted scope and focused on allele frequencies at candidate genes and on a more limited set of phenotypic measurements. Together the two datasets are the first evidence of parallel evolution in conifer species.

2.2.1 Latitudinal gradient patterns in phenology

While clinal variation in bud set has been amply demonstrated previously (e.g.

D

ORMLING

, 1973; G

YLLENSTRANDet al., 2007) we confirmed it in a subset

of populations in both species. Seedlings were exposed to photoperiodic treat- ments of increasing night length. Bud set percentages of Norway spruce and growth cessation of Siberian spruce were summarized in 5 and 3 populations from 47

N or 54

N south to 68

N north, respectively. As would be expected in the natural environment, northern populations were more sensitive to the in- crease of night length under experimental conditions (see Figure 2.3A). In Nor- way spruce, nearly 50% individuals from northernmost populations started to set bud under 6.5-h darkness treatment while in the southernmost populations 9.5h-darkness treatment was required to reach a similar percentage. We also observed similar clinal variation in growth cessation in P. obovata: the aver- age growing period was significantly shorter in northernmost population Igarka (14.3 days) compared to that of two southern populations (27.4 and 28.8, see Figure 2.3B).

2.2.2 Population genetic structures

We genotyped 137 SNPs from 19 candidate genes as well as 308 control SNPs

from genes a priori unrelated to bud set in 18 Norway spruce populations. The

(27)

Latitude

Percentage of Bud Set

SE−58.3 SE−61.6 SE−62.7 SE−64.1 SE−66.4

0.0 0.2 0.4 0.6 0.8 1.0

Photoperiod 6.5h−darkness 8h−darkness 9.5h−darkness 1 9.5h−darkness 2

A

54 56 58 60 62 64 66 68

10 15 20 25 30 35 40

Latitude

Growthing days

°N °N °N °N °N °N °N °N

B

Figure 2.3.

Latitudinal gradient at the phenotype level. (A). Percentage of individuals setting bud in five Norway spruce populations under different photoperiodic treatments; (B).

Days before growth cessation in three Siberian spruce populations.

candidate genes are putatively involved in the photoperiodic pathway, circadian clock and shoot apical development. The population genetic structure inferred using the program STRUCTURE (Figure 2.4A, B) confirmed the differentia- tion of populations into three main clusters corresponding to the Alpine do- main (Germany and the Saleby population), populations from central Sweden and Finland showing a high level of admixture and populations from Northern Finland (latitude > 66

N). These data may reflect the existence of at least two recolonization routes into Scandinavia after the Last Glacial Maximum (G

IESECKE AND

B

ENNETT

, 2004; T

OLLEFSRUD

et al., 2009) and the presence of high latitude refugia in Northwestern Russia (V

ÄLIRANTA

et al., 2011).

In Siberian spruce, post-glacial history did not generate any hierarchical structure among the populations investigated. The overall F

ST

value was 0.0152 and STRUCTURE results based on 14 SSR markers failed to delineate any meaningful clusters: all individuals appeared admixed reflecting the lack of population genetic structure (see Figure 2.4C, D). "Isolation by distance" (IBD) was significant but weak. Estimate based on 80 independent silent SNPs gave slightly different results: each population was divided into two clusters and the proportions of the two clusters were rather similar across populations. This in- dicated again a lack of latitudinal population genetic structure, but might also suggest the possibility of some longitudinal population genetic structure.

2.2.3 Allele frequency spectrum analysis for local adaptation

The admixture zone detected in Norway spruce certainly complicates the de-

tection of clinal variation at candidate SNPs as even neutral polymorphisms

could show clines in allele frequencies simply because of population history

(N

OVEMBRE AND

D

I

R

IENZO

, 2009). Therefore to tell selection apart from de-

mography we performed a three-step approach that combined linear regression,

(28)

K = 2 (Norway spruce) A

GE-47.0 RU-53.3SE-58.3 FI-61.5 SE-61.6SE-61.8 FI-62.0 SE-62.6 SE-62.7 FI-63.0 SE-63.4SE-63.7 SE-64.1 SE-65.3 FI-66.4 FI-67.0 FI-67.7 FI-68.0 0.0

0.2 0.4 0.6 0.8 1.0

K = 3 (Norway spruce) B

GE-47.0 RU-53.3SE-58.3 FI-61.5 SE-61.6SE-61.8 FI-62.0 SE-62.6 SE-62.7 FI-63.0 SE-63.4SE-63.7 SE-64.1 SE-65.3 FI-66.4 FI-67.0 FI-67.7 FI-68.0 0.0

0.2 0.4 0.6 0.8 1.0

K = 2 (Siberian spruce) C

IGA-67 TYR-65 BOR-61 YAR-60 ENI-58 KRQ-56 KOS-54

0.0 0.2 0.4 0.6 0.8 1.0

K = 3 (Siberian spruce) D

IGA-67 TYR-65 BOR-61 YAR-60 ENI-58 KRQ-56 KOS-54

0.0 0.2 0.4 0.6 0.8 1.0

Figure 2.4. Clustering analysis conducted in STRUCTURE. Results were plotted when K

= 2 and K = 3 in Norway spruce (A, B) based on 308 control SNPs, and in Siberian spruce (C, D) based on 14 SSR loci

References

Related documents

Det finns en hel del forskning om vilken betydelse faktorer som till exempel lokal identitet och ett rikt kultur- och fritidsutbud har för människors val av boendemiljö, men forskning

Indien, ett land med 1,2 miljarder invånare där 65 procent av befolkningen är under 30 år står inför stora utmaningar vad gäller kvaliteten på, och tillgången till,

Our cloning and characterization experiments of novel MADS-box genes from spruce indicate that gene family complexity indeed is larger than earlier believed. Moreover, some

The values indicated on the branches represent the percentage of bootstrap replications (only bootstrap values over 40 are shown on the tree) and the scale bar at the bottom

Alignments of transcript sequences without known protein homology to the spruce genome (using GMAP [5]) resulted in a similar picture: for the 122,571 aligned sequences, both the

MOE in a tangential direction is a function of both MC and temperature, and there are gradient terms in the elasticity relationship between incremental stress and strain that are

We proved that both capture probes and genotyping by sequencing (GBS) show similar results in common diversity measurements and offers many SNPs, although capture

We proved that both capture probes and genotyping by sequencing (GBS) show similar results in common diversity measurements and offers many SNPs, although capture probes