• No results found

Molecular evolution of voltage-gated calcium channels of L and N types and their genomic regions

N/A
N/A
Protected

Academic year: 2021

Share "Molecular evolution of voltage-gated calcium channels of L and N types and their genomic regions"

Copied!
57
0
0

Loading.... (view fulltext now)

Full text

(1)

UPTEC X 12 006

Examensarbete 30 hp

April 2012

Molecular evolution of voltage-gated

calcium channels of L and N types

and their genomic regions

(2)
(3)

Molecular Biotechnology Programme

Uppsala University School of Engineering

UPTEC X 12 006

Date of issue 2012-04

Author

Jenny Widmark

Title (English)

Molecular evolution of voltage-gated calcium channels of L and N

types and their genomic regions

Title (Swedish)

Abstract

The expansion of the voltage-gated calcium channel alpha 1 subunit families (CACNA1) of L and N types was investigated by combining phylogenetic analyses (neighbour-joining and maximum likelihood) with chromosomal data. Neighbouring gene families were analysed to see if the chromosomal regions duplicated through whole genome doublings in vertebrates. Results show that both types of CACNA1 expanded in two ancient whole genome

duplications as parts of larger genomic regions. Many gene families in these regions obtained copies in an additional teleost-specific genome duplication. This diversification of CACNA1 genes probably contributed to evolutionary innovations in nervous system function.

Keywords

Evolution, vertebrate, voltage-gated calcium channel, whole genome duplication Supervisors

Dan Larhammar

Uppsala University Scientific reviewer

Tatjana Haitina

Uppsala University

Project name Sponsors

Language

English

Security

ISSN 1401-2138

Classification

Supplementary bibliographical information Pages

57

Biology Education Centre Biomedical Center

Husargatan 3 Uppsala

(4)
(5)

Molecular evolution of voltage-gated calcium channels of L and N

types and their genomic regions

Jenny Widmark

Populärvetenskaplig sammanfattning

I alla celler i en organism finns en och samma uppsättning av arvsmassa (DNA), denna uppsättning kallas för ett genom och i genomet finns alla gener. Man vet hur genomen i flera olika arter ser ut eftersom de har sekvenserats, dvs att ordningen av byggstenarna har avlästs. Det har visat sig att en gen som i ryggradslösa djur bara finns i en kopia ofta finns i flera kopior i ryggradsdjur. Detta förklaras av att det i de senare har skett två genomduplikationer (genomfördubblingar), alltså att alla gener har kopierats i två omgångar. På detta sätt uppstod genfamiljer, snarlika gener som kodar för proteiner som liknar varandra. Äkta benfiskar har utöver de första två genomduplikationerna genomgått ytterligare en. Detta innebär att vi människor borde ha fyra gånger fler gener än ryggradslösa djur och att äkta benfiskar borde ha dubbelt så många gener som vi. Dock har många av de nya kopiorna muterat så att flera av dem har förstörts och därför inte finns kvar i genomet.

Eftersom det uppstått fler kopior av en och samma gen ges större möjligheter till att vissa av kopiorna förändras då minst en kopia finns kvar som fortsätter att utföra den ursprungliga funktionen. På så sätt kan några av genkopiorna mutera och utveckla nya funktioner. Därför tror man att genomduplikationerna har haft stor evolutionär betydelse, bland annat för

utvecklingen av ryggradsdjurens nervsystem. I den här rapporten beskriver jag evolutionen av två genfamiljer som kodar för två olika typer av kalciumkanaler. Kalciumkanaler har flera viktiga funktioner i nervsystemet och mina studier visar att båda familjerna har expanderat i de två genomduplikationerna och att endast ett av duplikaten har gått förlorat i däggdjur. Äkta benfiskar har även bevarat de allra flesta kopiorna från den tredje genomduplikationen. Den höga graden av bevarade genkopior i både däggdjur och äkta benfiskar stödjer teorin att gener som är viktiga för organismernas nervsystem är mer benägna att bevaras.

Examensarbete 30 hp

(6)
(7)

Contents

1 Introduction 9

1.1 Calcium channels . . . 9

1.2 Vertebrate evolution . . . 12

1.3 Phylogenetic tree analysis . . . 14

1.4 Neighbour-joining and maximum likelihood methods . . . 15

2 Materials and methods 17 2.1 Collection of CACNA1 gene families . . . 17

2.2 Identifying neighbouring gene families . . . 17

2.3 Editing of genes and protein sequences . . . 18

2.4 Sequence alignment and phylogenetic analyses . . . 18

3 Results 19 3.1 CACNA1 L-type family . . . 19

3.1.1 Chromosomal regions . . . 21

3.2 CACNA1 N-type family . . . 23

(8)

Key terms and abbrevations

Amniote Tetrapods with amniotic sacs surrounding the embryo. In-cludes reptiles, birds and mammals but not amphibians which do not have land-adapted eggs.

CACNA1 Calcium channel subunitα1

Clade A group which includes one common ancestor and all its de-scendants. In a phylogenetic tree a clade includes all branches that radiate from one single node.

Conserved synteny Preserved co-localisation of genes on chromosomes of differ-ent species.

Homologues Genes derived from a common ancestral DNA sequence. Orthologues Homologous genes in different species that diverged from a

common ancestral gene by speciation.

Paralogues Homologous genes within the same species, related by dupli-cation within a genome.

Paralogon A set of homologous chromosomal regions containing groups of paralogous genes, derived from a common ancestral re-gion.

Parsimonious From parsimony: according to this principle the hypothesis that makes the least assumptions, the simplest explanation, is the most parsimonious.

Teleosts True bony fishes; the largest clade within ray-finned fishes. In the present study, the zebrafish, medaka, three-spined stickle-back and spotted green pufferfish are teleosts.

Tetraploidisation The process of genome doubling, resulting in four homolo-gous sets of chromosomes (rather than the diploid set of two). Tetrapods Four-limbed vertebrates, includes amphibians, reptiles, birds

and mammals.

(9)

1

Introduction

A large number of human diseases are caused by ion channel malfunctions, so called channelopathies. One type of ion channel is the voltage-gated calcium channels and malfunction or lack of expression of these cause diseases such as epilepsy, blindness, paralysis, cardiac arrhythmias and migraine (Jegla et al., 2009). The ion channels have crucial functions in the nervous system and it has previ-ously been suggested (Jegla et al., 2009) that many of the ion channel families have expanded in vertebrate specific tetraplodisation events, called 2R and 3R (Panopoulou and Poustka, 2005). These whole genome duplications might have played a major role in the diversification of vertebrates and the evolution of the nervous system (Holland, 2009). Dan Larhammar’s research group has previously described that several gene families as well as their chromosomal regions have ex-panded in this way, such as the HOX cluster region (Sundström et al., 2008b), the neuropeptide Y system (Larsson et al., 2008; Sundström et al., 2008a), the opioid system (Dreborg et al., 2008; Sundström, 2010) and also other ion channels such as the voltage-gated sodium channels (Widmark et al., 2011).

Here I have investigated the expansion of the calcium channel α subunits of L and N type by combining two types of phylogenetic analysis together with chro-mosomal data to see if they are consistent with whole genome duplication events. In addition, I have analysed neighbouring gene families to further investigate the duplications of larger chromosomal regions.

1.1

Calcium channels

Voltage-gated calcium channels are made up by four to five different subunits,

α1, β, α2, δ and γ. This study is focused on the calcium channel subunit α1

(CACNA1) which is the largest subunit that forms the actual channel. The α1 subunit genes have about 47 exons which encode proteins of approximately 2500 amino acids. The proteins have four domains (I-IV) with six transmembrane re-gions (S1-S6) each, the fourth of which functions as a voltage sensor, see figure 1.

(10)

Figure 1: Schematic picture of a voltage-gated calcium channel. The channel has four domains with six transmembrane regions each. The fourth transmembrane region (in yellow) functions as a voltage-sensor. Adapted from Widmark et al. 2011.

pharmacological and electrophysiological diversity among the calcium channels while the other subunits mainly modulate their properties (Catterall, 2011). Three different CACNA1 genes are present in the fruit fly and the tunicate genomes, one ortholog for each CACNA1 family. This indicates that three CACNA1 genes were present before the divergence of the vertebrate lineage.

The L-type family of the CACNA1 subunits includes four different proteins in humans: CACNA1S, 1C, 1D and 1F. They typically have a long-lasting activation time and were previously thought to be dependent on a strong depolarisation for activation. However, lately it has been found that some members can be activated by smaller changes in membrane potential (Lipscombe et al., 2004). They initiate the contraction of muscle cells and the secretion from endocrine cells but are also involved in hearing, retinal signalling and urinary bladder functions.

(11)

Table 1:Summary of the ten voltage-gated calcium channels in human.(Catterall, 2011; Jansen

et al., 2011)

Type Channel name Gene name Human chromosome Involved in

Cav 1.1 CACNA1S 1 Skeletal muscle excitation

L (long-lasting) Cav 1.2 CACNA1C 12

Cardiac muscle contrac-tion, insulin secretion, urinary bladder function, spatial memory

Cav 1.3 CACNA1D 3

Sinoatrial node pacemak-ing, hearpacemak-ing, brain func-tion (mood behavior) Cav 1.4 CACNA1F X Retinal signaling

P/Q (purkinje) Cav 2.1 CACNA1A 19

Fast synaptic transmis-sion and synaptic plastic-ity, neuromuscular trans-mission "N" N (neural) Cav 2.2 CACNA1B 9 Sympathetic functions, long-term memory, synaptic transmission, neuronal migration during development R (residual) Cav 2.3 CACNA1E 1 Synaptic transmission

Cav 3.1 CACNA1G 17

Low threshold Ca2+spike that mediates burst firing of neurons, depolarization of sinoatrial nodal cells T (transient,

short-lived) Cav 3.2 CACNA1H 16

Nociception, relaxation of coronary arteries

Cav 3.3 CACNA1I 22

(12)

1.2

Vertebrate evolution

Two rounds (2R) of whole genome duplication (WGD) have occurred early in the evolution of vertebrates, at least before the emergence of jawed vertebrates (see figure 2) (Ohno, 1970; Dehal and Boore, 2005; Nakatani et al., 2007; Putnam

et al., 2008). The timing of these events is still uncertain, but they are thought to

have happened about 500 million years ago, around the divergence of the hagfish and lamprey lineages. Due to limited genome data no thorough whole genome investigation of the lamprey genome and its relation to 2R have been made. How-ever, several limited studies of known gene families have been conducted which suggest divergence of the lamprey and hagfish lineages after 2R (Kuraku, 2010; Kuraku et al., 2009). A third round of whole genome duplication (3R) occurred early in the evolution of teleost fishes (Jaillon et al., 2004; Meyer and Van de Peer, 2005).

Figure 2: Estimated timing of the whole genome duplication events in vertebrate evolution (in MYA). The timing of 2R is thought to have happened around 500 Mya before the divergence of the lamprey and hagfish lineages. Independent WGDs in some amphibian and teleost fishes are not shown. Adapted from (Sundström 2010)

(13)

Figure 3: Schematic representation of whole genome duplications. One ancestral chromosome duplicates into four chromosomes in 2R and eight chromosomes in 3R. Species diverging after 2R, like tetrapods, should therefore ideally have four gene copies of each gene located on four chromosomes. In the teleost lineage an additional whole genome duplication, 3R, occurred resulting in eight gene copies located on eight chromo-somes.

it has been argued that this radiation cannot be easily ascribed to one single event (Santini et al., 2009). Additionally, more recent WGDs have occurred in some vertebrate lineages, including the salmonid fish lineage, some cyprinid fishes and some species of frog (eg. Xenopus laevis) (Hordvik et al., 1997; David et al., 2003; Evans et al., 2004; Van de Peer et al., 2009).

As a result of the two rounds of WGDs, one gene becomes four genes located on four different homologous chromosomes, as depicted in figure 3. The third round of WGD then results in eight genes located on eight different chromosomes. This means that species diverging after 2R (cartilaginous fishes and tetrapods) could have at most four gene copies generated through WGD, while teleost fishes, diverging after 3R, could have up to eight copies. In this way genome duplication events give rise to gene families.

(14)

them; they can for example be expressed in different tissues or at different time points (Force et al., 1999; Hahn, 2009). The least common outcome is so called neofunctionalisation where genes acquire completely new functions (Lynch and Conery, 2000; Conant and Wolfe, 2008).

Since whole chromosomes have been duplicated, one can expect to find several related parts of the genome with conserved gene organisation, so called paralo-gons (Coulier et al., 2000). Therefore one can include positional data to study the evolution of gene families. That several gene families are located in the vicinity of each other on several chromosomes in several species supports an expansion in WGD events. However, over time the genes might have been rearranged or undergone local duplication events.

1.3

Phylogenetic tree analysis

To investigate the evolutionary history of gene families it is useful to make se-quence based phylogenetic trees in combination with chromosomal analyses. The phylogenetic analyses use the aligned amino acid sequences to infer the most likely evolutionary relationships between the included sequences and represent this as a phylogenetic tree. By combining this with data on the chromsomal loca-tions of the studied genes one can draw conclusions about the evolutionary events that have shaped the gene families.

Ideally, a phylogenetic tree of a gene family that has expanded in 2R and 3R would display a two-times-two topology (Hughes and Friedman, 2003) with equal branch lengths, assuming identical evolutionary rates (see figure 4, the WGDs are marked with red dots). In the more realistic figure 4B the gene families have been subjected to evolutionary events, such as gene losses and translocations (Taylor

et al., 2001). The branch lengths also differ between members of each family

indicating different evolutionary rates. The gene family marked in green has un-dergone at least one gene loss and one rearrangement and the red gene family has undergone at least two gene losses and one local duplication event or possibly a translocation of one of the original four red genes to the same chromosome as another member. The phylogenetic analysis has not been able to resolve all of the branches of the purple gene family which may be due to too weak phylogenetic signal in the dataset as a result of very well-conserved or short sequences.

(15)

Figure 4: Schematic representation of ideal (A) and realistic (B) phylogenies of gene families that have expanded in 2R. In (A) the evolutionary rates of all genes are equal and the time points of the WGDs are marked with red dots. In B the green gene family has undergone a gene loss as well as a rearrangement and the red gene family has undergone two gene losses and one local duplication event.

than they themselves are to each other. When investigating the evolution within the vertebrates it is suitable to choose a related invertebrate gene as an outgroup. To provide relative dating for the evolutionary events that shaped the gene family, and rule out an expansion before any of the WGDs, it is necessary to include a species that diverged between the outgroup and the vertebrates. The tunicates provide such a reference since they are close relatives of vertebrates and genome sequences for three species are available.

Since many different events could have shaped the evolution of the vertebrate gene families, it is important to include sequences representing as many of the major vertebrate lineages as is possible and practical (teleost fishes, amphibians, birds, reptiles and mammals). This also increases the phylogenetic signal of the analysis as well as the reliability of the trees. Due to the state of the current genome assembly of the western clawed frog (Silurana tropicalis) which provides very little chromosomal data I chose to not include any amphibians in the study.

1.4

Neighbour-joining and maximum likelihood methods

(16)

method of inferring phylogenetic relationships by identifying the sequences that are closest neighbours and joining them together in the tree one by one (Saitou and Nei, 1987). To validate the NJ trees they are commonly subjected to a statis-tical evaluation called bootstrapping. This method repeats the NJ analyses a given number of times and produces a consensus tree based on all replicates, a so called bootstrap consensus tree. The bootstrap values at each branchpoint represent the probability for the branch to be placed in the same way if the test was repeated. The bootstrap value is therefore not a measurement of the validity of the tree but a measure of its reproducibility. Since the NJ method is very fast, as many as a thousand bootstrap replicates can be used without compromising the time.

The various phylogenetic methods have different shortcomings and strengths. Therefore it is common to replicate the phylogenetic analyses using several meth-ods. There are many methods that provide more powerful and reliable alternatives than the relatively outdated NJ. Maximum likelihood methods are more time con-suming but are considered much more reliable since they consider not only se-quence similarities, but rather search for the best tree out of all possible trees that can explain the dataset (Felsenstein, 1981). The best tree is the tree that makes the observed sequence alignment most likely, taking into account a given model of sequence evolution as well as sequence similarities. ML analyses make a starting tree which is then rearranged and tested, seeking the final topology that maximizes the statistical likelihood. The sequence evolution model can be standard for each type of ML analysis, but it is better to test different models against the dataset to find the one that applies best.

Testing maximum likelihood trees by bootstrapping makes the analysis time even longer since the same process has to be replicated a number of times. Recently other statistical methods have been developed to make maximum likelihood analy-sis quicker. One such method is to do an approximate likelihood-ratio (aLRT) test coupled with the PhyML (Phylogenetic Maximum Likelihood) tree-calculating al-gorithm. This method avoids replicating the maximum likelihood procedure and when compared to bootstrap tests it produces similarly reliable trees (Guindon

et al., 2010). The aLRT values describe the probability that each branch position

(17)

2

Materials and methods

2.1

Collection of CACNA1 gene families

Protein sequences for the CACNA1 L-type family (CACNA1C, D, F and S) were collected from the Ensembl database version 61 by using the built in protein fam-ily predictions in the following species: human (Homo sapiens), mouse (Mus

mus-culus), grey short-tailed opossum (Monodelphis domestica), chicken (Gallus gal-lus), anole lizard (Anolis carolinensis), zebrafish (Danio rerio), medaka (Oryzias latipes), three-spined stickleback (Gasterosteus aculeatus), spotted green

puffer-fish (Tetraodon nigroviridis), transparent sea squirt (Ciona intestinalis) and fruit fly (Drosophila melanogaster). In general, the longest transcript prediction for each gene was chosen for the study. For proteins with diverging longest tran-scripts, alternate transcript predictions or automatic GenScan predictions (Burge and Karlin, 1997) from the Ensembl database were used instead. To identify addi-tional proteins that were not included in Ensembl’s automatic family predictions, Basic Local Alignment Searches (BLAST) (Altschul et al., 1990) were performed using identified amino acid sequences as search terms with standard settings on the Ensembl database and the National Center for Biotechnology Information (NCBI) database. For some families Hidden Markov Model searches (HMMER) (Finn

et al., 2011) were made to identify genes in species not included in the primary

selection. The procedure was then repeated for the CACNA1 N-type family.

2.2

Identifying neighbouring gene families

For the CACNA1 L-type family, a list of all identified genes located in the ge-nomic region 5 Mb upstream and 5 Mb downstream of each CACNA1 gene was collected from the Ensembl database version 61. Genes belonging to En-sembl protein families with members located within the selected region on at least two of the chromosomes were considered for the analysis of conserved synteny. The amino acid sequences of members included in the Ensembl families were downloaded from human (Homo sapiens), mouse (Mus musculus), grey short-tailed opossum (Monodelphis domestica), chicken (Gallus gallus), anole lizard (Anolis carolinensis), zebrafish (Danio rerio), medaka (Oryzias latipes), three-spined stickleback (Gasterosteus aculeatus), spotted green pufferfish (Tetraodon

nigroviridis), transparent sea squirt (Ciona intestinalis) and fruit fly (Drosophila melanogaster). In general, the longest transcript of each gene was chosen for the

(18)

the CACNA1 families. The procedure was then repeated for the CACNA1 N-type family.

2.3

Editing of genes and protein sequences

For short, incomplete or diverging sequences, the full intronic and flanking nu-cleotide sequence was collected and the Genscan gene prediction server (http://genes.mit.edu/GENSCAN.html) (Burge and Karlin, 1997) was used to iden-tify exons that had not been previously predicted. Sequences that were still di-verged with regard to exon-intron boundaries were curated manually by following consensus for splice donor and acceptor sites as well as sequence identities to other family members. Remaining highly divergent regions were removed. Some short protein sequences that did not provide enough sequence information in the alignments were removed; however the chromosomal position of the gene was registered in figures 6 and 8.

2.4

Sequence alignment and phylogenetic analyses

The sequences were aligned using ClustalX 2.0.12 (Larkin et al., 2007) with stan-dard settings. All alignments were manually inspected to optimize poorly aligned sequences by using BioEdit Sequence Alignment Editor (Hall 1999). The se-quences were named with the three-letter abbreviation for the species followed by the chromosome on which the gene is located. When more than one gene located on the same chromosome were present in an alignment an additional number was added, e.g. Mmu.2_2 means gene 2 of 2 located on mouse (Mus musculus)

chro-mosome 2.

(19)

and gamma distribution parameter estimated from the dataset, eight substitution rate categories, the tree topology and branch length were optimized using both the NNI and SPR topology optimisation methods. A non-parametric SH-like approx-imate likelihood ratio (aLRT) test was selected for the statistical support of the topology.

All genes in the same cluster were coloured relative to the human orthologues' chromosomal position. The phylogenetic trees for the CACNA1 gene families have been rooted using the fruit fly sequences. All bootstrap and aLRT values below 50 percent (bootstrap under 500 and aLRT under 0,5) were considered non-supportive.

3

Results

3.1

CACNA1 L-type family

(20)
(21)

3.1.1 Chromosomal regions

From the selection criteria, described in materials and methods, thirteen neigh-bouring gene families to CACNA1 of L-type were identified. Two of the gene families, voltage-gated calcium channel subunit alpha-2 (CACNA2D) and RNA-binding motif proteins (RBM), are being studied by others in our lab and were therefore not included in this study.

Two of the gene families, solute carrier family 26 (SLC26A) and WNT are com-plex multi-member families that could not be divided into subfamilies and they were therefore excluded from the study. The kinase anchor proteins (AKAP) could only be found in amniotes and were hence excluded from the study.

The families included in the study are: cell division kinase (CDK), ELKS/RAB6-interacting/CAST family (ERC), IQ motif and Sec7 domain containing (IQSEC), lysine specific demethylase (KDM), kelch domain containing (KELCH), translo-case of inner mitochondrial membrane 17 (TIMM17), ubiquitin modifier activat-ing enzyme (UBA) and ubiquitin specific processactivat-ing protease (USP). One of the predicted gene families, CDK, was divided into two subfamilies of which one was included in this study. All phylogenetic trees for the neighbouring families to the CACNA1 L-type family can be found in Supplementary figure S1.

For six of the eight studied families the phylogenetic trees as well as the positional data (figure 6) are in accordance with expansion in 2R. For two of the families, CDK and ERC, the topologies of the ML trees do not support an expansion in 2R; however, both the NJ trees as well as the positional data are consistent with an expansion in the vertebrate lineage.

No orthologue to the genes located on human chromosome X could be identified in the chicken genome. Missing genes in the chicken genome database is not uncommon and will be considered further.

An additional orthologue to human UBA1 was identified on mouse chromosome Y. KDM5D could only be identified in the human, mouse and opposum genomes, located on the Y chromosome in all species.

(22)
(23)

Four of the eight studied neighbouring gene families have some members with two teleost co-orthologues to one amniote gene which is consistent with an addi-tional duplication in 3R. The zebrafish genome has three ERC1 genes, one located on chromosome 25 and two shorter genes located close to each other on chromo-some 4. These two genes are located in opposite directions. None of the genes are complete but if put together they form a full length gene. Even though no full length protein could be identified in a protein database (nr "non-redudant" in NCBI), a combination of the two parts were used in the analyses.

Four of the CACNA1 L-type genes are located on chromosome 8 in the zebrafish genome. When the genes on chromosome 8 are sorted according to their position along the chromosome they form four distinct regions corresponding to differ-ent paralogon members, two of which appear to be 3R copies. CACNA1S and KDM5B are located in between these two parts and CACNA1D and ERC2 are located in one end.

One CDK gene present in the zebrafish, medaka and pufferfish genomes (located on zebrafish chromosome 9) does not cluster with any human genes in any of the phylogenies and it is therefore not possible to assign its orthologous relationship to the other genes in the family. However, the chromosomal positions in pufferfish and medaka are in regions that form an established 3R pair together with the chromosomes housing the genes coding for CDK18 (Kasahara et al., 2007). The genes could also be a fourth retained 2R copy that has been lost in the tetrapod lineage.

3.2

CACNA1 N-type family

(24)
(25)

3.2.1 Chromosomal regions

From the selection criteria described in materials and methods twelve neighbour-ing gene families to CACNA1 of N-type were identified. One of these families, NOTCH, has been previously studied by Theodosiou et al. (2009) and was there-fore not included in our phylogenetic study. However, the chromosomal position data were included in figure 8.

The families included in the study are: angiopoietin (ANGPTL), bromodomain (BRD), collagen alpha chain precursor (COL), glycosyltransferase 25 (GLT), LIM homeobox (LHX), olfactomedin (OLFM), sulfhydryl oxidase precursor (QSOX), ral guanine nucleotide dissociation stimulator (RALGDS), RAS protein activator like (RASAL), SEC16 homolog (SEC) and syntaxin (STX). One of the predicted gene families, RALGDS, was divided into two subfamilies of which one was in-cluded in this study. All phylogenetic trees for the neighbouring families to the N-type of CACNA1 can be found in Supplementary figure S2.

For six of the eleven studied families the phylogenetic trees as well as the posi-tional data (see fig. 8) are in accordance with an expansion in 2R. For two of the families, GLT and RALGDS, no orthologue could be identified in the tunicate or amphioxus lineages and although the topologies from the phylogenetic trees as well as the positional data support an expansion in 2R the families might have expanded already before 2R.

For the BRD and SEC families the topology from one of the phylogenetic methods does not support an expansion in 2R, however, the topology from the other phy-logenetic tree as well as the positional data are consistent with expansions in the vertebrate lineage. The topologies from the phylogenetic studies of the RASAL family do not support an expansion within the vertebrate lineage but according to the positional data the genes still seem to be located within the paralogon.

Only a few orthologues to the genes located on human chromosomes 6 and 19 could be identified in the chicken genome, see discussion.

(26)
(27)

Seven of the eleven studied neighbouring gene families have some members with two orthologous teleost sequences to one amniote gene which is consistent with an additional expansion in 3R.

Possible traces of an unknown member of the RALGDS family were found in the zebrafish genome on chromosome 21. However, due to its short lenght it was not included in the study. NOTCH2 located on zebrafish chromosome 13 was not included in the phylogenetic study made by Theodosiou (2009) and is therefore shown in white in figure 8.

Three partial COL5A3 genes are located next to each other on chromosome 3 in the zebrafish genome, one of which is reversed relative to the other two. Their nucleotide sequences have a high degree of identity. Even though local duplicates are not uncommon for the collagen gene family this could also be due to assembly errors. Only the gene coding for the longest protein was used in the studies. Three short collagen genes were identified in the pufferfish genome, one in the medaka genome and one in the anole lizard genome. The protein sequences encoded by these genes were all too short to include in the study and their orthologous rela-tionships could therefore not be established.

4

Discussion

The presence of three types of CACNA1 genes in the fruit fly genome suggests that three ancestral CACNA1 genes were present before the emergence of the ver-tebrate lineage (Jegla et al., 2009). The CACNA1 genes of L and N type seem to have expanded before the emergence of vertebrates in 2R followed by an ad-ditional teleost-specific expansion in 3R, which is described in more detail in this report. An analysis of the T-type family (work in progress) indicate that this fam-ily also expanded in the same manner.

The existence of the paralogons housing the L and N-type families is supported by the reconstructed linkages showing similarities between the amphioxus and hu-man genomes (Putnam et al., 2008) as well as the reconstruction of the vertebrate ancestral karyotype (Nakatani et al., 2007). A more extensive investigation of the L-type paralogon is currently in progress.

(28)

expan-sion in 2R. The phylogenies for the remaning two were ambiguous, however, the positional data support an expansion in 2R.

For the CACNA1 N type gene family six of the eleven neighbouring gene fam-ilies support an expansion in 2R. Two of the famfam-ilies lack sequences in species serving as a point of relative dating. However, both the topologies as well as chro-mosomal data support expansions in 2R. The phylogenies for two of the families were ambiguous, however, as for the L-type genes the positional data support an expansion in 2R. The remaining neighbouring gene family show topologies that do not support expansion in 2R, however, this is supported by the chromosomal data. The deviating phylogenetic trees are probably due to uneven evolutionary rates. The previous study of the neighbouring family NOTCH (not included in our phylogenetic studies) suggests that also this family expanded in 2R. (Theodosiou

et al., 2009).

The high level of retained 2R and 3R copies of CACNA1 genes suggests sub-or neofunctionalisation of the genes and, as shown in table 1, the ten different CACNA1 genes display a wide range of functions. One could conceivably specu-late that these expansions contributed to the increased complexity of the vertebrate nervous system. This study together with the previous study of the related voltage-gated sodium channels (Widmark et al., 2011) (that have retained all 2R and 3R copies) further supports the theory that important developmental and neuronal gene copies are more likely to be retained (Santini et al., 2009).

From the chromosomal data in fig. 6 and fig. 8 it appears as if some rearrange-ments have occurred in the zebrafish genome which makes evolutionary studies more difficult. Four CACNA1 genes are located on zebrafish chromosome 8, fig. 6. However, when the genes on this chromosome are sorted according to their position it seems as if regions from one 3R pair have been joined together on one chromosome, separated by a small region of genes orthologous to another human chromosome. Genome rearrangements in the teleost lineage has been previously proposed (Kasahara et al., 2007) and were recently proven by an analysis of the spotted gar (Lepisosteus oculatus) genome (Amores et al., 2011). The spotted gar diverged from the teleost lineage before 3R and was found to share a more similar gene organisation with the human genome than with that of teleost fish.

(29)

have been found in other avian species which further suggests that not all of these genes are actually lost in the chicken genome.

It is interesting that two gene families with genes located on human chromosome X (KDM and UBA) also have members on the Y chromosome. KDM5D was identified in the human, mouse and opossum genomes while UBA1 was only identified in the mouse genome, all on the Y chromosome. The human KDM genes located on the X and Y chromosomes are not a part of the pseudoautosomal regions that recombine during meiosis. However, the mammalian sex chromo-somes are thought to have emerged from one pair of autochromo-somes (Graves, 2006). Therefore, it is possible that the two UBA1 genes in mouse as well as KDM5C and 5D in several mammals originate from two different alleles that were both retained after the emergence of the sex chromosomes.

This study is based on predictions of genes that may potentially involve some uncertainties. To avoid this, several species are included in the study and the se-quences are compared with each other and the sequence alignments are inspected and manually curated and edited if necessary. The phylogenies are based on algo-rithms that sometimes are optimised for each dataset. In order to reduce the risk of biases two different phylogenetic algorithms have been used.

(30)

5

Acknowledgements

(31)

6

References

ABASCAL, F., ZARDOYA, R., AND POSADA, D. 2005. ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21:2104–2105.

ALTSCHUL, S. F., GISH, W., MILLER, W., MYERS, E. W., ANDLIPMAN, D. J. 1990. Basic local alignment search tool. J Mol Biol 215:403–410.

AMORES, A. 1998. Zebrafish hox Clusters and Vertebrate Genome Evolution.

Science 282:1711–1714.

AMORES, A., CATCHEN, J., FERRARA, A., FONTENOT, Q., AND POSTLETH -WAIT, J. H. 2011. Genome evolution and meiotic maps by massively parallel dna sequencing: spotted gar, an outgroup for the teleost genome duplication.

Genetics 188:799–808.

BURGE, C. AND KARLIN, S. 1997. Prediction of complete gene structures in

human genomic DNA. J Mol Biol 268:78–94.

CATTERALL, W. A. 2011. Voltage-gated calcium channels. Cold Spring Harb

Perspect Biol 3:a003947.

CONANT, G. C. AND WOLFE, K. H. 2008. Turning a hobby into a job: how

duplicated genes find new functions. Nat Rev Genet 9:938–950.

COULIER, F., POPOVICI, C., VILLET, R.,ANDBIRNBAUM, D. 2000. MetaHox

gene clusters. J Exp Zool 288:345–351.

DALLOUL, R. A., LONG, J. A., ZIMIN, A. V., ASLAM, L., BEAL, K.,

BLOMBERG LE, A., BOUFFARD, P., BURT, D. W., CRASTA, O., CROOI -JMANS, R. P., COOPER, K., COULOMBE, R. A., DE, S., DELANY, M. E., DODGSON, J. B., DONG, J. J., EVANS, C., FREDERICKSON, K. M., FLICEK,

P., FLOREA, L., FOLKERTS, O., GROENEN, M. A., HARKINS, T. T., HER -RERO, J., HOFFMANN, S., MEGENS, H. J., JIANG, A., DE JONG, P., KAISER, P., KIM, H., KIM, K. W., KIM, S., LANGENBERGER, D., LEE,

M. K., LEE, T., MANE, S., MARCAIS, G., MARZ, M., MCELROY, A. P.,

MODISE, T., NEFEDOV, M., NOTREDAME, C., PATON, I. R., PAYNE, W. S., PERTEA, G., PRICKETT, D., PUIU, D., QIOA, D., RAINERI, E., RUFFIER,

M., SALZBERG, S. L., SCHATZ, M. C., SCHEURING, C., SCHMIDT, C. J.,

SCHROEDER, S., SEARLE, S. M., SMITH, E. J., SMITH, J., SONSTEGARD, T. S., STADLER, P. F., TAFER, H., TU, Z. J., VANTASSELL, C. P., VILELLA,

(32)

DAVID, L., BLUM, S., FELDMAN, M. W., LAVI, U.,ANDHILLEL, J. 2003. Re-cent duplication of the common carp (Cyprinus carpio L.) genome as revealed by analyses of microsatellite loci. Mol Biol Evol 20:1425–1434.

DEHAL, P. AND BOORE, J. L. 2005. Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol 3:e314.

DREBORG, S., SUNDSTRÖM, G., LARSSON, T. A., AND LARHAMMAR, D. 2008. Evolution of vertebrate opioid receptors. Proc Natl Acad Sci U S A 105:15487–15492.

EVANS, B. J., KELLEY, D. B., TINSLEY, R. C., MELNICK, D. J., AND CAN

-NATELLA, D. C. 2004. A mitochondrial DNA phylogeny of African clawed

frogs: phylogeography and implications for polyploid evolution. Mol

Phylo-genet Evol 33:197–213.

FELSENSTEIN, J. 1981. Evolutionary trees from DNA sequences: a maximum

likelihood approach. J Mol Evol 17:368–376.

FINN, R. D., CLEMENTS, J., AND EDDY, S. R. 2011. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:W29–37. FORCE, A., LYNCH, M., PICKETT, F. B., AMORES, A., YAN, Y. L., AND

POSTLETHWAIT, J. 1999. Preservation of duplicate genes by complementary,

degenerative mutations. Genetics 151:1531–1545.

GRAVES, J. A. M. 2006. Sex chromosome specialization and degeneration in mammals. Cell 124:901 – 914.

GUINDON, S., DUFAYARD, J. F., LEFORT, V., ANISIMOVA, M., HORDIJK, W.,

ANDGASCUEL, O. 2010. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321.

HAHN, M. W. 2009. Distinguishing among evolutionary models for the mainte-nance of gene duplicates. J Hered 100:605–617.

HOLLAND, L. Z. 2009. Chordate roots of the vertebrate nervous system: expand-ing the molecular toolkit. Nat Rev Neurosci 10:736–746.

HORDVIK, I., DE VRIES LINDSTROM, C., VOIE, A. M., LILYBERT, A., JA -COB, J., AND ENDRESEN, C. 1997. Structure and organization of the im-munoglobulin M heavy chain genes in Atlantic salmon, Salmo salar. Mol

(33)

HUGHES, A. L.AND FRIEDMAN, R. 2003. 2R or not 2R: testing hypotheses of genome duplication in early vertebrates. J Struct Funct Genomics 3:85–93. JAILLON, O., AURY, J. M., BRUNET, F., PETIT, J. L., STANGE-THOMANN,

N., MAUCELI, E., BOUNEAU, L., FISCHER, C., OZOUF-COSTAZ, C., BERNOT, A., NICAUD, S., JAFFE, D., FISHER, S., LUTFALLA, G., DOS -SAT, C., SEGURENS, B., DASILVA, C., SALANOUBAT, M., LEVY, M., BOUDET, N., CASTELLANO, S., ANTHOUARD, V., JUBIN, C., CASTELLI, V., KATINKA, M., VACHERIE, B., BIEMONT, C., SKALLI, Z., CATTOLICO,

L., POULAIN, J., DE BERARDINIS, V., CRUAUD, C., DUPRAT, S., BROT

-TIER, P., COUTANCEAU, J. P., GOUZY, J., PARRA, G., LARDIER, G., CHAP -PLE, C., MCKERNAN, K. J., MCEWAN, P., BOSAK, S., KELLIS, M., VOLFF,

J. N., GUIGO, R., ZODY, M. C., MESIROV, J., LINDBLAD-TOH, K., BIR

-REN, B., NUSBAUM, C., KAHN, D., ROBINSON-RECHAVI, M., LAUDET, V.,

SCHACHTER, V., QUETIER, F., SAURIN, W., SCARPELLI, C., WINCKER,

P., LANDER, E. S., WEISSENBACH, J., AND ROEST CROLLIUS, H. 2004. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature 431:946–957.

JANSEN, R., TIMMERMAN, J., LOOS, M., SPIJKER, S., VAN OOYEN, A., BRUSSAARD, A. B., MANSVELDER, H. D., SMIT, A. B., DE GUNST, M., ANDLINKENKAER-HANSEN, K. 2011. Novel candidate genes associated with

hippocampal oscillations. PLoS One 6:e26586.

JEGLA, T. J., ZMASEK, C. M., BATALOV, S.,ANDNAYAK, S. K. 2009.

Evolu-tion of the human ion channel set. Combinatorial chemistry & high throughput

screening 12:2–23.

KASAHARA, M., NARUSE, K., SASAKI, S., NAKATANI, Y., QU, W., AH -SAN, B., YAMADA, T., NAGAYASU, Y., DOI, K., KASAI, Y., JINDO, T., KOBAYASHI, D., SHIMADA, A., TOYODA, A., KUROKI, Y., FUJIYAMA, A., SASAKI, T., SHIMIZU, A., ASAKAWA, S., SHIMIZU, N., HASHIMOTO,

S.-I., YANG, J., LEE, Y., MATSUSHIMA, K., SUGANO, S., SAKAIZUMI, M., NARITA, T., OHISHI, K., HAGA, S., OHTA, F., NOMOTO, H., NOGATA, K.,

MORISHITA, T., ENDO, T., SHIN-I, T., TAKEDA, H., MORISHITA, S., AND

KOHARA, Y. 2007. The medaka draft genome and insights into vertebrate genome evolution. Nature 447:714–9.

KURAKU, S. 2010. Palaeophylogenomics of the vertebrate ancestor- impact of

hidden paralogy on hagfish and lamprey gene phylogeny. Integrative and

(34)

KURAKU, S., MEYER, A., ANDKURATANI, S. 2009. Timing of genome dupli-cations relative to the origin of the vertebrates: Did cyclostomes diverge before or after? Molecular Biology and Evolution 26:47–59.

LARKIN, M. A., BLACKSHIELDS, G., BROWN, N. P., CHENNA, R., MCGET

-TIGAN, P. A., MCWILLIAM, H., VALENTIN, F., WALLACE, I. M., WILM,

A., LOPEZ, R., THOMPSON, J. D., GIBSON, T. J., AND HIGGINS, D. G. 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23:2947–2948. LARSSON, T. A., OLSSON, F., SUNDSTRÖM, G., LUNDIN, L. G., BRENNER,

S., VENKATESH, B., AND LARHAMMAR, D. 2008. Early vertebrate chro-mosome duplications and the evolution of the neuropeptide Y receptor gene regions. BMC Evol Biol 8:184.

LIPSCOMBE, D., HELTON, T. D., AND XU, W. 2004. L-type calcium channels: the low down. J Neurophysiol 92:2633–2641.

LYNCH, M. AND CONERY, J. S. 2000. The evolutionary fate and consequences of duplicate genes. Science 290:1151–1155.

MEYER, A. AND VAN DE PEER, Y. 2005. From 2R to 3R: evidence for a

fish-specific genome duplication (FSGD). Bioessays 27:937–945.

NAKATANI, Y., TAKEDA, H., KOHARA, Y.,ANDMORISHITA, S. 2007.

Recon-struction of the vertebrate ancestral genome reveals dynamic genome reorgani-zation in early vertebrates. Genome Res 17:1254–1265.

OHNO, S. 1970. Evolution by gene duplication. Springer-Verlag.

PANOPOULOU, G. AND POUSTKA, A. J. 2005. Timing and mechanism of an-cient vertebrate genome duplications – the adventure of a hypothesis. Trends in

genetics : TIG 21:559–67.

PUTNAM, N. H., BUTTS, T., FERRIER, D. E., FURLONG, R. F., HELL

-STEN, U., KAWASHIMA, T., ROBINSON-RECHAVI, M., SHOGUCHI, E., TERRY, A., YU, J. K., BENITO-GUTIERREZ, E. L., DUBCHAK, I., GARCIA

-FERNANDEZ, J., GIBSON-BROWN, J. J., GRIGORIEV, I. V., HORTON, A. C.,

DE JONG, P. J., JURKA, J., KAPITONOV, V. V., KOHARA, Y., KUROKI, Y., LINDQUIST, E., LUCAS, S., OSOEGAWA, K., PENNACCHIO, L. A.,

SALAMOV, A. A., SATOU, Y., SAUKA-SPENGLER, T., SCHMUTZ, J., SHIN, I. T., TOYODA, A., BRONNER-FRASER, M., FUJIYAMA, A., HOLLAND,

L. Z., HOLLAND, P. W., SATOH, N., AND ROKHSAR, D. S. 2008. The

(35)

SAITOU, N. AND NEI, M. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425.

SANTINI, F., HARMON, L. J., CARNEVALE, G., AND ALFARO, M. E. 2009. Did genome duplication drive the origin of teleosts? A comparative study of diversification in ray-finned fishes. 9.

SUNDSTRÖM, G. 2010. Evolution of the Neuropeptide Y and Opioid Systems and their Genomic Regions , volume 585. Acta Universitatis Upsaliensis, Uppsala. SUNDSTRÖM, G., LARSSON, T. A., BRENNER, S., VENKATESH, B., AND

LARHAMMAR, D. 2008a. Evolution of the neuropeptide Y family: new genes by chromosome duplications in early vertebrates and in teleost fishes. Gen

Comp Endocrinol 155:705–716.

SUNDSTRÖM, G., LARSSON, T. A., AND LARHAMMAR, D. 2008b. Phyloge-netic and chromosomal analyses of multiple gene families syntenic with verte-brate Hox clusters. BMC Evol Biol 8:254.

TAYLOR, J. S., VAN DEPEER, Y., BRAASCH, I.,ANDMEYER, A. 2001. Com-parative genomics provides evidence for an ancient genome duplication event in fish. Philos Trans R Soc Lond B Biol Sci 356:1661–1679.

THEODOSIOU, A., ARHONDAKIS, S., BAUMANN, M.,ANDKOSSIDA, S. 2009.

Evolutionary scenarios of Notch proteins. Mol Biol Evol 26:1631–1640. VAN DEPEER, Y., MAERE, S., ANDMEYER, A. 2009. The evolutionary

signif-icance of ancient genome duplications. Nat Rev Genet 10:725–732.

(36)

Supplementary figure S1

(37)
(38)
(39)
(40)
(41)
(42)
(43)
(44)
(45)
(46)

Supplementary figure S2

(47)
(48)
(49)
(50)
(51)
(52)
(53)
(54)
(55)
(56)
(57)

References

Related documents

We examined if a change in immu- nological tolerance is accompanied by altered ORAI, STIM and Ca V gene expression in peripheral blood mononuclear cells (PBMCs) in pregnant women and

Re-examination of the actual 2 ♀♀ (ZML) revealed that they are Andrena labialis (det.. Andrena jacobi Perkins: Paxton & al. -Species synonymy- Schwarz & al. scotica while

The crystal structure of the open K V 1.2 channel [25], complemented with modeled regions where electron density was missing, was inserted into a lipid bilayer membrane and

A recent study using Flow Field-Flow Fractionation (FlFFF) to characterize colloidal matter in a creek in Sweden (Lyvén et al., 2003), showed that a significant amount of Ca,

University orientations, academic developers, implicit leadership theories, roles and responsibilities, academic development,

Samtidigt som man redan idag skickar mindre försändelser direkt till kund skulle även denna verksamhet kunna behållas för att täcka in leveranser som

Thus, only small changes of the resin acid molecule (anchor, stalk and effector) can be very powerful; each in combination with the others make it possible both to fine tune

1620 Malin Silv er å Ejneb y Sit e and Mechanism o f Action o f Resin Acids on V olt age-Gat ed Ion Channels 2018.. Site and Mechanism of Action of Resin