• No results found

Phylogenetic analysis of aquatic microbiomes: Evolution of the brackish microbiome

N/A
N/A
Protected

Academic year: 2022

Share "Phylogenetic analysis of aquatic microbiomes: Evolution of the brackish microbiome"

Copied!
78
0
0

Loading.... (view fulltext now)

Full text

(1)

IN

DEGREE PROJECT BIOTECHNOLOGY, SECOND CYCLE, 30 CREDITS

STOCKHOLM SWEDEN 2020,

Phylogenetic analysis of aquatic microbiomes

Evolution of the brackish microbiome ZILING DENG

(2)

Author

Ziling Deng <ziling@kth.se>

School of Engineering Sciences in Chemistry, Biotechnology and Health KTH Royal Institute of Technology

Place for Project

Department of Gene Technology, Science for Life Laboratory KTH Royal Institute of Technology

Examiner

Peter Savolainen

Supervisor

Anders Andersson

Science for Life Laboratory

KTH Royal Institute of Technology

Date of Submussion

26/05/2020

(3)

Abstract

Microorganisms play crucial roles in aquatic environments in determining ecosystem stability and driving the turnover of elements essential to life. Understanding the distribution and evolution of aquatic microorganisms will help us predict how aquatic ecosystems will respond to Global Change, and such understanding can be gained by studying these processes of the past. In this project, we investigate the evolutionary relationship between brackish water bacteria from the Baltic Sea and Caspian Sea with freshwater and marine bacteria, with the goal of understanding how brackish water bacteria have evolved. 11,276 bacterial metagenome-assembled genomes (MAGs) from seven metagenomic datasets were used to conduct a comparative analysis of freshwater, brackish and marine bacteria. When clustering the genomes by pairwise average nucleotide identity (ANI) at the approximate species level (96.5%

ANI), the Baltic Sea genomes were more likely to form clusters with the Caspian Sea genomes than with Swedish lakes genomes, even though geographic distances between Swedish lakes and the Baltic Sea are much smaller. Phylogenomic analysis and ancestral state reconstruction showed that approximately half of the brackish MAGs had freshwater ancestors and half had marine ancestors. Phylogenetic distances were on average shorter to freshwater ancestors, but when subsampling the tree to the same number of freshwater and marine MAG clusters, the distances were not significantly different. Brackish genomes belonging to Acidimicrobiia, Actinobacteria and Cyanobacteriia tended to originate from freshwater bacteria, while those of Alphaproteobacteria and Bacteroidia mainly had evolved from marine bacteria.

Keywords

Metagonome, Aquatic environments, Biodiversity, Evolution, Phylogenetic tree, Ancestor state reconstruction, Biome origin

(4)

Sammanfattning

Mikroorganismer spelar avgörande roller i akvatiska ekosystem där de driver kretsloppen av näringsämnen. En ökad förståelse för hur mikroorganismer anpassar sig till miljöförändringar är viktigt för att förutsäga hur akvatiska ekosystem kommer att förändras som en konsekvens av global uppvärmning, och sådan förståelse kan uppnås genom att studera tidigare skeenden i evolutionen. I detta projekt undersöker vi det evolutionära förhållandet mellan brackvatten-bakterier från Östersjön och Kaspiska havet med sötvattens- och marina bakterier, med målet att förstå hur brackvatten-bakterier har utvecklats. 11,276 bakteriella arvsmassor som rekonstruerats med metagenomik från sju data-set användes för att utföra en jämförande analys av bakterie-genom från söt-, brack och havsvatten. Klustring av genomen baserat på parvis genomsnittlig nukleotididentitet (ANI) på ungefärlig artnivå (96,5% ANI), grupperade Östersjöns bakterier tillsammans med Kaspiska havets bakterier mer än med bakterier från svenska sjöar, trots att det geografiska avståndet mellan svenska sjöar och Östersjön är mycket mindre. Fylogenetisk analys visade att ungefär hälften av brackvatten arterna hade anfäder från sötvatten och hälften från havsvatten. De fylogenetiska avstånden var i genomsnitt kortare till anfaderna i sötvatten, men när man reducerade trädet till att ha samma antal sötvatten och marina arter var avstånden inte längre signifikant olika. Brackvatten-arter som tillhörde Acidimicrobiia, Actinobacteria och Cyanobacteriia tenderade att härstamma från sötvattenbakterier, medan de från Alphaproteobacteria och Bacteroidia främst härstammade från marina bakterier.

Nyckelord

Metagenom, vattenmiljöer, biologisk mångfald, utveckling, fylogenetiskt träd, rekonstruktion av förfäder, biom-ursprung

(5)

Table of Contents

1 Introduction ... 1

1.1 Microorganisms in aquatic environments ... 1

1.1.1 Marine environments ... 1

1.1.2 Freshwater environments ... 1

1.1.3 Brackish environment ... 2

1.2 Evolutionary relationships of bacteria in different aquatic environments ... 3

1.3 Objectives and goals ... 4

2 Materials and Methods ... 5

2.1 Genome datasets ... 5

2.2 Phylogenetic tree construction ... 6

2.2.1 Genomes quality assessment ... 6

2.2.2 Taxonomic annotation ... 6

2.2.3 Phylogenetic tree construction ... 6

2.3 Phylogenetic analyses ... 7

2.3.1 Clustering by ANI value and pruning of tree ... 7

2.3.2 Ancestor states reconstruction ... 8

2.4 Gene prediction and annotation ... 8

3 Results ... 9

3.1 Bacterial diversity ... 9

3.2 ANI analysis ... 12

3.2.1 Distribution of ANI value ... 12

3.2.2 Cluster generation by different ANI limit... 13

3.2.3 Cluster at 96.5% ANI value ... 15

3.2.4 Relationship of ANI value and salinity difference in Baltic Sea clusters ... 16

3.3 Phylogenetic analyses ... 17

3.3.1 Phylogenetic tree and ancestor state reconstrcution ... 17

3.3.2 Biome origin of brackish MAGs ... 19

4 Discussion ... 27

(6)

5 Future perspectives ... 30

Acknowledgements ... 31

Reference ... 32

Tables and figures ... 36

Appendixes ... 51

Appendix I: Taxonomic subsections of the pruned phylogenetic tree ... 51

Appendix II: R codes for phylogenetic analyses ... 59

Appendix III: raw data and results on Uppmax ... 68

(7)

1 Introduction

1.1 Microorganisms in aquatic environments

Microorganisms are ubiquitous in aquatic environments and play crucial roles by being both the main primary producers and by carrying out the bulk of the turnover and recycling of life-essential elements, such as carbon, nitrogen, phosphorus and trace metals. The diversity and abundance of microorganisms in the environment directly affect the stability of the whole ecosystem, and thus have an impact on the survival of life on the planet, because aquatic microorganisms supply roughly 50% of the atmospheric oxygen as well as 15% of the protein content of the world [1].

In light of Global Change, it is important to understand how microorganisms adapt or redistribute in response to changes in their environment, in order to predict how ecosystems will change. To this end, comprehensive analyses of microbial adaptation, distribution and evolution in different environmental conditions are important, which can be realized by sequencing genomes of bacterioplankton in aquatic environments and preforming phylogeomic analyses.

1.1.1 Marine environments

About 70% of Earth’s surface is covered by oceans. Thus, the marine environment is the major habitat of the planet. Many global studies, such as the Global Ocean Sampling Expedition [2] and the Tara Oceans Survey [3,4], are discovering and analysing marine microorganisms’ ecology and evolution with the help of the rapidly developing molecular technologies. There are more than 108 microorganisms/L in surface water, of which most are bacteria. Previous research has shown that bacterial communities in the pelagic zones are mainly dominated by Alphaproteobacteria, Gammaproteobacteria, Flavobacteria, Cyanobacteria, Actinobacteria and Betaproteobacteria [5].

1.1.2 Freshwater environments

There are two types of freshwater environments, running water and standing water, which have different features and different microbial communities [1]. Lakes are highly

(8)

complex standing water environments that can vary widely in chemical conditions regarding i.e. ionic composition [6]. Diversity and abundance of bacterial communities in lakes are related to lake trophic state. In most cases, Actinobacteria, Proteobacteria, Cyanobacteria, Planctomycetes, and Verrucomicrobia are dominating taxa [7].

1.1.3 Brackish environment

Brackish environments have the salinity of in between freshwater and marine water.

The salinity of marine water is 30-40 PSU (practical salinity unit), 0.5-30 PSU for brackish, and less than 0.5 PSU for freshwater. Thus, microorganisms that survive in brackish water often have the ability to adapt to salinity fluctuations. Recent research has indicated that typical brackish microorganisms exist, that are potentially globally distributed [8].

The Baltic Sea is the second largest brackish water body in the world, as an arm of the North Atlantic Ocean connecting by narrow straits. Horizontal salinity gradient in the surface water ranges from 18-26 (PSU) in the Kattegat to 2-4 (PSU) in the innermost Bothnian Bay [9]. Seasonal temperature variations and large fluctuations of salinity in space cause great differences in ecological environment and thus affect biodiversity spatiotemporally [10]. Therefor, the Baltic Sea is one of the best ecosystems to investigate the relatiopship between environment and biodiversity. Bacterial communities in the Baltic Sea have been shown to be constituted of a mixture of lineages closest related to freshwater and marine counterparts. The surface of central Baltic Sea, where native bacterial communies are largely unaffected, is dominated by Bacteroidetes and influenced by typical freshwater groups within Verrucomicrobia, Actinobacteria and Betaproteobacteria [11].

The Caspian Sea is the largest enclosed water body on Earth, endorheic system (has no outflows) and is disconnected from the ocean. Other major brackish water bodies, like the Baltic Sea and the Black Sea, are connect with surrounding marine habitats, and similar to the bacterial communities in estuaries get affected by this marine intrusion, which make it rather difficult to distinguish native species from transient species [12].

However, this problem does not exist for the Caspian Sea, where there is no mixing with marine water. In addition, according to salinity, the Caspian Sea can be roughly divided into three parts, freshwater, brackish water with a salinity gradient, and brackish

(9)

water with stable salinity [13]. While the Brackish Sea is a young ecosystem (>10,000 years), the Caspian Sea is millions of years old. Previous research has indicated that origins of microorganisms in the Caspian Sea are related to both freshwater and marine lineages, such as phylogenetic groups within Alphaproteobacteria and Actinobacteria [12].

1.2 Evolutionary relationships of bacteria in different aquatic environments

For prokaryotes, it is generally believed that environmental conditions are more important than geographic distances for shaping their distribution, due to their high dispersal potential. In addition, lateral gene transfer among prokaryotes help them easier cross environment boundaries than multicellular organisms [14,15].

Many studies make it clear that marine and freshwater bacteria lineages are evolutionarily distinct, suggesting that few marine-freshwater transition events have occurred. From an environmental perspective, there are many factors that can influence habitats and bacterial colonization, such as salinity, temperature, pH and organic matter composition. Of these, the most influential factor in shaping bacterial distributions appears to be salinity [16]. Different salt concentrations in the environment can result in different energetic costs of various metabolic pathways and dissimilatory reactions, which makes it difficult for an organism to adapt to a new salinity level [17].

For young brackish ecosystems, special brackish strains are more likely transported from other brackish environments through winds or others rather than evolving from nearby freshwater and marine strains. The Baltic Sea became brackish around 8000 years ago. In contrast, whole-genome differences between Baltic Sea, freshwater and marine genomes have indicated that these differ by >1% of nucleotides. Such differences likely require much longer time than 8000 years to evolve [8], which suggests that the bacteria inhabiting the Baltic Sea adapted to brackish conditions before the Baltic Sea was formed, and likely immigrated from other brackish water bodies to the Baltic Sea.

(10)

1.3 Objectives and goals

In all aquatic environments, microoranisms play an irreplaceable role in mantaining ecosystem stability, including the biogeochemical cycling of nutrients and the formation of food webs. Understanding the differences of bacterial communities’

distributions and evolution as a consequence of global change can help predicting ecosystem changes. In addition, knowledge on biodiversity and corresponding functions is conductive to the management and utlization of aqutic resources. The aim of this project was to investigate evolutionary history and biogeography of brackish water bacteria by analysing nucleotide identity and constructing a phylogenetic tree using brackish water genomes from different geographic regions, as well as genomes from marine and freshwater. More specificically, the aim was to investigate: 1) how similar in terms of nucleotide identity genomes from freshwater, brackish and marine environments are to each other, 2) adress to what extent brackish genomes are derived from freshwater and marine genomes, respectively, and how this differs for different taxa, 3) to investigate phylogenetic distances to freshwater and marine ancestors to see if transition to brackish conditions has happened continuosly or appear to be focused to some time intervals.

(11)

2 Materials and Methods

2.1 Genome datasets

Metagenome-assembled genomes (MAGs) used in this project were derived from brackish, freshwater and marine environments, and obtained from seven different studies, including one unpublished and six published (Table 2.1.1). The freshwater MAGs were obtained from North American lakes (Lake Mendota and Trout Bog Lake), Swedish lakes and Lake Baikal. The studies for North American lakes and Lake Baikal have been published [18,19], while the dataset for swedish lakes (44 boreal lakes, mainly from Sweden) is unpublished and provided by collaborators at Swedish University of Agricultural Science (SLU; Prof. Stefan Bertilsson, Dr. Moritz Buck).

The brackish MAGs were sampled in the Baltic Sea and the Caspian Sea. The Baltic Sea set has been publihed by Alneberg et al. [20]. The Caspian Sea set is a mixture of previously published MAGs by Mehrshad et al. [12] and new, unpublihsed MAGs from the same samples, assembled and provided by our collaborator Dr. Mehrshad at Uppsala University. The marine MAGs are from the Tara Oceans Survey, and were published by Tully et al. [2] and Delmont et al. [3].

Table 2.1.1 Number of MAGs in each dataset from different studies.

Dataset #MAGs Reference

North American lakes (Freshwater) 193 Linz et al. (2018) [18]

Swedish lakes (Freshwater) 7654 Unpublished

Lake Baikal (Freshwater) 35 Cabello-Yeves et.al.(2018) [19]

Baltic Sea (Brackish) 1989 Alneberg et al. (2020) [20]

Caspian Sea (Brackish) 324 Mehrshad et al. (2016) [12]

Global Ocean T (Marine) 2631 Tully et al. (2019) [2]

Global Ocean D (Marine) 957 Delmont et al. (2019) [3]

(12)

2.2 Phylogenetic tree construction

We have collected a fraction of the reconstructed MAGs used in this study from public databases together with the MAGs from in-house studies [20] and some obtained from collaborators (see Table 2.1.1) in FASTA format and uploaded them to UPPMAX (Uppsala Multidisciplinary Center for Advanced Computational Science).

2.2.1 MAGs quality assessment

In order to select MAGs suitable for constructing a phylogenetic tree, estimating their completeness and contamination is necessary. Draft MAGs in this project were filtered by CheckM, which can estimate the completeness and contamination with help of a reference genome tree and marker gene sets by making random fragment model and genomes contamintaed with foreign DNA model [21]. Because bias in estimation is small (<3%) when genomes with >70% completeness and medium contamination (≤10%) [20], in this project, MAGs with ≥75% completeness and ≤5% contamination were considered as good quality MAGs and used in the continuation of the study.

2.2.2 Taxonomic annotation

Taxonomic annotation was performed by the “classify_wf” function of the Genome Taxonomy Database Toolkit (GTDB-tk) (v 0.3.2), which assigns taxonomic classifications for bacterial and archaeal genomes based on the GTDB reference tree [22,23]. Classify_wf function accepts MAGs in FASTA format and consists of three steps: infer, align and classify [22].

2.2.3 Phylogenetic tree construction

The “de_novo_wf” function of GTDB-tk (v 0.3.2) was used to construct a phylogenetic tree. De__novo_wf function accepts nucleotide sequence files in FASTA format as input and consists of five steps: infer, align, infer, root and decoreta [22]. In order to construct a rooted phylogenetic tree, WAG (Whelan Goldman) amino acid evolution model was selected in the infer step and p__Patescibacteria was selected as outgroup in the root step.

(13)

2.3 Phylogenetic analyses

Most downstream phylogenetic analyses were performed in RStudio (V3.6.2) with the R packages ape, phytools, phangorn and ggplot, using the tree file produced by GTDB- tk and a text file with MAG information as input.

In order to analyse evolutionary relationships among MAGs, the phylogenetic tree should be binary. A rooted bifurcating tree just has two descendants from each interior node. Thus, the multi2di function in R was used to resolve multichotomies (that appear because some MAGs are near identical) in the phylogenetic tree. However, this function could produce some branches with zero length. Therefor we arbitrarily changed all zero-length branches to 5E-04 (which was the smallest length of all non-zero branches) to make sure that the smallest branch length in the phylogenetic tree was >0.

The MAG information file consisted of sampling information and taxonomic annotation for each MAG, including biome (brackish, freshwater or marine), salinity, geographic region and taxonomy.

2.3.1 Clustering by ANI value and pruning of tree

ANI (Average Nucleotide Identity) value between all MAGs was estimated by FastANI (V1.2). 96.5% was chosen as cutoff to do clustering among all MAGs, because previous studies have proved that pairs of genomes with ANI value ≥95% could be classified as the same species [24].

In this project, clustering among all MAGs was done by hclust and cuttree functions in Rstudio. Hierarchial Clustering groups objects based on their dissimilarites by using different agglomeration methods. Here, we chose the UPGMA (unweighted pair group method with arithmetic mean) method (method = “average” in hclust function). Cuttree function cuts the dendrogram from the hierarchial clustering into several groups by cutting at a set height, which was used to generate clusters of MAGs by 0.035 height (corresponding to 96.5% ANI value).

For the purpose of easier phylogenetic analyses, pruning the phylogenetic tree to get rid of redundant information and reduce complexity was necessary. We kept one random MAG per biome per cluster to get a pruned tree and randomly removed some

(14)

freshwater MAGs to get a subsampled tree with the same number of freshwater and marine MAGs.

2.3.2 Ancestor states reconstruction

The ace (Ancestor Character Estimation) function in R was performed to estimate the biome state of each internal node in phylogenetic tree. Maximum likelihood using an ER (Equal Rates) model for discrete characters was used to calculate the likelihoods of each node being freshwater, brackish and marine. The ancestor state for each brackish MAG (which biome it originated from) was defined by finding its closest ancestor node that had a likelihood of >50% of being either freshwater or marine. The biome with highest likelihood for this node was assigned as ancestor state of the MAG.

2.4 Gene prediction and annotation

Prodigal (v2.6.3) was used to identify protein-coding genes in MAGs, running the program on each MAG with the help of snakemake. Prodigal accepts contig sequence in nucleotide FASTA format as input and outputs amino acid sequences of the predicted genes in FASTA format [25].

Gene function annotation was performed with eggNOG-mapper (v2.0.0), based on the eggNOG v5.0 database, using the protein sequences (output of Prodigal) as input and gathering all annotation information in a tabular output file [26,27]. A Python script was designed (anno_table.py, see Appendix III) and used for counting the number of occurrences of each Kegg Ortholog (KO) in the annotation file for each MAG. The script output the counts per MAG per KO in a tabular output file.

(15)

3 Results

We collected 13,783 MAGs in total from seven different studies. The quality of the MAGs from North American lakes (FW) [18], Lake Baikal (FB) [19], Global Ocean T (MT) [2] and Global Ocean D (MD) [3] were assessed by CheckM and MAGs selected by the criteria of having ≥75% completeness and ≤5% contamination. The quality of Swedish lakes MAGs (FS)(unpublished), the Baltic Sea MAGs (BB) [20] and the Caspian Sea MAGs (BC; [12] and unpublished) had already been estimated by CheckM and filtered using the same criteria. Finally, there were 11,509 MAGs with good quality, of which 11,276 were bacterial MAGs and 233 were archaeal MAGs. Of the bacterial MAGs, 7643 were from freshwater, 2268 from brackish water and 1365 from marine water (Table 3.0.1).

Table 3.0.1 Number of MAGs in each dataset before CheckM filtering (#Total) and after CheckM filtering (#Filtered Total, #Filtered Bacterial, #Filtered Archaeal)..

Dataset #Total #Filtered Total #Filtered Bacterial

#Filtered Archaeal

North American lakes 193 84 84 0

Swedish lakes 7654 7654 7554 100

Lake Baikal 35 7 5 2

Baltic Sea 1989 1989 1954 35

Caspian Sea 324 324 314 10

Global Ocean T 2631 990 932 58

Global Ocean D 957 461 433 28

3.1 Bacterial diversity

The MAGs were taxonomically annotated using the Genome Taxonomy Database toolkit (GTDB-tk). In order to get an overview of the taxonomic composition in the three different enviroments, we selected the eight most frequent bacterial classes among 11,276 bacterial MAGs and compared their frequencies in the MAG sets of the different

(16)

biomes (Figure 3.1.1). The most abundant class in brackish datasets was c_Bacteroidia, which was also the second largest bacterial group in freshwater and the third largest in marine datasets. c_Bacteroidia belongs to phylum Bacteroidetes, which colonized all kinds of habitats on Earth, such as soil, ocean, freshwater and animal guts, with the functions for degration of organic matter, like phosphorylases [28]. In addition, Bacteroidia species have good ability in Na+ translocation, due to expression of NOR, Rnf and Oad genes [29]. Interestingly, sodium motive force genes, particularly NQR, are considered as gained genes along the freshwater-marine genome transition [15].

Both c_Gammaproteobacteria and c_Alphaproteobacteria belong to phylum Proteobacteria, and they were the most abundant bacterial class in freshwater and marine datasets, respectively. c_Alphaproteobacteria are good at amino acid transport and metabolism and c_Gammaproteobacteria are good at carboxylate degradation. The different distribution of these two classes can be explained by pH and nutrients. The relative abundance of phylum Actinobacteria, including c__Actinobacteria and c_Acimicrobiia, was highest in brackish and lowest in marine datasets. Actinobacteria have high G+C contents, which are known in most soil, freshwater, and marine habitats to decompose the organic matter to supply nutrients for other lives [30].

c__Cyanobacteriia are photosynthetic bacteria conducting oxygenic photosynthesis.

c__Cyanobacteriia had higher relative abundance in brackish than freshwater and marine datasets. The origin of Cyanobacteria is considered to be land or freshwater and then colonized the marine environment [31], suggesting brackish Cyanobacteria might have evolved from freshwater. c__Verrucomicrobiae was the third most abundant class in freshwater datasets, belong to phylum Verrucomicrobia which encode many carbohydrate metabolizing enzymes and can respond to phytoplankton blooms [32].

c__Planctomycetes have special structure, cell compartment, and they were abundant in brackish and less abundant in freshwater datasets.

(17)

Figure 3.1.1 Distribution of the eight most dominant bacterial classes among all MAGs. The y-axis gives relative abundance (100%) for each class. Each bar represents a biome; F(freshwater), B(brackish) and M(marine), and each color a bacterial class.

When it came to the most dominant bacterial orders among MAGs, the differences among different biomes were more obvious. The most abundant order in brackish datasets was o__Flavobacteriales (19% relative abundant), which was also the most dominant in marine (14% relative abundant), while around 1% relative abundance in freshwater. In addition, although both o__Flavobacteriales and o__Bacteroidales belong to c_Bacteroidia, almost none o__Bacteroidales in brackish and marine but 3%

in freshwater datasets. o__Burkholderiales was the most abundant order in freshwater (18% relative abundant), and the second abundant order in brackish datasets(10%

relative abundant), but less than 2% in marine. However, o__Pseudomonadales that is in the same class as o__Burkholderiales was the second abundant order in marine. The most abundant order in c_Acimicrobiia was o__Microtrichales in brackish and marine, while o__Acidimicrobiales in freshwater datasets. The relative abundance was around 5% of o__Chlorobiales and o__Methylococcales in freshwater, but almost 0% in brackish and marine. Around 6% bacteria were o__Nanopelagicales in freshwater and brackish datasets but just 0.5% in marine.

F B M

Bacteria Diversity

Biome Relative abundance (100%) 020406080100

c__Acidimicrobiia c__Actinobacteria c__Alphaproteobacteria c__Bacteroidia c__Cyanobacteriia c__Gammaproteobacteria c__Planctomycetes c__Verrucomicrobiae others

(18)

Figure 3.1.2 Distribution of the twelve most dominant bacterial orders among all MAGs. The y-axis gives relative abundance (100%) for each order. The x-axis is divided into three parts, shows relative abundant of orders in F(freshwater), B(brackish) and M(marine), respectively. Each bar represents one bacterial order.

3.2 ANI analysis

Since MAGs were assembled seperately from several different datasets, and in some cases from multiple samples within each dataset, some MAGs might represent the same species with almost identical DNA sequence. We ran FastANI to calculate average nuleotide identity (ANI) among all MAGs, with the aim to compare genetic similarity of MAGs between and within different environments, and to cluster MAGs into non- redundant species-level clusters..

3.2.1 Distribution of ANI value

The distribution of ANI values among all MAGs was similar to the distribution of freshwater MAGs (Figure 3.2.1), with a peak at 100% and a large number of MAG pairs with >96.5% ANI. That is because around 68% of the MAGs used in this project were from freshwater, which caused a high redundancy in freshwater MAGs. The frequencies of ANI values below 100% was lower in the marine dataset than the others,

F B M

Bacteria Diversity

Biome Relative abundance (100%) 020406080100

c__Acidimicrobiia;o__Acidimicrobiales c__Acidimicrobiia;o__Microtrichales c__Actinobacteria;o__Actinomycetales c__Actinobacteria;o__Nanopelagicales c__Alphaproteobacteria;o__Rhodobacterales c__Bacteroidia;o__Bacteroidales c__Bacteroidia;o__Chitinophagales c__Bacteroidia;o__Flavobacteriales c__Chlorobia;o__Chlorobiales

c__Gammaproteobacteria;o__Methylococcales c__Gammaproteobacteria;o__Pseudomonadales c__Gammaproteobacteria;o__Burkholderiales c__Planctomycetes;o__Pirellulales c__Phycisphaerae;o__Phycisphaerales c__Verrucomicrobiae;o__Pedosphaerales c__Verrucomicrobiae;o__Opitutales others

(19)

possibly because the two marine datasets were both consisting of non-redundant MAGs [18,19]. Interestingly, the ANI distribution among the brackish MAGs differed from the other datasets in that it peaked at 99.4% and not at 100% as the others. This could be a consequence of the heterogenous environmental conditions in the brackish samples, such as the salinity gradient of the Baltic Sea, favouring more genetic diversity (eg single nucleotide polymorphisms) among different MAGs in the same species.

Figure 3.2.1 Distribution of pairwise inter-MAG ANI value between 90%-100%. Minimum nucleotide identity among all genomes, among brakish genomes, among freshwater genomes, among marine genomes are around 96%. The peak of the distribution for brackish genomes is around 99.4%, while 100% for freshwater and marine genomes.

3.2.2 Cluster generation by different ANI limit

In order to make a comparison of genetic similarity among MAGs of the different biomes (Brackish, Freshwater and Marine), we conducted clustering of the MAGs at a range of different cut-off levels and recorded the number of clusters containing MAGs from different biomes (Figure 3.2.2). As the maximium average within-cluster

ANI value among Genome

ANI

90 92 94 96 98 100

050001000015000 ANI value among Brackish Genome

ANI

Frequency

90 92 94 96 98 100

0500100015002000

ANI value among Freshwater Genome

ANI

90 92 94 96 98 100

0200040006000800012000

ANI value among Marine Genome

ANI

Frequency

90 92 94 96 98 100

050100150200

(20)

nucleotide distance increased (the average within-cluster ANI decreased), the number of clusters that consisted of a single biome decreased, and the number of clusters that consisted of mixtures of serveral biomes increased. The number of single-biome clusters almost stopped to decrease at 0.035 distance (96.5%), in agreement with the ANI distributions (Figure 3.2.1). When max distance was less than 0.02, there were 4 clusters containing both freshwater and brackish MAGs (line F+B), but no cluster for other mixtures of biomes (line B+M, M+F, B+M+F). All the four clusters are composed of c__Actinobacteria;o__Nanopelagicales. In addition, the number of clusters containing both freshwater and brackish MAGs were higher than clusters containing both marine and brackish MAGs at all cut-off levels, although B+M increased sharply at distance >0.2 and almost reached F+B at distance 0.25. These results suggest that bacterial genomes are more similar between freshwater and brackish water than between marine and brackish water.

Figure 3.2.2 Number of MAG clusters as a function of nucleotide distance cutoff-level. The x-axis gives the maximium within-cluster nucleotide distance (average-linkage clustering), y-axis gives the number of clusters. F, B and M depict clusters purely conisting of freshwater, bracksish and marine

0 2000 4000 6000 8000

0 20 40 60 80

0.00 0.05 0.10 0.15 0.20 0.25

Max Distance

Number

colour B B+M F F+B F+B+M M M+F

(21)

MAGs, while the others show clusters containing different mixtures of these types. Numbers of B+M, F+B, MMF+B+M are indicated by the right y-axis, the others by the left y-axis.

3.2.3 Cluster at 96.5% ANI value

A cut-off at 96.5% ANI value (0.035 height) was applied to group all MAGs into 3639 distinct clusters, with 516 brackish clusters, 2093 freshwater clusters, 835 marine clusters and 11 mixture clusters. Detailed cluster distribution among different datasets is shown in Table 3.2.1. The geographic location of the Baltic Sea is much closer to Swedish lakes than to the Caspian Sea. Also, the Sweden Lakes dataset (7554 MAGs) is much larger than the Caspian Sea dataset (314 MAGs). Still, the number of clusters containing both BB (Brackish Baltic) and BC (Brackish Caspian) was larger than the number of clusters containing BB and FS (Freshwater Sweden). This indicates that environmental factors are much more important to bacterial biodiversity than geographic distance. But the influence of geographic distance cannot be ignored, because there were 7 BB+FS clusters and no BC+FS clusters although there were only twice as many BB as BC clusters. Four of these seven BB+FS clusters belong to c__Actinobacteria;o__Nanopelagicales, the other three clusters belong to c__Alphaproteobacteria;o__Sphingomonadales,

c__Verrucomicrobiae;o__Chthoniobacterales, c__Bacteroidia;o__NS11-12g.

Table 3.2.1 Number of clusters consisting of individual datasets and mixtures of datasets when clustering MAGs at 96.5% ANI value. BB: Brackish Baltic, BS: Brackish Caspian, FB: Freshwater Baikal, FW:

Freshwater Wisconsin (North Amercian), FS: Freshwater Sweden, MT: Marine Tully, MD: Marine Delmont.

Datasets Cluster number Mixture Datasets Cluster number

BB 329 BB+BC 40

BC 147 BB+FS 7

FB 5 BB+MT 1

FS 2004 BC+MT 1

FW 84 MT+MD 184

MT 603 BB+BC+MT 1

MD 232 MT+MD+FS 1

(22)

3.2.4 Relationship of ANI value and salinity difference in Baltic Sea clusters

MAGs from the Baltic Sea had been assembled from 123 water samples along the horizontal salinity gradient. In order to investigate if the larger diversity of brackish MAGs that we observed above may be related to genetic diversification related to different niches, we looked at how pairwise ANI values was related to the salinity difference of the samples from where the MAGs were assembled. For each cluster that include more than two Baltic Sea MAGs and at least one pair of MAGs with a salinity difference of >10 PSU we plotted ANI vs. (absolute) salinity difference (Figure 3.2.3).

There were a total of 40 eligible clusters, of which 34 got negative spearman correlation between ANI value and salinity difference for pairs of MAGs. Moreover, after false discovery rate adjustment, 16 of them were significant negative correlations (p value

<0.05), while only 1 had significant positive correlation. This indicates that the lower genetic similarity of brackish MAGs indeed could be related to niche differentiation.

Figure 3.2.3 ANI vs. salinity difference for pairs of MAGs within MAG clusters from the Baltic Sea.

Each box is one cluster, only clusters that have more than two Baltic Sea MAGs and have at least one pair of MAGs with more than 10 in absolute salinity difference were included. The number above each

0 5 10 20 30

949698100 −0.49

matr[1, ]

0 5 10 20 30

949698100 −0.41

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.42

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.1

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.09

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.15

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.18

matr[1, ]

0 5 10 20 30

949698100 −0.15

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.12

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.19

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.42

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 0.04

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 0.56

matr[1, ]

0 5 10 20 30

949698100 −0.03

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.32

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 0.51

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 0.21

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.3

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 0.07

matr[1, ]

0 5 10 20 30

949698100 −0.23

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.52

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.25

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.66

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.27

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.1

matr[1, ]

0 5 10 20 30

949698100 −0.27

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.42

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 0.4

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.05

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.34

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.06

matr[1, ]

0 5 10 20 30

949698100 −0.04

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.22

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.77

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.14

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.16

matr[1, ]

matr[2, ]

0 5 10 20 30

949698100 −0.38

0 5 10 20 30

949698100 −0.37

matr[2, ]

0 5 10 20 30

949698100 −0.34

matr[2, ]

0 5 10 20 30

949698100 −0.25

matr[2, ] c__Acidimicrobiia

c__Actinobacter ia c__Alphaproteobacter ia c__Bacteroidia c__Cyanobacteriia c__Gammaproteobacter ia c__Planctomycetes c__Verrucomicrobiae

c__Acidimicrobiia c__Actinobacter ia c__Alphaproteobacter ia c__Bacteroidia c__Cyanobacteriia c__Gammaproteobacter ia c__Planctomycetes c__Verrucomicrobiae others

(23)

plot is the spearman correlation coefficient (rho). 34 of 40 have negetive spearman correlation coefficient. The different colors represent different classes, the same as in Figure 3.1.1.

3.3 Phylogenetic analyses

3.3.1 Phylogenetic tree and ancestor state reconstrcution

We constructed a phylogenetic tree based on the 11,276 bacterial MAGs using a set of conserved house-keeping genes using GTDB-tk. This tree was pruned by only keeping one random MAG per biome per cluster, using the MAG clustering conducted earlier at 96.5% ANI value. Further, we constructed another subsampled tree that contained the same number of freshwater and marine MAGs to reduce biases caused by dataset size, by randomly removing freshwater MAGs from the pruned tree until the number of freshwater and marine MAGs was the same (Table 3.3.1).

Table 3.3.1 Number of MAGs in pruned tree and subsampled tree

# Total MAGs

# Brackish # Marine # Freshwater

All datasets 11276 2268 1365 7643

Pruned tree 3650 526 1023 2101

Subsample tree 2572 526 1023 1023

Ancestor state reconstruction was performed on the phylogenetic trees to infer biome of origin of the brackish genomes, using maximum likelihood estimation implemented in the ace function of the R package ape (Figure 3.3.1 and Appendix I).

(24)

(a)

Biome B F M Taxon

c__Acidimicrobiia c__Actinobacteria c__Alphaproteobacteria c__Bacteroidia c__Cyanobacteriia c__Gammaproteobacteria c__Planctomycetes c__Verrucomicrobiae others

(25)

(b)

Figure 3.3.1 Phylogenetic tree of the aquatic MAGs. (a) is pruned tree, while (b) is subsampled tree.

The different colors filled in the circles on tips indicate the biome state of genomes, and the different colors for branches indicate taxonomy of these MAGs. The pie charts represent the ancestral biome state likelihood for nodes, red is brackish, green is freshwater, and blue is marine. Pie charts are added to nodes with >0.5 likelihood of a biome state and were this is different from the most likely biome state of its ancestor or one of its descendants.

3.3.2 Biome origin of brackish MAGs

For the purpose of knowing more about the biome of origin of brackish MAGs, we chose to analyze the MAGs closest non-brackish ancestors. In this project, these were defined as the closest ancestor with a likelihood of >0.5 of being either freshwater or marine. Figure 3.3.2 provides an overview of the closest non-brackish ancestors biome state for brackish MAGs.

Biome B F M Taxon

c__Acidimicrobiia c__Actinobacteria c__Alphaproteobacteria c__Bacteroidia c__Cyanobacteriia c__Gammaproteobacteria c__Planctomycetes c__Verrucomicrobiae others

Biome B F M Taxon

c__Acidimicrobiia c__Actinobacteria c__Alphaproteobacteria c__Bacteroidia c__Cyanobacteriia c__Gammaproteobacteria c__Planctomycetes c__Verrucomicrobiae others

(26)

(a)

(b)

Figure 3.3.2 The closest non-brackish ancestor biome state for brackish MAGs. (a) Pruned tree (b) Subsampled tree. Y-axis is the likelihood. Each bar represents the biome likelihoods of the non-brackish ancestor of one brackish MAG. Red gives likelihood of brackish ancestor, green freshwater ancestor and blue marine ancestor.

Table 3.3.2 shows the distribution of closest non-brackish ancestor for brackish MAGs belonging to the most abundant bacterial classes. Since several brackish MAGs may share the same non-brackish ancestor, the number of the unique closest non-brackish ancestors among different classes were also given. Taking the results from the pruned and subsampled tree together, it appears that Acidimicrobiia, Actinobacteria and Cyanobacteriia from brackish water to a high extent have evolved from freshwater bacteria while brackish Alphaproteobacteria and Bacterodia mainly originate from

Non−Brackish Ancestor Biome State for Brackish MAGs

Brackish MAGs likelyhood 0.00.20.40.60.81.0

Non−Brackish Ancestor Biome State for Brackish MAGs

Brackish MAGs likelyhood 0.00.20.40.60.81.0

References

Related documents

Four extant accessions, from Morocco (IG32066), Lanzarote (BGE031112), Gran Canaria (CBT2698), and Tenerife (CBT2609), were chosen to represent the mainland area thought to be

Environmental conditions that vary in space and time influence the distribution, abundance, diversity and evolution of individuals, populations, species and communities.

The aim of this study was to evaluate circulating miRNAs and proteins as potential factors for distinguishing patients with tongue squamous cell carcinoma from healthy

To test whether this aggregation was caused by DCDC2C binding to free tubulin, a bio-layer interferometry (BLI) assay was performed [226]. In this assay, a sensor measures

Show that the uniform distribution is a stationary distribution for the associated Markov chain..

Detta steg kommer att fortgå under hela tiden som projektet pågår och dokumenterar projektet. 2.7

This study will be done by investigating how the level of abuse performed by armed groups such as the Government of Liberia between 1997 and 2003 and the rebel group RENAMO

Study 1 set out to examine whether legitimate power groups are perceived as more powerful and as having more positive traits than illegitimate power groups; whether men and women