• No results found

The metagenomes of root nodules in actinorhizal plants: A bioinformatic study of endophytic bacterial communities

N/A
N/A
Protected

Academic year: 2022

Share "The metagenomes of root nodules in actinorhizal plants: A bioinformatic study of endophytic bacterial communities"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Bachelor thesis, 15 ECTS

The metagenomes of root nodules in

actinorhizal plants

A bioinformatic study of endophytic bacterial communities

Ellen Fasth

(2)

Abstract

Actinorhizal plants are in symbiosis with the nitrogen-fixating soil bacterium Frankia, which forms nodules in the plant root. However, several studies also report other endophytic bacteria appearing in the nodules, but their function and interaction with the host plant or Frankia is not yet understood. This thesis used a bioinformatic approach to investigate the metagenomes of eighteen actinorhizal nodule samples to find out which bacteria are present, how the microbiomes differed from each other, and if the genomes of non-Frankia inhabitants could give indications of any functions. The results showed that the bacterial composition, richness, and diversity differed among the samples, especially between the samples sequenced from the field versus those primarily cultivated in a greenhouse. All samples had a substantial number of sequencing reads belonging to non-Frankia endophytes, such as strains of Enterobacteria, Pseudomonas, Streptomyces, Micromonospora, Mycobacteria and Pseudonocardia. There seemed to be a common microbial community shared among the plants on a family level, since no significant difference was found in the core microbiomes between the field and greenhouse groups. Some sequences found in the metagenomes were annotated as potential functions of the fellow travellers, such as antibiotic synthesis, proteins involved in regulating abiotic stresses, but also probable plant damaging compounds rather associated with pathogens than symbionts.

Key words: Actinorhizal plants, Frankia, endosymbiont, root nodules, metagenomes, bioinformatics

Preface

I would like to thank Katharina Pawlowski and Fede Berckx from the Department of Ecology, Environment and Plant Sciences at Stockholm University, for letting me in on this project and assisting me through it.

(3)

Table of Contents

Abstract ... 2

Preface ... 2

1 Introduction and background ... 4

1.1THE NEED FOR NITROGEN ... 5

1.2ACTINORHIZAL PLANTS ... 5

1.3THE FRANKIA GENUS ... 6

1.4UNDETERMINED SYMBIOSES THE FELLOW TRAVELLERS ... 7

1.5THE METAGENOMIC WORKFLOW ... 8

1.6CHALLENGES WITH METAGENOMIC DATA ... 9

1.7PURPOSE AND THESIS QUESTIONS ... 10

2 Material and method ... 10

2.1FIELD SAMPLING, PROPAGATION, AND SEQUENCING ... 10

2.2MGX TAXONOMIC ANALYSIS ... 11

2.3DATA NORMALIZATION ... 12

2.4FUNCTIONAL GENOMIC ANALYSIS ... 12

2.5STATISTICAL ANALYSIS ... 12

3 Results ... 13

3.1RAREFACTION RESULT ... 13

3.2BACTERIA COMPOSITION AND DIVERSITY ... 13

3.3FUNCTIONAL ANNOTATION ... 18

4 Discussion ... 19

4.1POTENTIAL FUNCTIONS OF THE FELLOW TRAVELLERS ... 23

4.2CONCLUSION ... 26

5 References ... 27

6 Appendix ... 31

APPENDIX 1.BIODIVERSITY INDICES ... 31

APPENDIX 2.TOP TEN FAMILIES PER SAMPLE ... 31

APPENDIX 3.CORE MICROBIOME READS ... 32

(4)

1 Introduction and background

All plant species are coupled with a microbial community, and the symbiotic interactions between them have during the last decades received more attention (Abdelfattah et al. 2021).

The microbiome is known to have advantageous effects on the health of its host plant contributing to for example increased growth or better stress tolerance, while receiving a carbohydrate-rich refuge in return (Franco et al. 2007; Trivedi et al. 2020). The improved understanding of microbes is much due to the modern DNA sequencing techniques that has enabled the identification of non-cultivatable bacteria, accounting for around 99% of the total, by taking whole metagenomes directly from their natural environment (Solden, Lloyd, and Wrighton 2016). This has allowed formerly hidden micro-world of many different niches, such as the concealed symbionts occupying plants, to be unravelled and plant-microbe interactions to be seen in a new light.

The microbiome’s impact on its organism is of growing interest and understanding its influence on plants can have many vital applications in for example agriculture. The current conventional agricultural system is fully dependent on industrial fertilizers to provide the crops with sufficient nutrients, such as phosphorus and nitrogen, to the extent that about half of the human population today could not exist without it (Hoffman et al. 2014). Indeed, nitrogen is the main compound that limits plant growth. Despite being readily available in the atmosphere, it exists in an inaccessible form for plants, for which in some plant species is solved by having nitrogen-fixing prokaryotes to provide it for them. Nitrogen-fixing symbiosis with endophytic bacteria such as the one between legumes (Fabaceae) and rhizobia is well known, but less is known about the one occurring between actinorhizal plants and the soil bacterium Frankia. Just as in the legume-rhizobia interaction, the host plant forms root nodules to encompass Frankia intracellularly, where nitrogen-fixation takes place to convert the inert N2 to the much-needed NH3. However, more evidence points towards the fact that Frankia is not alone. Metagenomic and isolation studies continuously detect the presence of other endophytic bacteria inhabiting the nodule, such as other actinobacteria like Streptomyces, Micromonospora and Nocardia, but whose function is not entirely understood (Qin et al. 2009; Trujillo et al. 2015). When several of these bacteria lack the mechanisms needed for nitrogen-fixation, the question arises whether these fellow travellers may have any other beneficial functions for the plant hosting them.

It is by some argued that the plant and its microbiome should be seen as one entity – a holobiont – concerning the large impact different microorganisms have on plants’ growth and fitness (Guerrero, Margulis, and Berlanga 2013). It could therefore prove fruitful to investigate not only a single species, but the whole ecosystem found in the nodule environment. An increased understanding of new important plant symbionts has many applications in both biotechnology and agriculture such as for improving nutrient acquisition, where for example the intense use of fertilizers comes with serious ecological consequences like eutrophication (Canfield, Glazer, and Falkowski 2010). Facing an increasing world population in combination with climate change, alternative, sustainable solutions need to be examined in order to guarantee the global food security. Further explorations of whole metagenomes derived from plants can give new insights into endophytic bacteria’s relations with each other and their genes or metabolites that influence plant fitness.

(5)

1.1 The need for nitrogen

All living organisms are dependent on biomolecules containing nitrogen, which is readily available as gas form (N2) in the atmosphere. However, the N2 form is inert and needs to be reduced to other forms such as ammonia (NH3) or nitrate (NO3-), in order for living organisms to be able to utilize it (Hoffman et al. 2014). This inaccessibility makes nitrogen one of the main factors that limits agriculture production worldwide. Until the 1950’s and the wide spreading of the Haber-Bosch process that provided industrially made nitrogen that could be used for fertilization, crops depended on nitrogen supply derived from natural processes (Hoffman et al. 2014). This is either through geochemical ways such as bedrock weathering, from decomposing organic material or biological nitrogen fixation, which involves the nitrogenase enzyme complex that fixes atmospheric nitrogen to accessible forms by prokaryotes (Dynarski et al. 2019; Nguyen 2017). These bacteria and archaea that can utilize and survive on atmospheric nitrogen are called diazotrophs (Dixon and Kahn 2004). The nitrogenase enzyme complex in diazotrophs is a two-component system that comprises two proteins, one Fe protein and one MoFe protein which associate and dissociate during a catalytic cycle. The cycle includes electron transfers and hydrolyses ATP for energy supply, in the reaction written as eq.

1 (Hoffman et al. 2014). The enzyme complex is very sensitive to oxygen and is denatured upon exposure (Nguyen 2017).

j

(Eq. 1) N2 + 8e- + 16ATP + 8H+ ® 2NH3 + H2 + 16ADP + 16Pi 1.2 Actinorhizal plants

While nitrogen usually is the most growth limiting element for plants, some species have solved this issue with the support of microorganisms. The plant-microbe mutualistic relationships go far back in time and already in the early evolution of land plants, key genes derived from soil bacteria through horizontal gene transfer increased both abiotic and biotic resistance in plants transitioning to terrestrial areas (Cheng et al. 2019). With plants being able to perform photosynthesis, they provide rich carbon sources for microbial communities which in return can offer desired nutrients, such as the nitrogen fixating bacteria occurring in root nodules in some plant species (Levy et al. 2018). The nitrogen-fixing symbiosis between legumes and the Gram-negative soil bacteria Rhizobia (such as Rhizobium, Mesorhizobium and Ensifer) is well recognised. However, less attention has been given to the symbiosis between actinorhizal plants and the Gram-positive soil bacteria Frankia, which was referred to as simply “non- legume” up until 1978 (Huss-Danell 1997; Pawlowski and Demchenko 2012). The evolutionary origin of both nodule symbioses is from a single common ancestor, followed by multiple independent losses among the orders, and they are both included in the monophyletic Rosid I clade (Griesmann et al. 2018; Pawlowski and Demchenko 2012).

Actinorhizal plants are perennial dicotyledons and are with one exception of Datisca sp., trees or shrubs. They are found world-wide, but are less abundant in the tropics, and are often pioneers in colonizing empty soils (Huss-Danell 1997). The orders that harbour actinorhizal plants are Rosales (Elaeagnaceae, Rhamnaceae, Rosaceae), Fagales (Betulaceae, Casuarinaceae, Myricaceae) and Cucurbitales (Datsicaceae, Coriariaceae). The ability to form Frankia-symbiosis is surprisingly not found in all genera within a family (Huss-Danell 1997; Sellstedt and Richau 2013). The closely related alder and birch trees (Betulaceae) are one example where only Alnus is actinorhizal. The yellow mountain aven Dryas drumondii is the sole actinorhizal species in its genus (Dryas, Rosaceae), while the members included in the

(6)

small families Datiscaceae and Coriariaceae (2 species and 17 species respectively) are all described to be symbiotic (Huss-Danell 1997).

It has been found in the rhizobium-legume symbiosis that flavonoids, a plant secondary metabolite, is used to establish the symbiosis by both attracting the bacteria and activating the canonical nod-genes (Sellstedt and Richau 2013). The excreted flavonoids activate the expression of the bacterial nodulation genes (nodABC) which encode for three specific enzymes: NodA—acyl transferase, NodB—chitin deacetylase, NodC—chitin synthase. These are required for the synthesis of Nod factors, the lipo-chito-oligosaccharide signalling molecules (LCO), which are produced when the nod genes are induced. The LCO compounds trigger a signalling pathway when plant kinases belonging to the LysM-RLK family recognise them. The signalling pathway is also called the common symbiosis pathway since it also is operative in the evolutionary older arbuscular mycorrhiza symbioses. The common symbiosis pathway is also the mechanism in actinorhizal symbioses, however, no signalling molecules similar to the LCOs have yet been identified (Persson et al. 2015). In addition, Frankia genomes do not contain nod genes. The only exceptions are the uncultured cluster-2 strains found in Datisca glomerata and one cluster-3 strain (Frankia sp. NRRL B-16219), which all contain the canonical nodABC genes (Ktari et al. 2017; Persson et al. 2015). Despite an absence of the nod gene clusters in most Frankia genomes, the flavonoid signalling appears to be similar since eight genes related to the flavonoid biosynthetic pathway have been discovered and an increased expression of the genes been observed after the plant was inoculated with Frankia, linking the pathway to the symbiosis process (Auguy et al. 2011). The chemical compounds that the infecting Frankia releases in response to the plant’s flavonoids and which onset the common symbiosis pathway is up to this date unknown. After the common symbiosis pathway is induced, the growth hormone auxin is accumulated post infection to stimulate the nodule formation. Other factors involved are lectin proteins, produced either by the host plant or the symbiont. They are believed to assist in the binding of the endosymbiont to plant structures such as root hairs, and genes having lectin rich domains have been found in for instance Frankia alni (Sellstedt and Richau 2013).

The infection mechanism of Frankia can be either intracellular or intercellular. The intracellular pathway occurs by infection threads in curled root hairs of the plant in which Frankia hyphae enters and penetrates through the cell wall. The infecting hyphae grows into the nodule primordium cells. The deformation of root hairs has as mentioned in legumes been explained as a response by the LCO Nod factors that bind to receptors in the plant, which also clarifies the reason behind the host specificity. However, the mechanism is not fully understood yet in the actinorhiza symbioses in which also non-symbiotic bacteria can induce the root hair curling. The other, intercellular way occurs by infection between the root epidermal cells into the root cortex. The infection induces the root pericycle from which a nodule primordium is generated and the hyphae enters (Pawlowski and Demchenko 2012).

1.3 The Frankia genus

The nitrogen-fixing symbionts of actinorhizal plants is soil bacteria in the Frankia genus, belonging to the family Frankiaceae in the order Actinomycetales (Normand et al. 1996). It is found in several species of woody angiosperms, however, it is not an obligate symbiont and can live freely in the soil (Huss-Danell 1997). The first discovery and naming of Frankia was from

(7)

the roots of alder trees in 1886, named after the biologist A.B Frank, who also has coined the word “symbiosis” (Pawlowski and Nguyen 2017). Today, Frankia is found to induce nodules in eight families from the orders Fagales, Cucurbitales and Rosales, which accounts for around 220 plant species (Persson et al. 2011). It is hypothesized that the bacteria switched from a saprophytic lifestyle to a symbiotic one, and at the same time experienced a decrease in diversity (Sellstedt and Richau 2013). The members of Frankia are phylogenetically divided into four clusters and each cluster is able to engage in symbiosis with different groups of plants (Pawlowski and Demchenko 2012). Cluster 1 contains strains that can nodulate plants in the Fagales order, which includes the actinorhizal families Betulaceae, Casuarinaceae (exception:

Gymnostoma) and Myricaceae. Cluster 2 have the broadest host range in which it can induce nodules; Two families from the Rosales order, Rosaceae and the genus Ceanothus from Rhamnaceae, and two families from the Curcubitales order, Datsicaceae and Corariaceae. This cluster also stands out since it, except for two cases, cannot be successfully cultured. Strains in cluster 3 can nodulate the families Elaeagnaceae and Rhamnaceae (exception: Ceanothus) from Rosales, and Gymnostoma and Morella from the Fagales order. Lastly, cluster 4 are instead strains of Frankia that are found in actinorhizal plants but lack the ability to independently fix nitrogen or form nodules (Pawlowski and Nguyen 2017).

The genome sizes among Frankia cluster ranges from 5 to 10,45 Mbp. The genus is mesophilic and has a slower growth rate with up to 60 days. Its cell wall is of type III, which contains a meso-diaminopimelic acid (Normand et al. 1996). Frankia can grow as three different cell forms: a branched hyphal form of around 0,5-2 um, a multilocular sporangia and vesicles (Pawlowski and Demchenko 2012). The vesicle is a highly specialized cell type, also called a diazo vesicle, that Frankia produces in the ends of hyphae or branches. It is a spheric, lipid- enveloped organelle that protects the nitrogenase enzyme complex from oxygen exposure (Nguyen 2017; Normand et al. 1996). While nitrogen-fixation in legumes is dependent on the host plant to provide the anaerobic environment to shelter the nitrogenase, both the plant and Frankia itself can protect the complex with vesicle and nodule formations (Nguyen 2017). This enables Frankia to fixate nitrogen also in a free-living state (Sellstedt and Richau 2013).The vesicle morphology varies and depends on the host plant, which in Datiscaceae and Corariaceae form filamentous structures. In some genus, such as Causarina spp., no vesicles are formed at all (Huss-Danell 1997).

1.4 Undetermined symbioses – the fellow travellers

Having concluded that clusters of Frankia establish a symbiotic relationship with actinorhizal plants, several studies have revealed other bacteria families occurring in the nodule environment. Examples are Pseudonocardia, Mycobacterium, Microbacterium, Micromonospora, Streptomyces, Streptosporangium, Thermomonosporaceae and Nocardiaceae (Carro et al. 2013; Franco et al. 2007; Ghodhbane-Gtari et al. 2019; Qin et al.

2009; Trujillo et al. 2015; Valdés et al. 2005). Some of these (e.g., species in Streptomyces, Mycobacterium, Micromonospora) have nitrogen-fixing capacity, while others lack the nitrogenase enzyme complex and hence are unable to form a nitrogen-fixing symbiotic relationship like of that in the Frankia-genus (Sellstedt and Richau 2013). Yet, their recurrent presence hints towards the idea that they are established microsymbionts and could have other functions in the nodule. These bacteria will be referred to as “fellow travellers” throughout the thesis. The data collected is scarce and the understanding of these fellow travellers is still limited. Actinobacteria overall are well-known for producing a wide array of secondary

(8)

metabolites involved in nutrient acquisition and in direct defence towards pathogens via antibiosis (Franco et al. 2007). Having observed increased growth and plant performance in plants inoculated with these endophytes, they are thought to have important roles for the host plant that are yet to be determined (Franco et al. 2007; Trujillo et al. 2015)

Some functions of the non-Frankia partners have been studied and proposed. For example, some isolated Frankia are in later trials unable to re-infect the same host that they were found in but can on the opposite re-infect a non-host plant. For example, Frankia irregularis was isolated from Casuarina glauca (a cluster-1 host plant), but able to re-infect members of the Rhamnales (cluster-3 hosts) (Nouioui et al. 2018). Ghodhbane-Gtari provides a hypothesis that these “fellow travellers”, such as Nocardiaceae, enable cluster 3 Frankia to successfully nodulate incompatible host plants. Nocardiaceae is also believed to have other assisting functions in actinorhizal symbiosis, such as producing secondary metabolites, phytohormones like auxin and inducing nodulation in Casuarina glauca. It has been suggested as plant-growth promoting bacteria since it increases both biomass and root and shoot lengths (Ghodhbane- Gtari et al. 2019). Increased plant growth and nutrient uptake of for example iron has also been reported from Streptomyces, Micromonospora and Pseudonocardia (Trujillo et al. 2015).

Other studies have proven functions affecting the plant immune system, which for example inoculation of two Streptomyces spp. strains induced systematic acquired resistance (SAR) and the jasmonate-ethylene pathway against infection of a soilborne fungus (Franco et al.

2007). Streptomyces is also known for generating a wide array of antibiotics and other secondary metabolites, which likewise is reported from other fellow travellers like Micromonospora, Pseudonocardia and Streptosporangium (Alexander et al. 2003; Qin et al.

2009). Certainly, the actinorhizal nodules are a microbiome with many putative inhabitants having a broad spectrum of potential functions. Since many of the bacteria found are difficult to cultivate in the lab, metagenomics provides a valuable tool in the search of finding out who is there and what they are doing.

1.5 The metagenomic workflow

Metagenomics can be defined as the study of several genomes found in a sample derived from the environment (Thomas, Gilbert, and Meyer 2012). The rapid evolvement of faster and cheaper sequencing techniques have enabled high throughput sequencing (HTS), which has allowed researchers to study the metagenomes of non-cultivatable microbial communities, accounting for around 99% of all microorganisms (Kristiansson, Hugenholtz, and Dalevi 2009). A metagenomic sample is thus taken directly from the environment and consists of all genomes of the microorganisms present at the sourced location, giving a unique insight of the community composition in their natural habitat. Compared to regular genomics, which usually focuses on a single organism’s DNA and function, the metagenomic approach deals with numerous genomes from a wide array of different domains or other taxonomic groups. In other words, instead of investigating a “bag-of-genes”, metagenomic studies deals with a “bag- of-genomes” (Frioux et al. 2020)

Following sample collection and DNA sequencing, a metagenomic workflow typically consists of genome assembly, mapping, annotation and binning using multiple different software, before the data is further analysed and presented (Thomas et al. 2012). Depending on the sequencing techniques, the given data consists of hundreds or thousands of short reads that need to be recovered into the full genome again. The first step is usually an assembly, which

(9)

can be done de novo, using a reference genome, or can include both. The assembly merges overlapping sequences and produces contigs or scaffolds as outputs, which are the short gene fragments joined into longer, continuous arrangements. These can afterwards be compared to a reference genome to improve the ordering of the built contigs and lower the error rate, which is called mapping (Edwards and Holt 2009). To investigate the taxonomy found in the sample, the contigs or the raw reads are sorted into different bins based on its relatedness to the certain taxa, referred to as binning. The sequences are with the help of different algorithms either analysed by its conserved nucleotide composition or its similarity is compared to already identified genes, where NCBI and its BLAST function are commonly used as a standard gene reference base (Altschul et al. 1990; Raes, Foerstner, and Bork 2007).

The similarity-based algorithms, that compares the reads to already known genes in a database, are also frequently used in functional annotation (Meyer et al. 2008). Functional annotation of a genome is the process of decoding the DNA sequences revealed in the samples to a biological context by understanding the genome’s functional properties (Abril and Castellano 2018). This process is prone to several challenges and it is estimated that roughly 20-50% of metagenomic sequences cannot be annotated (Thomas et al. 2012). Since the annotation is dependent on mapping the sequences to databases housing already recognized genes, the identification will be completely reliant on the contents of the specific database.

Consequently, factors such as that organisms with a greater economic interest (i.e., microorganisms causing human diseases) may be more studied and constitute larger quantities of the genomic references in databases. This should be considered when annotating non-cultivatable, environmental samples that have a high probability of containing novel genes or microbial strains.

1.6 Challenges with metagenomic data

The development and improvement of high throughput sequencing techniques have provided an insight to many, previously unknown, microbial ecosystems. However, the interpretation of the huge data load received from these samples remains challenging since several systematic variabilities are present. First, the data is highly dependent the sequencing technique used and the management of it affecting its quality. Secondly, metagenomic samples can give considerable differences in GC-content, genome size variations among species and most importantly, have different sequencing depth (number of reads) (Pereira et al. 2018; Weiss et al. 2017). The different sequencing depth, also referred to as library size, in metagenomic studies can be seen as different sampling effort in traditional ecology studies, in which the sampling size or effort will affect the observed richness (Chao et al. 2014). In other words, this can lead to inaccurate reading of the results as for instance assuming higher alpha or beta diversity in samples with more reads (i.e., more available data which have higher probability to contain more taxa). Unequal read sizes have also been proven to yield high type I errors (Pereira et al. 2018; Stier, Bolker, and Osenberg 2016; Weiss et al. 2017; Willis 2019).

Consequently, normalization of data with various library sizes is often a necessity to be able to draw meaningful conclusions about the true biological diversity between them.

One of the standard normalization approaches for comparing richness in ecology and microbial ecology is rarefaction, a method first developed by H.L Sanders (Chao et al. 2014;

Hsieh and Chao 2017; Saary et al. 2017; Sanders 1968; Weiss et al. 2017). The rarefying method normalizes the different library sizes by randomly drawing reads without replacement into new

(10)

sub-samples until all samples reaches the same amount, using the sample with the smallest library size as a threshold (Weiss et al. 2017; Willis 2019). Thus, rarefaction adjusts the samples to a uniform depth by simulating equal sampling effort in the new subsampled datasets (Saary et al. 2017). However, despite being a method commonly used, it is important to note that the usage of rarefaction with microbe-type data also is debated. Some claim rarefying to be an unsuitable and statistically inefficient approach for investigating species abundancy, mainly pointing at the method’s discard of available data which results in underestimating the true variance (McMurdie and Holmes 2014; Willis 2019). Statistically viewed, this means that only using a certain fraction of the available data increases both loss of power and increased uncertainty, including a “artificial uncertainty” regarding the random drawing step performed in rarefaction (McMurdie and Holmes 2014). Nonetheless, when comparing rarefying with other available methods such as logUQ, CSS, DESeqVS and edgeR-TMM for normalizing library sizes, its accuracy is above or equal to the other approaches. In addition, rarefaction was compared to the other methods proven to have among the lowest false discovery rate (type I error). Hence, it is still considered an effective normalization technique in the absence of better alternatives (Weiss et al. 2017).

1.7 Purpose and thesis questions

The purpose of this study is to analyse the metagenomes of 18 actinorhizal plants’ root nodules to map and compare their bacterial composition and diversity. The thesis will especially focus on the “fellow travellers” that are hypothesized to constitute an established part of the nodule environment. Compared to contaminants, these would make up considerably more sequencing reads and have a continuous presence in the samples. Is there a common microbial composition shared among the samples? The metagenomes will also be analysed for genes that could indicate any putative functions of commonly occurring endophytes. Using bioinformatic tools and computational approaches to investigate the metagenomes, three thesis questions will be answered and discussed throughout this thesis:

i. Does the population diversity differ among the nodule samples, and does it differ between field and greenhouse metagenomes?

ii. What bacteria families, other than the Frankiaceae symbiont, are present in the root nodules?

iii. Is there any evidence in the metagenome for these fellow travellers to have a function in the nodule?

2 Material and method

2.1 Field sampling, propagation, and sequencing

The actinorhizal nodule samples used in this study derives from previous work performed by T. Van Nguyen, F. Berckx and K. Pawlowski following the method as described in Nguyen et al. (Nguyen et al. 2019). They were collected from the roots of various actinorhizal plants at different geographical locations, and a detailed list of the samples’ origins is found in table 1.

Some samples (denoted “field”) had the nodules metagenomes analysed directly, while some samples first were inoculated in a new plant (denoted “greenhouse”) before DNA isolation. The

(11)

greenhouse samples have the propagation plant added to their name with an underscore. The plants were grown in a greenhouse at Stockholm University, except for Cv1_Ct which was grown in a greenhouse at the University of California. All nodules were surfaces sterilized to prevent contamination from non-endophytic bacteria. The sequencing was performed using MiSeq sequencer from Illumina, 2x250 bp using paired-end sequencing. A more detailed description of the sampling, inoculation, greenhouse cultivation and sequencing processes can be found in (Nguyen et al. 2016, 2019). .

Table 1. A table describing the geographical origin, sample name, host plant from which the sample was isolated from, division of Frankiaceae cluster, collecting site, sample type (greenhouse/field) and plant used for

propagation if cultivated in greenhouse. *Grown in Californian greenhouse Origin Sample Host plant Frankia

cluster

Collected Type Propagated in

Eurasia Cj1_Ci Coraria japonica 2 Japan Greenhouse Coraria intermedia

Eurasia Cj1_Dg Coriaria japonica 2 Japan Greenhouse Datisca glomerata

Eurasia Cm1_Dg Coriaria myrtifolia 2 France Greenhouse Datisca glomerata

New Zealand Cas1 Coriaria arborea 2 New Zealand (Christchurch)

Field -

New Zealand Cas2 Coriaria arborea 2 New Zealand (Christchurch)

Field -

New Zealand Cas4 Coriaria arborea 2 New Zealand (Christchurch)

Field -

New Zealand Cas5 Coriaria arborea 2 New Zealand (Christchurch)

Field -

New Zealand Cas7 Coriaria arborea 2 New Zealand (Christchurch)

Field -

New Zealand Cas8 Coriaria arborea 2 New Zealand (Christchurch)

Field -

North America Cv1_Ct Ceanothus velutinus 2 USA (California) Greenhouse* Ceanothus thyrsiflorus North America Dd1_Dg Dryas drumondii 2 USA (Alaska) Greenhouse Datisca glomerata

Pacific CiP1 Coriaria intermedia 2 Philippines Field -

Pacific CiP2 Coriaria intermedia 2 Philippines Field -

Pacific CiP3 Coriaria intermedia 2 Philippines Field -

Pacific CiP4 Coriaria intermedia 2 Philippines Field -

Pacific CiT1_Ca Coriaria intermedia 2 Taiwan Greenhouse Coriaria arborea

Pacific Cppng1_Ca Coriaria papuana 2 Papua New Guinea Greenhouse Coriaria arborea

Pacific Cppng1_Ct Coriaria papuana 2 Papua New Guinea Greenhouse Coriaria terminalis

2.2 MGX taxonomic analysis

The raw sequencing reads were first analysed with default settings in MGX (https://mgx- metagenomics.github.io/), an metagenome analysis software (Jaenicke et al. 2018). The MGX- pipeline assigns raw reads into taxonomic bins, relying on the sequencing classifying programs Kraken and DIAMOND. The samples were run only including Bacteria-assigned reads under the attribute NCBI_Family, which matches reads to the taxonomic level of family using the NCBI database. The table with reads per family retrieved was used for further downstream analyses.

(12)

2.3 Data normalization

To normalize the data for read length differences for a more accurate statistical analysis, the rarefaction method was used. In other words, a random dataset containing the exact same number of reads for each sample was generated using the results retrieved from the MGX taxonomic pipeline classified as “Bacteria” and binned into “NCBI_Family”. The total number of reads was calculated for each sample. The smallest dataset of these was used as threshold, which was Cas1 with 746906 reads. For each of the other samples, a random dataset of reads belonging to the different family units was generated, by randomly selecting reads until the set threshold of 746906 counts was reached. This was performed using a customized python script available at https://github.com/andraghetti/bioelfa. The seed was fixed to 0 to ensure reproducibility in the sense that the random sampling procedure can be repeated and yield the same output. Rarefaction curves for both before and after normalization were plotted to confirm a successful rarefication, using the R-package Vegan (Oksanen et al. 2012).

Rarefaction curves also indicates sampling coverage, in the sense that the curves reach a plateau if sufficient amount of data has been collected (De Meyer et al. 2015).

2.4 Functional genomic analysis

To investigate any potential functions of the fellow travellers discovered in the metagenomes, a brief functional analysis of certain samples with high presence of fellow travellers was performed. The forward and reverse files were first quality checked using FastQC through the open-source Galaxy webpage, available at https://usegalaxy.org (Afgan et al. 2018). If needed, low quality reads (Q < 20) and adapters were removed using Trimmomatic on default settings.

The metagenomic reads were thereafter annotated using MG-RAST, a bioinformatic software developed for taxonomic and functional annotation of shotgun metagenomic samples, available at https://mg-rast.org (Meyer et al. 2008). The software uses similarity-based annotation and reads the sequences against several databases, such as NCBI RefSeq. The three most common fellow travellers (Streptomycetaceae, Enterobacteriaceae, Psuedonocardiaceae) in the two core microbiomes were analyzed, using the samples available from the supervisor (Cas2, Cas4, Cas7, Cj_Ci). The annotation was run using 80% identity as a threshold and an E- Value set at 5. The gene sequences sorted by MG-RAST into a specific taxonomic group were manually checked, and selected genes that potentially could be more than basic housekeeping genes were noted. The sites UniProt, InterPro and EMBL-EBI was used to search for gene descriptions. The results also were searched for in literature to compare them with previously published data.

2.5 Statistical analysis

All statistical analyses and data visualisation was performed in R and R Studio, using the normalized data with reads per family in each sample (R Core Team 2021). The aesthetics of graphs and plots were produced with the R-package ggplot2 (Wickham 2016). To view the samples’ bacterial composition, a heatmap (Method = NMDS, Distance = Bray Curtis) was generated using the R-package Phyloseq (McMurdie and Holmes 2013). A principal coordinate analysis (PCA) was performed to visualize the sample differences, which is an established method for microbiome data (Luz Calle 2019). The PCA plot was created with the R-package ggplot2.

For estimating the richness and evenness of bacteria families, biodiversity indices to measure the alpha diversity (sample diversity) were calculated from the number of reads per family for

(13)

each sample, and plotted with the package Phyloseq in R. These biodiversity values, classified into field or greenhouse groups, were statistically tested in an unpaired Wilcoxon rank sum test using R.

The “core microbiome” can be defined to confirm the most prevalent taxa that has a consistent presence in the nodules (Abdelfattah et al. 2021). To conclude the core microbiome for both greenhouse and field samples, all families with minimum 300 reads in at least 75% of all samples were included for each. These were calculated and visualized using Microsoft Excel. A Wilcoxon rank sum test was performed using R to examine any significant difference between the bacteria abundances in the core microbiomes.

3 Results

3.1 Rarefaction result

To be able to do meaningful comparisons between the samples, the MGX-taxonomic binned reads were first normalized using the rarefaction method in a customized Python script.

The rarefying of the data was successfully performed, which is revealed by the rarefaction curves, produced with the R-package Vegan, changing from different to equal sample sizes in Fig. 1. The curves also seem to reach a plateau, which would indicate adequate species coverage.

Figure 1. Rarefaction curves before (top) and after (below) normalization using the rarefying method. The y-axis shows the total number of species, and the x-axis show the sample size on a logarithmic scale.

3.2 Bacteria composition and diversity

The OTU-table (Operational taxonomic unit) retrieved from rarefying the data (Fig. 1) was used in downstream analyses. A total of 498 bacteria families were obtained from the 18 actinorhizal nodule samples. Around 300 families were

(14)

present in each sample, with Dd1_Dg containing the least number of families (206 families in total) and Cv1_Ct being the richest (447 families in total) (Tab. 2)

Table 2. The total number of bacteria families detected in each sample after rarefying.

Sample Cj1_Ci Cj1_Dg Cm1_Dg Cas1 Cas2 Cas4 Cas5 Cas7 Cas8

359 263 247 359 278 350 348 396 295

Cv1_Ct Dd1_Dg Cppng1_Ca Cppng1_Ct CiT1_Ca CiP1 CiP2 CiP3 CiP4

447 206 259 277 384 219 342 394 368

The majority of reads belonged to Frankiaceae, which is the most abundant bacterium accounting for around 300k to 700k reads (Fig. 2, appendix 2). It composed of around 80% or more of the top ten families in CiP4, CiP3, CiP2, CiP1, CiT1_Ca, Cppng1_Ca, Dd1_Dg, Cv1_Ct, Cas1, Cm1_Dg, Cj1_Ci and Cj1_Dg (Fig. 3). However, this was not true for the samples Cppng1_Ct, Cas8, Cas7, Cas5, Cas4 and Cas2, where Frankiaceae reads only covered around 50% or less of the ten most common bacteria and reads belonging to Enterobacteriaceae were in principle equally abundant (Fig. 3). These reads belonged mostly to the genus Klebsiella and Salmonella, according to the MGX binning. All samples had in addition a clear presence of the fellow travellers that previously have been found in actinorhizal nodules, such as Streptomycetaceae, Mycobacterium, Micromonosporaceae, Thermomonosporaceae, Pseudonocardia, Nocardiaceae and Microbacterium, which all are Actinobacteria in the order of Actinomycetales (Fig. 2, 3). Other fellow travellers that comprised a considerable amount was as mentioned Enterobacteriaceae, as well as Pseudomonaceae, Bradyrhizobiaceae and Moraxellaceae which all are Proteobacteria (Appendix 2). Some contained outliers, which were only occurring once or twice among the top ten of the samples, such as Acidobacteria (CiP3, Cv1_Ct), Clostridiaceae (Cas2), and Chitinophagaceae (Cv1_Ct).

Figure 2. Bacteria families that represent at least 0.1% of the reads. The bacterial families constitute the y-axis, and the x-axis shows metagenomes. Method is NMDS and distance is Bray Curtis.

(15)

Figure 3. A barchart showing the 10 most frequent families in each sample (x-axis) in a percentage scale (y-axis).

The full list with exact numbers of reads is found in appendix 2.

Based on principal component analysis (Fig. 4), the samples Cas1, CiP2, CiP3, CiP4 and CiT1_Ca all cluster together while the other Cas-samples together with Cppng1_Ct diverge from the rest. Dd1_Dg and Cv1_Ct seem to be similar in their composition, while Cppng1_Ca, Cj1_Dg, Cj1_Ci, CiP1 and Cm1_Dg remain rather separated from the rest. There seem to be no excessively clear distinction between field and greenhouse samples in their composition of fellow travellers in figure 4.

Figure 4. This shows the beta diversity of the fellow travellers (bacteria families without the Frankiaceae reads) with Euclidean distance, calculated from the number of reads per family in each sample.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

CiP4 CiP3

CiP2 CiP1 CiT Cppng_Ct

Cppng_Ca Dd1 Cv_nod

Cas8 Cas7

Cas5 Cas4

Cas2 Cas1

Cm1 Cj_Ci Cj_nod

10 most common bacterial families per sample

Frankiaceae Streptomycetaceae Micromonosporaceae Pseudonocardiaceae Mycobacteriaceae Nocardiaceae Microbacteriaceae Thermomonosporaceae Nocardioidaceae Streptosporangiaceae Acidobacteriaceae Rhizobiaceae Pseudomonadaceae Enterobacteriaceae Pseudoalteromonadaceae Moraxellaceae Enterococcaceae Bradyrhizobiaceae Micrococcaceae Streptosporangiaceae Phyllobacteriaceae Hyphomicrobiaceae Sphingomonadaceae Chitinophagaceae Clostridiaceae Comamonadaceae

1_Ct

_Dg

Field

Greenhouse

CiT1_Ca

Cj1_Dg 1

(16)

The alpha diversity differed a fair amount between the samples as seen in Fig. 5. All Coraria arborea derivatives showed to have high diversity according to the Shannon and Simpson indices. The Cas7, closely followed by Cas4, samples scored the highest value in the Shannon index and the samples Cas4, Cas5, Cas7 and Cas8 all scored a higher Simpson index. The samples Dd1_Dg, Cm1_Dg, Cj1_Dg, which are all greenhouse samples propagated in Datisca glomerata, had the lowest diversity measured in both the Shannon and Simpson indices. The Chao1 values, which are more focused towards rare and low-abundance species (Kim et al.

2017), differed somewhat from the other indices. The Cas-samples counted much lower, while CiP3, CiP4, CiT1_Ca and Cj1_Ci got a remarkable higher score than in the other biodiversity measures. All of these are derivates of Coraria intermedia, except for the latter (Cj1_Ci) which is a Coraria japonica derivative but propagated in C. intermedia. One sample, Dd1_Dg, received a low index in all three measures and is the only derivative of Dryas drumondii, propagated in Datisca glomerata. The biodiversity measures for the greenhouse samples were compared to the field samples as two groups in an unpaired Wilcoxon rank sum test. The Shannon index had a p-value = 0,034 and the Simpson index a p-value = 0,021, hence both prove a significant difference of the diversity between the greenhouse and field groups (<0,05).

The Chao1 test showed a p-value = 0,274 and have therefore no significant difference between the groups.

Figure 5. The alpha diversity of the bacterial composition (family as taxonomic unit) within the samples. A full table with exact values for the richness estimates is found in appendix 1.

Out of all normalized reads, 41 families in the field samples and 27 families in the greenhouse samples represented a minimum of 300 reads in at least 75% of all samples, which can be seen as the core microbiome of the root nodules (Fig. 6). For the field microbiome, Frankiaceae had a total of 4 239 286 reads, followed by Enterobacteriaceae and Streptomycetaceae accounting for 1 515 482 and 463 717 reads respectively. A large part of the greenhouse microbiome constituted of the same families, but with Frankiaceae accounting for a higher total of 4 772 687 reads, Enterobacteriaceae with 293 965 reads and Streptomycetaceae with 221 446 reads (Appendix 3). Several other of the “fellow travellers” mentioned earlier were included in both core microbiomes, like Micromonospora, Mycobacteria and Pseudonocardia. The Wilcoxon rank sum test gave a p-value = 0,737, which indicates that there is no significant difference (p>0,05) between the bacterial composition of the two core microbiomes.

Field

Green- house

(17)

Figure 6. The core microbiome of nodules retrieved from greenhouse (A) and field (B) samples, comprising a total of 27 and 41 families respectively. Only bacteria families with a relative abundance above or equal to 1% is labeled. The full list of families with the exact number of reads is found in appendix 3.

Nocardiaceae 1%

Enterobacteriaceae 5%

Pseudonocardiaceae

1% Mycobacteriaceae 1%

Micromonosporaceae 1%

Frankiaceae 83%

Streptomycetaceae 4%

Bradyrhizobiaceae 1%

Pseudomonadaceae 1%

A. Greenhouse core microbiome

Moraxellaceae Thermomonosporaceae Burkholderiaceae Microbacteriaceae Nocardiaceae Enterobacteriaceae Rhodobacteraceae Pseudonocardiaceae Mycobacteriaceae Nocardioidaceae Nocardiopsaceae Clostridiaceae Comamonadaceae Sphingomonadaceae Micromonosporaceae Frankiaceae Propionibacteriaceae Streptosporangiaceae Rhizobiaceae Streptomycetaceae Gordoniaceae Geodermatophilaceae Bradyrhizobiaceae Pseudomonadaceae Intrasporangiaceae Micrococcaceae Phyllobacteriaceae

Moraxellaceae 1%

Microbacteriaceae 1%

Nocardiaceae 1%

Enterobacteriaceae 21%

Pseudonocardiaceae 2%

Mycobacteriaceae 2%

Nocardioidaceae 1%

Micromonosporaceae 2%

Frankiaceae 59%

Streptomycetaceae 6%

Bradyrhizobiaceae 1%

Pseudomonadaceae 1%

B. Field core microbiome

Moraxellaceae Thermomonosporaceae Oxalobacteraceae Acidobacteriaceae Burkholderiaceae Microbacteriaceae Nocardiaceae Alcaligenaceae Enterobacteriaceae Rhodobacteraceae Catenulisporaceae Pseudonocardiaceae Mycobacteriaceae Nocardioidaceae Nocardiopsaceae Clostridiaceae Corynebacteriaceae Comamonadaceae Sphingomonadaceae Caulobacteraceae Nakamurellaceae Promicromonosporaceae Methylobacteriaceae Micromonosporaceae Frankiaceae Propionibacteriaceae Streptosporangiaceae Hyphomicrobiaceae Cellulomonadaceae Rhizobiaceae Streptomycetaceae Gordoniaceae Geodermatophilaceae Bradyrhizobiaceae Pseudomonadaceae Rhodospirillaceae Actinomycetaceae Intrasporangiaceae Micrococcaceae Phyllobacteriaceae Xanthomonadaceae

(18)

3.3 Functional annotation

The potential functions from the annotated fellow travellers in the metagenomes, using the MG-RAST software, are listed in table 3. A big part of the annotation results consisted of housekeeping genes, encoding things such as cell structures, metabolism, ribosomal units, DNA helicases, transcription factors and other basic survival functions, which were as described earlier not noted. In addition, a large amount were only annotated as “hypothetical protein” which are sequences that the pipeline recognizes as protein-coding genes, while the function of any homolog has not been confirmed experimentally. However, some genes that seemed to encode the eznymes for the synthesis of secondary metabolites, either for the plant or for the the surrounding microbiome, were detected. These were for example antibiotics and pencillins, heavy metal resistance proteins or different stress-related proteins. Some potentially plant damaging functions were also detected, such as pectate lyase and a putative avirulence (plant pathogenic) protein.

Table 3. The functional annotation results of the fellow travellers Streptomyces, Pseudonocardia and Enterobacteria, using MG-RAST.

Streptomyces

Cas2, Cj1_Ci, Cas4, Cas7

MarR-family regulatory protein Streptomyces sp. e14, Streptomyces lividans TK24

Cas2 organic hydroperoxide resistance protein Streptomyces avermitilis MA-4680, Streptomyces pristinaespiralis ATCC 25486

Cas2 high-affinity nickel-transport protein Streptomyces avermitilis MA-4680

Cas2 RelE family toxin-antitoxin system Streptomyces griseoflavus Tu4000

Cj1_Ci PhoR, phosphate starvation-induced protein Streptomyces roseosporus NRRL 15998, Streptomyces avermitilis MA-4680

Cj1_Ci oleandomycin polyketide synthase Streptomyces hygroscopicus ATCC 53653

Cj1_Ci arsenate reductase Streptomyces coelicolor A3(2)

Cj1_Ci, Cas4 secreted pectate lyase Streptomyces griseoflavus Tu4000, Streptomyces sviceus ATCC 29083

Cj1_Ci phenazine antibiotic biosynthesis protein Streptomyces scabiei 87.22

Cj1_Ci cold-shock domain containing protein Streptomyces sp. Mg1

Cj1_Ci ribose/xylose/arabinose/galactoside ABC-type transporter permease

Streptomyces sviceus ATCC 29083

Cj1_Ci penicillin-binding protein, penicillin acylasw Streptomyces avermitilis MA-4680

Cj1_Ci siderophore biosynthesis protein Streptomyces scabiei 87.22

Cj1_Ci nitrite reductase (NAD(P)H) large subunit Streptomyces avermitilis MA-4680

Cas4, Cas7 Tex-like protein-like protein

(involved in control of toxin biosynthesis operons)

Streptomyces violaceusniger Tu 4113

Cas4 heat shock protein HtpX, heat shock protein 70 Streptomyces griseus, Streptomyces scabiei 87.22

Cas4 amino acid transporter transmembrane protein Streptomyces sviceus ATCC 29083

Cas4 chitinase A, exochitinase 1 Streptomyces avermitilis MA-4680, Streptomyces sviceus ATCC 29083

Pseudonocardia

Cas7 beta lactamase Saccharopolyspora erythraea NRRL 2338

Cas7, Cj1_Ci arsenic-transport integral membrane protein Amycolatopsis mediterranei U32

(19)

Cas7, Cj1_Ci heavy metal / copper-silver translocating P-type ATPase

Amycolatopsis mediterranei U32

Cas7 molybdenum cofactor biosynthesis protein A Amycolatopsis mediterranei U32

Cas7 Cas4

PhoH-like phosphate starvation-induced protein Saccharopolyspora erythraea NRRL 2338

Cas7 RHS repeat-containing protein Amycolatopsis mediterranei U32

Cas4 amino acid transporter Amycolatopsis mediterranei U32

Cj1_Ci trehalose synthase Thermobispora bispora DSM 43833

Cj1_Ci glyoxalase/bleomycin resistance protein/dioxygenase

Saccharopolyspora erythraea NRRL 2338

Enterobacteria

Cj1_Ci macrolide-specific ABC-type efflux carrier Pectobacterium atrosepticum SCRI1043

Cj1_Ci multidrug efflux system subunit MdtB Escherichia coli IAI39, Salmonella enterica subsp enterica serovar Gallinarum str. 287/91

Cj1_Ci chloramphenicol resistance Salmonella enterica subsp. enterica serovar Tennessee str. CDC07-0191

Cj1_Ci, Cas7 arsenical-resistance protein Salmonella enterica subsp. enterica serovar Newport str. SL317

Cj1_Ci Acriflavin resistance protein Erwinia billingiae Eb661

Cj1_Ci nickel-resistant membrane protein-like protein NcrC

Enterobacter cloacae subsp. cloacae ATCC 13047

Cj1_Ci putative avirulence protein (plant pathogen factor) Dickeya dadantii Ech586]

Cj1_Ci Auxin Efflux Carrier Dickeya dadantii Ech703

Cas4 penicillin amidase Serratia proteamaculans 568

Cas7 Heat shock protein 90 Yersinia pestis Antiqua

Cas7 peroxiredoxin Escherichia coli MS 198-1

Cas7 putative stress-response protein Klebsiella pneumoniae 342

4 Discussion

Analysing the metagenomes of the 18 actinorhizal nodule samples, a vast diversity of bacteria families was encountered. Following normalization to account for the different metagenome sizes (Fig. 1), the sample richness among the samples varied between a total of 206 to 447 bacteria families detected (Tab. 2). While it is necessary to normalize data in order to properly compare samples, the rarefaction method does include discarding data. In other words, the number of families are not the exact ones that might actually occur in the nodules and the true variance could be underestimated (McMurdie and Holmes 2014). Interestingly, while the library size of Cas1 was as described in the method used as the threshold, it still has a high number of total families (359) found relative to the other samples. While other samples could have up to six million reads such as Cm1_Dg (Fig. 1), this sample did not contain a greater

(20)

bacterial richness after rarefying, where Cm1_Dg had a total of 247 families (Tab. 2). Indeed, one could argue that rarefaction results in a severe loss of abundance seeing that the largest sample, in respect to library size, ended up with less taxa than what the smallest sample did.

At the same time, to avoid normalization would also affect the observed richness and mean a high risk of incorrectly concluding a non-existent difference when comparing samples, as discussed earlier (Chao et al. 2014; Weiss et al. 2017). The total families found after rarefying can also be explained that many reads in a sample can belong to the same family, and a large library size would not always equal plenty of species present. While the largest sample Cm1_Dg resulted in 247 families, the fourth largest sample Cv1_Ct resulted in the highest, with a total of 447 families (Fig. 1, Tab. 2).

Nonetheless, the established actinorhizal symbiont Frankiaceae was present with many reads in all samples, constituting around 50-80% of the total (Fig. 2). However, in accordance with previous actinorhiza studies, all examined nodules showed to have a constant attendance of other bacteria which occurred in different abundances among the samples (Fig. 2, 3) (Carro et al. 2013; Franco et al. 2007; Ghodhbane-Gtari et al. 2019; Trujillo et al. 2015). The discovered fellow travellers that often recurred in almost all samples were Enterobacteriaceae, Pseudonocardiaceae, Mycobacteriaceae, Streptomycetaceae, Micromonosporaceae, Nocardiaceae, Nocardioidaceae, Thermomonosporaceae and Pseudomonadaceae and there seem to be a common microbial composition between the samples. For instance, one study found Micromonospora strains to be common endophytes in several actinorhizal plants from different families (Carro et al. 2013). This could be due to actinorhizal plants’ core microbiota, which studies call the recurrent microorganisms associated with a certain host (Trivedi et al.

2020). Knowing that at least Frankia among the bacteria shows host specificity in which plants it engages symbiotically with, there are some indications that also the fellow travellers follow a similar pattern (Sellstedt and Richau 2013). It has been suggested that the similar microbiota in plants of the same species is governed by its genetical and morphological traits, but also that vital endosymbionts are inherited through the seeds (Abdelfattah et al. 2021). For instance, all metagenomes from nodules of Coraria intermedia (CiP2;3;4 and CiT1_Ca), except one (CiP1), showed a similar composition of their fellow travellers based on principal component analysis (Fig. 4). This is also true for the Coraria arborea derived Cas-samples (except for Cas1) which seem to share some similarity in their composition (Tab. 1, Fig. 4). C. intermedia and C.

japonica are evolutionary close and together with C. terminalis and C. myrtifolia, belong to the Northern hemisphere lineage (Renner et al. 2020). The samples derived from these host plants, such as CiP2;3;4 and CiT1_Ca, and Cj1_Dg, Cj1_Ci and Cm1_Dg, respectively, are relatively close in their composition of non-Frankia bacteria (Fig. 4). C. arborea is evolutionary closer to C. papuana and are both included in the Southern hemisphere lineage (Renner et al. 2020). Indeed, most Cas-samples together with one C. papuana sample (Cppng1_Ct) bundle rather close together, distinct from the others (Fig. 4). The host plant species also seemed to affect the diversity. All C. arborea metagenomes showed a high diversity according to the Shannon and Simpson indices (Fig. 5) Cas7, closely followed by Cas4, scored the highest value in the Shannon index and the samples Cas4, Cas5, Cas7 and Cas8 all scored a high Simpson index. The samples Dd1_Dg, Cm1_Dg, Cj1_Dg, which are all greenhouse samples propagated in Datisca glomerata, had the lowest diversity measured in both the Shannon and Simpson indices. The two C. papuana samples were propagated in different plants and indeed show a dissimilar microbial composition (Fig. 3, 4). The propagation method

(21)

could mean that bacterial strains are inoculated on a plant that they are not suitable with, hypothesizing that the fellow travellers show a similar host specificity like Frankia strains.

Just like any ecosystem, the root nodule appears to be a dynamic environment. An interesting result is that sequences of Frankia sp. had a lower abundance in the field samples than in the greenhouse ones (Fig. 2, 3). Instead, the field samples seemed to have a greater biodiversity and had much higher Shannon and Simpson indices than most greenhouse samples (Fig. 5).

The Wilcoxon rank sum test comparing field and greenhouse Shannon indices as two groups had a p-value = 0,034, and the Simpson indices a p-value = 0,021. In other words, they showed a significant difference when comparing their biodiversity. The samples Cas4;5;7;8 had the highest measure in both Shannon and Simpson indices (Fig. 5). For measuring the diversity of microorganisms, the Shannon and Simpson indices are advised to be most suitable. Both estimate species richness and evenness, but Shannon is more skewed towards richness (number of species present) while Simpson leans towards evenness (uniformity of the population sizes of the species present) (Kim et al., 2017). Viewing these indices in fig. 5, the sample Cm1_Dg had the lowest diversity, while it at the same time has the highest presence of Frankia (< 700k reads) of all samples. Most other samples have around 20k to 60k reads per fellow traveller, while the second largest bacteria family after Frankiaceae in Cm1_Dg only has around 13k reads (Appendix 2). This indicates that the diversity differences may be strongly affected by the abundance of Frankia sp. in the nodule. Further studies investigating the impact of Frankia spp. on other bacteria constituting the microbiome would give a better insight on its influence on the nodule’s ecosystem. However, the Chao1 values did not differ between the field and greenhouse groups (p-value = 0,274). The least bacteria rich sample, Dd1_Dg with 206 families detected, had a low Chao1 value while the richest sample, Cv1_Ct with 447 families (Tab. 2), both greenhouse samples, got the highest Chao1 value (Fig. 5). Since this index is focused on low abundance species, it explains that these values are strongly dependent on the number of bacterial families present (Kim et al. 2017). The differences in diversity between field and greenhouse metagenomes, could be explained by common ecological reasons in which competition and predation constantly affect an ecosystem’s composition, and these influences are less strong in the more controlled, enclosed greenhouse location. The plant is well pampered with regular water supply, non-nitrogenous fertilizer (P, K, Mg, Ca) and lack abiotic stresses. The host plant also suffers less from predators and competition that otherwise would affect its survival and susceptibility to different microorganisms. Equally, the field samples could have a higher risk of temporary contaminations from for example surrounding plant pathogens which could be one explanation for the high frequency of Enterobacteriaceae in many field samples (Fig. 3), while the greenhouse ones would be able to stay more unspoiled.

Another important factor that would affect both abundance and diversity of bacteria is the soil differences, remembering that Frankia and the fellow travellers are indeed soil bacteria. With the soil being the richest and most heterogenous microbial reserve on Earth, this will most probably have a considerable effect on the metagenomic composition (Daniel 2005). The high Chao1-value in Cv1_Ct is explained by the large number of reads for rare families that are not as abundant in any other sample, like Hyphomicrobiaceae, Phyllobacteriaceae and Chitinophagaceae (Appendix 2). This was also the single sample grown in a Californian greenhouse, which very likely could explain its divergent endophytome. The two sole North

References

Related documents

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Syftet eller förväntan med denna rapport är inte heller att kunna ”mäta” effekter kvantita- tivt, utan att med huvudsakligt fokus på output och resultat i eller från

Regioner med en omfattande varuproduktion hade också en tydlig tendens att ha den starkaste nedgången i bruttoregionproduktionen (BRP) under krisåret 2009. De

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

I regleringsbrevet för 2014 uppdrog Regeringen åt Tillväxtanalys att ”föreslå mätmetoder och indikatorer som kan användas vid utvärdering av de samhällsekonomiska effekterna av

a) Inom den regionala utvecklingen betonas allt oftare betydelsen av de kvalitativa faktorerna och kunnandet. En kvalitativ faktor är samarbetet mellan de olika

Parallellmarknader innebär dock inte en drivkraft för en grön omställning Ökad andel direktförsäljning räddar många lokala producenter och kan tyckas utgöra en drivkraft

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar