Uncovering the genetic organisation of Claroideoglomus candidum

(1)

1

Uncovering the genetic organisation of Claroideoglomus candidum

George B Cheng

Degree project in biology, Master of science (2 years), 2019 Examensarbete i biologi 45 hp till masterexamen, 2019

Biology Education Centre and Department of Evolutionary Biology, Uppsala University Supervisors: Anna Rosling and Marisol Sanchez Garcia

External opponents: Jente Ottenburghs and Boel Olsson

(2)

1 Acknowledgements

I would like to thank my supervisors Dr. Anna Rosling and Dr. Marisol Sanchez Garcia from the department of Ecology and Genetics at Uppsala University. If I ran into trouble or had a question about my research or writing their doors were always open for me. They were both very

supportive and kept me grounded throughout the project. I would also like to extend my gratitude

towards the rest of the Rosling research group at Uppsala University for also being supportive

and teaching me about their ongoing research. I would like to thank my two external opponents

Dr. Jente Ottenbughs and Boel Olsson from Uppsala University. I’m grateful for the comments

they made for this thesis that helped shape the final product. Finally, I would like to thank my

parents and my brother for supporting and encouraging me throughout these two years leading

up to the end of the thesis.

(3)

2 Abstract

Arbuscular mycorrhizal (AM) fungi are hypothesized to have been key players in facilitating the transition from aquatic to terrestrial plants and continue to benefit plants through their symbiotic association after 450 million years. These fungi form mycelia that can contain hundreds of nuclei within one aseptate cytoplasm, which leads to the ongoing debate on whether these

multinucleated fungi are homokaryotic or heterokaryotic. Not only is there evidence to support the hypothesis of the nuclei as genetically identical, but also the other hypothesis of divergent nuclei within a single strain. There has been no evidence of sexual reproduction, however specialized genomic regions specific to meiosis and a putative mating-type (MAT) locus have recently been identified and may help answer the ongoing debate between homokaryosis and heterokaryosis.

In this study I applied de novo genome assembly and annotation of 24 individual nuclei from a single spore of Claroideoglomus candidum. The full length of the de novo genome assembly was 87.6 Mb with 17,542 genes. Estimated polymorphism between the nuclei was very low. I

identified the MAT locus in C. candidum, using a previously sequenced MAT locus from

another congeneric species. Only one of the MAT locus alleles was found in the examined spore.

The evidence points towards homokaryosis as the genetic organization of Claroideoglomus

candidum.

(4)

3 Acknowledgements ... 1

Abstract ... 2

Introduction ... 4

AM Fungal Symbiosis ... 4

Evolutionary Persistence of AMF ... 6

Genome Sequencing ... 8

Project Aims ... 9

Methods ... 10

Origin of reads ... 10

De novo genome assembly ... 10

Genome Annotations ... 11

Variant Calling ... 12

MAT Locus ... 12

Results ... 14

Reference Genome ... 14

Individual Nuclei Assemblies ... 16

Single Nucleotide Polymorphisms ... 16

MAT locus... 16

Discussion... 21

References ... 25

(5)

4 Introduction

AM Fungal Symbiosis

Symbiotic associations can be formed between a vast range of different organisms in different environments, from the red-billed oxpecker picking ticks off large mammals, to the bacteria that facilitate the tube worms living on deep hydrothermal vents, or the bacteria and fungi that sustain plants within their roots (Cordes et al. 2005, Mikula et al. 2018). These symbiotic relationships can be in the form of mutualistic, parasitic, or commensalistic associations, which can be further divided into facultative or obligatory alliances. The obligatory symbiosis occurs when one or both symbionts completely depend on the other to survive, whereas the facultative symbiosis is an optional relationship between the symbionts capable of surviving independently. One of the oldest and often overlooked obligate symbiotic relationships is found between terrestrial plants and arbuscular mycorrhizal (AM) fungi. This 450-million-year-old relationship can be found in nearly 80% of all land plants (Martin 2016).

AM fungal symbiosis, which was established before mutualistic interactions evolved between insects and vertebrates, was arguably the essential driving force for successful plant colonization on land (Kiers et al. 2011, Redecker et al. 2000, Heckman et al. 2001). This symbiotic

relationship can have a deep impact on agricultural production. In order to achieve more

environmental-friendly agriculture processes, a better understanding of how to harmonize all

aspects of the agriculture environment including this plant-fungal relationship. Not only do we

need to understand which crops would be best suited for the plot of land, but we also need to be

aware of the microbiota that thrive beneath the surface. The presence of AM fungi facilitates

nutrient uptake by capturing and directing nitrogen and phosphorus to the plant, in exchange for

(6)

5 carbon sources essential for growth and survival of the fungus. Aside from nutrient acquisition for the host plant; AM fungi facilitates mineral and water absorption (Souza 2015). This symbiotic relationship is multifold, in helping to promote resistance and tolerance towards abiotic stress (e.g., drought) and biotic stresses (e.g., pathogens and herbivores) (Campos‐

Soriano et al. 2012, Kiers et al. 2011, Souza 2015). AM fungi can also improve photosynthesis processes by protecting the photosystems within the chloroplasts against heavy metal toxicity by forming compounds that bind to heavy metals and inhibit their movement through to above- ground structures (Zhang et al. 2018). As climate change advances, plants will be exposed to changes in temperature and other abiotic stresses. The plants cold tolerance is improved with AM fungi by inducing higher enzymatic activity and increasing secondary metabolite contents (e.g., flavonoid, lignin) in plants (Chen et al. 2013). Under high temperatures AM fungi can help the plant cope, protecting the plant’s photosystems and increasing plant growth (Mathur et al.

2018).

Understanding and utilizing AM fungi in agricultural practices could reduce the use of chemical fertilizers and pesticides, however one challenge is that AM fungi express species dependent host preferences which can make it difficult to pair to crop species (Angelard et al. 2014, Kim et al.

2017). In a field study by Hijri (2016) potato yield was evaluated in plots inoculated with AM fungi, and found an overall increase in the yield compared to that of uninoculated plots.

However, some inoculated plots experienced a decrease in yield compared to the uninoculated plots, revealing other potential causes of reduction. Hijri (2016) suggested advancing several hypotheses that could explain this reduction; the poor application of the inoculum with

insufficient agitation of the inoculum, surveying for pathogen attacks, competition between AM

(7)

6 fungi in the inoculum and indigenous AM fungi populations. Understanding the local soil

community dynamics and how it affects AM fungi can be crucial to improving product yield.

AM symbiosis can elicit two different community dynamics; positive feedback that strengthens the mutualism between plant and fungal species but decreases the community diversity; negative feedback that weakens the mutualism but contributes to the maintenance of the plant and fungal diversity (Bever 2002). AM fungi can potentially experience genotypic plasticity due to a change in host plants or their environment (Angelard et al. 2014). The study done by Angelard et al.

(2014) suggests that the fungi show potential for adaptability due to its ability to alter its

nucleotype frequencies to better suit its environment or host. If the AM fungi fuse with different plant species simultaneously, the nuclei within the hyphal network may be genotypically

different.

Evolutionary Persistence of AMF

An important mechanism for long-term persistence and adaptation in eukaryotic species has been sexual reproduction. As for asexual reproducers, accumulation of deleterious mutations and loss of adaptivity often leads to extinction. Most fungi are known to reproduce both sexually and asexually. However, for a long time, AM fungi have been thought to only reproduce asexually;

many consider them to be ancient asexuals that defy the basis of evolutionary theory by persisting for 450 million years (Parniske 2008). While sexual reproduction has not been explicitly observed in AM fungi, it has been inferred to occur due to the presence of a putative

“mating-type” locus (MAT locus) similar the mating type of Basidiomycetes fungi (Ropars et al.

2016).

(8)

7 The MAT locus is a specialized region of the genome that codes for the establishment of cell- type identity and orchestrates the sexual cycle. The MAT locus also encodes for global transcription factors which establish cell type identity by controlling the expression of the developmental cascades, it commonly involves homeodomain or other classic transcriptional regulatory elements (Fraser and Heitman 2003). The MAT locus contains genes that can code for homeodomains which code for transcription factors, as well as control the fusion of cells from different individuals (Fraser and Heitman 2003). The recent identification of the MAT locus (Ropars et al. 2016) may help describe the genetic structure between nuclei in AM fungi. The genetic organization of AM fungi could hold the answer for how they have been able to keep up with the changes in their host and environments.

The AM fungal mycelium is organized as of one continuous cytoplasm of aseptate hyphae with multinucleated spores that form and hold hundreds to thousands of nuclei flowing through the entire structure (Marleau et al. 2011). There are two views on the genetic organization of the nuclei; the heterokaryotic hypothesis stating that AM fungi will have genetically different nuclei, and the homokaryotic hypothesis explaining that the nuclei will be genetically highly similar. It is still unclear whether nuclei show significant genetic difference between each other and are homokaryotic or heterokaryotic. One method to determine which hypothesis suits these fungal species involves the identification of AM fungal genes related to mating, specifically the

“mating-type locus” (MAT locus) (Ropars et al. 2016). The MAT locus was located in

Rhizophagus irregularis isolates revealing that R. irregularis produce either homokaryotic or

heterokaryotic mycelia. Within the MAT locus there are two open reading frames that contain

the homeodomain-like region that were designated as HD1-like and HD2. The heterokaryotic

(9)

8 isolates have two alleles of HD1-like and HD2, and homokaryotic isolates would have only one (Ropars et al. 2016).

Genome Sequencing

Genome sequencing methods have been continually expanding especially since the breakthrough of the human genome (Liu et al. 2012). According to the National Human Genome Research Institute (NHGRI 2016), these methods have been constantly improving, lowering the cost drastically compared to the cost in 2001 and making it more accessible to sequence genomes.

This opened avenues of new research for many fields of biology. Assembling genomes unlocks more information about the species of interest, such as identifying proteins, uncovering

regulatory pathways, or evaluating the differences between or within species (Sharman 2001).

When constructing genome assemblies, there are two approaches that can be utilized, reference- based assembly and de novo assembly. The de novo assembly is only utilizing the sequenced reads to construct a genome by comparing each read and using overlapping reads to form longer contiguous sequences (contigs). These contigs are then positioned to create scaffolds that are combined to form the final assembly. The reference-based assembly aligns or maps each read to a previously generated genome sequence of a closely related individual to construct a new genome or identify single nucleotide variations.

Determining whether the species is heterokaryotic or homokaryotic will depend on how each are

defined. One strict definition is that homokaryosis is when genetic composition among the

individual nuclei are the exact same with no single nucleotide polymorphisms (SNPs). In

(10)

9 contrast, if there are significant amounts of SNPs present then heterokaryosis is observed.

Another possible definition combination could be made about the density of SNPs observed in the genome. In the pathogenic fungus, Puccinia striiformis f. sp triticiı, it is known that the homokaryotic and heterokaryotic isolates experience on average 0.41 SNPs/kb and 5.29 SNPs/kb, respectively (Cantu et al. 2013). So, using the heterokaryotic SNP rate from Cantu (2013) as the threshold, single spores that have a SNP rate over 5.29 SNPs/kb will be considered heterokaryotic and those with a SNP density below that threshold will be considered

homokaryotic.

Project Aims

The aim of this master thesis project is to assemble the genome and determine the genomic organization of Claroideoglomus candidum, if it is heterokaryotic or homokaryotic. To do this the sequences of several individual nuclei from a single spore will be compared. Knowing whether they have similar or different nuclei may help us understand how AM fungi can

propagate and reproduce specific nuclei based on the plant species they are or will be colonizing;

and if they have specific traits that can benefit specific species of plants.

(11)

10 Methods

Origin of reads

Claroideoglomus candidum CCK pot B6-9 were isolated from a single spore collected from old field soil in North Carolina, USA. The strain is part of the James Bever collection. From the culture a single spore was isolated and crushed to release nuclei which were then collected using fluorescence-activated cell sorting (FACS). Twenty-four nuclei were extracted from the spore, amplified through multiple displacement amplification and then sequenced with Illumina HiSeq X (Montoliu-Nerin et al. 2019).

In order to compare results and patterns in this study with those of previously studied fungal genomes, the parameters for variant calling were replicated from Chen et al. (2018) which was then followed with a stricter filter for repeats. This was done to avoid potential discrepancies and try to standardize the approach and be able to compare with other genomic data. Concerns with comparability between studies was expressed by Ropars and Corradi (2015), since there are many different techniques in SNP calling, each could produce different results and conclusions about SNP detection.

De novo genome assembly

The raw reads from each nucleus were normalized before constructing the assembly using

bbnorm of BBMap v. 38.08 (Bushnell 2014) with an average depth of 100x to reduce potential

errors downstream. De novo assemblies for each nucleus were made using SPAdes assembler v

3.11.1 (Bankevich et al. 2012) with default parameters. The individual assemblies were good

quality representing the majority of reads but encountered issues when attempting to construct

(12)

11 the reference assembly using the Lingon pipeline (Montoliu-Nerin et al. 2019). The individual nuclei assemblies were reassembled with the raw reads using MaSuRCA (Zimin et al. 2013) and used in the Lingon (Grabherr 2018) pipeline to create the reference genome assembly. The quality assessment and the statistics of the individual nuclei assemblies and the reference assembly were performed using BUSCO v. 3.0.2b (Simão et al. 2015) to evaluate the completeness and Quast v. 4.5.4 (Gurevich et al. 2013) to obtain statistical metrics of the assembly. Using the metrics from the individual assemblies, two of the nuclei (4, 7) were removed from further analysis due to poor quality in assembly (Table 1).

KmerGenie v. 1.7039 (Chikhi & Medvedev 2014) was used to estimate the genome size.

Combinations of different number of nuclei were used to generate assemblies to assess the quality and determine how many nuclei should be used to produce a full genome assembly.

Genome Annotations

Annotations were done using a snakemake workflow of different programs that was specifically developed to be used in the larger arbuscular mycorrhizal genomic project ongoing in the lab.

RepeatModeler v. 1.0.8_RM4.0.7 (Smit 2008) was used to predict repeats and create a repeat library that was used by RepeatMasker v. 4.0.7 (Smit 2015) to mask the genome assembly.

GeneMark v. 4.33-es (Ter-Hovhannisyan 2008) was used to predict the protein coding genes from

the repeat-masked assembly. InterProScan v. 5.30-69.0 (Jones et al. 2014), GenomeTools v. 1.5.9

(Gremme et al. 2013), blast v. 2.6.0+ (Camacho et al. 2009), and MAKER v. 3.01.1-beta (Cantarel

et al. 2008) were used for gene predictions and locations.

(13)

12 Variant Calling

Burrows-Wheeler Aligner (BWA-Mem) (Li & Durbin 2009) with -M parameters were used to map the reads of each nucleus back to the whole genome assembly. Freebayes (Garrison &

Marth 2012) was used to filter and detect variants in the reads using the following parameters that were also used by Chen et al. (2018): -K -m 30 -C 2 -q 20 -p 1. The parameters were set for a minimum quality of mapped reads of 30, a minimum set of reads supporting alternative allele of two, a minimum base quality of 20 and a ploidy of one. A second filter was applied on top of the first using the vcflib package, vcffilter (Garrison 2018), with the following parameters:

QUAL > 1 removing bad sites, QUAL / AO > 10 ( Quality / Allele Observation Observation Count ), SAF > 0 and SAR > 0 removing alleles that are on one strand, RPR > 1 and RPL >1 having at least two reads “balanced” on each side, removing reads placed to the left or right, and RO > 1. BCFtools (Li 2011) stats with default parameters was used to determine the number of SNPs found in the whole genome, genome without repeats, and only in coding regions.

OrthoMCL v. 2.0.9 (Li et al. 2003) was used to identify single copy orthologs among the 22 nuclei. Single copy orthologs allows for the comparison of the amino acid or nucleotide sequences of a region present in all 22 nuclei and convey the level of polymorphism in each nucleus. Freebayes (Garrison & Marth 2012) was used with the aforementioned parameters to filter and detect variants among the single copy orthologs.

MAT Locus

A HD2 sequence in the same genus as C. candidum, Claroideoglomus claroideum, Genbank

accession number MH445375, was used as the query sequence in blast v 2.7.1+ against all 24

nuclei to find the presence and location of the MAT locus in C. candidum. The two low quality

(14)

13 nuclei were blasted as well to see if the MAT locus was present in the fragmented sequences.

The HD2 sequence specific to C. candidum was extracted with the contigs containing the MAT

locus and were then aligned together using MAFFT v. 7.407 (Katoh & Standley 2013) with

default settings followed by manual alignment inspection.

(15)

14 Results

Reference Genome

The nuclear genome of C. candidum was sequenced and assembled. The whole genome size was 87.60 Mb with 17,542 genes (Table 2). Of the full assembly, 44.7% is comprised of repeats.

Adding the percent completeness and fragmented, the full assembly had a BUSCO of 86.2%

(Table 2). When constructing the full genome assembly, only 8 of the most best quality nuclei

MaSuRCA assemblies were used (Table 3). When increasing the number of assembled single

nuclei, the size of the genome continued to inflate as seen in Figure 1. However, the quality of

the genome, as estimated by BUSCO completeness, did not improve after increasing the number

of nuclei. Using the 8-nuclei assembly had a higher completeness with a high number of single

copy genes and low number of duplicated genes compared to those in the 24-nuclei assembly

(Figure 2). The assembly size of the eight nuclei was very close to the estimated genome size

(87Mb) based on Kmergenie. The consideration for choosing to use eight nuclei for assembling

was a combination of nuclei that had the highest completeness with the highest number of single

core genes and the lowest number of duplicated genes (Table 4).

(16)

15 Figure 1 Overview of the assembly size for each nuclei combination. The number of bases in the assembly

continues to increase with each additional nucleus.

Figure 2 Comparison of assembly stats for different number combinations of nuclei. The values of the single (blue), duplicated (orange), and fragmented (gray) genes were used as criteria to determine the best number of nuclei combination for whole genome assembly. With the increasing number of nuclei, the number of duplicated genes increases in place of the decrease in single genes. The red line shows the highest N50 length between 7 nuclei and 14 nuclei.

50000000 60000000 70000000 80000000 90000000 100000000 110000000 120000000

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Number of Bases

Number of Nuclei

Total length

0 2000 4000 6000 8000 10000 12000 14000

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 0 50 100 150 200 250 300 350

Contig Length

Number of Nuclei

Number of Genes

Assembly Stats

Single Duplicated Fragmented Missing N50

(17)

16 Individual Nuclei Assemblies

Two of the 24 nuclei, 4 and 7, were excluded from succeeding analysis due to poor quality of their assemblies (Table 1). The two nuclei removed had assembly sizes of 4.43 and 2.18 and BUSCO completeness values of 7.9% and 4.8% which were below the average (51.41%) and the lowest completeness (29.7%) of the other nuclei (Table 1). The average size of the other

assembled nuclei was 41.55Mb, ranging from 28.32 Mb to 52.59 Mb. The highest completeness benchmark for the nuclei at 69.3% and the lowest completeness at 29.7% with the average at 51.41%.

Single Nucleotide Polymorphisms

Eleven single copy orthologs were shared between the 22 nuclei and had between 0-2 SNPs per ortholog (0.0013 SNPs/kb). The SNP density for the whole genome assembly (0.96 SNPs/kb) and the assembly without repeats (0.98 SNPs/kb), which were lower than the SNP density when only considering coding regions (1.22 SNPs/kb) (Table 5). Figure 3 shows an example of the SNPs on one contig observed between nuclei; the grey shows the same as reference, blue marks nucleotide other than reference and white indicates not found in that nucleus.

MAT locus

Using the HD2-like region of the MAT locus from C. claroideum, part of the MAT locus was identified in C. candidum. However only one allele of the MAT allele was found in 20 of the 24 nuclei. The 20 contigs containing the MAT locus were aligned together and presented no

differences amongst them (Fig 4). These two observations would indicate that C. candidum is

homokaryotic for the mating type locus.

(18)

17 Figure 3 Several SNP variations seen on the reads found among the 24 nuclei. The reference assembly sequence located on the bottom. The presence of SNPs is represented with the light blue markers. Gray markers are

nucleotides that match to the reference sequence. White markers indicate that nucleotide is missing in that nucleus.

Figure 4 Comparison of a segment of the MAT locus. This segment showed no variations in the HD2 region of the

MAT locus among the nuclei. The nuclei (11, 12, 20) with empty rows had matches further down the sequence. This

segment captured the most overlap among all the nuclei for a better visualization.

(19)

18 Table 1 Assembly stats of the individual nuclei generated from the SPADES assembler. Red indicates nuclei that were excluded due to low quality assemblies.

Nuclei Assembly Size (Mb)

No. of Contigs

N50 Contig

Largest

Contig (Kb) GC % BUSCO % No. of Genes

Repeats (Mb) 1* 38.63 10431 7367 52986 27.81 C: 49.3 F: 12.1 10798 12.25 2* 52.59 11952 10698 86679 27.8 C: 69.3 F: 9.0 12889 20.59 3* 41.85 10228 9033 106112 27.82 C: 52.1 F: 11.4 10646 15.35

4* 4.43 1087 8762 47580 27.86 C: 5.5 F: 2.4 -- --

5* 45.96 11412 8891 70965 27.81 C: 56.2 F: 10.0 11855 17.18 6* 37.68 10296 7742 61796 27.8 C: 45.5 F: 10.7 10362 13.36

7* 2.18 617 7469 65913 27.86 C: 3.1 F: 1.7 -- --

8* 48.65 11616 10114 84385 27.81 C: 61.0 F: 10.0 12113 18.56 9* 40.33 10349 8255 70206 27.79 C: 46.9 F: 13.1 10815 13.09 10* 40.73 10396 8403 73020 27.8 C: 51.1 F: 11.0 10897 13.1 11* 47.64 11310 10034 69919 27.8 C: 58.3 F: 8.6 11619 18.25

12* 28.32 8491 6046 45615 27.82 C: 29.7 F:13.4 8541 8.3

13* 52.56 11559 10831 76762 27.8 C: 68.3 F: 7.6 12507 20.2 14* 51.04 11780 10006 76861 27.82 C: 59.7 F: 12.1 12496 19.8 15* 44.98 10949 9631 60484 27.8 C: 54.8 F: 12.1 11598 15.28 16* 35.46 9424 7588 63419 27.82 C: 42.4 F: 13.1 9800 11.11

17* 42.52 9696 9529 56725 27.82 C: 56.2 F: 9.0 10847 13.7

18* 41.69 10441 8208 58320 27.82 C: 52.8 F: 12.1 11090 13.69 19* 38.63 9558 8366 51615 27.82 C: 44.5 F: 12.4 10292 12.2 20* 35.58 9977 7006 52471 27.81 C: 44.8 F: 9.7 10075 10.84

21* 29.72 9361 5630 45369 27.85 C: 35.8 F: 14.5 9312 8.57

22* 38.03 7556 12024 74356 27.82 C: 45.8 F: 7.2 9166 11.95

23* 45.56 10617 9499 61888 27.82 C: 61.4 F: 9.0 11814 14.67

24* 36.06 8476 9251 70117 27.91 C: 45.2 F: 12.4 9435 11.09

(20)

19 Table 2 Assembly stats for the full genome assembly from the MaSuRCA individual nuclei assemblies

Nuclei Assembly Size (Mb)

No. of Contigs

N50 Contig

Largest

Contig (Kb) GC % BUSCO % No. of Genes

Repeats (Mb) Whole

Genome 87.59 11334 11355 104398 27.65 C: 78.9 F:7.24 17542 39.16

Table 3 Individual nuclei MaSuRCA assemblies. * Nuclei used for reference genome assembly.

Nuclei # contigs Largest contig

Total

length N50 Single Duplicated

Total single and duplicated

# contigs

1* 8106 18987 29489618 3766 20 12 32 8106

2* 11388 30797 46219418 4329 42 11 53 11388

3* 8627 26962 32096789 3875 25 10 ³⁵ ⁸⁶²⁷

4* 585 15627 2090392 3570 3 1 ⁴ ⁵⁸⁵

5* 9769 29656 38988991 4289 26 11 ³⁷ ⁹⁷⁶⁹

6* 8304 25787 32011304 4077 26 11 ³⁷ ⁸³⁰⁴

7* 129 9349 439228 3557 0 1 ¹ ¹²⁹

8* 10270 25362 42205745 4480 34 9 ⁴³ ¹⁰²⁷⁰

9* 8391 21426 32372008 4096 23 12 ³⁵ ⁸³⁹¹

10* 8335 24135 31974802 4033 28 9 ³⁷ ⁸³³⁵

11* 10001 22941 39924909 4297 30 14 44 10001

12* 5604 18777 19121213 3490 15 11 ²⁶ ⁵⁶⁰⁴

13* 11447 30619 47577698 4533 44 12 ⁵⁶ ¹¹⁴⁴⁷

14* 10920 30682 43835385 4272 34 16 50 10920

15* 8781 34668 35233894 4309 30 13 ⁴³ ⁸⁷⁸¹

16* 7113 20841 25697597 3743 22 9 ³¹ ⁷¹¹³

17* 8858 26845 34123824 4050 25 10 ³⁵ ⁸⁸⁵⁸

18* 8862 20865 33217182 3930 30 11 ⁴¹ ⁸⁸⁶²

19* 7998 28571 29685933 3873 21 18 ³⁹ ⁷⁹⁹⁸

20* 7410 17016 26649828 3727 23 8 31 7410

21* 6201 15985 20860658 3434 16 8 ²⁴ ⁶²⁰¹

22* 7177 27037 28499372 4276 23 9 ³² ⁷¹⁷⁷

23* 9678 26744 37318671 4057 34 11 45 9678

24* 7234 26211 27976344 4099 25 11 ³⁶ ⁷²³⁴

(21)

20 Table 4 Different Nuclei combination reference assembly. Single and duplicated number of gene and completeness were considered picking which number of nuclei to use. *Nuclei number combination chosen for reference genome assembly

Nuclei

Combination Single Duplicated Fragmented Missing

% Completeness

(C)

% Fragmented

(F)

%C & F N50

5* 195 32 21 42 78.275 7.241 85.517 10496

6* 195 32 21 42 78.275 7.241 85.517 10496

7* 191 37 18 44 78.620 6.206 84.827 11314

8* 196 33 21 40 78.965 7.241 86.206 11355

9* 194 36 19 41 79.310 6.551 85.862 11221

10* 190 39 21 40 78.965 7.241 86.206 11373

11* 183 40 18 49 76.896 6.206 83.103 11510

12* 180 42 22 46 76.551 7.586 84.137 11479

13* 175 46 22 47 76.206 7.586 83.793 11334

14* 175 48 23 44 76.896 7.931 84.827 11187

15* 166 53 23 48 75.517 7.931 83.448 11068

16* 167 45 24 54 73.103 8.275 81.379 10609

17* 167 53 19 51 75.862 6.551 82.413 10472

18* 166 55 24 45 76.206 8.275 84.482 10499

19* 177 49 25 39 77.931 8.620 86.551 10266

20* 174 47 23 46 76.206 7.931 84.137 9854

21* 161 49 25 55 72.413 8.620 81.034 9658

22* 158 51 33 48 72.068 11.379 83.448 9517

23* 159 51 29 51 72.413 10.000 82.413 9399

24* 161 50 28 51 72.758 9.655 82.413 9469

Table 5 SNP density for the 8-nuclei assembly

Assemblies Size (Mbp) # of SNPs SNP Density (SNPs/kb)

Full 87.60 84901 0.96

Without repeats 48.43 47787 0.98

Coding regions 15.81 19249 1.22

(22)

21 Discussion

The full genome and individual nucleus assemblies give insight into the genetic structure of C.

candidum. Although 86.2% genome completeness may not seem very high, it gives us a good representation of the genome with the small amount of DNA that was available for sequencing from individual nuclei. To put in perspective of how the 86.2% completeness of C. candidum stands with other known AM fungi species. The Rhizophagus irregularis genome sequenced by Lin (2014); where they used monoxenic cultures to get a completeness of 97%. This assembly was based on an 8-nuclei combination that had the highest completeness out of the other number of combinations such as the 24-nuclei (82.4%). The 8-nuclei had an assembly size of 87.6 Mb, the highest number of single core genes (196) and the lowest number of duplicated core genes (33), while the 24-nuclei assembly with 107.5Mb had 161 single genes and 50 duplicated genes.

Here we see that using more nuclei for creating an assembly did not contribute to the quality of the assembly. The additional nuclei were inflating the assembly size and increasing duplications of supposedly single copy genes instead of adding new single copy genes.

The SNP densities for C. candidum was 1.22 SNPs/kb in its coding region and even lower in the

whole assembly without repeats at 0.98 SNPs/kb. As mentioned previously the repeats were

removed to obtain a SNP density with less variant calling errors. Even looking into the 11

orthologs found, the number of SNPs observed for each ortholog ranged from 0-2 SNPs per

orthologous gene among the nuclei. Considering either of the densities, there is still genetic

variation among the nuclei, which brings us back to how we define being homokaryotic or

heterokaryotic. If we use a strict definition that all the nuclei must be genetically the same with

no SNPs to be homokaryotic, then C. candidum would be heterokaryotic. However, there is a

(23)

22 fungal species with known heterokaryotic and homokaryotic isolates, the pathogenic fungi, Puccinia striiformis f. sp triticiı. The heterokaryotic isolate has an average of 5.29 SNPs/kb whereas its homokaryotic counterpart has 0.41 SNPs/kb on average (Cantu et al. 2013). These variations seen in the homokaryotic isolates are from non-synonymous mutations (Cantu et al.

2013). Using the SNP densities of P. striiformis as a reference, the SNP density of C. candidum would classify more as homokaryotic than as heterokaryotic. Even though the SNP density for C.

candidum is greater than the homokaryotic P. striiformis, it does not exceed the density of being heterokaryotic. But there are certain cases that complicate the genetic classification. For

example, Tuber melanosporum is a homokaryotic fungus with 0.06 SNPs/kb, while Laccaria bicolor is a dikaryotic fungus with 0.78 SNPs/kb (Tisserant et al. 2013). Each species has different SNP densities that correspond to being heterokaryosis or homokaryosis. These two examples demonstrate how it is not possible to determine clear thresholds for heterokaryosis.

Locating and confirming the presence of the MAT locus in the remaining nuclei would help me determine if C. candidum is homokaryotic or heterokaryotic for this locus. However, there is a slight misalignment in heterokaryosis definitions used across studies. In the Ropars et al. (2016) study, the heterozygotic isolates harboring two alleles of the MAT loci also have SNP variation less than one SNP/kb. This does not follow the same categorization as P. striiformis as an example for heterokaryosis, which would classify the isolate as homokaryon. After using BLAST, HD2 was found in 20 of the 24 nuclei without variation.

From the evidence we gathered, all indicate C. candidum to be a homokaryon based the presence

of a single MAT locus allele as per the definition in Ropars et al. (2016). The low level of

polymorphism falls under that of the heterokaryotic P. striiformis isolate, further supporting

(24)

23 homokaryosis. But when compared to L. bicolor, C. candidum would be considered

heterokaryotic. Which leads to the problem of how to define these classifications.

To be able to determine which type of organization is observed in this genome would depend on which definition to follow. The strict definition that a homokaryon contain genetically identical nuclei, then any variation observed would set the individual as a heterokaryon. The other definition follows the threshold of variation for known homokaryons and heterokaryons. Bever (2008) defined the genetic organization on a spectrum of heterokaryosity instead of having two clear cut definitions. This would help explain the small variation seen in species known to be homokaryotic. I do think that utilizing Bever’s degree of heterozygosity may be a better approach at describing AM fungi’s genetic organization. Compared to the other known heterokaryotic species, the genetic variation among nuclei within a single spore is low in C.

candidum.

It is not certain that C. candidum is a homokaryon, since this study was focused on the variation within a single spore and there could be variation between spores. It would be interesting to see if there would be more variation between nuclei from a different spore from the same strain.

There could be the possibility that C. candidum shares the same pattern of having different isolates that are homokaryons and heterokaryons like in Ropars et al. (2016) and Cantu et al.

(2013). Understanding where these variations occur within the genome, can reveal how

impactful they can be. That can also be used to compare C. candidum with other species and see

if they share the same level of variation. Then, we can organize these species on the spectrum of

heterozygosity and see how they compare with each other. Knowing the genetic structure of AM

fungi could play a crucial part in agriculture. It would be interesting to see if the AM fungus is

heterokaryotic would it contain different nuclei that are specific for multiple different hosts as

(25)

24 seen in Angelard et al. (2014), or if it is homokaryotic then it specializes in one host. This

information could help with planning what crops to plant and pair AM fungi inoculum to

increase the production output as seen with the potato yield in Hijri (2016). Especially with the

increase of the human population and potential food shortages, the need for efficient food

production is necessary.

(26)

25 References

Angelard C, Tanner CJ, Fontanillas P, Niculita-Hirzel H, Masclaux F, Sanders IR. 2014. Rapid genotypic change and plasticity in arbuscular mycorrhizal fungi is caused by a host shift and enhanced by segregation. The ISME Journal 8: 284–294.

Bever JD. 2002. Negative Feedback within a Mutualism: Host-Specific Growth of Mycorrhizal Fungi Reduces Plant Benefit. Proceedings: Biological Sciences 269: 2595–2601.

Bushnell, B. BBMap short read aligner. Joint Genome Institute, department of energy (2014).

doi:10.1016/j.avsg.2010.03.022

Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. 2009.

BLAST+: architecture and applications. BMC bioinformatics 10: 421.

Campos-Soriano L, García-Martínez J, Segundo BS. 2012. The arbuscular mycorrhizal

symbiosis promotes the systemic induction of regulatory defence-related genes in rice leaves and confers resistance to pathogen infection. Molecular Plant Pathology 13: 579–592.

Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, Moore B, Holt C, Alvarado AS, Yandell M.

2008. MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Research 18: 188–196.

Cantu D, Segovia V, MacLean D, Bayles R, Chen X, Kamoun S, Dubcovsky J, Saunders DG, Uauy C. 2013. Genome analyses of the wheat yellow (stripe) rust pathogen Puccinia striiformis f. sp. tritici reveal polymorphic and haustorial expressed secreted proteins as candidate effectors. BMC Genomics 14: 270.

Chen S, Jin W, Liu A, Zhang S, Liu D, Wang F, Lin X, He C. 2013. Arbuscular mycorrhizal

fungi (AMF) increase growth and secondary metabolism in cucumber subjected to low

temperature stress. Scientia Horticulturae 160: 222–229.

(27)

26 Chen EC, Mathieu S, Hoffrichter A, Sedzielewska-Toro K, Peart M, Pelin A, Ndikumana S,

Ropars J, Dreissig S, Fuchs J, Brachmann A, Corradi N. 2018. Single nucleus sequencing reveals evidence of inter-nucleus recombination in arbuscular mycorrhizal fungi. eLife 7:

e39813.

Chikhi R, Medvedev P. 2014. Informed and automated k-mer size selection for genome assembly. Bioinformatics 30: 31–37.

Cordes EE, Arthur MA, Shea K, Arvidson RS, Fisher CR. 2005. Modeling the Mutualistic Interactions between Tubeworms and Microbial Consortia. PLOS Biology 3: e77.

Fraser JA, Heitman J. 2003. Fungal mating-type loci. Current Biology 13: R792–R795.

Garrison E, Marth G. 2012. Haplotype-based variant detection from short-read sequencing.

arXiv:1207.3907 [q-bio]

Garrison E. Vcflib, a simple C++ library for parsing and manipulating VCF files. 2018.

https://github.com/vcflib/vcflib.

Grabherr, M. G. Lingon: A d-mer based genome assembly pipeline. (2018).

Gremme G, Steinbiss S, Kurtz S. 2013. GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations. IEEE/ACM Trans Comput Biol Bioinformatics 10: 645–656.

Gurevich A, Saveliev V, Vyahhi N, Tesler G. 2013. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29: 1072–1075.

Heckman DS, Geiser DM, Eidell BR, Stauffer RL, Kardos NL, Hedges SB. 2001. Molecular Evidence for the Early Colonization of Land by Fungi and Plants. Science 293: 1129–1133.

Hijri M. 2016. Analysis of a large dataset of mycorrhiza inoculation field trials on potato shows

highly significant increases in yield. Mycorrhiza 26: 209–214.

(28)

27 Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell

A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong S-Y, Lopez R, Hunter S. 2014. InterProScan 5: genome-scale protein function classification.

Bioinformatics 30: 1236–1240.

Katoh K, Standley DM. 2013. MAFFT Multiple Sequence Alignment Software Version 7:

Improvements in Performance and Usability. Molecular Biology and Evolution 30: 772–

780. Kiers ET, Duhamel M, Beesetty Y, Mensah JA, Franken O, Verbruggen E, Fellbaum CR, Kowalchuk GA, Hart MM, Bago A, Palmer TM, West SA, Vandenkoornhuyse P, Jansa J, Bücking H. 2011. Reciprocal Rewards Stabilize Cooperation in the Mycorrhizal Symbiosis.

Science 333: 880–882.

Kim SJ, Eo J-K, Lee E-H, Park H, Eom A-H. 2017. Effects of Arbuscular Mycorrhizal Fungi and Soil Conditions on Crop Plant Growth. Mycobiology 45: 20–24.

Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform.

Bioinformatics (Oxford, England) 25: 1754–1760.

Li H. 2011. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27:

2987–2993.

Li L, Stoeckert CJ, Roos DS. 2003. OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Research 13: 2178–2189.

Lin K, Limpens E, Zhang Z, Ivanov S, Saunders DGO, Mu D, Pang E, Cao H, Cha H, Lin T,

Zhou Q, Shang Y, Li Y, Sharma T, van Velzen R, de Ruijter N, Aanen DK, Win J, Kamoun

(29)

28 S, Bisseling T, Geurts R, Huang S. 2014. Single Nucleus Genome Sequencing Reveals High Similarity among Nuclei of an Endomycorrhizal Fungus. PLoS Genetics 10: e1004078.

Liu L, Li Y, Li S, Hu N, He Y, Pong R, Lin D, Lu L, Law M. 2012. Comparison of Next- Generation Sequencing Systems. Journal of Biomedicine and Biotechnology 2012:

Marleau J, Dalpé Y, St-Arnaud M, ri M. 2011. Spore development and nuclear inheritance in arbuscular mycorrhizal fungi. BMC Evolutionary Biology 11: 51.

Martin F. 2016. Molecular Mycorrhizal Symbiosis, 1st ed. John Wiley & Sons, Incorporated Mathur S, Sharma MP, Jajoo A. 2018. Improved photosynthetic efficacy of maize (Zea mays)

plants with arbuscular mycorrhizal fungi (AMF) under high temperature stress. Journal of Photochemistry and Photobiology B: Biology 180: 149–154.

Mikula P, Hadrava J, Albrecht T, Tryjanowski P. 2018. Large-scale assessment of

commensalistic–mutualistic associations between African birds and herbivorous mammals using internet photos. PeerJ 6: e4520.

Montoliu-Nerin M, Sánchez-García M, Bergin C, Grabherr M, Ellis B, Kutschera VE, Kierczak M, Johannesson H, Rosling A. 2019. From single nuclei to whole genome assemblies.

bioRxiv 625814.

NHGRI. 2016. The Cost of Sequencing a Human Genome | NHGRI. online July 6, 2016:

https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost.

Accessed May 14, 2019.

Olson ND, Lund SP, Colman RE, Foster JT, Sahl JW, Schupp JM, Keim P, Morrow JB, Salit

ML, Zook JM. 2015. Best practices for evaluating single nucleotide variant calling methods

for microbial genomics. Frontiers in Genetics, doi 10.3389/fgene.2015.00235.

(30)

29 Parniske M. 2008. Arbuscular mycorrhiza: the mother of plant root endosymbioses. Nature

Reviews Microbiology 6: 763–775.

Redecker D, Kodner R, Graham LE. 2000. Glomalean Fungi from the Ordovician. Science 289:

1920–1921.

Ropars J, Corradi N. 2015. Homokaryotic vs heterokaryotic mycelium in arbuscular mycorrhizal fungi: different techniques, different results? New Phytologist 208: 638–641.

Ropars J, Toro KS, Noel J, Pelin A, Charron P, Farinelli L, Marton T, Krüger M, Fuchs J, Brachmann A, Corradi N. 2016. Evidence for the sexual origin of heterokaryosis in arbuscular mycorrhizal fungi. Nature Microbiology 1: 16033.

Sharman A. 2001. The many uses of a genome sequence. Genome Biology 2: reports4013.1.

Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO:

assessing genome assembly and annotation completeness with single-copy orthologs.

Bioinformatics 31: 3210–3212.

Smit, AFA, Hubley, R. 2008-2015. RepeatModeler Open-1.0. <http://www.repeatmasker.org>.

Smit, AFA, Hubley, R & Green, P. 2013-2015. RepeatMasker Open-4.0.

<http://www.repeatmasker.org>.

Souza T. 2015. Overview. In: Souza T (ed.). Handbook of Arbuscular Mycorrhizal Fungi, pp. 1–

8. Springer International Publishing, Cham.

Ter-Hovhannisyan, V., Lomsadze, A., Chernoff, Y. O. & Borodovsky, M. 2008. Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res. doi:10.1101/gr.081612.108

Tisserant E, Malbreil M, Kuo A, Kohler A, Symeonidi A, Balestrini R, Charron P, Duensing N,

Frei dit Frey N, Gianinazzi-Pearson V, Gilbert LB, Handa Y, Herr JR, Hijri M, Koul R,

(31)

30 Kawaguchi M, Krajinski F, Lammers PJ, Masclaux FG, Murat C, Morin E, Ndikumana S, Pagni M, Petitpierre D, Requena N, Rosikiewicz P, Riley R, Saito K, San Clemente H, Shapiro H, van Tuinen D, Bécard G, Bonfante P, Paszkowski U, Shachar-Hill YY, Tuskan GA, Young JPW, Sanders IR, Henrissat B, Rensing SA, Grigoriev IV, Corradi N, Roux C, Martin F. 2013. Genome of an arbuscular mycorrhizal fungus provides insight into the oldest plant symbiosis. Proceedings of the National Academy of Sciences of the United States of America 110: 20117–20122.

Treangen TJ, Salzberg SL. 2011. Repetitive DNA and next-generation sequencing:

computational challenges and solutions. Nature Reviews Genetics 13: 36–46.

Zhang H, Xu N, Li X, Long J, Sui X, Wu Y, Li J, Wang J, Zhong H, Sun GY. 2018. Arbuscular

Mycorrhizal Fungi (Glomus mosseae) Improves Growth, Photosynthesis and Protects

Photosystem II in Leaves of Lolium perenne L. in Cadmium Contaminated Soil. Frontiers in

Plant Science, doi 10.3389/fpls.2018.01156.

Uncovering the genetic organisation of Claroideoglomus candidum

1

Uncovering the genetic organisation of Claroideoglomus candidum

George B Cheng

Degree project in biology, Master of science (2 years), 2019 Examensarbete i biologi 45 hp till masterexamen, 2019

Biology Education Centre and Department of Evolutionary Biology, Uppsala University Supervisors: Anna Rosling and Marisol Sanchez Garcia

External opponents: Jente Ottenburghs and Boel Olsson

1

Acknowledgements

I would like to thank my supervisors Dr. Anna Rosling and Dr. Marisol Sanchez Garcia from the department of Ecology and Genetics at Uppsala University. If I ran into trouble or had a question about my research or writing their doors were always open for me. They were both very

supportive and kept me grounded throughout the project. I would also like to extend my gratitude

towards the rest of the Rosling research group at Uppsala University for also being supportive

and teaching me about their ongoing research. I would like to thank my two external opponents

Dr. Jente Ottenbughs and Boel Olsson from Uppsala University. I’m grateful for the comments

they made for this thesis that helped shape the final product. Finally, I would like to thank my

parents and my brother for supporting and encouraging me throughout these two years leading

up to the end of the thesis.

2

Abstract

In this study I applied de novo genome assembly and annotation of 24 individual nuclei from a single spore of Claroideoglomus candidum. The full length of the de novo genome assembly was 87.6 Mb with 17,542 genes. Estimated polymorphism between the nuclei was very low. I

identified the MAT locus in C. candidum, using a previously sequenced MAT locus from

another congeneric species. Only one of the MAT locus alleles was found in the examined spore.

The evidence points towards homokaryosis as the genetic organization of Claroideoglomus

candidum.

3

Contents

Acknowledgements ... 1

Abstract ... 2

Introduction ... 4

AM Fungal Symbiosis ... 4

Evolutionary Persistence of AMF ... 6

Genome Sequencing ... 8

Project Aims ... 9

Methods ... 10

Origin of reads ... 10

De novo genome assembly ... 10

Genome Annotations ... 11

Variant Calling ... 12

MAT Locus ... 12

Results ... 14

Reference Genome ... 14

Individual Nuclei Assemblies ... 16

Single Nucleotide Polymorphisms ... 16

MAT locus... 16

Discussion... 21

References ... 25

4

Introduction

AM Fungal Symbiosis

AM fungal symbiosis, which was established before mutualistic interactions evolved between insects and vertebrates, was arguably the essential driving force for successful plant colonization on land (Kiers et al. 2011, Redecker et al. 2000, Heckman et al. 2001). This symbiotic

relationship can have a deep impact on agricultural production. In order to achieve more

environmental-friendly agriculture processes, a better understanding of how to harmonize all

aspects of the agriculture environment including this plant-fungal relationship. Not only do we

need to understand which crops would be best suited for the plot of land, but we also need to be

aware of the microbiota that thrive beneath the surface. The presence of AM fungi facilitates

nutrient uptake by capturing and directing nitrogen and phosphorus to the plant, in exchange for

2018).

Understanding and utilizing AM fungi in agricultural practices could reduce the use of chemical fertilizers and pesticides, however one challenge is that AM fungi express species dependent host preferences which can make it difficult to pair to crop species (Angelard et al. 2014, Kim et al.

2017). In a field study by Hijri (2016) potato yield was evaluated in plots inoculated with AM fungi, and found an overall increase in the yield compared to that of uninoculated plots.

However, some inoculated plots experienced a decrease in yield compared to the uninoculated plots, revealing other potential causes of reduction. Hijri (2016) suggested advancing several hypotheses that could explain this reduction; the poor application of the inoculum with

insufficient agitation of the inoculum, surveying for pathogen attacks, competition between AM

6 fungi in the inoculum and indigenous AM fungi populations. Understanding the local soil

community dynamics and how it affects AM fungi can be crucial to improving product yield.

(2014) suggests that the fungi show potential for adaptability due to its ability to alter its

nucleotype frequencies to better suit its environment or host. If the AM fungi fuse with different plant species simultaneously, the nuclei within the hyphal network may be genotypically

different.

Evolutionary Persistence of AMF

many consider them to be ancient asexuals that defy the basis of evolutionary theory by persisting for 450 million years (Parniske 2008). While sexual reproduction has not been explicitly observed in AM fungi, it has been inferred to occur due to the presence of a putative

“mating-type” locus (MAT locus) similar the mating type of Basidiomycetes fungi (Ropars et al.

2016).

“mating-type locus” (MAT locus) (Ropars et al. 2016). The MAT locus was located in

Rhizophagus irregularis isolates revealing that R. irregularis produce either homokaryotic or

heterokaryotic mycelia. Within the MAT locus there are two open reading frames that contain

the homeodomain-like region that were designated as HD1-like and HD2. The heterokaryotic

8 isolates have two alleles of HD1-like and HD2, and homokaryotic isolates would have only one (Ropars et al. 2016).

Genome Sequencing

This opened avenues of new research for many fields of biology. Assembling genomes unlocks more information about the species of interest, such as identifying proteins, uncovering

regulatory pathways, or evaluating the differences between or within species (Sharman 2001).

Determining whether the species is heterokaryotic or homokaryotic will depend on how each are

defined. One strict definition is that homokaryosis is when genetic composition among the