• No results found

SEQUENCING THE GENOMIC DNA OF ANODONTA ANATINA USING OXFORD NANOPORE TECHNOLOGY

N/A
N/A
Protected

Academic year: 2021

Share "SEQUENCING THE GENOMIC DNA OF ANODONTA ANATINA USING OXFORD NANOPORE TECHNOLOGY"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

SEQUENCING THE GENOMIC DNA OF ANODONTA ANATINA USING OXFORD NANOPORE TECHNOLOGY

Master Degree Project in Bioscience One years Level, 60 ECTS

Saad Shikh Khaled A19saask@student.his.se Supervisor: Mikael Ejdebäck Mikael.ejdeback@his.se Examiner: Erik Gustafsson Erik.gustafsson@his.se

(2)

Abstract

Freshwater mussels are members of phylum Mollusca, which live in freshwater habitats such as lakes and rivers. Freshwater mussels are essential ecologically in the aquatic ecosystems, they have a high capacity for water purification and play a significant role in calcium recycling. The genomic DNA of many freshwater mussels' species has not yet been sequenced. Knowledge of such a sequence can be useful in the development of a multi-biomarker panel to identify water pollution, and it also helps to develop a method to identify freshwater mussels' species according to their genomic DNA. This study aims to use nanopore sequencing technology to sequence the genomic DNA of Anodonta anatina, a species of freshwater mussel common in Europe. The DNA used in this experiment was extracted from the foot tissues, and two tissue homogenization methods were tested in this experiment to determine the best approach. The genomic DNA was sequenced by using Oxford nanopore MinION device, and the reads were assembled and polished using multiple software tools. The reads obtained from sequencing the DNA cover 3.5x of the estimated genome size of Anodonta anatina. 20x coverage is required for a complete genome assembly, and due to the low coverage, only a partial sequence of the genomic DNA was obtained during this experiment. This indicates that nanopore sequencing could be used to sequence the genome of freshwater mussels, but further sequencing runs are required to get enough coverage to assemble the whole genomic DNA.

(3)

Popular scientific summary

Water pollution is a global issue that affects most countries around the world. Water pollution occurs when a harmful substance such as chemicals contaminate a stream, river, lake, ocean, or other body of water, which will result in degrading water quality and making it toxic to humans or the environment. Water is a natural solvent, which means that it enables most pollutants to dissolve in it easily and contaminate it. The leading causes of water pollution are industrial waste, sewage and wastewater, mining activity, marine dumping, and accidental oil leakage. Early detection of water pollution is a critical point in fighting and preventing a further increase in water pollution. Two types of methods are currently used to detect the availably of a pollutant in water.

The first type of methods is called chemical methods, which rely on tests that check for the availability of a specific chemical contaminant in water, such as magnesium or zinc. The second type of methods, which is biological methods rely on doing tests on organisms that are characterized by particular vulnerability to contaminants to check if there is any contaminant in water that is affecting these organisms.

Freshwater mussels are essential in maintaining the aquatic environment; these creatures have a high capacity for water purification and play a crucial role in calcium recycling in lakes. There are sixteen species of freshwater mussels in Europe, and seven of these species can be found in Sweden. These species can be found in many rivers and lakes around Sweden. In ecological monitoring, mussels have been used previously to indicate pollutants in water, such as copper, cadmium, zinc, and other contaminants. A previous study performed by the University of Skövde suggests that freshwater mussels could be used to develop a method that allows for the identification of pollution in water and give an early warning to this pollution. This method has the potential to replace the use of chemical tests to detect water pollution. Chemical tests, however, suffer from a lack of robustness, and only those chemicals tested will be found/not found. Therefore, unknown and unexpected pollutants will be undetected. Many laboratories and researchers around the world are trying to develop a multi-biomarker panel to identify water pollution. Multi-biomarker panel refers to the methods that use biological tests to determine the existence of multiple markers in a living organism. The unique thing about these markers that these markers can only be observed in an organism that lives in a specific environmental condition. These markers can be DNA, protein, or even an enzyme that can be found in organisms that lives in normal environmental conditions. Still, it cannot be found in organisms that live in pollutant environmental conditions, and the reverse can apply to this statement.

This study aims to know what is the composition and the order of the DNA of a freshwater mussel species that lives in a standard environmental condition. Then and after defining what is normal, it can be compared with the DNA of freshwater mussels that live in abnormal conditions to check if there are any differences in the DNA between freshwater mussels that live in clean water and freshwater mussels that live in pollutant water. Due to numerous challenges that were faced during this study, only a part of the composition and order of the DNA of the freshwater mussels' spices was obtained. This part of the DNA composition could be used to study the possibility of using freshwater mussels to develop a multi-biomarker panel. However, the whole DNA composition is still required to make sure that it is possible. Also, the current study encompassed a considerable effort to understand the optimal methods that can be used to obtain the correct order and composition of the DNA of freshwater mussels.

(4)

Table of Contents

Abbreviations ... 1

Introduction ... 2

DNA extraction ... 4

Library preparation ... 4

Bioinformatic processing ... 4

Multi-biomarker panel ... 5

Aim ... 6

Materials and Methods ... 7

Sample preparation ... 7

Sequencing ... 8

Bioinformatic processing ... 8

Results ... 9

Sample preparation ... 9

Tissues homogenization ... 9

DNA extraction ... 9

Sequencing and Base-calling ... 10

Bioinformatic processing ... 11

Assembly ... 11

Polishing ... 11

Discussion ... 12

Sample preparation ... 12

Sequencing ... 13

Bioinformatic processing ... 13

Multi-biomarker panel ... 15

Conclusion ... 15

Ethical aspects, gender perspectives, and impact on the society ... 16

Future perspectives ... 17

Acknowledgments ... 18

References ... 19

(5)

1

Abbreviations

Bp Base-pair

CNS Central nerves system

EPA Environment protection agency Gbp Giga base-pair

IUCN International union for conservation of nature Kbp Kilo base-pair

LFB Long fragments buffer Mbp Mega base-pair

NGS Next-generation sequencing ONT Oxford nanopore technology PCR Polymerase chain reaction

RFLP Restriction Fragment Length Polymorphism SSR Simple Sequence Repeats

WGS Whole-genome shotgun sequencing

(6)

2

Introduction

Freshwater mussels are the members of phylum Mollusca, which live in freshwater habitats such as lakes and rivers (Rosenberg, 2014). Freshwater mussels are essential ecologically in the aquatic ecosystems (Naimo, 1995). Freshwater mussels have a high capacity for water purification in lakes and rivers (Vaughn, 2018) and play a significant role in calcium recycling in water habitats (Green, Singh, & Bailey, 1985). Freshwater mussels also have other essential functions such as particle filtration and processing, nutrient release, and sediment mixing (Vaughn & Hakenkamp, 2001). Freshwater mussels play a significant role in maintaining the ecosystems of the surrounding environment by regulating water purification by a process called bioremediation (Vaughn, 2018). Bioremediation is a waste management process where the waste in the contaminated environment is neutralized or removed by an organism (Prince, 2000). According to the environment protection agency (EPA), the definition of bioremediation is" a treatment that utilizes naturally occurring organisms to break-down a dangerous substance into a less hazardous or non-toxic substance."

There are sixteen species of unionid freshwater mussels, and seven of these can be found in Sweden (Osterling, Zulsdorff & Schneider, 2012). The most known freshwater mussels in Sweden are the freshwater pearl mussel (Margaritifera margaritifera) and the thick-shelled river mussel (Unio crassus). Both of these species are categorized as endangered on the IUCN and the Swedish Red list of species (Osterling et al., 2012). The information that we know about freshwater mussel's genomics is limited due to the lack of a genomic sequence in the databases.

Nearly five decades have passed since the invention of methods that allows us to sequence the DNA of the living organisms (Shendure, Mitra, Varma, & Church, 2004). DNA sequencing is the process of determining the order of the nucleotides (cytosine, thymine, adenine, and guanine) of a given DNA fragment, and it can be done on animals, plants, and microorganism. DNA sequencing is a vital process of determining the sequence of a gene, full chromosomes, or an entire genome.

Sequencing is also essential in molecular biology because it allows researchers to identify differences in genes that are associated with diseases and unnormal conditions. Sequencing allows for the identification and diagnosis of viral infections and drug resistance testing, which is very helpful when designing medicines (Fei & Ng, 2019). For example, we can use sequencing information to determine which areas of the DNA contain genes and which areas provide regulatory instruction. In addition, and most importantly, sequence data can highlight changes in a gene that may cause inherited diseases (França, Carrilho, & Kist, 2002).

The first DNA sequencing method which emerged in the early 1970s included the Maxam-Gilbert method, discovered by and named for American molecular biologists Allan M. Maxam and Walter Gilbert, and the Sanger method (or dideoxy method), discovered by English biochemist Frederick Sanger (Sanger, Nicklen, Coulson, 1977). In the Sanger method, which was the most commonly used most of the two approaches, the DNA chain was synthesized by using a template strand, but the strand building was stopped when one of four possible dideoxynucleosides, which lack a hydroxyl group became incorporated which will prevent the addition of another nucleotide. By this method, many truncated DNA molecule was produced that represent each of the sites of that particular nucleotide in a template DNA. Then, the DNA can be separated according to size in a procedure called electrophoresis, and the nucleotide sequence was deduced by a computer (Griffiths, 2012). This method had many limitations, such as it only sequence short pieces of DNA, very expensive, and low throughput (Ari & Arikan, 2016). Because of this limitation, multiple DNA sequencing methods were developed that lower the cost and increase sequencing accuracy,

(7)

3

precision, and throughputs such as Next-generation sequencing (NGS) and MinION nanopore sequencing (Ku & Roukos, 2013).

Nanopore sequencing was introduced by David Deamer at the University of California and by George Church and Daniel Branton both at Harvard University. At the beginning of the 1990s, many academic laboratories reached a series of achievements toward making a functional nanopore sequencing platform (Deamer et al., 2010). These milestones include the translocation of an individual nucleic acid strand, processive enzymatic control of DNA at a single nucleotide precision, and the achievement of single-nucleotide resolution (Jain, Olsen, Paten, & Akeson, 2016). Multiple companies have proposed nanopore-based sequencing methods, but only MinION-based sequencing has been successfully used by independent genomics laboratories around the world (Jain et al., 2016). When the DNA passes through the nanopore a change in ion current will occur, this change is dependent on the shape, size, and the length of the DNA sequence and each type of nucleotide will block the ion flow through the pore for a different period of time which will allow identifying the order and position of the nucleotide (Rusk, 2009). The advantages of nanopore sequencing that it does not need any labeling of the DNA, no expensive fluorescent reagents, or a charge-coupled device (Rusk, 2009).

Despite all the achievements that science achieve in the past 50 years, sequencing a whole genome of an organism remains a complicated task. Sequencing and entire genome require breaking the DNA into small fragments, sequencing the fragments, and assembling the fragments into a long consensus (Jung, Winefield, Bombarely, Prentis, & Waterhouse, 2019). The significant steps of genomic DNA sequencing are shown in figure 1.

Figure 1. Represent the main steps involved in sequencing the whole genomic DNA of an organism.

(8)

4

The three most important steps in genomic DNA sequencing are DNA extraction, library preparation, and bioinformatic processing. A summary of these steps is given below.

DNA extraction

DNA extraction is the process of separating the DNA from proteins, membranes, and other cellular materials (Kelly & Elkins, 2013). The DNA extraction process requires careful handling of the samples to prevent contamination with other DNA or chemicals (Elkins & Kelly, 2013). There are three necessary steps in DNA extraction Cell lysis, DNA purification, and DNA precipitation. Cell lysis is the step of breaking the cell membrane to expose the DNA, and it can be done by physical methods such as tissue homogenizer or chemically by using a cell lysis buffer. Purification is the process of purifying the DNA from the surrounding cellular components, and it can be done chemically by using proteinase k, RNase, or by using a phenol-chloroform method or physically by using centrifugation. The last step in DNA extraction is the precipitation of the DNA, which can be done by either ethanol or isopropanol (Kelly & Elkins, 2013).

Library preparation

Library preparation is the process of repairing and preparing the DNA for sequencing. Library preparation is achieved by attaching an artificial DNA segment (adapters) to both ends of the DNA fragments (Gansauge & Meyer, 2013). These adapters allow for priming of the sequencing reaction and, more importantly, it enables amplification of the DNA library by polymerase chain reaction (PCR). By amplifying all the fragments that were effectively transformed into library molecules, the library is successfully immortalized. This will make the library available for both whole-genome shotgun sequencing and for enriching genomic regions of interest via hybridization capture (Gansauge & Meyer, 2013). Finally, after library preparation is done, the DNA is conditioned, and then it is ready to load into the sequencing device (Mikheyev & Tin, 2014).

Library preparation usually takes about half a day, but it depends on the protocol and the sequencing device being used (Mikheyev & Tin, 2014).

According to Oxford nanopore technology, there are two types of DNA adapters that allow for DNA priming and DNA amplification through Polymerase chain reaction. The first type of adapters, which is called 1D adapters, enable the template DNA strand and the complementary DNA strand to be sequenced as individual strands which result in a very high data yield (reads) from a wide variety of sample types. The second type of adapters, which is called 1D2 adapters, is a special type of DNA adapters that increase the probability that the complement strand will immediately follow the template strand during sequencing. this type of adapters, when used with 1D2 analysis methods, allow for fewer errors in sequencing and produces a higher accuracy reads. So, the choice of sequencing adapters is determined by what is the purpose of the sequencing experiment.

If the sequencing experiment requires a high data yield such as whole-genome sequencing, then 1d sequencing adapters can be used. While, if the experiment requires a high base accuracy and a low percentage of errors such as gene sequencing then 1D2 sequencing adapters can be used.

Bioinformatic processing

Bioinformatic processing is the most complicated and time-consuming part of genomic DNA sequencing. A flowchart that shows the steps of bioinformatic processing is shown in figure 2. The first step in bioinformatic processing after obtaining the raw reads using the Oxford nanopore sequencing device is base-calling, which is the computational process of translating raw electrical signals to a nucleotide sequence (Wick, Judd, & Holt, 2019). Base-calling can be done using

(9)

5

different computer software's but the optimum software to use on raw Minion data is called Guppy software, which is a software that was developed by Oxford nanopore technology company. The next step after base-calling is genome assembly, which is the process of merging the small sequenced fragments into larger fragments (contigs) using an assembly software such as Canu or Falcon (Foxman, 2012). The next step is contigs polishing is the process of aligning the consensus to the base-called data to find any possible errors, and fix these errors using the original reads or new reads (Zimin & Aleksey, 2019). After polishing the contigs, scaffolding can be done, which is the process of bridging the gaps between the contigs to make a scaffold (Waterston, Lander, & Sulston, 2002).

Figure 2. Represent a flowchart of the major steps that are needed to process raw nanopore sequencing data into a high-quality scaffold.

Multi-biomarker panel

Obtaining the genomic sequence of the freshwater mussel Anodonta anatina will increase the understanding of the genomics and the transcriptomics that this species has. Also, it will help in the development of a multi-biomarker panel that helps to identify water pollution. It is known that water and air pollution cause a change in various metabolites during metabolism in an organism. Most of these metabolites are active electrophilic compounds and can interact with DNA to create many types of DNA damage that is known as DNA mutations. Any DNA damage may interfere in the final transformation of cells. Therefore, detection of DNA damage can provide a tool for monitoring and recognizing the genotoxicity of pollutants in a water environment (Liyan, Ying, & Guangxing, 2005). These DNA mutations can be detected by various methods that include restriction fragment length polymorphism (RFLP) and variable number of tandem repeats (VNTR) (Liyan et al., 2005). So, sequencing the genomic DNA of freshwater mussels that lives in normal environmental conditions, then comparing it to the genomic DNA of freshwater mussels that lives in polluted environmental conditions. This will allow identification of DNA mutations caused by water pollution and identify the possibility of using these mutations as a biomarker for water pollution. Also, the genomic DNA sequence of Anodonta anatina could also be useful in determining the genes and the proteins that this species carries under normal and abnormal environmental conditions, which may also lead to identifying transcriptomic biomarkers for identification of water pollution.

Raw data Base-calling

Quality control Assembly Polishing Scaffolding

(10)

6

Aim

Knowledge of a genomic sequence can aid in the identification of the species of freshwater mussels and, ultimately, in the development of a multi-biomarker panel that can be used to identify pollution in water and give an early warning if there is pollution. This can be done by comparing the genomic DNA of freshwater mussels that lives in normal and abnormal environmental conditions to identify any DNA mutations that resulted from pollution. Then, these mutations can be studied to determine if they can be used as a biomarker for water pollution. Also, the genomic sequence will give a better understanding of the genomics of this organism, and it will allow analyzation of the transcriptomics of this species.

In this thesis, the aim is to sequence the genomic DNA of the freshwater mussel Anodonta anatina.

From a local source, the mussels will be collected, DNA extracted and sequenced using MinION sequencing device.

(11)

7

Materials and Methods Sample preparation

Freshwater mussels were collected on 4th of March from Näsbadet at Skärvalången lake near Skövde, Sweden. Two species of freshwater mussels (Anodonta anatina and Unio tumidus) can be found in high amounts in this lake. Nine samples of freshwater mussels were collected from the lake and stored in a dry bucket and covered with a wet towel. The water temperature in the lake was 2 °C, and the pH of the water was 6.8.

A scalpel was used to cut the superior and inferior valves of the mussel. The shell of the mussels was opened, and the mantle was removed by using a scalpel. The foot, digestive gland, and gills of freshwater mussels were excised by a scalpel and stored in a 1.5ml Eppendorf tube containing 1ml of RNA later solution (Invitrogen) and stored in the fridge at 4°C until DNA extraction.

Two tissue homogenization methods were tested to determine which homogenization method gives longer fragments size. In the first homogenization method, 19.1mg of foot tissue was ground with liquid nitrogen. In the second method, 16.9mg of foot tissue was homogenized in a TissueLyser (Qiagen). The tissue was homogenized for 40 seconds at 40 Hz, according to TissueLyser handbook (Qiagen) instructions. The DNA of freshwater mussel was extracted using blood and cell culture DNA midi kit (Qiagen), and 20g Qiagen genomic tip was used for the extraction. The DNA concentration was measured using a Qubit 4.0 fluorometer, and the dsDNA High-sensitivity assay kit (Invitrogen) protocol was followed. The purity of the DNA was tested by using Nanodrop spectrophotometer (Saveen Werner). The length of the fragments was obtained by using a Fragment Analyzer (Advanced analytical) following the instructions in the DNF-915 dsDNA reagent kit protocol.

After deciding which extraction method is the best, a new sample was used to extract the DNA for sequencing. 96.2mg of foot tissues were homogenized using TissueLyser (Qiagen). After homogenization, the DNA of freshwater mussels was extracted by using 100g Qiagen genomic tip, and the instructions in Qiagen genomic DNA handbook protocol (Qiagen) were followed. The concentration of the DNA was obtained by using a Qubit 4.0 fluorometer (Thermofisher). The purity of the DNA was tested by using Nanodrop (Saveen Werner). The length of the fragments was obtained by using a fragment analyzer (Advanced analytical).

The extracted DNA was prepared for sequencing by following genomic DNA by ligation (SQK- LSK109) protocol (ONT). One microgram of the extracted DNA was transferred into a 1.5 ml Eppendorf tube, and the volume was adjusted into 49 μl with distilled water and used for library preparation according to the protocol. After finishing the preparation step, one microliter of the eluted sample was quantified by using a Qubit fluorometer by employing a dsDNA High-sensitivity assay kit (Invitrogen). After preparation, adapter ligation and clean up were done by following genomic DNA by ligation (SQK-LSK109) protocol (ONT). Long Fragments buffer (LFB) was used to retain DNA fragments that are longer than 3 kb and to wash off the smaller fragments. One microliter of the eluted sample was quantified by using a qubit fluorometer by employing dsDNA high-sensitivity assay kit (Invitrogen). The prepared library was stored on ice until loaded into the flow cells.

(12)

8

Sequencing

The library was loaded into a new flow cell, and the number of pores was tested before and after the sequencing run. In total, 50 fmol of the final library preparation was loaded into R9.4.1 flow cell following the protocol's steps. After library preparation, priming, and loading of the DNA, the flow cell was attached to a laptop that contains MinKNOW version 19.12.5 software. Sequencing was started by using MinKNOW software. The duration of the sequencing run was set to 13 hours, and the output format was set to fast5 files. The base-calling option was turned off during this stage in order not to overwhelm computer memory capacity. The flow cell R9.4.1 was washed by following the instructions in the cell wash kit EXP-WSH003 protocol (ONT) and stored at 4°C.

Bioinformatic processing

The genome size of Anodonta anatina was estimated by using Jellyfish software version 2.2.3, which is a tool for counting k-mers in the sequencing reads. The software was started by using the default parameters and using seven threads. The k-mers length was set to 19, and the output file was converted into a histogram by using the same software. The histogram was viewed by using GenomeScope software version 1.0 by using the default parameters. The obtained genome-size was compared with other genome sizes of freshwater mussels' species on GenBank, and an estimated genome size of Anodonta anatina was obtained.

The generated fast5 files from running the DNA by using MinKNOW software were base-called using Guppy software (Wick, Judd, & Holt, 2020). The software was used with the default parameters, and the GPU version of base-calling was used. Guppy has two running options (fast or accurate), and the accurate method was used in this experiment. The generated fastq files were separated into a pass or fail folder by using a quality score of seven. The results generated by Guppy Base-calling software were analyzed by using pycoQC software (Leger & Leonardi, 2019).

Defaults parameters were used for the software, and a base-calling summary file was used as an input. The result obtained from base-calling and quality control was saved on an external hard drive for further analysis. The passed fastq reads from base-calling were corrected and trimmed, then assembled by using Canu software (Koren & Walenz, 2020). Canu software was left to run by using seven threads and using default parameters for assembly, while the genome size was set to 1.8Gbp.

The contigs obtained by Canu software (Koren & Walenz, 2020) were aligned to the Golden mussels (Limnoperna fortunei) reference genome by using minimap2 software (Li, 2018). The software was run using the default parameters to map the reads to a reference genome, and an output mapping file in sam format was obtained.

The obtained contigs from Canu software (Koren & Walenz, 2020) were polished against the mapping file by using Racon software (Vaser, Sovic, Nagarajan, & Šikić, 2020). The software was run by using the default parameters, and seven threads were used during the process. The polished contigs obtained using Racon software (Vaser et al., 2020) were polished again to increase the accuracy of the consensus sequences using Medaka software (ONT, 2018). The software was run by using default parameters. The trimmed reads file obtained from Canu software was aligned to the polished contigs obtained by Racon software to increase the polishing accuracy of the contigs. Finally, the output contigs obtained from Canu software (Koren & Walenz, 2020) and the polished sequences obtained from Racon software (Vaser et al., 2020) and Medaka software (ONT, 2018) was assessed by using Quast software (Gurevich, Saveliev, Vyahhi, Tesler, 2020).

(13)

9

Results

Sample preparation

Tissues homogenization

Two tissue homogenization methods were tested to determine which method resulted in less fragmentation of the DNA and gave higher concentration and purity. The first DNA sample was extracted by using TissueLyser machine, and the second sample was extracted by using liquid nitrogen. The concentration of the genomic DNA was measured by using three readings of the same sample for each method by using Qubit 4.0 fluorometer. The purity of each method was measured by using Nanodrop fluorometer as shown in table 1. The table demonstrates the concentration and purity obtained by utilizing liquid nitrogen and TissueLyser DNA homogenization methods.

Table 1. Represent the results of analyzing the tissue homogenization methods by using Nanodrop machine and Qubit 4.0 fluorometer.

Extraction method TissueLyser Liquid nitrogen DNA concentration (Nanodrop) 18.7 ng/μL 10.9 ng/μL

A260/230 0.041 0.003

A260/280 1.85 2.30

Qubit sample 1 15.4 ng/μL 10.5 ng/μL

Qubit sample 2 17.5 ng/μL 10.1 ng/μL

Qubit sample 3 14.5 ng/μL 9.70 ng/μL

Qubit average 15.8 ng/μL 10.1 ng/μL

The fragments size of the DNA homogenized by each method was measured using a fragment analyzer. The result from TissueLyser homogenization method shows multiple fragments with a size that falls within a range of 35bp and 100bp and another range of fragments with a size that falls within the range of 800bp and 5000bp. While the result from liquid nitrogen homogenization method only shows fragments with size within 800bp to 5000bp.

DNA extraction

The result obtained by extracting the genomic DNA of a new sample of freshwater mussel using the optimized homogenization method was analyzed by using Nanodrop and fragment analyzer.

The result obtained from analyzing the concentration and the purity of the genomic DNA using Nanodrop shows a DNA concentration of 92.9 ng/μL. The absorbance at 260/230 nm was 0.193, while the absorbance at 260/280 nm was 1.85. The concentration that was obtained by using Qubit 4.0 fluorometer was 58.8 ng/μL.

The DNA fragment size extracted using 100G Qiagen genomic tip was analyzed using a fragment analyzer. The results show multiple fragments with a size that falls within the range of 50 bp and 150 bp and another range of fragments with a size that falls within the range of 1200 bp and 5000bp, as shown in figure 3.

(14)

10

Figure 3. Fragment analyzer results. The x-axis shows the size of each fragment in bp, and relative fluorescence units (RFU) are shown on the y-axis corresponding to the amount of DNA for each fragment.

Where (LM) represents the lower marker, and (UM) represents the upper marker used for calibration.

Sequencing and Base-calling

The estimated number of pores available for sequencing using R9.4.1 flow cell prior to running the sequencing run was 1627 pores. While the estimated number of pores in the flow cell after the sequencing run was 827. Running the DNA by using MinKNOW version 19.12.5 software for 13 hours resulted in 4.12 million reads and an estimated base-pairs of 5.91Gbp.

Results from estimating the genome size using Jellyfish software by using the default parameters and a k-mer length of 21 shows a genome size that falls within a range of 1.6 Gbp and 1.9 Gbp. Due to the lack of accuracy of Jellyfish software when using noisy long reads, and due to the lack of a next-generation sequencing reads, a comparison of the genome size of many freshwater mussels on GenBank has been made. The comparison indicates that the genome size of freshwater mussels falls within the range of 1.5 Gbp and 2 Gbp.

Base-calling the data using Guppy software (Wick et al., 2020) using the GPU accurate version of the software indicated that 4.12 million reads had been base-called with an estimated 5.6 Gbp bases and a median read quality of 10.81. The number of reads that passed the quality score test of seven is 3.6 million, and the estimated number of bases was 5.1Gbp with a median read quality of 11.06.

Analyzing the reads quality scores by using PycoQC software (Leger & Leonardi, 2019) indicates that more than 90% of the reads have passed the quality score of seven, and the reads median quality score is 13.65 as shown in figure 4. The median read length of all reads was 496bp, while the median read length of the passed reads was 498bp. The software also indicates that both read quality over time and read length overtime was constant during the sequencing experiment.

(15)

11

Figure 4. Result from analyzing the base-called reads quality scores by using pycoQC version 3.0 software.

The X-axis represents the read quality scores, and the y-axis represents the density of the reads.

Bioinformatic processing

Assembly

After running Canu software (Koren & Walenz, 2020), the input sequences had total coverage of 3.1x of the total length of the genome size. The result from running Canu software to correct, trim, and assemble the reads shows a total of 3194 contigs with a total of 14 Mbp. In contrast, a total of 67253 sequences has not been assembled due to bad trimming or low overlap number.

Polishing

Polishing the contigs using Racon software (Vaser et al., 2020) software and Medaka software (ONT, 2018) has been assessed by using Quast software (Gurevich et al., 2020). Polishing the contigs by using Racon software (Vaser et al., 2020) shows that a total of 3073 contigs has been reserved after the polishing step with the largest contig size of 193 Kbp. Polishing the contigs furthermore by using Medaka software (ONT, 2018) shows a total of 3164 contigs that have been reserved after the polishing step with the largest contig size of 194 Kbp. The results obtained by using Quast software (Gurevich et al., 2020) is shown in table 3 which show the number of contigs, largest contig and total length for the result obtained after running Canu software (Koren &

Walenz, 2020) to assemble the data and the result obtained after the polishing steps.

Table 3. Represent the result obtained by assessing Canu assembly and Racon/medaka polishing by using Quast software.

Software Canu Racon Medaka

Number of contigs 3194 3037 3164

Largest contig 194 Kbp 193 Kbp 194 Kbp Total length 14.6 Mbp 14.1 Mbp 13.9 Mbp

(16)

12

Discussion

Nanopore sequencing technology was developed at the beginning of the 1990s and is a method that allows to sequence the genomic DNA in a short duration of time. The MinION platform allows for real-time analysis because the individual DNA strands are translocated into the nanopore. This will allow decisions to be taken during the sequencing run, which is an essential advantage of the nanopore, especially in clinical uses (Jain, Olsen, Paten, & Akeson, 2016). Before sequencing adapters are ligated to both ends of the genomic DNA fragments, these adapters facilitate strand capture and loading of a processive enzyme at the 5'end of one strand. Also, the adapters concentrate the DNA substrate at the membrane surface proximal to the nanopore, which will boost the DNA capture rate by several thousand folds (Jain et al., 2016). These adapters allow long and continuous sequencing of both strands of a duplex molecule by covalently binding one strand to the other strand (Jain et al., 2016). When DNA passes through the pore, the sensor detects the change in anionic current caused by differences in nucleotides that pass through the pore, which will give an accurate reading of the genomic DNA sequence loaded into the machine (Jain et al., 2016).

Sample preparation

Tissue homogenization can be done in different ways, depending on the type of source of tissue.

The key starting material for every nanopore sequencing run is the quality of the DNA during the library preparation step (Schalamun et al., 2019). The quality can be assessed by using Nanodrop absorbance ratios at 260 and 280 nm. A pure dsDNA will display a 260/230 ratio between 2 and 2.2 and 260/280 ratio between 1.8 and 2 (Desjardins & Conklin, 2010). The ratio of 260/230, shown in table 1, is very low for both homogenization methods. The 260/280 ratio for the DNA that was extracted by using TissueLyser falls within the optimal ratio interval (Boesenberg, Pessarakli, & Wolk, 2012), While the 260/280 ratio for the DNA that was extracted by using liquid nitrogen is 2.30, which is higher than the optimal level for this ratio, as shown in table 1. The low 260/230 ratio that is shown for both tissue homogenization method is due to a contaminant absorbing at 230 nm or less such as residual phenol or guanidine used during DNA extraction (Matlock, 2015). In contrast, the high 260/280 ratio in the liquid nitrogen method is due to a contaminant that is absorbing at 280 nm (Matlock, 2015). This contamination can be a residual RNA in the sample, which causes an increase in the absorbance at 280 nm (Wilfinger &

Chomczynski, 1997). According to a study that was done by (Gallagher, 2017), high 260/280 purity ratios are not indicative of an issue in any downstream DNA experiments (Gallagher, 2017).

The concentration of the DNA obtained by using TissueLyser is 17.5 ng/μL, while the concentration obtained by using liquid nitrogen is 10.5 ng/μL (Table 1). Due to the difference in the weight of the tissue used in each method, normalization has been done. The result from normalizing the weight indicates that each milligram of tissue used in the liquid nitrogen method resulted in 0.54 ng/μL of DNA, and each milligram of tissue used in TissueLyser method resulted in 1.03 ng/μL of DNA. This indicates that the concentration of DNA obtained from the Liquid nitrogen method is significantly lower than the TissueLyser method.

A previous study that was done by (Smith, Li, Andersen, Slotved, & Krogfelt, 2011) to test three different tissue homogenization methods suggests that TissueLyser method did not cause over fragmentation of the genomic DNA. In contrast, the result obtained in this study by analyzing the genomic DNA by using a fragment analyzer suggests that over fragmentation has occurred. This over fragmentation can be seen in figure 3, where multiple fragments that are less than 100bp in

(17)

13

length were seen. These small fragments are due to over fragmentation of the DNA by the mechanical shearing of the machine. This problem can be avoided in the future by shortening the run time and the power of TissueLyser machine (Smith et al., 2011) or by using a tissue homogenization method that results in less mechanical shearing of the tissues such as chemical methods. Oxford nanopore technology (ONT) estimates that the optimal DNA size for MinION device is within the range of 200bp to 8000bp (Lopez et al., 2019). Sequencing fragments with a size that is below 200 bp is useless in this case, according to ONT, because event detection and base-calling will not be possible at this size (Lopez et al., 2019). This indicates that the DNA that was loaded into the sequencing machine had a mixture of optimal and suboptimal fragment sizes.

This could affect the output of the sequencing run because the MinION device will be busy sequencing the short fragments instead of the longer fragments, which will lower the number of reads that will get base-called and pass the quality score test.

To summarize, the best tissue homogenization method would be the one that obtains pure DNA with as long fragment size as possible. The result in table 1 indicates that TissueLyser homogenization method gave higher concentration and purity than liquid nitrogen homogenization method. The fragments sizes that were obtained from TissueLyser and liquid nitrogen homogenization methods are approximately the same, around 4000bp. For this reason, TissueLyser homogenization methods were chosen for the rest of the experiment to homogenize the tissue that was used to sequence the whole genomic DNA of freshwater mussels by using Oxford MinION technology.

Sequencing

Sequencing the DNA by using nanopore technology generated 4.12 million reads after 13 hours of running the device with a total of 5.91 Gbp generated. This number of reads is enough to cover 3.5x of the genome size of freshwater mussels. The minimum coverage needed to assemble a genomic sequence by using Canu software is 20x. The number of active pores that were identified in the flow cell before starting the sequencing run was 1623 pores. While, the minimum number of pores suggested by ONT to start a sequencing run is 800 pores (McIntyre et al., 2016). The number of pores that were actively sequencing at the end of the run was 70%. This indicates that the low coverage obtained during the sequencing run is not due to an issue in the loaded DNA library or the flow cell. Oxford nanopore technologies claim that a MinION nanopore flow cell is capable of sequencing up to 30Gbp if the device is left until all pores are unfunctional. This indicates that one flow cell is not to enough to sequence reads that cover 20x of the expected genome size of Anodonta anatina. This indicates that at least two flow cells are required to obtain enough coverage to assemble the whole genomic DNA of Anodonta anatina by using MinION flow cell. A high number of reads will result in an increased number of coverages for the sequenced genome, which will give a more accurate genomic DNA assembly (Jung et al., 2019).

Bioinformatic processing

Base-calling, the computational process of translating raw electrical signals to a nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford nanopore Technologies (Wick et al., 2019). The result obtained from base-calling the reads by using Guppy software (Wick et al., 2020) indicate that at least 90% of the reads have been base-called and passed the quality score test with a total of 3.6 million reads. This high number of passed reads indicate that DNA extraction, library preparation, and sequencing was successful.

(18)

14

In a genome sequencing experiment, the DNA of an organism is broken into many small pieces and reads in a sequencing machine. The reads might vary in size from 200bp to 8000bp, depending on which sequencing method is being used (Baker, 2012). Most large-scale genome sequencing experiments nowadays use the whole-genome shotgun sequencing (WGS) strategy.

This strategy work by cutting the DNA into many small fragments that vary in size, which can be sequenced from both ends to obtain many fragments with multiple fragments size (Pop, 2004).

These fragments can then be assembled into longer sequences (contigs) by any of the multiple assembly software available such as Canu or Falcon (Giordano, 2017). These contigs can then be ordered and oriented in relation to one another to form scaffolds (Sahlin, Chikhi & Arvestad, 2016). Genome assembly refers to the process of ordering the nucleotide sequences into the correct order. Assembly is required because, read lengths are shorter than genomes and most genes (Foxman, 2012). De novo genome assembly assumes that there is no preliminary knowledge about the DNA length, layout, or composition.

The third generation long reads sequencing methods, such as nanopore and PacBio, have a significant advantage over second-generation sequencing in de novo assembly studies. However, due to the low base accuracy of third-generation sequencing, this technology cannot be used for k-mer counting and estimating the genomic profile based on k-mer frequencies (Wang et al., 2020). In previous genomic projects, second-generation data was used to determine the genome size of the organisms (Wang et al., 2020). In this study, due to the lack of a second-generation reads Jellyfish software was used to obtain an estimation of the genome size. Jellyfish software estimated the genome size to be 1.8Gbp, and by comparing different mussel's species genome size on GenBank, the average genome size of mussels was found to be between 1.5 and 2 Gbp. For this reason and for the continuity of the thesis, 1.8 Gbp was chosen to be the estimated genome size for Anodonta anatina. This estimation could be improved in the future by using short reads that are obtained from second-generation sequencing.

Canu is a long-read assembly software developed using the Celera assembler pipeline, which is designed explicitly for noisy single-molecule sequences. Canu software can work on both nanopore and PacBio sequences, and the significant advantages of Canu assembler, when compared to Celera assembler, is that Canu halves depth-of-coverage requirements and improves assembly continuity while simultaneously reducing runtime by order of magnitude on large genomes (Koren et al., 2017). The lowest coverage that Canu can perform an assembly on is 20x of the genome size of the sequenced organism (Koren et al., 2017). The result obtained by sequencing the genomic DNA of freshwater mussels indicates that the coverage obtained by sequencing the DNA for 13 hours is 3.5x of the genome size. This coverage is not enough for Canu to assemble a highly contiguous de novo genome of the freshwater mussel. Running Canu software with such a coverage resulted in only 3194 contigs with a total of 14 Mbp, which do not cover the estimated genome size of Anodonta anatina genome. Such a problem can be avoided in the future by extending the duration of the sequencing run to obtain more reads or doing multiple sequencing runs until enough coverage is obtained to cover at least 20x of the genome size (Xu, Wu, Zhang, Shen, & Deng, 2017).

A study that was done by (Minei, Hoshina & Ogura, 2018) revealed that the assembly quality could be improved by doing a hybrid assembly using both short-read data from Illumina sequencing and long-read data from Oxford nanopore sequencing. This pipeline will result in longer and more accurate contigs than the contigs obtained by assembling the data of each method separately. Such a method will establish a new pipeline of de novo assembly of middle-sized genomes by using

(19)

15

Nanopore and Illumina sequencers. This method could be used as an alternative solution for the coverage issue as it will lower the required amount of reads that are needed to assemble the genomic DNA of Anodonta anatina. Another solution is to use GridION sequencing device as an alternative to MinION. This device is capable of sequencing 150Gbp of reads, while MinION is only capable of sequencing 30Gbp of reads.

Contig polishing is the process of aligning the reads obtained by sequencing to the contigs obtained by the assembly to increase the quality and the accuracy of the contigs. There are two approaches to polishing the contigs of an existing assembly. The first is to recover the multi- alignment of the reads by aligning them to the genome assembly and then redoing the consensus calculation using the original or additional reads data. The second approach is to align the reads to the assembly, find any possible errors, and fix these errors by using the reads (Zimin & Aleksey, 2019). The optimal way to evaluate the polishing step is by aligning the polished contigs to a reference genome and identifying the similarity percentage between the genome and the contigs (Zimin & Aleksey, 2019). This method cannot be used in this experiment due to the lack of a suitable reference genome for Anodonta anatina genome and the low coverage obtained during the sequencing run.

Multi-biomarker panel

A previous study that was done by (Liyan et al., 2005) suggests that DNA damage in aquatic organisms that live in a polluted environment can be used as a biomarker to detect water pollution. This study suggests the use of Restriction Fragment Length Polymorphism (RFLP) or Simple Sequence Repeats (SSR) to identify the DNA damage that is caused by a pollutant environment. The use of the RFLP method to detect DNA damage requires a very high DNA base accuracy, which is hard to obtain using Oxford nanopore sequencing technology due to the 15%

error rate that this technology carries. While the use of the SSR method to detect the DNA damage is possible using Oxford nanopore technology because this method relies on the length and number of repeats, which does not require a high base accuracy. So, the use of genomic data to detect water pollution relies on what kind of DNA damage the pollution cause and which methods will be used to detect the DNA Damage. For example, If the DNA damage is observed as a change in SSR, Oxford nanopore sequencing technology can be used to develop a multi-biomarker panel that identifies water pollution. While, If the DNA damage is seen as a change in a single or few nucleotide, then second-generation sequencing methods have to be used to develop the multi- biomarker panel due to the high base accuracy that this technology carries. Further studies are required to determine which type of DNA damage water pollution causes to the DNA of freshwater mussels to decide if Nanopore sequencing technology could be used to develop the multi- biomarker panel.

Conclusion

A partial sequence of the whole genome of Anodonta anatina was obtained during this experiment.

This sequence has the potential to be a genomic marker for Anodonta anatina species, and it can be used to analyze the genomics and the transcriptomics within this species. Sequencing and base-calling steps were performed correctly. The amount of reads and coverage that was obtained during sequencing is too low to assemble the whole genomic DNA of Anodonta anatina. The duration of the sequencing run should be extended to obtain more reads that cover 20x of the genome size of the freshwater mussels' genome. Another solution is to do multiple sequencing

(20)

16

runs using MinION sequencing device or using GridION device, which is capable of giving a high number of reads when compared to the MinION device.

Ethical aspects, gender perspectives, and impact on the society

The three R's ethical guiding principles, the reduction, refinement, and replacement, were followed during this experiment. Replacement principle, which refers to methods that avoid or replace the use of animals in the area where animals would otherwise be used, could not be followed due to the lack of a published sequencing reads for freshwater mussels. The reduction principle refers to the methods that allow researchers to obtain as much information as possible from a few animals. This principle was followed during the experiment, where the lowest amount of tissue that can give the expected outcome results was used to avoid the use of more than one animal. Refinement principles refer to the methods that minimize potential pain, suffering, distress, and increase animal welfare. This principle could not be followed during the experiment due to the lack of a central nervous system (CNS) in freshwater mussels, which indicates that these creatures lack the sense of pain and suffering. The genomic DNA of freshwater mussels that this project aims to obtain can be used to develop a multi-biomarker panel to identify water pollution.

The genomic DNA of freshwater mussels can also be used to develop a fast and accurate method to identify the species of freshwater mussels and give a better understanding of freshwater mussel's genomics and transcriptomics.

(21)

17

Future perspectives

The main objectives of the current research were partially accomplished. The result from sequencing the DNA indicates that only 3.5x of the genome coverage was obtained, which is not enough to assemble a sequence that covers the genome. More sequencing runs are required to obtain reads that at least cover 20x of the genome size. According to a previous study that was done by (Minei et al., 2018), the reads obtained during this study can be used together with second-generation sequencing reads to assemble a high accuracy sequence that covers the whole genomic DNA of Anodonta anatina species. Using this pipeline will increase the base accuracy of the sequence and lower the reads coverage needed to assemble the genomic DNA sequence of Anodonta anatina mussel.

A study done by (Liyan et al., 2005) revealed that water pollution could cause DNA damage in the organisms that live in polluted water. Detection of this DNA damage can provide a tool for monitoring and recognizing the genotoxicity of pollutants in a water environment. So, after obtaining the genomic DNA of Anodonta anatine, this study could be further extended to analyze the presence of DNA damage caused by water pollution in Anodonta anatine, Further studies are also required to determine if this DNA damage can be used to develop a multi-biomarker panel that helps to identify water pollution.

After obtaining the genomic sequence of Anodonta anatina, further studies are also necessary on this species' gene expression to validate the possibility of finding a transcriptional biomarker for water pollution. A previous study that was done by (Ugge et al., 2020) revealed that freshwater mussels show transcriptional and biochemical biomarker responses under environmentally relevant Cu exposure. More studies are required to validate if similar biomarkers can be observed underexposure of other types of pollutants and if these biomarkers can be used to develop methods that allow for the identification of these pollutants.

(22)

18

Acknowledgments

First and foremost, I am thankful to the School of Bioscience at the University of Skövde, Sweden, for contributing the necessary facilities and resources for this thesis. Special thanks should also be given to my supervisor Mikael Ejdebäck for his constant support, motivation, and patience during this thesis. I want to extend my thanks to the lecturer at the bioscience school, John Baxter, for his help during the laboratory work and his continued guidance during this thesis. I would like to extend my thanks to the Associate Professor Annie Jonsson for her help during the sample collection stage of this thesis. I am incredibly grateful to my student colleagues at the bioscience school for the encouraging discussions and support through the thesis. Finally, I am thankful to my family and friends for their motivation and support.

(23)

19

References

Ari, Ş., & Arikan, M. (2016). Next-generation sequencing: advantages, disadvantages, and future.

In Plant omics: Trends and applications (pp. 109-135). Springer, Cham.

Baker, M. (2012). De novo genome assembly: what every biologist should know. Nature methods, 9(4), 333-337.

Boesenberg-Smith, K. A., Pessarakli, M. M., & Wolk, D. M. (2012). Assessment of DNA yield and purity: an overlooked detail of PCR troubleshooting. Clinical Microbiology Newsletter, 34(1), 1-6.

Branton, D., Deamer, D. W., Marziali, A., Bayley, H., Benner, S. A., Butler, T., ... & Jovanovich, S. B.

(2010). The potential and challenges of Nanopore sequencing. In Nanoscience and technology: A collection of reviews from Nature Journals (pp. 261-268).

Castro, C. J., Marine, R. L., Ramos, E., & Ng, T. F. F. (2019). The effect of variant interference on de novo assembly for viral deep sequencing. bioRxiv, 815480.

Deamer, D., Akeson, M., & Branton, D. (2016). Three decades of nanopore sequencing. Nature biotechnology, 34(5), 518.

Desjardins, P., & Conklin, D. (2010). NanoDrop microvolume quantitation of nucleic acids. JoVE (Journal of Visualized Experiments), (45), e2565.

Elkins, K. M. (2012). Forensic DNA biology: a laboratory manual. Academic Press.

Foxman, B. (2010). Molecular tools and infectious disease epidemiology. Academic Press.

França, L., Carrilho, E., & Kist, T. (2002). A review of DNA sequencing techniques. Quarterly Reviews of Biophysics, 35(2), 169-200. doi:10.1017/S0033583502003797

Gallagher, S. R. (2017). Quantitation of DNA and RNA with absorption and fluorescence spectroscopy. Current Protocols in Immunology, 116(1), A-3L.

Giordano, F., Aigrain, L., Quail, M. A., Coupland, P., Bonfield, J. K., Davies, R. M., ... & Yue, J. X. (2017).

De novo yeast genome assemblies from MinION, PacBio and MiSeq platforms. Scientific reports, 7(1), 1-10.

Green, M. R., & Sambrook, J. (2017). Precipitation of DNA with isopropanol. Cold Spring Harbor Protocols, 2017(8), pdb-prot093385.

Green, R. H., S. M. Singh & R. C. Bailey, 1985. Bivalve molluscs as response systems for modeling spatial and temporal environmental patterns. The Science of the Total Environment 46: 147–169.

Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. (2020). QUAST: quality assessment tool for genome assemblies (version 5.1.0) [Computer software]. Retrieved from https://github.com/ablab/quast.

Jain, M., Olsen, H. E., Paten, B., & Akeson, M. (2016). The Oxford Nanopore MinION: delivery of Nanopore sequencing to the genomics community. Genome Biology, 17(1), 1–11.

https://doi.org/10.1186/s13059-016-1103-0

Jung, H., Winefield, C., Bombarely, A., Prentis, P., & Waterhouse, P. (2019). Tools and strategies for long-read sequencing and de novo assembly of plant genomes. Trends in plant science.

(24)

20

Kasianowicz, J. J., Brandin, E., Branton, D., & Deamer, D. W. (1996). Characterization of individual polynucleotide molecules using a membrane channel. Proceedings of the National Academy of Sciences, 93(24), 13770-13773.

Koren, S., & Walenz, B. (2020). Marbl/Canu (Version 2.0) [Computer software]. Retrieved from https://github.com/marbl/Canu.

Koren, S., Walenz, B. P., Berlin, K., Miller, J. R., Bergman, N. H., & Phillippy, A. M. (2017). Canu:

scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome research, 27(5), 722-736.

Leger, A., Leonardi, T. (2019). pycoQC, interactive quality control for Oxford Nanopore Sequencing (Version 3.0) [Computer software]. Retrieved from https://github.com/a-slide/pycoQC.

Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094- 3100. doi:10.1093/bioinformatics/bty191.

Liyan, Z., Ying, H., & Guangxing, L. (2005). Using DNA damage to monitor water environment. Chinese Journal of Oceanology and Limnology, 23(3), 340-348.

Lopez, R., Chen, Y., Ang, S., Yekhanin, S., Makarychev, K., Racz, M., . . . Ceze, L. (2019, July 03). DNA assembly for Nanopore data storage readout. Retrieved May 21, 2020, from https://www.nature.com/articles/s41467-019-10978-4.

Lu, H., Giordano, F., & Ning, Z. (2016). Oxford Nanopore MinION Sequencing and Genome Assembly. Genomics, Proteomics and Bioinformatics, 14(5), 265–279.

https://doi.org/10.1016/j.gpb.2016.05.004.

Matlock, B. (2015). Assessment of nucleic acid purity. Thermo Fisher Scientific. [verkkojulkaisu].

[Viitattu 2017-11-23.] Saatavissa: https://tools. thermofisher.

com/content/sfs/brochures/TN52646-E-0215M-NucleicAcid. pdf.

McIntyre, A. B., Rizzardi, L., Angela, M. Y., Alexander, N., Rosen, G. L., Botkin, D. J., ... & Burton, A. S.

(2016). Nanopore sequencing in microgravity. npj Microgravity, 2(1), 1-9.

Minei, R., Hoshina, R., & Ogura, A. (2018). De novo assembly of middle-sized genome using MinION and Illumina sequencers. BMC genomics, 19(1), 700.

Naimo, T. J. (1995). A review of the effects of heavy metals on freshwater mussels. Ecotoxicology, 4(6), 341-362.

ONT. (2018). Medaka Sequence correction provided by ONT research (Version 1.0.1) [Computer software]. Retrieved from https://github.com/ONT/medaka.

Osterling, M., Zulsdorff, V., & Schneider, L. (2012). Host fish species of freshwater mussels in seven Swedish river systems (The thick-shelled river mussel brings back life to rivers. Serious No. 46).

Retrieved from the website of UCforlife : http://www.ucforlife.se/wp- content/uploads/2012/12/7.2.2_TECHNICAL-REPORT_C1a.pdf.

Pop, M., Phillippy, A., Delcher, A. L., & Salzberg, S. L. (2004). Comparative genome assembly. Briefings in bioinformatics, 5(3), 237-248.

Prince, R. C. (2000). Bioremediation. Kirk-Othmer Encyclopedia of Chemical Technology.

(25)

21

Rosenberg, G. (2014). A new critical estimate of named species-level diversity of the recent Mollusca. American Malacological Bulletin, 32(2), 308-322.

Rusk, N. (2009). Cheap third-generation sequencing. Nature Methods, 6(4), 244-244.

Sahlin, K., Chikhi, R., & Arvestad, L. (2016). Assembly scaffolding with PE-contaminated mate-pair libraries. Bioinformatics, 32(13), 1925-1932.

Sanger, F., & Coulson, A. R. (1975). A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. Journal of Molecular Biology, 94(3), 441–448.

https://doi.org/10.1016/0022-2836(75)90213-2.

Sanger, F., Nicklen, S., & Coulson, A. R. (1977). DNA sequencing with chain-terminating inhibitors.

Proceedings of the national academy of sciences, 74(12), 5463-5467.

Schalamun, M., Nagar, R., Kainer, D., Beavan, E., Eccles, D., Rathjen, J. P., ... & Schwessinger, B.

(2019). Harnessing the MinION: An example of how to establish long-read sequencing in a laboratory using challenging plant tissue from Eucalyptus pauciflora. Molecular ecology resources, 19(1), 77-89.

Shendure, J., Mitra, R. D., Varma, C., & Church, G. M. (2004). Advanced sequencing technologies:

methods and goals. Nature Reviews Genetics, 5(5), 335–344. https://doi.org/10.1038/nrg1325.

Smith, B., Li, N., Andersen, A. S., Slotved, H. C., & Krogfelt, K. A. (2011). Optimising bacterial DNA extraction from faecal samples: comparison of three methods. The Open microbiology journal, 5, 14.

Wang, H., Liu, B., Zhang, Y., Jiang, F., Ren, Y., Yin, L., ... & Fan, W. (2020). Estimation of genome size using k-mer frequencies from corrected long reads. arXiv preprint arXiv:2003.11817.

Vaughn, C. C. (2018). Ecosystem services provided by freshwater mussels. Hydrobiologia, 810(1), 15-27.

Vaughn, C. C., & Hakenkamp, C. C. (2001). The functional role of burrowing bivalves in freshwater ecosystems. Freshwater Biology, 46(11), 1431-1446.

Wick, R. R., Judd, L. M., & Holt, K. E. (2019). Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome biology, 20(1), 129.

Wick, R. R., Judd, L. M., and Holt, K. E. (2020). Guppy Local accelerated base calling for Nanopore data (Version 3.6) [Computer software]. Retrieved from https://community.ONT.com/downloads.

Wilfinger, W. W., & Chomczynski, M. K. (1997). P 260/280 and 260/230 Ratios Nanodrop® ND- 1000 and ND-8000 8-Sample Spectrophotometers. BioTechniques, 22, 474-481.

Xu, C., Wu, K., Zhang, J. G., Shen, H., & Deng, H. W. (2017). Low-, high-coverage, and two-stage DNA sequencing in the design of the genetic association study. Genetic epidemiology, 41(3), 187-197.

Zimin, A. V., & Salzberg, S. L. (2019). The genome polishing tool POLCA makes fast and accurate corrections in genome assemblies. bioRxiv.

References

Related documents

Stöden omfattar statliga lån och kreditgarantier; anstånd med skatter och avgifter; tillfälligt sänkta arbetsgivaravgifter under pandemins första fas; ökat statligt ansvar

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

I regleringsbrevet för 2014 uppdrog Regeringen åt Tillväxtanalys att ”föreslå mätmetoder och indikatorer som kan användas vid utvärdering av de samhällsekonomiska effekterna av

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar

På många små orter i gles- och landsbygder, där varken några nya apotek eller försälj- ningsställen för receptfria läkemedel har tillkommit, är nätet av