• No results found

Advancing Evolutionary Biology: Genomics, Bayesian Statistics, and Machine Learning

N/A
N/A
Protected

Academic year: 2021

Share "Advancing Evolutionary Biology: Genomics, Bayesian Statistics, and Machine Learning"

Copied!
2
0
0

Loading.... (view fulltext now)

Full text

(1)

INSTITUTIONEN FÖR BIOLOGI OCH MILJÖVETENSKAP

Advancing Evolutionary Biology:

Genomics, Bayesian Statistics, and Machine

Learning

Tobias Andermann

Institutionen för biologi och miljövetenskap

Naturvetenskapliga fakulteten

Opponent: Dr. Tracy A. Heath

Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, USA

Examinator: Mari Källersjö

Institutionen för Biologi och Miljövetenskap, Göteborgs universitet

Akademisk avhandling för filosofie doktorsexamen i naturvetenskap, inriktning biologi, som med tillstånd från Naturvetenskapliga fakulteten kommer att offentligt försvaras fredagen den 18 december 2020 kl. 14:00 i Hörsalen, Botanhuset, Institutionen för Biologi

och Miljövetenskap, Carl Skottsbergs gata 22B, Göteborg.

ISBN 978-91-8009-136-7 (tryckt) ISBN 978-91-8009-137-4 (pdf) Tillgänglig via http://hdl.handle.net/2077/66848

(2)

ABSTRACT

During the recent decades the field of evolutionary biology has entered the era of big data, which has transformed the field into an increasingly computational discipline. In this thesis I present novel computational method developments, including their application in empirical case studies. The presented chapters are divided into three fields of computational biology: genomics, Bayesian statistics, and machine learning. While these are not mutually exclusive categories, they do represent different domains of methodological expertise.

Within the field of genomics, I focus on the computational processing and analysis of DNA data produced with target capture, a pre-sequencing enrichment method commonly used in phylogenetic studies. I demonstrate on an empirical case study how common computational processing workflows introduce biases into the phylogenetic results, and I present an improved workflow to address these issues. Next I introduce a novel computational pipeline for the processing of target capture data, intended for general use. In an in-depth review paper on the topic of target capture, I provide general guidelines and considerations for successfully carrying out a target capture project. Within the context of Bayesian statistics, I develop a new computer program to predict future extinctions, which utilizes custom-made Bayesian components. I apply this program in a separate chapter to model future extinctions of mammals, and contrast these predictions with estimates of past extinction rates, produced by a set of different recently developed Bayesian algorithms. Finally, I touch upon newly emerging machine learning algorithms and investigate how these can be improved in their utility for biological problems, particularly by explicitly modeling uncertainty in the predictions made by these models.

The presented empirical results shed new light onto our understanding of the evolutionary dynamics of different organism groups and showcase the utility of the methods and workflows developed in this thesis. To make these methodological advancements accessible for the whole research community, I embed them into well documented open-access programs. This will hopefully foster the use of these methods in future studies, and contribute to more informed decision-making when applying computational methods to a given biological problem.

KEYWORDS

Computational biology, bioinformatics, phylogenetics, neural networks, NGS, target capture, Illumina sequencing, fossils, IUCN conservation status, extinction rates

References

Related documents

Have you during the course been subjected to negative

Representation-based hardness results are interesting for a number of rea- sons, two of which we have already mentioned: they can be used to give formal veri cation to the importance

(a) Geographic positions for all wolverine samples included in the population genetic study (n = 234, mainly tissue samples collected from 1993 to 2011) (encircled points, samples

Because of these challenges, genomic data is currently only available for a few Wolbachia strains. These are the two supergroup A strains, wMel infecting Drosophila melanogaster

Parallellmarknader innebär dock inte en drivkraft för en grön omställning Ökad andel direktförsäljning räddar många lokala producenter och kan tyckas utgöra en drivkraft

• Utbildningsnivåerna i Sveriges FA-regioner varierar kraftigt. I Stockholm har 46 procent av de sysselsatta eftergymnasial utbildning, medan samma andel i Dorotea endast

Den förbättrade tillgängligheten berör framför allt boende i områden med en mycket hög eller hög tillgänglighet till tätorter, men även antalet personer med längre än

Det har inte varit möjligt att skapa en tydlig överblick över hur FoI-verksamheten på Energimyndigheten bidrar till målet, det vill säga hur målen påverkar resursprioriteringar