• No results found

INVESTIGATING STREPTOCOCCAL BIODIVERSITY IN SEPSIS USING NEXT-GENERATION SEQUENCING

N/A
N/A
Protected

Academic year: 2021

Share "INVESTIGATING STREPTOCOCCAL BIODIVERSITY IN SEPSIS USING NEXT-GENERATION SEQUENCING"

Copied!
61
0
0

Loading.... (view fulltext now)

Full text

(1)

INVESTIGATING STREPTOCOCCAL

BIODIVERSITY IN SEPSIS USING NEXT-GENERATION SEQUENCING

Master Degree Project in Systems Biology A2E Author:

Daniel Shahbazi 950120-6016

A13dansh@student.his.se Examiner:

Björn Olsson

Bjorn.olsson@his.se School of Bioscience University of Skövde 541 28 Skövde Supervisor:

Helena Enroth

Helena.enroth@unilabs.com

(2)

Abstract

Sepsis is one of the leading causes for fatalities in the intensive care unit, and also one of the biggest health problems worldwide. It is a disease caused primarily by bacterial infections but can also be caused by viral or fungal infections. Since it is such a big health problem being associated with increased risk of sepsis, coupled with longer stays in the intensive care unit, the need for fast diagnosis and treatment is very important. Currently, culture is the leading diagnostic method for identification of bacteria, although other methods are currently being tested to improve identification time and decrease cost and workload. Next generation sequencing (NGS) has the capacity to output several million reads in a single experiment, making it very fast and relatively cheap compared to other older sequencing methods such as Sanger sequencing. The ability to analyze genes and even whole genomes, opens the possibilities to identify factors such as bacterial species, virulence genes and antibiotic resistance genes. The aim of this study was to find any possible correlations between 16 species of streptococci and clinical data in patients with suspected sepsis. Initial species identification was performed using MALDI-TOF before the samples were sequenced using NGS. Sequence files were then quality controlled and trimmed before being assembled. Following assembly, coverage was controlled for all assembled genomes before the downstream analysis started. Different tools such as 16S RNA species identification, multi locus sequence typing and antibiotic resistance finder were used, among other tools. The results were extremely mixed, with the overall quality of the data being of good quality, but the assembly and downstream analysis being worse. The most consistent species was S. pyogenes. No correlation between sepsis patients and relevant clinical data was found. The mixed quality of results from assembly and downstream analysis were most likely contributed to difficulties in culturing and sequencing of the streptococci. Finding ways to circumvent these problems would most likely aid in general sequencing of streptococcal species, and hopefully in clinical applications as well.

(3)

Popular science

Most people mainly think about cancer, obesity, diabetes and heart attacks when they hear “big, global health problems”. One of the biggest health problems worldwide, which is sepsis, is not mentioned as much as the others, but is a bigger problem than most of them. Our skin poses a hefty defense against most foreign organisms, but sometimes it fails to keep them out. When foreign organisms such as bacteria enter your body, your body reacts in a specific way to get rid of them.

This is an immune response. The immune cells in your blood start moving towards the infection trying to fend it off. They send out signals to surrounding specialized immune cells telling them to consume and break down the invaders. The immune cells can also create antibodies against the invaders, which helps breaking them down. Sometimes the immune system can overreact and go into overdrive because of e.g. bacteria in the blood. This is what is known as sepsis. During a septic shock, the immune response is mounted throughout the body, causing blood pressure to drop and heart rate to increase. This leads to organs not getting enough blood, and therefore not enough oxygen to function properly. If left untreated, this leads to serious complications, and in most cases death. Since it is the leading cause of fatalities in the intensive care unit, the need for fast treatment is extremely important. Since bacteria can manage to develop resistance against antibiotics, it is important to know if bacteria are the cause of the sepsis or not, and if so, what species they are.

Knowing the species lets healthcare workers know what antibiotic to use. Currently, methods for identifying the bacteria are robust, but relatively slow. A lot of techniques are being tested, modified and optimized to be usable in different fields. One of these techniques is called next generation sequencing. This method allows for the DNA – the blueprint for all living organisms - of e.g. bacteria to be obtained. The method is extremely fast and generates tons of data in a single experiment, allowing for a lot of analyses to be performed. When looking at the genetic level of organisms, huge amounts of information about that organism can be revealed. Any genes that aid the bacteria in resisting antibiotics, genes that help them infect hosts easier, what species they belong to etc. can be obtained. In this study, bacteria belonging to 16 different streptococcal species were isolated from patients with suspected sepsis and were analyzed by using the next generation sequencing technique. The data output from the sequencing was then analyzed for different traits that the bacteria might have, such as genes that help them resist certain antibiotics, to confirm what species it really is and to see if the bacteria might have picked up DNA from other bacteria in the form of so- called plasmids. The results from these analyses were then compared against potentially useful information about the sepsis patients, such as age and sex to see if any of these factors might correlate with one another. Problems arose during the experiment, such as errors when trying to put together the pieces of the bacteria’s DNA to one whole unit. This, in turn, led to the rest of the analyses returning doubtful results. Some of the results, such as the absence of genes that might give Streptococcus pyogenes resistance against certain antibiotics were in line with previous studies suggesting the same results, but further identification and confirmation about specific species and sequence types either returned as unknown or faulty. These problems are likely attributed to streptococci being difficult to culture, and also difficult to sequence properly when using certain methods, such as automatic DNA extraction.

(4)

List of abbreviations

CRP C-Reactive Protein

FISH Fluorescent In SituHybridization

ICU Intensive Care Unit

MALDI-TOF MS Matrix Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry

MLS Macrolides, Lincosamides, Streptogramines

MLST Multi Locus Sequencing Tool

NGS Next Generation Sequencing

QUAST Quality ASsessment Tool (for Genome Assemblies)

SBS Sequencing by Synthesis

SGB Streptococcus Group B

SGC Streptococcus Group C

SGG Streptococcus Group G

SOFA Sequential Organ Failure Assessment

SPAdes St. Petersburg Genome Assembler

UTI Urinary Tract Infection

(5)

Table of contents

Introduction ... 1

Streptococcus ... 3

Next-Generation Sequencing and Genomic studies of bacteria ... 3

Methods ... 5

Culturing and DNA extraction, next generation sequencing ... 5

Quality control and trimming ... 5

Assembly and coverage ... 6

Analysis of genomes ... 6

Results ... 8

Quality control and trimming ... 8

Assembly and coverage ... 9

SpeciesFinder ... 10

MLST ... 11

ResFinder ... 11

PathogenFinder and PlasmidFinder ... 11

Traitar ... 12

Clinical data ... 12

Discussion ... 13

Ethical aspects ... 18

Future perspectives ... 18

Acknowledgements ... 19

References ... 20

Appendix A... 25

Appendix B ... 37

Appendix C ... 47

(6)
(7)

1

Introduction

Because of its high mortality rate, sepsis is considered as one of the biggest health problems globally. Despite optimal care for patients, the chances of survival are still low (Shukeri et al., 2017).

Sepsis is also the main cause of fatalities in the intensive care unit (Ndjom et al., 2017). This is mainly attributed to the fact that a majority of these patients present other diseases as well, making diagnosis difficult in an intensive care setting (Novosad et al., 2016). Studies have also suggested associations between length of stay in intensive care units and risk of infection, putting this group of patients at greater risk (Vincent et al., 2009). Sepsis-related maternal mortality seems to be relatively high in the UK compared to other high-income countries. In the UK, it is the leading cause of maternal mortality (Acosta and Knight, 2013). Even though the rate of fatality has decreased during the last several years, several studies suggest that the rate of incidence is still increasing at the same time (Dombrovskiy et al., 2007). The increased rate of sepsis has been suggested to likely be contributed to the aging populations and an increase in recognition (Dellinger et al., 2013). Other studies suggest that the reported increase in sepsis might be incorrect. In a study made by Rhee and colleagues (Rhee et al., 2017), clinical data from 409 hospitals was analyzed. The results indicated no change in neither sepsis incidence during a 5-year period (2009-2014) nor in the combined outcome of discharge from hospital or death. Given the high fatality and morbidity associated with sepsis, finding and development of optimal treatment and faster, more effective ways of diagnosing sepsis is of utmost importance and priority.

Contrary to popular belief, sepsis is not only caused by bacteria entering the blood stream. It can be caused by a lot of different factors, such as fungal or viral infections (Deutschman and Tracey, 2014).

Bacterial infections do however cause a majority of septic cases. When these microbial organisms enter the bloodstream, the body responds with an overwhelming and powerful inflammatory response. If this response is not regulated properly, organ failure is likely to follow eventually (Hawiger et al., 2015). In a study by Singer et al (2016) it was suggested that this organ dysfunction should be represented by a point-based system called the Sequential Organ Failure Assessment (SOFA, see Figure 1). Two points or more in this system indicates organ dysfunction for clinical purposes. Also, septic shock should be considered as a subgroup of sepsis. Septic shock proposes a greater risk of mortality than just normal sepsis, which is mainly associated with different cellular and metabolic abnormalities (Singer et al., 2016).

Since sepsis is such a big health issue, mainly in the intensive care unit (ICU) and with the length of stay in the ICU being associated with increased risk of infection, the need for fast and effective diagnosis is of great importance. Earlier treatment has been shown to increase chances of survival, since delay of treatment increases the risk of multiple organ failures (Blanco, et al, 2008). Currently, a few guidelines are used to diagnose sepsis patients. Clinical definitions play a partial role, mainly following the guidelines from the aforementioned SOFA system. The main criteria to follow are altered levels of consciousness, systolic blood pressure that is less than 100 mmHg, and a respiratory rate greater than 22 rpm (Singer et al., 2016).

(8)

2

Figure 1. The Sequential Organ Failure Assessment (SOFA) 3 system used for classifying sepsis patients.

Culture-based diagnosis is one of the more reliable techniques currently in use, although the drawback is that it is relatively slow (Bursle and Robson 2017). Therefore, faster ways of diagnosis are of great interest, such as the use of early biomarkers or other molecular tests. There are presently a couple of different biomarkers in use for diagnosis; C-reactive protein (CRP) and procalcitonin. The CRP is a protein that is well-known for being a biomarker of inflammation, and the plasma levels have been shown to correlate with organ failure in critically ill patients (Lobo, et al, 2002). Studies have suggested that the levels of CRP on its own is a better marker of infection than patient temperature, but the combination of CRP and temperature resulted in a more accurate infection diagnosis of sepsis patients (Póvoa et al., 2005). Other biomarkers or ways of diagnosis more specific to sepsis would be more beneficial however, since CRP is relatively unspecific. It can, for example, be used to monitor patients post-operation, since it is associated with the rate of injury after operation (Watt, et al, 2015).

Using biomarkers to quickly and easily identify septic cases would not only benefit the patient since it would result in being treated earlier, but it would also eliminate the problem of administering broad-spectrum antibiotics as a preventative treatment before culture-based diagnosis can confirm or dismiss if there are any bacteria present in the blood (Dupuy et al., 2013).

Several studies have been made, suggesting different treatments and routines for septic patients.

One example is a study by Dellinger and colleagues (Dellinger et al., 2013) where some of the recommended actions were blood culturing before administration of antibiotic therapy, confirming potential source ofinfection by imaging studies, and administration of broad-spectrum antibiotics within certain timeframes after the recognition of septic shock or severe sepsis.

Altered Mental Status

Respiratory Rate >22/min

Systolic Blood Pressure <100

mmHg

Score of 2 or more criteria

Score of less than 2 criteria

Greater chance of good outcome Greater risk of

poor outcome

(9)

3

Streptococcus

The Streptococci are classified as Gram-positive, nonmotile and non-sporeforming cocci (Baron and Patterson, 1996). One way of differentiating the different human pathogenic streptococci from each other is through the use of “Lancefield Grouping”. This makes use of surface antigens that are specific for different species and differentiates them into different groups such as A, B, C, G (Reglinski and Sriskandan, 2015).

Streptococcus pneumoniae is a Gram-positive bacterium and is one of the major human pathogens among the Streptococci (Martner et al., 2008). It often causes diseases such as meningitis, sinusitis and otitis media (Lynch and Zhanel, 2009). Since it most frequently colonizes in the respiratory tract, especially in children (Greenberg, 2009), it is often exposed to antibiotics, which could lead to development of resistance to these agents (Jacobs, 2004). S. pneumoniae possesses a few traits which contribute to its pathogenicity, like the hydrophilic polysaccharide capsule surrounding the bacteria, which helps against phagocytes present in the host (Hyams et al., 2010). The bacteria also have another property which is peculiar; when it reaches the stationary growth state, it tends to undergo autolysis by using autolysins. This has been hypothesized to be a way of altering the behavior of neutrophils by inducing production of intracellular reactive oxygen species in neutrophils (Martner et al., 2008). It has also been suggested that the autolysis is a mechanism used to avoid activation of phagocyte-activating cytokines, and therefore phagocytosis (Martner et al., 2009).

Group A Streptococci (also known as SGA) is a group of Streptococcus that can cause infections such as necrotizing fasciitis, streptococcal shock syndrome and pharyngitis (Kansal et al., 2010). These bacteria are associated with quicker sepsis progression, as well as more severe cases of sepsis (Acosta et al., 2014). The SGA are covered in a specific protein called M Protein that serves as a key virulence factor for the bacteria by offering diversity in the bacteria’s surface coat, allowing it to bypass antibodies produced from earlier infections (Metzgar and Zampolli, 2011).

The Group B Streptococcus (or SGB) is a group of Gram-positive bacteria that can be found in most healthy women’s vaginal and gastrointestinal flora and has also been identified as the leading cause of neonatal sepsis (Barcaite et al., 2008).

Three species of Streptococci belong to the same group; the Streptococcus anginosus group. These 3 Streptococci are S. anginosus, S. constellatus and S. intermedius (Summanen, 1993). These bacteria have been isolated from the gastrointestinal tract and the pharynx. The bacteria often cause oral cavity, skin, liver and soft tissue infections, forming abscesses (Fazili et al., 2017). Pyogenic infections from S. intermedius have been associated with less favorable prognosis compared to S. constellatus, and longer hospital stay compared to S. anginosus (Junckerstorff et al., 2014), while S. anginosus infections have been shown to be less associated with abscesses than the other two species (Claridge, et al, 2001).

Next-Generation Sequencing and Genomic studies of bacteria

Next-generation sequencing (NGS) is a relatively new, but revolutionizing technique that has made extremely fast sequencing possible. NGS presents a lot more possibilities and advantages than previous sequencing methods such as Sanger sequencing. With NGS, millions of sequence reads can be processed in parallel, generating a huge amount of data from a single experiment (Mardis, 2008).

Sanger sequencing is a relatively slow method, which is only able to differentiate and identify small base changes in the DNA, such as substitutions and deletions. With NGS however, it is possible to identify these small base changes, and also larger changes such as translocations, which typically require use of technologies such as fluorescence in situ hybridization (FISH) (Behjati and Tarpey, 2013). NGS has been used in previous studies in order to identify bacterial infections in the blood, and also to identify those bacteria’s taxonomic profiles (Gosiewski et al., 2017). The researchers in one study found that healthy volunteers mostly had anaerobic bacteria (76%) while septic patients seemed to have more aerobic or microaerophilic microorganisms (75.1%) (Gosiewski et al., 2017).

(10)

4

Their conclusion was that with the help of NGS, bacterial DNA and taxonomic composition can be detected directly from the blood.

In a former study, Ljungström, et al describe the “Sepsis study Skaraborg” where 2475 patients with sepsis were examined. The study was performed during 2011-2012, over a nine month long period.

In this study, approximately 1800 bacterial isolates were successfully collected. Some samples were isolated from patients’ blood, while some were isolated from urine and many other locations.

Species identification was done using MALDI-TOF MS. A range of different bacteria were isolated, including Escherichia coli, Streptococci and Klebsiella isolates, with E. coli being the most common one. The isolates were sequenced, assembled and then examined. Bacterial traits were predicted beforehand, and the sequences were analyzed for characteristics such as phylogenetic grouping, species identification, virulence genes and resistance genes.

Another study performed on patients suffering from orthopedic-device-related infections used whole-genome sequencing to study different relationships between genomic variations and different features that were associated with “cured patient”-status and “not cured patient”-status.

They found that traits in bacteria such as aminoglycoside resistance and being able to form a strong biofilm were associated with the “not cured patient”-status (Post et al., 2017). Being able to identify these kinds of bacterial traits and associating them to specific outcomes in diseases such as sepsis would provide not only further working-points for future research, but also increased knowledge about how to treat the disease. It also shows that using comparative genomics and whole-genomic sequencing provides better insight on how to tackle certain diseases or conditions.

S. pyogenes, which is also referred to as Streptococcus Group A is another pathogen that can cause sepsis (among other diseases such as necrotizing fasciitis and streptococcal toxic shock syndrome).

There have been previous studies where these bacteria have been screened using NGS, which then enabled researchers to examine important traits such as resistance, genomic recombination, virulence genes and such (Ibrahim et al., 2016).

The next step would be not only to use NGS to find microbial infections and diagnose sepsis, but to find and identify bacteria present in septic patients and then determine if they might be the cause of the sepsis, or if they are just randomly present in the patient at the time. Considering that NGS has been used to check for factors like virulence genes, resistance etc., this should be possible and also beneficial since it could aid in customizing and tailoring the right treatment for the patient. Further work could also address the methods of analyzing the data, focusing more on optimizing the methods, and the pipe-line overall.

The current study is inspired by previous work regarding whole-genomic sequencing and bacterial traits. However, the use of NGS, assembly of the genomes, identification of bacterial traits which are then compared both between different species of Streptococci, but mainly pairing and comparing with possibly relevant clinical data such as sex, age and type of sample has not been done for all these groups of Streptococci.

The aim of this master thesis was to investigate potential antibiotics genes, acquired plasmid DNA, sequence types and phenotypical traits in clinical streptococcal isolates in sepsis by use of broad- analysis of NGS data using tools provided by Center for Genomic Epidemiology and the software TRAITAR, and some clinical data, such as sex, age, clinical diagnosis. This clinical data will be coupled with the NGS data to also see if there is any difference in the different bacterial traits mentioned above within the same species, between different types of samples, different sequence types and clinical factors. A secondary aim was to create a basis for a functioning pipeline when working with bacterial NGS data, which can later be improved upon to optimize future similar works. Finding possible correlations between different bacterial traits and clinical data could aid in faster diagnosis of patients or reduce the abundant use of broad-spectrum antibiotics by narrowing down the responsible bacteria to a smaller number of groups or species.

(11)

5

Methods

Culturing and DNA extraction, next generation sequencing

The Streptococci isolates that are to be analyzed in this experiment are isolates from the “Sepsis study Skaraborg”. Bacterial isolates were cultured on blood-agar plates overnight by personnel at the Clinical Microbiology Laboratory, Unilabs, Skaraborg Hospital, Skövde. The bacteria were cultured using conventional culturing methods. Species identification was performed using MALDI- TOF MS. DNA from S. pneumoniae was extracted using the small volume kit, while the rest of the DNA isolation was performed using the large volume kit. Samples were diluted using different volumes for the kits as well. S. pneumoniae were diluted as to have an input volume of 200 µl (output volume 100 µl), while 96 other samples had an input volume of 1000 µl, and the 16 remaining had an input volume of 500 µl. DNA concentration for some samples was then measured using Qubit 2.0 (Thermo Fisher Scientific). Library preparation was also performed by personnel at UniLabs by enzymatic tagmentation, PCR-cleanup and index-ligation. This was done according to Nextera XT guidelines (Illumina).

Out of 185 isolates, a total of 169 Streptococci isolates were successfully cultured and extracted. The remaining 16 samples could not be cultured properly.

Samples were transported to SciLifeLab in Solna where the DNA was sequenced by personnel at SciLifeLab using Sequencing By Synthesis technique (SBS) on a Illumina HiSeq 2500 instrument, using a high throughput protocol for bacteria. These samples include: Group A Streptococci (37), Group B Streptococci (22), Group C Streptococci (7), Group G Streptococci (13), S. pneumoniae (57), S.

Anginosus (6), S. Constellatus (3), S. Equisimilis (1), S. Gallolyticus (2), S. Gordonii (2), S. Intermedius (2), S. Mitis (8), S. Oralis (1), S. Parasanguis (4), S. Salivarius (3) and S. Sanguinis (1) as identified by MALDI-TOF.

Quality control and trimming

The workflow starting from quality control and trimming has been visualized in a flowchart figure (Figure 2).

All sequenced files for all 169 isolates were downloaded from SciLife’s servers to the University of Skövde where all the bioinformatics was performed. The quality of the FASTQ-files was first controlled using the software FastQC version 0.11.51. FastQC is a quality control software suitable for Illumina-derived NGS data in FASTQ-format and is excellent for summarizing details about the data, such as adapter content and read quality by position (Brown, et al, 2017). Basic statistics such as quality per base sequence, quality scores per sequence, contents of each base and GC per sequence, GC and N content per base, sequence length distribution, duplication and overrepresentation were all examined for every FASTQ-file. Following quality control, the FASTQ- files were trimmed using Trimmomatic version 0.362. The trimming was performed using the following parameters: Sliding window: 4:20 – the software scans the reads with a length of four bases at a time, cutting where the average base quality is below 20 per base in the four-base window. Leading: 3, which removes leading low quality or N bases below quality 3. Trailing: 10, which removes trailing low quality or N bases below quality 10. Minimum Length: 30, which lets the software exclude reads shorter than 30 bases long. The adapter sequence was also trimmed using the Illumina clip option, choosing the parameter NexteraPE-FE.fa:2:30:10, which removes NexteraPE-adapter sequences, allowing a maximum mismatch count of 2, while still allowing a full match. The match between two adapter ligated reads must be above 30, and the accuracy threshold

1: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

2: http://www.usadellab.org/cms/?page=trimmomatic

(12)

6

for adapter sequences against reads was 10. After the trimming was completed, quality control was repeated on the FASTQ-files. Files/sequences with inferior quality following trimming were deemed unusable and thus discarded.

Assembly and coverage

The trimmed sequences were then assembled by using the SPAdes assembly software version 3.11.13. SPAdes is an open-source software used for assembling both single-cell and multicell assembly, although it is built mainly to be used for shorter genomes, such as bacterial (Bankevich, et al, 2012). The coverage for the assembled genomes was then checked using QUAST version 4.6.34.

Analysis of genomes

The genomes were analyzed using tools available at the Center for Genomic Epidemiology5. ResFinder 3.0 was used to find any potential antibiotic resistance genes in the genomes using the parameters “Acquired antimicrobial resistance genes”. PathogenFinder 1.1 was used to predict the bacteria’s pathogenicity towards humans, using “Automatic Model Selection”. MLST 1.8 (Multi-Locus Sequence Typing) was used to screen the sequences against housekeeping genes in order to help identify the bacterial clones. PlasmidFinder 1.3 was used to identify any potential plasmids. The threshold for %ID was put at 95% using the database for Gram-positive bacteria. SpeciesFinder 1.2 was used to predict the species. The option “Assembled Genome/Contigs” was also chosen for all tools mentioned above.

The software Traitar was used to predict the phenotypical traits of the bacteria. The software uses two different classification methods (phypat and phypat+PGL) to predict different phenotypical traits. Both classification results are then combined into consensual votes, which more accurately predict the phenotype compared to the individual classifications (Weimann et al, 2016). Traitar analyzes and predicts 67 different traits.

3: http://bioinf.spbau.ru/spades

4: http://quast.sourceforge.net/quast.html 5: http://www.genomicepidemiology.org/

(13)

7

Figure 2. Workflow for bacteria NGS analysis. FASTQ-files were quality controlled and trimmed before assembly into FASTA-files. These files were then analysed using different tools hosted on Center of Epidemiology’s webpage, and the software Traitar.

FASTQ files

Quality control,

trimming Assembly Check

coverage

Data analysis

ResFinder PathogenFinder PlasmidFinder SpeciesFinder MLST

Traitar

FASTA-files

(14)

8

Results

Quality control and trimming

The raw FASTQ-files were not considered to be of high enough quality, requiring trimming before any further assembly and analysis could be performed. None of the single read files were removed, resulting in no singletons remaining after trimming. Average number of reads for each species and average length of reads both before and after trimming were calculated and are presented in Table 1 together with standard deviation within each species. The standard deviation is presented as percentage deviation from the mean. The number of reads for most species varied by a lot within the same species, resulting in deviations as high as 132.3%. More detailed statistics for each individual isolate is presented in Appendix C, Table C1.

After trimming, all the files were analyzed once again using the FastQC software. Adapter content, quality by position, tile and by count were all at acceptable levels for all of the FASTQ-files. All files did, however, show small warnings on the “per base sequence content”. As seen in Figure 3, the example sequence file, which is a S. pyogenes FASTQ file, showed a distribution of higher ratio of thiamine and adenine, and lower ratios of guanine and cytosine among the first ~14-15 bases, which is likely due to biased fragmentation during the library preparation. This fluctuation was shown in all FASTQ files. Adapter content was at accepted levels with no warnings displayed for any of the files.

Figure 3. Example screenshot from FastQC. The picture represents the ”Per base sequence content” results from a S. pyogenes isolate, showing the fluctuation in base-content among the first ~14-15 bases.

(15)

9

Table 1. Average number of reads before and after trimming for each species (phenotype). Standard deviation is also presented as a percentage deviation, as well as average length before and after trimming.

Species (phenotype)

Average reads before trimming

Average reads after trimming

SD before trimming (%)

SD after trimming (%)

Average length before trimming

Average length after trimming S. pyogenes, n=37

2230868,9 1777490.1 66.2 59.7 101 71

S. agalactiae, n=22

536399,5 398220.9 119.8 114.6 101 71

S. dysgalactiae

SGC, n=7 1095597 906862.7 87.7 87.2 101 71

S. dysgalactiae

SGG, n=13 1308949,5 1033535.5 90.1 85.3 101 71

S. pneumoniae, n=57

775894,6 572619.9 128.7 135.8 101 71

S. anginosus, n=6

1692326 1250903 67.5 64.5 101 71

S. constellatus, n=3

2184702,3 1499811.7 83.2 77.8 101 71

S. equisimilis, n=1

3152970 2436806 - - 101 71

S. gallolyticus, n=2

1348258 889253 101.9 93 101 71

S. gordonii, n=2

1273596 1065027.5 8.6 6.2 101 71

S. intermedius, n=2

1829379,5 1473183 88.5 87.3 101 71

S. mitis, n=7

877431 623364.5 125.5 105.6 101 71

S. oralis, n=1

1598202 1304622 - - 101 71

S. parasanguis, n=4

1344291,8 915964.5 132.3 120.4 101 71

S. salivarius, n=3

1743813 1360461.7 86.8 87.2 101 71

S. sanguinis, n=1

2868237 2052292 - - 101 71

Assembly and coverage

After the sequences were trimmed and quality controlled, assembly was performed in LINUX using SPAdes assembly software version 3.11.1. Assemblies for most bacteria were completed without any errors (S. pyogenes, S. dysgalactiae (C), S. dysgalactiae (G), S. pneumoniae, S. anginosus, S.

constellatus, S. equisimilis, S. gallolyticus, S. gordonii, S. intermedius, S. oralis and S. sanguinis) as seen in Table 1. 24.2% of all 169 assemblies finished with errors, stating that too many erroneous k- mers were present. One assembly, which was for S. mitis failed completely, rendering no assembled FASTA-file at all. This isolate was removed and excluded from any further analysis. Following the assembly, all FASTA-files were run through the QUAST software in order to identify the average coverage for every assembled isolate. As seen in Table A1, 41.6% of the assembled genomes had an average coverage greater than 30, while 13.1% had an average coverage of between 20-30, and 45.2% had an average coverage of less than 20. A coverage of greater than 20, but less than 30 was considered somewhat acceptable, since 20 has been reported as sufficient coverage for successful assembly (Chen et al., 2013), while anything lower than 20 was considered too low to be reliable.

Coverage greater than 30 was considered ideal (Rieber et al., 2013). S. pyogenes showed an overall

(16)

10

higher average coverage, while a majority of e.g. S. pneumoniae and S. agalactiae had low average coverage (below 20). The remaining species also showed mixed results, with S. equisimilis, S.

gordonii, S. oralis and S. sanguinis having good coverage, while the rest were divided somewhat equally between high, medium and low average coverage. The average N50, size of largest contig, number of contigs and total length varied by a lot within the species (see Table 2). A good example of this is S. agalactiae with an N50-value of 62918, with a percentage deviation of 94%, meaning the N50 values within that species are very spread. The average largest contig size for S. agalactiae was also 177585 with a percentage deviation of 92%, and the average number of contigs being 251.5 with an extremely high percentage deviation of 153.1%. S. pneumoniae also showed a very low average N50 size of 22066, a relatively low average largest contig and a very high number of contigs (401.8 contigs on average). The percentage deviations for these statistics were also very high, all being higher than 78%, meaning a very high spread of data points. The low sample size for twelve of the species should also be kept in mind.

Table 2. Statistics for each species (phenotype) post-assembly. Average N50, number of contigs, total length and largest contig, as well as their standard deviation presented as percentage deviation.

Species (phenotype)

Average N50

Average largest contig

Average

# of contigs

Average total length

SD N50 (%)

SD Largest Contig (%)

SD

# of contigs (%)

SD Total Length (%)

S. pyogenes, n=37 135967 376247 35.6 1823175 22.3 38.3 30.8 4.9

S. agalactiae, n=22 62918 177585 251.5 1722872 94 92 153.1 39.9

S. dysgalactiae SGC, n=7

60938 151377 74.1 2143271 13.4 17.6 17 3.1

S. dysgalactiae SGG, n=13

69945 167549 71.9 2147515 39.4 33.5 16.7 3.8

S. pneumoniae, n=57 22066 71764 401.8 1749863 82.6 78.7 99.1 29.8

S. anginosus, n=6 174359 345915 43.3 1961550 79.3 51.6 53.9 2.1

S. constellatus, n=3 176839 361968 29.7 1887435 20 4.1 10.3 1.3

S. equisimilis, n=1 55933 221455 77 2192112 - - - -

S. gallolyticus, n=2 174887 310389 52 2217196 2 17.1 29.9 6.2

S. gordonii, n=2 204750 379200 26 2320198 25.6 5.9 21.8 3.3

S. intermedius, n=2 274414 363258 22.5 2023172 30.4 6.9 53.4 0.5

S. mitis, n=7 213694 279078 274 1917039 193.9 140.2 138.2 7.6

S. oralis, n=1 135907 542913 45 2139235 - - - -

S. parasanguis, n=4 53484 202020 350.5 2468546 65.8 74.8 129.2 36.7

S. salivarius, n=3 101873 174238 379.7 1785677 92.2 85.2 154.3 41.9

S. sanguinis, n=1 183666 404153 31 2359428 - - - -

SpeciesFinder

The 168 FASTA-files were then uploaded to the Centre for Genomic Epidemiology’s servers, where they were analyzed. First, the files were analyzed using the SpeciesFinder tool, which would help identify the bacteria species by use of the 16S rRNA gene. As can be seen in Appendix A, Table A4,

(17)

11

results were mixed. Eight species were identified genotypically to be the same species as MALDI-TOF predicted. The genotype was decided by SpeciesFinder, while the phenotype was decided by MALDI- TOF. Isolates marked as (f) or failed are isolates that could not be matched against a particular species in the database, prompting the SpeciesFinder tool to check the nearest identical species. If the 16s rRNA gene has an average coverage and percentage identity of lower than 98% against the nearest species, the species identification is marked as “failed” (Larsen et al, 2014). The rest were either partially the expected genotype, with a few other species mixed in, or none of the isolates were the expected genotypes. All unexpected genotypes were other species of Streptococci except for one S. pneumoniae isolate which returned as Truepera radiovictrix. MALDI-TOF was only able to identify the streptococcal groups for some of the isolates, while the SpeciesFinder was able to identify which exact species the isolates belonged to. SGA isolates were mainly identified as S.

pyogenes, SGB as S. agalactiae and SGC and SGG were identified as S. dysgalactiae group C and group G respectively.

MLST

The results from the SpeciesFinder were then integrated when the MLST was used. The MLST results were ideally expected to return with 100% identical housekeeping genes to their template counterparts. However, as projected in Appendix A, Table A3, most results did not match the expected housekeeping genes for the identified species. S. pyogenes, S. dysgalactiae (SGC) and S.

equisimilis had higher rates of matching species. MLST was unavailable for seven species. Overall, 61.3% of all isolates were 100% matches with the housekeeping genes from their respective genotypes according to SpeciesFinder. The sequence types were spread somewhat evenly among the different species. Most notably, ST28 seemed more prevalent than the others in S. pyogenes. No sequence type could be established for almost half of the S. pneumoniae isolates (40.3%). For all isolates belonging to species S. anginosus, S. constellatus, S. gordonii, S. intermedius, S. mitis, S.

oralis, S. parasanguis, S. salivarius and S. sanguinis, all results came back as unknown sequence types. This is illustrated in Appendix A, Table A%.

ResFinder

Isolates were also analyzed using the ResFinder tool to identify any possible antibiotic resistance genes. The isolates were screened for any known resistance genes for aminoglycocides, beta- lactams, colistin, fluoroquinolones, fosfomycin, fusidic acid, glycopeptides, MLS (macrolides, lincosamides, streptogramines), nitroimidazole, oxazolidinone, phenicol, rifampicin, sulphonamides, tetracyclines and trimethoprim. As illustrated in Appendix A, Table A2, very few resistance genes were found, evenly distributed among three groups of antibiotics – MLS, phenicol, aminoglycocides and tetracyclines. S. equisimilis and S. oralis had the highest ratio of isolates with MLS and tetracycline resistance genes, although the sample size for both species was only one. S. agalacitae had significantly higher proportion of tetracycline resistance (54.5%) than the other species. Both S.

pyogenes and S. pneumoniae had very few resistance genes overall, with less than 10% for any of the resistance genes. All S. mitis isolates had resistance genes, with more than half being MLS resistance.

PathogenFinder and PlasmidFinder

The isolates were also analyzed for any potential plasmids using the PlasmidFinder tool, and the isolates were also identified as human pathogens using the PathogenFinder tool. All 168 isolates returned as human pathogens, without any plasmids.

(18)

12

Traitar

All FASTA-files were also analyzed using the software Traitar to predict the phenotypical traits of the bacteria. The isolates were grouped together with their respective species and analyzed. All isolates were expected to be Gram-positive and cocci (Baron and Patterson, 1996), which Traitar accurately predicted (see Appendix A, Figures A1-A7). For S. pyogenes, (Appendix A, Figure A1) most phenotypical traits look like expected across all isolates. All isolates were predicted to be catalase negative, which was expected (Brenot et al, 2004). All isolates except one (#37) were negative for Voges Proskauer. β-hemolysis is also an exception, with seven isolates being predicted as non-β- hemolytic. Isolate 27, which was identified as S. dysgalactiae by SpeciesFinder also seems to show the most different results compared to the rest of the isolates.

For S. agalactiae (SGB, Appendix A, Figure A2), the results were a bit more mixed. Samples 50 and 54 seemed to deviate more than the rest, and only had positive votes for respective traits from either phypat alone, or phypat and phypat+PGL combined. Furthermore, ten out of 22 samples were positive for citrate, one of them being classified as such by both classification methods, and the remaining nine by phypat. The remaining twelve samples were classified as citrate negative.

Melibiose and DNAse activity were both also very mixed, with almost half the isolates being classified as negative for melibiose by phypat+PGL, and four isolates being classified as DNAse positive by phypat+PGL. S. dysgalactiae (SGC, Appendix A, Figure A3) showed a more uniform prediction of traits than S. agalactiae, with only a few exceptions where one sample deviated from the rest. S. dysgalactiae (SGG, Appendix A, Figure A4) also looked very uniform in its predictions, with the exception of β-hemolysis where 6 out of 13 isolates were predicted not to be β-hemolytic.

For S. pneumoniae, the results were extremely mixed for quite a few traits, as illustrated in Figures A5 and A6 (Appendix A). Sample 96 seemed to deviate the most, showing negative predictions for most traits where the rest, or majority of the S. pneumoniae isolates were predicted to be positive, such as citrate, esculin hydrolysis and coagulase production. Salicin, acetate utilization, and DNAse were some of the notable traits where predictions varied by a lot within the species.

In Figure A7 (Appendix A), trait prediction for the remaining eleven species is illustrated. The biggest deviations among these species was in the S. mitis group (samples 153-159), where the isolates differed from each other in a few traits, e.g. arginine dihydrolase where four samples were predicted as negative, and acetate utilization where half the samples were predicted as negative.

Clinical data

Out of 155 patients, a total of 21 patients (22 isolates) suffered from sepsis according to their clinical diagnosis. The rest had a wide range of different diseases/infections such as urinary tract infection (UTI), unspecified bacterial pneumonia and erysipelas. The number of sepsis patients were spread approximately evenly between the different Streptococcal species. Out of 37 isolates, 4 S. pyogenes isolates were from sepsis patients, 2/22 S. agalactiae, 2/7 S. dysgalactiae (C), 4/13 S. dysgalactiae (G), 3/57 S. pneumoniae, 2/6 S. anginosus, 1/2 S. constellatus, 1/2 S. gallolyticus, 1/2 S. gordonii, 1/8 S. mitis and 1/3 S. salivarius were also isolates from sepsis patients. When looking at different culture types, the isolates that belonged to sepsis patients were found among four different culture types: anaerobic blood culture (5), aerobic blood culture (11), general culture (3) and upper respiratory tract cultures (3). None of the isolates that came from sepsis patients were S. equisimilis, S. intermedius, S. parasanguis or S. sanguinis, nor were there any isolates from throat cultures, urine cultures, spot cultures or lower respiratory tract cultures. 15 of the patients were male, with an average age of 69 years old, and 7 were female with an average age of 78 years old.

(19)

13

Discussion

Considering the trimming and quality control of the sequences, the overall quality of the samples was good. No isolates were discarded, and with only warnings about fluctuations of base-content among the first few bases, the trimming seemed to have been successful as well. The fluctuation in base-content (Figure 3) could likely be attributed to biased fragmentation during the library preparation, which seems to be a known issue for some types of libraries, especially Illumina (Hansen, et al, 2010). This issue does not seem to be correctable by trimming, but on the other hand, it does not seem to have any severe effects on any downstream analysis either (Babraham Informatics, 2018).

Assembly of the genomes was quite a bit challenging, considering the amount of warnings regarding erroneous k-mers present in most sequence files. Assemblies were, however, completed. Coverage varied immensely between the isolates, as can be observed in Appendix A, Table A3, with 45.2% of the isolates having an average coverage of below 20. A coverage threshold of 30 or higher was chosen as the threshold for higher likelihood of reliable assembly results, since 30 seems to be used in quite a few studies (Rieber et al., 2013; Kisand and Lettieri, 2013; Sims et al., 2014; Ellington et al., 2017), and is considered the threshold needed for correct assembly without any gaps (Chisatz, et al, 2011). Another study has also reported that a coverage of 20 has been sufficient for de novo assembly and identification of SNPs (Chang et al., 2013), allowing for an average coverage of 20 to be the lowest acceptable threshold for this study, although a coverage of >30 is likely more reliable than a coverage of 20-30. Higher coverage is favorable, since higher coverage means an increased likelihood of successful assembly (Koren and Phillippy, 2015). However, higher coverage does not mean that the results or that the genome has been assembled 100% correctly in all cases. It simply increases the likelihood (Sims et al, 2014; Pablo et al, 2018; Koren and Phillipy, 2015). Likewise, low coverage does not guarantee that the results are wrong. Studies have shown that combining reference-guided assembly with de novo-assembly can significantly improve the quality of the assembly (Lischer & Shimizu, 2017), which could potentially have improved the quality of the assembled genomes used in this study. This is perhaps something to consider in future studies when assembling NGS-reads from streptococci.

When looking at the coverage and comparing it to the rest of the analyses, there seems to be a trend towards assemblies with low coverage and/or warnings about erroneous k-mers during assembly presenting some less preferable results. E.g., when looking at the S. pneumoniae group, (which had the most/highest percentage of low-coverage assemblies, and displaying the most warnings during assembly), all isolates that showed warnings during assembly either did not return a 100% match for S. pneumoniae in the MLST analysis or showed extremely low coverage. All four assembled genomes that did not pass as S. pneumoniae in the SpeciesFinder (Appendix A, Table A4), were also the ones that returned errors during the assembly.

Looking at the dataset as a whole, especially at the analyses that did not turn out as expected (i.e.

MLST, SpeciesFinder), the same trend continues. Isolates with low coverage and/or errors either not matching 100% with the MLST analyses, or not being identified as the expected species were common. Although, the SpeciesFinder could possibly be attributed to the isolates simply being another species, the low coverage of these isolates means they may as well be identified as the predicted species, just not assembled correctly.

MLST was unavailable for 30 out of 168 samples. These samples belonged to S. anginosus, S.

constellatus, S. equisimilis, S. gordonii, S. intermedius, S. mitis, S. parasanguis, S. salivarius and S.

(20)

14

sanguinis. As such, sequencing type could not be established for these samples. Future studies done on these species would perhaps benefit from using other tools or databases that have MLST profiles for these species or wait for existing databases to be updated.

Low coverage and incorrect assembly of parts of the genomes would explain other results as well, e.g. the sequence types. S. pyogenes, as mentioned earlier, had great coverage across all isolates, except for a couple of exceptions. The variety of sequence types were spread, although ST 28 (see Appendix A, Table A5) was more prevalent than the others. Only two of the isolates were unknown sequence types. Compare this to S. pneumoniae, where the coverage was extremely poor for the species as a whole, along with extremely mixed/bad results from the MLST, 23 out of 57 isolates being unknown sequence types, and one being identified as a non-streptococcal species.

S. pyogenes was the species that returned the best coverage and no errors during assembly. All of the isolates except for one were identified as S. pyogenes, which is a part of the Streptococcus Group A (Reglinski and Sriskandan, 2015). All isolates also returned a 100% match from the MLST against S.

pyogenes, except for the isolate identified as S. dysgalactiae. Looking at the ResFinder results, only two of the isolates seemed to have any antibiotic resistance genes. Both isolates had genes for resistances towards tetracyclines, while the other 35 isolates had none. This seems to correspond with earlier research about S. pyogenes, where it is reported that only resistances against macrolides and tetracyclines are found in clinical isolates of S. pyogenes (Cattoir, 2016). None of the S. pyogenes isolates seemed to contain any plasmid DNA, which also seems to correspond with earlier research (Horn et al., 1998). S. pyogenes seems to be not so intrinsically competent, compared to S.

pneumoniae, and does not seem to commonly transfer genes by conjugation, since plasmids are almost never observed in clinical isolates (Horn, et al¸ 1998).

The S. agalactiae isolates had a high number of samples that carried resistance genes against tetracyclines. Out of the 21 isolates, 12 harbored TET(M)-genes. This seems to be in agreement with previous studies on S. agalactiae where tetracycline resistance amongst S. agalactiae has been very prevalent, and the most prominent tetracycline resistance gene has been the TET(M)-gene (Hraoui et al., 2012; Emaneini et al., 2014; Usein et al., 2012). In these studies, however, tetracycline resistance was even more prevalent (>90%) among the samples than in the current study (57%). This could be attributed to a couple of different factors. The first one being sample size, since the previously mentioned studies had more S. agalactiae samples included. The second reason could be due to the fact that coverage for the S. agalactiae isolates were extremely low (15/21 isolates with

<20 coverage). Three of the isolates that had tetracycline resistance genes expressed any resistance towards tetracyclines phenotypically based on laboratory data. The remaining 9 samples were not tested for tetracycline resistance in the lab, for unknown reasons. This could be due to different isolates being collected from different sample types, therefore being tested in different labs, resulting in different antibiotics being tested since all labs do not test for the same antibiotics. Eight of the isolates also displayed errors during assembly, and the same 8 samples did not match 100%

with S. agalactiae in the MLST.

S. pneumoniae had two isolates carrying tetracycline resistance genes (Appendix A, Table A2), both of them being TET(M) genes. This is very low compared to previous studies, which show numbers closer to 70% of S. pneumoniae carrying tetracycline resistance genes (Reinert, 2009). Both samples also carried resistance genes against MLS. Despite the low sample size of two isolates with tetracycline resistance, both having tet(M)-genes is in agreement with previous studies which have shown that the tet(M)-gene is the most common gene coding for resistance against tetracycline in S.

pneumoniae (Doherty et al., 2000). The gene mediates tetracycline resistance by ribosomal

(21)

15

protection. It allows for the tRNA and acceptor site in the ribosome to bind, even in the presence of tetracycline (Burdett, 1996). The low occurrence of resistance genes in these samples could possibly be attributed to a few factors: either the genomes have not been assembled correctly, considering such a large portion of the S. pneumoniae assemblies had very low average coverage. As mentioned earlier, a lot of the assemblies also returned errors during the assembly process. The second reason could simply be that the genomes were assembled correctly, but the bacteria have not acquired any resistance genes.

S. mitis also had a high ratio of resistance genes, mainly for MLS, which has been reported in earlier studies as well (Seppälä et al., 2003).

The results from Traitar (Appendix A, Figures A1-A7) were difficult to interpret. Overall, S. pyogenes looked like the most accurately predicted group, with a few exceptions, such as seven isolates being predicted as non-β-hemolytic, while S. pyogenes is a β-hemolytic species (Facklam, 2002). Expected results would have been uniform in traits within the same species, e.g. all S. pyogenes isolates being Voges-Proskauer negative. However, some phenotypical traits varied a lot within the same species, e.g. DNAse activity in S. agalactiae (Appendix A, Figure A2), acetate utilization in S. pneumoniae (Appendix A, Figures 5 and 6) and β-hemolysis in S. dysgalactiae (SGG, see Appendix A, Figure A4).

This could, like previously mentioned, be attributed to the errors during assembly and small FASTQ file sizes.

Possible explanations for the extremely low coverage among all samples could be either errors during the sequencing or during the DNA extractions performed. SciLifeLab in Stockholm stated that the exchange for certain species is a lot worse when using MagNa Pure extraction than for other extraction methods. It was also stated that other clients had been able to increase the exchange by switching to manual DNA extraction instead of automatic (SciLifeLab, 2017, personal communication, November 2). However, using manual extraction would require more time and higher cost than using the automated extraction. Furthermore, no optimal protocol for manual extraction of bacteria was available at UniLabs at the time of DNA extraction (Enroth H., 2018, personal communication, May 30). Better refined methods in order to circumvent the need for manual extraction for certain species would allow for labs to still use automated extraction. When looking at the file-sizes of the FASTQ-files, almost exclusively all genomes that returned errors during assembly had FASTQ-files that were significantly smaller in file-size than the ones that did not return any errors. File-sizes pre-quality control would range from 3 MB to about 600 MB, meaning the amount of sequences contained in each file varied by magnitudes of up to 20 within the same species. As seen in Table 2, the average number and size of contigs vary both between the different species (e.g. average 251.5 contigs for S. agalactiae and 35.6 for S. pyogenes), as well as the percentage deviation being high for a majority of the species. Average N50 also varies both between species (average) and within the species themselves. For S. agalactiae, the average N50 was less than half of that of S. pyogenes. The average number of contigs was also ~7 times as high, and the average size of largest contigs were less than half of that of S. pyogenes. N50 is regularly used as a metric to measure the quality of the assembly, since it represents how much of the genome is covered by larger contigs (Schatz et al, 2010). The values for e.g. S. agalactiae in this study were worse than those from previous studies (de Aguiar et al, 2016), suggesting relatively poor quality of assembly for a majority of the samples. This could very well affect coverage depth, since shorter sequences to work with means more possible combinations, and also lower coverage depth during assembly and therefore the reliability of the assembled genome is not too high. A notable portion of the assembled genomes also varied by different margins from their expected genome sizes. E.g. Two assembled S. agalactiae genomes were only 0.3 MB and 0.086 MB (Appendix C, Table C1), while its

(22)

16

genome should be around 2 MB (de Aguiar et al, 2016). The average total length for each species, its percentage deviation and its expected genome size is presented in Appendix A, Table A6. Most species seem to be relatively close to their expected assembly sizes, with a relatively low percentage deviation of ~1-6%. For S. agalactiae, S. pneumoniae, S. parasanguis and S. salivarius, however, there seems to be a trend of assembled genome size deviating from the expected genome size, as well as high percentage deviation (up to 41.9%). Couple this with the quality control and trimming, and the amount of usable sequences decreases even further. Another reason might be because of the difficulties in growing Streptococci, namely S. pneumoniae. Because of their tendency to go into autolysis when certain conditions are met, mainly by staying in the stationary phase for too long, the acquisition of viable log phase cells is quite difficult (Restrepo et al., 2005; Martner et al., 2009).

Gaining more knowledge about optimal culturing conditions for Streptococci would ideally yield increased amounts of viable cells, hopefully improving the sequencing results after DNA extraction.

Only four of the 22 isolates that came from sepsis patients had any resistance genes. One SGB isolate, which was from a 90-year-old female, had genes for resistance against both MLS and tetracyclines. The isolate was from an aerobic blood culture. Two of the isolates were SGG isolates which both had genes for resistance against tetracyclines. One of those two isolates was from a general culture from an 82-year-old female, while the other was an aerobic blood culture from an 82-year-old male. The fourth isolate was an S. pneumoniae isolate taken from a 66-year-old male.

The bacteria had resistance genes against MLS and were from an aerobic blood culture. All four patients only had single isolates extracted.

When comparing the phenotypical antibiotic resistances and the genotypical, only three samples that expressed resistance towards antibiotics had genes for the resistances. Two of the samples were S. pneumoniae samples, both with resistance against tetracyclines, and the third sample was the single S. oralis sample which also expressed resistance against tetracyclines. The remaining 25 samples that possessed resistance genes against tetracycline did not express this trait phenotypically. None of the 12 samples that had resistance genes against MLS showed any phenotypical resistance against MLS. The results were the same for the single resistance gene found against aminoglycocides and the two S. mitis isolates that had resistance genes against phenicol; no phenotypical resistance was found against either of these antibiotic groups.

Whether the species identifications or the rest of the results are correct or not is extremely difficult to tell with the data at hand. 16S RNA sequencing is one of the gold standards in modern medicine when identifying bacterial species (Janda and Abbott, 2007). The MALDI-TOF procedure is fast, accurate and a relatively cheap alternative, able to discern between different species with little to no difficulties. Its drawbacks are, however, that it relies on having access to as complete databases as possible. With insufficient data to work with, the bacteria might only be identified down to e.g.

genus level (Singhal et al., 2015). A majority of the bacteria in this study were only identified down to their respective groups using MALDI-TOF, e.g. Streptococcus Group A, B, C, G. It is also uncertain if the S. salivarius identification refers to the group S. salivarius, or the species S. salivarius, which is one of three species in the S. salivarius group, together with S. vestibularis and S. thermophilus (Thompson et al,, 2013), which the SpeciesFinder identified the samples as. In the scenario that the MALDI-TOF identified the isolates down to the group level only, NGS would prove superior in terms of species identification. Further contribution to the MALDI-TOF database used could potentially improve these classifications. Even though MALDI-TOF has its pros, the most ideal situation would be to use NGS in as short time as possible, from when the patient is admitted and suspected of having sepsis. Being able to use NGS directly on patient samples would not only help determine the species quickly, but also identifying sequence types and antibiotic resistances more quickly. This would help

References

Related documents

In this study, we assessed the feasibility of high-throughput sequencing of eight full-length classical HLA genes using non- commercial primers and evaluated freely

As current methods impose restrictions in the genetic screening of PCC and PGL patients we initiated a study investigating the use of targeted DNA enrichment, sequenced on a

In the document, we explore a great gamut of modification over ALF processing stages, being the ones with better results (i) a QP-aware implementation of ALF were the

The bacterial system was described using the growth rate (k G ) of the fast-multiplying bacteria, a time-dependent linear rate parameter k FS lin , the transfer rate from fast- to

AK AD EM IS K AVHANDLING som med vederbörligt tillstånd av Medicinska fakulteten vid Umeå Universitet. f ö r avläggande av D r

This file is further processed by CAR tool to generate an analytical report, for example by expressing coverage values per ROI:s and to create a short list of coverage depth

Numbers of deletions and duplications detected by CNVnator and Manta as mean values per sample and as collapsed copy number variable regions (CNVRs) at population level...

a) Inom den regionala utvecklingen betonas allt oftare betydelsen av de kvalitativa faktorerna och kunnandet. En kvalitativ faktor är samarbetet mellan de olika