• No results found

THE B CELL TRANSCRIPTIONAL LANDSCAPE UPON MUCOSAL HEALING IN EXPERIMENTAL COLITIS

N/A
N/A
Protected

Academic year: 2022

Share "THE B CELL TRANSCRIPTIONAL LANDSCAPE UPON MUCOSAL HEALING IN EXPERIMENTAL COLITIS"

Copied!
44
0
0

Loading.... (view fulltext now)

Full text

(1)

THE B CELL TRANSCRIPTIONAL LANDSCAPE UPON MUCOSAL

HEALING IN EXPERIMENTAL COLITIS

A

SINGLE CELL TRANSCRIPTOMICS ANALYSIS

Claudio Novella Rausell

Degree project in bioinformatics, 2020

Examensarbete i bioinformatik 45 hp till masterexamen, 2020

Biology Education Centre, Uppsala University, and Karolinska Institute, Center for Molecular

(2)
(3)

Abstract

Upon damage of the intestinal barrier, a complex interaction between the immune system

and the epithelium is required to promote mucosal healing. However, which cell type and how

the immune system orchestrates changes in the tissue transcriptomic landscape to eventually

modulate mucosal healing is yet to be investigated. Here, using scRNA-seq, I identified a

particular cluster of cells that increased the most during mucosal healing and is characterized

by the expression of serine proteases inhibitors and for having a higher JAK/STAT activation

signature. Further characterization of the stromal cell and intestinal epithelial network cor-

roborate that B cells modulate their transcriptomic profiles related to tissue remodelling. All

(4)
(5)

Divide and conquer

Popular Science Summary Claudio Novella Rausell

Inflammatory Bowel Diseases affect more than 2.5 million people worldwide and no cure has been found yet. These diseases affect the gastroenteric tissue and are characterized by phases of inflammation followed by recovery.

The human body is compartmentalized. First we have organs (such as the liver or the heart) which are specialized in one specific function (pumping blood and filtering, in this case).

These organs are made up of tissues (such as muscular or nervous tissue), and each tissue is formed by groups of organized cells. These cells are the simplest form of life. But they need instructions on how to behave, these are written in a special language, ribonucleic acid (RNA).

Thanks to the improvements in technology, today we are able to examine these cells one by one and read their instructions (i.e. RNA). Once we’ve read their instructions we can group cells based on how similar their instructions are. For example, if we were to read the instructions of all our home appliances we would end up with groups such as cooking, cleaning, music or video. But if we read the instructions of our friend’s home, we might end up with slightly different groups: mechanics, bikes, video and cooking, for instance. We can do the same with our cells, reading their instructions when we are healthy and in disease to see differences in these groupings that might be causing a disease.

In this study I’ve followed this approach to study Inflammatory Bowel Diseases using a mice model. I’ve found that some cell’s instructions change in the recovery phase of the disease.

These findings will lead to new studies following this lead that will try to find a possible cure.

Degree project in bioinformatics, 2020

Examensarbete i bioinformatik 45 hp till masterexamen, 2016

(6)
(7)

Contents

Abstract 2

Popular Science Summary 4

Abbreviations 9

1 Introduction 12

1.1 Inflammatory Bowel Diseases . . . . 12

1.1.1 Generalities . . . . 12

1.1.2 Relevance . . . . 12

1.1.3 Current treatment . . . . 13

1.2 The immune system and IBD . . . . 13

1.2.1 Innate immune system . . . . 14

1.2.2 Adaptive immune system . . . . 14

1.2.3 B cells, the missing link . . . . 14

1.3 Single cell technologies . . . . 15

1.4 Computational analyses of scRNA-seq . . . . 16

1.5 Objectives of the degree project . . . . 18

2 Materials and Methods 20 2.1 Biological material . . . . 20

2.1.1 Mice . . . . 20

2.1.2 Colitis model in mice . . . . 20

2.2 Isolation and cell sorting . . . . 20

2.3 Single cell sequencing protocol . . . . 21

2.3.1 Library preparation . . . . 21

2.3.2 Sequencing and preprocessing . . . . 21

2.4 scRNA-seq analysis of B cells during tissue repair . . . . 22

2.4.1 Enrichment analysis . . . . 23

2.5 scRNA-seq analysis of Stromal and Epithelial cells in B cell-depleted mice . . 23

2.5.1 Topic Modeling . . . . 23

3 Results 24 3.1 scRNA-seq reveals B cell heterogeneity dynamics during recovery . . . . 24

3.2 Cell heterogeneity uncovers pathway activity perturbations in B cells during experimental colitis . . . . 26

3.3 B cells present a shifted transcriptional fingerprint amid recovery . . . . 27

3.4 B cells impact tissue remodeling programs during mucosal healing . . . . 28

4 Discussion 30

4.1 Ethical statement . . . . 31

(8)

5 Conclusions and future perspectives 32

6 Acknowledgements 33

References 34

Appendix 41

(9)
(10)

Abbreviations

In alphabetical order, a list of all abbreviations used in the text.

APC Antigen Presenting Cell

API Application Programming Interface B1b B1 B cell

BCL Binary base Call format Bmem Binary base Call format UC Ulcerative Colitis

CD Cronh’s Disease

CLI Command Line Interface

cDNA complementary DNA

DC Dendritic Cell

DEG Differentially Expressed Gene DNA Deoxyribonucleic Acid

DT Diphteria toxin DTT Dithiothreitol

EDTA Ethylenediamine Tetraacetic Acid FCS Fetal Calf Serum

GC Germinal Center

GWAS Genome Wide Association Study HBSS Hanks’ Balanced Salt Solution

HEPES 4-(2-Hydroxyethyl)-1-Piperazineethanesulfonic Acid IBD Inflammatory Bowel Disease

IEL Intraepithelial lymphocyte

(11)

IL Interleukin

ILF Intestine Lymphoid Follicle IVT In vitro transcription

JAK-STAT Janus kinase signal transducer and activator of transcription LDA Latent Dirichlet Allocation

LP Lamina propia

LPL Lamina Propia Lymphocyte MLN Mesenteric Lymph Node

MAPK Mitogen-activated protein kinase

mRNA messenger RNA

NGS Next Generation Sequencing PBS Phosphate-Buffered Saline PCR Polymerase Chain Reaction RAG Recombination-activating Gene RNA Ribonucleic Acid

scRNA-seq single cell RNA sequencing SNN Shared Nearest Neighbor

TF Transcription Factor

TNF Tumor Necrosis Factor

UMI Unique Molecular Identifier

(12)
(13)

1 Introduction

1.1 Inflammatory Bowel Diseases

The term Inflammatory Bowel Disease (IBD) refers to a set of chronic inflammatory diseases of the gastroenteric tissue. These are comprised of Ulcerative Colitis (UC) and Crohn’s disease (CD), among others. Both diseases are defined by cycling phases of relapse and remission after inflammation. While CD can affect any part of the tract, UC typically involves the colon (Wallace, 2014).

1.1.1 Generalities

As many other immune diseases, IBDs are characterized by an immune deregulation. In this case, it induces an abnormal response against commensal microbiota in patients with a predisposed genetic background. Recent studies trying to unravel the pathogenesis of the disease revealed its association with environmental factors, genetics and the microbiome.

Among the environmental factors involved, antibiotic use (Hviid et al., 2011) and diet (De Fil- ippo et al., 2010) have been reported to be associated with the disease. Furthermore, the surge of Genome Wide Association Studies (GWAS) in the late 2000s unveiled genetic vari- ants that are associated with an increase in the susceptibility to IBDs. These variants play a role in innate and adaptive immunity regulatory networks (Burton et al., 2007; Duerr et al., 2006; Hampe et al., 2007).

When being born, a human’s gastrointestinal tract is colonized by a broad range of microor- ganisms. These microorganisms eclipse host cells by approximately 10 fold (Saleh and Elson, 2011), creating a vast, complex network where immune cells and the microbiota interact with each other. The tolerance as well as the regulation of these interactions is key for the intestinal homeostasis (Nell et al., 2010). It’s been widely described how the host micriobiota serves as stimulus for an inflammatory response leading to IBD (D’Haens et al., 1998).

It’s still unclear whether the cause of IBD is an imbalance in bacterial content or a deregulated immune response against the microbiota, but it probably is an interplay of these and other unknown factors.

1.1.2 Relevance

During the second half of the 20th century, an increase in the incidence of UC and CD took place in western countries (Molodecky et al., 2012). Recent studies have shown that the same shift is happening in countries with rapid socioeconomic growth (Kaplan and Ng, 2016).

Nowadays, more than 2.5 million people suffer from the disease worldwide (Burisch et al.,

2013). In Europe, the incidence of IBDs was 505 per 100.000 persons for UC and 322 per

100.000 persons for CD. In North America, the extent of UC was reported to be of 286.3

cases per 100.000 persons and 318.5 cases per 100.000 persons in the case of CD (Ng et al.,

2017).

(14)

1.1.3 Current treatment

Since the pathology of the disease has not been characterised yet, today’s treatment tries to maintain the patient in remission and mitigate secondary effects, rather than correcting or reversing the mechanism. The different treatments are classified based on their mechanism of action. Broadly speaking we can find aminosalicylates (5-aminosalicylic acid; 5-ASA), which try to keep the patient at remission in UC; corticosteorids (Azathriopine; AZA), inducing and maintaining remission; immunosupressants (anti-TNF and mercaptopurine; 6-MP) and monoclonal antibodies (Pithadia and Jain, 2011).

However, subgroups of patients fail to respond to anti-inflammatory therapies like the treat- ment with anti-TNF or 6-MP while others develop adverse effects. This, grouped with the disease prevalence presented above highlights the importance of IBDs in our society and the urge to find alternative therapies.

1.2 The immune system and IBD

As it has been stated above, IBDs are characterised by an immune deregulation. This phe- nomenon is comprised of an epithelial damage (Korzenik and Podolsky, 2006), usually related to an anomalous mucus production or defective repair; the development of inflammation in- duced by the micriobiota and immune cells infiltrating the lamina propia (LP) (Choy et al., 2017); and the failure of the immune regulation that controls the aforementioned response (Ince and Elliott, 2007).

The gastrointestinal immune system consists of innate and adaptive immunity. The in- nate immune system is comprised of the intestinal mucin, the epithelium, an acidic pH, Macrophages, Neutrophils, Dendritic cells (DCs), Innate Lymphoid Cells (ILCs) as well as cytokines and other immune molecules such interleukins (ILs) and tumor necrosis factors (TNFs). When this innate response is not able to prevent the pathogenesis, the adaptive immune system starts producing pathogen-specific B and T cells that will help overcome the disease.

It’s been shown that mice lacking Recombination-activating genes (RAG

-/-

), hence lacking an adaptive immune system, don’t develop spontaneous colitis. Notwithstanding, when these mice are treated with chemicals or pathogens that leverage colitis-like conditions, they develop colitis (Buonocore et al., 2010). This suggests that the components of the innate immune system, in the absence of an adaptive immune system, can induce IBD. Nonetheless, the adaptive immune system is thought to play a role in the immunopathology of the disease either through a surplus of pro-inflammatory cytokines or defective anti-inflammatory effector cells.

All in all, the pathology of IBD is tightly regulated. Whether an immune tolerance or a defensive inflammatory response takes place highly depends on the appropriate functioning of this regulation.

13

(15)

1.2.1 Innate immune system

The innate immune system is the first defensive measure against foreign pathogens and anomalies and triggers the adaptive immune response. This response is non-specific and has no memory.

One of the first effectors of the innate immune response is the epithelium. There are different subtypes of Intestinal Epithelial Cells (IECs) which are specialized in their function, these include production of mucus, secretion of hormones and synthesis of antimicrobial peptides and proteins. The epithelium is supported by connective tissue cells such as stromal cells, which can influence the activity of immune cells by production of cytokines and other sig- nalling molecules. This can, for instance, modulate the activity of the epithelial stem cell niche, affecting growth and renewal of epithelial cells (Roberts et al., 2017). Epithelial cells are renewed every two to three days in a process of proliferation and apoptosis. Disruptions in this process enable organisms to break through the layer and triggering an immune response that, in turn, can lead to IBD (Sartor, 2006).

The next effectors in the innate response chain are the innate immune cells present in the LP.

Among these we can find Antigen Presenting Cells (APCs) such as DCs and B cells. B cells sense antigens and present them to T cells, polarizing their response (Batista and Harwood, 2009).

1.2.2 Adaptive immune system

The adaptive immune system is pathogen-specific and confers lasting immunity. It is triggered after the innate immune system hasn’t been able to deceive the pathogen. It is thought to be the main contributor to IBD (Wallace, 2014).

The main component of the adaptive immune system are lymphocytes. In the gastroenteric tissue we can find two groups of lymphocytes depending on how they differentiate, mature and where they reside: Intraepithelial Lymphocytes (IELs) and Lamina Propia Lymphocytes (LPLs). Although there are two main types of lymphocytes in the adaptive immune system, healthy intestine lymphocytes are mainly T cells. Although the effect of different types of T cells have been widely studied in both UC and CD (Hardenberg et al., 2011; Sakuraba et al., 2009), the effect in the pathology of the disease from other adaptive immune cell types such as B cells remains unknown.

1.2.3 B cells, the missing link

In the colon, B cells are mainly located inside intestinal lymphoid follicles (ILFs) and have been associated with inflammation status (Brandtzaeg et al., 2006). It’s been shown that UC patients have distinc phenotype characterized by an increased number of CD19+ B cells over healthy controls, unlike CD patients that show no significant difference (Lee et al., 1997;

van Unen et al., 2016).

Using a chemical-induced colitis mouse model, it has been shown that there is a B cell ex-

pansion upon intestinal repair after inflammation (Frede, A and Czarnewski. P, unpublished

(16)

observations), suggesting that these might pose a promising target for novel therapies.

1.3 Single cell technologies

The heterogeneity of cell populations can give us clues about biological processes such as development, gene expression and regulation amid others. Moreover, this heterogeneity is important in fluctuating conditions, where plasticity is imperative to adapt; on the con- trary, in stable environments this heterogeneity is reduced to small shifts around the optimal phenotype.

For years, bulk RNA sequencing (RNA-seq) has been the methodology of choice to charac- terize cell populations at the molecular level. This, averages gene expression across all cells, resulting in a loss of heterogeneity information. Single cell technologies aim to characterise this heterogeneity in the cellular context. Specifically, single cell RNA sequencing (scRNA- seq) does this at the transcriptional level. One of the most well-established applications of single cell transcriptomics is the discovery of new sub-populations of cells (Wagner et al., 2016).

In order to capture and amplify the RNA at single cell resolution, several methodologies have been developed in recent years. Among them, the most popular are: Quartz-seq2 (Sasagawa et al., 2018), Drop-seq (Macosko et al., 2015), SMART-seq3 (Hagemann-Jensen et al., 2019), CEL-seq2 (Hashimshony et al., 2016) and MARS-seq (Jaitin et al., 2014).

Although the processing differs between methodologies, all of them need a first step of single cell isolation. Usually, the starting biological material is raw tissue, in this case the mate- rial needs to be fragmented into single cell suspensions. Single cells are then isolated with either of the available techniques: manual pipetting, characterised by its low throughput and intense labor; microfluidics, separating cells according to morphology, DNA expression (with fluorescent reporters) (Zhang et al., 2012) or electric and magnetic properties; and fluorescence-activated cell sorting (FACS), which has recently gained popularity and stands as one of the most used sorting methodologies for scRNA-seq studies (Gr¨ un and van Oude- naarden, 2015).

Once the single cells have been isolated, cell lysis, mRNA extraction and capture, reverse transcription into the first strand of cDNA, second cDNA strand synthesis, cDNA amplifica- tion and library preparation for Next Generation Sequencing (NGS) platforms are performed (Hwang et al., 2018). All steps until the second strand synthesis are roughly the same across the previously mentioned methodologies. These include cell lysis with a hypotonic buffer containing RNAse inhibitors, polyA capture and synthesis of cDNA with a catalyzed reverse transcription. However, what makes every methodology different is the second cDNA strand synthesis and the amplification steps.

For the second strand synthesis, two main approaches are used: template-switching (Drop-

15

(17)

seq and SMART-seq3) and polyA tailing (Quartz-seq2) (Hwang et al., 2018). The latter is based on the ligation of a short A homopolymer together with a universal primer at the 3’

end of the newly synthesized first cDNA strand. This will serve as a primer binding site for the synthesis of the second cDNA strand. The described approach lacks strand specificity and uniform coverage across the transcript due to the polymerase stopping before reaching the 5’ end of the transcript. On the contrary, template-switching covers the whole transcript thanks to the activity of the M-MuLV polymerase, which adds cytosines at the 3’ end of the newly synthesized first cDNA strand only when it reaches the 5’ end of the transcript. Then the polymerase ’switches’ templates and uses the primed Template Switching Oligo (TSO) instead of the mRNA as its template. This results in a first cDNA strand with a known TSO sequence at its 3’ end that can be used for amplification and second strand synthesis. This approach allows for full-length transcript sequencing and confers strand specificity. On one hand, sequencing full-length libraries (SMART-seq3 and Quartz-seq2) has a cost associated that has been addressed by sequencing just the 5’ or 3’ ends of the transcripts (Drop-seq, CEL-seq2 and MARS-seq). On the other hand, having information about the full-length transcript allows for allelic and isoform expression resolution (Hwang et al., 2018).

To amplify the cDNA, either PCR (Drop-seq and SMART-seq3) or in vitro transcription (IVT; MARS-seq and CEL-seq2) are usually chosen (Hwang et al., 2018). The latter can linearly amplify the transcripts at the cost of being time consuming, while the former produces amplification bias due to the different affinities of the primers to the template. That is, low affinity primers will have underrepresented amplicons while high affinity primers will yield overrepresented amplicons. In order to overcome this amplification bias, 5’ and 3’ end sequencing methodologies have incorporated unique molecular identifiers (UMIs) (Macosko et al., 2015) which are different for every transcript and can be used as a reference of the absolute amount of transcripts before the amplification takes place.

With scRNA-seq, researchers want to analyse the biological component that may lead to new hypothesis. Technical noise remains a big problem that hinders our ability to study biological variation. This noise can be introduced in every step of the workflow, from the experimental design to the library sequencing. In order to tackle this, spike-in mRNAs are a common approach that helps to correct the signal by identifying the basal level of expression due to technical noise. In addition to this, the low amount of messenger RNA (mRNA) present in single eukaryotic cells (10-50 pg) (Kang et al., 2011), and Poisson sampling of transcripts (Islam et al., 2014), transcript dropouts (genes transcribed but not captured) are frequent and lead to extremely sparse matrices. This is a hallmark of scRNA-seq data and poses a vital analytical challenge that must be taken into consideration (Kharchenko et al., 2014).

1.4 Computational analyses of scRNA-seq

The number of tools available for scRNA-seq analyses is ever-increasing. Databases such

as www.scrna-tools.org (Zappia et al., 2018), have more than 600 available tools (as of

April 17th, 2020), highlighting how trending this new technology is. It is worth mentioning

(18)

that among these tools there are toolkits such as Scanpy (Python-based, Wolf et al. (2018)) and Seurat (R-based, Butler et al. (2018)) that include Application Programming Inter- faces (APIs) containing the necessary functions for performing the computational analyses of scRNA-seq data. These analyses include, but are not limited to, Quality Control (QC), Normalization, Gene and cell filtering, Dimensionality Reduction, Clustering and Differential Expression.

The first step in the workflow is to pre-process the data. This includes converting base call format (BCL) files to fastq, demultiplexing of reads into samples, collapsing identical UMIs to get absolute counts and the trimming of reads. Technology-specific and more general API suites have been developed to automate this process, tools such as Cell Ranger (10X Genomics), Drop-seq tools and zUMIs combine this steps into ready-to-use command line interface (CLI) tools (Macosko et al., 2015; Parekh et al., 2018; Zheng et al., 2017).

Just like in bulk RNA-seq experiments, in order to quantify these reads and get gene counts, the sequences are mapped to a reference genome or transcriptome if available. Here short read and splice aware aligners such as STAR (Dobin et al., 2013) and HISAT2 (Kim et al., 2015) are common in their scRNA-seq iterations; pseudo-aligners such as Alevin (Srivastava et al., 2019) and Kallisto (Bray et al., 2016) can be used too.

Single cell technologies can break, kill and stress living cells, hence obtaining non-informative sequencing data from these low quality cells. In addition, droplets or capture sites containing multiple cells or no cells at all can lead to misinterpretation. Because of this, one of the most critical steps to generate informative and accurate results is QC. Several metrics must be taken into account to filter low quality cells and make proper conclusions from the data. These metrics include: mitocondrial and ribosomal gene fraction, number of genes and number of reads, among others.

In practice, there’s no single metric that can discriminate low quality cells. But, once data is normalised, superposing quality metrics by sample with dimensionality reduction methods (e.g. Principal Component Analysis, PCA) can help expose low quality cells and possible batch effects. In fact, it has been shown that these have a strong association with the leading principal components (Wagner et al., 2016). These batch effects can be both technical, mainly due to experimental procedures and biological, such as cycling state or donor in human samples.

Although RNA-seq global scale methods such as DEseq2 (Love et al., 2014) can be used to normalize single cell data (Hwang et al., 2018), specific strategies have been developed in order to overcome the sparsity and variable sequencing depth across cells of scRNA-seq (Svensson et al., 2017). Methods such as SCTransform and SCnorm (Bacher et al., 2017;

Hagemann-Jensen et al., 2019), using different strategies, learn gene specific factors that are used to normalize the expression across cells.

However, sometimes batch effects can’t be removed by normalization or by visually inspect-

17

(19)

ing patterns in a low dimensional space. In this case, several computational methods and techniques have been established. These include Canonical Correlation Analysis (CCA, im- plemented in Seurat), k-nearest Neigbor (kNN) classification (implemented in kBET B¨ uttner et al. (2019)), among others. In addition to these methods, quality control metrics might be regressed out if suspected to contribute to batch effects.

In order to identify sub-populations of cells, unsupervised clustering of cells followed by visualisation in a lower dimensional subspace is performed. Although there is a wide spectrum of algorithms for unsupervised clustering to choose from, graph-based community detection algorithms such as Leiden (Traag et al., 2019), are the most efficient (Zhang et al., 2020).

When it comes to dimensionality reduction, the methods are applied to a subset of genes that are known to define populations a priori or statistically-determined highly variable genes. This reduces the use of computational resources while keeping valuable biological heterogeneity. The most commonly used methods for dimensionality reduction are PCA, t- stributed Stochastic Neighbour Embedding (tSNE) (van der Maaten and Hinton, 2008) and Uniform Manifold Approximation and Projection (UMAP) (McInnes et al., 2018).

Although unsupervised clustering is one of the most popular methodologies to characterize single cell heterogeneity, other algorithms based on unsupervised learning can be used. In this case a topic modeling approach with latent Dirichlet allocation (LDA). It was originally developed to obtain topics that explain the words found in a collection of texts (Blei et al., 2003). In scRNA-seq each document can be thought of as a cell, and every topic will define biological programs consisting of genes instead of words. Fixing the number of topics as input, the fitted model assigns every cell a weight on every topic. Moreover, it also infers the weight every gene has on every topic.

Checking for the expression of known cell type markers can help annotate the clusters. How- ever, obtaining a list of signature genes for the clusters allows for the functional characteriza- tion of unknown sub-populations. These genes can be obtained by computing differentially expressed genes (DEGs). Here, methodologies already described for bulk RNA-seq as well as newly developed scRNA-seq tools can be applied to obtain cluster-to-cluster DEGs or glob- ally DEGs per cluster. Among these, MAST (Finak et al., 2015) and DEseq2 (Love et al., 2014) are the most efficient (Dal Molin et al., 2017).

To functionally annotate these genes, pathway enrichment analysis, transcription factor (TF) analysis and pathway activity analysis can be done (Holland et al., 2020). In these methods, functional gene sets, TFs and pathways are statistically tested for over-representation in the empirical gene list compared to what is expected by chance.

1.5 Objectives of the degree project

The main goal of this master’s thesis was to characterise a previously overlooked population

of immune cell types in the context of experimental colitis. To this end, single cell RNA

(20)

sequencing was performed on isolated B cells. The data analysis allowed us to find a novel subpopulation of B cells that increases the most upon mucosal healing, pointing towards a probable novel function in this process.

Another goal of this work was to deeply characterise scRNA-seq data while considering cell plasticity. To this end, a Latent Dirichlet Allocation algorithm was implemented following Topic Modelling literature standards in other fields such as linguistics. This implementa- tion let me get a more nuanced view of each cell’s functional fate while keeping biological information.

19

(21)

2 Materials and Methods

The following steps (2.1, 2.2) were carried out by other members of the lab with expertise on the field. However, they are included in this report to portrait the overall workflow of the project.

2.1 Biological material

Despite the fact that immunological processes are not always analogous to those in humans, Mus musculus models have been widely used to study immunological disorders such as IBD in vivo. Because of this, and due to its vast genetic catalogue, a colitis mouse model was chosen as the source of biological material for this project.

2.1.1 Mice

All mice used for experiments were 6-16 weeks of age and were of C57BL/6 background. Wild- type mice were purchased from Taconic (Taconic, Ry, Denmark). CD19-CrexiDTR (Buch et al., 2005) were kindly provided by Nadine H¨ ovelmeyer (University of Mainz, Germany).

Animals were maintained under specific pathogen-free conditions at the KM-A animal facil- ity and handled according to protocols approved by the Institutional Animal Care and Use Committee at the Karolinska Institutet (Stockholm, Sweden).

2.1.2 Colitis model in mice

Among all the available chemically induced colitis models, Dextran sulfate sodium (DSS) was chosen for this study. DSS is provided at a given concentration in drinking water inducing severe colitis, resembling human’s UC phenotype (Okayasu et al., 1990).

Despite the mechanism of action not being known, it is a widely held view that DSS toxicity is against epithelial cells in the basal crypts of the gut. This toxicity damages epithelial cells, exposing luminal content to LPLs and IELs, generating a rapid and acute inflammatory response. This rapid, acute response is extremely useful for our study, allowing for the characterization of the resolution and regeneration of acute inflammation.

Colitis was induced in male or female mice by adding 2% DSS (Affymetrix ThermoFischer, Germany) to the drinking water for 7 days followed by a 7 day recovery phase with normal drinking water. For B cell depletion experiments Diphteria toxin (Sigma Aldrich; DT) was administered i.p. on days 9, 10, 12 and 13. Severity of colitis was assessed throughout the experiments by measuring body weight loss. On day 14, colon length was measured and histological analysis was performed for both experiments to assess the recovery phenotype.

2.2 Isolation and cell sorting

Colon and small intestine lamina propria cells were isolated as previously described (Vill-

ablanca et al., 2011) with slight modifications. Briefly, intestines were harvested and placed

(22)

in ice-cold PBS. After removal of residual mesenteric fat tissue the intestines were opened longitudinally. Tissues were then washed in ice-cold PBS and cut into 1 cm pieces. After 3 more washes in PBS, tissues were incubated on a shaking incubator at 37

C for 30 minutes in 20 mL of HBSS containing 5% FCS, 5 mM EDTA, 1mM DTT and HEPES 15mM. Tissue pieces were washed with 20mL PBS with 5% FCS and EDTA 1mM followed by PBS with 1% FCS and 15mM HEPES. Small intestine pieces were next digested in 10 ml of serum-free HBSS containing Liberase TL (0.15 mg/mL, Roche) and 0.1mg/mL DNase I (Roche) at 37at 600 rpm for 45 min. Cells were washed and passed through a 100 µm cell strainer. Cells were resuspended in 4.5 ml of 44% Percoll (Sigma Aldrich) and 2.3 mL of 67% Percoll were then underlaid. Percoll gradient separation was performed by centrifugation for 20 min at 600 x g at room temperature. Lymphoid fractions were collected at the interphase of the Percoll gradient, washed once, and resuspended in FACS buffer (5% FCS DPBS) or culture medium.

Single cell suspensions were incubated for 10 min at 4

C with Fc-blocking (CD16/32) anti- body (eBioscience) and fixable viability dye AmCyan or Pacific Blue (eBioscience) prior to staining with fluorochrome-conjugated antibodies for 15 min at 4

C. B cells from d0 and and d14 after DSS treamtent were sorted as viable CD45+CD3-CD64-Ly6G-CD19+. Epithe- lial cells were sorted as viable CD45-CD31-CD326+ and stromal cells were sorted as viable CD45-CD31-Ter119-CD324-CD326-CD90+ cells.

2.3 Single cell sequencing protocol

Freshly sorted cells were processed following a droplet-based protocol using the Chromium Controller (10X Genomics). The scRNA-seq libraries were constructed according to the 10X Genomics protocol using the Chromium Single-Cell 3’ Gel Bead and Library V2 Kit (10X Genomics).

2.3.1 Library preparation

Briefly, single cells were combined with reverse transcriptase (RT) and separated into Gel Bead-In-EMulsions (GEMs). Inside these beads, transcripts are barcoded with an Illumina R1 sequence (read 1 primer), a 16 bp 10X barcode (unique for each individual cell), and a 10 bp Unique Molecular Identifier (UMI; unique for each transcript). P5, P7, a sample index, and R2 (read 2 primer) were added during library construction as per manufacturer’s instructions.

2.3.2 Sequencing and preprocessing

The resulting libraries were sequenced using Illumina NovaSeq 6000 with paired-end 150bp read length. Using Cell Ranger (version 3.0.1; 10x Genomics) function mkfastq (default parameters) samples were demultiplexed and FASTQ files corresponding to read 1, read 2 and indexes were generated. Cell Ranger count function was used to align the reads to the mouse reference transcriptome (mm10) and collapse UMIs, generating gene-barcode matrices for both samples.

21

(23)

2.4 scRNA-seq analysis of B cells during tissue repair

A total of 3376 and 4780 cells, for d0 and d14 samples respectively, were analysed using the R package Seurat (Butler et al., 2018) version 3.1.4, together with R version 3.6.1. Genes known to be long non coding RNAs (lincRNA) such as Malat1 and pseudogenes were filtered out of the dataset. Cells with less than 200 detectable genes were removed. Genes were kept if expressing on at least 3 cells. As a doublet-filtering criteria a threshold of 2200 detected genes per cell was used, removing cells surpassing this threshold. Ribosomal and mitochondrial gene fractions were calculated, and based on visual inspection of its distribution across cells, a threshold of 10% was chosen. Cells were considered broken (i.e. artifacts) if more than 10% of its gene content corresponded to mitochondrial genes. Cells with less than 10% of ribosomal content were also removed.

Cycling cells were removed, as these may bias downstream analyses. Briefly, known mouse cycling genes were downloaded from tinyatlas (https://github.com/hbc/tinyatlas), and gene names were extracted from Ensembl release 99 (Hunt et al., 2018) using AnnotationHub version 2.18.0 (Morgan and Shepherd, 2020). Single cells were given a score based on their G2/M and S marker genes’ expression following Tirosh et al. scoring algorithm. Based on these scores, cells with extremely high cycling scores (i.e. distribution outliers) were removed.

Counts in every cell were log normalized with the total expression and multiplied by a scale factor of 10.000 using NormalizeData with default parameters. Highly variable genes were selected based on their dispersion across cells while controlling for average expression using FindVariableFeatures with the option selection.method set to ”mean.var”. Counts were scaled and mitochondrial percentage, cycling scores and the number of counts were regressed out using ScaleData with default parameters.

For downstream analyses a low dimensional object was used. With this purpose, RunPCA was executed to compute 50 Principal Components (PCs). The number of PCs to keep were calculated to avoid subjective visual inspection. Briefly, a first approximation was obtained by selecting the PC that exhibits a cumulative variance greater than 90% and an associated variation of less than 5%. A second estimation was made by choosing the PC that showed a difference of less than 0.05 in the percent of variation with its subsequent PC. Finally, the most explanatory PC out of these two was chosen. To account for possible batch effects between the two samples, a CCA correction was performed using IntegrateData using the previously computed PCs and other parameters as default. A UMAP dimensionality reduction was achieved by using RunUMAP with the previously computed PCs, 30 neighboring points and other parameters as default.

For subpopulation discovery, a graph was computed for all the cells, followed by a graph-based

clustering algorithm. Shortly, a Shared Nearest Neighbor (SNN) graph was computed using

FindNeighbors with the aforementioned PCs and k set to 20. Unsupervised clustering was

performed using FindClusters and the graph-based clustering algorithm Louvain with multi-

level refinement. The resolution parameter was set to 0.3. Cells expressing T cell marker genes

(24)

(Cd8b1, Trbc2 or Ms4ab) clustered together and were removed. The described workflow was performed again with the same parameters but decreasing the resolution of FindClusters to 0.16. Conserved genes (i.e. differentially expressed genes with similar expression across con- ditions) for every cluster were obtained with FindConservedMarkers. Differentially expressed genes per cluster and condition were also calculated using FindAllMarkers and FindMarkers respectively. In both cases MAST version 1.12.0 (Finak et al., 2015) was used for differential expression testing.

2.4.1 Enrichment analysis

In order to functionally characterize clusters of interest, pathway enrichment analysis was performed on the cluster’s signature genes with enrichR version 2.1 (Kuleshov et al., 2016).

Briefly, lists of upregulated and downregulated genes were extracted from the analysis men- tioned above and an enrichment analysis was carried out against Kyoto Encyclopedia of Genes and Genome (KEGG) pathway database (Kanehisa, 2000) and Gene Ontology (GO) (Ashburner et al., 2000). To calculate per-cluster pathway activity scores, the normalized counts per cell were provided to PROGENy version 2.0 (Schubert et al., 2018) using the mouse model matrix containing 14 pathways and its components, as well as the top 650 most responsive genes upon pathway perturbation according to the adjusted p-value. Pathway scores are computed by a weighted sum of the product from expression and the weight of responsive genes.

2.5 scRNA-seq analysis of Stromal and Epithelial cells in B cell-depleted mice

To gain insights into the B cell modulatory effect in the process of mucosal healing, we performed single-cell transcriptomic analysis in the presence or absence of B cells. The analysis was carried out following the described procedure. Due to time constraints, I did not perform these analyses but the processed matrices were provided by my supervisor Kumar Parijat Tripathi.

2.5.1 Topic Modeling

In order to characterize the heterogeneity of Stroma and Epithelial cells, a topic model was applied to each dataset. A model was fitted to the sparse count matrix using the FitGoM function from the package CountClust version 1.14.0 (Dey et al., 2017). Ribosomal genes were removed from the counts matrix prior to fitting the model. To determine the optimal number of topics for the best-fitted model, I computed the Bayesian Information Criterion (BIC), the estimated likelihood and the Akaike Information Criterion (AIC) for a range of topics k. The choice of k was determined when both information criteria were minimal. In this case k was set to 10. The tolerance parameter was set to 0.1. I focused on topics with significantly different weight score between the B cell-containing control and B cell-depleted groups. Due to the weights following a Dirichlet distribution, a non-parametric test was used to test for differences in weights, specifically the Wilcoxon rank-sum test was used. The top genes to highlight for each topic were selected using the ExtractTopFeatures() function.

23

(25)

3 Results

3.1 scRNA-seq reveals B cell heterogeneity dynamics during recovery

In order to characterize the heterogeneity of B cells in the LP during recovery after DSS- induced colitis, scRNA-seq dropblet-based libraries were generated and sequenced both at days 0 (homeostasis) and 14 (repair) (Fig. 1A). After pre-processing and filtering the data (see Methods), a total of 8005 cells and 12726 genes were analyzed. Of these cells, 3247 corresponded to day 0 and 4655 to day 14 (Fig. 1B). Unsupervised graph-based clustering of cells yielded 7 different B cell subsets. Co-expression of Ighd and Ighm, confirmed the B cell identity of all clusters (Fig. S1).

Figure 1. (A), Schematic of DSS-induced colitis model. (B), number of cells analyzed after processing on day 0 and day 14

Clusters were manually annotated based on their conserved markers and the ImmGen database (Fig. 2A and B), heedless of day. The aforementioned ubiquitous expression of Ighd and Ighm identified cluster 1 as Naive B cells.

Most cells were localized in cluster 1, due to a lack of specific markers for B cell subpopulations or activation markers and the aforementioned ubiquitous expression of IgD (Ighd) and IgM (Ighm), we decided to describe this cluster as na¨ıve cells. Cells in cluster 7 resemble Plasma cells based on their expression of Jchain, Aicda and IgA (Igha). Cluster 6 was annotated as B1 B cells (B1b) based on the overall pattern of expression and the Immgen database.

Cluster 5 comprises cells expressing immediate early response genes like Egr1, Egr2, Fos and

Myc, suggesting a recent response after B cell receptor engagement. Cluster 2 shows a high

prevalence of IFN induced genes like Ifi27a, Ifi47, Irf7 and together with the expression of

Stat1 and Ly6a might indicate a response to foreign antigens. Genes related to cell cycle

progression, differentiation and cell growth (Ska2, S100a6) were found in cluster 4. Cluster

(26)

3 is characterized by cells expressing genes involved in a stress response (Hsapa1a, Dnaja3 and Ndgr1). Furthermore cells within this cluster expresse Ccl4 a chemokine which attracts other immune cells.

Clusters 1, 2, 4 and 5 showed an increase in the number of cells during the recovery phase, clusters 3, 6 and 7 have a decreased number of cells on day 14 (Fig. 2C). With cluster 2 being the group of cells with the highest increase during recovery.

Figure 2. (A), Dot plot visualizing normalized and scaled counts of the top 3 conserved markers (ranked by minimum adjusted p-value) by cluster. (B), UMAP embedding of cells colored by cluster. (C), Bar plot of the number of cells per day and per cluster on a logarithmic scale

25

(27)

Due to the highest increase of cells within cluster 2 on day 14, we decided to further charac- terize these cells. First, we performed GO enrichment analysis using DEG between clusters.

DEG in Cluster 2 were enriched in genes associated with IFN-response. In agreement with an ongoing IFN response, cluster 2 B cells co-express Sell (Evans et al., 1993), which is a homing receptor required to reach lymphoid structures and genes involved in antigen processing and presentation (Kiefer et al., 2012), suggesting an undergoing activation by invading pathogens.

Overall, the shown results elucidate the heterogeneity of the already proven increase of B cells upon recovery in experimental colitis. As has been shown, not all the cell types in the population increase upon recovery, for this reason I formulated a new hypothesis: do these cell types have a different function in the recovery phase?

3.2 Cell heterogeneity uncovers pathway activity perturbations in B cells during experimental colitis

To test if the different clusters of cells presented an altered pathway activity compared to the expected in steady state, gene expression was mapped to known mouse pathway components, using the integrated dataset (see Methods).

Figure 3. Heatmap of 14 mouse pathway activities (represented as z-scores) in the integrated dataset inferred from the top 650 most responsive genes’ expression using PROGENy

This analysis predicted a higher activation of the Janus kinase (JAK)-signal transducer of activation of transcription (STAT) pathways in cluster 2 compared to the rest of the clusters (Fig. 3), which is in agreement with a pathogen-mediated activated state. As expected, Plasma cells show an overall altered pathway activity, manifesting their natural activation state (In fact, these are just activated B cells that produce copious amounts of antibody).

Interestingly, Cluster 5 shows a unique increased Mitogen-activated protein kinase (MAPK)

pathway’s activity.

(28)

3.3 B cells present a shifted transcriptional fingerprint amid recovery

As described above, the heterogeneity of B cells remains between homeostasis (d0) and repair (day14). As not one specific subtype of B cells is emerging or disappearing during tissue repair we tried to further describe the recovery phase functionally. To this end I tried to find DEGs in a bulk-like setting. Treating all the cells as a single cluster didn’t find any significant (p.adj <0.05) difference in expression.

In order to fine tune the differences in expression, clusters showing an increase of cells in the recovery phase (i.e. clusters 1, 2, 4 and 5) were further characterized and tested between day 0 and day 14. Clusters 1, 4 and 5 didn’t have any significantly (p.adj <0.05) differentially expressed genes between day 0 and day14. On the contrary, the analysis of cluster 2 cells showed 5 DEGs, namely Serpina3g, Pkib, Ly6a, Zbp1 and Samhd1 (Fig. 4A-E). Notwith- standing, a higher proportions of cells are expressing these genes during the recovery phase compared to the control sample (Fig. 4F).

Figure 4. (A-E), Violin plots visualizing the level of expression of cells in cluster 2 as normalized counts for every cell (dots not shown), on day 0 and day 14. (F), Bar plot of the percentage of cells (expressed as a fraction of the total number of cells in that cluster) expressing each gene, on day 0 and day 14 separately

These genes are associated with IFN-γ signalling, protein kinase A (PKA) regulation (Fig.

S2) as well as innate immune function, suggesting contact and maybe clearance of bacterial species in the gut.

27

(29)

3.4 B cells impact tissue remodeling programs during mucosal healing

It has previously been shown that B cells have a positive effect during the inflammatory phase of DSS-induced colitis (Yanaba et al., 2011). As shown above we see a pronounced increase of B cells in the recovery phase and therefore, we sought to explore the consequences of the B cells depletion during the recovery rather than the inflammatory phase.

Moreover, the effect of B cells on the pathology of experimental colitis might not be explained by changes in the function of these but rather by their influence on other cell types present in the LP. To this end, we made use of the CD19cre x iDTR mouse line which allowed us to deplete CD19+ cells (B and plasma cells) specifically with the exposure to diphteria toxin (DT) (Demircik et al., 2013). Libraries were generated and sequenced from sorted stromal cells and IECs from B cell depleted mice and B cell sufficient mice (see Methods) at day 14 of experimental colitis (Fig. 5).

Figure 5. Schematic of DSS-induced colitis model in CD19crexiDTR mice. B cell depletion was induced with i.p injected DT at days 9, 10, 12 and 13 (yellow)

So far we have based our analysis in the classical clustering approach, which results on exclusive identities for each cell. This doesn’t consider biological signals that might be shared by distinct cell types. To further explore unanticipated biological insights, I applied Topic Modeling with the LDA algorithm (see Methods) to these cells. The number of topics chosen for the model was 10. This captured relevant biological programs and didn’t result in apparent overfitting (Fig. S3, Methods). LDA assumes the existence of a number of underlying gene programs (“topics”) that are weighted on every cell. Explaining, at least in part, each cell’s transcriptional activity (Fig. 6A-I).

Cells from B cell depleted mice had a significantly different weight distribution when com- pared to B cell sufficient mice (Fig 6A, p = 9.98 x 10

-110

, Wilcoxon rank-sum test) when focusing in Topic 1 (Fig. 6B). This topic is characterized by genes involved in tissue re- modeling (Fig. S4A) such as Col4a5 and Col18a1 (Fig. 6C), as well as genes defining the intestinal stem cell niche, such as Bmp5 and Wnt4a (Fig. S4A) (Eliazer et al., 2019; Kosinski et al., 2007). Moreover, Topic 3 showed a significantly different distribution of weights (Fig.

6D, p = 7.83 x 10

-168

, Wilcoxon rank-sum test) in cells proximal to Topic 1, suggesting a

novel transcriptional gradient (Fig. 6E). This difference was in the opposite direction when

compared to Topic 1, with cells from B cell depleted mice being less prominent in this topic

than cells from B cell sufficient mice. Notwithstanding, this topic was also heavily weighted

on genes related to tissue remodeling such as Col14a1 (Fig. 6F, Fig. S4B). Following this

trend, Topic 4 was also weighted significantly different in depletion and control conditions,

with an analogous behaviour to Topic 3 (Fig. 6G, p = 3.55 x 10

-106

, Wilcoxon rank-sum

(30)

test). The cells weighted for this Topic followed the ”trajectory” determined by the previ- ously mentioned topics (Fig. 6H). This Topic is defined by cancer related genes such as Ptn and Serpine2 ( ¨ Ozdemir et al., 2014) (Fig. 6I, Fig. S4B).

Figure 6. (A,D,G), cumulative density function (y axis) of topic weights (x axis) for cells from B cell depleted mice (red curve) or B cell sufficient mice (blue curve). Adjusted p-values were determined using a Wilcoxon rank-sum test. (B,E,H), UMAP 2D embedding of cells coloured by topic weight. (C,F,I), UMAP 2D embedding of cells coloured by expression (as normalized counts) of the top features per topic

Altogether, topic analysis demonstrated that B cell depletion during mucosal healing impacts stromal and intestinal epithelial cells transcriptomic programs, which were mostly associated to tissue remodeling and the epithelial stem cell niche.

29

(31)

4 Discussion

Although the expansion of B cells in experimental colitis was previously known, the role of these cells in the process has not been described yet. Here, with an unbiased analysis we have identified a key sub-celltype that governs the aforementioned expansion. Using single cell analysis we tried to identify a specific B cell subtype which might explain the expansion.

Our results indicated that not a specific subpopulation is increasing but several and B cells remain heterogeneous. Even though several subtypes are expanding, we identifed cluster 2 (Section 3.1) as the one increasing the most during tissue repair, pointing towards a possible function in the process of mucosal healing.

Further analysis of these cells showed an increased activity of the JAK-STAT pathway (Sec- tions 3.2 3.3). This pathway has several functions as it is associated with immune cell division, activation and survival (de Prati et al., 2005). JAK-STAT signalling is important for several functions within immune cells including response to cytokines and growth factors (Harrison, 2012). These results suggests an activated immune response compared to the other clus- ters. Furthermore, cluster 2 is characterized by genes related to IFN signalling which are associated with immune responses against pathogens, indicating a recent activation.

These cells can be associated with damage and act as an early response to inflammation in an innate-like fashion. After breakage of the intestinal barrier, the host microbiota and foreign microorganisms penetrate the epithelial layer, which might result in the recruitment of these damage-associated B cells to the affected area.

B cells are overall considered to be part of the adaptive immune system as they bind antigens and secrete specific antibodies for them. While it has been shown that B cells exert functions besides the adaptive immune response, for instance antigen presentation, in the context of the intestinal tract these functions have not gained much attention. The innate function of B cells has been previously described (Zhang, 2013), secreting IgM and anti-inflammatory IL-10 without needing a memory response. But this was not described in the context of IBDs.

However, no conclusions about the role of these cells in the mucosal healing process could be drawn from these results. Notwithstanding, with the Topic Modeling analysis of IECs and Stromal cells (Section 3.4), a novel function of B cells in the process of mucosal healing seems plausible. Cells heavily weighted on Topic 1 express genes related to the epithelial basement membrane (Col4 and Col6) and other factors to support epithelial growth (Wnt, Bmp) suggesting that B cells might be detrimental for the process of mucosal healing. This function might not be due to a direct effect of these cells but rather on their interaction with other immune surrounding cells such as IECs and Stromal cells, which can be further studied in future experiments.

The use of Topic Modeling in single cell data has clearly been proven to be an alternative

approach to functionally characterise populations of cells (Hagemann-Jensen et al., 2019).

(32)

Although classical clustering and Topic Modeling are based on unsupervised learning, the fundamentals of both are different. While Topic Modeling can work with the whole dataset of cells and genes, graph-based clustering is done on a lower dimensional object. This object might or might not capture the biology we are interested in. In addition, topic modeling allows cells to have more than one identity, unlike clustering where each cell is assigned to a single cluster. All in all, Topic Modeling is able to capture fluid transcriptional states while leveraging the biological programs explaining each cell’s functional fate.

4.1 Ethical statement

It is worth mentioning that this thesis work was carried out on Mus musculus, a model organism widely used in biomedical research due to its high homology with humans. Although how these mice are treated and the number of organisms used in the studies is linearly improving with the years, whether these studies are as ethical as they can be remains an open problem in science.

However, a new trend in animal ethics has arisen, which proposes the ”3Rs” principle (re- duction, refinement, replacement). Following these guidelines, higher level animals are being replaced with lower organisms or methodologies when possible; moreover, the experiments need to be as refined (i.e. planned) as possible, reducing animal suffering and distress as much as possible. Finally, the number of animals is being reduced within statistical significance boundaries (Doke and Dhawale, 2015).

But despite the efforts of the scientific community to be as ethical as possible in their exper- iments, computational resources and methodologies are more accessible than ever. Because of this, some computational models, specially in the field of drug discovery and validation, are being used.

31

(33)

5 Conclusions and future perspectives

The maturity of scRNA-seq and its associated computational methods has allowed immu- nologists to pursue the hidden mechanisms behind previously unanswered questions. In this project I showcased the efforts made to understand the pathology of IBDs with the help of this technologies. In summary:

• B cell heterogeneity does not change amid recovery during experimental colitis

• The transcriptional landscape of B cells changes during recovery in mice undergoing experimental colitis

• Pathway activity inference analysis can functionally characterize clusters of B cells

• The JAK-STAT and MAPK pathways have an increased activity in B cell sub-populations during experimental colitis

• Topic Modeling accurately portraits fluid biological programs in single cell RNA-seq data

• B cell deficient mice show altered remodeling programs in IECs and Stromal cells during mucosal healing

These findings are fascinating, but equally challenging biological and technical research ques- tions remain to be addressed.

In order to demonstrate the suggested role of B cells during mucosal healing after DSS- induced colitis, further in vivo experiments need to be done. These experiments will help find a distinct phenotype associated with the proposed role of B cells that will allow us to demonstrate causality. Moreover, for this study to gain medical relevance, a similar analysis of already published human data in similar conditions could be carried out.

Due to the nature of scRNA-seq protocols, the spatial dimension of the cells is lost. And so,

cells in this study were given different identities based on their transcriptomic profile. But it

is widely known that the spatial context of a cell plays an important role in determining its

function. Because of this, the use of complementary novel technologies such as Visium, will

facilitate resolve the spatial expression profile of these cells and add a contextual interpreta-

tion of their function. The integration of other single cell technologies such as scATAC-seq,

scCHIP-seq or CITE-seq, is computationally challenging but might help us obtain a more

detailed view of these cells.

(34)

6 Acknowledgements

I’d like to thank both my supervisors and subject reader for their support along the way.

Parijat, thank you for being my mentor and helping me overcome difficult times, both aca- demically and personally. I hope your dedication and good science help you reach all your goals with the ones you care the most about by your side.

Annika, thanks for your comprehensive explanations on immunology and your overall sup- port. A little immunologist has grown in me after all these months. I truly hope this new career path brings you academic satisfaction and personal growth.

˚ Asa, you’ve been the best subject reader a student could get, I was not expecting this much support and supervision. Thanks for helping me overcome my student insecurities when contacting other investigators. I’ve been writing a lot of emails lately and it has never been this easy, thanks.

This master thesis wouldn’t have been possible without the approval of Eduardo Villablanca, for whom I’d also like to give a special ’thank you’.

I have to admit that we had a rough start, but your change of mind really grew on me. The amount of care and attention you put into your team is notworthy. I’ve never felt that my research was as important as it is now. I’m glad I could’ve helped the lab in whatever it was possible. Finally, I think I speak for everyone that has been under your supervision when I say that, no matter who comes in or leaves, your lab will always be part of the scientist we are today and in the future.

Last but not least, I’d like to thank the ones that have been closer to me during this time.

Kathi, thanks for being the best part of my everyday’s life. Thanks for challenging me to be a better person on a daily basis. I’m sure we have a bright future in front of us. Thanks to my mom, who I wish to see soon with all my will. Thanks to my father, for fighting against all odds and making me feel the luckiest son ever and finally, thanks to my sister, whose hard work and joy for life have helped me become who I am today.

33

(35)

References

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G, 2000. Gene Ontology: tool for the unification of biology. Nature Genetics 25(1):25–29. doi:10.1038/75556.

Bacher R, Chu LF, Leng N, Gasch AP, Thomson JA, Stewart RM, Newton M, Kendziorski C, 2017. SCnorm: robust normalization of single-cell RNA-seq data. Nature Methods 14(6):584–586. doi:10.1038/nmeth.4263.

Batista FD, Harwood NE, 2009. The who, how and where of antigen presentation to B cells.

Nature Reviews Immunology 9(1):15–27. doi:10.1038/nri2454.

Blei DM, Ng A, Jordan M, 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3:29.

Brandtzaeg P, Carlsen HS, Halstensen TS, 2006. The B-Cell System in Inflammatory Bowel Disease. In Back N, Cohen IR, Kritchevsky D, Lajtha A, Paoletti R, Blumberg RS, Neurath MF, editors, Immune Mechanisms in Inflammatory Bowel Disease, volume 579.

Springer New York, New York, NY, pp. 149–167. doi:10.1007/0-387-33778-4 10. Series Title: Advances in Experimental Medicine and Biology.

Bray NL, Pimentel H, Melsted P, Pachter L, 2016. Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology 34(5):525–527. doi:10.1038/nbt.3519.

Buch T, Heppner FL, Tertilt C, Heinen TJAJ, Kremer M, Wunderlich FT, Jung S, Waisman A, 2005. A Cre-inducible diphtheria toxin receptor mediates cell lineage ablation after toxin administration. Nature Methods 2(6):419–426. doi:10.1038/nmeth762.

Buonocore S, Ahern PP, Uhlig HH, Ivanov II, Littman DR, Maloy KJ, Powrie F, 2010.

Innate lymphoid cells drive interleukin-23-dependent innate intestinal pathology. Nature 464(7293):1371–1375. doi:10.1038/nature08949.

Burisch J, Jess T, Martinato M, Lakatos PL, 2013. The burden of inflammatory bowel disease in Europe. Journal of Crohn’s and Colitis 7(4):322–337. doi:10.1016/j.crohns.2013.01.010.

Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, Kwiatkowski DP, McCarthy MI, Ouwehand WH, Samani, 2007. Association scan of 14,500 nonsynony- mous SNPs in four diseases identifies autoimmunity variants. Nature Genetics 39(11):1329–

1337. doi:10.1038/ng.2007.17.

Butler A, Hoffman P, Smibert P, Papalexi E, Satija R, 2018. Integrating single-cell transcrip-

tomic data across different conditions, technologies, and species. Nature Biotechnology

36(5):411–420. doi:10.1038/nbt.4096.

(36)

ing single-cell RNA-seq batch correction. Nature Methods 16(1):43–49. doi:10.1038/

s41592-018-0254-1.

Choy MC, Visvanathan K, De Cruz P, 2017. An Overview of the Innate and Adaptive Immune System in Inflammatory Bowel Disease:. Inflammatory Bowel Diseases 23(1):2–

13. doi:10.1097/MIB.0000000000000955.

Dal Molin A, Baruzzo G, Di Camillo B, 2017. Single-Cell RNA-Sequencing: Assessment of Differential Expression Analysis Methods. Frontiers in Genetics 8:62. doi:10.3389/fgene.

2017.00062.

De Filippo C, Cavalieri D, Di Paola M, Ramazzotti M, Poullet JB, Massart S, Collini S, Pieraccini G, Lionetti P, 2010. Impact of diet in shaping gut microbiota revealed by a comparative study in children from Europe and rural Africa. Proceedings of the National Academy of Sciences 107(33):14691–14696. doi:10.1073/pnas.1005963107.

de Prati A, Ciampa A, Cavalieri E, Zaffini R, Darra E, Menegazzi M, Suzuki H, Mariotto S, 2005. STAT1 as a New Molecular Target of Anti-Inflammatory Treatment. Current Medicinal Chemistry 12(16):1819–1828. doi:10.2174/0929867054546645.

Demircik F, Buch T, Waisman A, 2013. Efficient B Cell Depletion via Diphtheria Toxin in CD19-Cre/iDTR Mice. PLoS ONE 8(3):e60643. doi:10.1371/journal.pone.0060643.

Dey KK, Hsiao CJ, Stephens M, 2017. Visualizing the structure of RNA-seq expression data using grade of membership models. PLOS Genetics 13(3):e1006599. doi:10.1371/journal.

pgen.1006599.

D’Haens GR, Geboes K, Peeters M, Baert F, Penninckx F, Rutgeerts P, 1998. Early lesions of recurrent Crohn’s disease caused by infusion of intestinal contents in excluded ileum.

Gastroenterology 114(2):262–267. doi:10.1016/s0016-5085(98)70476-7.

Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR, 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics (Oxford, England) 29(1):15–21. doi:10.1093/bioinformatics/bts635.

Doke SK, Dhawale SC, 2015. Alternatives to animal testing: A review. Saudi Pharmaceutical Journal 23(3):223–229. doi:10.1016/j.jsps.2013.11.002.

Duerr RH, Taylor KD, Brant SR, Rioux JD, Silverberg MS, Daly MJ, Steinhart AH, Abraham C, Regueiro M, Griffiths A, Dassopoulos T, Bitton A, Yang H, Targan S, Datta LW, Kistner EO, Schumm LP, Lee AT, Gregersen PK, Barmada MM, Rotter JI, Nicolae DL, Cho JH, 2006. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science (New York, NY) 314(5804):1461–1463. doi:10.1126/science.1135245.

Eliazer S, Muncie JM, Christensen J, Sun X, D’Urso RS, Weaver VM, Brack AS, 2019. Wnt4 from the Niche Controls the Mechano-Properties and Quiescent State of Muscle Stem Cells.

Cell Stem Cell 25(5):654–665.e4. doi:10.1016/j.stem.2019.08.007.

35

(37)

Evans SS, Collea RP, Appenheimer MM, Gollnick SO, 1993. Interferon-alpha induces the expression of the L-selectin homing receptor in human B lymphoid cells. The Journal of Cell Biology 123(6 Pt 2):1889–1898. doi:10.1083/jcb.123.6.1889.

Finak G, McDavid A, Yajima M, Deng J, Gersuk V, Shalek AK, Slichter CK, Miller HW, McElrath MJ, Prlic M, Linsley PS, Gottardo R, 2015. MAST: a flexible statistical frame- work for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biology 16(1):278. doi:10.1186/s13059-015-0844-5.

Gr¨ un D, van Oudenaarden A, 2015. Design and Analysis of Single-Cell Sequencing Experi- ments. Cell 163(4):799–810. doi:10.1016/j.cell.2015.10.039.

Hagemann-Jensen M, Ziegenhain C, Chen P, Ramsk¨ old D, Hendriks GJ, Larsson AJ, Faridani OR, Sandberg R, 2019. Single-cell RNA counting at allele- and isoform-resolution using Smart-seq3. preprint, Genomics. doi:10.1101/817924.

Hampe J, Franke A, Rosenstiel P, Till A, Teuber M, Huse K, Albrecht M, Mayr G, De La Vega FM, Briggs J, G¨ unther S, Prescott NJ, Onnie CM, H¨ asler R, Sipos B, F¨ olsch UR, Lengauer T, Platzer M, Mathew CG, Krawczak M, Schreiber S, 2007. A genome-wide associa- tion scan of nonsynonymous SNPs identifies a susceptibility variant for Crohn disease in ATG16L1. Nature Genetics 39(2):207–211. doi:10.1038/ng1954.

Hardenberg G, Steiner TS, Levings MK, 2011. Environmental influences on T regulatory cells in inflammatory bowel disease. Seminars in Immunology 23(2):130–138. doi:10.1016/

j.smim.2011.01.012.

Harrison DA, 2012. The JAK/STAT Pathway. Cold Spring Harbor Perspectives in Biology 4(3):a011205–a011205. doi:10.1101/cshperspect.a011205.

Hashimshony T, Senderovich N, Avital G, Klochendler A, de Leeuw Y, Anavy L, Gennert D, Li S, Livak KJ, Rozenblatt-Rosen O, Dor Y, Regev A, Yanai I, 2016. CEL-Seq2:

sensitive highly-multiplexed single-cell RNA-Seq. Genome Biology 17(1):77. doi:10.1186/

s13059-016-0938-8.

Holland CH, Tanevski J, Perales-Pat´ on J, Gleixner J, Kumar MP, Mereu E, Joughin BA, Stegle O, Lauffenburger DA, Heyn H, Szalai B, Saez-Rodriguez J, 2020. Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data.

Genome Biology 21(1):36. doi:10.1186/s13059-020-1949-z.

Hunt SE, McLaren W, Gil L, Thormann A, Schuilenburg H, Sheppard D, Parton A, Armean IM, Trevanion SJ, Flicek P, Cunningham F, 2018. Ensembl variation resources. Database:

The Journal of Biological Databases and Curation 2018. doi:10.1093/database/bay119.

Hviid A, Svanstrom H, Frisch M, 2011. Antibiotic use and inflammatory bowel diseases in childhood. Gut 60(1):49–54. doi:10.1136/gut.2010.219683.

Hwang B, Lee JH, Bang D, 2018. Single-cell RNA sequencing technologies and bioinformatics

pipelines. Experimental & Molecular Medicine 50(8):96. doi:10.1038/s12276-018-0071-8.

(38)

Ince MN, Elliott DE, 2007. Immunologic and Molecular Mechanisms in Inflammatory Bowel Disease. Surgical Clinics of North America 87(3):681–696. doi:10.1016/j.suc.2007.03.005.

Islam S, Zeisel A, Joost S, La Manno G, Zajac P, Kasper M, L¨ onnerberg P, Linnarsson S, 2014. Quantitative single-cell RNA-seq with unique molecular identifiers. Nature Methods 11(2):163–166. doi:10.1038/nmeth.2772.

Jaitin DA, Kenigsberg E, Keren-Shaul H, Elefant N, Paul F, Zaretsky I, Mildner A, Cohen N, Jung S, Tanay A, Amit I, 2014. Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types. Science 343(6172):776–779. doi:10.1126/science.

1247651.

Kanehisa M, 2000. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 28(1):27–30. doi:10.1093/nar/28.1.27.

Kang Y, Norris MH, Zarzycki-Siek J, Nierman WC, Donachie SP, Hoang TT, 2011. Tran- script amplification from single bacterium for transcriptome analysis. Genome Research 21(6):925–935. doi:10.1101/gr.116103.110.

Kaplan GG, Ng SC, 2016. Globalisation of inflammatory bowel disease: perspectives from the evolution of inflammatory bowel disease in the UK and China. The Lancet Gastroen- terology & Hepatology 1(4):307–316. doi:10.1016/S2468-1253(16)30077-2.

Kharchenko PV, Silberstein L, Scadden DT, 2014. Bayesian approach to single-cell differential expression analysis. Nature Methods 11(7):740–742. doi:10.1038/nmeth.2967.

Kiefer K, Oropallo MA, Cancro MP, Marshak-Rothstein A, 2012. Role of type I interferons in the activation of autoreactive B cells. Immunology and Cell Biology 90(5):498–504.

doi:10.1038/icb.2012.10.

Kim D, Langmead B, Salzberg SL, 2015. HISAT: a fast spliced aligner with low memory requirements. Nature Methods 12(4):357–360. doi:10.1038/nmeth.3317.

Korzenik JR, Podolsky DK, 2006. Evolving knowledge and therapy of inflammatory bowel disease. Nature Reviews Drug Discovery 5(3):197–209. doi:10.1038/nrd1986.

Kosinski C, Li VSW, Chan ASY, Zhang J, Ho C, Tsui WY, Chan TL, Mifflin RC, Powell DW, Yuen ST, Leung SY, Chen X, 2007. Gene expression patterns of human colon tops and basal crypts and BMP antagonists as intestinal stem cell niche factors. Proceedings of the National Academy of Sciences 104(39):15418–15423. doi:10.1073/pnas.0707210104.

Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, Koplev S, Jenkins SL, Jagodnik KM, Lachmann A, McDermott MG, Monteiro CD, Gundersen GW, Ma’ayan A, 2016. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.

Nucleic Acids Research 44(W1):W90–97. doi:10.1093/nar/gkw377.

Lee HB, Kim JH, Yim CY, Kim DG, Ahn DS, 1997. Differences in Immunophenotyping of Mucosal Lymphocytes Between Ulcerative Colitis and Crohn‘s Disease. The Korean Journal of Internal Medicine 12(1):7–15. doi:10.3904/kjim.1997.12.1.7.

37

References

Related documents

Hence, the PP GC is important for the development of memory B cells following oral immunization and when the GC is disrupted by administration of anti-CD40L Mab during an

Understanding the regulatory requirements f or gut IgA B cell responses and their potential role in mucosal vaccine development | Rathan Joy Komban. SAHLGRENSKA ACADEMY

In addition, ascorbic acid enhanced the stemness of cultured mouse corneal epithelial stem/progenitor cells (TKE2) in vitro, as shown by elevated clone forma- tion ability and

IV Cortisol effects on the intestinal mucosal immune responses during cohabitant challenge with IPNV in Atlantic salmon (Salmo Salar).. (Submitted for publication in

Intestinal Mucosal Immunology of Salmonids – Response to Stress and Infection and Crosstalk with the Physical Barrier.. Department of Biological and Environmental Sciences,

Human colonic biop- sies stained with hematoxylin and eosin showing (3A) normal colon- ic mucosa; (3B) typical findings of collagenous colitis, with an in- creased

In order to understand the nature of infiltrat- ing T cells commonly observed in MC patients, we analyzed the T cell receptor (TCR) β chains in colonic biopsies of MC patients

This list consisted of genes for cardiac ion channel subunits and calcium handling that have been identified by Synnergren, Améen, Jansson, &amp; Sartipy (2012) as well as