• No results found

TRANSCRIPTIONAL CHANGES IN STEM CELL-DERIVED CARDIOMYOCYTES DURING EXTENDED CELL CULTURES

N/A
N/A
Protected

Academic year: 2021

Share "TRANSCRIPTIONAL CHANGES IN STEM CELL-DERIVED CARDIOMYOCYTES DURING EXTENDED CELL CULTURES"

Copied!
67
0
0

Loading.... (view fulltext now)

Full text

(1)

TRANSCRIPTIONAL CHANGES IN STEM CELL-DERIVED

CARDIOMYOCYTES DURING EXTENDED CELL CULTURES

Bachelor Degree Project in Bioscience G2E 30 ECTS

Spring term 2020 Michel Jonathan Speh a17jonsp@student.his.se Supervisor: Jane Synnergren Examiner: Zelmina Lubovac

(2)

Abstract

The recent advancement in the field of stem cell research gave rise to high expectations from both general and scientific audiences, and indeed, stem cells and stem cell-derived cells have an enormous potential to forward the fields of medical research and regenerative medicine.

Nevertheless, there are still many obstacles and limitations associated with stem cell technology.

One of those limitations is that currently no method to derive fully mature cardiomyocytes from human pluripotent stem cell is available. Instead, stem cell-derived cardiomyocytes only show basic characteristics of their adult counterparts and are thus only of limited use. However, there is experimental evidence that those cells may increase in maturity during extended culture times.

To improve the understanding of those processes, a characterisation of the transcriptional changes in human embryonic stem cell-derived cardiomyocytes over two weeks was made. Using standard bioinformatics methods, including differential and enrichment analysis as well as basic machine learning technologies, changes in mRNA and miRNA transcription and several functions related to those changes could be identified. The analyses indicated increasing structural organisation in the cell cultures, increasing expression of cardiac ion channels and decreasing proliferative capacity in the cardiomyocyte cultures. Furthermore, potential dysregulations of important signalling pathways were observed. The results of this project may aid in developing protocols for differentiating stem cells into cardiomyocytes with features that are more mature than those of the currently available stem cell derived-cardiomyocytes.

(3)

List of abbreviations

B-H Benjamini-Hochberg procedure

BP Biological process

B-Y Benjamini-Yekutieli procedure

CC Cellular component

Chip-seq Chromatin Immunoprecipitation Sequencing

CM cardiomyocytes

cTnT cardiac troponin t

DEGs differentially expressed genes

DE-miRNAs differentially expressed miRNAs

Epo erythropoietin

FC fold change

FDR false discovery rate

GO gene ontology

GSEA gene set enrichment analysis

GUI graphical user interface

HO null hypothesis

H1 alternative hypothesis

hESC human embryonic stem cells

hiPSCs human induced pluripotent stem cells

hPSC human pluripotent stem cells

hsa Homo sapiens

Log2 logarithm with the base 2

LOWESS locally weighted scatterplot smoothing

MF Molecular function

miRNA micro ribonucleic acid

mRNA messenger ribonucleic acid

RA Retinoic acid

RMA robust multichip average algorithm

RNA-seq RNA sequencing

SAM significance analysis of microarray

TF transcription factor

WSS within-cluster sum of squares

(4)

Table of Contents

Introduction ... 1

Biological Background ... 1

High throughput analysis of gene expression data ... 3

Analysis of differentially expressed genes ... 5

Transcription factor analysis ... 6

Analysis of differentially expressed miRNAs ... 7

Clustering ... 7

Problem formulation and Aim ... 8

Materials and methods ... 8

Origin of the data ... 8

Differential expression analysis ... 9

Functional analysis of the DEGs ... 9

miRNA downstream analysis ... 10

Analysis of genes associated with cardiac contraction ... 10

Ethical Considerations ... 11

Results ... 12

Data processing and quality control ... 12

Differential expression analysis ... 15

DEG functional analysis ... 17

miRNA downstream analysis ... 22

Analysis of genes associated with cardiac contraction ... 25

Discussion ... 29

Changes in the cardiomyocytes’ expression profiles over time ... 29

Changes in the expression of genes for ion channels and calcium handling ... 32

Transcription factor analysis ... 33

Discussion of the used methods ... 35

Strengths and limitations in the study design ... 40

Main findings and implications ... 41

Conclusion ... 42

Future prospects ... 42

Acknowledgements ... 43

References ... 44

Appendices ... 51

Appendix A – Enriched terms and pathways ... 51

Appendix B – K-means clustering... 60

(5)

Page: 1

Introduction

Biological Background

Since the scientific discovery on the possibility to induce of pluripotency by overexpressing certain factors in somatic cells by Takahashi and Yamanaka (2006), the field of stem cell research has brought astonishing new knowledge and provided unique technologies for medical research.

Nowadays, many protocols for differentiating pluripotent stem cells, including human induced pluripotent stem cells (hiPSCs) and human embryonic stem cells (hESCs), into various kinds of human somatic cells are established. The differentiated cells possess characteristics comparable to their natural equivalents and thus have a vast potential to serve as disease models or as platforms for development and testing of new drugs (Machiraju & Greenway, 2019). One kind of stem cell derivatives widely used for pharmacological research are stem cell-derived cardiomyocytes (CMs). They promise new therapy options for patients with cardiovascular diseases – according to the World Health Organisation, the leading cause of death worldwide (World Health Organisation, 2017).

Furthermore, there is a significant number of drugs that have shown toxic effects on the heart muscle. To address and potentially prevent these side-effects stem cell-derived CMs can be used to model states of the healthy and diseased heart. Such models give an immense number of opportunities for medical and pharmacological research. The human pluripotent stem cell- derived cardiomyocytes (hPSC-CMs) are, for example, used to assess beneficial and adverse effects of drug candidates, to identify and evaluate cardiac-specific biomarkers or to discover disease mechanisms and potential targets for novel treatments. Even the use of stem cells in regenerative medicine, to restore the function of a damaged or diseased heart muscle has been proposed and is under investigation (Denning et al., 2016; Holmgren et al., 2015; Kishino et al., 2020).

Stem cell-derived cardiomyocytes can be obtained from human pluripotent stem cells by defined biochemical modulation of the Glykogensynthase-Kinase (GSK) and the canonical WNT-signalling pathway and without the need for genetic manipulation (Lian et al., 2013). This differentiation strategy allows the creation of cells that are capable of spontaneous contractions and express core cardiac markers, such as cardiac Troponin T (cTnT). The protocol by Lian et al. (2013) can yield cell cultures with a purity of > 80% cTnT+ cells, which has been further increased to above 99%

(Lian et al., 2013; Machiraju & Greenway, 2019). Such highly pure cultures of hPSC-CMs are used for the aforementioned purposes and have a potential even for applications in regenerative medicine (Machiraju & Greenway, 2019). While the possible applications for hPSC-CMs are almost unlimited, their practical use yet remains limited. Indeed, hPSC-CMs show the basic features of mature cardiomyocytes (cTnT expression, contractility). However, they still differ significantly from mature CMs in various aspects on transcriptional, morphological/functional and metabolic level. Most prominently those differences can be seen in the expression of genes coding for structural elements of the cells and consequently in the corresponding cell structures. Adult CMs express structural genes such as MYH7, N2B, cTNi and SERCA2 at comparatively high levels and thus show a high level of organisation, express T-Tubules and may appear bi-nucleated. Stem cell- derived CMs, on the other hand, are less organised and express MYH6 and N2A rather than the genes mentioned above. Other genes that are expressed at lower levels in hPSC-CMs are S100A1, and SCN5A, coding for calcium-binding protein 1 and sodium voltage-gated channel alpha subunit 5, respectively (Machiraju & Greenway, 2019). Functionally, hPSC-CMs and adult CMs differ in their electrophysiological and contractile properties, with the hPSC-CMs having slower calcium handling, and weaker contractions that are in opposition to adult CMs spontaneous. Even the metabolism of the hESC-CMs differs compared to primary CMs, as hPSC-CMs tend to prefer

(6)

Page: 2 glucose and lactose as energy source, while primary CMs usually rely on fatty acids (Denning et al., 2016).

Considering the complex mechanisms that regulate the development of the embryonic, foetal and neonatal heart, it is no surprise that most efforts to differentiate stem cells into fully mature cardiomyocytes so far provide only partially satisfactory results. In vivo, the development of cardiomyocytes from the pluripotent cells of the mesoderm, via distinct cardiac progenitor cells towards mature, contractile and highly organised cardiac muscle cells takes place over several weeks of gestation and relies on a temporarily and spatially well-defined cascade of signalling events (Günthel, Barnett, & Christoffels, 2018). Several evolutionarily well-conserved transcription factor (TF) domains including GATA zinc finger, T-box, homeobox, MADS-box and helix-loop-helix motifs, have been identified as essential parts of the regulatory mechanisms of cardiac genesis (Cui, Wang, Bassel-Duby, & Olson, 2018). Another crucial aspect of the early cardiomyocyte development is the interaction of the cardiac progenitor cells with their surroundings. Here are the cells of the endoderm adjacent to cardiac progenitor cells of particular relevance, as those cells release factors such as sonic hedgehog (SHH), fibroblast growth factor (FGF8), bone morphogenetic protein (BMP2) and ligands of the WNT pathway, and thereby create the environment that is necessary for the further cardiac development (Cui et al., 2018; Paige, Plonowska, Xu, & Wu, 2015). Already at very early embryonic stages, the developing cardiomyocytes show the capacity for basic calcium handling which is later followed by the first contractions. The onset of calcium handling and the resulting electrical excitation and contraction provide the next level of development cues for the young cardiomyocytes (Tyser et al., 2016).

While in the earliest stages of cardiac development, the differentiation of cardiac progenitor cells into cardiomyocytes is responsible for cardiac growth, this shifts later towards a hyperplastic growth were the increase in myocardial volume is caused by the proliferation of existing cardiomyocytes. Experiment on chicken embryos showed that the regulation of cardiomyocytes proliferation is separated in an early, FGF dependent phase and a later phase in which the dependency on FGF ceases (Mima, Ueno, Fischman, Williams, & Mikawa, 1995). In 2003, Stuckmann, Evans, & Lassar could show, that in this FGF-independent phase, erythropoietin (Epo) and retinoic acid (RA) play a crucial regulatory role. Furthermore, they could prove that the epicardium, the outermost layer of the heart provides essential functionality in regulating the proliferation. Due to more recent findings, it is nowadays believed, that RA and Epo do not act directly on the cardiomyocytes, but instead induce the secretion of cardiac-specific mitogens such as IGF2 from the epicardium (Foglia & Poss, 2016). The hyperplastic growth of the myocardium continues until after birth and ceases within a short period afterwards.

During this period, the cardiomyocytes show a high rate of DNA synthesis without a corresponding rate of cell divisions, leading to a high number of polyploid cells and a relatively large population of binucleated cells (~25 % in humans) amongst the CMs. With that, the majority of the CMs leaves the mitotic cell cycle and thus loses the ability for proliferation (Cui et al., 2018).

The halting of the cell cycle seems to be mediated by the classic and well-known cell-cycle regulatory mechanism, which include increased cell cycle inhibitor activity and decreased expression of cyclins and cyclin-dependent kinases. (Foglia & Poss, 2016). A next key process during postnatal CM maturation is the isoform switch of contractility genes such as troponin or myosin. Furthermore, the metabolism of the cardiomyocyte’s changes from the foetal, carbohydrate-based metabolism towards the more efficient metabolism of fatty acids.

At this point, the cardiomyocyte’s functions and the overall organisation of the heart are complete enough to maintain the bodies blood supply continuously and could be considered as functionally mature. However, the increasing demand on the heart muscle during human childhood and youth results in ongoing maturation processes that do not end until the onset of adulthood (Cui et al.,

(7)

Page: 3 2018; Quaife-Ryan et al., 2017). In fact, the transcriptional differences between neonatal and adult cardiomyocytes are so substantial that Quaife-Ryan et al. (2017) claimed, “that they could be considered as 2 distinct cell types” (p. 1128). It remains to say that this section contains only a very brief summary of the complex regulatory mechanisms that are required to form an adult heart or even a functional neonatal CM out of the pluripotent cells of the mesoderm. However, even here, the challenge to replicate this process in vitro becomes apparent.

Comparing the earlier described properties of hPSC-CMs with those of cardiomyocytes in vivo, it appears that the stem cell-derived CMs resemble foetal CMs rather than neonatal or adult CMs.

Pluripotent stem cells are, customarily differentiated into CMs by modulating GSK and Wnt pathway and by employing FGF signalling, resulting in an approach that resembles the very early stages of in vivo CM developments (Burridge et al., 2014). However, this technique cannot mimic all signalling processes and environmental conditions that an in vivo developing CM is subjected to. Processes such as the RA and Epo dependent release of mitogens from the epicardium, or the increasing mechanical forces on the cells through the onset of blood flow cannot be easily replicated in vitro and thus, the differentiation techniques fail to provide such essential development cues, resulting in essentially immature cells.

Those immaturities significantly impair the usefulness of the data obtained from hPSC-CM-based assays, as an in vitro observed effect might not translate to the conditions in vivo. Some mechanisms, for example, the effect of a drug or agent that targets one of the genes that is not or only sparely expressed might not even be visible in the cell-based assay even though they are present in mature CMs. To improve the usefulness of hPSC-CMs, methods to increase the maturity of those cells are needed. Several promising approaches for that have been introduced. Those methods include modifications of the growth conditions like altering the carbon source of the cells, use of different growth substrates, co-culturing the CMs with other types of cells or genetic manipulation of the cells. Another very promising method for increasing maturity is to subject the cells to electrical or mechanical stimulation during growth (Machiraju & Greenway, 2019; Tu, Chao, & Wu, 2018). Other studies have shown that even a simple extension of the culture time of hPSC-CMs can alter certain properties of the cell and increase their maturity. A study by Ross, Rizvi, Emelyanova, Tajik and Jahangir (2019) showed that hiPSC-CMs, cultured for more than 30 days showed changes in both, expression of ion channels and electrophysiological properties compared to cells cultured for 10-15 days. However, the cells cultured for longer times still remained at an immature state, characterised by spontaneous action potentials.

Furthermore, Kamakura et al. (2013) have shown that, even after 360 days of culture, stem cell- derived CMs show spontaneous action potentials and are still steps away from being considered as mature. Additionally, culturing cells for a year or more just to bring them into a mature state does not seem feasible for many of the proposed applications. Henceforth, there is an obvious need for the development of more efficient stem cell differentiation protocols to obtain actual mature cardiomyocytes. Such development requires an in-depth knowledge of the genetic and molecular mechanisms that increase the maturity of hPSC-CMs. This includes the temporal changes in gene expression that the cells undergo during extended culturing times.

High throughput analysis of gene expression data

A large number of genes and pathways are involved in the development of cardiomyocytes in vivo.

To see how those important genes may affect the maturation of stem cell-derived cardiomyocytes, a method for large scale screening of gene expression data is needed. One widely established method that provides such a large-scale screening is the DNA microarray technology (Schena, Shalon, Davis & Brown, 1995). Although this technology has been developed and improved in the last 25 years, it still relies on the principle mechanisms that were employed by Schena et al.

(8)

Page: 4 (1995). Briefly, DNA microarray measures gene expression from extracted RNA by creating a library of fluorescently labelled cDNA, which then is hybridised to defined probes with known positions on a microarray plate. Those probes are DNA oligonucleotides that are complementary to unique regions in the target gene so that only a specific gene binds to a specific probe on the plate. After removal of unbound cDNA, the fluorescence of each probe is measured by a scanner, and the obtained intensities are converted into relative expression values for the different genes (Van Hal et al., 2000). Since its emergence, the DNA microarray technology has established as valuable tool for general research, genotyping, toxicology and safety assessment and even helped to forward the field of personalised medicine (Fumagalli et al., 2014; Van Hal et al., 2000).

Nowadays, however, a more recent technology has widely replaced the use of DNA microarrays for large scale gene expression analysis. This technology, called RNA sequencing (RNA-seq), uses next-generation sequencing for analysis and quantification of transcripts rather than the previously described on-chip hybridisation technique (Zhao, Fung-Leung, Bittner, Ngo, & Liu, 2014). This approach omits the need of pre-designed probes and thus provides a broader range of detection. Furthermore, RNA-seq appears to be more sensitive than DNA microarrays, and since it involves sequencing of the transcripts, it can give additional information about the transcriptome such as alternative splicing of RNA transcripts (Rao et al., 2019; Zhao et al., 2014).

Even though RNA-seq also comes with some disadvantages, especially regarding cost and increased requirements for data handling and analysis, those aforementioned benefits have resulted in RNA-seq being nowadays the gold standard for large scale gene expression analysis.

Despite that, the microarray technology has still their niches of application, and data previously obtained using DNA microarray is not invalid just because of the emergence of more powerful tools.

The processing and analysis of microarray results are made using bioinformatics and statistical methods. A common workflow for the analysis consists of data pre-processing, quality control and identification of differentially expressed genes (DEGs), which then can be used for further functional analysis. During pre-processing nonspecific background signals are removed (background correction) to omit this source of noise that might impair the data analysis and interpretation. Afterwards, the expression values are normalised. Here, the scale of all measured values is equalised to remove the impact of systematic effects that are not associated with actual biological differences. After normalisation, the data is generally prepared for the identification of DEGs; however, there are additional steps that should be done before this. Firstly, it is recommended to perform quality control by visually inspecting parameters such as central tendency, spread and distribution of the data and to investigate the general relationship or correlation between the different samples by performing hierarchical clustering. Furthermore, it should be considered whether probes that show continuously low expression or variance throughout the experiment should be removed prior to the differential analysis. After completing the pre-processing, and given that the quality control did not reveal any problems (e.g. variation that cannot be biologically explained or batch effects that could not be resolved by normalisation), the actual identification of DEGs can be made. Here, the ratios between genes for the selected comparisons are calculated (fold change), and statistical tests are applied to determine whether those changes appear to be statistically significant or not. A statistical method that is well established and commonly employed here is linear modelling, as described by Ritchie et al.

(2015). In this method, a linear regression model is fit through the whole experiment and moderated t-statistics are used to determine the p-value for each comparison. By that, a list of genes with their associated fold changes and statistics is generated, which then can be filtered for those genes that are considered as significantly differentially expressed. For this filtering, most commonly a p-value threshold (α) is used, which gives the maximum risk for a false positive.

However, since the statistical analysis of genomic or transcriptomics data involves an extremely

(9)

Page: 5 high number of comparisons, even a rather low α can lead to the accumulation of false positives that is too high to be acceptable. To address this issue, methods that adjust the p-value for multiple comparisons have been developed and by that, the accumulation of false-positive results can be avoided (Noble, 2009). A commonly used threshold for filtering results from linear modelling is a p-value (after correction for multiple testing) < 0.05. Additionally, the fold change can be used as another filtering criterion to retain only genes that have a fold change above a certain magnitude (Ritchie et al., 2015).

Analysis of differentially expressed genes

While a differential expression analysis, such as the aforementioned linear modelling can rather easily provide information about the differences in gene expression between different experimental conditions or sampling points, it usually does not give much information about the biological effects of those differences. A differential expression analysis can, depending on the experimental setups and the filtering parameters, results in several hundred to thousands of DEGs so that a manual retrieval and analysis of the gene products and their function is usually not a feasible option. Instead, a method called gene enrichment analysis is used to determine which functions are associated with genes in a list. This analysis takes advantage of the fact that a given biological function usually requires the interaction of proteins encoded by several genes. During the enrichment analysis, the fraction of genes in the sample list that are associated with a term is compared to the fraction of genes in a reference or background list (e.g. genome-wide list of annotated genes) associated with the same term. By that, terms that are overrepresented in the sample gene list compared to the background list can be identified. Additionally, statistical tests, such as Fisher’s exact test are used to determine whether this overrepresentation is statistically significant or not (Bauer, Gagneur, & Robinson, 2010). The terms that are identified by such analysis can be used in different ways to explain the biological functions of genes associated with the term. The probably most widely used source of such terms is the gene ontology consortium, a collection of several bioinformatics resources that strives to standardise the functional annotation of gene products. It does that, by providing a set of common and controlled vocabulary to describe different functional aspects of a gene product, i.e. to annotated a gene with a descriptive ontology.

As of now, the consortium provides three different root ontologies – Biological process (BP), Molecular function (MF) and Cellular component (CC) – that describe those three different key characteristics of a gene. Within each root ontology, several more specific levels of ontology describe those characteristics with increasing specificity. The consortium even provides information on how the different ontologies related to their parent and child terms (the adjacent more general and more specific terms respectively) (Harris et al., 2008; Jones & Paton, 1999). By providing this well-structured and hierarchical approach towards the functional annotation of gene or gene products, and thanks to continuous development and improvement, the gene ontology consortium has become one of the most comprehensive resources for the large-scale bioinformatics analysis of gene functions (Carbon et al., 2019). However, in many cases, not only the biological function of a given set of genes but also how those genes actually provide this functionality is of interest. As it is mentioned earlier, extensive interactions between gene products are often required for the correct regulation and establishment of certain functionality.

While the gene ontology project provides a large and wide-ranging collection of functional annotations, it does not generally provide much information regarding how the genes within an ontology interact to achieve the described functions. For that, other resources are needed. One resource that provides this information is the Kyoto Encyclopaedia of Genes and Genomes (KEGG).

This knowledgebase provides a collection of networks that describe and visualise the interactions of proteins and other molecules contributing to different physiological and pathological processes. As of now, KEGG contains pathway collections related to metabolism, genetic information processing, environmental information processing, cellular processes, organismal

(10)

Page: 6 systems, human diseases and drug development (Kanehisa, Sato, Furumichi, Morishima, &

Tanabe, 2019). The information on this knowledgebase is provided as descriptions of the pathways, and more importantly, as networks or maps, which visually show the interactions between proteins and other molecules in the pathways. In many cases, the maps also show modes of interaction (e.g. activation, repression), the cellular or extracellular location of that interaction and even points of interactions with other functional pathways (Kanehisa et al., 2019). By that, KEGG pathways provide important resources for the understanding of functions and effects of genes that were previously identified as of interest in a certain context, such as DEGs.

Using those two resources, the KEGG knowledge base and the gene ontology consortium as references for enrichment analysis, this method becomes a powerful tool for the identification of potential functions and interactions of large lists of differentially expressed genes.

Transcription factor analysis

After identifying DEGs and the functions and pathways that may be associated with those DEGs, the next question might be in which way, and by which molecules the expression of genes associated with those functions and pathways are regulated. All cells and organisms rely on complex regulatory mechanisms, on transcriptional, translational and post-translational levels to ensure that protein synthesis and activity appropriately meet the current cell’s or organism’s demand. Notable mechanisms for this regulation included the post-translational activation and inhibition of proteins, miRNA mediated inhibition of translation and transcription factor- mediated regulation of mRNA synthesis. Transcription factors are regulatory, DNA-binding proteins which recognise and bind to specific sequences (regulatory motifs) and induce or enhance transcription upon binding to those motifs. As such, transcription factors play a key role in almost all cellular processes and functions and are moreover involved in a huge variety of pathologies, including cancer, cardiovascular diseases and immunological diseases (Papavassiliou

& Papavassiliou, 2016). Considering the great impact of transcription factors on physiology and pathology of cells and organisms, it is obvious that transcription factors are seen as interesting targets to modify gene expression, both in cell-based experiments and in vivo as a new therapeutic option. Here, two approaches are discussed, and at least experimentally exploited. Firstly, it is possible to provide molecules that alter certain properties of the transcription factor itself, or its binding sites and thus modify (usually inhibit) the activity of the targeted TF (Hagenbuchner &

Ausserlechner, 2016). Secondly, some approaches use engineered artificial transcription factors, with defined properties of their DNA binding and activity domains (Yaghmai & Cutting, 2002).

However, both of those approaches require a deep understanding of the mechanisms by which the transcription factors act on their target genes and how and where they bind to DNA. The detection of transcription factor bindings sites or regulatory motifs in vitro is usually done using chromatin immunoprecipitation (ChIP) followed by a microarray experiment (ChIP-chip) or nowadays more common by second-generation sequencing (ChIP-seq) (Yan, Chen, & Costa, 2004).

The ChIP method identifies protein-DNA interactions by crosslinking proteins and DNA in vivo using chemicals such as formaldehyde before harvesting any lysing the treated cells. Then, the DNA is fragmented using sonication. Afterwards, antibodies specific for the protein of interest are used to precipitate fragments that are crosslinked to the target protein while all other fragments are washed away. The so obtained chromatin (which in this context refers to any DNA-protein complex) is then eluted and can (after reversal of the crosslinks) be used for further analysis such as qPCR, microarray or sequencing (Bruno, 2019; Yan et al., 2004). The numerous regulatory motifs and their associated TFs that have been identified in such Chip-seq experiments and that are collected in databases such as encode (Davis et al., 2018) provide valuable resources for the in silico prediction of transcription factors that may be affecting a set of genes of interest. Here, computational methods can be used to screen the region around the exons of the genes of interest to identify known regulatory motifs in this region and to predict TFs that are likely to bind to these

(11)

Page: 7 regions (Janky et al., 2014). Results from such an analysis can help to understand the regulatory mechanism behind the functions of a set of genes. They may even give interesting directions towards the search of new ways of influencing gene expression in vitro to change certain characteristics of cultured cells and in vivo as targets for new therapeutic agents (Gonda &

Ramsay, 2015; Lambert, Jambon, Depauw, & David-Cordonnier, 2018; Papavassiliou &

Papavassiliou, 2016).

Analysis of differentially expressed miRNAs

Another important regulator of gene expression are miRNAs – a class of short, non-coding RNA molecules which regulate gene expression post-transcriptionally by forming an RNA inducing silencing complex (RISC), consisting of at least an argonaut protein and a miRNA. The miRNA then guides the RISC to a miRNA responsive element – a sequence on the target mRNA that is complementary to the miRNA. There, the argonaut protein can either directly exert its endonuclease activity (AGO2) or recruits other proteins that facilitate the degradation or inhibit the translation of the mRNA (Catalanotto, Cogoni, & Zardo, 2016). Thus, miRNAs mainly inhibit the expression of their target genes, unlike TFs that increase or promote expression (de Sousa, Gjorgjieva, Dolicka, Sobolewski, & Foti, 2019). However, it has been reported, that the activity of miRNAs can – dependent on whether cells are under cell cycle arrest or not – switch towards upregulating translation (Vasudevan, Tong, & Steitz, 2007). In their experiments, Vasudevan et al.

showed, that hsa-miR-369 and hsa-let-7 upregulate translation in cells in cell-cycle arrest while they repress it in proliferating ones. Since their discovery, miRNAs have risen more and more attention, especially in medical research, and nowadays, there are used as biomarkers to identify diseases and disease states, for targeted gene silencing in vitro and even their use as drugs is discussed (Hanna, Hossain, & Kocerha, 2019). Hence, a good understanding of the miRNA function in cells and organisms is not only necessary to get a better picture of the whole gene expression regulation, but it also opens the possibility for directly affecting this expression by targeting miRNA expression or by providing synthetical miRNAs (Eguchi et al., 2016). Since miRNAs exhibit their function by regulating the translation of mRNA into proteins, a functional analysis of DE- miRNAs needs to be done by investigating their target genes. This can be done by using predicted or experimentally determined miRNA-mRNA interactions to create a set of genes that are affected by a given list of miRNAs. This set of genes can then further be analysed in a similar way as DEGs.

Clustering

An alternative approach to the analysis of gene expression profiles, and an approach that significantly differs from the more traditional ways that are mentioned above, can be taken by borrowing methods out of the toolbox of unsupervised machine learning. Unsupervised learning refers to methods, where an algorithm draws interferences out of an unlabelled dataset, i.e. the algorithm tries to discover patterns in a dataset without being in any way trained or prepared for the dataset. This approach may allow for an analysis that is not or less biased by presumption or expected results. The most basic of those unsupervised machine learning algorithms simply aim to assign the observations in a dataset into different groups, i.e. to cluster the observations. One of such algorithms is the k-means clustering algorithm introduced by MacQueen et al. in 1967.

This algorithm assigns each observation into one of k clusters with k being the predefined number of clusters. The algorithm then clusters the data in the following way (Ray & Turi, 1999):

1. Set k initial centres (either predefined or randomly generated positions) 2. Assign each data point to its closest initial centre.

3. Calculate new centres based on the previously made assignments 4. Assign each data point to the closes new centre

5. Repeat step 3 and 4 until the centres stop changing with subsequent iterations (or a predefined maximum of iterations is reached)

(12)

Page: 8 Using this iterative approach, the algorithm tries to create a clustering that minimises the total sum of squared distances in all clusters. As the distance measure, the algorithm usually employs the Euclidean distance, however, other, depending on the nature of the input data more appropriate measures, such as Pearson’s correlation can be used as well.

The rationale behind using such a clustering algorithm for the analysis of gene expression data lies within the assumption, that genes that act together at a functional level often have similar expression profiles, and change together over time. By using clustering, a large list of potentially interesting genes can be broken down into smaller clusters with similar expression profiles which then can be further analysed for common functions or common regulatory mechanisms.

Problem formulation and Aim

While stem cell-derived cardiomyocytes are expected to provide great opportunities to forward research and medicine in the field of cardiac diseases, the currently available hPSC-CMs frequently fail to meet their theoretical potential due to inefficient differentiation protocols. These protocols result in cells that resemble foetal or neonatal cardiomyocytes rather than adult ones. Thanks to numerous experiments and observations, it is known that stem cell-derived cardiomyocytes change gene expression, structure and function during extended culture times. Furthermore, there is evidence that those changes increase the maturity of the cells. However, not all of these changes and their contribution to the increased maturity of stem cell cardiomyocytes are fully known and understood yet.

The project aimed to use the aforementioned high-throughput data analysis methods to create a characterisation of the temporal changes in the gene expression profile of human embryonic stem cell-derived cardiomyocytes after 1, 2, 7 and 14 days of culturing. An additional aim was to evaluate if any of the identified changes include genes or pathways that have the potential to act as targets for a treatment that increases the maturity of stem cell-derived cardiomyocytes. With respect to this aim, the following research questions were formulated:

1. How does the expression profile of the hESC-CMs change over time?

2. How does the expression of genes for ion channels and calcium handling change?

a. Which transcription factors / miRNAs are associated with those changes?

b. Are there any known drugs/ agents that target those genes?

Materials and methods

The data analysis was, unless otherwise specified, conducted using R Version 3.6.2 (R Core Team, 2019) and the RStudio integrated development environment (RStudio Team, 2016). R base functions and the tidyverse package collection (Wickham et al., 2019) were used for data preparation, manipulation, and to visualise the results. Packages that were used for specific analysis are introduced at the corresponding parts of the report, and parameters that are not explicitly mentioned were left on default. The network analyses have been done in Cytoscape 3.7.2 (Shannon et al., 2003), references for the plugins used within Cytoscape can be found in the respective sections.

Origin of the data

The expression data used in the project originates from a study performed by Holmgren et al.

(2015), which aimed to identify novel biomarkers for the assessment of Doxorubicin-induced cardiotoxicity. In their study, the researchers used Cellartis® Pure hESC-CM (human embryonic stem cell-derived cardiomyocytes) obtained from Takara Bio Europe AB. The cells were thawed for four days, seeded and cultured for 14 days following the manufactures instructions (37 °C, 5

% CO2) and the culture medium was changed every second day. The experiment was performed in triplicates. During the 14 days of doxorubicin incubation, samples were collected at day 0, day

(13)

Page: 9 1, day 2, day 7 and day 14. Expression data was generated from these samples for the identification of potential biomarkers, including mRNA and miRNA expression. The mRNA expression levels were measured using the HuGene ST 2.0 array (Affymetrix, www.affymetrix.com) and for the miRNA expression the miRCURY LNA™ microRNA Hi-Power Labeling Kit, Hy3™/ Hy5™ (Exiqon, http://exiqon.gene-quantification.info, nowadays distributed by Qiagen, http://www.qiagen.com) was used. A more thorough description of cell culturing, sampling and mRNA microarray profiling can be found in Holmgren et al. (2015), and the miRNA profiling is described in Holmgren et al. (2016). The expression data are available on ArrayExpress under the accession numbers E-MTAB-6045 (mRNA) and E-MTAB-6052 (miRNA).

Only the untreated samples, i.e. the control samples in the experiments by Holmgren et al., are considered in this thesis project. For the mRNA analysis, the raw data available on Array Express were used, the miRNA data, however, were already processed, i.e. normalised using the global LOWESS (locally weighted scatterplot smoothing) regression algorithm (see processing protocol P-MTAB-68108) (Cleveland, 1979).

Differential expression analysis

Retrieval of the microarray data, quality control and identification of differentially expressed genes (DEGs) and differentially expressed miRNA (DE-miRNAs) were performed using R and the miodin package (Ulfenborg, 2019). This package was mainly created for the integration of multi- omics data, but it also provides a user-friendly and reproducible workflow for standard microarray analysis and was thus found to be suitable for this project.

The raw mRNA microarray data and the corresponding chip annotations were imported to miodin, and the data were normalised using the robust multi-array average algorithm (RMA). The threshold for the removal of low expressed probes was set to 5 (on the log2 scale), and all probes with an expression above this threshold in at least one sample were retained. The decision for this threshold was made based on the average expression values of stem cell marker genes that are expected to show no expression in differentiated cells. During these processing steps, miodin also generates visualisation plots for assessment of the quality of the datasets. After processing the data, the performOmicsModelling function was used to apply linear modelling, provided by limma (Ritchie et al., 2015) on the datasets. The study contrasts were defined setting day 0 as the baseline and comparing the subsequent days to this baseline. Genes with an absolute value of the log2 fold-change > 1 (i.e. a 2-fold up- or downregulation) and a false discovery rate (FDR) < 0.05 were considered as significantly differentially expressed and retained for subsequent analysis.

The miRNA differential analysis was performed similarly; however, since this dataset was already processed, no normalisation or filtering of probes was needed. Once again, results with an absolute log2 fold change > 1 and an FDR < 0.05 were considered as significantly differentially expressed.

The results of the differential analysis were then filtered for duplicates, and non-annotated probes were removed from the results. The remaining DEGs and DE-miRNAs were used subsequently.

Functional analysis of the DEGs

GO and pathway enrichment analysis

The functional analysis of the DEGs started with an annotation enrichment analysis (overrepresentation analysis) which was performed using the enrichR package (Jawaid, 2019), an interface to the web-based Enrichr gene set enrichment analysis tool (E. Y. Chen et al., 2013;

Kuleshov et al., 2016). Using an overrepresentation analysis for this part of the project was found suitable, as this type of analysis only requires a list of genes of interests (such as DEGs) as input and provides a good and easily interpretable overview over functions that are associated with

(14)

Page: 10 those genes of interest. The analysis was targeted on gene ontology terms (GO_Cellular_Compartment_2015, GO_Molecular_Function_2015, GO_Biological_Process_2015) as well as on KEGG pathways (KEGG_2019). Terms and pathways with a Benjamini-Hochberg adjusted p-value < 0.05 were considered significantly enriched. To identify terms and pathways that may be affected by the extended culture time, the enrichment analysis was made separately for the up- and downregulated genes at each day. Afterwards, the REVIGO tool (Supek, Bošnjak, Škunca, & Šmuc, 2011) was used to remove redundant GO terms from the results of the enrichment analysis. The identified GO-terms and corresponding FDR values were input to the tool, and terms with a similarity score based on the simRel method as described by Schlicker, Domingues, Rahnenführer, & Lengauer (2006) above 0.5 were filtered from the list.

Transcription factor analysis

The filtered list of GO terms was further analysed, and terms associated with cardiomyocyte function were selected for a more in-depth investigation of the regulation of their associated genes. The selected GO-terms and their associated DEGs were imported into Cytoscape, and the iRegulon plugin (Janky et al., 2014) was used to identify regulatory motifs found on those DEGs and to predict transcription factors that bind to those motifs. For the motif identification, the motif collection provided in the iRegulon plugin (collection available at https://aertslab.org/#data- resources-all) and Chip-seq tracks collected by the ENCODE: Encyclopaedia of DNA Elements (Davis et al., 2018) were used. The FDR threshold for the transcription factor prediction was left at default (0.001). The five motifs with the highest number of targets among the DEGs were selected, and for each motif, the transcription factor with the highest similarity has been mapped onto the gene-GO-term network.

Pathway mapping

Out of the significantly enriched KEGG pathways identified in the enrichment analysis described above amongst the upregulated genes, three pathways have been selected for a more in-depth analysis. Those three pathways have been mainly selected based on their FDR. However, their relevance for the given project has been used as a secondary selection criterion, so that the one pathway was excluded from the mapping. For the three pathways, the relative expression values of their associated DEGs were mapped onto the nodes of the native pathway maps using the Pathview package (Luo, Weijun, Brouwer, & Cory, 2013).

miRNA downstream analysis

The identified differentially expressed miRNAs (DE-miRNA) were further analysed using the DIANA mirPath v.3 tool (Vlachos et al., 2015). This tool was used to predict target genes of the DE- miRNAs, and to perform a pathway enrichment analysis of those targets. For the target prediction, the MicroT threshold, i.e. the threshold for the target prediction score, was set to 0.8 and only genes that were targeted by at least two of the DE-miRNAs were retained. Pathways with an FDR

< 0.05 were considered significantly enriched. The analysis was separated in up- and downregulated miRNAs and only DE-miRNAs identified on day 7 and 14 were considered since the number of DE-miRNAs at the earlier time points was too low for performing further downstream analysis.

Analysis of genes associated with cardiac contraction

Ion channels and calcium handling genes regulate crucial functionalities of cardiomyocytes. Thus, a focused analysis of the gene expression profiles of such genes was performed. For that, a list of genes of interested has been created. This list consisted of genes for cardiac ion channel subunits and calcium handling that have been identified by Synnergren, Améen, Jansson, & Sartipy (2012) as well as additional genes that are associated with the GO terms GO:0086091 - regulation of heart rate by cardiac conduction, GO:0086091 - cardiac muscle cell action potential or GO:0086001 -

(15)

Page: 11 regulation of cardiac muscle contraction by calcium ion signalling. Out of those 185 identified genes, 126 were present in the normalised mRNA expression dataset and were included in the subsequent analysis. A complete list of these 185 candidate genes is available in Appendix B, table 1. After identifying this list of genes, the k-means clustering approach has been used to create clusters of genes with similar expression profiles. At first, the optimal number of clusters was estimated using the elbow method, provided by the R package factoextra (Kassambara & Mundt, 2020). In this method, the optimal number of clusters is determined as the number of clusters after which total within-cluster sum of squares (WSS) ceases to decrease notably when another cluster is added. A number of four clusters was found to be suitable and used for the clustering analysis.

The k-means clustering itself was done using the amap package (Lucas, 2019). Pearson’s correlation was used as a distance measure, k was set to 4, and 100 different start configurations were used to improve the clustering. The results of the clustering were extracted, and the centres of each cluster were plotted to identify clusters of genes sharing a distinct gene expression profile over time. To get an indication of the functionality of the genes in the separate clusters, these genes have then been imported into Cytoscape. The Cytoscape string app (Doncheva, Morris, Gorodkin,

& Jensen, 2019) was then used to visualise whether the genes are coding for ion channels/ion channel subunits or for proteins that are involved in the regulation of ion transport. Regulators of the ion transport are here defined to genes that are listed under the GO_BP term regulation of ion transmembrane transport, and ion channels have been identified based on their UniProt keywords.

Ethical Considerations

While stem cells provide the great opportunities for medical research that are previously mentioned, the ethicality of their use is continuously and controversially debated. Since an in- depth discussion of all the ethical aspects could easily exceed the extent of this thesis project, the ethical discussion here will focus on two critical issues concerning the use of human pluripotent stem cells. In the following, these two aspects, the origin of the stem cells and the downstream research will be introduced and discussed in the context of the project.

The pluripotent stem cells that are used in research nowadays originate either from human embryos (hESCs) or from human somatic cells that have been de-differentiated into pluripotent stem cells (hiPSCs). The most ethical concerns arise from work with hESCs. These cell lines originated from the inner cell mass of a 5 to 7 days old human blastocyst. The ethical problem here comes from the necessary destruction of the blastocysts to obtain the inner cell mass. The major question here is, if the blastocysts, which has undoubtfully the potential to give rise to human life, already comprises a human life itself and thus must be protected as such. Despite the many years this has been debated, there is still no consensus to this question. However, many scientists agree on a moderate view on that matter. While they agree that a human embryo deserves certain protection and is more than just a clump of cells, they claim that research on embryos is acceptable given a good scientific justification and informed consent from the donors of the embryo. A more comprehensive discussion of the ethics of the different sources of human embryos for research can be found in the article by Lo & Parham, (2009) and in the entry “Ethics of Stem Cell research” in the Stanford Encyclopaedia of Philosophy (Siegel, 2018). The origin of hiPSCs is less controversial since those cells can be obtained from adult donors by only minimal- invasive methods. Provided a given informed consent by the donor, no ethical issues arise from getting those cells. However, the following concerns regarding downstream research on stem cells concern both hESCs and hiPSCs. Here, ethical problems may appear when stem cells are used for purposes that the original donor would not agree to. Common examples here are large-scale genome sequencing, genetic modification and injection into animals.

(16)

Page: 12 In this project, data that were obtained from hESC-CMs were analysed. They were collected in an experiment by Holmgren et al. (2015) and are publicly available. In the study, hESC-CMs provided by Takara Bio Europe AB were used, originating from the pluripotent stem cell line SA121, which is registered at the human pluripotent stem cell registry (hPSC reg) which oversees the ethical origin of its entries. The experiment yielded expression data of mRNA, miRNA and proteins, no whole-genome sequencing that could allow conclusions over the blastocyst donors was made.

The data analysis in this thesis project has been made using open-access tools; all of those tools and all analysis steps have been reported with the aim to achieve reproducibility of the performed study. Almost any kind of scientific work relies partly on the results and efforts of others, and this thesis project is no exception form that. The accomplishments of those researcher and software developers that through their work made this project possible are acknowledged by providing references to their relevant publications.

The results of this thesis project may help to improve the understanding of stem cell-derived cardiomyocytes, and of course, there is always a strive to apply an improved understanding to future research and application of stem cells and stem cell-derived cells which may give rise to new ethical considerations. However, since the nature of such potential future uses of stem cells and hESC-CMs is unknown, the associated ethical issues cannot be conclusively discussed here.

Thus, it can be concluded that there are no significant ethical concerns associated with the performed project or its results. No need to declare a conflict of interest could be identified.

Results

Data processing and quality control

After accessing and normalising the mRNA expression data using the RMA algorithm, the expression set contained 15 samples (5 sampling points with three replicates per day) with 53,614 features per sample. After applying the aforementioned criteria for filtering of low expressed genes, 21,16 features remained and were used for subsequent differential analysis. The distribution, as a measurement of the obtained data, was assessed before and after filtering using boxplot, histogram and density plot (Figure 1).

(17)

Page: 13 Figure 1. Quality control plots for the normalised mRNA expression data before and after filtering out low expressed genes. The boxplot (A) allows the comparison of range and spread of the data, and histogram (B) and density plot (C) show the distribution of the expression values.

In the boxplot (Figure 1A), it can be seen that all 15 samples have both similar central tendencies and spreads while the histogram (Figure 1B) indicates an approximately normal distribution. All lines of the density plot (Figure 1C) are close to each other, follow the same trends, and the bandwidth, i.e. the parameter that defines the smoothing of the curves with 0 equals no smoothing is low with values of 0.1254 and 0.1085, respectively. That means that not too much smoothing had to be done to plot the densities and thus, it can be concluded that the observed densities are close to the theoretical/ expected densities. In the plots created after filtering, it is visible, that probes with continuously low expression values are removed which leads to a higher median value for the arrays (visible in the boxplot) and a steeper increase of the frequency to the left of the central values (i.e. fewer values there). Based on those plots, the normalisation of the mRNA

A

B

C

N: 804255 Bandwith:

0.1254

N: 315240 Bandwith:

0.1085

(18)

Page: 14 data has been successful, and the normalised and filtered dataset is appropriate to use for the subsequent linear modelling.

The miRNA dataset, which, as previously mentioned, has already been normalised and not subjected to any filtering, contained 2,099 probes per sample. The quality control plots for the miRNA data are shown in figure 2.

Figure 2. Quality control plots for the miRNA expression data.

The spread of the miRNA expression data (Figure 2A) is not as equally distributed as in the mRNA data. However, there are no apparent outliers, and all the samples have reasonably similar median values. The histogram (Figure 2B) shows a normal distribution centred around 0 and the density plot confirms this distribution and shows no lines that significantly divert from the others. Thus, the three plots allow the conclusion that this dataset is suitable for linear modelling as well.

Another way of inspecting the dataset before starting the differential analysis is to use hierarchical clustering between the different samples to assess that replicated samples cluster together. The hierarchical clustering, provided by miodin, is based on Pearson’s correlation. The heatmaps of these clusterings are shown in Figure 3 (mRNA) and 4 (miRNA).

Figure 3. Hierarchical clustering of the mRNA samples. The colours in the heatmap show Pearson’s correlation between the different samples.

A B C

N: 31385 Bandwith:

0.03967

v

(19)

Page: 15 Figure 4. Hierarchical clustering of the miRNA samples. The colours in the heatmap show Pearson’s correlation between the different samples.

For both datasets, it can be seen that the clustering does not separate the samples only based on the sampling time points. Both, the time point and the origin of the sample, i.e. the replicate it originates from have a weight on the clustering, which indicates rather small time-dependent differences between these groups. However, there is a clear separation between the early (day 0 and day 1) and the late (day 7 and day 14) sampling points. Already the first branching of the dendrogram separates the first two (day 0 and day 1) and three (day 0, day 1 and day 2) sampling points (respectively for mRNA and miRNA) from the later ones.

Differential expression analysis

The linear modelling resulted in 17, 124, 516 and 628 DEGs and 4,2,14 and 40 DE-miRNAs for Day 1, 2, 7 and 14 respectively. After removal of non-annotated probes and duplicates, the numbers of the DEGs were reduced to 16, 105, 408 and 471. No duplicates or non-annotated probes had to be removed from the DE-miRNAs. Table 1 shows the number of DEGs and DE-miRNAs and the fractions of up- and downregulated genes and miRNAs for each day.

Table 1. Numbers of genes and miRNAs that were found to be significantly differentially expressed (absolute value of the log2FC > 1, FDR < 0.05).

Day 1 Day 2 Day 7 Day 14

Total DEGs 17 124 516 628

Duplicates / NAs removed 16 105 408 471

Upregulated 13 46 193 283

Downregulated 3 59 215 188

Total DE-miRNAs 4 2 14 40

Upregulated 1 1 10 27

Downregulated 3 1 4 13

Out of the DEGs, 15 genes were identified as differentially expressed at all four time points (Table 2). For most of those genes, the expression profile changed in a time-dependent manner, i.e. the log2 FC continuously increased or decreased with time (Figure 5).

(20)

Page: 16 Table 2. The gene symbol, the protein name and the log2 FC for the 15 genes that are differentially

expressed in all four time points are shown. Protein names were retrieved from UniProt KB.

Symbol Protein name Day 1 Day 2 Day 7 Day 14

ACTA1 Alpha-actin-1 -1.52736 -2.04395 -2.6567 -3.5801 AREG Amphiregulin -1.31303 -1.53064 -2.06911 -2.60376 MAL2 Protein MAL2 -1.18102 -1.5167 -1.76318 -1.79028 PIEZO2 Piezo-type mechanosensitive ion

channel component 2 1.088052 1.067495 1.218217 1.33752 PDGFRB Platelet-derived growth factor

receptor beta 1.090659 1.084502 1.177017 1.265146

LGALS3BP Galectin-3-binding protein 1.091451 1.385145 1.578666 2.098835 COL1A2 Collagen alpha-2(I) chain 1.27358 1.323212 2.122839 2.303734 SPON1 Spondin-1 1.276094 1.284207 2.179903 2.479685

MGP Matrix Gla protein 1.299893 2.056639 2.561292 2.730433

LUM Lumican 1.306391 2.155926 2.977291 3.126295

MFAP4 Microfibril-associated glycoprotein

4 1.34443 1.917514 3.073336 3.902462

COL3A1 Collagen alpha-1(III) chain 1.458547 1.636566 2.274655 2.397198 PTGIS Prostacyclin synthase 1.601192 2.168102 2.667669 2.456928

BGN Biglycan 1.808831 1.619201 2.328946 2.60282

ITGA8 Integrin alpha-8 1.973358 2.059407 1.337771 1.055318

Figure 5. The expression profiles of the 15 genes that were differentially expressed at all four time points.

Most genes follow a trend of having either a steadily increasing positive log2 FC or a constantly decreasing negative log2 FC with the only exception of ITGA8 which has a decreasing positive log2 FC after Day 2.

Among the DE-miRNAs, hsa-mir-665 was the only one found to be differentially expressed at all time points. The log2 FC of this miRNA changes from -1.163 (Day 1) to -1.385 (Day 14) and thus only minorly throughout the experiment.

Expression profiles of the 15 DEGs

(21)

Page: 17

DEG functional analysis

A list with all significantly enriched terms (after removal of redundant terms) and pathways can be found in Appendix A. Corresponding to the results of the differential analysis, where the number of DEGs increased over time, it could be seen that the number of significantly enriched GO terms and KEGG pathways increased correspondingly. Additionally, a trend that the terms at the later sampling point show lower adjusted p-values and higher counts, i.e. more DEGs associated with the respective terms could be observed.

Looking at the enriched GO terms identified amongst the upregulated genes (Figure 6 and Table 1 in Appendix A), the most noticeable results are the high number of terms associated with morphological properties of the cells. Mainly terms that relate to the organisation of the extracellular structures such as the extracellular matrix are enriched. At the later time points, terms related to cell surface and cell-substrate junction complement those findings. Other frequently occurring terms are those associated with growth factor activity and general metabolic processes. During the later time points (day 7 and 14), several of the enriched terms are rather nonspecific considering that the expression data originated from stem cell-derived cardiomyocytes. Examples of such nonspecific findings are terms related to, e.g. stress response of multicellular organisms (e.g. GO:0002920: regulation of humoral immune response and GO:0042060: wound healing) or even to the development of the neural system (e.g. GO:0007411:

axon guidance or GO:0001504: neurotransmitter uptake).

Figure 6. Gene ontology terms and KEGG pathways identified amongst the upregulated DEGs. For each day and source, the five terms with the lowest FDR have been selected for this graphics. If less than five terms were identified, all of those are shown.

(22)

Page: 18 A few GO terms that are especially interesting for this project, and the research question regarding the cardiomyocyte’s maturity were selected for further analysis. Table 3 gives an overview of those terms

Table 3. Selected GO terms associated with the function of mature cardiomyocytes. The table shows FDR (Benjamini-Hochberg adjusted p-value), Source (GO sub-ontologies), and Overlap (DEGs/total genes within the GO-term) for each of the terms.

Day Source Term FDR Overlap

Day 7 BP regulation of cardiac muscle contraction by regulation of the release of sequestered calcium ion

0.032965 3/17

Day 7 MF calcium ion binding 0.003 19/698

Day 14 BP cardiac muscle cell action potential 0.004407 5/30

Day 14 BP cardiac muscle cell action potential involved in contraction

0.027731 4/29 Day 14 BP regulation of cardiac muscle tissue growth 0.046252 4/35

Day 14 CC cation channel complex 0.017742 8/147

Day 14 CC voltage-gated potassium channel complex 0.040639 5/70

Day 14 MF calcium ion binding 5.41E-07 32/698

To get an understanding of candidate transcription factors that might affect those functions, the terms and their associated DEGs mapped onto a Cytoscape network. iRegulon was then used to add potential transcription factors of the mapped DEGs to the network. The resulting network for genes and functions on day 7 can be found in figure 7, and the network for day 14 is shown in figure 8.

Figure 7. Regulatory network for the relevant GO terms on day 7 made with Cytoscape and iRegulon. The blue circles show the selected GO-terms and their associated genes are displayed in the purple ovals. The green octagons show the five transcription factors that have the most interactions with the genes.

The selected terms, and thus the mapped genes in the network for day 7 (figure 7) are all associated with calcium handling. Most of the 19 mapped genes are associated with GO:0005509:

calcium ion binding in general. Only three of them are associated with the more specific GO:001088: regulation of cardiac muscle contraction by regulation of the release of sequestered calcium ion. The genes with the highest degree of interaction are RYR2 and DMD; both have seven

(23)

Page: 19 direct neighbours in the map which means they have predicted interactions with all five of the mapped transcription factors and are associated with both GO terms. Other genes that have predicted interactions with all five transcription factors are SMOC2 and SULF1. ATP1A2 is the only gene in the network that is only associated with the more specific GO term, and it is only predicted to be affected by NUCB1. The mapped transcription factors, their motifs and their degree of interactions are listed in table 4.

Figure 8. Regulatory network for the relevant GO terms on day 14 made with Cytoscape and iRegulon. The blue circles show the selected GO-terms and their associated genes are displayed in the purple ovals. Red ovals denote genes that show no interaction with the mapped transcription factors. The green octagons show the five transcription factors that have the most interactions with the genes.

For the network on day 14, 40 genes with functions related to calcium handling, cation channelling, cardiac muscle action potential and cardiac muscle tissue growth have been mapped.

An overview of the motifs and transcription factors is given in table 4. Five of the calcium ion binding genes (SULF1, GSN, CDH8, THBS2 and DMD), as well as KCND2 (potassium channel) and FGF12 (cardiac muscle cell action potential and cardiac muscle tissue growth), have been identified as targets of all five transcription factors. The genes RYR2 and DMD are the only mapped genes that are associated with both, calcium ion binding and with more cardiac-specific terms.

Both genes are involved in cardiac muscle cell action potential, and RYR2 is in addition to that also associated with the regulation of cardiac muscle tissue growth. While DMD, as aforementioned, is predicted to be regulated by all five transcription factors, RYR2 seems to be a target of TEAD1, NFIX and RPL6.

References

Related documents

Towards gaining better understanding of the differentiation and maturation process, we employed a standardized protocol to differentiate six hPSC lines into hepatocytes and

With their proliferative capacity and differentiation potential, human induced pluripotent stem cells (iPSCs) represent an appealing cell source for such a model system

Interestingly, germ cells from both the 1- and 7-year-old samples consisted of only undifferentiated spermatogonia (corresponding to States 0–1) (Figures 2A, 2D, and 2E) that

Keywords: aging, cell adhesion, cell-material interaction, exosomes, extracellular vesicles, mesenchymal stem cells, osteogenic differentiation, proliferation,

The findings presented in this thesis demonstrate the regenerative effects of MSC-derived EVs/exosomes, in terms of stimulating proliferation, osteogenic differentiation,

Studies have shown that it takes approximately one year for patients undergoing autologous or allogeneic HSCT to improve their HRQL to the baseline value (64, 65) , and that

Have you during the course been subjected to negative

Regenerative effects of mesenchymal stem cell- derived exosomes.