Transcription response in the TGF-betapathwayFrancisco Manuel Sánchez de Oria

(1)

Transcription response in the TGF-beta pathway

Francisco Manuel Sánchez de Oria

Degree project in biology, Master of science (2 years), 2008 Examensarbete i biologi 30 hp till masterexamen, 2008

Biology Education Centre and Department of Genetics and Pathology, Rudbeck Laboratory, Uppsala

University

(2)

List of abbreviations...2

Abstract...3

Introduction...3

The TGF superfamily β ...3

Role of TGF in tumor pathogenesis β ...4

TGF signal transduction and the Smad proteins β ...5

Studying transcription factors binding: Chromatin ImmunoPrecipitation (ChIP) assays...7

ChIPseq: next generation ChIP assays...10

Results...13

In vivo mapping of binding sites for Smad4 transcription factor...13

Verification of known binding sites...13

Analysis of Smad4 target genes...17

Smad4 enriched regions contained an over representation of Smad binding sites...19

Discussion...20

Global mapping of Transcription Factor Binding Sites in the postgenomic era...20

Unraveling the secrets of the complex network of TFs in the TGF pathway β ...21

Materials and methods...23

Experimental procedures...23

Cell cultures...23

Antibodies...23

ChIP and DNA template preparation for sequence analysis...23

PCR confirmation...25

Data analysis...25

Acknowledgments...25

References...26

(3)

List of abbreviations

BMPs Bone MorphogeneticProteins CDK CyclinDependent Kinases ChIP Chromatin Immunoprecipitation CLB Cell Lysis Buffer

FBS Fetal Bovine Serum

FoxG1 Forkhead Box G1B

ISmads Inhibitory Smads

MAPK Mitogen Activated Protein Kinases PAI1 Plasminogen Activator Inhibitor1 RSmads ReceptorRegulated Smads

RIPA RadioImmunoPrecipitation Assay SARA Smad Anchor for Receptor Activation

SBE Smad Binding Element

Smurf Smad ubiquitination regulatory factor TF Transcription Factor

TFBS Transcription Factor Binding Site

TGFβ Transforming Growth Factor beta

TSS Transcription Start Site

(4)

Summary

Transforming growth factor beta (TGF ) is a multifunctional cytokine involved in the regulation of β numerous cellular responses including cell proliferation, differentiation and apoptosis. Escaping from TGF induced apoptosis is one of the hallmarks that characterizes cancer cells. β The aim of this project was to identify genes bound and regulated by Smad2 and Smad4 transcription factors, which directly mediate TGF β signaling. In this project I used ChIPseq, a stateoftheart method used to analyze protein interactions with DNA. ChIPseq combines Chromatin Immunoprecipitation, a powerful method employed to selectively enrich for DNA sequences bound by a particular protein in vivo, with massively parallel DNA sequencing of the ChIPenriched DNA. Therefore the DNA bound to it is identified and mapped to the human reference genome to locate visually the position of every DNA fragment. Using HepG2 cells (human hepatocellular liver carcinoma cell line) as a model system, I identified well known target genes of the Smad4 protein, such as PAI1 and JUNB, as well as some candidate genes that could potentially be targets for therapeutic intervention, like FoxG1 and HNF4

genes.

(5)

Introduction

Transforming growth factor beta (TGF ) is a multifunctional cytokine involved in the regulation of β numerous cellular responses, such as cell growth and proliferation, differentiation, cellular matrix production, migration and apoptosis (Jennings & Pietenpol, 1998; Verrecchia & Mauviel, 2002). TGFβ is a secreted homodimeric protein member of the TGF superfamily. β Deregulation of TGFβ

expression or signaling is involved in a variety of diseases, including cancer and fibrosis (Blobe et al., 2000).

The TGF superfamily β

The TGF β superfamily includes more than 30 pleiotropic cytokines with similar structure, involved in key roles in development and tissue homeostasis. The first member of the TGF superfamily, β TGFβ1, was discovered in the late 1970s. Its name stands for the ability to induce growth and morfological transformation of rat kidney fibroblasts (DeLarco and Todaro, 1976). However, shortly after its

discovery it was shown that TGFβ1 also acts as an inhibitor of cell proliferation (Tucker et al., 1984).

This duality in cell growth regulation is celltype dependent and imprinted during embryonic

development (Sporn and Roberts, 1990). Other members of the TGF s β uperfamily are TGFβ2 and TGFβ3, bone morphogenetic proteins (BMPs), antimüllerian hormone (AMH), activins and nodal (Piek et al., 1999).

Role of TGF β in tumor pathogenesis

TGFβ plays important roles in tumor pathogenesis, contributing to cell growth, invasion and metastasis, angiogenesis and also decreasing host tumorspecific immune responses (Jennings &

Pietenpol, 1998). Although originally TGF acts as a tumor suppressor inhibiting cell growth in most β

cell types via the Smads pathway, once the tumor has been established most cells become resistant to

TGF and β TGF turns prooncogenic (Elliott and Blobe, 2005) (Figure 1). Escaping from TGF β β

growth inhibition is the identifying characteristic of many cancer cells (Massagué et al., 2000).

(6)

Figure 1. The dual role of TGF in tumor pathogenesis. β

TGF arrests the cell cycle progression at early G1 through controlling a number of important cell β cycle regulators (Hanahan and Weinberg, 2000). Cyclindependent kinases (CDK) regulation is essential for cell growth inhibition mediated by TGF . This regulation can be either direct β

downregulation of CDK levels (Zhang et al., 2001) or by upregulation of CDK inhibitors (Alexandrow and Moses, 1995)

Alterations of the TGF pathway can increase cancer risk. A common example is TGFBR1*6A , a β variant of the TGFBR1 gene with a 9bp inframe deletion. This modification is present in

approximately 14% of the general population and results in decreased TGF mediated growth β inhibition. Population studies have shown that this allele is related to increased breast cancer risk by 31% for heterozygotes and 169% for homozygotes, respectively (Zhang et al., 2005). Nonetheless, the most common cause of TGF signalling alteration is the mutational inactivation of the TGFBR2, β present in about 2030% of all colon cancers (Biswas et al., 2004).

TGF β signal transduction and the Smad proteins

The Smads proteins directly mediate the biological effects of TGF . The Smads proteins are homolog β of both the Drosophila mothers against decapentaplegic (MAD) protein and the Caenorhabditis elegans SMA protein, their name is a combination of the two.

TGFβ binds to and activates type I and type II serine/threonine kinase receptors present in the surface

of the cell. The receptorregulated Smads (RSmads) directly mediate TGFβ signalling upon receptor

activation, and those are Smad1, Smad2, Smad3, Smad5 and Smad8. The Smad anchor for receptor

(7)

activation (SARA) or endofin mediates Smad activation delivering the RSmads to the receptor, which result in phosphorylation of the RSmads (Tsukazaki et al., 1998; Shi et al., 2007). Once RSmads are activated they form heterodimeric complexes together with Smad4. These complexes are translocated to the nucleus, where they recruit other transcription factors to regulate the expression of target genes through the interaction with other transcription factors, coactivators and corepressors (Massagué et al., 2005). Such genes mediate the biological effects of TGF . Some of the activated target genes stimulate β tumorigenesis, while others suppress it. Although Smad4 is not required for translocation into the nucleus, it seems to be needed for the Smad complex to act as a transcription factor (Liu et al.,1997).

Besides the TGF superfamily receptors there are other kinases such as cyclin dependent kinases β (CDK) and mitogen activated protein kinases (MAPK) that can phosphorylate Smad proteins thus regulating their capacity of controlling transcription of their target genes (Matsuura et al., 2004 ; Kamaraju and Roberts, 2005) (Figure 2).

Figure 2. The TGF Smad pathway. TGF binds to and activates type I and type II serine/threonine kinase receptors β β present in the surface of the cell . The receptorregulated Smads (RSmads) directly mediates TGF signalling upon β receptor activation. SARA or endofin mediates Smad activation delivering the RSmads to the receptor, which result in phosphorilation. Once RSmads are activated they form heterodimeric complexes together with Smad4 and are translocated to the nucleus, where they control target genes. Smad7/Smurf12 represents the negative loop of the cycle, ending signaling.

Modified with permission from ten Dijke and Hill, 2004.

(8)

Smad6 and Smad7, the ISmads, constitute a subclass of inhibitory Smads that acts in direct opposition to RSmads signalling, forming a negative feedback loop. Originally this subclass was shown to

compete with RSmads for activated type I receptor binding (Moustakas et al., 2001). Later on they were found to produce ubiquitination and degradation of the activated type I receptor by recruiting of E3ubiquitin ligases, also known as Smad ubiquitination regulatory factor 1 (Smurf1) and Smurf2, thus ending signalling (Shi and Massagué, 2003). Shortly after this discovery it was demonstrated that I

Smads associate with phosphatases, dephosphorylating and therefore inactivating type I receptors (Shi et al., 2004). A possible role for ISmads in transcriptional regulation has also been postulated as Smad6 has been shown to repress BMPinduced transcription by recruiting corepressor CtBP (C

terminal binding protein) and Smad7 disrupts Smad2/Smad3 complexes in the nucleus (Lin et al., 2003).

Although there are numerous members in the TGFβ superfamily that produce a vast diversity of cellular responses there are only two different Smad pathways known, raising many questions about how signaling specificity and diversity are produced (Attisano and Wrana, 2002; Miyazawa et al., 2002).

Studying transcription factors binding: Chromatin ImmunoPrecipitation (ChIP) assays Transcription is controlled by the association of transcription factors (TFs) with their target DNA sequences in gene regulatory regions and additional recruitment of activators of the transcription machinery, hence it is of great importance to be able to study in vivo proteinDNA associations. Those associations are fine tuned by epigenetic modifications including methylation of CpG dinucleotides (Antequera, 2003), posttranslational modifications of histones (Strahl and Allis, 2000 ; Jenuwein and Allis, 2001) and incorporation of histone variants (Mito et al., 2007). Such modifications are used by the transcription factors to modulate transcription and constitute the epigenetic code (Cosgrove and Wolberger, 2005).

Chromatin Immunoprecipitation (ChIP) assays are the cuttingedge techniques to study large scale

proteinDNA interactions in vivo. The ChIP technique involves reversible crosslinking of proteins with

DNA, a procedure by which the proteinDNA interaction is covalently linked using formaldehyde. The

purpose of the crosslinking is to ensure that the DNAprotein link is maintained during the ChIP

procedure. The chromatin is fragmented into smaller pieces, usually in the range of 200 base pairs

length, using either enzymatic digestion or sonication of the nuclei. The sheared chromatin is then

immunoprecipitated with an antibody recognizing the protein of interest. In the last steps the crosslink

is reversed, proteins are digested and the enriched ChIPDNA is purified (Figure 3). For a recent review

of the ChIP current state and applications see Collas and Dahl, 2008.

(9)

Figure 3. ChIP assay experimental outline. Modified with permission from Collas and Dahl, 2008

For several years a strong limitation of the ChIP technology was the restriction of analysis of the ChIP

selected DNA material to a set of predetermined target sequences using PCR with chosen primers. This method introduces a strong bias towards the sequence of interest. Array technology extended the power of ChIP, enabling the discovery of novel target sites for TFs and build the map of posttranslationally modified histones across the genome. This approach is known as ChIPonchip or ChIPchip, and was first successfully applied on yeast in three papers published in 2000 and 2001 (Ren et al.,2000; Iyer et al.,2001; Lieb et al.,2001). Recent advances in microarray technology have made it possible to study TFs genomewide in human cells (RadaIglesias et al., 2008). Microarray hybridization overcomes the limitations of regular ChIPPCR analysis and have permitted genomewide scope analysis. Nonetheless, the advent of nextgeneration sequencing technologies have lead ChIP assays to the next frontier.

ChIPseq: next generation ChIP assays

The so called nextgeneration sequencing machines are machines capable of producing tens to hundreds of millions of short sequence reads during a single instrument run (Shendure and Ji, 2008). This

unprecedented sequencing capacity is being applied in many fields of biology enabling striking scientific advances at dizzying speed.

ChIPsequencing, also know as ChIPseq, uses this novel technology to sequence ChIPDNA fragments in massively parallel manner. Some of the advantages of ChIPseq over ChIPchip assays are lower cost, less input DNA or less amplification requirements, not limited by microarray content and more

accurate mapping (Barski et al., 2007; Johnson et al., 2007; Mikkelsen et al., 2007; Robertson et al.,

2007). ChIPseq has been recently used to study epigenetic changes in the DNA and target sites for TFs

(10)

and other related chromosomeassociated proteins across the entire genome, enabling the possibility to build a high resolution genomewide map for gene expression and genome function (Barski et al., 2007;

Johnson et al., 2007; Robertson et al., 2007; Wederell et al., 2008).

The Illumina sequencing technology (Figure 5), which relies on proprietary reversible terminatorbased sequencing chemistry. The first step prior to sequencing is the library preparation. Adaptor sequences are ligated to the DNA fragments. The ligated fragments are then amplified and immobilized in a flow cell surface, where they are directly amplified (solid phase amplification) to create up to 1000 clones of each single molecule in very close proximity. Then the clusters of clones are sequenced using

fluorescentlylabeled modified nucleotides (sequencingbysynthesis). One important property of those nucleotides is reversible termination, allowing the presence of the 4 nucleotides (A, C, T, G)

simultaneously during sequencing, which results in higher accuracy than methods where only one

nucleotide is present at the time. For a cycle of sequencing, a laser excites the fluorescentlylabeled

nucleotides and the image is captured determining the identity of the base for each cluster. Each cycle is

repeated to obtain the sequence of bases in a given fragment. In the last steps the Illumina Pipeline

software maps the sequence reads to a reference genome in order to obtain the genomic coordinates of

every ChIPDNA fragment (aligned reads). The resulting file contains the sequence of every DNA

fragment and its location in the genome, and it can be formatted and uploaded to the University of

California Santa Cruz (UCSC) genome browser (http://genome.ucsc.edu/) genome browser to locate

visually the position of every DNA fragment in the genome and compare the different samples. Those

regions of the genome where several aligned ChIPDNA overlaps form peaks. Each step in the peak

represents the position of an aligned ChIPDNA read in the human reference genome.

(11)

Figure 5. Scheme of the Illumina sequencing technology. Modified with permission from Illumina Inc., www.illumina.com (2008)

Aim

The aim of this project was to identify genes bound and regulated by Smad2 and Smad4 transcription

factors, which directly mediate TGF signaling, in HepG2 cells. For that purpose I used chromatin β

immunoprecipitation and high throughput parallel sequencing (ChIPseq), a method employed to

determine the in vivo genomic localization of transcription factors and other chromatin related proteins.

(12)

Results

In vivo mapping of binding sites for Smad4 transcription factor

Chromatin immunoprecipitation coupled to highthoughput sequencing technology (ChIPseq) can be used to profile wholegenome binding sites for a chosen transcription factor (Barski et al., 2007;

Johnson et al., 2007; Robertson et al., 2007; Wederell et al., 2008). In this study I used chromatin immunoprecipitation to isolate DNA bound by Smad4 in TGF treated and control HepG2 cell β s. All ChIP samples were confirmed for the presence of known binding sites using semiquantitative or quantitative PCR. This step is required to evaluate the efficiency of the ChIP procedure before further analysis using the Illumina genome analyzer. Samples in which the result of the PCR showed low or no enrichment in known binding sites were discarded. Then the inmunoprecipitated Smadbound DNA samples were sequenced using Illumina 1G genome analyzer and mapped with respect to the human genome using the Illumina Analysis Pipeline, thus identifying target genes of the TGF pathway. The β output text files were converted to browser extensible data (BED) format in order to visualize the data in the University of California Santa Cruz (UCSC) genome browser (http://genome.ucsc.edu/). Table 1 summarizes the statistical information in the output files obtained:

Table 1. Sequencing statistics obtained for each sequenced ChIPDNA sample.

Name Antibody TGFbeta treated #Aligned reads Peaks

^a

Smad2 antiSmad2 Yes 3176304 586

Smad4 antiSmad4 Yes 2851319 667

Smad4C_Last antiSmad4 No 4846500 17330

Smad4T_Last antiSmad4 Yes 4707083 3117

a

Since Smad2 and Smad4 samples had less sequences, peaks above 6 overlapping reads are counted, while in Smad4C_Last and Smad4T_Last only peaks above 8 overlapping reads are counted.

Verification of known binding sites

In order to validate my data in silico, I looked for peaks located in well known and characterized promoters of the Smads target genes. I extracted 3 known target genes from the literature and analyzed them in the UCSC Genome Browser (Mar. 2006 Assembly): plasminogen activator inhibitor1 or PAI1 (SERPINE1), JUNB protooncogene (JUNB) and SMAD family member 7 (Smad7).

For the PAI1 gene, a Smad binding region has been located −586 to −551 upstream of the gene. This

(13)

region contains 3 Smad Binding Elements (SBE) and an Ebox, and the 3 bp spacer between the Ebox and an SBE has been shown to be essential to mediate TGF induced transcription (Hua β et al.,1999).

An Ebox is a small DNA sequence located typically upstream of a gene promoter and contains a palindromic canonical sequence CACGTG. Transcription factors containing the basichelixloophelix protein structural motif typically bind to Eboxes or related variant sequences and enhance transcription of the downstream gene. TGF activates JUNB by binding of a nuclear factor to a promoter distal β element, a 22 bp sequence located between nucleotides 2813 and 2792 relative to the JunB gene, where a SBE for this gene has been characterized (Jonk et al., 1998). The Smad7 gene has been shown to be regulated by the Smad3Smad4 complex in the presence of TGF treatment. The Smad7 β

promoter is located 471 to 275, and it contains a perfect 8 bp SBE (GTCTAGAC) (Nagarajan et al., 1999 , Stopa et al., 2000).

The first ChIP carried out included Smad2 and Smad4 transcription factors in TGF treated HepG2 β cells. The samples were sequenced and the sequencing data was analyzed in the UCSC genome

browser. The number of peaks in those samples were similar and in many cases overlapping in the same position. However, I did not find peaks at known binding sites.

The second set of samples were Smad4 control and TGF treated (Smad4C_Last and Smad4T_Last). β Those samples confirmed known binding sites for PAI1 (Figure 6A and 6B) and JUNB (Figure 7).

Nevertheless, although my data did not support Smad4 binding at 471 to 275 for the Smad7 gene,

there was a peak at around +750 bp (Figure 8). Further studies are necessary to determine whether the

471 to 275 region of the Smad7 gene is in fact negative for Smad4 binding in vivo.

(14)

A

B

Figure 6. (A) The PAI1 promoter in the UCSC genome browser showing the genome localization of the sequences

precipitated with antiSmad4 antibody. The upper panel (Smad4T_Last) represents the sequence tags (black) from the

TGF treated sample, while the lower panel (Smad4C_Last) represents the sequence tags (black) for the control sample. β

The Y axis represents the peak height, whereas the X axis represent the localization in the genome. Peaks are scaled

according to the tallest peak of each panel, so that different scaling is used in each panel. (B) A closer look at the PAI1

binding site. SBEs are shown in red, the Ebox sequence in black and the the 3 bp spacer between the Ebox and a SBE

shadowed in green.

(15)

Figure 7. The JUNB distal element in the UCSC genome browser showing the genome localization of the sequences precipitated with antiSmad4 antibody. The upper panel (Smad4T_Last) represents the sequence tags (black) from the TGF

treated sample, while the lower panel (Smad4C_Last) represents the sequence tags (black) for the control sample. The Y

β

axis represents the peak height, whereas the X axis represent the localization in the genome. Peaks are scaled according to the tallest peak of each panel, so that different scaling is used in each panel.

Figure 8. The Smad7 promoter in the UCSC genome browser showing the genome localization of the sequences precipitated with antiSmad4 antibody. The upper panel (Smad4T_Last) represents the sequence tags (black) from the TGF treated sample, while the lower panel (Smad4C_Last) represents the sequence tags (black) for the control sample. β The Y axis represents the peak height, whereas the X axis represent the localization in the genome. Peaks are scaled according to the tallest peak of each panel, so that different scaling is used in each panel

The overall success at identifying known targets genes suggests that my data have a good coverage of

(16)

known Smads binding sites across the genome.

Analysis of Smad4 target genes

In an attempt to extract relevant biological information from the enormous amount of data, the treated sample was filtered according the following criteria (see methods section): peaks below 8 hits and peaks located at the same position as peaks above 5 hits in the control sample were removed. Regions were extended +/ 250 bp from center and only those that were within 10 kb of a transcription start site (TSS) were saved. In this way, both weak peaks and peaks located in the same position in the control and treated sample were filtered away, leaving only strong peaks in the proximity of a TSS, which could potentially be gene promoters. Out of the 590 regions analyzed, table 2 shows the top 20 most enriched regions, description of the closest gene and distance from the peak to the TSS.

Table 2. Top 20 enriched regions within 10 kb of a TSS.

It is important to mention that amongst the most enriched regions appears the Forkhead Box G1B (FoxG1) gene with peak height 9. FoxG1 has been shown to regulate p21 expression (Seoane et al., 2004), a gene whose regulation determines TGF mediated growth inhibition. β

A histogram of distance from all 590 peaks to the TTS reveal that most were immediately downstream

Gene id Description

17 NM_002985 CHEMOKINE (C-C MOTIF) LIGAND 5 9322

15 NM_032514 MICROTUBULE-ASSOCIATED PROTEIN 1 LIGHT CHAIN 3 ALPHA 7289

14 BX538238 HYPOTHETICAL PROTEIN DKFZP686B0790 30

14 AF346307 CHROMOSOME 19 F379 RETINA SPECIFIC PROTEIN -4555

14 NM_005484 POLY (ADP-RIBOSE) POLYMERASE FAMILY, MEMBER 2 -356

14 NM_080833 CHROMOSOME 20 OPEN READING FRAME 151 -6130

14 BC050331 HYPOTHETICAL PROTEIN DKFZP434K191 -50

14 AK123337 HYPOTHETICAL PROTEIN MGC12760 -95

14 AK125239 SIMILAR TO RIKEN CDNA 4632412N22 GENE -8043

13 X87871 HEPATOCYTE NUCLEAR FACTOR 4, ALPHA 3317

13 AK094414 ACYL-COA SYNTHETASE SHORT-CHAIN FAMILY MEMBER 1 8863

13 NM_148961 OTOSPIRALIN 7343

13 NM_152837 SORTING NEXIN 16 434

13 BX161415 TETRATRICOPEPTIDE REPEAT DOMAIN 6 5781

13 NM_198441 FLJ40296 PROTEIN -419

13 NM_198951 TRANSGLUTAMINASE 2 (C POLYPEPTIDE, PROTEIN-GLUTAMINE-GAMMA-GLUTAMYLTRA... 759

12 NM_203448 HYPOTHETICAL PROTEIN LOC286286 -653

12 NM_000088 COLLAGEN, TYPE I, ALPHA 1 -1895

12 AK131425 CDNA FLJ16545 FIS, CLONE OCBBF3004972 4979

12 NM_004455 EXOSTOSES (MULTIPLE)-LIKE 1 532

12 NM_001054 SULFOTRANSFERASE FAMILY, CYTOSOLIC, 1A, PHENOL-PREFERRING, MEMBER 2 2228 Peak

Height Distance

from TSS

(17)

of the TTS. More than 50 peaks were located in the 0 +500 bp region of a TSS, suggesting that the peaks were preferably situated close to TSSs and not randomly distributed (Figure 9).

Figure 9. Histogram of distance of peaks to TSSs.

Smad4 enriched regions contained an overrepresentation of Smad binding sites

To determine whether the regions occupied by the filtered peaks contained Smad binding motif

sequences the data was analyzed using RegionMiner software (Genomatix, www.genomatix.de). The

software engine searched for all known TF binding motifs potentially contained in the data submitted

and the result was sorted by overrepresentation of those motifs compared to a set of background

promoters (Table 3). The overrepresentation reflects the fold factor of match numbers in regions

compared to an equally sized sample of the background (i. e. found versus expected). The Smad family

of transcription factors were reported among the top, with an overrepresentation of 1.95, suggesting that

the Smads binding motifs in my data occur almost twice as often as expected by chance.

(18)

Table 3. Overrepresentation of TF motifs contained in the sequenced samples

a

Zscores measures the distance from the population mean in units of the population standard deviation. Zscores bellow 2 or above 2 are considered statistically significant, corresponding to a pvalue of about 0.05

Expected

V$HAML 535 201.13 14.18 2.66 23.51

V$GABF 502 214.32 14.63 2.34 19.62

V$GZF1 254 109.85 10.48 2.31 13.71

V$HIFF 528 256.55 16.01 2.06 16.92

V$SMAD 416 212.82 14.58 1.95 13.9

V$BNCF 183 97.28 9.86 1.88 8.64

V$RBP2 167 89.09 9.44 1.87 8.2

V$MEF3 102 55.22 7.43 1.85 6.23

V$AHRR 625 339.47 18.41 1.84 15.48

V$MITF 190 104.09 10.2 1.83 8.37

V$OAZF 278 153.3 12.38 1.81 10.03

V$CHRE 194 109.15 10.45 1.78 8.08

V$MTF1 151 85.63 9.25 1.76 7.01

V$HESF 692 408.61 20.2 1.69 14

V$SREB 252 154.11 12.41 1.64 7.85

V$RREB 411 257.71 16.05 1.59 9.52

V$SP1F 1703 1075.08 32.73 1.58 19.17

V$P53F 604 390.16 19.74 1.55 10.81

V$EBOX 814 536.33 23.14 1.52 11.98

TF

Families Number of

Matches Std.

Dev. Over

representation Z-Score

^a

(19)

Discussion

Global mapping of transcription factor binding sites in the postgenomic era

Global mapping of transcription factor binding sites (TFBSs) and histone modifications has become widely available thanks to ultrahighthroughput sequencing technology. This innovative technology is quickly being used to decipher the complex network of TFBSs that regulate the mammalian genome (Barski et al., 2007; Johnson et al., 2007; Robertson et al., 2007; Wederell et al., 2008).

However, not all TFBSs identified in a wholegenome study (using either ChIPchip or ChIPseq) are functional TFBSs. Although ChIPseq allow for a more accurate and unbiased mapping of TFBSs, there is an increased need for tools that are able to differentiate true regulatory elements from those that are the consequence of biological noise (Struhl, 2007) or simply indirect TFDNA interaction

crosslinked through proteinprotein interaction during the ChIP process. Certain factors like the high interdependency among transcriptional networks or functional redundancy make it difficult to

recognize true binding sites. Moreover, it is important to note that binding sites that are located far from TSS are difficult to attribute any transcriptional regulatory function (Carroll et al., 2005), thus a given TF with a large number of binding sites located in such areas may act on enhancers, silencers or other distal regulatory elements, hence leading to a high false discovery rate of target genes.

The data presented here show somewhat low peak height. This could be caused by technical factors related to the efficiency of the antibody recognizing the Smad4 protein or the ChIP protocol itself, or due to biological factors such as the way transcription factors form complexes with Smad proteins.

Nonetheless, this study attempted to overcome those limitations by restricting the analysis to proximal regions of TSSs. Functional studies of the potential candidate genes are necessary in order to validate their implication in the TGFβ pathway.

Unraveling the secrets of the complex network of TFs in the TGF pathway β

In this study I used HepG2 cells to identify targets of Smad4 in the TGF pathway. The number of β genes detected may be underestimated due to the filter process. Along with known target genes my data suggest interesting genes that potentially could be good candidates for follow up studies.

Interestingly, FoxG1 gene appears as a candidate gene with peak height 9. FoxG1 binds to FoxOSmad

complexes and blocks p21 expression (Seoane et al., 2004). p21 is thought to be one of the most

(20)

important genes whose regulation determines TGF mediated growth inhibition. It has been shown β previously that p21 is constitutively expressed in HepG2 cells through Smad4 and Sp1 and does not get upregulated further by treating the cells with TGF (Moustakas and Kardassis, 1998). These findings β are supported by the fact that in my data the control sample showed a peak for Smad4 at about +175 bp of p21 and there was no signal in the treated sample, suggesting a basal state of p21 expression.

Nonetheless, there was no binding for the FoxG1 gene in the control sample, but there was binding in the treated sample. Taken together these results indicate that, in contrast to other cancer cell lines, p21 has no further upregulation during TGF treatment in HepG2 cells. β

Another interesting candidate gene is the hepatocyte nuclear factor 4 (HNF4 ). HNF4 belongs to    the nuclear hormone receptor family of transcription factors. This gene is recognized as a key regulator of hepatocyte differentiation and function (Watt et al., 2003). Alterations of the HNF4 activity have  very serious consequences, such as maturityonset diabetes of the young 1 (Fajans et al., 2001). It has been shown recently that TGF represses HNF4 expression via ALK5/Smad3/HMGA2 signaling β  pathway in mouse epithelial cells (Ishikawa et al., 2008). Those findings are supported by my data since there was a peak covering the TSS of the HNF4 gene, suggesting that the same mechanism may  act in human cells.

The detailed knowledge database of the molecular basis of cancer provides many unexploited targets for

therapeutic intervention. This study expands that knowledge a little, allowing for future projects derived

from these candidate genes presented. Emerging genebased therapies, such as gene regulation by

enhancing or suppressing expression or gene insertion (tumor suppressors, apoptosisinducing genes,

etc.) targeting cancer cells are in a growing number of clinical trials worldwide. Those innovative

approaches combined with conventional therapies such as chemotherapy, radiotherapy and surgery can

lead to a more effective and less invasive ways of cancer treatment. I sincerely hope that these findings

will one day contribute to winning the race against cancer.

(21)

Materials and methods

Cell cultures

HepG2 cells were grown at 37° C and 5% CO 2 in 175 cm ² cell culture plates with 50 ml RPMI 1640 medium (SigmaAldrich, cat# R0883) supplemented with 10% heatinactivated fetal bovine serum (FBS) (SigmaAldrich, cat# F7524). Before each ChIP assay, cells were incubated overnight with starvation medium (RPMI 1640, 1% FBS) at 37° C and 5% CO 2 .

Antibodies

The antibodies used for ChIP assays are presented in table 4.

Table 4. Antibodies used in the ChIP assay.

Antibody Type Company Catalog number

AntiSmad2 Monoclonal Santa Cruz Biotech SC6200

AntiSmad3 Monoclonal Zymed 511500

AntiSmad4 Monoclonal Santa Cruz Biotech SC7154

AntiPhosphorylated

Smad2 Polyclonal Kindly provided by Aris

Moustakas, Ludvig Institute, Uppsala, Sweden

N/A

AntiImmunoglobulin G Monoclonal Upstate 12370

ChIP and DNA template preparation for sequence analysis

Around 10 ⁸ cells in starvation medium were used per ChIP experiment. TGF 1 (PeproTech, cat# 100 β 21R) was added to a final concentration of 2.5 ng/ml during 1 hour. The nuclear proteins were cross

linked to the DNA using 0.37% formaldehyde for 10 minutes on a rocking bed at room temperature.

The crosslinking was stopped by adding glycine to a final concentration of 0.125 M and incubating another 5 minutes on a rocking bed at room temperature. The cells were then resuspended in cell lysis buffer (CLB) (0.01 M TrisHCl pH 8, 0.01 M NaCl, 0.20% Nonidet P40 [NP40]) with protease

inhibitors (PIs) added (10 µl/ml sodium butyrate, 1 µl/ml Leupeptin, 5 µl/ml 100x phenylmethylsulfonyl

(22)

fluoride). The cellCLB+PIs mix was incubated on ice for 10 minutes and centrifuged at 600 g at 4° C to collect the nuclei. The supernatant was discarded and the nuclei pellet was resuspended in

RadioImmunoPrecipitation Assay (RIPA) (1xPhosphate Buffered Saline [PBS] [SigmaAldrich, cat#

2318345], 1% NP40, 0.5% sodium deoxycholate, 0.1% sodium dodecyl sulfate [SDS], 0.004%

sodium azide) buffer containing PIs. The nuclei were incubated 10 minutes on ice and transferred to a 15 ml Falcon tube for sonication. Chromatin was fragmented using Bioruptor (Diagenode, cat#

UCD200) to approximately 200 base pairs length. The sheared chromatin was precleared by adding 75 µl of protein Gagarose (Roche Diagnostics, cat# 11 243 233 001) and incubated for 1 hour at 4° C on a rotating wheel. The agarose beads were centrifuged at 16000 g and the supernatant (contains precleared chromatin) was used to set up the ImmunoPrecipitation (IP) reactions, aiming to use DNA from around 5x10 ⁶ cells per reaction. A small fraction of precleared chromatin was kept as input DNA. Every IP reaction was incubated at 4° C overnight with 10 µg of antibody recognizing the protein of interest. The following day 100 µl of protein G agarose were added to every IP and incubated for 2 hours at 4° C with rotation to allow the antibodyproteinDNA complex to bind the agarose beads. Then the beads were collected by centrifugation at 16000 g for 2 minutes at 4° C and washed 4 times with RIPA buffer, one time with ImmunoPrecipitation Washing Buffer 2 (IPWB2) (0.01 M Tris HCl [pH 8],0.25 M LiCl, 0.001 M EDTA, 1% NP40, 1% Na deoxycholate) and one time with TE buffer (0.01 M Tris HCl [pH 8], 0.001 M EDTA). The immunocomplexes were diluted from the beads with 450 µl of freshly prepared IPElution Buffer (IPEB) (0.1 M NaHCO 3 , 1% sodium dodecyl sulfate [SDS]) to every IP reaction and incubated 30 minutes with gentle agitation at room temperature. Then the enriched DNA was transferred to a new tube and incubated with 15 μg of RNase A (Amersham Biosciences, cat#

E70194Y), and NaCl to a final concentration of 0.3 M at 65° C for 6 hours to reverse crosslinking.

Proteins were degraded by adding 90 μg of proteinase K (Amersham Biosciences, cat# E76230Y) and incubating at 45° C overnight. DNA was extracted by standard phenol/chloroform/isoamyl alcohol extraction (Sambrook et al., 1989), ethanol precipitated (Sambrook et al., 1989) and resuspended in water.

Polymerase chain reaction confirmation

Polymerase chain reaction (PCR) was performed on each sample to confirm the presence of known binding sites in the immunoprecipitated material, as a quality control before sending the samples for sequencing. 35 to 40 cycles of PCR were used to amplify the enriched DNA using primers for promoters of known early target genes. The PCR mix was composed of 2 µl 10x PCR buffer

(Invitrogen, cat# Y02028), 0.6 µl MgCl 2 (Invitrogen, cat# Y02016), 0.2 µl dNTP (10 mM) (Promega, cat# C1141), 0.2 µl Taq Platinum DNA polymerase (5U/ml) (Invitrogen, cat# Y02016), 1 µl

Reverse+Forward primer mix (Table 5) 10 µM, 0.5 µl DNA sample and 15.5 µl water, to a total volume

of 10 µl per sample. The program used was as follows: 95°C 3 (95°C 30”, 59°C 30”, 72°C 45”)x35 or

x40, 72°C 7'.

(23)

Table 5. Oligonucleotides used in the PCR reactions

Gene name Forward primer Reverse primer

PAI1 CAGAGGGCAGAAAGGTCAAG CTCTGGGAGTCCGTCTGAAC

JunB GTTAGCTTCCCAAGGTGCTG GGTCCCTGTGACCCCTAAAT

Smad7 TCGGACAGCTCAATTCGGAC GGTAACTGCTGCGGTTGTAA

Data analysis

A perl script made by Ola Wallerman was used to filter peaks in the treated sample that were present

either both in control and treated samples or not located within 10 kb of a TSS. Briefly, the script

compares the positions of the peaks to all TSSs downloaded from UCSC genome browser and reports

the distance to the nearest one.

(24)

Acknowledgments

I would like to thank Claes Wadelius for giving me the chance to work in his group. Many thanks to Olla Wallerman for teaching me the deep secrets of ChIP and for his support. I am really grateful to Mehdi Motallebipour for his unconditional patience answering my questions, to Mahdu Bysani and Kalicharan Patra for their generosity lending reagents, to Patricia Respuela for her "ChIP tips and tricks", to Katerina Pardali for useful comments on the manuscript and TGFbeta wisdom and to Aris Moustakas for providing homemade Smad2 antibody.

Lastly, this work is dedicated to the amazing friends I made along the way: Ammar, Gucci, Jelena,

Jessica, Millaray and Sara. The moments we shared together made my experience at Rudbeck

unforgettable.

(25)

References

Alexandrow, M. G. & Moses, H. L. (1995). Transforming growth factor beta and cell cycle regulation.

Cancer Res 55(7), 14521457.

Antequera, F. (2003). Structure, function and evolution of CpG island promoters. Cell Mol Life Sci 60(8), 16471658.

Attisano, L. & Wrana, J. L. (2002). Signal transduction by the TGFbeta superfamily. Science 296(5573), 16461647.

Barski, A.; Cuddapah, S.; Cui, K.; Roh, T.; Schones, D. E.; Wang, Z.; Wei, G.; Chepelev, I. & Zhao, K.

(2007). Highresolution profiling of histone methylations in the human genome. Cell 129(4), 823837.

Biswas, S.; Chytil, A.; Washington, K.; RomeroGallo, J.; Gorska, A. E.; Wirth, P. S.; Gautam, S.;

Moses, H. L. & Grady, W. M. (2004). Transforming growth factor beta receptor type II inactivation promotes the establishment and progression of colon cancer. Cancer Res 64(14), 46874692.

Blobe, G. C.; Schiemann, W. P. & Lodish, H. F. (2000). Role of transforming growth factor beta in human disease. N Engl J Med 342(18), 13501358.

Carroll, J. S.; Liu, X. S.; Brodsky, A. S.; Li, W.; Meyer, C. A.; Szary, A. J.; Eeckhoute, J.; Shao, W.;

Hestermann, E. V.; Geistlinger, T. R.; Fox, E. A.; Silver, P. A. & Brown, M. (2005).

Chromosomewide mapping of estrogen receptor binding reveals longrange regulation requiring the forkhead protein FoxA1. Cell 122(1), 3343.

Collas, P. & Dahl, J. A. (2008). Chop it, ChIP it, check it: the current status of chromatin immunoprecipitation. Front Biosci 13, 929943.

Cosgrove, M. S. & Wolberger, C. (2005), How does the histone code work? Biochem Cell Biol 83(4), 468476.

DeLarco, J. & Todaro, G. J. (1976). Membrane receptors for murine leukemia viruses: characterization using the purified viral envelope glycoprotein, gp71. Cell 8(3), 365371.

Elliott, R. L. & Blobe, G. C. (2005). Role of transforming growth factor Beta in human cancer. J Clin Oncol 23(9), 20782093.

Fajans, S. S.; Bell, G. I. & Polonsky, K. S. (2001). Molecular mechanisms and clinical pathophysiology of maturityonset diabetes of the young. N Engl J Med 345(13), 971980.

Hanahan, D. & Weinberg, R. A. (2000). The hallmarks of cancer. Cell 100(1), 5770.

Hua, X.; Miller, Z. A.; Wu, G.; Shi, Y. & Lodish, H. F. (1999). Specificity in transforming growth factor betainduced transcription of the plasminogen activator inhibitor1 gene: interactions of promoter DNA, transcription factor muE3, and Smad proteins. Proc Natl Acad Sci U S A 96(23), 1313013135.

Ishikawa, F.; Nose, K. & Shibanuma, M. (2008). Downregulation of hepatocyte nuclear factor4alpha and its role in regulation of gene expression by TGFbeta in mammary epithelial cells. Exp Cell Res 314(10), 21312140.

Iyer, V. R.; Horak, C. E.; Scafe, C. S.; Botstein, D.; Snyder, M. & Brown, P. O. (2001). Genomic

(26)

binding sites of the yeast cellcycle transcription factors SBF and MBF. Nature 409(6819), 533538.

Jennings, M. T. & Pietenpol, J. A. (1998). The role of transforming growth factor beta in glioma progression. J Neurooncol 36(2), 123140.

Jenuwein, T. & Allis, C. D. (2001). Translating the histone code. Science 293(5532), 10741080.

Johnson, D. S.; Mortazavi, A.; Myers, R. M. & Wold, B. (2007). Genomewide mapping of in vivo proteinDNA interactions. Science 316(5830), 14971502.

Jonk, L. J.; Itoh, S.; Heldin, C. H.; ten Dijke, P. & Kruijer, W. (1998). Identification and functional characterization of a Smad binding element (SBE) in the JunB promoter that acts as a

transforming growth factorbeta, activin, and bone morphogenetic proteininducible enhancer. J Biol Chem 273(33), 2114521152.

Kamaraju, A. K. & Roberts, A. B. (2005). Role of Rho/ROCK and p38 MAP kinase pathways in transforming growth factorbetamediated Smaddependent growth inhibition of human breast carcinoma cells in vivo. J Biol Chem 280(2), 10241036.

Lieb, J. D.; Liu, X.; Botstein, D. & Brown, P. O. (2001). Promoterspecific binding of Rap1 revealed by genomewide maps of proteinDNA association. Nat Genet 28(4), 327334.

Lin, X.; Liang, Y.; Sun, B.; Liang, M.; Shi, Y.; Brunicardi, F. C.; Shi, Y. & Feng, X. (2003). Smad6 recruits transcription corepressor CtBP to repress bone morphogenetic proteininduced transcription. Mol Cell Biol 23(24), 90819093.

Liu, F.; Pouponnot, C. & Massagué, J. (1997). Dual role of the Smad4/DPC4 tumor suppressor in TGFbetainducible transcriptional complexes. Genes Dev 11(23), 31573167.

Massagué, J.; Blain, S. W. & Lo, R. S. (2000). TGFbeta signaling in growth control, cancer, and heritable disorders. Cell 103(2), 295309.

Massagué, J.; Seoane, J. & Wotton, D. (2005). Smad transcription factors. Genes Dev 19(23), 2783

2810.

Matsuura, I.; Denissova, N. G.; Wang, G.; He, D.; Long, J. & Liu, F. (2004). Cyclindependent kinases regulate the antiproliferative function of Smads. Nature 430(6996), 226231.

Mito, Y.; Henikoff, J. G. & Henikoff, S. (2007). Histone replacement marks the boundaries of cis

regulatory domains. Science 315(5817), 14081411.

Miyazawa, K.; Shinozaki, M.; Hara, T.; Furuya, T. & Miyazono, K. (2002). Two major Smad pathways In TGFbeta superfamily signalling. Genes Cells 7(12), 11911204.

Moustakas, A. & Kardassis, D. (1998). Regulation of the human p21/WAF1/Cip1 promoter in hepatic cells by functional interactions between Sp1 and Smad family members. Proc Natl Acad Sci USA 95(12), 67336738.

Moustakas, A.; Souchelnytskyi, S. & Heldin, C. H. (2001). Smad regulation in TGFbeta signal transduction. J Cell Sci 114(Pt 24), 43594369.

Nagarajan, R. P.; Zhang, J.; Li, W. & Chen, Y. (1999). Regulation of Smad7 promoter by direct association with Smad3 and Smad4. J Biol Chem 274(47), 3341233418.

Piek, E.; Heldin, C. H. & Dijke, P. T. (1999). Specificity, diversity, and regulation in TGFbeta superfamily signaling. FASEB J 13(15), 21052124.

RadaIglesias, A.; Ameur, A.; Kapranov, P.; Enroth, S.; Komorowski, J.; Gingeras, T. R. & Wadelius, C.

(27)

(2008). Wholegenome maps of USF1 and USF2 binding and histone H3 acetylation reveal new aspects of promoter structure and candidate genes for common human disorders. Genome Res 18(3), 380392.

Ren, B.; Robert, F.; Wyrick, J. J.; Aparicio, O.; Jennings, E. G.; Simon, I.; Zeitlinger, J.; Schreiber, J.;

Hannett, N.; Kanin, E.; Volkert, T. L.; Wilson, C. J.; Bell, S. P. & Young, R. A. (2000).

Genomewide location and function of DNA binding proteins. Science 290(5500), 2306

2309.

Robertson, G.; Hirst, M.; Bainbridge, M.; Bilenky, M.; Zhao, Y.; Zeng, T.; Euskirchen, G.; Bernier, B.;

Varhol, R.; Delaney, A.; Thiessen, N.; Griffith, O. L.; He, A.; Marra, M.; Snyder, M. & Jones, S. (2007). Genomewide profiles of STAT1 DNA association using chromatin

immunoprecipitation and massively parallel sequencing. Nat Methods 4(8), 651—657.

Sambrook, J.; Fritsch, E. F.; and Maniatis, T. (1989). Molecular cloning A laboratory manual. 2nd edition. Cold Spring Habor Laboratory Press, Cold Spring Harbor.

Seoane, J.; Le, H.; Shen, L.; Anderson, S. A. & Massagué, J. (2004). Integration of Smad and forkhead pathways in the control of neuroepithelial and glioblastoma cell proliferation. Cell 117(2), 211223.

Shendure , J. & Ji, H. (2008). Nextgeneration DNA sequencing. Nat Biotechnol. 26(10), 113545.

Shi, W.; Chang, C.; Nie, S.; Xie, S.; Wan, M. & Cao, X. (2007). Endofin acts as a Smad anchor for receptor activation in BMP signaling. J Cell Sci 120(Pt 7), 12161224.

Shi, W.; Sun, C.; He, B.; Xiong, W.; Shi, X.; Yao, D. & Cao, X. (2004). GADD34PP1c recruited by Smad7 dephosphorylates TGFbeta type I receptor. J Cell Biol 164(2), 291300.

Shi, Y. & Massagué, J. (2003). Mechanisms of TGFbeta signaling from cell membrane to the nucleus. Cell 113(6), 685700.

Sporn, M. B. & Roberts, A. B. (1990). TGFbeta: problems and prospects. Cell Regul 1(12), 875882.

Stopa, M.; Anhuf, D.; Terstegen, L.; Gatsios, P.; Gressner, A. M. & Dooley, S. (2000). Participation of Smad2, Smad3, and Smad4 in transforming growth factor beta (TGFbeta)induced activation of Smad7. THE TGFbeta response element of the promoter requires functional Smad binding element and Ebox sequences for transcriptional regulation. J Biol Chem 275(38),

2930829317.

Strahl, B. D. & Allis, C. D. (2000). The language of covalent histone modifications. Nature 403(6765), 4145.

Struhl, K. (2007). Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nat Struct Mol Biol 14(2), 103105.

ten Dijke, P. & Hill, C. S. (2004). New insights into TGFbetaSmad signalling. Trends Biochem Sci 29(5), 265273.

Tsukazaki, T.; Chiang, T. A.; Davison, A. F.; Attisano, L. & Wrana, J. L. (1998). SARA, a FYVE domain protein that recruits Smad2 to the TGFbeta receptor. Cell 95(6), 779791.

Tucker, R. F.; Shipley, G. D.; Moses, H. L. & Holley, R. W. (1984). Growth inhibitor from BSC1 cells closely related to platelet type beta transforming growth factor. Science 226(4675), 705707.

Verrecchia, F. & Mauviel, A. (2002). Transforming growth factorbeta signaling through the Smad

pathway: role in extracellular matrix gene expression and regulation. J Invest Dermatol 118(2),

(28)

211215.

Watt, A. J.; Garrison, W. D. & Duncan, S. A. (2003). HNF4: a central regulator of hepatocyte differentiation and function. Hepatology 37(6), 12491253.

Wederell, E. D.; Bilenky, M.; Cullum, R.; Thiessen, N.; Dagpinar, M.; Delaney, A.; Varhol, R.; Zhao, Y.; Zeng, T.; Bernier, B.; Ingham, M.; Hirst, M.; Robertson, G.; Marra, M. A.; Jones, S. &

Hoodless, P. A. (2008). Global analysis of in vivo Foxa2binding sites in mouse adult liver using massively parallel sequencing. Nucleic Acids Res 36(14), 45494564.

Wong, C.; RougierChapman, E. M.; Frederick, J. P.; Datto, M. B.; Liberati, N. T.; Li, J. M. & Wang, X. F. (1999). Smad3Smad4 and AP1 complexes synergize in transcriptional activation of the cJun promoter by transforming growth factor beta. Mol Cell Biol 19(3), 18211830.

Zhang, H.; Zhao, J.; Zheng, S. & Chen, X. (2005). Is TGFBR1*6A really associated with increased risk of cancer? J Clin Oncol 23(30), 77434; author reply 77446.

Zhang, W.; Hoffman, B. & Liebermann, D. A. (2001). Ectopic expression of MyD118/Gadd45/CR6 (Gadd45beta/alpha/gamma) sensitizes neoplastic cells to genotoxic stressinduced apoptosis.

Int J Oncol 18(4), 749757.

Transcription response in the TGF-betapathwayFrancisco Manuel Sánchez de Oria

Transcription response in the TGF-beta pathway

Francisco Manuel Sánchez de Oria

Degree project in biology, Master of science (2 years), 2008 Examensarbete i biologi 30 hp till masterexamen, 2008

Biology Education Centre and Department of Genetics and Pathology, Rudbeck Laboratory, Uppsala

University

Table of Contents

List of abbreviations...2

Abstract...3

Introduction...3

The TGF­ superfamily β ...3

Role of TGF­ in tumor pathogenesis β ...4

TGF­ signal transduction and the Smad proteins β ...5

Studying transcription factors binding: Chromatin ImmunoPrecipitation (ChIP) assays...7

ChIP­seq: next generation ChIP assays...10

Results...13

In vivo mapping of binding sites for Smad4 transcription factor...13

Verification of known binding sites...13

Analysis of Smad4 target genes...17

Smad4 enriched regions contained an over representation of Smad binding sites...19

Discussion...20

Global mapping of Transcription Factor Binding Sites in the postgenomic era...20

Unraveling the secrets of the complex network of TFs in the TGF­ pathway β ...21

Materials and methods...23

Experimental procedures...23

Cell cultures...23

Antibodies...23

ChIP and DNA template preparation for sequence analysis...23

PCR confirmation...25

Data analysis...25

Acknowledgments...25

References...26

List of abbreviations

BMPs Bone MorphogeneticProteins CDK Cyclin­Dependent Kinases ChIP Chromatin Immunoprecipitation CLB Cell Lysis Buffer

FBS Fetal Bovine Serum

FoxG1 Forkhead Box G1B

I­Smads Inhibitory Smads

MAPK Mitogen Activated Protein Kinases PAI­1 Plasminogen Activator Inhibitor­1 R­Smads Receptor­Regulated Smads

RIPA RadioImmunoPrecipitation Assay SARA Smad Anchor for Receptor Activation

SBE Smad Binding Element

Smurf Smad ubiquitination regulatory factor TF Transcription Factor

TFBS Transcription Factor Binding Site

TGF­β Transforming Growth Factor beta

TSS Transcription Start Site

Summary

genes.

Introduction

expression or signaling is involved in a variety of diseases, including cancer and fibrosis (Blobe et al., 2000).

The TGF­ superfamily β

discovery it was shown that TGF­β1 also acts as an inhibitor of cell proliferation (Tucker et al., 1984).

This duality in cell growth regulation is cell­type dependent and imprinted during embryonic

development (Sporn and Roberts, 1990). Other members of the TGF­ s β uperfamily are TGF­β2 and TGF­β3, bone morphogenetic proteins (BMPs), anti­müllerian hormone (AMH), activins and nodal (Piek et al., 1999).

Role of TGF­ β in tumor pathogenesis

TGF­β plays important roles in tumor pathogenesis, contributing to cell growth, invasion and metastasis, angiogenesis and also decreasing host tumor­specific immune responses (Jennings &

Pietenpol, 1998). Although originally TGF­ acts as a tumor suppressor inhibiting cell growth in most β

cell types via the Smads pathway, once the tumor has been established most cells become resistant to

TGF­ and β TGF­ turns pro­oncogenic (Elliott and Blobe, 2005) (Figure 1). Escaping from TGF­ β β

growth inhibition is the identifying characteristic of many cancer cells (Massagué et al., 2000).

Figure 1. The dual role of TGF­ in tumor pathogenesis. β

TGF­ arrests the cell cycle progression at early G1 through controlling a number of important cell β cycle regulators (Hanahan and Weinberg, 2000). Cyclin­dependent kinases (CDK) regulation is essential for cell growth inhibition mediated by TGF­ . This regulation can be either direct β

downregulation of CDK levels (Zhang et al., 2001) or by upregulation of CDK inhibitors (Alexandrow and Moses, 1995)

Alterations of the TGF­ pathway can increase cancer risk. A common example is TGFBR1*6A , a β variant of the TGFBR1 gene with a 9­bp in­frame deletion. This modification is present in

TGF­ β signal transduction and the Smad proteins

The Smads proteins directly mediate the biological effects of TGF­ . The Smads proteins are homolog β of both the Drosophila mothers against decapentaplegic (MAD) protein and the Caenorhabditis elegans SMA protein, their name is a combination of the two.

TGF­β binds to and activates type I and type II serine/threonine kinase receptors present in the surface

of the cell. The receptor­regulated Smads (R­Smads) directly mediate TGF­β signalling upon receptor

activation, and those are Smad1, Smad2, Smad3, Smad5 and Smad8. The Smad anchor for receptor

Modified with permission from ten Dijke and Hill, 2004.

Smad6 and Smad7, the I­Smads, constitute a subclass of inhibitory Smads that acts in direct opposition to R­Smads signalling, forming a negative feedback loop. Originally this subclass was shown to

terminal binding protein) and Smad7 disrupts Smad2/Smad3 complexes in the nucleus (Lin et al., 2003).

Although there are numerous members in the TGF­β superfamily that produce a vast diversity of cellular responses there are only two different Smad pathways known, raising many questions about how signaling specificity and diversity are produced (Attisano and Wrana, 2002; Miyazawa et al., 2002).

Chromatin Immunoprecipitation (ChIP) assays are the cutting­edge techniques to study large scale

protein­DNA interactions in vivo. The ChIP technique involves reversible cross­linking of proteins with

DNA, a procedure by which the protein­DNA interaction is covalently linked using formaldehyde. The

purpose of the cross­linking is to ensure that the DNA­protein link is maintained during the ChIP

procedure. The chromatin is fragmented into smaller pieces, usually in the range of 200 base pairs

length, using either enzymatic digestion or sonication of the nuclei. The sheared chromatin is then

immunoprecipitated with an antibody recognizing the protein of interest. In the last steps the cross­link

is reversed, proteins are digested and the enriched ChIP­DNA is purified (Figure 3). For a recent review

of the ChIP current state and applications see Collas and Dahl, 2008.

The TGF superfamily β ...3

Role of TGF in tumor pathogenesis β ...4

TGF signal transduction and the Smad proteins β ...5

ChIPseq: next generation ChIP assays...10

Unraveling the secrets of the complex network of TFs in the TGF pathway β ...21

BMPs Bone MorphogeneticProteins CDK CyclinDependent Kinases ChIP Chromatin Immunoprecipitation CLB Cell Lysis Buffer

ISmads Inhibitory Smads

MAPK Mitogen Activated Protein Kinases PAI1 Plasminogen Activator Inhibitor1 RSmads ReceptorRegulated Smads

TGFβ Transforming Growth Factor beta

The TGF superfamily β

discovery it was shown that TGFβ1 also acts as an inhibitor of cell proliferation (Tucker et al., 1984).

This duality in cell growth regulation is celltype dependent and imprinted during embryonic

development (Sporn and Roberts, 1990). Other members of the TGF s β uperfamily are TGFβ2 and TGFβ3, bone morphogenetic proteins (BMPs), antimüllerian hormone (AMH), activins and nodal (Piek et al., 1999).

Role of TGF β in tumor pathogenesis

TGFβ plays important roles in tumor pathogenesis, contributing to cell growth, invasion and metastasis, angiogenesis and also decreasing host tumorspecific immune responses (Jennings &

Pietenpol, 1998). Although originally TGF acts as a tumor suppressor inhibiting cell growth in most β

TGF and β TGF turns prooncogenic (Elliott and Blobe, 2005) (Figure 1). Escaping from TGF β β

Figure 1. The dual role of TGF in tumor pathogenesis. β

TGF arrests the cell cycle progression at early G1 through controlling a number of important cell β cycle regulators (Hanahan and Weinberg, 2000). Cyclindependent kinases (CDK) regulation is essential for cell growth inhibition mediated by TGF . This regulation can be either direct β

Alterations of the TGF pathway can increase cancer risk. A common example is TGFBR1*6A , a β variant of the TGFBR1 gene with a 9bp inframe deletion. This modification is present in

TGF β signal transduction and the Smad proteins

The Smads proteins directly mediate the biological effects of TGF . The Smads proteins are homolog β of both the Drosophila mothers against decapentaplegic (MAD) protein and the Caenorhabditis elegans SMA protein, their name is a combination of the two.

TGFβ binds to and activates type I and type II serine/threonine kinase receptors present in the surface

of the cell. The receptorregulated Smads (RSmads) directly mediate TGFβ signalling upon receptor

Smad6 and Smad7, the ISmads, constitute a subclass of inhibitory Smads that acts in direct opposition to RSmads signalling, forming a negative feedback loop. Originally this subclass was shown to

Although there are numerous members in the TGFβ superfamily that produce a vast diversity of cellular responses there are only two different Smad pathways known, raising many questions about how signaling specificity and diversity are produced (Attisano and Wrana, 2002; Miyazawa et al., 2002).

Chromatin Immunoprecipitation (ChIP) assays are the cuttingedge techniques to study large scale

proteinDNA interactions in vivo. The ChIP technique involves reversible crosslinking of proteins with

DNA, a procedure by which the proteinDNA interaction is covalently linked using formaldehyde. The

purpose of the crosslinking is to ensure that the DNAprotein link is maintained during the ChIP

immunoprecipitated with an antibody recognizing the protein of interest. In the last steps the crosslink

is reversed, proteins are digested and the enriched ChIPDNA is purified (Figure 3). For a recent review

For several years a strong limitation of the ChIP technology was the restriction of analysis of the ChIP

ChIPseq: next generation ChIP assays

The so called nextgeneration sequencing machines are machines capable of producing tens to hundreds of millions of short sequence reads during a single instrument run (Shendure and Ji, 2008). This

2007). ChIPseq has been recently used to study epigenetic changes in the DNA and target sites for TFs

and other related chromosomeassociated proteins across the entire genome, enabling the possibility to build a high resolution genomewide map for gene expression and genome function (Barski et al., 2007;

fluorescentlylabeled modified nucleotides (sequencingbysynthesis). One important property of those nucleotides is reversible termination, allowing the presence of the 4 nucleotides (A, C, T, G)

nucleotide is present at the time. For a cycle of sequencing, a laser excites the fluorescentlylabeled

every ChIPDNA fragment (aligned reads). The resulting file contains the sequence of every DNA

regions of the genome where several aligned ChIPDNA overlaps form peaks. Each step in the peak

represents the position of an aligned ChIPDNA read in the human reference genome.

factors, which directly mediate TGF signaling, in HepG2 cells. For that purpose I used chromatin β

immunoprecipitation and high throughput parallel sequencing (ChIPseq), a method employed to

Chromatin immunoprecipitation coupled to highthoughput sequencing technology (ChIPseq) can be used to profile wholegenome binding sites for a chosen transcription factor (Barski et al., 2007;

Table 1. Sequencing statistics obtained for each sequenced ChIPDNA sample.

Name Antibody TGFbeta treated #Aligned reads Peaks

Smad2 antiSmad2 Yes 3176304 586

Smad4 antiSmad4 Yes 2851319 667

Smad4C_Last antiSmad4 No 4846500 17330

Smad4T_Last antiSmad4 Yes 4707083 3117

For the PAI1 gene, a Smad binding region has been located −586 to −551 upstream of the gene. This

region contains 3 Smad Binding Elements (SBE) and an Ebox, and the 3 bp spacer between the Ebox and an SBE has been shown to be essential to mediate TGF induced transcription (Hua β et al.,1999).

promoter is located 471 to 275, and it contains a perfect 8 bp SBE (GTCTAGAC) (Nagarajan et al., 1999 , Stopa et al., 2000).

The first ChIP carried out included Smad2 and Smad4 transcription factors in TGF treated HepG2 β cells. The samples were sequenced and the sequencing data was analyzed in the UCSC genome

The second set of samples were Smad4 control and TGF treated (Smad4C_Last and Smad4T_Last). β Those samples confirmed known binding sites for PAI1 (Figure 6A and 6B) and JUNB (Figure 7).

Nevertheless, although my data did not support Smad4 binding at 471 to 275 for the Smad7 gene,

there was a peak at around +750 bp (Figure 8). Further studies are necessary to determine whether the

471 to 275 region of the Smad7 gene is in fact negative for Smad4 binding in vivo.

Figure 6. (A) The PAI1 promoter in the UCSC genome browser showing the genome localization of the sequences

precipitated with antiSmad4 antibody. The upper panel (Smad4T_Last) represents the sequence tags (black) from the

TGF treated sample, while the lower panel (Smad4C_Last) represents the sequence tags (black) for the control sample. β

according to the tallest peak of each panel, so that different scaling is used in each panel. (B) A closer look at the PAI1