• No results found

During the last ten years, the development of more and more powerful techniques for differential gene expression studies have provided entirely new insights into molecular mechanisms underlying biological processes. In this thesis, we applied two different methods, cDNA RDA and cDNA microarray analysis, to molecular characterize the prostate hyperplasia of PRL transgenic mice and those two methods will be discussed in more detail in the following chapter.

C

DNA

REPRESENTATIONAL DIFFERENCE ANALYSIS

(RDA)

In paper II, we used the method of cDNA-RDA to identify differentially expressed transcripts in the hyperplastic prostate of the Mt-PRL transgenic mice compared to wildtype control littermates. RDA has been successfully adapted to identify genes that are differentially expressed between two populations of cells [212]. Representative cDNA fragments from each population are first generated by restriction endonuclease digestion of cDNAs followed by PCR amplification. The resulting mixtures, termed

‘representations’, are then subject to successive rounds of subtractive cross-hybridization followed by differential PCR amplification. This leads to progressive enrichment of cDNA fragments that are more abundant in one population than the other. Figure 1 shows a schematic description of the RDA procedure. The PCR products after each RDA round are termed differential products (DP). Theoretically, consecutive DP should contain more stringently selected gene fragments and less noise from non-differentially expressed genes. To allow isolation of both up- and down-regulated genes, both samples are used as tester and driver, respectively, in two parallel procedures. In paper II, we aimed to identify both up- and down-regulated transcripts in the hyperplastic prostates of Mt-PRL transgenic mice compared to controls, and we therefore used both samples as tester and driver, in two parallel procedures.

cDNA-RDA is a powerful technique for isolation of differentially expressed genes, but it also has limitations in that not all of the differentially expressed genes are necessarily enriched during the procedure. The lack of four-base pair restriction sites in the messenger RNA may result in the generation of

<100% coverage of expressed genes in the representations. In contrast, the restriction fragments may be too bigfor efficient amplification by PCR. The PCR amplification step to generate the starting representations in the RDA procedure is a very critical step for a successful RDA. In order to generate representations that truly represent the original cDNA pool with respect to

- 33 -

mRNA

cDNA

PRL-transgenic

prostate Control

prostate

Linker ligation PCR

Representations Restriction digestion Samples

Tester Driver

Digest and ligate

new linker Digest

Mix, melt and hybridize

Tester : Tester Tester : Driver Driver : Driver

Exponential

amplification Linear

amplification No

amplification PCR

Repeat procedure

Figure 1. A schematic description of the cDNA-RDA procedure. The first step is to synthesize cDNA using purified mRNA as template. The double-stranded cDNA is cut with a 4-basepair restriction enzyme. A linker, complementary to the generated overhangs, is ligated onto the cDNA fragments. This generates a pool, which is amplified by PCR, using primers complementary to the linker. This procedure generates a representation and one such representation is made from each of the two mRNA pools to be compared. The linker is then removed, using the same restriction enzyme as before. A new adapter is ligated onto the tester fragments only. The tester is then mixed with driver in excess, the mix is heat denatured and allowed to hybridize. A PCR amplification using primers complementary to the new adapters is performed. During this step, only tester: tester hybrids are amplified exponentially. Tester: driver hybrids are linearly amplified, and can be removed by nuclease treatment. Driver: driver hybrids are not amplified at all. The procedure is then repeated with increasing ratios of fresh driver. After a few rounds, distinct bands can be visualized on an agarose gel. These bands are isolated, and the products are cloned into vectors and characterized. Reproduced from Hubank et al [212].

- 34 -

fragment distribution while avoiding a size bias, the PCR needs to be carefully titrated for each sample.

The sensitivity of RDA remains to be defined. To allow detection of a transcript, the relative differences in expression levels between tester and driver populations are thought to be the major determinant. The flexibility of the RDA methodology can be employed to overcome this issue, as variation in the stringency of hybridization will influence the detection of small differences in gene expression between tester and driver populations. By lowering the stringency of subtraction (increase the amount of tester cDNA relative to the driver cDNA), cDNA-RDA can enrich for genes with subtle differences in gene expression [212]. However, too much driver cDNA can cause insufficient enrichment of the targets, rendering differences invisible whereas too littledriver cDNA may cause insufficient exhaustion of common (but differentially expressed) sequences in the tester cDNA, generating background. The risk of cloning non-differentially expressed genes is obviously also higher in the less stringent enrichment case.

Furthermore, the relative expression level of the corresponding gene may affect the degree of enrichment on the cDNA-RDA. Not all differentially expressed genes are equally enriched in the process, which favors fragments with high levels of differential expression, especially if RDA is performed for 3-4 rounds (DP3 and DP4). There is an inverse relation between the degree of enrichment for differentially expressed genes and the complexity of the output of RDA. If RDA is performed for 1-2 rounds, a broader spectrum of differentially expressed genes (including those with lower levels of differential expression) are obtained together with many non-differentially expressed genes, necessitating large-scale screening of the output, for example by using microarray, to remove those transcripts.

Sequence analysis

To follow up the RDA output, sequence analysis of RDA clones were performed by routine sequence analysis using cycle sequencing with dye-labeled nucleotides followed by running of purified products on an automated sequencing machine. Subsequently, the Staden sequence analysis package [213] was used for vector clipping, redundancy, and assembly analysis. Sequences were annotated and given an accession number by analyzing for homologies with published sequences in the non-redundant and expressed sequence tags (EST) divisions of the public databases of NCBI (National center for Biotechnology Information) by using the BLAST (N/X) software [214]. More than 85% homology over at least 50 base pairs region was required to annotating sequences based on homology to known genes or

- 35 -

ESTs. Clones that failed to match any existing database entry in BLAST (N/X) search were denoted unknowns. Functional prediction was performed in silico by using the information at UniGene, The Institute for Genomic Research (TIGR)-EGAD, Online Mendelian Inheritance in Man (OMIM), and Medline databases.

C

DNA

MICROARRAY ANALYSIS

The DNA microarray, also called chip, has become an important tool in gene expression studies, monitoring RNA expression levels, but can also be utilized to study mutations and polymorphisms at the DNA level. In this thesis, cDNA micorarray technology was used to characterize the molecular mechanisms of importance for the prostate hyperplasia in PRL-transgenic mice compared to controls. However, we made use of this method in different ways in the two studies. In paper II, we applied the cDNA microarray technology to verify the cloned RDA products, isolated as differentially expressed between the Mt-PRL transgenic prostates compared to controls. In contrast, in paper IV, we used the cDNA microarray technology to screen for differentially expressed transcript in the Pb-PRL transgenic model of prostate hyperplasia.

Basically, there are two main types of microarrays. The first type is the one composed of oligonucleotides which are synthesized in situ by photolithography [215]. These chips are also available commercially and form the basis of GeneChip™ technology sold by Affymetrix. The other type of microarray, cDNA microarray, was originally developed by Brown and colleagues at Stanford University [216]. This form of microarrays usually comprises PCR-amplified inserts from cDNA clones representing known genes and ESTs [217]. cDNA microarrays are generally used for comparative analysis where the two samples to be compared are hybridized onto a single chip. In contrast when using Affymetrix arrays, each sample is hybridized on separate arrays. Although, this results in the use of increased numbers of chips, it also provides the advantage that post hoc comparisons not planned in the original experiment can be more easily made. Another advantage of using short oligonucleotide probes on an array is the built-in ability to distinguish close members of a gene family. However, the current Affymetrix oligonucleotide expression platform is still significantly more expensive than cDNA arrays and lack flexibility when it comes to producing custom-designed arrays. Newer platforms using 50-70-mer spotted oligonucleotides allow for rapid array design and implementation.

Experience with this technology is still limited, but it may offer the best alternative.

- 36 -

The microarray technology is advancing at an impressive rate and this thesis will only describe the most important methodological characteristics. All steps mentioned need to be carefully optimized for the successful application of cDNA microarray analysis. Figure 2 shows a schematic description of the DNA microarray procedure.

Array design and Printing

Customized cDNA microarrays are fabricated by first selecting the genes to be printed/immobilized onto the array from public databases/repositories or institutional sources. Control clones can help to validate the microarray-derived data. Selected cDNA clones may be spotted twice at different locations on the chip to serve as “within slide” reproducibility controls. A set of negative controls including repetitive DNA, polyA sequences, genomic DNA and non-cross-reactive gene sequences from different organisms may be utilized to ensure specific hybridization. In addition so-called spiking controls (positive controls) may be used by adding RNA that will hybridize specifically to spots included on the array. High throughput DNA preparation is performed in either 96- or 384-well format by PCR amplification of the selected clones/gene sequences. Subsequently, the DNA is purified by ethanol precipitation and resuspended in an appropriate

“spotting” solution. Moreover, the purity of each gene is checked on an

Figure 2. Schematic of microarray experiments. From Duggan DJ et al. [218].

- 37 -

agarose gel. Spotting is carried out by a robot,which deposits a nanoliter of PCR product onto an aminosilane-coated glass slide in serial order to produce circular spots of about 90-200 µm in diameter. Spotted DNA is cross linked to the matrix by ultraviolet irradiation and denatured by exposureto heat.

Target preparation

The total RNA or mRNA samples, that are to be compared, are extracted from two tissues or cell groups and labeled with different fluorescent dyes in a reverse transcription reaction generating fluorescent dye-incorporated cDNA.

Most often, the cyanine-3, Cy3 (green), and Cy5 (red) dyes are used as they have well separated emission spectra which enable efficient channel separation in the signal detection. The labeling cDNA synthesis reaction is rapid, but the bulky Cy-dye molecules may reduce the incorporation efficiency of labeled nucleotides. In order to eliminate dye specific effects caused by a labeling bias, resulting in an uneven labeling of the two dyes for a specific gene sequences, a dye-swap design is recommended. Each hybridization is then performed twice but with switched colors during labeling. Finally, purification of samples is performed to remove unincorporated dye. This is often performed by spin column purification.

The amount of total RNA required for one microarray experiment is currently approximately 15 µg for each sample and this is considered one of the bottlenecks in microarray analysis. Although a number of amplification strategies have been developed, which aim to reduce the amount of starting material, [219-222], the limitations of all these strategies are reproducibility and unbiased amplification which is necessary to preserve the relative expression levels from the two starting RNA samples that are to be compared.

Hybridization

Hybridization of the labeled target is ideally linear (i.e. proportional to the amount of labeled targets), sensitive so that low abundance genes are detected, and specific so that probes hybridize only to the desired gene in the complex target mixture. The large size of the cDNA probes is also helpful in enabling stringent hybridization conditions and lowering cross-hybridization of unrelated genes, although closely related gene families will still be able to anneal to some extent. Procedures to reduce background (a step commonly called pre-hybridization) include inactivation of free reactive groups on the glass slide surface before hybridization. This can be performed either by

- 38 -

chemical inactivation [223] or by treatment with biomolecules such as bovine serum albumin (BSA) [224] to block the reactive groups. The hybridization temperature and buffer will determine the stringency of the hybridization. Salmon sperm DNA, polyA, tRNA, sodium dodecyl sulfate (SDS), and Cot1 DNA are added to the hybridization to eliminate nonspecific hybridization due to repetitive sequences [225]. After hybridization, the chip is washed in multiple steps, to wash away disturbing particles and loosely bound target DNA.

Image analysis and Normalization

The fluorescent signal of the hybridized probes is measured with a laser scanner capable of detecting emission from the Cy3 and Cy5 channels (showing green and red signals, respectively) to monitor the spots where target DNA has bound. Laser intensity and detector gain should be adjusted to yield images with non-saturated spots and approximately similar overall signal intensities for the red and green channels. An overlay of the red and green images will therefore allow a relative comparison, where the intensity of the signals from the two different samples is directly correlated with the original concentration of mRNA in the cell or tissue. Calculation of the expression ratio for each clone (red/green channel), enables the assignment of up-regulated, down-up-regulated, non-differentially or absent expression.

The image processing and subsequent data analysis from the microarray experiments are crucial for extraction of useful information. Image analysis in paper II and IV was performed by using GenePix Pro software. First, a grid describing the array design is aligned on the image to localize and link a clone identity to each spot. The software extracts intensity and background measurement for each probe. Automatic flagging localizes absence of a spot or very weak spots (≤1.4 (paper II) or ≤2 (paper IV) times above background) and manual flagging is used to eliminate artifacts. The value of the signal from each spot is calculated as the average intensity minus the background.

To allow for inter-array comparisons, each array needs to be normalized to remove systematic sources of variation. Normalization between the two fluorescent images was performed using ‘LOWESS’ normalization method in the SMA (Statistics of Microarrays Analysis) package [226, 227]. SMA is an add-on library written in the public domain statistical language R [228]

and can be used to analyze simple replicated experiments. The LOWESS (Locally Weighted Scatter Plot Smoother) algorithm performs a local fit to the data in an intensity-dependent manner. The intensity value for each spot

- 39 -

is normalized based on data distribution in the immediate neighborhood of the spot’s intensity. Bias in spatially defined sub-sets of the data can also be compensated for by normalization strategies (‘Pin-wise LOWESS’) e.g.

when clear biases caused by pin-to-pin variations during array printing or uneven hybridizations are observed.

Data Analysis and Statistical Evaluation

cDNA microarrays is now becoming used in a more or less standardized fashion and it has become increasingly clear that simply generating the data is not enough; one must be able to extract meaningful information about the system being studied. Despite the combined efforts of biologists, computer scientists, statisticians and software engineers, there is no one-size-fits-all solution for the analysis and interpretation of genome-wide expression data.

There are now numbers of tools available for interpreting the data and choosing among them is challenging.

The most basic question one can ask in a transcriptional profiling experiment is which genes’ expression levels changed significantly. Highly abundant genes with great differences in expression will normally not cause any problems as they will display expression ratios above experimental noise and measurement variations. However, for the detection of subtle expression differences and low abundance genes, a statistically justified experimental design and data evaluation is crucial.

The many sources of variation in a microarray experiment can be divided into three different parts. First, the biological variation, which is intrinsic to all organisms; it may be influenced by genetic or environmental factors, as well as by whether the samples are pooled or individual. Second, the technical variation, which might have been introduced to the samples during the extraction, labeling or hybridization procedures. Third, measurement error, which is associated with reading the fluorescent signals, which may be affected by factors such as dust on the array. Technical replicates generally involve a smaller degree of variation in measurements than the biological replicates.

Replication is essential in experimental design because it allows accounting for different sources of variability. It is more difficult to say how many replicates should be done, although Lee et al indicates that three replicates are sufficient to account for technical variability [229]. The ability to assess such variability allows identification of biologically reproducible changes in gene expression levels. Standard analyses of t-like tests assume that the data are

- 40 -

sampled from normal populations with equal variances. Although log transformation of the expression ratios can improve normality and help equalize variances [230], ultimately the best estimates of the data’s distribution come from the data themselves. Permutation tests, generally carried out by repeatedly scrambling the samples’ class labels and computing t statistics for all genes in the scrambled data, best capture the unknown structure of the data [226, 231]. These types of tests do not assume normal distribution of the data set. One advantage of permutation methods is that they allow more reliable correction for multiple testing. The issue of multiple tests is crucial, as microarrays typically monitor the expression levels of thousands of genes.

In paper II and IV, we used the permutation-based statistical method, Significance Analysis of Microarrays (SAM) software, adapted specifically for microarrays. Today, SAM is a well accepted statistical method for estimating the variability of the repeated experiment [231]. Briefly, SAM assigns a score to each transcript on the basis of change in gene expression relative to the standard deviation of multiple independent measurements.

Thereby, SAM allows selection of differentially regulated genes based on estimation of the percentage of genes identified as differentially regulated by chance, the so-called false discovery rate (FDR). To each of the genes in the array a q-value is assigned. This value is similar to the familiar p-value and measures the lowest FDR at which the gene is called significant.

Experimental design

The expression ratio obtained from a microarray experiment is relative, i.e. no absolute values of the number of mRNA molecules per cell can be obtained.

The key issue in designing a cDNA microarray experiment is to decide whether to use direct or indirect comparisons; that is, whether to make the comparison within or between slides [232]. Figure 3a show a direct design where the comparison of two different samples is made within one slide using the same orientation of dye labeling. In this design, dye bias may affect the

The key issue in designing a cDNA microarray experiment is to decide whether to use direct or indirect comparisons; that is, whether to make the comparison within or between slides [232]. Figure 3a show a direct design where the comparison of two different samples is made within one slide using the same orientation of dye labeling. In this design, dye bias may affect the

Related documents