• No results found

Genotyping and Mutation Detection In Situ: Development and application of single-molecule techniques

N/A
N/A
Protected

Academic year: 2022

Share "Genotyping and Mutation Detection In Situ: Development and application of single-molecule techniques"

Copied!
60
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)
(3)

After it's finished, sometimes I can trace a path that goes back to the possible source of inspiration.

Tracy Chapman

With love to my family

(4)
(5)

List of Papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Howell W. M., Grundberg I., Faryna M., Landegren U., Nils- son M. (2010) Glycosylases and AP-cleaving enzymes as a general tool for probe-directed cleavage of ssDNA targets.

Nucleic Acid Res, 38(7):e99.

II Larsson C.*, Grundberg I.*, Söderberg O., Nilsson M. (2010) In situ detection and genotyping of individual mRNA mole- cules. Nature Methods, 7(5):395-7. *equal contribution

III Grundberg I., Imgenberg-Kreuz J., Edlund K., Micke P., Sundström M., Kiflemariam S., Botling J.*, Nilsson M.* Diag- nostic mutation testing in situ in routine FFPE tissue sections for treatment prediction in clinical oncology. Manuscript.

*equal contribution

IV Weibrecht I.*, Grundberg I.*, Nilsson M., Söderberg O. Fol- lowing the pathway – determination of cellular activity status in single cells. Submitted. *equal contribution

Reprints were made with permissions from the respective publishers.

(6)

Related work by the author

Book chapters

Nilsson M., Larsson C., Stenberg J., Göransson J., Grundberg I., Isaksson M., Conze T., Henriksson S. Application of Padlock and Selector Probes in Molecular Medicine. In Molecular Diagnostics (2nd Edition). Academic Press, Elsevier. Oxford, (2010).

Grundberg I.*, Weibrecht I.*, Landegren U. Amplified single-molecule detection. In Handbook of Nanophysics: Principles and Methods. CRC Press, Taylor and Francis Group. Florida. (2010). *equal contribution

(7)

Contents

Introduction ... 11

The central dogma of molecular biology ... 12

From DNA to RNA ... 13

The complexity of the human transcriptome ... 13

Genetic variations ... 14

Single nucleotide and structural variants ... 15

The cancer genome ... 16

Personalized medicine ... 18

Methods for analysis of gene expression and genetic variation ... 19

Method validation ... 19

Global studies of gene expression ... 20

Single-cell studies of gene expression ... 21

In situ detection of DNA and RNA ... 23

In situ hybridization ... 24

Modified nucleotides in hybridization probes ... 25

In situ polymerization ... 26

Signal amplification after in situ hybridization ... 26

Padlock probes ... 27

Rolling-circle amplification ... 29

In situ detection of nucleic acids with padlock probes and RCA ... 31

Proximity ligation assay ... 32

Present investigations ... 33

Paper I: Glycosylases and AP-cleaving enzymes as a general tool for probe directed cleavage of ssDNA targets ... 33

Aim of the study ... 33

Background ... 33

Results ... 34

Discussion ... 35

Paper II: In situ detection and genotyping of individual mRNA molecules ... 35

Aim of the study ... 35

Background ... 36

Results ... 36

Discussion ... 37

(8)

Paper III: Mutation detection in cancer tissue sections for in situ

diagnostics in clinical oncology ... 39

Aim of the study ... 39

Background ... 39

Results ... 40

Discussion ... 41

Paper IV: Following the pathway - determination of cellular activity status in single cells ... 42

Aim of the study ... 42

Background ... 42

Results ... 43

Discussion ... 44

Summary and future perspectives ... 45

Acknowledgements ... 48

References ... 52

(9)

Abbreviations

A adenine AP apurinic

bDNA branched DNA

BER base excision repair C cytosine

cDNA complementary DNA

CML chronic myeloid leukemia CNV copy number variation

COSMIC catalogue of somatic mutations in cancer

DNA deoxyribonucleic acid

DUSP6 dual specificity phosphatase 6 ENCODE the encyclopedia of DNA elements FISH fluorescence in situ hybridization

Fpg formamidopyrimidine–DNA glycosylase

G guanine

GWAS genome-wide association studies HCR hybridization chain reaction HGP human genome project

HRP horseradish peroxidase

KRAS kirsten rat sarcoma viral oncogene homolog LCM laser capture microdissection

LNA locked nucleic acid LOD limit of detection LOH loss of heterozygosity

mRNA messenger RNA

MIP molecular inversion probe miRNA microRNA

mtDNA mitochondrial DNA

OG oxidized guanine

OLA oligonucleotide ligation assay PAGE polyacrylamide gel electrophoresis PET positron emission tomography PCR polymerase chain reaction

PDGFRβ platelet derived growth factor receptor beta PLA proximity ligation assay

PNA peptide nucleic acid

(10)

PRINS primed in situ labeling Q-FISH quantitative FISH qPCR quantitative PCR RCA rolling circle amplification RCP rolling circle product RCR rolling circle replication RNA ribonucleic acid

RNAi RNA interference

rRNA ribosomal RNA

RT reverse transcription

RT-qPCR reverse transcription quantitative PCR SAGE serial analysis of gene expression siRNA small interfering RNA

snRNA small nuclear RNA snoRNA small nucleolar RNA

SNP single nucleotide polymorphism T thymine

tRNA transfer RNA

TIC touch imprint cytology

TMA tissue microarray

TSA tyramide signal amplification U uracil

UDG uracil-DNA-glycosylase

(11)

Introduction

Cells are the basic units of all life and together they can form complex struc- tures, such as human bodies. The extraordinary machinery that is always active inside every cell is much more impressive than any computer or ve- hicle invented in the 21st century. A cell is crowded with immense number of biomolecules and their teamwork must be exquisite to maintain a functional unit and fulfill its destiny. Sometimes, however, a cell acquires changes that may cause it to break the basic rules of behavior; namely live, divide and die. This can lead to a non-functional state in which the cell divides uncon- trollably which over a lifetime might accumulate into cancer. The genetic differences that can cause this type of disturbance in a cell can be extremely small, which means that the cancerous cell can easily be misclassified as normal if not carefully analyzed with a suitable technique.

Throughout the years, methods have been developed for analysis of bio- molecules and the genetic changes that can cause a disorder. These methods have increased our knowledge about the complex processes that are con- stantly ongoing in cells. Depending on the application, the most suitable method is selected from the giant toolbox of techniques that exist. However, there are still many queries that cannot entirely be resolved with today’s well-established techniques and for that reason new techniques are conti- nuously developed. In order for a method to succeed within medical re- search, there should be an obvious gain that might lead to new discoveries that no other method can offer.

In this thesis, work will be presented that describes the development and application of a method for identification and visualization of single mole- cules in situ, i.e. directly in cells or tissues. With the use of specific molecu- lar probes, called padlock probes, the method is highly suitable for sensitive analysis within biomedical research and diagnostics. The introduction will give a brief overview of some of the remarkable episodes of a cell and the importance of studying these events in situ. I will describe the limitations with some of the conventional techniques and the reason behind the need for new alternative methods. Furthermore, I will discuss the key properties and the main advantages of a new technique for single molecule detection in individual cells. Finally, I will review the papers that are presented in this thesis that spans from the development to application of a new method with- in basic medical research and cancer diagnostics.

(12)

The central dogma of molecular biology

After the discovery of the double helical structure of DNA in the early 1950s1 it became clear that cells store the hereditary information needed for life in nucleotides. These nitrogenous base molecules represent the letters of the genomic alphabet and “genetic words” arise when combining the letters in different order. The words are thereafter used for creating whole sen- tences, genes, which represent units of our inheritance. To propagate the genetic information, DNA creates an exact replica of itself before each cell division for its progeny. To build the other constituents of a cell, the most crucial information is transferred to the closely related but shorter and sin- gle-stranded molecule ribonucleic acid or RNA. Finally, RNA uses the in- formation by serving as a template for synthesis of the major constituent of the cell, namely the proteins. This process was termed the central dogma of molecular biology (Figure 1) and claimed that genetic information could only be transferred linearly from DNA to protein. However, methodological advances in the past decades led to the discovery that the relationship be- tween these three molecules is much more complex than once declared and accounts for only part of the genetic agenda. In the following sections, I will give a brief introduction of the extended view of this paradigm and the ge- netic variations that can take place in a cell.

Figure 1. The central dogma of molecular biology. Double-stranded DNA in the nucleus is transcribed to RNA and transported out to the cytoplasm of the eukaryotic cell. The ribosome will then use the processed mRNA molecule as a template for synthesis of proteins.

(13)

From DNA to RNA

In contrast to the English alphabet DNA has only four different letters; ade- nine (A), guanine (G), cytosine (C) and thymine (T) and the double-stranded structure of DNA is created by the strict rule of base pairing, A binds to T, and G binds to C. The knowledge about the human code opened up for fur- ther astonishing progresses in life science and almost exact half a century after the revolutionary breakthrough by Watson and Crick, the Human Ge- nome Project (HGP) released the complete human DNA sequence library along with annotation of the encoded genes2-3. The human genome consists of three billion base pairs and contains about 20,000 protein-coding genes, residing in the 23 pairs of chromosomes within the nucleus of a cell. The genetic instructions of the genes are copied by transcription to RNA, which essentially has the same language as DNA except that thymine is replaced with the base uracil (U). The primary products of eukaryotic transcription of protein-coding genes are processed extensively to generate the mature sin- gle-stranded RNA molecule, messenger RNA (mRNA), which in turn directs synthesis of proteins. This is done through removal of internal sequences, introns, and splicing of the remaining and typically shorter coding se- quences, exons. Of the human genome only about 1% harbors protein-coding information and the remaining vast majority consists of non-coding RNA genes (which will be further discussed in the next section), regulatory se- quences, and non-coding DNA, once known as “junk”. This part of the ge- nome has been thoroughly investigated for its role more recently by the EN- CODE (The Encyclopedia of DNA elements) project4. They found that the majority of the bases (93%) in the human genome are in fact associated with at least one primary transcript, including non-protein coding elements, such as promoters and enhancers, and overlapping transcripts which highlight their importance in e.g. transcriptional and translational regulation. The tre- mendous amount of information and data about our human code, acquired from these described projects (e.g. HGP and ENCODE), is publicly available and will contribute substantially to future success in biological and medical research.

The complexity of the human transcriptome

The transcriptome, i.e. the set of all RNA molecules in a cell, is far more complex than once expected and the role of RNA molecules has expanded from being a template for protein synthesis to playing a central role for post- transcriptional control5. The most well-known transcription product is mRNA, exported from the nucleus to the cytoplasm, which when translated templates synthesis of proteins until it is degraded. The longevity of mRNA is influenced by the addition of a protein cap in the 5´ end of the transcript and the length of the adenine-consisting sequence in the 3´ end, called poly-

(14)

A tail. The balance between the mRNA synthesis and degradation deter- mines the level of a particular transcript in a cell.

There are more isoforms of RNA that have significant importance in re- gulating post-transcription, such as the so called non-coding RNAs. Non- coding RNAs include ribosomal RNAs (rRNA) that read the encoded mRNA sequence and assemble it to proteins from amino acids carried by transfer RNA (tRNA), as well as small nuclear RNA (snRNA), small nucleo- lar RNA (snoRNA), microRNA (miRNA) and small interfering RNA (siR- NA). snRNA directs splicing of pre-mRNA to mature mRNA which allows a single gene to be processed into multiple mRNAs with distinct features6. Scientific findings encountered along the road to elucidate the mechanisms behind RNA splicing led to the understanding that RNA is in fact more complex than DNA and subject to many levels of regulation. Another post- transcriptional process is RNA editing which modifies mRNA so that the nucleotide sequence is altered. This mechanism causes changes of the pro- tein it encodes, i.e. the protein will be different from that predicted by the genomic DNA sequence. Alterations in RNA editing have been connected to tumor progression as well as many other human diseases7. Post- transcriptional events, such as alternative splicing and RNA editing, are im- portant mechanisms for the generation of molecular diversity.

snoRNA is associated with specific modifications, such as DNA methyla- tion, whereas miRNA regulates gene expression by inhibiting translation by specific base-pairing to target mRNAs8 or affecting the stability of the mRNA9. miRNAs participate in the regulation of almost every cellular process and their dysregulation has been linked to cancer and disease devel- opment10-12. Finally, siRNA is a class of double-stranded RNA molecules that plays an important role for gene silencing through RNA interference (RNAi)13. Synthetic target specific RNAi can knockdown genes with high selectivity and has great potential for use in targeted cancer therapy14-15. The greater complexity and regulation of the transcriptome reflects that variation in biology has RNA at its core.

Genetic variations

The insight gained into the human genetic landscape during the past years and advances in technology and bioinformatics have enabled extensive map- ping of genetic variations. Analysis of genetic variation both among individ- uals within a population and between different populations, has been useful in the search for genetic contribution to diseases. The production of more or less complete human reference sequence evolved into the HapMap16 project;

an initiative that was based on genotyping individuals and determining hap- lotypes, i.e. blocks of sequence variants that are always seen together in a population. The project aimed to catalog genetic similarities and variations

(15)

that affect health and disease. An ongoing extension of the HapMap project is the 1000 Genomes Project17 which aims to construct a more complete catalogue of the genetic variations in the human genome using massively parallel sequencing. These projects will support medical research in the un- derstanding of how genetic variation is related to common diseases. In the following sections, I will describe different types of genetic variations and their affect on human health and disease.

Single nucleotide and structural variants

The DNA between any two people are 99.9% identical18, thus it is the small variations that can underpin certain diseases or phenotypes, i.e. the observa- ble characters of an organism. Frequently occurring sequence variations, where the least common allele have a frequency of >1% in a population, are called polymorphisms, and do not usually affect biological functions. Se- quence alterations that, on the other hand, do affect biological functions are named mutations. Mutations can either be inherited from one or both parents or arise spontaneously when mistakes in the DNA replication process occur or by exposure to environmental agents such as UV-light, radiation or chem- ical mutagens, such as water and oxygen. Mutations are usually under nega- tive evolutionary selection pressure and are thus generally rare. Variations in the human genome can in a broad sense be divided into two classes: single nucleotide variants and structural variants. Variations that lead to amino acid substitution are referred to as nonsynonymous or missense mutations, in contrast to synonymous or silent mutations, in which the base substitution do not alter the amino acid. A mutation that causes a replacement at a single nucleotide position is called a point mutation or single base substitution.

Single nucleotide polymorphisms (SNPs) are genomic alterations that occur when either of two nucleotides can exist at a certain position in the genome and represent the most common type of genetic variation in the human ge- nome19. The vast majority of the SNPs are shared between populations and not associated with diseases, yet it is becoming more and more evident that SNPs can in fact have an impact on risk of disease, drug response etc.

Thanks to genome-wide association studies (GWAS), linking whole-genome genotyping information with disease phenotypes in large-scale cohorts, hun- dreds of genetic variants are now being associated with complex human diseases and traits20-21. However, most of the identified common variants appear to confer modest risk and explain only a small portion of the total heritability.

Structural variations are changes that span in size from one base pair in- sertions and deletions (indels) to whole chromosomes and their frequency in the population is inversely proportional to the size of the variation. Such variation also includes inversions, duplications and translocations of DNA sequences and encompasses copy-number variants (CNVs)22.

(16)

Studies of genetic variations can be used for many applications, such as forensic investigations, routine clinical tests, population genetic studies as well as for finding new genetic markers for monogenic and complex diseas- es23.

The cancer genome

All cancers are initiated by changes in the DNA sequences of the genomes of cancer cells24. Early studies reported that cancer cells have abnormal aberra- tions in their DNA sequences and if DNA from human cancers was intro- duced into normal cells they became cancerous25. Moreover, agents that are known to damage DNA and cause mutations give also rise to cancer26. The development of cancer is the result of failed competition between the ap- pearance of aberrations of the genomic sequence, caused by mutations and/or epigenetic changes (e.g. chromatin modifications and DNA methyla- tion), and the natural occurring repairing of DNA in individual cells or eli- mination of cells with too damaged DNA. Even though all cancers arise as a result of DNA sequence alterations, not all abnormalities lead to the devel- opment of cancer. Mutations that cause clonal growth advantage of the can- cer cells and are positively selected are named driver mutations, in compari- son to passenger mutations that do not contribute to cancer development since they are not selected and lack growth advantage. Most often, abnormal tumor cells are defeated or will give rise to a benign growth, such as skin moles. However, cancer cells sometimes conquer this selection, and if they are sufficiently strong they invade tissues and further metastasize in new places in the body (Figure 2).

Figure 2. Accumulation of mutations over a lifetime of a cancer patient. The lineage of cell divisions from a fertilized egg to a cancer cell showing the timing of somatic mutations (internal and external). Passenger mutations do not by itself cause cancer, but driver mutations will cause a clonal expansion. The last step demonstrate possi- ble relapse after chemotherapy when the cell may acquire a chemotherapy-resistant driver mutation. (Modified from Stratton et al, 2009).

(17)

The hallmarks of cancer have been described as the biological capabilities acquired during the development of tumors27. They include excessive proli- feration, circumventing growth suppression, resisting cell death, enabling immortality, increasing angiogenesis (i.e. formation of new blood vessels) and finally, inducing invasion and metastasis. In a recent review, two addi- tional emerging hallmarks were proposed - metabolic reprogramming and avoiding immune destruction – along with cancer characteristics that enable these hallmarks – genome instability and mutation as well as tumor- promoting inflammation28.

The genomes in cancer cells can acquire their altered DNA sequences from somatic mutations, i.e. alterations occurring in any cells other than for germ cells, which encompass all the genetic variations described in the pre- vious section. In addition, exogenous sources (e.g. viral infections) can in- troduce completely new DNA sequences to cells leading to cancer29. Some cancers are associated with inherited mutations, such as the well studied BRCA mutation in the BRCA1 and BRCA2 genes that can increase the risk for developing breast and ovarian cancer as well as other types of cancer30. The genes that are critical for cancer development can be divided into two discrete classes: proto-oncogenes31 and tumor suppressor genes32. This clas- sification is based on how the mutated gene product is advantageous for cancer initiation. Proto-oncogenes are mutated into an overactive and domi- nant form of oncogenes by a gain-of-function mutation which causes the cells to grow and divide in an uncontrollable sense. Well-studied examples of oncogenes are EGFR, RAS and BCR-ABL. Tumor-suppressor-genes, with p53 being the most well-studied example33, are protective and therefore re- quire removal or inactivation of both gene copies to see an effect on cell growth, proliferation and division. Such loss-of-function mutations are re- cessive and functions through a two-hit mode in which one of the allele is mutated while the other is lost via different possible mechanism, e.g. dele- tion. This phenomenon will result in loss of heterozygosity (LOH)32,34 and is one of the most commonly observed alterations in human cancer. In many human cancers mutations accumulate in the mitochondrial genome, however their functional role for cancer initiation and progression is still unclear35.

Currently, more than 1% of all human genes (>400) have been shown to bear mutations in cancer and major efforts to sequence cancer genomes are initiated by high-throughput mutation detection techniques, i.e. next- generation sequencing, to identify all somatically acquired mutations and cancer-critical genes. The COSMIC (Catalogue of Somatic Mutations in Cancer) database is a freely and publicly available database which stores and displays information about somatic mutations in cancer (http://www.sanger.ac.uk/genetics/CGP/cosmic/)36. Even though there are ongoing massive research efforts within the oncology field to understand cancer processes and discover new therapies, cancer remains the leading killer in the developed world and is responsible for one in eight deaths24.

(18)

Personalized medicine

Individualized medicine was introduced several decades ago for cancer ther- apy, but after the announcement of the term ‘personalized medicine’ in 199937 the interest for targeted therapy has increased considerably. Nowa- days it is frequently used along with the similar terms of stratified medicine and companion diagnostics. The general definition for these terms is opti- mizing the medical healthcare in the way that disease therapy is tailored to the individual characteristics of each patient, i.e. “the right drug, for the right patient, in the right amount, at the right time”. The terms differ in the way that personalized medicine refers to a complete treatment customization for each unique patient, whereas stratified medicine use molecular testing to group patients with similar biological characteristics for suitable treatment.

Companion diagnostics, on the other hand, refers to molecular diagnostic tests that are used for correctly identifying patients that will benefit from a certain drug. The goal for all these approaches is not creation of new drugs that is unique to a patient, but rather to increase the ability to classify pa- tients into groups that differ in susceptibility to a particular disease or re- sponse to a therapy. The driving force behind this concept is the many dis- coveries during the last decade about genetic variation as well as the major technological advances within molecular profiling and sequencing tech- niques38. Furthermore, the assumption is that current diagnosis standard of human diseases is incomplete and between 30% and 70% of patients will have unsuccessful or adverse response to drug treatment39. Cancer patients have the lowest success rate for drug response which has caused major influ- ences of personalized medicine within the oncology field. Targeted therapy has been used for a long time on cancer patients with the highly successful examples of Herceptin in breast cancer and Gleevec in chronic myeloid leu- kemia (CML). A recent implementation to personalized medicine within oncology is that all patients with colorectal cancer are now screened for point mutations in the kirsten rat sarcoma viral oncogene homolog (KRAS) gene before initiation of anti-EGFR therapy, since positive mutation status indicates that the patient will not respond to this treatment40. The develop- ment of therapeutics is a complex and costly process with low probability of approval success. Efficient stratification during clinical trials may increase the likelihood that a drug will be approved, increasing the number of drugs entering the market.

The scientific interest in personalized medicine is continuously growing and the hopes are high that it will be a new diagnostic era in which the pa- tients will benefit from better clinical outcomes and decreased side effects, and that less money will be spent on ineffective treatments41.

(19)

Methods for analysis of gene expression and genetic variation

Gene expression is the sum of processes that give rise to a certain transcript and protein level in a cell. This process is finely regulated to ensure that the correct balance of transcripts and proteins is present at the right time in the right cell. To fully understand the biology behind this process we need to study mRNA and protein levels independently to know how expression of specific genes is related to a particular phenotype. The protein level is the result of factors such as mRNA and protein half-life, and transcription and translation rate. By including information about mRNA distribution, the complete picture of gene expression will be more descriptive compared to only protein distribution information42-43. Furthermore, mRNA studies can provide important insights about transcriptional variations, such as allelic imbalance, that cannot be examined with protein studies. In the coming sec- tions, I will describe methods that are used for analyzing gene expression and due to the nature of the presented papers I will focus on single-cell me- thods for in situ analyses.

Method validation

There is a wide range of methods for analyzing gene expression. Some glob- al approaches allow for studies of thousands mRNA in one run, while others focus on single genes in single cells, and surely the range of approaches have their advantages that will be discussed further in the following sections.

Every assay has properties that make methods suitable for a certain applica- tion. The choice of assay can depend on target analyte, concentration levels of target, sample matrices, costs, equipment etc. Clearly, no strategy is ap- plicable to every experimental situation and validation of the method should be based on the intended use of the assay. To be able to understand the ter- minology when describing and comparing methods I will define some com- mon terms that are used in this thesis. Sensitivity is the capability of a me- thod to discriminate between small differences in concentration of target molecule in a sample and should not be confused with limit of detection (LOD) that instead describes the lowest detectable concentration of analyte in a sample. Other terms that are often used interchangeably are specificity

(20)

and selectivity, yet they are different in meaning. Specificity describes how efficient the assay targets the correct analytes in a sample, i.e. that it only detects the molecules of interest. Selectivity, on the other hand, defines the method’s ability to distinguish between closely related targets, e.g. for SNP genotyping or detection of point mutations. Detection efficiency denotes in general how well an assay performs, i.e. the proportion of the detected tar- gets relative to the total amount of targets in a sample. Other important prop- erties for a reliable assay are robustness, which is the ability to remain unaf- fected by small variations in the procedure, and reproducibility that deter- mines the capacity of obtaining similar results between experiments and in different laboratories.

Assays that analyze biomolecules can be classified after level of sample- complexity. In vitro assays analyze biomolecules in a cell-free extract and the results represent the average value of the target molecule in the complete sample. In situ analysis offers localized detection of biomolecules within single cells and tissues that have been fixed for preservation of the biological material. These two concepts and their advantages will be further addressed in the next section. The most complex approach for analysis of biomolecules is in vivo assays where intact living organisms are studied, such as in vivo imaging, e.g. PET (Positron Emission Tomography), that is frequently used in disease diagnosis. However, in vivo analyses are usually costly, technical- ly challenging and sometimes subject to many ethical questions.

Global studies of gene expression

Many of the decisions that cells take concerning survival, growth and diffe- rentiation are reflected in altered patterns of gene expression. mRNA expres- sion can be studied globally with techniques such as reverse transcription quantitative polymerase chain reaction (RT-qPCR)44, microarrays or more recently sequencing technologies45-46. RT-qPCR is a variant of quantitative PCR (qPCR) where the RNA is first reverse transcribed into its complemen- tary DNA (cDNA) using a reverse transcriptase enzyme47. The RT-qPCR technology is easy to perform, capable of high-throughput analysis of up to hundreds of known transcripts at a time and can combine low LOD with reliable specificity48. These assays can also be used for quantification of mRNA targets. To achieve proper quantification of mRNA expression by RT-qPCR the RNA should be of good quality, internal controls as well as standard curves should be included and samples should be normalized against relatively constantly expressed genes, so called housekeeping genes48. RT-qPCR experiments that rely on RNA extraction of sometimes complex tissue samples will give an average value from numerous variable subpopulations of cells of different lineage at diverse stages of differentia- tion. This average value can be misleading in attempts to compare mRNA

(21)

expression levels between different individuals44. Moreover, comparison of gene expression patterns of housekeeping genes, in subpopulations of cells derived from the same individual, revealed differences in mRNA levels. This provides evidence that cellular subpopulations of the same origin are highly heterogeneous49. This is a recurrent subject in this thesis and will be further discussed in the coming sections of this thesis.

The microarray technology allows simultaneous characterization of ex- pression levels on a genome-wide scale and has been applied for not only detection of mature mRNA, but also non-coding RNAs50. For gene expres- sion profiling by microarray technology, transcripts are isolated, labeled and hybridized to thousands of probes that are attached to a solid surface. Un- reacted targets (that are not bound to the DNA probes) are washed away and the remaining signals are detected and measured. However, array-based as- says suffer from certain limitations such as unreliable detection of low abun- dant genes and cross hybridization which gives rise to unspecific signals50-51. This type of hybridization-based approach rely on prior knowledge about the genome sequence in contrast to sequence-based approaches that directly can determine the expression of novel transcripts, thereby allowing identification of previously uncharacterized genes. The limitations associated with tradi- tional sequencing, i.e. Sanger sequencing, such as relatively low throughput and high costs, resulted in the development of tag-based methods, e.g. Serial Analysis of Gene Expression (SAGE)52 that offer higher throughput and precise digital gene expression levels. However, disadvantages with this approach limit the use of it, such as laborious technical procedure and diffi- culties in resolving similar transcripts52-53. The development of a panel of next-generation sequencing technologies (e.g. 454/Roche, Illumina, SOLiD and Helicos) provided new transcriptome studies for gene expression profil- ing as well as for identification of genetic variants such as mutations, splice variants and fusion genes and was termed RNA-Seq (RNA sequencing)46,53-54. Although it is still in the early stage of use, RNA-Seq is believed to have many advantages over previously described methods, such as the deep cov- erage and base level resolution. However, the newly described sequencing technique is associated with some limitations or difficulties, such as non- uniformity of transcript coverage and transcript-length bias, which will be important to further advance RNA-Seq in becoming an invaluable tool for the characterization and quantification of the transcriptome46,55.

Single-cell studies of gene expression

High-throughput methods described in the previous section are preferable for some studies. However, expression analysis of single genes in single cells is also important. For example, during the last decade it has become evident that single-cell analyses are crucial for finding rare events in a sample56, e.g.

(22)

cancer cells that are hidden in a group of normal cells. Moreover, studies reveal that gene expression can be highly diverged, even within a clonal population of cells57, and that genes are transcribed in bursts with long pe- riods of expression inactivity58. In these cases, single cell detection tech- niques that can identify cell-to-cell differences within a population become a preferable method of choice. Furthermore, multiplex in situ analysis is re- quired to appreciate the interplay between different cells in a heterogeneous tissue and the respective transcript expression profiles. A plausible risk with bulk measurements is the limitation to see differences at the inter- and intra- cellular level and instead end up with false positives or negatives that represent the average value in that sample (Figure 3). Thus, the advantage of studying single molecules in individual cells is that it gives the correct fre- quency distribution of expressed molecules for single genes, yielding much more detailed information than can be gleaned from the mean value alone.

Figure 3. Single cell heterogeneity and correlated transcript levels. Single cell me- thods can distinguish (a) and (c) from (b) and (d), respectively, while average mea- surements cannot.

(23)

The in vitro techniques described in the previous section use isolated mRNAs from cells or tissues to determine the expression levels. However, the precise dissection of tissues might be difficult to attain without inadver- tently including some irrelevant surrounding cells which can lead to false results. Exact sampling is especially important for diagnostic analysis where samples need to be as pure as possible, with no contamination from normal cells, to prevent uncertain or incorrect results. In a SNP array genotyping study, it was shown that when sample purity decreases to 80% or lower, problems with uncertain or incorrect calls arise59. This type of sample con- tamination could for example lead to problems in distinguishing between true LOH and other allelic imbalanced mechanisms34 in cancer.

One method that offers mRNA as well as protein expression analysis is laser-capture microdissection (LCM) of single-cells isolated from a certain location in a heterogeneous tissue60. The technique can precisely target and capture the cells of interest for a wide range of downstream analyses, such as RT-qPCR, microarrays and sequencing. However, LCM has a number of potential drawbacks. The procedure is expensive, time-consuming and li- mited to amplification-based techniques61. Moreover, another major limita- tion is the need to identify the cells of interest based on morphologic charac- teristics, which in turn, requires a trained histologist or pathologist62.

In situ detection of DNA and RNA

In situ analyses can achieve precise and spatial localization within morpho- logical preserved cells or tissues as they occur in their natural situation.

Studying tissues can also give comprehensive information of the origins of the different cell types and find regions containing cells of similar characte- ristics. There are impressive amounts of tissue materials stored in biobanks around the world, thus there is still a lot of new material to explore with me- thods suitable for these samples. In addition to this, studies of transcriptional variation and cells with deviating gene expressions make high resolution in situ methods of absolute importance. Furthermore, to address in situ asso- ciated limitations, such as limited sample size and costly reagents, the tissue microarray (TMA) technology was developed that facilitates molecular pa- rallel analysis in thousands of tissue specimens. TMA’s are produced by punching out samples from selected tissue-regions and distributing them on a single slide63. The TMA technology has been widely applied within the field of cancer research for diagnostic and drug target discovery64-65.

Because of the nature of the papers described in this thesis I will hereon focus on in situ analyses.

(24)

In situ hybridization

A technique for in situ hybridization was first described in 1969 for detec- tion of ribosomal DNA66. At first radio-labeled probes were used, however many non-isotopic labeling variations have been developed. Non-radioactive hybridization methods can be divided into two groups: direct and indirect. In the direct method, the probe is bound directly to the target molecule so the resulting hybrid can be visualized in a microscope immediately after hybri- dization. This can be accomplished by introducing labeled nucleotides (e.g.

fluorophores) to the probes. With the indirect method, the labeled probe is not visualized directly. Instead, a reporter molecule is bound to the label after hybridization which enables the visualization of the target in a micro- scope. Commonly used labels for indirect approaches are streptavidin and digoxigenin. The direct method is simple and fast and best suited for detec- tion of repetitive sequences and multicopy genes, whereas the indirect me- thod is more labor intensive and instead suitable for low-copy target se- quences67.

The first fluorescent in situ hybridization (FISH) application came in 198068 where fluorescently-labeled RNA probes were used for detection of specific DNA sequences. Since that time, numerous variants have been de- veloped and applied for different purposes69. Fiber-FISH allows high resolu- tion analysis of stretched DNA permitting mapping of genes and segments of DNA on fibers of chromatin70. Flow-FISH combines in situ hybridization with flow cytometry71 whereas Q-FISH (quantitative FISH) permits mea- surements of signal intensity and is commonly used for studies of telomere lengths72. This development made multiplex analysis of nucleic acids possi- ble by using probes labeled with different fluorophores. Probe design has over the years been optimized and compared for length and labeling of the probes. Long FISH-probes cannot be used for detection of short DNA se- quences and can give rise to high background since its sequence may involve repeated regions that can bind unspecifically to incorrect targets. Shorter probes, on the other hand, can more easily enter the cells and nuclei but are associated with a decrease in hybridization efficiency and selectivity. Due to its poor selectivity, FISH is therefore not suitable for diagnostic detection of highly similar sequences or SNPs.

More recently, FISH has been used for detection and quantification of in- dividual RNA molecules. In one study, oligonucleotides labeled with many fluorophores, were hybridized to single β-actin mRNAs to generate high- intensity signals that could be quantified by comparing their intensities to background fluorescence73. Problems with synthesizing heavily labeled probes and differences in target intensities led to the development of a slightly modified approach by another research group for single mRNA de- tection in situ. By using large numbers of oligonucleotides, each labeled with a single fluorophore, they were able to detect individual mRNA molecules in

(25)

fixed cells that were identified as uniform signals by fluorescence microsco- py74. Recently, a multiplexed FISH method based on orthogonal amplifica- tion with hybridization chain reactions (HCR)75 was described for simulta- neous detection and imaging of five mRNAs in fixed whole-mount zebrafish embryos76.

Although these mRNA-detection approaches can offer sufficient detection efficiencies and a possibility to determine transcript copy numbers in single cells or in multiplex, the challenge of distinguishing between highly similar sequences remains. Efforts have been made to circumvent this limitation as well as general in situ-associated difficulties, such as low target abundance and poor target accessibility. Modified nucleotides incorporated into hybridi- zation probes can improve the binding affinity to a target. Furthermore, ap- proaches for efficient in situ detection of nucleic acids can be categorized after their different strategies: target amplification, signal amplification and probe amplification77. These approaches will be discussed in the following sections.

Modified nucleotides in hybridization probes

Some strategies to improve the resolution of FISH involve the incorporation of modified nucleic acid analogues into the hybridization probes. Peptide nucleic acids (PNA) are synthetic DNA analogues with protein-like back- bones that have unique properties, such as forming very stable duplexes with DNA and RNA78. Unlike natural nucleic acids, PNAs are uncharged com- pounds thus resulting in greater hybridization stability and higher melting temperature (Tm) of the DNA-PNA hybrid than natural DNA duplexes. This high affinity binding was later reported in a FISH study in which PNA probes were used for discrimination of repeated sequences that differed by only a single base pair79. However, since PNAs are neutral molecules with- out any charge they are poorly soluble in water compared to DNA and have a tendency to aggregate80. A peculiar property with this type of nucleic acid hybrid is that PNA strands can also form triple helix with DNA displaying structures with very high Tm81.

Locked nucleic acids (LNA) are another type of nucleic acid analogue that has exceptional hybridization affinity towards complementary DNA and RNA molecules. The synthetic LNA molecule contains a methylene bridge on the ribose ring between the 2´-oxygen and the 4´-carbon thereby locking the structure into a high binding-affinity with reduced conformational flex- ibility. An LNA-DNA duplex provides a substantial increase in thermal sta- bility with the ability to increase the Tm of an oligonucleotide with +1 to +8

°C for DNA and +1 to +10 °C for RNA per LNA monomer introduced82. Furthermore, LNA probes have high discriminatory power between matched and mismatched sequences which make them well suited for sensitive nucle-

(26)

ic acid detection. Unlike PNA-probes, LNA-modified oligonucleotides are fully soluble in water, which simplifies experimental implementation83. LNA-modified probes have been used in various applications including in vitro discrimination of single-base mismatches84, FISH-studies for detection of repeated genomic sequences83,85 and miRNAs86 and for whole mount in situ hybridization detection of mRNAs87.

In situ polymerization

To increase the sensitivity of traditional in situ hybridization, target- amplification strategies involving in situ polymerization have been devel- oped. Over two decades ago, in situ PCR was developed for detection of DNA molecules88. The method conducts PCR directly on cells and tissues with elongation of sequence specific primers and amplification in a conven- tional thermal cycler. The amplified targets can be detected directly with labeled nucleotides or indirectly via in situ hybridization of labeled target specific-probes, which is more specific and therefore preferred. Further mod- ifications of the technique lead to in situ RT-PCR for detection of RNA89. In practice, however, the method is associated with many problems, such as low amplification efficiency, poor reproducibility, sensitivity and specificity as well as problems with high background, which makes the practical appli- cation limited77,90-91.

A similar technique, also based on in situ polymerization, is the primed in situ labeling (PRINS) procedure in which an unlabeled, target-specific probe is hybridized and used as primer for chain elongation in situ using Taq po- lymerase and labeled nucleotides92. The method has mainly been used for detection of chromosomes in metaphase spreads and interphase nuclei93, but has also been adapted for mRNA identification on cell lines94. Although PRINS has the specificity to discriminate between single nucleotide differ- ences it cannot detect low copy-number sequences due to low LOD77,90-91.

In situ PCR and PRINS are target amplification-based techniques that were developed to address the need to amplify and detect targets in situ and represent rapid and relatively inexpensive alternatives to some in situ hybri- dization applications. However, due to the limitations described above none of these methods is suitable for detection and visualization of low-copy sin- gle nucleotide variants in situ.

Signal amplification after in situ hybridization

An alternative approach to improve detection of nucleic acid is to increase the signal rather than amplifying the target or the probe. The branched DNA (bDNA) signal amplification technique uses sets of labeled probes, hybri- dized sequentially to the target nucleic acid creating comb-like DNA struc- tures, which generate chromogenic or fluorescent signals in an alkaline

(27)

phosphatase-catalyzed reaction. The bDNA in situ hybridization method has been used for specific detection of DNA and RNA molecules in different cell types down to only a few DNA copies per cell95-96. Another strategy is the tyramide signal amplification (TSA) method that is based on the ability of horseradish peroxidase (HRP) to convert fluorescent or hapten-labeled tyra- mine molecules into a highly reactive oxidized intermediate that can bind tyrosine at the site of HRP-probe binding. The detection signal is in the order of 500-1,000 fold compared to conventional procedure97 which makes it applicable for sensitive in situ hybridization experiments of low-copy nucle- ic acid. However, since the amplification is so great with high sensitivity, TSA can result in unfavorable signal-to-noise ratio from e.g. endogenous peroxidases91.

Amplifying the signal instead of the target or the probe makes this strate- gy a quantitative tool since the original amount of the target remains unmo- dified. Another advantage with signal amplification is that it is performed after in situ hybridization, so the specificity is not compromised. On the oth- er hand, in case of poor specificity also the unspecific signals will be ampli- fied, which demand highly specific probe hybridization.

Padlock probes

The concept with padlock probes was invented two decades ago and is an extension of the oligonucleotide ligation assay (OLA)98. Padlock probes have many advantageous characteristics and offer highly selective detection of DNA and RNA in solution and in situ. First, padlock probes are linear oligonucleotides of approximately 70 to 100 nucleotides in length with tar- get-complementary 5´- and 3´- ends which constitute dual target recognition when both probe arms must hybridize correctly to the target. This property allows for highly multiplex assays with limited cross-reactivity between probes99-101. Secondly, when the padlock probes hybridize to their correct target the ends of the padlock probe are brought together in a head to tail orientation, with only a nick in between. The nicks can be sealed by a DNA ligase creating circles that are locked onto the target strands as padlocks102, as illustrated in Figure 4. This nick ligation will only occur if there is a per- fect match between probe and target at the ligation junction, leaving allelic probes linear and unamplified103. Thirdly, upon circularization, the probes can act as template for amplification using PCR or rolling-circle amplifica- tion (RCA) that will be further described in the following sections. Moreo- ver, the arising probe/target duplex becomes topologically locked and will thereby resist extreme washes, which reduce the amount of non-specific signals. These characteristics make padlock probes especially suitable for detection and genotyping of point mutations and SNPs. In those cases, probes are designed in the way that they are only differing at one nucleotide

(28)

position in the 3´-end, since the ligase is in particular sensitive to mis- matches at that site104.

In our lab, the two DNA ligases commonly used for ligation of padlock probes are T4 DNA ligase and the thermostable Ampligase. They differ in important properties. One is the source of energy: T4 DNA ligase uses ATP and is dependent on NaCl105, while Ampligase uses NAD. Another differ- ence is that Ampligase is active and stable at higher temperatures than T4 DNA ligase which enables more stringent reaction conditions. Ampligase is also associated with higher ligation specificity than T4 DNA ligase which makes Ampligase more suitable for discrimination of similar target se- quences e.g. genotyping103.

Figure 4. Padlock probe circularization upon binding to correct target. (a) Linear padlock probe with target-complementary 5´ and 3´ ends (green) and recognition site for detection (blue) is designed to hybridize to complementary target (red). (b) Upon perfect matched hybridization, the nick is sealed by a ligase creating a circular probe locked onto its target. (c) Upon circularization, the padlock probe can act as template for amplification (e.g. RCA) and finally the complementary products can be visua- lized by hybridization of a fluorophore (star) labeled oligonucleotide (yellow).

(29)

Gap-fill padlock probes have a slightly different design than the tradition- al padlock probe in which the target-complementary ends are designed to hybridize a distance apart from each other. The gap formed in between the hybridized arms of the probe is then filled by a polymerase prior to ligation106-107. One example of a gap-fill padlock probe is the molecular inversion probe (MIP) that leaves a single nucleotide gap when hybridized to its target, thus makes it highly suitable for genotyping assays100. Another probe that belongs to the family of padlock probes is the selector probe. A selector probe is designed to hybridize to both ends of a DNA restriction fragment target creating a circle that is closed by ligation108. The selector probe technique has been applied for studies of genetic variations109, but is now mainly focusing on targeted resequencing applications110-111.

Padlock probes have been widely used over the years in various applica- tions for sensitive detection of nucleic acids, e.g. in studies of genotyping100-

101,104,112, gene copy number113-114, gene expression analysis112,115 and for detection of infectious pathogens116.

Rolling-circle amplification

Detection of padlock probes can be done in many ways, but from an in situ- perspective, with the readout format being microscopy, you want to produce the strongest signal possible that can be distinguished from the background.

RCA, also known as rolling circle replication (RCR), is an isothermal ampli- fication technique of circular DNA molecules that creates long single- stranded DNA molecules with tandem repeats of complements of the origi- nal circles117. Thus, RCA generates copies at a rate that is linear over time unlike PCR that proceed in an exponential fashion. Since the contiguous rolling circle products (RCPs) will by nature collapse into micrometer-sized DNA-bundles, RCA is highly suitable for localized detection. The RCPs become detectable in a fluorescence microscope by the local enrichment of short fluorescent probes that hybridize to the detection sites of the coiled RCPs118.

The enzyme used in RCA, Φ29 DNA polymerase, possesses several im- portant features which make it most suitable for the efficient amplification of circular DNA molecules. The polymerase carries a 3´ to 5´ exonuclease ac- tivity that enables proofreading of the newly synthesized DNA119-120. This feature also becomes handy for initiation of polymerization in cases when no complementary primer is added and should instead be primed by the target.

In order for the RCA to begin, the target strand must have a free end close to where the padlock probe hybridizes. This is because the reaction is inhibited by the topological link formed between the padlock probe and its target se- quence121. If a nearby free 3´ end is introduced close to the padlock probe site at the target DNA strand, this inhibition is circumvented and the reaction

(30)

can proceed efficiently. This variant of RCA has been named target-primed RCA and is applied in all papers in this thesis. Once the appropriate primer is created and in place the Φ29 DNA polymerase can then switch its activity and starts incorporating nucleotides in 5´ to 3´ direction. After replicating the entire circular padlock probe sequence, the polymerase will reach the prim- ing site again. The polymerase then displaces the newly synthesized strand and goes on for another round122. This strand-displacement activity will keep the amplification continuously going, creating longer and longer RCPs, until inactivation of the polymerase or change of temperature. After approximate- ly one hour of amplification the resulting concatemeric product is roughly one µm in diameter and contains about 1,000 copies of the original padlock probe. This RCP can then be labeled by hybridization of complementary fluorescent detection oligonucleotides and is easily detected and visualized as a bright spot in a fluorescence microscope (Figure 5).

Figure 5. A rolling circle product (RCP). (a) Padlock probe amplification with RCA creates a long linear molecule that spontaneously coils up to a DNA-bundle. The RCP becomes detectable by hybridization of fluorescence labeled probes (stars) that is visualized as a bright spot (square) in a cell. (b) Hundreds of RCPs representing detected mRNA targets in the cytoplasm of a human fibroblast. Scale bar, 20 μm.

A great advantage with the appearance of single RCPs, being distinct bright signals representing individual molecules, is that it permits exact quantification of detected targets in a solution or in situ. In a solution-phase assay for detection of single DNA or protein molecules using padlock probes, RCPs have been enumerated using standard fluorescence microscopy by scanning samples that were pumped through a microchannel116. Highly precise quantification of detected targets has also been demonstrated in stu- dies with RCPs immobilized onto microscope slides113 as well as in several in situ analysis for quantification of RCPs found in single cells112,123-126. Fur- thermore, the number of counted molecules can be compared and correlated with the expected number of target molecules in the same sample, offering a value for the overall detection efficiency of that assay.

(31)

In situ detection of nucleic acids with padlock probes and RCA

Padlock probes were recognized early on for being useful for in situ analysis and the first published application demonstrated genotyping of centromeric sequences127 using hapten or fluorescence labeled probes. With this ap- proach, it was not possible to detect single molecules because of high back- ground from unspecifically bound probes. Soon thereafter the first combined in situ assay using padlock probes and RCA was presented on cytological preparations with the potential of increasing the signal and thereby specifici- ty and decreasing unspecific background117. The detection efficiency was reported to remain very poor, possibly due to the topological inhibition of RCA when the probe is locked to the target121. In response to this, an alterna- tive approach was presented a few years later, in which the target strand was enzymatically prepared to create a nearby free end that could prime the RCA128. Later on, this target-preparation process was refined in a robust protocol for in situ detection and genotyping of mitochondrial DNA (mtDNA) and the MELAS mutation with padlock probes and RCA104. By combining restriction enzyme digestion with an exonuclease step for target preparation, the double-stranded DNA was made single-stranded to enable binding of padlock probes to targets. By having two padlock probes, one for each genotype (designed with different tag sequences), the wild-type geno- type was easily distinguished from the mutant in both homo- and heterop- lasmic cell lines as well as in fresh-frozen tissue104. Other in situ applications with padlock probes and RCA have been demonstrated for the detection of repetitive genomic sequences129-130 as well as for the detection of bacterial infection of the Anaplasma spp. in epithelial cells in situ131. A limitation with an approach that is based on specific cleavage of restriction enzymes can be that there are no suitable restriction sites within a near proximity to the pad- lock probe, thus leading to a potential inhibition of padlock probe hybridiza- tion and/or initiation of RCA. This issue will be further discussed in the background of paper I.

There has always been a great interest for a possibility to perform gene expression profiling and genotyping of transcripts in situ using padlock probes and RCA. Direct detection of RNA with padlock probes has had limi- tations since it is well known that ligation on an RNA template is not as se- lective and efficient as compared to DNA132-133. Moreover, there was uncer- tainty whether Φ29 DNA polymerase can use RNA as a primer for RCA. As a consequence, many strategies have been developed with the aim to find a working method that allows detection and genotyping of transcripts in cells and tissues. A variant of the padlock probe was demonstrated which could avoid these problems to some extent. This probe, called turtle probe, created an internal hairpin at the site where the ligation event occurred, which in this way circumvented the limitations associated with RNA-ligation134. However,

(32)

the method was not suitable for genotyping and limited for detection of non- polyadenylated RNA and target sequences in the very end of the RNA which thereby leave out the possibility for detection of the majority of mRNAs. In another study where padlock probes and target-primed RCA were applied for RNA detection in situ they demonstrated that Φ29 DNA polymerase in fact could digest the 3´ end of RNA. The same study showed detection of internal sequences, although the efficiency was estimated to be as low as 1%135.

The development of a method that enabled mRNA detection and genotyp- ing in situ was the aim of this thesis and will be further described and dis- cussed in the papers presented herein.

Proximity ligation assay

Proximity ligation assay (PLA) is a specific detection method of target pro- tein molecules that can, like the padlock probes, be adapted for in situ analy- sis in cells and tissue. The technique employs proximity probes which are chimeric molecules consisting of an antibody for recognizing the protein of interest, and a conjugated oligonucleotide for subsequent detection purposes.

Upon binding of pairs of proximity probes to the same protein target, the oligonucleotides attached to the antibodies become colocalized and can be joined by ligation to serve as a template for amplification136. PLA can be varied in many ways and allows sensitive detection of a wide range of tar- gets. For in situ PLA the proximity probes direct the formation of a circular DNA molecule and thereby serve as template for amplification via RCA, similar to the in situ padlock probe approach described in the previous sec- tion. In situ PLA permits detection of individual proteins, their modifications and protein-protein interaction in cells and tissue126,137.

(33)

Present investigations

The aim of the investigations presented in this thesis was to develop an in situ method for detection and genotyping of single transcripts in cells and tissues using padlock probes and target-primed RCA. This line of study re- sulted in four papers presented below. The first paper presents an approach for targeted cleavage of ssDNA targets, e.g. extended 3´ ends that can inhibit RCA initiation. The cleavage method can be useful as a complement to re- striction endonucleases. The second paper describes a method for detection and genotyping of single mRNA molecules in fixed cells and tissues which was thereafter applied for detection of KRAS point mutations in complex tumor tissues, presented in paper three. Finally, the fourth paper in this thesis describes an in situ method for simultaneous detection of individual endo- genous protein modifications and mRNA molecules in single cells.

Paper I: Glycosylases and AP-cleaving enzymes as a general tool for probe directed cleavage of ssDNA targets

Aim of the study

The aim of paper I was to develop a strategy for directed cleavage of ssDNA by using a set of repair-enzymes and their cleaving activity towards certain mismatches. By having an A nucleotide at the target cleavage position mis- paired with a G in the complementary oligonucleotide probe, the cleavage will be site-specific and useful as a complement to other cleaving enzymes, e.g. restriction endonucleases. This could be applied for cleavage of ex- tended 3´ ends that are inhibiting initiation of an RCA reaction.

Background

Only a small subset of techniques exists today for precise cleavage of DNA.

Restriction enzymes can be directed for specific cleavage if the duplex struc- ture contains the correct recognition sequence of the enzyme. Even though there are hundreds of these enzymes commercially available, this multi-

(34)

nucleotide site dependency can severely restricts assay design towards clea- vage at an unsatisfied site. Moreover, in case of multiple targets the costs will increase substantially with the number of enzymes that are required. In this paper we developed a way to direct DNA cleavage to a certain target site without the need of any nucleotide modifications, restriction endonucleases or expensive equipments. By using a pair of DNA repair enzymes138 which recognize and cleave at mismatches, we can decide where the ssDNA will be cleaved. Briefly, the method is based on hybridization of an oligonucleotide to a target sequence. The DNA duplex formed between target and probe is designed such that a deliberate mismatch is created. The internal mismatch will lead to cleavage at a precise nucleotide site. The 5´- and 3´-ends gener- ated from the cleavage reaction are then suitable for other subsequent enzy- matic reactions such as ligation or polymerization.

MutY glycosylase, which is an important member in the base excision re- pair (BER) pathway, recognizes G:A139, oxidized guanine (OG):A140 and to a lesser extent C:A141 mismatches in DNA. In case of an internal G:A mis- match in a duplex, MutY glycosylase removes the undamaged A leaving behind an apurinic (AP) site. This AP- or abasic site can then be severed by an enzyme carrying AP-cleaving properties such as Endonuclease IV142 or formamidopyrimidine–DNA glycosylase (Fpg)142 leaving the rest of the molecule intact. An oligonucleotide is designed in the way that when it hy- bridizes to a complementary target, a deliberate G:A mismatch is created that will cause cleavage at the desired position in the sequence. The aim of the study was to examine if this strategy could be used for in situ cleavage of nucleic acid targets, hybridized to a padlock probe, suffering from inefficient RCA due to extended 3´ ends.

Results

We initiated the study by investigating the enzymatic activity of MutY gly- cosylase with the use of a pair of oligonucleotides that when hybridized created an internal G:A mismatch. MutY glycosylase was added for removal of the undamaged A of the mismatch and the generated AP-site was cleaved by a combination of heat and alkali-treatment. The cleaved and non-cleaved DNA were thereafter separated and identified by polyacrylamide gel electro- phoresis (PAGE) as the A containing strand was 32P-labeled. By comparing the band intensities of the gel, we demonstrated that approximately 85% of the sites were cleaved which correlates well with literature143. We thereafter wanted to explore the cleavage efficiencies of the different AP-cleaving en- zymes, since the traditional chemical strategy of AP-cleavage using strong heat/base treatment is not an option for in situ analyses. For this, we used a model system with a hybridized duplex in which the labeled strand contained a U that effectively is removed by treatment with uracil-DNA-glycosylase

References

Related documents

The increased pressure difference will also lead to an increased volume when the diaphragm is stretched. The fingerprint was affected by friction in two ways. First, the maximum

3.1.5 Comparison of hybridisation signal using digoxigenin and radioactive labelling In order to investigate whether any differences in hybridisation signals occurs between

To monitor the different steps in the experiment where blobs were to be counted from recognition molecules instead of preligated circles an assay was performed where

This assay requires two primary antibodies raised in different species, for example rabbit and mouse, targeting the protein/proteins of interest which are

The same optimization was performed on mouse liver tissue sample A165L but with incubation time in pepsin for 2.5, 3, 3.5 and 4.5 min followed by mtDNA protocol for tissue

(representative data shown in section 3.4). The number of signals per cell plays an important role when using flow cytometry as an analysing method. As can be seen from figure 6,

Since the division of the front channel occurs close to the main inlet (Fig. 6, 1), cells flowed into the chip from the two loading ports (Fig. It is therefore possible to

The project aims to evaluate and optimize assays that could be used in Companion Diagnostics relevant for colorectal cancer. The other part was to validate new