• No results found

On diverse biophysical aspects of genetics: from the action of regulators to the characterization of transcripts

N/A
N/A
Protected

Academic year: 2022

Share "On diverse biophysical aspects of genetics: from the action of regulators to the characterization of transcripts"

Copied!
108
0
0

Loading.... (view fulltext now)

Full text

(1)

On diverse biophysical aspects of genetics

from the action of regulators to the characterization of transcripts

Aymeric Fouquier d’Hérouël

Avhandling som med tillstånd av Kungliga Tekniska högskolan

framlägges till offentlig granskning för avläggande av teknologie doktorexamen onsdagen den 6 april 2011 kl 10.00

i sal FB53, AlbaNova, Roslagstullsbacken 21, Kungliga Tekniska högskolan, Stockholm.

ISBN 978-91-7415-911-0 TRITA-CSC-A 2011:04

ISSN-1653-5723 ISRN-KTH/CSC/A–11/04-SE

© Aymeric Fouquier d’Hérouël, mars 2011

(2)

ISBN 978-91-7415-911-0 TRITA-CSC-A 2011:04 ISSN-1653-5723

ISRN-KTH/CSC/A–11/04-SE

Doctoral Thesis in Biological Physics

Typeset in LATEX

KTH – Royal Institute of Technology

School of Computer Science and Communication SE-100 44 Stockholm

Sweden

© Aymeric Fouquier d’Hérouël, March 2011

(3)

iii

Abstract

Genetics is among the most rewarding fields of biology for the theoretically in- clined, offering both room and need for modeling approaches in the light of an abundance of experimental data of different kinds. Many aspects of the field are today understood in terms of physical and chemical models, joined by infor- mation theoretical descriptions. This thesis discusses different mechanisms and phenomena related to genetics, employing tools from statistical physics along with experimental biomolecular methods. Five articles support this work.

Two articles deal with interactions between proteins and DNA. The first one reports on the properties of non-specific binding of transcription factors proteins in the yeast Saccharomyces cerevisiae, due to an effective background free energy which describes the affinity of a single protein for random locations on DNA.

We argue that a background pool of non-specific binding sites is filled up before specific binding sites can be occupied with high probability, thus presenting a natural filter for genetic responses to spurious transcription factor productions.

The second article describes an algorithm for the inference of transcription factor binding sites for proteins using a realistic physical model. The functionality of the method is verified on a set of known binding sequences for Escherichia coli transcription factors.

The third article describes a possible genetic feedback mechanism between human cells and the ubiquitous Epstein-Barr virus (EBV). 40 binding regions for the major EBV transcription factor EBNA1 are identified in human DNA.

Several of these are located nearby genes of particular relevance in the context of EBV infection and the most interesting ones are discussed.

The fourth article describes results obtained from a positional autocorrela- tion analysis of the human genome, a simple technique to visualize and classify sequence repeats, constituting large parts of eukaryotic genomes. Applying this analysis to genome sequences in which previously known repeats have been re- moved gives rise to signals corroborating the existence of yet unclassified repeats of surprisingly long periods.

The fifth article combines computational predictions with a novel molecular biological method based on the rapid amplification of cDNA ends (RACE), coined 5’tagRACE. The first search for non-coding RNAs encoded in the genome of the opportunistic bacterium Enterococcus faecalis is performed here. Applying 5’tagRACE allows us to discover and map 29 novel ncRNAs, 10 putative novel mRNAs and 16 antisense transcriptional organizations.

Further studies, which are not included as articles, on the monitoring of sec- ondary structure formation of nucleic acids during thermal renaturation and the inference of genetic couplings of various kinds from massive gene expression data and computational predictions, are outlined in the central chapters.

Keywords: transcription regulation, regulatory motifs, binding affinity, genetic interactions, secondary structure, sequence repeats, transcript characterization

(4)

iv

(5)

To life!

(6)

vi

(7)

vii

Acknowledgements

Professor Erik Aurell, my supervisor at KTH, truly deserves the first mention.

Thank you for your scientific and friendly support during many years of study, for keeping a constantly positive attitude and for reminding me that the upper side of the medal is brighter and more than once flipping doubts into incitement.

Thank you also for an immense amount of freedom in the choice of my projects, eventually allowing me to explore and dive into genetics and molecular biology while keeping a foot in physics. I also thank my co-supervisor professor Massimo Vergassola for giving me the opportunity to spend fruitful time at Institut Pasteur and for introducing me to doctor Francis Repoila at INRA. Francis, you taught me many subtleties of molecular biology, from the very basic wielding of pipettes, to the intricacies of extracting and handling nucleic acids. During the years of our collaboration, you have been a great colleague and a teacher to me.

Doctor Maria Werner, my colleague at KTH, has been a true friend through the years we’ve spent in the theoretical physics department and later at CB. You kept me going when work was frustrating and were always a role model of a true investigative and demanding scientist to me. You’re in the real world now and I hope we can find the time to meet every now and then, laughing about the past and comparing, you know, like nunchuku skills, bow hunting skills, computer hacking skills. . .

Professor Ingemar Ernberg at Karolinska Institutet provided me with scien- tific food and shelter, trusted me with with a bench in a molecular biology lab and allowed me to work in the inspiring environment of his group at MTC. I am thankful to have met such an open-minded and generous person, who al- lowed me to conduct parts of my experimental work at his lab. I also want to thank members of his group, foremost Fu Chen in memoriam, who helped me enormously with everyday lab-bench problems, LiFu Hu, Anna Birgersdot- ter, LI-Sophie Rathje, JieZhi Zou, Ziming Du, Ning Wang, and especially Gösta Winberg and Liudmila Matskova for introducing me to the lab and helping me getting started gathering equipment from the secret catacombs of MTC, thank you very much! Rozina Caridha, thank you for making the sun shine at MTC and keeping inviting me to the famous MTC pubs (one day I will make it).

I had the chance to spend some time of my work at INRA in Jouy en Josas, where I would especially like to thank Alexandra Gruss for inspiring discussions and occasional riddles, as well as Pascale Serror, David Halpern and Françoise Wessner, Philippe Palcy for providing me access to computational resources at INRA and Patrick Re´gent for cutting out those steel sheets I used to construct a home (read lab) made semidry electrotransfer unit with a couple of wires, matches and shreds of polystyrene, probably defying all odds of high-voltage electric hazard.

Professor Jerker Wiedengren at KTH and Evangelos Sisamakis agreed to col- laborate with me on the experimental verification of crazy ideas and I am grateful for your trust. Evangelos, you taught me a great deal on fluorescence spectroscopy and I hope that our collaboration will persist.

(8)

viii

Harriett Johansson at CB deserves the warmest thanks for her immense sup- port! You shielded me from many burdens of administration in the most re- markable way. Your help with the many administrative and non-administrative problems I brought to you was extraordinary.

Thanks also go to Supriya Krishnamurthy for reviewing a draft of this thesis and giving me last minute feedback. You have also been a wonderful teacher of non-equilibrium statistical mechanics and I think I owe you a properly made chocolate cake.

During the last couple of years, I got to know many people with whom I shared pleasant moments, discussing science or life in general, around a coffee or drink- ing a beer. In no obvious order I would like to thank especially Jakob Wohlert, Martin Lindén, Marios Nikolaou, Petter Holme, Yasser Roudi, Nicolas Innocenti, Samuel Bottani, Hamed Mahmoudi, René Pfitzner, Marc Bailly-Bechet, Ilqar Abdullayev, Charles Ollion, Ali Tofigh, Jean-Baptise Masson, Mikael Lindahl, Hossein Farahani, Guillaume Voisinne, Lars Arvestad, Gabriel Östlund, Kristof- fer Forslund, Örjan Ekeberg, Johannes Hjorth, Erik Fransén, Pak-Lee Chau, Oliver Frings and all staff at the KTH department of Computational Biology and the Stockholm Bioinformatics Center at SU.

Special thanks go to Dave Messina for keeping my spirits up ever since I joined CB. You have been a good friend and it is such friendship which is important in life. May the last months of your doctoral studies be paved of success! Let’s keep in touch, dude, at least once a year spilling some carvosterone.

Thank you also to my friends Cho, Leena, Qing, Oscar, Théres, Sebastian, Carola, Olaf, Gunnel, Glenn, Pia, Jonas, Åsa, Tom, Christian and Sofia for sup- port and good times, we did not find many occasions to spend together recently, and I hope we can make up for it. Nevertheless, you make life worth living every time we meet!

My mum in memoriam and dad deserve very special thanks for their constant support, without which I would not have been able to venture into a research career. Finally, thank you Anna for your unconditional support and making sure I did not starve during the last days of working on this thesis. You are my life!

Stockholm, March 2011

(9)

ix

Synopsis

This thesis consists of five parts, a general introduction to the topic in a broader sense, followed by a part on the physics and information of nucleic acids with a slight focus on the description of functional sequence motifs. The third part outlines experimental methods relevant to the work, both from a molecular bio- logical and from an experimental physical point of view. The fourth part details some theoretical and computational aspects, dwelling on methods for the infer- ence and identification of functional motifs and outlining techniques borrowed from statistical physics for the deduction of direct genetic interactions. At last, five articles featuring my work are briefly summarized. These papers are included in the appendix.

Included Papers

• Paper I: Transcription factor concentrations versus binding site affinities in the yeast S. cerevisiae by Erik Aurell (EA), AFdH, Claes Malmnäs (CM) and Massimo Vergassola (MV) has been published in Physical Biology 4(2):

134 – 143 in 2007. This collaborative project was suggested by MV, Institut Pasteur. I and CM performed all computational work and I analyzed the results. I and CM wrote an initial draft. EA and MV wrote the final paper and I contributed editing. This article is also included in my licentiate thesis (Fouquier d’Hérouël, 2008).

• Paper II: Quadratic programming sampler: a motif finder using biophysical modeling by AFdH is publicly available as e-print through the reference arXiv:0802.0258v1 [q-bio.QM] since 2008. I designed the project together with EA, performed the analytical and computational work myself and wrote the paper. This article is included in my licentiate thesis (Fouquier d’Hérouël, 2008) as well.

• Paper III: FR-like EBNA1 binding repeats in the human genome by AFdH, Anna Birgersdotter (AB) and Maria Werner (MW) has been published in Virology 405(2): 524 – 529 in 2010. This collaborative project was designed by me and MW. I performed all computations. I, AB and MW analyzed the data and wrote the paper together.

• Paper IV: Autocorrelation maps of masked sequences reveal novel repeats by AFdH is previously unpublished. I stumbled upon results motivating this project when analyzing the data to Paper III. I wrote the manuscript.

• Paper V: A simple and efficient method to search for selected primary transcripts: noncoding- and antisense RNAs in the human pathogen Ente- rococcus faecalis by AFdH, Françoise Wessner (FW), David Halpern (DH), Joseph Ly-Vu (JLV), Sean Kennedy (SK), Pascale Serror (PS), Erik Au- rell (EA) and Francis Repoila (FR) has been accepted by Nucleic Acids

(10)

x

Research and is available online as advance access paper since January 25, 2011, doi:10.1093/nar/gkr012. This collaborative project was conceived by FR. I performed computational predictions. I, FW, DH, JLV, SK, EA and FR contributed with experimental work. I, FW, SK, PS and FR analyzed the results. FR drafted the paper to which I, FW and EA contributed during the writing process.

Remark

Some of the projects which I am working on are still ongoing at the time of writing this thesis. They are, however, closely related to the theme of this work, which is why I chose to include and mention some preliminary and unpublished data. This is the case for (1) a study on monitoring the thermal renaturation of RNA using intercalating fluorescent dyes, (2) a microarray analysis of the transcriptome of human nasopharyngeal cells in presence of Epstein-Barr virus encoded non-coding RNAs, (3) a high-resolution transcriptome analysis of Enterococcus faecalis, (4) the validation of predicted RNA-RNA interaction partners in this bacterium and (5) the inference of genetic couplings from small-sample transcriptomic data for Listeria monocytogenes.

The choice of topics in the following parts is of course biased by what I deem necessary knowledge to understand the appended papers, above some conceptual grasp of basic thermodynamics and a little knowledge on the biochemistry of genetics, and what I deem useful to appreciate some of the ideas behind the ongoing and unpublished work.

Let me recommend to the interested reader, who is mainly curious about the published studies advertised in the abstract, to read the introduction part before skipping to the final chapter, where more thorough and, hopefully also for the non-expert reader, comprehensible summaries of the articles can be found.

(11)

Contents

I Introduction 1

1 Scientific context 3

1.1 Systems Biology . . . 3

1.2 Physics in Biology . . . 4

1.2.1 Historical modeling approaches . . . 5

1.2.2 Experimental techniques . . . 5

2 Genetics in a nutshell 7 2.1 Mendelian laws . . . 7

2.2 Central dogma of Molecular Biology . . . 8

2.2.1 Transcriptional gene regulation . . . 8

2.2.2 Post-transcriptional control . . . 9

2.3 Noisy gene expression . . . 12

II Physics & Information of Nucleic acids 13

3 Physics of nucleic acids 15 3.1 Building blocks . . . 15

3.1.1 Sugar & base . . . 15

3.1.2 Complementarity . . . 16

3.2 Structural properties . . . 17

3.2.1 Helix architecture . . . 17

3.2.2 DNA vs RNA . . . 19

3.3 Dynamic responses . . . 19

3.3.1 Electric fields . . . 19

3.3.2 Solvent effects . . . 20

3.3.3 Metal ions . . . 20

3.3.4 Temperature . . . 21

3.3.5 Structure prediction . . . 22 xi

(12)

xii Contents

4 Information content of nucleic acids 25

4.1 Central dogma redux . . . 25

4.2 Sequence signatures . . . 26

4.2.1 Coding sequences . . . 26

4.2.2 Functional motifs . . . 27

4.2.3 Repeated elements . . . 29

III Experimental methods 31

5 Molecular methods 33 5.1 Northern blotting . . . 33

5.2 Tagged RACE . . . 34

6 Fluorescence methods 39 6.1 Spectroscopy . . . 39

6.1.1 Quenching mechanisms . . . 40

6.1.2 Covalently labeled nucleic acids . . . 41

6.1.3 Intercalation in dynamic measurement . . . 42

6.2 Thermal renaturation . . . 44

6.3 Microarray techniques . . . 46

7 Transcriptome sequencing 49 7.1 Dideoxy termination sequencing . . . 49

7.2 Next-Generation methods . . . 50

7.2.1 Sequencing by synthesis . . . 51

7.2.2 Sequencing by ligation . . . 52

IV Inference of regulation & interaction 53

8 Inference of regulatory motifs 55 8.1 Representation of functional sites . . . 55

8.1.1 Lexical description . . . 55

8.1.2 Biophysical modeling . . . 55

8.2 Stochastic methods . . . 62

8.2.1 Gibbs motif sampler . . . 65

8.2.2 Quadratic programming sampler . . . 67

8.3 Choice of thresholds . . . 67

9 Regulatory interactions 69 9.1 Mode and prediction of direct interactions . . . 69

9.2 Inference of genetic couplings . . . 70

(13)

Contents xiii

Bibliography 74

V Papers 87

10 Summary of results 89

10.1 Paper I – Numbers and Affinity . . . 89

10.2 Paper II – Motif Sampling . . . 91

10.3 Paper III – Virus-Host Interactions . . . 93

10.4 Paper IV – Novel Repeats . . . 94

10.5 Paper V – Transcript Characterization . . . 96

(14)

xiv Contents

(15)

List of Figures

2.1 Main flow of information according to the central dogma . . . 8 2.2 Schematics of activation (A) and repression (R) of transcription

by DNA binding proteins, transcription start sites are depicted by arrows and the unwinding of DNA by RNA polymerase is suggested 9 2.3 Representation of T. aquaticus (Taq) RNA polymerase (PDB ID

1L9U) holoenzyme with attached DNA binding σAinitiation fac- tor, protein subunits are displayed in different colors, image pro- duced using the software by Tarini et al. (2006). . . 10 3.1 Chemical structures of pentoses, and the most common purines

and pyrimidines (standard bases), as incorporated in RNA and DNA . . . 15 3.2 Chemical structure of adenosine triphosphate . . . 16 3.3 Backbone of RNA carrying an adenine base . . . 17 3.4 Mg2` ion chelated by two proximate phosphate groups from the

backbone of an RNA molecule . . . 21 3.5 Designed RNA sequence for fluorescence measurement of renatu-

ration rates, complementary hybridization domains A and B are denoted . . . 22 3.6 Evolution of base pairing probabilities in the domains of the RNA

sequence from figure 3.5, left: pairing probabilities from structure prediction at temperatures between 0˝C and 100˝C suggesting two distinct melting transitions, right: complete heatmap of posi- tional pairing probabilities (color coded) in the same temperature range . . . 23 4.1 A slightly more realistic scheme of the Central Dogma of Life . . . 25 4.2 From consensus sequence to weight matrix representation of a

binding motif for the FruR TF protein of E. coli, upper sequence logo corresponding to occurrence counts and lower scaled by po- sitional information score . . . 28

xv

(16)

xvi List of Figures

5.1 Northern blots of (anti) SsrS non-coding RNAs of B. subtilis and E. faecalis grown in rich media. Lanes correspond to DNA size- marker (L), RNA extractions from exponential growth phase (E) and from stationary phase (S). Numbers to the left indicate the size of marker bands. . . 34 5.2 5’tagRACE procedure to create a 5’tag-cDNA library . . . 36 5.3 Idealized 5’tagRACE electrophoresis result featuring five 5’ regions

of a transcript with two distinct TSSs (only in TAP` lane) and three processed forms (in both TAP`and TAP´ lanes) . . . 37 6.1 Jablonski diagram of a typical fluorophore with short-lived singlet

(S) and long-lived triplet (T) states. Excited vibrational states are shaded in gray and non-radiative relaxation is denoted by dashed lines. Solvent effects can lead to a relaxation of excited states. . . . 39 6.2 Molecular beacon with a donor-acceptor resonant energy transfer

pair, left: incident light (νb) excites the donor, quenched by a nearby acceptor which emits light (νr), right: when the acceptor is too far away, e.g. when the molecule is paired to a target, the donor is not quenched and can radiate (νg) . . . 41 6.3 Ethidium group (orange carbon rings) intercalating cytosine and

guanine bases of DNA (gray) resolved by X-ray diffraction at 1.32 Å resolution . . . 42 6.4 Chemical structures of ethidium bromide and PicoGreen . . . 43 6.5 Absorption and emission spectra of fluorescent dyes EB and PG

when bound to double-stranded DNA according to manufacturer (Invitrogen™) . . . 43 6.6 Renaturation curves of DNA (left) and RNA (right) sample with

EB:PG dye-mix. Different curves correspond to different detection wavelengths. . . 45 6.7 Renaturation curves of DNA (left) and RNA (right) samples in

presence of one of the fluorescent intercalators EB (upper ) or PG (lower ). . . 45 6.8 Expected hybridized stem-length vs temperature for RNA and

mimic DNA in bulk measurements, averaging over configurations at fixed temperature . . . 46 6.9 Probability distribution heatmaps for structural configurations at

different temperatures, left: mimic DNA exhibits two main struc- tural states with stems of 9 bp and 0 bp (unfolded), right: designed RNA exhibits three main structural states with stems of 19 bp, 9 bp and 0 bp (unfolded) . . . 47 6.10 Principle of nucleic acid detection on DNA microarrays. Labeled

sequences hybridize to biotinylated baits (gray circle) on a surface and can be detected upon excitation of their fluorophore (red circle). 47

(17)

List of Figures xvii

6.11 Photograph of Affymetrix® GeneChip® Human Gene 1.0 ST ar- ray hybridized with total RNA extracted from immortalized na- sopharyngeal cells (NP69). Clearly visible are calibrations spots (structured fields) and positioning marks (4 ˆ 4 checkerboards). . . 48

7.1 Principle of termination sequencing methods, left: DNA poly- merase, primers (arrows) dNTPs and a low concentration one of the four dideoxyribonucleotides is added to amplified DNA, right: random termination by ddNTPs can be resolved by gel electrophoresis of individual reactions (chain termination) or by detection of fluorescent labeled ddNTPs (dye termination) in cap- illary electrophoresis. . . 50 7.2 Emulsion PCR on primer-coated beads suspended in water droplets 51

8.1 Distribution of binding energies and sequence binding probability . 61 8.2 Alignment boxes of a set of four sample sequences with a1 “ 3,

a2 “ 17, a3 “ 10 and a4 “ 23. Different alignment widths w1w2 “ 5 and w3 “ w4 “ 6 correspond to a gapped alignment, as shown to the right. . . 64 8.3 Evaluation of the model matrices cνiand qνi, and the background

vector pν on the complete set of sequences from the previous ex- ample in figure 8.2. . . 66

9.1 Target prediction for the E. faecalis ncRNA refA; A: cumulative distribution function of predicted interaction strengths between the ncRNA and the bacterial transcriptome (‚) and random se- quences from a 5th order Markov chain (–) , B & C: Northern blots of the ncRNA, indicating coarse expression levels in different growth conditions (L: size marker, E: exponential, S: stationary, H: static with hemin, O: oxidative stress, R: respiration) . . . 70 9.2 Map of top 10% of inferred genetic couplings between significantly

differentially expressed L. monocytogenes genes from microarray transcriptomic expression data in a mutant/wildtype comparison in triplicate experiment; interaction strength is proportional to line width, positive couplings are shown in black, negative ones in red . . . 73

(18)

xviii List of Figures

10.1 Comparison of the relation between the background energy Fb

and the abundance for a set of S. cerevisiae transcription factors.

Difference between consensus energy E˚ and background energy Fb are reported as squares. Their values shifted by the logarithm of the TF abundance (as measured experimentally) are reported as circles. Vertical dashed lines correspond to average values for the two sets of points. Points have a sizable scatter but circles are clearly centered around zero. (upper : results for log-odds ratio matrices; lower : results for energy matrices). Histograms give visual access to the distribution of points. . . 90 10.2 Logarithmic heat maps of the evolution of alignment position prob-

ability distributions on the promoter regions of different operons in E. coli; regions contain one known TF binding site for FruR each and were aligned with QPS; distributions converge after 14 iterations . . . 92 10.3 Distribution and positioning of predicted binding sites for EBNA-

1 on EBV; FR (red blobs) and DS (green blobs) indicate known binding sites . . . 94 10.4 Autocorrelation function ρpdq in 0.5 Mbp windows of human chr Y 95 10.5 Compiled autocorrelation heatmaps of human chromosome 16 from

0.5 Mbp windows, left: crude sequence, right: repeat-masked se- quence . . . 95 10.6 Map of primary transcripts characterized with 5’tagRACE in the

chromosome of E. faecalis, pathogenicity island (PAI) denoted as gray box . . . 97

(19)

List of Tables

3.1 IUPAC symbols for nucleotide bases . . . 16 3.2 Geometric parameters of A-, B- and Z-type helices of double-

stranded nucleic acids (Stryer, 1995) . . . 18 4.1 Standard RNA codon translation table . . . 26 4.2 Selected repeat classes in human DNA (Doggett, 2001) . . . 30 10.1 List of repeats identified by single nucleotide autocorrelation anal-

ysis of masked sequences; window indexes from autocorrelation heatmaps, covering 500 kb and shifted by 250 kb in each step; ap- proximate chromosomal positions according to the index; apparent period harmonics in brackets . . . 96

xix

(20)

xx List of Tables

(21)

List of Abbreviations

APS Adenosine Phosphosulfate

ATP Adenosine Triphosphate

B. subtilis Bacillus subtilis

bp Base Pair (length unit)

cDNA Complementary DNA

dATP Deoxyribo-ATP

DNA Deoxyribunucleic Acid

dNTP Deoxyribo-NTP (A,C,G or T)

dsDNA Double-stranded DNA

dsRNA Double-stranded RNA

E. coli Escherichia coli E. faecalis Enterococcus faecalis L. monocytogenes Listeria monocytogenes

mRNA Messenger RNA

ncRNA Non-coding RNA

nt Nucleotide (length unit)

NTP Nucleotide Triphosphate (A,C,G or U)

PCR Polymerase Chain Reaction

RACE Rapid Amplification of cDNA Ends

RLPCR RNA Ligase mediated PCR

RNA Ribonucleic Acid

xxi

(22)

xxii List of Abbreviations

RNAP RNA Polymerase

RNase Ribonuclease

RT Reverse Transcription

ssDNA Single-stranded DNA

ssRNA Single-stranded RNA

T. aquaticus Thermus aquaticus

TAP Tobacco Acid Pyrophosphatase

(seldom Tobacco Alkaline Phosphatase)

TF Transcription Factor

TSS Transcription Start Site

(23)

Part I

Introduction

1

(24)
(25)

Chapter 1

Scientific context

Let me begin this thesis with a delineation of the field which is discussed here:

genetics from a biophysical perspective. As such, the work presented here can be ascribed to the broad field of systems biology, which I shall briefly sketch in the following section, before giving a short historical account of selected contributions of physics and chemistry to biology.

1.1 Systems Biology

Biology is classically a descriptive scientific discipline, already since the term was coined in 1766 by german physician Michael Christoph Hanov, much in the sense of ontology, from the systematic classification of species to the characterization of cellular components with the development of microscopes and later with the car- tography of enzymatic reactions, when chemistry offered the necessary analytical tools, as opposed to physics, which in many instances is a predictive discipline.

The term descriptive is meant in absence of any dismissive connotation, as the discipline has aggregated a colossal wealth of knowledge ranging from charting and classifying a multitude of the organisms populating our planet down to ex- planing how and which chemical reactions drive sub-cellular processes. Moreover, one shall not neglect that this information has been inferred both inductively and deductively from observations, and modeling approaches have been used contin- uously, most prominent ones being the model of evolution by natural selection, partly inferred from observation of finch species distributions in the Galapagos is- lands (Darwin, 1859) and the structure of DNA (Watson and Crick, 1953), which before its verification competed with different similar structural models.

When it comes to cellular and sub-cellular studies, focus has for a long time been on well delimited phenomena and molecular mechanisms, a necessary ap- proach due to the immense complexity of processes taking place within and be- tween individual cells. Long range interactions between spatially and temporally separated regions and reactions often being ignored, descriptions remained mostly

3

(26)

4 Chapter 1. Scientific context

isolated from each other. This restrained quantitative understanding of the re- sponse of the whole system to perturbations of specific parts, despite detailed knowledge on these individual parts, or the identification of more complex be- havior emerging from the interplay between these parts. Recent technological de- velopments, such as those briefly outlined below, have changed the situation. Am- bitions are big with modern techniques allowing to record snapshots of complete cellular states measured in various ways, such as the complete set of produced gene transcripts, or to follow the distribution of molecular species in response to external stimuli, down to the motion of individual molecules, both in cell pop- ulations and single cell experiments. With the availability of such measurement data, a new interdisciplinary field has emerged with the new millenium, systems biology (Ehrenberg et al., 2003), bringing together various branches of math- ematics, computer science, theoretical and experimental physics and different engineering disciplines, e.g. control theory and signal analysis, with fundamental fields of modern biology such as genetics and molecular biology. Thus, systems biology not only relies on the availability of modern data acquisition techniques, but also on a rich variety of tools developed within biochemistry and molecular biology to perform very specific manipulations and induce perturbations in the systems of interest, be it a single cell or whole multicellular organism.

1.2 Physics in Biology

Niels Bohr saw no reason why biology should not undergo the same revolution as atomic physics (Bohr, 1933). He hoped for the discovery of fundamental constants (Bohr, 1963), much as had been the case in physics, and sought for “complemen- tarity of mechanistic and teleological descriptions” (McKaughan, 2005) in biology.

This was partially satisfied by Linus Pauling’s “research into the nature of the chemical bond and its application to the elucidation of the structure of complex substances”, awarded the 1954 Nobel Prize in Chemistry, which led to a more detailed understanding of molecular structures, in particular also of biological molecules. James Watson and Francis Crick eventually published their famous work (Watson and Crick, 1953) on DNA in 1956, for which they were awarded the 1962 Nobel Prize in Physiology or Medicine together with Maurice Wilkins “for their discoveries concerning the molecular structure of nucleic acids and its sig- nificance for information transfer in living material”. Bohr’s strong influence and support eventually led Max Delbrück to switch from physics to genetics, where he made fundamental “discoveries concerning the replication mechanism and the genetic structure of viruses”, amongst others definitely proving DNA to be the main carrier (Rassoulzadegan et al., 2006) of genetic information. Delbrücks work was later awarded the 1969 Nobel Prize in Physiology or Medicine along with Alfred Hershey and Salvador Luria. Erwin Schrödinger, on his part motivated by the developments in physics and admiring the work of Delbrück, endeavored to popularize the core questions of genetics during his years in Dublin. The sug- gestion of central ideas on the activation of enzymes leading Jacques Monods

(27)

1.2. Physics in Biology 5

to his famed work on gene regulation can be ascribed to Leo Szilard (Monod, 1972; Novick and Szilard, 1954). Monod, along with François Jakob and André Lwoff, was later awarded the 1965 Nobel Prize in Physiology or Medicine “for their discoveries concerning genetic control of enzyme and virus synthesis”.

Today, the properties of biological networks of arbitrary scale have emerged as significant research topics, affecting subjects from gene regulation (Alon, 2007;

Maslov and Sneppen, 2002) to the understanding of neuronal circuits in the brain (Eguíluz et al., 2005).

1.2.1 Historical modeling approaches

Living cells are essentially complex thermodynamic machines, allowing a micro- scopic description in terms of statistical mechanics and thermodynamics. Incen- tive ideas for such descriptions are collected in the popular little book What is life? by Schrödinger (1944). The highly influential works on the kinetic theory of Brownian motion from the early 20th century (Einstein, 1905; von Smolu- chowski, 1906) remain pertinent, as the understanding of diffusive processes is of major importance in many aspects of biology, from the motion of small molecules within and across cells, over the behavior of polymer chains such as DNA, pili1or filaments2, to the spread of cells in viscous media or the migration of organisms in their environment.

The behavior of polymers alone is a topic of great importance, not least for its role in relation to nucleic acid chains, and analytical work on this was pioneered by Flory (1953, 1968). Simulations, on the other hand, can allow to gain insight in the mechanics of many particle systems and modern molecular dynamics sim- ulations (Hess et al., 2008; Brooks et al., 2009, to name but two out of many) are often chosen to backup models of protein motility and interaction or reactions at the level of cell membraned and lipid structures (Karplus and McCammon, 2002). However, simulation outcomes strongly depend on the choice of particle interaction models (Tuckerman and Martyna, 1999), making it difficult to use such results in absence of experimental evidence. Complete quantum mechanical descriptions based on density functional theory lead to more pertinent mechanis- tic models (Siegbahn, 2003, and references therein), yet these are restricted to small molecules of several tens of atoms, due to the computational complexity of the method.

1.2.2 Experimental techniques

Advances in the experimental field were made by Block (1990), who extensively pushed the application of optical tweezers in cell biology. These tools were devel- oped by Ashkin (1970), inspired by cooling methods for single atoms by lasers,

1Many bacteria exhibit hairlike polymers on their surface and such pili are commonly asso- ciated with cell motion.

2Different classes of polymer filaments are found in eukaryotic cells, involved in cell structure and plasticity, cell motility and intracellular transport.

(28)

6 Chapter 1. Scientific context

work for which Steven Chu, Claude Cohen-Tannoudji and William D. Phillips were awarded the 1997 Nobel Prize in Physics “for development of methods to cool and trap atoms with laser light”. Carlos Bustamante later refined the tech- nique, opening the way for single-molecule manipulations (Wuite et al., 2000), which further allowed to perform mechanistic and biochemical experiments an incredible level of detail.

At the same time, spectroscopic methods advanced, especially those employ- ing fluorescent dyes, permitting not only to visualize sub-cellular processes, but also to quantify reaction mechanics on a nanometer scale (Joo et al., 2008).

(29)

Chapter 2

Genetics in a nutshell

2.1 Mendelian laws

Genetics, one of the fundamental disciplines in biology, began with Gregor Mendel’s work on the heredity of distinct characters in peas (Mendel, 1866). Towards the end of the 19th century, almost half a century after Mendel’s experiments and nearly a decade after his death in 1884, botanists Hugo de Vries and Carl Correns, as well as agronomist Erich Tschermak-Seysenegg independently rediscovered the rules of heredity, referred to as the laws of Mendelian inheritance:

Uniformity

for a particular trait, offspring from one homozygous dominant and one homozygous recessive parent is uniformly heterozygous in that trait Segregation

the fraction of heterozygous and homozygous offspring from parents being heterozygous in a particular gene is equal

Independent Assortment

offspring from parents being homozygous in two particular genetic traits inherits both traits independently from each other

The name genetics was coined in the early 20th century by Danish botanist and pharmacist Wilhelm Ludvig Johannsen and his contemporary, British biol- ogist William Bateson, introducing genes as carriers of hereditary information, later defined as being coded by a specific region on a chromosome. Zygotism in Mendel’s laws refers to diploid organisms, carrying each chromosome in two copies, one from each parent, thus each trait can originate from different loca- tions in the genetic material with slightly different content, termed alleles of this trait. A trait is thus called homozygous if the corresponding chromosomes carry the same gene sequence and heterozygous if the alleles are different. In the above stated laws, different alleles for the same trait can compete with each other, hence the supersede a recessive variant or succumb to a dominant one.

7

(30)

8 Chapter 2. Genetics in a nutshell

Applications of the rediscovered principles were obvious: manipulation of culture and breed, and possible lessons even more tempting, to understand life.

With later breakthroughs on the structure and function of DNA, as well as on the surrounding biochemical reactions, underlying genetic mechanisms became more and more the focus of the quest for the understanding of life.

This thesis does by no means dive into the whole breadth of genetics which is the legacy of the above named pioneers, but only on few, selected and related subjects. The emphasis of this work remains on the elementary mechanisms which eventually lead to the expression of genes, as well as on the characterization of relevant molecules, on one hand, and more abstract bits of information, on the other.

2.2 Central dogma of Molecular Biology

Genes are defined by specific sequences of DNA1, which are first transcribed to RNA and thereafter, in the case of coding genes, translated to proteins. This is often represented as the one-dimensional, linear flow, shown in figure 2.1, which is attributed to Crick (1958). The central dogma eventually postulates

Figure 2.1: Main flow of information according to the central dogma that no information transfer occurs from the produced proteins back to RNA or DNA. Evidently, the complete picture is more complex, as clearly stated by Crick (1970). Not all genes are translated into Proteins, though, and so called non-coding genes, e.g. the placental mammal RNA gene Xist, for X-inactive spe- cific transcript, (Brown et al., 1992) or the bacterial open promotor mimic 6S (Wassarman and Storz, 2000; Trotochaud and Wassarman, 2005), cf. Paper IV, act as properly structured RNA and therefore may require enzymatic process- ing to reach maturity. Gene expression can thus be controlled at two different stages: prior to transcription, by factors determining whether an RNA transcript is produced and to which extent, and post-transcriptionally, but prior to trans- lation, by factors determining whether a given transcript will be recognized and processed by a ribosome, or affecting the maturation of a non-coding gene.

2.2.1 Transcriptional gene regulation

The most prominent regulation of RNA synthesis is performed by transcription factor (TF) proteins which bind in the vicinity of the transcription start site, in

1This is of course neglecting the case of RNA viruses such as, e.g., those responsible for influenza or the blue tongue disease.

(31)

2.2. Central dogma of Molecular Biology 9

the so called promoter region. Other, more subtle regulation schemes are driven by the availability of nucleic acids. A simple model is illustrated in figure 2.2.

The template strand of DNA – complementary to the usually annotated coding strand – is transcribed to mRNA by RNA polymerases (RNAP) which binds the promoter regions. Transcription thus means unwinding of DNA, separation of

Figure 2.2: Schematics of activation (A) and repression (R) of transcription by DNA binding proteins, transcription start sites are depicted by arrows and the unwinding of DNA by RNA polymerase is suggested

the strands and covalently binding free nucleotides to a polymer of RNA.

The recognition of the binding sites for the polymerase is usually under control by a set of TFs which may either enhance or reduce transcription by facilitating binding of RNAP enzymes to DNA or by blocking the binding site for RNAP.

Transcribed mRNAs may then be processed by ribosomes, translating them into proteins, or interact with other molecules in the cell, e.g. metabolites leading to changes in the secondary structure of the mRNA and most prominently other, so called regulatory non-coding (nc)RNA (Prasanth and Spector, 2007; Eddy, 2001). The high-temperature tolerant bacteria T. aquaticus’ RNAP bound to an essential transcription factor σA(Murakami et al., 2002) is pictured in figure 2.3.

TF proteins can be classified as basal and effecting factors. The former are usually associated to the RNAP enzyme, then referred to as holoenzyme, before it binds to the promoter and are thus relevant for the default behavior of the holoenzyme. The latter bind independently to the promoter region and either further improve the affinity of the holoenzyme for its binding site, or reduce it, usually by covering that site.

2.2.2 Post-transcriptional control

Until the early 1990s, gene regulation was thought to be mainly transcriptional.

Transcribed functional mRNA was supposed to lead to the synthesis of a corre- sponding protein and mRNA degradation by RNase proteins was rather seen as

(32)

10 Chapter 2. Genetics in a nutshell

Figure 2.3: Representation of T. aquaticus (Taq) RNA polymerase (PDB ID 1L9U) holoenzyme with attached DNA binding σAinitiation factor, protein sub- units are displayed in different colors, image produced using the software by Tarini et al. (2006).

passive garbage collection. No major post-transcriptional mechanism had been observed in animals till small fragments of RNA, dubbed micro (mi)RNA, with surprising properties were identified in the roundworm Caenorhabditis elegans (Lee et al., 1993). These short fragments of „ 22 nt were processed from parts of the lin-4 mRNA as it was being prepared for translation and showed reverse com- plementarity in their genomic sequence to the mRNA of the lin-14 gene. Without going into the details and functions of the named genes, the lin-4 miRNA was shown by Wightman et al. (1993) to be able to bind lin-14 mRNA and reduced the translation of lin-14 possibly by acting as a steric obstacle during the process- ing of the lin-14 mRNA by ribosomes. Although more hints and clues suggesting another kind of regulatory apparatus were found all along the way, it took al- most a decade for the miRNA research to point out the capacities of those short nucleotide sequences. Among others, Lagos-Quintana et al. (2001) describe the

(33)

2.2. Central dogma of Molecular Biology 11

beginning of a broader understanding of the regulatory machinery. Regulation by short RNA appears to be a fundamental mechanism in eukaryotes and several different types of such regulators have emerged. Eventually, the 2006 Nobel Prize in Physiology or Medicine was awarded to Andrew Fire and Craig Mello “for their discovery of RNA interference - gene silencing by double-stranded RNA” in the nematode Caenorhabditis elegans.

By now, a multitude of ncRNAs have emerged as important post-transcript- ional regulators. In prokaryotes, large functional ncRNAs have been identified with the ability of modifying mRNA secondary structures upon hybridization, see Storz and Haas (2007) for a review, especially Toledo-Arana et al. (2007), while the short miRNAs and their associated mechanisms have not been ob- served to date. Furthermore, specific regions of mRNAs have been identified as post-transcriptional regulators of the encoded gene. Riboswitches and thermosen- sors denote such regions, which can en- or disable their mRNA’s translation by modifying its affinity for the ribosomes in response to the binding of metabolite molecules or a change in environmental temperature, respectively. In the simplest case, this is done by changing the secondary structure such that the ribosome binding site is being exposed or made unavailable.

General modes of post transcriptional regulation can be classified as either:

Conditional translation

ncRNA, thermosensors and riboswitches modify the secondary structure of mRNA, regulating the access of ribosomes to their binding sites. This being mostly observed in prokaryotes (Johansson et al., 2002; Kortmann et al., 2010), some hints point towards the existence of such mechanisms in higher organisms (Prasanth and Spector, 2007). In eukaryotes, predominantly in animal cells, miRNA-bound mRNA often remains intact, but the processing by ribosomes is hindered (Wightman et al., 1993).

Targeted degradation

miRNA interferes with mRNA, leading to a degradation of the latter. This ability of miRNA and small interfering (si)RNA is frequently observed in plants (Rhoades et al., 2002). Analogous effects are observed in prokaryotes (Prévost et al., 2011), which rely on the exposure of specific RNA hybrids recognized by the cell’s degradation machinery.

This short excursion into the world of post-transcriptional regulation is to prevent the possible thought that one is just in grasping distance of understanding the whole genetic machinery. TF proteins still play a major role in this game but yet unknown participants may be undisclosed any time, enlarging the necessary set of rules – on the other hand enabling us to understand the implications of these very rules.

(34)

12 Chapter 2. Genetics in a nutshell

2.3 Noisy gene expression

Based on chemical reactions with typically small numbers of reactants, gene expression is an intrinsically noisy process (Spudich and Koshland, 1976). In- trinsically not only in the sense that a large network of biochemical reactions may tend to show chaotic behavior under certain circumstances (see e.g. Aldana and Cluzel (2003) or Pécou (2005)), but also in that many processes such as the transcription and translation of genes, are counteracted by antagonists, leading to stochastic process at the molecular level. Further, folding of proteins into their functional form is also commonly regarded as stochastic (G¯o, 1983). Extrinsic noise may further play a role, stemming from environmental variations of tem- perature, pressure, radiation or distributions of cellular components, thus rather acting on the level of populations than on single cells (Elowitz et al., 2002).

Regulatory systems need a certain degree of robustness against intrinsic and, to some extent, extrinsic noise. An important aspect here is the affinity of regula- tory proteins for non-specific binding sites on DNA. Such sites may play the role of a thresholding pool which has to be filled to a certain extent before functional sites can be bound efficiently. This issue is addressed in more detail in Paper I, where parts of the results show aspects of noise filtering features intrinsic to transcription regulation.

(35)

Part II

Physics & Information of Nucleic acids

13

(36)
(37)

Chapter 3

Physics of nucleic acids

This chapter introduces the basic components of nucleic acids before discussing structural properties of such polymers and their behavior under variation of ex- ternal parameters such as solvent composition, electric fields and temperature.

3.1 Building blocks

3.1.1 Sugar & base

Naturally occurring nucleosides are molecules consisting of a pentose sugar and one of the purines adenine or guanine, or one of the pyrimidines cytosine, thymine or uracil, with structures shown in figure 3.1. Bound to a sugar, the bases form

Figure 3.1: Chemical structures of pentoses, and the most common purines and pyrimidines (standard bases), as incorporated in RNA and DNA

15

(38)

16 Chapter 3. Physics of nucleic acids

the nucleosides adenosine, guanosine, thymidine, cytidine and uridine. The use of different sugars in RNA (ribose) and DNA (deoxyribose) yields slightly differ- ent structural properties between the two kinds of nucleic acids. The standard bases adenine, guanine, cytosine and thymine normally occur in RNA and DNA, while uracil is exclusively found in RNA, where it substitutes thymine. For the sake of completeness, derived and modified bases such as inosine, lysidine, pseu- douridine and dihydrouridine are commonly found in ribonucleic acids other than messengers, most prominently in transfer and ribosomal RNA (Czerwoniec et al., 2009). The active implication of nucleotide modification in gene regulation has only emerged recently (Wu et al., 2011). As in the following, nucleic acids are

Symbol A C G T U R Y S W K M B D H V N -

Base

adeninecytosineguanineth ymineuracil A

or G

C or

T G

or C

A or

T G

or T

A or

C C

or G

or T A

or G

or T A

or C

or T A

or C

or G an

ybase gap

Table 3.1: IUPAC symbols for nucleotide bases

commonly abbreviated with single letters. Table 3.1 lists the abbreviations in use, according to the International Union of Pure and Applied Chemistry, including symbols used for grouping all possible ambiguities. Nucleotides, i.e. phosphory-

Figure 3.2: Chemical structure of adenosine triphosphate

lated nucleosides, are important in different biochemical reactions. Nucleotide triphosphates (NTPs), e.g. ATP illustrated in figure 3.2, are used as source of energy in catalytic reactions and are incorporated into nucleic acid chains, loos- ing a diphosphate (pyrophosphate) group, while monophosphates are implicated as signaling molecules in communication pathways of cells (Stryer, 1995).

3.1.2 Complementarity

The standard bases are shaped in a way that allows the formation of hydrogen bonds between nucleic acids of opposite orientation. Adenine and thymine can

(39)

3.2. Structural properties 17

share two protons, hence forming two bonds, while guanine can form three bonds with cytosine or two bonds with uracil. The latter is promoted in dsRNA by the lack of the methyl (-CH3) group in uracil, which sterically prohibits bond formation between guanine and thymine in dsDNA. Formation of such hydrogen bonds between AT and GU basepairs over longer stretches of complementary sequences eventually leads to helical structures.

The stabilizing effect of basepair formation between two strands can be de- scribed in terms of the change in Gibbs free energy of a helix upon addition, or stacking, of the corresponding pair. Averaging out nearest neighbor effects, stacking an AT basepair on a dsDNA molecule has been determined to con- tribute on average ´1.13˘0.32 kcal mol´1to the binding free energy and ´2.08˘

0.21 kcal mol´1 was found for GC pairs (Allawi and SantaLucia, 1997). Other combinations are repulsive and destabilize the structure. Corresponding values are similar for stacking in dsRNA, where one finds ´0.86˘1.06 kcal mol´1for GU,

´1.37 ˘ 0.71 kcal mol´1 for AU, and ´2.48 ˘ 0.72 kcal mol´1 for GC (Mathews et al., 1999), again averaging over different nearest neighbors. Again, stacking of other base combinations destabilize the structure. As a coarse rule of thumb, the stacking energies of GC:AU:GU pairs in RNA roughly relate to each other as 3:2:1. For DNA, A is thus complementary to T and G complementary to C, while for RNA both A and G are complementary to U.

3.2 Structural properties

3.2.1 Helix architecture

Nucleic acid chains are made up of alternating groups of phosphate and sugar, constituting the molecules backbone, and each sugar is bound by a base as il- lustrated in figure 3.3 with adenine and a ribose backbone. The asymmetry of

Figure 3.3: Backbone of RNA carrying an adenine base

the sugar imposes a directionality on the backbone. Defining the carbon atom at which the base is attached as carbon #1, one phosphate is attached at carbon

#3 (lower one in figure) and another at carbon #5 (upper one in figure), hence the terms 5’ and 3’, describing the orientation of a nucleic acid chain.

(40)

18 Chapter 3. Physics of nucleic acids

In absence of mechanical stress, stretches of complementary chains form he- lices with distinct geometrical properties, the compact ribbon-like A-form, the twisted double-helical B-form, and the more elongated, zigzagging Z-form. Table 3.2 summarizes the main characteristics of the three helix types. In a relaxed

helix type

A B Z

relative size broadest intermediate narrowest

rise per bp 2.3 Å 3.4 Å 3.8 Å

helix diameter 25.5 Å 23.7 Å 18.4 Å

screw sense right right left

bp per turn 11 10.4 12

pitch per turn 25.3 Å 35.4 Å 45.6 Å

tilt of bp from axis 190 10 90

major groove narrow & deep wide & deep flat minor groove broad & shallow narrow & deep narrow & deep Table 3.2: Geometric parameters of A-, B- and Z-type helices of double-stranded nucleic acids (Stryer, 1995)

state, RNA is typically found in A form, while DNA is taking a B conformation.

However, in presence of metal ions, helical RNA is known to undergo a transition from the right-handed A-form to the left-handed Z-form (Hall et al., 1984).

Helix formation kinetics are commonly modeled as second order reaction be- tween single-stranded (ss) nucleic acids (NA), forming double-strands (ds) at rate k2and dissociating at rate k´1

2 ssNAÝÝáâÝÝk2

k´1

dsNA . (3.1)

For long chains of complementary nucleic acids, after initial formation of a hybrid of a few base-pairs, this reaction is reported to take place as fast length-dependent nucleation, or zipping of two strands, at rate

k2„ N0.52, (3.2)

as originally described in Wetmur and Davidson (1968) and recently rediscussed in Sikorav et al. (2009). For single stranded nucleic acids, typically RNA, the situation is slightly more complex due to partial intramolecular complementarity.

Hybrid formation is reported to occur in two steps (Fürtig et al., 2007), forming secondary structure helices and hairpins at timescales τ2 between 10 and 100 µs, followed by a spacial reorganization and stabilization of the tertiary structure by interactions between hairpin loops and unpaired bulges at timescales τ3between 10 and 100 ms.

(41)

3.3. Dynamic responses 19

3.2.2 DNA vs RNA

Fundamental to the differences in helical structure between RNA and DNA is a small chemical difference: deoxyribose is missing one hydroxyl (-OH) group, com- pared to ribose. This slight variation entails a different behavior of corresponding nucleic acids. An important mechanical property of polymers, the stiffness of the chain, is typically expressed in terms of the persistence length lp, which can be defined from the angular correlation of a worm-like polymer chain (Flory, 1968)

xcos φplqy “ e´l{lp, (3.3)

where l is the contour length along the polymer chain and φplq is the angle between an axial tangent to the polymer at length zero and a tangent at l, thus the relative turn of the chain at that position.

Relaxed dsRNA (predominantly in A-form) has a persistence length of 63 ˘ 2 nm and is somewhat stiffer than relaxed dsDNA (predominantly in B-form) with roughly 54 ˘ 2 nm in 10 mM Na` buffer (Abels et al., 2005). Neglecting supercoiling effects (Liu and Wang, 1987; Marko and Siggia, 1995), this might be a signature for the evolutionary advantage of coding genetic information on DNA, which can be more efficiently compacted than RNA.

3.3 Dynamic responses

3.3.1 Electric fields

Nucleic acids carry a net negative charge, due to the phosphate group of the backbone, and are thus susceptible to electric fields. In vacuum or homogenous solvents, nucleic acids experience a force proportional to their charge, which corresponds to the molecular mass, since each phosphate group of the backbone carries a negative charge. Neglecting differences between bases, the mass of a nucleic acid is approximately proportional to the molecule’s charge. The velocity with which a nucleic acid moves in an electric field is thus just a function of the field and with negligible effect of the nucleotide composition.

Migration of nucleic acids in retarding viscous media, e.g. made up of entan- gled polymers, is described by the theory of reptation (de Gennes, 1971; Slater and Noolandi, 1985). Here, the behavior is very different and in presence of elec- tric fields, migration velocity becomes a function of the molecular length. This property is commonly used to separate nucleic acids of different sizes, migrating through a retarding medium, a gel.

An evenly charged polymer chain of N monomers with total charge Q, e.g.

DNA, in a viscous medium containing obstacles, e.g. agarose or polyacrylamide gel, subjected to a homogenous electric field E will experience an effective drift associated to the electrophoretic velocity of the molecule

xvmy „ QE

N2 pasymptoticallyq (3.4)

(42)

20 Chapter 3. Physics of nucleic acids

parallel to the electric field. Two molecules of the same kind but of different numbers of monomers N1 ă N2, subjected to the same electric field thus attain migration velocities which differ by

∆v12“ xv1y ´ xv2y „ E ˆ 1

N1

´ 1 N2

˙

. (3.5)

Different molecular lengths are thus readily separated, as longer nucleic acids migrate slower than shorter ones. Composition and size of the retarding gel define the differences in length which can be resolved, typically from single to several thousand nucleotides.

3.3.2 Solvent effects

Polar or charged solvent molecules interact with the negatively charged backbone of nucleic acids. Solvent polarity affects the stability of hydrogen bonds between complementary nucleic acid sequences . While e.g. water enables the formation of the above described bonds, non-polar solvents such as isopropanol, ethanol or methanol prevent their formation: exposure of the backbone to the solvent is minimized and double strands are denatured (Herskovits et al., 1961; Baldini et al., 1985; Cui et al., 2007).

A different effect is due to intercalating molecules, which position themselves between stacked bases. Presence of such molecules in the solvent foremost im- pacts on the stability of double-strands and stiffness of helical structures (Řeha et al., 2002; Berge et al., 2002).

3.3.3 Metal ions

“The formation or presence of bonds (or other attractive interactions) between two or more separate binding sites within the same ligand and a single central atom” (McNaught and Wilkinson, 1997), termed chelation, is important in the context of nucleic acid sequences in presence of polydentate metal ions. Such ions, exhibiting multiple bonds, contribute to the structural configuration of nu- cleic acids by interacting with several oxygen ions on the backbone. Figure 3.4 illustrates the interaction between a magnesium ion and two nearby phosphate groups. Depending on the electronegativity of the metal ion, its denticity and its concentration in the solvent, the effect on nucleic acids may be either stabilizing, e.g. by inducing sterically favorable bends in the molecule, or disrupting, e.g. by prohibiting the binding of complementary parts of high curvature. Physiologi- cally relevant metal ions such as Na`, Mg2`, Ca2`or Zn2`, to name but a few, thus have an important role in the formation of nucleic acid structures.

For an extensive review of the topic and treatise of different modes of inter- action between nucleic acids and metal ions, see Hud (2008).

(43)

3.3. Dynamic responses 21

Figure 3.4: Mg2` ion chelated by two proximate phosphate groups from the backbone of an RNA molecule

3.3.4 Temperature

Variations in temperature can have a strong impact on the structure of nucleic acids. Double stranded DNA dissociates into single stranded molecules under physiological conditions at temperatures above 95˝C , unless it is further stabi- lized by supercoiling mechanisms or the presence of chelators, as in many hyper- thermophile archaea, which thrive at temperatures of up to 107˝C (Marguet and Forterre, 1994). Clearly, other parts than the genetic material are as well affected and denatured by to too high temperatures in living cells. In physiologically rea- sonable ranges, however, temperature is readily sensed by specific ribonucleic acids (Repoila and Gottesman, 2001), and mRNA intrinsic thermosensors (Jo- hansson et al., 2002; Kortmann et al., 2010), triggering the translation of genes under their control, commonly by exposing ribosomal binding sites when under- going conformational changes or melting.

As temperature influences the structure, its variation can be used a tool to explore the configurations of nucleic acid sequences. This is classically done for DNA by denaturing a molecule of interest in bulk solution at a high temperature and following the hybridization kinetics of complementary parts by measuring the optical absorbance at 260 nm, as single stranded DNA absorbs more ultraviolet light than double stranded (Britten and Kohne, 1968; Wetmur and Davidson, 1968). Alternative approaches make use of fluorescent reporter molecules which specifically bind double-stranded nucleotides (Yguerabide and Ceballos, 1995;

Sisamakis and Fouquier d’Hérouël, 2011), see below.

When monitoring reactions between nucleic acid strands and especially with reporter molecules at varying temperatures, one has to keep in mind the al- ternation of the reaction equilibrium. The change of the equilibrium constant K with temperature T , or as written here in terms of the inverse temperature β “ pkBT q´1, is described by the van ’t Hoff equation (Atkins and De Paula, 2006)

Bβln K “ ∆HNA´1, (3.6)

with the Avogadro constant NA and the reaction enthalpy ∆H relating to the Gibbs free energy ∆G and the change in the system’s entropy ∆S through

∆G “ ∆H ´ T ∆S . (3.7)

(44)

22 Chapter 3. Physics of nucleic acids

3.3.5 Structure prediction

Publicly available secondary structure prediction tools (Mathews et al., 2001;

Hofacker, 2002; Markham and Zuker, 2008) readily implement thermodynamic models of RNA, allowing to predict structural configurations while taking into account parameters as ionic strength and temperature.

For the RNA sequence shown in figure 3.5, equilibrium secondary structures were predicted using the RNAfold software from the Vienna RNA package by Hofacker (2002) at temperatures ranging from 0˝C to 100˝C using energy pa- rameters as described in Andronescu et al. (2007). Structure prediction yielded

CUAGGUUUA-A-GACAGUUAGU-aaaaaa-GCUAAUUGUU-A-UAGAGUUAG

|:::::::::|-|::::::::::|---|::::::::::|-|:::::::::|

domain A domain B head domain B domain A

Figure 3.5: Designed RNA sequence for fluorescence measurement of renaturation rates, complementary hybridization domains A and B are denoted

base-pairing probabilities for each pair of nucleotides. The analyzed RNA se- quence has been designed to form a hairpin at moderate temperatures, made up of two domains (A and B) of different stability.

Resolving independently, within both domains, the maximum pairing proba- bility between any two nucleotides, yields an estimate of the melting behavior of the molecule, showing a clear difference between the dissociation temperature of domains A and B, visualized in figure 3.6.

(45)

3.3. Dynamic responses 23

0 20 40 60 80 100

0.00.20.40.60.81.0

Temperature in °C

Max within−domain pairing probability

domain B domain A difference

0.0 0.2 0.4 0.6 0.8 1.0

5 10 15 20

0 20 40 60 80

Nucleotide position from extremity

Temperature in ºC

Figure 3.6: Evolution of base pairing probabilities in the domains of the RNA sequence from figure 3.5, left: pairing probabilities from structure prediction at temperatures between 0˝C and 100˝C suggesting two distinct melting transi- tions, right: complete heatmap of positional pairing probabilities (color coded) in the same temperature range

(46)

24 Chapter 3. Physics of nucleic acids

(47)

Chapter 4

Information content of nucleic acids

4.1 Central dogma redux

Authoritative statements or public decrees that are not to be disputed routinely require some degree of interpretation and are therefore naturally prone for mis- understandings. As such, the central dogma of molecular biology (Crick, 1958) is no exception, as it is often stated as postulating unidirectionality as in figure 2.1 (Lodish et al., 2000; Sadava et al., 2009, among others). As discussed by Crick (1970), this dogma is merely a general framework, denoting DNA, RNA and pro- teins as the major players in molecular biology. Directionality is not imposed a priori, rather a completely connected picture of possible information transfer, similar to figure 4.1 is presented. Indeed, transfer from proteins to proteins and

Figure 4.1: A slightly more realistic scheme of the Central Dogma of Life

to RNA is described as then yet unobserved (Crick, 1970). However, accepting targeted degradation of RNA and enzymatic activity of proteins modulating the function of other proteins or affecting the transcribability of DNA by silencing or de-silencing specific regions as information transfer, even those links have to be included.

25

References

Related documents

Interestingly, it is more common to be employed by ‘Other immigrant employer’ than by a co-national employer in all groups except for those with parents born in

(B) Binding traces for FITC-Trastuzumab binding to SKOV3 cells intrinsically adhered on cell culture dishes (blue) and grown on BAM (red) and the result from a global fit according

The EU exports of waste abroad have negative environmental and public health consequences in the countries of destination, while resources for the circular economy.. domestically

In order to be successful in university mathematics activities, students need to understand the principles for distinguishing between the university context, and the

Here we propose an approach to solve the problem dis- cussed above by separating the image into a set of gray- scale channels and filter certain combinations of channels in order

Till denna energian- vändning tillkommer energi för uppvärmning och torkning samt energianvändning i växthus (SCB, 2008).. Utöver denna användning användes även 55 840 m 3

Här upprepas t ex än en gång uppgiften att föräldrarna inte besvarat ett brev med önskan om fotografi, men det undanhålls att personalen inte postat alla brev (enligt faderns