Modeling of Conformational Transitions in Membrane Proteins with Coarse-Grained Methods

(1)

Theoretical Physics

Modeling of Conformational Transitions in

Membrane Proteins with Coarse-Grained Methods

Cathrine Bergh

cabergh@kth.se

SA104X Degree Project in Engineering Physics, First Level

Department of Theoretical Physics

Royal Institute of Technology

Supervisors: Laura Orellana, Erik Lindahl

May 22, 2015

(2)

Abstract

Understanding the dynamics of proteins is crucial in order to understand life on a molec-ular level. In this Bachelor’s thesis large scale motions of the membrane protein SERCA has been investigated with coarse-grained methods. The goal was to find certain places in the structure where mutations could alter and affect the functionality of the protein, possibly leading to disease. The protein was modeled with an elastic network model and simulated with Langevin dynamics. Data from the simulations was used to calculate all the places were formation of stabilizing salt bridges were possible and these were then compared with mutations found in real cancerous cells. Promising results were obtained but need to be investigated further with more sophisticated methods before any conclu-sions can be drawn regarding their significance for the protein functionality. Likewise, the use of elastic network models together with Langevin dynamics showed to be a good alternative to sample large conformational changes in proteins.

(3)

Sammanfattning

Kunskap om proteiners dynamik är av stor vikt för att kunna förstå mekanismerna bakom livet på en molekylär nivå. I denna kandidatexamensuppsats har storskaliga rörelser i membranproteinet SERCA undersökts med hjälp grovkorniga metoder. Målet var att hitta särskilda platser i strukturen där mutationer kunde införas och påverka proteinets funktionalitet, något som möjligen skulle kunna leda till sjukdom. Proteinet modeller-ades som ett elastiskt nätverk och simulermodeller-ades med Langevin-dynamik. Data från simu-leringarna användes sedan för att beräkna samtliga platser där bildandet av stabiliserande saltbryggor var möjlig, varpå dessa jämfördes med mutationer funna i riktiga cancerceller. Lovande resultat erhölls, men dessa måste dock undersökas vidare med mer sofistikerade metoder innan slutsatser kan dras gällande deras betydelse för proteinets funktionalitet. Därtill visade sig användandet av elastiska nätverksmodeller tillsammans med Langevin-dynamik vara ett bra alternativ för att fånga stora strukturella förändringar i proteiner.

(4)

Chapter 1 Introduction

1.1 Protein Dynamics and Function

Proteins are complex organic macromolecules that play extremely important roles in eucaryotic cells since they are responsible for a vast number of functions that are taking place inside the cells. The proteins can for example work as motors, cellular factories, as regulators and as messengers in signal transduction, that all are represented by highly dynamical structures. This implies that protein dynamics is of highest importance for a majority of biological functions. Understanding the relations between protein dynamics, structure and function is therefore crucial in order to understand the mechanisms of life on a molecular level.

Proteins are ever-changing molecules, a property which give them the possibility to per-form useful tasks in the cells instead of just being mechanical support by merely acting like building blocks. The fact that protein structures are highly dynamic structures was first understood in the early 50’s by the use of the hydrogen exchange approach, where information on the tertiary structure of a protein can be obtained by studying the chem-ical reaction where a hydrogen atom is replaced by a deuterium atom, and vice versa, in the backbone of the protein [1]. Since then, protein dynamics has been viewed as of immense importance for the proteins’ biological functions. All these movements that give rise to biological function have been determined from evolutionary selection and since these structural conformations are encoded within the amino acid sequence, evolution decides the sequence [2].

1.1.1 The Limitations of In Vitro Experiments

One of the most powerful methods to study protein structure of real proteins in vitro is to perform X-ray diffraction analysis on crystallized proteins. This method can provide high-resolution structural information about most biological macromolecules in terms of average positions of a conformational ensemble [3]. Even though some information about the flexibility and movement of regions of the protein can be obtained with X-ray diffraction, this method can only generate a frozen picture of a stable protein or, in best case, some snapshots of a few conformations that the protein samples. So far, there

(6)

is no experimental method that can provide a whole movie of a real protein in motion. Additionally, there are problems with the crystallization itself, especially regarding mem-brane proteins. When a protein is being crystallized it is taken out of its native aqueous environment and being packed into a crystal, something that might render a distorted picture of the protein structure. With this information in mind we will try to calculate the motions of a protein, starting from the information contained in crystal structures, using the laws of physics.

As a model system, the focus of this report will be on understanding the dynamics of one very important and well-studied membrane protein that is responsible for pumping calcium ions in the nervous system.

1.1.2 SERCA

Figure 1.1: The starting and ending structures of the SERCA pumping cycle, with the different domains marked. Image from [5]

In eucaryotic cells there is a large manifold of car-riers of signals that are necessary for the correct functioning of cell life. Among these, the most ver-satile is probably Ca2+_{, which regulates a long list}

of important functions. The Ca2+ _{signal is}

impor-tant in the cells from their creation to their death, influencing functions like fertilization, neurotrans-mission, gene transcription, generation of fuels in metabolic pathways, secretion/exocytosis, muscle contraction, memory and programmed cell death [4]. Because Ca2+ _{is involved in so many}

func-tions inside the cell it is beneficial for the cells to have access to an almost unlimited amount of Ca2+

from the extracellular environment. However, if the concentration of Ca2+ increases above the op-timal amount of 100-200 nM for a sustained time the cell will experience discomfort or even death [5]. In order to keep a high concentration of Ca2+ out-side the cell and a low concentration inout-side, there are protein pumps and channels to regulate cal-cium equilibrium. One paradigmatic example is the ATP driven Ca2+_{pump SERCA (sarco endoplasmic}

reticulum Ca2+_{-ATPase) that transports calcium}

ions through the membrane between the cytosol and the lumen of muscle cells, thus making the muscle relax after tension. A number of diseases have been related to mutations in the SERCA pump, among them Brody’s disease, Darier’s disease, heart fail-ure, some types of cancer and possibly type II dia-betes [5].

SERCA might be the best studied calcium transporter, with more than 56 solved crystal structures in nine different states almost covering the complete reaction cycle [6].

(7)

Figure 1.2: A simplified scheme of the catalytic cycle of SERCA. Image from [5]

SERCA is a highly dynamic protein that undergoes a large conformational change as it pumps Ca2+ through the cell mem-brane. The protein consists of four main domains: a transmembrane domain (M) that is built up from ten alpha helices, an actuator domain (A), a phosphoryla-tion domain (P) and a nucleotide-binding domain (N). The three domains A,P and N are flexible domains that move in the cytoplasm, while the M domain is placed through the membrane which is holding it in a more fixed position [5]. The struc-ture of SERCA can be viewed in Figure 1.1 and the pumping mechanism is briefly explained in Figure 1.2.

(8)

1.2 Atomistic Methods

1.2.1 Molecular Dynamics

Since the first simulation of a protein in 1977 [7], molecular dynamics has been one of the most proficient methods to model protein motion. Molecular dynamics (MD) specifies how positions and velocities of all particles in a system vary with time. The time dependent Schrödinger equation should in theory be able to describe all the properties of any molecule with an arbitrary precision, but computationally this is only possible for a system containing a few particles. For larger systems it is therefore necessary to introduce approximations, which are often made from empirical parameterizations of real systems, for example classical Coulomb interactions or Lennard-Jones interactions [8]. In MD Newton’s laws of motion are being integrated numerically, often by the use of finite difference methods [9] and the force field, V (r), containing all the physical detail of the system, is calculated. A sketch of the algorithm behind molecular dynamics can be viewed in Figure 1.3.

Start

Initial input data: force field, V (rt=0) coordinates, rt=0 choose short ∆t Compute potential V (ri), forces Fi = −∇V (ri) and accelerations ai = Fi/m

Move atoms ri+1 =

ri + vi∆t + 1₂ai∆t + ...

Move time forward ti+1 = ti + ∆t

More steps?

Done! no

yes

Figure 1.3: The MD algorithm

In the 70s the limitations for simulat-ing proteins with molecular dynamics were enormous since proteins contained too many atoms and needed to move for a too long time for the computers to han-dle the calculations. Today, advances in computational power and computational algorithms have made it possible to per-form MD simulations on larger systems, but many limitations still remains since the method cannot model conformational changes of large already-folded proteins at biologically relevant time scales.

Umbrella sampling

When two regions of a system are sepa-rated by an energy barrier in the config-uration space some areas might become poorly sampled since the probability of overcoming an energy barrier often will be low. Umbrella sampling can be used in MD simulations in order to enhance the sampling of the system by introducing a potential specifically chosen to cancel the impact of the present energy barrier [10].

(9)

1.3 Coarse-Grained Methods

1.3.1 Langevin Dynamics

Langevin dynamics is a simplified alternative to molecular dynamics that takes friction caused by the surrounding molecules (a fluid for example) and molecular-thermal agi-tation which leads to random collisions on the particle into account as terms added to the Newton equations of motion, and is therefore approximating the canonical ensemble. The analogy of Newton’s equations of motion for every particle in a system including these extra terms is called the Langevin equations [11, 12]:

mir¨i = Fi − γ ˙ri + ξi(t) (1.1)

Here the first term, Fi, is the force obtained from particle interactions, the second term

the friction the particle feels when moving through the surrounding medium with velocity ˙

ri and the damping constant γ of the medium, and the last term ξi(t) is a Gaussian noise

term caused by random collisions from the surrounding particles.

The relationship between the friction force and the random fluctuating force (the Gaus-sian noise term) can be described by the fluctuation-dissipation theorem [13]:

hξi(t)i = 0 (1.2)

Z

hξi(0)ξi(t)i = 6kBT γi (1.3)

where kB is the Boltzmann constant and T is the temperature.

For simulations, however, it is often a good approximation to assume that the random force is unrelated at different times, which makes equation (1.3) take the form

hξi(t)ξj(t0)i = 6kBT γiδijδ(t − t0) (1.4)

where δij is the Kronecker’s delta and δ(t − t0) is the Dirac delta.

In practice this relation means that the temperature of the system is maintained during the simulation.

1.3.2 Elastic Network Models

Instead of simplifying the physical properties acting around or on a simulated object it is possible to reduce complexity by simplifying the object itself. For proteins or other biomolecules it has shown to be powerful to use elastic network models (ENM) in order to study their dynamic properties. An elastic network model is a bead-and-springs rep-resentation, which means that a group of atoms are replaced with one particle that is then connected via uniform elastic springs to the neighboring particles.

(10)

The ENM potential is defined by a network of elastic springs that connects the particles. These particle contacts describe the topology, which is described with the Kirchhoff matrix, Γ, where the ijth element is equal to -1 if the nodes are within a cutoff distance rc and otherwise 0. The connectivity of the particles are described by the diagonal

elements:

Figure 1.4: A nearest neighbors elastic network model of SERCA with cutoff distance 16 Å

Γij = ( −1 dij ≤ rc 0 dij > rc (1.5) Γii= − N X k|k6=i Γik (1.6)

The topology of the network can also be described in terms of a stiffness matrix, where the elements are represented by the Hookean force constants, Kij, of the

springs connecting any pair of nodes i, j. Kij =

1

2ξΓij (1.7)

where ξ is a constant that can have the same value for all pairs or different values for some, depending on the model. The potential energy of the elastic network is according to Hooke’s law

U =X

i6=j

Kij(Rij − R0ij)

2 _(1.8)

where Rij and R0ij are the current and equilibrium distances between each pair of particles,

i, j, respectively.

Despite the fact that elastic network models are extreme simplifications of large macro-molecules, studies have shown that they work remarkably well to qualitatively predict functional properties [14]. The major reason to why these simplified models work is that the network nodes, the topology and shape of the protein play a dominant role when it comes to describing the motions at the low-frequency end of the mode spectrum, that consists of the most collective modes. Long range interactions like those between electrically charged residues, van der Waals interactions, hydrogen bonds etc. are not as important as their impact in the structure, which is thus represented in the network topology. Natural selection has made each biomolecule’s shape and intrinsic motions evolve to support its particular function, which implies that the structure is of highest importance for its functional motions[15].

(11)

Nearest neighbors elastic network model

There are many different types of elastic network models and most of them can only rely on Cartesian distance and thus not discriminate close chain neighbors and distant contacts. Therefore Orellana et al. have developed a more physical ENM at the α-carbon level accounting for this effect [16]. This model uses sequential weighted springs for the closest neighbors, while the long-range electrostatic contacts are described by an inverse function of the Cartesian distance.

The weights of the spring constants for the first sequential neighbors are assumed to follow the harmonic oscillator model giving the following apparent stiffness constants:

K_ijapp = kBT h[Rij − R0ij]2i

(1.9)

where kB is the Boltzmann constant, T the temperature and Rij − R0ij describes

oscilla-tions in distance from average equilibrium values. The constants, K_ijapp are fitted to an inverse exponential function which gives

K_ijapp(Sij) = Cseq Snseq ij and K_ijapp(dij) = Ccart dncart ij (1.10)

where Sij is the sequential distance and dij is the Cartesian distance between residues i, j.

Cseq _{and C}cart _{are constants designed to match real instead of apparent force constants,}

where a value of 60 kcal/(mol·Å2) for Cseq and 6 kcal/(mol·Å2) for Ccart turned out to be the optimal values according to Orellana et al. [16].

The network topology of the nearest neighbors ENM is represented by a fully connected matrix for the first M neighbors where a size-dependent cutoff is used in order to compen-sate for artificial distant interactions. The topology and force constants for the residue pairs i, j with sequential distance Sij > 0 and Cartesian distance dij is therefore

Γij      Sij ≤ M Γij = 1 Sij > M ( Γij = 1 dij ≤ rc Γij = 0 otherwise (1.11) Kij        Sij ≤ M Kij = C seq S_ijnseq Sij > M ( Kij = C cart dij ncart dij ≤ rc Kij = 0 otherwise (1.12)

(12)

1.4 Normal Mode Analysis

Since the 1980s normal mode analysis (NMA) has been used to investigate protein motion[17] but it is only in the last 15 years that the method have gained popularity in the field of protein dynamics. The main reason to this is that it has been discovered that global modes generated by NMA bear functional importance. As coarse-grained models, like elastic network models, were introduced the feature became even more evi-dent simultaneously as the need for computational power was largely reduced [18]. Considering one particle that is moving in one dimension "fixed" by one Hookean spring Newton’s equation reads

m∂

2_x(t)

∂t2 = −kx(t) (1.13)

where m is the mass of the particle, x is the displacement from the equilibrium spring length and k is the spring constant. The general solution for this equation can be written as

x(t) = Aeiωt (1.14)

where A is an arbitrary constant and ω the frequency. Insertion of the solution into the differential equation yields

ω2mA = Ak (1.15)

This equation can now easily be generalized for a system of N particles in Cartesian space connected by springs. The above equation now reads

m ¨R(t) + HR(t) = 0 , Hij =

∂2V ∂ri∂rj

0

(1.16) where R is the coordinates of the particle at the time t and H is the Hessian matrix that contains the second derivatives of the energy with respect to the coordinates. This equa-tion can then be solved by diagonalizaequa-tion of the mass-weighted Hessian matrix, which renders a set of eigenvalues and eigenvectors [12]. The eigenvalues defines the normal modes and the eigenvectors are linear combinations of the movements of the particles. The eigenvectors corresponding to the lower eigenvalues usually represent larger protein motions while higher ones represent small motions, vibrations or noise. In conclusion, NMA removes the need of integrating Newton’s equations by assuming harmonicity to the system and thus allows unique analytical solutions to the system (the normal modes).

(13)

1.5 Principal Component Analysis

Principal component analysis (PCA) is a statistical method that is being applied in many fields, where data from the Cartesian coordinate system is being transformed into a new system of collective coordinates. The idea is to reduce dimensionality of a multivariate data set, while preserving as much variation as possible from the original data set. The new variables, the principal components, are uncorrelated and ordered so that the first few variables describe most of the variation present in all of the original variables [19]. For biomolecular structures the major principal components of the motion will often be very similar to the first normal modes calculated from NMA since the largest motions are the most relevant for biological function.

Assume that there is an ensemble of M conformations (crystal structures or MD snap shots for example) of a protein consisting of N particles (atoms, residues or other coarse-grained representations). Every conformation, k, can be described by a 3N -dimensional vector containing position vectors R(k)_i = (x(k)_i , y(k)_i , x(k)_i )T_{, (1 ≤ i ≤ N ), of the N}

particles. Then we define a 3N -dimensional fluctuation vector for each conformation

∆q(k)= q(k)− q0 _(1.17)

that contains the position vectors organized as

q(k) =(R(k)₁ )T, ..., (R(k)_N )T

T

(1.18) The fluctuation vector is therefore describing the deviation of all the position vectors from their equilibrium positions R0_i = (x0

i, yi0, zi0)T.

(a) First principal component (b) Top view of the first principal

component (c)Second principal component

(14)

All the fluctuation vectors are then stored in the covariance matrix, which is also the inverse of the Hessian matrix described earlier

C = M−1X

k

∆q(k)∆q(k)T (1.19)

The covariance matrix can then be diagonalized

C = PSPT (1.20)

where P is the matrix containing normalized displacements along the principal axes, or principal modes of structural changes, and S is the diagonal matrix containing all the eigenvalues, which are directly proportional to the variance along the principal axes. The corresponding eigenvectors then represent the correlated most important motions of the molecule [20]. In this report PCA is going to be applied on the 56 solved crystal structures for SERCA in order to extract the main coordinates of motion for the protein, which will then be used as a reference for our simulations. The first and second principal components of SERCA can be viewed in Figure 1.5.

(15)

Chapter 2 Investigation

2.1 Problem

The aim of this report is to investigate large-scale motions of proteins and how these affect its biological functions. Which are the most important region responsible for the dynamics of a protein and how can mutations affecting a single amino acid affect the protein function? More specifically we are going to try to understand how the Ca2+

-pump SERCA works on a molecular level. So far, it is not possible to get a movie of a real moving protein, only snapshots of the crystallized protein in a few stable or metastable states. Although there are many solved crystal structures for SERCA, it is important to know the intermediate states connecting them, which seldom can be crystallized due to their instability. Therefore we are instead going to tend to the laws of physics and try to simulate the protein between a few experimentally determined states. However, full atomistic simulation methods like molecular dynamics require a lot of computational power and complicated biasing techniques, which makes it nearly impossible and very expensive to sample large conformational changes like the ones we are interested in. Coarse-grained methods is a fast low-resolution alternative, better suited for large conformational changes. Hopefully a few places in the structure that are important for the dynamics of SERCA will be found. The computed mutation candidates will then be compared with already reported in vivo mutations in the SERCA-pump from cancer cells. Ideally these results could lead to further understanding of the mechanisms, on a molecular level, behind diseases caused by malfunction in SERCA.

2.2 Methods

In this report several methods were used to study the conformational changes of SERCA between the open structure E1-2Ca2+ _{(PDB reference: 2C9M) and the closed structure}

E1-2Ca2+-ADP (PDB reference: 1T5S) as well as from E1-2Ca2+-ADP (PDB reference: 1T5T) to E2P (PDB reference: 3B9B). The protein was modeled with a nearest neighbors elastic network model, where the α-carbons in the protein was connected by Hookean springs. The elastic network beads-and-springs potential was ten used to drive a Langevin

(16)

dynamics simulation between the experimentally solved start and target structures, bi-ased to find the target structure by sampling functional modes obtained from PCA. All this was implemented with the Verlet algorithm.

Finding the Transition Pathway

The transition pathways were analyzed by projecting the trajectories received from the elastic network model-Langevin dynamics (ENM-LD) simulations upon the first and sec-ond principal components calculated from the 54 crystal structures available for SERCA in the Protein Data Bank. The transition from the crystal structures 1T5T to 3B9B was compared with an umbrella sampled molecular dynamics simulation (made by Olof Hedkvist) using the LD-computed structures as seeds in order to drive the simulation. This was used in order to get a more accurate picture of the minimum energy pathway between the studied conformations. The other transitions were not compared with any other references than the crystal structures. All ENM-LD simulations were carried out with a cutoff distance of the elastic network at 16 Å.

Interdomain Locking

First we tried to find certain dynamically important regions or "hotspots" that are im-portant for the transitions speed, and thus where point mutations could affect the func-tionality of the protein. This was done by locking two predefined domains together by an increase of the force constants of the springs connecting these two domains. If it is important for the function of the protein that these domains move relative to each other, the transition between the two states would go slower. The transition speed might also accelerate by pushing domains together that are naturally moving towards each other. The transition speed was evaluated by calculating the root mean square deviation to the target structure as a function of iteration time.

Finding Correlating Cancer-Related Mutations

Places in the structure where mutations could alter the function were found with an ex-clusion approach. First, all the interdomain contacts within a given cut-off distance were computed using the coordinates of all the α-carbons from the coarse-grained Langevin simulation. Only distances between carbons belonging to different domains were of in-terest, since the relative motions of the residues inside a domain are very small. Here the cut-off distance 10 Å was used in order to capture all the interesting bonds that could be formed during the transition. Since fixed bonds are not of very much interest when we deal with conformational transition, the contacts being formed or broken some-time during the transition were sorted out. The formation or breaking of salt bridges are of importance for the protein’s dynamic properties as they can act as stabilizers for the structure depending on their placement in the protein [21]. For that reason only electrically positively and negatively charged residues were selected as candidates. Next, the remaining contact pairs were compared with reported mutations found in diseased cells from the Catalogue of Somatic Mutations in Cancer (COSMIC) [22]. The matching

(17)

residues were visualized in the full-atom reconstructed Langevin simulation and charged side chains placed so that salt-bridge formation would be impossible were dismissed.

Deformational Space Overlap

In order to see if the elastic network model is a good approximation for the system the accumulated normalized dot product between the ten first eigenvector pairs from each subspace (obtained from PCA and NMA) were calculated. The following equation was used γXY = 1 m m X i=1 m X j=1 (ν_iP CA· νN M A j ) 2 _(2.1)

where i, j are the orders of the eigenvectors and m is the minimum number of eigen-vectors needed to describe a certain variance threshold. The eigeneigen-vectors derived from the principal component analysis originate from crystal structures obtained from x-ray diffraction, while the eigenvectors calculated with normal mode analysis originate from an elastic network representation. Therefore the eigenvectors from the PCA can be seen as an experimental reference for NMA evaluation.

2.3 Results and Discussion

2.3.1 Transition Pathways

The transition pathways between the structures 2C9M and 1T5S and 1T5T and 3B9B can be viewed in Figure 2.2 a) and in b) together with an umbrella sampled molecular dynamics simulation (the scattered points) from the state 1T5T to 3B9B. The black clustered points are all different experimentally solved crystal structures of SERCA ob-tained from the Protein Data Bank. In Figure 2.2 a) it is first of all clear that transition pathways do not necessarily follow the linear Cartesian interpolation, which would be represented as a straight line connecting the crystal structures through the shortest dis-tance. We can also see that the transition from the structure 2C9M to 1T5S is passing closely to a crystal structure, confirming that this is an intermediate state of the transi-tion. This is also a validation of the method since the simulation does not just take the fastest way to the target, but instead makes the effort to visit this experimentally solved intermediate structure without introducing any biasing towards it. It is also interesting to note what kinds of motions that drive the different transition. Even though both of the first and second principal components are important for the transition it can be seen that the transition between 2C9M and 1T5S is more dependent on the opening and closing of the A- and P-, N-domains, while in the transition between 1T5T and 3B9B the rotation of the A-domain and the tilting of the transmembrane are more dominant. This can clearly be seen in coarse-grained ENM-LD simulations of these complete transitions. In Figure 2.2 b) we can see the umbrella sampled MD simulation from 1T5T to 3B9B together with the transition pathways calculated with ENM-LD. It can be noticed that

(18)

(a) ENM-LD (b)umbrella sampling

Figure 2.1: Trajectories projected upon the principal components obtained from a) the ENM-LD simulation and b) an umbrella sampled MD simulation (the scattered points), plotted together with the trajectories in a)

the ENM-LD trajectory only follows the MD pathway in the end of the transition. The MD trajectory starts in the state 1T5T but then makes a jump in the second principal component before it goes rather straight towards the target conformation. The reason why the MD simulation makes this jump might be that it finds a more attractive energy minimum outside 1T5T conformation. However, this MD simulation does not contain all the details present in the real system, since the two Ca2+ _{and the ATP-molecules that}

should be present inside and attached to the protein at this stage in the pumping cycle have been left out. If these will be added, maybe the trajectory will look a bit different. A more radical theory is that the crystal structure itself is wrongly placed in the plot. Since the crystallization of a protein is troublesome the structure might not be entirely correct and thus not completely reconcilable with physics. One must also keep in mind that MD is not exact in any way, but relies on statistical mechanics. In order to get a reliable reference as much more detail as possible should first be added to the force field and then the simulations should be repeated in order to get a statistical average.

Regarding the close passing of the intermediate state in the 2C9M to 1T5S transition and the fact that the 1T5T to 3B9B follows the MD pathway reasonably well the ENM-LD simulations can be assumed to be reliable as an approximation. However, the parameters (cutoff distance for example) of the nearest neighbors elastic network model can most likely be much better fitted to SERCA which then might render more reliable trajectories. With some better fitting the 2C9M to 1T5S trajectory can hopefully pass the intermediate state even closer.

2.3.2 Interdomain Locking

One result of locking two domains together in the elastic network model-Langevin dy-namics (ENM-LD) simulation is shown in Figure 2.1. The plot shows the root mean square deviation (RMSD) to the target conformation with respect to iteration time of

(19)

Figure 2.2: Root mean square deviation versus iteration time when two domains are forced together. A simple ENM was used with cut-off distance 4 Å. To the left is the open structure (2C9M) of SERCA and on the right the closed structure (1T5S). The domains that have been forced together are domains 2 (green) and 6 (blue), 7 (pink) and 8 (yellow), 8 (yellow) and 9 (red).

the algorithm, which means that the slope of the curve can be interpreted as the speed of the conformational transition. For this method many combinations of cutoff distances in the ENMs as well as different types of ENMs were tested without giving satisfactory results. We expected to obtain curves that deviated much from each other as highly dynamic domains that were perturbed by being forced to stick with another domain, were plotted together with perturbed domains that did not have much relative motion. Figure 2.1 shows the most satisfactory result obtained, where the domains marked in the coarse-grained pictures of SERCA to the right and left of the plot were forced together. The simple ENM was used with a cutoff distance of 4 Å. It can be noticed that when the two closely positioned alpha helices in the trans membrane part are being forced together the transition goes a little bit slower than for the other two domain pairs marked in the figure. The effect was however expected to be much larger and due to the many simu-lations (48) performed with different ENMs, cutoff distances and force constants of the springs, all hardly showing any deviation from unperturbed simulations, the conclusion can be drawn that this method is not working for SERCA in its current form. This was first very surprising since the method successfully has been used on other proteins [23]. However, since SERCA has a very compact transmembrane part as well as two initially widely spaced and very dynamic domains (the N and A domain) it is difficult to find a suitable elastic network model that will allow the protein to respond to perturbations. Either the protein gets too rigid with too many springs connecting the particles if a high cutoff is used or too loose with a low cutoff. Therefore more fine tuning of the parame-ters is necessary before this method will show any results, but this will be a both time consuming and difficult task to carry out.

(20)

Figure 2.3: The displacement, in Å, of every residue for the four different transitions with respect to simulation frame.

Every vertical line shows how much the protein has moved since the last frame and the color displace the magnitude of the displacement in Å. This shows which parts of the protein that are moving much and at what time during the simulation. Notice the light horizontal lines that appears in all of the transitions. These lines correspond to residues that are moving a lot for a longer time during the transition, which indicates that these are dynamically important regions for the function of the protein. The N-domain is in our definition ranging from residues 359 to 602 and it can clearly be seen in all of the transitions a), b), c) and d) that residues from this domain are highly dynamic. Especially residues 570 to 590 and 500 to 520, are moving much during the whole transition, even though these lines are clearer in a) and b). These domains are situated on the top of the N-domain towards the inner side of the protein. In transition a) and b) we also have some motion in the bottom of the alpha helices in the transmembrane part (residues 860 to 900) and in the end of the transition motion in a short helix connecting the A-domain with the transmembrane part can be seen (residues 50 to 60), which fit with the model of piston-like motions related to calcium pumping. In transition c) and d) we can instead see a little bit more motion in the A-domain and the M1-M2 helices (residues 1 to 230). In figure 2.3 d) there is a clear thin line appearing in the middle of the transition. This line corresponds to a piece of bent coil sticking out of the A-domain first being very close to the P-domain. As the A-domain rotates this part is violently being repelled from the P-domain.

2.3.3 Correlation with Cancer-Related Mutations

Table 2.1-2.4 presents all the obtained interdomain contacts that has at least one in vivo mutation reported for at least one of the contact residues. One of the most interesting

(21)

findings is probably the contact that is being formed between the electrically charges residues 489 and 680 as the N-domain is bending forward towards the P-domain, in the 2C9M to 1T5S transition. It is likely that this residue pair will form a salt bridge as the get close enough to each other an therefore work as a lock helping the protein to stay in the closed conformation. The mutation reported for the negatively charged glutamic acid means that it will be changed into an uncharged valine, which then means that this salt bridge can not be formed. Depending on how much of a stabilizing effect this specific salt bridge has for the protein, the mentioned mutation might have a radical effect on the folding of the protein. There are also other interesting contacts that might have a big effect of the stabilization of the protein. These are situated in the transmembrane domains, often close to the P-domain. One particularly interesting residue contact is the one between residues 325 and 749, where there are several possible mutations reported for residue 749 and one for residue 325. One of these mutation, namely when the negatively charged glutamic acid in residue 749 is being changed into a positively charged lysine, which means that there are going to be two positively charged amino acids repelling each other instead of a salt bridge being formed. Radically, this might cause the protein to expand from the inside or at least reduce the overall stability in the transmembrane. We also have a three point contact where the residues 47, 113 and 243 form a triangle as they are situated on three different helices right below the A-domain, might be forming salt bridges. It is not clear from the simulation between which pairs a salt bridge might be formed, or if a salt bridge will first be formed between two of the residues, then be broken after which a salt bridge is formed between the other two residues, as the A-domain rotates. However, there is a mutation reported for residue 47 where the positively charged lysine is replaced with a negatively charged glutamic acid, which will lead to repulsion between these three residues. This might cause severe destabilization in the helices below the A-domain and possibly cause the A-domain to drift further from the middle of the protein, making the closing of the A- and N-domains harder.

In the 1T5T to 3B9B transition we first have four different contacts with possible mu-tations, probably corresponding to salt bridges being broken between the P-, N- and A-domains. Then we have one interesting contact that is being formed between the P-domain and the transmembrane that might be of importance for the positioning of the P-domain as it moves towards the center of the protein. For this contact there is a possible mutation where the positively charged arganine is changed into a methionine with a hydrophobic side chain, which makes it impossible for the residues to form a salt bridge. Again, we have the residues 47 and 113 involved in a three point contact, but this time the third residue is number 324. These residues are placed in a row on three different helices under the A-domain and in the transmembrane. There are several possi-ble mutation that can occur in these three residues, but the one that might have to most radical effect would probably be when the negatively charged glycine in residue 113 is changed into a positively charged lysine, meaning that this residue would repel both of the other residues in the trio, since this residue is placed in the middle of the row. There are also interesting results for the transitions 1T5T to 2C9M and 3B9B to 1T5T, but since they are not very likely to occur in the natural pumping cycle of SERCA, these results will not be discussed in this report.

However, these results need to be investigated further with more accurate methods like molecular dynamics. First, it is necessary to simulate the protein free from mutations

(22)

in order to see if salt bridges really are being formed at the interesting places. Then all these mutations have to be put into the protein separately and simulated again in order to see if and how these mutations affect the function of the protein.

Table 2.1: The interdomain contacts with corresponding possible mutations seen in diseased cells. ⊕ denotes positively and negatively electrically charged side chains respectively. Transition 2C9M to 1T5S.

Interdomain contact Minimum

Cα-distance Mutations found in cancer cells Mutation effect 557 ASP - 638 ARG 4.473 p.R638* p.R638Q ⊕ → stop codon ⊕ → polar side chain

557 ASP - 637 ARG 7.009 p.R637C ⊕ → uncharged

489 ARG - 680 GLU 7.404 p.E680V → hydrophobic side chain

47 LYS - 243 GLU 7.664 p.K47E ⊕ →

47 LYS - 113 GLU 7.710 p.K47E

p.E113K ⊕ → → ⊕ 325 ARG - 749 GLU 7.739 p.R325Q p.E749K p.E749* p.E749E

⊕ → polar side chain → ⊕

→ stop codon Silent

80 GLU - 290 ARG 7.760 p.R290S ⊕ → polar side chain

5 HIS - 196 ASP 7.827 p.H5H Silent

Table 2.2: The interdomain contacts with corresponding possible mutations seen in diseased cells. ⊕ denotes positively and negatively electrically charged side chains respectively. Transition 1T5T to 3B9B.

Cα-distance

Mutations found in cancer cells

Mutation effect

329 LYS - 737 ASP 6.995 p.D737H → ⊕

139 ARG - 426 ASP 7.083 p.R139M ⊕ → hydrophobic side chain

695 ASP - 825 LYS 7.201 p.R825Q ⊕ → polar side chain

47 LYS - 113 GLU 7.220 p.K47E

p.E113K

⊕ → → ⊕

218 LYS - 422 ASP 7.248 p.K218I ⊕ → hydrophobic side chain

557 ASP - 638 ARG 7.264 p.R638*

p.R638Q

⊕ → stop codon ⊕ → polar side chain

560 ARG - 627 ASP 7.362 p.R560H ⊕ → ⊕

(23)

Table 2.3: The interdomain contacts with corresponding possible mutations seen in diseased cells. ⊕ denotes positively and negatively electrically charged side chains respectively. Transition 1T5S to 2C9M.

Cα-distance Mutations found in cancer cells Mutation effect 557 ASP - 638 ARG 6.178 p.R638* p.R638Q ⊕ → stop codon ⊕ → polar side chain

47 LYS - 113 GLU 7.220 p.K47E

p.E113K ⊕ → → ⊕ 325 ARG - 749 GLU 6.974 p.R325Q p.E749K p.E749* p.E749E

560 ARG - 627 ASP 7.165 p.R560H ⊕ → ⊕

329 LYS - 737 ASP 7.453 p.D737H → ⊕

218 LYS - 422 ASP 7.248 p.K218I ⊕ → hydrophobic side chain

139 ARG - 426 ASP 7.972 p.R139M ⊕ → hydrophobic side chain

Table 2.4: The interdomain contacts with corresponding possible mutations seen in diseased cells. ⊕ denotes positively and negatively electrically charged side chains respectively. Transition 3B9B to 1T5T.

Cα-distance

Mutations found in cancer cells

Mutation effect

113 GLY - 334 ARG 6.442 p.E113K → ⊕

557 ASP - 638 ARG 6.948 p.R638*

p.R638Q

⊕ → stop codon ⊕ → polar side chain

47 LYS - 113 GLU 7.204 p.K47E

p.E113K

⊕ → → ⊕

5 HIS - 192 GLU 7.547 p.H5H Silent

325 ARG - 749 GLU 7.562 p.R325Q

p.E749K p.E749* p.E749E

695 ASP - 825 LYS 7.988 p.R825Q ⊕ → polar side chain

2.3.4 Deformational Space Overlap

The accumulated normalized dot product between the ten first eigenvectors obtained from both NMA and PCA, as described in the Methods section, yielded an overlap of 84.346 % when a cutoff of 16 Å was used in the ENM. However, there is no indication that this is the optimal cutoff distance. Therefore similar calculations should be performed for other cutoff distances in order to find the best overlap and thus the most optimal cutoff for the nearest neighbors elastic network model of SERCA.

(24)

Chapter 3 Summary and Conclusions

In this report large conformational changes of the membrane protein SERCA has been studied with coarse-grained methods. The protein was modeled with the elastic network model nearest neighbors, coarse-grained to the Cα-level, and simulated with Langevin dynamics. This report focuses on the two largest transitions, between the crystal struc-tures 2C9M and 1T5S and 1T5T and 3B9B. The goal was to find certain places in the structure where mutations could alter and affect the functioning of the protein.

First, the natural motion of SERCA was simulated for all of the transitions mentioned above. The transition pathways were then obtained by projecting the simulations onto the first and second principal components obtained from an ensemble of crystal struc-tures. The result showed that the transition going from 2C9M to 1T5S passes close to a crystal structure of an intermediate state, which enhances the reliability of the pathway. It is also evident that the Cartesian interpolation is not a physical way of describing the transition, since the shortest way does necessarily minimize the energy and is thus not always the optimal pathway for the protein. The transition pathway 1T5T to 3B9B was also compared with a transition pathway obtained from an umbrella sampled molecular dynamics simulation. Even though there are differences especially in the beginning and the end of the transitions, the Langevin dynamics simulation is following the MD simu-lation quite well. The MD simusimu-lation was however made with seeds from the Langevin simulation, otherwise it would not be able to find the path in a reasonable time. There-fore the conclusion can be drawn that elastic network models simulated with Langevin dynamics can sample large conformational transitions that molecular dynamics can not tackle.

The reasonably high overlap between the eigenspaces from principal component analysis and normal mode analysis shows that the elastic network model is a good approximation for atomistic representations of proteins. However, it is very likely that a more optimal cutoff distance can be used in the elastic network thus yielding a higher overlap.

The first approach to find the places in the structure where mutations could affect the functioning of the protein was to use the elastic network to lock two domains to each other by increasing the force constants of the springs connecting the domains. This would make the structural transition either slow down or accelerate, as the protein was not allowed to move in its natural way. However, hardly any changes could be observed in

(25)

the transition speed for SERCA, suggesting that its elastic network model representation does not allow the protein to respond to perturbations. In order to notice any differences in the transition speed the parameters of the elastic network model need to be fine-tuned to fit the structure of SERCA.

The second approach was to calculate all the possible places where salt bridges could be formed between two different domains. These places were then compared to mutations observed in real cancerous cells and several matches were found. Compared to the first approach this method did not allow for investigation of the importance of these salt bridges for the function of the protein. Many of the possible mutations found seem promising, but in order to be able to draw any conclusions these mutations need to be simulated with an atomistic method so that it can be seen how they affect the function of the protein. Optimally, these results may lead to further understanding of what is causing cancer diseases related to malfunction in SERCA on a molecular level.

(26)

Bibliography

[1] S.W. Englander, L.Mayne, Y. Bai, T.R. Sosnick, Hydrogen Exchange: the Modern Legacy of Linderstrøm-Lang, Protein Science 6, 1101-1109, 1997

[2] J. Velázquez-Muriel, M. Rueda, I. Cuesta, A. Pascual-Montano, M. Orozco, J.M. Carazo, Comparison of Molecular Dynamics and Superfamily Spaces of Protein Domain Deformation, BMC Structural Biology 9, 2009

[3] K. Acharya, M. Lloyd, The Advantages and Limitations of Protein Crystal Structures, TRENDS in Pharmacological Sciences 26, 10-14, 2005

[4] E. Carafoli, The Calcium-Signalling Saga: Tap Water and Protein Crystals, Nat. Rev. Mol. Cell. Biol. 4, 326-332, 2003.

[5] M. Brindi, E. Carafoli, Calcium Pumps in Health and Disease, Physiol. Rev. 89, 1341-1378, 2009.

[6] C. Toyoshima, Structural Aspects of Ion Pumping Ca2+-ATPase of Sarcoplasmic Reticulum, Archive of Biochemistry and Biophysics 476, 3-11, 2008.

[7] A. McCammon, B. Gelin, M. Karplus, Dynamics of Folded Proteins, Nature 267, 585-590, 1977.

[8] E. Lindahl, Molecular Dynamics Simulations, Molecular Modeling of Proteins 443, 3-23, 2008

[9] A. Leach, Molecular Modelling (2nd Edition), Pearson Education Limited, 353-360, Edinburgh, 2001.

[10] G. Torrie, J. Valleau, Npnphysical Sampling Distribution in Monte Carlo Free-Energy Estimation: Umbrella Sampling, Journal of Computational Physics 23, 187-199, 1977

[11] P. Langevin, On the Theory of Brownian Motion, Acad. Sci. 146, 530-533, 1908 [12] M. Orozco, L. Orellana, A. Hospital, A. Naganathan, A. Emperador,

O. Carrillo, J. L. Gelpí, Coarse-Graining Representation of Protein Flexibility. Foundations, Successes, and Shortcomings, Advances in Protein Chemistry and Structural Biology 85, 183-215, 2011.

[13] R. Kubo, The Fluctuation-Dissipation Theorem, Rep. Prog. Phys. 29, 255-283, 1966

(27)

[14] F. Tama, Y. Sanejouand, Conformational Change of Proteins Arising from Nor-mal Mode Calculations, PEDS 14, 1-6, 2000

[15] I. Bahar, T. Lezon, L. Yang, E. Eyal, Global Dynamics of Proteins: Bridging Between Structure and Function, Annu. Rev. Biophys. 39, 23-42, 2010

[16] L. Orellana, M. Rueda, C. Ferrer-Costa, J. Lopez-Blanco, P. Chacón, M. Orozco, Approaching Elastic Network Models to Molecular Dynamics Flexibility, J. Chem. Theory Comput. 6, 2910-2923, 2010

[17] B. Brooks, M. Karplus, Normal Modes for Specific Motions of Macromolecules: Application to the Hinge-Bending Mode of Lysozyme, Proc. Natl. Acad. Sci. U.S.A. 82, 4995-4999, 1985.

[18] I. Bahar, A. Rader, Coarse-Grained Normal Mode Analysis in Structural Biol-ogy, Curr. Opin. Struct. Biol. 15, 586-592, 2005.

[19] I. T. Jolliffe, Principal Component Analysis (2nd Edition), Springer-Verlag, 1-9, New York, 2002.

[20] I. Bahar, T. Lezon, A. Bakan, I. Shrivastava, Normal Mode Analysis of Biomolecular Structures: Functional Mechanisms of Membrane Proteins, Chem. Rev. 110, 1463-1497, 2010.

[21] S. Kumar, R. Nussinov, Salt Bridge Stability in Monomeric Proteins, J. Mol. Biol. 293, 1241-1255, 1999.

[22] S. Forbes, G. Bhamra, S. Bamford, E. Dawson, C. Kok, J. Clements, A. Menzies, J. Teague, P. Futreal, M. Stratton, The Catalogue of Somatic Mutations in Cancer (COSMIC), Curr. Protoc. Hum. Genet. 57, 1-10, 2008. [23] B. Fenwick, L. Orellana, S. Esteban-Martín, M. Orozco, X. Salvatella,

Correlated Motions are a Fundamental Property of β-Sheets, Nature Communica-tions 5, 1-9, 2014.

Modeling of Conformational Transitions in Membrane Proteins with Coarse-Grained Methods