Proton binding in proteins: pK

(1)

UPTEC X 04 025 ISSN 1401-2138 APR 2004

JENS CARLSSON

Proton binding in proteins: pK _a

calculations with explicit and implicit solvent

Master’s degree project

(2)

Molecular Biotechnology Programme

Uppsala University School of Engineering

UPTEC X 04 025 Date of issue 2004-04

Author

Jens Carlsson

Title (English)

Proton binding in proteins: pK_a calculations with explicit and implicit solvent

Title (Swedish)

Abstract

Accurate and reliable pK_a prediction is of significant interest, because it provides direct information of the protonation state of a protein and can be compared to experimental data. In this work pKa shifts of three aminoacids in proteins have been calculated using molecular dynamics free energy simulations with an explicit solvent and implicit Generalized Born solvent model. The direction of the shifts were correctly predicted in both the explicit and implicit solvent calculations, but the results were not in perfect agreement with experimental data.

Keywords

Proton binding, pKa, free energy calculations, molecular dynamics

Supervisors

David A. Case

Department of Molecular Biology, The Scripps Research Institute, San Diego, USA Scientific reviewer

Johan Åqvist

Department of Cell and Molecular biology, Uppsala University

Project name Sponsors

Language

English

Security

ISSN 1401-2138 Classification

Supplementary bibliographical information Pages

(3)

(4)

Proton binding in Proteins:

pK

_a

calculations with explicit and implicit solvent

Jens Carlsson

Sammanfattning

Att kunna modellera cellens molekyler med datorsimuleringar är av stort intresse inom biologin och kan ge djupare förståelse för relationen mellan proteiners stuktur och funktion. Proteiners aktivitet styrs bland annat av den omgivande syra-bas

koncentrationen i cellen och i detta projekt har förändringar hos proteiner vid olika pH- värden studerats. För att på ett korrekt sätt beskriva den miljö som omgiver proteinet måste naturligtvis vatten inkluderas i datorsimuleringarna och detta studerades med två olika modeller. Den ena modellen beskriver vattenmolekyler på ett detaljerat sätt, vilket leder till mycket kostsamma beräkningar, medan den andra modellen representerar vattnets effekt på ett mer förenklat sätt. Det visade sig att båda modeller kunde, på ett tillfredställande sätt, indikera proteinets egenskaper vid olika pH-värden.

Examensarbete 20 p i Molekylär bioteknikprogrammet Uppsala universitet April 2004

(5)

Contents

1 Introduction 2

2 Theory 4

2.1 Molecular dynamics . . . . 4

2.2 Molecular mechanics and force ﬁelds . . . . 5

2.3 Thermodynamic integration . . . . 6

2.4 The Generalized Born model . . . . 8

2.5 pK_a shifts from free energies . . . . 10

2.6 Linear Response Theory . . . . 11

3 Materials and Methods 13 3.1 Modelling proton binding with molecular mechanics . . . . 13

3.2 Choice of appropriate residues and reference molecule . . . . 14

3.3 Setup of explicit solvent calculations . . . . 15

3.4 Setup of implicit solvent calculations . . . . 16

4 Results and Discussion 17 4.1 Linear free energy derivative and Gaussian distribution of the perturbation energy . . . . 17

4.1.1 Explicit solvent calculations . . . . 17

4.1.2 Implicit solvent calculations . . . . 20

4.2 Structural analysis and comparison to experimental data . . . . . 22

4.2.1 Asp26: structural re-organization on long timescales . . . 22

4.2.2 Asp20 . . . . 28

4.2.3 Asp14 . . . . 28

5 Conclusions 30

6 Acknowledgements 31

(6)

1 Introduction

Understanding the pH dependent properties of proteins is of great importance, because ionizable residues in active sites play key functional roles in fundamental biochemical processes. A deeper insight into proton binding provides knowledge in many ﬁelds of biochemistry, such as protein stability, substrate binding, en- zyme mechanisms and molecular recognition. For these reasons, accurate and reliable pK_a prediction is of signiﬁcant interest, because it provides direct information of the protonation state of the protein and can easily be compared to experimental data.

Proton binding constants are sensitive to the environment created by the tertiary structure of the protein. For example, an aspartic acid in a nonpolar environment will have an enormous pK_a shift upwards, because it is highly unfavorable to create a charge in such a region. Conversely, a positively charged environment would in this case favor an unprotonated state and hence downshift the pK_a value (Figure 1).

Figure 1: The pK_a shift of ionizable residues depends on the local environment in a protein. Buried residues can have substantially shifted pK_a values compared to the amino acid in aqueous solution (The illustration was used with permission from Holger Gohlke [1]).

In the ionization of an aspartic acid in a protein a hydrogen ion is transferred into the solvent leaving a negatively charged carboxylate group. The protein itself undergoes dielectric relaxation by conformational changes and electronic polarization in response to the charge re-arrangement.

There are several macroscopic methods for calculating pK_avalues. One popular approach is based on solving the Poisson-Boltzmann equation(PB) using a continuum model for both the protein and the solvent [2, 3]. These calculations are often based only on one or a few experimental conformations of the protein, but unfortunately, slightly different conformers have been shown to pro- duce results, that differ by several pK_a units [4, 5, 6]. Attempts to incorporate conformational averaging in PB models by side chain flexibility [7, 8, 9], larger ensembles of structures [10], or an average structure [11] have been shown to improve results. Yet, these models still lack the ability to model the conformational changes that occur in response to protonation state changes. These effects are accounted for implicitly in PB models by using higher values of the dielectric constant, [12]. There is, however, no general consent about the meaning of an average in such heterogenous structures as proteins and how an appropriate value for a specific model is chosen [13, 14, 15].

(7)

A more fundamental approach is given by microscopic free energy calculations using Molecular Dynamics (MD) simulations. Unfortunately, calculations of free energies using an all atom representation of both solvent and protein have, for a long time, been impossible due to convergence problems, lack of computational power, and inaccurate treatment of long-range interactions. Mi- croscopic calculations of the free energy associated with deprotonation of ionizable residues have been carried out, but many of these studies have made further approximations, for example by using simpliﬁed solvent representations such as the Langevin Dipole model [16]. Another popular assumption is the linear response approximation [17], which assumes a linear response of the protein to the charge re-arrangement. Although linear response has been shown to be valid for small molecules, few computational studies with a fully microscopic treatment of the protein and solvent system have been performed in this area [18].

In recent years, there has been significant improvement in the accuracy of long-range electrostatics in MD simulations as well as an enormous increase of computer power, which provide the opportunity to reach converged results using a fully microscopic approach. Since pK_a essentially is the free energy of deprotonation, a fundamental way to estimate this value is by using Molecular Dynamics Free Energy (MDFE) simulations. This work presents the first MDFE calculatations of pK_a shifts with an all-atom representation of both the protein and the solvent as well as an accurate treatment of long-range electrostatics by Particle Mesh Ewald (PME) summation. The approach should therefore provide a more accurate description of both conformational flexibility and solvation effects compared to PB methods. Furthermore, it allows the protein to relax from the structure of the initial state in response to the charge re-arrangements.

The study focuses on three aspartic acids of which two are extremely per- turbed in pK_a compared to the free amino acid in solution (pK_a ∼ 4.0) [19]:

Asp26 (pK_a ∼ 7.5) [20] and Asp20 (pKa ∼ 4.0) [21] of Escherichia coli Thiore- doxin and Asp14 (pK_a ∼ 2.0) [22] of Ribonuclease A. For all cases, the direction of the pK_a shifts were correctly predicted and the results are in fair agreement with experimental data. The simulations reported also directly verify linear response to the deprotonation of a solvated protein.

A second set of MDFE pK_a calculations were carried out using a Gener- alized Born [23, 24, 25] implicit solvent model. Although this approach uses a continuum solvation model, it still explicitly takes into account conformational averaging and protein relaxation and, thus, does not, contrary to most PB methods, assume a higher value of the protein dielectric constant. The same three residues as in the explicit solvent calculations were considered. Again the direction of the shifts were correctly predicted and the results were closer to the experimental data than the corresponding explicit solvent calculations.

(8)

2 Theory

This section outlines the basic principles of molecular dynamics and free energy calculations. More detailed descriptions of these methods can be found in litterature [26, 27].

2.1 Molecular dynamics

In molecular dynamics the movement of a molecule is simulated as a function of time by solving Newtons equations of motion. Assume a system composed of N atoms, for example a protein in aqueous solution. By deﬁning a potential function, V (r^N), for the system, the force F_iacting on atom i, can be calculated:

F_i=−∂V (r^N)

i

(1) where r^N = r₁, ...,r_N are the cartesian coordinates of all N atoms. Since the masses m_iof the atoms are known, Newton’s second law connects the force with the acceleration a_iof the atoms:

a_i= F_i

m_i (2)

By integration of this expression, the velocities and positions of the atoms are obtained. Under the assumption that the force acting on each atom is constant for a given time step, , new positions and velocities can be generated by using ﬁnite diﬀerence methods. An example of such an algorithm is leap-frog [28], which is widely used in MD simulations:

r_i(t + δt) = r_i(t) + δt· vi(t) (3) v_i(t + δt) = v_i(t−1

2δt) + δt· ai(t) (4) where r_i(t) and v_i(t) are the position and the velocity for particle i at time t. The time step of a biomolecular simulation is typically chosen to be in the order of one femto-second. Using these equations thousands of iterations can be performed to investigate the movement of a molecule [26].

(9)

2.2 Molecular mechanics and force fields

In order to use MD simulations a potential function, V (r^N), must be defined and parametrized. There are several different types of potential functions (also called force fields) and this section describes the general form of those common in biomolecular simulations. These force fields are composed of two groups of terms: Intramolecular (Intra) terms, which are associated with covalently connected atoms and Intermolecular (Inter) terms, which represent the noncovalent or nonbonded interactions:

V (r^N) = V_intra(r^N) + V_inter(r^N) (5) The intramolecular potential function has the following form:

V_intra =

bonds

K_r(r−req)²+

angles

K_θ(θ−θeq)²+

dihedrals

V_n

2 [1+cos(nφ−γ)] (6) The ﬁrst two terms represent the potential energy of the bonds and angles.

For these contributions, the energy varies with the square of the displacement, which is an eﬀective way to keep the bond lengths r and angles θ close to their equilibrium values r_eq and θ_eq. K_rand K_θ are the force constants of the bonds and angles near their equilibrium value. The last term in the intramolecular term represents the potential energy associated with torsional movement. The dihedral angle is φ, V_n is the force constant, n is the periodicity, and γ is the phase of the dihedral.

The intermolecular potential function has the following form:

V_inter =

i<j

4_ij

σ_ij r_ij

12

−

σ_ij r_ij

6

+

i<j

q_iq_j

r_ij (7)

The ﬁrst term in the intermolecular contribution is the Lennard-Jones (LJ) 6- 12 term and represents repulsion and dispersion interactions. There are two parameters in this equation; the well depth, _ij, indicates the magnitude of favorable dipole induced interactions between atoms i and j, and σ_ij is the interatomic distance at which the interaction energy is zero. The last term in the potential function is the electrostatic term, which depends on the partial atomic charges, q_i and q_j, and the distance, r_ij , between atom i and j [26].

The parameters in the intra- and intermolecular terms are derived by using experimental data, combined with quantum chemical calculations, to obtain the correct behavior of diﬀerent types of molecules. AMBER, which is the force ﬁeld used in this work, is parameterized to be used in simulations of proteins and nucleic acids.

(10)

2.3 Thermodynamic integration

Free energy calculations are among the most powerful and interesting tools in computational chemistry, and the technique is frequently used in biomolecular calculations. This section describes a common free energy calculation method, thermodynamic integration (TI).

If the free energy diﬀerence, ∆F, between two states A (protonated aspartic acid) and B (unprotonated aspartic acid) is assumed to be a continuous function of a coupling parameter λ, it can be written as:

∆G(λ) =

₁

0

∂G(λ)

∂λ dλ =

₁

0

∂U (r^N, λ)

∂λ

λ

dλ (8)

The free energy F(λ) is deﬁned as:

F (λ) =−kBT ln(Z(λ)) =−kBT ln(Q(λ)) + c(p^N) (9) where Z(r^N, p^N, λ) is the classical partition function and Q(λ) is the conﬁg- urational integral with the velocity part integrated out. The conﬁgurational integral in the NVT ensemble is:

Q(λ) =

exp[−U(r^N, λ)/k_BT ]dr^N (10)

U(r^N, λ) is the potential energy function, T is the temperature and k_B is the Boltzmann constant.

Computer simulations allow manipulations of systems in a way that would not be possible in real experiments. Since the free energy is a state function, the transformation from state A to state B can be performed stepwise, where diﬀerent values of the coupling parameter λ correspond to mixtures of states A and B. A common hybrid function is a linear interpolation between the potential energy functions corresponding to each state, U_A and U_B:

U (r^N, λ) = (1− λ)UA(r^N) + λU_B(r^N) (11) The hybrid system is gradually changed from state A to state B by varying the coupling coordinate λ from 0 to 1.

By using the chain rule, the derivative of the free energy can be calculated:

∂F (λ)

∂λ =−kBT∂lnQ(λ)

∂λ =−kBT 1 Q(λ)

∂Q(λ)

∂λ (12)

The derivative of the classical partition function is:

∂Q(λ)

∂λ =− 1 k_BT

∂U (r^N, λ)

∂λ exp[−U(r^N, λ)/k_BT ]dr^N (13) Substituting this into equation (12) yields:

∂F (λ)

∂λ =

∂U (r^N, λ)

∂λ

exp[−U(r^N, λ)/k_BT ]

Q(λ) dr^N (14)

(11)

Since the Boltzmann probability function is ρ(λ) = exp[−U(r^N, λ)/k_BT ]

Q(λ) (15)

this yields

∂F (λ)

∂λ =

∂U (r^N, λ)

∂λ ρ(λ)dr^N =

∂U (r^N, λ)

∂λ

λ

(16)

The brackets indicate an average over the ensemble corresponding to the hybrid energy function, U(r^N, λ). Finally the free energy diﬀerence can be calculated from:

∆F (λ) =

1 0

∂U (r^N, λ)

∂λ

λ

dλ (17)

In practice, the integration can be achieved by making a series of simulations corresponding to a number of values of λ between 0 and 1. The total free energy diﬀerence can then be approximated by the area under the graph (Figure 2) [26].

0 0.2 0.4 0.6 0.8 1

<∂U/∂λ>λ

λ ∆G

Figure 2: The free energy can be calculated by making a series of calculations corresponding to diﬀerentλ values. The free energy change can be estimated by calculating the area under the graph.

The technique derived is not only valid for the NVT ensemble, but also for the other thermodynamical ensembles such as NPT and NVE. It should be noted that these ensembles essentially are artiﬁcial constructs and in the thermodynamic limit (large system size and long simulations) the calculated free energies become equivalent [29].

(12)

2.4 The Generalized Born model

Biomolecules interact in an aqueous phase and a correct description of this environment is thus important to obtain realistic simulations. Explicit solvation using an all-atom representation of water molecules is the most accurate way to incorporate these effects in MD, but it is also the computationally most expensive. A more approximate, but less expensive, alternative is implicit solvation, where the explicit water molecules are replaced with a continuum medium having the average properties of water. This procedure decreases computational time, because the number of atoms used is reduced. In addition, there are several other advantages. First, there is no need for re-organization of water molecules in response to conformational changes of the solute and, therefore, it is not necessary to perform long equilibrations in order for water to fully penetrate cavities in the protein. Also, since implicit representation of solvent corresponds to solvation in an infinite volume, difficulties with using periodic systems, such as replica interaction, are avoided [30].

Protein ε^w

Protein

H H O

H H O H H

O H H

O

H H O H H H H O

O

H H O H H O H H OH H O

H H O H H

O

H H O H H

O

H H O

H H H H O

O H H

O

H H O

H H H H O O

H H O

H H O H H

O H H

O

H H O

H H O H H

O

H H O H H O

Figure 3: By using an implicit solvent model the water molecules in a simulation do not have to be explicitly included. The water molecules are replaced with the average properties of water, characterized by its dielectric constant,w. This reduces computation time and accelerates conformational sampling.

In 1920 Born derived the electrostatic component of the free energy of solvation for displacing a charge within a spherical solvent cavity. For a simple ion with radius a and charge q the solvational free energy is given by the well-known Born formula [31]:

∆G_Born=−q² 2a

1− 1

_w

(18)

(13)

If a protein is modelled as a set of spheres with charges q₁,..., q_N and radii a₁,...,a_N the total electrostatic free energy is a sum of Born- and Coulomb terms:

G =

N i=1

N j=i+1

q_iq_j

_wr_ij −1 2

1− 1

_w

N

i=1

q²_i

a_i (19)

=

N i=1

N j=i+1

q_iq_j r_ij −

1− 1

_w

N

i=1

q_iq_j r_ij −1

2

1− 1

_w

N

i=1

q²_i

a_i (20) The last two terms represent the polarization free energy. This is the Generalized Born equation (GB) [23, 24]:

∆G^GB_pol =−

1− 1

_w

N

i=1

N j=i+1

q_iq_j r_ij −1

2

1− 1

_w

N

i=1

q_i²

a_i (21) As a ﬁnal approximation, the expression can be transformed into a single term:

∆G^GB_pol ≈ −1 2

1− 1

_w

N

i=1

N j=1

q_iq_j

f_GB(r_ij) (22) The function f_GB should be constructed so that when i=j (r_ij = 0), it acts as an ”effective Born radius”, f_GB = R_i. The effective Born radius, R_i, depends not only on a_i, but also accounts for the fact that atoms in proteins, unlike ions, are surrounded by neighbors that displace the solvent and, therefore, decrease the polarization energy (R_i ≥ ai). At longer distances between atom i and j f_GB acts as an effective interaction distance, f_GB ≈ rij. One popular form of f_GB is:

f_GB(r_ij) =

r²_ij+ R_iR_jexp[−rij/4R_iR_j]¹₂

(23) Many methods have been proposed for the estimation of R_i and numerous ex- amples can be found in literature [30]. These models have been shown to be a good approximation of solvation energies for small molecules, but for molecules having large interior regions the errors increase. This also affects pK_a calculations, and it seems as GB models might not provide the accuracy needed to predict pK_a shifts [32]. The largest errors in the calculation of effective Born radii were obtained for deeply buried atoms, where the model seems to under- estimate the pK_a shifts. In this work a recently developed Generalized Born model, GBÔBC [25], is used which corrects for the underestimation of effective Born radii for deeply buried atoms.

(14)

2.5 pK_a shifts from free energies

The pK_a value is connected to the free energy change of protonation. The equlibrium constant K_a is related to the pK_a value:

AH + H2O A⁻+ H3O⁺ (24)

pK_a=−log10(K_a) (25)

Since the equilibrium constant is connected to the free energy by:

K_a= exp

−∆Gdeprotonation

RT

(26) The pK_a can be calculated from equation (25) and (26), which yields:

pK_a= 1

2.303RT∆Gdeprotonation (27)

the pK_a shift of a residue relative to a reference compound with known pK_a value can then be calculated by performing two free energy calculations (Figure 4) [16]. One calculation where the free energy of deprotonation for the amino acid in the protein, ∆G_protein, is estimated and a second simulation where the same free energy for the amino acid in solution, ∆G_model, is computed.

Figure 4: By using two mutations, ∆pK_a with respect to a model compund can be calculated. Protein-ASPH and ASPH represent the protonated form of aspartic acid in the proteins and the model compound respectively. Protein-ASP⁻and ASP⁻ represent the unprotonated forms.

The relative free energies are given by:

∆∆Gdeprotonation= ∆G_protein− ∆Gmodel (28) The pK_avalue of the residue in the protein is then given by using equation (27):

pK_a,protein= pK_a,model+ 1

2.303RT∆∆Gdeprotonation (29)

∆pK_a,protein= pK_a,protein− pKa,model= 1

2.303RT∆∆Gdeprotonation (30)

(15)

2.6 Linear Response Theory

This section gives an introduction to linear response theory and provides a brief overview of concepts used in the analysis of the results of this work.

Assume that the potential function, U, for a system can be written as a sum of the initial state, U_A, and a perturbation, ∆U , where the variable λ ∈ [0, 1]

determines the magnitude of the perturbation.

U = U_A+ λ∆U (31)

In the case considered in this work the perturbation is the charge created on an amino acid by deprotonation and the λ value determines the mixed states of the protonated and deprotonated amino acid. For the linear interpolation (see Section 2.3), this means that ∆U = U_B−UA, i.e. equation 31 can be expressed:

U = λU_B+ (1− λ)UA= U_A+ λ(U_B− UA) (32) The free energy can be expressed through the perturbation formula derived by Zwanzig [33]:

∆F =−β⁻¹lnexp(−βλ∆U)_A (33)

Expansion of this expression in powers of βλ∆U yields [26]:

∆F = λ < ∆U >_A−λ²β 2

(∆U− < ∆U >A)²

A+ λ³^β₆²

(∆U− < ∆U >A)³

A+ ...

(34)

In early studies of electron transfer reactions, it was found that the free energy of charging is approximately quadratic as a function of charge. This implies that terms of order three and higher are negligible and this assumption leads to the linear response approximation (LR) [18].

∆F_LR= λ < ∆U >_A−λ²β 2

(∆U− < ∆U >A)²

A (35)

The free energy is hence a quadratic function of λ, in accordance with the results described above. From equation (35) it also follows that the derivative with respect to λ of ∆F_LR is a linear function of λ:

∂∆F_LR(λ)

∂λ =< ∆U >_A−λβ

(∆U− < ∆U >A)²

A (36)

It can also be shown that an equivalent condition is that the probability distribution of the energy gap is gaussian [34].

The free energy can be decomposed into two components [18]. The static component to the free energy, ∆F_static, corresponds to the introduction of the

(16)

The relaxation free energy can hence be obtained by calculating the second derivative with respect to λ:

∆F_relaxation=−β

2 < (∆U− < ∆U >A)²>_A=1 2

∂²F_LR(λ)

∂λ² (39)

(17)

3 Materials and Methods

3.1 Modelling proton binding with molecular mechanics

In this work, molecular mechanics is used to model the proton binding reaction.

The proton is treated as a classical mechanical particle, bearing a partial charge, and interacts with the atoms in the system via the force ﬁeld potential function.

The binding of the proton is introduced by creating a new particle, the hydrogen atom, in the simulation. By using thermodynamic integration, this can be modelled by gradually transforming the protonated form into the unprotonated of the amino acid.

Oδ1 Oδ2 -0.6376 (-0.8014)

Cγ 0.6462 (0.5307)

Cβ

Cα

Hδ2 0.4747 (0.0000)

-0.5554 (-0.8014)

Figure 5: Charges of the protonated form and the unprotonated form (in parenthesis) of aspartic acid.

The change of charge was localized to the C_γ, O_δ1, O_δ2and H_δ2atoms. The partial charges of aspartic acid were chosen so that the total change of charge is -1 (Table 1). In many other methods for pK_acalculations, the change is located only to the heavy atoms, but the approach taken here should provide a more accurate description of the reaction because the hydrogen atom is explicitly included.

Table 1: Partial charges and atom types of the protonated and unprotonated form of aspartic acid.

Protonated Unprotonated Atom Atom type Charge Atom type Charge

C_γ C 0.6462 C 0.5307

O_δ1 O -0.5554 O2 -0.8014

O_δ2 OH -0.6376 O2 -0.8014

H_δ2 H 0.4747 H 0.0000

(18)

3.2 Choice of appropriate residues and reference molecule

The choice of residues for which the pK_ashifts were calculated was based on two critera: First, many studies of pK_a values have focused on unperturbed surface groups, but it has been pointed out that in these cases reasonable agreement with experiment can be achieved even by fundamentally wrong models [14].

Thus, to be a proper test case of the applied methodology, the residues should have a large pK_a shift. Second, the residues were selected such that they are not inﬂuenced by the protonation state of other ionizable residues. That is, there should not be any residues nearby that have a similar pK_a value with respect to the residue of interest. Interacting titratable sites add complexity to the calculations, because all 2^N diﬀerent protonation states have to be taken into account in this case. The chosen residues are listed in Table 2.

Table 2: Experimental pK_a values of the three chosen residues and the model compound.

Residue pK_a

Thioredoxin - Asp26 7.5 Thioredoxin - Asp20 4.0 Ribonuclease A - Asp14 2.0

Model - Asp 4.0

Asp26 of the oxidized form of Escherichia coli Thioredoxin is a deeply buried residue in the interior of the protein and it has an experimentally determined pK_a of 7.5 [20]. It was assumed that it titrates independently of other ionizable groups, which seems reasonable since no nearby group has a similar pK_a value as Asp26. All the other ionizable groups in the protein have a ﬁxed protonation state, which is determined by the most likely protonation state at pH close to the experimental pK_a of the residue. In this case all other aspartic and glutamic acids were kept unprotonated, histidines were kept in a singly protonated state, and lysines and arginines were protonated. To conﬁrm that a surface exposed residue is expected to show unshifted behavior, Asp20 of Escherichia coli Thioredoxin was chosen. No experimental data was found for this residue, but it is conserved in human Thioredoxin where its pK_a is unperturbed (pK_a

≈ 3.6-4.0) [21]. All aspartic and glumatic acids were protonated in this case and histidines were kept in a doubly protonated state. Arginines and Lysines were also kept in their protonated state. Finally, to complement the abnormally high pK_a of Asp26, Asp14 of Ribonuclease A was chosen as the third case. Ac- cording to experiment it has an unusually low pK_a value equal to 2.0 [22]. For the same reasons as in the Asp26 of Thioredoxin, this residue was expected to titrate independently of other residues. In this case aspartic and glutamic acids were kept unprotonated although the predominant state at pH = 2.0 probably is the protonated form. Even though all of them are more than 10 ˚A apart from Asp14, this might affect the final results of the calculations. To study the influence of this, Generalized Born calculations were carried out both with the protonated and unprotonated aspartic and glumatic acids. Histidines were doubly protonated in all these calculations.

(19)

As model compound aspartic acid with blocking groups N-acetyl and N- methylamide (Figure 6) was chosen. The model pK_a value is, according to experiment, equal to 4.0 [19].

O H

N

H N CH₃

H O

O CH₃ O

Figure 6: Aspartic acid with protective groups was chosen as reference molecule for thepKacalculations.

3.3 Setup of explicit solvent calculations

All explicit solvent free energy calculations were carried out with AMBER 7.0 suite of programs [35] using the AMBER/parm94 force ﬁeld [36] and the TIP3P water model [37]. Periodic boundary conditions were applied using an octahedral box. The temperature was set to 300 K, the time step to 1 fs, and bonds to hydrogens were constrained with SHAKE [38]. Long-range electrostatic interactions were treated by PME [39] with an 8 ˚A cutoﬀ for the direct space sum and a continuum method to evaluate van der Waals interactions beyond the direct sum.

All calculations on Thioredoxin were performed starting from an X-ray structure with 1.68 ˚A resolution (PDB code 2TRX) determined at pH=3.8 [40]. The crystallographic structure was placed in an octahedral box with a volume of about 181000 ˚A³. In the Asp26 case the box contained 4757 water molecules and 4 Na⁺ ions and in the Asp20 case 4753 water molecules and 3 Na⁺ ions.

The structure was equilibrated by performing 50 ps of NVT simulation, in which the temperature was raised to 300 K. An additional NPT simulation was carried out for another 110-250 ps until a proper density of the system was reached.

All calculations on Ribonuclease A were performed starting from an X-ray structure with 1.05 ˚A resolution determined at pH 5.2 ( PDB code 1KF2 ) [41].

The crystallographic structure was placed in an octahedral box with volume 267923 ˚A³ containing 7324 watermolecules and 9 Cl⁻ ions. The system was equilibrated by performing 50 ps of NVT simulation, in which the temperature was raised to 300 K. Finally a NPT simulation was carried out for another 250

(20)

Successive simulations were started from the conﬁgurations reached after 200- 1000 ps of the previous simulation. For Asp26 and Asp14, which are buried atoms, convergence was slow and a 3 ns equilibration was performed at each λ. After that a 5 ns production for Asp14 and 6 ns production for Asp26 was carried out. Asp20, which is a surface residue, was equilibrated for 200 ps at each λ and after that a 5 ns production were carried out. At each endpoint a short, 1.5 ns, reverse calculation (λ=0.88729 to λ= 0.5) was carried out for all residues. This was performed to detect possible hysteresis eﬀects in the free energy derivative. The model compound was simulated for about 1.0 ns for each window.

3.4 Setup of implicit solvent calculations

All Generalized Born thermodynamic integration calculations were carried out with AMBER/GBÔBC(parameter set I) [25] with a modified form of the parm99 force field, parm99Mod2, [42, 43]. The temperature was set to 300 K, the time step to 2 fs and bonds involving hydrogens were constrained with SHAKE [38].

No cut-off for long-range interactions was used, GB/SA was used and concen- tration of monovalent ions was set to 0.2 M. The same crystal structures as in the explicit solvent run were used. They were prepared by a 20 ps equilibration in which the temperature was gradually raised to 300 K. The coordinates were held fixed with harmonic restrains and gradually released during a period of 30 ps. The TI calculations were performed with five different λ’s between 0 and 1 (0.0, 0.11270, 0.5000, 0.88729, 1.0). For most points a 10 ps equilibration was performed before starting the production runs in order for the protein to adjust to the perturbation. All successive simulations were started from the configu- ration reached after 200-500 ps of the previous simulation except λ = 0 and λ

= 0.11270, which were both started from the equilibrated X-ray structure

(21)

4 Results and Discussion

This section is divided into two parts. The ﬁrst part discusses the calculations and the agreement of the results with linear response theory. Section two ana- lyzes the structures, the calculated pK_a values and compares this work to earlier computational studies and experimental data.

4.1 Linear free energy derivative and Gaussian distribu- tion of the perturbation energy

4.1.1 Explicit solvent calculations

The total production time for the explicit solvent calculations was on average more than 5 ns for each λ value. The calculated free energies of deprotonation are shown in Table 3 together with the estimated errors.

Table 3: Explicit solvent MDFE results (kcal/mol)

Model Asp26 Asp20 Asp14

run length (ns) 3.0 18.0 15.0 15.0

∂G

∂λ(λ=0.11270) -1.3(0.8) -3.1(12.0) -1.0 (1.6) -1.3(1.0)

∂G

∂λ(λ=0.5) -75.3(0.2) -64.5(4.6) -73.9(0.4) -78.3(1.8)

∂G

∂λ(λ=0.88279) -148.6(2.0) -131.3(7.8) -145.1(1.8) -146.8(3.2)

∆G -75.1(0.6) -66.0(4.3) -73.4(0.7) -75.9(1.1)

∆∆G 0 9.1(4.3/2.5) 1.7(0.9) -0.8(1.4)

∆∆G (exper.) - +4.8 0 -2.7

Absolute and relative free energies (kcal/mol) for the three residues and the model compound. ∆G and ∆∆G are calculated as presented in the methods section. The error in the derivatives are estimated as twice the standard deviation of block averages, each trajectory being divided into four blocks.

As described in Section 2.6 a gaussian distribution of the energy gap ( ∆U

= U_protonated(λ) - Uunprotonated(λ) ) and a linear free energy derivative with respect to λ can be expected from linear response theory. Figure 7 shows the free energy derivative with respect to λ for each residue. Figure 8 shows the log probability distributions of the energy gap for each residue and λ value.

(22)

-160 -140 -120 -100 -80 -60 -40 -20 0

0 0.2 0.4 0.6 0.8 1

∂G/∂λ

λ Model

-160 -140 -120 -100 -80 -60 -40 -20 0

0 0.2 0.4 0.6 0.8 1

∂G/∂λ

λ Model

Fit

(a) Model.

-160 -140 -120 -100 -80 -60 -40 -20 0

0 0.2 0.4 0.6 0.8 1

∂G/∂λ

λ Asp26

-160 -140 -120 -100 -80 -60 -40 -20 0

0 0.2 0.4 0.6 0.8 1

∂G/∂λ

λ Asp26

Fit

(b) Asp26.

-160 -140 -120 -100 -80 -60 -40 -20 0

0 0.2 0.4 0.6 0.8 1

∂G/∂λ

λ Asp20

-160 -140 -120 -100 -80 -60 -40 -20 0

0 0.2 0.4 0.6 0.8 1

∂G/∂λ

λ Asp20

Fit

(c) Asp20.

-160 -140 -120 -100 -80 -60 -40 -20 0

0 0.2 0.4 0.6 0.8 1

∂G/∂λ

λ Asp14

-160 -140 -120 -100 -80 -60 -40 -20 0

0 0.2 0.4 0.6 0.8 1

∂G/∂λ

λ Asp14

Fit

(d) Asp14.

Figure 7: The free energy derivative (kcal/mol) as a function ofλ for the explicit solvent calculations: a) Model, B) Asp26, C) Asp20, D) Asp14.

The linearity in the free energy derivatives is close to perfect for all three residues, although it should be pointed out that only three data points were used in this study. Using more points would, of course, more clearly show if the response truly is linear. The ”backward” runs, carried out to detect possible hysteresis, overlapped well with the ”forward” runs for all residues (data not shown). The decrease in the free energy derivative describes the increasing polarization of the environment surrounding Asp. This indicates that the environment, as described in Section 2.6, responds linearly to the deprotonation of the residues.

Proton binding in proteins: pK

JENS CARLSSON

Proton binding in proteins: pK a

calculations with explicit and implicit solvent

Molecular Biotechnology Programme

Proton binding in Proteins:

pK

calculations with explicit and implicit solvent

Jens Carlsson

Proton binding in proteins: pK _a