• No results found

Novel Potential Cruzain Inhibitors Identified through Virtual Screening

N/A
N/A
Protected

Academic year: 2022

Share "Novel Potential Cruzain Inhibitors Identified through Virtual Screening"

Copied!
30
0
0

Loading.... (view fulltext now)

Full text

(1)

UPTEC X09 035

Examensarbete 30 hp Oktober 2009

Novel Potential Cruzain Inhibitors Identified through Virtual Screening

Henrik Keränen

(2)

Molecular Biotechnology Programme

Uppsala University School of Engineering

UPTEC X 09 035

Date of issue 2009-10

Author

Henrik Keränen

Title (English)

Novel potential cruzain inhibitors identified through virtual screening

Title (Swedish)

Abstract

The World Health Organisation estimates that 16-18 million people are infected by Trypanosoma cruzi, the causative parasite of Chagas’ disease. A serious disease against which there are few and no effective drugs for. This project has focused on finding inhibitors through virtual screening for the major cysteine protease, cruzain. Using the relaxed complex scheme a set of 46 putative inhibitors have been found, most of which would not have been found with conventional methods.

Keywords

Chagas’ disease, Cruzain, Inhibitors, Virtual Screening, Relaxed Complex Scheme

Supervisors

Andrew J. McCammon

Department of Chemistry and Biochemistry, University of San Diego, USA

Scientific reviewer

Johan Åqvist

Department of Cell and Molecular Biology, Uppsala University

Project name Sponsors

Language

English

Security

ISSN 1401-2138 Classification

No

Supplementary bibliographical information Pages

27

Biology Education Centre Biomedical Center Husargatan 3 Uppsala Box 592 S-75124 Uppsala Tel +46 (0)18 4710000 Fax +46 (0)18 555217

(3)

Novel potential cruzain inhibitors identified through virtual screening

Henrik Keränen

Populärvetenskaplig sammanfattning

Trypanosoma cruzi är en parasit som orsakar Chagas sjukdom hos människan, med symptom som hjärtrytmfel och förstorad grovtarm; som alla kan leda till död. Världshälsoorganisationen uppskattar att 16-18 miljoner människor är bärare av parasiten. Eftersom sjukdomen främst finns i fattiga delar av Sydamerika har läkemedelsbolagen ett vagt intresse av sjukdomen. Idag finns inte några effektiva läkemedel mot sjukdomen, så det är viktigt med mer forskning som leder fram till nya läkemedel inom en snar framtid.

För att kunna finna nya läkemedel så har detta arbete fokuserats på att göra datormodelleringar av ett av parasitens proteiner. Genom att simulera hur proteinet rör sig så kan man få en mer

verklighetstrogen bild av det, och därmed kunna designa inhibitorer som är anpassade för just detta protein. Här har det gjorts genom att ta ögonblicksbilder av simuleringen och försöka passa in

liganderna i alla dessa. Utifrån detta kan man då räkna ut vilka ligander som binder bäst till det rörliga proteinet, snarare än bara en fast struktur. Och i slutändan kan förhoppningsvis dessa resultat vara ett steg på vägen till ett nytt fungerande läkemedel.

Examensarbete 30 hp

Civilingenjörsprogrammet i molekylär bioteknik Uppsala universitet 2009

(4)

Contents

1. Introduction . . . . 1

1.1 Trypanosoma cruzi . . . 2

1.1.1 Life cycle . . . 2

1.2 Cruzain . . . 2

2. Theory . . . . 4

2.1 Molecular dynamics . . . 4

2.2 Force fields . . . 5

2.3 Free energy calculations . . . 7

2.4 Lamarckian genetic algorithm . . . . 9

3. Methods . . . . 10

3.1 System setup . . . 10

3.2 Ligand setup . . . . 10

3.3 Initial docking . . . 10

3.4 Scoring . . . 11

3.5 MD simulation . . . . 11

3.6 Clustering . . . 12

3.7 Relaxed complex scheme . . . . 12

3.7.1 Redocking . . . 12

3.7.2 Rescoring . . . 13

4. Results and Discussion . . . 14

4.1 Scoring functions . . . 14

4.2 Positive controls . . . 14

4.3 Docking . . . 15

4.4 Clustering . . . 15

4.5 Relaxed complex scheme . . . . 16

4.5.1 Positive controls . . . 17

4.5.2 Redocking . . . 18

5. Conclusions . . . 23

6. Future work . . . 23

7. Acknowledgments . . . 23

8. References . . . . 24

(5)

1

1 Introduction

The World Health Organization (WHO) estimates that 16-18 million people are infected by the protozoan parasite Trypanosoma cruzi, the causative agent of Chagas’ disease, in 21 endemic countries. The disease is part of WHO’s list of neglected diseases and typically transmitted through a vector, the triatomine insect, but transmission through blood transfusion is also common (Kirchhoff, 1993, Shulman et al., 1997). The disease has two stages: an acute and a chronic stage. The acute phase lasts from a few weeks to a few months, usually with only mild symptoms such as fever. An inflammatory lesion may develop at the site of entry, which spreads regionally. The parasite multiplies via asynchronous cycles, causing cell destruction and re-infection within the

reticuloendothelial system, part of the immune system. Approximately 30 % of infected persons progress to the chronic stage of Chagas’ disease, which develops over many years following the initial infection. In the chronic stage, various organs are infected, including the heart, with heart rhythm abnormalities often leading to sudden death; the digestive system, where dilation of the digestive tract causes megacolon leading to severe weight loss; and the nervous system, potentially leading to neurological disorders including dementia. (Tanowitz et al., 1992)

Most drugs, including those that are effective against close relatives of T. cruzi such as T.

brucei, are not effective against T. cruzi. Benznidazole and Nifurtimox are the only available

therapies, but, while they are effective in the acute phase of Chagas’ disease, they are ineffective in the chronic state. Both drugs must be taken for an extensive period of time and may cause severe side effects including nausea, vomiting, loss of weight, and anorexia (Guadalupe et al., 2006). No extensive studies of long-time use have been done in humans, but several reports of invasive lymphomas in rabbits given the drugs have been described (Teixeira et al., 1990, Teixeira et al., 1990). Many efforts have been made to develop vaccines against T. cruzi, but many difficulties have arisen, one being that there may be an autoimmune component involved in the disease pathology (Brener, 1986). Thus so far no success in vaccine development has been achieved.

The major cysteine proteinase cruzain has been proposed as a drug target. It can be detected in all stages of the parasite life-cycle, and, even though the exact function of the

proteinase is not clear, it is thought to have an important role in differentiation, cell invasion, and intracellular multiplication (Tomas et al., 1997). These are all key pathogenic steps leading to chronic Chagas’ disease. Furthermore studies have shown that cysteine proteinase inhibitors have trypanocidal activity, and negligible toxicity towards mammalian cells (Meirelles et al., 1992).

As Chagas’ disease is one of the neglected diseases and, the endemic area affected is in the poor and rural parts of Latin America, pharmaceutical companies have neglected vaccine and drug

(6)

2

development. Because of this neglect, the academia can have a major impact in this research area.

1.1 Trypanosoma cruzi

The disease was discovered by Carlos Chagas in 1909, and he is the only scientist so far that has described a new infectious disease, with its pathogen, vector, host, clinical symptoms, and epidemiology (Chagas, 1909, Chagas, 1911). Of roughly 20 known Trypanosoma species, only two cause disease in humans, T. brucei causes African trypanosomiasis, or African sleeping sickness, and T. cruzi causes American trypanosomiasis, or Chagas’ disease. T. cruzi is a protozoan parasite with only a nucleus, flagella and the associated organelle, the kinetoplast, and reproduces through binary fission.

1.1.1 Life cycle

T. cruzi has distinct life stages in both the digestive tract of triatomine insects and in the blood and muscle tissue of mammals. After the triatomine insect feeds on the blood of a vertebrate host, the insect defecates, allowing T. cruzi to escape the triatomine digestive tract. Scratching the bite site allows T. cruzi to enter the wound and infect the mammal. The main life stages of the parasite are trypomastigote, amastigote, and epimastigote. When leaving the insect and entering the mammal host, e.g. human, the parasite is in the trypomastigote form, which includes a flagellum for better motility. The parasite then travels through the bloodstream and colonizes a muscle cell.

Inside the cell, it develops into the next form, the amastigote. Amastigotes are oval in shape and lack the flagellum. They will form cysts inside the cells that are released into the blood stream when the muscle cell erupts. Upon cell eruption, the parasite is transformed into the trypomastigote form, and thereby colonizes new cells. When the triatomine insect feeds on an infected mammal, it ingests trypomastigotes, which subsequently travel to the insect digestive tract and transform into flagellated epimastigotes. These epimastigotes are adapted to survive in the intestinal tract of the insect. The triatomine can be found in rural areas in most parts of South and Central America. The insect vectors often live in mattresses and in the cracks of mud houses that are common in rural South America. And as the vector comes out at night to feed and reproduce, humans are often bitten during their sleep and often do not know that they have been infected.

1.2 Cruzain

Cruzain (fig. 1), a papain-like cysteine protease comprised of 215 amino acids, is active in the digestive vacuoles of the parasite. Although the exact function of cruzain is not known, the protein is thought to play an important role in differentiation, cell invasion, and intracellular multiplication (McKerrow et al., 1993). Cruzain is made up of two distinct domains separated by a cleft where the catalytic triad is situated. This triad is comprised of cysteine-25, histidine-159, and a stabilizing

(7)

3 glutamine-19.

Figure 1. Cruzain (PDB code: 1ME4) with docked positive control, 1ME4. The protein is made up of two domains separated by a cleft where the active site is located.

(8)

4

2 Theory

2.1 Molecular dynamics

There is a close relationship between the structure and movements of a protein and its function. To visualize and study these movements on a atomic level molecular dynamics is a powerful tool. As implied by the name, molecular dynamics considers the dynamics of molecular motions. A system of N atoms, e.g. a protein, is subject to a potential energy, V, which depends on all atomic positions, and so moves according to a force, Fi, which is calculated using Newton’s law of motion

 = −  

(1)

where rN= r1, r2,…, rN is the spatial positions of all N atoms. Newton’s second law can be used to calculate the acceleration, ai, of each atom using the known mass, mi.

=

(2)

With the assumption that the force, Fi, on a given atom over a defined time step is constant, new positions and velocities can be calculated by integrating equation (2). The potential energy of the system, represented by a force field in molecular dynamics, is the most crucial part of the

simulation. It must represent the interaction between atoms in a realistic way. Additionally, the force field equation must be a simple mathematical function that can be quickly calculated. The most commonly used algorithm in molecular dynamics is the velocity Verlet

 + =  + (3)

 +  =  +  +  (4)

where rit and  are the position and velocity vectors for the atom i at time t, and δt is the time step in the simulation, typically two femtoseconds. Equations (2) and (3) together with equation (4) yield the atomic positions of the next time step

 +  =  +  + !"! (5)

(9)

5

For equation (5) to generate the correct Boltzmann distribution for canonical (NVT) ensemble simulations, in which the temperature is controlled, the velocity is calculated by integrating the (stochastic) Langevin equation

# = − $− %&'( )* (6)

where $ is the friction coefficient, kB is the Boltzmann constant, T is the temperature, and R(t) is a Gaussian random process.

2.2 Force fields

The potential energy used to calculate atomic forces and positions can be described in various ways. In molecular dynamic simulations, the potential energy functions, or force fields, generally include terms representing covalently and non-covalently connected atoms. The equations describing forces between covalently bound atoms contain expressions that account for bond length, angles between atoms, dihedral angles, and improper dihedral angels. The equations describing forces between non-covalently bound atoms contain expressions that account for van der Waals and electrostatic interactions.

+, = +-./01, + +2/3451, + +0650 241, +

+7 .75 1, + +809, + +54, (7)

The first five terms in equation (7) represent the potential energy contribution from the covalently bound atoms. To represent the displacements from the ideal bond length, i.e. the bond length at equilibrium, the potential energy is calculated according to Hooke’s law.

+-./01 = :  − 5;  (8)

Kr is the force constant that determines the flexibility of the bond, r is the bond length, and req is the bond length at equilibrium. The bond angles can also be calculated with a simple harmonic function

+2/3451 = :<Θ − Θ5;  (9)

where the force constant KΘ determines the flexibility of the angle, Θ is the angle between the two bonds and Θeq is the angle at equilibrium. To calculate the potential function for rotation about a bond, a periodic cosine function is used

(10)

6

+0650 241 = :>1 + @ABCD −  (10)

where Kφ is the force constant for rotation around a dihedral angle φ, n is the periodicity, and δ is the phase. To ensure that atoms that should be in-plane are not bent out if it, e.g. carbons in a benzene ring, the improper potential function is used. This function can also be represented by a harmonic equation

+7 .75 1 = :GH − H5; (11)

where Kξ is the force constant for the improper angle ξ and ξeqis the improper angle at equilibrium.

To calculate the potential energy of the non-covalently bound atoms, the Lennard-Jones potential is used to approximate the van der Waals contribution

+809 = LMMN− OMMNP (12)

where Aij and Bij are Lennard-Jones parameters for interactions between atom i and j, and rij is the distance between atom i and j. The first term in the equation describes the repulsive potential energy preventing atoms from overlapping, and the second term describes the attractive potential due to London dispersion forces. To calculate the electrostatic interactions between atoms, Coulomb’s law is used

+54 =QRST; ; UU (13)

where q is the partial charge of atom i and j, respectively, rij is the distance between atom i and j and ε0 is the electric permittivity in vacuum. To calculate the total force field for all atoms equations (8) – (13) are condescended into equation (7)

+, = ∑-./01YZ − 5; +

2/3451Y[Θ − Θ5; +

0650 241Y\1 + @ABCD −  + 7 .75 1Y]H − H5; +

^LMMN− OMMNP+QRS

T

; ;U U_

/./N-./050

2!.72 1 ,M (14)

(11)

7 2.3 Free energy calculations

There are a number of different methods to calculate the standard and relative free energies of a reaction (Zwanzig, 1954, Åqvist et al., 1994). The free energy is important as a scoring function, as it can indicate which ligand fits the receptor the best. The free energy also serves as a reference that can be compared to experimentally obtained free energies. Biomolecular

simulations, and, in this project, the relaxed complex method, need reasonably precise but more importantly fast algorithms to estimate the free energy. The Molecular Mechanics/Poisson Boltzmann Surface Area (MM/PBSA) scheme (Kollman et al., 2000) evaluates the standard free energies of molecular complexes in a less computational expensive way. The method combines molecular mechanical energies, EMM, molecular solvation energies, GPBSA, and solute entropy, SMM, to calculate the average standard free energy, G, of the molecule

abc = adeec + abfghic − jakeec (15)

where the average molecular mechanical energy, aEMMc, is defined as

adeec = ad-./0c + ad2/345c + ad!. 1./c + ad809c + ad54c (16)

where Ebond, Eangle, Etorsion, EvdW and Eel are bond, angle, torsion, van der Waals and electrostatic terms of intramolecular energy, respectively. The average molecular solvation free energy, aGPBSAc, can be calculated by

abfghic = abfgc + $aLc (17)

where $ is the surface tension of water, aAc is the average solvent accessible surface area and abfgc is the average electrostatic contribution of molecular solvation energy that can be calculated by

bfg = ∑ s t2;− t3  (18)

where q is the partial charge of atom i, ϕiaq and ϕig are the electric potentials of atom i in aqueous and gas phase, respectively. They can be calculated by solving the Poisson-Boltzmann equation

−∇ ∙ z{∇ϕr| + } sinh 5'‚ 

()  = ƒ (19)

where εr is the position dependent dielectric constant, kB is the Boltzmann constant, T is the absolute temperature, ec is the electron charge, and κ is the inverse Debye-Hückel screening length given by

(12)

8 } ='…5†

()S  (20)

I is the ionic strength, and ρr from equation (19) is the charge distribution of the molecule solved by

ƒ = ∑ s  −  (21)

The average of the molecular mechanical entropy can be decomposed to

akeec = ak! 2/1c + ak .!c + ak8-c (22)

where Strans, Srot and Svib are the translational, rotational, and vibrational entropies, respectively. The translational entropy is calculated by

k! 2/1 = ˆgln Re'6†()Š‹   (23)

where M is the molecular mass, and h is the Planck constant. The rotational entropy can be calculated by

k .! = ˆgln R †5Ž † ‘R6†'†

(…

’“  (24)

where σ is the symmetry number, and Ii is the moment of inertia calculated by the eigenvalues of

• = –

∑ # + ˜ − ∑ # ™— − ∑ # ™˜

− ∑ # —™ ∑ # + ˜ − ∑ # —˜

− ∑ # ˜™ − ∑ # ˜— ∑ # + —

š (25)

x, y and z are the Cartesian coordinates, and m is the mass of each atom. The vibrational entropy is calculated by

k8- = ˆg ›<œž

Ÿ[œ   − ln 1 − e<œž

’,NPM (26)

where the normal mode vibrational temperatures, Θωj, are defined by Θ¤ ℏ§' U

( (27)

where ωj is the normal angular frequency, and ℏ is the Dirac constant. The binding free energy of a

(13)

9 ligand to a receptor can be calculated by

Δb©ª«¬ = Δb­®¯°±Ÿ²− Δb³Ÿ­Ÿ°´®³− Δb±ªµ¶«¬ (28)

where ΔGcomplex, ΔGreceptor and ΔGligand are the standard free energy of the receptor, ligand, and complex, respectively. The derived binding free energy can then be compared with an

experimentally obtained binding free energy, ΔGexp, calculated from the dissociation constant, KD, with

ΔbŸ²°= RT ln K¿ (29)

where R is the gas constant.

2.4 Lamarckian genetic algorithm

To explore a large conformational space during the docking procedure, a fast search

algorithm is needed. In genetic algorithms, the arrangement of the ligand and its proposed receptor can be defined by energy calculations. The state values of the ligand are its genotype and the atomic coordinates are the phenotype. New individuals are produced via “crossovers” between random pairs, giving the offspring genes from both parents. Additionally, some of the offspring also undergo random “mutations”. By introducing a local search method, the genetic algorithm can be enhanced by yielding a genotype from a phenotype and therefore obeying Lamarckian genetics instead of Mendelian genetics. The offspring will inherit any changes of the phenotype due to adaptations to the environment during the local search. The search algorithm will adjust the step size depending on if there are a number of repeated increases or decreases of energy. The Lamarckian genetic algorithm is a hybrid of the original genetic algorithm with the adaptive local search method.

(Morris et al., 1998)

(14)

10

3 Methods

3.1 System setup

The crystal structure of cruzain from T. cruzi was used in the simulations in this work (PDB code 1ME4; Huang et al., 2003), and the coordinates of the corresponding apo structure was obtained by removing the reversely bound inhibitor. The protonation states of the protein residues were determined at pH 5.5 by the National Biomedical Computation Resource PDB2PQR Web service (Dolinsky et al., 2004) and manually verified. The low pH was chosen to mimic the natural acidic environment cruzain is expressed in. The system was parameterized using the Generalized Amber force field (GAFF; Wang et al., 2004) and the FF99SB AMBER force field (Hornak et al., 2006).

All crystallographic water molecules were kept and the system was solvated in a TIP3P water box (Jorgensen et al., 1983), with 10 Å between the protein and the boundary of the box. The system was neutralized by adding 10 sodium ions, and additional ions was added to obtain a 20 mM NaCl concentration in the system, to mimic physiological conditions. The preparative procedures above were performed using the LEaP module of AMBER9 suite (Case et al., 2005). VMD (Humphrey et al., 1996) was used to measure the dimensions and center of the solvation box for MD simulation setup.

3.2 Ligand setup

The NCI diversity set II is a collection of 1364 widely diverse pharmacophores. The diversity set was downloaded and prepared for docking using LigPrep in the Schrödinger suite at pH 5.5 to get accurate 3D models of the proposed ligands.

3.3 Initial docking

The initial docking was done with CDOCKER within Accelrys Discovery Studio using the positive controls and the NCI diversity set to dock into the apo structure of the crystallographic protein. The docking was done within a predefined sphere with a diameter of 15 Å and the center of the sphere given by the crystallographic ligand coordinates. (fig. 2)

(15)

11 3.4 Scoring

The scoring functions LigScore1, LigScore2 (Krammer et al., 2005), Jain (Jain, 1996), PLP1, PLP2 (Gehlhaar et al., 1995), PMF (Muegge et al., 1999), PMF04 (Muegge, 2006), and the binding energy calculated by CDOCKER, in Score Ligand Poses in Accelrys Discovery Studio was used to score all the docked ligands including the positive control. Some of the ligands had several different versions of it since the ligands can take different forms, e.g. tautomers and different protonation states. The different states were handled as different ligands. The ligands were scored and ranked separately in all scoring functions and then the first 302 unique ligands from the entire “scoring table” were selected. This was done by taking the first ten ligands in the table and deleting all redundancies, and then doing the same with the next ten ligands, until the 302 unique ligands were found.

3.5 MD simulations

NAMD2.7b1 (Phillips et al., 2005) was used for all MD simulations in this project. Periodic boundary conditions were used on the system with Particle-Mesh Ewald Method to take care of electrostatic effects efficiently with a smoothing cut-off of 14 Å for long-range interactions. Langevin dynamics were applied to maintain the temperature, and a modified Langevin Piston Nosé-Hoover was used to keep the pressure at 1 atm in the system. Energy minimizations were gradually done, starting with relaxing hydrogen atoms for 5,000 steps, carrying on with relaxing hydrogen atoms, waters and ions for 5,000 steps, then relaxing hydrogen atoms, waters, ions and side chains for 10,000 steps, and finally relaxing all atoms for 25,000 steps. After the minimizations the system was equilibrated in the NPT-ensemble at 310 K using a stepwise harmonic constraint force constant of 4, 3, 2, and 1 kcal/mol/Å2 on the protein backbone for 250,000 steps, respectively. All minimizations and equilibrations were carried out with a 1 fs time step. Finally five 20 ns simulations were carried

Figure 2. Docked positive control, 1ME4, into the apo structure of the

crystallographic protein.

(16)

12

out, with 2 fs time steps, with different random seeds to sample many protein configurations.

3.6 Clustering

The gromos clustering method within GROMACS (g_cluster) was used to cluster the trajectories from the MD simulation (Lindahl et al., 2001). This was done to get a reduced set of representative structural ensembles for the simulation. Structures with an interval of 50 fs were extracted from each simulation, generating totally 4,000 trajectories that were superimposed using all Cα atoms in the protein to eliminate all rotations and translations. The clustering was done on 73 active site residues of the protein, which was defined as all residues within 10 Å of the crystallized ligand: 18-31, 50, 53-54, 57-72, 74, 91, 93-98, 115, 117, 120, 136-142, 144-145, 158-165, 181-184, 203-210. The rmsd of every structure was calculated and by using a cut-off, the structure with the highest number of neighbors was taken as the center of the cluster. The structures of that cluster were deleted from the pool of total structures, so that no overlapping clusters were produced. The procedure was then repeated until the pool of structures was empty. (Daura et al., 1999) The cut-off was obtained by meeting criteria that have been established in the lab, e.g. 90 % of all trajectories should be within the seven most populated clusters, and there should be only a few clusters with only one protein configuration. A central member was then selected from each cluster to represent that cluster in the following dockings.

3.7 Relaxed complex scheme

Many docking schemes include ligand flexibility, but most programs still consider the receptor to be rigid. The assumption that the receptor will act as a rigid body is not true, consequently that approach will have a hard time finding ligands that bind to receptor

configurations that only exists for short amount of time during the molecular dynamic simulations.

The McCammon lab introduced the relaxed-complex method, which account for receptor flexibility.

In the scheme, a long molecular dynamics simulation of the apo receptor is first performed to sample the multiple receptor conformations. Small libraries consisting of many candidate ligands are then docked into multiple conformations extracted from the simulation.

3.7.1 Redocking

Accelrys Discovey Studio´s CDOCKER was used to redock the positive controls and the 302 selected ligands from the first docking into the selected central member of each cluster. The same protocol as in the first docking was used; docking within a predefined sphere with a diameter of 15 Å and the center of the sphere given by the crystallographic ligand coordinates.

(17)

13 3.7.2 Rescoring

When rescoring the docked ligands the same protocol was used as in the first scoring;

Accelrys Discovery Studio´s Score Ligand Poses was used with LigScore1, LigScore2, PLP1, PLP2, PMF04 scoring functions and the binding energy calculated by CDOCKER.

(18)

14

4 Results and Discussion

4.1 Scoring functions

The different scoring functions used in this project were LigScore1, LigScore2, Jain, PLP1, PLP2, PMF, and PMF04. The scoring functions are usually divided into to three categories, which are force field, empirical, and knowledge based. LigScore1, LigScore2, Jain, PLP1, and PLP2 are empirical scoring functions, whereas PMF and PMF04 are knowledge based scoring functions. Knowledge based scoring functions uses statistical atom pair potentials derived from databases. Both PMF and PMF04 are based on protein-ligand complexes from the Protein Data Bank, where the former is based on 697 complexes and the latter 7152 complexes. Empirical scoring functions are based on a model that calculates the most important physical properties of protein-ligand interactions. The empirical different scoring functions take different physical properties into account for giving both scores and penalties. PLP1 and PLP2 are simple scoring functions that use an effective stochastic search procedure and a simple energy function based on steric interactions, hydrogen-bond interactions, and an elementary intramolecular term. Jain accounts for hydrophobic interactions, polar interactions, entropic effects, and solvation effects. Whereas LigScore1 and LigScore2 are built up by a function based on van der Waal interactions, polar interactions and gives desolvation penalties.

All of the scoring functions have successfully been used in other studies (Krammer et al., 2005, Jain, 1996, Gehlhaar et al., 1995, Muegge et al., 1999, Muegge, 2006). The fact that the scoring functions have been widely used with success, and that they are from different categories, but also based on different criteria within the categories were all reasons why they were used in this project.

4.2 Positive controls

To confirm that the docking conditions that were used were good, the positive controls were docked into the apo structure of the crystallized protein. When using Accelrys Discovey Studio´s CDOCKER the docked ligand pose could be match with the predicted pose from the holo structure. All-atoms root mean square deviation (RMSD) for the docked ligand compared to the crystal structure were calculated to 0.84 Å for 1ME4 (fig. 3). CDOCKER was used because it was able to recapture the crystallographic pose, which both AutoDock and AutoDock Vina failed to do. Since it was able to dock the positive controls in a highly successful way the program can be used with confidence and the positive control can be used as validations and references for the hits in the virtual screening. As additional validation another known inhibitor, PDB code 1ME3, (Huang et al.,

(19)

2003) was also used as positive control

well. The calculated all-atoms RMSD for 1ME3 for the docked ligand compared to the crystal structure was 1.46 Å. 1ME3 includes a pyridine ring that has a pKa of 5.21, and since

simulations will be done at a pH of 5.5 protonated (fig. 3). The binding free energies

were -61.4 kcal/mol for 1ME4, -60.427 kcal/mol for 1ME3 1ME3.

4.3 Docking

All 1364 ligands from the NCI diversity s structure. This was done to get a smaller set of ligands expensive relaxed complex method

with LigScore1, LigScore2, PLP1, PLP2, PMF, PMF04, and the binding energy, and thereafter ranked according to the protocol above.

the new set. All of the ligands in the new dataset docked close to surface making all of them putative

4.4 Clustering

After evaluating the different cut

clusters. The cut-off also met all the criteria that were

include 97.6 % of the total 4002 trajectories and only five clusters have just one protein conformation (table 1).

15

positive control, which was docked into the apo structure with success atoms RMSD for 1ME3 for the docked ligand compared to the crystal 1ME3 includes a pyridine ring that has a pKa of 5.21, and since

at a pH of 5.5, two states of the ligand were made; one neutral and one The binding free energies for the positive controls obtained from the dockings

60.427 kcal/mol for 1ME3, and -46.495 kcal/mol for

All 1364 ligands from the NCI diversity set II were docked into the edited apo crystal to get a smaller set of ligands relatively fast for the computationally more expensive relaxed complex method. The docked ligands, including the positive controls,

with LigScore1, LigScore2, PLP1, PLP2, PMF, PMF04, and the binding energy, and thereafter ranked according to the protocol above. The 302 highest ranked ligands from the dockings were chosen as All of the ligands in the new dataset docked close to the binding cleft on the protein ng all of them putative inhibitors.

After evaluating the different cut-offs, a cut-off of 0.095 nm were chosen which gave all the criteria that were defined beforehand; the seven first clusters include 97.6 % of the total 4002 trajectories and only five clusters have just one protein

Figure 3. The three positive controls used in the project. a) 1ME3, b) 1ME3+ and 1ME4. Since the simulations are carried out at a pH close to the pKa value of the pyrimidine ring, two states of 1ME3 included in the project.

docked into the apo structure with success as atoms RMSD for 1ME3 for the docked ligand compared to the crystal 1ME3 includes a pyridine ring that has a pKa of 5.21, and since all the

of the ligand were made; one neutral and one obtained from the dockings 46.495 kcal/mol for the charged

docked into the edited apo crystal for the computationally more , including the positive controls, were scored with LigScore1, LigScore2, PLP1, PLP2, PMF, PMF04, and the binding energy, and thereafter ranked

ligands from the dockings were chosen as binding cleft on the protein

0.095 nm were chosen which gave 24 beforehand; the seven first clusters include 97.6 % of the total 4002 trajectories and only five clusters have just one protein

The three positive controls used ) 1ME3+ and c) Since the simulations are carried out at a pH close to the pKa value of the pyrimidine ring, two states of 1ME3 were

(20)

16

Table 1. The clusters with the number of trajectories and the representative trajectory member of each cluster. The clustering was done with the gromos method in GROMACS with a cut-off of 0.095 nm.

Cluster

number of members

central member

1 3300 2517

2 277 461

3 144 2428

4 104 634

5 31 234

6 29 1319

7 21 684

8 15 729

9 15 2453

10 13 392

11 11 279

12 8 600

13 6 2643

14 6 283

15 5 796

16 4 691

17 3 644

18 3 233

19 2 709

20 1 822

21 1 711

22 1 119

23 1 352

24 1 738

4.5 Relaxed complex scheme

The relaxed complex scheme gives the opportunity to study a flexible and more realistic view of a protein rather than just a rigid snapshot of it. The method has been successfully used in other systems such as kinetoplastid RNA editing ligase 1, the avian influenza virus subtype H5N1 and the FDA improved HIV integrase drug, raltegravir (Cheng et al., 2008, Temesgen et al., 2008, Amaro et al., 2008). Although the flexibility of the ligand is well accounted for in computer-aided drug design, the dynamic nature of the protein is usually not, and in many cases the flexibility of the protein is crucial for a ligand to bind. So if it is not somehow simulated many potential inhibitors will not be found. When the relaxed complex scheme was used in this project 46 unique drug-like inhibitors were found, many of which would not have been found by only docking into the protein crystal structure alone.

(21)

17 4.5.1 Positive controls

The positive controls were also docked into the central representative members of the clusters in the relaxed complex scheme. They all scored well and were ranked among the top 5 to 15 in most scoring functions (table 2). Since they are known inhibitors to the protein and they ranked well, it can be assumed that the docking conditions are good and also that the ligands that score and rank well are putative drug-like inhibitors. The only scoring functions that had problems with scoring the positive controls were PMF, in particular, but also PMF04. None of the positive controls were among the first 50 ligands in PMF. Since the positive controls scored so badly it was decided that PMF would not be used in further work, meaning that it was not used when the final set of ligands were defined. PMF04 did somewhat better, the charged 1ME3 ranked 3rd, 1ME4 ranked 23rd, but 1ME3 only ranked 77th. Although it did not score the positive controls as good as the rest of the scoring functions, it can still be used with confidence since it did rank the positive controls relatively high but also since it has a lot of consensus with the other scoring functions in respect to the other ligands.

Table 2. Weighted ensemble average scores of the 181 redocked ligands, including positive controls. The weighted ensemble average scores of the 181 redocked ligands. Here, only the top 25 ligands are shown. The three positive controls (1ME4, 1ME3, 1ME3 +) are marked in blue. That the positive controls have ranked well in all scoring functions, except in PMF, means that the scoring functions can be used and trusted since the positive controls are known inhibitors. It also means that the ligands that do rank high are putative inhibitors for cruzain. Since PMF did not score any of the positive controls among the top 50 hits, it was decided to be thrown out when defining the final set of ligands. Note that the smaller the value, the better for PMF04.

CDOCKER Jain LigScore1 LigScore2 PMF PMF04 PLP1 PLP2

Name S. Name S. Name S. Name S. Name S. Name S. Name S. Name S.

1ME4 59.6 8463 6.7 27186 5.8 1ME4 6.5 8997 63.1 5670 -53.0 1ME4 117.4 1ME4 103.0 1ME3 59.2 1ME3+ 5.3 4384 5.7 4383 6.4 12819 60.7 27186 -52.2 1ME3+ 112.7 1ME3+ 102.4

5608 56.2 1ME4 4.5 1803 5.5 8950 6.4 8802 57.9 1ME3+ -51.8 27186 109.0 27186 98.3 2192 50.8 3525 4.5 1ME4 5.4 1ME3 6.2 8735 56.6 2112 -51.6 1ME3 107.9 1ME3 93.5 1ME3+ 47.1 3553 4.0 4383 5.4 1ME3+ 6.1 9021 54.1 2113 -49.0 22386 106.4 16771 92.4 12986 39.9 2858 4.0 1ME3+ 5.4 27186 6.1 27186 52.9 8997 -46.9 20594 104.0 22386 89.3 9529 39.1 9663 3.3 16771 5.3 12819 6.1 3930 51.4 2858 -44.3 16771 103.1 2858 87.6 429 38.0 556 3.1 22386 5.2 16771 6.0 724 50.2 556 -43.0 2858 100.8 1803 87.4 8750 35.7 9382 3.1 2858 5.2 6436 6.0 3553 45.4 8344 -42.3 12819 98.0 6610 85.4 6840 35.0 20063 3.0 8950 5.2 2858 5.9 20594 44.8 20058 -42.2 7676 96.7 16516 84.7 25233 34.5 6865 2.9 12819 5.2 4384 5.9 22386 44.8 12232 -41.4 3553 96.3 20594 84.3 4722 34.4 5654 2.9 9529 5.0 724 5.8 15798 44.6 9354 -41.4 6610 94.9 12819 83.5 16226 34.3 724 2.6 1ME3 4.9 22386 5.8 114 44.1 5052 -40.9 4384 94.8 4384 82.9 2994 34.2 9657 2.6 724 4.8 20594 5.6 16771 43.5 3219 -40.6 21490 90.1 724 82.4 2906 34.1 1ME3 2.6 29148 4.8 1803 5.5 20058 43.1 21490 -40.2 724 89.4 15618 81.5 14436 34.0 7881 2.5 6436 4.7 8463 5.5 9368 42.4 5620 -38.5 8950 89.4 3553 80.8 24665 34.0 20594 2.5 8463 4.4 26633 5.5 6436 41.6 8950 -38.0 1803 89.1 9395 79.8 1768 33.0 9021 2.5 12277 4.3 6610 5.5 29148 40.9 7676 -37.1 4383 88.9 8950 79.3 8769 33.0 9396 2.5 6037 4.2 6340 5.5 7881 40.4 17133 -36.8 15618 88.6 7676 78.9 10961 32.8 11210 2.4 5410 4.2 9396 5.4 2258 40.4 12819 -36.2 8735 88.2 8731 77.9 9033 32.8 14436 2.4 16226 4.1 6339 5.4 17722 40.4 16736 -35.9 16516 88.1 9354 77.4 14449 32.4 6610 2.4 6340 4.0 18413 5.4 3641 39.9 20063 -35.8 8731 88.0 556 76.6 4021 32.3 16771 2.4 114 4.0 557 5.3 9354 39.3 1ME4 -35.7 5052 85.0 9402 76.6 10873 32.2 24665 2.3 13115 4.0 4722 5.3 8463 38.2 24665 -35.5 556 84.7 4383 76.5 557 31.9 13071 2.3 14436 3.9 25798 5.3 5654 37.0 1339 -35.4 4749 84.6 21490 75.2

(22)

18 4.5.2 Redocking

The 302 ligands chosen from the first docking were docked into each central member representing the 24 different clusters. Most of the ligands were redocked with success; however there were a few ligands that did not redock into some of the central members. Ligand 724 had eight states that failed to dock into central member of cluster 3, eight failed to dock into 4, two failed to dock into 10, six failed to dock into 13, six failed to dock into 17, and five failed to dock into 23. Ligand 12819 had four states that failed to dock into the central member of cluster 4, and one state that failed to dock into 13. One state of ligand 2858, one state of 9021, and two states of ligand 3641 failed to dock into central member of cluster 4. As with the first docking all of the states were again treated as separate ligands, all of the successfully docked ligands were then scored separately and weighted ensemble averages were calculated for every ligand and scoring function (table 2). The weighted ensemble average was calculated by taking the sum of the score of each central member multiplied by the number of members in the cluster, divided by the total number of trajectories in the clusters (eq. 30). For the few cases of ligands that failed to dock, the scores were left out from the sum and the total number of trajectories was subtracted by the number of trajectories in that particular cluster.

B =∑ // 5

ÀÁÀ (30)

where s is the weighted ensemble average score, the sum goes through i clusters, ni is the number of members in cluster i, ei is the score of the central member representing cluster i, and ntot is the total number of trajectories in all the clusters.

After the weighted ensemble average scores were calculated the different states of the ligands were gone through for each scoring function separately and only the state with the best score were kept. This was done with the thought that the different states are in equilibrium in the solution and the ligand will undertake the state that docks, in this case scores, the best. This treatment resulted in a set of 181 ligands that was ranked. Since the different scoring functions cannot be compared straightforward two different approaches were done, and they were then merged to get a sense of resemblance in the two methods. The first approach was done by

calculating an average rank for the different scoring functions (table 3). This was done with the idea that if a ligand scores and ranks well on average in all scoring functions it will probably be a good candidate for further drug design. The second approach was done by taking the nine highest ranked ligands from every scoring function and merging them into one table (table 4). This approach was

(23)

19

based on the idea that if a ligand scores among the best in any scoring function it would be a good candidate for further drug design. The 32 best ranked ligands from the two different approaches were merged into table 5 to see how many of the ligands ranked well in both methods, and to get a final set of ligands that will be experimentally validated. That showed that as many as 18 of the ligands were among the best 32 in both methods.

To see how much the rankings of the ligands were rearranged from the first docking to the crystal structure and after the relaxed complex scheme, the rankings of the final set of 46 ligands in the first dockings were investigated (table 5). There are only nine ligands that have been ranked among the top 46 in four or more of the scoring functions in the first docking to the crystal

structure, these are 724, 2858, 9529, 12819, 16771, 22386, 27186, 20594, and 6436. Among these ligands 2858 and 27186 stand out from the others by ranking among the top 46 in 7 and 6,

respectively, of the 8 scoring functions. If looking at how many of the ligands had a ranking of 46 or less within a scoring function, four scoring functions seem to be slightly better than the others.

These are PLP1, LigScore1, LigScore2, and the binding energy calculated by CDOCKER, with 17, 15, 16, and 16 ligands ranked within the top 46, respectively. The worst scoring functions were PMF and Jain that only had 8 and 6 ligands among the top 46, respectively. Most of the top 46 ligands from the relaxed complex scheme were ranked really poorly in the docking to the crystal structure, which indicates that the method is a very powerful tool when doing computer-based drug design since it is able to find new compounds that would never be found otherwise.

References

Related documents

Indien, ett land med 1,2 miljarder invånare där 65 procent av befolkningen är under 30 år står inför stora utmaningar vad gäller kvaliteten på, och tillgången till,

Av 2012 års danska handlingsplan för Indien framgår att det finns en ambition att även ingå ett samförståndsavtal avseende högre utbildning vilket skulle främja utbildnings-,

It has been noted that for a mixed generation day care to function the elderly need to be treated differently than the children and allow the able to help out with the child

Wage pressure and ranking have similar effects in the model: both tend to raise the equilibrium rate of unemployment and make the effects of shocks to employment more persistent.

Due to the absence of structural information of target binding properties, potential ligand-target interactions of importance in ligand binding were examined by docking known binder

Second, genetic methods were used to mutate genes and attach a peptide causing the gene product (protein) to fluoresce so it could be seen in a microscope, to elucidate

Besides this we present critical reviews of doctoral works in the arts from the University College of Film, Radio, Television and Theatre (Dramatiska Institutet) in

How  the  revenues  are  connected  to  the  revenue  drivers  is  a  problem  which  will  be  enlightened.  It  is  problematic  since  the  customer  and