• No results found

Statistical molecular design, QSAR modeling, and scaffold hopping – Development of type III secretion inhibitors in Gram negative bacteria

N/A
N/A
Protected

Academic year: 2022

Share "Statistical molecular design, QSAR modeling, and scaffold hopping – Development of type III secretion inhibitors in Gram negative bacteria"

Copied!
71
0
0

Loading.... (view fulltext now)

Full text

(1)

S TATISTICAL MOLECULAR DESIGN , QSAR MODELING , AND SCAFFOLD

HOPPING

D EVELOPMENT OF TYPE III SECRETION INHIBITORS IN G RAM NEGATIVE

BACTERIA

M ARKUS K. D AHLGREN

AKADEMISK AVHANDLING

U MEÅ U NIVERSITY

D EPARTMENT OF C HEMISTRY

2010

(2)

2

C OPYRIGHT © 2010 M ARKUS D AHLGREN

ISBN: 978-91-7264-976-7

(3)

3

P RINTED IN S WEDEN BY VMC - KBC UMEÅ

(4)

4

Title

Statistical molecular design, QSAR modeling, and scaffold hopping - Development of type III secretion inhibitors in Gram negative bacteria

Author

Markus K. Dahlgren, Department of Chemistry Umeå University, SE-90187, Umeå, Sweden

Abstract

Type III secretion is a virulence system utilized by several clinically important Gram-negative pathogens. Computational methods have been used to develop two classes of type III secretion inhibitors, the salicylidene acylhydrazides and the acetylated salicylanilides. For these classes of compounds, quantitative structure- activity relationship models have been constructed with data from focused libraries obtained by statistical molecular design. The models have been validated and shown to provide useful predictions of untested compounds belonging to these classes.

Scaffold hopping of the salicylidene acylhydrazides have resulted in a number of synthetic targets that might mimic the scaffold of the compounds. The synthesis of two libraries of analogs to two of these scaffolds and the biological evaluation of them is presented.

Keywords

Statistical molecular design, QSAR, synthesis, type III secretion, virulence, scaffold hopping

ISBN: 978-91-7264-976-7

(5)

5

Contents

1. LIST OF PAPERS ... 1

2. ABBREVIATIONS ... 3

3. INTRODUCTION ... 5

3.1. C OMPUTATIONAL DRUG DESIGN ... 5

3.1.1. Statistical molecular design ... 6

3.1.2. QSAR modeling ... 9

3.1.3. Scaffold hopping ... 10

3.2. B ACTERIAL VIRULENCE ... 11

3.2.1. Type III secretion ... 12

3.2.2. Type III secretion inhibitors ... 13

4. SCOPE OF THIS THESIS ... 15

5. STATISTICAL MOLECULAR DESIGN, SYNTHESIS, AND BIOLOGICAL EVALUATION OF TYPE III SECRETION INHIBITORS .... 17

5.1. S ALICYLANILIDES ( PAPER I) ... 17

5.2. S ALICYLIDENE ACYLHYDRAZIDES ( PAPER II) ... 20

6. QSAR MODELING OF TYPE III SECRETION INHIBITORS ... 25

6.1. S ALICYLANILIDES ( PAPER I) ... 25

6.2. S ALICYLIDENE ACYLHYDRAZIDES ( PAPER II) ... 29

6.3. E VALUATION OF QSAR MODELS USING EXTERNAL TEST SETS ... 35

6.3.1. External test set for the salicylanilides (paper I) ... 35

6.3.2. External test set for the salicylidene acylhydrazides (paper II) ... 36

6.4. QSAR MODELS USED TO DEVELOP AZIDE CONTAINING T3S INHIBITORS FOR TARGET IDENTIFICATION ... 39

6.4.1. Azide containing salicylanilides ... 39

6.4.2. Azide containing salicylidene acylhydrazides ... 40

6.5. C ONCLUSIONS FROM QSAR MODELING ... 42

7. SCAFFOLD HOPPING FROM A SALICYLIDENE ACYLHYDRAZIDE 43 7.1. 2-(2-A MINO - PYRIMIDIN -4- YL )-2,2- DIFLUORO -1-( PHENYL )- ETHANOLS ( PAPER III) ... 44

7.2. T HIAZOLES ( PAPER IV) ... 49

7.3. O THER SCAFFOLDS ... 51

(6)

6

8. CONCLUDING REMARKS ... 53

9. ACKNOWLEDGEMENTS ... 55

10. REFERENCES ... 59

(7)

1

1. List of papers

This thesis is based on the following papers, which will be referred to in the text by their roman numerals I-IV.

I Design, Synthesis, and Multivariate Quantitative

Structure-Activity Relationship of Salicylanilides-Potent Inhibitors of Type III Secretion in Yersinia

Dahlgren, M. K.; Kauppi, A. M.; Olsson, I.-M.; Linusson, A.;

Elofsson, M.

J. Med. Chem.; 2007; 50(24); 6177-6188.

DOI: 10.1021/jm070741b

II Statistical molecular design of a focused salicylidene acylhydrazide library and multivariate QSAR of inhibition of type III secretion in the Gram-negative bacterium Yersinia

Dahlgren, M. K.; Zetterström, E. C.; Gylfe, Å.; Linusson, A.;

Elofsson, M.

Bioorg. Med. Chem.; Article in Press DOI: 10.1016/j.bmc.2010.02.022

III Synthesis of 2-(2-amino-pyrimidine)-2,2-difluoro-ethanols identified through scaffold hopping from a salicylidene acylhydrazide

Dahlgren, M. K.; Öberg, C.; Wallin, E.; Jansson, P.; Elofsson, M.

Manuscript

IV Synthesis of [4-(2-Hydroxy-phenyl)-thiazol-2-yl]- methanones, structural analogs of salicylidene acylhydrazides

Hillgren, M.; Dahlgren, M. K.; Tam, T. M.; Elofsson, M.

Manuscript in preparation

Papers I and II are reprinted with kind permission of the publishers.

(8)

2

(9)

3

2. Abbreviations

BB building block Equiv. equivalents FD factorial design

PCA principal component analysis

PLS partial least-squares regression to latent structures PLS-DA PLS-discriminant analysis

MLR multiple linear regression

QSAR quantitative structure-activity relationship QSPR quantitative structure-property relationship SAR structure-activity relationship

SMD statistical molecular design Y. Yersinia

TEA triethylamine T3S type III secretion MM molecular mechanics

ADME absorption, distribution, metabolism, excretion det determinant

DOOD D-optimal onion design

HOMO highest occupied molecular orbital

LUMO lowest unoccupied molecular orbital

(10)

4

(11)

5

3. Introduction

Medicinal chemistry is a discipline involving the design, synthesis, and optimization of small organic molecules as part of the development of drugs and research tools. Drug discovery often starts with a validated hit compound that is identified through unbiased biological screening of compound libraries using robust assays that are representative for a specific disease. 1 Analogs to the hit compound are then synthesized in order to ensure that the structure of the hit compound can be varied without entirely loosing biological activity.

The compound is then further optimized by medicinal chemists to increase potency, selectivity, solubility, etc. The drug development process is an arduous, costly, and time consuming task. Proficient use of computational tools will reduce the time needed to deliver new drug candidates. 2 Biologically active compounds can also be used as research tools in a field known as chemical genetics or chemical biology. Such compounds can be utilized to, for instance, identify receptor targets for a class of compounds with an unknown mode of action or help in the elucidation of complex, not fully understood, biological systems. This thesis exemplifies computational techniques that are highly useful in facilitating compound optimization in the absence of a structural target, including design strategies, synthesis, biological evaluation, and QSAR modeling.

3.1. C Co om m pu p ut ta a ti t io on na al l d dr ru ug g d de es si ig gn n

Computational chemistry can facilitate drug development at all stages if

applied proficiently. If the structure of the target is known a wide range of

computational techniques are available, including molecular docking 3, 4 ,

pharmacophore mapping of the binding site, and structure based design. If the

hit compounds have been identified in for instance cell-based assays and no

structural target is known, the computational techniques are limited and

applied to the ligands. Simple filters, like Lipinski´s rules, 5 can easily be

computed to filter out compounds that are unlikely to be developed into, for

instance, orally administered drugs. Pharmacophore mapping of key features

of the ligands can be used to construct a model that can be used to select

promising candidates from virtual libraries. Scaffold hopping methods are

usually similarity based and the aim is to identify compounds that should

retain key interaction features, such as hydrogen bond donors and acceptors,

but replace the scaffold in order to, for instance, increase potency and

(12)

6

chemical stability. Scaffold hopping has become a popular tool in pharmaceutical industry to develop so called fast-follower drugs, where scaffold hopping is performed on compounds in clinical trials with the aim to identify new compounds with good patentability that quickly can be developed into drugs. Quantitative structure-activity relationship (QSAR) 6, 7 models can be used to predict the biological activity of new compounds belonging to the same class. 8-11 The input in such models is numerical descriptions of chemical features and one or more responses (e.g. quantitative biological data from one or more in vitro assays). The responses can be any quantitative assay readout meaning that such models can be used to not only optimize activity, but also other properties, such as solubility, membrane permeability, and reduction of toxicity. When other properties than biological activity are used as responses, the models are called quantitative structure- property relationship (QSPR) models. Statistical molecular design (SMD) 12-14 can be used with the preset goal to achieve reliable QSAR models and is a strategy used to systematically vary the chemical features of the compounds believed to be important for the response, effectively reducing the number of compounds to be synthesized while retaining much of the information of the full set.

3.1.1. Statistical molecular design

When a compound class is subjected to a medicinal chemistry project for optimization of one or more properties or responses QSAR modeling can facilitate the process. QSAR models relate chemical features of compounds to responses through regression modeling according to equation 3.1.

Equation 3.1. General form of a QSAR model equation. y

i

is the ith response, x

ik

is the ith compound, described by k=1…K predictor variables, b

k

is the model coefficient for each variable, k, and f

i

is the residual for the ith response.

The predictor variables in the equation are numerical descriptions of chemical features of the molecules which can be expanded to include higher order terms, such as cross and square terms, to give interaction models. By use of SMD, a representative compound set is selected with the aim of

(13)

7

obtaining a QSAR model with minimal error in the model´s coefficients, b k , estimated from the data and maximizing the likelihood of reliable predictions.

The most commonly used experimental designs in SMD are factorial designs (FDs) 15, 16 and D-optimal designs 17, 18 . Both types of designs are used to systematically vary the chemical features of interest in order to obtain a representative subset of compounds. This subset should as a result of systematic structural variation contain compounds that give a balanced spread in the response measured. In addition the designs select compounds that span the experimental domain (i.e. all possible synthetic targets) in order to minimize the error in the QSAR model coefficients (figure 3.1a). As opposed to SMD, small variation of chemical features of compounds will give large errors in the QSAR model coefficients (figure 3.1b). While FDs systematically vary chemical features at high and low levels, D-optimal designs selects a set of compounds that will span as large volume as possible.

In mathematical terms a D-optimal design will select m compounds from a matrix X with K columns (the chemical features investigated) and n rows (the entire candidate set) in such a way that det(X’X), where det denotes the determinant, is maximized.

Figure 3.1 An assay that for each investigated compound gives a variation (error bars) in the measured response, Y, will result in errors in the QSAR model coefficient. The QSAR model coefficient is the slope of the line, established through regression. In each figure two lines are drawn that show the maximum variation of the QSAR model coefficient. a) A large change of a chemical feature X

1

gives a more reliable QSAR model coefficient. b) If X

1

is varied only slightly the variation of the assay will give a large error of the QSAR model coefficient.

There might be some unforeseeable problems with SMD using FDs or D-

optimal designs, for instance, some of the chemical features investigated

might not have any correlation with the response and there might be other

important chemical features, highly correlated with biological activity, that

are not investigated by the selected compounds. To remedy the latter it is

(14)

8

usually a good idea to add extra compounds to the designed set. A center point, i.e. a compound that has average values for all properties investigated, should be added to monitor possible nonlinearities. Additional compounds can be added manually or by implementation of a multilayer design, such as D-optimal onion design (DOOD) 19 . In a DOOD the chemical space is divided into onion layers and a D-optimal design is performed in each layer. DOOD can be centered on any compound, for instance, the compound with the highest biological activity.

SMD and QSAR rely on the computation of molecular descriptors, which essentially are numerical descriptions of chemical features. Descriptors are generally classified as 1D, 2D, or 3D descriptors. 1D descriptors are essentially descriptors that can be calculated directly from the molecular formula and generally does not require a program, such as atom counts and molecular weight. 2D descriptors give information about how the atoms are connected and can be properties calculated from connectivity tables, tables that explain how atoms are connected to each other in any given molecule.

Examples of 2D descriptors are, for instance, connectivity indices, volumes and areas that are calculated from connectivity tables, surface and volume approximations of certain chemical groups, density, and bond counts. 3D descriptors are conformation dependent and are computed in programs that require a 3D input of each structure. There are essentially four different levels of computation for descriptors, i.e. informatics, molecular modeling, semi- empirical, and quantum chemical descriptors. Informatics descriptors generally do not require programs for computations and examples include many 1D and 2D descriptors. Molecular modeling descriptors are based on force-field mechanics and can be obtained through software such as MOE 20 and DRAGON 21 . Semi-empirical descriptors are derived through regression based on parameters, often established through quantum chemical calculations, defined in the given program. Semi-empirical descriptors can be obtained through software such as MOE and Spartan, for which the semi- empirical calculations have been documented 22 . Quantum chemical descriptors can be calculated using various software such as Spartan, Jaguar 23 , and Gaussian 24 .

The descriptors need to be relevant for the response for which the compounds are going to be optimized. It is advisable to select a wide range of descriptors in order to increase the chance of describing features important for the activity that can be used in QSAR modeling. When large sets of descriptors are chosen it quickly becomes problematic to visualize the data set.

Additionally, descriptors are often heavily correlated which means that they

(15)

9

essentially describe the same chemical feature (e.g. molecular weight, volume, and number of atoms that all represent the chemical feature size).

These problems can be addressed by variable selection or by the use of principal component analysis (PCA) 25, 26 , which is used to compute orthogonal principal components. The principal components describe the main variation of the data and form a hyper plane onto which the entire data set can be projected. The principal components describe chemical features that are uncorrelated, also known as principle properties, which can be used directly as design variables in what is called a multivariate design. 27, 28 SMD can be performed at both the building block (BB) and product level. If BBs prove unreactive or otherwise problematic, and an SMD has been performed on the BB level, it is easy to manually exchange them with other BBs that are close in principal component space. QSAR modeling based on BBs selected through multivariate design will offer direct insight into how local structural alterations will affect the response. Additionally, it is computationally more effective to perform SMD at the BB level than the product level.

3.1.2. QSAR modeling

After synthesis and biological evaluation of a designed compound set has

been completed, QSAR models can be computed, relating molecular

descriptors or principal properties of a set of compounds (i.e. the training set)

to one or more responses through regression. Commonly used regression

techniques for QSAR modeling are partial least-squares regression to latent

structures (PLS) 29, 30 and multiple linear regression (MLR) 31 . Prior to

regression modeling, data is sometimes filtered. Orthogonal signal correction

(OSC) is a method that can be used to remove descriptors that are orthogonal,

i.e. linearly independent, to the response. 32 Orthognal projections to latent

structures (OPLS) is essentially a PLS method with an integrated OSC

filter. 33 Support vector machines (SVM) is another type of regression

technique that can be used to model nonlinear data. 34 In order to compute

QSAR models the compounds need to interact with the biological target

through the same mechanism, possess an even spread in biological activity,

obtained with robust and reproducible assays, and the numerical descriptions

of the compounds need to be relevant for the responses. If SMD has been

performed prior to QSAR modeling, the designed set should be of

manageable size, allowing simultaneous evaluation of the entire compound

(16)

10

set in replicates. The biological evaluation should be performed on several occasions to get reliable data. Some compounds from the designed set that are inactive can be included in the training set to get complementary information, but those compounds need to be inactive due to unfavorable interactions with the biological target and not through inability to, for instance, pass through cell membranes. It is often tricky to decide if an inactive compound should be included in the QSAR modeling. One way to investigate whether an inactive compound is suitable for inclusion in the training set is to compute a QSAR model for those compounds that are active and use that model for prediction of all the inactive compounds. Those compounds predicted as inactive could then be added to the training set.

QSAR models are useful to gain an understanding of what chemical features that correlate with the responses, even with limited prior knowledge. That information can be extracted from a training set using for instance PLS regression, relating descriptors or principal properties of the training set to one or more responses, and subsequent variable selection. QSAR models are also highly useful for prediction of responses for compounds not synthesized and biologically evaluated. Those compounds used for predictions are called the test set. In order to get reliable predictions, the training set needs to cover the chemical features of the test set. It can therefore be of interest to do a second round of SMD and QSAR modeling to get new models that offer more accurate predictions, by using the former QSAR models´ coefficients as design parameters. The test set should be selected, synthesized, and biologically evaluated after the QSAR models have been computed to ensure unbiased evaluation of the models. These types of test sets are usually called external test sets. The predictive power of any given model is usually not as good as indicated by the Q 2 value, 35 and therefore it is of utmost importance to critically test and evaluate the models with external test sets.

3.1.3. Scaffold hopping

SMD is usually used to vary the substitution pattern on a given scaffold or

the BBs used to synthesize a library with a common scaffold. Scaffold

hopping 36 on the other hand usually aims to keep key interaction points and

favorable substitution patterns and instead change the scaffold of a compound

class. The two techniques are therefore complementary and can be used in

conjunction. 3D scaffold hopping methods have been published that

outperform 2D methods, 37, 38 which consider flexibility, geometry, and

(17)

11

pharmacophore-like molecular properties. A more recently published method is SHOP 39 that considers geometrical features of the scaffold, shape, and alignment-independent GRID descriptors. 36 A receptor-based scaffold hopping method has recently been developed that is incorporated in SHOP. 40 Scaffold hopping can be used in early drug discovery to identify additional lead compounds (backup leads) that should lower the chance of drug development attrition due to such factors as, for instance, undesirable absorption, distribution, metabolism, and excretion (ADME) properties. The creation of intellectual property is also facilitated. In addition the method can be used for finding bioisosteres. In this thesis the program SHOP 39 has been used and is the only software discussed in detail.

3.2. B Ba ac ct te er ri ia al l vi v ir ru ul le en nc ce e

Infectious diseases, with a substantial contribution from pathogenic bacteria,

are the leading cause of death world-wide. 41 Antibiotics, i.e. compounds that

kill or inhibit growth of bacteria, have proven to be very effective against

infectious diseases caused by pathogenic bacteria in those regions where they

have been available. Antibiotics that target the bacterial cell wall (for

example penicillin), or cell membrane, or interfere with essential bacterial

enzymes are usually bactericidal (killing bacteria) in nature, while those that

target protein synthesis are usually bacteriostatic (inhibitors of bacterial

growth). 42 Even though antibiotics have been highly successful against

infectious diseases, they are not without side effects. Since they target general

features common to most bacteria, essential bacteria in the intestinal flora

will also be affected by antibiotics causing adverse side effects. Multidrug

resistant bacterial strains have surfaced which resist most available treatments

available on the market. Antibiotic resistance is a result of selection for

organisms that have enhanced ability to survive doses of antibiotics that

previously would have been lethal. Those bacteria which have developed

resistance allowing them to withstand an antibiotic treatment will survive and

live on to reproduce. 43 They will then pass on that trait, which will result in a

fully resistant colony. Resistance can also be the result of horizontal gene

transfer, in which a bacterium can incorporate genetic material from another

bacterium without being its offspring. 44 Since antibiotics target non-

pathogenic bacteria as well, resistance will be developed rapidly even outside

of the host. This puts a demand for new antibiotics to combat resistant strains,

but to avoid the rapid development of resistance, new strategies to combat

bacterial infectious diseases are needed.

(18)

12

Targeting the functions through which bacteria are able to evade the immune response or establish disease might potentially halt or slow progress of bacterial disease and reduce risk of giving rise to resistant strains. 41 The methods through which different bacteria invade the host, evade the host immune response, proliferate within the host, and establish disease are broadly termed virulence mechanisms. Developing drugs targeting virulence mechanisms generally poses a bigger challenge than the development of traditional antibiotics since such systems usually require activation, either artificial or by placement of the bacteria in an environment where the specific virulence system is triggered. General examples of virulence mechanisms include, but are not limited to, bacterial adhesion to host cell surfaces, secretion and translocation of toxins and immune response inhibitors, quorum sensing, invasion of host tissue or host cells, and colonization of compartments within the host. 41 In addition to anti-virulence drugs, small organic molecules that have been identified as virulence inhibitors through biological screening can be used as research tools to elucidate complex, not well understood biological mechanisms involved in expression, regulation, and function of the virulence system.

3.2.1. Type III secretion

Type III secretion (T3S) is a virulence system found in several Gram- negative animal pathogens, such as Yersinia spp., Pseudomonas aeruginosa spp., Chlamydia spp., Salmonella, spp., and Shigella flexneri spp. 45 The virulence system also exists in several Gram-negative plant pathogens. The function of T3S varies between different bacterial species. The molecular events during Yersinia infections have been extensively studied, 45-48 and Yersinia thus serves as an excellent model organism to study T3S and evaluate inhibitors. 48, 49 In this thesis Yersinia pseudotuberculosis (Y.

pseudotuberculosis) has been used as a model organism for evaluation of T3S

inhibitors. When Y. pseudotuberculosis senses contact with a eukaryotic cell

the bacterium will secrete and translocate effector proteins into the cytosol of

the target cell. The effector proteins target specific functions of the target

cell, such as phagocytosis and inflammatory responses, 46 allowing the

bacteria to subvert it and proliferate (figure 3.2a). A T3S inhibitor would stop

the secretion, thus preventing the bacteria from injecting the effector

molecules into the cytosol of the target cell. The disarmed bacteria would be

(19)

13

eliminated through phagocytosis and the infection would be cleared (figure 3.2b).

Figure 3.2. Schematic representation of a Yersinia infection; a) The bacterium will sense contact with the eukaryotic cell and adhere to it. The cytosol of the target cell will be injected with effector molecules that will turn off the immune response and subvert the target cell. In absence of a functional immune defense, the bacteria will proliferate; b) The addition of a T3S inhibitor will prevent the injection of effector molecules, resulting in a functional immune defense that can clear the infection.

3.2.2.Type III secretion inhibitors

Prior to the work described in this thesis, a number of T3S inhibitors were identified and published within the research group. 50 Three classes of inhibitors were identified, an acetylated salicylanilide that was a singleton in the biological screening campaign (figure 3.3a), salicylidene acylhydrazides (figure 3.3b), and a 2-arylsulfonylamino-benzanilide (figure 3.3c) that also was a single hit within its class, were all further investigated through synthesis of analogs and biological evaluation. The 2-arylsulfonylamino-

Adhere Inject Subvert Proliferate

a)

Adhere Phagocytosis Clearance T3S Inhibitor

b)

(20)

14

benzanilides were subjected to SMD, synthesis, biological evaluation and QSAR modeling. 51 The acetylated salicylanilide was a singleton in the screening campaign. Three analogs were synthesized and biologically evaluated, where the acetyl group was exchanged with a propanoyl or butanoyl group or the salicylanilide was left unacetylated. 52 A number of salicylidene acylhydrazides were synthesized and evaluated for their ability to inhibit T3S. 53 Since that study, the salicylidene acylhydrazides have been used extensively as research tools to study the function of T3S in a wide range of organisms where they are active. 54 In this thesis, the acetylated salicylanilides and the salicylidene acylhydrazides were subjected to SMD, synthesis, and QSAR modeling (figures 3.3b and 3.3d).

Figure 3.3. The structures of the compounds identified from biological screening and the general structures of the compounds studied in this thesis. a) the acetylated salicylanilide that was a singleton in the biological screening campaign; b) the general structure of the salicylidene acylhydrazides; c) the 2-arylsulfonylaminobezanilide that was a singleton in the biological screening campaign; d) the general structure of the acetylated salicylanilides.

O O

N H O

R1

R2

a)

HO

NN H R2

O R1 NH

Cl N H S O O

O N

S N

Cl

Cl O

O N H O I

I

Cl

b) c) d)

(21)

15

4. Scope of this thesis

This thesis describes the use of computational tools to optimize T3S

inhibitors. The SAR of the acetylated salicylanilide had not been investigated

beyond manipulation of the acetyl group and the first goal was to investigate

the SAR by variation of the substitution pattern on both the salicylic acid and

aniline ring moieties. The information gained from the SAR would be used to

guide an SMD of the compound class that hopefully would lead to the final

goal, namely the establishment of a QSAR model. The salicylidene

acylhydrazides were to be subjected to SMD directly, using information from

the previously published compounds, with the aim to establish a QSAR

model also for this compound class. In the later part of the graduate studies

focus was shifted towards finding alternatives to the salicylidene

acylhydrazides, since a number of challenges associated with the core of the

compounds were identified. The salicylidene acylhydrazides are interesting

from a biological and possibly a clinical perspective in that they inhibit T3S

in several relevant Gram-negative organisms. Scaffold hopping of the central

fragment with subsequent synthesis of a small number of resulting scaffolds

was planned for the last part of this thesis.

(22)

16

(23)

17

5. Statistical molecular design, synthesis, and biological evaluation of type III secretion inhibitors

All synthesized compounds presented in this thesis were evaluated for their ability to inhibit T3S in a reporter-gene assay as previously described. 53 The biological read-out from this assay (% inhibition of luciferase light emission) is directly proportional to inhibition of the reporter-gene. To verify that the inhibition was not a result of direct interference with luciferase or the light signal, an additional assay based on the secreted effector molecule YopH 51 was used for all the salicylidene acylhydrazides. Another method used to verify inhibition of secretion was Western Blot, 50 that was used to evaluate some of the salicylanilides. All salicylidene acylhydrazides and some of the salicylanilides were also tested for bacterial growth inhibition, as previously described, 50 to ensure that the observed reporter-gene inhibition was not due to general toxicity.

The SARs of the acetylated salicylanilides and the salicylidene acylhydrazides, had previously not been studied in detail. Without any structural information of the biological target we decided to use SMD as a strategy to design focused compound libraries that hopefully could be used to establish QSAR models for both classes of compounds. This section will describe the SMD strategies used.

5.1. S Sa al li ic cy yl la a ni n il li id de es s ( (p pa a pe p er r I I) )

The acetylated salicylanilide (1a, table 5.1) was a single hit from the initial

screening 50 and only structural variation around the acetyl group had

previously been investigated. 52 No structural target was known for this

compound class. It was roughly three times more potent against T3S than the

most potent of the previously published salicylidene acylhydrazides. 50, 53

Analogs to 1a could be synthesized via amide coupling of a salicylic or

benzoic acid and an aniline. Subsequent acetylation of the salicylic hydroxyl

group under acidic conditions gave the acetylated salicylanilide analogs

(24)

18

(scheme 5.1). Interestingly if the acetylation was performed with pyridine as catalyst acetylation of both the hydroxyl and the amide groups was observed.

Scheme 5.1. Synthesis of analogs to the acetylated salicylanilide 1a.

An initial SAR study was performed, through the synthesis and biological evaluation of two previously synthesized (1a and 1b) 52 and five new analogs (table 5.1).

ID R Structure

Reporter-gene signal inhibition at four compound concentrations

100 μM 50 μM 20 μM ± 10 μM

1a

Ac 99 ± 0 99 ± 0 98 ± 0 76 ± 1

1b

H 99 ± 1 100 ± 0 100 ± 0 100 ± 0

2a

Ac 100 ± 0 100 ± 0 100 ± 0 99 ± 2

2b

H 78 ± 6 80 ± 3 85 ± 1 91 ± 1

3a

Ac 99 ± 1 92 ± 3 44 ± 11 23 ± 4

3b

H 100 ± 0 99 ± 0 64 ± 4 14 ± 1

4

- - - - -

Table 5.1. Six SAR compounds were synthesized to probe the biologically tolerated structural variation of the original hit 1a. The compounds were evaluated using the reporter-gene assay. Means and standard deviations were calculated from triplicates, and experiments were reproduced on at least two separate occasions.

In the SAR study only structural modification of the salicylic acid moiety was performed. In retrospect, the aniline moiety should have been manipulated as well to investigate whether the aniline moiety could be structurally altered without complete loss of T3S inhibition. The results indicated that the salicylic acid moiety allowed exchange of both iodines with hydrogen atoms without complete loss of biological activity. Interestingly the compounds synthesized from 5-iodo-salicylic acid (2a and 2b) were roughly ten times more potent than the original hit. Exchange of the hydroxyl or O- acetyl groups with hydrogen, as in 4, resulted in complete loss of T3S activity

R

OH O

R

1

H

2

N

R

2

+

R N H O

R

1

R

2

PCl

3

, Toluene MWI 150

o

C, 10 min R = H

R = OH Ac

2

O, phosphoric

acid, 70

o

C, 30 min R = H

R = OH R = OAc

I

I OR

N H

O Cl

I OR

N H

O Cl

OR N H

O Cl

N H

O Cl

F

F

(25)

19

at compound concentrations as high as 100 μM. Based on these results a second selection of compounds was planned, where different salicylic acids and anilines would be selected to form a virtual library from which a number of targets for synthesis would be chosen.

From commercial sources 25 anilines and 22 salicylic acids were chosen based on availability, price, substitution pattern, chemical compatibility, and size. All combinations of the BBs were enumerated, resulting in 550 virtual salicylanilides. A three component PCA model (R 2 = 0.97, Q 2 = 0.96), describing size, hydrophobicity, density, and connectivity, was used for manual selection of 16 new unacetylated salicylanilides (figure 5.1).

Figure 5.1. Manual selection of salicylanilides from a three component PCA model (first two components shown). The hit compound, 1a, is marked with an open ring. The selected compounds are marked with filled circles. The first PC corresponds to size and hydrophobicity. The second PC represents the density of the compounds (molecular weight divided by molecular volume). The third PC describes hydrophobicity and connectivity of the compounds.

The 16 acetylated and the corresponding unacetylated salicylanilides were synthesized and biologically evaluated. The yields, including the SAR compounds, ranged from 2-60% over two steps. The resulting data did not contain enough active compounds to compute a QSAR model. Only five compounds displayed higher than 50% reporter-gene inhibition at 20 μM concentration, therefore a complementary selection was performed. For this selection DOOD was applied, which was especially attractive since it allowed the design to be performed around the most potent compound. A few

-4 -2 0 2 4 6

-10 0 10

t[2 ]

t[1]

(26)

20

additional BBs with differing size and shape were added to the design to complement the previously used BBs. The BBs were characterized with conformation independent descriptors that mainly described electronic properties, hydrophobicity, size, and surfaces. Two PCA models were computed, one for each BB set. The first three score vectors for each BB set, in combination with the molecular descriptors SlogP (an atomic contribution model that calculates logP from the given structure) and total polar surface area for the products, were used as design variables. The three compounds with the highest activity were set as vertices in an inner shell and the center of the vertices was set as center point for the DOOD. The entire candidate set was divided into layers in such a way that the thickness of the two outer layers was equal to 10% of the thickness of the inner layer. The previously synthesized compounds were set as inclusions in the DOOD, and five new compounds were selected. An additional compound that was geometrically closest to the theoretical centre point was added. Both unacetylated and acetylated versions of the entire compound set were synthesized and biologically tested. The acetylated and unacetylated compounds were generally pair-wise active. Five of the six acetylated and all of the unacetylated compounds showed a dose-response pattern, highlighting the usefulness of DOOD to select compounds with likelihood of being biologically active.

In total 51 compounds were synthesized, with yields for the acetylated salicylanilides ranging from 2-60% over two steps, and biologically evaluated. Of the acetylated salicylanilides, 13 displayed higher than 40%

reporter-gene inhibition at 50 μM compound concentration.

5.2. Sa S al li ic cy yl li id de e ne n e a ac c yl y lh hy yd dr ra a zi z id de es s ( (p pa ap pe er r I II I) )

A number of salicylidene acylhydrazides that displayed inhibition of T3S had previously been published by Nordfelth et al. 53 By close inspection of the structure of the compounds and their corresponding biological activity it was concluded that there was no clear SAR. The aromatic rings of both BB sets tolerated substitution with different functional groups or exchange to heteroaromatic systems or fused aromatics without loss of biological activity.

The salicylic aldehydes could also be exchanged for salicylic ethanones,

albeit with a small decrease in reporter-gene signal inhibition. Substructure

searches of the compound library from the original screening campaign

identified a number of compounds in which the salicylic hydroxyl group were

(27)

21

lacking or were replaced by an alkyloxy substituent. Those compounds completely lacked T3S inhibition, indicating that the hydroxyl group was of vital importance. We believed that the characterization of the compounds would be of great importance to be able to establish QSAR models. The salicylic aldehyde ring tolerated substitution with both polar and hydrophobic substituents. This led us to believe that the SAR was not dependent on the polarity of the salicylic aldehydes, but perhaps the atomic partial charges of the salicylic aldehyde aromatic carbons. Additionally, electron donating and withdrawing substituents would directly affect the pKa of the salicylic phenol proton.

Substructure search for commercially available, and chemically compatible, hydrazides and salicylic aldehydes was performed. Through some of the major commercial sources (Aldrich, Acros, Alfa Aesar, Maybridge, and ABCR), 48 salicylic aldehydes and 92 hydrazides were readily available for ordering. Prior to the computation of molecular descriptors, conformational analysis was performed for each BB set. For the salicylic aldehydes a conformational search was performed using the software OMEGA 55 with default settings. For the hydrazides a stochastic conformational search was performed in MOE 20 . The lowest energy conformations of each BB set were geometry optimized using Hartree-Fock calculations. The pKa of the phenol proton and the atomic partial charges of the aromatic carbon atoms of all salicylic aldehydes were calculated. In addition molecular descriptors describing shape, size, hydrophobicity, surface properties, electronic properties, and partial charges were computed for both BB sets. Based on the previously published structures it appeared that the biological response was more sensitive to the substitution pattern of the salicylic aldehydes and tolerated a greater structural variety of hydrazides. To emphasize some of the ab initio calculated properties, which were believed to be important for T3S inhibition, such as pKa, partial charges, and orbital energies, the MM descriptors describing hydrophobicity and charges were grouped in two separate groups and summarized using PCA. The score vectors from those two models and the other ungrouped variables comprised the descriptor set for the salicylic aldehyde BB set.

The design was performed on the BB level and to reduce computational time

a selection of products was planned based on the selected BBs. For each BB

set a two layer DOOD was computed resulting in 18 hydrazides and 17

salicylic aldehydes. 5-bromo-salicylic aldehyde, a BB that had been used to

synthesize several active salicylidene acylhydrazides, 53 was added to make

the two BB sets of equal size. Each BB was planned to be used three times in

(28)

22

the final products, resulting in 54 salicylidene acylhydrazides. By using each selected BB three times the risk of erroneous conclusions about the BBs in subsequent SAR analysis would hopefully be minimized. The two BB sets were listed in random order in two separate columns. Combination of the two columns yielded the first 18 virtual products. The hydrazide column was then shifted one step downwards so that the eighteenth BB became the first and the two columns were combined anew to yield the second set of 18 virtual products. The procedure was repeated to yield the final 18 compounds and combination of the three sets gave the 54 targets for synthesis (figure 5.4).

Figure 5.4. Systematic combination of BBs to yield a set of 54 virtual products where each BB is represented three times.

Out of the 54 target compounds 50 could successfully be synthesized with a purity of at least 95 % and generally more than 98%. One of the hydrazide BBs, butyric acid hydrazide, was unreactive under the reaction conditions used and thus three of the target compounds could not be synthesized. The final failed synthesis was due to problematic purification of the target compound. All compounds were biologically evaluated for reporter-gene inhibition and phosphatase activity originating from secreted YopH was measured.

Before starting the QSAR modeling the compounds not specifically targeting T3S had to be removed. Phosphatase activity originating from secreted YopH had been measured and the compounds that inhibited the reporter-gene signal with at least 40% at 50 μM and reduced the YopH activity were classified as active. In addition the inhibition of the reporter-gene needed to be dose-

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 6 16 17

18 S al ic y lic al dehy de B B s 18 1

2 3 4 5 6 7 8 9 10 11 12 13 14 6 15 16

17 S al ic y lic al dehy de B B s 17 18

1

2

3

4

5

6

7

8

9

10

11

12

13 6

14

15

16 S al ic y lic al dehy de B B s

(29)

23

dependent. Bacterial growth experiments were performed for all compounds

to verify that the observed activity was not the result of toxicity. No or

modest effect on growth was observed. Five compounds were removed from

the modeling due to lack of reduction of YopH activity. 18 out of 50

salicylidene acylhydrazides were classified as active.

(30)

24

(31)

25

6. QSAR modeling of type III secretion inhibitors

Since there were no published SAR analyses for the acetylated salicylanilides and the salicylidene acylhydrazides, QSAR models were established by first expanding descriptors or PPs to include higher order terms. Subsequent variable selection identified terms that correlated with the investigated response. This section describes the QSAR modeling of the two compound classes.

6.1. S Sa a li l ic cy yl la a ni n il li id de e s s ( (p pa ap pe er r I I) )

Before starting the QSAR modeling we had to decide whether to perform the modeling on the acetylated or the unacetylated compounds, or both. Since unacetylated salicylanilides had been reported as proton motive force uncouplers, 56 we decided to perform all modeling on the acetylated compounds. The acetylated salicylanilides showed an even spread in biological activity at 20 and 10 μM compound concentrations and those inhibitory values were used as response variables. In total 15 acetylated salicylanilides were used in the training set. A number of conventional methods were used in attempts to compute QSAR models. Expansion of the DOOD parameters for the BBs followed by PLS regression and variable selection did not lead to any significant models. The description of the molecules used for the SMDs was only based on 1D and 2D descriptors. A larger number of descriptors including 3D descriptors were computed in an attempt to address the problematic QSAR modeling. A rough conformational search was performed where each bond was rotated 60º and the lowest energy conformation of each BB was further energy minimized using the MMFF94 force field in MOE 20 . Additional 1D, 2D, and 3D MM and semi- empirical descriptors were computed. Local PCA models for the BBs were computed and the score vectors were extracted and expanded. PLS regression and variable selection did not yield any models with positive Q 2 values.

Much of the variation found among the BBs apparently did not have any

correlation with the response.

(32)

26

We needed to identify the variation in the BBs that correlated with the response. To do this PLS regression was performed on the BB level. PLS components were added until R 2 X reached 1.0 and the score vectors were extracted, combined, and expanded. PLS regression followed by variable selection gave a two component model (Q 2 = 0.82). Figure 6.1 schematically illustrates the methodology used, figure 6.2 the observed versus calculated data, and figure 6.3 shows the model coefficients.

Figure 6.1. Schematic representation of the QSAR modeling of the acetylated salicylanilides. X

1

and X

2

are the matrices of BBs characterized with 1D, 2D, and 3D descriptors. X

3

and X

4

are the PLS score vectors derived from the BBs. X

5

are the square terms of X

3

, and X

6

the square terms of X

4

. X

7

are the interaction terms between X

3

and X

4

. Y is the reporter-gene signal inhibition at 10 and 20 μM compound concentration.

X 1 Y

PLS

X 2 Y

PLS Acetylated

Salicylic Acids Anilines

Acetylated Salicylanilides

Calculation of

descriptors Calculation of

descriptors

X 3 X 4

6 PLS score vectors

7 PLS score vectors

X 3 X 4 X 5 X 6 X 7 PLS

Y

Expansion of

linear terms

(33)

27

Figure 6.2. Calculated versus experimental data at a) 10 μM and b) 20 μM compound concentrations.

Figure 6.3. QSAR model coefficients for the responses at 10 μM and 20 μM compound concentrations. The models consisted of nine linear terms (grey) and five interaction terms (black).

The compounds displaying a high T3S inhibition generally had oval-shaped, large anilines with large hydrophobic and negatively charges surfaces. The HOMO and LUMO orbital energies were generally low. The salicylic rings of the potent T3S inhibitors had large dipole moments, high hardness, high density, large negative charges, large negatively charged surfaces, and high HOMO orbital energies.

0 20 40 60 80 100

0 20 40 60 80 100

Experimental % reporter-gene signal inhibition

Calculated % reporter-genee signal inhibition

6a

7a

1a

18a 16a

14a 17a 21a

22a

23a

25a 2a

26a 5a

3a

a

0 20 40 60 80 100

0 20 40 60 80 100

Experimental % reporter-gene signal inhibition

Calculated % reporter-gene signal inhibition

6a

7a

1a

16a 18a

14a 17a

21a

22a

23a

25a 2a

26a 5a

3a

b

-0.4 -0.2 -0.0 0.2 0.4

t1_Sal t2_Sal t3_Sal t4_Sal t1_An t2_An t5_An t6_An t7_An t1_Sal x t6_An t2_Sal xt1_An t3_Sal x t7_An t4_Sal xt2_An t1_An xt5_An

Coefficients of % inhibition (10 μM) t1_Sal t2_Sal t3_Sal t4_Sal t1_An t2_An t5_An t6_An t7_An t1_Sal xt6_An t2_Sal xt1_An t3_Sal xt7_An t4_Sal xt2_An t1_An xt5_An

-0.4 -0.2 -0.0 0.2 0.4

Coefficients of % inhibition (20 μM)

(34)

28

To get a complementary model that could discriminate between active and inactive acetylated salicylanilides, a PLS-discriminant analysis (PLS-DA) model was computed using the same strategy as outlined above. The luciferase light emission inhibition at 50 μM compound concentration was used as response and compounds were classified as active if displaying at least 40% inhibition. PLS regression on the BB level resulted in 11 score vectors for the anilines and salicylic acids respectively. Combination and expansion of the score vectors followed by PLS regression and variable selection resulted in a one-component PLS-DA model (R 2 Y = 0.75, Q 2 = 0.65). The model showed good separation of the two classes along the direction of the PLS component (figure 6.4). The model was more complex than the PLS QSAR model, consisting of 15 linear terms, 1 square term, and 9 interaction terms (figure 6.5).

Figure 6.4. The PLS-DA model shows separation between the active (boxes) and inactive compounds (open circles) along the PLS score vector.

-6 -4 -2 0 2 4 6

0 2 4 6 8 10 12 14 16 18 20 22 24 26

t[1]

Num

6a 7a

1a 16a

14a 17a

21a 22a

23a 25a

2a

5a 3a

10a 9a 8a

11a 13a

12a 18a

15a

19a 20a

24a 26a 2 SD

2 SD 3 SD

3 SD

(35)

29

Figure 6.5. Coefficients of the PLS-DA model. The model consists of 15 linear (grey), one square (white), and nine interaction terms (black).

6.2. S Sa a li l ic cy yl li id de en ne e a a cy c yl lh hy yd dr ra a zi z id de es s ( (p pa a pe p e r r I II I ) )

Of the compound concentrations investigated, 25 μM gave the best spread in inhibition of T3S and the inhibition of the luciferase light emission signal observed at that concentration was therefore selected as response for modeling. The compounds that dose-dependently inhibited the reporter-gene signal with at least 40% at 50 μM and reduced YopH activity were classified as active. According to these criteria, 18 compounds were classified as active.

The first attempt to establish a QSAR model was to use the SMD parameters in an effort to establish a linear model using PLS regression. No model could be computed and the data was therefore expanded with square, cubic, and interaction terms. PLS regression and variable selection did not result in any significant model. Much of the variation found in the descriptions of the compounds did not appear to have any correlation with the biological response. The same strategy as outlined for the salicylanilides was then applied. PLS regression at the BB level yielded 14 PLS score vectors for each BB set. The PLS score vectors were combined and expanded with square and interaction terms. PLS regression and variable selection gave a one- component model (Hi-PLS-1, R 2 Y = 0.67, Q 2 = 0.51) that showed an S- shaped correlation between experimental and calculated luciferase signal inhibition (figure 6.6).

-0.2 -0.1 -0.0 0.1 0.2

Sal_t1 Sal_t3 Sal_t4 Sal_t6 Sal_t8 Sal_t10 Sal_t11 An_t1 An_t2 An_t3 An_t4 An_t7 An_t8 An_t10 An_t11 An_t4 x An_t4 Sal_t1 x An_t1 Sal_t3 x Sal_t4 Sal_t6 x Sal_t10 Sal_t6 x An_t10 Sal_t8 x Sal_t11 An_t2 x An_t8 An_t3 x An_t4 An_t7 x An_t11 An_t8 x An_t10

Coefficients for Class Separation

(36)

30

Figure 6.6. Calculated versus experimental % reporter-gene signal inhibition at 50 μM compound concentration of Hi-PLS-1. The data shows curvature, indicating that additional non-linear variables might be needed to get a more linear relationship. The last two numbers from the compound IDs are shown in the plot.

The PLS model was constructed from compounds showing a dose-dependent response and some inactive ones that also were calculated to be inactive in the model. One compound (ME0157) was included in the initial model computation as an inactive, but no models could be computed when that salicylidene acylhydrazide was in the training set. That compound might have been inactive due to, for example, efflux, or poor membrane permeability.

This highlights the importance to remove compounds that do not share the same mechanism or are inactive due to other reasons than, for instance, poor affinity to the receptor.

Just like the QSAR model of the salicylanilides, the model for the salicylidene acylhydrazides could not predict inactive compounds reliably.

Inactive compounds are generally harder to correctly predict, partly since the inactivity can be due to several reasons not related to affinity. To classify inactive and active compounds, a PLS-DA model was computed using the same methodology as outlined in figure 6.1. The entire set of 50 salicylidene acylhydrazides, excluding ME0157 and five compounds that displayed inhibition of the luciferase light emission signal but lacked inhibition of

Hi-PLS-1

0 20 40 60 80

0 20 40 60 80

50 51

52 53

57 59

60

62

64 65

66

68 69

72

73 74 77

78

80

81

84

94

96 97

98

E xp er im en tal % re p o rt e r- g en e s ig n a l in h ib it io n

Calculated % reporter-gene signal inhibition

(37)

31

YopH activity, was used as training set. Compounds were classified as active if they displayed a minimum of 40% inhibition of the luciferase light emission signal, otherwise inactive. PLS regression at the BB level yielded 16 PLS score vectors for each BB set. The score vectors were combined and expanded with square and interaction terms. PLS regression and variable selection gave a one-component PLS-DA model (Hi-PLS-DA-1, R 2 Y = 0.67, Q 2 = 0.55) that showed separation of active and inactive compounds along the direction of the PLS score vector (figure 6.7).

Figure 6.7. The PLS-DA-1 model shows good separation of active (boxes) and inactive (circles) compounds along the direction of the PLS score vector. The last two numbers from the compound IDs are shown in the plot.

Hi-PLS-1 and Hi-PLS-DA-1 both displayed reasonable statistics, but were difficult to interpret. Hi-PLS-1 consisted of 33 terms (figure 6.8) while Hi- PLS-DA-1 consisted of 42 terms (figure 6.9). The problematic interpretation stems from the fact that each PLS score vector from the same BB set contains all descriptors but with different weights applied to the individual descriptors.

-6 -4 -2 0 2 4 6

t[ 1 ]

Compound ID

50

51

59

62 65

66 68

69

74

77 80

84 98

52 53 5455

56 57

58 60

61 64

70 71

72 73

75 76

78 79

81 82

83 85

86 87

88 89 94 95

96 97

99

3 SD 2 SD 2 SD 3 SD

(38)

32

Figure 6.8. Coefficients for luciferase light emission signal inhibition of Hi-PLS-1. The model consists of 19 linear terms, one square term, and 13 interaction terms. Four out of the 19 linear terms (black) were used for interpretation.

Figure 6.9. Coefficients for separation of active and inactive salicylidene acylhydrazides of Hi-PLS-DA-1. The model consisted of 23 linear terms and 19 interaction terms. Of the 23 linear terms, three were used for interpretation (black).

To get interpretable models an additional strategy was employed. The descriptors calculated for the BB sets were grouped based on the chemical features they described, forming groups for such features as size and hydrophobicity. Those descriptors that did not fit into any group were kept separate. PCA models were computed for each group separately and the PCA score vectors combined with the ungrouped descriptors were used as variables in QSAR modeling. The process is summarized in figure 6.10.

w *c (Hi-PLS-1)

-0.4 -0.2

-0.0 0.2 0.4

SAL2_t1 SAL2_t2 SAL2_t3 SAL2_t4 SAL2_t5 SAL2_t8 SAL2_t9 SAL2_t10 SAL2_t12 SAL2_t13 SAL2_t14 HYD2_t1 HYD2_t2 HYD2_t3 HYD2_t4 HYD2_t5 HYD2_t6 HYD2_t7 HYD2_t8 SAL2_t12 x SAL2_t12 SAL2_t1 x SAL2_t9 SAL2_t1 x HYD2_t4 SAL2_t1 x HYD2_t5 SAL2_t2 x SAL2_t14 SAL2_t3 x SAL2_t14 SAL2_t3 x HYD2_t7 SAL2_t4 x SAL2_t9 SAL2_t5 x HYD2_t3 SAL2_t5 x HYD2_t8 SAL2_t8 x HYD2_t1 SAL2_t10 x HYD2_t3 SAL2_t13 x HYD2_t6 SAL2_t14 x HYD2_t2 inhibition 50µM

-0.4 -0.2 -0.0 0.2 0.4

SAL1_t1 SAL1_t2 SAL1_t5 SAL1_t7 SAL1_t8 SAL1_t9 SAL1_t10 SAL1_t11 SAL1_t12 SAL1_t15 SAL1_t16 HYD1_t1 HYD1_t3 HYD1_t4 HYD1_t5 HYD1_t7 HYD1_t8 HYD1_t10 HYD1_t11 HYD1_t12 HYD1_t14 HYD1_t15 HYD1_t16 SAL1_t2 x SAL1_t8 SAL1_t2 x SAL1_t10 SAL1_t2 x SAL1_t16 SAL1_t5 x SAL1_t7 SAL1_t7 x SAL1_t11 SAL1_t7 x HYD1_t1 SAL1_t7 x HYD1_t12 SAL1_t9 x SAL1_t15 SAL1_t9 x HYD1_t11 SAL1_t10 x HYD1_t1 SAL1_t10 x HYD1_t5 SAL1_t10 x HYD1_t10 SAL1_t12 x HYD1_t4 SAL1_t12 x HYD1_t15 SAL1_t15 x HYD1_t7 SAL1_t15 x HYD1_t10 HYD1_t3 x HYD1_t10 HYD1_t8 x HYD1_t14 HYD1_t12 x HYD1_t16 Class 1 (actives) Class 2 (inactives) w *c (Hi-PLS-DA-1)

(39)

33

Figure 6.10. Establishment of QSAR models based on grouped variables. The descriptors of each BB set were grouped in six groups for the salicylic aldehydes (X1

1-6

) and five for the hydrazides (X2

1-5

). The descriptors that did not fit into any group were kept as separate variables (X1

U

and X2

U

). A PCA model was computed for each group of variables (X1

1-6

and X2

1-5

) and the PCA score vectors were extracted and combined with the ungrouped variables (X1

U

and X2

U

), forming the two X-blocks (X

SAL

and X

HYD

) used in PLS modeling. X

SAL

and X

HYD

were combined and expanded followed by PLS regression and variable selection. Y is the reporter-gene signal inhibition at 25 μM compound concentration.

Using the outlined strategy illustrated in figure 6.10 and the same training set as for Hi-PLS-1, Hi-PLS-2 (figure 6.11, R 2 Y = 0.69, Q 2 = 0.53) was computed. Hi-PLS-2 gives a better correlation between the inactive compounds experimental inhibition and their calculated inhibition than Hi- PLS-1 (figure 6.6). The middle-active compounds are not accurately calculated, especially in Hi-PLS-2. The compounds span 20% to 70%

luciferase signal inhibition, while the calculated values in Hi-PLS-2 range from 30% to 55%. The terms constituting Hi-PLS-2 were readily interpretable, but the large number of terms made it impossible to directly translate the model terms into an optimal T3S inhibitor (figure 6.12).

Descriptors for salicylic aldehydes

and hydrazides Moment of

Inertia

Atomic Partial Charges

Size Descriptors

Surface Descriptors

Charge Descriptors

Hydrophobicity Descriptors

Moment of Inertia

Size Descriptors

Surface Descriptors

Charge Descriptors

Hydrophobicity Descriptors Grouping

of variables

Grouping of variables

Variables that did

not fit into any

group

PCA PCA

X1

1

X1

2

X1

3

X1

4

X1

5

X1

6

X2

1

X2

2

X2

3

X2

4

X2

5

X1

U

X2

U

PCA score vector extraction and combination with

ungrouped variables

PCA score vector extraction and combination with

ungrouped variables

Combination and expansion of data

X

SAL

X

HYD

X

SAL

X

HYD

X

SAL2

X

HYD2

X

SAL

X

HYD

PLS

Y

Salicylic aldehydes Hydrazides

(40)

34

Figure 6.11. Experimental versus calculated luciferase signal inhibition plot at 50 μM compound concentration of Hi-PLS-2. The last two numbers from the compound IDs are shown in the plot.

Figure 6.12. Coefficients for luciferase signal inhibition of Hi-PLS-2. The coefficients in black were used for interpretations. The model is highly complex to interpret with its four linear and 19 non-linear terms.

0 20 40 60 80

0 20 40 60 80

50 51

52 53 57

59

60

62

64

65

66

68 69

72 73

74 77

78

80

81

84

94

96 97

98

E xp er im en tal % re p o rt e r- g en e s ig n a l in h ib it io n

Calculated % reporter-gene signal inhibition Hi-PLS-2

-0.4 -0.2 0.0 0.2

SAL_dipole SAL_polarizability SAL_pKa HYD_KierFlex HYD_LUMO HYD_Gap HYD_polarizability SAL_shape SAL_ar_charges_t2 SAL_ar_charges_t3 HYD_surfaces_t2 HYD_hydrophobicity HYD_charges_t2 HYD_size HYD_shape SAL_surfaces_t2 SAL_size SAL_polarizability x HYD_shape SAL_pKa x HYD_KierFlex SAL_pKa x HYD_LUMO SAL_pKa x HYD_Gap SAL_pKa x HYD_polarizability SAL_pKa x HYD_surfaces_t2 SAL_pKa x HYD_charges_t2 SAL_pKa x HYD_size HYD_KierFlex x SAL_shape HYD_KierFlex x SAL_ar_charges_t2 HYD_Gap x SAL_ar_charges_t2 HYD_polarizability x SAL_shape HYD_polarizability x SAL_ar_charges_t2 SAL_shape x HYD_hydrophobicity SAL_shape x HYD_charges_t2 SAL_shape x HYD_size SAL_ar_charges_t2 x HYD_charges_t2 SAL_ar_charges_t2 x HYD_size SAL_ar_charges_t3 x HYD_hydrophobicity inhibition 50µM w *c (Hi-PLS-2)

References

Related documents

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

Inom ramen för uppdraget att utforma ett utvärderingsupplägg har Tillväxtanalys också gett HUI Research i uppdrag att genomföra en kartläggning av vilka

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

I regleringsbrevet för 2014 uppdrog Regeringen åt Tillväxtanalys att ”föreslå mätmetoder och indikatorer som kan användas vid utvärdering av de samhällsekonomiska effekterna av

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar

Denna förenkling innebär att den nuvarande statistiken över nystartade företag inom ramen för den internationella rapporteringen till Eurostat även kan bilda underlag för