This dissertation has been submitted by a student. This is not an example of the work written by our professional dissertation writers.

Combinatorial library enumeration and lead hopping using Comparative Interaction Fingerprint Analysis (CoIFA) and classical 2D QSAR methods for seeking novel GABAA α3 modulators

Abstract

Selective modulators of GABAA α3 (Gamma Amino Butyric Acid α3) receptor are known to alleviate the side effects associated with non specific modulators. A follow up study was undertaken on a series of functionally selective phthalazines with an ideological credo of identifying more potent isofunctional chemotypes. A bioisosteric database enumerated using combichem approach endorsed mining in a lead like chemical space. Primary screening of the massive library was undertaken using “Miscreen” toolkit, which uses sophisticated bayesian statistics for calculating bioactivity score. The resulting subset thus obtained was mined using a novel proteo-chemometric method that integrates molecular docking and QSAR formalism termed CoIFA (Comparative Interaction Fingerprint Analysis). CoIFA encodes protein- ligand interaction terms as propensity values based on a statistical inference to construct categorical QSAR models that assist in decision making during virtual screening. In the absence of an experimentally resolved structure of GABAA α3 receptor, standard comparative modeling techniques were employed to construct a homology model of GABAA α3 receptor. A typical docking study was then carried out on the modeled structure and interaction fingerprints generated based on the docked binding mode, were used to derive propensity values for the interacting atom pairs that served as pseudo energy variables to generate CoIFA model. The classification accuracy of the CoIFA model was validated using different metrics derived from a confusion matrix. Further predictive lead mining was carried out using a consensus 2D QSAR approach, which offers a better predictive protocol compared to the arbitrary choice of a single QSAR model. The predictive ability of the generated model was validated using different statistical metrics and similarity based coverage estimation was carried out to define applicability boundaries. Few analogs designed using the concept of bioisosterism were found to be promising and could be considered for synthesis and subsequent screening.

Abbreviations

CoIFA – Comparative Interaction Fingerprint Analysis,

GABAA - Gamma Amino Butyric Acid type

A, HTS – High Throughput Screening,

KNN- K-Nearest Neighbor,

GOLD -Genetic optimization in ligand Docking,

RD-QSAR- Receptor Dependent QSAR.

Introduction

Gamma Amino Butyric Acid type A (GABAA) ionotropic receptor is the major inhibitory neuronal receptor of the mammalian brain conferring fast synaptic inhibition1.The physiological role exerted by GABAA in regulating brain excitability and its pharmacological significance as a valuable drug target for many neuronal disorders has surged an renewed interest for a cohesive understanding of its structure and function. The heterogeneous nature of GABAA receptor and its low abundance (pmol/mg of protein), coupled with the inherent difficulties associated in isolation and purification of integral membrane proteins have precluded structural investigations on GABAA receptor2,3. Purification, cloning and sequencing of the GABAA receptor and its composite subunits have allowed identifying 21 subunits arranged within 8 families (α1-6, β1-4, γ1-4, 1δ, 1ε, 1π, 1θ and ρ1-3)4. Given the heterogeneity of GABAA receptor, the pharmacological significance of identifying subtype selective modulators is increasingly being recognized5,6,7. Of the numerous structural classes of drugs that have been shown to bind to the BZ (benzodiazepine) site of GABAA receptor, relatively few display subtype selectivity and hence associated with side effects such as amnesia, tolerance, dependence and alcohol potentiation 5,6,7. Recent investigations have revealed that ligands displaying functional selectivity for GABAA α2 and α3 receptor subtypes act as non-sedating anxiolytics in animal models5,6,7. A drug discovery project conducted by Merck research laboratories, identified a series of Phthalazines that exhibit functional selectivity towards GABAA α38,9. Starting with the available information in hand, further exploration was undertaken in pursuit of identifying additional leads. A targeted library of privileged molecules was enumerated using combinatorial approach. Intelligent enumeration involves the rational choice of building blocks. Bioisosteric building blocks that are ostensibly known to yield molecular entities imparting similar biological properties were considered as ideal replacements for the marked Markush fragments. Such bioisosteric replacements have proven successful particularly for the anxiolytic drug segment, where the methylated carboxamide function of diazepam when replaced by its bioequivalent methylated triazole group lead to the identification of alprazolam10. In the present study four classes of bioisosters were identified and a virtual combinatorial library was enumerated by exploring all possible combinations on the mono-dentate attachment point marked on the scaffold. Thus, the principle of analog design, widely exploited in medicinal chemistry was used to create a targeted library.

High throughput virtual activity profiling was carried out using Molinspiration “Miscreen” toolkit11, which prioritizes molecules in the databases using a supervised Bayesian statistical method. Further lead mining of the combinatorial subset was carried out using a novel categorical receptor dependent 3D-QSAR strategy termed as CoIFA (Comparative Interaction Fingerprint Analysis), a variant of other ligand-receptor based QSAR methods like COMBINE12(Comparative Binding Energy), AFMoC13 (Adaptation of Fields for Molecular Comparison) and CoRIA14 (Comparative Residue Interaction Analysis).

Underlying the CoIFA methodology is the generation of an interaction fingerprinting scheme that encodes the non-bonded interactions between ligand and protein based on the putative 3D binding modes (docked solutions). Pair wise interaction fingerprints observed in the protein-ligand complex are assigned knowledge based probabilistic contact values which in turn serve as predictor (X) variables for QSAR analysis to yield categorical models that also provides mechanistic insights regarding binding modes.

In the absence of experimentally resolved structure of GABAA receptor, comparative modeling was employed to generate an atomistic model of GABAA α3 receptor. An open state conformation of GABAA α3 was modeled using the standard homology modeling techniques employed by Modeller15 based on the X-ray structure of the molluscan AchBP16, a homologue of GABA.

A typical docking study was carried out on the dataset using GOLD docking suite17. Based on the docked binding mode, probabilistic atom-pairs contact values were generated using the PLIF 18,19 module of MOE20, and a decision tree, was construed using a recursive partitioning algorithm to generate a robust categorical QSAR model21. The classification accuracy of the CoIFA model was validated using the standard parameters inferred from the confusion matrix. CoIFA based QSAR models are categorical, hence we supplemented classical QSAR based methods for predictive lead mining.

Classical type QSAR models were developed using the 2D descriptors available in MOE20 and fitting was carried out using two statistical methods (GFA22, G/PLS23). The developed models were subjected to rigorous validation, with emphasis on model stability and predictivity. Activity forecasting was carried out using a consensus QSAR model, anticipated to provide a better model compared to the arbitrary choice of a single QSAR model24,25. A key component that needs to be evaluated, particularly when extrapolating QSAR models for lead mining is to ensure that the predictions came from the domain upon which the model was calibrated. Hence descriptor based applicability domain estimation was carried out using an integrated approach that incorporates two pattern recognition algorithms, K-Means and KNN to define applicability boundaries based on Euclidean distance measures26.

QSAR paradigm has been of interest in the field of medicinal chemistry generally as a retrospective tool to rationalize the lead optimization process in drug design. As an adaptive response to the changing scenario the present study demonstrates how QSAR techniques can be used as a tool to screen millions or perhaps billions of molecules to identify potentially bioactive molecules.

MATERIALS AND METHODS

All computational studies outlined here were performed using the following software packages. Discovery studio 2.027, GOLD 3.228, BROOD29, ROCS29, EON29 MOE20, Miscreen11 and TSAR 3.030 running on a Pentium core2 Duo workstation using Windows XP operating system. Cerius2 4.1031 software, based studies were carried out on a SGI Fuel workstation running on IRIX 6.5 operating system

Selection and modeling of dataset.

A crucial prerequisite in carrying out a successful QSAR analysis is the selection of an internally consistent data set. Hence biological data acquired by the same group under the same assay method was pooled from literature sources to render them comparable. The skewness in the dataset was removed by converting the reported Ki (nM) values8,9 to PKi using a simple logramethic transformation log (1/Ki). Data set compounds were modeled using the 3D Sketcher module of Cerius2. Initial 3D structure optimization was carried out using CORINA32, further charge assignment and energy optimization was carried out using the COSMIC module of TSAR.

Bioisterism guided Virtual combinatorial library enumeration

Combinatorial chemistry, which orchestrates the creation of large compound libraries, was used to enumerate a targeted library33. Central to the design of a targeted library, is the identification of R groups that are key SAR determinants and the identification of high quality building blocks for intelligent enumeration34. The SAR report functionality of MOE was used to identify key R group motifs18,35 that were structurally transformed using the concept of bioisosterism36,37. Potential isosters for the marked Markush fragments were identified from a biosisosteric database containing fragments of synthetic tractability using Brood software29. Four classes of isosters that matches the query fragment in terms of shape and atom-type, electrostatics, structure and graph were identified. A total of 856 isosteric fragments were identified for R1 and 462 isosters for R2 after carrying out a redundancy check using an SVL script of MOE. QuaSAR-CombiGen module of MOE was then used to enumerate a virtual library of all possible products that could be combinatorially obtained from the set of fragments. The targeted library thus generated can be screened en masse, and yields better enrichment rates than a random library or diverse library38. Markush fragments considered for isosteric replacement are shown in 1 and the core scaffolds used for clipping the fragments are shown in 2.

Bayesian statistics based activity profiling

A targeted library consisting of 790944 molecules was screened using “Miscreen” virtual screening toolkit11, which uses sophisticated Bayesian statistics for screening large compound libraries based on fragment profile comparisons. A bioactivity model was developed by generalizing the occurrence of substructural fragments among a set of active and inactive phthalazines. Construction of the Bayesian model was based on the reported PKi values. Compounds with PKi value < 8 were assigned to be inactive and those with values > 8 were regarded as active compounds. Such a cut off value appears to be a reasonable starting point for hit-to-lead activity.The bioactivity model developed was validated for its enrichment ability and the validated model was subsequently used for virtual activity profiling. Miscreen provides an, “Bioactivity score” for each compound that ranges between -3 to 3 which is calculated as a sum of the activity contributions of the fragments present in the molecules. Screening using “Miscreen” toolkit is very fast (approximately 300'000 molecules in 30 mins on a Pentium core2 Processor)

Comparative Interaction Fingerprint Analysis (CoIFA) – A novel categorical 3D QSAR

With the ever increasing number of experimentally derived structures available across public domains like PDB39 and computational methods like Homology modeling gaining momentum, 3D QSAR methods seldom harness this valuable information. The amount of structural information incorporated in 3D QSAR methodologies has been limited to the alignment stage (Docking based alignment) and for visually interpreting the results of a QSAR model40. Approaches like CoMFA41 use pseudo receptor modelling techniques to build QSAR models and are purely ligand based in nature. Variant approaches like COMBINE12, CoRIA14 and AFMoC13 are termed as Receptor dependent (RD) QSAR approaches, as they explicitly use the structural information of both the ligand and receptor to correlate the observed activity with the selected variables. COMBINE12 and CoRIA14 use molecular mechanics and Poission Boltzman methods for calculating residue wise energy components, which are decomposed and translated in to a regression based QSAR model. AFMoC13 is a reverse approach of CoMFA41 that works by placing a grid into a binding site that maps the pair-potentials between the protein and ligand atoms resulting in “potential fields”. The pair potentials are calculated using the knowledge-based scoring function, DrugScore.

As a valuable extension to the classical 3D-QSAR paradigm, we report herein a novel proteo-chemometric based QSAR formalism termed CoIFA (Comparative Interaction Fingerprint Analysis). CoIFA method integrates molecular docking and QSAR formalism for deriving a categorical QSAR model. Derivation of a CoIFA model essentially involves four steps that share analogy with other 3D QSAR approaches.

(i) Receptor guided alignment based on docking.

(ii) Summarizing the interactions in the protein – ligand complex using knowledge based probabilistic contact values in lieu of probes that employ simple force field based approaches to calculate energy values on a Grid/Surface.

(iii) Application of classification algorithms for the development of a reliable decision rule for automated classification (active/inactive).

(iv) Validation of the QSAR model for its classification accuracy using different statistical metrics.

CoIFA QSAR modeling is a fundamental shift from other RD-QSAR methods. Being categorical in nature, it has the advantage of modeling datasets with biological endpoints (Ki, IC50, Kd) expressed as discrete and continuous data. The core strengths of CoIFA models could also be attributed to the nonlinear modeling technique (Recursive Partitioning Method) that relies on generalizations based on weighing a set of active and inactive molecules.

Being a categorical QSAR approach that relies on non linear modeling technique the error rate would be significantly less, since no attempt is made to minimize the squared error between the model and the observed data as in other regression based fitting methods. Moreover Decision tree methods (Recursive Partitioning Methods) are also known to be tolerant to noisy experimental data and the model building process is also very fast in contrast to neural networks methods that require a lengthy training phase. When one considers the merits, it is also prudent to mention the caveats of CoIFA based approach. Though CoIFA approach incorporates protein-ligand cross terms based on statistical potentials that are relatively faster, the solvent and entropy terms are overlooked in CoIFA approach. Another concern with the use of statistical potentials is that CoIFA approach would not model the interactions correctly when the atom pair interaction in question doesn't fall in to the class upon which it was trained42. The list of the primary atom types considered to derive the statistical potential is available via reference43. Further it should be noted that geometry based evaluation of non-bonded interactions are loose approximations because it is ambiguous to differentiate the interaction between two given hydrophobic groups in close proximity as attractive or repulsive without the incorporation of force field based energy terms44. Another obvious concern for all RD QSAR approaches is the unavailability of 3D structure of the receptor and also a sizeable amount of active and inactive compounds. Inactive compounds are generally not available through literature sources and in such circumstances presumed inactives could be collected from publicly available datasets like the DUD45 decoy set.

In the absence of the crystal structure of GABAA α3 receptor, homology modeling was carried out to model the pentameric structure of GABAA α3 receptor. Amino acid sequence of human GABAA alpha3, Beta2 and Gamma2 were retrieved from Swiss Prot46 (P34903, P47870 and P18507). The signaling peptide region at the N terminal was truncated and the mature protein sequence was used for modeling. All subsequent amino acid numbering is based on the mature sequence without the signal peptide. The crystal structure of the molluscan AchBP PDB id (1I9B)16 was identified as the modeling template based on a BLAST47 search carried out against the PDB (alpha3 30%, Beta2 and Gamma2 27% ). FUGUE48 based alignment that employs a sequence – structure based alignment method and particularly suited in case of low sequence identity was employed for alignment. Standard modeling techniques were carried out using the Modeller module of Discovery studio software suite to assign atomic coordinates to structurally conserved regions (SCR), building intervening loops, optimizing the rotamers of amino acid side chains, and to perform an initial energy optimization of the structure. Three dimensional structure of GABAAα3 was modeled in the presence of the crystal coordinates of HEPES (N-2-hydroxyethylpiperazine-N'-2-ethanesulphonic acid), a buffer molecule that occupies the Benzodiazepine site. This was made possible using the “copy ligand” option of Modeller. The best ranked model based on probability density function (PDF) violations was chosen and further evaluated using the routine protein structure validation tools.

To simulate the binding mode of the flexible ligands to the rigid homology modeled receptor, molecular docking was carried out using GOLD, docking program. GOLD uses an evolutionary Genetic Algorithm (GA)49 for generating ligand conformations and also imparts partial receptor flexibility in an implicit manner by allowing movement around the side chain dihedrals of protein OH and NH3+. The binding site was defined using the coordinates of HEPES (residues within 10Å from the ligand). The 'Standard default mode' which comprises of 100000 genetic operations on an initial population of 100 members divided into five subpopulations (Number of islands = 5) was used. The annealing parameters for the fitness function were set at, 4.0 for van der Waals and 2.5 for hydrogen bonding. A niche size of 2, and a selection pressure of 1.1 were used and the early termination option was turned off. The top ranking pose was retained based on Gold fitness function which served as our presumed bioactive conformation for alignment.

The applications of geometry-based methods in developing knowledge-based scoring function (PMF50,BLEEP51,DrugScore52) and the success of SIFt53 as a post docking tool prompted us to put in place a proteo-chemometric based QSAR methodology. The steric and electrostatic energy terms commonly calculated using a probe atom on a series of grid points surrounding the aligned molecules in 3D space, has been replaced with more realistic protein – ligand atom pair contact potentials. CoIFA descriptors depict the physical interactions within the protein-ligand system based on atom pair contact propensity values. These contact propensity values are knowledge based values estimated by statistical means from a collection of structures available in the Protein Data Bank, which serves as a knowledge base43. Descriptor values calculated using the PLIF module of MOE were subjected to a categorical QSAR study using the Decision tree method21 implemented in the QuaSAR-Classify module of MOE package. The tree classification method employs a recursive partitioning algorithm that consists of two parts: tree growing and tree pruning. The process starts with a training set consisting of pre-classified records. Tree growing begins with the root node, which is the entire learning data and that root node is set as the current node. The partitioning of the root node in to child nodes is carried out according to the rules of the form x < = c (if x is a continuous variable) or x = c (if it is a categorical data), where c is the best value for splitting the node. Splitting is based on the Gini index of diversity

--------------- (1)

where, are the fractions of compounds of each class I (I =1,...,K) in node t.

The goodness of a split is measured by a change in value of C termed as the change of impurity

---------- (2)

Where, PL and PR are the proportions of cases going to the left tL and right tR child nodes. Node splitting stops when the number of compounds in the child node is lower than a predefined threshold or if the distribution of the compounds in the child node becomes homogeneous. The misclassification rate in node t is calculated as, where is the number of compounds of class j, and is the total number of compounds in the node. The total misclassification rate is measured as:

------------- (3)

where, and are the total number of misclassified compounds and the total number of compounds in the data set, respectively.

The initial tree obtained from the training data is generally too big and prone to noise (equivalent to an over parameterized model). Hence high correct classification rates (CCR) are obtained for the training set and usually performs modest for the test set. Pruning solves this issue by removing smaller branches that fail to generalize and the sub tree subsequently identified is referred to as the best sub tree. In the present study cross validation was employed for tree pruning. A modified tree misclassification rate, where L (T) is the number of leaves in the tree, and a > 0 is a parameter that provides a solution for identifying nodes that needs to be pruned. According to this equation, the size of the tree and the misclassification rate are balanced. The classification accuracy of the model was validated using different metrics derived from the confusion matrix. It should be noted that CoIFA model like all other QSAR models is intrinsically biased toward the scaffolds used for training, hence extrapolations should be carried out only within the domain it was trained upon. When embarking lead hopping, it is always advisable to carry out a diversity analysis of the library using some similarity metrics to filter out potential outliers (structurally diversified compounds).

Predictive data mining using Consensus 2D QSAR

CoIFA models are well suited for HTS campaigns, to decide which compound should be passed to the next level of screening. CoIFA models offer a two state prediction level (Active /Inactive) and cannot be a substitute to classical QSAR modeling per se. To augment our virtual screening workflow, we carried out classical QSAR studies to transform the qualitative belief inferred form CoIFA models into a quantitative measure for forecasting activity. Predictive lead mining was carried out using 2D QSAR models of proven statistical quality with extrapolations restricted to the applicability domain. Development of a predictive 2D QSAR model relies on a multitude of factors and one such important prerequisite is the rational selection of training and a test set. A non-hierarchical method called K-means clustering was performed using SPSS54 for the rational selection of a training set and a test set55. Manual selection of compounds was done from each cluster so as to ensure a heterogeneous training set and a test set, homogenous with the training set in terms of activity range and “structural diversity”. K-Means clustering and QSAR model development were carried out using the 2D descriptors (physicochemical, structural and topological) available in MOE. Details of the clusters are provided in supplementary information (SI Table 1). The descriptors used in the study represents physical properties, subdivided surface areas, atom counts and bond counts, Kier and Hall connectivity and kappa shape indices, adjacency and distance matrix descriptors and partial charge descriptors. Two chemometric methods namely GA and G/PLS methods implemented in Cerius2 was employed for variable selection and fitting. Genetic algorithm based variable selection is a stochastic process that mimics biological evolution by undertaking a survival of the fittest approach and is considered superior to models developed using stepwise regression technique. Initial variable reduction was carried out by considering only 40% of those information rich descriptors with highest variance in order to prevent over exploitation of independent variables (X) in fitting the dependent variable (Y). GFA22, which uses a combination of Holland's genetic algorithm for variable selection and Friedman's multivariate adaptive regression splines (MARS) algorithm to build regression models, was employed for model building. Population of equation chosen at random was evaluated using a “fitness function” or Lack of Fit (LOF) that reflects the quality of the model. This cycle is then repeated for a specified number of iterations until a final population (of descriptors) that is better suited to model the endpoint is obtained. The error measurement term LOF is determined by the following equation:

------------- (4)

Where, LSE is the least-squares error, c the number of basis functions in the model, d the smoothing parameter, p the number of descriptors and m is the number of observations in the training set.

A variable usage graph displayed during each GFA run was used to monitor premature convergence of GA. Another statistical method termed G/PLS which combines the best features of both GFA and PLS was also employed. G/PLS has PLS applied to it instead of multiple linear regressions as employed by GFA, hence each model can have more terms in it without the danger of over-fitting, and is particularly valuable in cases when the number of descriptors is more than the number of samples. G/PLS23 retains the ease of interpretation by back transforming the PLS components to the original variables. The robustness of the developed 2D QSAR were assessed for four important qualities namely goodness of fit, model stability, predictive ability and domain applicability. Goodness of fit was adjudged using square correlation coefficient (R2) and explained variance (R2adj). Internal predictive ability of the generated model was evaluated using LOO based q2 and Bootstrap R2. The external predictive ability was validated using a reserved test set. Model stability and risk of chance correlation were evaluated using the Y- Randomization procedure.

Table 1: Structural features, with observed and predicted pKi values along with their substitutions (R1 and R2) .The core scaffolds A and B are shown in 2.

Name

Actual

Pki

(nM)

GPLS

GFA

C

QSAR

R1

R2

Core Scaffold

1

7.39

7.29

6.81

7.05

Bis-(2- methoxy-ethyl)-amine yl

4-Methoxy Phenyl

A

2

6.77

6.83

6.61

6.72

Phenylmethoxy

4-Methoxy Phenyl

A

3

6.92

7.42

7.57

7.49

Methoxy

Phenyl

A

4

7.39

7.09

6.97

7.03

Phenylmethoxy

Phenyl

A

5

6.14

6.69

6.48

6.58

Phenylethoxy

Phenyl

A

6

6.38

6.50

6.21

6.35

Phenylpropoxy

Phenyl

A

7

6.59

6.82

6.62

6.72

2-Chloro Phenylmethoxy

Phenyl

A

8

7.18

7.25

7.01

7.13

3 -Methoxy-Phenylmethoxy

Phenyl

A

9

6.77

6.61

7.08

6.84

3 -Cyano-Phenylmethoxy

Phenyl

A

10

6.52

6.79

6.73

6.76

Thiophene-2-yl-methoxy

Phenyl

A

11

6.62

6.90

6.84

6.87

Thiophene-3-yl-methoxy

Phenyl

A

12

6.74

7.10

6.85

6.98

Furan-2-yl-methoxy

Phenyl

A

13

7.68

7.60

7.20

7.40

Furan-3-yl-methoxy

Phenyl

A

14

8.11

7.91

8.20

8.05

Pyridine -4-yl-methoxy

Phenyl

A

15

9.15

8.58

8.64

8.61

Pyrazine -2-yl - methoxy

Phenyl

A

16

9.66

9.31

9.10

9.21

(1-Methyl-1H-imidazol-2-yl)-methoxy

Phenyl

A

17

7.43

7.23

7.63

7.43

Piperidine-4-yl-methoxy

Phenyl

A

18

7.77

7.75

7.94

7.85

Pyridine -2-yl-ethoxy

Phenyl

A

19

8.92

7.94

8.31

8.13

6-Methyl-Pyridine -2-yl-methoxy

Phenyl

A

20

8.37

8.37

8.57

8.47

5-Methyl-Pyridine -2-yl-methoxy

Phenyl

A

21

8.00

7.85

8.12

7.99

4-Methyl-Pyridine -2-yl-methoxy

Phenyl

A

22

9.00

8.36

8.57

8.46

3-Methyl-Pyridine -2-yl-methoxy

Phenyl

A

23

7.42

7.44

7.47

7.46

Pyridine -2-yl-methoxy

3-Pyridyl

B

24

6.82

7.00

7.15

7.08

Pyridine -2-yl-methoxy

4-pyridyl

B

25

7.42

7.26

7.40

7.33

Pyridine -2-yl-methoxy

3-thienyl

B

26

7.74

7.33

7.59

7.46

Pyridine -2-yl-methoxy

cyclopropyl

B

27

7.28

7.50

7.36

7.43

Pyridine -2-yl-methoxy

3-Methoxy phenyl

B

28

7.49

8.15

8.03

8.09

Pyridine -2-yl-methoxy

4-Methyl phenyl

B

29

8.21

7.70

7.68

7.69

Pyridine -2-yl-methoxy

2-FluroPhenyl

B

30

7.24

7.29

7.28

7.28

Pyridine -2-yl-methoxy

4-FluroPhenyl

B

31

7.60

8.07

8.03

8.05

Pyridine -3-yl-methoxy

Phenyl

B

32

7.89

7.86

7.83

7.85

Pyridine -4-yl-methoxy

Phenyl

B

33

7.74

8.11

8.00

8.05

4-methyl1-Pyridine -2yl methoxy

Phenyl

B

34

7.82

8.33

8.20

8.26

5-methyl1-Pyridine -2yl methoxy

Phenyl

B

35

7.70

8.04

8.03

8.04

6-methyl1-Pyridine -2yl methoxy

Phenyl

B

36

8.62

8.06

8.30

8.18

3-Propyl1-Pyridine -2yl methoxy

Phenyl

B

37

8.44

8.57

8.58

8.57

3,6-di methyl1-Pyridine -2yl methoxy

Phenyl

B

38

9.15

8.75

8.74

8.75

3,5-di methyl1-Pyridine -2yl methoxy

Phenyl

B

39

8.77

8.53

8.54

8.54

3,4-di methyl1-Pyridine -2yl methoxy

Phenyl

B

40

8.77

8.09

8.09

8.09

3,Cyano 1-Pyridine -2yl methoxy

Phenyl

B

41

8.15

7.56

7.46

7.51

3-MeO2C-2-pyridyl

Phenyl

B

42

6.74

7.45

7.21

7.33

3,Hydroxyl-1-Pyridine -2yl methoxy

Phenyl

B

43

8.96

8.68

8.54

8.61

3, ethoxy-1-Pyridine -2yl methoxy

Phenyl

B

44

8.41

9.04

8.81

8.93

3,Methoxyethoxy-2-pyridyl methoxy

Phenyl

B

45

8.64

8.32

8.25

8.28

3,Cyclopropyl-methoxy-2-pyridinyl methoxy

Phenyl

B

46

7.42

7.31

7.55

7.43

3, Cyclobutyloxy-2-pyridinyl methoxy

Phenyl

B

47

7.70

7.87

8.03

7.95

3, Benzyloxy-2-pyridinyl methoxy

Phenyl

B

48

8.07

7.75

7.87

7.81

pyridazin-3-yl methoxy

Phenyl

B

49

7.48

7.56

7.55

7.56

pyrimidin-2-yl methoxy

Phenyl

B

50

8.51

8.19

8.07

8.13

Pyrimidine 6 yl methoxy

Phenyl

B

51

8.52

7.82

7.70

7.76

Pyrazine 6 yl methoxy

Phenyl

B

52

8.00

8.00

8.03

8.02

6-Methyl-pyridazin-3-yl methoxy

Phenyl

B

53

7.11

6.82

7.12

6.97

(1,4 dihydro -quinoline-4-yl)-methoxy

Phenyl

B

54

7.59

8.22

8.24

8.23

(1,2dihydro -quinoxalin-2-yl)-methoxy

Phenyl

B

55

6.70

7.63

7.58

7.60

2-pyridylethoxy

Phenyl

B

56

7.28

7.23

7.50

7.37

2-Cyano-benzyloxy

Phenyl

B

57

7.18

7.03

7.30

7.17

3-Cyano-benzyloxy

Phenyl

B

58

7.37

7.83

7.78

7.81

3-Hydroxymethyl-benzyloxy

Phenyl

B

59

7.62

7.60

7.28

7.44

3 - N.N dimethyl-aminomethyl

benzyloxy

Phenyl

B

60

7.51

6.87

7.17

7.02

4-Cyano phenyl methoxy

Phenyl

B

61

6.34

6.78

6.32

6.55

1-Methyl-piperidin-3-ylmethoxy

Phenyl

B

62

6.59

6.61

6.65

6.63

5-oxo-2-pyrrolidinemethoxy

Phenyl

B

63

7.05

6.91

7.05

6.98

Me2NCOCH2O

Phenyl

B

64

7.30

6.79

6.76

6.77

1-pyrrolidine-COCH2O

Phenyl

B

65

7.39

7.38

7.88

7.63

BnMeNCOCH2O

Phenyl

B

66

6.77

6.98

7.09

7.04

Hydroxypropoxy

Phenyl

B

67

7.72

8.06

8.07

8.06

pyrazole-2-yl methoxy

Phenyl

B

68

8.64

8.55

8.39

8.47

3,5 dimethyl pyrazole 2-yl methoxy

Phenyl

B

69

8.17

7.91

7.87

7.89

Imidazole-2-yl methoxy

Phenyl

B

70

8.11

8.16

8.04

8.10

5-Methyl Imidazole-2-yl methoxy

Phenyl

B

71

9.00

8.74

8.70

8.72

3-Ethyl Imidazole-2-yl methoxy

Phenyl

B

72

8.70

8.94

9.29

9.11

3-Benzyl Imidazole-2-yl methoxy

Phenyl

B

73

7.74

8.47

8.44

8.45

5-methyl-isoxazole 3yl methoxy

Phenyl

B

74

7.72

7.70

7.80

7.75

thiazol 2yl methoxy

Phenyl

B

75

7.52

7.95

7.96

7.96

4-Methyl thiazol 2yl methoxy

Phenyl

B

76

7.62

8.02

7.96

7.99

5-Methyl thiazol 2yl methoxy

Phenyl

B

77

7.92

7.52

7.66

7.59

5-Methyl thiazol 4yl methoxy

Phenyl

B

78

6.60

6.63

6.78

6.71

Benzthiazol 2yl methoxy

Phenyl

B

79

7.00

7.37

7.00

7.18

3-Methyl Isothiazol 5yl methoxy

Phenyl

B

80

7.29

6.97

6.99

6.98

3-Methyl-1,2,4-oxadiazol-5-yl methoxy

Phenyl

B

81

7.18

7.50

7.30

7.40

5-Methyl-1,2,4-oxadiazol-3-yl methoxy

Phenyl

B

82

7.72

7.10

7.36

7.23

1 H - [1,2,4] - triazol-3-yl methoxy

Phenyl

B

83

8.10

8.46

8.22

8.34

1-Methyl-1,2,4-triazol-3-yl methoxy

Phenyl

B

84

7.40

7.04

7.40

7.22

1-Propyl-1,2,4-triazol-3-yl methoxy

Phenyl

B

85

8.85

8.80

8.56

8.68

2-Propyl-1,2,4-triazol-3-yl methoxy

Phenyl

B

86

7.68

7.73

7.79

7.76

1-Methyl-tetrazol-5-yl methoxy

Phenyl

B

87*

6.92

7.47

7.61

7.54

2-pyridyloxymethyl

Phenyl

B

88*

6.77

6.28

6.33

6.30

2-pyridyl-Amide

Phenyl

B

89*

7.15

7.04

7.30

7.17

4-Cyano-benzyloxy

Phenyl

B

90*

8.00

7.17

7.49

7.33

thiazol 4yl methoxy

Phenyl

B

91*

8.55

8.46

8.22

8.34

2-Methyl-1,2,4-triazol-3-yl methoxy

Phenyl

B

92*

8.37

8.70

8.74

8.72

3-Methyl Imidazole-2-yl methoxy

Phenyl

B

93*

7.36

7.00

7.01

7.01

4-Methoxy-benzyloxy

Phenyl

B

94*

7.64

7.39

7.66

7.53

3-Chloro Pyridazine 6 yl methoxy

Phenyl

B

95*

8.62

8.33

8.16

8.25

3,Methoxy-1-Pyridine -2yl methoxy

Phenyl

B

96*

6.60

6.62

6.42

6.52

4-Chloro Phenylmethoxy

Phenyl

A

97*

8.68

8.32

8.20

8.26

3-methyl1-Pyridine -2yl methoxy

Phenyl

B

98*

6.72

7.26

7.01

7.13

4 -Methoxy-Phenylmethoxy

Phenyl

A

99*

7.72

8.12

8.40

8.26

Pyridine -3-yl-methoxy

Phenyl

A

100*

8.89

8.23

8.44

8.34

Pyrimidine -4-yl-methoxy

Phenyl

A

101*

7.89

7.79

7.87

7.83

Pyridine -2-yl-methoxy

Phenyl

B

102*

7.28

6.96

6.79

6.88

Pyridine -2-yl-methoxy

2-furyl

B

103*

7.54

7.91

7.76

7.84

Pyridine -2-yl-methoxy

3-Methoxy phenyl

B

Compounds marked as * indicates test set

A serious concern about the use of QSAR models for activity forecasting is the reliability of the predictions. Hence predictions were carried out within the defined applicability domain, because interpolation is in general more reliable than extrapolation. Applicability Domain estimation was performed using our in house programme. Our algorithm employs a three-stage approach for defining AD. In the first stage K– Means algorithm is employed for clustering the training set compounds based on 2D descriptor space. The identified clusters represent different “structural classes”. In the second stage, assignment of structural classes for the test set compound and the designed isosteric analogues were performed using the K-NN (K-nearest neighbor approach). In the present study Euclidean distances were calculated for 3 of it's nearest neighbors to determine the degree of structural similarity between the molecules in question and the molecules associated within the structural classes. In the final stage an inclusion and exclusion criteria was defined for the compounds in question based on DT values calculated from a Euclidean distance based similarity matrix computed for each structural class. DT values were calculated using the formula shown in equation 5

DT = Y + σ Z ------------ (5)

Where, Y is the average Euclidean distances between the compounds in question and all members belonging to that cluster. σ is the standard deviation and Z , arbitrary parameter to control the significance level was kept a 0.5 which formally places the allowed distance at on half of the standard deviation26. DT calculations were performed using an in house program written using C programming language.

RESULTS AND DISCUSSION

Combinatorial library generation and virtual activity profiling using Miscreen

The Bayesian bioactivity model generated using Miscreen was validated beforehand for its enrichment ability using a Perl script before undertaking an en masse virtual screening of our targeted combinatorial library consisting of 790944 molecules. A 4 fold cross-validation process was performed by randomly dividing the original data set into N subsamples. Of the N subsamples, a subset was retained as a validation set for testing the predictive ability of the model and the remaining N – 1 sub-samples are used as training set. The cross-validation process was repeated 4 times, and a ROC (receiver operating characteristic) curve that reports the discriminatory ability of the model at various thresholds by plotting true positive rate (TPR, Y axis) versus false positive rate (FPR, X axis) was carried out. The overall performance of the model was judged based on the area under the curve (AUC) values. Average AUC value of 0.778 (of 4 cross validation runs) was obtained and the enrichment ability for the developed bioactivity model is shown in 3. Compounds with bioactivity score > 0.3 were retained for further analysis. Though the current Bayesian model was applied as virtual screening tools, it also identified the essential sub-structural features that discriminates active and inactive molecules. Some lists of sub-structural features responsible for activity are provided in supplementary information (SI 1), and it could serve as valuable information for chemists/computational chemists during a hit-to-lead or lead optimization endeavour.

Comparative Interaction Fingerprint Analysis (CoIFA) – A Receptor dependent 3D QSAR Approach

To carry out CoIFA, the prerequisite receptor structure was modeled using Modeller. Fugue alignment was carried out to align the template sequence with the target sequence. The obtained alignment is shown in 4 and the degree of conservity corresponds to the intensity of the colour coding.

The best model was determined based on the lowest value of the Modeller objective function. The modeled structure was evaluated for stereo chemical accuracy; fold reliability and packing quality using Procheck56 and whatif57. Profile- 3D58 overall self-compatibility scores for the models were much higher than the lowest possible scores conceivable for such a model. No major misfolded regions were present near the active site as evident for Profile- 3D (SI 2) and a residue-by-residue energy assessment carried out using DOPE59 (Discrete Optimized Protein Energy) score. Ramachandran plot60 (SI 3) of the Φ-Ψ angle distribution showed that only 1.3 % of the residues fall in the disallowed region. The backbone Cα trace of the modeled GABAA receptor model and its template AChBP overlapped well (RMSD 0.66 Å).

Table 2: Model validation parameters obtained for the Homology Model

Validation Parameter

Values

PDF Total energya

7493.97

Φ – ψ violationsb

1.3%

Profile-3Dc

357.49

DOPE Scored

-76717

aA pseudo energy function obtained from Modeller . b Percentage of residues lying in the disallowed region. cThe minimum and maximum possible scores for a model are 207.375 and 460.833, respectively. d A statistical potential function for predicting errors in modeled proteins

The modeled structure was consistent with the overall topology as evident in LGIC (Ligand Gated ion Channels). The modeled extra cellular ligand binding domain displays a β rich domain forming a sandwich like architecture. The luminal (inner) and abluminal (outer) β sheets are connected by the signature disulfide bridge. The helical pore-forming domain's architecture was also consistent with other experimentally resolved LGIC (Ligand Gated Ion Channels). Hydrogens were added to the modeled protein using the Protonate 3D module of MOE.

Ionization states for the titratable groups were assigned assuming a pH of 7.2. The added hydrogen's were minimized using CHARMM force field61 keeping the rest of the system static to relive steric clash and improve hydrogen bonding geometry. A total of 133 ligands were collected and docked to the benzodiazepine active site using GOLD. Compounds with reported pKi > 7 were considered as active (1) and those with pKi < 7 were considered as inactive (0). The interaction finger prints generated using the PLIF module of MOE is shown in 6.

All the interacting residues were treated equally (given equal weight age). Six types of interactions were defined namely side chain hydrogen bonds (donor or acceptor), backbone hydrogen bonds (donor or acceptor), ionic interactions, and surface interactions. The generated interaction fingerprints were assigned probabilistic contact values based on MOE methodology. Details of probabilistic contact value calculations are covered elsewhere and not elaborated for reasons of brevity43. These values obtained using MOE served as pseudo energy values to construct a categorical 3D QSAR model using a recursive partitioning algorithm (Decision tree). Irrespective of the nature of the modeling technique employed, validation and predictivity are mandatory for any QSAR model.

The performance of the classification method was evaluated based on the obtained confusion matrix (Table 3). Different metrics namely Classification Error (E), precision (P), recall (R) values enrichment factor (EF) and Matthews correlation coefficient (MCC) were used as a measure of its performance. The formulas used for calculating the respective metrics are stated below.

Table 3: Confusion matrix obtained for CoIFA QSAR model

Predicted

Observed

Active

Inactive

Total

Active(72Train18 Test)

TP

(61Train16 Test)

FP

(8 Train 2 Test )

TP+FP

(69Train 18Test )

Inactive

(36Train 7 Test)

FN

(11Train 2 Test )

TN

(28 Train 5 Test )

FN+TN

(39 Train 7 Test)

Total(108 Train25 Test )

Nact = TP+FN

(72Train18 Test )

N inact = FP+TN

(36Train 7 Test )

N= Nact+ NInact

( 108 Train 25 Test )

--------------(6)

-------------------------(7)

--------------------------(8)

-----(9)

-----(10)

Where, the classification error (E) provides an overall error estimate, whereas recall R measures the percentage of correct predictions, and precision (P) gives the percentage of observed positives that are correctly predicted. The enrichment factor (EF) describes to what degree true positives are over-represented in the selected data set compared to the whole set of compounds. The obtained CoIFA model had an overall error estimate of 17.59%, recall ability of 84.73%, precision value of 88.4% and an enrichement value of 131.94% for the training set. Matthews correlation coefficient (MCC), regarded as one of the best measures to quantify the correlation coefficient between the observed and predicted binary classification was used to evaluate the obtained model. MCC values of 0.61 for the training set and 0.60 for the test set signify the predictive ability of the classification approach.

A modified Correct Classification Rate (CCR) metric, as described by Weston et al62, was used to prevent artefacts that occur when the numbers of compounds belonging to different classes are significantly disproportionate. Accordingly the CCR formula suited for an unbalanced classification problem is

---------- (11)

In the above expression NAct and NInact signifies the total number of actives and inactives, whereas tp signifies the number of compounds predicted as active and tn the number of compounds predicted as inactive. The CoIFA model obtained had a correct classification rate of 0.812 for the training set. For rigorous validation, an external test set of 25 compounds (18 Actives and 7 Inactives) that was not used during the model development was considered for validation. Classification accuracy of 0.80 for the test set reveals that CoIFA model derived based on the interaction patterns, shows good discriminative ability in distinguishing binders and non binders. This clearly suggests the existence of an underlying classification pattern based on the interaction patterns. Such a type of study could also shed information about the hotspots in the protein active site. The encouraging results obtained from our study assures that even homology modeled structures derived for structures with reasonable homology also proves to be a valuable tool for mining chemical database on the basis of interaction patterns and would be a value added approach to post process docked outputs in light of the mediocrity of the current scoring functions. Interaction based filtering has proven to be a successful tool for post filtering docked outputs as evident from many studies53.

Classical 2D QSAR based predictive data mining

Predictive QSAR models developed using two statistical techniques (GFA and GPLS), were in turn used to build a consensus QSAR prediction strategy. The results obtained for the best GFA and GPLS models obtained are discussed below in detail. GFA model was obtained on a population size of 100 with 25,000 generations. The mutation probabilities were kept at system default. The G/PLS model derived was optimal with 4 components as determined by q2 (Cross- validated square correlation coefficient). The best model obtained for GFA an explained variance of 72.1% and a predicted variance of 67.1%.

The best QSAR equation obtained for GFA model is shown in equation 12

pKi = 8.14817 - 0.06079 * "PEOE_VSA_PPOS" + 0.016144 * "SlogP_VSA9" - 0.017353 * "PEOE_VSA-0" + 0.000235 * "weinerPath" + 0.010737 * "PEOE_VSA_POS" - 0.056294 * "PEOE_VSA+3" - 1.00127 * <-6.82572 - "logS"> - 0.016909 * <"PEOE_VSA+0" - 85.5211> ------------ (12)

N Training = 86, N Test = 17, Optimal Number of components (ONC) = 4, R2 =0.742,

Adjusted R2 (Radj2) = 0.721, Bs R2 =0.748, R2 cv (q2) =0.687, Randomized R2=0.383,

PRESS = 8.764, R2Pred = 0.671, LOF = 0.234, F test = 28.171, (R2 – R2O) / R2= 0, k=1.

Regression using spline terms allows the incorporation of non linear modeling. The appearance of log S in the equation as a spline term with a negative regression coefficient suggests that if the value of log s is more than -6.82572, then it shows a negative contribution toward the activity. The appearance of PEOE_VSA_PPOS in the equation with coefficients of opposite sign suggest that there may be a parabolic relationship between the activity and the parameter PEOE_VSA_PPOS. Though a bilinear modeling incorporating squared terms could have shed more light on the effect of this variable on activity, we confine to the present model as it exhibited good predicivity and stability required to address the goal of the current study. To vindicate this speculation, the G/PLS model as shown in equation 13 reveals the appearance of the parameter PEOE_VSA_PPOS as a spline term suggesting that a value less than 23.3534 then it contributes positively for activity. GFA model had 3 compounds (55,44,19) and GPLS model had four compounds (55,3,51,41) with residual values exceeding twice the standard deviation. These compounds were considered as outliers, of which 55 appeared as common outlier in both models. Removal of outliers from QSAR models remains a contentious issue as is often, termed as method of polishing R2 value, unless obvious evidence supports the uniqueness of the outlier compound63. Activity, forecasting was carried out using predictive QSAR models inclusive of outliers in light its good statistical quality index, and the relatively low number of outliers evident form the models. The best G/PLS model obtained in terms of statistical significance is expressed in equation 13. G/PLS model obtained had an explained variance of 76.1% and a predicted variance of 67.9%.The best model was obtained on a population size of 100 with 25,000 generations with optimal number of components 3. The mutation probabilities were kept at system default.

pKi = 8.50466 + 0.230131 * <”Kier1” – 20.5959> + 0.014112 * “SlogP_VSA9” - 0.964241 * <-6.87042 – "logS"> - 0.01636 * “PEOE_VSA-0” - 0.136879 * <33.9457 – “vsa_pol”> + 0.071594 * <23.3534 – “PEOE_VSA_PPOS”> - 0.046163 * “PEOE_VSA+3” --------- (13)

N Training = 86, N Test = 17, Optimal Number of components (ONC) = 3, R2 =0.770,

Adjusted R2 (Radj2) = 0.761, Bs R2 =0.754, R2 cv (q2) =0.692, Randomized R2=0.362,

, PRESS = 8.764, R2Pred = 0.679, (R2 – R2O) / R2= 0, k=1.

A short explanation of the descriptors that appeared in the final model is provided in supplementary information (SI table 2). One important consideration in QSAR studies is the “interpretability” of the model, to provide structural insights as to what modifications of the existing compounds might be promising. Since an earlier 3D QSAR study by us55 has dealt upon those aspects, and the present focus being on activity forecasting, emphasis was stressed on the predictive ability of the models. A regress diagnostic R2 carried out to measure the goodness of interpolation of GFA and G/PLS models were statistically high (0.742 and 0.77). Since individual QSAR model may overemphasize some descriptors and may overlook some important features completely, it seems reasonable to anticipate that a consensus QSAR model, derived from averages of predictions from individual models, may provide better statistical fit and predictive ability as compared to individual models. Hence a C-QSAR model was derived as a non weighted average prediction of GFA and G/PLS. Apparently the results obtained from our C-QSAR model was superior that the individual models with an R2 value of 0.77 and a predicted variance of 69.4%

Since the value of R2 tends to be inflated as the number of terms in the equation increases

R2 Adj was also calculated for expository reasons. The near value of R2 and R2 Adj for both the models ensures the absence of over fitting. Internal predictivity of the models were validated using two resampling techniques namely leave –one –out (LOO) Cross Validation and Bootstrapping. Obtained R2 cv (q2) values of 0.687 (GFA), 0.692 (G/PLS) 0.694 (C-QSAR) assure the predictive ability of the models. Model stability and chance correlation were evaluated by subjecting the developed models to a Y-Randomization procedure that scrambles the dependent variable set, and rebuilds a new QSAR model based on the permuted response. Randomized R2 values of 0.383 (GFA) and 0.362 (G/PLS) apparently signify the stability of the models. The above mention metrics are indicative of the interpolative ability of the models and do not reflects the extrapolative ability of the models. The predictive ability and extensibility of the models were established using an external test set consisting of 17 compounds that was not considered during the model generation process. The predictive power of the models were calculated using the formula64

R2 Pred = SD- PRESS ---------------- (14)

SD

Where SD is the sum of square deviations between the biological activities of each molecule in the test and the mean activity of the training set molecules and PRESS is the sum of square deviations between the predicted and the actual activities of molecules in the test set. More rigorous validation was carried out using other parameters proposed by Tropsha et al64.

They recommended the use of the following formula for illustrating the predictive ability

(R2 – R2O) / R2 < 0.1 or (R2 – R2′O) / R2 < 0.1 ------------- (15)

and

k or k′ close to 1 ------------ (16)

Where, R2 represents the square correlation coefficient between the predicted and observed activities. R2O and R2′O are quantities characterizing the square correlation between predicted Vs observed and observed Vs predicted activity respectively, with Y-intercept set to zero and k, k′ their corresponding slopes. When embarking on an extrapolative adventure using QSAR models, assessing the applicability domain with reference to the calibrated model is of immense importance to obtain predictions with confidence. A distance based similarity method used to define the applicability domain based on a distance threshold value (DT) was used to weed out compounds that fall outside the scope of the model domain in order to obtain reliable predictions. The overall statistical quality index as evident from the footnotes provided in equation 12 and 13 reveal that both GFA and G/PLS models satisfy the acceptability criteria for a valid QSAR model (R2 >0.6, q2 >0.5, R2 – q2 < 0.3and R2 Pred > 0.5)65.

Post analysis of leads

A robust Insilico screening workflow of our bioisosteric library consisting of 790944 compounds facilitated the identification of 123 consensus potential leads with predicted pKi values greater than the established compounds. The reported pKi values are based on consensuses QSAR predictions subjected to rigorous validation with extrapolation limited to the applicability domain. This ensures the highest possible accuracy in forecasting the biological activity of compounds outside to the training set.

AEDSA1 AEDSB1

Predicted PKi 9.795 Predicted pKi 10.183

More interestingly it was found that some isosteric analogues, which were predicted to show good activity as per C-QSAR model, were not predicted to be active analogs using our proteo-chemometric based CoIFA approach. A close scrutiny reveals that those compounds pose an interaction pattern quite different from the other compounds. This discrepancy in the selection of leads between classical QSAR methods and the proteo-chemometric based CoIFA method could be attributed to the feature selection issues in QSAR modeling. During the process of removing invariant variables, Genetic Algorithm based feature selection overlooks a small set of variant descriptors that are relatively constant to generate a statically valid model that faithfully reflects the SAR data. Hence mining databases using QSAR models which has few variables may not be prudent as QSAR models are mandated to faithfully correlate the variation in descriptor values with that of the target property upon which it was trained. Hence integrated virtual screening using Proteo-chemometric based QSAR method like CoIFA which incorporates protein – ligand interaction cross terms, could be a valued added tool since molecular recognition plays a crucial role in determining activity. As an additional measure of confidence we used similarity metrics to establish the degree of analogy between the designed bioisosteric analogues and its progenitor molecule with regard to shape, chemistry and electrostatics together termed as electroform. Electroform calculations were performed using ROCS and EON. An atom centered Gaussian shape based overlay process optimized with respect to chemical features as defined by Mills and Dean force field66 was employed to measure the shape and chemical similarity of the reference compound. The electrostatic potential maps of molecules were compared using EON, which employs partial charges calculated using MMFF94 force field67. The obtained leads have a shape tanimoto similarity (TShape) of 0.86 and an electrostatic similarity (TElectro) of 0.88 reinforcing the fact that they are true electroforms, which are more probable to be active.

Reference AEDSA1 Reference AEDSA1

( TShape = 0.86) (TElectro = 0.88)

A close scrutiny of the R1 fragment of the hit coded AEDSA1 belonging to scaffold (A) with its progenitor molecules reveal that phenyl substitutions with Fluro at position 2 and methyl at position 3 has been explored earlier and were also shown to posses remarkable selectivity profile, but concurrent substitution of the phenyl ring with Fluro and Methyl at positions 2 and 3 respectively has not been explored. It is postulated that such a substitution could afford better affinity. The R2 fragment shows the presence of an 1,2,4, triazol moiety, with an ethylene linker bridging the position 1 of the Triazole ring and position 6 of the 7,8,9,10-tetrahydro-7,10-ethano-1,2,4-triazolo[3,4-a]phthalazine core. Ethylene, an electroform to other ether linkers like methoxy, thio ether and amino ether, all reported earlier is worth exploring in light of its high predicted activity. Analysis of the potential lead coded AEDSB1 belonging to the core scaffold (B) reveals that a 2,4,6 Fluorine substituted phenyl moiety at R1 position is more probable to be active. Such a substitution is akin to the hydrophobic substitution (CH3 and F) explored at various positions around the phenyl ring. Similarly the occurrence of imidazole is not totally unexpected as some progenitor molecules has been synthesised with Methyl and Ethyl substitutions at various positions around the imidazole moiety, with the only variation being the attachment position of the ethylene linker. Overall, it is gratifying to see that the method employed here revealed some unexplored R group fragments with more binding affinity that are deemed to be dissimilar based on intuition yet analogs based on similarity values.

Conclusion

Combinatorial chemistry technique has been utilized to enumerate a targeted library of molecules by varying the Markush features using the concept of bioisosterism. High throughput virtual profiling was carried out using Molinspiration “Miscreen” that uses Bayesian statistics. Extending the QSAR paradigm for virtual screening is an attractive approach given its general notion as a retrospective analysis tool for ligand optimization approach. Conventional QSAR identifies relationships between some aspects of molecular structure and the target property and seldom exploits the structural information of the receptors deeming it to be termed a ligand based approach. To make the best of the information, in hand we have put in practice a novel QSAR approach termed CoIFA that extends the 3D QSAR paradigm to incorporate receptor-ligand interaction profiles to classify molecules as active or inactive based on a pre informed judgment obtained using well validated recursive partitioning method. Such a QSAR model when synchronized with classical QSAR models further hones the predictive power of QSAR models as a powerful virtual screening tool, as exemplified in the study. We believe that this extended view of QSAR modeling as a virtual screening tool, presented in the paper brings a practical solution to mine novel leads using QSAR approaches rather than to confine QSAR modeling to obtain appreciable statistical values that faithfully quantify the SAR data.

References

1. Kuffler, S. W.; Edwards, C. Mechanism of gamma aminobutyric acid (GABA) action and its relation to synaptic inhibition. J. Neurophysiol. 1958, 21 (6), 589-610.

2. Xue, H; Chu, R; Hang, J.; Lee, P.; Zheng, H. Fragment of GABA(A) receptor containing key ligand-binding residues overexpressed in Escherichia coli. Protein. Sci. 1998, 7(1),216-9.

3. Haifeng, Shi; Shui, Ying, Tsang; Hui, Zheng; James, N. Sturgis; Hong, Xue Two β-rich structural domains in GABAA receptor α1 subunit with different physical properties: Evidence for multidomain nature of the receptor. Protein. Sci. 2002, 11(8), 2052–2058.

4. Mehta, A. K.; Ticku, M. K. An update on GABAA receptors. Brain. Res. Brain. Res. Rev. 1999, 29 (2-3), 196-217.

5. Rudolph, U.; Crestani, F.; Benke, D.; Brunig, I.; Benson, J. A.; Fritschy, J. M.; Martin, J. R.; Bluethmann, H.; Mohler, H. Benzodiazepine actions mediated by specific gamma-aminobutyric acid(A) receptor subtypes. Nature 1999, 401 (6755), 796-800.

6. McKernan, R. M.;. Rosahl, T.W; Reynolds, D.S.; Sur, C.; Wafford, K.A.; Atack, J.R.; Farrar, S.; Myers, J.; Cook, G.; Ferris, P.; Garrett, L.; Bristow, L.; Marshall, G.; Macaulay, A.; Brown, N.; Howell, O.; Moore, K.W.; Carling, R.W.; Street, L.J.; Castro, J.L.; Ragan, C.I.; Dawson, G. R.; Whiting, P.J. Sedative but not anxiolytic properties of benzodiazepines are mediated by the GABA (A) receptor alpha1 subtype. Nat. Neurosci. 2000, 3, 587–592.

7. Low, K.; Crestani, F.; Keist, R.; Benke, D.; Brunig, I.; Benson, J. A.; Fritschy, J. M.; Rulicke, T.; Bluethmann, H.; Mohler, H.; Rudolph, U., Molecular and neuronal substrate for the selective attenuation of anxiety. Science 2000, 290 (5489), 131-4.

8. Russell, M. G.; Carling, R. W.; Atack, J. R.; Bromidge, F. A.; Cook, S. M.; Hunt, P.; Isted, C.; Lucas, M.; McKernan, R. M.; Mitchinson, A.; Moore, K. W.; Narquizian, R.; Macaulay, A. J.; Thomas, D.; Thompson, S. A.; Wafford, K. A.; Castro, J. L. Discovery of functionally selective 7,8,9,10-tetrahydro-7,10-ethano-1,2,4-triazolo[3,4-a]phthalazines as GABA A receptor agonists at the alpha3 subunit. J. Med. Chem. 2005, 48 (5), 1367-83.

9. Carling, R. W.; Moore, K. W.; Street, L. J.; Wild, D.; Isted, C.; Leeson, P. D.; Thomas, S.; O'Connor, D.; McKernan, R. M.; Quirk, K.; Cook, S. M.; Atack, J. R.; Wafford, K. A.; Thompson, S. A.; Dawson, G. R.; Ferris, P.; Castro, J. L. 3-phenyl-6-(2-pyridyl)methyloxy-1,2,4-triazolo[3,4-a]phthalazines and analogues: high-affinity gamma-aminobutyric acid-A benzodiazepine receptor ligands with alpha 2, alpha 3, and alpha 5-subtype binding selectivity over alpha 1. J. Med. Chem. 2004, 47 (7), 1807-22.

10. Wermuth, C. G. Similarity in drugs: reflections on analogue design. Drug. Discov. Today. 2006, 11 (7-8), 348-54.

11. http://www.molinspiration.com/(accessed Aug 7, 2009 ).

12. Ortiz, A. R.; Pisabarro, M. T.; Gago, F.; Wade, R. C. Prediction of drug binding affinities by comparative binding energy analysis. J. Med. Chem. 1995, 38 (14), 2681-91.

13. Gohlke, H.; Klebe, G. DrugScore meets CoMFA: adaptation of fields for molecular comparison (AFMoC) or how to tailor knowledge-based pair-potentials to a particular protein. J Med Chem 2002, 45 (19), 4153-70.

14. Datar, P. A.; Khedkar, S. A.; Malde, A. K.; Coutinho, E. C. Comparative residue interaction analysis (CoRIA): a 3D-QSAR approach to explore the binding contributions of active site residues with ligands. J. Comput. –Aided. Mol. Des. 2006, 20 (6), 343-60.

15. Sali, A.; Blundell, T. L. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 1993, 234 (3), 779-815.

16. Brejc, K.; van Dijk, W. J.; Klaassen, R. V.; Schuurmans, M.; van Der Oost, J.; Smit, A. B.; Sixma, T. K. Crystal structure of an ACh-binding protein reveals the ligand-binding domain of nicotinic receptors. Nature. 2001, 411 (6835), 269-76.

17. Jones, G.; Willett, P.; Glen, R. C.; Leach, A. R.; Taylor, R. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 1997, 267 (3), 727-48.

18. Clark, A. M.; Labute, P.; Santavy, M. 2D structure depiction. J. Chem. Inf. Model. 2006, 46 (3), 1107-23.

19. Clark, A. M.; Labute, P. 2D depiction of protein-ligand complexes. J. Chem. Inf. Model. 2007, 47 (5), 1933-44.

20. http://www.chemcomp.com/ (accessed Aug 7, 2009).

21. Breiman, L.; Friedman, J.; Olshen, R.A.; Stone, C.J.Classification and Regression Trees, 1984.

22. Hopfinger, A. J.; Rogers, D. Application of Genetic Function Approximation to Quantitative Structure-Activity Relationships and Quantitative Structure-Property Relationships. J. Chem. Inf. Comput. Sci. 1994, 34 (4), 854–866.

23. Dunn, W.J.; Rogers, D. Genetic partial least squares in QSAR, in Genetic Algorithms in Molecular Modeling, J. Devillers (Ed.), Academic Press, 1996.

24. Gramatica, P.; Giani, E.; Papa, E. Statistical external validation and consensus modeling: a QSPR case study for Koc prediction. J. Mol. Graph. Model. 2007, 25 (6), 755-66.

25. Ganguly, M.; Brown, N.; Schuffenhauer, A.; Ertl, P.; Gillet, V. J.; Greenidge, P. A. Introducing the consensus modeling concept in genetic algorithms: application to interpretable discriminant analysis. J. Chem. Inf. Model.2006, 46 (5), 2110-24.

26. Tropsha, A.; Gramatica, P.; Gombar, V. K. The importance of being earnest: Validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb.Sci. 2003, 22, 69-77.

27. Accelrys. Discovery Studio, version 2.0; Accelrys Inc.: San Diego,CA, 2007.

28. The Cambridge Crystallographic Data Centre. GOLD, version 3.2; CCDC: Cambridge, U.K., 2006.

29. OpenEye Scientific Software, 9 Bisbee Court, Suite D, Santa Fe, NM 87508, 2006.

30. Accelrys. TSAR, version 3.0; Accelrys Inc.: San Diego, CA, 2007.

31. Accelrys. Cerius2, version 4.10; Accelrys Inc.: San Diego, CA, 2006.

32. Gasteiger, J.; Rudolph, C.; Sadowski, J. Automatic Generation of 3D Atomic

Coordinates for Organic Molecules. Tetrahedron Comp. Method. 1990, 3, 537-547.

33. Langer, T.; Wolber, G. Virtual combinatorial chemistry and in silico screening: Efficient tools for lead structure discovery? Pure. Appl. Chem. 2004, 76 (5), 991–996.

34. John,I.M; David,H, S. Designing Combinatorial Libraries for Efficient Screening. In Methods in Molecular Biology Combinatorial Library L.B.,English Ed.; Humana Press Inc.: Totowa, NJ,2002;pp 307-323

35.Schuffenhauer, A.; Ertl, P.; Roggo, S.; Wetzel, S.; Koch, M. A.; Waldmann, H. The scaffold tree--visualization of the scaffold universe by hierarchical scaffold classification. J. Chem. Inf. Model. 2007, 47 (1), 47-58.

36. Patani, G. A.; LaVoie, E. J. Bioisosterism: A Rational Approach in Drug Design. Chem. Rev. 1996, 96 (8), 3147-3176.

37. Akos, Pap; Tamas, Szommer‌; Laszlo, Barna‌; Gergely, Gyimesi‌; Peter, Ferdinandy‌; Cesare, Spadoni‌; Ferenc, Darvas‌; Toshio, Fujita‌; Laszlo, Urge;‌Gyorgy, Dorman Enhanced hit-to-lead process using bioanalogous lead evolution and chemogenomics: application in designing selective matrix metalloprotease inhibitors. Expert Opinion On Drug Discovery. 2007, 2(5), 707-723.

38. Chen, X.; Wang, W. The use of bioisosteric groups in lead optimization. Annual Reports in Medicinal Chemstry 2003, 38.

39. http://www.rcsb.org/pdb/home/home.do (accessed Aug 7, 2009)

40. Rebecca, C. W.; Stefan, H.; Ting, W. Using 3D protein structures to derive 3D-QSARs. Drug Discovery Today: Technologies 2004,1(3), 241-246.

41. Cramer, R.D., III.; Patterson, D.E.; Bunce. J.D. Comparative molecular field analysis ( CoMFA ). 1. Effect of shape on binding of steroids to carrier proteins. J. Am. Chem. Soc., 1988, 110(18),5959-67.

42. Gohlke, H.; Hendlich, M.; Klebe, G. Predicting binding modes, binding affinities and 'hot spots' for protein-ligand complexes using a knowledge-based scoring function. Perspect. Drug Discovery Des. 2000, 20, 115-144.

43. Labute, P. Probabilistic Receptor Potentials.Journal of the Chemical Computing Group2001.

44. Shadnia, H.; Wright, J. S.; Anderson, J. M. Interaction force diagrams: new insight into ligand-receptor binding. J Comput.- Aided. Mol. Des. 2009, 23 (3), 185-94.

45. Huang, N.; Shoichet, B. K.; Irwin, J. J. Benchmarking sets for molecular docking. J. Med. Chem. 2006, 49 (23), 6789-801.

46. http://www.expasy.ch/sprot/(accessed Aug 7, 2009)

47. Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 1990, 215 (3), 403-10.

48. Shi, J.; Blundell, T. L.; Mizuguchi, K. FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol. 2001, 310 (1), 243-57.

49. Verdonk, M. L.; Cole, J. C.; Hartshorn, M. J.; Murray, C. W.; Taylor, R. D. Improved protein-ligand docking using GOLD. Proteins 2003, 52 (4), 609-23.

50. Muegge, I.; Martin, Y. C. A general and fast scoring function for protein-ligand interactions: a simplified potential approach. J. Med. Chem. 1999, 42 (5), 791-804.

51. Mitchell, J. B. O.; Laskowski, R. A.; Alex, A.; Thornton, J. M. BLEEP - Potential of mean force describing protein-ligand interactions: I. Generating potential. J. Comput. Chem. 1999, 20 (11), 1165-1176.

52. Gohlke, H.; Hendlich, M.; Klebe, G. Knowledge-based scoring function to predict protein-ligand interactions. J. Mol. Biol. 2000, 295 (2), 337-56.

53. Deng, Z.; Chuaqui, C.; Singh, J. Structural interaction fingerprint (SIFt): a novel method for analyzing three-dimensional protein-ligand binding interactions. J. Med. Chem. 2004, 47 (2), 337-44.

54. http://www.spss.co.in/ (accessed Aug 7, 2009 )

55. Vijayan, R. S.; Ghoshal, N. Structural basis for ligand recognition at the benzodiazepine binding site of GABAA alpha 3 receptor, and pharmacophore-based virtual screening approach. J. Mol. Graph. Model. 2008, 27 (3), 286-98.

56. Laskowski, R. A.; MacArthur, M. W.; Moss, D. S.; Thornton, J. M. PROCHECK - a program to check the stereochemical quality of protein structures. Journal of Applied Crystallography. 1993, 26, 283-291.

57. Vriend, G. WHAT IF: a molecular modeling and drug design program. J. Mol.Graph .1990, 8 (1), 52-6, 29.

58. Bowie, J. U.; Luthy, R.; Eisenberg, D., A method to identify protein sequences that fold into a known three-dimensional structure. Science 1991, 253 (5016), 164-70.

59. Shen, M. Y.; Sali, A., Statistical potential for assessment and prediction of protein structures. Protein Sci. 2006, 15 (11), 2507-24.

60. Ramachandran, G. N.; Ramakrishnan, C.; Sasisekharan, V. Stereochemistry of polypeptide chain configurations. J .Mol. Biol. 1963, 7, 95-9.

61. Brooks, B. R.; Brooks, C. L., 3rd; Mackerell, A. D., Jr.; Nilsson, L.; Petrella, R. J.; Roux, B.; Won, Y.; Archontis, G.; Bartels, C.; Boresch, S.; Caflisch, A.; Caves, L.; Cui, Q.; Dinner, A. R.; Feig, M.; Fischer, S.; Gao, J.; Hodoscek, M.; Im, W.; Kuczera, K.; Lazaridis, T.; Ma, J.; Ovchinnikov, V.; Paci, E.; Pastor, R. W.; Post, C. B.; Pu, J. Z.; Schaefer, M.; Tidor, B.; Venable, R. M.; Woodcock, H. L.; Wu, X.; Yang, W.; York, D. M.; Karplus, M. CHARMM: the biomolecular simulation program. J. Compu.t Chem. 2009, 30 (10), 1545-614. 62.Weston, J.; Perez-Cruz, F.; Bousquet, O.; Chapelle, O.; Elisseeff, A.; Scholkopf.

Feature Selection and Transduction for Prediction Of Molecular Bioactivity for Drug Design. Bioinformatics 2003, 19, 764–771.

63. Cronin, M.T.D.; Schultz, T.W. “Pitfalls in QSARs”. Journal of Molecular Structure - THEOCHEM 2003, (622), 39-52.

64. Golbraikh, A.; Tropsha, A. Beware of q2! J Mol Graph Model 2002, 20 (4), 269-76.

65. Eriksson, L.; Jaworska, J.; Worth, A.P.; Cronin, M.T.; McDowell, R.M.; Gramatica, P. Methods for reliability and uncertainty assessment and for applicability evaluations of classification and regression based QSARs. Environ Health Perspect. 2003, 111(10), 1361-75.

66.Mills, J. E.; Dean, P. M.Three-dimensional hydrogen-bond geometry and probability information from a crystal survey J. Comput. -Aided Mol. Des. 1996, 607 – 622.

67. Halgren, T. A. Merck Molecular Force Field. I. Basis, Form, Scope, Parameterization and Performance of MMFF94, J. Comp. Chem. 1996 17, 490-51.

Writing Services

Essay Writing
Service

Find out how the very best essay writing service can help you accomplish more and achieve higher marks today.

Assignment Writing Service

From complicated assignments to tricky tasks, our experts can tackle virtually any question thrown at them.

Dissertation Writing Service

A dissertation (also known as a thesis or research project) is probably the most important piece of work for any student! From full dissertations to individual chapters, we’re on hand to support you.

Coursework Writing Service

Our expert qualified writers can help you get your coursework right first time, every time.

Dissertation Proposal Service

The first step to completing a dissertation is to create a proposal that talks about what you wish to do. Our experts can design suitable methodologies - perfect to help you get started with a dissertation.

Report Writing
Service

Reports for any audience. Perfectly structured, professionally written, and tailored to suit your exact requirements.

Essay Skeleton Answer Service

If you’re just looking for some help to get started on an essay, our outline service provides you with a perfect essay plan.

Marking & Proofreading Service

Not sure if your work is hitting the mark? Struggling to get feedback from your lecturer? Our premium marking service was created just for you - get the feedback you deserve now.

Exam Revision
Service

Exams can be one of the most stressful experiences you’ll ever have! Revision is key, and we’re here to help. With custom created revision notes and exam answers, you’ll never feel underprepared again.