Grid Potential Analysis Molecular Docking Studies Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

A grid potential analysis employing a novel approach of 3D-QSAR as AutoGPA module in MOE2009.10 was performed on a dataset of 42 compounds of N-arylsulfonyl-3-acetylindoles as anti-HIV agents. The molecular docking simulation was also performed to determine the most probable binding mode and reliable conformation. The uniqueness of AutoGPA module is that, it automatically builds the 3D-QSAR model on the pharmacophore based molecular alignment. The AutoGPA based 3D-QSAR model obtained in the present study gave the cross-validated Q2 value of 0.588, r2pred value of 0.701, r2m statistics of 0.732 and Fisher value of 94.264. Furthermore, the steric and electrostatic contour maps, along with the 3D structure of protein (binding residue of active site) inlaid were canvassed to understand the structural requirements and interaction between binding residue of protein and inhibitors of HIV-1. The results of 3D-QSAR analysis and molecular docking studies showed good correlation and were in agreement with each other. The study indicates that hydrophobic groups at R1 and R2 position and groups that forms hydrogen bond at R3 position are favourable for good activity. To find similar analogues, virtual screening on ZINC database was done using generated AutoGPA based 3D-QSAR model and showed good prediction. In addition to in-silico ADME profiling and toxicity risk assessment test was performed and results showed that majority of compounds from current dataset and newly virtual screened hits generated were within their standard limit.

Keywords: AutoGPA, Molecular Docking, 3D-QSAR, HIV-1, Grid potential analysis

1. Introduction

Acquired immunodeficiency syndrome (AIDS), despite the efforts of researchers worldwide to unreveal the mode of action of the virus and develop therapies is currently a very serious global health threat and the leading cause of death. HIV-1, the causative agent of AIDS, is a retrovirus from the LentiViridae family. Currently, the drugs for treatment of HIV-1 infected patients belong to nucleoside/nucleotide reverse transcriptase inhibitors (NRTIs), non nucleoside reverse transcriptase inhibitors (NNRTIs), protease inhibitors (PIs), integrase inhibitors (INIs), and fusion inhibitors (FIs). The development of combination antiretroviral therapy, has provided a useful means of suppressing the viremia in infected people, resulted in dramatic reductions in HIV-1 associated morbidity and mortality. However, the current therapies are very far from being curative, as the complexity and toxicity of dosing regimens has led to patient non-compliance. Moreover, the incomplete suppression of HIV-1 replication is responsible for the emergence of drug resistant HIV-1 strains. At present, 32 drugs have been approved by U.S. FDA approval, and while many new are in clinical pilot phase, but still the cure of AIDS is a major problem. Consequently, many efforts have been endowed, in recent years to design and develop novel, selective and safer drugs for treatment of HIV-1.

The main objective of computer aided drug design is predicting the activity of newer compounds. There are many important methods of drug design core technology, based on drug target interactions driven by molecular forces (Docking), quantitative structure-activity relationships (QSAR) using molecular descriptors/potential field and pharmacophore modelling, are being utilized and applied successfully to predict different classes of inhibitors.

Several computational studies (3D-QSAR and docking) have been successfully applied for modeling biological activities on series of anti-HIV agents. Reports on acylthiocarbamates, 1-(2-hydroxyethoxymethyl)-6-(phenylthio)thymine (HEPT), efavirenz derivatives, pyridinones, 1,3,4,5-tetrasubstituted-pyrazoles (TPs), 2-amino-6-arylsulfonylbenzonitriles, indolyl aryl sulfones, etc. are some recent examples.

3D-QSAR models derived from Grid potential analysis provide spatial distribution of important potential fields around active ligand. Since, the distribution usually vary widely depending on the superimposed conformations of the active ligands, it is not easy to find the most appropriate model that represents the actual binding site. To develop most reliable 3D-QSAR model, a novel approach of AutoGPA script in MOE 2009.10 (molecular operating environment) has been used. The AutoGPA script considers conformations of the active ligands aligned over pharmacophoric points. It uses PLS regression method to calculate grid potential around aligned molecules.

In the present study, we have applied a novel approach of AutoGPA, to develop a 3D-QSAR model on a series of N-arylsulfonyl-3-acetylindole derivatives as anti-HIV agents. In addition, the molecular docking simulation have been carried out to elucidate the binding modes in the binding pocket. The AutoGPA based 3D-QSAR model development, considers the steric (vdW), electrostatic fields along with pharmacophore and allowable space around the ligands. Based on the performance of the AutoGPA based 3D-QSAR model, the developed model not only helps in understanding the structure activity relationship of these compounds, but also serves as a useful guide for the design of new inhibitors with better activities.

The protocol followed for the development of AutoGPA model shown in Fig. 1 is as follows:

The pEC50 values and conformations of active ligands were stored in a database of the MOE.

Division of dataset into training and test set.

Pharmacophore elucidation based on conformations of training set.

Generation of 3D-QSAR model for all superimposed conformation by the AutoGPA module in MOE.

Selection of the best model based on q2 values.

Best model (Pharmacophore + 3D-QSAR)

AutoGPA Model: (Pharmacophore + 3D QSAR + Allowable space)

Finally, a virtual screening search was performed on ZINC database using AutoGPA based 3D-QSAR model as a hypothesis to find new hits. In-silico ADME profiling as preliminary drug-likeness and toxicity risk assessment like mutagenicity, tumorigenicity, irritant and reproductive effects of molecules of current dataset and new hits were also performed.

[Insert Fig. 1 here]

2. Materials and Methods

2.1. Data set

The dataset of N-arylsulfonyl-3-acetylindoles derivatives, which was selected from reported literature, consisted of forty two (42) compounds as shown in Table 1. The inhibitory activity values (EC50 in µg/ml) were converted into negative logarithm scale value in mole (pEC50) as dependent variable for grid potential analysis. The corresponding inhibitory activity data covered nearly three log units (pEC50= 3.688 to 6.886). The dataset was randomly partitioned into training and test set compounds by considering activity range, so that both training set and test set compounds, consisted of high, medium and low actives. The training and test set consist of 31 and 11 compounds, respectively. The 3D structure of these compounds was built using the builder tool of MOE2009.10 and energy was minimized using MMFF94x force field, with generalized born solvation model and associated partial charge with RMS gradient of 0.0001.

[Insert Table 1 here]

2.2. Conformation generation

Prior to 3D-QSAR model development, it is necessary to fully exploit the conformations of compounds to obtain the bioactive conformers. In the present study, conformational import was used to generate conformers for a large number of compounds, emphasizing speed by using pregenerated conformations for template fragments as shown in Fig. 2.

The structures of database was first "washed" to assign protonation states at pH7.4 and to remove counter-ions. Preliminary filters were applied to the washed structures based on physicochemical properties (e.g., MW and number of rotatable bonds) and predefined SMARTS (e.g., reactive groups). Then, each structure was divided into overlapping fragments. A pregenerated library of fragment conformations was searched for each fragment. If any fragment was not found, its conformations were generated on the fly by the stochastic search. Once the conformations for all the constitutive fragments were available, the complete conformers were assembled by systematic combinations of the 3D fragments. This allowed the chirality of stereocenters to vary. The conformations with poor van der Waals contacts or unfavorable geometries (e.g., cis amide) were rejected. The reported strain energy is the sum of the individual fragment contributions and is not equivalent to a typical force field energy, e.g. it does not include interfragment nonbonded energies. By default, no further refinement (energy minimization) was performed, and conformers with a strain energy >4.5 kcal/mol were discarded. A conformation is the output to the final database if i) its estimated strain energy is less than a user-defined threshold (4.5 kcal/mol) and ii) the conformation is different from those found previously (RMSD gradient 0.15 Å by default). The process continued until a specified maximum number of conformations per ligand (500) was reached or no more new conformations could be found. A total of 230 and 52 conformation for training and test compounds were generated respectively and stored in their respective databases.

[Insert Fig. 2 here]

2.3. Pharmacophore elucidation and alignment

The typical pharmacophoric features such as hydrogen bond acceptor, hydrogen bond donor, hydrophobic area, positively and negatively ionized areas, were assigned to each conformation. The pharmacophore elucidation search method, extract common pharmacophoric queries (3D arrangement of molecular features) including 2 point query, 3-point query, 4-point query and 5-point query from a collection of compounds, in which some or all the compounds were active against a particular biological target, such that all or most of active compounds satisfy the queries. In the absence of receptor information, it is anticipated that feature geometries, common to many of the actives, will contain information related to important interactions between the bound conformations of the actives and the receptor. In this context, it was assumed that, all of the active compounds bind to the receptor in the same mode (i.e. overlay well). In the present study, pharmacophoric queries for each point were elucidated for most stable energy conformer of each compound in training set. Most of the conformers of compounds satisfied the 4 and 5 point query and a total of 65 (sixty five) pharmacophore queries were generated on training set compounds (See the supplementary Table S1). Finally, all the compounds in the training set were aligned on generated pharmacophore queries for further AutoGPA 3D-QSAR model development.

2.4. AutoGPA based 3D QSAR model construction

The AutoGPA script was imported to MOE2009.10 to assess the 3D-QSAR functionality. The 3D-QSAR in AutoGPA script is based on theory of CoMFA analysis as reported by R.D. Cramer. Cramer suggested that suitable sampling of steric and electrostatic fields surrounding a set of aligned or superimposed ligands might provide all the information necessary for understanding their observed biological properties. AutoGPA script improves 3D-QSAR model development by considering the pharmacophore queries and allowable space around the aligned ligands. The generated conformers for each ligand were aligned over the pharmacophoric queries and whether or not to keep the particular conformers, is guided by the allowable space and finally prediction was done on the basis of 3D-QSAR model developed. The pharmacophoric queries and allowable space together helped in screening out the large compound library, to select and predict the compounds for respective targets.

In AutoGPA module, the 3D-QSAR model was developed on pharmacophore aligned ligands, using a regular three dimensional grid with a 2.0 Å separation surrounding all the molecules. A sp3 carbon atom with charge +1.0 was used as the probe atom to be placed at each intersection point. A cutoff distance of 4.5 Å was fixed so that atoms within cutoff distance from each grid point will be considered for potential calculation. Molecular fields around each molecules were evaluated by calculating the electrostatic (Columbic, with a 1/r dielectric) and steric (van der Waals 6-12) interaction energies between the molecule and probe atom placed at each grid point. The potential energy in kcal/mol was assigned to each grid point and a set of grids was designated as a grid potential model. During the calculation of grid potential for all the ligands, a limit of van der Waals potential (>30kcal/mol) and a threshold for maximum variance (0.15) was set. After that all the grid potential fields (steric and electrostatic) were calculated, partial least squares (PLS) regression analysis method was used, to study the relationship between grid potential fields around ligands and biological activity, to develop AutoGPA based 3D-QSAR model. The optimum number of components were set upto 10, which was identified by leave-one-out cross validation.

2.5. Validation of model

To check the robustness, accuracy, reliability and predictability of the developed model, an internal and external validation of the QSAR models was performed. For this purpose, the leave-one-out (loo) procedure (cross validated correlation coefficient) as internal validation method and prediction of test set as external validation method was used. In internal validation method, AutoGPA script calculated two kinds of cross validated correlation values, XR2 and Q2, both of them are similar, except their calculation method.

XR2 = MVAR/YVAR (eq.1)

and Q2 = 1-MSE/YVAR (eq.2)

where, MSE (Mean Square Error)

YVAR (variance of experimental data )

MVAR (variance of predicted value by model )

The value of XR2 is an approximation, whereas Q2 is accurate. Often a high Q2 value (Q2>0.5) confirms a reasonably robust model.

Tropsha emphasized that a high Q2 value (>0.5) is necessary but not the sufficient condition for a predictive QSAR model. An external validation using external dataset (test set) of compounds should be performed to assess the predictability of model. It is judged by predictive r2 (r2pred) as shown below:

r2pred = (SD-PRESS)/SD

where SD is the sum of squared deviations of biological activity from their mean and PRESS or predictive sum of squares, is the sum of the squared differences between the actual and predicted biological activity.

Furthermore Golbraikh recommended the criteria i) r2 > 0.6, ii) Q2 > 0.5, iii) or is close to r2, such that or <0.1 and 0.85 ≤ k ≤ 1.15 or 0.85 ≤ k' ≤ 1.15 to be fulfilled for a given QSAR model to be accepted.

Where r2 is the correlation coefficient between the predicted and observed activities of the test set. and are correlation coefficient for regression through the origin for predicted versus observed and observed versus predicted activities, respectively and k and k' are corresponding slopes of regression lines through the origin. The predictability of the model was further confirmed by additional validation parameter such as modified r2 term () as reported by Roy and calculated as follows;

A value of greater than 0.5 suggests that the model obtained from training set has good external predictability.

2.6. Docking Simulation

In order to investigate the binding orientation of inhibitors to their binding pocket and support the hypothesis developed through 3D-QSAR model, a docking simulation of N-arylsulfonyl-3-acetylindoles as inhibitors into the reverse transcriptase enzyme binding pocket was performed using Surflex-dock docking tool with Surflex-Dock GeomX mode available in SYBYL-X. Surflex-dock uses an empirical scoring function and a patented search engine to dock ligand into a protein's binding site. The Protein (PDB code: 1RTD) file was downloaded from the RCSB protein data bank. To prepare the protein file for docking simulation, polar hydrogens were added, and the protonation state was defined at pH7.4. The partial charges for the protein were assigned from AMBER7 FF99 force field. Finally, the position of added hydrogens were optimized by keeping all heavy atoms fixed. The prepared protein file was imported into SYBYL-X suite and binding site was defined using co-crystallize bound tenofovir (TTP) molecule. The docking process was guided by protomol, an idealized representation of ligand that makes every potential interaction with the binding site. All the parameters in Surflex-Dock GeomX mode were kept default during the docking simulation. Ligands were prepared by assigning the Gasteiger-Hückel charges and minimization was done using the Tripos force field with a RMS gradient of 0.05 and iteration of 100. The Surflex-dock scores were expressed in -Log(Kd) units to represent binding affinities .

2.7. Virtual Screening Protocol

The hypothesis in the form of AutoGPA 3D-QSAR model was selected for the virtual screening of the ZINC databases, to find the potential hits as anti-HIV agent. The protocol followed for virtual screening is summarized in Fig. 3. In brief, a total number of 17,08,96,974 molecular conformations from a total molecules (1,78,18,291) of the ZINC database were selected, using the pharmacophoric constraints automatically generated on potent molecules (compound no. 35) of the dataset as template through the web server ZINCPharmer. The pharmacophoric flexibility was provided by considering 3 Hydrophobic, 3 aromatic and 2 hydrogen bond acceptor to increase the number of screened molecules. During screening, preliminary filters such as molecular weight (>500), total number of rotatable bonds to 10 and number of conformations per molecule to 1 (Filter-I) were set. After this filter, a total number of 1683 molecules were retained. The selected molecules were imported in MOE2009.10 and minimized by setting the force field MMFF94x and associated partial charge with RMS gradient of 0.0001. Using conformational import functionality with the same settings as discussed in materials and method section, a total number of 91,490 low energy conformers were generated. The low energy conformers were subjected to next filter based on novel approach of AutoGPA module which combines pharmacophoric queries (four point) and allowable space (Filter-II) generated on the N-arylsulfonyl-3-acetylindole derivatives. A total number of 104 molecular conformers were retained which satisfied both the filter criterion mentioned above and rest of the molecular conformers were discarded. The filtered molecular conformers were predicted using validated AutoGPA based 3D-QSAR model developed. A total of 07 out of 104 molecular conformers showed high to medium predicted activity for HIV-1 target.

[Insert Fig. 3 here]

2.8. In-silico ADME profiling and Toxicity risk assessment test

It would be advantageous in the process of drug discovery, if the information about the ADME properties of molecules are predicted earlier. This information helps the chemist to ameliorate the pharmacokinetic profile of molecules. Drug-likeness or Druggability of molecules was assessed based on Lipinski rule of five as suggested by Christopher A Lipinski. This rule is based on observation that orally administered drug should have a molecular weight (MW) of ≤ 500 Da, a LogP ≤ 5, Hydrogen bond donor ≤ 5, and Hydrogen bond acceptor sites (N and O atoms) ≤ 10. For better drug-likeness properties, total polar surface area (TPSA) of a molecule should be ≤ 140 Å2 which correlates well with the passive molecular transport through membranes as suggested by Ghose et al. The in-silico prediction of ADME properties, of compounds under consideration as well as virtual hits was performed by QikProp version 3.3 of Schrödinger suite 2010 and calculation of total polar surface area (TPSA) of compounds was carried out by MOE2009.10. In order to find any toxicity risk associated with the current data set like mutagenicity, tumorigenicity, irritant and reproductive effects were calculated using OSIRIS property explorer. OSIRIS property explorer also calculates drug-likeness and drug score value for individual compound. The drug-likeness score value is based on summing up the score value of fragments or substructure present in the molecule. Drug score value is combination of drug-likeness, cLogP, logS, molecular weight and toxicity risk for each molecule, which is used to judge the molecules overall potential to qualify as a drug. It is calculated by multiplying contributions of the individual properties using following equation.

where, ds is the drug score, Si is the contributions from cLogP, logS, molecular weight and drug-likeness, ti is the contributions taken from the 4 toxicity risk types. The ti values are 1.0, 0.8 and 0.6 for no risk, medium risk and high risk, respectively.


3.1. Grid Potential Analysis

The AutoGPA script for 3D-QSAR model development generated fifty six (56) QSAR models based on pharmacophore queries. The best model was selected on the basis of statistical and validation parameters, which considered the steric (vdW) and electrostatic grid potential fields along with pharmacophore and allowable space around the ligands and related it with the HIV-1 inhibitory activity. The PLS statistics for the best grid potential analysis model is shown in Table 2. The information content of grid potential around the aligned ligands for best 3D-QSAR model is shown in Fig. 4. The steric (vdW) potential field around the ligand was generated to rationalize the regions in 3D space, where changes in the three dimensional volume increases or decreases the activity in compounds, presented by green and yellow contour maps respectively. The green contours indicate high steric tolerance while, yellow contours denote regions of unfavourable steric effects at particular region around the ligand is shown in Fig. 4 (a)). The green and yellow contour maps are surrounded by the phenyl group at N1 position. The sterically-favourable two green contours at para position of phenyl ring suggested that lengthening of the carbon chain will improve the activity. Indeed, some compounds in the dataset such as 13, 14, 21, 35 bearing ethyl group at para position showed good activity whereas compounds 02, 20, 30, and 38 bearing either methyl substituent or are without substitution at that position showed low activity. So substitution with a long chain increased the activity of the compound. The sterically-nonfavorable yellow contour is adjacent to green contour suggesting that bulkiness or branching in alkyl chain at that position should be considered carefully. Similarly, electronegative groups (like bromine and chlorine atoms) were also substituted at para position in compounds and oriented towards the yellow contour (compounds 04, 07, 11, 17, 25, 36, 38). Compounds with bromine atom at para-position were less active as compared to compounds having chlorine atom, as bromine atom has larger vdW radii than the chlorine atom as shown in Fig. 4(b). Another feature which makes compounds to be in the category of active compounds was nitro group. The steric size of nitro group is comparable with halogens and methyl group, so their presence in molecule makes the compounds to be active. Fig. 4(c) shows a small green contour at meta position of phenyl ring and compounds with nitro group at meta position (compounds 03, 04, 12 and 33) were active. Fig. 4(a) shows one green contour near 6th position of indole ring system, suggesting that the bulky groups might increase the activity. The highly active compounds 31, 33, 35 and low active compounds 26, 27, 38 justified the hypothesis. All the aligned training set compounds with steric grid potential fields are shown in Fig. 4(d).

[Insert Table 2 & Fig. 4 here]

The electrostatic grid potential fields as are indicated by the blue and red contours, exhibit the regions where electron donating and electron withdrawing groups would be favourable for activity. As shown in Fig. 5(a) two red contours are present at phenyl ring at N1 position suggesting that electron withdrawing groups will enhance the activity. In respect to this electron withdrawing nature of nitro group in compounds 04, 08, 25, 33 and nitrile group in compounds 18 and 21 made these to be potent compounds in the dataset. One small red contour near the carbonyl group of 3-acetyl group at C-3 position of indole nucleus suggested that the electron withdrawing groups on this position influence the potency. The compounds 22, 31, 33, and 35 bearing a carbonyl group are highly active. Fig. 5(b) shows the volume of red contour is small, suggesting that electron withdrawing groups are not necessary but assist in enhancing the activity, as observed for compounds 04, 10, 13 and 18, which are from high activity spectrum, as they do not have carbonyl group at that position.

Another electrostatic grid potential field, blue contour (positively favourable region) was surrounding the phenyl ring and indole ring favouring the electron withdrawing groups to be substituted, which made phenyl and indole ring more electropositive and activity is further increased. One blue contour nearby 3-postition of indole ring suggested that electron releasing groups would be favourable, as supported by compounds 22, 31, 33 and 35 bearing alkyl groups.

Fig. 5(c) indicates the best hypothesis with pharmacophoric features shared by all compounds in the dataset and it includes Aromatic (Aro|PIR) π-ring center coloured by orange sphere surrounded by the blue contours, one hydrophobic (Hyd) center positioned over phenyl ring system and coloured by green spheres and two oxygen atoms on sulphur atom oriented towards the projected acceptor annotation (Acc2), coloured by blue spheres. The common pharmacophoric features further helped in screening out the potential ligand for anti-HIV-1 activity. Fig. 5(d) shows an AutoGPA for the best 3D-QSAR model, all the training set compounds are aligned based on best pharmacophore model and allowable space around them.

[Insert Fig. 5 here]

3.2. Molecular Docking Analysis

Molecular docking studies offer precise information for studying the interaction between the ligand and protein residue and confirm the corroboration with 3D-QSAR model.

To validate the docking method used, the co-crystallized bound ligand was docked back into the binding pocket to ascertain the reproducibility of the docked pose as compared to the co-crystallized bound ligand. The result of redocking analysis as exemplified by the RMSD value of 0.5498 showed that Surflex-dock revealed good ability to reproduce the co-crystallized bound ligand conformation. Redocking pose is shown in Fig. 6. After validation, all the molecules were docked into the binding pocket. The Surflex-dock, successfully positioned all the molecules in the binding site as shown in Fig. 7. The protein-ligand binding mode and their interaction diagram are shown in Fig. 8 for potent compound 35 of the dataset. The indole nucleus was surrounded by the negatively charged amino acids (Asp110, Asp113, Asp185) and positively charged amino acids (Arg72, Lys65, Lys66), , which were well correlated with the developed AutoGPA 3D-QSAR model as shown in Fig. 5(a), in which red contour maps (negative favourable region) are located in positively charged area and blue contour maps (positive favourable region) are directed towards negatively charged area of binding pocket. The contour maps suggests that negatively and positively charged favourable groups are desirable in that area for better interaction and inhibition activity. A favourable arene-cation interaction was observed with indole nucleus and Arg72 residue. Most of the potent compounds of the dataset bear a carbonyl group at C-3 position and it is believed that they stabilizes the inhibitors to their binding site by making favourable interaction with Asp110, Asp185 through coordination complex with metal ion (Mg++), as depicted in Fig. 8. The substituents present at C-6 position are juxtaposed into hydrophobic amino acids (Phe116, Ala114), making hydrophobic interaction and are suggested to be important, as differences in potency could be seen in compounds of dataset bearing an alkyl group and compounds without alkyl group. Furthermore, support for these kinds of features, Fig. 4 shows 3D-QSAR steric contour maps, where green potential field at that position suggests favourable hydrophobic groups for better inhibition. Another important feature well correlated with 3D-QSAR contour maps is phenyl ring system at sulfonyl group, which is surrounded by the Gly152, Met184, Tyr115 and Leu74 belonging to a group of nonpolar/hydrophobic amino acids and forms a hydrophobic environment for substituents present at phenyl ring system. A favourable arene-arene interaction was observed between Tyr115 and phenyl ring system. The steric contours specially green, which favours bulky groups around the phenyl ring and compounds having such kind of features are potent whereas less potent compounds lack these features, suggesting that these groups are important for better inhibition and further validated the AutoGPA based 3D-QSAR model.

[Insert Fig. 6,7,8 here]

3.3. Virtual Screening Result Analysis

One efficient approach to drug discovery is the virtual screening of the molecular libraries. The hypothesis in the form of validated AutoGPA based 3D-QSAR model was used to retrieve some hits from ZINC database. As a result of screening with a combination of two filters, followed by prediction, seven molecules as hits were retrieved. All the hits retrieved had similar analogy with the dataset under investigation. The hits showed high to medium predicted activity as possible anti-HIV-1 agents. All hits showed similar alignment over the pharmacophoric features (four point), encompassing in allowable space with steric and electrostatic contour maps corroborating well with substituents present over the molecules as shown in Fig. 9. The molecular docking analysis was further carried out to strengthen the AutoGPA based 3D-QSAR analysis results. All the seven molecules were docked into the binding pocket, guided by the Surflex-dock GeomX mode. The docking score as Surflex-dock score helped to rank the molecules accordingly. Table 3 lists the virtually screened hits along with their predictivity based on AutoGPA based 3D-QSAR model and Surflex-Dock score values. The docking analysis of seven molecules showed that they shared the similar binding mode as the current dataset of the molecules. All of them showed similar kind of interaction with the key amino acid residue as exemplified by favourable arene-cation, arene-arene and hydrophobic interactions with the substituents present over the molecules. All the molecules have shown a favourable kind of interaction with Asp110, Asp185 through coordination with metal ion (Mg++) which stabilized the molecules in the binding pocket as shown in Fig. 10.

[Insert Fig. 9, 10 and Table 3 here]

3.4. In-silico ADME profiling and Toxicity risk assessment test analysis

The ADME properties of molecules in humans depend on several factors like their permeability through skin or gut (LogKp), ability to cross the blood-brain barrier (LogBB), serum protein binding (LogKhsa), Caco-2 permeability etc. Table S2 (See the supplementary file) lists ADME and TPSA parameters for compounds under consideration. The in-silico ADME prediction of dataset under consideration showed that most of the molecules were orally bioavailable, as they pass the Jorgenson's rule of three and could be a possible drug candidate by obeying Lipinski rule of five. LogP is the partition coefficient between octanol and water, which governs the molecule to be either hydrophilic or lipophilic, the predicted value for LogP should be below 5, and all the molecules of the data set were within limit. The LogP value for compounds 05, 06, 11, 15 were high as compared to rest of the dataset molecules. The LogBB, which governs the ability of a molecule to cross the blood-brain barrier (-3.0 (low ability) to 1.2 (high ability)) were predicted and most of the compounds showed their predicted values in negative, suggesting the low permeability, whereas compounds 05, 06, 11, and 15 were predicted for high permeability through the blood-brain barrier. The results of predicted CNS activity for all the compounds showed that, they are CNS inactive, except compounds 05, 06, 11, and 15. So the three properties (LogP, LogBB and CNS prediction) showed reciprocal relation, compounds 05, 06, 11, and 15 with higher LogP value, would be able to cross the blood:brain barrier and would be CNS active. Observing the structure and activity table, it is found that these compounds belong to low activity spectrum except compound no. 15 and it is thought that behaviour of such kind of ADME properties might be due to the presence of some electronegative groups in their parent structure, which enhances their lipophilicity. The hydrophilicity to the molecule is imparted by the presence of polar functional groups substituted at different position on parent nucleus, making these molecules to be having intermediate to low LogP value, lesser ability to cross the blood:brain barrier and reduced CNS activity. The oral bioavailability of all the molecules was assessed on TPSA, as it was apparent from studies that TPSA correlates inversely to lipid penetration ability, and compounds having TPSA>140Å were absorbed less than 10% in humans. The TPSA for all the molecules were within limit. Further to support this, predicted Caco-2 was also calculated, which is permeability through the intestinal epithelium, and all molecules showed good permeability except compound 41.

The virtual screening hits from ZINC database were also subjected to in-silico ADME prediction. As the results of LogP, LogBB and CNS prediction from dataset were interrelated, the same correlation was also observed for virtual hit molecules. Summarizing that, for all hits, LogP values were below 5, LogBB values between -3.0 to 1.2 and were CNS inactive. Caco-2 cell permeability for all the hits were predicted and were in acceptable range, while the predicted LogS for all the hits showed that these molecules possessed intermediate aqueous solubility. The oral bioavailability was also predicted (TPSA>140 Å) and was within limit, showing excellent oral absorption. Table 4 shows the in-silico ADME prediction results.

[Insert Table 4 here]

Toxicity risk assessment test on the current data set was performed to find any kind of toxicity risk like mutagenicity, tumorigenicity, irritant effects and reproductive effects (See the supplementary Table S3). The toxicity risk prediction program locates the fragments within molecule that indicate a potential toxicity risk. Toxicity risk assessment test showed most of the molecules were devoid of toxicity. Mutagenicity prediction with medium risk was shown by Compounds 03, 12, 20, 24, 33, 41 as these compounds were substituted by -NO2 (nitro groups) at 3-position, mutagenicity increased for compounds 4 and 25 with -Cl (chloro group) at 4-position and -NO2 (nitro group) at 3-position. It was also observed that presence of -COCH3 group at R3 position reduces the irritant effect (compound 25). The Toxicity risk assessment test was also performed on the virtual hits. The result indicated that all the molecules were devoid of mutagenicity, tumorigenicity, irritant effects and reproductive effects. Table 5 summarizes the toxicity risk assessment parameters calculated for virtually screened hits. Overall drug score was calculated, to judge the potential of a molecule to be a drug, which combines drug-likeness and toxicity risk parameters. Results of OSIRIS property explorer for overall drug score of dataset as well as virtually screened molecules are comparable with Zidovudine (AZT) Efavirenz (EFV), and Nevirapine (NVP) as standard drugs and can be used for further exploration, optimization and testing processes.

[Insert Table 5 here]


The spatial alignment of conformations of compounds is very important to derive the meaningful 3D-QSAR model. The AutoGPA based 3D-QSAR model development took into account pharmacophoric points searching in all active conformers of training set compounds followed by alignment, and finally prediction of test set compounds. The alignment in this way provides best superimposition for better QSAR model development. In selected AutoGPA based 3D-QSAR model, the steric (vdW) grid potential field contributes 20% and electrostatic grid potential field contributes 80% towards the HIV-1 inhibition activity. The contour maps were able to explain the influence of substituents on activity. The predictive ability of the model was confirmed by predicting the activity of 11 compounds used as external test set. In support to AutoGPA based 3D-QSAR model developed, a molecular docking approach on current dataset was also performed and all compounds were observed to bind in the same fashion and very well corroborated with the steric (vdW) and electrostatic grid potential field around ligand.

The results suggested that to design more potent HIV-1 inhibitors; structural requirement on this series needs to focused on the electronegative pocket (Asp110, Asp113, Asp185) and electropositive pocket (Arg72, Lys65, Lys66) offered by amino acids, hydrogen bond interaction with Asp110 and Asp185 through Mg++ and hydrophobic interactions with amino acids (Phe116, Ala114, Gly152, Met184, Tyr115). The hypothesis in the form of validated AutoGPA based 3D-QSAR model was selected for virtual screening of ZINC database. Seven (07) hits were retrieved through this process having high to medium predicted activity. All the seven molecules were docked into the binding pocket and Surflex-Dock score were determined. The results of docking study revealed the same binding orientation for virtually screened hits as well as current dataset. The in-silico ADME profiling and Toxicity risk assessment results for the dataset as well as virtually screened hits were found to be within standard limit. The present work revealed new hits as valuable leads for further biological assays. Thus the AutoGPA based 3D-QSAR analysis serves as a novel approach for screening of new lead molecules from large database within the covered space for HIV-1 target.


The author is grateful to Director, Shri G.S. Institute of Technology and Science, Indore for providing the facility to carry out the work. SK acknowledges Naoyuki Asakawa of Royaka Systems Inc. for providing the AutoGPA script and valuable suggestion about the program. Financial support (Senior Research Fellowship) provided by Council of Scientific and Industrial Research is thankfully acknowledged.