The Sequence Analysis And Physicochemical Characterization Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The snake venom 5 Nucleotidase target sequence with accession number A6MFL8 was retrieved from Uniprot database and its physiochemical characterization was computed using the Expasy Protparam program. A similarity search for SV-5ʹ NUC in the Protein Data Bank (PDB) was performed using the BLAST server. Crystal structure of Human 5ʹ Nucleotidase (H-5ʹ NUC) PDB ID - 2J2C was selected as the template for the target SV-5ʹ NUC based on its sequence and functional homology. Alignment between target SV-5ʹ NUC sequence and the template H-5ʹ NUC sequence was performed and visualized using ES pript.

2.2. Homology modelling and validation

Homology modelling of the target protein was carried out with MODELLER9v7 and multiple models were generated. The generated models were ranked based upon their Discrete Optimized Protein Energy (DOPE) and MOLPDF scores. The target model having the least DOPE and MOLPDF scores with acceptable statistics from Ramachandran plot was selected further for all other studies. Validation studies was further performed on selected SV-5ʹ NUC target model using NIH SAVES server analysis.

2.3. Molecular dynamics simulation (MD) of chosen target model

MD stimulation was carried out using 43A1 force field of Gromacs96 enforced in the GROMACS program. A cubic box with the SPC water model was built and submitted to maximum 1000 steps of energy minimization using the steepest descent gradient algorithm. Leap-frog algorithm was used for integrating Newton's equations in MD simulation. The chosen target model was subjected to equilibration for 1000 steps. Further a MD simulation for 500 ns at 300 K was performed, using 2 fs step integration time. Constraints were used on all protein covalent bonds to maintain the constant bond length. Berendsen temperature and Parinello-Rahman pressure coupling were used to subdue the drift effect during equilibration and MD simulation. Co-ordinates and energy terms (potential energy for the whole system) were saved for every 10000 steps, with the aim of evaluating the protein system stabilization throughout MD simulations.

2.4 Binding-site prediction

The binding-site identification plays a major role in structure based drug design (SBDD). In our study, the binding-site region of the chosen predicted model was identified by using SiteMap program (v2.5) which identifies one or more regions suitable for ligand binding. Further, the hydrophobic and hydrophilic map (donor, acceptor and metal-binding regions) was produced using various contour maps and scored. The score was generated using default parameters implemented in SiteMap program (v2.5) to generate more than two sites.

2.5. Ligand preparation

The chemical molecules vanillin (CID: 1183) and vanillic acid (CID: 8468) were retrieved from Pubchem database. The ligands were prepared for docking by using LigPrep program (v2.5). The tautomers for each of these ligands were generated and optimized. Partial atomic charges were computed using the OPLS_2005 force field and the ligands were energy minimized.

2.6. Molecular docking analysis

The "Extra Precision" (XP) mode of Glide (v5.7) was used to perform all docking calculations using the OPLS-AA 2005 force field. In this work the bounding box of size 10 Å Ã-10 Å Ã-10 Å was defined and confined to the sitemap predicted active site region of SV-5ʹ NUC model for docking the ligands. The scale factor of 0.4 for van der Waals radii was applied to atoms of protein with absolute partial charges less than or equal to 0.25. Five thousand poses per ligand were generated during the initial phase of the docking calculation, out of which best 1000 poses per ligand were chosen for energy minimization. The dielectric constant of 4.0 and 1000 steps of conjugate gradient minimizations were included for energy minimization protocol. Upon completion of each docking calculation, 10000 poses per ligand were generated and the best docked structure was chosen using a Glide Score function. The choice of the best pose is made using a model energy score that combines the energy grid score, Glide score, and the internal strain of the ligand.

2.7. ADME analysis

The QikProp program (v3.4) was used to obtain the ADME properties of both compounds. It predicts both physically significant descriptors and pharmaceutically relevant properties. The program was processed in normal mode, and more than 44 descriptors were analyzed for all the molecules. It also evaluates the drug-likeliness of the compounds based on Lipinski's rule of five, which is essential for rational drug design.

2.8. Energy-optimized pharmacophore mapping

Energy-optimized pharmacophores (e-pharmacophores) are obtained by mapping the energetic terms from the Glide XP scoring function onto atom centers. The ligand is docked with Glide XP and the pose is refined. The Glide XP scoring terms are computed, and the energies are mapped onto atoms. The pharmacophore sites are generated, and the Glide XP energies from the atoms that comprise each pharmacophore site are summed. The sites are then ranked based on these energies, and the most favorable sites are selected.

2.9. Molecular electrostatic potential analysis

The Molecular electrostatic potential (MEP) analysis at the functional binding pocket of the modeled target protein was carried out using Pymol (v1.3) based on their surface level potential values. The Poisson-Boltzmann based molecular surface was generated and visualized using Pymol (v1.3).

3. Results and discussion

3.1 Sequence analysis and physicochemical characterization

Demansia Vestigiata SV-5ʹ NUC comprises of 559 aminoacids (Uniprot id: A6MFL8) with a molecular mass of 64,642 Da, and is said to contain hydrolase activity. The sequence analysis revealed that SV-5ʹ NUC belongs to superfamily of proteins. The physico-chemical characterization of the protein revealed the following: Theoretical pI : 5.61; Total number of negatively charged residues (Asp + Glu): 80; Total number of positively charged residues (Arg + Lys): 64; Extinction coefficient : 72700 M-1 CM-1 with a estimated half-life of 30 hours (mammalian reticulocytes, in vitro). The computed instability index score 40.77 of SV-5ʹ NUC revealed that the protein was unstable. The grand average of hydropathicity (GRAVY) and aliphatic index prediction of SV-5ʹ NUC aminoacids revealed a score of -0.400 and 77.92 respectively.

3.2 Homology modeling and validation

From sequence analysis and BLAST search against PDB database, the functional homolog of SV-5ʹ NUC in humans, H-5ʹ NUC (2J2C) with a resolved crystal structure of 2.2 Šwas identified as the template for homology modelling studies due to its lowest e-value of zero and high sequence coverage/identity of 95%. The last 77 residues are not modelled due to the lack of structural information and believed not to have any functional role in enzymatic activity of SV-5ʹ NUC. Four models for the modeled region of SV-5ʹ NUC were generated and the best model among them was chosen as SV-5ʹ NUC1 based on its lowest molpdf and DOPE scores (Table. S1). The modelled structure (SV-5ʹ NUC1) confirmed that it is a member of 5_nucleotid superfamily of α/β hydrolases containing the HAD-IG-nucleotidase subfamily domain (1-480 residues). The modelled region of SV-5ʹ NUC1 structure (95% identity and 97% similarity towards 2J2C) is depicted in Fig. 1. It was observed that SV-5ʹ NUC1 has a mixture of α/β folds with single Rossman-like domain containing haloacid dehydrogenase (HAD) member like motifs as shown in Fig. 2. HAD member motifs are recognised by the presence of three specific motifs {hhhhDxDx(T/V)}, {hhhh(T/S)}, and {hhhh(G/N)(D/E)x(3-4)(D/E)} (where "h" stands for a hydrophobic residue) in the sequence.

The validation of SV-5ʹ NUC1 model with Ramachandran plot revealed 85.1% amino acid residues in the favorable region, 14.4% in additionally allowed region and 0.5% in the generously allowed region respectively. Moreover, none of the residues was observed in the disallowed region (Fig. 2). Thus, our SV-5ʹ NUC1 model is stereochemically significant with the reasonable distribution of backbone angle in the protein structure and acceptability of the built model. The G-factors, indicating the quality of the covalent, dihedral and overall bond angles, were −0.10° for dihedrals, 0.42° for covalent, and −0.11° overall. The overall main-chain and side-chain parameters, as evaluated by ProCheck, are all very favorable. The ERRAT plot depicted the various non-bonded interactions between different atom types of amino acids. It provided the structure modifying guidance to improve the sterically hindered regions in the protein. The overall quality factor of homology model was 90.48% in ERRAT plot, with minor 'structure error' that reflects the steric hindrance between few amino acids (Fig. S1). As expected, Verify-3D also revealed that 91.67% of the amino acids in the current structure of SV-5ʹ NUC1 have compatible 1D-3D score greater than 0.2. The SV-5ʹ NUC1 model has Z-score value of -0.82 in the range of native conformations of crystal structures which further enhanced the confidence of accepting the SV-5ʹ NUC1 model. The crystal structure of H-5ʹ NUC and SV-5ʹ NUC1 model was superimposed to confirm the striking conformational similarity between them. The RMSD value of 0.5 Šwas observed for the superimposed structure (Fig. 3). It further emphasized over the quality of the built model due to the minimum deviation with respect to backbones and side chains respectively.

3.3. MD simulation

In order to check the stability of SV-5ʹ NUC1, RMSD of backbone atoms from MD production run was plotted as time-dependent function in Fig. 3. The graph clearly indicates that there is significant change in RMSD for the initial 200 ps and then the system stabilized with fluctuations less than 0.3 Å. The RMSD between energy minimized model of SV-5ʹ NUC1 and final structure from MD simulation was low (0.512 Å). Furthermore, structural comparison of energy minimized structure with structures generated throughout the MD production run indicates that the energy minimized SV-5ʹ NUC1 model represents a stable conformation. Structure validation results suggest that the energy minimized SV-5ʹ NUC1 model is precise for molecular docking process.

3.4. Binding-pocket prediction and docking analysis

In order to investigate the interaction between SV-5ʹ NUC1 and the pubchem ligands, the binding site was defined based on the calculations predicted by the SiteMap module in Schrödinger and as well as based on the information available from the literature. The best siteMap1 binding site residues (Table. S3) predicted from siteMap analysis coincided with the information from the literature. The predicted site is comprised of amino acid residues Asp52, Asp54, Tyr65, Thr72, Phe155 and Asp346 believed to be important for the ligand-protein interaction. Based on the literature and our SiteMap1 results, this site has been chosen as the most favorable binding site to dock the ligands (Vanillin and vanillic acid). Further the Glide XP mode docking was performed for both the energy minimized ligands in the validated binding pocket of SV-5ʹ NUC1 protein. Vanillin forms hydrogen bond interactions with side chain OH atom of Tyr65. Vanillic acid forms hydrogen bond interactions with side chain OH atoms of Tyr65 and Thr72. Their observed interaction binding poses and interaction maps are shown in Fig.4 and Fig. S4. The observed binding pattern of vanillic acid makes us speculate that the -COOH group present in it could provide better interaction for inhibition compared to the -CHO group of vanillin. From the molecular docking results, we observed that Vanillic acid was found to be the better inhibitor than vanillin due to lower Glide XP score and Glide energy score (Table.2). Moreover Vanillic acid was said to contain only two hydrogen bond weak interactions with Tyr65 and Thr72 compared to only one hydrogen bond of vanillin with Tyr65. As observed earlier by us, the bioinformatics results confirm the experimental results of vanillic acid as better inhibitor than vanillin, confirming it as a better inhibitor based on their already reported SV-5ʹ NUC IC50 values in three other snakes Naja naja, Daboia russellii and Trimeresurus malabaricus.

3.5. ADME analysis

We analyzed 44 physical signifiers and pharmacologically relevant properties of the two lead compounds, including molecular weight, H-bond donors, H-bond acceptors, log P (octanol/water), log P MDCK, log Kp (skin permeability), humoral absorption and their positions according to Lipinski's rule of five (Table 3 and Table 4). Lipinski's rule of five is a rule of thumb to evaluate drug likeness; in other words, to determine if a chemical compound with a certain pharmacological or biological activity has properties that would likely make it an orally active drug in humans. The rule describes molecular properties that are important in the drug's pharmacokinetics in the human body, including its ADME. However, the rule does not predict whether a compound is pharmacologically active. The two selected compounds were in the acceptable range of Lipinski's rule of five. For the two lead compounds, the partition coefficient (QP log P (o/w)) and the water solubility (QP log S), which are crucial when estimating the absorption and distribution of drugs within the body, ranged between −1.690 to 1.727 and −1.582 to −4.691, respectively, while the cell permeability (QP PCaco), a key factor governing drug metabolism and its access to biological membranes, ranged from 0.345 to 95. Overall, the percentage human oral absorptions for the compounds ranged from 25% to 100%. All of these pharmacokinetic parameters are within the acceptable range defined for human use, thereby indicating their potential for use as drug-like molecules.

3.6. E-Pharmacophore mapping

The e-pharmacophore combines aspects of structure-based and ligand-based techniques. Incorporating protein-ligand contacts into ligand-based pharmacophore approaches has been shown to produce enhanced enrichments over using ligand information alone. The method described here attempts to take a step beyond simple contact scoring by incorporating structural and energetic information using the scoring function in Glide XP. Based on the above mentioned consideration three common pharmacophore sites were observed in vanillin and vanillic acid. The common pharmacophore energetically favorable sites for both these ligands were found to consist of an acceptor group (A2), an aromatic ring (R6), and one H-bond donor (D4) as shown in Fig. 4. The distances and angles between them were calculated and are shown in Fig. 4. These energetically favorable sites encompass the specific interactions between the ligands and the SV-5ʹ NUC1 protein. This information should prove helpful in the development of new SV-5ʹ NUC inhibitors.

3.7. Molecular electrostatic potential (MELP) analysis

The molecular electrostatic interaction is a crucial part of the non-covalent interaction energy between the molecules. The MELP on a molecular surface can be used to visually compare two molecules, guide docking studies, and identify sites that interact with its ligands. Numerous studies have employed the MELP technique to relate the biological potency of different ligands based on potential values. The color grade for the MELP ranges from deep blue color representing the most negative potential to deep red color representing the most positive potential. This analysis can also provide the 3D spatial features of the binding cavity of the protein-ligand interactions.


Our main objective of this work was to identify the residues involved in the cleavage mechanism through theoretical calculations. The identification of inhibitors for Chikungunya virus has been hampered but a lack of structural insight into any proteins. Therefore, we have chosen to model the nsP2 protein, which plays a vital role in activating the nonstructural protein complex by cleaving the proteins into subunits of nsP1, nsP2, nsP3 and nsP4. The model was further validated by molecular dynamics simulation and various validation tools. Again, the model was subjected to flexible peptide docking and further e-pharmacophore mapping was carried out. Ligands that had a fitness score of more than 1.0 were subjected to a rigid docking study. As per our docking analysis, the residues Gln1039, Lys1045, Glu1157, Gly1176, His1222, Lys1239, Ser1293, Glu1296 and Met1297 show crucial interactions with the nonstructural protein complex to be cleaved, and were considered an individual functional unit. Chikungunya virus replication and propagation depends on the nsP2 protein; so a chemical compound that inhibits this protein by targeting the key residues specified above will be potentially applicable therapeutically. Based on the docking results, we can report four chemical compounds that may be potential inhibitors of nsP2 protease. Furthermore, the backbone structural scaffolds of these four lead compounds could serve as building blocks in the design of drug-like molecules for the treatment of Chikungunya viral fever. Besides targeting the Chikungunya virus, the inhibitors may act against other members of the Alphavirus genus due to the high sequence similarity among alphavirus proteins, which thus provides a clear potential path towards the identification of broad-spectrum drugs.