Research criteria and databases

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.





The databases are used to search and obtain data that related to the research such as target proteins, ligands and drug. These data are all come from different databases based on their type and structure. Since all these databases are reliable, well-managed and always up-to-date hence the validity of the structures obtained are not in doubt. COLLABORATORY FOR STRUCTURAL BIOINFORMATICS (RCSB) PROTEIN DATA BANK (PDB)

Figure 3.2.1 Homepage of RCSB Protein Data Bank

The RCSB Protein Data Bank is a publicly available biological macromolecular resource centre. It contains all known and supported protein

molecules with only confirmed structure are allowed in this database. Hence, provides a reliable source of protein information. The structures of the molecules in PDB are usually presented in a 3D structure format which makes it easy for molecular docking processes. Furthermore, RCSB Protein Data Bank also allows us to download the data in PDB format which is a universal format for 3D protein structure. Also we can view the data in FASTA format. FASTA format is convenience when it comes to sequence analysis as it enables us to straight copy and paste the sequence for BLAST or even phylogeny tests. DATABASE

Figure 3.2.2 Homepage of ZINC Database

ZINC Database is a small molecule database. This is where the 3D structure of ligand is obtained. Currently ZINC Database provides more than 35 million compounds although some of the contents are purchase-only but this database is free-to-use for public. It allows us to download files in multi-format such as MOL, SMILES and PDB.


Figure 3.2.3 Homepage of DrugBank

The DrugBank database is a publicly unique bioinformatics and cheminformatics drug resource that combines detailed drug data together with comprehensive drug information. Currently DrugBank contains 7759 drugs with more than 1600 FDA-approved drugs. This database allows user to download drug data in various format including MOL, SMILES and PDB.


Online tools are used to test the relevant data before they were undergoes the docking simulation. Those tests are including Lipinski’s Rules test, active-sites predictions test and sequence alignment test. These tests are important in order to determine the validity of the structure and were carried out to both target proteins and ligands. Though there is one online tool that is used to carry out the docking simulation as opposed to the standalone software.

Figure 3.3.1 Homepage of Molinspiration

Molinspiration is an online tool that offering free services for calculation of important molecular properties such as logP, polar surface area, number of hydrogen bond donors and acceptors and others. Or in other word Molinspiration is used to carry out the Lipinski’s Rule test and bioactivity score ligand as well as the drug. This test will determine whether the ligand or the drug is suitable for oral consumption. DOCK

Figure Homepage of SwissDock

SwissDock is an online tool that offers docking service to predict the molecular interactions that may occur between a target protein and a small molecule or ligand. SwissDock is based on the docking software EADock DSS, whose algorithm consists of many binding modes that are generated either in a box (local docking) or in the vicinity of all target cavities (blind docking). DoGSiteScorer


Figure Homepage of DogSiteScorer

DogSiteScorer is an automated pocket detection and analysis tool which can be used for protein druggability assessment. Predictions with DoGSiteScorer are based on calculated size, shape and chemical features of automatically predicted pockets, incorporated into a support vector machine for druggability estimation.


Standalone software is used to carry out the docking simulation between target protein and ligand or drug. Furthermore, some standalone software also used to view the 3D structure of the target protein, ligand, drug and the product of docking which is protein-ligand complex. TOOLS

Figure Graphical User-Interface of AutoDock Tools

AutoDock Tools is free educational-purpose software that can simulate a docking process and predict how ligands are interacting with the macromolecules. It has a built-in AutoDock Vina plug-in that will be used to perform the docking simulation. Autodock Vina performs flexible ligand docking, so the optimal geometry of the ligand will be determined during the docking. Result of the docking process will be analysed by considering the scores of RMSD and binding affinity. MOLECULAR VIEWER


Figure Graphical User-Interface of PyMOL

PyMOL is stand-alone downloadable molecular viewer software. In this project, it was used as the most important tool for the crucial part of research which is the structural alignment step. However, for some research, due to certain procedures limitation the researchers need to develop a new tool to be used as a PyMOL plugin. When this software is opened, it will give user a viewer window and its Molecular Graphics System window on the top where all the settings can be configured. In the viewer window, the action button were all listed on the bottom right in alphabet codes such as the letter S to show the list of amino acid residues on the exact location; the shortcut keys of moving the protein structure, and such. CHIMERA


Figure Graphical User-Interface of UCSF Chimera

UCSF Chimera is a program for visualization and investigation of the compound structures and their data. The results can be in images or animations were available to be downloading freely for users. This software was used to convert MOL2 file data to .pdb format of compound. Some of the data that can be obtained from UCSF Chimera are ViewDock which were used to screening of the orientations of docked ligand and Movie tools for replay the trajectories of compound.




During the data mining phase, all the data relevant to the research were collected. Firstly, 3D structure of Adenomatous polyposis coli protein was searched in RCSB Protein Data Bank and found. Then the structure was downloaded into the computer in .pdb format with a codename of 3NMW. Next, we searched for ligand structure which is in our study was the Lupeol structure in small molecules database, ZINC Database. There were a few Lupeol structures available but we chose the one that usually found in Arabidopsis thaliana. The file is downloaded in .mol2 format. Lastly we searched for the currents drug which is the Celecoxib in DrugBank. After found the drug we downloaded the file in .mol format.


Before going through the docking simulation, the APC protein was undergoes a few tests and was prepared for a smooth, accurate docking result. First of all, after downloaded the protein structure from RCSB Protein Data Bank into the computer in .pdb format, the structure was then submitted to DogSiteScorer where its potential active sites on the protein surface are calculated based on the 3D coordinates of the protein. The results were generated within five to ten minutes as it was depends on the server’s load. The results are also sent to the user’s email.

Then we opened AutoDock Tools software and load the APC protein into it. By using the features available in the AutoDock Tools, the water molecules were removed from the structure. Later we added hydrogen on the macromolecules as well as merged all the non-polar in all those macromolecules. After that we added Kollman charges to the molecule and the total Kollman charges added was shown at the same time. Afterward, we detected the root of the ligand in the structure and changed the torsion roots into rotatable if the bonds were non-rotatable. Also, we chose the torsion for the molecule. Lastly, we saved the structure file as in .pdbqt format as compatible for AutoDock Vina.


The downloaded Lupeol structure was submitted to an online tool called Molinspiration to calculate the Lipinski’s Rules of Five which consists of the calculation of logP, number of hydrogen bond donor, numbers of hydrogen bond acceptors, molecular weight and rotational bonds. Then the structure was loaded into UCSF Chimera to be converted into .pdbqt format as compatible for AutoDock Vina.


Similar to Lupeol structure, the downloaded Celecoxib structure was submitted to an online tool called Molinspiration to calculate the Lipinski’s Rules of Five which consists of the calculation of logP, number of hydrogen bond donor, numbers of hydrogen bond acceptors, molecular weight and rotational bonds. Then the structure was loaded into AutoDock Tools to be converted into .pdbqt format as compatible for AutoDock Vina and .mol2 format as compatible for SwissDock.


Last step and the most important part of the methods is the docking simulation. For this particular study we were using 2 different docking software but with same flexible docking mechanism. The software is AutoDock Vina and SwissDock. VINA

Firstly, both .pdbqt format files of APC protein structure and Lupeol were placed in a folder. Then two main executable files of AutoDock Vina which are “vina.exe” and “vina_split.exe” were placed into the same folder along with configuration file named “conf.txt”. The configuration file is consists of all the setting need to run the docking simulation including the parameters of the grid size, receptor, ligand and output file. The configurations were as follows:


Figure 3.3.6 Configuration file

The docking processes were run manually and it was done from the command prompt of the computer's operating system. The same steps were repeated with APC protein structure and Celecoxib using the same setting as default to get accurate and fair results. SWISSDOCK

In SwissDock, first APC protein structure and Lupeol were uploaded to the SwissDock server. The APC protein structure was in .pdb format while the Lupeol was in .mol2 format. After the relevant information in the submission form has been filled up, the job was then submitted as all the parameters and setting were all automatically configured by the SwissDock itself. The docking process was taking a few hours depending on the server’s load and job query. Lastly the results were sent through the user’s email.