Distributed Automated Docking Of Flexible Ligands To Protein Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The author also defines autodock as an unbiased technique which includes an SA (Simulated Annealing) search with a rapid grid based energy evaluation method. Autodock predicts the most probable positions of binding of a ligand to a rigid macromolecular/protein target. This method is used to predict the binding of substrate and enzymes. This version (2.4) of auto dock has enhanced features of techniques of coordinate manipulation and energy evaluation. It has also been developed on UNIX platform to distribute multiple docking jobs on a cluster of a variety of workstations and then analyze the result.

The materials and methods part the paper describes the SA technique. SA allows a wide variety of degrees of freedom to be searched during the experiment but it does not guarantee that it will find a global minimum conformation. The grid technique gives a detailed energetic model, but can be used only in case of rigid targets.

A docking job is one process started by one command in UNIX prompt and is controlled by one docking parameter file and each parameter file may have separate dockings. Each docking process is started with a ligand in one particular position and untill finally it arrives at a docked conformation. After all the dockings are completed a clustered analysis is carried out on the docked conformations. In each process of SA, a particular number of cycles are conducted at different annealing temperature which is progressively reduced during the progress. In each cycle the conformation and position of the ligand are changed to obtain new states, thus the ligand is said to have made a move or step. This step may be accepted or rejected depending on the annealing temperature, the energy of the new state and the energy of the previous state.

The ligand molecule is allowed to move in the six spatial degrees of freedom for orientation and torsional degrees of freedom specific to the ligand. The user may choose to start the process in a particular conformation and location or may have a new docking run at randomly chosen states , but in a randomly chosen state during the SA run may cause the clash of some or all the atoms of the ligand with the receptor and may trap it in the small interstices of the protein ,this will result in a series of unproductive rejections and a final energy of 10^5 K cal per mol or even more. This is prevented by a parameter called 'eOmax'. That is, if a state having energy higher than that of the criteria is chosen, then a new random state is chosen until the criteria is satisfied. The user may limit the number of times this process is to be carried out, if still a suitable energy state cannot be found then the state with minimum energy during the iterative stages is taken as the starting point for docking

Each new state is generated by adding or subtracting a random fraction of a maximum step size for each of the variables and then converted into Cartesian coordinates. The user may also specify the starting and ending values and ask the program to correct or calculate the correct factor based on the SA cycles.

A problem occurs when the ligand encounters face of grid map, A two dimensional movement occurs which is similar to that of the proposed molecule encountering a protein surface, here the ligand spends unproductive time in exploring artificial surfaces; this problem was rectified by introducing a reflective barrier. So, if the ligand is to intersect an energetic wall ,two times the translation is applied. This causes the ligand to bounce away from faces back into the relevant portions in the space.

The energy evaluation is carried out on a grid based method and is used to precalculate grid maps. The protein is place in a 3 dimensional grid and a probe atom visits each grid point. For each grid point the pair wise interaction energy of the probe is summed all over the protein atoms within a non bonded cutoff radius of 8 Angstrom. A table is created from which interaction energies can be calculated. Different grids are calculated for different atoms in the ligands.


These are of two types

Soft Constraints are specified by preferred torsion angle and the constraints half width used to define inverted Gaussian energy profile. Hard Constraints keep the torsion constraints exactly as the user had mentioned and does not allow the torsion angle to deviate from these bounds. The analysis of docking results is accomplished by 'Structure binning'. The coordinates of the lowest energy conformation are compared by rmsd ( Root Mean Square deviation ) with those of all other conformations .A list of similarly docked conformations from lowest to highest energy order .If a reference structure is known such as the X ray crystallographic structure then the rmsd of each of the conformation can be compared with it order

Distributed docking

The docking was carried by a cluster of Hewlet Packard precision Architecture 9000/735 stations. The grouping scripts combines all the available docked conformation that have been the output of autodock jobs. Autodock can be run with a clustering parameter file, which outputs the confirmations into clusters ranked by energy, the output can be visualized by software such as AVS (advanced visual Systems).

The paper also speaks of test systems and protocols used. Two sets of experiments were carried out

1). to test the ability of autodock ability to predict binding models and

2). Effect of changing different SA parameters.

Dimensionality of 6-17 degrees of freedom was considered, all crystallographic structures were taken from Brookhaven Protein Data Bank. For every test case, grid spacing of 0.375 angstroms was used with 61 points in each Cartesian direction making a total of 226981 points per grid. Therefore the grid dimensions were 22.875 *22.875*22.875 Angstrom. The aim was to determine whether SA could produce experimentally observed X Ray crystallographic similar structure. A 100 docking runs were performed in each case distributed evenly among 10 HP Precision Architecture 9000/735 stations. Each run consisted of 50 annealing cycles. A cycle terminated if the ligand made 25000 accepted or 25000 rejected moves whichever came first, the annealing temperature RT was 616 cal per mol for the first cycle ad reduced linearly at the end of cycle .

The second experiments was to determine the choice and effect of SA parameters , these tests were carried out using benzamide docking to beta trypsin (3ptb), camphor to cytochrome P 450 (2cpp), Phosphocholine to Fab McPC 603 (2mpc) , biotin with streptavidin (1stp), XK-263 with HIV-! Protease (1hvr), sailic acid with hemagglutinin (4hmg) .

It was concluded that SA is an powerful technique to solve variety of NP complete problems. It was observed that the method of starting the current cycle with minimum energy state from previous cycle is superior to to use the last state. A standard schedule of 100 runs, 50 cycles, and 25000 accepts and rejects was considered. For larger ligands a divide and conquer strategy was successful in applying SA. In this approach we dock a fragment of the molecule first, then freeze or constrain the formation and adding another flexible fragment incrementally building up to a final docked conformation

The techniques for docking can be categorized into following two types

1). Matching Methods: An active site model is created which is then tried to dock a given rigid ligand by finding the correct match of the geometry and

2). Docking simulation: In which the ligand starts from outside the molecule and explores each and every place on the macromolecule until a perfect site is found.

Autodock is an example of flexible docking technique and utilizes a grid for calculating protein ligand interaction. The early versions of autodock used Monte Carlo Simulated anneling search but this search method had a few drawbacks like only ligand with 8 rotatable bonds or less could be used and did not support more degrees of freedom, hence the two major changes in the new version of autodock were made, which are

1). Search method used is Genetic Algorithm GA, a local search and a global search based on Lamarckian genetic Algorithm LGA.

2).And empirical binding free energy force field which predict binding free energy.

Genetic Algorithm

In genetic algorithm each of the ligand state variables like orientation, translation and conformation of ligand correspond to a gene the atomic coordinates correspond to a phenotype. GA considers genome as a bit string of fixed length and uses binary cross over and binary mutations to generate new individuals and GA also allows autodock to deal with more degrees of freedom.

Working of Genetic Algorithm:

The GA initially starts with a random population, say about 50-300, each gene corresponds to a ligand state variable and then genetic operations like crossover and mutation are carried out. A generation counts of the following stages

1). Mapping : It converts each individual genotype to corresponding phenotype by which the individual fitness can be determined.

2). Selection: In this, individuals that are more fit proportionally have more offspring's and hence are selected.

3). Crossover and Mutation are based on user determined rates like one point crossover, two point crossover, uniform crossover and arithmetic crossover are performed these processes are similar to those observed in nature . This is followed by mutation which is done by addition of a random real number which has Cauchy distribution..

4). Another parameter called elitism finds how many of the top individuals will make it to the next generation. The GA is carried out again and again until any one of the termination criteria are met. At the end autodock displays the fitness, the state variables and the estimated free energy of binding.

Lamarckian Genetic Algorithm (LGA)

In LGA the phenotypic adaptations of an individual to its environment can be mapped to its genotype and then inherited by its offspring's. The GA applys the principles of Mendelian genetics, that there is one way transfer of information from genotypes to phenotypes but in case where inverse mapping exists the phenotypes give rise to genotype. Genotype corresponds to state variables and phenotype corresponds to atomic coordinates.

The genotypic space is defined by genetic factors like crossover and mutation by which parents give rise to children and phenotypic space is defined by the genetic function. In LGA the local search is used to determine fitness of an individual in the GA. Only if the developmental mapping function is invertible then only converting the phenotype to genotype is possible. This developmental mapping converts genotypic state variables of molecule into corresponding atomic coordinates. The hybrid global -local search is carried out in genotypic space rather than phenotypic space, Solis and Wets operators are used to carry out this process. Solis and Wets search has an advantage that it does not require gradient information in order to proceed.

The figure above shows genotypic and phenotypic search, the lower horizontal line is the genotypic space and the line above is the phenotypic space. Phenotypes are got from genotypes. The fitness function is given by f(x).The genotypic mutation takes place as shown on the right side of the figure and the local search is done as shown on the left side of the figure and is carried out in phenotypic space.

Kitchen et. Al

In process of docking the complexity increases with advancement in each step which is based on the algorithm and scoring function used to determine biological activity.


Grids were designed to determine energy calculation and play a vital role in determining the time and money involved in the process of docking. If the ligand had to visit each and every point and crevices on the macromolecule it would take a long time to cover the entire surface of the molecule and would not be of much worth. Hence the grid function divides the molecule in to small grids or squares and calculates the average energy of the small box. If the average energy of the ligand interaction is observed to be near to that of the average grid energy then that energy and conformation are selected, this reduces time and cost involved in visiting each place on the molecule.

The algorithms based on flexibility of ligand are divided into three categories

1). Systematic methods: A typical example of an algorithm which divides the ligand into fragments and then again combines the fragments to get the ligand with fragments having acceptable score.

2). Random Search: This method makes random changes in the ligand or entire group and the recently obtained ligand is compared or evaluated to the earlier ligand.

3). Simulation Methods: Molecular dynamics is the current best field in simulation search methods, but as simulations are not able to overcome the energy barriers different parts of the protein ligand complex are tried to simulate at different temperatures.

The is a lot of advancement concerning ligand flexibility but very less in case of protein flexibility so we have a few approaches which consider a part of the protein to be flexible. Some of the models include Monte Carlo calculations, protein assembly grids, molecular dynamics and rotamer libraries base on algorithm called dead end elimination.

Scoring of the predicted function between ligand and protein plays an important role in determining the correct pose. The scoring function has been categorized as

1). Force field based: It considers entire energy to a system as sum of various molecular dynamics and only one protein conformation. The autodock is based on AMBER force field. Protein ligand interactions are mainly described by Vander Waals forces which are based on Lenard Jones Potential and also by electrostatic energy.

2). Empirical Scoring function: These scoring function are used to obtain the same results as that of the experimental data the basis of empirical scoring function is that sum of un correlated terms of individuals can be used to approximate binding energies. The coefficients of various terms are obtained by regression analysis.

3). Knowledge based scoring function: These reproduce experimentally observed structures rather than binding energies.

4) Consensus scoring: As the name says, this function take s the consensus (information) of various scores to aptly balance the error caused from a single score and enhances the probability to find an actual ligand.

The rigid molecules are predicted at a higher percentage correctly compared to flexible molecules having rotatable bonds. The scoring functionality can be improved by including salvation and rotational entropy parameters but including these contributions is computationally expensive.

Structure based virtual screening: Assessment of large number of structures which have the most probability to bind to a target. This process involves docking and scoring functions which estimate the likelihood of ligand binding to a protein. It is seen that large molecules have large number of hypothetical interactions and hence have a better score as compared to small molecules.

Docking and simulations are vastly used to analyze features of active sites which include hydrophobic and hydrophilic interactions. Though huge advancements made in the field of structure based drug designing and lead optimization, scoring function and ranking are the areas which still need to be improved upon. The relationship between docking and scoring is a complex one, and it is easier to produce reliable models of bound ligand than to find the ligand. Pharmacophore models are combined with docking to simplify the number of compounds in case of intricate scoring calculations.

Grid Parameter File (GPF)

The .gpf file tells autodock which types of maps are to be computed and also the location of these maps. It also specifies the pair wise potential energy parameters. For each ligand one map calculated for each of the element and another electrostatic map. Based on the atoms in the macromolecule the 12-6 Lenard Jones energy parameters, Rij- equilibrium internuclear separation and epsij- energy well depth are specified for each map. If hydrogen bonding is being considered then 12-10 will be specified instead of 12-6 parameters.

Docking Parameter File (DPF)

The docking parameter file has the extension .dpf and directs autodock which maps to be used, where to start with the ligand, what is the center and the number of torsions and which algorithm to be used out of the four namely SA (Simulated Annealing); GA (Genetic Algorithm); LS (Local Search) and LGA (Lamarckian Genetic algorithm). Each of these algorithm have their own parameters which need to be set before running the docking process, these include ; the kind of random number generator to use, the step size and also how long will be the run.

The parameters of dpf are given below,

1). 'seed'-It tells which random number generator is to be used.

2). 'types'- It tells names of all the atoms in the ligand.

3). 'fld'-grid data file created by autodock.

4). 'map'-file name of the first autogrid affinity grid map

5). 'move'- filename of the ligand to be docked.

6). 'about'- coordinate reference frame in autodock.

7). 'tran0'-each of the new run starts ligand from this point.

8). 'quat0'- initial quaternion Qx, Qy , Qz, Qw.

9). 'ndihe'-number of rotatable bonds in ligand.

10). 'didhe0'-user set initial relative dihedral angles.

11). 'tstep'-the maximum translational jump allowed per step.

12). 'qstep'- maximum orientation of angular component w of quaternion, default is 50 degrees.

13). 'dstep'- maximum dihedral step size

14). 'torsdof' torsional degrees of freedom

15).'intelec'-calculate internal ligand electrostatistic energies.

16). 'outlev'-diagostic output level. Use 1 for SA and 0 for GA and LGA.

17). 'rmstol'root mean square deviation tolerance for cluster analysis.

18). 'rmsref'-the rmsd for docked conformations calculated according to coordinates in the pdbq file .

19). 'extrng'-external grid energy, assigned to atom outside the grid. Default 1000 kcal/mole.

20). 'analysis'- performs a clustered analysis on the docking and output results to log file.