This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Proteins are fantastically diverse biopolymers formed by sequential arrangement of 20 possible amino acids. These exceptional macromolecules are found in all living organisms and perform all types of biological function from purely structural to enzymatic. Theoretically proteins can be uniformly denatured into a flexible and distorted state, or they can be folded into units of secondary structure. The work of Pauling and Corey in 1951 revealed several units of secondary structure and among the most common are the alpha-helix and the beta-sheet (Pauling and Corey, 1951). The tertiary structure of proteins shows the organization of units of secondary structure into three-dimensional motifs or domains, which often give evolutionary information or clues about its function (Lupas et al., 2001). These domains have often specific tasks such as transporting biomolecules, performing catalytic steps or simply binding or recognizing other entities. The quaternary structure relies on the formation of multi-proteins complexes, such as the tetrameric haemoglobinor more complex transmembrane channels and such organization is often essential for activity.
Proteins are amenable to chemical structure examination by visualising frozen snapshots of the protein using X-ray crystallography or by building the protein model in solution from NMR signals of isotopically labelled N, C, and H atoms (Doerr, 2006). The three-dimensional models built using different methods do not give identical structures. Molecular models are abstract representations of the real objects and therefore care should be exercised when extracting information from them. Three-dimensional models of proteins that do not have any ligand bound are called apo forms. Proteins often crystallize with one or more ligands bound to it and this give extra information, as the residues important for binding can be determined. When studying a specific protein it is advisable to observe as many different structures as possible, as protein conformation vary when different ligands are bound and this gives a picture of the dynamics of the protein. Actives sites of proteins have regions of high flexibility as this allows the exchange of ligands and the proper recognition of substrates (Carlson, 2002). Historically the lock-and-key theory proposed by Emil Fisher in 1894 (Fischer, 1894) was the first principle to be adopted for explaining the specificity of protein-ligand interactions. The rigidity of the lock-and-key model did not account for all observable interactions and in 1958 the induced fit theory was proposed (Koshland, 1958), stating that upon substrate binding the enzyme changed its 3D structure in order to ensure a geometric fit between the ligand and the protein. More recently several research groups have postulated the pre-existence of an ensemble of conformational states that coexists in equilibrium (Ma et al., 1999, Carlson, 2002, Tobi and Bahar, 2005) each one of them with a different energy level. When a binding molecule interacts with a specific conformation decreasing the free energy, it stabilizes the complex and creates a shift in the population towards that particular conformation. Ultimately, if there is enough ligand, all the protein molecules would bind to the ligand, and in the case of inhibitors, all protein molecules will be captured in an energy minimum, a desired situation in target disruption.
Members of Mycobacterium tuberculosis complex (Actinobacteria) are acid-fast, slow-growing bacteria, causing tuberculosis (TB) in humans (Cole, 2002). The bacilli with a dimension of 0.2-0.5 µm infect principally the lungs where nodules or "tubercules" appears, but it can also colonize the lymph, liver, spleen, brain, skin and almost every organ. Because of its antiquity and severity it has been a well-known disease to humans. During the 19th and 20th centuries, before the emergence of the first antibiotics, it reached epidemic proportions and was almost always fatal (Daniel, 2006). The World Health Organization (WHO) declared TB to be a global health emergency with the increasing number of cases during the 1990-2000 decade. This phenomenon was attributed principally to the emergence of multiple drug-resistant strains (MDR and XDR) and the association of human immunodeficiency virus (HIV) and TB (Brewer and Heymann, 2004). Novel drugs are clinically required for treating resistant strains of M. tuberculosis. They must be safer than second line drugs, and more potent to prevent rapid emergence of drug-resistance and for shortening the length of treatment (Ginsberg and Spigelman, 2007). Also they must not interfere with anti-retroviral therapy. Although there are increasing efforts for developing novel anti-TB drugs, particularly supported by the TB Alliance, it is not clear whether there are enough leads entering at the beginning of the pipeline to assure the appearance of a new full regimen in the next decade (Casenghi et al., 2007).
Peptidoglycan is an essential polymer of bacterial cell wall that gives shape to the cell and serves as containment for internal cytoplasmic pressure (Heijenoort, 2001). In Mycobacterium species, the peptidoglycan is the basal macromolecule that supports the essential mycolyl-arabinogalactan complex responsible for the acid-fast properties of this bacterium (Crick et al., 2001, Hett and Rubin, 2008). The biosynthesis of peptidoglycan occurs in several steps both in the cytoplasm and in the periplasmic space. Î²-lactams, which include several useful antibiotics such as the penicillins, the carbapenems and the cephalosporins, target the final cross-linking periplasmic steps of peptidoglycan. Others antibiotics such as bacitracin, the glycopeptides (vancomycin) and cycloserine also target the peptidoglycan biosynthesis pathway (Chopra et al., 2002). Because it is a well-validated essential pathway for bacteria, the development of selective inhibitors of other essential enzymes of the pathway may provide novel useful antibacterials. ATP-dependent Mur ligases are central to the peptidoglycan biosynthetic pathway as they sequentially add amino acids to the peptide chain of the peptidoglycan monomer uridine-diphosphate N-acetylmuramic acid (UDP-MurNAc) (Zoeiby et al., 2003). Recently the MurE ligase of Mycobacterium tuberculosis was cloned, over-expressed, purified and three-dimensionally characterized by X-ray crystallography in our research group (Basavannacharya et al., 2010a, Basavannacharya et al., 2010b). This enzyme uses ATP to add meso-diaminopimelic acid to a uridine-diphosphate-N-acetyl-muramic acid-L-ala-D-glu (UDP-MurNAc-dipeptide) forming a tripeptide. In this work we take a step further in the study of this enzyme by using computational tools for modelling potential ligands which may bind and potentially inhibit the activity of this enzyme.
2. Basic knowledge of the MurE ligase from Mycobacterium tuberculosis
2.1. Homologues, paralogues and orthologues
In the course of biologic evolution, gene duplication is the most important mechanism for the generation of the astonishing diversity of proteins found today in living organisms. The term homologues is used to designate structures or sequences that evolved from a single ancestor structure or sequence (Gogarten and Olendzenski, 1999). In 1970, Fitch proposed two more terms to designate different subclasses of homologous proteins: paralogues and orthologues (Fitch, 1970). Paralogues structures or sequences are homologues that appeared as a result of gene duplication in the history of an organism, for example human Î±- and Î²-hemoglobin are paralogues as they evolved from the same ancestor protein by a gene duplication event. Orthologues structures or sequences are homologues that resulted only because of a speciation event, human and mouse Î²-hemoglobins are orthologues proteins.
In the genome of M. tuberculosis the ATP-dependent Mur ligases C, D, E and F which catalyse the sequential peptide bond formation of UDP-MurNAc with L-ala, D-glu, m-DAP and D-ala-D-ala respectively coupled to the hydrolysis of ATP, are considered as paralogues proteins among them as they are the result of gene duplication events. Most probably the gene duplication event occurred before speciation, because we can find all ATP-dependent Mur ligases C, D, E and F in the dcw operon of several prokaryotes such as M. leprae, Escherichia coli, Bacilus subtilis, Haemophilus influenzae, Thermatoga maritima, Streptococcus pneumoniae, Staphylococcus aureus, Chlamydia sp, Pseudomonas aeruginosa and many others (Azzolina et al., 2001, Vicente et al., 1998, McCoy and Maurelli, 2006, Smith, 2006). In summary the MurE protein of M. tuberculosis has three paralogues in the same species (MurC, MurD and MurF) and several thousand MurE orthologues in different species of prokaryotes. Although a paper from 1992 suggest the possibility of endogenous synthesis of peptidoglycan in eukaryotic cells (Roten and Karamata, 1992) there is no evidence of the existence of Mur ligases enzymes in the human genome (Anishetty et al., 2005, Shanmugan and Natarajan, 2010). Interestingly the biologically widespread folylpolyglutamate synthases (FPGS) which catalyzes the addition of glutamates to folates (Shane, 1989) act by the similar mechanism as ATP-dependent Mur ligases. These FPGS occurs in mitochondria and in the cytoplasm of human cells (McGuire et al., 2000) and they constitute antitargets for the development of bacterial Mur ligases inhibitors. In other terms, a broad spectrum inhibitor of bacterial Mur ligases can also inhibit eukaryotic FPGS which can led to cytotoxic effects on the host. The sequence alignment of MurE from M. tuberculosis with the other ATP-dependent Mur ligases from M. tuberculosis and with MurE from Escherichia coli and human FPGS are shown in Figure 1. This alignment shows that the Mur ligases have a much higher similarity between them, compared to human FPGS, suggesting that it is possible to develop selective inhibitors against Mur ligases without having a disturbing effect on human FPGS.
Figure 1. Sequence alignment of MurC, MurD, MurE and MurF from M. tuberculosis, MurE from Eschericha coli and human FPGS using MAFFT and SeaView.
2.2. X-ray structure analysis
To date, two different MurE structures of Mycobacterium tuberculosis have been solved by X-ray crystallography. They have been deposited in the Protein Data Bank with accession codes 2wzt and 2xja. The first deposited structure 2wtz was solved at 3.0 Å resolution with an R-work value of 0.195 and an R-free value of 0.251. The deposited structure 2xja was solved at 3.0 Å resolution with an R-work value of 0.193 and an R-free value of 0.258. The R-work and R-free values are statisticals used to validate the model of the crystal structure (Kleywegt and Brünger, 1996). The R-work value measure the discrepancy between the amplitudes of the observed (experimental) and calculated (theoretical from the obtained model) reflections. This statistical is closely related to the crystallographic residual which is minimized in the refinement process. Therefore the R-work value does not necessarily tell how good a model is, and there are cases when incorrect models have low R-work values. The cross-validation method was developed by Brünger in 1992 when the R-free was introduced (Brunger, 1992). Basically the R-free is the same as the R-work only that it is calculated from a 5-10 % subset of reflections of the data that are not taken initially for the refinement. Both 2wtz and 2xja structures were solved to a moderate resolution of 3.0 Å and therefore accurate interpretation of atomic positions is not possible. Researchers in the field have proposed that good docking results should be obtained for targets prepared from crystallographic models at atomic resolution (Kontoyianni et al., 2003, Krovat et al., 2005), meaning that useful models should have a resolution below 2 Å. Although the MurE models of M. tuberculosis do not have a resolution below 2 Å, these are the only 3D structures available for this particular enzyme of the pathogenic organism and therefore we will base our modelling on the available models of this protein.
The MurE protein from M. tuberculosis crystallizes in triclinic system, namely a P1 symmetry group (space group containing only translations as elements of symmetry) and contains four different chains per asymmetric unit. This enzyme is named as UDP-N-acetylmuramoyl-L-alanyl-D-glutamate-2,6-diaminopimelate ligase (E.C. 188.8.131.52) and its sequence consists of 535 amino acids. The enzyme has a molecular weight of 55.4553 kDa and has a modified residue N6-carboxylysine in position 262. The MurE enzyme is a three-domain protein (Figure 2). The first domain (1-139) is consistent with a Rossmann fold (Schulz, 1992, Bertrand et al., 1997) typical of nucleotide binding proteins, and is responsible for binding to UDP (Basavannacharya et al., 2010b). The second domain (140-378) is the central one and is responsible for binding ATP and activates the UDP-MurNAc-L-ala-D-glu substrate. Finally the third domain (379-535) is the C-terminal domain involved in recognition and binding of the meso-diaminopimelic acid (m-DAP) substrate (Basavannacharya et al., 2010b).
Figure 2. Ribbon structure of M. tuberculosis MurE protein (PDB:2wtz) showing the three different domains: domain 1 in marine blue, domain 2 in cyan and domain 3 in air blue.
2.3. Enzymatic mechanism of MurE
Since all ATP-dependent Mur ligases C, D, E and F are topologically similar to each other, sharing homologous regions (Bertrand et al., 1997, Smith, 2006), and they all catalyse the formation of an amide peptide bond driven by the hydrolysis of ATP (Zoeiby et al., 2003), it is well accepted that they share the same catalytic mechanism (Figure 3), proposed in the first instance by Bertrand and collaborators (Bertrand et al., 1999). According to the reported mechanism adapted for the MurE of M. tuberculosis, the first step is the activation of the UDP-MurNAc-L-ala-D-glu by phosphorylation of the Î³-carboxylic group of D-glu to form an acyl phosphate assisted by a magnesium cation, lysine-247 and histidine-248 (Wallace et al., 1995, Basavannacharya et al., 2010a) releasing ADP. Then, one of the deprotonated amino groups of m-DAP attack nucleophilically the carbonyl group of the acyl phosphate of the glutamic acid of UDP-MurNAc-L-ala-D-glu with the assistance of the glutamic acid-220 and lysine-347 residues. Thereafter the charged oxygen of the acyl group forms again the carbonyl group and the phosphate leaving group is finally liberated, forming the final product of the reaction UDP-MurNAc-L-ala-D-glu-m-DAP.
Figure 3. Proposed enzymatic mechanism of the ligase reaction of MurE from M. tuberculosis producing UDP-MurNAc-L-ala-D-glu-m-DAP tripeptide product.
2.4. Essential residues for MurE activity
A recent work described the generation of mutants of M. tuberculosis MurE on essential amino acids necessary for ligase activity (Basavannacharya et al., 2010a). The amino acids lysine-157, glutamic acid-220, aspartic acid-392 and arginine-451 were replaced by alanine, and asparagine-449 was replaced by aspartic acid. A truncated form of the MurE protein was also generated which lacked the first 24 amino acids from the N-terminal. The results of enzymatic activity showed that all the mutated amino acids were important for catalytic activity of the enzyme. In contrast the truncated form of MurE had the same activity as the wild-type MurE, indicating that the first 24 residues are not essential for the activity. Lysine-157 is part of the ATP binding consensus sequence GxxGKT (Walker et al., 1982, van der Wolk et al., 1995) and its replacement for alanine considerably decreased MurE activity. Glutamic acid-220 is part in a second recognition motif for ATP (Walker et al., 1982) which helps in the chelation of Mg2+ and the proper approximation of ATP and UDP-MurNAc-L-ala-D-glu (Basavannacharya et al., 2010a). Aspartic acid-392 is also involved in binding to ATP as its mutation for alanine results in a decrease in the affinity for ATP which translates in a decreased activity. Arginine-451 and asparagine-449 belongs to a conserved DNPR motif involved in the specific recognition of m-DAP (Gordon et al., 2001) and their mutations also decreased MurE activity considerably (Basavannacharya et al., 2010a).
According to the sequence alignments, we can recognize three different types of regions, those which are shared by the ATP-dependent peptide forming enzymes, those regions which are shared between the Mur ligases among them, and those regions which are shared between the MurE ligases which recognize m-DAP as the incoming amino acid. There is a clear hot spot in the ATP-recognition region centred on essential lysine-157 with a sequence shared among ATP-dependent peptide ligases TGTxGKxxT. A PGDxxx sequence found in the third domain from proline-494 to leucine-499 is common to all ATP-dependent Mur ligases from M. tuberculosis. Moreover the sequence GDRDP from glycine-422 to proline-426 is common to MurE from M. tuberculosis and Human FPGS. Finally the essential motif DNPR from aspartic acid-448 to arginine-451 is involved in the m-DAP substrate recognition and is therefore common to all m-DAP adding MurE ligases (Basavannacharya et al., 2010a).
3. Docking activities
3.1. Computational docking and softwares
Over the past few decades, computational methods have been used widespread for the identification of appropriate lead molecules in drug development projects in pharmaceutical industry. The term virtual screening (VS) emerged in the 1990 decade for describing the process of using computational tools to identify a reduced set of molecules with increased potential for biological activity (Alvarez, 2004, Klebe, 2006). Basically there are two types of approaches in VS, those techniques based on the ligand, aiming to identify molecules behaving as endogenous ligands, and those techniques based on the target or receptor, aiming to identify molecules having an interaction with the macromolecule (Lyne, 2002, Alvarez, 2004). The first approach is called ligand-based virtual screening as opposed to the structure-based virtual screening approach, also known as molecular docking, which is based on the availability of three-dimensional models of the target.
In this work we will focus our attention on the docking method which predicts the binding orientations or "poses" of the ligands chosen to be fitted into the active site of the protein target. Docking is generally viewed as a multi-step process, beginning with the application of algorithms to pose the ligands in the active site of the target, then after sampling all possible conformations that best matches the receptor, the programs calculate scoring functions designed to estimate the energy of the interactions between the compounds and the receptor (Kitchen et al., 2004). These calculations take into account steric, electrostatic, hydrogen bonding, van der Waals and other types of interactions which can contribute or counteract the stabilization of a ligand in the active site. According to Halperin and collaborators, there are three main key steps in the docking process: the representation of the system, the search for conformational space and the ranking of potential solutions (Halperin et al., 2002).
3.1.2. Target and ligand models
The mathematical representation of the target can be achieved using three different models: atomic, surface and grid (Kitchen et al., 2004). The atomic model is normally used with real potential energy functions, and although it can be very accurate, the mathematical complexity of the representation makes it slow and therefore not very useful when docking a huge library of small-molecule ligands. It can be used during the final ranking of selected solutions when addition accuracy is needed. The surface mathematical representation is mostly used for protein-protein interactions as it consists of a geometric surface where shape complementarity is easily computed (Smith and Sternberg, 2002). The Connolly surface is a very popular model for protein surfaces (Connolly, 1983) based on the representation of the surface of the macromolecule accessible to the solvent molecules. Finally, Goodford was the first to use potential energy grids (Goodford, 1985) for studying protein-ligand interactions and this idea has been thoroughly exploited by docking programs such as Glide and Autodock (Tantoso et al., 2004, Friesner et al., 2004). Basically the grid representation is based on the assignment of energetic contributions such as electrostatic and van der Waals, to a three-dimensional grid of the receptor so that it only needs to be read during ligand sampling and scoring, saving time and computational processing.
Ligand representation is also a very important aspect of the docking as it is necessary to ensure that the modelled ligands are in full agreement with the submitted chemical structures. The ligands are basically represented as property-based atomic descriptors, such as volume, charge, lipophilicity and hydrogen bond donor or acceptor properties (Schafferhans and Klebe, 2001). In order to limit computational complexity, atomic descriptors can be reduced to functional group descriptors making it easier for further calculations, for example the volume of a methyl group can be reduced from a four-term descriptor (each one for each atom) into a single volume descriptor.
The simplest case in when evaluating the flexibility of a docking application is to treat the receptor as a rigid body. Typically the rigid body approximation is considered justified when the bound and unbound X-ray structures of a target do not differ considerably (Halperin et al., 2002). However when there are large rearrangements of the protein surface when ligands bind to it, the rigid body approximation is not valid, and flexible approaches should be implemented. A solution to this problem, is to use ensembles of protein conformations or multiple protein structures (MPS) rather than a single one, to generate various receptors potentials and to dock the ligands against each structure (Carlson and McCammon, 2000, Carlson, 2002). Another possibility is to select the preferred side-chain conformations using a side chain rotamer library, and then reduce its number by removing those that do not contribute to energy minimization (Leach, 1994). It should be noted that the rigid body-approximation is consistent with the lock-and-key model, whereas the utilisation of an ensemble of distinct energetically favoured conformations is more in agreement with current views of protein as energetic landscape of peaks of high-energy conformations and populated valleys of stable conformations. The use of soft docking, which employs a "soft" scoring function, allows some overlap between the ligand and the target to occur, and therefore enables some plasticity to develop over the rigid position of the protein.
In docking applications, ligand flexibility should be completely sampled, as the modelling process has to determine the most stable arrangement of the small molecule within the active site of the receptor. But, in doing so, the complexity of the computational calculations can increase exponentially as the number of possible ligands conformations rise in proportion to the power of rotatable bonds (Lorber and Shoichet, 1998). For example, for an organic molecule with ten rotatable bonds, the number of possible conformations is 59,049 if only three favourable angles are considered per rotatable bond. Thus ligand flexibility has to be approximated by using one of three different methods: systematic search, stochastic methods and simulation methods (Kitchen et al., 2004). In the systematic search, the algorithms try to explore all the degrees of freedom of the small-molecule. Ligand flexibility can be explored in a stepwise incremental fashion by docking small molecule fragments and then try to link them together (Rarey et al., 1996) or by dividing the ligands into rigid and flexible parts and docking the rigid parts first and then adding the flexible fragments. In the stochastic methods, such as the Monte-Carlo algorithm, an initial configuration is set and then a random variation of translations, rotations and conformations are varied inside the active site. Then a scoring function is applied and the score compared with the initial configuration. If the score has been improved the new configuration is retained. The process is repeated until a desired number of configurations have been sampled (Kitchen et al., 2004). Another popular stochastic method is the genetic algorithm which is based on the mathematics developed for solving competition and population dynamics problems. Finally the simulation method usually employs molecular dynamics based on Newton´s equation of motion for an atomic system, giving the opportunity to calculate the trajectory of change of the atomic positions over time.
3.1.4. Scoring functions
Once a pose or arrangement of the ligand into the active site of the receptor has been generated it needs to be scored in order to rank the quality of the pose with respect to other poses of the same ligand and with respect to other ligands (Lyne, 2002). Although it is desirable to have accurate and quantitative scoring function which correlates closely with binding affinity, such scoring function are computationally expensive and impractical for virtual screening of huge libraries (Kitchen et al., 2004). Therefore scoring functions often perform simplifications and do not rigorously treat the physical phenomena underlying molecular binding. There are three types of scoring functions: the force-field, the empirical and the knowledge-based (Lyne, 2002, Kitchen et al., 2004). In the force-field scoring functions, the score is based solely on energetic terms such enthalpic and entropic contributions (Hecht and Fogel, 2009). The enthalpic term is known as the force-field energy and takes into account electrostatic, van der Waals and hydrogen bonding interactions. The entropic contributions such as solvation effects and conformational mobility result in computationally complex calculations which are often neglected at expenses of the enthalpic terms which dominate the force-field arena. Examples of force-field scoring functions are Autodock, DockScore, D-score, G-score, GOLDScore, ICM, QXP, RankC, SIE and VALIDATE (Hecht and Fogel, 2009). Another type of score is the empirical scoring functions which are based on experimental data such as binding energies and conformations (Kitchen et al., 2004). The empirical score is based on contributions such as coulombic, van der Waals, hydrogen bonding, hydrophobicity and entropic factors, and each one of these factors is scaled by constants calculated by linear regression of experimentally determined binding affinities (Eldridge et al., 1997, Hecht and Fogel, 2009). Examples of empirical functions are Autodock, LUDI, ChemScore, eHITS, FlexX, F-Score, GlideScore, Hammerhead, HINT, LigScore, PLP, SCORE, ScreenScore, SIE and X-Score (Hecht and Fogel, 2009). The knowledge scoring function are statistical functions generated on basis of the frequency of ligand atom - protein atom interaction pair, using Boltzmann distribution (Hecht and Fogel, 2009). The main advantage is that they are computationally efficient and simple, allowing a large ligand screening library to be docked quickly (Kitchen et al., 2004). The most used approach is the potential of mean force (PMF) which converts structural information into free energies and therefore balances implicitly opposing interactions such as interaction enthalpy and solvation effects or conformational entropy (Xue et al., 2010). Examples of knowledge-based scoring functions are BLEEP, DrugScore, M-Score, PLP, PMF and SMoG (Hecht and Fogel, 2009).
According to comparison studies of docking programs and scoring functions, suggest that there does not seem to be a general docking program or scoring function that work well for all cases (Graves et al., 2005, Grosdidier et al., 2007). It has been also noted that in practice, scoring functions works well depending on the principal type of intermolecular interaction present between the target and the ligand (Hecht and Fogel, 2009). For example when hydrogen bonding plays an important role, scores such as GOLDscore, X-Score and PLP perform well, however when hydrophobic interactions and Van der Waals forces are dominant, ChemScore works best (Hecht and Fogel, 2009). Therefore the consensus scoring was introduced to prevent the use of weak scoring functions (Charifson et al., 1999). One common problem of the scoring functions is that they tend to give better scores for large-molecular weight compounds (Alvarez, 2004) and therefore it has been proposed a score normalization step according to the number of heavy atoms of the ligand (Pan et al., 2002).
3.1.5. Docking programs
The first docking program DOCK was originally described in 1982 by Kuntz and collaborators (Kuntz et al., 1982). Several other docking softwares have been developed and to date the most popular are GOLD (Jones et al., 1997), Autodock (Morris et al., 1998), FlexX (Rarey et al., 1996) and Glide (Friesner et al., 2004) to name just a few. All of these programs are still far from perfect (Klebe, 2006) and therefore their binding predictions should be verified by alternative experimental techniques. Many of the most popular programs perform reasonably well when trying to obtain the correct poses published in experimental crystallographic databases (Hecht and Fogel, 2009, Warren et al., 2005). However the scoring functions normally have low correlation with experimental data (Pearson correlation coefficient below 0.5) (Hecht and Fogel, 2009), meaning that the score rarely is in agreement with binding affinities.
Basically the selection of the software relies on the type of scoring function to be used, the availability of the docking software and the choice of the user. For example, most users performing docking on a specific program will continue to use it, as long as the results obtained are fairly well and the software companies update the system on a regular basis. The studies on docking software comparison are difficult to perform because it is easy to get biased towards a program that performs relatively well on a set of pre-established conditions whereas other programs may need more tweaking to obtain accurate results (Cummings et al., 2005, Liebeschuetz, 2008). In a recent survey it was found that ICM, Glide and Surflex generated poses similar to the published X-ray structures of 68 complexes, being Glide program the best tool for virtual screening and cognate ligand docking (Cross et al., 2009).
3.2. Selection and preparation of the targets
Proteins are flexible macromolecules that can exist in an ensemble of conformational states (Carlson and McCammon, 2000, Ma et al., 2002, Carlson, 2002), so when the proteins bind to different ligands they change their conformation to adopt a more complementary arrangement. There are currently two different published X-ray structures of MurE from M. tuberculosis in the Protein Data Bank database, the 2wtz, having the UDP-MurNAc-L-ala-D-glu substrate, and the 2xja, having both UDP-MurNAc-L-ala-D-glu and ADP substrates (Figure 4). The two structures are very close to each other, having a backbone root mean square distance (RMSD) of only 0.678 Å. As the two 3D MurE structures are very similar but have a slight conformational movement due to the binding of ATP, it can be interesting to dock the potential ligands against both structures simultaneously. This process can be used to validate the modeling method as the results that will tend to be reproducible in both models, will deserve higher confidence. Also the selected set of small molecules considered to have some affinity for the enzyme will be considered binders of two similar but nevertheless different conformations of the protein.
Figure 4. Comparison of the two published MurE structures from M. tuberculosis. The 2wtz structure was drawn in yellow and 2xja in cyan.
A protein which is also considered for computational preparation is the human folyl-polyglutamate synthase (FPGS). As this type of protein constitutes the human most similar macromolecule (at least in sequence) to the bacterial Mur ligases, selective inhibitors of the ligases will have to be non-effective inhibitors of human FPGS. Otherwise the small-molecules could show cytotoxicity on the host because of the interruption of the folate metabolism (Bailey and Gregory, 1999, McGuire et al., 2009). The receptors or proteins that may have undesired effects if inhibited are called antitargets (Recanatini et al., 2004) and therefore the human FPGS can be considered as antitargets when developing Mur ligases inhibitors. There is no published three-dimensional structure of any human FPGS in the PDB database. The mitochondrial human FPGS (accession code Uniprot Q05932) was modeled using the CPHmodels-3.0 structure prediction web server (Nielsen et al., 2010) based on folyl-polyglutamate synthase of Thermatoga maritima (PDB:1o5z) as template. The obtained model had a z-score of 33.8 giving confidence on the model. It was compared to the MurE structure from M. tuberculosis (PDB:2wtz) showing that there are a few structural similarities particularly in the central domain close to the ATP-binding site (Figure 5).
Figure 5. Comparison of the modeled human folyl-polyglutamate synthase and the MurE structure from M. tuberculosis. The modeled human FPGS was drawn in green and the 2wtz structure in yellow.
All proteins were prepared using the protein preparation wizard of the Maestro software (Schrödinger Software Suite 2009). The docking grid was constructed as a centroid of 30 Å choosing the residues fundamental for substrate binding according to the PDBsum Ligplot diagram (Wallace et al., 1995). When no information about substrate binding is available, as in the case of FPGS, the grid was selected based on structural comparison of the chosen grid in the MurE structure and analogous residues present in FPGS.
3.3. Selection and preparation of ligands
Three approaches are being considered for the selection of ligands to be docked into the M. tuberculosis MurE structures available. The first approach is based on our in-house screening of potential inhibitors from compounds coming from natural sources, semi-synthetic or synthetic project collaborators (Guzman et al., 2010, Guzman et al., submitted). Basically the compounds that show experimental inhibition on the activity of the MurE ligase of M. tuberculosis, are docked into the MurE model to predict their probable binding modes.
The second approach of ligand selection is based on the published papers that report the chemical structure of Mur ligases inhibitors. Several ATP-dependent Mur ligase inhibitors have been reported such as UDP-muramic substrates analogous (Tanner et al., 1996, Gegnas et al., 1998, Baum et al., 2006, Marmor et al., 2001), macrocycles (Horton et al., 2003), pulvinones (Antane et al., 2006), thiazolylaminopyrimidines (Baum et al., 2006), glutamic acid derivatives (Kotnik et al., 2007, Kristan et al., 2009, SimcÌŒicÌŒ et al., 2009, Tomasic et al., 2009), phosphinates (Å trancar et al., 2007), substituted 8-hydroquinolones (Baum et al., 2007), diarylquinolines (Baum et al., 2009), phenylsulfonyl carbamates (Frlan et al., 2009), benzoic acid derivative (Kristan et al., 2009), 1,3-dicarboxylic acid derivatives (Perdih et al., 2009), hydroxylated xanthenes (Turk et al., 2009), phosphorylated hydroxyethylamines (Sova et al., 2009) and 5-benzylidenthiazolidin-4-ones (TomaÅ¡iÄ‡ et al., 2010) among others. These inhibitors will be prepared and docked into the MurE model to check whether they have some predicted affinity for this particular Mur ligase.
Finally the third approach of ligand selection is based on the use of virtual screening libraries such as ZINC (Irwin and Shoichet, 2004), ChemBank (Seiler et al., 2008), DrugBank (Wishart et al.), PubChem, ChemDB, and MMsINC (Masciocchi et al., 2009). Based on the queries established by the researcher, most of these libraries can filter the database to obtain a set of molecular entries in agreement with the desired chemical space to be explored such as fragment-like, lead-like, drug-like, natural products, commercially-available molecules, and so on. For MurE it has been envisaged to dock a library of fragments, lead-like and natural products.
The ligands to be docked into the MurE models are drawn using chemical software such as Chemdraw and then converted into LigPrep (Schrödinger Software Suite 2009) acceptable formats (sdf, mol, pdb, smiles, mol2) using appropriate converters such as BabelGUI or Corina. The ligands coming from virtual screening libraries are prepared directly by LigPrep without further modification. LigPrep processes the structures at the MurE working pH 8.5 in order to generate the tautomers and ionic states of the ligands.
Glide (Schrödinger Software Suite 2009) is the docking application used for the ligand binding prediction and scoring process. Glide (Grid LIgand Docking with Energetics) was designed to perform an exhaustive search on the position, orientation and conformation of the ligand on a defined site of the target (the grid) with high computational speed (Friesner et al., 2004). Basically the receptor is defined by different sets of fields on a grid and the ligands are defined by a set of ligands conformations selected by energy minimization of the ligand torsion-angle space. The docking process as developed by Glide is organized in hierarchical filters. The initial screening tries to locate the most promising locations of the ligand in the whole receptor grid. The energy of these initial poses is then minimized using a molecular mechanics energy function (the OPLS-AA force field) in conjunction with a distance dependent dielectric model, allowing the ligand to locate in a position that matches mass and coulombic interactions with the protein. Finally a Monte Carlo stochastic process allows some degree of ligand flexibility to occur, altering torsional angles in order that the peripheral groups gain a proper orientation. For selecting the correctly docked pose, Glide calculates a composite scoring function called Emodel which contains both the molecular mechanics energy and the empirical scoring function GlideScore.
The scoring function used by Glide, known as GlideScore, is really an extended version of the empirical function Chemscore developed by Eldridge and collaborators (Böhm, 1994, Friesner et al., 2004). GlideScore involves the summation of different types of interactions such as: lipophilic, hydrogen bonds, metallic, polar, coulombic and solvent-mediated. Glide also allows for making more space for the hydrophobic parts of the ligands, as the radii of the atoms is scaled by 0.8 in comparison with protein radii which is scaled by 1.0, and this effects modulate the "breathing" of the protein which allows the accommodation of slightly larger ligands than the native co-crystallised ligands (Halgren et al., 2004).
Another feature of Glide, is that it offers wide variability of speed vs. accuracy options. In the high-throughput virtual screening (HTVS), the docking is performed very quickly, allowing huge libraries (millions of compounds) to be docked effectively. The standard precision (SP) mode is designed for docking large libraries (thousands of compounds) with high accuracy. The extra-precision (XP) is used when advanced scoring functions and extensive sampling is needed for eliminating false positives and obtaining poses with high accuracy.
For docking the different ligands into the MurE models, Glide was used initially in SP mode to get the possible binding modes of the ligands. Selected ligands with the lower GlideScore values were run in XP mode in order to get a more accurate prediction of the binding and scoring function. The docking was run varying amide bond conformations.
3.5. Scoring and validation
The scoring function Glidescore varies from system to system, and therefore there is no cut-off value for selecting good ligand valid to all receptors. According to the Schrödinger answers to frequently asked questions, it is considered that for HTVS or SP docking, a GlideScore of -10 or below is considered as good binding, however for some targets where hydrophobic interactions are predominant, a GlideScore between -8 and -9 is considered very good. For XP the GlideScore of promising ligands tend to be at -12 or below. In a recent paper, the virtual screening of 1596 potential inhibitors was carried out in an homology modeled structure of MurD ligase from Leptospira interrogans using Glide. The reported GlideScore was in the range -8.5 to -10.3 kcal/mol for the most potent inhibitors, but was only -6.97 kcal/mol for the UDP-MurNAc-L-ala natural substrate and -5.32 kcal/mol for the D-glutamic acid (Umamaheswari et al., 2010). These results suggest that GlideScore is grid-dependent, and therefore it should be used to compare among different ligands for the same grid. Also natural ligands which are supposed to be good binders of the target can have GlideScores as low as -5 and -7 kcal/mol.
The validation of the docking process (preparation of the target, ligands, grid construction) can be performed when there is a published X-ray structure with a bound ligand. The strategy called "self-docking" basically involves docking the same ligand in the apo receptor structure and to verify that the ligand is placed in the same position and orientation as in the X-ray structure. The RMSD between all the heavy atoms of the docked and crystallized ligand should be below 2Å for considering adequate the docking process (SaÌndor et al., 2010). For larger ligands (usually more than 30 heavy atoms) an RMSD below 3 Å can be considered acceptable. Also a GlideScore for a natural ligand can be calculated which gives an idea of the magnitude of the scoring function that potential inhibitors should have for that particular target. For example, the RMSD of the best scoring pose for different ligands of thymidine kinase is well below 2 Å for the majority of the ligands (Friesner et al., 2004), indicating that a proper setting of the docking process. In the same work, 282 published PDB structures were self-docked, showing that 71% of the cases have correct binding modes (RMSD < 2 Å) (Friesner et al., 2004, SaÌndor et al., 2010). In a fragment based virtual screening project, small fragments were self-docked (RMSD < 2 Å) into their X-ray positions in 82% of the cases for 190 complexes (SaÌndor et al., 2010).
The MurE substrate UDP-MurNAc-L-ala-D-glu was effectively self-docked in the same orientation as the published MurE-substrate crystal structure (PDB:2wtz), with a GlideScore of -9.92 kcal/mol. The RMSD of the heavy atoms of the UDP-MurNAc-L-ala-D-glu substrate in the docked and the crystal structures was 3.09 Å, suggesting a good preparation of the protein and adequate docking parameters for such a large ligand.
The use of docking algorithms for drug discovery application was a revolution in the 1980 decade, when it was thought that the introduction of computing science into pharmaceutical science would allow en explosion of drug regristations. However the panorama quickly changed as it was realized that the docking algorithms were not powerful enough to clearly change the picture of drug discovery and development. However the increase in the computing power of curent processors and the use of more accurate algorithms and better scoring functions have modified the general view, and today, virtual screening is widely used in drug discovery projects. Today
(limitations (necessity of a 3d-structure...) & comparison with crystallography or other biophysical methods ITC, WaterLogsy...)