This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
It is a great challenge for nowadays biologists to predict the three-dimensional structure of a protein from its linear sequence. Proteins, amino acid chains, are made up from 20 different amino acids that are folded into unique three-dimensional protein structures.
In the meantime, there are two experimental methods available for determining the three-dimensional structure of a protein from its amino acid sequence: X-ray crystallography and nuclear magnetic resonance (NMR). Unfortunately, these methods are not efficient enough because they are expensive and time-consuming. As a result, there is a bad need for a fast and reliable computational method to predict structures from protein sequences, especially because the number of completely-sequenced genomes is growing very fast. There are two different approaches for protein structure prediction: comparative modeling and ab initio prediction. In comparative modeling the prediction is based on the knowledge of the structure of the existing known proteins, such that the sequence of the unknown protein is aligned to an existing known protein and if a similarity more than 35% exists, the three dimensional structure is assumed to be the same.1 Comparative methods have been proved to be efficient and applicable, so a big progress has been made using these methods.2 Ab initio method means to start predicting protein tertiary structure from the protein sequence alone without knowledge of similar folds.3 However, three reasons make ab initio folding interesting. First, there is a huge number of proteins with no homology with known structure proteins. Second, some proteins which show high homology with other proteins have different structures. Third, comparative modeling does not offer any perception of why a protein adopts a specific structure.4
A huge number of ab initio algorithms have been proposed in the last few years in two main areas: speed and quality. Some models focus on speed but not accurate enough and others focus on the quality but their running time is not acceptable.
The rest of the paper is divided into the following sections: The second section discusses protein structure prediction, it also defines ab initio protein structure prediction problem and provides a general idea of some of the methods that have been used to solve the problem. The third section presents articles that discuss the energy functions used to solve this problem. The fourth section presents different conformational search methods. The fifth section discusses various model selection methods. And the final section provides brief concluded remarks and discussion.
Protein Structure Prediction
To predict the three-dimensional structure of a protein from its linear sequence is a great challenge in the current computational biology. Proteins are amino acid chains composed of twenty different amino acids. These amino acids gather to produce exceptional three-dimensional protein structures. These twenty amino acids can be divided into different classes according to the size and other physical and chemical properties. The major class is divided into hydrophobic and hydrophilic residues.
Each protein has a distinctive amino acid sequence and folds into a unique, stable three-dimensional structure in its native state.
There are two components of any ab initio method: a search method to explore the conformational space, and an energy function that can recognize correct structures from incorrect ones. Thus, to have a successful ab initio method we need an efficient search method and an accurate energy function in addition to a selection model to select from a pool of decoy structures.
Conformational Search Methods
A successful ab initio method for protein structure prediction depends on a powerful conformational search method to find the minimum energy for a given energy function. Molecular Dynamics (MD) and Monte Carlo (MC) are two common methods to explore protein conformational search space. For protein prediction, these two methods require an enormous amount of computational resources to explore the conformational space. A main technical difficulty of Monte Carlo simulations is that the energy landscape of protein conformational space is quite rough containing many energy barriers, which may trap the MC simulation procedures.
Different conformational search methods have been developed to overcome these problems as we will discuss in this section. We will illustrate the key ideas of conformational search methods used in various ab initio protein structure prediction methods. Until now, there is no single powerful search method that outperforms the others for all cases, while we can find some which outperforms others in some cases.
Molecular Dynamics (MD)
MD simulation solves Newton's equations of motion for all steps of atom movement. This method is most often used for the study of protein folding pathways.5 One of the major issues of this method is its long simulation time, since the incremental time scale is usually in the order of femtoseconds while the fastest folding time of a small protein is in the millisecond range in nature. When a low resolution model is available, MD simulations are often carried out for structure refinement since the conformational changes are assumed to be small. One remarkable approach is the recent work of Liwo and his colleagues who have implemented a MD simulation with the coarse-grained energy function UNRES.6
Monte Carlo Simulations
Simulated Annealing (SA) is a stochastic optimization procedure which is widely applicable and has been found effective in several problems arising in computer aided circuit design.7 SA is general such that it can be applied on any optimization problem. The simulated annealing uses Metropolis algorithm to generate a series of conformational states following the canonical Boltzmann energy distribution for a given temperature, starting by high temperature MC simulation with slowly decreasing temperature. Although SA is simple, its conformational search efficiency is striking in comparison to other more sophisticated methods discussed below.
Monte Carlo with minimization (MCM)8 was successfully applied to the conformational search of ROSETTA's high-resolution energy function to overcome the multiple-minima problem. In MCM, one performs MC moves between local energy minima to compare it with the previously accepted local minimum to update the current conformation of each perturbed protein structure. For a given local energy minimum structure, a trial structure is generated randomly and is subject to local energy minimization. The acceptance of this trial structure is determined by the usual Metropolis algorithm by calculating the energy difference between the two structures.
Sometimes, MC simulations get stuck in a mets-stable state that may defom the distribution of sampled states, and that's when the energy landscape of the system is rough. Many simulation techniques have been developed to avoid this problem, one of the most successful techniques is the one based on the generalized ensemble approach in contrast to the usual canonical ensemble. These techniques were called by different names such as multi-canonical ensemble9 and entropic ensemble.10 The basic idea in these techniques is to accelerate the transition between states separated by energy barriers by modifying the transition probability such that the final energy distribution of sampling becomes more or less flat rather than bell-shaped. A famous similar method is the replica exchange Monte Carlo Method (REM)11 where a set of many Monte Carlo simulations with different temperatures covering the entire folding transition region are carried out. To overcome energy barriers, temperatures can be exchanged from neighboring simulations to sample states from time to time. Parallel hyperbolic sampling (PHS)12 further extended the REM method by dynamically deforming energy using an inverse hyperbolic sine function which more quickly explore the low-energy barriers in the protein.
Genetic Algorithm was suggested to use for protein folding simulations by Unger and Moult13 who proved the Schemata theorem in the context of protein structure observing that Genetic Algorithm gives more attention to favorable local structures while unfavorable local structures will be rapidly abandoned. Konig and Dandekar14 improved that method by investigating a new search strategy in combination with the simple genetic algorithm on a two-dimensional lattice model. They proposed a new search strategy called systematic crossover, which prevents the population from becoming too homogeneous. Comparing their method with Unger and Moult's method, they showed that their new search strategy in combination with the simple genetic algorithm, significantly increased the search effectiveness.
One of the successful genetic algorithms was proposed by Torres et al.15 with some good features like using heuristic secondary structure information to initialize the genetic algorithm with an enhanced 3D spatial representation. They used hash tables, which increase the efficiency of search and operations. In general, their model is a good predictor in comparison to the results of CASP 7, but it's still needed some work to improve the quality of the energy function and the spatial representation. It is important to highlight that the use of hash tables introduced an excellent computational technique to model amino acid spatial occupancy, because the number of collisions has been reduced to zero and the insertion, erasing and search were very efficient. Recently, Hoque, M.T et al.16 presented the ab initio protein structure prediction as a conformational search problem in a low resolution model using the genetic algorithm. They showed that nondeterministic approaches such as the genetic algorithm (GA) found to be relatively promising for conformational search. However, GA often fails to provide reasonable outcome, especially for longer sequences and that is due to the nature of the complex protein structure prediction problem.
The search approach called a branch and bound (aBB)17,18 is mathematically strict, while other methods discussed here are stochastic and heuristic methods. In this approach, the search space is cut into two halves and the lower and upper bounds (LB and UB) of the global minimum are estimated for each branched phase space. The upper bound is estimated to be the best obtained local minimum energy, while the estimate for the lower bound is obtained from the modified energy function multiplied by a quadratic term of the dissecting variables with the coefficient a. The LB can get the value of one energy minimum by giving a larg value of a. By repeating the analysis of the phase space with estimating the lower and upper bounds, we can eliminate phase spaces with LB higher than the global UB.
Depending on the use of statistics from the existing protein 3D structures, energy functions can be classified into two groups: physics-based energy functions and knowledge-based energy functions.
Physics-Based Energy Functions
In a physics-based ab initio method, interactions between atoms are based on quantum mechanical theory with only a few fundamental parameters such as the electron charge and the Planck constant; all atoms should be described by their atom types where only the number of electrons is relevant.19 However, the computational resources required to predict protein structure from quantum mechanics are still far from what is available now. Therefore, there are no serious trials to predict structures of proteins from quantum mechanics. Some of the methods that used all-atom physics-based force fields include AMBER,19,20,21 CHARMM,22,23,24 OPLS,25,26 and GROMOS96.27 These potentials contain terms associated with bond lengths, angles, torsion angles, van der Waals, and electrostatics interactions. However, The major difference between them is in the selection of atom types and the interaction parameters.
For protein folding, these classical force fields were often linked with molecular dynamics simulations. The results, from the position of protein structure prediction, were not quite successful. The first landmark in such a MD-based ab initio protein folding was the work of Duan and Kollman21 who simulated the villin headpiece in explicit solvent for four months on parallel supercomputers starting from a fully unfolded extended state. Although the protein folding resolution was not high, the best of their final model was within 4.5 Å to the native state. Recently, using Folding@Home, a worldwide-distributed computer system, this small protein was folded to 1.7 Å with a total simulation time of 300 ms.28 However, the all-atom physics-based MD simulation is still far from being used for structure prediction of long proteins (of size ~100-300 residues).
While the all-atom physics-based MD simulations were not particularly successful in structure prediction, fast search methods (such as Monte Carlo simulations and genetic algorithms) have shown to be promising in structure prediction. One example is the project of Liwo and colleagues29,30,31 who developed a physics-based protein structure prediction method which combines the coarse grained potential of UNRES with conformational space annealing method of global optimization. In UNRES , each residue is described by two interacting off-lattice united atoms, Ca and the side chain centre. This effectively reduces the number of atoms, enabling us to handle large polypeptide chains (> 100 residues). The resulting prediction time for small proteins can be then reduced to 2-10 hours. The UNRES energy function6 is probably the most accurate ab initio method available, and it has been systematically applied to many CASP targets since 1998.
A multistage hierarchical algorithm ASTRO-FOLD,17,18 is another example of physics-based modelling approaches. In this method, the first stage is to predict the helical segments by partitioning the overall target sequence into oligopeptides then calculate a free energy function which includes entropic, cavity formation, polarization, and ionization contributions for each oligopeptide. In the second stage, ß_strands, ß_sheets, and disulfide bridges are identified through a novel superstructure based mathematical framework. The RMSD of the predicted model was 4.94 Å over all 102 residues. The relative performance of this method for a number of proteins is yet to be seen in the future.
Recently, a novel approach was proposed32 which generates many thousands of models based on an idealized representation of structure given the secondary structure assignments and the physical connection constraints of the secondary structure elements. The top scoring conformations are selected for further refinement.33 The authors successfully folded a set of five small aß proteins in the range of 100-150 residues length with the first model within 4-6 Å RMSD of the native structure. Recently, development of ROSETTA,34,35 used a physics-based atomic potential Monte Carlo structure refinement, after performing the low-resolution fragment assembly in the first stage.36
Knowledge-Based Energy Function
Knowledge-based energy functions use the statistics of the solved structures in PDB. They can be divided into two types:37 The first one is sequence-independent terms that describe a generic protein such as the hydrogen bonding and the local backbone stiffness of a polypeptide chain.38 The second one is sequence-specific terms that describe local terms reflecting secondary structural preferences, including: pairwise residue contact potential,39 distance dependent atomic contact potential,40,41,42,43 and secondary structure propensities.38,44
However, the local protein structures are difficult to reproduce in the reduced modelling eventhough the knowledge-based energy functions contain secondary structure propensity.
There are two prediction methods which used knowledge-based energy functions, and showed a remarkable success in ab initio protein structure prediction.2,36
One of the remarkable methods,45 produced protein models by assembling small fragments taken from the PDB library. A successful algorithm called ROSETTA36 was developed, which showed a good performance for the free modelling targets in CASP experiments and made the fragment assembly approach popular in the field. Recently, ROSETTA was further improved by authors,34,35 who generated models in a reduced form with conformations specified with heavy backbone and Cß atoms as a first round, then in the second round they built a set of models by refining low-resolution models from the first round by an all-atom refinement procedure using an all-atom physics-based energy function, including van der Waals interactions and an orientation-dependent hydrogen-bonding potential.
After the success of the ROSETTA algorithm, many researchers developed their own energy functions using the idea of ROSETTA. For example, the energy terms of Simfold46 and Profesy;47 include van der Waals interactions, hydrophobic interactions ,backbone dihedral angle potentials, backbone hydrogen-bonding potential, pairwise contact energies, and beta-strand pairing.
TASSER,2 is another successful free modelling approach which used a knowledge-based energy to construct 3D models. The used energy terms include information about predicted secondary structure, backbone hydrogen bonds, consensus predicted side chain contacts, a short-range correlation and hydrophobic interactions. In this model, the authors used both threading, to search for possible folds first, and ab initio modelling to reassemble full-length models, and build the unaligned regions.38
Chunk-TASSER,48 is new development of TASSER which first divides the target sequences into chunks, each chunk contains three successive secondary structure elements (helix and strand).
I-TASSER,49 is another development of TASSER which used iterative Monte Carlo simulations to refine TASSER cluster centroids. I-TASSER built models with correct topology (~3-5 Å) for seven cases with sequences up to 155 residues long. Recently, a comparative study on 18 ab initio prediction algorithms identified that I-TASSER is the best method in terms of the modelling accuracy and CPU cost per target.4
Another open problem in protein structure prediction is the ability to select the best appropriate models which are closer to the native structure than to the templates used in the construction. Model Quality Assessment Programs (MQAPs) were developed to perform this task.50 In general, model selection approaches can be classified into two types; the energy based and the free-energy based. We will focus in this section on the energy-based model selection methods, and we will discuss three methods: (1) physics-based energy function; (2) knowledge-based energy function; (3) scoring function describing the compatibility between the target sequence and model structures. There is another popular method in Model Quality Assessment Programs called consensus based method, which uses the similarity of other models taken from the predictions generated by different algorithms.51 This method is also called meta-predictor approach.49 The essence of this method is similar to the clustering approach since both assume the most frequently occurring state as the near-native ones. This approach has been mainly used for selecting models generated by threading-servers, and it has so far been the most successful MQAP. 51,49
Physics-Based Energy Function
To develop an all-atom physics-based energy function, some researchers used existing solvation potential methods to discriminate the native structure from decoys that are generated by threading on other protein structures. For example, CHARMM23 and EEF153 were exploited and found that the energy of the native state is lower than those of decoys in most cases.52 Later, Petrey and Honig54 used CHARMM and a continuum treatment of the solvent, Dominy and Brooks;55 Feig and Brooks56 used CHARMM plus GB solvation, Felts and colleagues57 used OPLS plus GB, Lee and Duan58 used AMBER plus GB, and Hsieh and Luo59 used AMBER plus Poisson-Boltzmann solvation potential on a number of structure decoy sets (including Skolnick decoy set11,60 and CASP decoys set).61 All the above authors obtained similar results, i.e. the native structures have lower energy than decoys in their potentials. The claimed success of model discrimination of the physics-based potentials seems contradicted by other less successful physics-based structure prediction results. Recently, Wroblewska and Skolnick showed that the AMBER plus GB potential can only discriminate the native structure from roughly minimized TASSER decoys.62 Their result partially explained the inconsistency between the widely-reported decoy discrimination ability of physics-based potentials and the less successful folding results.
Knowledge-Based Energy Function
A pairwise residue-distance based potential using the statistics of known PDB structures was developed in which a variety of knowledge-based potentials have been proposed, these potentials include atomic interaction potential, solvation potential, hydrogen bond potential, torsion angle potential, etc.63 In coarse-grained potentials, each residue is represented either by a single or a few atoms, for example, Ca-based potentials,64 Cß-based potentials,65 side chain centre-based potentials,66,67,68,39,69,70 side chain and Ca-based potentials.71 One of the most widely-used knowledge-based potentials is a residue-specific all-atom distance-dependent potential, which was first formulated by Samudrala and Moult;40 it counts the distances between 167 pseudo-atoms. Following this, several atomic potentials with various reference states have been proposed.41,42, 43,72,73 All claimed in their tests that native structures can be distinguished from decoy structures. However, the task of selecting the near native models out of many decoys remains as a challenge for these potentials;37 this is more important than native structure recognition because, in reality, there are no native structures available from computer simulations. Based on the CAFASP4-MQAP experiment in 2004,50 the best-performing energy functions are Victor/FRST73 which incorporates an all-atom pair wise interaction potential, solvation potential and hydrogen bond potential, and MODCHECK74 which includes Cß atom interaction potential and solvation potential. In CASP7-MQAP in 2006, Pcons developed by Elofsson group based on structure consensus performed best.51
Sequence-Structure Compatibility Function
In the third type of MQAPs, the best models are selected based on the compatibility of target sequences to model structures, not purely based on energy functions. The earliest example used threading scores to evaluate structures.75 It was improved later to use a quadratic error function to describe the non-covalently bonded interactions, where near-native structures have fewer errors than other decoys.76 Verify3D77 used local threading scores in a 21-residue window which improved the previous method.75 GenThreader78 used neural networks to classify native and non-native structures. The inputs of GenThreader include pairwise contact energy, solvation energy, alignment score, alignment length, and sequence and structure lengths. Similarly, based on neural networks, ProQ79 was built for quality prediction of decoy structures. The inputs of ProQ include contacts, solvent accessible area, protein shape, secondary structure, structural alignment score between decoys and templates, and the fraction of protein regions to be modeled from templates. Recently, a consensus MQAP80 called ModFold was developed which combined scores obtained from ProQ,79 MODCHECK74 and ModSSEA.81 The author showed that ModFold outperformed the individual MQAPs tested.
Successful ab initio modelling for protein tertiary structure prediction from its amino acid sequence alone is considered as the "Holy Grail" of protein structure prediction,82 because its success would mark the eventual solution to the problem. Except for the generation of 3D structures, ab initio modelling can also help us understand the underlying principles on how proteins fold in nature; this could not be done by the template-based modelling approaches which build 3D models by copying the framework of other solved structures.
The current up to date ab initio protein structure prediction methods utilize knowledge-based information from known structures, which have many benefits. The accuracy of ab initio modelling for proteins with length 100-120 residues had been significantly improved in the last decade using good parameterized knowledge-based potential terms assisted by different advances in the conformational search methods.
We can get further improvement by parallelizing an accurate potential energy function and using an efficient optimization method. At the same time, there is a need for systematic benchmarking of conformational search methods, such that the advantages and limitations of available search methods can be explored. It is important to highlight that the accuracy and speed of ab initio prediction methods, based only on the physicochemical principles of interaction, are still far behind in comparison with the methods utilizing bioinformatics and knowledge-based information. However, the physics-based potentials have been verified to be useful in refinement of the side chain atoms and the peptide backbones. Therefore, a composite method combining both knowledge-based and physics-based energy terms can be a promising approach to the problem of ab initio modelling.
Conclusion and proposed framework
We can observe from the previous review that a powerful search algorithm is a very important component in ab initio modeling; a new heuristic optimization algorithm called Harmony Search83 was developed and proven to be a powerful search tool and able to outperform other methods (such as GA and simulated annealing) in terms of solution quality and computational time.84
Recently-developed Harmony Search (HS) algorithm was successfully applied to a combinatorial structure optimization with discrete decision variables. The results showed that the proposed algorithm is potentially a powerful search and optimization technique in terms of attaining better solutions and faster convergence. Using a powerful search tool like HS would show promising results.
Harmony Search (HS) has the following steps: 83
- Step1. Initialize a Harmony Memory (HM).
- Step2. Improvise a new harmony from HM.
- Step3. If the new harmony is better than minimum harmony in HM, include the new harmony in HM, and exclude the minimum harmony from HM.
- Step4. If stopping criteria is not satisfied, go to Step 2.
We introduce here an adapted harmony search algorithm for ab initio protein tertiary structure prediction, which includes the following steps:
- Pick a sequence from the sequence database.
- Get the torsion angles and generate conformation.
- Apply the harmony search algorithm on this conformation until find the best conformation with the lowest energy as shown in figure 1.
In our adapted harmony method, the harmony is the conformation of torsion angles of the protein sequence which we want to predict its tertiary structure. The result from the harmony search algorithm is the protein structure which can be stored in our structure database.
This research work was supported by APEX University Initiative Grant and USM Fellowship.
- Edwards, Y. J. K. & Cottage, A. Bioinformatics methods to predict protein structure and function. Mol. Biotechnol. 2003 ;23: 139-166.
- Zhang Y, Skolnick J . Automated structure prediction of weakly homologous proteins on a genomic scale. Proc Natl Acad Sci U S A . 2004a;101:7594-7599.
- Hardin, C., Pogorelov, T. V. & Luthey-Schulten, Z. Ab initio protein structure prediction. Curr. Opin. Struct. Biol. 2002;12: 176-181.
- Helles G . A comparative study of the reported performance of ab initio protein structure prediction algorithms. J R Soc Interface . 2008; 5(21):387-396.
- Duan Y, Kollman PA . Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. Science. 1998; 282(5389):740-744.
- Liwo A, Pincus MR, Wawak RJ, et al. Calculation of protein backbone geometry from alpha-carbon coordinates based on peptide-group dipole alignment. Protein Sci. 1993; 2(10):1697-1714.
- Kirkpatrick S, Gelatt CD, Vecchi MP . Optimization by simulated annealing. Science. 1983; 220(4598):671-680.
- Li Z, Scheraga HA. Monte Carlo-minimization approach to the multiple-minima problem in protein folding. Proc Natl Acad Sci USA. 1987; 84(19):6611-6615.
- Berg BA, Neuhaus T . Multicanonical ensemble: a new approach to simulate first-order phase transitions. Phys Rev Lett. 1992; 68(1):9-12.
- Lee J . New Monte Carlo algorithm: entropic sampling. Phys Rev Lett. 1993; 71(2):211-214.
- Kihara D, Lu H, Kolinski A, et al. TOUCHSTONE: an ab initio protein structure prediction method that uses threading-based tertiary restraints. Proc Natl Acad Sci USA . 2001; 98(18):10125-10130.
- Zhang Y, Kihara D, Skolnick J . Local energy landscape flattening: parallel hyperbolic Monte Carlo sampling of protein folding. Proteins. 2002; 48(2):192-201.
- Unger, R., Moult, J. Genetic Algorithms for Protein Folding Simulations. J. Mol. Biol.1993; 231, 75-81.
- Konig R, Dandekar T. Improving genetic algorithms for protein folding simulations by systematic crossover. Biosystems. 1999; 50: 17-25.
- Torres, S. R., Romero, D. C., Vasquez, L. F., and Ardila, Y. J. A novel ab-initio genetic-based approach for protein folding prediction. In Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation GECCO '07. ACM, New York, NY. 2007; 393-400.
- Hoque, M.T, Chetty, M, Sattar, A. Genetic Algorithm inAb Initio Protein Structure Prediction Using Low Resolution Model: A Review. Springer. 2009; 224: 317-342.
- Klepeis JL, Floudas CA . ASTRO-FOLD: a combinatorial and global optimization framework for Ab initio prediction of three-dimensional structures of proteins from the amino acid sequence. Biophys J. 2003; 85(4):2119-2146.
- Klepeis JL, Wei Y, Hecht MH, et al. Ab initio prediction of the three-dimensional structure of a de novo designed protein: a double-blind case study. Proteins. 2005; 58(3):560-570.
- Weiner SJ, Kollman PA, Case DA, et al. A new force field for molecular mechanical simulation of nucleic acids and proteins. J Am Chem Soc. 1984; 106: 765-784.
- Cornell WD, Cieplak P, Bayly CI, et al. A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J Am Chem Soc. 1995; 117:5179-5197.
- Duan Y, Kollman PA . Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. Science. 1998; 282(5389):740-744.
- Brooks BR, Bruccoleri RE, Olafson BD, et al. CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J Comput Chem. 1983; 4(2):187-217.
- Neria E, Fischer S, Karplus M . Simulation of activation free energies in molecular systems. J Chem Phys. 1996; 105(5):1902-1921.
- MacKerell Jr. AD, Bashford D, Bellott M, et al. All-atom empirical potential for molecular modeling and dynamics studies of proteins. J Phys Chem B. 1998; 102 (18):3586-3616.
- Jorgensen WL, Tirado-Rives J. The OPLS potential functions for proteins. Energy minimizations for crystals of cyclic peptides and crambin. J Am Chem Soc. 1998; (110):1657-1666.
- Jorgensen WL, Maxwell DS, Tirado-Rives J. Development and testing of the OPLS All-Atom Force Field on conformational energetics and properties of organic liquids. J Am Chem Soc. 1996; 118:11225-11236.
- van Gunsteren WF, Billeter SR, Eising AA, et al. Biomolecular simulation: the GROMOS96 manual and user guide. VDF Hochschulverlag AG an der ETH, Zurich.1996.
- Zagrovic B, Snow CD, Shirts MR, et al. Simulation of folding of a small alpha-helical protein in atomistic detail using worldwide-distributed computing. J Mol Biol. 2002; 323(5):927-937.
- Liwo A, Lee J, Ripoll DR, et al. Protein structure prediction by global optimization of a potential energy function. Proc Natl Acad Sci USA. 1999; 96(10):5482-5485.
- Liwo A, Khalili M, Scheraga HA . Ab initio simulations of protein-folding pathways by molecular dynamics with the united-residue model of polypeptide chains. Proc Natl Acad Sci USA. 2005; 102(7):2362-2367.
- Oldziej S, Czaplewski C, Liwo A, et al. Physics-based protein-structure prediction using a hierarchical protocol based on the UNRES force field: assessment in two blind tests. Proc Natl Acad Sci USA. 2005; 102(21):7547-7552.
- Taylor WR, Bartlett GJ, Chelliah V, et al. Prediction of protein structure from ideal forms. Proteins. 2008; 70(4):1610-1619.
- Jonassen I, Klose D, Taylor WR. Protein model refinement using structural fragment tessellation. Comput Biol Chem. 2006; 30(5):360-366.
- Bradley P, Misura KM, Baker D. Toward high-resolution de novo structure prediction for small proteins. Science. 2005; 309(5742):1868-1871.
- Das R, Qian B, Raman S, et al. Structure prediction for CASP7 targets using extensive allatom refinement with Rosetta@home. Proteins. 2007; 69(S8):118-128.
- Simons KT, Kooperberg C, Huang E, et al. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol. 1997; 268(1):209-225.
- Skolnick J. In quest of an empirical potential for protein structure prediction. Curr Opin Struct Biol. 2006; 16(2):166-171.
- Zhang Y, Kolinski A, Skolnick J. TOUCHSTONE II: a new approach to ab initio protein structure prediction. Biophys J. 2003; 85(2):1145-1164.
- Skolnick J, Jaroszewski L, Kolinski A, et al. Derivation and testing of pair potentials for protein folding. When is the quasichemical approximation correct? Protein Science. 1997; 6:676-688.
- Samudrala R, Moult J. An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. J Mol Biol. 1998; 275(5):895-916.
- Lu H, Skolnick J. A distance-dependent atomic knowledge-based potential for improved protein structure selection. Proteins. 2001; 44(3):223-232.
- Zhou H, Zhou Y. Distance-scaled, finite ideal-gas reference state improves structurederived potentials of mean force for structure selection and stability prediction. Protein Sci. 2002; 11(11):2714-2726.
- Shen MY, Sali A. Statistical potential for assessment and prediction of protein structures. Protein Sci. 2006; 15(11):2507-2524.
- Zhang Y, Skolnick J. The protein structure prediction problem could be solved using the current PDB library. Proc Natl Acad Sci USA. 2005; 102:1029-1034.
- Bowie JU, Eisenberg D. An evolutionary approach to folding small alpha-helical proteins that uses sequence information and an empirical guiding fitness function. Proc Natl Acad Sci USA. 1994; 91(10):4436-4440.
- Fujitsuka Y, Chikenji G, Takada S. SimFold energy function for de novo protein structure prediction: consensus with Rosetta. Proteins. 2006; 62(2):381-398.
- Lee J, Kim SY, Joo K, et al. Prediction of protein tertiary structure using PROFESY, a novel method based on fragment assembly and conformational space annealing. Proteins. 2004; 56(4):704-714.
- Zhou H, Skolnick J. Ab initio protein structure prediction using chunk-TASSER. Biophys J. 2007; 93(5):1510-1518.
- Wu S, Skolnick J, Zhang Y. Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol. 2007; 5:17.
- Fischer D. Servers for protein structure prediction. Curr Opin Struct Biol. 2006; 16(2):178-182.
- Wallner B, Elofsson A. Prediction of global and local model quality in CASP7 using Pcons and ProQ. Proteins. 2007; 69(S8):184-193.
- Lazaridis T, Karplus M. Discrimination of the native from misfolded protein models with an energy function including implicit solvation. J Mol Biol. 1999a; 288(3):477-487.
- Lazaridis T, Karplus M. Effective energy function for proteins in solution. Proteins. 1999b; 35(2):133-152.
- Petrey D, Honig B. Free energy determinants of tertiary structure and the evaluation of protein models. Protein Sci. 200; 9(11):2181-2191.
- Dominy BN, Brooks CL. Identifying native-like protein structures using physics-based potentials. J Comput Chem. 2002 23(1):147-160.
- Feig M, Brooks CL. Evaluating CASP4 predictions with physical energy functions. Proteins. 2002; 49(2):232-245.
- Felts AK, Gallicchio E, Wallqvist A, et al. Distinguishing native conformations of proteins from decoys with an effective free energy estimator based on the OPLS all-atom force field and the Surface Generalized Born solvent model. Proteins. 2002; 48(2):404-422.
- Lee MC, Duan Y. Distinguish protein decoys by using a scoring function based on a new AMBER force field, short molecular dynamics simulations, and the generalized born solvent model. Proteins. 2004; 55(3):620-634.
- Hsieh MJ, Luo R. Physical scoring function based on AMBER force field and Poisson-Boltzmann implicit solvent for protein structure prediction. Proteins. 2004; 56(3):475-486.
- Skolnick J, Zhang Y, Arakaki AK, et al. TOUCHSTONE: a unified approach to protein structure prediction. Proteins. 2003; 53(Suppl 6):469-479.
- Moult J, Fidelis K, Zemla A, et al. Critical assessment of methods of protein structure prediction (CASP): round IV. Proteins. 2001;(Suppl 5):2-7.
- Wroblewska L, Skolnick J. Can a physics-based, all-atom potential find a protein's native structure among misfolded structures? I. Large scale AMBER benchmarking. J Comput Chem. 2007; 28(12):2059-2066.
- Sippl MJ. Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. J Mol Biol. 1990; 213(4):859-883.
- Melo F, Sanchez R, Sali A. Statistical potentials for fold assessment. Protein Sci. 2002;11(2):430-448.
- Hendlich M, Lackner P, Weitckus S, et al. Identification of native protein folds amongst a large number of incorrect models. The calculation of low energy conformations from potentials of mean force. J Mol Biol. 1990; 216(1):167-180.
- Bryant SH, Lawrence CE. An empirical energy function for threading protein sequence through the folding motif. Proteins. 1993; 16(1):92-112.
- Kocher JP, Rooman MJ, Wodak SJ. Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches. J Mol Biol. 1994; 235(5):1598-1613.
- Thomas PD, Dill KA. Statistical potentials extracted from protein structures: how accurate are they? J Mol Biol. 1996l; 257(2):457-469.
- Zhang C, Kim SH (2000) Environment-dependent residue contact energies for proteins. Proc Natl Acad Sci USA. 2000; 97(6):2550-2555.
- Zhang C, Liu S, Zhou H, et al. An accurate, residue-level, pair potential of mean force for folding and binding based on the distance-scaled, ideal-gas reference state. Protein Sci. 2004; 13(2):400-411.
- Berrera M, Molinari H, Fogolari F. Amino acid empirical contact energy definitions for fold recognition in the space of contact maps. BMC Bioinformatics. 2003; 4:8.
- Wang K, Fain B, Levit M, et al. Improved protein structure selection using decoy-dependent discriminatory functions. BMC Struct Biol. 2004; 4(8).
- Tosatto SC. The victor/FRST function for model quality estimation. J Comput Biol. 2005; 12(10):1316-1327.
- Pettitt CS, McGuffin LJ, Jones DT. Improving sequence-based fold recognition by using 3D model quality assessment. Bioinformatics. 2005; 21(17):3509-3515.
- Luthy R, Bowie JU, Eisenberg D. Assessment of protein models with three-dimensional profiles. Nature. 1992; 356(6364):83-85.
- Colovos C, Yeates TO. Verification of protein structures: patterns of nonbonded atomic interactions. Protein Sci. 1993; 2(9):1511-1519.
- Eisenberg D, Luthy R, Bowie JU. VERIFY3D: assessment of protein models with threedimensional profiles. Method Enzymol. 1997; 277:396-404.
- Jones DT. GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol. 1999; 287(4):797-815.
- Wallner B, Elofsson A. Can correct protein models be identified? Protein Sci. 2003; 12(5):1073-1086.
- McGuffin LJ. The ModFOLD server for the quality assessment of protein structural models. Bioinformatics. 2008; 24(4):586-587.
- McGuffin LJ. Benchmarking consensus model quality assessment for protein fold recognition. BMC Bioinformatics. 2007; 8:345.
- Zhang Y. Progress and challenges in protein structure prediction. Curr Opin Struct Biol. 2008; 18(3):342-348.
- Geem, Z.W., Kim, J.H., Loganathan, G.V. A new heuristic optimization algorithm: Harmony search. Simulation. 2001; 76, 60-68.
- Geem, Z.W. Optimal cost design of water distribution networks using harmony search. Engineering Optimization. 2006; 38, 259-280.