This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Protein structure prediction problem is one of the most difficult problems, which are facing researchers in the current era. So far, there is no radical solution to this problem. The main difficulty reason of this problem is the difficulty in finding a correct way to calculate the protein energy as well as the difficulty of the conformational search for the protein conformation that is characterized by the lowest energy.
Solving the Protein structure prediction problem is one of the most challenging in bioinformatics (Hu et al., 2008). This problem is still one of the fundamental unsolved problems in Bioinformatics and computational structural biology and many other research areas. This chapter focuses on the computational protein structure prediction methods. Huge numbers of computational methods have been proposed to predict the protein structure. These methods are classified into three classes based on the sequence similarity to the target sequence and the utilization of protein information available in structure databases (Bonneau and Baker, 2001, Zhang, 2002b, Yi-Yuan et al., 2005). Methods which use sequence similarity in the prediction are Homology Modelling and Fold Recognition and the methods which do not use sequence similarity are ab initio. This classification is also called non-optimisation or knowledge-based methods (Homology Modelling and Fold Recognition) and optimisation methods (ab initio). Further classification classifies the ab initio methods into classical ab initio methods and knowledge-based ab initio methods. Since our focus is on the ab initio prediction and in particular the protein conformation search methods, this chapter gives an overview on Homology Modelling and Fold Recognition methods and concentrates on ab initio and its components.
2.2 Homology Modelling
Homology Modelling also known as Comparative Modelling is the easiest, the most reliable, and the most successful computational protein tertiary structure prediction method (Pedersen, 1999, Zhang, 2002a, Jones, 2004), and it is the chosen method for predicting protein tertiary structure (Augen, 2004). Homology Modelling is based on the observations from the experimental data which indicate that the protein sequence determines the protein structure and the similarity in the protein sequence imposes the similarity in the protein structure (Zhang, 2002b). This similarity could be interpreted as the new proteins evolve progressively by adding or deleting or changing the location of the amino acids while retaining the structure and function of the protein during this process (Zhang, 2002a).
Homology Modelling methods do not have to care about the folding mechanics of a protein. They build a model of tertiary structure based on the identifiable sequence association between the new protein and another protein or proteins of known structure. The prediction is done by first searching for suitable structure templates by comparing the sequence of the target protein with the sequences of proteins of known structure in the structure database. Then the target sequence is aligned to the structural templates. After that, the backbone is built from the alignment, the loops are added and side-chains are placed. The final step is refining the model (Fig. 2.1).
Figure 2.1: Homology Modelling
In order to have successful and accurate structure prediction using Homology Modelling, the target sequence should have a clear evolutionary relationship to another protein whose structure has already been solved and stored in the structure database (Skolnick and Kolinski, 2001, Zhang, 2008, Skolnick et al., 2006, Bergeron, 2002). Therefore, Homology modelling is limited to predict the structure of protein families with at least one known structure. Moreover, understanding the effects of different forces that play important roles in the formation secondary and tertiary structure cannot be obtained by using Homology Modelling (Volker et al., 1999, Pillardy et al., 2001). In other words, Homology Modelling ''does not help to answer the question why a protein adopts its specific structure'' (Lee et al., 2009).
The quality of the prediction using Homology Modelling depends on the degree of similarity between the target sequence and proteins in the structure database (Floudas, 2007, Pillardy et al., 2001). As the similarity goes down the quality of prediction goes down too (Shortle, 1999). The sequence alignment is the bottleneck of the Homology Modelling (Schonbrun et al., 2002). Achieving good quality alignments plays an important role in the success of Homology Modelling (Schonbrun et al., 2002) and the accuracy of predicted structure (Shortle, 1999, Zhang, 2002b)
2.3 Fold Recognition or Threading
When the homology modelling methods fail to find similar protein sequences to the target protein sequence, Fold Recognition methods can be applied to predict the protein structure based on the similarity between the sequence of the target protein and the structure of known protein folds.
Fold Recognition methods are based on the fact that the number of protein folds in the nature is limited, and that the structure of the new protein should be similar to one or some of these folds (Lotan, 2004). When the target protein is structurally similar to some known protein folds, these proteins are said to be remote homologous. Fold recognition tries to identify the remote homologue from the known protein folds. They choose a fold that the target sequence is best fitted by aligning the sequence with the best known protein structure folds (sequence-structure alignment) from a set of alternatives according to some energy function (Pedersen, 1999).
Fig. 2.2: Fold Recognition
Known proteins folds
Similar to Homology Modelling, the sequence similarity plays an important role in the quality of the prediction of the Fold recognition methods. Fold recognition methods fail to predict the precise fold when the similarity of the sequence is low and new folds cannot be predicted because the prediction is based on already known folds (Ginalski et al., 2005). Fold recognition is limited by the high computational cost of the energy functions to determine the correct fold (Zhang, 2002b). Fold Recognition does not provide a general understanding of the role of particular interactions in the formation of protein structure and the mechanisms of protein folding (Pillardy et al., 2001). Moreover, according to Zhang (2008) the progress and development in Fold Recognition methods have been reached to a steady state.
2.3 Ab Initio
Homology modelling and Fold Recognition methods fail to predict the protein structure when there are no similar protein sequences to the target protein sequence found in the structure databases. In this case, ab initio is the valuable complement to these methods because it can be applied more generally to predict the structure of any protein sequence.
Ab initio or de novo means from the first principles or from the beginning. Ab initio protein structure modelling or prediction methods try to predict the protein tertiary structure from the amino acids sequence using physical principles to fold the protein from a random conformation (Skolnick and Kolinski, 2001). Ab initio methods are based on the Anfinsen thermodynamic hypothesis (Anfinsen, 1973). Anfinsen hypothesis is the most widely accepted hypothesis, most of the researches in protein structure prediction are based on it (Ngan et al., 2008). It explains the process of protein folding and it was formulated in a Nobel Prize winning experiment. This experiment revealed that the protein amino acids have all the necessary information of the forces (Chan and Dill, 1993) that fold the protein into its native conformation i.e. the tertiary structure, which is the conformation with the lowest free energy. Therefore, the natural conformation of the protein in the real world corresponds to the free energy minimal conformation.
Based on Anfinsen thermodynamic hypothesis the protein structure prediction problem is formulated as a combinatorial minimisation optimisation problem (Ogura et al., 2003, Crivelli and Head-Gordon, 2004, Garduno-Juarez et al., 2003, Yun-Ling and Lan, 2006, Morales et al., 2000, Bortolussi et al., 2005, Vengadesan and Gautham, 2006). Classically, ab initio protein tertiary structure prediction carries out a conformational search with the guidance of an energy function (Lee et al., 2009). The goal is to search the protein conformational search space to find the lowest free energy conformation. In order to achieve that, three main components of the ab initio method must be addressed (Pedersen, 1999, Bonneau and Baker, 2001, Zhang, 2002b, Hardin et al., 2002, Osguthorpe, 2000, Jones, 2000, Lee et al., 2009, Huang et al., 2000). Firstly, protein conformation must be represented in a proper representation. Based on the treated degree of freedom, this representation is ranged from all atoms representation to simplified or reduced representation. Secondly, an energy function compatible with the protein conformation representation is used to calculate the conformation energy; and then, a conformational search algorithm is utilized to search the conformation search space to find the lowest free energy conformation. The protein conformational search space consists of all possible conformations of the protein.
Since the problem is formulated as an optimisation problem, optimisation is one of the promising approaches to solving this problem. Optimisation methods represent the conformation of a protein as a set of parameters depending on the type of representation. These parameters represent the protein conformational search space. The prediction of the protein tertiary structure using ab initio methods is performed by searching the protein conformational search space for the global minimum conformation. This is accomplished by generating many conformations by making changes to the parameters. The generated conformations are evaluated by employing the energy function. The search is performed iteratively and the conformation corresponding to the global minimum is then chosen to be the structure of the protein (Jones, 2000, Pillardy et al., 2001).
The prediction of the protein tertiary structure using ab initio methods is performed by searching the protein conformational search space for the global minimum conformation. This is accomplished by generating many conformations that are evaluated by employing the energy function. The conformation corresponding to the global minimum is then chosen to be the structure of the protein (Jones, 2000, Pillardy et al., 2001).
Protein tertiary structure prediction using ab initio methods is the ââ‚¬Å“holy grailââ‚¬Â of the protein structure prediction field (Jones, 2000, Helles, 2008). Ab initio structure prediction remains a difficult challenge today (Ngan et al., 2008). Since developing an accurate ab initio protein structure prediction method is one of the top ten challenges in bioinformatics (Meidanis, 2003) and a major goal of theoretical molecular biology (Friesner and Gunn, 1996). It is a true computational challenge to predict the protein tertiary structure using only the protein sequence information. This approach of prediction is the most complicated protein structure prediction approach (Feldman, 2003). According to Yang (2008) predicting the structure of protein larger than 150 amino acids using ab initio methods is a difficult task and considered a challenge due to the limited accuracy of energy functions and vast conformational search space to be search (Chivian et al., 2003) and multiple minima problem. Because of these complexities, it is generally believed that the prediction of tertiary structure from first principles is impossible. To the contrary, some other researchers, as an example, Pillardy et al. (2001) thinks that, that this problem can be solved (Pillardy et al., 2001).
Ab initio methods are not limited to predict the structures of proteins, which belong to protein families that have known structures. They have the ability to predict the protein structure in the absence of homology proteins (). In principle, they are the only methods that can be applied to predict the structure of any protein sequence (Ye, 2007). They predict proteins in different environmental conditions and provide insight into the mechanism, thermodynamics, and kinetics of protein folding (ref. An improved hybrid global optimization method for protein tertiary structure prediction 2009). However, ab initio methods are computationally intensive and provide low to moderate accuracy. Regardless of the low accuracy of the prediction of ab initio methods, these methods are useful since that the predicted structure with errors could be used to predict some aspects of the protein function (Sanchez et al., 2000).
Ab initio methods can be distinguished into knowledgeââ‚¬"based ab initio and classical ab initio (Forman, 2001) or Simulation methods (Zhang, 2002a). Knowledge based methods use constraints and rules, which are inferred from the data of known structures. Simulation methods do not use databases and predict the structure based on physics principles. At this time, the accuracy of ab initio methods is low and the success is limited to small proteins (<100 amino acids) (Lee et al., 2009). In the following subsections the three, ab initio components are described.