Overview Of Disulfide Bond Biology Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Disulfide bond is a kind of post translational modifications. It is common to many proteins and play important roles in protein structure and function. In this study, we developed a new method for both inter- and intra-protein disulfide bond prediction based on nearest neighbor algorithm with Maximum Relevance Minimum Redundancy (mRMR) method followed by Incremental Feature Selection (IFS). We incorporated features of sequence conservation, residual disorder, amino acid factor for both inter- and intra-protein disulfide bond predictions. We also incorporated the distance feature for intra-protein disulfide bond prediction specifically. Our approach can achieve a prediction accuracy of 0.8702 for inter-protein disulfide bonds using 128 features and 0.9219 for intra-protein disulfide bonds using 261 features. We then analyzed and compared the optimal feature sets for both intra- and inter-protein disulfide bonds. The results demonstrated that conservation, amino acid factor and disorder features can affect the disulfide bond formation. Our analysis also demonstrated that there are some unique features between intra- and inter-protein disulfide bonds. The selected optimal feature sets, especially the top features may provide important clues for understanding the mechanism determinging the disulfide bond formation and for further experimental validation in this research area.

KEYWORDS

Disulfide bond, inter-protein, intra-protein, Maximum Relevance Minimum Redundancy, Incremental Feature Selection, nearest neighbor algorithm

Introduction

Disulfide bond is a kind of post translational modifications. It is formed by the oxidation of thiol (-SH) groups in cysteine residues inter- or intra-proteins. Disulfide bonds are common to many proteins and have close relationship to protein structures since it can impose geometrical constraints on the protein backbones [1-2]. Correct localization of disulfide bonds can greatly limit the search space of possible conformation [3-4] and facilitate the prediction of protein 3D structure. Disulfide bonds have been demonstrated to play roles in various physiological functions, such as hemostasis [5], cell death [6], G-protein-receptors [7], growth factors [8]. Disulfide bonds have also been implicated in various pathological processes, such as tumor immunity [9], neurodegenerative diseases [6].

In addition, to determine cysteine disulfide bonding pattern by conventional experimental approaches such as mass spectrometry method [10-11], NMR method [12] and radiation experiment [13], may be time-consuming and labor-intensity especially for large scale data. Therefore, it is much more convenient and fast-speed to predict cysteine disulfide bonding pattern using in silico algorithms, especially at the proteome level.

There had been some computational methods for disulfide bond prediction. For instances, Lin Zhu et al. used both global and local features to develop a method for disulfide bond prediction. Using Support vector regression model and Based on three newly developed feature selection methods, the prediction accuracy of their method for disulfide bonds achieved 80.3%. [14] However, there method can only predict intra-protein disulfide bonds. Rotem Rubinstein et al. analyzed correlated mutation patterns based on multiple sequence alignments to predict disulfide bonds. Their method s prediction accuracies for proteins with two, three and four disulfide bonds are 73, 69 and 61% respectively. [15] This method has the limitation that they cannot unambiguously predict all disulfide bonds of a protein if more than one fully conserved disulfide bond exists. Hsuan-Hung Lin et al. developed a web server for disulfide bond prediction using the coordination of the C of each amino acid in the protein as the feature. Their method performed better than methods before, but had limitations such as do not suitable for protein sequences containing cysteines located in the metal binding sites. As described above, most proposed methods have limitations, thus the development of a more general computational method for both inter- and intra- protein disulfide bond prediction is quite important.

In this work, we considered both inter- and intra-protein disulfide bonds and developed two computational methods based on machine learning approach (NNA, Nearest Neighbor Algorithm) combining with feature selection (IFS based on mRMR) for the prediction of these two types of disulfide bonds respectively. We used three kinds of features: Position-Specific Scoring Matrices (PSSM) conservation scores, amino acid factors and disorder score. Our approach can achieve a prediction accuracy of 0.8702 for inter-protein disulfide bonds using 128 features and 0.9219 for intra-protein disulfide bonds using 261 features. We then analyzed and compared the optimal feature sets for both intra- and inter-protein disulfide bonds. The results demonstrated that conservation, amino acid factor and disorder features can affect the disulfide bond formation. Our analysis also demonstrated that there are some unique features between intra- and inter-protein disulfide bonds. The selected optimal feature sets, especially the top features may provide important clues for understanding the mechanism determinging the disulfide bond formation and for further experimental validation in this research area.

2 Materials and Methods

2.1 Dataset

We downloaded 2930 protein sequences containing disulfide bonds from SysPTM [16]. Then we performed our analysis for two disulfide bonding types: inter- and intra-protein. For inter-protein disulfide bonds, we extracted all cysteine sites containing 9 residues (including cysteine itself and 4 residues at both upstream and downstream sites). We took all known disulfide bonds as positive samples and took 5 fold number of the positive sample number as negative samples. Then we excluded the samples containing NA features, resulting in totally 2227 samples containing 370 positive samples and 1857 negative samples. The sample dataset for inter-protein disulfide bond can were given in DataSet S1.

For intra-protein disulfide bonds, we calculated all C pairs within each protein sequence, then takes known disulfide bond C pairs as positive samples. Because 94.05226% of the positive samples have distances between C-C less than 100 residues, we selected 5 fold negative samples from C pairs with distances less than 100 residues in the remaining C pairs. By excluding C pairs with less than 8 surrounding sites at each C site, we totally get 46988 samples including 7089 positive samples and 39899 negative samples. The sample dataset for intra-protein disulfide bond can were given in DataSet S2.

For independent test, we downloaded 3217 protein sequences containing experimentally validated disulfide bonds from UniProt [17-18]. We removed the 2898 protein sequences already used in our training dataset and protein sequences with less than 50 residues, resulting in 260 protein sequences. We then extracted all 9-residue peptides with cysteine at center containing 4 amino acids at both C- and N-terminals for inter-protein disulfide bond prediction, resulting in totally 2750 sample peptides, including 54 positive sample peptides and 2696 negative sample peptides. The independent test data set for inter-protein disulfide bond prediction was given in DataSetS3.

For intra-protein disulfide bonds, there are totally 37948 possible cysteine pairs, including 747 disulfide bond pairs and 37201 non-disulfide bond pairs. Within the 37201 non-disulfide bond pairs, there are 10911 cysteine pairs with distances less than 100 residues. We then excluded cysteine pairs with less than 9 residues at either C- or N-terminal from the 747 disulfide bond pairs and 10911 non-disulfide bond pairs, resulting in totally 11213 sample cysteine pairs including 695 positive sample cycteine pairs and 10518 negative sample cysteine sample pairs with both 9 residues at the two sides. The independent test data set for intra-protein disulfide bond prediction was given in DataSetS4.

2.2 Feature Construction

2.2.1 PSSM conservation score features

Evolutionary conservation is an important aspect of biology research and plays important role in determination of post-translational modifications, such as tyrosine sulfation [19], disulfide bond formation [14]. In our study, we used Position Specific Iterative BLAST (PSI BLAST) [20] to quantify the conservation probabilities of each amino acid against 20 different amino acids by calculating an 20-dimensional vector. All such 20-dimensional vectors for all residues in a given protein sequence formed the Position Specific Scoring Matrix (PSSM). Residues which are more important for biological function are more conserved through cycles of PSI BLAST. In this study, PSSM conservation score was used as the conservation features of each amino acid in a given protein sequence.

2.2.2 The features of amino acid factors

The diversity and specificity of protein structures and functions are largely attributed to the different compositions of different amino acids, which have different physicochemical properties. The effect of amino acid properties on post-translational modification determination has been demonstrated by previous studies [14, 19].

AAIndex [21] is a database maintaining various amino acid physicochemical and biochemical properties. Atchley et al [22] performed multivariate statistical analyses on AAIndex [21]. They summarized and transformed AAIndex to five highly compact numeric patterns reflecting polarity, secondary structure, molecular volume, codon diversity, and electrostatic charge. We used these five numerical pattern scores (we called amino acid factors ) to represent the respective properties of each amino acid in our research.

2.2.3 The features of disorder score

Protein disordered region is a protein segment that lack 3-D structures under physiological conditions. Previous studies demonstrate that these regions always contain PTM sites, sorting signals and play important roles in protein structure and function [23-25].

In our study, VSL2 [26], which can accurately predict both long and short disordered regions in proteins, was used to represent each of the amino acid disorder status in the protein sequence by calculating disorder score. The disorder scores of Tyrosine site and 4 flanking sites at both C-terminal and N-terminal composed the disorder score features in our study.

2.2.4 The feature space

2.2.4.1. for inter-protein disulfide bond

For Tyrosine (Y) site, 20 PSSM conservation scores and 1 disorder score, totally 21 features was used. For each of the 8 surrounding amino acids, 20 PSSM conservation scores, 5 amino acid factors and 1 disorder score, totally 26 features were used. Over all, each sample peptide was encoded by features.

2.2.4.2. for intra-protein disulfide bond

We calculated the summary and subtract absolute values of the two paired C sites, resulting in totally 458 features. The distance between the two paired C sites was also included as a feature. So the overall feature space contains 458+1=459 features.

2.3 mRMR method

To rank the features according to their importance, we used Maximum Relevance, Minimum Redundancy (mRMR) Method, which could rank features based on the trade-off between maximum relevance to target and minimum redundancy to the already selected features. Features have a smaller index means that they are more important features.

We used mutual information (MI) to quantify the relation between two vectors, which was defined as following:

(1)

where , are vectors, is the joint probabilistic density, and are the marginal probabilistic densities.

To quantify both relevance and redundancy, we defined as the whole feature set, as the already-selected feature set containing m features and as the to-be-selected feature set containing n features. The relevance between feature in and the target can be calculated by:

(2)

The redundancy of the feature in with all the features in can be calculated by:

(3)

To obtain the feature in with maximum relevance and minimum redundancy, Eq(2) and Eq(3) are combined with the mRMR function:

(4)

For a feature set with features, the feature evaluation will continue N rounds. After these evaluations, we will get a feature set by mRMR method:

(5)

In this feature set , each feature has an index h, indicating which round the feature is selected. A better feature will be selected earlier and have a smaller index h.

2.4 Nearest Neighbor Algorithm

In our study, Nearest Neighbor Algorithm (NNA) is used as prediction model. NNA calculates similarities between the test sample and all the training samples and then assigns the test sample to the class of the training sample with the largest similarity. In our study, the distance between vector and is defined as follow [27-28]:

(6)

where is the inner product of and , and represents the module of vector . The smaller is, the more similar to is.

In NNA, given a vector and training set , will be designated to the same class of its nearest neighbor in , i.e. the vector having the smallest :

(7)

2.5 Jackknife Cross-Validation Method

We used Jackknife Cross-Validation Method [29-31], an objective and effective way to evaluate the performance of our classifier. In Jackknife Cross-Validation Method, every sample is tested by the predictor trained with all the other samples. The prediction accuracies for the positive samples, negative samples and the overall samples were defined as following: (8)

2.6 Incremental Feature Selection (IFS)

After ranking of features by mRMR method based on their importance, we used Incremental Feature Selection (IFS) to determine the optimal number of features.

An incremental feature selection is conducted for each of the independent predictor with the ranked features. Features in a set are added one by one from higher to lower rank. If one feature is added, a new feature set is obtained, then we get N feature sets where N is the number of features, and the i-th feature set is:

Based on each of the N feature sets, an NNA predictor was constructed and tested using Jackknife cross-validation test. With N overall accurate prediction rates, positive accurate rates and negative accurate rates calculated, we obtain an IFS table with one column being the index i and the other three columns to be the overall accurate rate, positive accurate rate and negative accurate rate, respectively. is the optimal feature set that achieves the highest overall accurate rate.

3 Results and Discussion

3.1 mRMR result

Using the mRMR program, we obtained the ranked mRMR list of 229 and 459 features for inter-protein and intra-protein disulfide bonds respectively. Within the list, the smaller index of feature indicates more important roles in discriminate positive samples from negative ones. The mRMR list was used in IFS procedure for feature selection and analysis.

3.2 IFS result

3.2.1. for inter-protein disulfide bonds

Based on the outputs of mRMR, we built 229 individual predictors for the 229 sub-feature sets to predict disulfide bond sites. we tested each of the 229 predictors and obtained the IFS result which can be found in Table S1. Figure 1A shows IFS curve plotted based on Table S1. The maximum accuracy is 0.8752 containing 207 features. To focus our analysis on a relatively small set of features, we selected the first predictive accuracy more than 0.87 that is 0.8702 containing 128 features. These 128 features were considered as the optimal feature set of our classifier. The 128 optimal features were given in Table S2.

3.2.2. for intra-protein disulfide bonds

Based on the outputs of mRMR, we built 459 individual predictors for the 459 sub-feature sets to predict disulfide bond sites. we tested each of the 459 predictors and obtained the IFS result which can be found in Table S3. Figure 1B shows IFS curve plotted based on Table S3. The maximum accuracy is 0.9219 containing 261 features. These 261 features were considered as the optimal feature set of our classifier. The 261 optimal features were given in Table S4.

3.3 Optimal feature set analysis

We investigated the selected optimal sets for both inter-protein and intra-protein disulfide bond determination. The numbers of each of the three types of features (PSSM conservation scores, amino acid factors and disorder scores) in optimal feature set was investigated.

3.3.1. for inter-protein disulfide bonds

As shown in Figure 2A, in the optimized 128 features, there were 26 amino acid factor features, 3 disorder score features and 99 PSSM conservation score features. This suggests that all the three kinds of features contribute to the prediction of protein cysteine disulfide bond patterns and conservation may play irreplaceable role for disulfide bond prediction.

Figure 2B demonstrates that the center site (site 5) and relatively distal sites (site 1, 2 and 9) have the most influence on inter-protein disulfide bond prediction. Sites at 3, 4, 6 and 8 have relative small effect on tyrosine sulfation, and sites 7 have the smallest effect on inter-protein disulfide bond prediction. The site-specific distribution of the optimal feature set is quite interesting, revealing that the residues at the two sides and the center are more important for inter-protein disulfide bond prediction than residues at the directly adjacent sites to the cysteine.

3.3.2. for intra-protein disulfide bonds

As shown in Figure 2C, In the optimized 261 features, there were 47 amino acid factor features, 3 disorder score features and 210 PSSM conservation score features and one distance feature. This suggests that all the three kinds of features and distance feature contribute to the prediction of protein cysteine disulfide bond sites and conservation may play irreplaceable role for disulfide bond prediction.

The site specific distribution of the optimal feature set shown in Figure 2D demonstrates that the center (site 4, 5, 6) and relatively distal sites (site 1, 2 and site 8, 9) have the most influence on cysteine disulfide bond determination. The remaining two sites (site 3 and site 7) have relatively small effect on disulfide bond determination. The site-specific distribution of the optimal feature set is quite interesting, revealing that the residues at the two sides and the center are more important for cysteine disulfide bond determination than residues at the directly adjacent sites to the cysteine.

3.3.3 Optimal feature set comparison between inter- and intra-protein disulfide bonds

From Figure 2A and Figure 2C, we can see that pssm conservation, amino acid factor and disorder can all contribute to both inter- and intra- disulfide bond prediction. Site specific distribution of the optimal feature set illustrated that sites at the center (site 4, 5 and 6) and two sides (site 1, 2 and site 8, 9) contribute to the disulfide bond prediction for both inter- and intra-protein disulfide bond prediction. Site 7 contributes relatively small to both inter- and intra-protein disulfide bond prediction. However, features at site 3 contribute more to inter-protein disulfide bond prediction than to intra-protein disulfide bond prediction.

3.4. PSSM conservation feature analysis

3.4.1. for inter-protein disulfide bonds

We investigated the feature-specific and site-specific distribution of these 99 PSSM conservation features in the optimal feature set.

As shown in Figure 3B, the conservation status of cysteine (site 5) is most important. Sites at both sides (site 1, 2, 3 and site 8, 9) play relatively more important role in disulfide bond determination. The sites adjacent the cysteine site play relatively less important role in disulfide bond determination.

As shown in Figure 3A, the conservation status against Cysteine (C) play the most important role in disulfide bond determination. Otherwise, the conservation status against A, M, W, H, P, Y, V plays relative more important role in disulfide bond determination.

3.4.2. for intra-protein disulfide bonds

As shown in Figure 3C, the conservation status against C, S and A has the most effect on disulfide bond determination than against other residues. The conservation status against R, H, K, Y has the second most important effect on disulfide bond determination.

As shown in Figure 3D, the conservation status at the center (site 4, 5 and 6) and relatively distal sites (site 1, 2 and site 8, 9) has the most important effects on disulfide bond determination.

3.4.3. PSSM conservation feature comparison between inter- and intra-protein disulfide bonds

From Figure 3A and Figure 3C, we can see that the conservation status against C is the most important pssm conservation feature for both inter- and intra-protein disulfide bonds prediction. Conservation status against A, H and Y play important roles in both inter- and intra-protein disulfide bonds prediction. However, there are some differences between these two disulfide bond types. For inter-protein disulfide bond prediction, the conservation status against M, W, P and V play important roles in disulfide bond prediction. For intra-protein disulfide bond prediction, the conservation status against S, R and K play important roles.

From Figure 3B and Figure 3D, we can see that conservation status at site 5 plays the most important roles in both inter- and intra-protein disulfide bonds prediction. Conservation statuses of sites at two sides (site 1, 2 and site 8, 9) play important roles for both inter- and intra-protein disulfide bonds prediction. However, conservation status of site 4 and 6 are more important for intra-protein disulfide bond prediction than inter-protein disulfide bond prediction. Conservation status of site 3 is relatively more important for inter-protein disulfide bond prediction than intra-protein disulfide bond prediction.

3.5. Amino acid factor analysis

3.5.1. for inter-protein disulfide bonds

We investigated the feature- and site-specific distribution of these 26 amino acid features in the optimal feature set.

As shown in Figure 4A, secondary structure and molecular volume play the most important role in disulfide bond determination. Codon diversity and polarity play relatively more important role in disulfide bond determination. Electrostatic charge plays relatively less important role.

As shown in Figure 4B, AAFactors at site 4 and site 6 play the most important role in disulfide bond determination. The remaining 6 sites play relatively equal and small role in disulfide bond determination.

3.5.2. for intra-protein disulfide bonds

As shown in Figure 4C, secondary structure has the most important effect on disulfide bond determination. Electrostatic charge and codon diversity have the second most important effect on disulfide bond determination. Polarity and molecular volume have relatively small effect on disulfide bond determination.

As shown in Figure 4D, amino acid factor features at site 4 have the most important effect on disulfide bond determination. The amino acid factor features at the remaining sites have important and almost equal effect on disulfide bond determination.

3.5.3. Amino acid factor comparison between inter- and intra-protein disulfide bond predictions

From Figure 4A and Figure 4C, we can see that secondary structure feature play the most important role in both inter- and intra-protein disulfide bonds prediction. The molecular volume feature is more important for inter-protein disulfide bond prediction and electrostatic charge feature is more important for intra-protein disulfide bond prediction.

3.6. Disorder score comparison between inter- and intra-protein disulfide bonds

3.6.1. for inter-protein disulfide bonds

There are 3 disorder features in the optimal feature set. They located at site 1, 6 and 9, indicating that the disorder status at directly adjacent and relatively distal sites is important for disulfide bond determination.

3.6.2. for intra-protein disulfide bonds

There were 3 disorder features in the optimal feature set, 2 located at site 9 and 1 located at site 1. This indicates the important effect of the disorder status at site 9 and the disorder status at site 1 are important for disulfide bond determination.

3.6.3. Disorder score comparison

For inter-protein disulfide bond prediction, disorder features at site 1, 6 and 9 were selected. For intra-protein disulfide bond prediction, 1 disorder feature at site 1 and 2 disorder feature site at site 9 were selected. The results demonstrated that disorder status at site 1 is important for both inter- and intra-protein disulfide bond determinations. Disorder status at site 6 is more important for inter-protein disulfide bond prediction. Disorder status at site 9 is more important than the other sites for intra-protein disulfide bond prediction.

3.7. Distance features

The optimal feature set included the distance feature with an index of 2, indicating that the distance between two Cysteines play important role in disulfide bond determination.

3.8. Directions for experimental validation

We investigated the top 10 and 20 features in the optimal feature sets for inter- and intra-protein disulfide bond predictions. The detailed analysis of the top features may provide useful information for understanding the disulfide bond formation mechanism and for further experimental studies.

3.8.1 for inter-protein disulfide bonds

For inter-protein disulfide bond prediction, there are 5 amino acid factor features, 4 pssm conservation features and 1 disorder feature. This indicates that amino acid factor features play the most important role for inter-disulfide bond prediction, which is different from the intra-protein disulfide bond prediction. The disorder feature at site 6, which was not selected in the intra-protein disulfide bond optimal feature set, has an index of 4 in the inter-protein disulfide bond optimal feature set, indicating that it plays different roles in inter- and intra-protein disulfide bond determination.

3.8.2 for intra-protein disulfide bonds

For intra-protein disulfide bond prediction, we can see that within the top 20 features, there are 19 PSSM conservation features with 1 DISTANCE features. This indicates that the conservation status within the cysteine sites is most important for intra-protein disulfide bond prediction. Previous study had shown that bound cysteines are significantly more conserved than unbound one. [32] The correlated mutation patterns of cysteine pairs forming disulfide bond had also been demonstrated.[15] The distance between the two disulfide-bond linked cysteins has an index of 2, indicating that it is an important feature for intra-protein disulfide bond prediction, which is consistent with previous study [14].

3.8.3 Comparison of the top features for inter- and intra-protein disulfide bonds

Comparing the top features within inter- and intra-protein disulfide bond prediction, we can see that there are different determination mechanisms for inter- and intra-protein disulfide bond prediction. For intra-protein disulfide bond, conservation and distance played almost exclusively role in disulfide bond determination. For inter-protein disulfide bond, amino acid factor and disorder features played more roles in disulfide bond determination. This may provide clues for further studies in this research area.

3.4 Comparisons with existed methods

We put our independent test data set into our prediction method. For inter-protein disulfide bonds, the prediction accuracy for positive, negative and total samples are 0.5556, 0.9065 and 0.8996 respectively. For intra-protein disulfide bonds, the prediction accuracy for positive, negative and total samples are 0.7151, 0.9028 and 0.8912 respectively.

We also put our independent test data set into other methods. The prediction accuracy for ** is **.

4 Conclusions

In this study, we developed a computational method for both inter- and intra-protein disulfide bond predictions. Based on mRMR and IFS methods, our approach can achieve a prediction accuracy of 0.8702 for inter-protein disulfide bonds using 128 features and 0.9219 for intra-protein disulfide bonds using 261 features. We then analyzed and compared the optimal feature sets for both intra- and inter-protein disulfide bonds. The results demonstrated that conservation; amino acid factor and disorder features can affect the disulfide bond formation. Our analysis also demonstrated that there are some unique features between intra- and inter-protein disulfide bonds. The selected optimal feature sets, especially the top features may provide important clues for understanding the mechanism determining the disulfide bond formation and for further experimental validation in this research area.

Supporting information

DataSet S1. Training data set for inter-protein disulfide bond prediction used in this study.

DataSet S2. Training data set for intra-protein disulfide bond prediction used in this study.

DataSet S3. Independent test data set for inter-protein disulfide bond prediction used in this study.

DataSet S4. Independent test data set for intra-protein disulfide bond prediction used in this study.

Table S1. Inter-protein IFS scores.

Table S2. Inter-protein optimal feature set.

Table S3. Intra-protein IFS scores.

Table S4. Intra-protein optimal feature set.

Figure captions

Figure 1. Distribution of prediction accuracy against feature numbers for both inter- and intra-protein disulfide bond prediction

(A) Distribution of prediction accuracy against feature numbers for inter-protein disulfide bond prediction. IFS prediction accuracy was plotted against feature numbers based on Table S1. The maximum accuracy is 0.8752 containing 207 features. To focus our analysis on a relatively small set of features, we selected the first predictive accuracy more than 0.87 that is 0.8702 containing 128 features. These 128 features were considered as the optimal feature set of our classifier. (B) Distribution of prediction accuracy against feature numbers for intra-protein disulfide bond prediction. The maximum accuracy is 0.9219 containing 260 features. These 260 features were considered as the optimal feature set of our classifier.

Figure 2. Feature and site specific distribution of the optimal feature set for both inter- and intra-protein disulfide bond prediction

(A)Feature distribution of the optimal feature set for inter-protein disulfide bond prediction. in the optimized 128 features, there were 26 amino acid factor features, 3 disorder score features and 99 PSSM conservation score features. This suggests that all the three kinds of features contribute to the prediction of protein cysteine disulfide bond patterns and conservation may play irreplaceable role for disulfide bond prediction. (B) Site specific distribution of the optimal feature set for inter-protein disulfide bond prediction. the center site (site 5) and relatively distal sites (site 1, 2 and 9) have the most influence on inter-protein disulfide bond prediction. Sites at 3, 4, 6 and 8 have relative small effect on tyrosine sulfation, and sites 7 have the smallest effect on inter-protein disulfide bond prediction. (C)Feature distribution of the optimal feature set for intra-protein disulfide bond prediction. In the optimized 260 features, there were 47 amino acid factor features, 3 disorder score features and 209 PSSM conservation score features and one distance feature. (D) Site specific distribution of the optimal feature set for intra-protein disulfide bond prediction. the center (site 4, 5, 6) and relatively distal sites (site 1, 2 and site 8, 9) have the most influence on cysteine disulfide bond determination. The remaining two sites (site 3 and site 7) have relatively small effect on disulfide bond determination.

Figure 3. Feature and site specific distribution of the PSSM features in the optimal feature set for both inter- and intra-protein disulfide bond predictions

(A) Feature distribution of the PSSM features in the optimal feature set for inter-protein disulfide bond prediction. the conservation status against Cysteine (C) play the most important role in disulfide bond determination. Otherwise, the conservation status against A, M, W, H, P, Y, V plays relative more important role in disulfide bond determination. (B) Site-specific distribution of the PSSM features in the optimal feature set for inter-protein disulfide bond prediction. the conservation status of cysteine (site 5) is most important. Sites at both sides (site 1, 2, 3 and site 8, 9) play relatively more important role in disulfide bond determination. The sites adjacent the cysteine site play relatively less important role in disulfide bond determination. (C)Feature distribution of the PSSM features in the optimal feature set for intra-protein disulfide bond prediction. the conservation status against C, S and A has the most effect on disulfide bond determination than against other residues. The conservation status against R, H, K, Y has the second most important effect on disulfide bond determination. (D) Site-specific distribution of the PSSM features in the optimal feature set for intra-protein disulfide bond prediction. the conservation status at the center (site 4, 5 and 6) and relatively distal sites (site 1, 2 and site 8, 9) has the most important effects on disulfide bond determination.

Figure 4. Feature and site specific distribution of the amino acid factor features in the optimal feature set for both inter- and intra-protein disulfide bond predictions

(A) Feature specific distribution of the amino acid factor features in the optimal feature set for inter-protein disulfide bond prediction. secondary structure and molecular volume play the most important role in disulfide bond determination. Codon diversity and polarity play relatively more important role in disulfide bond determination. Electrostatic charge plays relatively less important role. (B) Site specific distribution of the amino acid factor features in the optimal feature set for inter-protein disulfide bond prediction. AAFactors at site 4 and site 6 play the most important role in disulfide bond determination. The remaining 6 sites play relatively equal and small role in disulfide bond determination. (C) Feature specific distribution of the amino acid factor features in the optimal feature set for intra-protein disulfide bond prediction. secondary structure has the most important effect on disulfide bond determination. Electrostatic charge and codon diversity have the second most important effect on disulfide bond determination. Polarity and molecular volume have relatively small effect on disulfide bond determination. (D) Site specific distribution of the amino acid factor features in the optimal feature set for intra-protein disulfide bond prediction. Amino acid factor features at site 4 have the most important effect on disulfide bond determination. The amino acid factor features at the remaining sites have important and almost equal effect on disulfide bond determination.

Writing Services

Essay Writing
Service

Find out how the very best essay writing service can help you accomplish more and achieve higher marks today.

Assignment Writing Service

From complicated assignments to tricky tasks, our experts can tackle virtually any question thrown at them.

Dissertation Writing Service

A dissertation (also known as a thesis or research project) is probably the most important piece of work for any student! From full dissertations to individual chapters, we’re on hand to support you.

Coursework Writing Service

Our expert qualified writers can help you get your coursework right first time, every time.

Dissertation Proposal Service

The first step to completing a dissertation is to create a proposal that talks about what you wish to do. Our experts can design suitable methodologies - perfect to help you get started with a dissertation.

Report Writing
Service

Reports for any audience. Perfectly structured, professionally written, and tailored to suit your exact requirements.

Essay Skeleton Answer Service

If you’re just looking for some help to get started on an essay, our outline service provides you with a perfect essay plan.

Marking & Proofreading Service

Not sure if your work is hitting the mark? Struggling to get feedback from your lecturer? Our premium marking service was created just for you - get the feedback you deserve now.

Exam Revision
Service

Exams can be one of the most stressful experiences you’ll ever have! Revision is key, and we’re here to help. With custom created revision notes and exam answers, you’ll never feel underprepared again.