Predicting the Structure of Anopheles Gambiae Cytochrome P450 Protein in-silico

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Predicting the Structure of Anopheles Gambiae Cytochrome P450 Protein in-silico

Abstract — the CYP12F4 protein is a member of the Cytochrome P450 super-family of monooxygenanses, a large and diverse group of enzymes that catalyzes the oxidation of organic substances and metabolic reactions. It is found in the female African mosquito Anopheles gambiae (A. gambiae) that carries and transmits the most deadly malaria parasite, Plasmodium falciparum (Pf). Presently experimental structure is not available for this protein; it has thus remained uncharacterized with unknown functions.

This work employs in-silico methods to predict the structure of this metabolic catalyzer and further deduced a specific function for the same protein. Using 6 template proteins, 29 residues were modeled with homology. Several web servers were deployed to predict a computational model for CYP12F4. GOPET web tool was finally used to deduce the unique function of this protein. The folds were identified and analyzed and the protein was specifically found to be active in binding of molecules with 86% confidence value with various catalysis activities. 20 helices, 16 strands, 48 turn and 348 hydrogen bonds were elucidated and analyzed on the structure.

Several literatures have confirmed these findings. CYP12F4 is a heme and iron ion binding protein that carries heme and catalyses the incorporation of one atom from molecular oxygen into a compound and reduces the other atom of oxygen to water. A deep understanding of these properties of heme and its binding with respect to CYP12F4 protein is vital in malaria control.

I. Introduction

The following characteristics makes Anopheles gambiae the primary vector of malaria, they include: rapidly colonizing small pools of rain water, habouring the parasite plasmodium falciparum in its body under a wide range of environmental conditions, its acute anthropophilic nature, feeding and resting indoors i.e about 95% indoor resting catch in Kenya (Jannat, 2006). Protein structure prediction deals with inferring the structure of a target protein. Protein structures are determined experimentally and computationally. Experimental methods includes: x-ray crystallography, Cryo-Em, NMR spectroscopy. Computational methods are becoming popular and its faster than experimental methods. Computational methods include (ab initio, fold recognition and homology modeling). Protein structures are categorized into primary, secondary, tertiary and quaternary structures. The tertiary structure contains a single polypeptide chain with secondary structures, protein domains; while the quaternary structure gives the 3D atomic representation of the protein. Primary level structure prediction involves detection of remotely related sequences and for recognizing amino acid patterns to predict posttranslational modification and binding site. Secondary structure prediction involves secondary structure prediction (alpha helix and beta sheets), membrane-spanning regions and secondary structural class. Tertiary structure prediction involves threading a sequence into protein families with similar folds. Threading techniques as at 1995 was not able to show superiority over sequence pattern recognition methods (Eisenhaber et al, 1995).

As at the 1960’s, secondary structure prediction focused on identifying likely alpha helices and were based on helix-coil transition models (Fasman et al, 1974). Accurate predictions surfaced from the 1970’s that included beta sheets and relied on statistical assessment based on probability parameters derived from known solved structures. Accuracy of 60-65% is often achieved with under prediction of beta sheets (Mount, 2004). Moreover secondary structure prediction methods can achieve up to 80% accuracy in globular proteins with large database of known proteins and modern machine learning methods (e.g neural networks and support vector machines). Basically the theoretical upper limit of accuracy is around 90%, partly due to some twist in DSSP assignment near the ends of secondary structures, where local conformations vary under native conditions but may be forced to assume a single conformation in crystals due to packing constraints (Zhou, 2006).


Protein structure and function prediction is a very crucial study in cell biology, proteomic and drug therapeutics as well. Research has been carried out in this field from literature and has produced good results. A very recent work from (Zhu et al., 2013) showed the use of phylogeny (evolutionary relationship to analyze all CYPs in Tribolium Castaneum with genes in other insect species to deduce genetic evolution and function of T. Castaneum CYP gene family. The integrated use of annotations, molecular modeling, docking, phylogenetic analysis, gene expression revealed 143 CYPS in T. casteneum which may contribute to insecticide resistance in beetle. Their work also provided insights into the evolution of T. Castaneum CYP gene superfamily and developed a valuable resource for the functional genomics research necessary for understanding the strategies employed by insects in coping with their environment and to harness potential insecticides targest for pest control. Homology modeling was also used by (Lertkiatmongkol, et al., 2011) to model three CYPs (CYP6AA3, CYP6P7 and CYP6P8) implicated in insecticides. Modeling of these structures showed better understanding about different substrate preferences among the enzymes, variations among predicted substrates channels and geometry of active site was the reason behind their pyrethroids binding differences (i.e differences in their active sites structure may impact substrate binding and selectivity). Their result showed that the differences in metabolic activities in insect P450 can attribute to structural differences responsible for selectivity in their activities against insecticides. They concluded by saying that the predicted models may be used to explore target P450 inhibitors and in the analysis of the binding and metabolism of insecticide compounds that have potential for use in the control of A. gambiae. Similar study by (Sarapusit et al., 2013) involved the use of homology modeling (comparative modeling) to infer the structure of a cytochrome P450 enzyme (AnCYPOR) with rat as template (CYPOR). Detailed analysis revealed major differences in FMN-and FAD/ NAD (P) H binding domains that might lead to (differences in enzymatic properties and catalysis of mosquito CYPOR from mammalian CYPOR (rat), also mutagenesis study showed that C427 supports FAD binding in AnCYPOR and that NAD(P)H binding and catalysis differs from mosquito and rat. Computational approaches have proved to be faster, cost-effective and efficient than experimental methods from these related works.

Other approaches not peculiar to structure protein structure prediction have been used for in-depth analysis of the target cytochrome P450 protein at metabolic and catalytic level. The combination of molecular modeling and quantitative structure-activity relationship (QSAR) study by (Lewis and Ito, 2009) was used to understand the factors that determine substrate selectivity and binding to the human drug metabolizing P450s. Detailed review by (Felix and Silveira, 2012) expatiated on the general structure of P450 Cytochromes, role of Cytochrome P450s in A. gambiae as detoxifying enzymes, highlighting the link between A. gambiae P450 cytochrome and insecticide resistance, response to malaria infection in P. berghi and P. falciparum invasion (Felix et al., 2010). Further details showed that effect of chloroquine (in the abundance of transcripts responsible for encoding proteins involved in a variety of processes) including P450 cytochromes (CYP9L1, CYP304B1 and CYP305A1) expressed differently in a blood meal containing P. berghi. In relation to insecticides resistance Pyrethroid resistance has been detected in the A .gambiae (Nikou et al., 2003) due to a combination of target site insensitivity and increased oxidative metabolism, catalyzed by P450s cytochrome.


Tools used in predicting the structure of the target protein were selected from the top 5 ranked web server tools from Critical Assessment of Techniques for protein Structure Prediction (CASP 10) for high performance, these includes Phyre2, I-TASSER and Swiss-Model. 3DLigandSite was used to model the active sites, while PROCHECK was used to evaluate the structure based on a ramachandran plot. JMOL and Swiss-Model viewer was used to visualize the structures. GOPET was used to predict the function of the protein. Fig.1 shows the workflow for predicting the structure of the target protein.

Fig.1 work flow

IV. results

the final model was modeled with Swiss-Model web server , Binding sites from Swiss-Model includes 118,139,142,150,154,161,211,319,320,324,327,328,392,394,417,460,461,462,465,466,467,468,469,470,471,473,474,477 predicted from 3DLigandSite. Below is the comparative analysis of the results from the structure predictions.

Fig 2 (a) final model

(b) ligand binding site

(c) Ramachandran plot for final model

(d) heme binding site

GOPET was used to predict the Molecular function (heme binding 86%, catalytic activity 83% confidence level) and QuickGO for Biochemical process (Oxidation-reduction process), Molecular function (monooxygenase activity, iron ion binding, oxidoreductase activity, oxidoreductase activity acting on pair donor, with incorporation of reduction of molecular oxygen, electron carrier activity). Structural alignment from Dalilite between the target protein cyp12f4 and template (rat) is shown below, and is reveals a Z-score of 43.3 and 387 numbers of equivalent residues between mol1A and mol2A, 41.9 Z-score and 387 number of equivalent residues between mol1A and mol2B. The red coloured regions shows superimposed regions of alignment.

Fig. 3 structural alignment


From the comparative analysis of results, we observed unique features of the CYP12f4 protein which includes its substrate binding sites (at the F, R, V, I, 400 positions), cytochrome reductase/cyp b5 binding site, heme binding site (Ligand Heme name: HEME B, PROTOPORPHYRIN IX CONTAINING FE, with Chemical Formula C34 H32 Fe N4 O4 ) and NADPH-binding site which is the source of electron transfer. These are peculiarities to the family of cytochrome P450s. CYPs have shown resistance to organic compounds like insecticides and sub-family members of CYP12F4 have been implicated in this. The most highly conserved regions among P450s lie between the I and L helices, and are involved in heme binding. The heme of P450s is covalently bound to an invariant cysteine, which is enveloped by aβ–bulge region called the Cys-pocket. Three residues besides the cysteine are very strictly conserved, two Glycine ( is in a position that allows the formation of the β–hairpin turn, it serves two roles: allowing for a sharp turn from the Cys-pocket into the L helix, allowing for a sharp turn from the Cys-pocket into the L helix and for proximity to the heme) and one Phenylalanine ( close to the sulfur-iron bond which is based from an Inference from Human_CYP11A1 Cholesterol side-chain cleavage enzyme, mitochondrial). Multiple sequence alignment between CYP12F4 and sub-family members revealed the following conserved regions 466 (P) PRO 87%, 455 (P) PRO 62%, 437 (R) ARG 87%, 436 (E) GLU 62%, 470 (G) GLY 87%, 327 (T) THR 75%, 438 (W) TRP 62%, 435 (P) PRO 75 %, 433 (F) PHE 84 %, 430 (P) PRO 72 %, 427 (F) PHE 62%. Final model is structurally similar to the Crystal structure of rat mitochondrial P450 24A1 S57D in complex with CHAPS chain A and B at 58.7% sequence identity. Structural alignment from DaliLite showed Z-Score of 43.3, 387 number of residues and rms deviation of C-alphas (amstrong) for molecules of target protein and template structures.


From literature sub-family members of CYP12F4 have been highly expressed in A. gambiae to insecticides resistance by detoxifying the organic compounds, 1 day after blood meal containing P.Berghi up-regulated in fat body, in A. gambiae midgut due to malaria infection from P.Berghi and P.falciparum. The sub-family of cytochrome P450 (CYP12) have shown response to malaria infection by hemocytes. Future work seeks experimental confirmation for the structure of the target protein and its function.


  1. Anorld, K., Bordoli, L. Kopp, J. andSchwede, T. (2005). The SWISS-MODEL workspace: a web-based environment for protein structure homology modeling. Oxford Journals. Science & Mathematics. Bioinformatics. Vol. 22. Issue 2. Pp 195-201
  2. Bonneau, R. and Baker, D. (2001) Ab initio protein structure prediction: progress and prospects. Annual Reviews 30: 173-89 pg 1.
  3. Eisenhaber, F., Persson, B. and Argos, P. (1995) Protein structure prediction: recognition of primary, secondary and tertiary structural features from amino acid sequence. Pubmed, PMID 758 7278; 30(1):1
  4. Félix, RC and Silveira, H (2012). The Role of Anopheles gambiae P450 Cytochrome in Insecticide Resistance and Infection, Insecticides - Pest Engineering, Dr. Farzana Perveen (Ed.), ISBN: 978-953-307-895-3, InTech, Available from:
  5. Froimowitz, M. and Fasman, G.D (1974) Prediction of the secondary structure of proteins using the helix-coil transition theory. Macromolecules 7(5), 539-9. PMID 4371089.
  6. Garnier J., Osgurthorpe D.J. and Robson, B. (1998). Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. JMOL. Biol. 120 (1) 97-120.
  7. Jannat, N. K. (2010) Effect of larval environment on some life history parameters in anopheles gambiae s.s (Diptera: Culicidae). Simon Fraser University, Library. Canada pg 3.
  8. Kelly, L. A. and Slernberg, M.J.E. (2009) Protein structure prediction on the web: a case study using the phyre server. Nature Proc 4, 363-371.
  9. Kihara, D., Zhang, Y., Lu, H., Kolinski, A. and Skolnick, J. (2002) Ab initio protein structure prediction on a genomic scale: Application to the mycoplasma genitalium genome. PNAS Vol.9 (9), pg 5993-5998.
  10. Lakizadeh, A. and Marashi, S.A (2009). Addition of contact number information can improve protein secondary structure prediction by nueral networks. Excil J. 8 pg 66-73.
  11. Mount, D.M. (2004). Bioinformatics: sequence and genome analysis2. Cold Spring Harbor Laboratory Press. ISBN 0-87969-712.
  12. Nikou, D., Ranson, H. and Hemingway, J. (2003). An adult-specific CYP6 P450 gene is overexpressed in a pyrethroid resistant strain of the malaria vector, Anopheles gambiae. Elsevier. Gene. Vol. 318. Pg 91-102.
  13. Roy, A., Kucukural, A. and Zhang, Y. (2010) I-TASSER: a unified platform for automated protein structure and function prediction. NIHPA Manuscripts. Vol. 5. 4. Pg 725-738
  14. Schwede, T., Kopp, J., Guex, N. and Peitsch, MC. (2003).SWISS-MODEL: an automated protein homology-modelling server. Oxford Journals. Science & Mathematics. Nucelic Acids Research. Vol. 31. Issue 13. Pp 3381-3385.
  15. Wass, N.M, Kelley, L.A and Sternberg (2010). 3DLigandSite: Predicting ligand-binding sites using similar structures. Oxford Journals. Science & Mathematics. Nucleic Acids Research. Vol. 38. Issue suppl 2. pp W469-W473.
  16. Zhang, Y. (2008). I-TASSER server for protein 3D structure prediction. BMC Bioinformatics. Vol. 9. 40
  17. Zhang, Y. (2008). Progress and challenges in protein structure prediction. Curr. Opin. Struct. Biol., 18 (3), 34 2-8 PMC 2680823 PMID 18436442.
  18. Zhou, D.O. (2006). Achieving 80% tenfold cross-validated accuracy for secondary structure prediction by large-scale training. Proteins. 66 (4), 8 38-45. PMID 17177203.