Proteases Are Essential Enzyme For Survival Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Proteases are essential enzyme for survival of all organisms, and they are encoded by about 2 of genes in all kind of organisms Rawlings and Barrett, 1998. Proteases of microbial origin possess considerable industrial potential due to their biochemical diversity and wide applications in tannery and food industries, medicinal formulations, detergents and processes like waste treatment, silver recovery and resolution of amino acid mixtures. They have diverse applications in a wide variety of industries, such as in detergent, food, pharmaceutical, leather, silk and recovery of silver from used X-ray films (Morya and Yadav, 2010).

The industrial demand of proteolytic enzymes, with appropriate specificity and stability to pH, temperature and surfactants, continues to stimulate the search for new enzyme sources. The catalytic type of a peptidase relates to the chemical groups responsible for its catalysis of peptide bond hydrolysis. The six specific catalytic types that are recognized are the serine, threonine, cysteine, aspartic, glutamic and metallo-peptidases (Rawlings et al., 2007). Insights to crystallographic structures of proteases revealed that the active site is commonly located in a groove on the surface of the molecule between adjacent structural domains, and the substrate specificity is dictated by the properties of binding sites arranged along the groove on one or both sides of the catalytic site that is responsible for hydrolysis of the bond cleaved (the scissile bond). Besides the nucleophile severalother residues are also important for catalysis and maintaining the structure of the active site (Rawlings et al., 2007). The active site residues are often well conserved among different proteases within a family. The basic nature of serine and cysteine proteases is mainly due to the presence of histidine and sometimes lysine residue.. In many serine peptidases this third member of the catalytic triad is an aspartate, for example in chymotrypsin (S01.001), subtilisin (S08.001) and carboxypeptidase Y (S10.001). In assembling (S21.001) the third residue is a second histidine, and in d-Ala-d-Ala carboxypeptidase A (S11.001) it is a second serine. Exceptionally, the serine peptidases omptin (S18.001) and eukaryote signal peptidase (S26.010) have a Ser/His catalytic dyad only. In cysteine peptidases the third member of the triad may be asparagines (e.g. papain, C01.001), aspartate (e.g. deubiquitinating peptidase Yuh1, C12.001) or glutamate (e.g. adenovirus endopeptidase, C05.001) (Rawlings et al., 2007). There are many cysteine peptidases which have only a Cys/His dyad, however. In serine and cysteine peptidases, a fourth residue is often important because it helps stabilize the transitional acyl-intermediate that forms between the peptidase and the substrate as a first stage of catalysis. A residue forms a hydrogen bond with the negatively charged oxygen atom, and this catalytic subsite is known as the oxyanion hole. In chymotrypsin this fourth important residue is glycine, in subtilisin it is asparagine and in papain it is glutamine. Some peptidases appear to have only one catalytic residue, which is the N-terminal residue. These are known as N-terminal nucleophile (Ntn) hydrolases. All known threonine peptidases are Ntn-hydrolases, but there are also some serine peptidases (e.g. penicillin G acylase precursor, S45.001) and cysteine peptidases (e.g. penicillin V acylase precursor, C59.001), that are autolytic peptidases. In Ntnhydrolases, the N-terminal amino group is thought to function as the general base. Full descriptions of the catalytic mechanisms of serine, cysteine and threonine peptidases have been provided by Polgar (Polgar, 2004a; Polgar, 2004b). No residues other than the aspartates are known to be involved in catalysis by the aspartic peptidases (James, 2004). In metallopeptidases other residues have been shown by mutation studies to be essential, but exactly what their roles may be is controversial (Auld, 2004). A glutamate is important for activity in all the metallopeptidases that carry the HEXXH zinc-binding motif (e.g. thermolysin, M04.001), as well as carboxypeptidase A (M14.001). In metallopeptidases that have two catalytic metal ions, two residues are essential, often a glutamate and an aspartate (e.g. glutamate carboxypeptidase, M20.001).

The tools of bioinformatics are being extensively used for characterizations of different proteins on the basis of sequences and structural datas for functional annotation (Shapiro and Harris 2000; Gutteridge et al., 2003; Ofran et al., 2005 Add few more current refs). The knowledge of catalytic residues can greatly help in performing enzyme-targeted drug design, understanding the catalytic mechanism of enzyme reaction and constructing metabolic pathways (Bartlett et al., 2002; Chou and Cai,2004;Porter et al.,2004). The putative identification of possible catalytic residues of the query enzymes can be determined by either sequence or structural similarity based methods (Ref.).. In case of sequence similarity based methods, the prerequisite is the identification of homologous enzymes sequence with known catalytic residues followed by transfer of catalytic residues in an identified homolog to the query sequence. This method can be misleading based on the fact that enzyme functions are less conserved (Todd et al., 2001; Rost, 2002; Tian and Skolnick, 2003).The structural similarity based method can be used to identify the catalytic residues even in absence of sequence similarity provided that the 3D structure for the query enzyme is available (Orengo et al., 1999). Further proteins without detectable sequence or structural similarity might be having similar configuration of active sites for performing c similar reaction (Torrance et al., 2005; Zhang and Grigorov, 2006; Zhang and Tang, 2007).

This paper reports in silico characterization of active site residues of different proteases representing different species of Aspergilli utilizing the MEROPS databases (Rawlings et al., 2007, 2010) and different tools of bioinformatics. The phylogenetic analysis of these 130 proteases sequences representing different clans based on types of proteases has also been attempted.

Materials and methods

Sequence retrieval

The protease sequences of Apergillus were downloaded from NCBI and Swissprot server and arranged in FASTA format. The putative, uncharacterized and fragmented sequences were not considered for this study. Total 130 sequences including 36 acidic, 24 cysteine, 21 metallo, 44 serine and 5 neutral proteases reported from different Aspergillus species were retrieved (Morya et al., 2012).

Prediction of active site:

The active site of the proteases were predicted manually as well as using MEROP database server ( MEROPS database provides a catalogue and structure-based classification of proteases (i.e. all proteolytic enzymes). The proteases are classified into families based on statistically significant similarities between the protein sequences in the part termed the `peptidase unit' that is most directly responsible for activity (Rawlings and Barrett, 1998; Rawling et al., 2007).

Sequence analysis and phylogenetic study:

Clustal X2 and seaview were used for multiple sequence alignment analysis (Rao et al., 1998) of protease sequences, for phylogenetic analysis. For construction of rooted and non rooted dendrogramthe Mega 4.1 (Kumar et al., 2004; Tamura et al., 2007; Rawlings and Barrett, 1993) was used. The Dendrograms were construction by Neighbour-Joining method (NJ) (Kato et al., 1987; Saitou and Nei, 1987).

Pattern analysis:

For the search of the domain within the sequences of proteses the Pfam server ( was used. Domain analysis was done using MEME ( The protein conserved motif deduced by MEME were subjected to biological functional analysis using protein BLAST, and domains were studied with Interproscan providing best possible match based on the highest similarity score (Morya et al., 2012; Yadav et al., 2009).

Results and discussion:

A total 130 full length sequences were downloaded and subjected for analysis by MEROP identifier. As the proteases are classified in to aspartic protease, cysteine protease, metallo protease, neutral protease and serine protease, this classification was based on based on molecular and physiological behavior. While MEROPS classification system has grouped proteases into clans that typically have the same linear order of catalytic triad residues and/or structural homology ( webcite; Rawlings et al., 2010; Laskar et al., 2012). In this study the attempt has made to understand the catalytic site and phylogenic relation among proteases of Aspergillus. A total of 130 proteases sequences comprising of 44 serine proteases, 36 aspartic proteases, 24 cysteine protease , 21 metalloproteases and 5 neutral proteases representing different species of Aspergillus were analyzed here.

Out of 130 a total 36 have identified as aspartate protease belonging to A1 and S53 family. Out of 36 sequences eight were predicted manually, i.e. using homology search and secondary structure prediction and found to be member of S53 and A1 family, those were XP_001401093.1, 1918207A, XP_753324.1, EDP52067.1, BAC00848.1, XP_001276381.1, XP_001399855.1 and XP_754479.1 respectively. The A1 family shares a triad of ¿½Asp-Tyr-Asp¿½ while S53 have ¿½Glu-Asp-Asp-Ser¿½ tetrad as active site residues (Coates et al., 2008; Robbins et al., 2009). The position of Asp, Tyr and Asp residue in all 32 A1 family of proteases were found to be variable i.e. 100-142-281, 101-143-283, 102-144-284, 108-155-305, 109-152-294, 115-223-326, and 112-167-307 (Table 1). Four sequences were identified as member of S53 family sharing a tetrad of Glu-Asp-Asp-Ser at 320-324-410-518 but sequence CAE51075.1 have the distribution of tetrad at 301-305-426-569 position (Table 1). Two members namely EAW14955.1 and XP_001276381.1 of A1 family showed dyad of His84-Asp278, instead of catalytic triad (Table 1).

In protease out of 130 sequences total 24 sequence have identified as cysteine protease belong to C2, C14 and C54 family. Active sites of 17 sequence were retrived by MEROP database and seven sequences namely XP_750095.1, XP_002372925.1, EED57313.1, XP_002381967.1, XP_750419.2, XP_002375933.1, XP_754211.1 determined by manually i.e. homology search and secondary structure prediction and found to be member of C1 and C14. The C2 family shares a tetrads of ¿½Gln-Cys-His-Asn¿½ while C14 family have ¿½His-Cys¿½ dyad and C54 family have ¿½Tyr-Cys-Asp-His¿½ tetrads as the active site residues. The position of Gln,Cys,His and Asn residues in all 10 C2 family protease were found to be same.(Table 2), one sequence were identified as member of C-54 family sharing a tetrad ¿½Tyr-Cys-Asp-His¿½ at position 98-147-322-324 position (Table-2) and 13 members of belong to family C14showed dyad of His241,Cys297 instead of catalytic triad (Table-2).

The 21 sequences were identified as metallo protease and belonging to M12, M48, M22, M41, M28 and M1. Active site of total 19 sequence were deduced by ¿½MEROP¿½ database and two EAL85589.1, EAW11933.1, determined by manually i.e. homology search and secondary structure prediction and found to be member of M22 family. In M12 family there is a motif His-Glu-Xaa-Xaa-His-Xaa-Xaa-Gly-Xaa-Xaa-His in which the three His residues are ligands of a zinc atom and the Glu has a catalytic role and M48 family have His-His-Glu metal ligand of a zinc atom and Glu has a catalytic role. In the M22 family contains His-His metal ligands of zinc atom whereas in the M41 family have His-His-Asp, metal ligands of zinc atom and Glu has catalytic role (Auld, 2004). Family M28 contains His-Asp-Glu-Asp-His ligands of a zinc atom and ¿½Asp-Glu¿½ dyad has catalytic role. In the family M1 have His-His-Glu ligand of zinc atom and ¿½Glu-Tyr¿½ dyad has catalytic role (Auld, 2004; Rawlings et al., 2007). The position of metal ligand in family M12, M48, M41, M28 and M1 were 423-427-433, 297-301-390, 157-161, 540-544-618, 314-326-359-387-472 and 387-391-410 respectively. The catalytic residues of family M12, M48, M22, M41, M28 and M1 were 424, 298, 541, 316-358, 388-476 respectively (Table-3).

Only 5 sequences were identified as neutral protease belonging to M36 and M35 family. Active sites of four sequences were predicted by ¿½MEROP¿½ data base. The Active site of 1718308A determined by manually i.e. homology search and secondary structure prediction and found to be member of M35. The M36 family contains the HEXXH motif which occurred in metallopeptidases from clan MA in which two His ligands are associated with zinc atom and one Glu has catalytic role. The M35 family also shares two His and one Glu residues with M36 family but an additional Asp residue also found which acts as ligands to zinc. The position of metal ligand in family M36, 429-433 and in M35, 321-325-336 was deduced. The catalytic residues in family M36 and M35 was found to be 430-322 respectively (Table 4).

A total 41 serine proteases have been identified and distributed in S8, S1, S10 and S53 family. Active site of above proteases were predicted by available database and three namely EED55509.1, XP_753718.1, XP_001274135.1 determined by traditional method and found to be member of S8 family. The S8 and S53 family shares a tetrads respectively Asp-His-Asn-Ser and Glu-Asp-Asp-Ser while S1 and S10 family contains triad respectively His-Asp-Ser and Ser-Asp-His as the active sites. The position of tetrads of Asp-His-Asn-Ser residues in all 26 family protease were found to be variable i.e. 162-193-284-349, 325-357-454-519, 175-213-314-385 (Table 5). Four sequences identified as member S1 family sharing a triad of His-Asp-Ser at 121-152-236, position (Table 5). One sequence of family S10 showing ¿½Ser-Asp-His¿½ Triads in position 198-405-470 (Table 5) while family S53 showing Glu-Asp-Asp-Ser tetrads in position 320-324-410-518.

Of all serine proteases, the PA clan of endopeptidases is the most abundant and has been studied the most in-depth. Although most members of this clan utilize a nucleophilic Ser residue (S sub-clan), there are several viral PA proteases that alternatively use a nucleophilic cysteine (Cys) residue (C sub-clan) (Laskar et al., 2012; Polgar, 2004). However, this study focuses solely on the PA clan serine proteases and more specifically members of the S1 family that bear the archetypal trypsin fold. Most clan PA proteases have trypsin-like substrate specificity, cleaving the polypeptide substrate on the carboxyl side of an arginine (Arg) or lysine (Lys) amino acid (Rawlings et al., 2007, 2010). Nucleophilic attack by the Ser195 (standard chymotrypsin numbering) hydroxyl group on the carbonyl of the peptide substrate initiates the proteolytic mechanism. This reaction is catalyzed by the His57 acting as a general base, which itself is supported by a hydrogen bond to Asp102. The resulting tetrahedral intermediate is stabilized by Gly193 and Ser195, which contribute to a positively charged pocket known as the oxyanion hole. This tetrahedral intermediate breaks down to an acylenzyme intermediate, followed by the formation of a second tetrahedral intermediate. With the protonation of Ser195 by His57, the second tetrahedral intermediate breaks down and the carboxyl terminus of the substrate is released (Chou and Cai, 2004; Rawlings 2007).

The S1 proteases are comprised of 2 ¿½-barrels that align asymmetrically in a classical Greek key formation, bringing the catalytic residues together at their interface. The His57 and Asp102 reside in the N-terminal ¿½-barrel with the nucleophilic Ser195 and oxyanion hole generated by the C-terminal ¿½-barrel (Rawlings et al., 2007). High specificity of their catalytic domains, interactions among the regulatory regions, and efficient removal of active serine proteases by irreversible protease inhibitors ensure local, transient reactions to physiological or pathological cues (Rawlings et al., 2007; Moray et al., 2012). The S1 proteases have numerous functions including intestinal digestion (eg. trypsins, chymotrypsins, elastases), blood coagulation (eg. thrombin, coagulation factors), immunity (eg. complement factors, tryptases in secretory granules of mast cells, granzymes of cytotoxic cells) and homeostatic regulation (eg. kallikreins) (Tian and Skolnick, 2003; Rawlings et al., 2007; Morya et al., 2012 )

This study investigates the structural properties of different S1 family serine proteases from a diverse range of taxa using molecular modeling techniques. Although the catalytic core geometry shows evolutionary divergence between taxa, the relative positions of the catalytic triad residues were conserved, as were other highly conserved residues that possibly provide stabilization. There was also large variation in secondary structure features outside the core, the overall amino acid distribution, and surface electrostatic potential patterns between species.

Sequence alignment analysis showed that maximum conservations were found between stretch 351 to 537 and 586 to 730. The results from of protease by comparison of amino acid sequences are attractive in principle, but it is greatly complicated in practice by the chimeric nature of many proteins, which have separate domains that show quite different relationships. A total 14 clusters were obtained belonging to various types of peptidases. All Aspartate proteases were clustered in clan 1. Similarly, tripeptidyl aminopeptidase (clan-2), metacaspase Cas (clan-3), Neutral protease (clan-4), O-sialoglycoprotein endopeptidase (clan-5), subtilisin-like serine protease (clan-6), alkaline protease (clan-7), autophagic serine protease (clan-8), kexin like processing protease (clan-9), cysteine protease PalB (clan-10), CaaX prenyl protease (clan-11), Leukotriene A-4 hydrolase (clan-12), intermembrane space AAA protease IAP-1 (clan-13), Pro-apoptotic serine protease nma111 (clan-14) family for proteases also found. We have already reported that on the basis of different active site and their position among proteases are important tool to classify the proteases in different peptidases family (Rawling et al., 2007). The phylogenetic clustering and dendrogarm analysis also supports this work with whatever reported in earlier reports (Rawlings and Barrett, 1998; Todd et al., 2001; Rost, 2002; Tian and Skolnick, 2003; Rawlings et al., 2007; Tamura et al., 2007). We have found that focusing on a particular functional group of proteins and classifying these strictly with reference to the part of each protein that is primarily responsible for the function has allowed a coherent system of classification to be developed for the peptidases. We suggest that this method could be applied to other functional groups of proteins. No doubt that MEROPS is one of best database for proteases providing an organizational framework to which it is relatively easy to add further information on peptidases. But some of sequences of proteases from Aspergilli were not included in the MEROPS. For such sequences we have attempted conventional method of active site prediction. This will help the researchers to facilitate then with information for enzyme engineering for designing a robust enzyme for various industrial uses. There is also scope for adding data for the specificity, biochemical characteristics, inhibitor profile and a concise bibliography for each protease.