Proteolytic activity is important for normal functioning of an organism and must be rigorously controlled to avoid potentially dangerous excess protein degradation. Failure in biological control mechanisms of proteolytic activities causes a wide range of diseases, among them cancer, rheumatoid arthritis and osteoarthritis, Alzheimer's disease, multiple sclerosis and muscular dystrophy (reviewed in Turk et al., 2000). For many diseases resulting from excess proteolysis, no inhibitors have yet been identified with the necessary profile for therapeutic use. Thus, research into the physiological roles of proteases and into the discovery of substances to modulate them will remain a priority of both science and the pharmaceutical industry for the foreseeable future. For now, drug design proceeds hand in hand with the discovery of the biological roles of enzymes; when a specific role has been identified by an inhibitor, this compound is already a drug candidate.
About 500-600 proteases have been identified in the human genome (Lopez-Otin & Overall, 2002). Of these, about 60 are lysosomal proteases (Mason, 1995), which include a group of about a dozen papain-like lysosomal cysteine proteases. For historical reasons, intracellular proteases were named cathepsins (the root of the word originates from the Greek language, where it means to digest); however, there is no strict rule that links the reactive mechanism and localization of cathepsins with their name. All known lysosomal cysteine proteases are cathepsins, but not all cathepsins are lysosomal or cysteine proteases. Cathepsins D and E are aspartic proteases, whereas cathepsins A and G are serine proteases; cathepsins E and G are not lysosomal proteases. Discovery of legumain (Chen et al., 1997, 1998), also a lysosomal cysteine protease belonging to clan CD, added to the confusion in nomenclature. This review focuses on the group of papain-like cysteine proteases, which are ubiquitous among living organisms (including bacteria, viruses and plants, and lower and higher animals, including parasites). Particular attention is paid to human lysosomal enzymes (cathepsins) and their mammalian homologues. The relatively small size of the group, the uniquely reactive cysteine sulfohydryl group (pKa in the range 2.5-3.5; Pinitglang et al., 1997) and their unique reactive mechanism make these enzymes attractive targets for drug design. There are 11 human enzymes currently known (cathepsins B, C, F, H, L, K, O, S, V, X and W; Turk et al., 2000; Turk, Turk et al., 2001) and it is quite likely that the list has already been completed. Human gene data bank searches have not indicated any new members of the family (Sali, personal communication).
Get your grade
or your money back
using our Essay Writing Service!
The classical cathepsins (B, C, H, L and S) were discovered by biochemical techniques, beginning with cathepsin C (Gutman & Fruton, 1948). Other cathepsins (F, K, O, V, X and W) were found in the 1990s by means of DNA-manipulation techniques. The scientific community still awaits reports on the biochemical characterization of cathepsins O and W. The papain-like fold was revealed in the early days of crystallography (Drenth et al., 1968); however, structural characterization of cathepsins began in earnest in the early 1990s with the cathepsin B structure (Musil et al., 1991). Crystal structures of all human representatives or their mammalian analogues except cathepsins O and W have now been determined and are available from the PDB (Table 1). The relevance of cathepsins as potential drug targets is best indicated by the fact that four (K, S, V and recently F) of the nine structures of cathepsins were published by industrial research groups. Structures of cathepsins L (Fujishima et al., 1997) and S (McGrath et al., 1998) have been also reported by industrial groups but are not yet publically available. An additional publication describing the complexes of cathepsin S with synthetic inhibitors is in preparation (Cygler & Rath, personal communication). So far, structures of four zymogens are known (Table 2). Their structure determination generally followed the structures of the active enzymes. An exception was the procathepsin L structure (Coulombe et al., 1996), which preceded the mature enzyme structure by 3 years.
Primary citations and the PDB codes of cathepsin structures
The parallel entries indicate structures that were determined simultaneously by several groups.
Cathepsin PDB code Citation Species
B 1huc Musil et al. (1991) Human
C 1jqp Olsen et al. (2001) Rat
1k3b Turk, Janjic et al. (2001) Human
F 1m6d Somoza et al. (2002) Human
Always on Time
Marked to Standard
H 8pch Guncar et al. (1998) Porcine
L 1icf Guncar et al. (1999) Human
K 1mem McGrath et al. (1997) Human
1atk Zhao et al. (1997) Human
S 1glo Turkenburg et al. (2002) Human
V 1fh0 Somoza et al. (2000) Human
X 1ef7 Guncar et al. (2000) Human
Primary citations and the PDB codes of proenzyme cathepsin structures
The parallel entries indicate structures that were determined simultaneously by several groups.
Zymogen PDB code Citation Species
B 1mir Cygler et al. (1996) Rat
1pbh Turk et al. (1996) Human
3pbh Podobnik et al. (1997) Human
L 1cjl Coulombe et al. (1996) Human
K 7pck Sivaraman et al. (1999) Human
1by8 LaLonde et al. (1999) Porcine
X 1deu Sivaraman et al. (2000) Human
2. Physiological roles and localization
The papain-like lysosomal cysteine proteases have long been believed to be responsible for protein degradation in lysosomes (Kirschke et al., 1995). Analyses of gene knockouts suggested that this function is not exclusively dependent on any single cathepsin (Saftig et al., 1998; Shi et al., 1999; Pham & Ley, 1999; Deussing et al., 1998; Nakagawa et al., 1998, 1999; Roth et al., 2000). However, analyses of gene knockouts and the locations of mutations on genes of lysosomal cysteine proteases responsible for some hereditary diseases revealed several specific biological functions. These functions are a consequence of limited proteolysis of their target substrates and additionally rely on co-localization and timing.
ï¿½(i) Cathepsin K was found to be crucial in bone remodelling (Chapman et al., 1997; Saftig et al., 1998).
ï¿½(ii) Cathepsin S is the major processing enzyme of the MHC class II associated invariant chain and is thus essential for the normal functioning of the MHC class II associated antigen processing and presentation (Nakagawa et al., 1998, 1999; Shi et al., 1999). Cathepsins L and F were shown to participate in the same process, primarily in tissues or cells not expressing cathepsin S (Nakagawa et al., 1998; Shi et al., 2000), although the role of the former has probably been taken by cathepsin V in humans (Brï¿½mme et al., 1999).
ï¿½(iii) Cathepsin L-deficient mice developed periodic hair loss and epidermal hyperplasia, indicating that cathepsin L is involved in epidermal homeostasis and regular hair-follicle morphogenesis and cycling (Roth et al., 2000). One-year-old cathepsin L-deficient mice (Stypmann et al., 2002) exhibited histomorphological and functional alterations of the heart, resulting in dilated cardiomyopathy, which is a frequent cause of heart failure.
ï¿½(iv) Cells derived from cathepsin C-deficient mice fail to activate groups of serine proteases from granules of immune (cytotoxic T lymphocytes, natural killer cells) and inflammatory (neutrophils, mast cells) cells primarily involved in the defence of the organism, demonstrating that cathepsin C is involved in their activation (Pham & Ley, 1999; Wolters et al., 2001). The current list of unprocessed zymogens of proteases in cathepsin C knockout mice contains granzymes A, B and C, cathepsin G, neutrophil elastase and a chymase.
More data about the processes in which the lysosomal papain-like cysteine proteases participate can be found elsewhere (Kirschke et al., 1995; Chapman et al., 1997; Barrett et al., 1998; McGrath, 1999; Turk et al., 2000; Turk, Turk et al., 2001; Brï¿½mme & Kaleta, 2002).
Lysosomal cysteine proteases have been found to be associated with a number of pathologies, including cancer, inflammation, rheumatoid arthritis and osteoarthritis, Alzheimer's disease, multiple sclerosis, muscular dystrophy, pancreatitis, liver disorders, lung disorders, lysosomal disorders, Batten's disease, diabetes and myocardial disorders. In many of these diseases, the lysosomal enzymes have been found in the extracellular and extralysosomal environment in their (zymogenic) `pro' forms, which are substantially more stable than the mature enzymes (reviewed in Kirschke et al., 1995; Chapman et al., 1997; Barrett et al., 1998; Kos & Lah, 1998; Turk, Turk et al., 2001). Cathepsins also participate in apoptosis, although the exact mechanism is not yet clear (Stoka et al., 2001; Salvesen, 2001; Leist & Jï¿½ï¿½ttelï¿½, 2001; Turk et al., 2002).
Several genetic disorders have been traced to genes of lysosomal cysteine proteases. Pycnodysostosis, an autosomal recessive osteochondrodysplasia characterized in humans by severe bone abnormalities, was found to be associated with the loss-of-function mutation of cathepsin K (Gelb et al., 1996), while loss-of-function mutation in the cathepsin C gene leads to Papillon-Lefevre syndrome, an autosomal recessive disorder characterized in patients by palmoplantar keratosis and severe early-onset periodontitis (Toomes et al., 1999; Hart et al., 1999; Hart, Hart, Michalec, Zhang, Firatli et al., 2000; Hart, Hart, Michalec, Zhang, Marazita et al., 2000; Allende et al., 2001). These effects are quite likely to be a result of incomplete processing of some as yet unidentified proteases presumably involved in establishing or maintaining the structural organization of the epidermis of the extremities and the integrity of the tissues surrounding the teeth and in the processing of proteins such as keratins (Nuckolls & Slavkin, 1999). In addition, cathepsin C may be involved in chronic airway diseases such as asthma (Wolters et al., 2000).
This Essay is
a Student's Work
This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.Examples of our work
Similarly, down-regulation of natural inhibitors, as demonstrated by a mutation in the gene for stefin B, predisposes affected individuals to a hereditary form of myoclonal epilepsy (Pennacchio et al., 1996; Lalioti et al., 1997).
4. Fold and specificity
The papain-like lysosomal cysteine proteases are monomeric proteins with MW between 22 and 28 kDa. The only exception is cathepsin C, which is a tetrameric molecule with an MW of 200 kDa (Dolenc et al., 1995). They all share the common fold of a papain-like structure. Cathepsin L, as a typical endopeptidase, has been chosen as a representative of the family (Fig. 1). A papain-like fold consists of two domains, reminiscent of a closed book with the spine at the front. The domains separate at the top in a V-shaped active-site cleft, in the middle of which the residues Cys25 and His159, one from each domain, form the catalytic site of the enzyme. The most prominent feature of the left (L) domain is central -helix of about 30 residues in length; the right (R) domain forms a kind of -barrel, which includes a shorter -helical motif. (The terms left and right domain refer to the standard view shown in Fig. 1.)
Fold of cathepsin L. Cathepsin L (1icf ) is shown as a ribbon in its standard orientation, viewed along the two-domain interface with the central -helix in a vertical orientation and the active site at the top. The side chains of the catalytic residues Cys25 and His159 are shown as yellow and green atom balls, respectively. This figure and Fig. 2 were prepared using the program RIBBONS (Carson, 1997).
Lysosomal cathepsins are encoded as `pre-proenzymes'. Following cotranslational cleavage of an amino-terminal signal peptide that mediates transport across the endoplasmic reticulum membrane (Erickson, 1989), procathepsins undergo proteolytic processing to the active mature enzyme form in the acidic environment of late endosomes or lysosomes (Nishimura et al., 1988; Kominami et al., 1988). The crystal structures of proenzymes (Table 2) showed that the structure of the mature enzyme is already formed in the zymogen form. Propeptide chain builds a -helical domain, which continues along the active-site cleft towards the N-terminus of the mature enzyme in a predominantly extended conformation in the direction opposite to substrate binding, blocking access to the active site. The procathepsin L structure (Coulombe et al., 1996) has been chosen as a representative of the family (Fig. 2). Propeptides are in fact inhibitors of their cognate enzymes, as demonstrated by kinetic data (Guay et al., 2000). Among them, cathepsin K (Sivaraman et al., 1999; LaLonde et al., 1999) has the longest N-terminal peptide and cathepsin X (Sivaraman et al., 2000) the shortest. The propeptide of cathepsin X is also the only one covalently attached to the reactive-site cysteine via a disulfide bond.
Fold of procathepsin L (1cjl ). The mature enzyme part of cathepsin L is shown in blue and and the propeptide is shown in red.
4.1. Substrate-binding sites
When in 1967 Schechter and Berger reported their fundamental work on the substrate-binding sites of papain they had to rely solely on kinetic data (Schechter & Berger, 1967). They studied the dependence of substrate kinetics on the length of a polyalanine chain and discovered the kinetics to be influenced by the polypeptide-chain length up to a length of seven amino acids and concluded that there are seven substrate-binding sites on the papain molecule. Their definition of substrate-enzyme interactions and their nomenclature became the standards (Fig. 3) for the assignment of interaction sites of a polypeptide substrate and a proteolytic enzyme.
Schechter and Berger's definition of substrate-binding sites (Schechter & Berger, 1967).
Three decades later, when a sufficient number of protease-inhibitor structures became available, the definition of Schechter and Berger substrate-binding sites on the papain-like enzymes was revisited and redefined (Turk et al., 1998). The base and walls of the substrate-binding sites are formed by four chain segments comprising two shorter loops in the L-domain (19-25, 61-69) and two longer loops in the R-domain (136-162, 182-213; Fig. 4). A third loop from the L-domain might also be named if the disulfide (Cys22-Cys65) which connects the two L-domain loops at the top is considered to be an additional loop closure.
Substrate-binding sites. (a) A view from the top: a polyalanine substrate model bound in the active-site cleft of cathepsin L. (Modelling of the binding geometry of a substrate is based on information gained from the crystal structures of substrate-analogue inhibitors and their interactions with a papain-like protease active site; see Figs. 5a, 5b and 5c). Substrate residues are shown as green sticks and are denoted using the Schechter and Berger nomenclature. Cathepsin L is shown with a grey surface representation. The surface of the catalytic cysteine side chain is yellow. (b) The same as Fig. 4(a), only that in this case cathepsin L is shown as a chain trace. Most of the chain is grey, whereas the loops building the substrate-binding sites are colour-coded: the L-domain loops (19-25 and 61-69) are purple and yellow and the R-domain loops (136-162 and 182-213) are blue and red. The loops building the substrate-binding sites do not only contain the residues directly contributing to the surface, but also include those that provide the foundation for it. (c) Structure-based amino-acid alignment of sequences of papain-like domains of all known human cathepsins. Structural alignment was made using the program Modeller (Sali and Blundell, 1993), then the sequences of cathepsins F, O and W were aligned to the template with the ClustalW program (Higgins et al., 1996). The sequences were taken from SWISS-PROT or GENBANK databases and the structures from the PDB. The loops building the substrate-binding sites are marked at the top marked with stripes and using the same colour code as in Fig. 4(b). Figs. 4(a), 4(b), 5, 8 and 9 were prepared with MAIN (Turk, 1992) and rendered with Raster3D (Merritt & Bacon, 1997). The cathepsin L surface was generated with GRASP (Nicholls et al., 1991).
The superimposed structures of complexes of substrate-analogue inhibitors and cathepsins (Figs. 5a and 5c) have revealed that substrate residues bind along the active-site cleft in an extended conformation, with the side chains alternately oriented toward the L- and R-domains. Residues P2, P1 and P1' bind into well defined binding sites. Positioning of these residues is governed by interactions which involve both main-chain and side-chain atoms. The S2 binding site is a deep pocket, whereas the S1 and S1' sites provide a binding surface. The positioning of the P3 residue is mediated only by side-chain interactions. For this reason, the binding geometries of the latter are scattered over a broad area and are unique for each substrate. On the prime side of the binding cleft, the P2' residue-binding site appears to be quite well defined. However, current knowledge is based on specific interactions between CA030 and the parts of cathepsin B structure responsible for its carboxydipeptidase activity (Fig. 5c) (Turk et al., 1995). It thus remains possible that interactions within the S2' site of an endopeptidase would be different.
Low-molecular-weight inhibitor-binding geometry. The inhibitors (shown as sticks) from structures of complexes with papain-like cysteine proteases are superimposed on top of the cathepsin L surface. The catalytic site Cys25 surface is coloured yellow. PDB codes are given in parentheses. Complexes with plant enzymes are also included. (a) Substrate-analogue inhibitors: fluoro- and chloromethylketone-based inhibitors and leupeptin are shown in light blue. Inhibitors are taken from structures of complexes with the following enzymes: cruzipain (1aim , 2aim ), papain (1pad , 1pop , 5pad , 6pad ), glycyl endopeptidase (1gec ) and cathepsin B (1the , 1cte ). (b) E-64 and derivative are shown in magenta. Inhibitors are taken from structures of complexes with the following enzymes: actinidin (1aec ), caricain (1meg ), cathepsin K (1atk ) and papain (1pe6 , 1ppp ). (c) CA030 inhibitor from the complex with cathepsin B (1csb ) is shown in blue. (d) Vinylsulfone-based inhibitors taken from structure of complexes with cathepsin K (1mem ) and cathepsin V (1fh0 ) are shown in green. (e) A group of non-covalent cathepsin K inhibitors are shown in red (1ayu , 1ayv , 1ayw , 1au0 , 1bgo , 1au2 , 1au3 , 1au4 ).
The location of the substrate-binding sites beyond S3 and S2' is not constrained by main-chain interactions. Each substrate residue docks on the surface of an enzyme in its own way (Fig. 5a). In particular, for the non-primed binding sites, there is evidence that a common S4 binding site and also an S3' site do not exist. Therefore, it was suggested that the substrate residue-binding regions beyond S2 and S2' should not be called sites but areas (Turk et al., 1998). The papain-like proteases thus represent a special class of proteolytic enzymes with the smallest number of substrate-binding sites, as opposed to chymotrypsin-like serine proteases which have six (Bode & Huber, 1992) and aspartic proteases which have eight binding sites (Wlodawer & Gustchina, 2000).
4.2. Binding of low-molecular-weight inhibitors
The rather short binding area seems to facilitate covalent interactions with low-molecular-weight inhibitors. Covalent interactions, however, impose hard constraints on the binding geometry. It thus took some time to design inhibitors that bind into the primed as well as non-primed side of the active-site cleft.
The structures of the first complexes of substrate-analogue inhibitors, based on the chloromethyl reactive group, with papain clarified the substrate binding in the non-primed binding sites in the 1970s (Drenth et al., 1976). Other structures followed later (Fig. 5a). At about the same time, a natural cysteine protease inhibitor named E-64 was discovered (Aoyagi & Umezawa, 1975; Hanada et al., 1978). E-64 utilizes an epoxysuccinyl group to covalently interact with the reactive-site cysteine (Fig. 6). Structures of E-64 (Varughese et al., 1992) and its analogues (Yamamoto et al., 1991) revealed that they bind into the non-primed region of the active site, but in the direction of propeptide binding and opposite to substrate binding (Fig. 5b).
Schemes of three most frequent reactive groups before and after binding to the reactive-site cysteine. (a) Chloromethylketone, (b) epoxysuccinyl, (c) vinylsulfone.
The crystal structure of CA030 in complex with human cathepsin B (Turk et al., 1995) showed that E-64 derivatives can also bind into the primed binding side in the direction of a substrate binding. Switching of the binding side was made possible by the specific interactions. The carboxylic group of the C-terminal residue of CA030 mimics the C-terminus of a substrate and docks against the occluding loop residues His110 and His111 (Fig. 5c). Alignment of the E-64 and CA030 binding geometries showed (Fig. 7; Turk et al., 1995) that the epoxysuccinyl group possesses internal symmetry with two carboxylic heads, mimicking a polypeptide C-terminus to which amino-acid residues can be attached. The synthesis of double-head inhibitors followed (Schaschke et al., 1997, 2000; Katunuma et al., 1999). The binding geometry of the double-head inhibitor design has recently been confirmed by the crystal structures of cathepsin L and cathepsin B-inhibitor complexes (Tsuge et al., 1999; Stern et al., unpublished results).
Alignement of epoxysuccinyl derivatives.
The S1' binding site can also be reached with inhibitor constructs using the vinylsulfone reactive group (Fig. 6c; McGrath et al., 1997; Somoza et al., 2000, 2002) and exceptionally even by a long side chain of a P1-mimicking residue of a chloromethyl inhibitor (Figs. 5a and 5d; Jia et al., 1995).
The covalent interaction with the reactive-site cysteine is not mandatory as shown by a series of `Smith-Kline' compounds (Fig. 5e), which utilize various constructs to tightly block the reactive site, but are not engaged in covalent interactions (Thompson et al., 1997).
Additional information regarding inhibitors and their chemistry can be found elsewhere (Shaw, 1990; Otto & Schirmeister, 1997; Brï¿½mme & Kaleta, 2002).
Whereas in endopeptidases (cathepsins F, L, K, O, S and V) the active-site cleft extends along the whole length of the two-domain interface, the exopeptidases (cathepsin B, C, H and X) possess additional features that reduce the number of substrate-binding sites (Fig. 8). The role of these features is dual: they prevent the binding of longer peptidyl substrates and they dock with charged N or C chain termini of substrates by utilizing selective electrostatic interactions.
Features of exopeptidases. Chain traces of cathepsins H (8pch ), C (1k3b ), B (1huc ) and X (1ef7 ), coloured orange, red, dark blue and light blue, respectively, are shown superimposed on the cathepsin L structure viewed from the top. The surface of cathepsin L is shown in grey; a yellow colour denotes the surface of the catalytic residue Cys25. Structural elements facilitating the exopeptidase activity are labelled. Residues that play a crucial role in exopeptidase specificity are shown in stick representation.
Carboxydipeptidase cathepsin B (Musil et al., 1991) has an insertion of about 20 residues, termed the occluding loop, which blocks the active-site cleft on the primed binding side beyond S2' and provides two histidine residues, His110 and His111, that bind to the charged main-chain carboxylic group of the C-terminal residue of a substrate. Cathepsin B also exhibits an endopeptidase activity that is made possible by the flexible occluding loop, which can be displaced from the active-site cleft (Illy et al., 1997; Podobnik et al., 1997; Nagler et al., 1997).
Cathepsin X is primarily a carboxymonopeptidase (Nagler et al., 1999), which can also act as a carboxydipeptidase (Klemencic et al., 2000). The crystal structure showed that a histidine residue, His23, positioned within a short loop termed a mini-loop (Nagler et al., 1999), is the anchor for the carboxylic group of the C-terminal substrate residue (Guncar et al., 2000). In the free-enzyme structure, the histidine ring occupies the position which is the S2' substrate-binding site in related cathepsins. This structure thus corresponds to the carboxymonopeptidase mode of cathepsin X. A simple modelling study (manual rotation about the side-chain bonds) suggested that the histidine ring can adapt a position equivalent to His110 of cathepsin B. This cathepsin B-like position would thus correspond to the carboxydipeptidase mode.
Cathepsin H is an aminomonopeptidase. The crystal structure of the porcine enzyme (Guncar et al., 1998) revealed that an eight-residue segment of the propeptide, called the mini-chain, binds in the active-site cleft of the enzyme in the direction of a bound substrate. The negatively charged carboxylic group of its C-terminal residue, Thr83P, attracts the positively charged N-terminus of a substrate and thereby facilitates the aminopeptidase activity of cathepsin H. Thr83P mimics a substrate P2 residue by occupying the position that is the S2 binding site in related enzymes. The mini-chain is additionally fastened to the enzyme surface by a four-residue insertion (Lys155A-Asp155D) and a carbohydrate chain attached to Asn112. The positioning of the cathepsin H mini-chain closely resembles the positioning of the C-terminus of a distant homologue, bleomycin hydrolase (Joshua-Tor et al., 1995).
Cathepsin C (also termed dipeptidyl peptidase I or DPPI) is an aminodipeptidase. The four independent active sites of cathepsin C are located on the external surface of the tetrahedral molecule. In contrast, oligomeric proteolytic machineries such as 20S proteasome (Lowe et al., 1995; Groll et al., 1997), bleomycin hydrolase (Joshua-Tor et al., 1995), tryptase (Pereira et al., 1998) and tricorn protease (Brandstetter et al., 2001) have their active sites on the inside surface. Proteasomes are barrel-like structures composed of four rings of - and -subunits, which cleave unfolded proteins captured in the central cavity into short peptides. Tryptases are flat tetramers with a central pore in which the active sites reside. The pore restricts the size of accessible substrates and inhibitors. Similarly, the active sites of bleomycin hydrolase and tricorn protease are also located within the hexameric structure. The exposed active sites make cathepsin C a unique oligomeric protease capable of the hydrolysis of protein substrates in their native state regardless of their size. Its design, supported by the oligomeric structure, confines the activity of the enzyme to an aminodipeptidase and thereby makes it suitable for use in many different environments, where cathepsin C can selectively activate a group of chymotrypsin-like proteases and presumably also other proteins.
The active site of cathepsin C is blocked beyond the S2 binding site by the massive body of the exclusion domain (Turk, Janjic et al., 2001; Olsen et al., 2001). An exposed -hairpin, the first N-terminal residues of the exclusion domain and the carbohydrate ring attached to Asn5 block undesired access, while Asp1 with its carboxylic group side chain controls entry into the S2 binding pocket by fixing the N-terminal amino group of the substrate. Asp1 simultaneously prevents the positively charged side chains of arginine and lysine residues from binding in the S2 binding pocket. An additional special feature of cathepsin C is the dependence of its activity on chloride ions. One was located at the bottom of the very long S2 binding pocket.
Interestingly, structural comparison and similar interactions within the active-site cleft (Turk, Janjic et al., 2001) suggested that the exclusion domain of cathepsin C was adapted from a metalloprotease inhibitor (Baumann et al., 1995). The N-terminus of the exclusion domain only blocks access to a portion of the active-site cleft, whereas the N-terminus of the metalloprotease inhibitor binds along the primed binding sites and interacts with the reactive-site zinc ion.
Papain-like cathepsins are rather non-specific enzymes with no clear substrate-recognition site. This does not imply that specific inhibitors cannot be designed. It only suggests that specificity is not an issue involving a single binding site, but is rather a cumulative contribution of all interactions. This suggests that inhibitor constructs interacting with regions on both sides of reactive site can be advantageous compared with those which bind to only one side. The specificity of exopeptidases is, however, more a matter of exclusive interactions of free chain termini than side-chain recognition. The design of exopeptidase inhibitors therefore seems easier, as such inhibitors can rely on covalent interactions with the reactive site and electrostatic interactions with negatively charged carboxylic groups or positively charged histidines in the cases of aminopeptidases and carboxypeptidases, respectively.
5. Hints from interactions with protein inhibitors
Stefins and cystatins are rather non-specific endogenous inhibitors of cysteine proteases. They are only able to discriminate between endo- and exopeptidases. Whereas the inhibition of endopeptidases is rapid and tight, almost being pseudo-irreversible, with Ki values in the picomolar range, the inhibition of exopeptidases is much weaker with Ki values in the millimolar to nanomolar range (reviewed in Turk & Bode, 1991; Turk et al., 2000). Similar to stefins, an inhibitory fragment of the p41 form of MHC class II associated invariant chain (termed the p41 fragment) inhibits endopeptidase cathepsin L (Ki = 1.7 pM) and exopeptidase cathepsin H (Ki = 5.3 nM); however, it does not inhibit endopeptidase cathepsin S and exopeptidase cathepsin B (Bevec et al., 1996). How can this be explained on a structural basis?
Two crystal structures provided insight into the interactions between a papain-like cysteine protease and its protein inhibitor: those of the complexes of papain-stefin B (Stubbs et al., 1990) and cathepsin L-p41 fragment (Fig. 9; Guncar et al., 1999).
Binding of protein inhibitors. Stefin B superimposed on cathepsin L-p41 complex in views (a) across and (b) along the active-site cleft of cathepsin L. (a) is shown in approximately the standard view (Fig. 1), whereas (b) is generated by an 90ï¿½ rotation about the vertical axis. Superposition of the p41 fragment and stefin B is based on the three-dimensional alignment of papain and cathepsin L structures in the papain-stefin B and cathepsin L-p41 fragment complexes. Chain traces of the p41 fragment, stefin B and cathepsin L are shown in orange, red and blue, respectively.
The wedge shape and the three-loop arrangement of the p41 fragment bound to the active-site cleft of cathepsin L is reminiscent of the inhibitory edge of cystatins and thus demonstrate the first observed example of convergent evolution in the cysteine protease inhibitors. The interactions within the active-site cleft are non-specific. They are either hydrophobic or mediated via solvent molecules or involve hydrogen-bond interactions with groups which are conserved throughout the family of papain-like enzymes. This suggests that the p41 fragment, like the stefins, displaces the characteristic structural elements of exopeptidases from the active site, as revealed by the stefin A-cathepsin H complex (Jenko, unpublished results). However, the different fold of the p41 fragment results in additional contacts with the highly variable regions of the loops at the top of the R-domain, which build the surface of the S2 and S1' substrate-binding sites. This enables the p41 fragment to form specific interactions with its target enzymes and simultaneously prevents the approach of cathepsins S and B (Guncar et al., 1999).
Currently, no drug targeted towards a lysosomal cysteine protease is in use; however, many are in development. The knowledge and experience gathered in the field suggest that there are enough leads to drive the research. The protein-inhibitor complexes, however, suggest to designers of low-molecular-weight inhibitors that there are still unexplored areas on the surface of the enzymes. In particular, it may be worthwhile to explore them in the case of endopeptidases, which have fewer constraints within the active-site cleft than the exopeptidases.
The drug-design process is challenged also from another point of view. Are the mature human enzymes really the most appropriate targets for potential drugs? Labelled inhibitors targeting a papain-like cysteine protease from Trypanosoma cruzi indicate that such inhibitors may interact already with their proenzyme form within the Golgi complex (Engel et al., 1998).
The Slovenian Ministry of Education, Science and Sport and the ICGEB are gratefully acknowledged for their financial support.
Dusan Turk is a research associate at Jozef Stefan Institute, Ljubljana, Slovenia, heading the Structural Biology group in the Department of Biochemistry and Molecular Biology. He received his PhD degree at the Technical University, Munich in 1992 in the laboratory of Professor Robert Huber at the Max-Planck Institute of Biochemistry, Martinsried, Germany, after completing his BSc in chemistry and masters degree in the area of computational chemistry at Ljubljana University (Chemical Institute). His postoctoral experience was divided between the laboratories of Professors Robert Huber and Vito Turk at the Jozef Stefan Institute and he started a macromolecular crystallography laboratory in the latter. His principal interests are in the structural biology of proteases, predominantly cysteine proteases, and their control mechanisms, and in the development of computational methods for macromolecular crystal structure determination assembled in the computer program MAIN.
Gregor Guncar obtained a degree in chemistry in 1995 and his doctorate in 2000, both from the University of Ljubljana, Slovenia. He joined the Structural Biology group at Josef Stefan Institute after his BSc and worked in the field of cysteine proteases and their inhibitors, of which he determined several structures. He is now continuing his work in the field and is looking forward to finding a nice postdoctoral position in the near future.