Pairs of domains between two or more classification

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

By comparing the fold definitions of pairs of domains the level of agreement between two or more classification systems can be quantified. A fold pair consists of two domains having the same fold identifier in a given fold classification. The pairwise comparisons of the fold definitions in the three classification systems gave12,174,184 pairs in CATH, 7,325,286 in DALI and 5,328,268 pairs in SCOP database. With 12,174,184 pairs CATH contains 1453 fold types while DALI & SCOP have 1088 & 783 folds respectively. If we assume similar distributions of domains into folds for the three classification systems then we would expect more fold types in CATH to have fewer pairs. However CATH populates its most common folds more than DALI & SCOP(refRYANDAY(CONSENSUS).

The number of pairs increases as the square of the population, leading to a much larger number of pairs((((FIGGGGGG1)))))(refRYANDAY(CONSENSUS).

If both domains are defined in another fold classification system, we have domain agreement and if both are defined as having the same fold, we have fold agreement. When an analysis was carried out on the subset of structures in earlier SCOP releases, the fold agreement does appear to be declining slightly faster than domain agreement(((((fig3))))(refRYANDAY(CONSENSUS).

{When the families of CATH are mapped into SCOP and of SCOP into CATH, at lower levels the two hierarchies map quite well onto each other where CATH S-level families are clearly sub-families of SCOP families. As we move onto the higher levels the interleaving pattern fades, as 45% SCOP super-families map to CATH homologous super-families, 24% of them do not map to CATH at all, and only 18% map above the CATH homologous super-family level.

Similarly the DALI families generally match the low levels of the SCOP hierarchy. As we move on to the DALI hierarchy higher levels it does correlate with that of SCOP hierarchy but the pace is quite slow and this is accompanied by a fast decay of DAL families fraction which can be successfully mapped onto the SCOP hierarchy.

But mapping DALI families onto CATH yields a different picture. When we consider all the levels of DALI hierarchy, a large fraction of DALI families which is more than 30% do not agree with CATH. By considering both CATH S-level families and DALI level 6 families that map below the SCOP families, we can conclude that both CATH and DALI provide refinements of the SCOP families but the refinements of DALI do not match to those of CATH. This is quite reasonable because all the DALI classification including those which are at finest level 6 are based on structure comparison, while CATH S-level is a sequence based refinement of the CATH homologous superfamily level.(((fig5)))}(refELONPORTUGALY(ASSESSMENT)

{The CATH pairwise matches that occur most commonly in the three layer immunoglobulin fold and sandwich Rossmann fold are missed by SCOP. Most of the mismatches are due to the ‘fold overlap' problem where a fold within the CATH encompasses more than one fold within the SCOP and is vice versa. Within the CATH when a domain is classified as being a three layer sandwich Rossmann fold, there are several are several SCOP folds to which it could conceivably belong. The same pattern occurs within the CATH with the immunoglobulin fold. These domains may be present in several SCOP folds such as the prealbumin like fold, the cupredoxin fold and the immunoglobulin like β sandwich. Thus when a domain classified as one SCOP fold will not be paired with a domain in another fold. Though the structures that are deemed by the CATH appears to be geometrically similar, the SCOP separates them to reflect an evolutionary or topological distinction}((refAsystematiccompar(HADLEY)).

The analysis of the domain - domain interactions clearly depend on the nature of the domain classification system. Several differences can identified between the CATH and the SCOP domain assignments in spite of their high degree of correspondence{HADLEY}. There are some instances where a domain which can be found in one classification has no equivalent in other system. With these observations it can be also hypothesised that the set of domain - domain interactions which can be derived by using SCOP will contain interactions not found in the set that is derived using the CATH and is similar for the reverse. When several properties of CATH and SCOP domain - domain interactions were investigated, which include the sizes of the domains and interaction interfaces, the classification into pairwise superfamilies and families and the promiscuity of domains it was found that the CATH interfaces whose mean is 39.8 residues is generally smaller than the SCOP interface whose mean was 45.7 residues. This can be explained with the difference in the single domain sizes when comparing the CATH and the SCOP. In other words the CATH domains tend to be smaller than the SCOP domains and vary less in the size. Because of the use of stringent sequence similarity threshold by CATH than the SCOP to define the families, there are more CATH pairwise families than the SCOP pairwise families(FIG:table1).{PAGE3BELOWTABLE1}

ered to have an equivalent interface. So, for a reference interface to be considered to have an equivalent at 75%< overlap, at least one of the interactions belonging to the same pairwise superfamily classification needed to have an equivalent interface, which covered at least 75% of the interface area. Table II shows that at 75% overlap 77.8% of the SCOP pairwise superfamily interfaces have an equivalent CATH interface and 84.7% of the CATH pairwise superfamily interfaces have an equivalent SCOP interface. The percentage of nonredundant interfaces with equivalent interfaces decreases steadily as the degree of overlap at the interface increases. At 25% overlap, 82.5% of SCOP interfaces and 87.8% of the CATH interfaces had equivalents decreasing to 67.3% and 79.4%, respectively, for 100% overlap.

The difference in the overlap again shows that the interfaces derived using the SCOP classification were less well covered by CATH equivalents than CATH interfaces were covered by equivalent SCOP interfaces. This can partly be explained by the larger size of SCOP domains and the SCOP interfaces thereby providing a greater coverage of the equivalent CATH interfaces than CATH interfaces provide of equivalent SCOP interfaces. These results suggest that, as the coverage of CATH interfaces by SCOP was greater than the coverage of SCOP interfaces by CATH, SCOP should be a better choice of domain definition for analyzing protein-protein interactions if interface coverage is the principal criterion. However, using both SCOP and CATH in conjunction leads to a significant increase in the number of interfaces observed. As the shortage of domain-domain interfaces is a limiting factor in the use of structural data in protein-protein interaction investigation and prediction and the majority of work in this area is performed using only one domain classification system this result represents a significant finding.

To illustrate this point, if CATH is used as the domain classification system for prediction of protein-protein interactions through homology matching of sequences to domains observed interacting in structural data then there would be 1961 pairwise superfamiles with which to match. If the domain-domain interactions observed in SCOP that had no CATH equivalent were added there would be 2606 pairwise superfamilies with which to match (1961 þ 645). Aloy et al. predicted that the number of types of protein-protein interaction is 10,000.7 If a pairwise superfamily is considered to be a type of protein- protein interaction then just using CATH would mean that 19.6% (1961/10,000) of protein-protein interaction types are observed in structural data. Including the SCOP pairwise superfamily interactions where there is no equivalent CATH interaction would increase this value to 26.1% (2606/10,000). This could reduce the false negative rate of predictive methods, which employ homology matching to structural data (such as7, 8, 9) by