Fault Diagnosis In Object Oriented Software Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

To meet the demands of todays continually evolving software development, organizations must optimize the use of their limited resources and must deliver quality products on time within the budget. This requires prevetion of fault introduction and quick discovery and repair of residual faults. In this paper, a new model for predicting and identifying of faults in object-oriented software systems is introduced. In particular, faults due to the use of inheritance and polymorphism are considered as they account for significant portion of faults in object-oriented systems. A fault predictor is subsequently established to identify the fault type of individual fault classification. It is concluded that the proposed model yields high discrimination accuracy between faulty and fault-free classes.

Data mining and information retrieval techniques have the potential to tackle voluminous data and assist in fault management. Computers are improving in their speed and functionality. Unfortunately, the difficulty in using them is also increasing. Their unexpected behavior at times is often a cause of frustration for the users. Identifying the cause of such behavior and fixing them is becoming a daunting task. Users used to identify certain symptoms and rely on help centers, vendor supported help database, search engines and web-forums for a solution. More often, the user may end up in giving/getting incomplete/insufficient information. The whole process is repeated for each such user facing a problem. Automating the fault diagnosis problem and designing robust fault identification and management systems is the need of the hour. Fault detection is generally handled by analyzing certain symptoms. Software reliability can be defined as the probability of failure-free operation of a computer program executing in a specified environment for a specified time. It is often considered a software quality factor that can aid in predicting the overall quality of a software system using standard predictive models. Predictive models of software faults use historical and current development data to make predictions about faultiness of software subsystems/modules. Although software faults have been widely studied in both procedural and object-oriented programs, there are still many aspects of faults that remain unclear. This is true especially for object-oriented software systems in which inheritance and polymorphism can cause a number of anomalies and fault types.Unfortunately, existing techniques used to predict faults in procedural software are not generally applicable in object-oriented systems.

Some recent studies [3], [4] report the use of object-oriented metrics to predict fault-proneness and number of faults by applying various statistical methods and neural network techniques. However, they generally stop at the problem of fault prediction without attempt to further characterize the faults likely present in the system. In this paper, a new method of fault prediction is introduced along with a method for classification of fault type. For the reasons mentioned earlier, faults due to inheritance and polymorphism are of special interest in this work.

The problem of predicting whether a software class is faulty is viewed as a binary classification problem in which the class is represented as a data point with coordinates described by object-oriented metrics and other parameters. The prediction of fault type in a faulty software class is then considered as a clustering problem in which each fault type is represented by a cluster prototype. To solve the two problems, use of neural network techniques is proposed. In particular, the classification problem is addressed using a Multilayer Perceptron (MLP) while the solution to clustering problem is based on Radial-Basis Function Network (RBFN).A set of source code is examined to analyze faults that exist in software systems. Fault analysis consists of two parts; faultiness prediction and fault type identification

Table 1: Fault and anomalies due to inheritance and polymorphism




State Definition Anomaly


State Definition Inconsistency


State Definition Incorrectly


Indirect Inconsistent State Definition


State Visibility Anomaly

Table 2: Syntactic Inheritance Patterns


Syntactic Pattern


Extension method Calls another Extension method


Extension method Calls Inherited methods


Extension method Calls Refining method


Extension method Defines Inherited state Variable


Refining method Calls Extension method


Refining method Calls other Inherited method


Refining method Calls another Refining method


Refining method Calls Overridden Method


Refining method Defines Inherited state Variable


Refining method Uses Inherited state Variable

2. Background

2.1. Neural Networks

This study employs two neural network techniques as the underlying mechanisms for fault prediction, namely, Multilayer Perceptron (MLP) and Radial-Basis Function Networks (RBFN). The former helps cluster input data into appropriate fault categories.

2.2. Fault Categories and Software Metrics

Inheritance and polymorphism provide many benefits in creativity, efficiency, and reuse of object-oriented software development. However, they can cause a number of anomalies and faults [20]. This study focuses on five fault types incurred by the use of polymorphism shown in Table 1.

A number of parametric measurements are introduced faulty causes, i.e., number of appearances of syntactic fault pattern [3], and syntactic and structural measures. The metrics are summarized in Tables1, 2. The parametric measurements are categorized according

2.2.1 Inconsistent type use (ITU)

For this fault type, a descendant class does not override any inherited method. Thus, there can be no polymorphic behavior. Every instance of a descendant class C that is used where an instance of T is expected can only behave exactly like an instance of T . That is, only methods of T can be used. Any additional methods specified in C are hidden since the instance of C is being used as if it is an instance of T . However, anomalous behavior is still a possibility. If an instance of C is used in multiple contexts anomalous behavior can occur if C has extension methods. In this case, one or more of the extension methods can call a method of T or directly define a state variable inherited from T . Anomalous behavior will occur if either of these actions results in an inconsistent inherited state.

2.2.2. State Definition Anomaly (SDA)

In general, for a descendant class to be behaviorally compatible with its ancestor, the state interactions of the descendant must be consistent with those of its ancestor. That is, the refining methods implemented in the descendant must leave the ancestor in a state that is equivalent to the state that the ancestor's overridden methods would have left the ancestor in. For this to be true, the refining methods provided by the descendant must yield the same net state interactions as each public method that is overridden. From a data flow perspective, this means that the refining methods must provide definitions for the inherited state variables that are consistent with the definitions in the overridden method. If not, then a potential data flow anomaly exists. Whether or not an anomaly actually occurs depends upon the sequences of methods that are valid with respect to the ancestor. Any extension method that is called by a refining method must also interact with the inherited variables of the ancestor in a manner that is consistent with the ancestor's current state. Since the extension method provides a portion of the refining method's net effects, to avoid a data flow anomaly the extension must not define inherited state variables in away that would be inconsistent with the method being refined. Thus, the net effect of the extension method cannot be to leave the ancestor in a state that is logically different from when it was invoked. For example, if the logical state of an instance of a stack is currently not-empty/not full, then execution of an extension method cannot result in the logical state spontaneously being changed to either empty or full. Doing so would preclude the execution of pop or push as the next methods in sequence.

2.2.3. State Definition Inconsistency due to state variable Hiding (SDIH)

The introduction of an indiscriminately named local state variable can easily result in a data flow anomaly where none would otherwise exist. If a local variable is introduced to a class definition where the name of the variable is the same as an inherited variable v, the effect is the inherited variable is hidden from the scope of the descendant (unless explicitly qualified, as in super.v). A reference to v by an extension or overriding method will refer to the descendant's v. This is not a problem if all inherited methods are overridden since no other method would be able to implicitly reference the inherited v. However, this pattern of inheritance is the exception rather than the rule. There will typically be one or more inherited methods that are not overridden. There is a possibility for a data flow anomaly to exist if a method that normally defines the inherited v is overridden in a descendant when an inherited state variable is hidden by a local definition.

2.2.4. State Defined Incorrectly (SDI)

Suppose an overriding method defines the same state variable v that the overridden method defines. If the computation performed by the overriding method is not semantically equivalent to the computation of the overridden method with respect to v, then subsequent state dependent behavior in the ancestor will likely be affected, and the externally observed behavior of the descendant will be different.

2.2.5 Indirect Inconsistent State Definition (IISD)

An inconsistent state definition can occur when a descendant adds an extension method that defines an inherited state variable. The method is an extension method, not a refining method. For example, consider the class hierarchy shown in Figure 1A where T specifies a state variable x and method m(), and the descendant D specifies method e(). Since e() is an extension method, it cannot be directly called from an inherited method, in this case T::m(), because e() is not visible to the inherited method. However, if an inherited method is overridden, the overriding method (such as D::m() as depicted in Figure 1B) can call e() and introduce a data flow anomaly by having an effect on the state of the ancestor that is not semantically equivalent to the overridden method (e.g. with respect to the variable T:: in the example). Whether an error occurs depends on which state variable is defined by e(), where e() executes in the sequence of calls made by a client, and what state dependent behavior the ancestor has on the variable defined by e().

Figure 1. IISD: Example of indirect inconsistent state definition

3. Fault Analysis

In this study, a set of source code is examined to analyze faults that exist in software systems. Fault analysis consists of two parts; faultiness prediction and fault type identification. A faultiness predictive model has been constructed based on software characteristics to predict whether the considered software is faulty or fault-free. A set of predetermined software metrics are used as the principal characterization attributes of software, while neural network techniques are applied to build the predictive model. In the previous work [16], two faultiness predictive models were built based on the software metrics with the help of multilayer perceptron (MLP) for the first model and Radial-basis function network (RBFN) for the second model. The results yielded prediction accuracy of 60% and 83%, respectively. Since some software metrics used in prior work are suitable only for structured software, additional object-oriented software metrics have been employed. A fault identification model named MASP is introduced. The MASP model consists of two stages, namely, faultiness prediction (or coarse-grained) stage and fault type identification (or fine-grained) stage. This is depicted in Figure2.

In the faultiness prediction stage, a coarse-grained metric selection algorithm is proposed to extract the vital fault metrics that affect fault proneness. A faultiness predictive model is applied to extract faulty classes using multilayer perceptron with back-propagation learning algorithm. Since the metrics selected by coarse-grained method do not contain adequate trace provisions for identifying fault type from the faulty classes so obtained, a fine-grained metric selection algorithm is presented to enhance trace identification capability with the help of other relevant metrics. A fault type identification model is constructed using radial-basis function network (RBFN).

Figure2.Fault Identification model construction

4. Data Preprocessing

All 60 software metrics and fault parameters were applied to the experimental data. However, not all software metrics and fault parameters contributed to faultiness of software classes. Therefore, it was necessary to select only the relevant metrics and fault parameters are simply referred to as metrics.

1)Set initial weight of each metric to accentuate its importance

Wi(t)= 0


Wi(t) is the weight value of metric i

i = {1, 2, . . . , m}

m is the number of metrics

t is the iteration number

2) Establish a pair of fault-free and faulty classes from the training set.

X = {x1, x2, . . . , xm} , Y = {y1, y2, . . . , ym}


X is a faulty class consists of m metrics

Y is a fault-free class consists of m metrics

3) Calculate the relative difference of values of each metric pair from step 2.

|xi - yi|

Di = ______ Χ 100

(xi + yi)

Di is the relative difference of values of metric i of their respective classes

Xi is the value of metric i of the faulty class

Yi is the value of metric i of the fault-free class This will prevent metrics intermix among their corresponding applicable domain.

4) Adjust the weight value of each metric

IF Di ≥ β

THEN Wi(t)=Wi(t−1)+ 1

ELSE Wi(t)=Wi(t−1) − 1

where β = 50 is a threshold value

5) Repeat step2 through step 4 until all fault-free classes match with all faulty classes of the training set.

6) Consider the weight value of each metric, replacing negative values with zero and normalize all weight values.

7) Select the metrics with weight values above the selected threshold.

5. Fault Prediction models

A. Faultiness prediction

To predict the faulty class, a predictive model has been constructed using MLP with back-propagation learning algorithm.The objective of the models is to correctly classify the data points into fault-free and faulty groups shown in Figure 3.

Fault free class Faulty class

Figure3. Faultiness Classification

The output value expected from the output node of each model is zero for the fault-free class and one for the faulty class. The learning rate of 0.35 with the help of the sigmoid function in weight adjustment to yield the correct output value. After the training process is completed, the model is reapplied to classify the test data. The output values so obtained ranging between 0 and 1 which are indecisive for data classification. Setting an acceptance ratio at 0.55, a data point is classified as a faulty class if the output of MLP is greater than this value. Otherwise, it is a fault-free class.

B. Fault Type Identification

Fault type identification model is based on RBFN technique. The objective of the model is to cluster faulty classes into groups based on fault type as shown in the example of three fault types in Figure 5. The model consists of 35 input nodes in the input layer, a number of hidden nodes in the hidden layer (this number is determined during the training process), and 5 output nodes in the output layer that form an output vector. The output vector denotes the type of fault in binary format as '10000', '01000', '00100', '00010', and '00001', representing SDIH, IISD, SVA, SDA, and SDI faults, respectively.

During the experiment, training data were used to generate the weights between the hidden layer and the output layer. If the network yields low accuracy, the number of hidden node will be incremented by one. This restructuring by node plus- one progression continues until the desired accuracy is acquired or the number of hidden nodes reaches the number of training data points. At which point, reorganization must be done by repeating the attribute selection algorithm and proceed along the same steps described. The implication of this reorganization is that some, or all, selected metrics do not contribute to the faulty behavior of software components, whereby prediction accuracy will fall short of the acceptable range.

To explore which metrics dominate the fault type of a given hidden node that represents all 35 metrics, an algorithm is proposed as follows:

1) Choose a fault type to find a set of representative metrics, for example, SDIH fault.

2) Find the hidden nodes that effect the fault type from the results. There are 2 hidden nodes in this case.

3) Identify the set of classes from the training data where the selected fault is originated. There are 80 classes from the training data that contain SDIH fault.

4) Calculate the difference between each metric of a training class and the same metric of a hidden node

Vi = |ci - hi|


Vi is the difference of values of metric i

of the class and the hidden node

ci is the value of metric i of the class

hi is the value of metric i of the hidden node

5) For each fault type, caculate the total difference of each metric value.

m n

Tot Vi = ∑ ∑ Vi (j,k)

j=1 k=1


TotVi is the total difference of value of metric i

of all classes and hidden nodes

Vi (j,k) is the difference of value of metric i

of training class k and hidden node j

m is the number of hidden nodes for the

selected fault type

n is the number of training classes for the

selected fault type

6) Nomalize all total difference values

Tot Vi -min

Tot Vi = ___________

max - min


max and min are the maximum and minimum

total difference values and repeat steps above until all fault types are considered.

Figure.4 The total difference of each metric having IISD fault

Figure 5.The total difference of each metric having SDA fault

Figure 4 and Figure 5 show the effects of IISD and SDA

metrics have on particular fault types. The zero total difference value means that the corresponding metrics of that training class and hidden node are the same and thus has no effect on the fault type. On the other hand, if the total difference metric between the training classes and the hidden nodes is high, the metric will likely contribute to the fault prediction of the software.


The proposed software metric attribute selection algorithm proved to be effective in determining the significance of each metric and characterization of software faultiness. Based on the two predictive models, the proposed approach is able to predict faultiness of a class with more than 90% accuracy. Accurate predictions obtained from such a good reliability model eventually lead to higher efficiency of software process and quality of resulting software products.