Reusability In Software Effort Estimation Model Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Most of the recent research initiatives have focused on development of formal estimation models to improve estimation accuracy. Formal estimation models have been developed to measure lines of code or size of software projects. But, most of the models have failed to improve estimation accuracy. This paper focuses on the reusability in software development effort estimation based on COCOMO II and ANN. Software reuse saves the software effort and improves productivity. Incorporation of reusability metrics in COCOMO II may yield better results. In COCOMO II it is very difficult to find the values of size parameters. This paper has proposed a new model called COREANN (COCOMO Reusability in Effort estimation based on ANN model) for better effort estimation accuracy and reliability. The proposed model has focused on two components of COCOMO II. First, instead of using RUSE cost driver, three new reuse cost drivers are introduced. Second, In order to reduce the project cost, three cost drivers such as PEXE, AEXE, LTEX are combined into single cost driver Personnel Experience (PLEX). Finally, this proposed model accuracy is more improved with the help of ANN Enhanced RPROP algorithm and simulated annealing optimization technique. To evaluate the performance of the proposed model a set of projects compared with the existing COCOMO II model by MRE, MMRE and PRED for evaluation of software cost estimation. The final results show that the use of proposed COREANN model effort estimation is reliable, accurate and predictable.

Key words:

Effort Estimation, Software Reuse, COCOMO II, Artificial Neural Network, Simulated Annealing

1. Introduction

A survey on software effort estimation revealed that most of the projects were failed due to effort overrun and exceeding its original estimates. It also stated that 60-80 percent of software projects encounter effort overruns [1]. Effort overruns usually lead to cost overruns and missed project deadline. This would cause lack of productivity or loss of business. Software effort estimation is one of the most critical and complex task in software engineering, but it is an inevitable activity in the software development processes. Over the last three decades, a growing trend has been observed in using variety of software effort estimation models in diversified software development processes. Along with this tremendous growth, it is also realized that the essentiality of all these models in estimating the software development costs and preparing the schedules more quickly and easily in the anticipated environments.

Although a great amount of research time and money have been invested to improve the accuracy of the various estimation models. Due to the inherent uncertainty in software development projects such as complex and dynamic interaction factors, change of requirements, intrinsic software complexity, pressure on standardization and lack of software data, it is unrealistic to expect very accurate effort estimation of software development processes[2].

Software reuse has been given great importance in software development for decades. Software reuse has benefits such as reduced effort, improved productivity, decreased time-to-market and decreased cost. This research work addresses the significance of reusability in effort estimation and formulates new metrics for reusability to determine the reliable and accurate effort estimates.

Predictable and reliable effort estimation is the challenging task in software project management. In research there are numbers of attempts to develop accurate cost estimation models based on various techniques. However, the evaluation of accuracy and reliability of the model shows its advantages and weaknesses. Selecting an appropriate model for a specific project is an issue in project management. The appropriate model which provides minimum relative error has to be considered as the best fit for effort estimation. In last decades, various methods for cost and effort estimation have been proposed in the following three categories:

Expert Judgment (EJ)

Algorithmic Models (AM)

Machine Learning (ML)

Expert Judgment (EJ)

Expert judgment is used as an estimation method [3] to evaluate the software effort estimates. Inacurate results produced by the algorithmic and non-algorithmic cost models results in extensive use of EJ method to produce better estimates [4].EJ estimates are purely based on the opinion provided by expert persons of the organization and it is the most frequently applied effort estimation technique [5]. The accuracy of estimate depends on the experience and expertise of the expert. Expert judgment based effort estimation is suited in unstable, changing environment and contain contextual information. Despite of its usefulness expert judgment have some drawbacks so the combination of model based expert judgment may produce better accuracy [2].

1.2 Algorithmic Models (AM)

Algorithmic models are popular for software effort estimation. When using algorithmic effort estimation models, the cost driver is used to measure the effort estimates. It maintains the relationship between effort and one/more project characteristics. It needs calibration or to be adjusted to local circumstances. Examples of algorithmic models are the Constructive Cost Model (COCOMO) and the Software Lifecycle Management (SLIM).

SLIM Model:

The SLIM [6] an algorithmic estimating method was developed by Larry Putnam of Quantitative Software Management in 1970's. SLIM is practically used in large software projects and it is a tool for cost estimation and man power scheduling. SLIM is basically derived based on the concept of Norden-Rayleigh curve which evaluates manpower as a function of time. SLIM depends on an Source Line of Code (SLOC) estimate for the project's general size, then modifies this through the use of the Rayleigh curve model to produce its effort estimates.

The SLIM software equation is determined as follows:

S= E * (EFFORT)1/3 * td4/3

Where td is the software delivery time,

E is the environment factor that reflects the development capability,

S is represented in LOC and

EFFORT is represent in person-year.

1.2.2 COCOMO II Model:

COCOMO II is an enhanced and updated model of COCOMO to fulfill the needs of the next generation software engineering practices[7][8]. COCOMO II was published initially in the annals of software engineering in 1995 with three sub models; an application-composition model, an early design model and a post-architecture model. COCOMO II has, as an input, a set of seventeen Effort Multipliers (EM) or cost drivers which are used to adjust the nominal effort (PM) to reflect the software product being developed.

The Application Composition Model

The Application Composition model is proposed to estimate the effort and schedule on projects that uses the latest software development tools which supports rapid application development. It purposes object points for sizing instead than the size of the code. The capital size evaluate is fixed by counting the number of reports, screens and third generation components that will be utilized in application.

Effort (PM) = NOP/PROD

Where NOP = (object points)*(100-%reuse)/100, PROD = NOP/PersonMonths.

The Early Design Model

This model is used get the rough estimates based on the preliminary investigation and incomplete project analysis. It used to measure substitute software system architectures where UFP is used for sizing.

Effort = a*KLOC*EAF

Where a = 2.45, Effort Adjustment Factor (EAF) is computed as in original COCOMO model using 7 metrics such as RCPX, RUSE, PDIF, PERS, PREX, FCIL and SCED.

The Post Architecture Model

The Post-Architecture model is used when high level design is completed and complete information about the project is known. This model is used during the actual development of the project or product. This model consists of a set of 17 cost drivers and 5 scale factors.

Effort (PM)= a*(SIZE)b*EAF

Where a,b are software coefficients, a is set as 2.55 and b is computed as b=1.01+0.01* ∑(wi), wi= sum of weighted scale factors.

1.3 Machine Learning (ML):

Machine learning techniques have been used as an alternative to EJ and AM. Examples include fuzzy logic models, regression trees, Artificial Neural Networks (ANN) and case based reasoning [2].

In earlier days of research, some researchers found that using more than one technique can reduce risk of reliability and improve the accuracy and predictions. Further, using more than one method may avoid the loss of useful information that other methods can provide [9]. But, using a combination of method seems to be a solution for providing more trustable decisions in software effort estimation. According to some survey a combination of individual methods has rarely been used to estimate software effort. However it has been implemented successfully in other scientific fields [9].

1.3.1 Artificial Neural Network (ANN) Model:

The interest on the application of ANN has grown in recent days. ANN has been applied to various problem domains such as engineering, medicine, physics, etc., ANN can be used as predictive models because it is capable of modeling complex functions. ANN is a massively parallel computational model which simulates the properties of biological interconnected neurons. A neuron is the base of an ANN model which is described by a state, synapses, a combination function and a transfer function. The neuron computes a weighted sum of its inputs and generates an output if the sum exceeds a certain threshold. This output then becomes an input to other neurons in the network. The process continues until one or more outputs are generated. Most of the software effort estimation techniques utilized back propagation learning algorithm [10][11]. The ANN is initialized with random weights and gradually learns the relationships implicit in a training data set by adjusting its weights when presented to these data.

Software reuse has been given great importance in software development for decades. Software reuse has benefits such as reduced effort, improved productivity, decreased time-to-market and decreased cost. This research work addresses the significance of reusability in effort estimation and formulates new metrics for reusability to determine the reliable and accurate effort estimates. This paper proposed a new model called COREANN which is the enhanced reusability model of COCOMO II post architectural model. The accuracy of the proposed model has been improved with the help of ANN Enhanced Resilient Backpropagation (ERPROP) algorithm and Simulated Annealing (SA) optimization technique.

The rest of the paper is organized as follows: Section 2 presents the review of the work done in the application of neural network techniques to software effort

estimation. In this section a brief description of back propagation and RPROP are given. Section 3 describes the architecture of the proposed neural network model. It also discusses the learning algorithm used in training the network. Section 4 describes the experimental setup i.e., dataset preparation and implementation details. Section 5 presents the experimental results obtained by applying two different algorithms on the proposed architecture. Finally Section 6 summarizes our work and gives conclusions and future research work.

2 Related Works

2.1 Extensions of COCOMO II

The COCOMO II [7][8] project was started to meet the future requirements of the next generation of software development process. The new COCOMO II model has incorporated features that are realistic and accurate in COCOMO 81 and Ada COCOMO models. COCOMO II has proposed three submodels based on development stages of the project. The Application Composition model is the first submodel used to estimate effort and schedule on projects that use rapid application development tools. Early design model is used to get approximate estimate in the preliminary stages of the project. Post architectural model is mainly used to estimate effort when the high level design is completed. COCOMO II defined the reuse model which adjusts the code reuse by modifying the size of the module or project. This model considers reuse with function points and source lines of code the same in either the early design model or the post-architecture model. A size estimate equivalent to the number of lines of new source code is computed and then adjusts the size estimate for new code. This model has not clearly specified complete system to evaluate the "actual" equivalent SLOC. It is difficult to calibrate the model and difficult to determine the parameters Design Modified (DM), Code Modified (CM), reuse software (IM) and Adapted SLOC.

Estimating development effort using reuse proposed by Balda and Gustafson [12]. This model adapted the simple COCOMO model by distinguishing newly developed code that is specific to the project, newly developed code that is made for reuse and code that is modified for reuse. This model uses the four variables to represent these types of code.

COCOMO II Constructive Staged Schedule & Effort Model (COSSEMO) [13] specifies the percentages of effort and schedule to be applied to the different stages of project: Inception, Elaboration and Construction. The predicted effort and schedule from a COCOMO II correspond to the sum of effort and schedule of inception, Elaboration and Construction stages. Thus, the sum of the effort or schedule for three stages can actually total more than 100% of the COCOMO II effort and schedule.

Constructive RAD Schedule Estimation Model (CORADMO) [14] model has five drivers. Each driver has both rating levels, which are selected by a user based on the characteristics of the software project, its development organization, and its milieu. There are numeric schedule and effort multiplier values per stage for each rating level. The impact of re-use of 3GL production code is handled directly in the COCOMO II model via the re-use sub-model and its effect on size. This CORADMO driver reflects the impact of re-use of code and/or the use of very high level languages, especially during the Inception and Elaboration stages. Higher rating levels reflect the potential schedule compression impacts in Inception and Elaboration stages due to faster prototyping, option exploration. Clearly this impact will be dependent on the level of capability and experience in doing this, such as Rapid Prototyping experience. The values of the multipliers corresponding to the rating levels are the same for both effort and schedule; this implies that the staff level is held constant.

Constructive Quality Model (COQUALMO) [15] is an extension of the existing COCOMO II model to specify the quality. This model is based on the software defect introduction and removal model described by Barry Boehm. The defects conceptually flow into a holding tank through various defect source pipes. These defect source pipes are modeled in COQUALMO as the "Software Defect Introduction Model". The Defect Introduction and Defect Removal Sub-Models described above can be integrated to the existing COCOMO II cost, effort and schedule estimation model.

COnstructive COTS integration cost model (COCOTS) [16] where COTS in turn is short for commercial-off-the-shelf, and refers to those pre-built, commercially available software components that are becoming ever more important in the creation of new software systems. This model was developed as an extension of the COCOMO II cost model for reusable components based software development effort estimation. COCOTS attempts to predict the lifecycle costs of using COTS components by capturing the more significant COTS risks in its modeling parameters.

The primary approach modeled by COCOMO is the use of system components that are developed from scratch or new code. But COCOMO II also allows you to model the reusability in which system components are built out of pre-existing source code. Even most the projects are not building the reuse component from scratch but reusable component's source code can be modified to suit your needs. COCOMO II currently does not model the case in which project has access to a pre-existing component's source code.

2.2 ANN based Effort Estimation

Literature reveals that many software engineering researchers have proposed ANN based approach to estimate software development effort [10, 11, 17, 18, 19]. The back propagation trained multilayered feed forward networks is generally used in most of the research work to predict the software effort estimation. The use of ANN with a back propagation learning algorithm for effort estimation has explored [10,20,21] and found the effectiveness of the neural network technique in effort estimation. Some preliminary investigation in the use of neural network in estimating software cost and produced very accurate results[19], but the major set back in their work was due to the availability of dataset and the accuracy of the result depends on the size of the training set.

3. Problem Statement

The main and important task in the software development process is to forecast accurate and reliable effort estimate. This estimation should be realistic and trusted. Inaccurate estimates lead to major problems in quality, schedule and cost. The potential weaknesses in the current estimation models motivate the need for more accurate and realistic model for successful project execution. But nowadays, software effort estimation models are inefficient for estimating effort. The main reason for most of the software effort estimation models failures are inability use of reusability. Because software reuse has benefits such as reduced effort, improved productivity, decreased time-to-market and decreased cost. This paper focused mainly on reusability in software effort estimation. This work identifies some of the drawbacks of COCOMO II effort estimation model in terms of reusability. To overcome the drawbacks and facilitate better estimates, some of the cost drivers have been modified to support reusability. This proposed model improve the accuracy and to reduce an errors in effort estimation with the help of ANN technique and optimize the solution using Simulated Annealing algorithm. The effort estimation of proposed model is more accurate and reliable.

4. Proposed Model - COREANN

The major goal of this proposed model is estimating more accuracy and reliable software effort with the help of software reusability concept. Comparing with COREANN, Software reusability in COCOMO II is not provided an accuracy result. Instead of RUSE cost driver, three new reuse cost drivers is introduced such as Reuse Veryhigh Level Language (RVLL), Required Integrator for Product Reuse (RIPR), Reuse Application Generator (RAPG) is yielding best result for reusability in software effort estimation. The effort estimation formula of COREANN is,



------------2 ------------------2

The COREANN model Scale Factors are same as the COCOMO II [7][8] model scale factors such as PREC, FLEX, RESL, TEAM, PMAT.

-------------3 ------------------3

------------4 ------------------4


COREANN Cost Drivers:

Product reliability and complexity - RELY, DATA, CPLX, DOCU

Required reuse - RVLL, RIPR, RAPG

Platform difficulty - TIME, STOR, PVOL

Personnel capability - ACAP, PCAP, PCON

Personnel experience - PLEX

Facilities - TOOL, SITE

Required Development Schedule - SCED

4.1 New Metrics Introduction

Three cost drivers such as PEXE, AEXE, LTEX are combined into single cost driver Personnel Experience (PLEX) for reducing the software project cost.

Instead of RUSE metric in COCOMO II, three new reuse metrics are introduced,

RVLL(Reuse Very high Level Language)

RIPR(Required Integrator for Product Reuse)

RAPG(Reuse Application Generator)

4.2 New Metrics Definition and Validation Methodologies

The Goal/Question/ Metric (GQM) paradigm provides a template and guidelines to define metric goals and refine them into concrete and realistic questions, which is subsequently lead to the definition of measures. Software engineering process requires feedback and evaluation mechanism to define and validate metrics. GQM is usable as a practical guideline to design and reuse technically sound and useful measures. It provides templates for defining goal and generate questions to define new metrics in software engineering process[22][23].The main focus is to construct cost drivers for predictive models that establish a reliable effort estimation. Goals are defined in an operational way by refining them into a set of quantifiable questions that are used to extract appropriate information. The new cost drivers are defined under GQM methodology.

These new cost drivers are properly validated with the help of Theoretical (Internal) validation and Empirical (External) validation [24][25]. The important of theoretical validation is to measure and asses the metric intensions using DISTANCE framework[26] and the empirical validation by gathering the information about the metrics using survey method. To validate the EAF of proposed model, company dataset containing 20 project has been used. By adjusting the value of cost drivers, this will yield better result than past projects.

4.3 COREANN with ANN Model Implementation:

To implement ANN model, COREANN effort estimation Equation 1 should be transform from non linear model to linear model by applying natural logarithm on both sides. ANN is implemented with Enhanced RPROP.

In(PM) = In(A) + 0.91 * In(SIZE) + SF1 * 0.01 * In(SIZE) + ………. + SF5 * 0.01 * In(SIZE) + In(EM1) + In(EM2) + ……… + In(EM17) --------------- 6

[ Linear Equation ]

OPest =WT0 + WT1 * IP1 + WT2 * IP2 + …+ WT6 * IP6 + WT7 * IP7 +…+ WT23 * IP23

----------------------- 7

[ ANN Based Model For Effort Estimation]


OPest = In(PM)

IP1 = 0.91 * In(SIZE)

IP2 = SF1 * In(SIZE),………., IP6 = SF5 * In(SIZE)

IP7 = In(EM1),………,IP23 = In(EM17)

WT0 = In(A)

WT1 = 1,…………., WT23 = 1

IP1 to IP23 => Inputs

OPest => Output

WT0 => Bias

WT1 …… WT23 => Weights (Initial Value is 1)

Actual observed effort is compared with this estimated effort. The differences between these values are the error in the effort. It should be minimized.

4.4 Enhanced ERPROP Algorithm

The basic principle of ERPROP is to eliminate the harmful influence of the size of the partial derivative on the weight step. Initially, the Enhanced RPROP algorithm is declared the following parameters :

The increase factor value is η+ = 1.2

The decrease factor value is η- = 0.5

The initial update-value is ∆0 = 0.1 (∆ij = ∆0)

The values of ∆max < 50 and ∆min > 1e-6

4.5. SA Optimization

In the proposed model COREANN, Simulated Annealing Algorithm[27] is used to estimate the optimum solution of the software project effort. The given solution method is helped to get optimal values of effort:


Where, EffortM = Measured Value of Effort, EffortC = Computed Value of Effort according to the model used.

Simulated Annealing Algorithm Procedure:

1. Initialization: parameters of annealing schedule.

2. Select an iteration mechanism: a simple prescription to generate a transition from current state to another state by a small perturbation.

3. Evaluate the new state, compute the value of E = (value of current state - value of new state).

4. If the new state is better, make it current state, otherwise probabilistically accept or reject it with a determined probability function

5. if condition is true continue Step 2 otherwise terminated.

5. Performance Measures

Company database containing 20 projects is used to test the proposed COREANN model. The following evaluation criterion is used to assess and compare the performance of the proposed model with existing COCOMO II Model.

A common criterion for the evaluation of cost estimation model is the magnitude of relative error (MRE), and mean magnitude of relative error (MMRE). MRE is defined as

------------------ 9

And Mean Magnitude of Relative Error (MMRE) for N projects is defined as [11]

---------10 ------------------ 10

Next to calculate the PRED (p) value. If lower MRE & MMRE and higher PRED(25), the software estimation model effort is more accurate and predictable than other models .

------------11 ------------------ 21

where K is the number of projects where MRE is less than or equal to p ( normally p value is 25%).

5.1 Results

Out of the 20 project dataset, to forecast an effort of the proposed model. The estimated effort is comparing with existing COCOMO II and Actual effort of the project. This results are shown as Table - 1 and comparison graph also provided as below:

Table - 1 : Comparison of Effort Estimation With SA Optimization

Table - 2 shows that the result for MRE comparison of the proposed model with existing COCOMO II.

Table 2 - Comparison of Effort Estimation Results In MRE

In Table - 2, MMRE value of COREANN is 17.019 and COCOMO II is 30.592. PRED(25) value of COREANN is 80.00 and COCOMO II is 35.00. By the above result, observed value for MMRE of COREANN is less than MMRE of COCOMO II and PRED(25) of COREANN is greater than PRED(25) of COCOMO II.

6. Conclusion and Future work

In software engineering, it is extremely difficult to select appropriate model for estimation effort estimation due to the availability of number of models. Software reuse has become a major factor in development. Hence, effort estimation for reuse must accurate for the successful project execution. This paper primarily concentrated on the computation of accurate effort with software reusability as the main focus. While comparing performance results of COREANN and COCOMO II, it clearly shows that the proposed COREANN works better than COCOMO II. That is, the COREANN model is estimated lower MRE & MMRE and higher PRED(25) than the COCOMO II model. So the prediction accuracy of COREANN is high based on the performance avaluation. In future work, the effort estimated by expert judgment method has to be considered to optimize the final effort estimation. Initial value of the optimization is the effort estimated by expert judgment.