# Discussion On Accuracy And Validation Of Software Development Prediction Accounting Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Accuracy of prediction in Software development cost, schedule estimation techniques shows a great impact on rationale for which they are currently being used like Budgeting, Tradeoff and risk analysis, Project planning and control, Software Process Improvement investment analysis and benefit measurement. For the same a new software development prediction model is developed, which later on, should be used for building a Process Improvement benefit measurement framework. The model was aimed to be constructed using safe statistical methods "step-wise linear regression" that could cut effects of co-linearity and should result in a better prediction power then currently existing "Constructive Cost Model COCOMO".

This research confirms the claimed accuracy that should be achieved on reusing Constructive Cost Model (COCOMO) 1997 publicly available dataset after its up-gradation into COCOMO II post architecture using "Rosetta Stone" tool. It has been found that, after the up gradation of reused dataset, it changed in to an unbalanced dataset [1], i.e. new independent variables have been added or replaced with the new ones which were not covering all possible scale levels (very low, low, nominal, high, very high). It has also been found that new model is statistically significant but it is failed to produce expected accuracy results.

The conclusion has been made as: for constructing a good Regression Model researchers should focuses on not only using statistically safe methods but also on using balanced dataset that should cover all possible scale levels for every candidate independent variable being involved for the construction of that model.

## Categories and Subject Descriptors

D. Software, D.2. Software Engineering, D.2.9 Management, Cost Estimation, Productivity and Software Process Model CMMI

## General Terms

Measurement, Performance, Economics and Verification.

## Keywords

COCOMO, Software Engineering Economics, Data set reusability, Cost Estimation Model Accuracy, Statistical Technique for Model Building and Process Improvement Measurement.

## 1. INTRODUCTION

Accuracy of prediction in Software development cost, schedule estimation techniques shows a great impact on rationale for which they are currently being used like Budgeting, Tradeoff and risk analysis, Project planning and control, Software Process Improvement investment analysis and benefit measurement. Keeping in mind the importance of estimation models as in studies [1], [2], [3], [4], [5] this research confirms the claimed accuracy that should be achieved on reusing Constructive Cost Model (COCOMO) 1997 publicly available dataset after its up-gradation into COCOMO II post architecture dataset using "Rosetta Stone" tool. This has been done by developing a software development prediction model that should give better prediction power than a very popular estimation model COCOMO. It has also been found that new model is statistically significant but it is failed to produce expected accuracy results.

This new model observes what is the proportion of variation in dependent variable is explained by unit change in its independent variables [7], [14] and [17]. Its prediction accuracy has been checked by calculating its mean variance which should be better than the mean variance value obtained from COCOMO model. For building new model, an structured approach is being used as discussed in methodology part. It will discuss descriptive statistics using frequency graphs, stepwise regression approach will be performed to develop model. This paper explores the problems associated with using unbalanced dataset, or when dataset not covers all possible level reading related to possible Independent variables [1]. Experience shows the minimum number of observations, n, required for regression has to be at least k+1 [24]. COCOMO1997 dataset is upgraded for generating COCOMOII like model, using a Rosetta Stone method [8], which is not a perfectly balanced datasets nor the total no of data sets are sufficient to produce model with higher number of independent variables. Because some combinations of factor levels are missing and some combinations to have repeated values like in case of PCON (personal Continuity) and SITE (multi site development).

## 2. BACKGROUND

## 2.1. COCOMO Model

COCOMOII consists of three sub models, each one offering increased fidelity. Listed in increasing fidelity, these sub models are called Applications Composition, Early Design, and Post-architecture models (PAM). In our earlier research it has been proved that COCOMO-II's post architecture model has best accuracy than other two models. It gives variance in 25% range more than 75% of times for middle scale MIS based application [7]. COCOMOII model estimates the effort to develop a software system based on its projected size S, effort multipliers EMi and exponent scale factors Wi.

(1)

Where, PM is the effort expressed in person months, S is the projected size of the software project (expressed in thousands of lines of code KLOC), EMi (i = 1,2â€¦.17) are effort multipliers, and Wi (i = 1,2â€¦.5) are exponent scale factors. There are 17 effort multipliers capturing the characteristics of the software development that affect the effort to complete the project. These weighted multipliers are grouped into four categories (product, platform, personnel and project) and their product is used to adjust the effort [13].

Table 1: Effort multipliers

## S. No

## Cost Driver

## Product Attributes

1

RELY

Required Software Reliability

2

DATA

Database Size

3

CPLX

Product Complexity

## Computer Attribute

1

TIME

Execution Time Constraint

2

STOR

Main Storage Constraint

3

VIRT

Virtual Machine Volatility

4

TURN

Computer Turn Around Time

## Personal Attributes

1

ACAP

Analyst Capability

2

AEXP

Application Experience

3

PCAP

Programmer Capability

4

VEXP

Virtual Machine Experience

5

LEXP

Programming Language Experience

## Project Attributes

1

MODP

Use of Modern Programming Practices

2

TOOL

Use of Software Tool

3

SCED

Required Development Schedule

Table 2. Scaling factors

## Scale Factors (Sfj)

PREC (Precedentedness, familiarity with application)

FLEX (Flexibility for development)

RESL (Architecture/Risk Resolution)

TEAM (Turbulence in team)

PMAT (Process Maturity CMMI level)

The nominal weight assigned to each multiplier is 1.0. If a rating level has a detrimental effect on effort, its corresponding multiplier is greater than 1.0. Conversely, if the rating level reduces the effort then the corresponding multiplier is less than 1.0. There are five exponent scale factors, which account for the relative economies or diseconomies of scale encountered as a software project increases its size based on different nominal values and rating schemes.

In COCOCMO II the major new modeling capabilities are a tailorable family of software size models, involving object points, function points and source lines of code; nonlinear models for software reuse and reengineering; an exponent-driver approach for modeling relative software diseconomies of scale; and several additions, deletions, and updates to previous COCOMO effort-multiplier cost drivers [13].

## 2.2. Rosetta stone method for data up gradation

Reference [8] gives a method, which is claimed to be a very important method to help COCOMO users for converting COCOMO 81 files to run using the new COCOMO II software.

## Table 3. Rosetta stone conversion factors

## COCOMO81 Drivers

## COCOMOII Drivers

## Conversion Factor

RELY

RELY

None, rate the same or the actual

DATA

DATA

None, rate the same or the actual

CPLX

CPLX

None, rate the same or the actual

TIME

TIME

None, rate the same or the actual

STOR

STOR

None, rate the same or the actual

VIRT

PVOL

TURN

Use values in table 4

ACAP

ACAP

None, rate the same or the actual

PCAP

PCAP

None, rate the same or the actual

VEXP

PEXP

AEXP

AEXP

None, rate the same or the actual

LEXP

LEXP

None, rate the same or the actual

TOOL

TOOL

Use values in Table 5

MODP

Adjust PMAT settings

If MODP is rated VL or L, set PMAT to VL

N, set PMAT to L

H or VH, set PMAT to N

SCED

SCED

None, rate the same or the actual

REUSE

DOCU

If Mode = Organic, set to L

= semi detached, set to N

=Embedded, set to H

PCON : Personnel continuity

Ser to N, or actual if available

SITE

Multisite development

Set to H, or actual if available

## Table 4. Rosetta stone conversion factors

## COCOMO II Multiplier / COCOMO 81 Ratings

## VL

## L

## N

## H

## VH

TURN

1.00

1.15

1.23

1.32

TOOL

1.24

1.10

1.00

## Table 5. Rosetta stone mode scale factor conversion rating

## MODE/SCALE FACTORS

## ORGANIC

## SEMI-DETACHED

## EMBEDDED

Precedentedness (PREC)

XH

H

L

Development Flexibility (FLEX)

XH

H

L

Architecture/Risk Resolution (RESL)

XH

H

L

Team Cohesion (TEAM)

XH

VH

N

Process Maturity (PMAT)

MODP

MODP

MODP

## Table 6. Prediction accuracy of COCOMOII model

## Cases

## PRED

## Accuracy (Relative Errors)

## Before Stratification by Organization

Using 1997 dataset of 83 projects

PRED (.25)

49%

Using Rosetta Stone with no adjustment

PRED (.25)

60%

Cost estimation model by the COCOMO research team at the Center for Software Engineering at the University of Southern California (USC). It allows users to update estimates made with the earlier version of the model so that they can take full advantage of the many new features incorporated into the COCOMO II package. Following the stated instructions, COCOMO 81 dataset that comprises of 83 datasets, has been converted in a form to generate a COCOMO II type new and more accurate model.

## 3. METHODOLOGY FOLLOWED

Research methodology comprise of following steps:

Literature Review.

Conducted email based interviews with practitioners and experts for:

Confirming usefulness of selected topic.

To confirm the factors, which were identified from literature.

Selection of Questionnaire for conducting survey

Identification of population.

It concentrates on candidate prediction model and its prediction power, which could be in future, utilized for predicting Software Process Improvement's impact on prediction of Productivity Parameters. Special case which we are focusing here is the reusability of available COCOMO data set and its validity.

Then Focuses on building statistical model.

Model Construction: The mathematical model construction with significant factor (95% confidence level).

Hypothesis Testing.

Model Evaluation.

In future a mechanism will be devised to help in generating process improvement's impact on performance data.

## 3.1. Hypothesis Statement

Following is the hypothesis of this research:

The Null hypothesis is posed that the claimed mean value of Rosetta Stone is same for new generated model Set of data may provide evidence, which will support the conclusion for rejecting the null hypothesis. The test is performed set to 0.05, which says there is only a 5% chance that we will incorrectly reject the null hypothesis [24].

Population 1: Estimated Effort from New model built on reused dataset using Rosetta stone.

Population 2: Estimated Effort from COCOMO II model, using reused dataset after applying Rosetta stone method.

H0 = Mean of both population locations are same and equals to the claimed accuracy value by Rosetta stone see table 6.

H1= The location of population 1 (New model's Mean) is to the left of the location of population 2 (Claimed mean).

## 3.2. Statistical model building process

Following model building steps were followed [14], [15], [16], [17], [18], [19] and [20]:

Specification of model's functional form comprise of identification of predictor variables.

Application of Exploratory Data Analysis (EDA) for data auditing is performed on observational data in order to:

Reveal possible errors in the data, e.g. outliers.

Reveal features of the dataset, e.g. symmetry, skew, scatter.

Test for a Normal distribution.

Performing following test cases in order to remove collinearity problem:

Removal of Outliers.

Analysis should not depend heavily on few cases correction in sign.

## Figure 1. Frequency Graph of Non Normal Actual Effort Data

Figure 2. Frequency graph of converted lognormal form of actual effort data

The model used here is a non-linear Multiple Regression Model. On checking normality of dependent and independent variables through frequency graphs, the data has been found as non-normal. Therefore log values has been taken in order to convert non-normal data in to normal data, which is basic requisite for constructing Multiple Regression based model. Non-linear multiplicative model is the best representative to depict the relationship between single dependent and multiple independent variables, thus it would be capable of depicting economies and diseconomies of scales.

(2)

Model can be transformed into a linear model by taking the logarithm of both sides as defined in the field of Econometrics [21], [7].

(3)

Here estimated response variables, Xi's are the predictor variables, Bi's are coefficients, k is the number of predictor variables as used in COCOMO II model . On applying log on COCOMO II its Linearize equation becomes:

(4)ln(PM) = ln (effort) = b0 + b1 ln (size) + b2 SF * ln (size) + â€¦+ b6 SF5 * ln (size) + b7 ln (em1) +â€¦. b23 ln (em17) + e

Reference [21], [22], [23], [24], [25] and [26] is used for driving research model. Regression model is checked to satisfy following five assumptions, to confirm the validity of its results:

The independent variables and the dependent variables have a linear relationship after applying Log-Log model is expressed by Equation.

The dependent variable is a continuous random variable and the independent variables are set at various values and are not random.

Data Heterogenity i.e. (non constant variance, subgroup behavior): The variances of the dependent variable are equally distributed given various combinations of the independent variables. MRE values are fluctuating in new model that is why unable to give estimations in 25% accuracy range.

Successive observed values of the dependent variable are uncorrelated. VIF values shows that successive observed predicted values are not correlated.

The distribution of the sampling error, ei, in the regression model is normal. See figure 3 for reference.

(1)Interdependence of independent variables is called multicollinearity [7]. If the independent variables are not linearly independent from each other, determining the contribution of each independent variable will be difficult because the effects of the independent variables are mixed. Thus the regression coefficients may be incorrectly estimated.

Stepwise Linear Regression model automatically checks this property and remove that independent variable that causes multicollinearity [24], [25].

The Stepwise criteria used for entering or removing independent variables from model is: Probability-of-F-to-enter <= 0.05, Probability-of-F-to-remove >= 100).

T and F tests are performed to determine if particular slope (coefficient of independent variable) made significant contribution. High F value of 92 and

P value is tracked to check correlation of variables with dependent variables. P value < 0.05 is checked for 95% significance level to retain the estimation variable in model.

VIF Variance Inflation Factor's value shows coefficients are either poorly estimated because of collinearity. It is represented by VIFj = 1/(1- R2); where R2 is multiple correlation. Variance of each regression coefficient is inversely proportional to Rj and directly proportional to the variance inflation factor. If predicted variables are not correlated with each other, then the VIFs for the coefficients would be 1. For Rj = 0.95 then the variance of jth coefficient would be inflated by a factor of ten. Experience indicates VIF larger then ten indicate coefficients that are poorly estimated because of collinearity [10]. Table 11 shows VIF values lesser than 10 means all the coefficients are not correlated.

The models are evaluated using Pred (25) on MRE [12].

Both Test set = Train set approach and Cross validation is used to test model's forecast accuracy. COCOMO81 1997 dataset is used as train/ calibration set where as NASA 93 dataset is used as Test/ validation set.

## 4. MODEL RESULT

This model is the result of applying stepwise multiple regression analysis. The scope of parameter coverage over the four effort influencing areas, product, process, team, and environment, is not broad enough to claim this model accounts for all factors that influence effort. Tables 9-11 show the new model as significant one but PRED (25) shows that model accuracy is damaged due to the facts discussed in section IV and V. Using coefficients from table 10, equation (6) is constructed to show the resulting estimation:

(5)PM = Effort = 2.13 * KLOC1.09 * KLOCTEAM .09 * KLOCPMAT .067 * PCAP 1.66 *

(6) CPLX 1.326 * AEXP 1.09 * RELY 2.09 * PEXP 4.49 * ACAP 1.78

PM = Effort = 2.13 * KSLOC (1.09 + 0.01 * Î£ SF i 1 - 2) * Î EMj 1-6

## Table 7. Model Initiating and New Coefficients Values

## Driver

## VL

## L

## N

## H

## VH

## XH

SF1: TEAM

5.48

4.38

3.29

2.19

1.10

0.00

B: 0.09

0.49

0.39

0.29

0.19

0.10

SF2: PMAT

7.80

6.24

4.68

3.12

1.56

0.00

B: 0.067

.523

.418

.3135

.209

.104

EM1: PCAP

1.42

1.17

1.00

0.86

0.70

B: 1.66

2.36

1.94

1.66

1.43

1.16

EM2: CPLX

0.70

.85

1.00

1.15

1.30

1.65

B: 1.326

0.93

0.79

1.33

0.91

1.18

1.95

EM3: AEXP

1.29

1.13

1.00

0.91

0.82

B: 1.09

1.41

1.23

1.09

0.99

0.89

EM4: RELY

0.75

.88

1.00

1.15

1.40

B: 2.09

1.57

1.84

2.09

2.40

2.93

EM5: PEXP

1.19

1.09

1

0.91

0.85

B: 4.5

5.35

4.90

4.5

4.09

3.82

EM6: ACAP

1.46

1.19

1.00

.86

.71

B: 1.78

2.6

2.12

1.78

1.53

1.26

## Table 9. ANOVA

## Sum of Squares

## df

## Mean Square

## F

## Sig.

Regression

34.258

9

3.806

92.93

0.000(k)

Residual

2.089

51

.041

Total

36.347

60

## Table 10. Coefficients

## Unstandardized

## Coefficients

## t

## Sig.

B

Std. Error

Lower

Bound

Upper

Bound

Constant

.758

.154

4.910

.000

lnKLOC

1.099

.055

20.084

.000

TEAM

.094

.025

3.721

.000

PMAT

.067

.022

2.988

.004

lnpcap

1.660

.529

3.138

.003

lncplx

1.326

.426

3.111

.003

lnaexp

1.090

.603

1.808

.076

lnrely

2.093

.497

4.209

.000

lnpexp

4.490

.879

5.105

.000

lnacap

1.778

.627

2.835

.007

## Table 11. Coefficients Cont

## Correlations

## Collinearity

## Statistics

## VIF

## B

## Std.

## Error

## (Constant)

## lnKLOC

.674

.583

1.714

## TEAM

.125

.476

2.101

## PMAT

.100

.720

1.389

## lnpcap

.105

.456

2.192

## lncplx

.104

.526

1.902

## lnaexp

.061

.687

1.456

## lnrely

.141

.439

2.279

## lnpexp

.171

.672

1.489

## lnacap

.095

.371

2.694

## Table 12. Model Prediction Accuracy

Project i

Actual mmai

Research Model Est.

mmei

## Research Model MERi = |REi|

## = |(mmai-mmei) / mmei|

COCOMO

Est.

mmei

## COCOMO II

## MERi =

## | REi | =

## | (mmai-mmei) / mmei |

## P4

6400

3090

0.517

212.7

0.966

## P1

2040

1556

0.237

94.71

0.95

## P2

1600

1654.6

0.034

21.64

0.986

## P3

243

607

1.497

.123

0.999

## P5

12

68.3

4.69

1.420

0.888

## P6

8

45

4.625

0.553

0.93125

## MMER =

Mean Abs = MER =

[ïƒ¥i |MERi| ]/n

MER = 1.933

MER = 0.952

## MMER% =

([ïƒ¥i (MERi)] /N) * 100

MMER* 100

## = 193% > 25%

## 95.2% > 25%

## For N Odd:

MdMRE % = [(MER[n+1]/] * 100

## For N Even:

## MdMRE =

([(MER ([n]/2)) + (MER([n]/2)+1)]

/2) *100

N=6,

MER 3rd+ 4th reading

= (0.034 + 1.497) *100

## = 76.55 % > 25%

99.25% > 25%

## PRED (25%) = [(K/N) * 100],

K = Number of observations where MER is less than or equal to Limit.

N = Number of observations

(2/6) * 100 = 33.3%

## 0%

## Table 13. Proximity similarity matrix, correlations of observed dependent variable

## Correlation between Vectors of Values

## 1

## 2

## 3

## 4

## 5

## 6

## 1

1.000

.000

.000

.000

.000

.000

## 2

.000

1.000

.000

.000

.000

.000

## 3

.000

.000

1.000

.000

.000

.000

## 4

.000

.000

.000

1.000

.000

.000

## 5

.000

.000

.000

.000

1.000

.000

## 6

.000

.000

.000

.000

.000

1.000

## Figure 3. Normal graph for sample error

## 4. MODEL FORECAST ACCURACY

Table 12 shows prediction accuracy of Research model which is PRED (25%) = 33% of times and COCOMO model's PRED (25%) = 0% of time. The Rosetta Stone was developed to provide its users with both a process and tool for converting their original COCOMO 81 files so that they can be used with the new COCOMO II estimating model.

H0 = failed, As Mean of both population locations are Not same and not equals to the claimed accuracy value by Rosetta stone, table 6. From table 6 the claimed accuracy of using old dataset after applying Rosetta Stone for generating COCOMO II model is 60% which is about half in reality i.e. 33.3%. On the other hand if old dataset is used to validate model accuracy of COCOMO II model by first converting that data applying Rosetta stone it gives COCOMO II model accuracy about 0% where as claimed accuracy of COCOMO II model is 75% of times.

H1= Failed, The location of population 1 (New model's Mean) is not to the left of the location of population 2 (Claimed mean). New Model's MMER% = 193% and COCOMO II model's MMER% = 95.2%, therefore the new model showes worst results.

Using Cross Validation might bring better predication accuracy but still even on using same train sets as a test set it shows unreliable accuracy.

## 5. SUMMARY AND CONCLUSION

This study shows and concludes following observations:

A good Regression Model doesn't always means to be accurate model specifically incase of dataset reusability.

The Rosetta Stone which was developed to provide its users a tool for converting their original COCOMO 81 files so that they can be used with the new COCOMO II estimating model is not reliable method for making old dataset useful for industrial as well as well as research purpose due to following observations:

It assigns single weightage to some newly added independent variables, which causes removal of that independent variable from the regression model. Figure 6 shows sample graph for lnPCON variable.

It is very necessary that collected Data set must cover all possible levels for candidate independent variables.

## Figure 6. Partial Regression Plot between lneffort and PMAT variable

Reuse, pcon and site show same reading for all dataset, which caused removal of these independent variables from model. Its prediction accuracy could be improved if dataset become publicly available for further research. Alternative data sets could substantially change the estimates of the coefficients consequently making them insignificant.

## 6. RESEARCH CONTRIBUTION

This research resulted in following contributions to knowledge:

Discussed Empirical based model-building steps that fulfill Regression model assumptions like Normality of data, Heterogeneity of residuals and interdependency between estimates.

Discussed the criticalities in detail while using transformed COCOMO81 data set in to COCOMOII dataset, through applying Rosetta Stone steps.

Discussed how unbalanced dataset effects the construction of regression model.

It explores and validates the claimed accuracy which should be achieved on reusing COCOMO81's dataset using the Rosetta Stone tool in order to generate COCOMO II (Post Architecture) like new research model using stepwise regression model. Contribution of this research is the detail exploration and recommendations related to unbalanced data, which is generated by applying rosetta stone method for upgrading publicly available data. Thus concludes a good Regression Model doesn't always means accurate model incase of reusability of datasets.

## 7. ACKNOWLEDGEMENT

I would like to thank Mr. B. K. Clark for sharing his research work and to Dr. Mahmood Niazi, faculty member and a part of empirical research group at Keele University UK whose input helped in giving right direction to this research.