Inference System For Software Development Effort Estimation Computer Science Essay

Published:

Abstract - Software estimation such as cost estimation, effort estimation, quality estimation and risk analysis is a major challenge for Software Projects. The literature shows several algorithmic cost estimation models such as Boehm's COCOMO, Albrecht's' Function Point Analysis, Putnam's SLIM, ESTIMACS etc., but each model has its own pros and cons for estimation and still there is a need to find a model that gives accurate estimates. In this paper, soft computing models using Adaptive Neuro-Fuzzy Inference System (ANFIS) are used for software development effort prediction. ANFIS Models are designed to improve the performance of the network that suits to the COCOMO Model. ANFIS Models are created using Triangular, GBell, and Trapezoidal and Gauss membership functions. A case study based on NASA 93 projects compares the proposed models with the Intermediate COCOMO. The results were analyzed using five different criterions MMRE, MARE, VARE, Mean BRE and Prediction. It is observed that the proposed ANFIS models combined with the neural network adaptive capabilities and the fuzzy inference system indicate a high level of efficiency with an accuracy level of more than 97%.

Index Terms- Cognitive Simulation, Cost Estimation, Knowledge Acquisition, Neural Nets

Lady using a tablet
Lady using a tablet

Professional

Essay Writers

Lady Using Tablet

Get your grade
or your money back

using our Essay Writing Service!

Essay Writing Service

I Introduction

In algorithmic cost estimation [1], costs and efforts are predicted using mathematical formulae. The formulae are derived based on some historical data [2]. The best known algorithmic cost model called COCOMO (COnstructive COst MOdel) was proposed in 1981 by Barry Boehm [1]. It was developed from the analysis of 63 software projects. Boehm proposed three levels of the model called Basic COCOMO, Intermediate COCOMO and Detailed COCOMO [1, 5]. In the present paper we mainly focus on the Intermediate COCOMO.

A. Intermediate COCOMO

The Basic COCOMO model [1, 5] is based on the relationship: Development Effort, DE = a*(SIZE)b where, SIZE is measured in KLOC. The constants a, b are dependent upon the 'mode' of development of projects. DE is measured in man-months or person/months. Boehm proposed 3 modes of projects [1,5]:

1. Organic mode - simple projects that engage small teams working in known and stable environments.

2. Semi-detached mode - projects that engage teams with a mixture of experience. It is in between organic and embedded modes.

3. Embedded mode - complex projects that are developed under tight constraints with changing requirements.

The accuracy of Basic COCOMO is limited because it does not consider the factors like hardware, personnel, use of modern tools and other attributes that affect the project cost. Further, Boehm proposed the Intermediate COCOMO[1,4] that adds accuracy to the Basic COCOMO by multiplying 'Cost Drivers' into the equation with a new variable: EAF (Effort Adjustment Factor) shown in Table 1.

TABLE I

DE FOR THE INTERMEDIATE COCOMO

Development Mode

Intermediate Effort Equation

Organic

DE = EAF * 3.2 * (SIZE)1.05

Semi-detached

DE = EAF * 3.0 * (SIZE)1.12

Embedded

DE = EAF * 2.8 * (SIZE)1.2

The EAF term is the product of 15 Cost Drivers [3,5] that are listed in Table 2 .The multipliers of the cost drivers are Very Low, Low, Nominal, High, Very High and Extra High. For example, for a project, if RELY is Low, DATA is High , CPLX is extra high, TIME is Very High, STOR is High and rest parameters are Nominal then EAF = 0.75 * 1.08 *1.65*1.30*1.06 *1.0. If the category values of all the 15 cost drivers are "Nominal", then EAF is equal to 1.

TABLE II

INTERMEDIATE COCOMO COST DRIVERS WITH MULTIPLIERS

S. No

Cost Driver Symbol

Very low

Low

Nominal

High

Very high

Extra high

1

RELY

0.75

0.88

1.00

1.15

1.40

-

2

DATA

-

0.94

1.00

1.08

1.16

-

3

CPLX

0.70

0.85

1.00

1.15

1.30

1.65

4

TIME

-

-

1.00

1.11

1.30

1.66

5

STOR

-

-

1.00

1.06

1.21

1.56

6

VIRT

-

0.87

1.00

1.15

1.30

-

7

TURN

-

0.87

1.00

1.07

1.15

-

8

ACAP

-

0.87

1.00

1.07

1.15

-

9

AEXP

1.29

1.13

1.00

0.91

0.82

-

10

PCAP

1.42

1.17

1.00

0.86

0.70

-

Lady using a tablet
Lady using a tablet

Comprehensive

Writing Services

Lady Using Tablet

Plagiarism-free
Always on Time

Marked to Standard

Order Now

11

VEXP

1.21

1.10

1.00

0.90

-

-

12

LEXP

1.14

1.07

1.00

0.95

-

-

13

MODP

1.24

1.10

1.00

0.91

0.82

-

14

TOOL

1.24

1.10

1.00

0.91

0.83

-

15

SCED

1.23

1.08

1.00

1.04

1.10

-

The 15 cost drivers are broadly classified into 4 categories [1,5].

1. Product: RELY- Required software reliability

DATA- Data base size

CPLX- Product complexity

2. Platform: TIME - Execution time

STOR - main storage constraint

VIRT - virtual machine volatility

TURN - computer turnaround time

3. Personnel: ACAP - analyst capability

AEXP - applications experience

PCAP - programmer capability

VEXP - virtual machine experience

LEXP - language experience

4. Project: MODP - modern programming

TOOL - use of software tools

SCED - required development schedule

Depending on the projects, multipliers of the cost drivers will vary and thereby the EAF may be greater than or less than 1, thus affecting the Effort [5].

The effort multipliers are as follows:

increase | acap | analysts capability

these to | pcap | programmers capability

decrease | aexp | application experience

effort | modp | modern programing practices

| tool | use of software tools

| vexp | virtual machine experience

| lexp | language experience

----------------------------------------------------------

| sced | schedule constraint

----------------------------------------------------------

decrease | stor | main memory constraint

these to | data | data base size

decrease | time | time constraint for cpu

effort | turn | turnaround time

| virt | machine volatility

| cplx | process complexity

| rely | required software reliability

II MACHINE LEARNING TECHNIQUES

In this section we present the categories of Machine learning like Fuzzy Logic, Neural Networks[8,9] and then the proposed ANFIS Models.

A. Fuzzy Logic

A fuzzy model is used when the systems are not suitable for analysis by conventional approach or when the available data is uncertain, inaccurate or vague [7]. The point of Fuzzy logic is to map an input space to an output space using a list of if-then statements called rules. All rules are evaluated in parallel, and the order of the rules is unimportant. For writing the rules, the inputs and outputs of the system are to be identified. The Intermediate COCOMO model data is used for developing the Fuzzy Inference System (FIS)[10,14]. The inputs to this system are MODE and SIZE. The output is Fuzzy Development Effort.

Advantages :

1)The advantage of using the fuzzy ranges is that we will be able to predict the effort for projects that do not come under a precise mode i.e. comes in between 2 modes. This situation cannot be handled using the COCOMO.

Disadvantages :

1)It is hard to maintain a degree of meaningfulness. The whole work have to be redefined for a newer dataset.

B. Neural Networks

A neural network [12,14] is a massive parallel distributed processor made up of simple processing units, which has a natural propensity for storing experimental knowledge and making it available for use. It resembles the brain in two respects [4, 7, 11]:

1)Knowledge is acquired by the network from its environment through a learning process[15]

2)Interneuron connection strengths, known as synaptic weights, are used to store the acquired knowledge.

Advantages:

Artificial neural networks can model complex non-linear relationships and approximate any measurable function so it is very useful in problems where there is a complex relationship between inputs and outputs.

Many different algorithms are available to choose from.

Disadvantages:

There is no clear guidance on how to design neural nets like for example how many hidden layers are to be present.

Accuracy depends on larger training dataset which is not always available.

They are effectively black boxes, once given the inputs; the generated outputs have to be accepted.

C. Neuro - Fuzzy Model

The acronym for Adaptive Neuro-Fuzzy Inference System is ANFIS. Using a given input/output data set, the toolbox function anfis constructs a fuzzy inference system (FIS) whose membership function parameters are tuned (adjusted) using either a back propagation algorithm alone or in combination with a least squares type of method. This adjustment allows the fuzzy systems to learn from the data they are modeling. A network-type structure similar to that of a neural network, which maps inputs through input membership functions and associated parameters, and then through output membership functions and associated parameters to outputs, can be used to interpret the input/output map.

Lady using a tablet
Lady using a tablet

This Essay is

a Student's Work

Lady Using Tablet

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Examples of our work

The parameters associated with the membership functions changes through the learning process. The computation of these parameters (or their adjustment) is facilitated by a gradient vector. This gradient vector provides a measure of how well the fuzzy inference system is modeling the input/output data for a given set of parameters. When the gradient vector is obtained, any of several optimization routines can be applied in order to adjust the parameters to reduce some error measure. This error measure is usually defined by the sum of the squared difference between actual and desired outputs

The hybridization of neural networks and fuzzy logic is the basic idea behind the neuro-fuzzy system. Neuro-fuzzy hybridization is done in two ways : fuzzy neural networks (FNN) and neuro-fuzzy systems (NFS). FNN is a neural network equipped with the capability of handling fuzzy information. NFS is a fuzzy system augmented by neural networks to enhance some characteristics like flexibility and adaptability. This paper is based on the second approach.

The Takagi-Sugeno neuro-fuzzy system was used which makes use of a mixture of back propagation to learn the membership functions and least mean square estimation to determine the coefficients of the linear combination in the rule's conclusions. The Takagi-Sugeno neuro-fuzzy system schema is depicted in Fig. 1 :

Figure 1. Takagi-Sugeno Neuro Fuzzy system

Perhaps the first integrated hybrid neuro-fuzzy model is ANFIS, and also due to Takagi-Sugeno rules implementation in ANFIS, it has lowest Root Mean Square Error (RMSE) among the other Neuro-Fuzzy models. So ANFIS was used here for implementing neuro-fuzzy model. In ANFIS, the adaptation (learning) process is only concerned with parameter level adaptation within fixed structures. The objective of the parameter-learning phase is to adjust parameters of the fuzzy inference system (FIS) such that the error function during training dataset, reaches minimum or is less than a given threshold .

D. Membership Functions

A membership function (MF) [9] is a curve that defines how each point in the input space is mapped to a membership value (or degree of membership) between 0 and 1. The input space is also called as the universe of discourse. For our problem, we have used 2 types of membership functions:

Triangular membership function.

GBell membership function.

Trapezoidal Membership function.

Gauss Membership function.

III VARIOUS CRITERIONS FOR ASSESSMENT OF

ESTIMATION MODELS

1. Variance Accounted For (VAF)

(1)

2. Mean Absolute Relative Error (MARE)

(2)

3. Variance Absolute Relative Error (VARE)

(3)

4. Prediction (n)

Prediction at level n is defined as the % of projects

that have absolute relative error less than n.

5.. Balance Relative Error (BRE)

(4)

Where, N = No. of Projects

E = estimated effort Ê = actual effort

Absolute Relative Error (RE) = (5)

A model which gives lower MARE (2) is better than that which gives higher MARE. A model which gives lower VARE is better than that which gives higher VARE [6,11]. A model which gives lower BRE (4) is better than that which gives higher BRE. A model which gives higher Pred (n) is better than that which gives lower Pred (n). A model which gives lower MMRE [6] is better than that which gives higher MMRE.

IV Experimental Study

The NASA 93 database consists of 93 projects data have chosen the Nasa93 dataset. Out of 93 projects, randomly selected 83 projects are used as training data. The Model is tested using the entire dataset. The estimated efforts using Intermediate COCOMO, ANFIS Models using TriangularMF, GBellMF, Trapezoidal MF are shown for some sample projects in Table 3. The Effort is calculated in man-months. Table 4 and Fig.5., Fig.6., Fig.7., Fig.8., Fig.9., Fig.10. & Fig. 11. shows the comparisons of various models [10] basing on different criterions.

TABLE III

ESTIMATED EFFORT IN MAN MONTHS OF VARIOUS MODELS

PID

ACTUAL EFFORT

Estimated EFFORT using

COCOMO

TriMF

GBellMF

TrapMF

GaussMF

2

117.6

95.2

120.7

114.9

127.4

111.7

4

36.0

27.8

39.9

41.7

40.8

42.2

6

8.4

6.4

9.0

9.7

10.4

9.1

8

352.8

290.5

352.8

352.7

352.8

352.6

11

24.0

10.5

27.0

27.7

28.9

27.8

19

48.0

29.4

78.3

75.2

73.5

75.8

24

90.0

62.2

78.3

75.2

73.5

75.8

26

48.0

31.0

51.9

49.9

49.7

50.3

33

18.0

17.8

24.3

24.2

26.3

24.4

39

42.0

35.4

38.7

40.6

39.7

41.0

40

114.0

85.5

87.7

84.3

83.8

85.6

59

4560.0

24726

4559

4560

4559

4560

72

300.0

464.0

298.6

298.2

201.5

296.3

77

1200

2727

1200

1200

1200

1200

85

4178

3555

4178

4178

4178

4178

93

38.0

37.1

37.9

38.0

38.0

38.0

TABLE IV

COMPARISON OF VARIOUS MODELS

Model

VAF

MARE

VARE

Mean BRE

Pred(30)%

COCOMO

33.64

47.22

46.89

0.77

53

Fuzzy-Trimf

99.14

15.06

5.91

0.20

81

Fuzzy-Gbellmf

99.09

15.78

6.56

0.20

77

Fuzzy-Trapmf

98.70

29.86

171.49

0.35

73

Fuzzy-Gaussmf

99.13

15.14

5.99

0.20

78

Figure 2. Comparision of VAF & MARE against various models

Figure 3. Comparision of VARE against various models

Figure 4. Comparision of Mean BRE against various models

Figure 5. Comparision of Pred(30)% against various models

V CONCLUSION

Referring to Table IV, we see that NeuroFuzzy Model[13] using Tri MF yields better results for maximum criterions when compared with the other models. Thus, basing on MARE, VARE, Mean BRE, MMRE[14] & Pred(30) we come to a conclusion that Neuro-Fuzzy Model using TriMF is apt. Therefore we proved that it's better to create an ANFIS Model using some training data and use it for effort estimation for all the other projects.