Abstract - Software estimation such as cost estimation, effort estimation, quality estimation and risk analysis is a major challenge for Software Projects. The literature shows several algorithmic cost estimation models such as Boehm's COCOMO, Albrecht's' Function Point Analysis, Putnam's SLIM, ESTIMACS etc., but each model has its own pros and cons for estimation and still there is a need to find a model that gives accurate estimates. In this paper, soft computing models using Adaptive Neuro-Fuzzy Inference System (ANFIS) are used for software development effort prediction. ANFIS Models are designed to improve the performance of the network that suits to the COCOMO Model. ANFIS Models are created using Triangular, GBell, and Trapezoidal and Gauss membership functions. A case study based on NASA 93 projects compares the proposed models with the Intermediate COCOMO. The results were analyzed using five different criterions MMRE, MARE, VARE, Mean BRE and Prediction. It is observed that the proposed ANFIS models combined with the neural network adaptive capabilities and the fuzzy inference system indicate a high level of efficiency with an accuracy level of more than 97%.
Index Terms- Cognitive Simulation, Cost Estimation, Knowledge Acquisition, Neural Nets
In algorithmic cost estimation , costs and efforts are predicted using mathematical formulae. The formulae are derived based on some historical data . The best known algorithmic cost model called COCOMO (COnstructive COst MOdel) was proposed in 1981 by Barry Boehm . It was developed from the analysis of 63 software projects. Boehm proposed three levels of the model called Basic COCOMO, Intermediate COCOMO and Detailed COCOMO [1, 5]. In the present paper we mainly focus on the Intermediate COCOMO.
A. Intermediate COCOMO
The Basic COCOMO model [1, 5] is based on the relationship: Development Effort, DE = a*(SIZE)b where, SIZE is measured in KLOC. The constants a, b are dependent upon the 'mode' of development of projects. DE is measured in man-months or person/months. Boehm proposed 3 modes of projects [1,5]:
1. Organic mode - simple projects that engage small teams working in known and stable environments.
2. Semi-detached mode - projects that engage teams with a mixture of experience. It is in between organic and embedded modes.
3. Embedded mode - complex projects that are developed under tight constraints with changing requirements.
The accuracy of Basic COCOMO is limited because it does not consider the factors like hardware, personnel, use of modern tools and other attributes that affect the project cost. Further, Boehm proposed the Intermediate COCOMO[1,4] that adds accuracy to the Basic COCOMO by multiplying 'Cost Drivers' into the equation with a new variable: EAF (Effort Adjustment Factor) shown in Table 1.
DE FOR THE INTERMEDIATE COCOMO
Intermediate Effort Equation
DE = EAF * 3.2 * (SIZE)1.05
DE = EAF * 3.0 * (SIZE)1.12
DE = EAF * 2.8 * (SIZE)1.2
The EAF term is the product of 15 Cost Drivers [3,5] that are listed in Table 2 .The multipliers of the cost drivers are Very Low, Low, Nominal, High, Very High and Extra High. For example, for a project, if RELY is Low, DATA is High , CPLX is extra high, TIME is Very High, STOR is High and rest parameters are Nominal then EAF = 0.75 * 1.08 *1.65*1.30*1.06 *1.0. If the category values of all the 15 cost drivers are "Nominal", then EAF is equal to 1.
INTERMEDIATE COCOMO COST DRIVERS WITH MULTIPLIERS
Cost Driver Symbol
The 15 cost drivers are broadly classified into 4 categories [1,5].
1. Product: RELY- Required software reliability
DATA- Data base size
CPLX- Product complexity
2. Platform: TIME - Execution time
STOR - main storage constraint
VIRT - virtual machine volatility
TURN - computer turnaround time
3. Personnel: ACAP - analyst capability
AEXP - applications experience
PCAP - programmer capability
VEXP - virtual machine experience
LEXP - language experience
4. Project: MODP - modern programming
TOOL - use of software tools
SCED - required development schedule
Depending on the projects, multipliers of the cost drivers will vary and thereby the EAF may be greater than or less than 1, thus affecting the Effort .
The effort multipliers are as follows:
increase | acap | analysts capability
these to | pcap | programmers capability
decrease | aexp | application experience
effort | modp | modern programing practices
| tool | use of software tools
| vexp | virtual machine experience
| lexp | language experience
| sced | schedule constraint
decrease | stor | main memory constraint
these to | data | data base size
decrease | time | time constraint for cpu
effort | turn | turnaround time
| virt | machine volatility
| cplx | process complexity
| rely | required software reliability
II MACHINE LEARNING TECHNIQUES
In this section we present the categories of Machine learning like Fuzzy Logic, Neural Networks[8,9] and then the proposed ANFIS Models.
A. Fuzzy Logic
A fuzzy model is used when the systems are not suitable for analysis by conventional approach or when the available data is uncertain, inaccurate or vague . The point of Fuzzy logic is to map an input space to an output space using a list of if-then statements called rules. All rules are evaluated in parallel, and the order of the rules is unimportant. For writing the rules, the inputs and outputs of the system are to be identified. The Intermediate COCOMO model data is used for developing the Fuzzy Inference System (FIS)[10,14]. The inputs to this system are MODE and SIZE. The output is Fuzzy Development Effort.
1)The advantage of using the fuzzy ranges is that we will be able to predict the effort for projects that do not come under a precise mode i.e. comes in between 2 modes. This situation cannot be handled using the COCOMO.
1)It is hard to maintain a degree of meaningfulness. The whole work have to be redefined for a newer dataset.
B. Neural Networks
A neural network [12,14] is a massive parallel distributed processor made up of simple processing units, which has a natural propensity for storing experimental knowledge and making it available for use. It resembles the brain in two respects [4, 7, 11]:
1)Knowledge is acquired by the network from its environment through a learning process
2)Interneuron connection strengths, known as synaptic weights, are used to store the acquired knowledge.
Artificial neural networks can model complex non-linear relationships and approximate any measurable function so it is very useful in problems where there is a complex relationship between inputs and outputs.
Many different algorithms are available to choose from.
There is no clear guidance on how to design neural nets like for example how many hidden layers are to be present.
Accuracy depends on larger training dataset which is not always available.
They are effectively black boxes, once given the inputs; the generated outputs have to be accepted.
C. Neuro - Fuzzy Model
The acronym for Adaptive Neuro-Fuzzy Inference System is ANFIS. Using a given input/output data set, the toolbox function anfis constructs a fuzzy inference system (FIS) whose membership function parameters are tuned (adjusted) using either a back propagation algorithm alone or in combination with a least squares type of method. This adjustment allows the fuzzy systems to learn from the data they are modeling. A network-type structure similar to that of a neural network, which maps inputs through input membership functions and associated parameters, and then through output membership functions and associated parameters to outputs, can be used to interpret the input/output map.
The parameters associated with the membership functions changes through the learning process. The computation of these parameters (or their adjustment) is facilitated by a gradient vector. This gradient vector provides a measure of how well the fuzzy inference system is modeling the input/output data for a given set of parameters. When the gradient vector is obtained, any of several optimization routines can be applied in order to adjust the parameters to reduce some error measure. This error measure is usually defined by the sum of the squared difference between actual and desired outputs
The hybridization of neural networks and fuzzy logic is the basic idea behind the neuro-fuzzy system. Neuro-fuzzy hybridization is done in two ways : fuzzy neural networks (FNN) and neuro-fuzzy systems (NFS). FNN is a neural network equipped with the capability of handling fuzzy information. NFS is a fuzzy system augmented by neural networks to enhance some characteristics like flexibility and adaptability. This paper is based on the second approach.
The Takagi-Sugeno neuro-fuzzy system was used which makes use of a mixture of back propagation to learn the membership functions and least mean square estimation to determine the coefficients of the linear combination in the rule's conclusions. The Takagi-Sugeno neuro-fuzzy system schema is depicted in Fig. 1 :
Figure 1. Takagi-Sugeno Neuro Fuzzy system
Perhaps the first integrated hybrid neuro-fuzzy model is ANFIS, and also due to Takagi-Sugeno rules implementation in ANFIS, it has lowest Root Mean Square Error (RMSE) among the other Neuro-Fuzzy models. So ANFIS was used here for implementing neuro-fuzzy model. In ANFIS, the adaptation (learning) process is only concerned with parameter level adaptation within fixed structures. The objective of the parameter-learning phase is to adjust parameters of the fuzzy inference system (FIS) such that the error function during training dataset, reaches minimum or is less than a given threshold .
D. Membership Functions
A membership function (MF)  is a curve that defines how each point in the input space is mapped to a membership value (or degree of membership) between 0 and 1. The input space is also called as the universe of discourse. For our problem, we have used 2 types of membership functions:
Triangular membership function.
GBell membership function.
Trapezoidal Membership function.
Gauss Membership function.
III VARIOUS CRITERIONS FOR ASSESSMENT OF
1. Variance Accounted For (VAF)
2. Mean Absolute Relative Error (MARE)
3. Variance Absolute Relative Error (VARE)
4. Prediction (n)
Prediction at level n is defined as the % of projects
that have absolute relative error less than n.
5.. Balance Relative Error (BRE)
Where, N = No. of Projects
E = estimated effort Ê = actual effort
Absolute Relative Error (RE) = (5)
A model which gives lower MARE (2) is better than that which gives higher MARE. A model which gives lower VARE is better than that which gives higher VARE [6,11]. A model which gives lower BRE (4) is better than that which gives higher BRE. A model which gives higher Pred (n) is better than that which gives lower Pred (n). A model which gives lower MMRE  is better than that which gives higher MMRE.
IV Experimental Study
The NASA 93 database consists of 93 projects data have chosen the Nasa93 dataset. Out of 93 projects, randomly selected 83 projects are used as training data. The Model is tested using the entire dataset. The estimated efforts using Intermediate COCOMO, ANFIS Models using TriangularMF, GBellMF, Trapezoidal MF are shown for some sample projects in Table 3. The Effort is calculated in man-months. Table 4 and Fig.5., Fig.6., Fig.7., Fig.8., Fig.9., Fig.10. & Fig. 11. shows the comparisons of various models  basing on different criterions.
ESTIMATED EFFORT IN MAN MONTHS OF VARIOUS MODELS
Estimated EFFORT using
COMPARISON OF VARIOUS MODELS
Figure 2. Comparision of VAF & MARE against various models
Figure 3. Comparision of VARE against various models
Figure 4. Comparision of Mean BRE against various models
Figure 5. Comparision of Pred(30)% against various models
Referring to Table IV, we see that NeuroFuzzy Model using Tri MF yields better results for maximum criterions when compared with the other models. Thus, basing on MARE, VARE, Mean BRE, MMRE & Pred(30) we come to a conclusion that Neuro-Fuzzy Model using TriMF is apt. Therefore we proved that it's better to create an ANFIS Model using some training data and use it for effort estimation for all the other projects.