# Ant Colony Optimization For Diagnosis Of Diabetes Computer Science Essay

**Published:** **Last Edited:**

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Ant colony optimization has been used successfully in data mining field to extract rule based classification systems. The Objective of this paper is to utilize ACO to extract a set of rules for diagnosis of diabetes disease. Since the new presented algorithm uses ACO to extract fuzzy If-Then rules for diagnosis of diabetes disease, we call it FADD. We have evaluated our new classification system via Pima Indian Diabetes data set. Results show FADD can detect the diabetes disease with an acceptable accuracy and competitive or even better than the results achieved by previous works. In addition, the discovered rules have good comprehensibility.

Diabetes is one of the most dangerous diseases, named Silent killer. This disease is a major health problem in both industrial and developing countries, and its incidence is rising. It is a disease in which either the body does not produce enough insulin or the cells ignore the insulin. Insulin is necessary for the body to be able to use glucose for energy [11]. Diabetes increases the risk of blindness, blood pressure, heart disease, kidney disease and nerve damage. This disease has two main types [4]: type1 and type 2. The most usual form of diabetes is diabetes type 2 or Diabetes mellitus type 2. Millions of people have been diagnosed with diabetes type 2, and unfortunately many more are unaware that they are at high risk [11]. In diabetes type 2, the body is resistant to the effects of insulin (a hormone that regulates the movement of sugar into cells) or the body doesn't produce enough insulin to maintain a normal glucose level[3].The Pima Indians of Arizona have the highest prevalence and incidence of diabetes Type 2 of any population in the world[4]. Although with new medical progresses, early diagnosis of disease has improved but about half of the patients diabetes Type 2 are unaware from their disease and may take more than ten years as the delay from disease onset to diagnosis [11]. While early diagnosis of disease and treatment of hyperglycemia and related metabolic abnormalities are of vital importance. The diagnosis of diabetes is not easy because there are many factors that the physician must consider. The most important two stages in diagnosis of diabetes disease are evaluating data taken from patient and referring to the previous decisions that the expert made on other patients with the same conditions [4]. The former method depends on expert's knowledge while the latter strongly depends on expert's experience with his earlier patients. This job is not easy to consider the number of factors that the expert has to evaluate. To reduce the possible errors and help the expert, the classification system can be used. The use of classifier systems in medical diagnosis is increasing gradually [4]. Expert systems and different artificial intelligence techniques for classification also help experts in a great deal and For this reason, many algorithms are proposed to classification diabetes patients [3, 4, 13, 14].

Ant colony optimization (ACO) has been successfully used for the classification task. Parpinelli et al [2] for the first time employed the ACO for data mining and named it AntMiner1. They showed that ACO is a successful method for data mining. Then, Liu et al [15] improved AntMiner1, and called it AntMiner2. They believed AntMiner2 doesn't give the chance of searching to ants and they introduced the new version of it and called it AntMiner3 [12]. Finally, Martens et al [1] proposed a new method which has all the advantages of previous versions of AntMiner and named it AntMiner+. Saniee et al [5,16] combined the ACO and Fuzzy Logic for Network intrusion detection, and they obtained significant results. To our best knowledge, ACO is never used for diagnosis of diabetes. In this paper we have use ACO and Fuzzy Logic for diagnosis of diabetes disease. We also have proposed a new framework for fuzzy rule learning. In the new presented framework the learning process for each class done independently. To evaluate the final rule-base classifier, two evaluation criteria are considered which are classification Rate and comprehensibility. The former denotes the capability of the classifier for detecting diabetes pattern in the input samples, while the latter refers to the interpretability grade of the classification system which is dependent on the classifier number of rules and the mean of rules length.

The proposed method has been tested using the public Pima Indian Diabetes data set available at the University of California, Irvine web site [17].The results show that this algorithm can classify the Pima Indian diabetes data set with acceptable accuracy and competitive or even better than the results achieved by earlier works. Also this algorithm has good comprehensibility, because it produces a few numbers of rules with short length.

The rest of the paper is as follow: Ant Colony Optimization is presented in section 2. The Fuzzy Classification is discussed in Section 3 and section 4 is devoted to proposed method. Experimental results are reported in Section 5, and Section 6 is conclusions.

Ant Colony Optimization

Ant algorithms are based on the cooperative behavior of real ant colonies, which are able to find the shortest path from a food source to their nest. While walking, real ants deposit a chemical substance called pheromone on the ground. Ants can smell pheromone and when choosing their way, they tend to choose, in a probabilistic way, paths marked by strong pheromone concentrations. In the absence of pheromone, ants choose paths randomly. Pheromone is evaporated over time, therefore, in shorter paths pheromone evaporation is less in comparison with longer paths and causes to the more pheromone is accumulated in the shorter routes. This positive feedback effect means that because of more pheromone all the ants will eventually use the shortest path. Although a single ant is capable of building a solution (i.e., a path), the optimal solution comes about solely as a result of the cooperative behavior of the ant colony (which is based on a simple form of indirect communication through the pheromone, called stigmergy). Although the first ACO algorithm, called Ant System, was applied to solve the TSP problem [7], a large number of applications to other problems were proposed after the introduction of ant system. Recently, the ACO metaheuristic was proposed as a common framework for existing applications [6,9].

Each ant builds a possible solution to the problem by moving through a finite sequence of neighbor states (nodes). Moves are selected by applying a stochastic local search directed by the ant internal state, problem-specific local information and the shared information about the pheromone.

Fuzzy Classification

Let us assume that our pattern classification problem is a -class problem in the -dimensional pattern space with continuous attributes. We also assume that real vectors are given as training patterns from the classes ().

Because the pattern space is , attribute values of each pattern are for and . In computer simulations of this paper, we normalize all attribute values of each data set into the unit interval.

In the presented fuzzy classifier system, we use fuzzy if-then rules of the following form.

: If is and … and is, then Class with .

Where is the label of thefuzzy if-then rule, are antecedent fuzzy sets on the unit interval(each triple <attribute, operator, value> called a term), is the consequent class (i.e., one of the given classes), and is the grade of certainty of the fuzzy if-then rule. In computer simulations, we use a typical set of linguistic values in Fig. 1 as antecedent fuzzy sets. The membership function of each linguistic value in Fig. 1 is specified by homogeneously partitioning the domain of each attribute into symmetric triangular fuzzy sets. We use such a simple specification in computer simulations to show the high performance of our fuzzy classifier system, even if the membership function of each antecedent fuzzy set is not tailored. However, we can use any tailored membership functions in our fuzzy classifier system for a particular pattern classification problem.

When ACO produced a set of rules, the following steps are applied to calculate the certainty grade of each fuzzy if-then rule: [5]

Step 1: Calculate the compatibility of each training pattern with the fuzzy if-then rule by the following product operation:

(1)

Figure 1. The used antecedent fuzzy sets in this paper. a)1: Small, 2: medium small, 3: medium, 4: medium large, 5: large. b) 0: don't care.

Where is the membership function of attribute of pattern and denotes the total number of patterns.

Step 2: For each class, calculate the relative sum of the compatibility grades of the training patterns with the fuzzy if-then rule:

(2)

Where is the sum of the compatibility grades of the training patterns in Classwith the fuzzy if-then rule and is the number of training patterns which their corresponding class is .

Step 3: The grade of certainty is determined as follows:

(3)

Where

(4)

Now, we can specify the certainty grade for any combination of antecedent fuzzy sets. Such a combination is generated by the proposed hybrid system will be explained in the next sections.

The task of our fuzzy classifier system is to generate combinations of antecedent fuzzy sets for generating a rule set with high classification ability. When a rule setis given, an input pattern is classified by a single winner rule in, which is determined as follows:

(5)

That is, the winner rule has the maximum product of the compatibility and the certainty grade.

Each fuzzy if-then rule is coded as a string. The following symbols are used for denoting the five linguistic values: (Fig. 1)

0: don't care (DC), 1: small (S), 2: medium small (MS), 3: medium (M), 4: medium large (ML), 5: large (L).

The Proposed Method

As it was mentioned earlier, ACO algorithm has recently been used in various kinds of data mining problems such as clustering, and classification [1,8].In this section, we discuss the detail of our proposed algorithm for the discovery of classification rules(We call it FADD). This section is divided into six subsections namely, a general description of proposed algorithm, Pheromone Initialization, Rule Construction, Quality Computation Function, Pheromone Update Rule, and Stopping Conditions.

A general description

The FADD utilizes of the artificial ants in order to explore the training search space and gradually make candidate rules. The major difference of this algorithm with the previous algorithms is that this algorithm learns rules for each class separately. In other words, for each class such as k the main function calls a function FADD, which this function learns the rules related to class k. When FADD learns the rules, returns them to the main function and main function remove the covered samples. The learning process is done for each class separately. This process will be done iteratively and finally a set of rules would be discovered and could be used as our detection model to apply in diabetes disease diagnosis. When user-defined number (No_Of_Ants) of ants modified the constructed rule, if constructed rule is proper (improve the classification Rate more than a threshold) then augmented to the DiscoveredRules otherwise the constructed rule is ignored. The steps of proposed algorithm as follow:

Step1: Set the DiscoveredRules as empty and TrainingSet

as all of training samples.

Step2: for each class

Step2-1: Call FADD(fig.2.) for learning the rules of each class.

Step2-2: Add the rules that recently learned (by

step 2-1) to DiscoveredRules.

Step2-3: Remove the covered samples of TrainingSet.

Step 3: Compute the grade of certainty CF for each rule

of the DiscoveredRules.

Step4: For each input pattern Xp=(x1, x2, x3, ..., xn), the single rule Rj can classify Xp which Rj has maximum product of the compatibility and the certainty grade CF among all of rules.

Pheromone Initialization

Whenever function FADD called for learning the rules of each class, all of cells in the pheromone table are initialized equally to the following value:

(6)

Where:

a: is the total number of attributes;

bi :is the number of values in the domain of attribute i.

Rule Construction

Each time function FADD is called at first iteration (T=0), a rule is created which all terms of this rule have DC value. In the next iterations (T≥1) an ant can only modify the terms of the rule that in previous iterations has been constructed. The maximum terms that each ant can modify in each iteration (T≥1) determined with a parameter named Max_Change.fig.2: A high description of FADD

The largest value of Max_Change is number of feature (In our experiments, Max_Change=2). The number of ants that modify the rule in inner loop of FADD is determined by user (No_Of_Ants). The probability that each ant chooses termi,j to modify is

Where

ηi,j : Is a problem-dependent heuristic value for termij. In this algorithm we use 0.5 for DC and 0.1 for other values.

τi,j : The amount of pheromone currently available

(at time t) on the path between attribute i and value j.

a: The total number of attributes

bi: The total number of values in the domain of attribute i

I: Is the set of attributes that are not yet used by the ant

Quality Computation Function

Whenever a rule modified by an ant, the quality function calculates the quality of modified rule. The quality of a rule such as Rj is computed according to equation (8).

(8)

Where

TP: true positives, the number of cases in our training set covered by the rule that have the class predicted by the rule.

FP: false positives, the number of cases covered by the rule that have a class different from the class predicted by the rule

FN: false negatives, the number of cases that are not covered by the rule but that have the class predicted by the rule.

TN: true negatives, the number of cases that are not covered by the rule and that do not have the class predicted by the rule.

Figure 2. A high description of FADD

Pheromone Update Rule

After each ant modifies the terms of a rule according to Max_Change parameter, pheromone updating is carried out. We have defined a new function to update pheromone, in such a way that whenever each ant has modified the terms of rule Rj, quality of rule Rj is calculated, if the quality of rule Rj is increased then pheromone of this rule is increase according to value of quality that improved. We believe (by our experiments) that with this new update Strategy, in each iteration the pheromone helps improve the quality of rule. Pheromone updating is carried out according to equation (10).

Where

âˆ†Q: show difference the quality of the rule after

and before modification.

c: is a parameter to regulate influence of quality.

Algorithm I:

j=1, LearnedRules=[];

While (Not satisfy stopping conditions)

T=0;

2.2 Pheromone initialization; /* all of cells in the pheromone table are initialized equally to equation(5).*/

2.3 Create rule Rj; /* all of terms in this rule have DC value*/

Repeat

2.4.1 T=T+1;

2.4.2 Modify Rj according to Max_change; /*Each ant can modify

the terms of rule Rj according to max_change parameter */

2.4.3 Compute the quality of Rj ;

/*according to equation(3).*/

2.4.4 Update pheromone; /* according to

equation(5) */

Until (T > No_Of_Ants)

If isProper(Rj) add Rj to LearnedRules;

/* Rj must Improve the classification rate */

j=j+1;

End While;

Return LearnedRule;

End Function FADD;

It is necessary to decrease the pheromone of terms that have not participated in the construction of rules. For this purpose, pheromone evaporation is simulated. To simulate the phenomenon evaporation in real ant colony, the amount of pheromone associated with each termij that does not occur in the constructed rule must be decreased. The pheromone of unused terms is decreased by dividing the amount of the value of each τij by the summation of all τij.

Stopping Conditions

Stopping condition in outer loop of FADD function refers to any condition that user has defined to terminate the loop. For example user can use the fix number of iterations or using the minimum uncovered instances to terminate the FADD function. In our experiments, we have used the combination of these two conditions to terminate FADD function.

Experimental Results

Our experiments used data sets from the UCI data set repository [17]: the Pima Indian Diabetes, which contains 768 instances, 8 integer-valued attributes and 2 classes. We normalized the data sets, where each numerical value in the data set is normalized between 0.0 and 1.0. For this purpose, the below function is applied to normalize the data set.

(11)

We evaluate comparative performance of FADD using ten-fold cross-validation. Data set is divided into ten partitions, and FADD is run ten times, using a different partition as test set each time, with the other nine as training set. The classification rate being calculated according to equation (12) (where the meanings of TP, TN, FN, FP are as in equation (8)).

(12)

Table I shows classification rate for the rule sets produced by different algorithms. And table II shows the results of FADD. It can be seen that proposed algorithm discovers less rules, but also it has the good classification rate, in comparison with other methods. Also because of the number of rules that FADD algorithm has produced and mean length of rules is low, FADD has good comprehensibility.

TABLE I: CLASSIFICATION RATE OBTAINED WITH DIFFERENT CLASSIFIER.

Method

Classification Rate

Decision Table*

71.224

RBF*

75.8

NNGE*

73.5677

C4.5 Dta*

73.0

Bayesa*

72.2

Regression Coefficients*

72.3958

Naive Bayes*

76.3021

CART*

72.8

C4.5 rules*

67.0

Deng et al[13]

78.4

Kayaeret al[14]

77.08

Polat et al[3]

78.21

Temurtaset al[4]

79.16

* The methods that is marked with asterisk, have been tested by software Weka [10].

TABLE II. RESULT OF FADD

Number of Rules

Mean Classification Rate

Mean length of rules

8

79.48±1.1

2.571

VI.

Conclusion

This paper presents a mixture of Ant Colony Optimization and Fuzzy Logic for mining among Pima Indian diabetes data set. Already, Ant Colony Optimization used for data mining to classification [1,2,11,15]. The main new features of the presented algorithm are as follows:

1. Introducing a new framework for learning the rules in such a way that the rules are learned for each class independently.

2. A different strategy for controlling the influence of pheromone values was studied. We proposed the new update pheromone rule that improves the quality of each rule. Because for each rule, the value of pheromone that increased in each iteration depend on the quality of modifications that ants were did. With this new update pheromone function ants in order to improve the quality of rule, make better decisions in next iterations.

3. There are two important concepts in ACO that are: Competition and Cooperation. The previous versions of AntMiner paid more attention to Competition and this caused some of the rules was very strong while the other rules was nearly weak. In this paper we have paid attention to cooperation in order to produce a set of nearly strong rules. For this propose, we have encouraged the ants to have more cooperation in the body FADD function.