# Enhanced Artificial Bee Colony Algorithms Computer Science Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Abstract- Clustering aims at the unsupervised learning of objects in different groups. The algorithms, such as K-means and Fuzzy C- Means (FCM) are traditionally used for clustering purpose. Recently, most of the researches and study are concentrated on optimization of clustering process using different optimization methods. The commonly used optimizing algorithms such as Particle swarm optimization, Ant Colony Algorithm and Genetic Algorithms have given some significant contributions for optimizing the clustering results. In this paper, we have proposed two new approaches by enhancing the traditional Artificial Bee Colony (ABC) algorithm, the first approach uses ABC algorithm with K means operator and second approach uses ABC algorithm with FCM operator for optimizing the clustering process. The comparative study of the proposed approaches with existing algorithms in the literature using the datasets from UCI Machine learning repository is satisfactory.

Keywords- Clustering, Optimization, Artificial Bee Colony Algorithm, K-operator, FCM Operator

Introduction

Clustering groups objects into different clusters, such that the objects in each cluster will be similar to each other, but dissimilar to objects in other cluster [22]. The main purpose of clustering is to uncover the natural grouping and also to find regular and valid-organization of the data. The basic principle of clustering [22] depends on the concept of distance metric or similarity metric. The two main methods [17] of clustering are partitioning clustering and hierarchical clustering. The partitioning clustering method partition the dataset into predefined number of clusters whereas the hierarchical clustering method iteratively split the dataset into smaller subsets, until the termination condition is satisfied. The most usually used clustering method is partitioning clustering method due to use of less memory and time of execution.

978-1-4673-2907-1/13/$31.00 Â©2013 IEEEThe mainly used partitional clustering algorithm is K-means algorithm [10], which is simple, fast and center based algorithm. But, K-means algorithm highly depends on the initial cluster center and always converges to the nearest local optimum from the starting position of the search. In order to overcome local optima problem, researchers from diverse fields are applying hierarchical clustering, density-based clustering, and artificial intelligence based clustering methods, such as: statistics, graph theory, artificial neural networks, evolutionary algorithms and swarm intelligence algorithms [6]. A genetic algorithm-based approach [14], to clustering problem was experimented on synthetic and real-life datasets to evaluate the performance.

Among the clustering methods most commonly used techniques are K-means algorithm and Fuzzy c-Means algorithm. The main drawback faced by the k-means algorithm depends on initial seed points. There are no optimization criteria in the K-means based algorithms [8-9], which serve a major drawback in K- means algorithms. Fuzzy C-Means (FCM) [21] algorithm is the effective and most popular fuzzy clustering algorithm. Fuzzy c-means clustering [18] allows one piece of data to belong to two or more clusters. The drawbacks of FCM algorithm is that it converges to a local minimum or a saddle point.

In recent days to enhance the quality of results, the swarm intelligence based algorithms [15] become an alternative to the conventional clustering methods. Recently introduced optimization algorithm namely Artificial Bee Colony (ABC) algorithm, which is described by Karaboga [1] in 2005, is introduced to the clustering process. The original ABC [4] algorithm was based on the foraging behavior of the honey bees. Their foraging behavior, knowledge, remembering and information sharing features have recently been one of the most interesting study areas in swarm intelligence.

Changsheng Zhang et al. [2] have proposed a novel approach using ABC algorithm for clustering and used Deb's rule to direct the search direction of each candidate. Dervis Karaboga et al. [19] have applied the ABC Algorithm on optimization of fuzzy clustering to classify different data sets. Krishna and Murty [7] have proposed an approach called genetic K-means algorithm for clustering analysis which express a basic mutation operator controlled clustering, named as distance-based mutation. Even though the ABC algorithm [5] performs well among other optimization methods, we enhanced the ABC algorithm by incorporating K-means operator and FCM operator added in the scout bee phase.

In the normal ABC algorithm [1], the optimization consists of three different phases. They are employed bee phase, onlooker bee phase and the scout bee phase. The employed bee and onlooker bee are processed according to the fitness of the solutions. Scout bee is one of the attracting elements in the ABC algorithm, which is introduced to the bee colony when an abandon solution is developed. In such a case, a random bee is introduced into the bee colony. The conventional approach is slightly modified in the proposed approaches using K-means operator [7] and FCM operator [11] separately.

The rest of this paper is organized as follows; in the section 2 the basic concepts of the original ABC algorithm is given. The section 3 discusses about the modified ABC algorithm using K- Means operator and the section 4 describes the modified ABC algorithm using FCM operator. The experimental results for the proposed approach are compared and discussed in section 5, and the section 6 gives the conclusion for the research work.

Basic Concepts

## Principle of ABC algorithm

ABC is an optimization algorithm initiated by the intelligent behavior of honey bees, described by Dervis Karaboga in 2005. The intelligence behavior [2] of the bee depends on the recruitment to a nectar source and abandonment of a source. The honey bee swarms [16] consists of three important components namely food sources, employed bee and unemployed bees. The assessment of a food source depends on various factors such as its proximity to the nest, richness of energy and ease of extracting this energy.

The ABC algorithm [1] consisting of three groups of bees namely the employed bees, the onlooker bees and the scout bees. The employed bees will be associated with the specific food source, which they are currently employed or exploiting. The information about these food sources is taken to the hive by the employed bee and shares this information along with the onlooker bees. The onlooker bees selects one of the food source after making the comparison among the food sources around the new location based on the nectar amount. If the bee starts searching for find out new sources without any information, it will be a scout bee. The location of a food source represents a possible solution to the optimization problem and the nectar amount of a food source corresponds to the quality (fitness) of the associated solution [1], calculated by

(1)

An employed bee will be associated with only one food source, so the number of employed bees will be equal to number of the food source. If the food source becomes abandoned by its employed bee then the employed bee become scout bee. The scout bee randomly starts searching for new food source around the bee hive.

## Steps in ABC Algorithm

The steps involved in ABC algorithm [1] are

Step 1: Initial Phase: In the colony first half is made the employed bees and the second half are onlookers. The current scout number is s=0.

Step 2: Introduce the new food source: Make the bees with worst solution as scout and other bees as employed bees.

Step 3:Employed bees phase : Produce new solution for employed bees from the current position and evaluate it. The best solution is memorized. If the limit for abandonment is reached then the employed bee forgets its memory and becomes the scout. The scout number is updated. Calculate the probability values for each employed bees.

Step 4: Scout bees phase : The scout bees search for new food source randomly. When a new food source is obtained and greedy selection process is used to evaluate it.

Step 5:Onlooker bees phase : Based on the probability value calculated, onlooker bees produce their new food source and and greedy selection process is used to evaluate it.

Step 6: Memorize best position : If the memorized position is better than the previous position , then the best position is replaced by it.

## K-means Algorithm

K-means [10] is an algorithm which is used to divide or to group the objects based on attributes (features) into k number of groups. Here, k is the number which is positive integer. The fundamental procedure of k-means clustering is uncomplicated. At the first stage, we require establishing number of cluster k and we assume the centroid for these clusters. We can take any object as initial centroid unsystematically. The k-means algorithm will do the following process until it convergences. The Steps involved in K-means algorithm is as follows:

Establish the centroid coordinate randomly.

Discover the distance matrix of every objects with relevant to the centroids

Group the object based on short distance

Find the new centroid coordinate

## Fuzzy C-Means algorithm (FCM)

Fuzzy C-Means algorithm [11-13] is the effective and most popular fuzzy clustering algorithm. FCM is a method of clustering which allows one piece of data to belong to two or more clusters [19-21]. This iteration will stop whenÂ it reaches a particular condition and converges to a local minimum or a saddle point. By repeated updation of the cluster centers and the membership function for each data point, FCM iteratively moves the cluster centers to the "right" location within a data set. The performance of the FCM algorithm depends on initial centroids. The Steps involved in Fuzzy C-means algorithm [11] is as follows:

Initialize U=[uij] matrix, U(0)

At k-step: calculate the center vectors C(k)=[cj] with U(k)

Update U(k) , U(k+1)

If ||U(k+1)-U(k)||< then STOP; otherwise return to step 2.

In the above algorithm, uij represents the degree of membership of ith object in jth cluster, cj represents the d dimension center of the cluster and xi represents the ith d dimensional measured data. This algorithm is based on the minimization of the following objective function which is given by

(2)

Data in this algorithm is bound to each other by means of a membership function[19] that represents the fuzzy behavior of the algorithm. Â First we have to build an appropriate matrix named U whose factors are numbers between 0 and 1 and it represent the degree of membership between data and centers of clusters.

Modified ABC Algorithm using K-mean operator (ABK)

The ABK algorithm stands for Artificial Bee Colony with K-means operator. The proposed approach presents an enhanced algorithm that incorporates the K-means algorithm into the ABC algorithm. The proposed algorithm is well designed to get all the features of the above mentioned algorithms. Among the three phases, employed bee phase and onlooker phase are inevitable phases where as the scout bee phase is a random phase. So, we incorporate the K-means algorithm in scout bee phase. The selection of scout bee will come only when there is an abandoned solution is generated. At that time, according to the ABC algorithm, a random solution is generated for the scout bees. This random solution may or may not be the best solution, but the chances are very less.

The proposed approach comes up with the plan that, in each cycle we will introduce solutions for the scout bees by the K-means algorithm. The new solution from the K-means is generated according to the solutions of the employed bee and the onlooker bee phases. This process may increase the chances of giving more suitable solutions for the optimization problem. The addition of new solution from K-means after every cycle may enhance the reach of ABC algorithm to a different level. Now, the proposed approach finds thevalues from the distances according to the following formula.

(3)

The fitness function is the calculated as the sum of all the values.

(4)

(5)

Here the fitness is calculated for the data, which is given as the input. The fitness is calculated for finding the most relevant data for the next population. The solution with better fitness among the population is survived and others are rejected. Among the three phases of the ABC algorithm, the employed bee phase and the onlooker bee phase are same as defined by the ABC algorithm and the changes we have made in the scout bee phase. As per the ABC algorithm, the third phase is random phase and that will occur only if there is an abandoned solution. But in our proposed approach, we have replaced the third phase with k-means operator.

## Modified Scout bee phase

The scout bee phase is the final stage in our proposed approach. This phase is powered with the K- means operator, in order to get the highest fitness value from the solution. The scout bee starts by selecting the solution from onlooker bee phase, which possesses the lowest fitness value. The onlooker bee phase generates different solution according to different values. So, we will get a number of fitness values also. Now, select the solution with least fitness value. The scout bee phase can be explained with the following pseudo code.

## Steps in Modified ABC Algorithm using K- Means operator

/* Input: Dataset */

/* Output: Clusters */

Step 1: Read dataset

Step 2: Initialize dataset.

/* Cycle starts */

Step 3: Apply random selection for employed

bees.

Step4: Calculate Distance matrix

Step 5: Find the fitness,

Step 6: Start Onlooker bee phase.

Step 7: Find new solution for onlookers,

Step 8: Find fitness of onlooker bee

Step 9: Select the bee with least fit value.

Step 10: Apply K-means operator in the selected

bee to find Scout bee.

Step 11: Add Scout bee into Employed bee.

/* Cycle ends */

Step 12: Repeat Step 3- Step 11.

Step 13: End

Modified ABC Algorithm using FCM operator (AB-FF)

In this approach, a new algorithm is developed by incorporating the FCM operator into the ABC algorithm. The proposed algorithm is well designed to obtain all the features of the ABC and FCM algorithms [21]. As the scout bee phase is a random phase, modification is incorporated in the scout bee phase of ABC algorithm with the help of FCM operator. The proposed approach comes with an alternative, in each cycle, a solutions for the scout bees is introduced by the FCM operator. The new solution from the FCM operator is generated based on the solutions of the employed bee and the onlooker bee phases for the better optimization results. The fitness function used in the proposed approach is given below

(6)

The dataset can be considered as the following set with 'n' elements:

(7)

i.e., In general:

(8)

The fitness function is calculated as the sum of all the fi values:

(9)

The fitness value is calculated for finding the most relevant data for the next population. The solution with better fitness value among the population survives and the solutions with worst fitness value are rejected. In the three phases of the ABC algorithm, the employed bee and the onlooker phase are same as that described in the basic ABC algorithm and the employed bee is considered as the initial population. The major modification is made in the scout bee phase.

## Modified Scout Bee Phase

In the proposed approach, a new method is used to introduce the scout bee, in the case of an abandon solution. Instead of adding a random position to the bee colony, the proposed approach uses the FCM function [21] to introduce the scout bee. The scout bee is produced from the onlookers with the highest fitness, though they have not possessed best fitness than the previous one. The onlookers with high fitness are sorted in their ascending order of their fitness values:

Here, Oi is the set of onlooker bee positions, which has the highest fitness. A scout bee is generated from the processing of the above set with the help of the FCM function. Thus, the new scout bee can be generated from the following function:

(10)

(11)

Where,

ïƒ The fuzzy membership value,

ïƒ The centroid calculated for the set

The FCM function generates a new position for the scout bee from the membership values [20] and the centroid values. As mentioned above, only one centroid is defined since a single scout bee has to be introduced to the context. Unlike the FCM algorithm, in the proposed approach a single iteration is conducted to obtain the new solution. The main advantage in the proposed approach is that the scout bee is added to the solution by processing from the best fitness values obtained from the last solution in the cycle instead of randomly assigning a scout bee. Moreover the scout bee is introduced in every cycle, which improves the optimal solution and the overall execution time.

## Steps in Modified ABC Algorithm using FCM operator

AB-FF Algorithm

/* Input: Dataset */

/* Output: Clusters */

Step 1: Read dataset

Step 2: Initialize the colony.

/* Cycle starts */

Step 3: Apply employed bees.

Step 4: Calculate Distance matrix

Step 5: Find the fitness,

Step 6: Start Onlooker bee phase.

Step 7: Find new solution for onlookers,

Step 8: Find fitness of onlooker bee

Step 9: Select the bee with least fitness value.

Step 10: Apply FCM operator in the selected bee

With least fitness value

Step 11: Add Scout bee into Employed bee.

/* Cycle ends */

Step 12: Repeat Step 3- Step 11.

Step 13: End

EXPERIMENTAL RESULTS

The proposed approaches deals with the clustering of data based on the ABC algorithm. The method we have proposed incorporates the K-means and FCM operator with the ABC algorithm for obtaining better efficiency. The algorithm is implemented in the JAVA language and executed on a core i5 processor, 2.1MHZ, 4 GB RAM computer.

## Dataset Description

The proposed hybrid clustering algorithm is tested on three different datasets and compared with other optimization algorithms in the literature. The three datasets are namely Iris, Thyroid and Wine datasets taken from UCI Machine Learning Laboratory [23].

The Iris dataset: This data set contains 3 categories of 50 objects, in which each class refers to a type of iris plant. There are 150 instances with four numeric features and no missing attribute value.

The Thyroid dataset: This data set contains three types of 215 patients suffering from human thyroid diseases. Thyroid diseases are tested based on 5 different tests and no missing attribute value.

The Wine dataset: This Data Set is the results of a chemical analysis of 178 wines grown in the same region in Italy, but derived from three different cultivars. Wine type is based on 13 attributes derived from chemical analysis of the wine.

## Comparative Analysis and discussion

Here, we describe the comparative study of the proposed approaches with ABC algorithm. The comparative evaluation is done based on the intra cluster distances of the proposed approaches and that of the traditional ABC algorithm. We have selected three different datasets for comparison, which includes the Iris, the Thyroid and the Wine dataset. The study of the comparison from three datasets has stated that in the best cases, our proposed approach has given a minimum intra cluster distance when compared to the traditional ABC algorithm. The analyses have shown the significance improvement in the intra cluster distance of our approaches in optimization of the clusters.

The detailed comparison is plotted below in Table.1, which shows the response of ABK and AB-FF Algorithms.

Table 1 Comparative analysis based on intra cluster distance

Dataset

Criteria

ABC

ABK

AB-FF

Iris

Average

78.94

70.6

74.4

Worst

78.94

88.2

74.6

Best

78.94

68.4

74.1

Thyroid

Average

10104.03

9952.2

9987.0

Worst

10108.24

11401.2

9542.0

Best

10100.31

9682.4

8866.0

Wine

Average

16260.52

14768.2

8101.0

Worst

16279.46

15039.8

15521.0

Best

16257.28

12234.4

7289.0

Conclusion

We have proposed two new approaches to optimize the clustering process with enhanced ABC algorithm. The ABC algorithm is one of the optimization algorithm recently introduced into the data mining domain.. We have incorporated the k-means operator and FCM operator into the scout bee phase of the ABC algorithm. The experimentation is done with three different datasets namely the Iris dataset, the Thyroid dataset and the Wine dataset. The intra-cluster distance of the proposed approaches are analyzed and compared with traditional ABC algorithm. The comparative analysis has shown that, the proposed approaches have given increased accuracy over the traditional ABC algorithm.