This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
The selection of features has a considerable impact on the success or failure of classification process. Feature selection refers to the procedure of selecting a subset of informative attributes to build models describing data. The main purpose of feature selection is to reduce the number of features used in classification while maintaining high classification accuracy. A large number of algorithms have been proposed for feature subset selection. Here we compare classical sequential methods with the genetic approach in terms of the number of features, classification accuracy and reduction rate. Genetic Algorithm (GA) achieves an acceptable classification accuracy with only five of the available 44 features. The optimal feature such as mean of contrast, mean of homogeneity, mean of sum average, mean of sum variance and range of autocorrelation provide best classification performance. Similar classification performance is obtained with SFFS and SFBS but with larger feature set.
Texture is a fundamental property in many natural images. It plays an important role in pattern recognition and computer vision. Also it helps our visual perception system to understand a scene better and to extract more information from it. In the last few decades [1, 2, 3] texture analysis is investigated by research. Texture analysis proves to be important in many fields such as industrial impaction, document segmentation, remote sensing of each resources and medical imaging.
Texture classification and texture segmentation arise as two major texture analysis problems. If a textured image is segmented it doesn't achieve satisfactory results. The failure to get the desired segmentation results for the textured images is explained by the fact that the distance from which we observe and the visual attention that we give to a texture gives us different interpretations of the texture. Many researchers have tried to solve the problem of texture segmentation by using different features selection techniques such as directional grey-level energy and different segmentation methods like region growing.
In texture classification, image textures are categorized into different classes and observed image will be determined to belong to one of the given set of texture classes.
Researchers employed various ways to solve the problem of texture classification such as co-occurrence matrix , an analysis of the image spectrum using Wigner distribution and Gabor filter, etc. , fractals , rotational invariant operators , the extraction of the structural and stochastic components of an image , model-based method and so on.
Due to the large numbers of features, feature selection procedure seems to be a key to classification problem. This procedure reduces the number of features that need to be collected and provides a better classification accuracy. Feature selection refers to algorithms that output a subset of the input feature set since it is unfeasible to measure all possible features and it is unknown in advance which features will provide the best discrimination between classes. More discriminatory features are kept, retaining sufficient information and maintaining an acceptable classification accuracy. By reducing the number of features, feature selection improves classification speed and efficiency. Genetic algorithm appears as a seductive approach to find optimal solution to optimization problem and to offer an alternative to heuristic tree search methods.
Feature selection methods use various criteria such as T-statistics, F-statistics as statistics criteria, and information-theoretic criteria such as mutual information and entropy-based measure. The diversity of criteria induces substantially different outcomes when applied to same data set.
The goal of this paper is to highlight the value of feature selection in having a considerable effect on the effectiveness of the resulting classification algorithm. Several algorithms are implemented. A taxonomy of feature selection algorithms is presented to show rank of genetic algorithm.
The outline of this paper is organized as follows. In section 2 we will introduce our method based on feature extraction, feature selection and classification. Different feature selection algorithms are presented in the following section. Section 4 we focus our interest on genetic algorithm. Section 5 compares our method using genetic algorithm with other feature selection algorithms with experimental results. Conclusions are presented in the final section.
2. Our Approach
We have developed an automated brain tumor classification method that is made up of three phases: features extraction, dimensionality reduction and finally SVM training and testing. The whole process is shown in the diagram of Figure 1. Our whole approach aims at classifying MR brain images into normal, benign tumor and malignant tumor tissue. To achieve this goal, texture features need to be extracted by using Spatial Gray Level Dependence Method (SGLDM) approach . Having as input extracted features the dimensionality reduction is applied. This step is ensured by GA . The out put of this phase is a selected set of features used as entries to the classification step which will be done via training and testing techniques. These tools enable us to distinguish between the brain tissue, benign tumor and malignant tumor tissue. Among all classification methods we adopt the Support Vector Machines (SVM) as a classifier. Our implementation of SVM is explained by its robustness, its capacity to handle very large feature spaces and to have good generalization properties compared to conventional classifiers.
Figure.1. Overview of our approach
3. Feature Selection Algorithms
Feature selection refers to algorithms that output a subset of the input feature set. Y represents the original set of features and X represents the selected subset that is XY. Feature selection criterion are of crucial importance. They divide feature selection methods into two categories: the filter method and the wrapper one . Whereas the wrapper method uses classification accuracy as feature selection criteria; the filter method employs various measurements as shown in figures xa, xb.
figure.x feature selection methods (a) Filter approach for feature selection (b) Wrapper approach for feature selection
Despite the rapidity of the filter approach, it does not improve the performance of the classification stage. In our paper we use the wrapper method and especially the probability function as a criterion. One possible criterion function is (1-pe), where pe denotes the probability of error. The use of probability of error as a criterion function makes feature selection dependent on the specific used classifier and the size of the training and test data sets.
3.2 Overview of Feature Selection Algorithms
Feature selection algorithms are divided into two main categories: artificial neural networks (ANN), and statistical pattern recognition (SPR) techniques giving the optimal solution or suboptimal feature set. In the suboptimal methods, one can maintain a population of subsets or store a single "current" feature subset and make modifications to it. Also algorithms may be deterministic, producing the same subset on a given problem, or stochastic having a random element to produce different subsets on every run. The figure 2 shows the tree of some representative feature selection algorithms.
Figure 2.Categories of feature selection algorithms
3.2.1 Suboptimal Methods
These methods are not guaranteed to produce the optimal result as they don't examine all possible subsets. They include deterministic, Single-Solution Methods and deterministic, stochastic Multiple-Solution Methods.
220.127.116.11 Deterministic, Single-Solution Methods
They are the most commonly used methods for performing selection. Being referred to as sequential method, deterministic single solution method start with a single solution and iteratively add or remove features until some termination criterion is met. They are split into those that start with the full set and delete features. Kittler  compares these algorithms with the optimal branch-and-bound algorithm by applying a synthetic two-class Gaussian data set. Pudil et al  modify Kittler's comparative study by introducing sequential floating forward selection (SFFS) and sequential floating backward selection (SFBS).
18.104.22.168 Deterministic, Multiple-Solution Methods
They are referred to as "feature selection lattice" since they treat the space of subsets as a graph. Siedlecki and Sklansky  have discussed the performance of "beam search" and a best-first search in the space of feature subsets and induced that both methods maintain a queue of possible solutions.
22.214.171.124 Stochastic, Multiple-Solution Methods
Among Stochastic, Multiple-Solution Methods, we can use the genetic algorithm. GA is an evolutionary method inspired by the natural process of evolutional. It allows a randomised search guided by a certain fitness measure. A feature subset is identified by a particular binary string of length n, with a zero or one in position i denoting the absence or presence of feature i in the set. In each iteration of the algorithm (generation), a fixed number (population) of possible solutions (chromosomes) is generated by means of applying certain "genetic" operators in a stochastic process guided by a fitness measure. Each chromosome is evaluated to determine its "fitness". New chromosomes are created from old chromosomes by the processes of recombination, crossover and mutation which represent the most important genetic operators. Siedlecki and Sklansky  introduced the use of genetic algorithms (GA) for feature selection.
3.2.2 Optimal Methods
Among optimal method brand-and-bound (BB) feature selection algorithm. Narendra and Fukunaga  introduced this algorithm to find the optimal subset of features much more quickly than exhaustive search. Yu and Yuan  modified Narendra and Fukunaga's branch and bound algorithm and introduced BAB+. They showed that BAB+ outperforms the original algorithm both analytically and experimentally. Their modification essentially recognizes all "string-structure subtrees".
4. Genetic Algorithm
Genetic algorithms are stochastic global adaptive search techniques based on the mechanisms of natural selection. GAs comprise a subset of Darwinian evolution-based optimisation techniques. They are introduced by Siedlecki and Sklansky  and later expanded to allow linear feature extraction by Punch et al.  and independently by Kelly and Davis .
Let d be a set of dimensional input patterns, m a dimensional space and (m < d). GA seeks to find a transformed set of patterns in m that maximizes a set of optimization criteria.
A population of competing feature transformation matrices is maintained by GA. The input patterns are multiplied by the matrix in order to evaluate each matrix in this population. The evaluation of matrices produces a set of transformed patterns which are then sent to a classifier. The available samples are divided into a training set, used to train the classifier, and a testing set, used to evaluate classification accuracy. Then GA, being a measure of the quality of the transformation matrix, measures the accuracy obtained. The dimension of the transformed patterns is then reduced increasing the classification accuracy. GA is identified by a particular method of coding the features into string of D bits, where =1 if the feature i is in the subset and included in the classifier; or=0 otherwise. Each resulting subset of features is evaluated according to its classification accuracy on a set of test data . Fitness value is allocated based on a performance measure in classification, e.g. the classification accuracy. The fitness function is represented in the equation (2).
Where is the weight of accuracy and is the weight of N feature participated in classification where N.
Feature reduction aims at improving classification by searching the best features subset. A reduction uses the fixed set of the original features according to a given processing goal and a feature evaluation criterion (i.e classification accuracy). For particular algorithm, accuracy classification is more importance than its execution time.
The effectiveness of the GA in producing desired feature sets was evaluated and compared with several feature selection methods for each dataset. The performance of GA, SFS and SFFS is compared by Ferri et al. . They indicated the superiority of GA performance in very high dimensional problems. Also they showed that the performance of GA decreases as the dimensionality grows.
The performance of SFFS, SFBS, branch and bound algorithm and GA is compared by Siedlecki and Sklansky . They showed the superiority of GA interms of classification performance and computational effort on a synthetic 24-dimensional data set. Different settings of the feasibility threshold and the tolerance margin are tested by Jain et al . The best recognition rate of 78.9% is obtained by using threshold t=0.1 and margin m = 0.05.
5. Experimental results
In order to test the efficiency of our classification method, we employ the real data base relative to 83 images of seven patients (four men and three women) collected from the Harvard Medical School website . The images consist of 29 images normal, 22 malignant tumors suffering from a low grade Glioma, Meningioma and 32 benign tumors suffering from a Bronchogenic Carcinoma, Glioblastoma multiform, Sarcoma and Grade IV tumors. These normal and pathological benchmark images used for classification, are axial, three weighted images (enhanced T1, proton density (PD) and T2) of 256256 sizes and acquired at several positions of the transaxial planes. Patients' ages are ranged from 22 to 81 years.
The data base is divided in two sets. The training set is composed by 12 normal brain images and 20 abnormal images and the remaining slices are used for testing.
To distinguish between benign and malign tumors, 9 benign and 16 malign tumor images are applied in the training phase and the lasting images are employed for testing. The SVM algorithm is implemented for classification in the test phase. Each feature vector, corresponding to a test image, is individually input to the SVM classifier, which produces the corresponding class.
In our approach we have applied Daubechies-2 level 1 wavelet co-occurrence in feature extraction phase  which enables us to extract a total of 44 features including the mean and the range. Then we employ SVM classifier especially RBF kernel . Cross-validation method, as a grid search with 5-folders, is used to fix its optimal parameters and to search the best parameters among an interval of values which achieve a high accuracy during training and testing phases. Hence, the values of =8 and =2, as the best parameters to apply in our implementation.
The efficiency of the classier is tested for 44 extracted features and is found to be 100% for normal and pathological brain . The classification of benign and malignant brain tumors yields an accuracy rate of 98.14%.
Table 1. Results of feature selection by sequential search algorithms (forward and backward) and genetic algorithm
Number of features
Classifier accuracy by normal and pathological brain
Classifier accuracy by benign and malignant brain tumors
Table 1 shows the performance of various feature selection algorithms. In GA-based feature selection method, a population of 30 chromosomes is randomly generated. Each chromosome contains 44 genes (One gene for each feature). The genetic operators, one point crossover and mutation are used. The crossover rate is 90% and mutation rate is 10%. Tournoi selection method is used to select the mating pool. In the case of classification of normal and pathological brain, the sequential floating forward selection (SFFS) method achieves good classification results of 100% with 7 of the available 44 features. This accuracy is similar to that obtained by the sequential floating backward selection (SFBS) method but by using 11 features. The optimal feature set found using GA requires only 5 features to obtain a classification accuracy of 100%. The feature size is reduced by 88.63%. The classifier accuracy for different feature set sizes for the feature selection methods is shown in table 1. For the identification of benign and malignant brain tumors the three selection methods achieve the same classification accuracy 98.14% but they differ in the size of features used. Table 1 displays GA is the best selection method as it uses the least number of features, such as mean of contrast, mean of homogeneity, mean of sum average, mean of sum variance and range of autocorrelation . GA reduces the feature size by 88.63% without compromising the accuracy of the classifier.
In this study we have tried to classify normal, benign and malignant brain tumors using texture analysis for classification. In order to validate our work we test various feature selection algorithms: SFFS, SFBS and GA. Although SFFS and SFBS achieve high classification accuracy, they don't actually reduce the number of features that must be measured for classification.
The experimental results indicate that SFFS is the best among the sequential floating algorithms. The optimal subset of features found using GA reveal that the classification accuracy can be obtained with only five wavelet co-occurrence features. The feature size is reduced by 88.63% using GA. The optimal features such as mean of contrast, mean of homogeneity, mean of sum average, mean of sum variance and range of autocorrelation are selected by GA and they are found to be highly discriminant features. The experiments show that GA is the best feature selection method followed by SFFS and SFBS for the classification of brain tumors in MR images.