Automated Bin Using GLCM GLAM GLCM Approach Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

In this section, we consider the problem of bin level detection from RGB images. The first step in bin level detection is usually building a robust database. Using GLCM as texture features, training and testing the output of GLCM using artificial neural network Multilayer Perceptrons (MLP) and k-nearest neighbor algorithm (KNN) with GLCM to classify the new images. This paper is focused on improving the performance of the applied method by optimizing the GLCM.

4.1 Image Database

This work is started with database development from all the images that represent the bin at different levels, the images of the bin is taken at different levels (low, medium, full, flow and overflow) status to make the training and testing of database more robust, the result that has been tested by two classifiers MLP and KNN . A low cost camera was used as explained in the previous section to obtain the bin images under various levels. The resolution from the camera is 800 600 pixels with 24 billion colors. The database has been changed from time to time as new images are added. The images are received via GPRS connection as it has been explained in the methodology and stored in JPG format. Figure 2 shows some samples of the bin images in the database that represent the bin in different levels.

Figure (2): samples of Bin database at different levels (empty, low, medium, full, flow and overflow)

GLCM and Haralick's texture features

The gray level co-occurrence matrices (GLCM) provide a second-order method for generating texture features (Haralick et al., 1973). GLCM measures the relationship between the conditional joint probabilities of all pair wise combinations of grey levels in the image given two parameters: displacement (d) and orientation (θ) (Gonzalez and Woods, 1993). The GLCM could be calculated as symmetric or non-symmetric matrices. GLCM is often defined to be symmetric, that is, a pair of grey levels (i,j) oriented at (θ=0) would also be considered as being oriented at (θ=180) so that entries would be made at (i,j) and (j,i) and also Each GLCM is dimensioned to the number of quantized grey levels (G) (Clausi, D.A., Jernigan, M.E 1998). Applying statistics to a GLCM generates different texture features. Before texture features can be calculated. The measures require that each GLCM matrix contain not a count, but rather a probability. The probability density function normalizes the GLCM by the number of times this outcome occurs, divided by the total number of possible outcomes. The probability measure can be defined as:

where (Cij ) the co-occurrence probability between grey levels i and j is defined as:

where Pij represents the number of occurrences of grey levels i and j within the given d and θ and G is the quantized number of grey levels.

Haralick [1] has proposed 14 statistical features extracted from gray level co-occurrence matrices to estimate the similarity between them. To reduce the computational complexity, only some of these features were selected. The GLCM and then in [N. Otsu,] stated that only six of the textural features are considered to be the most relevant. Those textural features are Energy, Entropy, Contrast, Variance, Correlation and Inverse Difference Moment.

We used ten textural features in our study. The following equations define these features. Let be (i,j), Cij and Pij the th entry in a normalized GLCM. The mean and standard deviations for the rows and columns of the matrix are

The features are as follows.







Cluster Shade:

Cluster Prominence

Maximum Probability

There are many important factors to consider when it comes to the designing of GLCM. These factors are quantization levels (G), displacement value (d) and orientation value (θ). Many researchers have been conducted to investigate different aspects of co-occurrence texture features related to these factors and the usability to their application. These investigations are performed to produce recommendations on how to set necessary parameters including: G, d and θ, statistics. In this paper, the role of G and d are tested, while θ is not discussed since it is accepted by many researchers that (0°, 45°, 90°, 135°) provide more accurate classifications [ Q. A. Holmes, D. R. Nuesch 1984 and M. M. Trivedi, C. A. Harlow 1984].

GLCM Design

Quantization (G)

The number of gray levels consider as a crucial factor in the computation of GLCM. The decision that we have to make is to choose value of G to represent the textures successfully. When the value of G is smaller, the computation would be accelerated and the noise is reduced but less information is obtained (D. Haverkamp, L.-K. Soh, 1995). The gain in information in presence of noise does not compensate the loss of information as a result of quantization (M. E. Shokr 1991). D. Haverkamp 1995 investigated feature values produced by individual statistics as a function of quantization level. (Clausi, D.A., Zhao, 2002) investigated the Quantization level that affect the output of GLCM. It is expected by choosing the right value of G would increase the classification accuracy and texture features separability. In this paper, different values of G have been investigated to get the best classification result.

Displacement (d)

Displacement (d) is the second crucial parameter for the computation of GLCM. Applying a large value of d to the image would produce GLCM features that do not capture all the information in the image. (D. W. Chen, S. K 1989) investigated different values of d varies from 1 to 64 and found that the classification accuracies with values 1 2 4 8 were basically same in classifying cloud images. However, for large d values, the authors found that the classification result reduced. They also found that the classification accuracy was best when using d values (1, 2). Another study showed that d=1 produce the best classification result comparing to d=5 and d=9 (D. G. Barber and E. F. LeDrew 1991). Another experiment has been conducted by (Shokr 1991) to investigate the values of d= 1, 2 and 3 and noticed that the best classification accuracy at d =2. In this research d values from 1 to 4 will be investigated to measure the classification accuracy in identifying the bin level.

Orientation (θ)

The orientation is the third parameter for GLCM computation however it is consider less important comparing to other parameters in GLCM (D. M. Smith, E. C. Barett 1995). Several studies have been conducted to investigate the affect of θ on texture feature classification. R. W. Conners, M. M. Trivedi, 1989 conducted an experiment and used different values of θ 0°,75°, 90°, 109°, and 165° . (R. M. Haralick 1973 and R. P. Kruger, W. B. Thompson 1974) used range of Ө = 0°, 45°, 90°, and 135° .]. However, Barber and LeDrew (1991) determined that the θ values 0°,45° and 90° produces results that have greater statistical and give high accuracy classification. In this study, we set the values of θ to (0°, 45°, 90°, 135°) and used in all the cases .


Multilayer Layer Perception (MLP)

The artificial neural network (ANN) is widely recognized as a useful classification technique for pattern recognition ( Huang 2009). With its capabilities of fault tolerance and learning, it can be used to detect bin level. Many kinds of ANN models have been developed. Among the models being studied, the back-propagation (BP) neural network. In general, BP network is multilayer, fully connected and feed-forward. The first and last layers are called the input and output layers. The layers between the input and output layers are called the hidden layers (Lin 2008). Input vectors and the corresponding target vectors are used to train a BP network to develop internal relationships between nodes so as to organize the training data into classes of patterns. This same internal representation can be applied to inputs that are not used during training (Li W 2000). The trained BP network tends to give reasonable answers when presented with inputs that the network has never seen. This generalization property makes it possible to train a network on a representative set of input/target pairs and get good results without training the network on all possible input/output pairs (Sedki 2009).

Concurrence features is a key step for the classification. In this experiment, a three-layer feed-forward network trained with the back propagation (BP) is used. In total, 250 bin images including individual classes of low, medium, high, flow and overflow were classified. All images were processed with the GLCM, and the feature selected is saved to a file before neural network training.

KNN classifier

K-nearest neighbor algorithm (KNN) is supervised learning that has been used in various applications such as data mining, statistical pattern recognition and image processing.

KNN is a method for classifying objects based on closest training examples in the training dataset. An object is classified by a majority vote of its neighbors. The neighbors are taken from a set of objects for which the correct classification is known. Many distances measures could be used with KNN such as Euclidean distance and Manhattean distance

In this research Euclidean distance is used as a distance measure. Euclidean distance is the most popular distance measure. Euclidean examines the root of square differences between the coordinates of a pair of objects. This is most generally known as the Pythagorean Theorem.

The algorithm of computing KNN is as follows:

Determine the value of K

Calculate the distance between the query-image and all the training images.

Sort the distances for all the training samples and determine the nearest neighbor based on the K-th minimum distance.

Since this is supervised learning, get all the Categories of your training data for the sorted value which fall under K.

Use the majority of nearest neighbors as the prediction value.

5.1 Training Setup

The type of MLP that has been used in this study is [10 5 2] input, hidden and output layers. The network is formulated as a three-layer Hyperbolic tangent sigmoid transfer function network since the output range is perfect for learning the output bipolar values, i.e. 0 and 1. The number of neural nodes of the input layer is 10 corresponding to the five sets. The number of neural nodes of the output layer is 2 so that the output values, i.e. [1 0], [1 1], [1-1], [-1 0] and [-1 1], corresponding to the five sets of bin level images, the first two numbers 1 and -1 representing the class of the bin (waste inside or outside the bin) , the grade of the waste represented by 0, 1 and -1 inside the bin which varies from low to full, outside the bin represented by 0 and 1 which means flow or overflow. The number of neural nodes in the hidden layers is 5, which is confirmed by testing. The training function of the BP neural network is a gradient descending function based on a momentum and an adaptive learning rate. The learning algorithm of the connection weights and the threshold values is a momentum-learning algorithm based on gradient descending.

For KNN the value of K that has been used in this study is 3, 5, and 7. The three K values are investigated using Euclidean distance to measure the distance between the query image and the training dataset. The database is divided into 5 classes of bin level images, class 1, class2, class3, class4 and class5 represented the five levels of the bin low, medium, full, flow and overflow. 20 images from each class were selected to form the trained database, the trained images were processed with the GLCM, and the feature selected is saved to a file. When the query image need to be classified is process it with GLCM and compare it with features in trained dataset.

5.2 MLP and KNN Evaluation using Receiver operating characteristic (ROC)

ROC analysis is applied to find out the real performance of the network. ROC analysis is related in a direct and natural way to cost/benefit analysis of decision making. It is originated from signal detection theory. There are four possible outcomes from a binary classifier. If the outcome from a prediction is p and the actual value is also p, then it is called a true positive (TP); however if the actual value is n, then it is said a false positive (FP). Conversely, a true negative (TN) has occurred when both the prediction outcome and the actual value are n, and false negative (FN) is when the prediction outcome is n while the actual value is p. The limitations of accuracy as a measure of decision performance require the introduction of concepts as the sensitivity (true positive rate TPR)" and specificity (false positive rate FPR) of an accuracy test. The equations of these measures can be expressed by (8) and (9):

In this experiment, we created and compared three different GLCM databases. The first co-occurrence features database (DB1) is created of the 10 Features such as energy, contrast, entropy autocorrelation homogeneity, correlation, dissimilarity, Cluster Shade, Cluster Prominence, maximum probability, derived from co-occurrence matrices calculated with original bin images. As one of our objectives is to reduce the computational complexity, some of these features have to be reduced but the efficiency must be maintained. We have reduced the number of features from 10 to 7, the second database (DB2) is developed with features such as (contrast, entropy, homogeneity, correlation, cluster shade, cluster prominence and maximum probability). To get the smallest features combination with similar classification accuracy, the third database (DB3) has been created and contains features such as contrast, entropy, homogeneity and correlation). All the features have been chosen due to their behaviors. Furthermore, those features were investigated by many researchers for different applications and we were able to utilize these features in bin level detection. In order to reduce the computational complexity, as mentioned in Ref [7--9], we select the five following representative features.

Decision algorithm

The bin level detection system provides a framework to classify various bin levels and make fast decision each output values combined in IF/THEN rules that make the classification decision and in what class the bin belongs to.

The bin level detection classifies the level of the waste as inside or outside the bin. Inside the bin represent low, medium and full, outside represent flow and overflow. The decision algorithm that has been proposed based on the level of the bin. Once the bin is collected and the image sent to the control station, the image is tested by GLCM.

The system detects the level of waste in bin as inside or outside. The output of the system for inside level is neural network matrix values (1 0), (1 1), (1 -1), the first value 1 represent the class (inside the bin) and the second value 0 1 -1 represent the grade (low, medium and full)

If the neural network matrix result is (1 0) the program classifies the bin as inside bin level low

If the neural network matrix result is (1 1) the program classifies the bin as inside bin level medium

If the neural network matrix result is (1 -1) the program classifies the bin as inside bin level full

The output of the system for outside level is neural network matrix values (-1 1) , (-1 1), the first value -1 represent the class (outside) and the second value -1 0 represent the grade( flow and overflow).

If the neural network matrix result is (-1 1) the program classifies the bin as outside bin level flow

If the neural network matrix result is (-1 0) the program classifies the bin as outside bin level overflow

The following rules are used in decision algorithm

If Class >0.8 and 0.6 > Grade <-0.6 the program classify the bin as

Inside Bin Level Low

If Class >0.8 and grade>0.8 the program classify the bin as

Inside Bin level Medium

If Class>0.8 and grade<-0.6 the program classify the bin as

Inside Bin level Full'

If Class<-0.8 and Grade>0.8

Outside Bin level flow

If Class<-0.8 and 0.6< grade >-0.6

Inside Bin level Overflow

If none of the above set bin as nothing


Based on the literature review that has been discussed in the paper, the following research questions are raised.

1. Choosing the suitable features that can represent the texture feature

2. Selecting the suitable parameters d and G that can give the best classification result.

3. Classification accuracy of the choosing features with different classifier.

By giving the answer the above questions, we can a preferred set of statistics to be nominated for consistent automated bin collection and improve the classification accuracy by choosing the best value of G and d and strong classifier.

Experimental Results and Discussion

Features analysis

The first step in this research is investigating the statistic features values, We have tested different d values from 1 to 40 with various values of G= 8, 16, 32,128 and 256 to compare texture values across all the values of d and G, the values of θ fixed to θ= 0, 45, 90, 135. All the texture is normalized by its maximum value.

Normalized equation

Figs. 11-18 show the graphs of texture values versus quantization schemes.

We investigated the features behavior with d, G and Ó¨ values, so we can get a hint of the features selection that might be chosen to form robust features databases. The purpose of the investigation is to choose fewer features that can give high classification result in bin level detection.

It is noticeable from figure (contrast) and figure (Dissimilarity) for all the values of d and G that displacement for contrast is almost identical and Dissimilarity follow the same behavior. This indicates that quantization does not affect the two textural features. If we look to equations that represent Contrast (i, j)2 and Dissimilarity (i, j). We can notice that no difference between them but the range of values will be different. Contrast and Dissimilarity essentially contains the same information. For that reason Dissimilarity has been eliminated to be one of the factors in the bin classification. Figure (correlation) shows that displacement for correlation is almost identical too. The three statistic features show no changing with changing of G value which means that the three features (contrast, Dissimilarity and correlation) does not affect by the value of G.

Figure (3): Contrast for all d values across all G values

Figure (4): Dissimilarity for all d values across all G values

Figure (5): Correlation for d values across all G values

As it can be seen from the figures ( ) that illustrate the displacement for energy, correlation, homogeneity, entropy, and autocorrelation, the behaviors of these features look similar in terms of locations of slopes along the curves. The slopes differ in the degree between them, which are probably caused by the reduction of resolution due to value of G. this shows that, although the reduction of information is observable, the behavior of natural structure among pixels for these textural features can still be captured and represented with GLCM.

Figure (6): Autocorrelation for all d values across all G values

Figure (7): Entropy for all d values across all G values

Figure (8): Energy for all d values across all G values

Figure (9): Homogeneity for d values across all G values

Figure (10): Maximum probability for all d values across all G values.

Figure (11): Inverse difference normalized for all d values across all G values.

Figure (12): Inverse difference moment normalized for all d values across all G values

Figure (13): Cluster prominence normalized for all d values across all G values

Figure (15): Cluster shade normalized for all d values across all G values

Clausi (2002) worked on classification using texture of SAR sea ice imagery. He analyzed correlation among textures to determine the best subset of texture measures. He found that Contrast, Correlation and Entropy used together outperformed any one of them alone, and also outperformed using these and a number of others all together. If only one can be used, he recommends choosing among Contrast, Dissimilarity, Inverse Difference Moment Normalized or Inverse Difference normalized.

As it is explained in the previous section because of the similarity between the contrast and dissimilarity information we have eliminated the Dissimilarity to be one of features representative to the bin images with 7 and 4 features selection as well as Inverse difference moment normalized behave same like autocorrelation curve. Contrast, entropy, Homogeneity and correlation were chosen to form 4features because they behave differently and they are independent from each other.

At all G values that have been tested, the output graph preserves well. It can be concluded that the value of G can be chosen at any value as long as the computation time is acceptable.

Parameter Analysis

In order to distinguish the bin level, suitable parameters of GLCM were chosen d and G has to be analyzed. The value of G does not affect the features output of the images, so the value of G is studied with different d value 1, 2, 3, 4 to investigate the features output and its affect on the classification accuracy. Due to the computational time cost, we exclude the values of G= 128 256. Instead of choosing multiple value of G, we studied each value of G with each value of d to generate the co-occurrence matrices. Each co-occurrence matrices must be analyzed to select the appropriate value of d and G that would be able to classify and detect the bin level.

Five sample sets from all the five levels were selected to do the experiment. Four d values were tested from 1 to 4 and four G values 8, 16, 32, 64 were used to measure the effect of those parameters on the suitability of various features and seperability of the five sets. 10 Features such as energy, contrast, entropy autocorrelation homogeneity, correlation, dissimilarity, Cluster Shade, Cluster Prominence, maximum probability derived from co-occurrence matrices calculated with original bin images. Those all the 10 features were used to create the co-occurrence matrices, Feature vectors were clearly affected by the d value when it is tested with different G value. (Figures 4, 5, 6 and 7) show the features output in the five sets where d increased from 1 to 4 and G=8. To reduce the computational load, G value should be made as small as possible as long as adequate distinguishable information can be provided for a high-level classifier (Wong 2009). By analyzing the curves, d=1 determined to be the most suitable for calculating co-occurrence matrices task. It is clear from the figures how is the 5 sets is well separated at d=1. G value affects the amount of computation needed to create co-occurrence matrices as well as classification accuracy. Generally, a larger G will be able to pick up more details but noise will be increased. For real-time application consideration, the G value must be minimized.

10 features 1-8

10 features 2-8

10 features 3-8

10 features 4-8

As the mission of this paper to get a better feature for the bin level and its surrounding area and get a better classification result. The desirable d value is 1with value of G= 8, 16, 32, 64 are the most suitable values to be applied with the GLCM. At all the values of G along with d=1, the features output has been effected by increasing .Figure (8,9) shows GLCM output for 10 features with d=1 and G values 8 and 64, as it can be seen from the figure how is all the five sets is well separated in both cases. Another test has to be done to choose the best module of GLCM.

10 features 1-64


In this section, we present the experimental results. As discussed earlier, we conducted experiments using two learning methods: MLP and KNN with 3 databases. Performance for each classifier was measured by repeatedly dividing the data into training and testing sets and averaging classification performance on each test set.

To evaluate the performance of the texture features in the classification task, parameters selection analysis was employed to select the best parameters that separated bin level. To reduce the number of combinations analyzed at once, each G value was initially analyzed separately. That is, for each G value (10 features),

10 features database for d=1, 2, 3, 4 and G= 8, 16, 32, 64, each d value were tested using MLP classifier to prove our parameters analysis of choosing the values of G with d=1 to be the best parameters values. The value of θ has been fixed to 0, 45, and 90,135 at all the databases. We were able to select the best features combination and an effective way of utilizing d and G values.

10 features













































Table (1) Classification accuracy (%) of MLP based on 10 features selected with various values of d and G

The classification result in table (1) shows that the best performance occurs when d=1for all G values. The correctly classification is more 95% in class classification and more than 80% in grade classification. The classification accuracy decreased as the displacement value increased which verify our analysis of choosing d=1 with all G value. Following this analysis, all of the features listed in Table (1) were then considered to determine which set of features yielded the best performance across all combinations.

The performance of the selected features DB1, DB2, and DB3 at d=1 and G=8, 16, 32, 64 was evaluated by training and testing MLP and KNN classifiers. Once the classifier was trained, it was tested in two ways. The first was to test on the same cases that were used for training to test the classifier performance after that testing the entire cases using MLP. This is repeated until all cases have been left out. Performance of the classifiers was evaluated by comparing the test results to each other (detect the bin level in the form of a classification matrix). From the classification matrix, the percent of bin correctly classified was calculated as well as sensitivity and specificity. Performance was evaluated based on percent correct classifications as well as an ROC analysis in which the area under the ROC curve was obtained.

Tables (2, 3, 4, 5&6) show the classification accuracy, in relation to the use of 10, 7 and 4 features. Each table indicates the result of using a different combination of classifier and the number of selected features. The use of different selected features and G values significantly affects the classification performance. The number of K of the nearest neighbors used by KNN classifier and G are also provided in (the first column of) these tables.

Table (2) and Table (3) demonstrate that the KNN classifiers with DB1 outperform those using MLP with same database. For instance, by employing G=32, 64 the KNN has a classification accuracy of 94%% or 92.80%, both beating the MLP performance (which has an accuracy of 95.88%, 81.93%, 94.50%, 82.10% for class and grade respectively). For MLP, the use DB1 can produce better results than that KNN in Class classification only; the overall classification performance in indentifying the bin level, KNN performs in better way than MLP.

10 features















Table (2) Classification accuracy (%) of MLP based on DB1 (10 features selected) with various values of d and G

10 features

d=1 G=8

d=1 G=16

d=1 G=32

d=1 G=64















80.80%Table (3) Classification accuracy (%) of KNN based on DB1 (10 features selected) with various values of d and G

Using DB2 (7 features selected).Table (4) and table (5) shows the classification rates produced by the MLP and KNN classifiers that use DB2, the classification results demonstrate that the MLP classifiers performance with DB2 better than using KNN with same database. For instance, the classification result that has been obtained from MLP and KNN classifiers at G=16 (93.43%, 80.63% 90.40% respectively).

7 features















Table (4) Classification accuracy (%) of MLP based on DB2 (7 features selected) with various values of d and G

7 features

d=1 G=8

d=1 G=16

d=1 G=32

d=1 G=64
















Table (5) Classification accuracy (%) of KNN based on DB2 (7 features selected) with various values of d and G

Table (6 ) and Table ( 7) shows the classification performance for DB3 (4 features selected) with both MLP and KNN classifiers, The results demonstrate that use DB3 with KNN classifier performs better than using MLP with same database. For instance, by employing G=16, 32 the KNN has a classification accuracy of 94.40% 95.20% both overcome the MLP performance (which has an accuracy of 90.48%, 77.30%, 93.38%, 87.03% for class and grade respectively). For MLP, the classification result at G=32 was acceptable and can be used for real application.

4 features















Table (6) Classification accuracy (%) of MLP based on DB3 (4 features selected) with various values of d and G

4 features

d=1 G=8

d=1 G=16

d=1 G=32

d=1 G=64















79.20%Table (7) Classification accuracy (%) of KNN based on DB3 (4features selected) with various values of d and G

4. Discussion and conclusions

As the objective of this is research is to classify the level of the bin images with the effective and efficient GLCM parameters. The effects of GLCM parameters on the performance of co-occurrence matrix texture measures in bin level detection and classify the bin level (low, medium, full, flow and overflow). We have shown that these parameters can significantly change the performance of MLP and KNN classifiers using co-occurrence matrices. Using different d and G values with MLP, we obtained performances that ranged from 97.80% to 85.94% for class classification using MLP classifier. The performance for grade classification ranged from 87.15% to 70.63%.

We were concerned that using d=1 might not be the best selection, training and testing each d values with each G value confirm our analysis and selection to the used parameter. All the results that have been obtained from MLP classifier confirmed using of d=1 is best displacement value. The best classification results obtained in all the databases (DB1, DB2 and DB3) when d=1. The rest of the displacement values were eliminated due to the poor seperability between the 5 set and poor classification as well. Quantization value G=8 was eliminated in the real application because of the reduction of the information that was obtained from bin image and give the worst classification results in all the databases.

When features from all the databases of the different G values were analyzed, the features that gave the best results (Contrast Entropy Homogeneity and correlation) were selected; instead, 10 and 7 features from DB1 and DB2. The best classification result from all the databases is obtained using MLP for d=1 and G=32 where the classification for class and grade was 93.38% and 87.03% respectively. There are some results better than 93.38% but class represent only inside or outside the bin, the classification responsibility rely on the grade classification.

The another task of this research is to show that reducing the features to 4 does not significantly affect the classification accuracy as compared to the use of the 10 features. The above experimental evaluation provides a solid experimental grounding for the design of effective and efficient MLP and KNN classifiers. In particular, classification result of using DB2 (7 features selected) shows that MLP classifiers better than KNN classifiers (97.80%, 84.65% 78.80% respectively).

Based on this observation, the classification results shows that the classifiers using DB3 (4 selected features) outperform those using the 7 and 10 features for MLP and KNN at values of d=1 and G=32, the KNN when K=3 and G=32 has a classification accuracy of 95.20% beating the KNN that uses k=5, 7 with all G values in DB1, DB2, and DB3.


This paper has presented a study on bin level image classification. Effective parameters and feature selection approached are used in combine with MLP and KNN classifiers to perform classification. This is supported with efficient comparative investigations, involving the use of 10, 7 and 4 features with various G values. Both types of classifier which use DB3 (4 selected features) perform well at d=1 and =32. These results show the feature selection in reducing similar feature measures (as fewer features may even lead to higher classification accuracy). This in combination with the observation of MLP and KNN, classification and the reduction of the features has improved the classification accuracy.

Results of recent research suggest that carefully designed multiplayer neural networks with local "receptive fields" and shared weights may be unique in providing low error rates on handwritten digit recognition tasks. This study, however, demonstrates that these networks, radial basis function (RBF) networks, and k nearest-neighbor (kNN) classifiers, all provide similar low error rates on a large handwritten digit database. The backpropagation network is overall superior in memory usage and classification time but can provide "false positive" classifications when the input is not a digit. The backpropagation network also has the longest training time. The RBF classifier requires more memory and more classification time, but less training time. When high accuracy is warranted, the RBF classifier can generate a more effective confidence judgment for rejecting ambiguous inputs. The simple kNN classifier can also perform handwritten digit recognition, but requires a prohibitively large amount of memory and is much slower at classification. Nevertheless, the simplicity of the algorithm and fast training characteristics makes the kNN classifier an attractive candidate in hardware-assisted classification tasks. These results on a large, high input dimensional problem demonstrate that practical constraints including training time, memory usage, and classification time often constrain classifier selection more strongly than small differences in overall error rate.