This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
This chapter presents the experimental results of the developed system in Chapter 5. Section 6.1 presents and discusses the SVM training results relative to the memorization and learning of the binary SVM classifier. SVM testing and validation results for unseen samples are presented in Section 6.2, where different datasets are evaluated in order to generalize the performance of the developed system. Lastly, to perform a comparative study, Section 6.3 presents experimental results by evaluating the developed system using different machine learning algorithms other than the SVM. Experimental results of the compared machine learning models are discussed in the last part of Chapter 6.
6.1 Experimental Results of Proposed Framework
6.1.1 Image Segmentation Performance Indices
The accuracy of the mammogram segmentation (Stage 1 in Figure 5.2) algorithm in this research is evaluated by deriving quantitative measures by comparing each segmented mammogram "mask" with its corresponding "gold standard". The gold standard is obtained by manually segmenting the breast region from the background region for each mammogram image. The boundary of the breast is then manually traced to extract the real breast region, which results in a Ground Truth (GT) image as shown in Figure 6.1.
Quantitative measures using the Receiver Operating Characteristics (ROCs) (Section 3.3.6) are used to describe the accuracy of the mammogram segmentation process. The region extracted by the segmentation algorithm (mask) which matches the GT image, is denoted as the True Positive (TP) pixels, which emphasizes that the algorithm has indeed found a portion of the breast. Pixels shown in the GT image but not shown in the mask are defined as False Negative (FNs) pixels, which are considered as the missing pixels in the breast region. Conversely, the pixels not in the GT image, but in the mask, are False Positive (FPs) pixels.
Figure 6.1: Image segmentation performance indices: TP, FP and FN
Using the mammogram segmentation performance indices (TP, FP and FN) in Figure 6.1, two metrics relating to the segmentation performance are derived, namely: Completeness (CM) and Correctness (CR). In mammogram segmentation, CM is the percentage of the GT region, which is explained by the segmented region, using the following expression:
CM ranges from 0 to 1, with 0 indicating that none of the regions are properly partitioned, and 1 indicating that all the regions were segmented. For example, a value of CM = 0.92 indicates a 92 percent overlap with the GT image. Similarly, CR represents the percentage of correctly extracted breast region, using the following expression:
Similar to CM, the optimum value for CR is 1 and the minimum value is 0. Lower values of CM indicate over-segmentation, whereby a region in the GT is represented by two or more regions in the examined segmented image. Similarly, under-segmentation is defined for CR, where two or more regions in the GT are represented by a single region in the segmented image. In mammogram segmentation, the segmentation algorithm is considered accurate if the percentage of CM and CR is greater than 95 percent. A more general measure of the mammogram segmentation performance is achieved by combining CM and CR into a single measure known as Quality (Q), using the following expression:
Similarly, the optimum value for Q is 1 and the minimum value is 0. Results obtained from mammogram preprocessing (Figure 5.1 and Figure 5.2) indicate some influence on the effectiveness of the segmentation algorithm, but since the noise removal and background/artifact suppression algorithms are not image enhancement algorithms, there are no GT images present. Thus, it is considered non-trivial to quantitatively measure the effective of the mammogram preprocessing.
188.8.131.52 Image Segmentation Results
The mammogram segmentation algorithm (Figure 5.1 and Figure 5.2) is evaluated on all 582 mammogram samples as shown in Table 5.1. To demonstrate the robustness of the segmentation algorithm, it has been tested on mammograms with differing breast densities such as fatty, fatty fibroglandular and dense fibroglandular tissues.
Segmenting all 582 mammogram images (Table 5.1), the average CM and CR obtained are 0.996 and 0.981 respectively, signifying that the mammogram segmentation algorithm is robust with respect to different tissue densities. This implies that the average proportion of the segmented breast region detected by the algorithm is 99.6 percent, while 1.9 percent of the background is mislabeled as the breast region.
With few exceptions the mammogram segmentation algorithm performed well and yielded a skin-air interface with sufficient reliability to extract the nipple in the breast profile. After segmentation is completed, it is computed that the average breast region contains approximately 208,400 pixels. Thus, on average, each segmented mask misses 366 pixels from the breast region and mistakes 1742 pixels from the background region as breast pixels, which gives a quality of Q = 0.98. The adaptability of the segmentation algorithm in terms of tissue density is illustrated in the three following experiments.
Experiment 1 â€• Fatty Tissue
The first experiment deals with mammograms which predominantly comprise of fatty tissue. The segmented image closely approximate to the breast region as represented by the GT image. The quantitative measures indicate that the extracted breast regions are marginally under-segmented (2 percent), but do contain the breast region in their entirety. The mean CM and CR values for all fatty tissues are computed to be 0.99 and 0.96 respectively.
Experiment 2 â€• Fatty-Fibroglandular Tissue
The second experiment deals with mammograms which comprise of fatty fibroglandular tissue. The mean CM and CR values for all fatty fibroglandular tissues are computed to be 1.00 and 0.99 respectively. Since, the CM and CR values are closest to the optimum values that can be obtained, this indicates that the segmentation error is very small, i.e., less than 1 percent.
Experiment 3 â€• Dense-Fibroglandular Tissue
The final experiment deals with mammograms comprising of dense fibroglandular tissue. The mean CM and CR values for all dense fibroglandular tissues are computed to be 1.00 and 0.98 respectively. For dense fibroglandular tissues the nipple in the breast profile in all the mammograms has been preserved and the extracted breast contour compares well with the GT images.
As a conclusion to the three experiments above, the mammogram segmentation algorithm is invariant to changes in tissue density. However, no segmentation algorithm can be considered 100 percent robust, especially due to the heterogeneous nature of mammograms. Problems with image acquisition such as scanner induced artifacts, excessive background noise, scratches and dust artifacts could influence the reliability of the segmentation algorithm. The mammogram segmentation results indicate that, for all 582 mammograms evaluated, only 9 mammograms (1.5 percent) fell marginally short of the 95 percent accuracy indicator and 3 mammograms (0.5 percent) were over-segmented, which attributed primarily to indistinct boundaries.
The reason for the 2 percent segmentation inaccuracy is because these 12 mammograms are considered as special cases, which have a highly non-uniform background and very little contrast in the area above the core breast tissue region. So, the segmentation results in a roughly extracted breast contour which approximates the primary breast tissue region. The mean CM and CR measures of the 9 (1.5 percent) oversegmented mammogram images are computed to be 0.87 and 0.99 respectively, which indicate that the segmented region is entirely contained within the GT image.
6.1.2 Feature Selection Results
Texture features are computed using the GLCMs of all ROI samples (malignant and benign) for the purpose of binary classification using SVM. The feature selection algorithm evaluated in this research is known as "F-score + RF + SVM" technique (Chen & Lin, 2006), which is discussed in detail and the experiments are reported in Section 4.3 and Section 5.4.3 of this thesis respectively.
Initially, 24 GLCM texture descriptors are used for the purpose of feature selection, as indicated in Table 5.6. After feature selection (Chen & Lin, 2006), the optimum subset of texture features is computed to be 1056, which corresponds to 22 GLCM texture descriptors as indicated in Table 5.7. This indicates that the Recursive Feature Elimination (RFE) technique eliminates 2 GLCM texture descriptors corresponding to 96 texture feature values. The optimum subset of 1056 texture features obtains the highest 10-fold CV accuracy of 82.30 percent. The following section discusses the feature selection results obtained using the proposed technique.
184.108.40.206 Discussion of F-score results
The F-score feature ranking algorithm (Section 4.3.1) uses a RFE technique, namely the SVM-RFE as discussed in Section 4.3.2. In order to compute the F-scores for the GLCM texture features using the SVM-RFE based approach, binary confusion matrix performance indices (TP, FP and FN) need to be computed at first, as indicated in Table 4.3. Prior to computation of F-scores, the and for each texture feature is computed using the following expressions:
Using the precision and recall values obtained from equations (6.4) and (6.5), F-scores for each texture feature are calculated using the following expression:
The F-scores computed for the GLCM texture features are shown in Figures 5.28 and 5.29, whereby the optimum subset of features selected using the proposed technique is shown in Figure 5.32. Since the proposed feature selection algorithm in this research obtains a 10-fold CV accuracy of 82.30 percent using the "F-score + RF + SVM" technique, this indicates that the optimum subset of features selected has a negligible correlation between each feature. This is because during SVM-RFE, texture features with higher correlation and lower F-scores are eliminated.
6.1.3 SVM Training Validation
After obtaining the optimal pair of SVM hyperparameters (Section 220.127.116.11), the SVM is trained for binary classification using the 162 training samples as indicated in Table 5.8.
Using the approach discussed in Section 18.104.22.168, the highest 10-fold CV accuracy of the SVM classification engine obtained is 87.83 percent, as indicated in Figures 5.40 and 5.44. The training accuracy of the classification engine is calculated using equation (5.26), where a training accuracy of 97.60 percent is obtained as indicated in Section 22.214.171.124. The training accuracy indicates that the developed classification engine has good learning and memorization capability. The separating boundaries (soft-margin) between the two classes of the training data, (malignant) and (benign) is illustrated in Figure 5.47.
Prior to SVM training, optimum SVM hyperplane parameters need to be determined. As mentioned throughout this thesis, for the purpose of SVM hyperparameter optimization, 10-fold CV is extensively used. During CV all SVM training samples (Table 5.8) are trained and validated in order to generalize the memorization accuracy of the SVM classification engine. The main reason for conducting 10-fold CV is to ensure that the SVM classification engine does not overfit the training data.
For the purpose of applying 10-fold CV, the 162 training samples (Table 5.8) are split into CV training and CV testing sets such that, 70 percent of the total samples (113 samples) from each class are used for CV training and the remaining 30 percent samples (49 samples) from each class are used for CV testing. This iterative procedure is repeated for 100 trials for 10-fold CV, where on each trial the CV training and CV testing data samples are selected randomly.
The Grid Search method proposed by Hsu et al. (2003) (Section 4.2.4) is used for SVM hyperparameter tuning in this research. In the Grid Search method, exponentially growing sequences of parameters are used to identify SVM hyperparameters obtaining the best 10-fold CV accuracy of 87.83 percent (Section 126.96.36.199). After Grid Search is complete, the optimum SVM hyperplane parameters are found to be: and , as shown in Figure 5.38. Thus, using the optimum set of SVM hyperparameters obtained from the Grid Search method, an average SVM training accuracy of 97.60 percent is obtained using the 49 CV testing samples (Section 188.8.131.52).
184.108.40.206 Discussion of SVM Training Results
In the C-SVM classification model (Hsu et al., 2003) applied in this research, the parameter is a SVM hyperparameter that defines the trade-off between the training error and complexity of the model (classification engine). In the dual Lagrangian formulation, the parameter (in equation (4.42)) defines the upper bound of the Lagrange multipliers , hence, it defines the maximal influence the sample can exert on the solution.
For the trained model developed in Figure 5.46, the SVM hyperparameter affects the training and memorization accuracy of the SVM classification engine. The reason for this is, since there are 10 bounded SVs (BSVs) in the trained model, as indicated in Figure 5.45, thus, (in equation (4.48)). Due to this, the Grid Search technique selects the parameter that defines the optimum trade-off between the training error and the complexity of the model, with parameter , signifying that the training data has significant noise. Thus, by using a smaller value of parameter in the developed model, the results of the SVM classification mapping are smoother with a lower noise consideration. The RBF kernel parameter in the SVM classification engine controls the width of the RBF (Gaussian) kernel. The parameter is related to which is defined by the following expression:
where is the variance of the resulting Gaussian hypersphere. The optimum value of the SVM hyperparameter in equation (6.7) found using Grid Search is computed to be: . So, using equation (6.1), the value of can be calculated using the following expression:
where is computed to be using . The value of for the trained classifier is acceptable, since any value of below 0.01 is considered small and any value of above 100 is considered large. The reason for this is, as the parameter acts as an important hyperparameter during SVM training, small values of lead the model close to overfitting the training data, while large values of tend to over-smooth the training data. From the statistical learning theory point of view, small values lead to a higher VC-dimension, meaning that too many features are used for machine learning which leads to overfitting, while large values lead to a lower VC-dimension, signifying that too few features are used to model the classification engine. Thus, the value is acceptable to model the SVM classification engine using the RBF kernel.
6.1.4 SVM Testing and Validation
The accuracy of SVM testing and validation of is a gauge to evaluate the capability of the developed framework, namely the capability to classify between malignant and benign samples. In this research SVM testing and validation is performed by integrating the LIBSVM v3.0 library (Chang & Lin, 2010) into MATLAB as indicated in Section 220.127.116.11.
The trained model in Figure 5.46 is validated with the 70 testing samples (Table 5.8) in order to classify previously unseen (untrained) samples. As observed from Figure 5.51 and 5.54, the SVM testing accuracy obtained for an average of 100 trials using 70 testing samples (selected randomly on each trial) is found to be 97.14 percent. In addition, the SVM probability estimates of the tested samples (Section 18.104.22.168) are obtained with the SVM classification results (class labels). The probability estimates (or scores) can be taken as a measure of confidence during classification of the testing samples, as indicated in Figure 5.52 and Figure 5.55. The experiments performed in this research are presented in Section 22.214.171.124 and discussed in Section 126.96.36.199.
188.8.131.52 SVM Classification Results
The framework developed in this research for the classification of malignant and benign abnormalities (Figure 5.2) is tested using a Dell XPS 430 Workstation, with a 3.00 GHz Intel Core2 Quad Processor and 8.00 GB of RAM. The time taken for testing one sample approximately takes 4 seconds, which varies based on the configuration of the computer used and the number of samples tested. The following sections present the experiments performed in order to meet the objectives and contributions of this research outlined in Section 1.2 and Section 1.3 respectively.
184.108.40.206.1 Optimum ROI Size Selection
In general it is difficult to determine the size of the neighbourhood or the Region of Interest (ROI) that should be used to extract the relevant GLCM texture features from the abnormal regions (mass lesions and MCCs). If the size of the ROI is too large, small lesions may be missed; while if the ROI size is too small, parts of large lesions may be missed.
Table 6.1: SVM classification accuracy comparison using different ROI sizes
Average SVM Accuracy for 100 trials
48 Ã- 48
C = 64, Î³ = 0.0078125
64 Ã- 64
C = 1024, Î³ = 0.001953125
96 Ã- 96
C = 256, Î³ = 0.001953125
110 Ã- 110
C = 32, Î³ = 0.00390625
128 Ã- 128
C = 64, Î³ = 0.001953125
136 Ã- 136
C = 512, Î³ = 0.0009765625
148 Ã- 148
C = 256, Î³ = 0.00390625
The primary contribution of this research as indicated in Section 1.3 is to determine the most suitable ROI (neighbourhood) size in order to perform optimum texture feature extraction. This specifically addresses the problem of predeterming the ROI size for feature extraction. Thus, in this research, seven common ROI sizes have been evaluated as discussed in Section 5.5.5, namely: 48 Ã- 48 pixels, 64 Ã- 64 pixels, 96 Ã- 96 pixels, 110 Ã- 110 pixels, 128 Ã- 128 pixels, 136 Ã- 136 pixels and 148 Ã- 148 pixels.
Testing the significance of the ROI sizes is performed using 70 testing samples (30 percent of the total ROI samples) with the developed SVM classification engine. The experimental results obtained using different ROI sizes and their tuned SVM hyperparameters, are shown in Table 6.1. As indicated from Table 6.1, the ROI size of 128 Ã- 128 pixels obtains the highest performance of 96.60% in terms of classification between malignant and benign ROIs. Further testing for significance shows that using a ROI size of 128 Ã- 128 pixels results in the lowest number of FPs and FNs (see Table XXXX) as compared to the other six ROI sizes.
In addition, performing analysis on the GT data, the minimum and maximum diameter in pixels of a circle enclosing all malignant and benign abnormalities is found to be 48 and 130 pixels respectively. Given the above reasons, it is confirmed that a 128 x 128 pixel square ROI (or a 128 pixel circle diameter) is a near optimum to the value that can be used to extract all the abnormal (malignant and benign) regions. All experiments performed from here onwards use a ROI size of 128 Ã- 128 pixels for the purpose of texture feature extraction.
220.127.116.11.2 SVM Testing
In this research, the SVM classification engine is developed using 162 training samples (70 percent of the total ROI samples) for a binary classification problem (Section 5.5.4), where malignant samples are taken as the class and benign samples are taken as the class. Thus, representing the ROI samples as positive and negative instances of a binary classification problem, a confusion matrix can be derived, as indicated in Figure 6.2.
p n Total
Total P N
Figure 6.2: Binary classification confusion matrix
Testing the 70 ROI samples indicated in Table 5.8 with the SVM testing and validation engine proposed in Section 18.104.22.168, the resulting confusion matrix obtained with performance indices TP, FP, FN and TN is shown in Figure 6.3.
TPs = 35
FPs = 1
FNs = 1
TNs = 33
Positive class (Malignant)
Negative class (Benign)
Figure 6.3: Confusion matrix after SVM testing
(malignant is the class and benign is the class)
The SVM testing results obtained in the binary confusion matrix in Figure 6.3 show that, 30 out of the total 31 malignant samples are classified correctly by the SVM, whereas 38 out of the total 39 benign samples are correctly classified by the SVM. This indicates that only one sample in both classes is misclassified Thus, in total 68 out of the 70 tested samples (Table 5.8) are classified correctly by the SVM, which give a binary classification accuracy of 97.14 percent as indicated in Figure 5.54. Using the confusion matrix results in Figure 6.3, binary classification performance metrics (equations (4.67) to (4.71)) are computed, as indicated in Table 6.2.
The sensitivity and specificity metrics (in equations (4.67) and (4.68)) from the confusion matrix performance metrics are computed to be 0.9710 and 0.9706 respectively, where the minimum and the optimum values of both are 0 and 1 respectively. The classification accuracy is computed using equation (4.69), which is found to be 97.15 percent, where 68 out of the total 70 tested samples are classified correctly by the SVM. Since the sensitivity, specificity and accuracy values are greater than 0.95 (95 percent), thus, the performance of the developed framework is acceptable.
Table 6.2: Binary classification performance metrics for tested samples
Binary Classification Performance Metrics
True Positive (TPs)
False Positive (FPs)
False Negatives (FNs)
True Negatives (TNs)
True Positive Fraction (TPF)
False Positive Fraction (FPF)
The True Positive Fraction (TPF) (also known as the sensitivity) and False Positive Fraction (FPF) metrics are calculated using equations (4.70) and (4.71), which are found to be 0.9710 and 0.0290 respectively as shown in Table 6.2. The TPF determines the performance of the SVM classification engine on identifying positive (malignant) samples correctly from all positive samples tested. In contrast, the FPF determines how many incorrect positive results occur among all negative (benign) samples tested.
To visualize binary classification results of the developed framework in Figure 5.1, an ROC curve is plotted using the 70 testing samples (Table 5.8), as shown in Figure 6.4. Each instance (testing sample) in the binary confusion matrix in Figure 6.3 is represented as one point in the ROC space in Figure 6.4.
Figure 6.4: ROC curve of binary SVM classifier testing 70 samples
(malignant is the class and benign is the class)
The Area Under Curve (AUC) for the ROC curve in Figure 6.4 is found to be . The optimum value for the AUC is 1, where ROC curves with are rated as optimum classification results. As observed from the plot in Figure 6.4, the ROC curve follows close to the left-hand border and then the top border of the ROC space, which indicates that the developed framework produces optimum results for classification between malignant and benign samples.
22.214.171.124.3 False Positive (FP) Reduction Results
The decision-logic system presented in Section 126.96.36.199 reduces the number of FPs for the confusion matrix in Figure 6.3. Each FP instance satisfying the condition for equation (5.24) is classified as TN instead of FP, the result of which is shown in the confusion matrix in Figure 6.6.
TPs = 35
FPs = 0
FNs = 2
TNs = 34
Positive class (Malignant)
Negative class (Benign)
Figure 6.5: Confusion matrix after implementation of decision-logic system
(malignant is the class and benign is the class)
Using the confusion matrix in Figure 6.6, the FPF in equation (4.71) is calculated to be 0, which is an ideal value for the FPF. Using the proposed decision-logic system with a small number of testing samples (70 samples in this case) a FPF of 0 is achievable. However using this decision-logic system, an ideal FPF of 0 cannot be guaranteed unless a larger amount of samples are tested.
One of the limitations in this research concerns the number of mammogram samples acquired for development of the computerized breast cancer detection system. The total number of mammography images obtained from University Malaya Medical Centre (UMMC) is limited due to the fact that UMMC have only recently implemented digital mammography in 2008. Thus, over a course of nearly two years, only a limited number of malignant and benign cases (Table 5.1) are available from the UMMC in digital format.
In order to demonstrate the performance of the developed system using a larger number of testing samples, the SVM training and testing engines as shown in Figures 5.38 and 5.47 respectively are evaluated using other datasets discussed in the following section.
188.8.131.52 Discussion of SVM Classification Results
This section summarizes the SVM classification results obtained from experimental testing in Section 184.108.40.206, where four major experiments are performed. All results presented in this section are evaluated on the UMMC and MIAS ROI samples in Table 5.8.
The first experiment presented in Section 220.127.116.11.1 evaluates different ROI sizes in order to determine an optimum ROI size, the experimental results of which are presented in Table 6.1. The optimum ROI size is found to be 128 Ã- 128 pixels with a classification accuracy of 97.60 percent between malignant and benign ROI samples. Since the classification accuracy is greater than 0.95 (95 percent), thus, the performance of the proposed model is acceptable.
The second experiment in Section 18.104.22.168.2 computes the four binary classification performance metrics (TP, TN, FP and FN) using the SVM testing results from the first experiment. The performance metrics are used to plot an ROC curve obtained by testing the 70 samples (Table 5.8), as shown in Figure 6.4. The ROC curve yields an AUC of . Based on a collective comparison of the results obtained from the first and second experiment, the following observations are made:
All binary performance metrics in Table 6.2 are greater than 95 percent.
The FPF in equation (4.71) is less than 5 percent.
The ROC curve follows close to the left-hand border and the top border of the ROC space.
The ROC .
These observations indicate that the developed system can classify between malignant and benign ROIs with an average classification accuracy of 97 percent. Since the classification accuracy of the developed system is greater than the baseline of 95 percent, thus it is confirmed that the developed framework shown in in Figure 5.1 produces promising classification results.
The third experiment in Section 22.214.171.124.3 gives attention on reducing the number of FPs obtained in the SVM classification results (Table 6.2). The number of FPs effect the FPF (equation (4.71)), which can be reduced by applying a decision-logic system (in Figure 6.5) using the probability estimates of the tested samples from the SVM classification results. Applying the decision-logic system confirms that number of FPs and the FPF can be minimized at a low cost. However, since the number of samples in the MIAS and UMMC datasets is less the accuracy of the FPF the reduction algorithm cannot be tested in depth.
6.2 Comparison of Proposed Framework with Other Techniques
In order to estimate the performance of the SVM based model, different machine learning algorithms other than SVM are evaluated. Since, ANNs have similar structure to that of SVMs, thus, they are used in this research comparison with the proposed SVM framework. Traditional and modern ANN based machine learning algorithms namely the Back-Propagation Neural Network (BPNN) and the Online-Sequential Extreme Learning Machine (ELM) presented in Section 5.XXX and 5.XXX respectively are used in the framework in Figure 5.2.
6.2.1 Experimental Results of Compared Techniques
Comparing the developed framework (using SVM) with a traditional and a modern ANN based approach, namely the BPNN (see Section 5.XXX) and the OS-ELM (see Section 5.XXX), provides a better estimate of the memorization and generalization capability of different learning machines.
The BPNN during training uses a different approach in the calculation of the training error, as it minimizes the empirical error, whereas the SVM minimizes the structural risk. Similar to the BPNN, the OS-ELM is a Single Layer Feed-forward Neural Network (SLFN). Conventional ANN learning algorithms of SLFNs require tuning of network parameters. However, the OS-ELM randomly generates the input weights and the hidden neuron biases of the SLFN and uses them to calculate the output weights without requiring further learning. The OS-ELM implemented in this research is an online variant of the ELM algorithm, applicable for batch learning (Liang et al., 2006).
Figure 6.6: Log-sigmoid transfer function
The network architecture of the BPNN implemented in this research consists of 1056 input neurons in the input layer, corresponding to the optimum subset of 1056 texture features (Section 5.4.3). The output layer of BPNN consists of a single neuron, where an output of 0 indicates a benign sample and an output of 1 indicates a malignant sample. In the BPNN, the output of the neurons in the hidden layers is calculated using the log-sigmoid activation function defined in equation (6.10) and shown in Figure 6.8.
The number of samples selected for BPNN training and testing is indicated in Table 5.8. Three parameters need to be determined for the BPNN prior to obtaining a trained model (classifier), which are as follows:
Number of hidden layers
Number of hidden layer neurons
Number of training iterations
In order to statistically determine the optimum parameter values for the BPNN, different combinations of parameter values of , and are iterated for the following ranges: , and . In order to perform 10-fold CV, the 162 training samples (Table 5.8) are split into CV training and CV testing sets, such that 70 percent of the total samples from each class are used for CV training (113 samples) and the remaining 30 percent samples from each class are used for CV testing (49 samples). This procedure is repeated for 100 trials using 10-fold CV, where on each trial CV training and CV testing samples are selected randomly. The final architecture of the BPNN after parameter optimization results in a training accuracy (equation (5.26)) of 93.58 percent, where the optimum BPNN parameters determined and used for training are shown in Table 6.8. The BPNN classification results obtained after testing the 70 samples (Table 5.8) are indicated in Table 6.9. The ROC curve obtained from the BPNN classification results for testing with 70 samples is shown in Figure 6.9 with an AUC of .
The OS-ELM is implemented in this research using the RBF activation function. Using the OS-ELM with the RBF nodes, the centers and widths of the nodes are generated randomly and fixed, based on this, the output weights are analytically determined by the network. The number of samples selected for OS-ELM training and testing is listed in Table 6.3.
Table 6.3: Optimum modeling parameters for the BPNN
Number of hidden layers
Number of hidden layer neurons
where, is a matrix specifying the number of hidden neurons in each hidden layer of the BPNN.
Number of training iterations
The network architecture of the OS-ELM implemented in this research consists of 1056 input neurons in the input layer, corresponding to the optimum subset of 1056 texture features (Section 5.4.3). The output layer of OS-ELM consists of a single neuron, where an output of 0 indicates a benign sample and an output of 1 indicates a malignant sample. In the OS-ELM only one parameter needs to be determined, which is the number of hidden layer neurons , since the OS-ELM is a SLFN. The method to search for the optimal number of the hidden layer neurons in the OS-ELM is suggested by (Huang et al., 2004), which indicates that the number of hidden neurons, vary in the range from 20 to 200 as presented in Section 4.3.2.
The optimal value of is determined based on the classification performance of the OS-ELM, which is the training accuracy (equation (5.26)). Since the number of neurons in the input layer of the OS-ELM is large i.e., 1056, thus, for modeling purposes, the range of is selected as 1, where the size of is incremented by a value of 10 on each iteration.
Table 6.4: Comparison of developed system using different machine
Binary Classification Performance Metrics
The final architecture of the OS-ELM after parameter optimization results in a training accuracy (equation (5.26)) of 96.28 percent, where the optimal size of the hidden layer neurons computed be . The OS-ELM classification results obtained after testing the 70 samples (Table 5.8) are shown in Table 6.9. The ROC curve obtained from the OS-ELM classification results for testing with 70 samples is shown in Figure 6.10 with an AUC of
Parameter optimization for the BPNN and the OS-ELM in this research is performed using 10-fold CV, similar to the SVM (Section 126.96.36.199). The reason for using CV is that, since the number of training samples can be divided further into subsets, CV ensures that the trained model (classification engine) does not overfit the training data. The BPNN and the OS-ELM are further evaluated using the other two external datasets indicated in Section 188.8.131.52. The classification results obtained for testing with the external datasets are summarized in Table 6.10. As observed from Table 6.10, the SVM outperforms the BPNN and the OS-ELM techniques in terms of classification accuracy.
Figure 6.7: ROC curve of binary BPNN classifier testing 70 samples
(malignant is the +ve class and benign is the -ve class)
6.2.3 Discussion of Compared Models
Both SVMs and ANNs are considered as black-box modeling techniques. Although both algorithms share the same structure, but the learning methods for both algorithms are completely different. ANNs try to minimize the training error, whereas SVMs reduce capacity using the SRM principle.
Figure 6.8: ROC curve of binary OS-ELM classifier testing 70 samples
(malignant is the +ve class and benign is the -ve class)
Comparison results of BPNN and OS-ELM in contrast to the SVM based model are tabulated in Table 6.9, which are obtained for testing the 70 samples from the local dataset (UMMC and MIAS). The experimental results in Table 6.9 show that the SVM based approach outperforms the BPNN and the OS-ELM with respect to the overall classification accuracy. This is because the optimum results for binary classification are obtained by the SVM based model, where parameters: sensitivity, specificity, TPF, FPF and are in optimum ranges.
To further investigate the accuracy of the compared machine learning models, the ROC curves of all three models are computed using statistics from Table 6.9, as shown in Figure 6.11. As observed from Figure 6.11, the SVM has the highest AUC , followed by the OS-ELM and the BPNN . The curve for the SVM follows the closest to the left-hand border and then the top border of the ROC space this indicates that the SVM has better classification results compared to the other techniques. To confirm that the SVM based approach outperforms the other machine learning techniques using the external datasets (Section 184.108.40.206), the experimental results from the testing are tabulated in Table 6.10.
Figure 6.9: ROC curves for different machine learning techniques
(70 testing samples as indicated in Table 5.8)
As observed from Table 6.9 and Table 6.10, the BPNN has the lowest performance in terms of the classification accuracy out of all compared models. Since the BPNN used in this research has a training accuracy of 93.58 percent, which is higher than the generalization (testing) accuracy of 84.29 percent, this indicates that the BPNN has a lower generalization as compared to the OS-ELM and the SVM. The main reason for the low generalization of the BPNN is due to the cause of excessive training, i.e. overfitting.
During BPNN training, the goal is to obtain a global optimum solution. However, in a BPNN, to get the overall minimum answer of the error function, the network extrema corrects itself slowly along the local improved way and eventually ends up obtaining the local optimization answers only, which generally occurs due to excessive training (overfitting). The reason for this is, since the BP algorithm is based on the gradient descent approach, the network tends to descend slowly with a low learning speed and when a flat section (roof) appears for a long time the algorithm ends the training at that instance, which results in locally optimized answers.
Another reason for the low generalization of the BPNN is due to noise in the digital mammography data (features). The low generalization of the BPNN does not mean that it is not a good tool for pattern classification, but given the reasons above, it is not a considered a suitable tool to be evaluated with the mammography datasets (Table 5.1) acquired in this research.
In terms of classification performance, from Table 6.10 it is observed that the OS-ELM ranks second after the SVM, whereby the BPNN ranks the last. The reason for the better classification accuracy of the OS-ELM compared to the BPNN is that, since the OS-ELM iteratively fine tunes the network's input weights and biases using finite samples of the training data, this yields a higher generalization for the OS-ELM. The RBF transfer function applied in the OS-ELM technique randomly initializes hidden neuron parameters such as the: input weight vectors, neuron biases for additive hidden neurons, centers and impact factors for RBF hidden neurons, and iteratively computes the output weight vectors.
During experimental testing of the OS-ELM technique it is observed that, if the order of the training samples is switched or changed, the training accuracy of the OS-ELM also changes significantly. In order to obtain an average estimate of the memorization performance of the OS-ELM, the training accuracy of the OS-ELM is computed using an average of 100 trials where on each trial training samples are selected randomly. It is observed from the OS-ELM that, with the increase in the number of input layer neurons, the OS-ELM achieves a better performance, while remaining stable for a wide range of input neuron sizes.
There are a few reasons which constitute to the low performance of the OS-ELM compared to the SVM. The first reason being that, the assignment of the initial weights in the OS-ELM is arbitrary, which effects generalization performance of network. As the proper selection of input weights and hidden bias values contributes to the generalization capability of the trained model (classification engine), the initialization of arbitrary weights decreases the generalization performance of the OS-ELM.
The second reason being that, the value of parameter in the RBF activation function of the OS-ELM is set as a constant value of 1, as discussed by Huang et al., (2006b) and Liang et al. (2003). As the parameter controls the width of the RBF function in the OS-ELM, thus, it is suggested to be selected in within the range of 0 to 1. If the value of is increased to , the generalization performance for unseen data will decrease.
More importantly, the value of cannot be fixed to a constant value, since the width of the Gaussian function depends upon the data samples to be classified and also the amount of noise present in the data. Since there is no evidence or literature on the ELM on how to tune the parameter for the RBF activation function, using the default parameter as suggested by Huang et al., (2006b) and Liang et al. (2003), the OS-ELM produces lower generalization performance compared to the SVM. The OS-ELM does suffer from a few drawbacks, which are as follows:
For achieving good generalization results with the OS-ELM, the number of hidden layer neurons must be chosen larger than standard ANN algorithms, (such as the BPNN). This is because the neuron weights and biases are not learned from the training data.
Multi-layer ANNs (such as the BPNN used in this research) if trained properly, can possibly achieve similar and even better results comparable to the OS-ELM, a SLFN.
The solution provided by the ELM and the OS-ELM is not always so smooth, and mostly shows some ripple.
The only notable advantage of the OS-ELM over the SVM is its faster training process, with the increase in the chunk (data) size. It is known that using the RBF (Gaussian) as the activation function, SVMs suffer from tedious parameter tuning. However, the OS-ELM with a single parameter to be tuned uses its arbitrary assignment of initial random weights, which requires it to search for the optimal size of hidden layer neurons . This requires the OS-ELM to execute many times in order to get an average estimate, which loses its edge over the SVM.
The experimental results presented in Section 6.2.2 indicate that using the SVM for classification of malignant and benign abnormalities from digital mammography data has shown to be very promising. In this research, SVMs have the a few notable advantages as compared to ANNs, which are as follows:
SVMs have non-linear dividing hypersurfaces that give them high discrimination.
They provide good generalization ability for unseen data classification.
They determine the optimal network structure (such as the hidden layers and hidden layer neurons) themself, without requiring to fine tune any external parameters.
In contrast to the advantages of SVMs over ANNs, there are some drawbacks of SVMs. However, these drawbacks are restricted due to practical aspects concerning memory limitation and real-time training of SVMs. The drawbacks of SVMs are as follows:
The quadratic programming (QP) optimization problem arising in SVMs is not easy to solve. Since the number of Lagrange multipliers is equal to the number of training samples, the training process is relatively slow. Even with the use of the Sequential Minimal Optimization (SMO), real-time training is not possible for large datasets.
The second drawback of SVMs is the requirement of storage capacity for the trained model (classification engine). Support vectors (SVs) in the trained model represent important features distinguishing the training samples between the two classes (malignant and benign). When the optimization problem has a low separability in the space used, the number of SVs increases. SVs have to be stored in a model file. This puts limitations on the implementation of SVM for devices with limited storage capacity.
Given all these aspects, the experimental results presented in Table 6.9, Table 6.10 and Figure 6.11, prove that SVMs show better classification capability compared to traditional and modern ANN based approaches. Thus, SVMs are considered as a superior machine learning technique when the requirement is to solve classification problems with noisy data.
This chapter presented the experimental results of the developed system in Chapter 5. Section 6.1 presented and discussed the SVM training results relative to the memorization and learning of the binary SVM classifier. SVM testing and validation results for unseen samples are presented in Section 6.2, where different datasets are evaluated in order to generalize the performance of the developed system. Lastly, in order to perform a comparative research, Section 6.3 presented experimental results by evaluating the developed system using different machine learning algorithms other than the SVM. Experimental results of the compared machine learning models are discussed in the last part of Chapter 6.