# Cascaded Svm Classifier For Mammographic Mass Detection Biology Essay

**Published:** **Last Edited:**

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

In this paper we introduce a cascaded of Linear SVM classifier with Haar-like wavelet features for automatic detection of masses in mammogram images. The novelty of this approach is that using NSGA-II multi-objective optimization for Linear SVM node construction in the cascaded framework, where the objectives are to maximize the Linear SVM classifier accuracy and minimize its complexity. Brute-force search strategy is used to detect all possible size and potion of masses inside the mammogram image. The proposed scheme was evaluated by 119 mammogram images from Mini-MIAS database. Results show that the proposed system achieved a sensitivity of 86% at 1.1 false-positives per image.

Breast Cancer is the leading causes of mortality among women and it is expected that more than 200 thousand new cases among women to occur in US during 2011[1]. The earliest sign of breast cancer is often an abnormality detected on mammogram screening. Early detection increases the possibility of recovery from the disease. Therefore, mammography screening remains the most effective tool used in the detection of breast abnormality and computer-aided detection (CAD) systems serve as a second opinion for the radiologist.

There are a lot of approaches have been developed for mass detection [6-11].Generally, those approaches constructed based on similar philosophy: candidate mass segmentation, feature extraction, and classification as suspicious or normal region. But, the first problem of this cascaded scheme that the segmentation stage will be the bottleneck of the detection system and its efficiency will effect on the whole system. The second problem is that it is difficult to characterize breast mass by a quantitative feature due to the considerably variation in their density, shape, size, position, and border characteristics.

Alternative scheme which is Featureless detection scheme presented in [2], where brute-force scanning window was used to scan all possible locations in the image at multiple scales. A pool of over completed haar wavelets coefficient was generated from each window and provided to the classifier. Therefore, without any prior knowledge provided by the trainer the classifier will learn automatically to detect masses by examples. In this way, the only thing the system needs is a set of positive examples (masses) and a set of negative examples (non-masses).

Unfortunately, there are a lot of problems encountered with the previous scheme. First problem: data asymmetry where a small sample of positive class (mass) against a millions of negative class (non-mass) and this is due to the brute-force scanning and the occurrence of positive class is a rare event. Second problem: cost asymmetry where the cost of positive class is higher than the negative class. Finally, "curse of dimensionality" problem where the number of input features is a huge which will increase the complexity of the classifier.

Viola and Jones presented a cascaded framework for face detection [3], where adaboost algorithm have been used as anode trainer in the cascaded framework. To train the cascaded framework there was a small of positive samples N (faces) and a bootstrap database of negative sample (non-face). The first node of the cascaded framework was trained by N positive examples with randomly selected N negative examples from the bootstrap database. After construction of the first node the bootstrap database will be filtered by the constructed node classifier and all successfully classified negative samples will be discarded form the bootstrap database and this process continue iteratively until the bootstrap database is finished. Therefore, the problem of data asymmetry will be solved since each classifier in the cascaded framework was trained by balanced data from both classes. In addition, for the "curse of dimensionality" problem, Viola and Jones achieved feature selection by constraining each weak classifier in the adaboost algorithm to depend on only a single feature. Therefore, each stage boosting process which selects a new weak classifier, can be viewed as a feature selection process. Asymmetric adaboost algorithm have been presented in [4], where asymmetric loss function was introduced to make adaboost algotihm cost-sensetive and the false negative was cost k times the false poitive.

In this paper we adopt Viola and Jones cascaded structure [3] with multiobjective objective nondominating sorting genetic algorithm-II (NSGA-II) [5] and Linear SVM for mammography mass detection. We formulate the multiobjective optimization problem with two objectives, to maximize the LSVM classifier accuracy and minimize its complexity. The Pareto optimal solutions of those two confliction objectives will be generated by NSGA-II. The remainder of this paper is organized as follows. In section 2, overview of the related work is presented. In section 3, background overview about LSVM, NSGA-II, and harr-like features are introduced. In section 4, describe the framework and the construction methodology using multiobjective algorithm. Section 5, the expermintal results was reported. The paper is concluded in section 6.

## RELATED WORK

There are a lot of approach have been published in the literature for mammography mass detection [6-12]. Et al.[6] investigated the use of multiresolution scheme for the detection of speculated mass where multiresolution for the image is obtained using wavelet transform then a set of features extracted for each pixel at every resolution level. They used select 19 mammogram images that contains a speculated masses from MIAS database. The author achieved 84.2% sensitivity at less than 1 false positive per image, and 100% sensitivity at 2.2 false positive per image. some disadvantages of this scheme are that it was designed to detect only speculated masses and the database used in their study includes small dataset with only 19 speculated and 19 normal images. Nevine H. Eltonsy et al.[7] proposed concentric morphological model for detection of malignant masses. The approach is depend on the presence of high concentration of concentric layers associated with lower average intensity is counted as a suspicious region. A total of 270 mammographic cases have been used in the evaluation and they report 92.1% sensitivity for malignant mass detection at 5.4 FP per image but its major limitation was the empirical selection and optimization of its key parameters. Another approach presented in [8] which consist of three stages: preprocessing, segmentation, and classification. SVM classifier have been used with contour-let features. Their study included 90 (60 normal and 30 abnormal cases) mammogram images taken from MIAS dataset and the type of abnormal cases was not identified. The reported accuracies was 96.6%, 91.5% and 82.1%. Guillaume Kom et al.[9] proposed mass detection scheme using local adaptive thresholding technique. Basically the algorithm consist of preprocessing and enhancement stage followed thresholding stage where the threshold by empirically chosen threshold The approach have been tested on 61 mammogram images and report a sensitivity of 95.91 % and the area under ROC curve was 0.946 without image enhancement and 0.938 with enhancement of the original image. Celia Varela et al.[10] presented neural network classifier with different features such as iris filter output, gray level, texture, etc. The system evaluated with 66 malignant and 49 normal cases and achieved 94% sensitivity at 1.02 FP per image. Recently [11,12], Gao et al [11] presented automated detection approach with Morphological Component Analysis (MCA). The scheme was evaluated by 200 mammogram images from DDSM database and achived 95.3% sensitivity at 2.83 false positive per image. Similar to our approach two stage SVM-based Featureless mass detector proposed in [13]. The first SVM was trained by a pool of overcompleted haar features to identify the candidate mass and the second SVM classifier was trained by the samples that passed the first classifier and its aim was to reduce the false positive rate . A total of 512 (200 normal + 312 abnormal) mammogram images selected from DDSM database used in the experiment and the author report 80% sensitivity at 1.1 false positive rate. However, our approach is different where we adopt the cascaded framework presented in [3]. In addition, we introduce NSGA-II multiobjective for the construction of the stage classifier to be efficient and accurate.

## BACKGROUND REVIEW

In this section we will provide a background overview of Linear SVM, NSGA-II optimization algorithm, and harr-like wavelet features,

## Linear SVM classifier (LSVM)

The theory behind SVM comes from statistical learning theory where the objective is to minimize the structural risk [14]. Therefore, the objective of SVM is to find the optimal

separating hyperplane which gives low generalization error. LSVM classifier divided into two types Hard-Margin LSVM where the training data is linearly separable and Soft-Margin LSVM where the training data is linearly non-separable as shown in fig1,2. Given a training set of instance-label pairs . The optimal separating hyperplane can be obtained by minimizing the following

(1)

(2)

For the solution of the above problem, a positive Lagrange multiplier , i=1, â€¦ . l is introduce that gives Lagrangian function:

(3)

This problem can now be solved by standard quadratic programming techniques and programs. The corresponding that have resulted are exactly the support vectors

S, which lie on the margin as shown in fig 1. Therefore, we obtain the optimal decision function f as follows :

(4)

where K is the linear kernel function and is the dot product between the input sample X to b tested with the support vector

Fig 1. Hard Margin LSVM for separable case. Support vectors are circled

On the hand, for linearly non-sperable case SVM soft margin that allow the decision boundary to misclassifiy some examples with penality cost for the violated constrains.

Fig 2. Soft Margin LSVM for inseparable case.

So, we consider minimization of

(5)

(6)

Where Î¾i is the slack variable which measure the degree of missclassification, C is the margin parameter that determines the trade-off between the maximiziation of the margin and minimization of the classification error.

## NSGA-II Algorithm

A fast and elitist multiobjective genetic algorithm NSGA-II [5] algorithm have been successfully used in a lot of problems such as feature selection in facial expression recognition problem [15], generation expansion planning problem [16], to obtain optimal decision threshold in wireless sensor network [17], and in [18] to find optimal trade of between image quality and compression in JPEG algorithm.

In NSGA-II the Pareto front (tradeoff curve) is directly found and there is no need to separately calculate the individual minimize of each objective function. NSGA-II is an elitist algorithm where good solutions are preserved in the population and each solution in the population is assigned a fitness and crowding distance value. The solutions with the same fitness are then resorted based on their crowding distance, which is a closure measure of each solution to its neighbors larger crowding distance is selected. More details about NSGA-II can be found in[5].

## Haar-Like Features

The Haar-like features are an over complete 2D haar function which can be used to encode local appearance of objects such as edges,lines, and center-surround features [19] . The feature value f of Haar-like which has k rectangles is obtained by subtracting the sum of the pixels which lie within the white rectangle form the sum of pixels in the black rectangles as shown in figure 3.

Fig 3. Six types of Haar-like features

## METHODOLOGY

## Cascaded SVM Construction using NSGA-II

The cascaded SVM framework is shown in Fig 4 which is similar to Viola and Jones cascaded structure [3] and each node is LSVM classifier. To train the node classifier the goal of minimum detection rate and the maximum allowed false positive rate is set and the sample of positive and negative example is provided. The steps of construction algorithm is shown in Algorithm 1.

Algorithm 1. Cascaded Framework Training

Inputs: - Positive example P

Bootstap database of negative examples D

initial set of negative examples N with the same size as P

Learning goal G (True Positive Rate, False Positive Rate )

Output: cascaded of classifiers LSVM =(LSVM1, LSVM2, LSVM3, â€¦., LSVMn )

i=1 , LSVM =1

## repeat

1: i = i+1

2: Node Construction {Learn LSVMi using P and N, add LSVMi to LSVM}

3: Remove correctly classified negative examples from N

4: Run the current cascade LSVM on D, add any false detection to N until N reaches the same size as the initial set.

until The learning goal G is satisfied

T

T

T

T

T

F

F

F

F

F

F

T

LSVM1

LSVM2

LSVM3

LSVM4

LSVMn

Input Image

Accept

Negative Class Samples

Reject

Reject

Reject

Reject

Reject

Reject

Fig 4. Framework of the object detection system

## Node construction using NSGA-II

The LSVM classifier in the cascaded framework have been constructed to be efficient and accurate. So, the goals of NSGA-II is to maximize LSVM classifier accuracy and minimize its complexity.

## LSVM Accuracy

The first objective function is a measurement of the accuracy of the classifier and scince in algorithm1 there is a specific goal must be satisfied by each node in the cascaded framework, we introduce a penality term that will be applied if the node classifier does not satisfy the learning goal. Otherwise, the penality value are set to zero as follows:

(7)

where, Accuracy = (8)

(9)

= (10)

= (11)

Where TP is the true positive rate , TN is the true negative rate, Total is the total number of training set which contains both negative class and positive class, , is the true positive goal and false positive goal respectively.

## LSVM Complexity

As shown in fig the architecture of the decision function (4). It is clear that the complexity of the classifier depends on the number of support vectors (SV) that participate in the decision and the dimensionality of the input pattern (X).

Input

Fig 5. LSVM Architecture

Therefore, the second objective function is to reduce the complexity of the classifier which is corresponding to the reduction of the number of support vectors that and the number of features of the input pattern as follows.

(12)

## Chromosome representation

To implement our approach the input chromosome compromised of three parts as shown in thable 1, the penalty term of LSVM C (continues value), the selected features mask where it takes the value "1" when the corresponding features is selected and "0" indicate the feature is not selected. The last part for the candidate support vector points to be eliminated from the training negative class sample and the associated mask takes the Boolean value "1" when the corresponding point candidate support vector have to be eliminated from the training set and "0" otherwise.

Table 1 Chromosome structure

LSVM margin parameter (Continues)

Input Features mask (Binary)

Candidate support vectors mask (Binary)

C

## â€¦.

## â€¦.

## Candidate support vectors identification

We adopt the technique provided in [20] for identification of boundary points of the negative class training sample. The advantage of eliminating border points from the negative class sample is twofold; first the optimal decision boundary will be pushed toward the negative sample and resulted in cost-sensitive classifier where the chance of missing costly positive example will be minimized. Second, it will enhance the possibility of making the training data linearly separable. The main idea of candidate support vectors identification is that centering a large sphere at each training example (from the negative class) , and the sphere have to be large as much as possible without covering instance from the other class (positive class), then the candidate points is identified by the number of pints (n) that falls inside the sphere .Therefore, the larger value of (n) indicate that the corresponding will be less likely to be close to the decision boundary and less likely to be support vector. Consequently, each training example from the negative class sample will be associated with a statistical confidence value (n), then sorting the negative class training data according to their confidence value and selecting 30% from those data that have the smallest confidence value as a candidate support vectors.

## -

## -

## -

## -

## -

## -

## -

## +

## +

## +

## +

## +

## +

## +

## +

## ++

## +

## -

## -

## -

## -

## -

## -

## -

## -

## -

## -

## -

## -

## -

## +

## +

## +

## +

## +

## +

## +

## +

## ++

## +

## -

## -

## -

## -

## -

## -

## -

## +

## +

## +

## +

## +

## +

## +

## +

## ++

## +

Fig 6. Candidate support vectors identification and elimination

## EXPERIMENTAL RESULTS

## Mammogram Dataset

The data set used in this experiment have been selected form Min-MIAS Mammographic Database [21]. Min-MIAS database includes 322 mammograms image of size 1024 x 1024 pixels with the resolution 200 microns/pixel. The database is divided into three classes: 207 normal cases, 69 cancer cases, and 74 benign cases. Moreover, cancer and benign cases classified into different types of class of abnormality as shown in table 2. In this study we are interested in three classes of abnormality which are circumscribed masses, ill-defined masses, and Speculated masses. Therefore, a total of 59 images which includes 25 circumscribed masses, 19 speculated masses, and 15 ill-defined masses have been used with 60 randomly selected normal images.

Table 3 : Mass class of abnormality

Mass abnormality

Malignant

Benign

Total

Circumscribed mass

4

21

25

Speculated mass

8

11

19

ill-defined mass

8

7

15

Architectural distortion mass

10

9

19

Asymmetry mass

9

6

15

## Implementation

## Preprocessing phase

The mammogram images was enhanced with the Contrast Limited Adaptive Histogram Equalization [22] and smoothed using Gaussian filter. Finally, each image was normalized by subtracting every pixel with its mean intensity value and dividing it by its variance.

## Training phase

A total of 60 (30 normal + 30 contain masses) mammogram images have been used in the training phase. A sliding window of size 100 x 100 have been used to scan the mammography image with one pixel a shifting step. The image was down-samples iteratively five times by a factor of 0.8 from the original resolution of 1024 x 1024 to 336 x 336. Therefore, we have a totally of 2020033 windows per image and only we have 30 positive training window that contain a mass. Fig 6 show some example of positive and negative windows. Therefore, each stage classifier will be constructed using 30 positive sample windows with randomly selected 30 negative sample selected from the training database. The feature pool contains a total of 126 Harr-like features and the number of candidate support vector negative samples was 30% from the total number of negative class and Câˆˆ [10-10,000]. Therefore, the length of the chromosome was 137 (1 for C + 126 features + 10 candidates SV). The total number of cascaded LSVM classifier was 31 stages.

Fig 7. Examples of training windows: "masses" (top row) and "non-masses" (bottom row)

## Testing phase

A total of 59 (30 normal + 29 contain masses) mammogram images have been used in the testing phase. The same sliding window in the training phase have been used in the detection and in order to reduce the scanning time and false positive rate we have generated a candidate region on mammography that most likely may contain a mass. So, we applied the dual morphological tophat operation that have been successfully used in [23][24]. The advantages of tophat operation is the simplicity, and efficiency in isolation of candidate masses but, the disadvantages is losing mass border details because of the embedded erosion operation in the tophat operation. But, here we are using brute-force sliding window, so we are interested in the location of the mass that must be included in the scanning process and the features used are the haar-like feature that to be extracted the whole window which include the whole mass details as shown in fig. As mentioned in [23] the size of the structure element is depend on the image resolution and have been chosen in [24] to be 75 pixel for the larger structure element and 1 pixel for the smaller structure element. As shown if fig 8 only foreground objects remains after applying tophat operation.

Fig 8. (a) mammogram marked with a box around the mass,(b) after applying tophat operation

## False positive reduction

Two soft heuristic rules have been used for false positive reduction, the first rule is that all overlapped detected locations that have a minimum distance closer than 100 pixel will be consolidated together. This rule will eliminate the redundancy of multiple windows. The second rule is that the detected regions only consider a suspected if it has detected by a minimum of 3 windows as shown in fig 7.

Fig 9. (a) detected masses,(b) after applying the heuristic rules.

## Mass Detection Results

In this section we study the sensitivity and false positive rate of the proposed approach according to images class and mass abnormality as follows:

## Testing on image class

A total of 59 mammogram images have been used to evaluate the proposed scheme. As shown in Table 3 that the approach a achieved and overall sensitivity 86% at 1.1 flase positive rate per image. The proposed approach successfully detect all cancer cases .However, there was 4 benign cases missed due to some reasons have been explained in section 5.2.6.

Table 3 : Testing Results on image class

Image class

Malignantimages

Benign images

Normal images

All images

sensitivity

100 % (10/10)

79% (15/19)

## -

86% (25/29)

false positive per image

1.8

(18window/10 image)

1.84

(35 window/19 image)

0.4

(11 window/30 image)

1.08

(64 window/59 image)

## Testing on mass abnormality

As shown in table 4 that all ill-defined masses were successfully detected. However, the approach missed 1 speculated and 3 circumscribed cases and the reason explained in section 5.2.6. The maximum false positive rate was found in circumscribed cases.

Table 4 : Testing Results on mass abnormality

Mass abnormality

Circumscribed masses

(bengin+malignant)

Speculated masses

(bengin+malignant)

ill-defined masses

(bengin+malignant)

sensitivity

63 %

(5/8)

88%

(7/8)

100%

(13/13)

false positive per image

3.375

(27 window/ 8 image)

1.75

(14 window/ 8 image)

0.9

(12window/ 13 image)

## Failure cases

As shown in fig 3 the mammogram images with boxes around the true masses that have been identified by the radiology. It is clear that in both fig 3 (a, and b) the masses were located in a very dense area and its border was not clearly identified. Fig 3(c) the mass was fragmented and was difficult to be identified visually. The last mass have been successfully identified by the system but rejected after applying the second soft heuristic rule of false positive reduction.

Fig 10.missed casses

## Comparison with other approaches

The results demonstrate the usefulness of our approach in mammography mass detection compared with related study that are more relevant and used the same dataset. Naga R. Mudigonda et al [25] proposed mass detection scheme and their study included only circumscribed and speculated masses from Mini-MIAS database. The overall detection sensitivity of their method was 60% (26 masses detected out of 43) at 2.2 FPs per image. A very recent study by Dominguez et al [26] proposed automatic mass detection scheme. The approach was tested on circumscribed, ill-defined, and speculated masses from Mini-MIAS and achieved a sensitivity of 80% at 2.3 false-positives per image.

## CONCLUSION

In this paper, we introduced a cascaded SVM classifier for mammography mass detection. The node of the cascaded system have been constructed by NSGA-II multiobjective optimization algorithm formulated with the objectives of maximization the SVM classifier accuracy and minimization its complexity. The scheme is efficient in the detection of masses and achieved 86% sensitivity at 1.1 false positive per image.