Detection Of Clustered Microcalcifications Using Bootstrap Pixcals Algorithm Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Detection of microcalcifications in mammograms has received much attention from researchers and public health practitioners in recent years. Microcalcifications appear in a mammogram as fine, granular clusters, which are tedious to identify in a raw mammograms. A variety of techniques have been proposed in the literature to enhance and automatically detect microcalcifications, but none of the method gives complete detection and clinically acceptable results. Many mammograms do not follow any type of available Statistical distributions. Hence in this paper, we propose Bootstrap PIXCALS algorithm to detect microcalcifications. A distribution free, non-parametric bootstrap technique is embedded in an algorithm to detect microcalcifications. The proposed system is able to classify an image as normal or abnormal, and also for an abnormal image it indicates the suspected area which contains microcalcifications. The different kinds of images have been considered and tested using the proposed algorithm. The efficiency of algorithm is measured using ROC and the results are compared with existing one.

Keywords: Breast Cancer; Mammogram; Microcalcifications; Region of Interest; Bootstrap; clusters; box plot; K-means; Mammography.


Breast cancer is one of the leading causes of death among women. The leading Cancer Institute estimates that one out of eight women will develop breast cancer at any stage during her lifetime. Primary prevention seems impossible because the causes of this deathly disease still remain unknown. Early detection is the key to improve breast cancer prognosis and treatment.

X-ray mammography is the most common technique used by radiologists in the screening and diagnosis of breast cancer. It is the most reliable method for early detection of breast carcinomas, reducing mortality rates by 25%. However its interpretation is very difficult and 10 % - 30% of breast microcalcifications are missed during routine screening. To increase the diagnostic performance of radiologists, computer aided diagnosis schemes have been developed to improve the detection of the primary signatures such as Masses and Microcalcifications.

Masses are defined as space-occupying lesions that are described by their shapes and margin properties. A benign neoplasm is smoothly marginated, where as a malignancy is characterized by an indistinct border that becomes more speculated with time. Because of the slight differences in the X-ray attenuation between masses and benign glandular tissue, they appear with low contrast and are often blurred. Microcalcifications are tiny deposits of calcium that appear as small bright spots in the mammogram. They have higher inherent attenuation properties, but cannot be distinguished from the high frequency noise because of their small size. The average size of microcalcifications is about 0.3mm diameter. Thus, the relevant features involved are variability, occurrence at different scales and orientations and characterization by discontinuous changes in intensity, as well as more subtle global variations in texture. Dense tissues may be easily misinterpreted as calcifications yielding a high false-positive (FP) rate which is a major problem in most of the algorithms.

To deal with these problems, many methods of automated digital mammography processing are available. Arianna Meneattini et al [1] have discussed a method, Dyadic wavelet processing for enhancement of mammographic images and detection of breast cancer. Liyang Wei et al [7] have discussed, the Relevance Vector Machine based on Bayesian Estimation theory for detection of clustered microcalcifications. Kristin.J.McLoughlin and J. Philips Bones [6] have developed a model using noise equalization and estimation of the noise as a function of the gray level is improved by calculating the noise statistics using a truncated distribution method. Reyer Zwiggelaar et al [12] used a method linear structure in mammographic images to detect and classify microcalcification. Lemanur.G.K et al [8] used New Wavelets with a high sobolev regular index for detecting microcalcifications. Paul Sajda, Clay Spence and John Pearson[10] have described a pattern recognition architecture, hierarchical pyramid/neural network, which learn to exploit image structure at multiple resolutions for detecting clinically significant features in digitized mammograms. Issam El.Naqa et al.[5] have designed "A Support Vector Machine Approach" for Detection of Microcalcifications. Heng-Da Cheng et al [4] have given a novel approach to microcalcification detection using fuzzy logic technique. Songyang, Ling guan [13] have developed a CAD system to detect clustered microcalcifications in digitized mammogram films. Ted C. Wang et al[14] have given a wavelet model to find microcalcifications in digital mammograms.

Santra, Jai Singh & Deva Arul [21] have developed a new algorithm PIXCALS to identify and detect microcalcifications. Santra, Jai Singh & Deva Arul [22] have developed a new algorithm, Pixcals Refined Bandwidth Algorithm to identify and detect microcalcifications by assuming Gaussian distribution. Chang wen chen et al. [19] have developed a robust algorithm for the segmentation of three dimensional image data based on a novel combination of adaptive K-means clustering and knowledge based morphological operations. Ng et al. [18] have developed a model for medical image segmentation using K-means clustering and improved watershed algorithm. However, the literature is scarce in segmentation methods without considering any underlying distribution. Therefore, in this paper, a distribution free technique called as Bootstrap PIXCALS is embedded in an algorithm. The acronym PIXCALS is chosen by the authors to indicate the movement of algorithm over the Pixels of entire image to detect Calcifications. Hence the name PIXCALS.


At present, CAD schemes are frequently evaluated with a database generated by the investigator(s), which may contain different proportions of subtle cases and obvious cases. As a consequence, it is not possible to perform meaningful comparisons of different schemes. A common database is an important step toward achieving consistency in performance comparison and the objective testing of algorithms. The Mammographic Image Analysis Society (MIAS), which is an organization of United Kingdom research groups interested in the understanding of mammograms, has produced a digital mammography database which we have chosen to use in our research. An important characteristic of the MIAS database is that each abnormal image comes with a consultant radiologist's truth information, i.e., the locality of the abnormality is given as the coordinate of its center and an approximate radius (in pixels) of a circle enclosing the abnormality. A original MIAS mammogram (mdb218) with clusters of microcalcifications is shown in Fig. 1 (a).


Histogram Thresholding procedure:

According to mammogram images, the breast image is bright in the middle of the tissue and gradually becomes darker towards the skin air interface. So if we create a binary image by choosing a proper initial threshold level we can segment a large area of the breast region. By decreasing the threshold level, over and over, the segmented region becomes larger and larger and results in a better approximation of the breast region. By continuing this process, the growing region fits and then exceeds the breast border. Thus, a criterion is needed to estimate the best threshold level for segmenting breast region. For this purpose, Compactness of the growing region can be used. The algorithm calculates the compactness of the segmented region after each threshold, and then decreases the threshold level for next iteration. This procedure is performed between two predefined threshold levels, the highest threshold level (initial level) and the lowest one. In any iteration, the algorithm selects the largest extracted region (breast region) and removes the others. At the end, the algorithm chooses the best threshold level which is related to lowest compactness and uses it to obtain the final binary image. Fig. 1 shows the result of histogram threshold for two sample mammogram.

Fig. 1(a) Fig. 1(b) Fig. 1(c) Fig. 1(d)

Fig. 1. Different region created by Histogram thresholding Fig. 1 (a) and (b) - Original Mammogram obtained from MIAS database (mdb218, 248). Fig. 2 (c) and (d) - Resulting Image


Median filtering

Median filtering has been found to be very powerful in removing noise from two-dimensional space without blurring edges[11]. This makes it particularly suitable for enhancing mammogram images [15]. To apply median filtering to a mammogram, the low-frequency image was generated by replacement of the pixel value with a median pixel value computed over a square area of 5 x 5 pixels centered at the pixel location. The Fig. 2(b) shows the feature images produced when a median filter with support region of size 5 x 5 has been applied on the original MIAS image.

Image Enhancement

Enhancement is aimed at realizing improvement in the quality of a given image[11]. It can be accomplished by enhancing contrast and enhancing edges. Applying contrast enhancement Unsharp Masking Filters improve the readability of areas with subtle changes in contrast. Many image enhancement techniques are based on spatial operations performed on local neighborhoods of input pixels [16, 17]. Often, the image is convolved with a finite impulse response filter called spatial mask. In the next section we will discuss the methods for image enhancement.

Spatial Low-Pass Filtering

Here each pixel is replaced by a weighted average of its neighborhood pixels, that is


Where f(x, y) and are the input and output images, respectively, w is a suitably chosen window, and a(k, l) are the filter weights. A common class of spatial low pass filters has all equal weights, giving


Where and Nw is the number of pixels in the window w. Here we convolve the resulting image with the 3x3 window size.

Unsharp Masking Filter

The unsharp masking filter [16, 17] is a simple sharpening operator which derives its name from the fact that it enhances edges (and other high frequency components in an image) via a procedure which subtracts an unsharp, or smoothed, version of an image from the original image.

Unsharp masking produces an edge image g(x,y) from an input image f(x,y) via


where fsmooth is a smoothed version of f(x,y)

This edge image can be used for sharpening if we add it back into the original signal. The enhanced image is obtained from the input image f(x,y) as


Where λ controls the shape of the Laplacian and must be in the range 0.0 to 1.0 and g(x,y) is a suitably defined gradient at (x,y). A commonly used gradient function is the discrete laplacian.



An important characteristic of the MIAS database is that each abnormal image comes with a consultant radiologist's truth information, i.e., the locality of the abnormality is given as the coordinate of its center and an approximate radius (in pixels) of a circle enclosing the abnormality. From this truth information, it is possible to extract subimage. The subimages contain all biopsy-truthed Region of Interests, and a physi­cian annotated abnormalities identified in corresponding pathology reports. These subimages considerably reduced algorithmic development times, and reduce computer memory requirements. In Fig. 2(d), shows subimages of the mammogram.

Fig. 2(a) Fig. 2(b) Fig. 2(c) Fig. 2(d)

Fig. 2 (a). Original mammogram obtained from MIAS database (mdb218).

Fig. 2 (b) Resulting Image - 5 x 5 Median Filter.

Fig. 2(c) - Enhanced Image after applying Unsharp Masking Filter,

Fig. 2(d) - Subimages extracted from Fig. 2(c).


An outlier is an observation that lies an abnormal distance from other values. Here all the pixel values considered as population and the sub images are treated as sample. Box plots are an excellent tool for conveying location and variation information in data sets, particularly for detecting and illustrating location and variation changes between different groups of data. A point beyond an inner fence on either side is considered a mild outlier. A point beyond an outer fence is considered an extreme outlier. In this study the mild and extreme outliers are detected and eliminated during the process.


Image segmentation remains one of the major challenges in image analysis. We make use of K-means clustering algorithm [18] , which is an unsupervised method, to provide us with a primary segmentation of the image. K-means clustering is often suitable for biomedical image segmentation since the number of clusters (K) is usually known for images of particular regions of human anatomy [19].

Given a set of observations (x1, x2, …, xn), where each observation is a d-dimensional real vector, then k-means clustering aims to partition the n observations into k sets (k < n) S={S1, S2, …, Sk} so as to minimize the within-cluster sum of squares (WCSS):


Given an initial set of k means m1(1),…,mk(1), which may be specified randomly or by some heuristic, the algorithm proceeds by alternating between two steps:

Assignment step: Assign each observation to the cluster with the closest mean.


Update step: Calculate the new means to be the centroid of the observations in the cluster.


The algorithm is deemed to have converged when the assignments no longer change. This algorithm is augmented with Bootstrap Pixcals algorithm.


Once an image has been preprocessed, the enhanced subimages Fig. 3 (a) contain the microcalcifications tend to be among the brightest and they may exist within regions of high average gray levels and thus prove difficult to reliably segment. Therefore, each pixel is taken into account to detect microcalcifications. Many authors have assumed some distributions regarding the pixels. However it is not suitable in general. Hence a distribution free technique is developed. The Bootstrap PIXCALS algorithm has been developed to detect microcalcifications.

The Bootstrap, originally proposed and named by Efron(1979), is a computational technique that can be used to effectively estimate the sampling distribution of a Statistic[2]. In particular, one can use the nonparametric Bootstrap to estimate the sampling distribution of a statistic, while assuming only that the sample is representative of the population from which it is drawn and that the observations are independent and identically distributed. In its simplest form, the nonparametric Bootstrap does not rely on any distributional assumptions about the underlying population.

To see how the nonparametric bootstrap works, suppose we use a random variables, X, to evaluate the performance of a process. Although we do not have any information regarding the distribution of X, we wish to estimate some parameter, , that characterizes the performance of the process. For example, may be the mean, median, or standard deviation of the population. A sample of n observations is drawn from the population and denoted by x1, x2,…,xn. An estimate of the parameter of interest can be computed from this sample and referred to as .

According to the nonparametric bootstrap, the Empirical Distribution Function, EDF, can be used to estimate the underlying population cumulative distribution function. The EDF simply assigns a probability of 1/n to each value observed in the sample and is written

(number of Xi x) (9)

A simple random sample of size n can be drawn from the EDF and denoted by This sample is called the bootstrap sample. An estimate of can be computed from the bootstrap sample. This estimate is denoted by and is called bootstrap estimate. This resampling procedure can be repeated multiple times, for example, B times. The B bootstrap estimates can be computed from the resample. The Bootstrap PIXCALS algorithm is given below.

Bootstrap PIXCALS Algorithm to detect Microcalcifications

Step 1: Read the mammogram image and store it in a two dimensional matrix.

Step 2: Segment the Breast Region by using histogram threshold method.

Step 3: Apply the preprocessing (Median Filter) and enhancement technique (Unsharp masking) to remove noise, to enhance contrast and to enhance edges.

Step 4: Segment the subimage from enhanced image.

Step 5: Eliminate mild and extreme outliers from subimage.

Step 6: Apply K means algorithm to segment the mammogram into K subgroups.

Step 7: Observe k subgroups of size n for a total of n.k observations.

Step 8: Draw a random subgroup of size n, with replacement, from the pooled sample of nk observations. This sample is a bootstrap sample.

Step 9: Compute the sample mean from the bootstrap sample drawn in step 8.


Step 10: Repeat steps 7-9 , many numbers of times, say N times.

Step 11: Sort the B bootstrap estimates,

Step 12. Find the smallest ordered, such that values are below it. This is the Lower Bandwidth limit, LBL.

Step 13: Find the smallest ordered such that values are below it. This is the Upper Bandwidth limit, UBL.

Here, α is the desired false alarm rate. . It must be in the range of

Step 14: Segment the Region of Interest (ROI) based on the threshold value UBL. The threshold image R(x,y) is defined as,


Step 15: The resulting image R(x,y) which contains the white pixels is termed to be Microcalcifications.


The experiments were conducted on digitized mammograms with a spatial resolution of 200 µm from the Mini-MIAS database, which were clipped or padded, so that every image is of the size 1024Ã-1024 pixels, comprising tumor cases amounting to 25 images and 25 randomly selected non tumor cases. All of them had been read by radiologist with mammographic expertise and proven by biopsies. In the experiment value is taken as 0.1, 0.2.The bootstrap sample size N is 1000.

The images in figure 3 are used to demonstrate the robustness of the proposed method. The enhanced MIAS subimages (mdb214, mdb227, mdb209, mdb218) are given in Fig. 3(a). The segmentation of microcalcifications is indicated by white pixels and is shown in Fig. 3(b). The results shows that the microcalcifications are detected and their main features are also well earmarked.

Fig. 3(a)

Fig. 3(b)

Fig. 3(a) - Enhanced MIAS subimages (mdb209, mdb214, mdb218, mdb227).

Fig. 3(b) - Resulting Images -segmentation of microcalcifications indicated by

white pixels.

10. Receiver Operating Characteristic (ROC) Analysis and Discussion

An ROC curve is a plot of variation of True Positive Rate (TPR) against False Positive Rate (FPR) that will determine the performance of a Computer Aided Diagnosis (CAD) system. Diagnostic tests have particular importance in medicine, where early and accurate diagnosis can decrease the morbidity and mortality due to diseases. For many years, diagnostic performance was influenced by the accuracy of test.

According to Seong Ho Park et. al., [20], the area under the ROC curve is an important criterion for evaluating diagnostic performance. Usually it is referred as the Az index. The value of Az is 1.0 when the diagnostic detection has perfect performance, which means that True Positive (TP) rate is 100% and False Positive (FP) rate is 0%. The estimation of the Az value is obtained by trapezoidal rule over ROC curve.

An important characteristic of the MIAS database is that each abnormal image comes with a consultant radiologist's truth information, i.e., the locality of the abnormality is given as the coordinate of its center and an approximate radius (in pixels) of a circle enclosing the abnormality. Based on this information, the test results of 50 images are manually classified as Normal, probably normal, probably abnormal and abnormal. Table I shows the different kinds of classes with corresponding frequencies for proposed method and existing methods.

Table I - Classification of resulting images with number of frequencies for

proposed and existing methods

Method Name

True Disease Status

Definitely Normal

Probably Normal

Probably Abnormal

Definitely Abnormal















Pixcals Refined Bandwidth Algorithm[22]













Proposed method













The TPR and FPR are calculated by using the following formulae,

TPR = TP / (TP + FN) (12)

FPR = 1 - (TN / (FP+TN)) (13)

. By using the data from Table I, True positive rate (TPR) and False positive rates (FPR) are calculated at four different operating points (or classes) are given on table II. Table II shows the TPR and FPR for the proposed and existing methods. ROC curve is generated by using Table II.

Table II: Measures of TPR and FPR for proposed and existing methods

Cut Points


Pixcals Refined Bandwidth Algorithm[22]

Proposed method








Definitely Normal and Probably Normal








Probably Normal and Probably Abnormal








Probably Abnormal and Definitely Abnormal







To deal with the multiple pairs of sensitivity and specificity values, one can draw a graph using the TPR as the y coordinates and the FPR as the x coordinates. Each discrete point on the graph called as operating point, is generated by using different cutoff levels for a positive test result. An ROC curve is connecting all the points obtained at all the possible cutoff levels. Fig. 4 shows the different ROC curve for proposed, Pixcals Refined Bandwidth and Pixcals algorithm. Area Under ROC (AUC) is a combined measure of sensitivity and specificity. The experimental results shows the TP rate of 95% which corresponding Area Under ROC curve is 0.95 (Az) for proposed method, TP rate of 94% which corresponding Az value is 0.94 for Refined Pixcals Bandwidth[21] and TP rate of 89% which corresponding Az value is 0.89 for Pixcals[22]. The proposed method gives better result when compared to the previous methods[21][22]. The ROC values are depicted in figure 4 which is constructed by using table II.

Fig. 4. ROC Curve for proposed and existing system


In this paper different kinds of mammograms in clinical applications are considered to detect microcalcifications. The BOOTSTRAP PIXCALS ALGORITHM is developed and used to detect microcal­cifications in digitized mammograms. This new system consists of three major steps including image preprocessing, segmentation of sub image, and microcalcifications detection in sub images. The proposed system is very efficient for locating microcalcifications in different kinds of distribution free mammograms. The sizes, shapes and intensities of the microcalcification clusters are well earmarked. The proposed method is developed using Matlab program to detect microcalcification. The system is capable of detecting microcalcifications and one can visualize the same through the output images obtained from the model and hence can be used for early detection of breast cancer. By using the Bootstrap Pixcals algorithm 95% True positive rate is achieved.