This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Breast cancer has always been a threat to many women and has become the most common disease in many developed countries and it is on the rise in developing countries. The limitations of mammography as a screening and diagnostic modality, especially in young women with dense breasts, necessitated the development of novel and more effective modalities with better sensitivity and specificity. Currently, there are several non-invasive techniques to detect breast cancer. These techniques are based on ionizing radiation (mammography, computed tomography), nuclear imaging (scintimammography, positron emission tomography), light properties (optical imaging), thermal properties (thermography), electrical properties (electrical impedance tomography, electrical impedance scanning), magnetic properties (magnetic resonance imaging, magnetic resonance spectroscopy), and acoustic properties (ultrasound, elastography). Each of these modalities has advantages and limitations in terms of effectiveness for all breast densities, cost, radiation exposure risk, speed, and patient comfort .
In this work, we investigated the effectiveness of infrared thermography in detecting breast cancer. Thermal imaging (thermography) is a noninvasive imaging procedure used to record the thermal patterns (called thermograms) using an Infrared (IR) camera. Changes have been detected and measured in the skin temperature of clinically healthy and cancerous breasts [2, 3]. The metabolic activity and vascular circulation in pre-cancerous tissue and its surrounding area are often higher than in normal breast tissue . The cancerous tumors increase circulation to their cells in order to supply nutrients by opening existing blood vessels, dormant (inactive) vessels and new ones. Thus, this increased circulation results in an increase in regional surface temperature of the breast that can be detected by infrared imaging. This procedure involves the use of medical infrared cameras and computers to detect and produce high quality images of temperature variations. Most breast cancer detection modalities are used to locate the tumor. Infrared imaging, on the other hand, focuses on finding thermal signs that suggest the presence of an early stage tumor which cannot be detected physically or suggest a pre-cancerous stage based on small variations in normal blood vessel activity. Thus, the earliest signs of breast cancer and the pre-cancerous state of the breast can be observed in the temperature spectrum.
In a study by Gautherie et al , 1527 patients with initially healthy breasts and abnormal thermograms were followed for a period of 12 years. 44% of the patients developed cancer within the first five years. The group concluded that "an abnormal thermogram is the single most important marker of high-risk category for the future development of breast cancer". Similar conclusion was obtained in many other studies [6, 7]. On the downside, the accuracy of thermography depends on many factors such as the symmetry of the temperature distribution in the breasts, temperature stability, physiological state, and menstruation . Determination of the breast surface isotherm pattern and the normal range of cyclic variations of temperature distribution can better assist in identifying the abnormal infrared images of diseased breasts. Therefore, Ng et al  investigated the cyclic variation of temperature and vascularization of the normal breast thermograms under a controlled environment. The authors presented a method to segment the thermograms and to choose an ideal time for thermal examination. Nowadays, IR imaging is becoming an increasingly popular diagnostic tool to detect various diseases. It has been widely used to detect the malignant tumors in the breast by thermovision techniques [8-11].
Manual inspection of the thermograms to detect the presence of cancer is tedious, time-consuming, and is prone to inter-subject variability. To make the interpretation more objective, Computer Aided Diagnostic (CAD) techniques are being used. These techniques are based on data mining framework. In this framework, the acquired data (images, in this case) are first pre-processed. Subsequently, features are extracted and selected from the pre-processed images and used in classifiers which classify an image into cancerous or non-cancerous. In this work, we have attempted to evaluate the utility of texture based features from the thermograms in detecting cancer.
The texture of an image can be defined as a function of spatial variation in pixel intensities . In the medical field, analysis of texture plays an important role in a number of applications. Recently, Tan et al used texture features to study the ocular thermograms in young and elderly subjects  and found a significant difference in their respective texture parameters.
The block diagram depicting the data mining framework used in this work is shown in Figure 1. The acquired thermograms are cropped and converted to gray-scale as part of the pre-processing step. Subsequently, texture features that are based on co-occurrence matrix and run-length matrix are extracted, and the features which are significant among these extracted features are selected using the t-test. These features are fed to several classifiers to classify the input thermogram into normal or cancerous.
Figure 1 Block diagram of the data mining framework employed in this work
Different statistical and several features of co-occurrence matrix and run length matrix
This chapter is organized as follows. In Section 2, we provide details about the data acquisition process and the pre-processing steps. Section 3 presents the theoretical description of the texture features. Classifiers are described in Section 4. The features and the classification results are presented in Section 5. We discuss the results obtained in Section 6, and conclude the chapter in Section 7.
2 Data Acquisition and Pre-processing
The thermograms used in this work were collected using non-contact thermography from patients in the Department of Diagnostic Radiology, Singapore General Hospital, Singapore [14, 15]. Infrared thermograms were acquired using NEC-Avio Thermo TVS2000 MkIIST System (3.0-5.4Âµm short wavelength; 30 frames/sec; Stirling cooler; InSb detector with 256x200 element; from Japan) (Website: www.nec-avio.co.jp/en/contact/index.html). This system has a measurement accuracy of Â±0.4% (full scale) and temperature resolution of 0.1Â°C at 30Â°C black body, with the instrument placed one meter away from the chest with lens (FOV 15o x 10o, IFOV 2.2mrad) attached. 90 patients were chosen at random to undergo the thermography examination. Examination was done in a temperature-controlled room with the temperature range of 20Â°C to 22Â°C (within Â±0.1Â°C). Humidity of the examination room was maintained at 60%Â±5% [16-20]. The patients were required to rest for at least 15 minutes to stabilize and reduce the basal metabolic rate in order to ensure minimal surface temperature changes, and therefore, to obtain satisfactory thermograms [21, 22]. Moreover, the patients were asked to wear a loose gown that does not restrict airflow. Furthermore, it was ensured that the patients were within the recommended period of the 5th to 12th and 21st day after the onset of menstrual cycle since during these periods the vascularization is at basal level with least engorgement of blood vessels [8, 23].
In this work, we have used a total of 50 thermograms, where 25 thermograms were from cancer patients (age range: 51Â±8 years) and 25 were from normal subjects (age range: 46Â±10 years). In the cancerous class, 15 patients had stage III cancer and rest had stage II cancer. 50% of the lumps were found in the upper-outer quadrant, 35% in the area behind the nipple, and 15% were located in the upper-inner quadrant. We have analyzed the cancerous breast in each of the 25 cancerous cases and one normal breast in each of the 25 normal cases. Figure 2(a) shows the thermogram image of a cancerous breast, Figure 2(b) presents the corresponding gray-scale image, and the 50 x 120 cropped images of the left and right breasts are shown in Figure 2(c).
Figure 2 Thermogram images of a cancerous case: (a) Original thermogram (b) Gray-scale version (c) Cropped right and left breasts
3 Texture features
Image texture provides measures of properties such as smoothness, coarseness, and regularity of pixels in an image. These features describe the mutual relationship among intensity values of neighboring pixels repeated over an area larger than the size of the relationship. The three main approaches used for texture description are statistical, structural, and spectral . Structural techniques analyze the arrangement of image primitives, such as the regularity of parallel lines and such. Spectral techniques use Fourier spectrum of the image to analyze the high energy areas and narrow peaks in the spectrum. In this work, we used the statistical approach, which generally characterizes the textures as smooth, coarse, grainy etc. Measures include entropy, contrast, and correlation based on the gray level co-occurrence matrix. The statistical features extracted from the thermograms are described in this section.
3.1 Gray-Level Co-occurrence Matrix (GLCM)
Gray-level co-occurrence matrix is a matrix that stores the spatial relationship between pixels. The GLCM function will scan through the image pixels and characterize the texture by calculating how often pairs of pixels with specific gray-level intensities and in a specified spatial relationship occurs in the image i.e. each element (i, j) in the resultant GLCM matrix indicates the total number of times a pixel with the gray-level intensity value of i occurs in a specific spatial relationship to a pixel with the gray-level intensity value of j. The size of the GLCM depends on the gray-level scale. Once the GLCM matrix is formed, features are extracted from it for further processing.
Mathematically, given an image of size M x N, the GLCM is defined by
where , and |:| denotes the cardinality of a set . Using this matrix, four features are calculated: (1) Homogeneity measures the closeness of the distribution of elements in the GLCM to the GLCM diagonal. (2) Energy indicates the denseness of an image. (3) Contrast depicts the local variations or differences in the image. (4) Entropy measures the randomness or the degree of disorder in an image. It will have maximum value when all elements of the co-occurrence matrix are the same.
Given the gray-level intensity i in an image, the probability that a pixel at a (âˆ†x, âˆ†y) distance away has a gray-level intensity j can be expressed as
The next four features that were extracted were the following moments: m1, m2, m3, and m4, which were obtained using equation (3).
The GLCM matrix is also used to calculate the difference statistics. Difference statistics indicate the distribution of the probability that the gray-level difference is k between the points separated by Î´ in an image . They are the subset of co-occurrence matrix, and are obtained from the GLCM matrix using the equation
where |i-j| = k, k = 0, 1, â€¦, n - 1, and n is the number of gray-scale levels. Weszka et al [26, 27] computed the following features from, and we extracted the same in this work.
Angular Second Moment: (5)
3.2 Run Length Matrix
A gray level run is a set of consecutive pixels having the same gray-level intensity value. The run length indicates the number of pixels in the run. In the run length matrix, the gray-level runs are characterized by the gray tone, length, and the direction of the run. Each entry in the matrix indicates the number of elements where the gray-level intensity i has the run length j continuous in the direction Î¸ . Various textural features may be calculated from the run length matrices of Î¸ = 0Â°, 45Â°, 90Â°, and 135Â° . The features extracted from the run length matrix are given below.
Short Run Emphasis: (9)
Long Run Emphasis: (10)
Gray Level Non-Uniformity: (11)
Run Length Non-Uniformity: (12)
Run Percentage: (13)
where A denotes the area of the image.
Normalization is performed to scale down the values of the computed features.
Classifiers use features of the input data to learn to classify the data to belong to a particular class. For many years, classifiers have been studied and used in the medical field to improve the prediction accuracy. A review of some of these studies can be found in Paliwal et al. . Classifier performance depends greatly on the characteristics of the features. There is no single classifier that works best for all datasets. Hence, in this study, several classifiers were evaluated to determine the classifier that presents the highest accuracy. The following classifiers were used in this work: Back Propagation Neural Network (BPNN), Gaussian Mixture Model (GMM), Support Vector Machine (SVM), Fuzzy classifier, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), and Probabilistic Neural Network (PNN). These classifiers are briefly described in this section.
Back Propagation Neural Network (BPNN): Back propagation algorithm was created by generalizing the Widrow-Hoff learning rule to multi-layer networks and nonlinear differentiable transfer functions. Input vectors and corresponding target vectors are used to train a network until it can associate input vectors with specific output target vectors. The back propagation algorithm consists of two paths: the forward path and the backward path. The forward path includes creating a feed forward network, initializing weight, simulation and training the network. The network weights and biases are updated in the backward path. The feed-forward networks often have one or more hidden layers of sigmoid neurons followed by an output layer of linear neurons. Multiple layers of neurons with nonlinear transfer functions allow the network to learn nonlinear and linear relationships between input and output vectors. The linear output layer allows the network to produce values outside the range -1 to +1. Before training a feed-forward network, the weight and biases must be initialized. We used random numbers around zero to initialize weights and biases in the network. The training process requires a set of inputs and corresponding class labels as target outputs. During training, the weights and biases of the network are iteratively adjusted to minimize the network performance function. The most commonly used performance function for feed-forward networks is the mean square error between the network outputs and the target output. The weight update aims at maximizing the rate of error reduction, and hence, it is termed as a 'gradient descent' algorithm . The weight increment is done in small steps; the step size is chosen heuristically, as there is no definite rule for its selection. In the present case, a learning constant of 0.9 (which controls the step size) was chosen empirically.
Gaussian Mixture Model (GMM): A Gaussian Mixture Model (GMM) is a parametric model used to estimate a continuous probability density function from a set of multi-dimensional feature observations. It is widely used in data mining, pattern recognition, machine learning and statistical analysis. This Gaussian mixture distribution can be described as a linear superposition of K multidimensional Gaussian components. In the case of classification, the training dataset is used to determine the GMM parameters for each class. During testing, the predicted class of a test sample is the class which has the maximum probability.
Support Vector Machine (SVM): The SVM classifier has illustrated excellent performance in a great deal of pattern recognition problems. The SVM is a supervised learning method which aims at determining a separating hyperplane that maximizes the margin between the input data classes which are viewed in an n-dimensional space (n stands for the number of features used as inputs). To calculate the margin, two parallel hyperplanes are constructed, one on each side of the separating hyperplane. These two hyperplanes are computed directly using the training set, and are then used during the testing phase. Input data are often transformed to high-dimensional feature space with the use of nonlinear kernel functions, so that the transformed data becomes more separable compared to the original input data.
Fuzzy classifier: In a fuzzy classification system, the pattern space is divided into multiple subspaces. In each subspace, the relationships between the input patterns and their classes are described by if-then type fuzzy rules. The advantage of this system is that a non-linear classification boundary can be easily implemented. In this work, a Fuzzy Inference System (FIS) was generated with the help of subtractive clustering technique. This clustering technique is used to estimate the number of clusters and the cluster centers in the examined dataset. A FIS is composed of inputs, outputs, and a set of rules that dictate the behavior of the fuzzy system. Each input and output has as many membership functions as the number of clusters that was chosen by the clustering technique. A radius parameter is used to specify a cluster center's range of influence in each of the data dimensions. Once the training is over, a FIS structure that contains a set of fuzzy rules to cover the feature space is generated. This is utilized to perform fuzzy inference calculations of the test data.
Linear Discriminant Analysis (LDA): LDA is used to find the linear combination of features which best separates the two classes.
Quadratic Discriminant Analysis (QDA): A QDA classifier separates the classes using a quadratic surface.
Probabilistic Neural Network (PNN): The basic idea of a PNN is that the predicted class of a sample is likely to be about the same as other samples that have close values of the predictor features. The PNN generally has four layers. The features are fed to the input layer, which consists of one neuron for each feature. In the next hidden layer, there is one neuron for each training sample, and this neuron stores the features and the corresponding target value for that sample. When a test data arrives, this neuron computes the Euclidean distance of the test case from the neuron's center point and then applies a radial basis kernel function using a particular sigma value. The resulting value is passed to the neurons in the pattern layer. The pattern layer sums these values for each class of inputs to produce a vector of probabilities as its net output. Then a 'compete' transfer function in the output decision layer picks the maximum of these probabilities, and assigns a class label 1 for that class and a 0 for the other classes.
As indicated previously, we had extracted 17 texture features from the thermograms. These features were subjected to a t-test. A two-sample t-test is a parametric test used to estimate whether the mean value of a normally distributed outcome variable (result) is significantly different between two groups of participants i.e. two classes. A two-tailed t-test uses a null hypothesis that states that there is no difference between the two means. Once a t-value is determined, a p-value is found using a table of values from the t-distribution. Then the determined p-value is compared with a level of significance (Î±-level). Popular levels of significance are 5% (0.05), 1%Â (0.01) and 0.1%Â (0.001). If the p-value is lower than the Î±-level, the null hypothesis is rejected. This means that the means are not equal (in the case of two-tailed test). Table 1 lists the Mean Â± Standard Deviation (SD) values of all the features in both the classes. The p-value is also listed in the table.
It is evident from Table 1 that among all these features, only a few had significantly low p-values (<0.001). Such significance indicates that these features might have more discriminatory power over the other features in classifying cancerous and normal classes. Therefore, we selected these features (listed separately in Table 2) for use in classifiers.
Table 1 Extracted features and their range (mean Â± SD) for normal and cancerous cases (p-value indicated)
Features from the Gray-Level Co-occurrence Matrix
Features from the Run Length Matrix (0Â°)
Short Run Emphasis
Long Run Emphasis
Gray Level Non-Uniformity
Run Length Non-Uniformity
Table 2 Selected features and their range (mean Â± SD) for normal and cancerous cases (p-value indicated)
41.0 Â± 16.9
68.3 Â± 26.1
less than 0.0001
5.625E+04 Â± 8.240E+04
2.515E+05 Â± 2.531E+05
0.404 Â± 2.119E-02
0.420 Â± 1.958E-02
Gray Level Non-Uniformity
8.506E+03 Â± 668
9.114E+03 Â± 868
5.2 Classification Results
When the total number of samples is relatively small, there is a need to go for resampling techniques such as hold-out technique, resubstitution method, leave-one-out method, k-fold cross validation, and bootstrap techniques. The most popular and time-tested efficient technique for small sample size is the k-fold cross validation technique. This method divides the available samples into k approximately equally sized disjoint subsets. (k-1) subsets are used for training, and the remaining subset is used for testing to get the performance measures. The process is repeated k times and the final performance measure is the average of all the k measures. In classification problems, stratified k-fold cross validation is used. This technique is similar to cross validation, with one important addition: the subsets are built so as to preserve the original class distribution in all subsets, i.e., in this work, the proportion of non-cancerous to cancerous samples is the same in all subsets. This guarantees that both the training and test sets have approximately equal class distribution. Three-fold stratified cross validation method was used in this work.
The performance measures used were accuracy, sensitivity, specificity, and positive predictive accuracy. Sensitivity is the probability that a test will produce a positive result when used on diseased population. Specificity is the probability that a test will produce a negative result when used on disease-free population. Accuracy is the ratio of the number of correctly classified samples to the total number of samples. The positive predictive value is the proportion of patients with positive test results who are correctly diagnosed.
Moreover, a Receiver Operating Characteristic (ROC) curve is obtained by calculating the sensitivity and specificity of a diagnostic test at different threshold values and plotting sensitivity vs. (1 - specificity). A test that perfectly discriminates between the two groups (normal and abnormal) would yield a curve that coincides with the left and top sides of the plot. Generally, the goodness of a diagnostic test is assessed by determining the Area under the ROC curve (AUC), which can vary between 0.5 and 1. In practice, the closer the area is to 1.0, the better the test is, and the closer the area is to 0.5, the worse the test is.
Table 3 shows the results of the classification. In each fold, 36 images (18 normal and 18 cancerous) were used for training, and 14 thermograms (seven in each class) were used for testing. It is evident from the table that the SVM classifier resulted in the highest accuracy of 88.10%. Table 4 shows the values of sensitivity, specificity, positive predictive accuracy, and the AUC. It can be seen from the Table 4 that the SVM classifier presents good sensitivity (85.71%) and specificity (90.48%) at the same time, and also good positive predictive accuracy and a high AUC of 0.8810. The ROC curves of the classifiers are presented in Figure 3.
Table 3 Percentage accuracy of the different classifiers
No. of training data
No. of testing data
Percentage of correct classification (%)
Table 4 Performance measures of the various classifiers
Positive Predictive Accuracy
Area under the curve
Figure 3 ROC curves of the classifiers
5.3 Graphical User Interface (GUI)
Figure 4 shows the snapshot of graphical user interface developed for our proposed technique. The user can click on the Load Image button to load the thermogram image that is to be classified. The loaded image and the gray-scale versions of the cropped left and right sides are displayed. Moreover, the patient information such as the name, age, and the gender are also displayed on the top left hand side. To obtain a diagnosis, the user can then click on the Support Vector Machine (SVM) push button, following which the features are automatically extracted from the image, fed into the SVM classifier, and the predicted class of the thermogram is displayed.
Figure 4 Graphical user interface of the proposed technique
In this section, we present the results of a few related studies and discuss the results obtained in this paper. In a recent study by Schaefer et al , the features derived from cross co-occurrence matrix were used in a fuzzy classifier to detect breast cancer in thermograms. Their proposed algorithm was able to identify the malignancies with an accuracy of 80%. Tan et al  proposed a Complementary Learning Fuzzy Neural Network (CLFNN) as a Computer-Assisted Intervention (CAI) tool for breast thermogram analysis. Experimental results show that the confluence of breast thermography and CLFNN provides a low cost alternative and also aids the physician in breast cancer detection and thermogram analysis with relatively superior accuracy. Application of k- and fuzzy c-means for color segmentation of thermal infrared breast images was studied by EtehadTavakol et al . They suggested that fuzzy c-means is preferred because the fuzzy nature of IR breast images helps it to provide more accurate results with no empty cluster. Recently, Wiecek et al  used the Discrete Wavelet Transform with biorthogonal and Haar mother wavelets to extract features from thermograms, and used the features in neural networks to detect cancer. They reported an accuracy of 86.6%.
In another study , discrete temperature readings were recorded by placing 16 temperature sensors on the surface of the breast to detect normal, benign, cancer, and suspected cancer stages. They used five classifiers namely BPNN, PNN, Fuzzy, GMM, and SVM for classification. They were able to achieve more than 80% accuracy in classifying the four different classes.
Similar to the results of the above mentioned studies, we have demonstrated the utility of breast surface temperature as an indicator for malignancy. This method is suitable for young women for whom mammography has proved to be less efficient. A thermogram presents a visual representation of 'hot spots' of the breast, and hence, the interpretation may be subjective. Therefore, we extracted texture features from the 50 thermograms and used them in seven classifiers for automatic classification. Such a technique makes the interpretation more objective and automatic, and therefore, inter-observer variability of diagnostic prediction is highly reduced. By using the SVM classifier and the texture features, we have demonstrated that our proposed technique has a higher classification accuracy of 88.10% in differentiating normal and malignant breasts. The sensitivity and specificity were also high (85.71% and 90.48%, respectively).
As per the proposed procedure, the computational requirements to pre-process the images and extract features are less. The classifiers are general purpose classifiers. The codes and the GUI which were written in Matlab (Website: http://www.mathworks.com/products/matlab/) can be easily converted to executable files and implemented in any doctor's office. Thus, as long as the clinic or hospital has a thermogram acquisition set-up, the cost of implementation of this CAD tool is minimal. The patient comfort level is high and the technique has fast-throughput. Even though the accuracy is higher than other studies, to be used in a hospital setting, it is necessary to test the technique with a higher sample size and also to try to improve the accuracy further. To improve the accuracy, in future we intend to evaluate only segmented portions of suspicious areas on the thermograms rather than using the whole cropped image. Moreover, we have randomly chosen seven classifiers for evaluation and comparison in this work. More classifiers have to be tested with increased sample size in order to improve the accuracy. Out of the 25 cancerous cases studied here, 10 patients had stage II cancer and 15 had stage III cancer. To study the effectiveness of the proposed approach for early detection, we intend to use thermograms obtained from women with early stage I small malignancies as part of our future work.
Breast cancer, being one of the most common diseases among women, warrants more attention in terms of developing modalities to effectively detect it, mostly, at the early stages. In this work, we have studied the infrared thermography technique which is currently an adjunct modality. Since the obtained thermograms are images, interpretation of the same will be subjective. Therefore, we have presented an automatic computer aided diagnosis technique for the assessment of breast cancer using thermograms. We extracted several texture features from normal and cancerous thermograms, selected the significant features among them using t-test, and fed these selected features into seven classifiers in order to find the classifier that gives the highest accuracy. We found that the SVM classifier was the best with a high accuracy of 88.10%, sensitivity and specificity of 85.71% and 90.48% respectively. Such good performance measures indicate the possibility that infrared thermogram along with such a CAD technique can be a valuable and reliable adjunct modality. The GUI that we developed for this technique can be easily incorporated into any computer at the doctor's office.