Computer Aided Diagnosis System For Lung Cancer Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Abstract - Computer Tomography (CT) is considered as the most sensitive imaging technique for early detection of lung cancer. On the other hand, there is a requirement for automated or semiautomated methodology in order to make use of large amount of data obtained CT images and more accurately understanding of individual images. Computer Aided Diagnosis (CAD) can be used efficiently for early detection of cancer in all areas of bogy such as lungs. Computer Aided Diagnosis (CAD) has been significant areas of research in the past two decades. Significant growth has been progressed in the field of lung cancer detection and CAD system because of its accuracy. The usage of existing CAD system for early detection of lung cancer with the help of CT images has been unsatisfactory because of its low sensitivity and false positive rates (FPR). This paper presents a CAD system which can automatically detect the lung cancer nodules with reduction in false positive rates. In this paper, different image processing techniques are applied initially in order to obtain the lung region from the CT scan chest images. Then the segmentation is carried with the help of Fuzzy Possibilistic C Mean (FPCM) clustering algorithm. With this segmentation, several diagnosis rules are applied in order to neglect the false positive rates. Finally for automatic detection of cancer nodules, Support Vector Machine (SVM) is utilized in this paper because of its simplicity and accuracy. The usage of SVM will helps in better classification of cancer nodules. The experimentation is conducted for the proposed technique by 1000 CT images collected from the reputed hospital.

Keywords - Computer Aided Diagnosis (CAD), Support Vector Machine (SVM) Nodule, Segmentation, False Positive Rates (FPR).


The lung cancer is considered as the notable cancer because it claims more than a million deaths every year. This lead to the requirement of lung nodule detection in chest Computer Tomography (CT) images [9] in advance. Thus the Computer Aided Diagnosis (CAD) [7, 13] system is very essential for early detection of lung cancer. Early finding of the disease is critical but the truth remains that only 20% of cases are detected in the first phase. Radiologists can miss up to 30% of lung nodules (which may develop into cancer) in chest radiographs due to the background anatomy of the lungs which can hide the nodules. CAD helps radiologists by performing preprocessing of the images and suggesting the most likely locations for nodules. Detection of lung nodules proceeds through techniques for suppressing the background structures in lungs which include the blood vessels, ribs and the bronchi. The images obtained will afford better chest structure which make good regions for nodule and can be further classified depending on characteristics like size, contrast and shapes. Simple rule based classifications on such features tend to produce a lot of false positives.

The difficulties faced by the CAD system for lung nodule detection are

Variation in nodule size

Density variation of nodules

Detection of nodule in the lung field

To overcome these problems, the author proposed a Computer Aided Diagnosing (CAD) [10] system for detection of lung nodules [8]. The lung cancer detection system is shown in figure 1. This paper initially apply the different image processing techniques such as Bit-Plane Slicing, Erosion, Median Filter, Dilation, Outlining, Lung Border Extraction and Flood-Fill algorithms for extraction of lung region. Then for segmentation Fuzzy Possibilistic C Mean (FPCM) algorithm is used and for learning and classification Support Vector Machine (EVM) is used.

Figure 1. The Lung Cancer Detection System


Yamomoto et al., [1, 17] proposed image processing for computer-aided diagnosis of lung cancer by CT (LSCT). This paper presents the image processing method for computer-aided diagnosis of lung cancer by CT (LSCT). LSCT is the recently developed mobile-type CT scanner for the mass screening of lung cancer. In this novel LSCT system, one important difficulty is the increase of image information to about 30 slices per person from 1 X-ray film. To overcome these problems, the author tried to minimize the image information significantly to be displayed for the doctor, by image processing algorithms.

Yeny Yim et al., [2] stated about Hybrid lung segmentation in chest CT images [11] for computer-aided diagnosis. The author proposes an automatic segmentation technique for accurately identifying lung surfaces in chest CT images. The proposed technique consists of three steps. Initially, lungs and airways are extracted by an inverse seeded region growing and connected component labeling. Next, trachea and large airways are delineated from the lungs by three-dimensional region growing. Then, accurate lung region borders are acquired by subtracting the result of the second step from that of the first step. The proposed technique has been applied to 10 patient datasets with lung cancer or pulmonary embolism. Experimental results indicate that the segmentation method extracts lung surfaces automatically and accurately.

Penedo et al., [3] put fourth Computer-aided diagnosis: a neural-network-based approach to lung nodule detection. In this paper, the authors have provided a computer-aided diagnosis system based on two-level artificial neural network (ANN) architecture. This technique was trained, tested, and evaluated in particular on the problem of detecting lung cancer nodules found on digitized chest radiographs. The initial ANN carries out the detection of suspicious regions in a low-resolution image. The input supplied to the second ANN is the curvature peaks computed for all pixels in every suspicious region. This is determined from the fact that small tumors possess an identifiable signature in curvature-peak feature space, where curvature is the local curvature of the image data when viewed as a relief map. The result of this network is thresholded at a selected level of significance to give a positive detection. Tests are carried out using 60 radiographs taken from a routine clinic with 90 real nodules and 288 simulated nodules. This paper employed free-response receiver operating characteristics method with the mean number of false positives (FP's) and the sensitivity as performance indexes to evaluate all the simulation results. The grouping of the two networks provide results of 89%-96% sensitivity and 5-7 FP's/image, depending on the size of the nodules [12].

Kanazawa et al., [4] described Computer aided diagnosis system for lung cancer based on helical CT images. In this paper, the author describes a computer assisted automatic diagnosis system [16] for lung cancer that detects tumor candidates at an early stage from helical computerized tomographic (CT) images. This mechanization of the process decreases the time complexity and increases the diagnosis confidence. The proposed algorithm consists of an analysis part and a diagnosis part. In the analysis part, this paper extracts the lung and pulmonary blood vessel regions and analyzes the features of these regions using image processing techniques. In the diagnosis part, this paper defines diagnosis rules based on these features, and detect tumor candidates using these rules. The author has applied the proposed algorithm to 450 patient's data for mass screening. The experimental results indicate that the proposed algorithm detected lung cancer candidates successfully.

Yamamoto et al., [5] explained Computer aided diagnosis system with functions to assist comparative reading for lung cancer based on helical CT image. The author have reported that a prototype computer-aided diagnosis (CAD) system [14] to automatically detect suspicious regions from chest CT images had been presented, and the CT screening system used was a TCT-900 super helix of the Toshiba Corporation. In this paper, the author proposes a new and automatic technique for an early diagnosis of lung cancer based on a CAD system in which all the CT images are read. In addition, the CAD system is equipped with functions to automatically detect suspicious regions from chest CT images, and to assist the comparative reading in retrospect. The main purpose of the CAD system is that it uses a slice matching algorithm for comparison of each slice image of the present and past CT scans, and an interface to display some features of the suspicious regions. The experimental results show that this CAD system can work effectively.

Cheran et al., [6] gave Computer aided diagnosis for lung CT using artificial life models. This paper introduces a novel computer assisted detection method for lung cancer from CT images. The proposed technique is based on different algorithms like: 3D region growing, active contour and shape models, centre of maximal balls but it can be said that at the core of this approach are the biological models of ants also known as artificial life models. In the initial step of the algorithm the images are undergoing a 3D region growing for identifying the ribcage. Once the ribcage is recognized, an active contour is used in order to build a confined area for the incoming ants that are deployed to make clean and accurate reconstruction of the bronchial and vascular tree. Then the branches of the recently reconstructed trees are checked to see whether they include nodules or not by using active shape models and also to see if there are any nodules attached to the pleura of the lungs (centre of maximal balls). The next process is to eliminate the trees in order to offer a cleaner algorithm for localizing the nodules which is achieved by applying snakes and dot enhancement algorithms.

A New CAD System for Early Diagnosis of Detected Lung Nodules is proposed by El-Baz et al., [18]. The growth rate is predictable by measuring the volumetric variation of the detected lung nodules over time, so it is important to accurately measure the volume of the nodules to quantify their growth rate over time. In this paper, the author introduces a novel Computer Assisted Diagnosis (CAD) system for early diagnosis of lung cancer. The projected CAD system involves five main steps. They are are:

Segmentation of lung tissues from computed tomography (CT) images,

Identification of lung nodules from segmented lung tissues,

A non-rigid registration technique to align two successive LDCT scans and to correct the motion artifacts caused by breathing and patient motion,

Segmentation of the detected lung nodules, and

Quantification of the volumetric changes.

This preliminary categorization results based on the analysis of the growth rate of both benign and malignant nodules for 10 patients (6 patients diagnosed as malignant and 4 diagnosed as benign) were 100% for 95% confidence interval. The experimental results of the proposed image analysis have yielded promising results that would supplement the use of current technologies for diagnosing lung cancer.


The initial stage of the proposed technique is lung region extraction using several image processing techniques. The second stage is segmentation [15] of extracted lung region using Fuzzy Possibilistic C Mean (FPCM) algorithm. Then the diagnosis rules for detecting false positive regions are elaborated. Finally, Support Vector Machine (SVM) technique is applied in order to classify the cancer nodules.

Lung Region Extraction

The initial stage of the proposed Computer Aided Diagnosing (CAD) [7, 13] techniques is the extraction of lung region from the CT scan image. The basic image processing techniques are utilized for this purpose. The methods and steps involved in the extraction of lung region from CT image are shown in figure 2. The image processing techniques applied in the proposed technique are Bit-Plane Slicing, Erosion, Median Filter, Dilation, Outlining, Lung Border Extraction and Flood-Fill algorithms.

Usually, the CT chest image not only contains the lung region, it also contains background, heart, liver and other organ areas. The main aim of this lung region extraction process is to detect the lung region and regions of interest (ROIs) from the CT scan image.

The first step in lung region extraction is application of bit plane slicing algorithm to the CT scan image. The different binary slices will be resulted from this algorithm. The best suitable slice with better accuracy and sharpness is chosen for the further enhancement of lung region.

The next is application of Erosion algorithm which enhances the sliced image by reducing the noise from the image. Then dilation and median filters are applied to the enhanced image for further improvement of the image from other distortion. Outlining algorithm is then applied to the noise reduced images to determine the outline of the regions. The lung region border is then obtained by applying the lung border extraction technique. Finally, flood fill algorithm is applied to fill the obtained lung border with the lung region. After applying these algorithms, the lung region is extracted from the CT scan image. This obtained lung region is further used for segmentation in order to detect the cancer nodule.

Figure 2. The proposed lung regions extraction method.









Figure 3. Lung regions extraction algorithm: a. original CT image, b. bit-plane-2, c. erosion, d. median filter, e. dilation, f. outlining, g. lung region borders, and h. extracted lung.

Figure 3 shows the application of different image processing techniques for the extraction of lung region from the CT scan image. The lung region obtained finally is shown in figure 3 (h).

Lung Regions Segmentation

After the lung region is detected, the next process is segmentation of lung region in order to find the cancer nodules. This step will identify the region of interest (ROIs) which helps in determining the cancer region. In this paper, Fuzzy Possibilistic C Mean (FPCM) is implemented for segmentation.

Fuzzy Possibilistic C Mean (FPCM)

FPCM is a clustering algorithm that combines the characteristics of both fuzzy and possibilistic c-means. Memberships and typicalities are important for the correct feature of data substructure in clustering problem. Thus, an objective function in the FPCM depending on both memberships and typicalities can be shown as:

With the following constraints:

A solution of the objective function can be obtained via an iterative process where the degrees of membership, typicality and the cluster centers are updated via:

FPCM produces memberships and possibilities simultaneously, along with the usual point prototypes or cluster centers for each cluster. FPCM is a hybridization of possibilistic c-means (PCM) and fuzzy c-means (FCM) that often avoids various problems.

After the segmentation is performed to the lung region, the feature extraction and cancer diagnosis can be performed with the segmented image.

Features Extraction and Formulation of Diagnostic Rules

After the segmentation is performed on lung region, the features can be obtained from it and the diagnosis rule can be designed to exactly detect the cancer nodules in the lungs. This diagnosis rules can eliminate the false detection of cancer nodules resulted in segmentation and provides better diagnosis.

Feature Extraction

The features that are used in this paper in order to generate diagnosis rules are:

Area of the candidate region

The maximum drawable circle (MDC) inside the candidate region

Mean intensity value of the candidate region

Area of the candidate region

This feature can be used here in order to

Eliminate isolated pixels.

Eliminate very small candidate object.

With the help of this feature, the detected regions that do not have the chance to form cancer nodule are detected and can be eliminated. This helps in reducing the processing in further steps and also reduces the time taken by further steps.

The maximum drawable circle (MDC)

This feature is used to indicate the candidate regions with its maximum drawable circle (MDC). All the pixels inside the candidate region are considered as center point for drawing the circle. The obtained circle within the region is taken for consideration. Initially radius of the circle is chosen as one pixel and then the radius is incremented by one pixel every time until no circle can be drawn with that radius. Maximum drawable circle helps in the diagnostic procedure to remove more and more false positive cancerous candidates.

Mean intensity value of the candidate region

In this feature, the mean intensity value for the candidate region is calculated which helps in rejecting the further regions which does not indicate cancer nodule. The mean intensity value indicates the average intensity value of all the pixels that belong to the same region and is calculated using the formula:

Where j characterizes the region index and ranges from 1 to the total number of candidate regions in the whole image. Intensity (i) indicates the CT intensity value of pixel i, and i ranges from 1 to n, where n is the total number of pixels belonging to region j.

Formulation of Diagnostic Rules

After the necessary features are extracted, the following diagnosis rules can be applied to detect the occurrence of cancer nodule. There are three rules which are involved are as follows:

Rule 1: Initially the threshold value T1 is set for area of region. If the area of candidate region exceeds the threshold value, then it is eliminated for further consideration. This rule will help us in reducing the steps and time required for the upcoming steps.

Rule 2: In this rule maximum drawable circle (MDC) is considered. The threshold T2 is defined for value of maximum drawable circle (MDC). If the radius of the drawable circle for the candidate region is less than the threshold T2, then that is region is considered as non cancerous nodule and is eliminated for further consideration. Applying this rule has the effect of rejecting large number of vessels, which in general have a thin oblong, or line shape.

Rule 3: In this, the range of value T3 and T4 are set as threshold for the mean intensity value of candidate region. Then the mean intensity values for the candidate regions are calculated. If the mean intensity value of candidate region goes below minimum threshold or goes beyond maximum threshold, then that region is assumed as non cancerous region.

By implementing all the above rules, the maximum regions which are not considered as cancerous nodules are eliminated. The remaining candidate regions are considered as cancerous regions. This CAD system helps in neglecting all the false positive cancer regions and helps in detecting the cancer regions more accurately. These rules can be passed to the Support Vector Machine (SVM) in order to detect the cancer nodules for the supplied lung image.

Support Vector Machine (SVM)

SVM is usually used for classification tasks introduced by Cortes. For binary classification SVM is used to find an optimal separating hyper plane (OSH) which generates a maximum margin between two categories of data. To construct an OSH, SVM maps data into a higher dimensional feature space. SVM performs this nonlinear mapping by using a kernel function. Then, SVM constructs a linear OSH between two categories of data in the higher feature space. Data vectors which are nearest to the OSH in the higher feature space are called support vectors (SVs) and contain all information required for classification. In brief, the theory of SVM is as follows.

Consider training set with each input n i x ∈ Rn and an associated output yiÎ{ -1, +1}. Each input x is firstly mapped into a higher dimension feature space F, by z=Æ (x) via a nonlinear mapping Æ: Rn →F. When data are linearly non-separable in F, there exists a vector w ∈ F and a scalar b which define the separating hyper plane as:

where x( ³0) are called slack variable. The hyper plane that optimally separates the data in F is one that

where C is called regularization parameter that determines the tradeoff between maximum margin and minimum classification error. By constructing a Lagrangian, the optimal hyper plane according to previous equation, may be shown as the solution of

where a1,…..,aL are the nonnegative Lagrangian multipliers. The data points i x that correspond to ai>0 are SVs. The weight vector w is then given by

For any test vector x ∈ Rn , the classification output is then given by

To build an SVM classifier, a kernel function and its parameters need to be chosen. So far, no analytical or empirical studies have established the superiority of one kernel over another conclusively. In this study, the following three kernel functions have been applied to build SVM classifiers:

Linear kernel function, K(x, z) =áx, zñ ;

2) Polynomial kernel function K(x, z) =(áx, z ñ+1) d is the degree of polynomial;

3) Radial basis function is the width of the function.

SVM kernel functions

The classification ability of feature combinations in gait applications is obtained with first attempt work of SVM kernel function. The three main kernel functions are used for our study here. Partial kernel function, influence to data near test points. The above mentioned kernel functions are briefly explained in this chapter. The most used kernel function for SVM is Radial Basis Function (RBF).

Radial Basis Function Kernel: The B-Spline kernel is defined on the interval [−1, 1]. It is given by the recursive formula:

In the work by Bart Hamers it is given by:

 Alternatively, Bn can be computed using the explicit expression:

Where x+ is defined as the truncated power function:

Linear Kernel: The Linear kernel is the simplest kernel function. It is given by the inner product <x,y> in addition with an optional constant c. Kernel algorithms using a linear kernel are often equivalent to their non-kernel counterparts.

Polynomial Kernel: The Polynomial kernel is a non-stationary kernel. Polynomial kernels are apt for problems where all the training data is normalized.

Modifiable parameters are the slope alpha, the constant term c and the polynomial degree d.

After the learning process is completed by providing several conditions, the proposed technique can be able to detect the cancer occurrence in the lung region automatically.


The experiments are conducted on the proposed computer-aided diagnosis systems with the help of lung images obtained from the reputed hospital. This experimentation data consists of 1000 lung images. Those 1000 lung images are passed to the proposed CAD system. The diagnosis rules are then generated from those images and these rules are passed to the Support Vector Machine (SVM) for the learning process. After learning, a lung image is passed to the proposed CAD system. Then the proposed system will process through its processing steps and finally it will detect whether the supplied lung image is with cancer or not.

Table I. Results of applying the proposed CAD system to the dataset.

Lung Image

No. of Slices

No. of Cancerous Nodules

True Positive

























3(<2 mm size)










4(<2 mm size)























Table 1 shows the results obtained by applying the proposed CAD system to the data images obtained from the reputed hospital. The number of slices obtained for the dataset is 2441 from which best suited slice is chosen for further proceedings. The number of cancerous nodule in the dataset is 15 and 7 nodules are less than 2mm size. The proposed technique detects 9 cancer nodules correctly. The false positive region detected by the proposed CAD system is 117. This result is better detection when compared to the conventional CAD system.


This paper presents the better Computer Aided Diagnosing (CAD) system for automatic detection of lung cancer. The initial process is lung region detection by applying basic image processing techniques such as Bit-Plane Slicing, Erosion, Median Filter, Dilation, Outlining, Lung Border Extraction and Flood-Fill algorithms to the CT scan images. After the lung region is detected, the segmentation is carried out with the help of Fuzzy Possibilistic C Mean (FPCM) clustering algorithm. With these, the features are extracted and the diagnosis rules are generated. These rules are then used for learning with the help of Support Vector Machine (SVM). The experimentation is performed with 1000 images obtained from the reputed hospital. The experimental result shows that the proposed CAD system can able to detect the false positive nodules correctly. Also the usage of Support Vector Machine will increase the accuracy of classification the cancer nodules.