Lung Cancer Detection Using Extreme Learning Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The best way to reduce death rates due to this disease is to treat it at an early stage. Early diagnosis of Lung cancer requires an effective procedure to allow physicians to differentiate between benign tumors from malignant ones. Computer-Aided Diagnosis (CAD) systems can be used to assist with this task. CAD is a non-trivial problem, and present approaches employed have the complexity in increasing both sensitivity to tumoral growths and specificity in identifying their nature.

CAD is an approach designed to reduce observational oversights and the false negative rates of physicians interpreting medical images. Future clinical studies have proved that there will be an increased use of cancer detection with CAD assistance. Computer programs have been widely used in clinical practices that support radiologists in detecting possible abnormalities on diagnostic radiology exams. The most common application is the computer aided (or assisted) detection, commonly referred to as CAD. The term CAD is pattern recognition technique that recognizes malignant features on the image and it is reported to the radiologist, in order to minimize false negative readings. CAD technique is presently FDA and CE approved for use with both film and digital mammography, for both screening and diagnostic exams; for chest CT; and, for chest radiographs. The main aim of CAD is to enhance the detection of disease by minimizing the false negative rate due to observational oversights. By using CAD, there are no demands on the radiologist. The main aspect of the approach is to increase disease detection quality. CAD approaches are developed to investigate for the same features that a radiologist expects during case review. Thus, CAD algorithms in terms of breast cancer on mammograms look for micro calcifications and masses. On chest radiographs and CT scans, present CAD approaches look for pulmonary densities that have particular physical features.

The development of CAD systems is mainly to support the radiologist and not to replace the radiologist. For instance, a CAD system could scan a mammogram and draw red circles around suspicious areas. Later, a radiologist can observe these areas and examine the true nature of those areas.

A number of CAD schemes have been investigated in literature. These include:

Subtraction approaches that detect abnormality by comparison with normal tissue

topographic approaches that perform feature extraction and analysis to detect abnormalities

Filtering approaches that use digital signal processing filters to augment abnormalities for easy detection

staged expert systems that perform rule-based analysis of image data in an attempt to provide a correct diagnosis

Most of the CAD approaches follow the subtraction techniques [157] in which the detection of abnormalities is by searching for image differences based on comparison with known normal tissue. In topographic techniques, the detection of the anomalies is based on image feature identification and extraction of features that associate with pathological anomalies, such as in texture analysis [158]. Most approaches follow the following stages which includes

examining the image data

extracting pre-determined features

Localizing regions of interest or ROIs which can be observed for potential abnormalities.

Several of these approaches are used for high degrees of sensitivity, but many have been vulnerable by high false-positive rates and hence low specificity. The problem of false positives is aggravated by the fact that false positive rates are reported per image, not per case. As many radiological examinations include more than one image, the actual number of false positives may be a multiple of those reported.

A number of different approaches have been employed in an effort to reduce false positive rates, many of them focusing on the use of Artificial Neural Networks (ANN), Machine Learning Approaches etc. Receiver Operating Curve or ROC is a general metric used for evaluating the performance of CAD systems and is commonly used to evaluate a CAD approach degree of tradeoff between sensitivity and specificity.

CAD is basically depends on highly complex pattern recognition. X-ray images are scanned for suspicious structures. Generally a few thousand images are needed to optimize the algorithm. Digital image data are copied to a CAD server in a DICOM-format and are prepared and analyzed in several steps.

The art of taking in raw data and taking an action depending on the classification of the pattern is generally defined as pattern recognition. Most research in pattern recognition is about methods for supervised learning and unsupervised learning. The main purpose of pattern recognition is to categorize data (patterns) based on either a priori knowledge or on statistical information obtained from the patterns. The patterns to be categorized are normally groups of measurements or observations, defining points in a suitable multidimensional space. The entire pattern recognition system contains a sensor that gathers the observations to be classified or described, a feature extraction approach that evaluates numeric or symbolic information from the observations, and a classification or description scheme that does the actual job of classifying or describing observations, relying on the extracted features. The classification approach is generally based on the presence of a set of patterns that have already been classified.

In order to overcome these problems, this chapter introduces a proposed Computer Aided Diagnosing (CAD) system for detection of lung nodules using the Extreme Learning Machine. The lung cancer detection system is shown in figure 6.1. The proposed approach initially apply the different image processing techniques such as Bit-Plane Slicing, Erosion, Median Filter, Dilation, Outlining, Lung Border Extraction and Flood-Fill algorithms for extraction of lung region. Then for segmentation Modified Fuzzy Possibilistic C Mean algorithm [155] is used and for learning and classification Extreme Learning Machine is used.

Lung Regions Extraction

Segmentation of lung region using MFPCM

Analysis of segmented lung region

Formation of diagnosis rules

Classification of occurrence and non occurrence of cancer in the lung using ELM

Figure 6.1: The Lung Cancer Detection System


Machine learning is a scientific discipline that is mainly based on the design and development of techniques that allow computers to develop characteristic features based on empirical data. In order to capture the unknown underlying probability distribution of the learner, the previous experience is of great help. Data is seen as examples that show relations between observed variables. A main aim of machine learning research is to make them automatically learn to identify complex patterns and make intellectual decisions based on the nature of data. As the possible inputs are too large to be covered by the set of training data, it is very tough.

The main aim of a learner is to generalize from its past experience. The training data from its experience come from unknown probability distribution and the learner has to get something more general, something about that distribution that offers significant responses for the future cases. Figure 6.2 shows the architecture of ELM.

E:\THESIS-Protein Predicton\ELM\2.png

Figure 6.2: Machine Learning Approach

Importance of Machine Learning

There are several reasons for the machine learning approach still being an important technique. The important engineering reasons are:

Certain operations can be defined only by instances. It is able to identify input / output pairs but a brief relationship between inputs and desired outputs can be obtained only by instances. Machines are expected to alter their internal structure to create correct outputs for a large number of sample inputs and thus properly limit their input/output function to approximate the relationship implicit in the instances.

Machine learning techniques are often used in the extraction of the relationships and correlations among data (data mining).

Most of the machines designed by human can not perform in the environments in which they are used. Moreover, certain characteristics of the working environment are not entirely known at design time. Machine learning approaches can be used for on-the-job enhancement of existing machine designs.

Certain works has large amount of knowledge available which is tough for the humans to encode explicitly. But machines are capable of learning this knowledge and perform better.

Environments keep on changing. But the machines that can adapt to a changing environment are very significant as it reduces the need for constant redesign.

Human constantly discover the new knowledge about tasks. There is constant change in the vocabulary. So the redesign of Artificial Intelligence systems according to the new knowledge is impractical, but machine learning approaches can track the changes and can easily update the new technologies.

Types of machine learning algorithms

Machine learning algorithms are organized into taxonomy, based on the desired outcome of the algorithm.

Supervised learning: It generates a function that maps inputs to desired outputs.

Unsupervised learning: It models a set of inputs, like clustering.

Semi-supervised learning: This type combines both labeled and unlabeled samples to produce a suitable classifier.

Reinforcement learning: It learns how to act given an observation of the world.

Transduction: It predicts novel outputs depending on training inputs, training outputs, and test inputs.

Learning to learn: This approach learns its own inductive bias depending on previous experience.

Extreme Learning Machines

Extreme Learning Machines have very high capability that can resolve problems of data regression and classification. Certain challenging constraints on the use of feed-forward neural networks and other computational intelligence approaches can be overcome by ELM. Due to the growth and improvement in the ELM techniques, it integrates the advantages of both neural networks and support vector machines by having faster learning speed, requiring less human intervene and robust property. An Example of ELM is depicted in Figure 6.3.

E:\THESIS-Protein Predicton\ELM\ELM1.jpg

Figure 6.3: An Example of ELM

ELMs parameters can be analytically determined rather than being tuned. This algorithm provides good generalization performance at very fast learning speed. From function approximation point of view ELM is very different compared to the traditional methods. ELM shows that the hidden node parameters can be completely independent from the training data.

In conventional learning theory, the hidden node parameters cannot be created without seeing the training data.

In ELM, the hidden node parameters can be generated before seeing the training data.

Salient features of ELM

Compared to popular Back propagation (BP) Algorithm and Support Vector Machine (SVM), ELM has several salient features:

Ease of use: Except predefined network architecture, no other parameters need to be manually tuned. Users need not have spent much time in tuning and training learning machines.

Faster learning speed: The time taken for most of the training will be in milliseconds, seconds, and minutes. Other conventional methods cannot provide such a fast learning speed.

Higher generalization performance: The generalization performance of ELM is better than SVM and back propagation in most cases.

Applicable for all nonlinear activation functions: Discontinuous, differential, non-differential functions can be used as activation functions in ELM.

Applicable for fully complex activation functions: Complex functions can also be used as activation functions in ELM.

All the parameter of the feed forward networks need to be tuned and thus the dependency between different layers of parameters exist. Gradient descent-based methods have been used in various learning algorithms of feed forward neural networks. These approaches are usually very slow due to improper learning steps or may easily converge to local minima. To achieve the significant learning performance, many iterative learning steps are required by such learning algorithms


The initial stage of the proposed technique is lung region extraction using several image processing techniques. The second stage is segmentation [156] of extracted lung region using Fuzzy Possibilistic C Mean (FPCM) algorithm. Then the diagnosis rules for detecting false positive regions are elaborated. Finally Extreme Learning Machine (ELM) technique is applied in order to classify the cancer nodules.

Five phases included in the proposed computer aided diagnosis system for lung cancer detection are as follows:

• Extraction of lung region from chest computer tomography images

• Segmentation of lung region using Modified Fuzzy Possibilistic C-Mean

• Feature extraction from the segmented region

• Formation of diagnosis rules form the extracted features

• Classification of occurrence and non occurrence of cancer in the lung

Phase 1: Extraction of Lung Region from Chest Computer Tomography Images

The first phase of the proposed Computer Aided Diagnosing system is the extraction of lung region from the chest computer tomography scan image. This phase uses the basic image processing methods. The procedure for performing this phase using the image processing methods is provided in figure 6.4. The image processing methods used for this phase are Bit-Plane Slicing, Erosion, Median Filter, Dilation, Outlining, Lung Border Extraction and Flood-Fill algorithms.

Usually, the CT chest image not only contains the lung region, it also contains background, heart, liver and other organs areas. The main aim of this lung region extraction process is to detect the lung region and regions of interest (ROIs) from the CT scan image.

The first step in lung region extraction is application of bit plane slicing algorithm to the CT scan image. The different binary slices will be resulted from this algorithm. The best suitable slice with better accuracy and sharpness is chosen for the further enhancement of lung region.

The next is application of Erosion algorithm which enhances the sliced image by reducing the noise from the image. Then dilation and median filters are applied to the enhanced image for further improvement of the image from other distortion. Outlining algorithm is then applied to determine the outline of the regions from the obtained from noise reduced images. The lung region border is then obtained by applying the lung border extraction technique. Finally, flood fill algorithm is applied to fill the obtained lung border with the lung region. After applying these algorithms, the lung region is extracted from the CT scan image. This obtained lung region is further used for segmentation in order to detect the cancer nodule.

Original Image

Extracted lung

Bit-Plane Slicing


Median Filter



Lung Border Extraction

Flood Fill Algorithm

Figure 6.4: The proposed lung regions extraction method.

Figure 6.5 shows the application of different image processing techniques for the extraction of lung region from the CT scan image. The lung region obtained finally is shown in figure 6.5 (h).

















Figure 6.5: Lung Regions Extraction Algorithm: a. Original CT Image, b. Bit-Plane Slicing, c. Erosion, d. Median Filter, e. Dilation, f. Outlining, g. Lung Region Borders, and h. Extracted Lung.

Phase 2: Segmentation of Lung Region Using Modified Fuzzy Possibilistic C-Mean

The second phase of the proposed CAD system is the Segmentation of lung region. The segmentation is performed for determining the cancer nodules in the lung. This phase will identify the Region of Interest (ROI) which helps in determining the cancer region. Modified Fuzzy Possibilistic C-Mean (MFPCM) is used in the proposed technique for segmentation rather than Fuzzy Possibilistic C Mean because of better accuracy of MFPCM.

FPCM algorithm merges the advantages of both fuzzy and possibilistic c-means techniques. Memberships and typicalities are essential for the accurate characteristic of data substructure in clustering technique. Thus, an objective function in the FPCM is based on memberships and typicalities can be given as:


With the following constraints:


The result of the objective function can be achieved through an iterative method where the degrees of membership, typicality and the cluster centers are given by:




FPCM constructs memberships and possibilities concurrently, together with the normal point prototypes or cluster centers for every cluster.

The objective function choosing is the very important aspect for the performance of the cluster method and to accomplish better clustering. Therefore the clustering performance is based on objective function to be utilized for clustering. To producing an suitable objective function, the following set of requirements are considered:

The distance between clusters and the data points allocated to them must be reduced

The distance between clusters must to be reduced

The desirability between data and clusters is modeled by the objective function. Also Wen-Liang Hung provides a new technique called Modified Suppressed Fuzzy C-Means, which considerably improves the function of FPCM because of a prototype-driven learning of parameter α. The learning procedure of α is dependent on an exponential separation strength between clusters and is updated at every iteration. The parameter α is described as:


where β represents a normalized term so that β is taken as a sample variance. That is, β is described as:



However the statement which must be presented here is the common value used for this parameter by every data at each iteration, which may provided in error. Thus the weight parameter is introduced for determining common value for α. Or each point of the data set contains a weight in association with each cluster. So the usage of weight allows providing good classification particularly in the case of noise data. So the weight is determined as given below:


Where wji indicates the weight of the point j in accordance with the class i. This weight is used to alter the fuzzy and typical separation. FPCM technique is iterative in nature, since it is not likely to change any of the objective functions evaluated directly. Otherwise to categorize a data point, cluster centroid has to be nearer to the data point, it is membership; and for determining the centroids, the typicality is used for reducing the undesirable cause of outliers. The objective function contains two expressions:

Fuzzy function and use of fuzziness weighting exponent

Possibililstic function and use of typical weighting exponent

But the two coefficients in the objective function are alone used as exhibitor of membership and typicality. A new relation, slightly unusual, offers a very fast reduction in the function and enhances the membership and the typicality when they inclined near 1 and reduce this degree when they are near 0. This relation is to afford weighting exponent as exhibitor of distance in the two under objective functions. The objective function of the MFPCM can be described as below:


U = {μij} indicates a fuzzy partition matrix, and is described as:


T = {tij} indicates a typical partition matrix, is represented as:


V = {vi} indicates c centers of the clusters, is represented as:


As MFPCM modifies its membership function according to the weight, the segmentation can be performed with better accuracy. When the boundary regions of the cancer nodules are considered, the usage of FPCM will sometimes misjudge the edge of nodule because of its fixed membership function. This problem is overcome by the usage of MFPCM. As the membership function varies according to the weight of a particular region, this helps in reducing the misclassification of borders of the cancer nodule. Some times the cancer nodules will appear in almost same intensity as that of lung region. In this case, the usage of FPCM will not detect the cancer nodule rather it will misclassify the region as lung region. When MFPCM is used, those cancer nodules can be exactly identified because of its capability of changing the membership function.

After the segmentation is performed in the lung region, the feature extraction and cancer diagnosis can be performed with the segmented image.

Phase 3: Feature Extraction from the Segmented Region

After the segmentation is performed on the lung region, the features can be obtained from it for determining the diagnosis rule for detecting the cancer nodules in the lung region perfectly.

The features that are used in this approach in order to generate diagnosis rules are:

Area of the candidate region

The maximum drawable circle (MDC) inside the candidate region

Mean intensity value of the candidate region

Area of the candidate region

This feature can be used here in order to

Eliminate isolated pixels.

Eliminate very small candidate object.

With the help of this feature, the detected regions that do not have the chance to form cancer nodule are detected and can be eliminated. This helps in reducing the processing in further steps and also reduces the time taken by further steps.

The maximum drawable circle (MDC)

This feature is used to indicate the candidate regions with its maximum drawable circle (MDC). All the pixels within the candidate region are taken as center point for drawing the circle. The resulted circle within the region is taken for consideration. Initially radius of the circle is chosen as one pixel and then the radius is incremented by one pixel every time until no circle can be drawn with that radius. Maximum drawable circle helps in the diagnostic procedure to remove more and more false positive cancerous candidates.

Mean intensity value of the candidate region

In this feature, the mean intensity vale for the candidate region is calculated which helps in rejecting the further regions which does not indicate cancer nodule. The mean intensity value indicates the average intensity value of all the pixels that belong to the same region and is calculated using the formula:


Where j characterizes the region index and ranges from 1 to the total number of candidate regions in the whole image. Intensity (i) indicates the CT intensity value of pixel i, and i ranges from 1 to n, where n is the total number of pixels belonging to region j.

Phase 4: Formation of Diagnosis Rules from the Extracted Features

After the necessary features are extracted, the following diagnosis rules can be applied to detect the occurrence of cancer nodule. The three rules involved for diagnosis are as follows:

Rule 1: Initially the threshold value T1 is set for area of region. If the area of candidate region exceeds the threshold value, then it is eliminated for further consideration. This rule will helps in reducing the steps and time necessary for the upcoming steps.

Rule 2: In this rule maximum drawable circle (MDC) is considered. The threshold T2 is defined for value of maximum drawable circle (MDC). If the radius of the drawable circle for the candidate region is less than the threshold T2, then that is region is considered as non cancerous nodule and is eliminated for further consideration. Applying this rule has the effect of rejecting large number of vessels, which in general have a thin oblong, or line shape.

Rule 3: In this, the rage of value T3 and T4 are set as threshold for the mean intensity value of candidate region. Then the mean intensity values for the candidate regions are calculated. If the mean intensity value of candidate region goes below minimum threshold or goes beyond maximum threshold, then that region is assumed as non cancerous region.

By implementing all the above rules, the maximum of regions which does not considered as cancerous nodules are eliminated. The remaining candidate regions are considered as cancerous regions. This CAD system helps in neglecting all the false positive cancer regions and helps in detecting the cancer regions more accurately. These rules can be passed to classifier in order to detect the cancer nodules for the supplied lung image.

Phase 5: Classification of Occurrence and Non Occurrence of Cancer in the Lung

The final phase in the proposed CAD system is the classification of occurrence and non occurrence of cancer nodule for the supplied lung image. The classifier used in this proposed approach is the Extreme Learning Machine (ELM).

Extreme Learning Machine

Extreme Learning Machine (ELM) meant for Single Hidden Layer Feed-Forward Neural Networks (SLFNs) will randomly select the input weights and analytically determines the output weights of SLFNs. This algorithm tends to afford the best generalization performance at extremely fast learning speed.

The structure of ELM network is shown in figure 6.6. ELM contains an input layer, hidden layer and an output layer.

The ELM has several interesting and significant features different from traditional popular learning algorithms for feed forward neural networks. These include the following:

Output Layer


Hidden Layer

Input Layer

Figure 6.6: Structure of ELM network

The learning speed of ELM is very quick when compared to other classifier. The learning process of ELM can be performed in seconds or less than seconds for several applications. In all the previous existing learning techniques, the learning performed by feed-forward network will take huge time even for simple applications.

The ELM has enhanced generalization result when compared to the gradient-based learning like. The existing gradient-based learning technique and a few other learning techniques may encounter several problems like local minima, not proper learning rate and over fitting, etc. To overcome these problems, some techniques like weight decay and early stopping techniques must be utilized in these existing learning techniques.

The ELM will attain the results directly without such difficulties. The ELM learning algorithm is much simple than the other learning techniques for feed-forward neural networks. The existing learning techniques can be applied to only differentiable activation functions, whereas the ELM learning algorithm can also be used to train SLFNs with many non-differentiable activation functions.

Extreme Learning Machine Training Algorithm

If there are N samples (xi, ti), where xi = [xi1, xi2… xin] T ƒŽRn and ti = [ti1, ti2, … , tim]T ƒŽRn, then the standard SLFN with N hidden neurons and activation function g(x) is defined as:


where wi = [wi1, wi2, … , win]T represents the weight vector that links the ith hidden neuron and the input neurons, ßi = [ßi1, ßi2, … , ßim]T represents weight vector that links the ith neuron and the output neurons, and bi represents the threshold of the ith hidden neuron. The "." in wi . xj indicates the inner product of wi and xj. The SLFN try to reduce the difference between oj and tj. This can be given as:


or, more in a matrix format as H ß = T, where




The matrix H is the hidden layer output matrix of the neural network. If the number of neurons in the hidden layer is equal to the number of samples, then H is square and invertible. Otherwise, the system of equations requires to be solved by numerical methods, concretely by solving


The result that reduces the norm of this least squares equation is


where H† is known as Moore-Penrose generalized inverse. The most significant properties of this result are:

Minimum training error.

Smallest norm of weights and best generalization performance.

The minimum norm least-square solution of Hβ = T is unique, and is


The ELM algorithm works as follows

Give a training set activation function g(x) and hidden neuron, do the following

Assigning random value to the input weight and the bias

Find the hidden layer output matrix H.

Find the output weight, using β ̂=H⁺T, where, H and T are defined in the same way they were defined in the SLFN specification above.


This chapter provides computer aided diagnosis system for early detection of lung cancer. The chest computer tomography image is used in this proposed. This chapter explains about the machine learning approaches and its importance. In the first phase of the proposed technique, the lung region is extracted from the chest tomography image. The different basic image processing techniques are used for this purpose. In the second phase, extracted lung is segmented with the help of modified fuzzy possibilistic c-means algorithm. The next phase is extraction of features for diagnosis from the segmented image. Next, the diagnosis rules are generated from the extracted features. Finally with the obtained diagnosis rules, the classification is performed using ELM to detect the occurrence of cancer nodules. ELM has several salient features and characteristics which make it very useful for the purpose of classification in this proposed approach.

The next chapter deals with the experimental observations for the proposed approach.