Ultrasound Thyroid Image Characterization Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Using right equipment and well trained personnel, ultrasound of the neck can detect a large number of non-palpable thyroid nodules. However, this technique often suffers from subjective interpretations and poor accuracy in the differential diagnosis of malignant and benign thyroid lesions. Therefore, we developed an automated identification system based on knowledge representation techniques for characterizing the intra-nodular vascularization of thyroid lesions. Twenty nodules (ten benign and ten malignant), taken from 3-D high resolution ultrasound (HRUS) images were used for this work. Malignancy was confirmed using fine needle aspiration biopsy and subsequent histological studies. A combination of discrete wavelet transformation (DWT) and texture algorithms were used to extract relevant features from the thyroid images. These features were fed to different configurations of AdaBoost classifier. The performance of these configurations was compared using receiver operating characteristic (ROC) curves. Our results show that the combination of texture features and DWT features presented an accuracy value higher than that reported in the literature. Among the different classifier setups, the perceptron based AdaBoost yielded very good result and the area under the ROC curve was 1 and classification accuracy, sensitivity and specificity were 100%. Finally, we have composed an Integrated Index called thyroid malignancy index (TMI) made up of these DWT and texture features, to facilitate distinguishing and diagnosing benign or malignant nodules using just one index or number. This index would help the clinicians in more quantitative assessment of the thyroid nodules.

Keywords: Thyroid, computer aided diagnosis, high resolution ultrasound, texture, discrete wavelet transform.


In 2009, the National Cancer Institute in the US estimated the number of new thyroid cancer cases to be 37,200 and the expected number of deaths from this type of cancer to reach 1630 [1]. The Surveillance, Epidemiology, and End Results database has estimated an increase of 3% per year in the incidence of thyroid cancer [2, 3]. Although the incidence of thyroid cancer appears to be increasing, the number of patients evaluated with a thyroid nodule to exclude carcinoma remains far greater. Thyroid nodules are common findings in clinical practice and occur in more than 50% of adult population; fortunately, only 7% of thyroid nodules are malignant [4].

Early detection of malignant nodules is of paramount importance for successful treatment. Due to the large number of subjects with thyroid nodules (50% of the adult population), the diagnostic technique must be cost-effective. Moreover, studies indicate that diagnostic tests for thyroid nodules are not sensitive or specific enough given the large number of benign lesions found today [5, 6]. Automated diagnosis support systems are one of the ways to achieve cost efficiency, because such systems shift work from humans to machines, and work performed by machines is more cost-efficient and consistent when compared to human labor. One promising way to build thyroid diagnosis support systems is to use the medical images for analysis. Medical image processing is effective because the characteristic textural features which represent major histopathologic components of the thyroid nodules can be extracted from the images. Ultrasound imaging is more cost-effective when compared to other medical imaging methods that are used for thyroid diagnosis. Chen et al. used ultrasound images to study common texture analysis methods to characterize thyroid nodules [7]. Image features were classified according to the corresponding pathologic findings. This study provided good justification for our decision to base our study on 3D ultrasound images.

Doppler ultrasound imaging has been widely used to differentiate malignant from benign thyroid nodules. Results show that benign nodules tend to have no or minimal internal flow with presence or absence of a peripheral ring, whereas malignant nodules tend to have a peripheral ring with an extensive internal flow [8]. Doppler studies evidenced the need for a quantitative evaluation of the internal nodule flow, to avoid subjective interpretations and partial visions that were caused by the bidimensional nature of the traditional high-resolution ultrasonography (HRUS). Maizlin et al. studied the sonographic features of Hürthle cell neoplasms (HCNs) of the thyroid [9].  Hürthle cell neoplasms showed a spectrum of sonographic appearances from predominantly hypoechoic to hyperechoic lesions and from peripheral blood flow with no internal flow to vascularized lesions. Pathologic criteria differentiating benign and malignant HCNs were beyond the resolution of sonography and fine-needle aspiration biopsy, and therefore, required removal of the entire lesion.  Hence, this precludes diagnosis and characterization of HCNs by sonography.

Contrast-enhanced ultrasound imaging (CEUS) has been introduced to improve the differential diagnosis of solitary thyroid nodules. CEUS studies of the thyroid nodules, conducted by analyzing the increase in nodule echogenicity due to contrast palpitation, did not prove to perform better than traditional HRUS [10]. On the contrary, CEUS perfusion analysis of neoplastic nodules proved effective in differentiating benign from malignant lesions, even though the performance was poor when compared to fine needle aspiration examinations [11].

HRUS is the most widely used tool for initial detection, analysis, and follow-up of thyroid lesions. It accurately reveals formations as small as 1 mm, besides being non-invasive and low cost [12]. Even though it has been demonstrated that malignancy is related to common features in HRUS B-Mode images (i.e., microcalcifications, marked hypoechogenicity, irregular margins, and the absence of a hypoechoic halo around the nodule [13]), interpretative pitfalls remain. As a result, these challenges decrease the sensitivity and the specificity of the HRUS technique.

In this paper, we show that a perfect classification of HRUS images into benign or malignant thyroid images is possible. We decided to use HRUS instead of CEUS because in the case of CEUS overlapping findings limit its potential in differential diagnosis of malignant and benign thyroid lesions [10]. The main technical contribution of this study is that the proposed CAD system combines DWT and texture feature extraction methods. The extracted features were fed into various configurations of AdaBoost classifier for classification. Furthermore, these features were used as a basis for the so called thyroid malignancy index (TMI). This TMI might prove to be useful in clinical practice for the differential diagnosis of thyroid nodules.

The layout of the paper is as follows: The data acquisition method used for this work is explained in Section 2. Section 2 also presents the feature extraction method and the different configurations of the AdaBoost classifiers used. Section 3 reports the ranges of the selected features, classification results, and the values of the TMI. Discussion is given in Section 4 and the paper concludes in Section 5.


Computer aided diagnosis systems are constructed by combining a feature extraction subsystem and a classification subsystem. In this work, the feature extraction subsystem produces a feature vector, the elements of which come from DWT and texture analysis techniques. The feature vectors are fed to one of the four different AdaBoost configurations. These configurations differ from one another only by the weak learner algorithm which is used by the main AdaBoost classification algorithm. Figure 1 shows the block diagram of the proposed CAD system. Each sub-block in this diagram is described in the following sections.

Figure Block diagram of the proposed CAD system.

2.1. HRUS Data Collection

Twenty patients with previously confirmed diagnosis of solitary thyroid nodule were enrolled in this study. Ten subjects were male (age: 53.5 ± 13.3 years; range: 22 - 71 years) and ten female (age: 50.1 ± 10.8 years; range: 25 - 68 years). All patients signed an informed consent prior to participating in the experiment. The experimental protocol was approved by the ethical committee of the Endocrinology section of the "Umberto I'" Hospital of Torino (Italy).

All the subjects underwent a clinical examination, hormonal profile, and ultrasound (B-Mode and Color Doppler) examination of the lesion. A trained operator with more than 30 years of experience in neck ultrasonography (R.G.) performed a freehand scanning of all the patients. In view of the bulkiness and weight of external mechanical scanning systems and the variability associated with the nodules dimension and its position, we had decided to perform a freehand scanning. The high frame rate of the device compared to the slow movement of the probe ensured that there is no gap between adjacent frames. The average frame rate of the device during acquisitions was 16 Hz. Images were acquired by a MyLab70 ultrasound scanner (Biosound-Esaote, Genova, Italy) equipped by a LA-522 linear probe working in the range 4-10 MHz. All images were acquired at 10 MHz. The volumes were transferred in DICOM format to an external workstation (Apple PowerPc, dual 2.5 GHz, 8 G RAM) equipped with processing and reconstruction software.

All the subjects underwent ultrasound-guided fine needle aspiration biopsy (FNAB) of the thyroid lesion. Ten nodules were malignant (six papillary, one follicular and one Hurtle cells carcinoma), and ten benign (struma nodules). We acquired 40 data sets from each of the ten patients diagnosed with malignant nodules and 40 data sets from each patient with benign thyroid nodules. In total, there were 400 benign and 400 malignant data sets for further analysis. The histo-pathological analysis confirmed the diagnosis of malignant carcinoma for all the ten patients. The results of the FNAB were used as reference for the benign nodules, which were all reported to be struma nodules. Figures 2(a) and 2(b) show the typical benign and malignant thyroid images obtained using HRUS.

Figure (a) Figure 2(b)

Figure 2: Typical thyroid HRUS images: (a) Benign (b) Malignant.

2.2. Feature extraction

Feature extraction is one of the most important steps in CAD systems. The challenge is to extract relevant and representative features from input data. In this work, DWT and texture features were extracted from preprocessed HRUS images.

DWT feature extraction

Discrete Wavelet Transform, a most commonly used technique in image processing, uses filter banks composed from finite impulse response filters to decompose signals into low and high pass components [14]. . The low pass component contains information about slow varying signal characteristics, and the high pass component contains information about sudden changes in the signal.

When low pass filtering is applied to both the rows and columns of the image, the LL coefficients are obtained. These coefficients are representative of the total energy in the images. When low pass filtering is applied to the rows, and high pass filtering to the column values, the resultant HL coefficients contain the vertical details of the image. Row-wise high pass filtering and column-wise low pass filtering result in the LH coefficients, which contain the horizontal details of the image. High pass filtering of both row and column values results in the finest-scale HH coefficients, which contain the diagonal details of the image. Decomposition is further performed on sub-band LL to attain the next coarser scale of wavelet coefficients.

Figure 3 Passband structure for a 2D sub-band transform with 3 levels.

In our work, we first converted the image to grayscale and then applied DWT using Daubechies (db) 8 as the mother wavelet [15]. Figure 3 shows the complete passband structure for a 2D sub-band transform with three levels. Elements of the individual sub-bands represented as matrixes are combined to form a feature. All sub-band features are combined similar to the matrix elements addition method. First, all the elements within the individual rows of the matrix are added. The elements of the resulting vector are squared and then summed up to form a scalar. Then the scalar is normalized by dividing it by the number of rows and columns of the original matrix. These features are represented as A2, H1, H2, V1, V2, D1, D2 in Figure 3 and correspondingly as A2, H1, H2, V1, V2, D1, D2 in Figure 1.

Texture feature extraction

Texture features measure the smoothness, coarseness, and regularity of pixels in an image. These features describe a mutual relationship among intensity values of neighboring pixels which is repeated over an area larger than the size of the relationship [16]. In the statistical texture analysis approach, scalar measurements of the texture such as entropy, contrast, and correlation based on the gray level co-occurrence matrix are determined. This approach characterizes the textures as smooth, coarse, grainy etc. . Another technique called the structural texture analysis method is complex compared to the statistical approach [17]. This technique presents detailed symbolic descriptions of the image. Parameters extracted using the statistical approach are more suitable for image analysis than those obtained using the structural method [18]. This section describes the statistical parameters that were extracted from the HRUS images. The Gray Level Co-occurrence Matrix (GLCM) of an m Ã- n image I is defined by


where , and denotes the cardinality of a set [14]. The probability of a pixel with a gray level value i having a pixel with a gray level value j at a distance away in an image is


Based on the GLCM, we obtain the following features:





The local variation between two pixels is captured by the contrast feature, while the similarity between pixels is quantified by the homogeneity measure. Furthermore, the denseness and degree of disorder in an image are measured by entropy features. The entropy feature will have a maximum value when all elements of the co-occurrence matrix are the same.

2.3. Classifiers Used

In this work, we have used four different configurations of the AdaBoost classifiers for classification. The main AdaBoost algorithm and its various configurations are briefly explained below.

AdaBoost classifier

The AdaBoost algorithm was introduced in 1995 by Freund and Schapire [19]. It is a meta-classifier which is used to improve the performance of weak classifiers. We approach the discussion in a standard way by defining a training set of m training examples , where .

The first step in the AdaBoost algorithm initializes the distribution D which is used to guide the weak learning algorithm:

. (6)

The next step is the learning phase where the weak learners are gradually improved.

For :

Find the classifier , where , that minimizes the error with respect to the distribution Dt:

, where

if εt > 0.5 then stop.

Choose , typically where εt is the weighted error rate of classifier ht.



where Zt is a normalization factor (chosen so that Dt + 1 will be a probability distribution, i.e. sum one over all x).

Output of the final classifier:


Thus, after selecting an optimal classifier for the distribution , the examples that the classifier identified correctly are weighed less and those that it identified incorrectly are weighed more. Therefore, when the algorithm is testing the classifiers on the distribution, it will select a classifier that better identifies those examples that the previous classifier missed. Next, we describe the weak learner algorithms, also called the base classifiers, which were used in this study.

C4_5 Base Classifier

C4_5, a statistical classifier, builds decision trees from a set of training data using the concept of information entropy. The algorithm used to generate the decision tree was developed by Quinlan [20]. At each node of the tree, C4_5 chooses one attribute of the data that most effectively splits its set of samples into subsets enriched in one class or the other. The attribute with the highest normalized information gain (difference in entropy) is chosen to make the decision. The C4_5 algorithm is then applied on the smaller sublists.

Perceptron Base Classifier

The perceptron, a type of feed-forward neural network, is a binary classifier which maps its input x (a real-valued vector) to an output value f(x) (a single binary value) using the following relationship [21].


where w is a vector of real-valued weights, is the dot product (which computes a weighted sum), and b is the 'bias', which is a constant term that does not depend on any input value and that is used to alter the position of the decision boundary. The value of f(x) (0 or 1) is used to classify x as either a positive or a negative instance. The perceptron learning algorithm does not terminate for non-linearly separable data.

Pocket Base Classifier

The pocket algorithm with ratchet [22] solves the stability problem of perceptron learning by keeping the best solution seen so far "in its pocket". The pocket algorithm then returns the solution in the pocket, rather than the last solution. The weights are actually modified only if a better weight vector is found. Therefore, this approach returns the "best" linear solution to a separation even when the training set is not linearly separable.

Stump Base Classifier

A decision stump is a machine learning model which consists of a one-level decision tree [23]. A stump has only one internal node (the root) which is immediately connected to the terminal nodes. A decision stump makes a prediction based on the value of just a single input feature. A stump classifier st is defined by


The stump ignores all entries of x except, therefore, it is equivalent to a linear classifier defined by an affine hyper-plane.

2.4. Performance Evaluation

The t-test provides an assessment of how well the means of two groups are statistically different from each other. It is the ratio of difference between the group means to the variability of groups. Lower 'p' values indicate that the two groups are clinically significant.

The ROC curve is a 2D-plot with x-axis indicating `1 - specificity' and the y-axis `sensitivity'. The area under the ROC curve (AUC) indicates the classifier performance across the entire range of cut-off points and AUC must fall in the range between 0.5 and 1 [24]. A good classifier will have an AUC close to unity, and hence, ROC can be used to test the performance of the classifier [25]. In this work, we used ROC to test the performance of the classifiers that classify two distinct classes, benign and malignant thyroid nodules. Specificity indicates the number of benign subjects correctly identified as benign, and sensitivity measures the number of malignant subjects correctly identified as malignant.

Thyroid malignancy index (TMI)

In this work, we have extracted and used two DWT based features and three texture based features. However, it is very difficult to assess and keep track of the individual variation of these features in a patient to make an adequate diagnosis. Therefore, we have adopted a novel method of formulating an integrated index by combining the five features in such a way that the index value is distinctly different for benign and malignant subjects. Such an integrated index would be beneficial from the diagnostic and better objective interpretation viewpoint. The proposed Thyroid Malignancy Index (TMI) is defined as follows.



We have used 560 images for training and 240 images for testing. Three-fold stratified cross validation method was used to test the classifiers. The whole dataset was split into three equal parts (roughly). Two parts of the data (training set) were used for classifier development and the built classifier was evaluated using the remaining one part (test set) (i.e. 560 images were used for training and 240 images for testing each time). This procedure was repeated three times using a different part as the test set in each case. Finally, the average of the accuracy, sensitivity, specificity, positive predictive accuracy, and AUC obtained over all three evaluations were taken as the overall performance measures.

3.1. Classification results

Table 1 presents the values of DWT and texture based features extracted from the benign and malignant HRUS images. The table indicates the features used for classification, their corresponding mean standard deviation values for benign and malignant cases, and the p-values. All these five features are clinically significant as they show low p-values (<0.0001).






0.524 ±1.456E-02

0.609 ±1.901E-02



3.08 ±7.197E-02

2.95 ±8.333E-02



1.92 ± 0.138

1.44 ± 0.125



0.124 ±4.993E-02

9.912E-02 ±9.588E-02



1.481E-02 ±1.961E-03

1.248E-02 ±1.731E-03


Table 1 Ranges and significance of the DWT and texture features.

Table 2 shows the number of true negatives (TN), false negatives (FN), true positives (TP), and false positives (FP), and accuracy, sensitivity, specificity, and AUC values for different AdaBoost configurations using the extracted features as input. The classification accuracy for all AdaBoost configurations is above 99%. This indicates that the combination of DWT and texture methods yields highly discriminative features. However, the AdaBoost with perceptron configuration performed the best with classification accuracy, sensitivity, specificity of 100% and unity AUC.

ADABOOST (1000 Iterations)
















































Table 2 TN, FN, TP, FP, sensitivity, specificity, AUC for the classifiers.

The ROC curves for the different classifiers used are shown in Figure 4. It can be seen from the figure that all the classifiers using different AdaBoost configurations performed relatively well.

Figure 4 ROC curves for the DWT based classification results.

3.2. Thyroid malignancy index results

The range of TMI for benign and malignant classes is shown in Table 3. This range of TMI is clinically significant as the p-value is very low (<0.0001). Figure 5 shows the spread of the TMI for benign and malignant classes. It can be seen from the plot that integrated index (TMI) separates benign and malignant nodules distinctly.




P value


11.6 ±1.31

7.17 ±0.812


Table 3 Range of TMI for benign and malignant cases.

Figure 5: Box plot of the TMI.


Table 4 shows the comparison of different CAD systems used for the diagnosis of thyroid malignancy. For the last 30 years, thyroid nodules were evaluated by fine-needle aspiration (FNA), despite significant shortcomings in sensitivity and specificity. Recently, molecular profiling was used to discriminate between benign and malignant thyroid nodules during screening test [26]. Their results showed that molecular profiling readily distinguishes between benign and malignant thyroid tumors with excellent sensitivity and specificity. Elucidated genes may provide insight into the molecular pathogenesis of thyroid cancer.

The main problem in thyroid imaging is distinguishing between follicular thyroid carcinoma (FTC) and benign follicular thyroid adenoma (FTA), where histology of FNA is not conclusive [27]. In order to improve the diagnosis, gene transcript expressions from FTC, FTA, and normal thyroid were developed. They showed that, combination of these markers were able to improve pre-operative diagnosis of thyroid nodules, allowing better treatment decisions and reducing long-term health costs. The immunohistochemistry correctly classified in 90.6% of fine-needle aspirations and 85.2% of follicular thyroid adenomas.

Real-time ultrasound elastography was used to differentiate benign from malignant thyroid nodules [28].  90 patients with thyroid nodules referred for surgical treatment were examined in their study. One hundred and forty-five nodules in these patients were examined by B-mode ultrasound, color Doppler ultrasound, and ultrasound elastography. The final diagnosis was obtained from histologic findings. In real-time ultrasound elastography, 86 of 96 benign nodules (90%) had a score of 1 to 3, whereas 43 of 49 malignant nodules (88%) had a score of 4 to 6 (P <.001), with sensitivity of 88%, specificity of 90%, a positive predictive value of 81%, and a negative predictive value of 93%. High sensitivity (88%) and specificity (93%) were also observed in 68 nodules that had a higher diameter of 1 cm or less. 

Quantitative fluorescent technique was developed for making in vivo iodine content determinations of the total thyroid gland or of selected parts [29]. In solitary thyroid nodules that are "cold" to radionuclide studies, the ratio of iodine content in the nodule to that in a corresponding area of the contralateral lobe was proven to be a good indicator of malignancy. In their study of 42 surgical patients, they observed that the iodine content ratio below 0.60 was an excellent indicator of malignancy with a sensitivity of 100%, specificity of 79%, and an overall accuracy of 90%.

In our study, the classification accuracy is by far better than the reported results. Hence, our proposed CAD system can be used as a tool for clinical tests. In addition to better performance, we have also proposed an integrated index (TMI) using DWT and texture features to diagnose the classes easily and accurately. This one value will be able to indicate benign and malignant classes accurately (their ranges do not overlap).





Molecular profiling [26]

Not reported



Genetic markers [27]

90.6% &85.2%

Not reported

Not reported

Real time ultrasound [28]

Not reported



Fluorescent scanning [29]


Not reported

Not reported

Proposed HRUS based CAD method. (perception as weak classifier)




Table 4 Comparison of published thyroid cancer classification results.


There is a need for the cost-efficient biomedical diagnostic support systems for the detection of thyroid cancer. This is the reason why we developed a thyroid cancer CAD system using HRUS images. A combination of DWT and texture features was fed into four different configurations of the AdaBoost classifiers for classification. Our results show that the AdaBoost configuration with the perceptron as weak learner outperformed all other tested configurations. This combination achieved 100% classification accuracy, sensitivity and specificity. The features are so discriminative that a distinct classification step is not needed. Therefore, we proposed a single thyroid malignancy index which can be used in clinical practice to support the diagnosis. The realization of such systems is cost-effective, and the TMI can be used for a more objective, faster and simpler detection of malignant and benign cases. This index can also be used to test the efficacy of the drug or treatment taken to cure the thyroid cancer.