Visualization Of Uncertainty Using Entropy Computer Science Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Noise clustering, is a vigorous clustering method, performs partitioning of data sets reducing errors caused by outliers. This uses pure spectral information in image classification. A 'noisy' classification results are often produced due to the high variation in the spatial distribution of the same class. This provides a degree of similarity for each pixel in every class. The performance of Noise Clustering with Entropy(NCWE) is evaluated in supervised mode and , the assessment of accuracy has been carried out using entropy. The basic objective of this research is to optimize the resolution parameter 'δ' for Noise clustering (NC) algorithm and regularizing parameter 'ν' for Noise clustering with entropy classifier(NCWE) and analysis of the classified fraction images. Experiments with simulated training dataset shows the optimized values of resolution parameter 'δ' is 106 and regularizing parameter 'ν' is 0.08 for Noise Clustering with Entropy(NCWE) classifier wherein minimum level of uncertainty exist. The entropy and membership verifications are taken as indirect measures to check the accuracy of classified image.

Keywords- Entropy, Noise clustering(NC), Noise Clustering with Entropy(NCWE)

Introduction

The determination of optimum number of parameters and their values for each hybrid classifier is critical and have to be investigated (Aziz, 2004). Fuzzy classification is a soft classification technique (Binaghi and Rampini 1993), which deals with vagueness in class definition (Foody et al. 1996). Therefore it can model the gradual spatial transition between land cover classes.Fuzzy c-Means (FCM) (Bezdek, et al., 1980; Ehrlich et al., 1984., Bezdek et al., 1987) is an unsupervised clustering algorithm which has been widely used to find fuzzy membership grades between 0 and 1.For numerous applications of information detection in databases finding outliers, rare events, is of important. Outlier objects often contain information about a peculiar behavior of the system. Noise clustering (NC) is a method, which can be adapted to any prototype-based clustering algorithm like k-means and fuzzy c-means (FCM) (Rehm et al., 2007).

The concept of a noise cluster is introduced, with the hope that all the noisy points can be dumped into a noise cluster. The fuzzy version of the K-means algorithm is an attractive candidate for the new approach considered here, since with that approach, one can also obtain a relative degree of belonging of a point to a noise cluster. The approach is based on the concept of first defining a noise cluster and then defining a similarity (or dissimilarity) measure for the noise cluster. Thus if one is looking for a certain number of good clusters, then the formulation of the problem requires defining an additional cluster that will collect the noisy data points.

The main concept of the NC algorithm is the introduction of a single noise cluster that will hopefully contain all noise data points (Cimino et al., 2005). Data points whose distances to all clusters exceed a certain threshold are considered as outlier. This distance is called the noise distance. The presence of the noise cluster allows outliers to have arbitrarily small memberships in good clusters (Dave et al., 1997). One popular method to find clusters in numerical data object minimizes the objective function of the c-means model defined as the sum of the quadratic distances between the data points and the centers of the clusters that they belong to.

In the last decades, a number of robust fuzzy clustering algorithms have been proposed to partition data sets affected by noise and outliers. In robust-NC, noise is modeled as a separate cluster and is characterized by a prototype that has a constant distance δ from all data points. Distance δ which is also known as resolution parameter determines the boundary of the noise cluster and therefore is a critical parameter of the algorithm (Cimino et al., 2005). The idea of noise clustering is based on the introduction of an additional cluster that is supposed to contain all outliers (Dave and Keller., 1997).

Cluster study is an important tool in many technical disciplines, and many clustering methods are available (Everitt, 1974 or Jain and Dubes, 1988). Most clustering methods are plagued with the problem of noisy data, i.e., characterization of good clusters amongst noisy data. In some cases, even a few noisy points or outliers affect the outcome of the method by severely biasing the algorithm. The noise that is just due to the geometric sharing of the measuring instrument is usually of no concern. On the other hand, the completely arbitrary noise points that just do not belong to the pattern or class being searched for are of real concern. A good example of that is in image processing, where one is searching for certain shapes, for instance, amongst all the edge elements detected. An approach that is frequently recommended (Jain and Dubes, 1988) is where one tries to identify such data and removes it before application of the clustering algorithms.

The entropy can be used as an absolute indicator to measure uncertainty in classified images (Dehghan., 2006). As commercially available image processing software's were not having soft classification algorithms used in this work. So inhouse developed SMIC (Sub-pixel Multi-Spectral Image Classifier) System (Kumar et al., 2005) having fuzzy and entropy based fuzzy classifier with assessment of accuracy module for fraction images used in this research work.

Classifiers And Accuracy Assessment Approaches

Noise Clustering Algorithm

In the last decades, a number of robust fuzzy clustering algorithms have been proposed to partition data sets affected by noise and outliers. In robust-fuzzy c-mean, noise is modeled as a separate cluster and is characterized by a prototype that has a constant distance 'δ' which is also known as resolution parameter, from all data points. Distance 'δ' find out the boundary of the noise cluster and therefore is a critical parameter of the algorithm to optimize (Cimino et al., 2005).

The idea of noise clustering is based on the introduction of an additional cluster that is supposed to contain all outliers (Dave and Keller., 1997). Feature vectors that are about the noise distance 'δ' or further away from any other prototype vector get high membership degrees to this noise cluster. The noise prototype is such that the distance dcj distance of feature vector xj from vc is the fixed constant value

The specification of the noise distance depends on several factors, i.e. maximum percentage of the data set to be classified as noise, distance measure, number of assumed clusters and the expansion of the feature space(Klawonn., 2004 ).

The noise distance proposed in (Krishna., 1998) is a simplified statistical average over the non-weighted distances of all feature vectors to all prototype vectors.

Where λ is the value of the multiplier used to obtain δ from the average of distances. The memberships of the vectors in the data set to the noise cluster are defined as

Objective function is given by (Krishna.,1998)

And may be computed as

Where i= 1,……c, k=1,…….n , resolution parameter δ> 0 and weighting exponent m>1.

n= row*column (image size). The distances are defined as

for all 'k' and i=1to (c-1) .The denotes the mean vector of each class and can be defined as

Execution Step of NC Algorithm

This execution model is based upon standard K-means type algorithm.

Step 1. Fix the number of clusters c, and fix the exponent m. For hard memberships, m = 2.9.

Select initial locations of cluster centers . Specify noise cluster distance δ

Step 2. Generate a (new) partition using equation (5).

Step 3. Calculate new cluster centers using equation (7).

Step 4. If the cluster partition is stable, stop; else go to Step 2.

Noise Clustering with entropy Algorithm

Recently many researchers are working on cluster analysis as a main tool to solve the problems related with satellite image classification, data analysis and data mining. An old and still most popular method is the K-means which use K cluster centers. A group of data is gathered around a cluster center and thus forms a cluster which in turn provides a base for noise clustering classifier. What we emphasize in this research is a family of algorithms using entropy or entropy-regularized methods which are less known, but we consider the entropy-based method to be another useful hybrid method of image classification (Binaghi et al. 1999). The term entropy was first used by Rudolf Clausius to state the second law of thermodynamics. Though entropy is a simple term, many people find it difficult to understand its exact meaning. There are three important E's in the study of the thermodynamics: energy, equilibrium and entropy. Entropy was taken from the Greek word 'tropee' which means transformation. Fuzzy or soft classification outputs of images as obtained either in the form of class membership or in the form of probabilities (Dunn.,1973) and (Bezdek.,1981). Such an idea of regularization has frequently been found in the formulation of ill-posed problems. A typical regularization is done by adding a regularizing function. The objective function of Noise Clustering with Entropy classifier (NCWE) is

where regularizing parameter '' >0

Soft accuracy assessment methods

For the uncertainty visualization and evaluation of the classification results, the entropy criterion is proposed. This measure expresses by the following Eq.(13)

For high uncertainty, the calculated entropy (Eq. (13) is high and inverse(Dehghan.,2006). Therefore this criterion can visualize the pure uncertainty of the classification results.

Study area and Data used

The study area for the present research work belongs to Sitarganj Tehsil, Udham Singh Nagar District, Uttarkhand, India. It is located in the southern part of the state. In terms of Geographic lat/long, the area extends from 28°52'29"N to 28°54'20"N and 79°34'25"E to 79°36'34"E. The area consists of agricultural farms with sugarcane and paddy as one of the few major crops with two reservoirs namely, Dhora and Bhagul reservoir. The images for this research work have been taken from three different sensors namely AWiFS, LISS-III and LISS-IV belonging to satellite IRS-P6.

Methodology

All three datasets (AWiFS, LISS-III, and LISS-IV) were geometrically corrected with RMSE less than 1/3 of a pixel and resampled using nearest neighbor resample method at 60 meter,20 meter, and 5 meter spatial resolution respectively to maintain the correspondence of a LISS-III pixel with specific number of LISS-IV pixels (here 4 pixels will corresponding to 1 LISS-III pixel) with respect to sampling during accuracy assessment. The flow chart of the methodology adopted is shown in Fig. 1

Optimization of resolution parameter 'δ' and regularizing parameter 'ν'

Images from AWiFS ,LISS-III and LISS-IVsensor

Pre-processing

Image to image registration & atmospheric corrections

Image of AWIFS sensor

Deployment of Training Data

CLASSIFICATION USING 'NC' WITH ENTROPY

Generation of Fraction Images using LISS-IV

Image to Image based accuracy assessment using FERM, SCM, MIN, LEAST & PROD operators.

Image of LISS-III sensor

Deployment of Training Data

CLASSIFICATION USING 'NC' WITHOUT ENTROPY

Generation of Fraction Images of all six classes and one for noise class using LISS-IV as a reference

Assessment of accuracy using entropy

Figure 1. Methodology adopted

The coding of NC and NCWE classifier is based on that mathematical expression mentioned in Eq. (4) and (8) respectively. These both algorithms have been incorporated in supervised soft classification mode. Training data can be incorporated as pure pixels or mixed pixels using different weighted norms. The accuracy of soft classifier is checked using entropy.

The six classes of interest, namely Agriculture land with crop, Sal forest, Eucalyptus plantation, Agriculture dry land without crop, Agriculture moist land without crop, and water body have been used for this study work. `Training data was collected with the help of field data and testing was conducted while taking 100 samples per class and total 600 samples randomly selected. The satellite image of study area is shown in Fig.2

AWiFS LISS-III LISS-IV

Figure 2. Location of study area

In first part of this research work it has been tried to find out the optimum value of resolution parameter 'δ' and regularizing parameter 'ν' NCWE classifier. To investigate the effect of uncertain pixels in this classifier, fixed optimized resolution parameter '∂'=106, is used and regularizing parameter 'ν' varies from 0.01 to 109 for all the above mentioned six land cover classes. The ambiguity and uncertainty is one of the major issues in the classification of remote sensing data. The estimation of uncertainty in the classification results is important and is necessary to evaluate the performance of any classifier. This study addresses the evaluation of entropy, based on NCWE classifier which estimates uncertainty in classification results using Eq.(13). In varying spatial resolution of classification and reference soft outputs, entropy gives the true reflectance of uncertainty ratio among various classes. The uncertainty criteria have been estimated from computed entropy based on actual output of classifier ((Verhoeye and Robert, 2000).

Results and discussions

According to (Kumar et al., 2006) to evaluate the accuracy of fuzzy classified map it is necessary to use soft reference data, due to the following reasons:

• Since it is the time consuming and costly way to generate soft reference dataset.

• It is difficult to locate the area of a 60X60 m pixel (of AWiFS) on the ground exactly.

• Due to the remoteness of the area field visit is not always possible and realistic.

• Reference data generated from field survey also may contain some errors (Foody., 2002.)

So it is necessary to adobe an effective way to generate soft reference data from available fine resolution dataset( Kumar and Ghosh 2007). In this study LISS-IV data was used to generate soft reference data.

Accuracy assessment via Entropy of NCWE Classifier

The vagueness is a major concern in the classification of remote sensing data. The uncertainty estimation of the classification results is important and necessary to evaluate the classifier performance. This study addresses the evaluation of entropy, based on NCWE classifier which estimates uncertainty in classification results. In varying spatial resolution of classification and reference sub-pixel outputs entropy give the true reflectance of uncertainty ratio among various classes. The uncertainty criteria have been estimated from computed entropy based on actual output of classifier. For setting the optimized value of '∂', a number of experiments have been conducted individually for this classifier by varying '∂' from 1.0 to 109. It has been observed from the resultant Tables 1, 2, and 3 that for homogenous classes like Agriculture land with crop, Agriculture dry land without crop Agriculture moist land without crop, and Water Body for NCWE classifier the optimized value of '∂' is 106. Similarly for heterogeneous classes like Sal forest and Eucalyptus plantation, the optimized value of '∂' is also106 for NCWE classifier. These findings suggest that using these optimized values of resolution parameter '∂'for NCWE classifier on homogenous and heterogeneous land cover classes the range of the computed entropy varies between the range of [0,3] as shown in resultant Tables 1 to 3. This in turn states that the information uncertainty is not exceeding more than 3%. In this research entropy has been used to measure the accuracy in terms of uncertainty without using any kind of ground reference data. This classification accuracy is directly measured by entropy. Measuring the spatial statistics of a satellite image using an entropy, of six land cover classes can be measured using Eq. (13) i.e. 6*(-1/6*log21/6)=2.585(Stein and Gorte., 2002 ). This states that if the computed entropy values of classified images are lying within this range; then indirectly this reflects better classification results. It is shown in Table 1, 2, and 3 where AWIFS, LISS-III and LISS-IV entropy of NCWE classifier for six land cover classes have been computed and, found that the entropy values are approximately lying within the specified range wherein the value of resolution parameter is varying from 1.0 to 109.

For setting the optimized value of 'ν', a number of experiments have been conducted individually for this classifier by varying 'ν' from 0.01 to 109. It has been observed from the resultant Tables 1, 2, and 3 that for homogenous classes like Agriculture land with crop, Agriculture dry land without crop Agriculture moist land without crop, and Water Body for NC classifier the optimized value of 'ν' is 0.08. Similarly for heterogeneous classes like Sal forest and Eucalyptus plantation, the optimized value of 'ν' is also 0.08 for NCWE classifier. These findings suggest that using these optimized values of regularizing parameter 'ν' for NCWE classifier on homogenous and heterogeneous land cover classes the range of the computed entropy varies between the range of [0,3] as shown in resultant Tables 1 to 3. This in turn states that the information uncertainty is not exceeding more than 3%. In this research entropy has been used to measure the accuracy in terms of uncertainty without using any kind of ground reference data. This classification accuracy is directly measured by entropy. This states that if the computed entropy values of classified images are lying within this range; then indirectly this reflects better classification results. It is shown in Table 1, 2, and 3 where AWIFS, LISS-III and LISS-IV entropy of NCWE classifier for six land cover classes have been computed and, found that the entropy values are approximately lying within the specified range wherein the value of resolution parameter '∂' is fixed ,and the value of regularizing parameter ' ν' is varying from 0.01 to 109.

Table 1: AWIFS entropy of various land cover classes from NCWE classification output

Value of ∂=106 and varying Regularizing Parameter 'ν'

Agriculture land with crop

Sal forest

Eucalyptus plantation

Agriculture dry land without crop

Agriculture moist land without crop

Water Body

ν=0.01

0

0

0

0

0

0

ν=0.02

0

0

0

0

0

0

ν=0.03

0

0

0

0

0

0

ν=0.04

0

0

0

0

0

0

ν=0.08

0

0

0

0

0

0

ν=0.09

0

0

0

0

0

0

ν=0.1

0

0

0

0

0

0

ν=0.1

0

0

0

0

0

0

ν=0.2

0

0

0

0

0

0

ν=0.3

0

0

0

0

0

0

ν=0.5

0

0

0

0

0

0

ν=0.6

0

0

0

0

0

0

ν=0.7

0

0

0

0

0

0

ν=0.8

0

0

0

0

0

0

ν=0.9

0

0

0

0

0

0

ν=1

0

0

0

0

0

0

ν=10

0

0

0

0

0

0

ν=102

0.06

0.06

0

0

0

0

ν=103

0.54

0.57

0.06

0

0

0

ν=104

1.19

1.26

0.79

0.06

0.06

0.06

ν=105

1.92

1.87

1.82

1.11

1.92

1.23

ν=106

2.55

2.55

2.51

2.57

2.61

2.72

ν=107

2.80

2.8

2.80

2.57

2.57

2.96

ν=108

2.96

2.93

2.92

2.92

2.93

2.95

ν=109

2.94

2.94

2.94

2.94

2.94

2.94

Table 2: LISS-III entropy of various land cover classes from NCWE classification output

Value of ∂=106 and varying Regularizing Parameter 'ν'

Agriculture land with crop

Sal forest

Eucalyptus plantation

Agriculture dry land without crop

Agriculture moist land without crop

Water Body

ν=0.01

0

0

0

0

0

0

ν=0.02

0

0

0

0

0

0

ν=0.03

0

0

0

0

0

0

ν=0.04

0

0

0

0

0

0

ν=0.08

0

0

0

0

0

0

ν=0.09

0

0

0

0

0

0

ν=0.1

0

0

0

0

0

0

ν=0.1

0

0

0

0

0

0

ν=0.2

0

0

0

0

0

0

ν=0.3

0

0

0

0

0

0

ν=0.5

0

0

0

0

0

0

ν=0.6

0

0

0

0

0

0

ν=0.7

0

0

0

0

0

0

ν=0.8

0

0

0

0

0

0

ν=0.9

0

0

0

0

0

0

ν=1

0

0

0

0

0

0

ν=10

0

0

0

0

0

0

ν=102

0.06

0.41

0.06

0

0

0

ν=103

0.97

1.09

0.54

0

0

0.06

ν=104

1.66

1.57

1.53

0.06

0.85

0.02

ν=105

2.60

2.44

1.97

2.37

2.58

2.36

ν=106

2.58

2.53

2.51

2.54

2.59

2.60

ν=107

2.79

2.81

2.79

2.56

2.55

2.95

ν=108

2.95

2.95

2.92

2.92

2.92

2.94

ν=109

2.94

2.97

2.94

2.94

2.94

2.94

Table 3: LISS-IV entropy of various land cover classes from NCWE classification output

Value of ∂=106 and varying Regularizing Parameter 'ν'

Agriculture land with crop

Sal forest

Eucalyptus plantation

Agriculture dry land without crop

Agriculture moist land without crop

Water Body

ν=0.01

0

0

0

0

0

0

ν=0.02

0

0

0

0

0

0

ν=0.03

0

0

0

0

0

0

ν=0.04

0

0

0

0

0

0

ν=0.08

0

0

0

0

0

0

ν=0.09

0

0

0

0

0

0

ν=0.1

0

0

0

0

0

0

ν=0.1

0

0

0

0

0

0

ν=0.2

0

0

0

0

0

0

ν=0.3

0

0

0

0

0

0

ν=0.5

0

0

0

0

0

0

ν=0.6

0

0

0

0

0

0

ν=0.7

0

0

0

0

0

0

ν=0.8

0

0

0

0

0

0

ν=0.9

0

0

0

0

0

0

ν=1

0

0

0

0

0

0

ν=10

0

0

0

0

0

0

ν=102

0.06

0.06

0.06

0

0

0

ν=103

0.93

0.95

0.25

0.06

0.06

0.06

ν=104

1.65

1.62

1.53

0.07

1.07

0.21

ν=105

2.57

2.48

1.95

2.35

2.55

2.34

ν=106

2.57

2.52

2.50

2.53

2.58

2.58

ν=107

2.790

2.79

2.79

2.56

2.55

2.95

ν=108

2.95

2.92

2.92

2.92

2.92

2.94

ν=109

2.94

2.94

2.94

2.94

2.94

2.94

Conclusion

In this research work it has been tried to generate fraction outputs from entropy-based NC classifier. These outputs have been generated from AWIFS, LISS-III and LISS-IV images of IRS-P6 data. Assessment of entropy and identification of membership values from fraction images, shown in Table 1-3. Entropy is an absolute uncertainty indicator wherein identification of membership approach is manual. Uncertainty is intrinsic in spatial data and this generally refers to error, inexactness, fuzziness and ambiguity. The objective of this research on spatial data to is to investigate, how uncertainties arise, and propagated in the spatial data. In the area of remote sensing, the decision making are not generally deterministic due to the involvement of fuzziness in the classification of remotely sensed imagery. In remote sensing classification, fuzzy based classifiers are becoming increasingly popular. Due to the wide acceptance of hybrid approach ,NCWE classifier, is used to evaluate the performance of the classified images, with optimized value resolution parameter '∂' and regularizing parameter 'ν' in this research. It has been shown in result that the entropy value, with minimum uncertainty in it, for the optimized value of resolution parameter '∂'=106, & regularizing parameter' ν'=0.08. It is shown Table 1-3. The entropy and membership verifications are taken as an indirect measure to check the accuracy of classified image. From this work it can be concluded that fuzzy based hybrid approach using entropy-based NC classifier (NCWE) with optimum value of resolution and regularizing parameters generates classified output with minimum uncertainty.

Writing Services

Essay Writing
Service

Find out how the very best essay writing service can help you accomplish more and achieve higher marks today.

Assignment Writing Service

From complicated assignments to tricky tasks, our experts can tackle virtually any question thrown at them.

Dissertation Writing Service

A dissertation (also known as a thesis or research project) is probably the most important piece of work for any student! From full dissertations to individual chapters, we’re on hand to support you.

Coursework Writing Service

Our expert qualified writers can help you get your coursework right first time, every time.

Dissertation Proposal Service

The first step to completing a dissertation is to create a proposal that talks about what you wish to do. Our experts can design suitable methodologies - perfect to help you get started with a dissertation.

Report Writing
Service

Reports for any audience. Perfectly structured, professionally written, and tailored to suit your exact requirements.

Essay Skeleton Answer Service

If you’re just looking for some help to get started on an essay, our outline service provides you with a perfect essay plan.

Marking & Proofreading Service

Not sure if your work is hitting the mark? Struggling to get feedback from your lecturer? Our premium marking service was created just for you - get the feedback you deserve now.

Exam Revision
Service

Exams can be one of the most stressful experiences you’ll ever have! Revision is key, and we’re here to help. With custom created revision notes and exam answers, you’ll never feel underprepared again.