Reviewing The Problems Of Classic Segmentation Cultural Studies Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.


IMAGE segmentation is a classic inverse problem which consists of achieving a compact region-based description of the image scene by decomposing it into meaningful or spatially coherent regions sharing similar attributes. This low-level vision task is often the preliminary (and also crucial) step in many video and computer vision applications, such as object localization or recognition, data compression, tracking, image retrieval, or understanding.

Often an object that is not extracted in gray levels can be extracted while using the color information. Generally monochromatic segmentation techniques are extended to color images. However all these techniques have advantages and inconvenient. Most method of segmentation are combination of classic techniques and/or fuzzy logic notations, neural network, genetic algorithms etc complete.

Clustering is the search for distinct groups in the feature space. It is expected that these groups have different structures and that can be clearly differentiated. The clustering task separates the data into number of partitions, which are volumes in the n-dimensional feature space. These partitions define a hard limit between the different groups and depend on the functions used to model the data distribution

Because of its simplicity and efficiency, clustering approaches were one of the first techniques used for the segmentation of (textured) natural images [1]. After the selection and the extraction of the image features [usually based on color and/or texture and computed on (possibly) overlapping small windows centered around the pixel to be classified], the feature samples, handled as vectors, are grouped together in compact but well-separated clusters corresponding to each class of the image. The set of connected pixels belonging to each estimated class thus defined the different regions of the scene. A popular heuristic for k-means clustering is Lloyd's algorithm[2].


Years of research in segmentation have demonstrated that significant improvements on the final segmentation results may be achieved by using notably more sophisticated feature selection

Procedures, more elaborate clustering techniques (involving sometimes a mixture of different or non Gaussian distributions

for the multidimensional texture features [3], [4]), taking into account prior distribution on the labels, region processes, or the number of classes [5], [6], [8], finally, involving (in the case of energy-based segmentation models) more costly optimization techniques.

Color image quantization [7] is the process used to reduce the number of colors in a digital color image. This process may be used to compress the image information. Selection of the color space needs to split each cluster orthogonally to its major axis, leading to poorer results. Fixed axis heuristic, which splits clusters orthogonally to their coordinate axis of greatest variance, gives worse results .Thus difficulty lies in, determination of a good color space and/or the fact that different color spaces may obtain equal/better results.

The choice of a color model [10] is of great importance for many computer vision algorithms (e.g., feature detection, object recognition, and tracking) as the chosen color model induces the equivalence classes to the actual algorithms. Moreover no color space can be considered as universal, as color can be interpreted and modeled in different ways. The problem arises with how to select the color model that produces the best result for a particular computer vision task induced by facts like several color spaces are equally good candidates, different color channels have similar properties. A proper weighting scheme is required to combine color spaces or color channels. Fusion algorithm allows for combination of features coming from very different domains. It enables to obtain an optimal balance between repeatability and distinctiveness.


The proposed segmentation approach is conceptually different and explores a new strategy. Instead of considering an elaborate and better designed segmentation model of textured natural image, this technique explores the possible alternative of blending (i.e., efficiently combining) several segmentation maps associated to simpler segmentation models in order to get a final reliable and accurate segmentation result. More precisely, this work proposes a fusion framework which aims at fusing several K-means clustering results (herein using as simple cues the values of the requantized color histogram estimated around the pixel to be classified) applied on an input image expressed by different color spaces. These different label fields are fused together by a simple K-means clustering techniques using as input features, the local histogram of the class labels, previously estimated and associated to each initial clustering result. It demonstrates that the proposed blended method, while being simple and fast performs competitively and often better (in terms of visual evaluations and quantitative performance measures) than the best existing state-of-the-art recent segmentation methods. The general block diagram of segmentation is given below in figure 1.

Fig 1 Block Diagram - Segmentation -Different Color Spaces


K-means (MacQueen, 1967) is one of the simplest unsupervised learning algorithms that solve the well known clustering problem


The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster. These centroids shoud be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other. The next step is to take each point belonging to a given data set and associate it to the nearest centroid. When no point is pending, the first step is completed and an early groupage is done. At this point we need to re-calculate k new centroids as barycenters of the clusters resulting from the previous step. After we have these k new centroids, a new binding has to be done between the same data set points and the nearest new centroid. A loop has been generated. As a result of this loop we may notice that the k centroids change their location step by step until no more changes are done. In other words centroids do not move any more.

Finally, this algorithm aims at minimizing an objective function, in this case a squared error function. The objective function


where is a chosen distance measure between a data point and the cluster centre , is an indicator of the distance of the n data points from their respective cluster centres.


The algorithm is composed of the following steps:

Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids.

Assign each object to the group that has the closest centroid.

When all objects have been assigned, recalculate the positions of the K centroids.

Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated


The initial segmentation maps which will then be blended together by this fusion framework (blended) are simply given by a K-means [2] clustering technique, applied on an input image that is expressed in different color spaces, and using as simple cues (i.e., as input multidimensional feature descriptor) the set of values of the re-quantized color histogram (with equidistant binning) estimated around the pixel to be classified. This local histogram is equally requantized (for each of the three color channels) in a bins descriptor, computed on an overlapping squared fixed-size neighborhood centered around the pixel to be classified. This estimation can be quickly computed by using a more coarsely

requantized color space and then computing

the bin index that represents each re- quantized color. [fig 2]

Fig. 2 Estimation, for each pixel x, of the N = q bins descriptor (q = 5) in the RGB color space.

The RGB color cube is first divided into Nb=q3 equal sized smaller boxes (or bins). Each Rx,Gx,Bx color value associated to each pixel contained in a (squared) neighborhood region(of size NwXNw) centered at X, increments(+1) a particular bin. The set of bin values represents the (non normalized) bin descriptor. We then divide all values of this Nb bins descriptor by

(NwXNw) in order to ensure that the sum of

these values integrates to one.

Here a texton is a repetitive character or element of a textured image (also called a texture primitive), is characterized by a mixture of colors or more precisely by the values of the re-quantized local color histogram. This model while being robust to noise and local image transformations is also simple to compute and allows significant data reduction and has already demonstrated all its efficiency for tracking applications [9].

Figure 3 Primary Segmentation (PS) Phase

Finally, these (125-bin) descriptors are grouped together into different clusters (corresponding to each class of the image) by the classical -means algorithm with the classical Euclidean distance. This simple segmentation strategy of the input image into classes is repeated for different color spaces which can be viewed as different image channels provided by various sensors or captors.


Segmentations (Ns) provided by the 6 color spaces, C= {RGB, HSV, YIQ, XYZ, LAB, LUV} [1], [14] are used. These initial segmentations to be fused can result the same initial and simple model used on an input image filtered by another filter bank (e.g., a bank of Gabor filters [13] or any other 2-D decomposition of the frequential space) or can also be provided by different segmentation models or different segmentation results provided by different seeds of the same stochastic segmentation model.

The final fusion procedure is more reliable as the interesting property of each color space has been taken into account. For example, RGB is the optimal one for tracking applications [12].due to it being an additive color system based on trichromatic theory and nonlinear with visual perception. The HSV is more apt to decouple chromatic information from shading effect [4].

The YIQ color channels have the property to code the luminance and chrominance information which are useful in compression applications (both digital and analogue). Also this system is intended to take advantage of human color characteristics. XYZ although are nonlinear in terms of linear component color mixing, has the advantage of being more psychovisually linear.

The LAB color system approximates human vision, and its component closely matches human perception of lightness [1].

The LUV components provide a Euclidean color space yielding a perceptually uniform spacing of color approximating a Riemannian space [13]. The stated blended technique tries to efficiently combine these properties.


The key idea of the proposed blended procedure simply consists of considering, for each site (or pixel to be classified), the local histogram of the class (or texton) labels of each segmentation to be fused, computed on a squared fixed size Nw neighborhood centered around the pixel, as input feature vector of a final clustering procedure. For a fusion of Ns segmentation with K1 classes into a segmentation with K2 classes, the preliminary feature extraction step of this fusion procedure thus yields to Ns (K1-bin) histograms which are then gathered together in order to form, K1 X Ns dimensional feature vector or a final bin histogram which is then normalized to sum to one, so that it is also a probability distribution function.

The proposed blended procedure is then herein simply considered as a problem of

clustering local histograms of class labels computed around and associated to each site. To this end, we use, once again, a K-means clustering procedure exploiting, for this fusion step, a histogram based similarity measure derived from the Bhattacharya similarity coefficient.(Fig 4)Bhattacharya distance between a normalized histogram and a reference histogram is given by

The pre estimated label fields to be fused

along with the fusion procedure can be viewed as a two-step hierarchical segmentation procedure in which, first, a texton map is estimated and, second, a final clustering, taking into account this mixture of textons is then used for a final clustering.

Fig.4 shows an example of the clustering segmentation model presented in Section IV of an input image expressed in the RGB, HSV, YTQ, XYZ, LAB, and LUV color spaces and the final segmentation map which results of the fusion of these clustering. We can find that none of them can be considered as reliable except the final segmentation result which visually identifies quite faithfully the different objects of the scene.