Roi Extraction From Geographical Map Image Technology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Images are considered as one of the most important medium of conveying information. An image can have thousand times better impression than hundreds lines of document. Understanding images and extracting the information from them such that the information can be used for other tasks is an important aspect of Machine learning. An example of machine learning form image can separate road network from image which helps the car to choose optimum path from source to destination. Color image segmentation [1], whose purpose is to decompose an image into meaningful partitions, is widely applied in multimedia analysis.

A map is a visual representation of an area, a symbolic depiction highlighting relationships between elements of that space such as objects, regions, and themes. A city map will include not only the city's transport network, but also other important information, such as city sights or public institutions. The extraction of Region of Interest (ROI) from overlapping objects of city map such as transportation network is a challenging problem in color image analysis. The problem is much complex in city map analysis is more complex since there are more types of data, various types of lines, possible curvature and even branching of graphics.

Clustering is a feature-space method of Color image segmentation. This method classifies pixels to different groups in a predefined color space. K-Means clustering is popular for its simplicity for implementation, and it is commonly applied for grouping pixels in images. The choices of color space may have significant influences on the result of image segmentation. There are many kinds of color space, including RGB, YCbCr, YUV, HSV, CIE L*a*b*, and CIE L*u*v*. Although RGB, YCbCr, and YUV color spaces are commonly used in raw data and coding standards but they are not close to human perceptions. Unlike the RGB and CMYK color models, Lab color is designed to approximate human vision. It aspires to perceptual uniformity, and its L component closely matches human perception of lightness. It can thus be used to make accurate color balance corrections by modifying output curves in the 'a' and 'b' components, or to adjust the lightness contrast using the L component.

Previous Work

The separation of text from map was done by researchers till date. Fletcher and Kasturi [3] developed an algorithm for text string separation from mixed text/graphics image. Taking account of curving labeled road names in the map, Tan and Ng [4] developed a system using the pyramid to extract text strings. Both methods, however, assume that the text does not touch or overlap with graphics. Touching lines with text is important for engineer drawings [9-11], especially maps [12]. This problem is much more complex since there are more types of data, various types of lines, possible curvature and even branching of the graphics. An approach in [5] introduced an algorithm which permits to extract text on binary images. Then regions may have characters is selected using the maximum connected components. The characters are merged into words using Hough Transform. But, it does not support the overlapping of text and graphics elements of images which is a common problem of topographic maps. In [11], authors develop a model to extract black characters in city maps. They use street lines data to recover the characters composing street names. Due to the possible overlapping between street names and street lines describing the street, they develop a specific OCR algorithm to produce an efficient text file of street names.

More recently, a text/graphic separation method applied to color thematic maps has been described in [10, 9, 15]. The authors describe a segmentation technique based on 24 images obtained by a combination of color components (R,G, B, (R+B)/2, etc.). The resultant binary image constructed from the mix of theses 24 images is then used in an OCR-based recognition system with neural networks.

The ROI extraction Process from Map

Color map contains several overlapping objects with different color, shape and size. The ROI is focused to road-network and water-way. The process of extraction of transport network deals with maximum connected road and extraction of waterway. This separation process is depicted in figure 1 which is explained in the following section.

Data Resources and Software used

For this research purpose we used K-Means Clustering Technique for color based Image segmentation. The images are of parts of Dhaka city, the capital of Bangladesh. The entire work is carried out using MatLab R2010a, Adobe Photoshop 7.0 and MS-Office.

Image Acquisition

The first stage of any vision system is the image acquisition stage. Color images of scenes and objects can be captured on photographic film by conventional cameras, on video tape by video cameras, and on magnetic disk or solid-state memory card by digital cameras. Digital color images can be digitized from film or paper by scanners. After the image has been obtained, various methods of processing can be applied to the image to perform the many different vision tasks required today. The images used in this paper are acquired from google map in RGB image format.

Figure : Flowchart of ROI extraction from Map Image

RGB to L*a*b* Conversion

As next step in the processing, measured RGB values in the map image are converted to CIE L*a*b*. Here L* describes lightness; its values can range between 0 (black) to 100 (white). The other two variables describe the actual color, with a* representing green (negative values) or red (positive) and b* representing blue (negative) or yellow (positive values). The conversion of RGB to the L*a*b* system is done via an intermediate step, by translating RGB values into CIE XYZ values. The standard conversion from RGB to XYZ values, which was used shipboard, uses the following equation [Rogers, 1985]:


Where X, Y, Z are the CIE tri-stimulus values of a color. R, G, B are the red, green, and blue channels of a color as measured in the image. XR, YR, ZR, and so on are the chromacities of the RGB primaries of the camera; and for R, G, and B.

However, for computational purposes, the terms (XR · CR), (YR · CR), (ZR · CR), and so on are taken together as a single unknown constant each, giving


This standard conversion implies that the lines calculated for X, Y, and Z intersect the origin of the axes i.e., that pure black has values equal to, or very close to (0,0,0) in both the XYZ and RGB vector space.


In this case, information about four colors is needed to solve the constants (i.e. the red, green, blue, and gray chips). Writing out the matrix multiplication and rearranging the sets of linear equations yields the following for the X-values of the four color chips used:

. (4)

Here, Rr, Gr, and Br, and Xr, Yr, and Zr are the measured channels and the known XYZ values respectively for the red chip; subscripts g = green, b = blue, and gr = gray apply to the other three color chips used. Typically, values for the constants a1, a2, and a3 are found to range between 1.5 and 4.5, demonstrating that an offset from the origin of the X, Y, and Z axes is indeed present.

Figure :RGB Map Image

Figure : L*a*b of the Map

The XYZ values estimated for the data in the line scan are then further converted to CIE L*a*b* values using the following equations (from Billmeyer and Saltzman, [1981]):

, (5)

, (6)

, (7)

with f(Y/Yn) = (Y/Yn)1/3 for Y/Yn > 0.008856 and f(Y/Yn) = 7.787(Y/Yn) + 16/116 for Y/Yn 0.008856; f(X/Xn) and f(Z/Zn) are defined similarly. Xn, Yn, and Zn are the tristimulus values of a reference white. For this study, the known XYZ values of the white color chip are used as reference values.

K-Means clustering in L*a*b Color Space

Clustering is a way to separate groups of objects. K-means clustering treats each object as having a location in space. It finds partitions such that objects within each cluster are as close to each other as possible, and as far from objects in other clusters as possible.

The K-means algorithm is an iterative technique that is used to partition an image into K clusters. The basic algorithm is:

Pick K cluster centers, either randomly or based on some heuristic

Assign each pixel in the image to the cluster that minimizes the distance between the pixel and the cluster center.

Re-compute the cluster centers by averaging all of the pixels in the cluster

Repeat steps 2 and 3 until convergence is attained (e.g. no pixels change clusters)

In this case, distance is the squared or absolute difference between a pixel and a cluster center. The difference is typically based on pixel color, intensity, texture, and location, or a weighted combination of these factors. Here the color information exists in the 'a*b*' space. ROI objects are pixels with 'a*' and 'b*' values. The process used K-means to cluster the objects into three clusters and pixel values distance is measured by the Euclidean distance metric.

Figure : K-Means Output

Connected component labeling of Image

Connected component labeling is used in computer vision to detect connected regions in binary digital images, although color images and data with higher-dimensionality can also be processed.[1]HYPERLINK ""[2] When integrated into an image recognition system or human-computer interaction interface, connected component labeling can operate on a variety of information.[3]HYPERLINK ""[4] Blob extraction is generally performed on the resulting binary image from a thresholding step. Blobs may be counted, filtered, and tracked. We have used this technique to filter pixel value based thresholding for separate connected components from K-Means clustered imaging output. 8-connected component algorithm is applied to label connected neighbor.

Process of Road Network

After getting the segmented image having road network several de-noising and reconstruction techniques are applied. After processing, maximum connected road network has been labeled.


For removal noise from the road network image, we have applied color based thresholding. Because, all roads are yellow like colored and we remove light-yellow colored region from the image. Then image is converted into binary image. This binary image is then feed for reconstruction the missing road portion.

Reconstruction of Road:

Because of removal of overlapping objects, text and line segments, some portions of road network disappeared. We have applied structuring elements to reconstruct the missing portion of the road.

Maximum connected Road Network Identification:

After removal of noise and reconstruction, we have applied bounded area labeling and identified the largest connected road area from the image.

Figure : Output of Road Network processing

Process of Waterway

The same process of road network is applied for identifying waterway from the map. In map water portions are colored with blue color. Smaller sized water colored for pond, small lake, ditch etc. are removed. Then labeling is applied to detect the boundary of waterway.

Figure : Output of Waterway Processing


There are mainly two contributions in this paper: the efficient ROI extraction method for both road network and waterway. The image segmentation method include a color based quantization method in L*a*b color space by K-Means clustering. The segmentation method is efficient to extract regions with different colors in images, and the results are close to human perceptions. For future developments, this output can be converted into graphs which can be processed by embedded system in to drive vehicle and vessel automatically from source to destination and follow optimum path.