Study On Information Retrieval And Image Models Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Now days, image is quite familiar word. But there are lots of things that come under the tag of image family. According to Narendra Ahuja and B. J. Schachter (December1981),Image family include maps, diagrams, pictures, statues, projections, poems, patterns, optical illusions, dreams, hallucinations, spectacles, memories, and even ideas as images. But the point is that by calling of all these things by the name of image does not necessarily mean that they all have something in common anyhow digital image is array of pixels[7].Image is an important source of information in world wide web era. In present era there is terrific boost in storage and communication technologies. Information storage is not limited to text but also images, videos all plays major role and contribute to major share of this stored information. [1]

This stored information is useless until unless it is effectively handle for searching and indexing. Large databases contain textual and images, information. Extraction of information and searching from these databases in minimum time and with max recall is one of the major issues these days. [1]

2.1.1 Types of Image

According to Gonzalez, Rafael, C; Woods, Richard E (2008), the types of image varies according to the number of pixels it contains. There are certain types of images, based on number of pixels. A binary image contains only black and white colors in simplest image and each pixel is represented by one bit only. Grayscale images contain band of shade or intensity from white to black varying from black at the weakest intensity, total absence, and white, total presence, at its strongest. In gray scale the each pixel's value is a single sample. Grayscale images have many shades of gray in white and black. Grayscale images do not contain chromatic variation [3].

2.1.2 Image models

Narendra Ahuja, B.C Schachter (December 1981) states that image can be categorized on pixel- base and region base models. Pixel base model are further divide into three types. Syntactic models, one dimensional tune series models and random field models. Random field models further incorporate eight global and local properties of an image. [40]

2.2 What is Information Retrieval? How it is measure?

Information retrieval (IR) is the science of searching for documents, for information within documents, and for metadata about documents, as well as that of searching relational databases and the World Wide Web. More formally, the field of Information Retrieval (IR) is concerned with the retrieval of information content that is relevant to a user's information needs (Frakes 1992). Information Retrieval is often regarded as synonymous with document retrieval and text retrieval, though many IR systems also retrieve pictures, audio or other types of non-textual information. The word document is used here to include not just text documents, but any clump of information [41].

2.2.1 Searching vs. Indexing

Document retrieval subsumes two related activities: indexing and searching (Sparck Jones 1997). Indexing refers to the way documents, i.e. information to be retrieved, and queries, i.e. statements of a user's information needs, are represented for retrieval purposes. Searching refers to the process whereby queries are used to produce a set of documents that are relevant to the query. Relevance here means simply that the documents are about the same topic as the query, as would be determined by a human judge. Relevance is an inherently fuzzy concept, and documents can be more or less relevant to a given query. This fuzziness puts IR in opposition to Data Retrieval, which uses deductive and Boolean logic to find documents that completely match a query [42].

2.2.2 Text based Retrieval System

R. Papka et. Al (1996) stated that in text based information retrieval systems, when the user enters a query q against a collection of documents c then each document d is examined and is given a weight according to the criteria of how well it satisfies the semantics of the query q or in other words the demand of the user up to the satisfaction level. According to R. Papka et. Al (1996), for every instance of triple <q; d; c> the weight assignment attributed to a document d is done by a function eval<q; d; c>. Generally speaking an IR system performs various retrieval tasks such as document indexing, ranking and classification [32]. Document Ranking is achieved by sorting all the documents in the collection on the basis of assigned weights [33] [34] [35].The core of an IR system is the parsing process. The tokens produced in the parsing of documents and queries are called terms. The query entered in IR system in natural language form is parsed into a set of term. When the terms are derived from a document after parsing then these terms are used to build an inverted list structure which is used as an index to the document collection. Normally, in IR systems it is assumed that if there is co-occurrence of a term in the query and a document then the document is relevant to the query. Also the co-occurrences of multiple terms of a query in a document contribute to the degree of relevance of that document [38].

2.2.3 Text Base Retrieval Process


Figure 2: The Process of Retrieving Information in Text Based Retrieval System [36]

According to the architecture shown in figure 1 the pre-step to information retrieval is to define the text database. This is the job of database manager to specify the documents which are to be utilized, the text operations and the text models including the structure and elements of text to be retrieved. Application of text operations transforms the original document into its logical view. After setting up the logical view the Database Manager creates an index of the text. According to Baeza-Yates (1999) an index is a critical Data Structure becuase it makes fast searching over huge data volumes efficiently possible.Different structures are used as indexes but the most widely used strucuture is inverted file [37].

According to Justin Zobel et Al.(1999) an inverted file consists of two parts: a) vocabulary, which stores all the distinct values to be indexed b) Inverted list, which stores the identifiers corresponding to the records that contain those distinct values. Queries are evaluated by fetching the inverted list for the query terms. Once the inverted list is fetched the record identifiers are mapped to the physical memory addresses using an address table which is usually stored in memory or on disk [36].

After the document indexing, the retrieval process can be initiated. The user specifies his need in the form of a natural language query. This query is parsed and transformed with the help of same text operations that are applied to the text itself. Query operations transform the user query into system representation of the user need. This form of query is then processed to obtain the retrieved documents. Faster searching is made possible by the indexes previously built. [36]Before representing the retrieved documents to the user, these documents are ranked according to the degree of relevancy. During the examination of the set of ranked documents, the user selects some of the documents that might be of interest to him. Thus initiates a user feedback cycle. In this feedback cycle the system, according to the documents selected by the user reformulates the query so that it better represent real need of user. [36]. It is worth mention that most users do not have any knowledge of text and query operation that is why the query that they pose to the retrieval systems is frequently inadequate and irrelevant. This poor formulation of the query often leads to a poor retrieval which can be commonly observed on the web in case of using web search engines. [36].

2.3 Image Base Retrieval System

Gonzalez, Rafael, C; Woods, Richard E (2008) states that an image retrieval system is a computer system for browsing, searching and retrieving images from a large database of digital images [3]. Zoran Peˇcenovi´c, Minh Do, Serge Ayer describe the image retrieval process .According to them the need for desire image or information is captured in form of query and presented to retrieval system for desire result. In multimedia technology the field of image retrieval is of prime importance and growing rapidly. Typical image retrieval process involve the extracting and processing of meaning full information about images from distributed image collection to characterize images in collection. Then fast intelligent manipulation of these feature .these feature are then match to the processed query features to find out similarity. Results are refined by user through feedback and specification [5]. J.P.Eakins (1996) describe that current IR system mostly use low level feature. These features can dig out easily from machine automatically. Semantic level retrieval is more desirable by the user. Due to heterogeneous and fast growing image collection it cannot be achieved easily [4].

2.4 Image Retrieval Process:

Dr. Fuhui Long, Dr. Hongjiang Zhang and Prof. David Dagan Feng describe the process of typical content base image retrieval. Multidimensional feature vectors are formed through extraction of visual contents from image database. Feature vectors of images from the feature data base. Users provide the retrieval system with example images or sketched figures to retrieve an image.

These images are converted into internal representation of feature vectors. The similarities /distances between the feature vectors of the query example or sketch and those of the images in the database are then calculated and retrieval is performed with the aid of an indexing scheme. The indexing scheme provides an efficient way to search for the image database. Recent retrieval systems have incorporated users' relevance feedback to modify the retrieval process in order to generate perceptually and semantically more meaningful retrieval results. [18]

Figure : Image Retrieval Process [18]

2.5 Architecture of Image Retrieval System

Young Rui, Thomas S.Huang and Shih Fu Chang de focus that interdisciplinary knowledge is essential for efficient image retrieval system. They proposed possible system architecture. According to them architecture will have three databases.

Image collection database: it contains the raw images for display purpose.

Visual feature database: it stores the different visual features extracted from the image using different techniques

Text Annotation database: it is free text and key word base representation of an image

Query Interface sub module and Query preprocessor module: it is graphic based .it collect the information from the user in friendly manner and display the output in the form of retrieved result so it provides a way to communicate with user in user friendly manner

So it is multi disciplinary user interactive architecture for image retrieval system [19].

Figure : Architecture of Image Retrieval System [19]

2.6 Information Retrieval Systems & Image Retrieval Systems.

According to Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang( April 2005) stated that easy web hosting and low-cost storage has stimulate the transmutation of common man from an inert consumer of photography in the past to a current-day active producer. Today, image data is not centered to one location growing rapidly, with extremely varied visual and semantic content. All these parameters are have created numerous possibilities and considerations for real-world image search system designers. [8]

2.7 Methods for Image Retrieval

Dr. Fuhui Long, Dr. Hongjiang Zhang and Prof. David Dagan Feng stated that image retrieval basically can be classified into two forms


Annotation is further divided into manual annotation and auto-annotation of images

CBIR is second method of image retrieval .This methodology avoids the use of textual descriptions and image retrieval is base on visual similarity to a user-supplied query image or user-specified image features. [11]

2-Content base image retrieval systems

Content-based image retrieval uses three visual contents of an image such as color, Shape, texture and spatial layout to represent and index the image. In any typical content-based image retrieval systems retrieval process can be considered as. Firstly the visual contents of the images in the database are extracted and described by multi-dimensional feature vectors. The feature vectors of the images in the database form a feature database. Users provide the retrieval system with example images or sketched figures. As there are different forms of query internal representation of these example or query images are obtained and stored as feature vector. Then similarities and difference are being computed between both feature vectors. Retrieval is performed then using some good indexing technique. An indexing technique provides a sufficient way to search for image database. Users' relevance feedback is another parameter incorporated in image retrieval systems to modify the retrieval process in order to produce more meaningful retrieval results. If we think the image retrieval process and break down into different phases .Feature extraction from image is first step in most of IR system. This feature can be local features like shape and texture and global feature like color and color histogram. [11]

2.8 Color Base Feature Extraction

2.8.1 Color Space

Pixel is minute point of an image in 3D color space. Three color spaces are commonly used for image retrieval. These are RGB, CIE L*a*b*, CIE L*u*v*, HSV (or HSL, HSB), and opponent color space. It is difficult to decide which color space is best but desirable feature of color space is uniformity. Uniformity means that if two color pairs are in equal similarity distance in color space is perceived equal by viewer also.

RGB space is composed of three basic color components and is a widely used color space for image display. These components are called "additive primaries" since a color in RGB space is produced by adding them together. In contrast, CMY space is a color space primarily used for printing. The three color components are cyan, magenta, and yellow. These three components are called "subtractive primaries" since a color in CMY space is produced through light absorption. Both RGB and CMY space are device-dependent and perceptually non-uniform. The CIE L*a*b* and CIE L*u*v* spaces are device independent and considered to be perceptually uniform. They consist of a luminance or lightness component (L) and two chromatic components a and b or u and v. CIE L*a*b* is designed to deal with subtractive colorant mixtures, while CIE L*u*v* is designed to deal with additive colorant mixtures [11].

In HSV (or HSL, or HSB) space is widely used in computer graphics and is a more intuitive way of describing color. The three color components are hue, saturation (Lightness) and value (brightness). The hue is invariant to the changes in illumination and camera direction and hence more suited to object retrieval. RGB coordinates can be easily translated to the HSV (or HLS, or HSB) coordinates. The opponent color space uses the opponent color axes (R-G, 2B-R-G, R+G+B).This representation has the advantage of isolating the brightness information on the third axis. With this solution, the first two chromaticity axes, which are invariant to? The changes in illumination intensity and shadows can be down-sampled since humans are more sensitive to brightness than they are to chromatic information. [11]

Retiendra Datta, jia li, and James Z.Wang (2005) stated that color in possibly distinctive and most prevailing visual feature in content base image retrieval. In CBIR the color histogram is mostly use descriptor. A color histogram captures the distribution of colors in an image. The advantage is that color histogram are easy to compute .But the drawback is that resultant large image feature vectors are hard to index and it also involve high search and retrieval cost. In addition it lack the spatial information .e.g. the histogram from an image having a red color blob in green background is same to histogram of that image that contain same number of red and green pixels but distributed in random order. Several of the recently proposed color descriptors try to integrate spatial information along with color histogram. [10]

2.8.2 Color Moments

According to Y.AlpAslandogan, Clement (2009), when an image contains objects, color moments are particularly used. It has been successfully used in many systems like QBIC. There are three moments. The first order (mean), the second (variance) and the third order (skewness).Color moments have been proved to be efficient and effective in representing color distributions of images. Usually the color moment performs better if it is defined by both the L*u*v* and L*a*b* color spaces .Color moments give the dense representation of image at the cost of discriminative power. The advantage is only 9 numbers are used to represent the content of an image [12].

2.8.3 Color Histogram

The color histogram serves as an effective representation of the color content of an

Image if the color pattern is unique compared with the rest of the data set. The color

Histogram is easy to compute and effective in characterizing both the global and local

Distribution of colors in an image. In addition, it is robust to translation and rotation

About the view axis and changes only slowly with the scale, occlusion and viewing

angle. Since any pixel in the image can be described by three components in a certain

Color space (for instance, red, green, and blue components in RGB space, or hue,

saturation, and value in HSV space), a histogram, i.e., the distribution of the number

of pixels for each quantized bin, can be defined for each component [12]. Clearly, the

More bins a color histogram contains, the more discrimination power it has. However,

A histogram with a large number of bins will not only increase the computational cost, but will also be inappropriate for building efficient indexes for image databases.

Furthermore, a very fine bin quantization does not necessarily improve the retrieval

performance in many applications. One way to reduce the number of bins is to use

The opponent color space which enables the brightness of the histogram to be down

Sampled. Another way is to use clustering methods to determine the K best colors in a

given space for a given set of images. Each of these best colors will be taken as a

histogram bin. Since that clustering process takes the color distribution of images

Over the entire database into consideration, the likelihood of histogram bins in which

no or very few pixels fall will be minimized. Another option is to use the bins that have the

largest pixel numbers since a small number of histogram bins capture the majority of pixels of

an image. Such a reduction does not degrade the performance of histogram matching, but may even enhance it since small histogram bins are likely to be noisy [12].

When an image database contains a large number of images, histogram comparison

will saturate the discrimination. To solve this problem, the joint histogram technique

is introduced. In addition, color histogram does not take the spatial information

of pixels into consideration, thus very different images can have similar color

distributions. This problem becomes especially acute for large scale databases. To

Increase discrimination power, several improvements have been proposed to

Incorporate spatial information. A simple approach is to divide an image into

sub-areas and calculate a histogram for each of those sub-areas. As introduced above,

The division can be as simple as a rectangular partition, or as complex as a region or

even object segmentation. Increasing the number of sub-areas increases the

information about location, but also increases the memory and computational time. [12]

2.8.4 Color Coherence Vector

A different way of incorporating spatial information into the color histogram is color coherence vectors (CCV). Each histogram bin is partitioned into two types, i.e., coherent, if it belongs to a large uniformly-colored region, or incoherent, if it does not due to its additional spatial information, it has been shown that CCV provides better retrieval results than the color histogram, especially for those images which have either mostly uniform color or mostly texture regions. In addition, for both the color histogram and color coherence vector representation, the HSV color space provides better results than CIE L*u*v* and CIE L*a*b* space. [12]

2.8.5 Color Correlogram

The color correlogram was proposed to characterize not only the color distributions of pixels, but also the spatial correlation of pairs of colors. The first and the second dimension of the three-dimensional histogram are the colors of any pixel pair and the third dimension is their spatial distance. A color correlogram is a table indexed by color pairs, where the k-th entry for (i, j) specifies the probability of finding a pixel of color j at a distance k from a pixel of color i in the image. Let I Color plays a significant role in image retrieval. Different color representation schemes include red-green blue (RGB), chromaticity and luminance system of CIE (International Commission on Illumination), hue-saturation-intensity (HSI), and others. The RGB scheme is most commonly used in display devices. Hence digital images typically employ this format. HSI scheme more accurately reacts the human perception of color. All perceivable colors can be reproduced by a proper combination of red, green and blue components. A 24-bit per pixel RGB color image may have 2 or approximately 16.7 million distinct colors. In order to reduce the numbers of colors for efficiency in image processing, colors are quantized with a suitable algorithm [12].

2.9 Texture Base Feature Extraction

Y.AlpAslandogan, Clement (1999) stated that texture is such visual pattern that has properties of homogeneity and this homogeneity is not due to sung intensity or color. Texture is a visual pattern where there are a large number of visible elements densely and evenly arranged. A texture element is a uniform intensity region of simple shape which is repeated. Texture can be analyzed at the pixel window level or texture element level. The former approach is called statistical analysis and the latter structural analysis. Generally, structural analysis is used when texture elements can be clearly indented; whereas statistical analysis is used for ne (micro) textures .Statistical measures characterize variation of intensity in a texture window. Example measures include contrast (high contrast zebra skin versus low contrast elephant skin), coarseness (dense pebbles vs. coarse stones), and directionality (directed fabric versus undirected lawn). Fourier spectra are also used to characterize textures. By obtaining the Fourier transform of a texture window, a signature is generated. Windows with same or similar signatures can be combined to form texture regions. Structural texture analysis extracts texture elements in the image, determines their shapes and estimates their placement rules. Placement rules describe how the texture elements are placed relative to each other on the image and include measures such as the number of immediate neighbors (connectivity), the number of elements in unit space (density), and whether they are layed out homogeneously (regularity). By analyzing the deformations in the shapes of the texture elements and their placement rules, more information can be obtained about the scene or the objects in it. For instance, a density increase and size decrease along a certain direction might indicate an increase in distance in that direction. [12]

2.10 Working of Large Scale Image Search Engine?

According to Y.AlpAslandogan, Clement (1999) some retrieval systems are combination of keyword base retrieval and perception base retrieval system. They work in iterations. In this proposed perception base retrieval system the results are better by achieving more relevant images. The proposed system uses the relevance feedback and sampling to refine the results in iterations. So this system uses active learning to capture subjective and complex query concepts. Two main component of the proposed image retrieval system are a multi-resolution image-feature extractor and a high-dimensional indexer. Both these parts help in query-concept learning and image retrieval efficient. [12]

2.11 Present Image Retrieval system Process.

H. B. Kekre, Sudeep D. Thepade (2009) about Block truncation coding (BTC). It was first developed for grayscale image coding. Later it was extend to color images .This method segment the color image into three components R,G and B.Threshold is set by taking the mean of interband average image(IBAI).IBIA is average of all three components. Each pixel is compared to its threshold value to create bitmap. If a pixel in the interband average image is greater than or equal to the threshold, the corresponding pixel position of the bitmap will have a value of 1otherwise it will have a value of 0. Two mean colors one for the pixels greater than or equal to the threshold and other for the pixels smaller than the threshold are also calculated. There are various equations given in paper [13] to describe the example to compute mean. The two means Upper Mean and Lower Mean are calculated and both together will form a signature of vector of an image. These features are stored in feature vector table .This process is done for images in database. The feature vector is computed for the query image when Query image is giving to CBIR system. Then a matching process starts in which these feature vector is compared with the table entries for best match. Image Retrieval applications based on block truncation code used Direct Euclidean Distance as a similarity measure to compute the similarity measures of images. The disadvantage of the above mention technique is those prominent components dominates the threshold vale and minimize the effect of other components. A more generic approach is by using three independent R, G and B components of color images. Then based on these component individual threshold for each component be created. Then apply BTC to each R, G, and B individual planes. Thresholds for the R, G, B be TR, TG and TB, .This can be computed from equation below. [14]

Here three binary bitmaps will be computed as BMr, BMg and BMb. If a pixel in each component (R, G, and B) is greater than or equal to the respective threshold, the corresponding pixel position of the bitmap will have a value of 1 otherwise it will have a value of 0.J

BMr (i, j) =

In the same way bitmap for color Blue and Green is computed. Two mean colors one for the pixels greater than or equal to the threshold and other for the pixels smaller than the threshold are also calculated .The upper mean color UM(Rm1,Gm1,Bm1) and lower color mean for three colors are computed .Equation for color red is given below[14].

RM1 and RM2 the Upper Mean and Lower Mean together will form a feature

vector or signature of the image. These feature vectors are computed and stored in feature vector table for every image stored in database. BTC-RGB gives the best performance if precision and recalls both are considered. [14]

2.12 RST?

According to Bill Mann, RST is linguist theory that explains the relationship between texts. This theory was originally developed as part of studies of computer based text generation.RST offers an explanation of coherence of text. Mainly it describes the coherence of text. It describe about the structure of text as various sorts of building blocks. The blocks are two levels, the coherence relations and schemas. [15]

Rhetorical Structure Theory was proposed by Mann and Thompson in 1980's as a result of extensive text analysis [20] [21] [22] [24]. Initially it was considered as a descriptive linguistic approach to identify the textual relations with the intention of generating text [25].The success of RST can be seen from the wide application of this theory in various domains ranging from discourse analysis to psycholinguistics, theoretical linguistics and computational linguistics [25][26][27].

This theory is remarkably used in various applications of computational linguistics, for example, text parsing and generation, machine translation and easy scoring and most importantly in natural language processing [25] [29] [28].This theory organizes the text by means of relations that exist between various sections of the text and establishes a coherency in which every section of text plays a role or function with respect to other sections of the text. The resulting relations are termed as coherence relations, discourse relations, conjunction relations by Mann (2006) and Rhetorical Relations by Asher and Lascarides (2003). Mann indentified 30 different relations in various sections of text [25].RST identifies two different types of text units: The Nucleus and the Satellites [31]. Nuclei are of prime importance whereas the satellites contribute to the nucleus and are considered secondary. Mann (2006) eludes the Elaboration relation in which the nucleus is the part of text containing basic information and satellites are parts of text containing additional or supporting information about the nucleus. According to Mann (2006) RST relations on the text are applied recursively until all text units are associated with one of the RST relations. The effect of any specific text can be summed up in one top level relation in hierarchy and is further decomposed in those particular relations that contribute to that effect. Then the text analysis, according to various constraints such as completeness, connectedness, uniqueness and adjacency etc, yields RST structure trees with one top level relation with respect to a number of low level relations, where main relation is one of the preparations. [25] According to Shoaib (2006) the current information retrieval techniques are able to retrieve only 30% of the relevant information. Currently the information retrieval systems are based on key-based indexing technique in which keywords are assigned static weights using some retrieval model such as extended Boolean, vector-based or probabilistic etc. The words carrying different meaning in various contexts can result in retrieval of irrelevant information [30]. Shoaib (2006) proposed an indexing technique with Dynamic Weight assignment based on Rhetorical Structure Theory. They used the punctuation and cue phrases (words that connect two or more text spans) to define the rhetorical relations. They constructed RST tree whose leaves represent the text spans and the internal nodes represent the rhetorical relations. Keywords extracted from the text spans are assigned dynamic weights such that the keywords in the text span closer to nucleus are assigned priority weights. With this dynamic weight assignment technique the precision rate has been clearly improvised. The system developed on the basis of this approach is able to capture the semantics of a document in an efficient way [30]. The architecture of their proposed model consisted of four different modules, Segmentor, Rhetorical Relation Finder, Rhetorical Tree Parser and RST based Indexer. Figure 2 illustrates the various components of the model proposed by Shoaib (2006).

rst model

Figure : RST Based Text Retrieval System based on dynamic Weights. [30]

The purpose and functionality of each component is defined as followed: [30]

1. SEGMENTOR: Text structure is usually organized into words, sentences and paragraphs. There is a correlation between these different text spans which is needed to be identified. To handle the structure in the text there is a need to identify the boundaries in the structure. This is the primary function of text segmentation. Text segmentor provides the structural information which is utilized for text analysis.

2. RHETORICAL RELATION FINDER: In the next step the indexer automatically finds out the rhetorical relations. For this purpose Shoaib and Abad (2006) used the concept of cue phrases and punctuation to get the relationship between every text span.

3. RST TREE: from the list of text spans and the list of relations generated in the previous stage Rhetorical Tree is generated whose nodes represent the relations and leaves represent the text spans. Based on the height of the tree, initial weights are assigned to text spans. The indexer will utilize the keywords and their corresponding initial/Dynamic weights to populate its knowledge base.

4. INDEXER: Indexer uses the concept of strong and weak nodes to assess the initial weights. This initial weight assessment is used for dynamic weight assignment. The initial weight is taken between 0 and 1. Root node is assigned 1 whereas the nucleus and satellite to the parent node are assigned 0.9 and 0.5 respectively. These weights are variable. The weight assigned to child node is generalized by shoaib and Abad (2006) into the following formula:

Initial weight of Child node = Weight of child node * weight of parent node

After calculating the initial weights of all children nodes these weights are associated with the with the index terms existing in the text spans. On the basis of initial weight assessment and term frequency the actual weights of index terms are calculated by shoaib and Abad (2006) as follows:

Actual Weight of the index term = initial weight assessment * term frequency

Indexer maintains the knowledge base by saving in it the document ID, vocabulary ID and actual weights. Document ID keeps track of which word belongs to which document. Vocabulary ID assures that there is no redundancy of words in the knowledge base thereby consuming less space. Dynamic weight indicates the semantic based occurrence of important index terms. [30]

2.12.1 Rhetorical Structure Theory and Multimedia Object?

Dr.Muhammad Shoaib and S.Khaldoon find out relationship between RST and multimedia objects. RST is not limited to text structure only. Its scope is enhanced to multimedia objects as well including audio, video, text, graphics and images. The propose model for multimedia objects with respect to RST is (N, S (t, n), MFi).where N is nucleus S (t, n) are satellite or satellites that provide solution for large objects. It contains the text tht provide support to nucleus where t is text part, any word or sentence related to multimedia objects, and n describe the number of satellite that will be affiliated to the nucleus. MFi is multimedia framework .it provides two basic features Mi= (D, F). First one is raw data of the object and second one is the feature associated with multimedia objects. [16]

2.13 Indexing techniques for Images:

A latent semantic indexing technique is based on matrix computation of image. As majority of images are stored as raster images. According to LSI images can be viewed as vector of pixels .Each vector represent some keyword in image. When a person sees an image it doesn't think about pixel. S/He automatically extracts the features that define the meaning of image for him but the object on image.

LSI focuses on searching the semantic of document. This searching is done by using matrix computation in particular single value decomposition(SVD).So in multidimensional space the two documents that are semantically similar will be closely located[6].

2.13.1 Latent Semantic Indexing (LSI) for image retrieval system by

(Avel Praks, Jiri Dvorsky, Vaclav Snasel).Latent Semantic Indexing (LSI)

LSI focus on search that semantics of document can be better represented. Matrix computation is used during searching process. Matrix Computation use the method of singular value decomposition (SVD) [39].

Every pixel in image represents some color. So any color image can be viewed as

vector of pixels representing the image semantic or keyword inside that image.

A set of n documents containing m dictionary terms is represents in matrix A= m*n.

So A is term document matrix.LSI aim to perform the singular value decomposition (SVD) of matrix A.


Where S belongs to Rm*n .Diagonal Matrix with non negative elements called singular values € Rm*m And V € Rm*m are orthogonal matrices.

The columns of the V and U matrixes are called right singular vector and left singular vector respectively [39].The decomposition is obtained such that singular values are sorted by declining order. Full SVD composition is time and memory consuming operation. Computed singular values and vectors are placed in memory and can later on be chosen during LSU speed /precision ration decision [39].

2.13.2 Vector Base Indexing Technique:

Mohamad Obeid, Bruno Jedynak and Mohamed Daoudi (2001) describe the process of image indexing. According to them initially the image database is semantically categorized in broad categories, texture, and related images are put in to that category to serve as intermediate feature. For each texture build a 3D color histogram made of 32 bins. Pi (r, g, b) represents the proportion of the pixels with some values in bin t (1 <=i<=6) [17].

An image is indexed by using a vector. Vector values represent the estimated proportion of texture in that particular image. For every pixel n compute the f (n).

The natural texture (t) is determined by f (n).where the quantity p(r, g, and b) is maximum this is then the estimated natural texture of that particular pixel. Pixels with low values of p(r, g, and b) are not labeled. Problem with this approach is that annotated images are needed. Putting of some semantic knowledge to categorize various images in texture is also a pre-work. The advantage of approach is that results are obtained with high precision and recall [17].

3.Problem statement

There is tremendous increase in the web growth and with the ever changing web contents retrieval of relevant information has become an imperative issue. Contents are not just limited to text but it also include other multimedia objects like images, movies etc.

Rhetorical Structure Theory has introduced the semantic based retrieval hence reducing the irrelevant search to its minimum. M.Shoaib and Abad Ali Shah (2006) gave indexing technique based on Rhetorical Structure theory

It works trough text segmentation, rhetorical relation finding, rhetorical parse tree making, weight assignment and storing all the relevant information in relational database.

The rhetorical structure theory has been proved for multimedia objects as well through a model. The propose model for multimedia objects with respect to RST is (N, S (t, n), MFi).where N is nucleus S (t, n) are satellite or satellites that provide solution for large objects. [16]. Currently, a lot of indexing techniques like Latent Semantic Indexing (LSI), Vector Base Indexing Technique exists for image retrieval but none of them give exact information and none is based on RST.RST based indexing techniques for text exists(43),So it give birth to the need of some model for image indexing based on RST.

Indexing technique plays a vital role in retrieving any object or document. Underlying object is retrieved based on its indexed mechanism. .Retrieval results depend on indexing structure of underlying object or text.