Development Of Efficient Indexing And Searching Technique English Language Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.


Interest in image retrieval has increased in large part due to the rapid growth of the World Wide Web. According to a recent study (Lawrence & Giles, 1999).There are 180 million images on the publicly index able Web, a total amount of image data of about 3Tb [terabytes],and an astounding one million or more digital images are being produced every day (Jain, 93). The need to find a desired image from a collection is shared by many groups, including journalists, engineers, historians, designers, teachers, artists, and advertising agencies. Image needs and uses across users in these groups vary considerably. Users may require access to images based on primitive features such as color, texture or shape or users may require access to images based on abstract concepts and symbolic imagery. The technology to access these images has also accelerated phenomenally and at present surpasses our understanding of how users interact with visual information.Content-based image retrieval (CBIR), also known as query by image content (QBIC).Content-based means that the search will analyze the actual contents of the image. The term 'content' in this context might refer to colors, shapes, textures, or any other information that can be derived from the image itself. Without the ability to examine image content, searches must rely on metadata such as captions or keywords, which may be laborious or expensive to produce.

Rhetorical Structure Theory (RST) is very old linguistic theory to find out relation between text .It can be used to find out relation between images. According to this theory analysis is based on the assumption that some text units are more central (salient) to the text than others, and that the other units are given to support the reader’s belief in them. The central units are named nuclei, and the supporting units are named satellites. Rhetorical relations are described in terms of schemas, i.e. the way in which one or more satellites (or nuclei) are related to the current nucleus. It is also assumed that a relation that holds between two text spans also holds between the nuclei of those text spans. The application of a particular schema to a couple of text units is restricted by a number of constraints. Therefore, for this purpose the Analyst a Person or Reader takes coherent text; parse the text and give output as a parse tree depending on the semantic of the text.RST is proved for textual information retrievals. I will make research how RST can be used for finding the semantics of images and to relate it to text

For textual indexing , I will use dynamic weight assignment methodology and based on this methodology. This technique may utilize the parse tree of coherent text developed using RST. This technique index all the terms (keywords); assigns them weights dynamically and generates the Knowledge Base of Indexed Terms. Later, search engine will use it. The user may query the desktop-based search engine that will analyze it according to the semantics and fetch the related information form the knowledge base.

I am interested to use and enhance indexing technique for information retrieval system using on Rhetorical Structure Theory (RST) using NLP technique.

The Basic theme behind this project is to take coherent text index for particular image on basis of their semantics basing on RST. In this context, I am going to use English coherent as the source text and will do all the research work for this context.


Now a days image retrieval is combination of content based and key word based retrieval. To infer the semantic of text Rhetorical Structure Theory has become one of the most popular and widely applied discourses theories of the last decade. It is the descriptive theory of a major aspect of the organization of natural text that shows how texts can be structured in terms of relations that hold between parts of the texts. Based on Text semantics image information retrieval can be achieved .Ushenko's Field Theory Of Meaning is also in use to derive relation ship between text and images. On the other hand, dynamic weight assignment is a new technique that is just purposed and published but not implemented yet at commercial level. This technique is focusing on understanding of semantics of the text rather than just term frequency comparison. This technique emphasis on, how to dynamically weight the keywords to support information retrieval. To accomplish the task some new relationships are expected to be developed that may result in amendment in RST Relation Collection.

Two imortnat part of project are sematic reterival and semantic annotaion.Semantic retrieval,where the user makes a request like "find pictures of dogs" By make use of lower-level features like texture, color, and shape). Semantic annotation.,The problem of creating metadata for images has been of vital importance to art and historical museums when cataloging collection items and storing them in a digital form. Following approaches are commonly used in annotating images: Keywords Controlled vocabularies are used to describe the images in order to ease the retrieval. In Finland, for example, the Finnish web thesaurus YSA is used for the task augmented with museum- and domain specific keyword lists. Classifications There are large classification systems, such as the ICONCLASS and the Art and Architecture Thesaurus that classify different aspects of life into hierarchical categories. An image is annotated by a set of categories that describe it. For example, if an image of a seal depicting a castle could be related to classes "seals" and "castles". The classes form a hierarchy and are associated with corresponding keywords. The hierarchy enriches the annotations. For example, since castles are a subclass of "buildings", keyword "building" is relevant when searching images with a castle. Free text descriptions Free text descriptions of the objects in the images are used. The information retrieval system indexes the text for keyword-based search. Semantic web ontology techniques and metadata languages contribute to this tradition by providing means for defining class terminologies with well-defined semantics and a flexible data model for representing metadata descriptions. One possible step to take is to use RDF Schema for defining hierarchical ontology classes and RDF ontology. The ontology together with the image metadata forms an RDF graph, a knowledge base, which can facilitate new semantic information retrieval services. In our case application, we used this approach.

Along with RST and dynamic weight assignment Techniques this project is state of art of Artificial Intelligence which is involved in Natural Language Processing (NLP) for semantic analysis of the text. This analysis is required at many levels that are: for development of relationships in RSTT development phase, selection of keywords from RSTT, dynamic weight assessment, user query processing.


I am interested to provide a methodology for image retrieval using information retrieval technologies one of which is RST that will be used to enhance to relevancy of information about images. This thesis particularly depends on RST which emphases on the understanding of the semantics of the texts and images As before, human experts do semantic analysis of text but my thesis will provide a methodology to analyze and weight the terms dynamically for the development of the Knowledge Base. Then this knowledge base will be used search engine to facilitate the information retrieval of the required terms. As this thesis is going to use context of coherent English text, it needs some special considerations and particular things to be done like consideration of new relationships new cue phrases and so on.


Following activity plan is laid to accomplish the goal :

Study of currently existing Search Engine Technologies, including famous search engines like Google, Yahoo and MSN.

Study on semantic and sensory gap of Images and text relation .

Study theories to annotation and to extract semantic of Images and

Make a comparative view of RTS and related theories.

Study of RST Theory, its limitations and constraints.

Acquisition of methodology to get text spans from RSTT and Keywords.

Formalization and Development of methodology for Indexing with Dynamic Weight Assessment Capabilities (IDWAC).

Comparison between existing Search Engine Technology and this purposed solution.

Conclusions over comparison results.