This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
In the generation of information overloaded, text documents, digital files, images, audios, etc. all kinds of information flood our daily lives. However, when people accomplish information searching, textual retrieval system is still the mainstream system to use. Why the textual retrieval system is still important? This is because of the increased amount of electronic information are most derived from documents or newspapers. This paper is focusing on the working principle and characteristics of textual retrieval system, users' needs and the current problems. The characteristics are identified from the descriptions of various IR systems such as the Boolean model, the Vector model, the Probabilistic model and the Connectionist model. Instead of trying to find out whether it is possible to model the data more precisely, but to find out a similarly simple model that can improve the results while not reducing retrieval efficiency. By analyzing the principle and model of textual retrieval system, we attempt to find a better information retrieval strategy as well as a better way of improving the textual retrieval system.
An information retrieval system is a system to store, retrieve and maintenance of information. The system is designed to serve a specific function, each of which is made up of a set of interacting components, that are integrated together to achieve a goal. To minimize the overhead of a use locating needed information is the objective of information retrieval system. The area of retrieval system spans a variety of sub-field, including information retrieval, text categorization, information filtering and question answering. Information retrieval users of internet search engines or digital libraries are performed. Text categorization labels text documents with one or more predefined categories possibly organized in a hierarchy. Information filtering or routing that matches input documents with user's interest profiles. Moreover, question answering which aims to extract special and preferably short answers rather then provide full documents containing them.
Information retrieval is as synonymous with document or text retrieval. Nowadays, text retrieval is implying that the task of an information retrieval system is to retrieve documents or texts with information content. The goal of text retrieval is closely relevant to a user's information need to retrieve documents or texts. In general, the central field of text retrieval is the problem of translating the user's needs into queries for text retrieval system. By improving text retrieval system, people's access to the information available to them is being improved. In this paper, we will present the characteristic of textual retrieval system and the description of various models.
Textual Retrieval system
Searching for a specific piece of information of a specific topic in large document repositories is the problem of text retrieval. In practice, the user should be able to retrieve the relevant information which is given a certain natural language query by using this methodology. Indexes of documents are built and the user is given by standard text retrieval the possibility to perform searches by formulating queries in this index. Queries are usually formulated in natural language and the concept is expressed which the user wishes to retrieve. After that, the system should be able to compare the concept which expressed with all the documents in the query. And then, the system ranks the documents in order of relevance and the most relevant ones is given back to the user. Text retrieving contents can be described in many different ways using different words.
3. Information Retrieval Models
Retrieval systems have the descriptions of various models for retrieving documents from huge will now be presented: the Boolean model, the Vector model, the Probabilistic model, the Connectionist model and Latent Semantic Indexing. The most common types of information retrieval are Boolean, vector space and probabilistic model. A Boolean model views each document as a Boolean statement by using operators such as AND, OR, NOT. The query of resolving problem becomes the finding problem which documents make the query true. A vector space model views queries and documents in terms of Euclidean distance which is to reduce for searching the finding of the closest documents to a query. A probabilistic model views each document by sampling from a probability distribution as having been created. The probability of certain features taking place in a document can be described and looking at the probability of relevance can retrieve documents.
3.1 Boolean model
Boolean model is the simplest methods of information retrieval system and use of Boolean operators to combine for searching terms. Queries are Boolean expressions of keywords that is connected by AND, OR and NOT. For complex queries, nested parentheses are used and this model is widely used in commercial system. Textual repositories have already been manually indexed with keyword which applies this method. Moreover, Boolean model can be extended to include ranking and it is reasonably efficient implementation for normal queries. An inverted file organization or text signature is used in retrieval system and proximity operator can be used. Boolean retrieval is very powerful user who knows how to specify exactly what they want. User can be expressed structural and conceptual constraints to important linguistic features. If a query requires a comprehensive and unambiguous selection, Boolean retrieval is very effective. Therefore, it offers a large amount of techniques to narrow or broaden a query.
Boolean model have some drawbacks that natural language is the way more complex. For searching to be more efficient, the user has to have some knowledge to search the topic. This model is difficult to control the amount of documents retrieved and to rank output. If a document is identified by the use as relevant or irrelevant, it will be difficult to perform relevant feedback. Boolean retrieval is a set of documents as a result without any more ranking. If the result set is very huge, the presentation of relevant documents to the user could cause a problem. A query can be modified or structured overwhelms to users with multiple ways. When Boolean operator AND combine with more than three and four criteria, no or very few documents are retrieved. The null-output or the information overload problem is often faced. The degree of uncertainty or error is not represented due the vocabulary problem.