A Natural Language Processing Interface Agent Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

With the amount of information on the Internet growing at a rapid rate, users are finding it increasingly difficult to locate relevant data in a timely and effortless fashion. All too often, search engines provide large amounts of information that do not meet the user's needs. As a result, important data can be missed as the user tries to narrow the search and hence reduce the number of successful hits. The aim of this research is to create a natural language interface to enhance the search capabilities of search engines

1. Aim of the research

The value to our society of being able to communicate with computers in everyday "natural" language cannot be overstated. Imagine asking your computer "Does this candidate have a good record on the environment?" or "When is the next televised National League baseball game?" Or being able to tell your PC "Please format my homework the way my English professor likes it." Commercial products can already do some of these things, and AI scientists expect many more in the next decade. One goal of AI work in natural language is to enable communication between people and computers without resorting to memorization of complex commands and procedures. Automatic translation---enabling scientists, business people and just plain folks to interact easily with people around the world---is another goal. Both are just part of the broad field of AI and natural language, along with the cognitive science aspect of using computers to study how humans understand language. Due to the afore mention reasons the primary objective of this research is the creation of a natural language interface agent to enhance PISA

The Personal Internet Search Assistant (PISA) system simplifies the search process by using a preemptive approach to searching. The system consists of a set of cooperative agents that monitor the user's Internet browsing behavior, which will enable it to seek out relevant information on its own. It also has the capability of functioning as a conventional search engine. The PISA system is made up of five software agents. Each agent has been designed to fulfill a specific purpose in an independent manner. This independence has been achieved by allowing each agent to act as a server. However, when combined they provide the user with a more satisfying searching experience. Even though the agents are independent, communication links have been created that enable the sharing of data to take place. Consequently, these agents can be on a single computer or distributed among more than one computer in different locations. (Byer 2008)

This agent will allow users to use complete sentences and questions went searching for information instead of key words and short phrases. The interface will use the principles of natural language processing to decode the sentence statement or question to find the best match of data that is within the database of the search engine.

2. Background

2.1 Advantages of Natural Language Interfaces

Natural language is only one medium for human-machine interaction, but has several obvious and desirable properties:

It provides an immediate vocabulary for talking about the contents of the computer.

It provides a means of accessing information in the computer independently of its structure and encodings.

It shields the user from the formal access language of the underlying system.

It is available with a minimum of training.

2.2 The Hardness of Natural Language

There are several major reasons why natural language understanding is a difficult problem. They include:

The complexity of the target representation into which the matching is being done. Extracting meaningful information often requires the use of additional knowledge.

The type of mapping: one-to-one, many-to-one, one-to-many, or many-to-many. One-to-many mappings require a great deal of domain knowledge beyond the input to make the correct choice among target representations. So for example, the word tall in the phrase "a tall giraffe" has a different meaning than in "a tall poodle." English requires many-to-many mappings.

The level of interaction of the components of the source representation. In many natural language sentences, changing a single word can alter the interpretation of the entire structure. As the number of interactions increases, so does the complexity of the mapping.

The presence of noise in the input to the understander. We rarely listen to one another against a silent background. Thus speech recognition is a necessary precursor to speech understanding.

The modifier attachment problem. (This arises because sentences aren't inherently hierarchical, I'd say -- POD.) The sentence Give me all the employees in a division making more than $50,000 doesn't make it clear whether the speaker wants all employees making more than $50,000, or only those in divisions making more than $50,000.

The quantifier scoping problem. Words such as "the," "each," or "what" can have several readings.

Elliptical utterances. The interpretation of a query may depend on previous queries and their interpretations. E.g., asking who is the manager of the automobile division and then saying, of aircraft?

3. What is natural Language processing

3. 1 Natural Language Processing

Language processing can be divided into two tasks:

Processing written text, using lexical, syntactic, and semantic knowledge of the language as well as any required real world information.

Processing spoken language, using all the information needed above, plus additional knowledge about phonology as well as enough additional information to handle the further ambiguities that arise in speech.

The steps in the process of natural language understanding are:

3.2 Morphological analysis

Individual words are analyzed into their components, and non-word tokens (such as punctuation) are separated from the words. For example, in the phrase "Bill's house" the proper noun "Bill" is separated from the possessive suffix "'s." ( Ritchey 2006)

3.3 Syntactic analysis

Linear sequences of words are transformed into structures that show how the words relate to one another. This parsing step converts the flat list of words of the sentence into a structure that defines the units represented by that list. Constraints imposed include word order ("manager the key" is an illegal constituent in the sentence "I gave the manager the key"); number agreement; case agreement. (Green and Morgan 2001 )

3.4 Semantic analysis

The structures created by the syntactic analyzer are assigned meanings. In most universes, the sentence "Colorless green ideas sleep furiously" (Chomsky, 1957) would be rejected as semantically anomalous. This step must map individual words into appropriate objects in the knowledge base, and must create the correct structures to correspond to the way the meanings of the individual words combine with each other. (Landauer, Foltz & Laham 1998)

3.5 Discourse integration

The meaning of an individual sentence may depend on the sentences that precede it and may influence the sentences yet to come. The entities involved in the sentence must either have been introduced explicitly or they must be related to entities that were. The overall discourse must be coherent. (Yang et al. 2001)

3.6 Pragmatic analysis

The structure representing what was said is reinterpreted to determine what was actually meant(Austin 1962).

4. Existing Natural Language Systems


Syntactic Appraiser and Diagrammer -- Semantic Analyzing Machine that was programmed by Robert Lindsay in 1963 at CMU. It used a basic English vocabulary (1,700 words) and followed a context-free grammar. It parsed input from left to right, built derivation trees, and passed them to SAM, which extracted the semantically relevant information to build family trees and find answers to questions. (Lindsay 1963)


An information retrieval program with a large database of facts about all American League games over a given year. It accepted input questions from the user, limited to one clause with no logical connectives. (Green 1963)

4.3 SIR

Semantic Information Retrieval system, it was a prototype "understanding" machine, since it could accumulate facts and then makes deductions about them in order to answer questions. (Raphael 1968)


The most famous pattern-matching natural language program, ELIZA was built at MIT in 1966. The program assumes the role of a Rogerian, or "nondirective," therapist in its dialog with the user. (Weizenbaum 1966)

It operated by matching the left sides of its rules against the user's last sentence, and using the appropriate right side to generate a response. Rules were indexed by keywords so only a few had to be matched against a particular sentence. Some rules had no left side, so they could apply anywhere with replies like "Tell me more about that." Note that these rules are "approximate" matchers. This accounts for ELIZA's major strength, its ability to say something reasonable most of the time, as well as its major weakness, the superficiality of its understanding and its ability to be led completely astray.


LUNAR answered questions about the rock samples brought back from the moon using two databases -- the chemical analyzes and the literature references. Specifically, it helped geologists access, compare, and evaluate chemical analysis data on moon rocks and soil composition obtained from the Apollo-11 mission. It operated by translating a question entered in English into an expression in a formal query language. The translation was done with an ATN parser coupled with a rule-driven semantic interpretation procedure. (William Woods 1973)