This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Natural language processing (NLP) is a subfield of artificial intelligence related to the field of computer since which is focused on developing efficient algorithms for the processing of texts. This processed information is made accessible to the computer applications.
Understanding the natural human language, the natural language studies the problems of automated generation. The specialized system for Natural language generation system converts data information from computer databases into normal-sounding human language, and natural language understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate.
Natural Language Processing is the artificial intelligent concept where the machines understand Natural Languages like English, Korean, French, Telugu, and Hindi... Etc.
To understand a Human speaking language requires background or common sense knowledge of the common world. Natural language refers to the common language spoken by the people, like English, German, and French etc. The consciousness of humans is coupled with language and models of the outer world which is static and unaffected by the conscious of the entities living in the outside world. Children often makeup some new words while playing which involves a real meaning in the context of their lives.
Basically computer cannot understand the Language spoken by Humans and the characters given as input are converted into machine language in order the computer can understand. There are two basic approaches depending on whether we want to write an effective "natural language front end" to a software system or if we are motivated to do fundamental research on minds and consciousness by building a system that acquire structure and intelligence through its interaction with its environment.
Natural Language Processing is an object of Artificial Intelligence bridging the gap between human language and machine language. In order to do this Finite State Machines are used that can recognize word sequences as syntactically valid sentence. The system uses An ATN based parser of the Word net lexicon.
Technology Involved in the Background
ATN parser which is short for Augmented Transition Networks, is the technology used in the background to represent the grammar rules. This was described by bill Woods in 1970. ATN parsers are finite state machines that recognize word sequences as specific words, noun phrases, verb phrases, etc.
ATN parsers are proved to be useful on following the restrictions:
Use them in a semantically limited domain - limit it to a particular domain
Restricted from very difficult input. Cannot understand ungrammatical sentences the way people can.
Used only for English or other known languages in which the word order determines grammatical structure.
Unsure of working all the time
The context free programming for NLP includes the following:
Difficulty in dealing with different sentences structures that has the same meaning.
Handling number agreement between subjects and verbs.
Determining the deep structure of input texts.
In general accurate assigning correct morphological tags to input text is difficult problem. Hidden Markov Model and Bayesian techniques are used for assigning word types. English Grammar is complex. The important steps in building NLP technology into your own programs are.
Reduce domain of discourse to a minimum.
Create a set of "use cases" to focus your effort in designing and writing ATNs, and to use for testing your NLP system during development.
When possible capture text input from real users of your system, and incrementally builds up a set of use cases that your system can handle correctly.
Map indentified words / parts of speech to actions that system should perform.
Lexicon data is used to indicate the many of the word types. We will use word-net lexical database to build a lexicon.
The main aim of this document is to research and validate the use of Natural language processing in a database query by suggesting a sophisticated ATN parser and similar technology.
The main objective of the system is to design and develop a system that can understand the Natural Languages like English and can convert the natural languages into data base queries.
The queries are executed in the DBMS and the response will be in the Natural Language.
This includes developing the system that can understand Natural language processor using the Artificial Intelligent concepts.
The main aim of the system is to design and implement of ATN Parser in any programming language suitable and create a database.
To create a help file and to retrieve information from the database with a basic level of computer usage and without any idea in programming language.
Natural Language Processing (NLP) is not a new terminology or technology introduced in the recent years. NLP is incredibly old. In the early stages, computers were primarily number processors and the resources available on the first-generation programmable calculators and are not dissimilar to the resources available on the early computers. In between late 1950s and early 1960s, in a way to get cheap programmable calculator to do machine translation from Russian to English, then you may get a sense of the magnitude of the tasks that confronted the pioneers of natural language processing (NLP).
Even today, computers represent linguistic objects in non-linguistic ways. A computer represents a character in the form of bytes; 1's and 0's which is ASCII code.
Machine Translation (MT) is technology used to translate text or speech from a natural language to another. It performs a simple substitution of characters in one natural language for another. The military and intelligence communities in the US and abroad, in particular, had great hopes in MT and invested accordingly. But, despite the level of funding, the first generation of work in MT was very disappointing. Ref: 
Due to the lack of sophisticated theories and amount of support and funding resulted in the downfall of first generation MT by the mid-1960s.
After two decades, many of the developments in NLP have arisen from a changing view of the nature of computers due to the raise of programming languages and algorithms.
Winograd's SHRDLU program was a crucial landmark in the development of NLP in 1971 as it is now. Winograd's program was written in LISP, the language of choice for most artificial intelligence (AI) researchers during the 1970s.
"One of Winograd's major contributions was to provide an 'existence proof' - to show that natural language understanding, albeit in restricted domains, was indeed possible for the computer. SHRDLU demonstrated in a primitive way a number of abilities - like being able to interpret questions, statements and commands, being able to draw inferences, explain its actions and learn new words - which had not been seen together before in a computer program. SHRDLU was a considerable achievement for one person and one that would have been impossible without the availability of high-level programming languages".
The idea of having programs working with explicitly represented, inspect able rules has been very successful in applications of AI, where these rule-based systems have been developed for tasks like medical diagnosis and the interpretation of geological measurements. Programming languages have emerged which allow the programmer to simply specify the rules and to leave many of the actual processing decisions up to the machine.
One especially exciting development is the rise of "logic programming" languages. Prolog, due to Alain Colmerauer - himself a computational linguist, is the most well-known of these languages and the one that we use in this book. The idea of these languages (still to be completely realized) is for programmers to simply describe their problems in logic, expressing what is to be done rather than how.
In NLP, an example of this might be a programmer specifying a grammar in much the same way as a descriptive linguist. With this representation, the computer would then be able both to generate example sentences allowed by the grammar and to determine whether given sentences were indeed grammatical.
Areas of usage
The Natural Language Processing being the best method of human-computer interaction, used in the fields of
Speech Segmentation: This is used in the field of security where the natural human soundings are compared with the ones in the database.
Speech Recognition: This is a Text to speech conversion where the huge amount of data or information is digitalised just through voice.
Text Segment: Made use of single use boundaries, as the languages like Sanskrit, Chinese, Japanese and Thai do not have single-word boundaries. So any significant text parsing usually requires the identification of word boundaries, which is often a non-trivial task.
In the Existing system the queries are in high level languages like SQL or any other.
The user on this system must be the person must learn the SQL and write the queries in the High level languages.
The user has to be a good programmer.
Good knowledge about the computer.
Should be trained specially.
Funding is high.
If there are multi-department databases, the requirement for employee is difficult.
Situation in a non-IT business field, with multi-divisions and branches.
In addition to these, the people in the different who are computer illiterates areas are unable to communicate with the computers in their regional language.
Of course, though NLP is being implemented in many fields how far was it able to reach a common man needs as it requires implementation, funding and awareness.
In the present system a series of common words are pre- written and up on using only these words during the query entry, the information can be retrieved.
The proposed system is designed to understand and convert the natural language and into the SQL query based. The system makes use of the English parts of speech, divides and identifies the nouns, verbs and conjunctions separately. The SQL query is executed in the oracle database. The results are again shown in the Natural Language.
Here the most frequent words that are used with relation to the organization database and the field in which this is going to be implemented.
The major functional requirements of the system are as follows.
To create a natural language processor.
To create DB Interface to connect the database.
To implement a Natural Language Engine this consists of Search techniques for the words.
Prior information regarding the field on which it's going to be implemented.
Non Functional Requirements:
The major non functional Requirements of the system are as follows
The queries from the client.
The data in the database.
In order to develop and test my research hypothesis, I thought of exampling a corporate company database and their website.
The website proved useful for the illiterate farmers, who are unaware of the Standard English language. The farmers provide with valid USER ID and PASSWORD can logon to the website and access the database. The website is accessible in 3 languages, English, Hindi and Telugu.
But, however the data in the database stored in the distant servers are in the standard English language.
Here using the programming language, a user friendly interface environment is created with the database where the user can just type his/her query in their own language. For example
User wants information of an item details till date : show item till date or item details
Users wants the items purchased by a person : Mr. X items or items by X
The user can enter the query in any language and need not be English.
Steps Involved in working
The user would type the query in their own natural language in the text box provided.
The user could make use of the database connection provided in the list box.
The user will click on the help to read the information about how to use the data.
The user will click on the do query button to execute the query.
Funding and budget
The funding required for such a project would be very less. The usage of computer labs and facilities along with appropriate software would be the major expenses.
CPU : Intel Pentium 4 Processor
RAM : 512 MB
HDD : 80 GB
Network : NIC Card Required
Programming Language : Java (Version JDK 1.5)
Database Backend : Oracle 10g Release 2
Technologies : Servlets, JSP
Scripting Language : Java Script
Operating System : Windows XP Professional with Service pack 2
Thus, the entire budget of the project would not be much; it would not exceed GBP 100. The main factor which would be time consuming and would require effort is the developing of the program modules and the testing of the CMS. The tests would have to be conducted to eliminate as many flaws as possible to make it completely automated.
In addition to this, the cost invested on the equipment and the technology will be a one time investment. So, if the same research on the NLP project is to be carried out later there is no need for the purchase of new.
The system is designed with completely automated process hence there is no or less user intervention.
The system is more reliable because of the qualities that are inherited from the chosen platform java. The code built by using java is more reliable.
The system exhibits high performance because it is well optimized. It uses the automatic garbage collection from java.
The system is designed to be the cross platform supportable. The system is supported on a wide range of hardware and any software platform which is having JVM built into the system.
The system is implemented in the platform independent, Light weight, Java Foundation Classes called java Swings. Core Java classes for the implementation of the AI Concepts.
The User Interface is completely based on the Swing components.
application is packaged into the single package named nlp.
The code subjected in this project is user permissions are issued to GPL General Public License.
Abstract, introduction and methods
Approach and methods
Concluding the project and documentation.
In the above table mentioned some tasks and work plan. It describes all the tasks, duration and required time. For task-1 it takes duration of 7 days and explains about introduction and methods. For task-2 literature review took 15 days. Coming to the point of task-3 that is, Approach and methods has taken nearly 15 days as the company and the person contacted, has responded only after getting permissions from his authorities. The task-4 included the planning for work followed by the task-5 concluding the assignment and preparing the document required.
The Proposed System strives to reach the needs of a common man in rural areas to a business man in sophisticated world. The most common words in the regional language of where the database is held are stored in the Artificial Logic and can be updated by the admin user when ever needed and also done as a regular update by the developers.