This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
NLP is having a very important place in our day-to-day life due to its enormous natural language applications. By means of these NLP applications the user can interact with computers in their own mother tongue by means of a keyboard and a screen.
Automatic summarization - Automatic summarization is the creation of a shortened version of a text by a computer program.
Foreign language reading aid - Foreign language reading aid is a computer program that assists a non-native language user to read properly in their target language. The proper reading means that the pronunciation should be correct and stress to different parts of the words should be proper.
Foreign language writing aid - Foreign language writing aid is a computer program that assists a non-native language user in writing decently in their target language. Assistive operations can be classified into two categories: on-the-fly prompts and post-writing checks. Assisted aspects of writing include: lexical syntax, lexical semantics - driven, idiomatic expression transfer, etc. Online dictionaries can also be considered as a type of foreign language writing aid.
Information extraction (IE) - IE is a type of information retrieval whose goal is to automatically extract structured information, i.e. categorized and contextually and semantically well-defined data from a certain domain, from unstructured machine-readable documents. Information retrieval (IR) - IR is concerned with storing, searching and retrieving information. It is a separate field within computer science, but IR relies on some NLP methods. Some current research and applications seek to bridge the gap between IR and NLP..
Machine translation - Automatically translating from one human language to another. Machine translation, sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. Named entity recognition (NER) - Given a stream of text, determining which items in the text map to proper names, such as people or places. Although in English, named entities are marked with capitalized words, many other languages do not use capitalization to distinguish named entities.
Natural language generation (NLG) - NLG is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form.
Natural language understanding (NLU) - NLU is an advanced subtopic ofÂ Natural language processingÂ that deals with machineÂ reading comprehension.
Optical character recognition - Optical character recognition, usually abbreviated toÂ OCR, is theÂ mechanicalÂ or electronicÂ translation ofÂ imagesÂ of handwritten, typewritten or printed text into machine-editable text. OCR is a field of research inÂ pattern recognition,Â artificial intelligenceÂ andÂ machine vision.
Anaphora resolution - InÂ linguistics,Â anaphoraÂ is an instance of an expression referring to another. In general, an anaphoric expression is represented by aÂ pro-formÂ or some kind ofÂ deictic. Anaphora resolution means finding what the anaphor is referring to.
Question answering - Given a human language question, the task of producing a human-language answer. The question may be a closed-ended or open-ended. Open domain question answering (QA) systems aim to support a user who wishes to ask a specific question in natural language and receive a specific answer to that question, where the answer is to be sought in a (potentially huge) collection of natural language texts. QA has become a important application area of natural language processing technologies in the past few years.
Speech recognition - Given a sound clip of a person or people speaking, the task of producing a text dictation of the speaker(s). Speech recognitionÂ also known asÂ automatic speech recognitionÂ orÂ computer speech recognition converts spoken words to machine-readable input. The term "voice recognition" is sometimes incorrectly used to refer to speech recognition, when actually referring toÂ speaker recognition, which attempts to identify the person speaking, as opposed to what is being said. SpeechÂ recognitionÂ applications include voice dialing, call routing,Â domesticÂ appliance control and content-based spoken audio search, simple data entry, preparation of structured documents, speech-to-text processing, and in aircraftÂ cockpits.
Spoken dialogue system - AÂ Spoken dialog systemÂ is aÂ dialog systemÂ delivered through voice. It has two essential components that do not exist in a text dialog system: aÂ speech recognizerÂ and aÂ text-to-speechÂ module. Text simplification - Text simplificationÂ is an operation used inÂ natural language processingÂ to modify, enhance, classify or otherwise process an existing corpus of human-readable text in such a way that the grammar and structure of the prose is greatly simplified, while the underlying meaningÂ andÂ informationÂ remains the same.
Speech synthesisÂ - Speech synthesisÂ is the artificial production of humanÂ speech. A computer system used for this purpose is called aÂ speech synthesizer, and can be implemented inÂ softwareÂ orÂ hardware. AÂ text-to-speech (TTS)Â system converts normal language text into speech; other systems renderÂ symbolic linguistic representationsÂ likeÂ phonetic transcriptionsÂ into speech.
Word sense disambiguation - Many words have more than oneÂ meaning; we have to select the meaning which makes the most sense in context. This is an NLP task with a long history but one which has come to prominence in recent years as a new, and very high level, application of empirical and machine learning methods in NLP. Summarization - Summarization systems aim to take one or more documents and produce from them a reduced document which contains essential information from the source documents. Difficulty in Achieving NLP
Resolving ambiguity is one of the most difficult tasks in natural language processing because computers lack "common sense"
Natural language processing requires analyzing underlying linguistic structures and relationships, grammatical rules, explicit concepts, implicit meanings, logic, discourse context, and more.
Because individual words and sentences often have multiple meanings, and a single concept can be expressed in many different forms, a significant challenge in natural language processing is how to handle the ambiguity that arises when interpreting a single sentence.
NLP Tools and Techniques
A number of researchers have attempted to come up with improved technology for performing various activities that form important parts of NLP works. These works may be categorized as follows:
â€¢ Lexical and morphological analysis, noun phrase generation, word segmentation, etc. (Bangalore & Joshi, 1999; Barker & Cornacchia,2000; Chen & Chang, 1998; Dogru & Slagle, 1999; Kam-Fai et al.. 1998; Kazakov et al.. , 1999; Lovis et al.. 1998; Tolle & Chen, 2000; Zweigenbaum & Grabar, 1999)
â€¢ Semantic and discourse analysis, word meaning and knowledge representation (Kehler, 1997; Mihalcea & Moldovan,1999; Meyer & Dale, 1999; Pedersen & Bruce, 1998)
â€¢ Knowledge-based approaches and tools for NLP (Argamon et al.., 1998; Fernandez & Garcia-Serrano, 2000).
Dogru & Slagle (1999) propose a model of lexicon that involves automatic acquisition of the words as well as representation of the semantic content of individual lexical entries. Kazakov et al.. (1999) report research on word segmentation based on an automatically generated annotated lexicon of word-tag pairs. Kam-Fai et al.. (1998) report the features of an NLP tool called Chicon used for word segmentation in Chinese text. Zweigenbaum & Grabar (1999) propose a method for acquiring morphological knowledge about words in medical literature. It takes advantage of commonly available lists of synonym terms to bootstrap the acquisition process. Although the authors experimented with the method on the SNOMED International Microglossary for pathology in its French version, they claim that since the method does not rely on a priori linguistic knowledge, it is applicable to other languages such as English. Lovis et al.. (1998) propose the design of a lexicon for use in the NLP of medical texts.
Mihalcea & Moldovan (1999) argue that the reduced applicability of statistical methods in word sense disambiguation is due basically to the lack of widely available semantically tagged corpora. They report research that enables the automatic acquisition of sense tagged corpora, and is based on (1) the information provided in WordNet, and (2) the information gathered from Internet using existing search engines.