The Intentional Exchange Of Information Communication English Language Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Communication is the intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs. Most animals use signs to represent important messages: food here, predator nearby, approach withdraw, let's mate. In a partially observable world, communication can help agents be successful because they can learn information that is observed or inferred by others.

What sets humans apart from other animals is the complex system of structured messages known as language that enables us to communicate most of what we know about the world. Although chimpanzees, dolphins and other mammals have shown vocabularies of hundreds of signs and some aptitude for stringing them together, only humans can reliably communicate an unbounded number of qualitatively different messages.

Communication as action

One of the actions available to an agent is to produce language. This is called a speech act. "speech" is used in the same sense as in "free speech", "not talking ", so emailing, skywriting and using sign language all count as speech acts. English has neutral word for an agent that produces language by any means, so we will use speaker, hearer, and utterance as generic terms referring to any mode of communication.

Fundamentals of language

A formal language is defined as a (possibly infinite) set of strings. Each string is a catenation of terminal symbols, sometimes called words.

A grammar is a finite set of rules that specifies a language. Formal languages always have an official grammar, specified in manuals or books. Natural languages have no official grammar, but linguists strive to discover properties of the language by a process of scientific inquiry and then to codify their discoveries in a grammar. To date, no linguist has succeeded completely. Note that linguists are scientists, attempting to define a language as it is. There are also prescriptive grammarians who try to dictate how a language should be. They create rules such as "don't split infinitives" which are sometimes printed in styles guides, but have little relevance to actual language usage.

Both formal and natural language associate a meaning or semantics to each valid string. For example, in the language of arithmetic, we would have a rule saying that if "X" and "Y" are expressions, then "X+Y" is also an expression, and its semantics is the sum of X and Y. In natural languages, it is also important to understand the pragmatics of a string the actual meaning of the string as it is spoken in a given situation. The meaning is not just in the words themselves, but in the interpretation of the words in situ.

Most grammar rule formalisms are based on the idea of phrase structure - which strings are composed of substrings called phrases, which come in different categories. For example, the phrases "the wumpus" , "the king", and the "the agent in the corner" are all example of the category noun phrase, or NP. There are two reasons for identifying phrases in this way. First, phrases usually correspond to natural semantic elements from which the meaning of an utterance can be constructed: for example, noun phrases can combine with a verb phrase (or VP) such as "is dead" to form a phrase of category sentence (or S). without the intermediate notions of noun phrase and verb phrase, it would be difficult to explain why "the wumpus is dead" is a sentence whereas " wumpus the dead is " is not.

Category names such as NP, VP and S are called nonterminal symbols. Grammars define nonterminals using rewrite rules.

Knowledge in Language Processing

By language processing, we have in mind those computational techniques that process written human language. As language processing this is an inclusive definition that encompasses everything from mundane applications such as word counting and automatic hyphenation, to cutting edge applications such as automated question answering on the Web, and real time spoken language translation.

What distinguishes these language processing applications from other data processing systems is their use of knowledge of language. Consider the Unix wc program, which is used to count the total number of bytes, words, and lines in a text file. When is used to count the bytes and lines, wc is an ordinary data processing application. However, when it is used to count the words in a file it requires knowledge about what it means to be a word, and thus becomes a language processing system.

Of course, wc is an extremely simple system with an extremely limited and impoverished knowledge of language. More sophisticated language agents such as HAL requires much broader and deeper knowledge of language. To get a feeling for the scope an kind of knowledge required in more sophisticated applications consider some of what HAL would need to know to engage in the dialogue.

The knowledge of language needed to engage in complex language behavior can be separated into distinct categories.

Phonetics and Phonology - The study of linguistic sounds.

Morphology - The study of the meaningful components of words.

Syntax - The study of the structural relationships between words.

Semantics - The study of meaning.

Pragmatics - the study of how language is used to accomplish goals.

Discourse- the study of linguistic units larger than a single utterance.


A perhaps surprising fact about six categories of linguistic knowledge is that most or all tasks in language processing can be viewed as resolving ambiguity at one of these levels. We say some input is ambiguous if there are multiple alternative linguistic structures that can be built for it.

Consider the spoken sentence I made her duck. Here's five different meaning this sentence could have (there are more) each of which exemplifies an ambiguity at some level.

I cooked waterfowl for her.

I cooked waterfowl belonging to her.

I created the (plaster?) duck she owns.

I caused her to quickly lower her head or body.

I waved my magic wand and turned her into undifferentiated waterfowl.

Literature Review

Syntactic analysis

If words are the foundation of language processing, syntax is the skeleton. Syntax is used as formalism for defining the sentences of a language which are grammatical (they adhere to the arrangement constraints of the language: ordering, composition, agreement, etc).Syntactic analysis deals with how words are clustered into classes called parts of speech, how they group with their neighbors into phrases, and the way words depends on other words in a sentence. Syntax is the formal relationship between words. Syntax refers to the arrangement of natural language elements in order to form valid sentences of a language. The elements of natural language syntax include the words as well as grouping of words into coherent units which can be arranged with respect to each other. These coherent units are often called phrases; these represent a logical segmentation of the text.

Syntax is used as formalism for defining the sentences of a language which are grammatical (they adhere to the arrangement constraints of the language: ordering, composition, agreement, etc)

.Syntactic analysis deals with how words are clustered into classes called parts of speech, how they group with their neighbors into phrases, and the way words depends on other words in a sentence. Words are traditionally grouped into equivalence classes called parts of speech (POS), word classes, morphological classes, or lexical tags. In traditional grammars there were generally only few parts of speech (noun, verb, adjective, preposition, adverb, conjunction, etc). More recent models have much larger numbers of word classes (45 for the Penn Treebank, 87 for the Brown corpus and 146 for the C7 tag sets).

There are three phases in syntactic analysis


Parts of speech tagging



Tokenization is the process of breaking a stream of given text up into words, phrases, symbols, or other meaningful elements called tokens. The list of tokens becomes input for further processing such as parts of speech tagging and parsing. The tokens may be words or number or punctuation mark. Tokenization does this task by locating word boundaries. Ending point of a word and beginning of the next word is called word boundaries. Tokenization is also known as word segmentation.

Typically, tokenization occurs at the word level. However, it is sometimes difficult to define what is meant by a "word". Often a tokenizer relies on simple heuristics, for example:

All contiguous strings of alphabetic characters are part of one token; likewise with numbers.

Tokens are separated by whitespace characters, such as a space or line break, or by punctuation characters.

Punctuation and whitespace may or may not be included in the resulting list of tokens.

In languages such as English (and most programming languages) where words are delimited by whitespace, this approach is straightforward. However, tokenization is more difficult for languages such as Chinese which have no word boundaries.

Challenges in Tokenization

Challenges in tokenization depend on the type of language. Languages such as English and French are referred to as space-delimited as most of the words are separated from each other by white spaces. Languages such as Chinese and Thai are referred to as unsegmented as words do not have clear boundaries. Tokenising unsegmented language sentences requires additional lexical and morphological information. Tokenization is also affected by writing system and the typographical structure of the words. Structures of languages can be grouped into three categories:

Isolating: Words do not divide into smaller units. Example: Mandarin Chinese

Agglutinative: Words divide into smaller units. Example: Japanese, Tamil

Inflectional: Boundaries between morphemes are not clear and ambiguous in terms of grammatical meaning. Example: Latin.

Parts of speech

The parts of speech for a word give a significant amount of information about the word and its neighbors. This is the clearly true for major categories, (verb versus noun), but is also true for the many fine distinctions. For example these tagsets distinguish between possessive pronouns (my, your, his, her, its) and personal pronouns (I, you, he, me).knowing whether a word is a possessive pronoun or a personal pronouns can tell us what words are likely to occur in its vicinity (possessive pronouns are likely to be followed by a noun, personal pronouns by a verb).

Parts of speech can also be used in stemming for information retrieval (IR), since knowing a word's part of speech can help tell us which morphological affixes it can take. They can also help an IR application by helping select out nouns or other important words from a document. Automatic part of speech taggers can help in building automatic word sense disambiguation algorithms, and POS taggers are also used in advanced ASR language models such as class-based N-grams.

Parts of speech can be divided into two broad subcategories: closed class types and open class types. Closed classes are those that have relatively fixed membership. For example, prepositions are a closed class because there is a fixed set of them in English; new prepositions are rarely coined. By contrast nouns and verbs are open classes because new nouns and verbs are continually coined or borrowed from other languages. It is likely that any given speaker or corpus will have different open class words, but all speakers of a language, and corpora that are large enough, will likely share the set of closed class words. Closed class words are generally also function words: function words are grammatical words like of, it, and, or you, which tend to be very short, occur frequently, and play an important role in grammar.

There are four major open classes that occur in the languages of the world: nouns, verbs, adjectives, and adverbs. Nouns are traditionally grouped into proper nouns and common nouns, proper nouns are names of specific persons or entities. In English they aren't preceded by articles. In written English, proper nouns are usually capitalized. Common nouns are divided into count nouns and mass nouns. Count nouns are those that allow grammatical enumerations: that is they can occur in both the singular and plural and they can be counted. Mass nouns are used when something is conceptualized as a homogenous group.

The verb class includes most of the words referring to actions and processes including main verbs like draw provide and go.

The third open class English form is adjectives: semantically this class includes many terms that describe properties or qualities. Most languages have adjectives for the concepts of color (white, black), age (old, young), and value (good, bad), but there are languages without adjectives.

The final open class form, adverbs is rather a hodge-podge, both semantically and formally. For example: Unfortunately, John walked home extremely slowly yesterday.

The closed classes differ more from language to language than do the open classes. Some of the more important closed classes in English.

-propositions: on, under, over, near, by, at, from, to, with

-determine: a, an, the

-pronouns: she, who, I, others

-conjunctions: ad, but, or, as, if, when

Auxiliary verbs: can, may, should, are

Particles: up, down, on, off, in, out, at, by

Numerals: one, two, three, first, second, third

Prepositions occur before noun phrases; semantically they are relational, often indicating spatial or temporal relations, whether literal (on it, before then, by the house) or metaphorical (on time, with gusto, beside her- self). But they often indicate other relations as well.

A particle is a word that resembles a preposition or an adverb, and that often combines with a verb to form a larger unit called a phrasal verb.

A particularly small closed class is the articles: English has three: a, an, and the. Articles often begin a noun phrase. 'A' and 'an' mark a noun phrase as indefinite, while the can mark it as definite. Articles are quite frequent in English; indeed 'the' is the most frequent word in most English corpora.

Conjunctions are used to join two phrases, clauses, or sentences. Coordinating conjunctions like and, or, or but, join two elements of equal status. Subordinating conjunctions are used when one of the elements is of some sort of embedded status. For example that in 'I thought that you might like some milk' is a subordinating conjunction that links the main clause I thought with the subordinate clause you might like some milk. This clause is called subordinate because this entire clause is the 'content' of the main verb thought. Subordinating conjunctions like that which link a verb to its argument in this way are also called complementizers.

Pronouns are forms that often act as a kind of shorthand for referring to some noun phrase or entity or event. Personal pronouns refer to persons or entities (you, she, I, it, me, etc). Possessive pronouns are forms of personal pronouns that indicate either actual possession or more often just an abstract relation between the person and some object (my, your, his, her,its, one's, our, their). Wh-pronouns (what, who, whom, whoever) are used in certain question forms, or may also act as complementizers (Frieda, who I met five years ago... ).

A closed class subtype of English verbs is the auxiliary verbs. Cross linguistically, auxiliaries are words (usually verbs) that mark certain semantic features of a main verb, including whether an action takes place in the present, past or future (tense), whether it is completed (aspect), whether it is negated (polarity), and whether an action is necessary, possible, suggested, desired, etc. (mood).

English auxiliaries include the copula verb be, the two verbs do and have, along with their inflected forms, as well as a class of modal verbs. Be is called a copula because it connects subjects with certain kinds of predicate nominal and adjectives (He is a duck). The verb have is used for example to mark the perfect tenses (I have gone, I had gone), while be is used as part of the passive (We were robbed), or progressive (We are leaving) constructions. The modals are used to mark the mood associated with the event or action depicted by the main verb. So can indicates ability or possibility, may indicates permission or possibility, must indicates necessity, etc.

English also has many words of more or less unique function, including interjections (oh, ah, hey, man, alas), negatives (no, not), politeness markers (please, thank you), greetings (hello, goodbye), and the existential there (there are two on the table) among others. Whether these classes are assigned particular names or lumped together (as interjections or even adverbs) depends on the purpose of the labeling.

Tag sets for English

There are small numbers of popular tagsets for English, many of which evolved from the 87 tag tagsets used for the Brown corpus. Three of the most commonly used are the small 45- tag penn Treebank tagset, the medium sized 61 tag C5 tagset , the larger 146-tag C7 tagset.

Tag Description Example

Tag Description Example

CC Coordin. Conjunction and, but, or

SYM Symbol +.%.&

CD Cardinal number one, two, three

TO "to" to

DT Determiner a, the

UH Interjection ah, oops

EX Existential 'there' there

VB Verb. Base form eat

FW Foreign word mea culpa

VBD Verb, past tense ate

IN Preposition/sub-conj of, in, by

VBG Verb, gerund eating

JJ Adjective yellow

VBN Verb, past participle eaten

JJR Adj., comparative bigger

VBP Verb, non-3sg pres eat

JJS Adj., superlative wildest

VBZ Verb, 3sg pres eats

LS List item marker 1, 2, One

WDT Wh-Determiner which, that

MD Modal can, should

WP Wh-Pronoun what,who

NN Noun, sing. or mass llama

WPS Possessive wh whose

NNS Noun, plural llamas

WRB Wh-adverb how,where

NNP Proper noun, singular IBM

$ Dollor sign $

NNPS Proper noun, plural Carolinas

# Pound-sign #

PDT Predeterminer all, both

" Left quote (' or ")

POS Possessive ending 's

" Right quote (' or ")

PP Personal pronoun I, you, he

( Left parenthesis ({,[,(,<)

PP$ Possessive pronoun your, one's

) Right parenthesis (],}.>)

RB Adverb quickly, never

, Comma ,

RBR Adverb, comparative faster

. Sentence final punc (,!?)

RBS Adverb, superlative fastest

: Mid-sentence punc (::,,,--)

RP Particle up, off

Table:Penn Treebank Part-of-Speech Tags (Including Punctuation)

Part of speech tagging

Part of speech tagging is the process of assigning a part of speech or other lexical class marker to each word in a corpus. Tags are also usually applied to punctuation markers: thus tagging for natural language is the same process as tokenization for computer languages, although tags for natural languages are much more ambiguous.

The input to a tagging algorithm is a string of words and a specified tagset of the kind. The output is a single best tag for each word.

For example:

Book that flight.

Book/VB , that/DT, flight/NN.

Does that flight serve dinner?

Does/VBZ, that/DT, flight/NN, serve/VB, dinner/NN?

Even in these simple examples, automatically a tag to each word is not trivial. For example, book is ambiguous. That is it has more than one possible usage and part of speech. It can be a verb or a noun. Similarly that can be a determiner or a complementizer.

The problem of POS tagging is to resolve these ambiguities, choosing the proper tag for the context. Most tagging algorithms fall into one of the three classes rule based taggers and stochastic taggers. Rule-based taggers generally involve a large database of hand-written disambiguation rule which specify, for example, that an ambiguous word is a noun rather than a verb if it follows a determiner. Stochastic taggers generally resolve tagging ambiguities by using a training corpus to compute the probability of a given word having a given tag in a given context.


Parsing is the process of analyzing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given (more or less) formal grammar. It is the task of recognizing an input string and assigning some structure to it. Parsing results to form parse trees of a sentence. Parse trees are directly useful in applications such as grammar checking in word processing systems. Parsing identifies whether a given sentence is grammatically correct or not. A sentence which cannot be parsed may have grammatical errors. It also important for semantic analysis and plays vital role in applications like machine translation, question answering, and information extraction.

For example:

In order to answer the query

Can I have a flight to Pokhara on Sunday?

In above example that the subject of the sentence was a flight and that by adjunct was to pokhara to help figure out the user wants a flight on Sunday(not just flight to Pokhara).

Syntactic parser can be used in applications like online versions of dictionaries.

There are different types of algorithm for parsing. The main parsing algorithm are as follows:

Earley algorithm

Cocke-Younger-Kasami (CYK) algorithm

Graham-Harrison-Ruzzzo (GHR) algorithm

//Earley algorithm

// It is one of the context free paring algorithm which is based on dynamic programming. //These dynamic programming algorithms include Minimum distance, vertibi, forward.

S ƒ  NP VP

S ƒ  Aux NP VP

S ƒ  VP

NP ƒ  Det Nominal

Nominal ƒ  Noun

Nominal ƒ  Noun Nominal

NP ƒ  Proper-Noun

VP ƒ  Verb

VP ƒ  Verb NP

Det ƒ  that / this /a

Noun ƒ  book / flight /meal / mpney

Verb ƒ  book / include / prefer

Aux ƒ  does

Prep ƒ  from / to / on

Proper-Noun ƒ Pokhara / TIA

Nominal ƒ  Nominal PP

A miniature English grammar and lexicon



In tokenization if given input is a character sequence and a defined document unit, tokenization is the task of chopping it into pieces, called tokens. It throws away certain characters which are not used in language processing, such as punctuation etc.

Tokenization is done by separating each word. First of all the special characters which may come in the sentence but is not use in language processing is recognized first. The special characters we stored to delaminate them from the sentence are: "whitespace ' ' , full stop '.' , question mark '?' , comma ',' , exclamation sign '!' , semi colon ';' , new line '\n' , tab \t' ". Whitespace is the most frequent in a sentence which is used to separate the words. These special characters are then eliminated from the sentence and each word is stored for further processing. The words after tokenization process are called tokens and these tokens are important in next phases of syntactic analysis called parts of speech tagging.

Tokenizing the sentence which consists of special characters:

INPUT: "Can I have a flight from Kathmandu to Pokhara on Sunday? Book that flight ,Wow! So beautiful lake".


0: Can

1: I

2: have

3: a

4: flight

5: from

6: Kathmandu

7: to

8: Pokhara

9: on

10: Sunday

11: Book

12: that

13: flight

14: Wow

15: so

16: beautiful

17: lake

In the input sentence it consists of special characters like whitespaces, comma, tab, full stop which are eliminated and the output consists of only words which are stored in the form of array.

Parts of Speech Tagging

Artificial Neural Network (ANN)

A neuron is a cell in the brain whose principal function is the collection, processing and dissemination of electrical signals. The brain's information processing capacity is thought to emerge xxvSD primarily from networks of such neurons


Neural network has two layer one input layer and one output layer. With no hidden layer this neural network is basically a linear separator.

Figure: Structure of two layer neural network

Input Layer:

Input layer consists of 192 nodes. Each input node is fed input data collected from the corpus.

So, to tag word0 in sequence of words (word-1, word0, word1, word2) such that we already know Part Of Speech of word-1 to be POSi, where I is between 1 and 48, we have inputs:

In-1, j = Probability of occurrence of POSj after POSi calculated from corpus, where j = 1 to 48

Ink, j = Probability of wordk having POSj, where k = 0 to 2, j = 1 to 48

Output Layer:

1Output layer consists of 48 nodes. Each output node Outi gives probability that word0 has POSi. During tagging the Part Of Speech with probability greater than threshold (0.7) is assigned to the word being tagged.

Figure: Characteristic S-shape of the sigmoid function

Network uses sigmoid output function. The characteristic S-shape of sigmoid function provides a near binary outputs in output node.


Training of neural network is conducted on annotated Wall Street Journal corpus. Training is conducted using back propagation algorithm with a learning rate parameter 0.05 and momentum parameter 0.025.

During training weights are initialized at random value between -0.5 and 0.5. The network is trained for each training set until accumulated absolute error in all output nodes converges to 0.001.


Input: Do you have any flights to Pokhara on Sunday?

Output: Do/VBP you/PRP have/VB any/DT flights/NNS to/TO Pokhara/NNP on/IN Sunday/NNP ?/.


It is the process of parsing a given sentence to know whether given sentence has grammatical errors or not. For parsing we used open NLP parser of open NLP API. The open NLP offers two different parser implementations; the chunking parser and the tree insert parser. Tree insert parser is still research and it is not use for production. So we used chunking parser of open NLP.

INPUT: Book/VB that/ Det flight/NN

OUTPUT: ( S (VP ( NP ( Det that ( Nominal ( Noun flight ) ) ) ) verb Book ) )


Fig : Parse Tree


Text based Air Traffic Information System is an application that focuses on user to query about the flights present on certain date. User can simply query the system in natural language and gets result in natural language. Not only the in query but also can book the flight on certain date user wants.