Review On Syntactic Analysis Tokenization English Language Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

To put it bluntly, linguistic semantics is the study of meaning. It is related to the sense of the sentence. So, a semantically correct sentence would be the ones which make proper sense. But the term meaning or sense again is a very vague term. The purpose of language itself is to convey meaning [1]. Language can be defined as the sequence of sounds or letters which have meaning. For example, Infants' babbling is not a language although it is a sound. The sentences we use and speak to convey information have meaning. And not just sentences but even the words we use in the sentences which don't contain information have meanings. So the words like the, not, of all contribute something to meaning of the sentence and so can be considered to have meanings themselves. To mean something involves a relationship involving at least one of three different things: the language (symbol), the world (referent) and the intention (thought). Ogden and Richards symbolized the meaning phenomenon as a semiotic triangle.

Relation of truth/falsity


casual relation



casual relation

Figure 2.1.a) Semiotic triangle showing three aspects of meaning

Since it is human beings who use language, the language must originate from our mind or brain. So the language is the result of the 'thought'. The 'symbol' represents whatever perceptible token used to express the speaker's intended meaning [1]. It may be a sound, text or gesture. In general we may say that symbol is the language. The 'referent' then represents the real world object, events or situation that the language refers to. The referent may be context dependent or context independent. While some referent is identifiable only in given context others are identifiable even without context. However, the fact remains that we only know the referents we have mental representation for [1].

For example, when a person says:

2.2.i) I want to leave Kathmandu for Pokhara on Sunday.

Sentence 2.2.i) shows a relation between all three aspects. The person intends to express his/her wanting to travel from Kathmandu to Pokhara. The person uses language to express his/her intention. And the words 'Kathmandu' and 'Pokhara' are the symbols used to represent the two places in the real world.

The word semantics comes from the ancient Greek word semantikos meaning "related to sign". So, linguistic semantics is the study of linguistic signs. So semantics can be defined as the study of the formal representation capable of capturing the meaning of linguistic utterances and the study of the algorithm [2].

Semantic interpretation can be defined as the process of mapping a syntactically analyzed text of natural language into a representation of its meaning [3]. There are three levels of the semantic interpretation that we may distinguish [4]:

Literal meaning: It is the semantic interpretation based on the knowledge of the language and is available to the hearer without knowing the thought or intent involved in production of the language.

Explicature: It is the meaning we get when we involve the contextual information and world knowledge to find out the referent and the work out what symbol represents.

Implicature: When interpretation is made with taking into consideration the particular context under which the language was uttered so as to better understand the thought involved in generation of the language. It involves the identifying intention that the user wants to express.

Also when doing the semantic interpretation we must keep in mind three inferences about the sense and meaning of the utterance [4]:

Propositions: A sentence may be expressed in many different ways while carrying a single meaning and sense. For example a person may express his intent to know departure time of a flight may express it as:

2.2.ii) I want to know about the departure time of the flight.

2.2.iii) Tell me the departure time of the flight.

However, both sentences 2.2.ii) and 2.2.iii) express an abstract idea that remains the same in all these variations. This core sentence meaning is called the 'proposition' of the sentence.

Compositionality: The meaning of sentence can generally be explained using the meaning of its part and the manner in which they are put together. So a meaning of sentence is composed of or built upon the meaning of its parts which may be words, phrases, clauses.

However, there exists exception to the compositionality. Such exceptions are called 'idioms'. The meaning of these idioms cannot be worked out from the meaning of its part.

Entailment: Entailment like in formal logic denotes the facts we can infer from an utterance. It is related to the proposition of the sentence.

2.2. iv) I want to leave in the morning.

From the sentence 2.2.iv) we can infer that the speaker wants flights which depart in the time interval that represents the morning i.e. between (0000hrs to 0800hrs). At the same time the person disqualifies any flights not leaving in the morning from the consideration

Based on these and other characteristic of semantic interpretation different computational methods to actually carry out the semantic interpretation has been developed. These methods have their individual pros and cons and are discussed ahead.

Syntax based analysis

The syntax based semantic analysis is based on the principle of compositionality, which dictates that meaning of utterance is built upon the meaning of the parts and way in their arrangement. As the name suggests it states that semantics of the utterance is based on the syntactic arrangement of the words and their relation provided by the syntactic analysis of the sentence.

Parse tree

Semantic analyzer

Semantic representation

Figure 2.2.b) A simple flowchart of syntax based semantic analysis

The input to the semantic analyzer is the parse tree which completely denotes the syntactic structure of the utterance and also the relations of these words.

The syntax based analysis is based on the logical representation of the world using predicate logic or various forms of intentional logic [1]. Let us consider a sentence

2.2.v) I need to go from Kathmandu to Pokhara.

Then a FOPC representation of the sentence for the event of 'going' looks like:

The semantic analysis now involves attaching the sentence 2.2.v) for example to the representation. So we replace x with the term Kathmandu and y with the term Pokhara yielding:

So we need to define a process to replace the quantified variables with the body of the parse tree of the sentence. So we need to know exactly which variable within the semantic attachment should be replaced by which argument of the verb 'go' in the sentence and then perform the replacement.

There is now a new problem that we do not know when and how each of its quantified variables should be replaced.

As a solution to this problem we use lambda calculus. Lambda calculus is the extension to the First Order Predicate Calculus which includes expression of the following form:

Using lambda expression we can apply them to generate a new FOPC expression. The formal parameter variables are bound to the specified terms to generate the new FOPC. The process is called lambda reduction and is a simple replacement of lambda variables with the specified FOPC term followed by removal of λ. Applying the lambda expression to constant A and then performing the lambda reduction results in a FOPC expression as follows:

Let us again consider the action of verb 'go' in terms of the lambda calculus. The lambda calculus expression of the action of verb 'going' would look like:

This consists of a nested lambda expression. Outer expression provides the variable replaced in first reduction and the inner expression provides variable replaced in second reduction. The general reduction sequence for the sentence 2.2.v) is as shown below:

So we have a formal FOPC representation of the natural language.

Semantic Grammar

The syntax based analysis makes use of syntactic grammar. The structure that syntactic grammar provide are not well suited for the task of compositional semantic analysis [2]. This is because of the fact that the design of grammar puts more emphasis on generalization and avoiding over generation than on semantic sensibility. The reason the traditional grammars cannot provide the structure that compositional semantic analysis needs has been pointed out as [2]:

Key semantic elements are often widely distributed across parse trees. Thus composition required for meaning representation is highly complicated.

Many syntactically significant components present in the parse trees often have no role in semantic processing.

The general nature of many semantic constituents result in semantic attachment that create nearly vacuous meaning representation.

Let us consider an example parse tree of the sentence: I want to go from Kathmandu to Pokhara.

Figure 2.2.c) a syntactic parse tree of the sentence

If we look at the structure of this tree the components that compose the meaning of the sentence is dispersed throughout the tree. At the same time there are very few nodes which have significant contribution to the meaning of the sentence.

To deal with these problems we use semantic grammars. Semantics grammar can address more effectively the needs of compositional analysis. The goal of such grammar is to allow key semantic components to occur together within a single rule. Another aspect is that the rules are now no more general than is needed to achieve the analysis [2].

Then we can create a simple semantic grammar rule for the sentence as:

Here USER, SOURCE and DESTINATION are semantically motivated non terminals. From this rule we can have all the information needed to compose the meaning representation of the utterance.

One disadvantage associated with the semantic grammar is that it completely lacks the concept of reuse. Another disadvantage is that it is susceptible to regeneration [2].

Semantic role labeling

One of the method of semantic analysis that is very well suited for the task of Q&A is the method of semantic role labeling.

Gruber (1965) and Fillmore (1968) first proposed the concept of thematic roles. It involves a set of categories which provide a shallow semantic language for characterizing certain argument of verbs. This approach has two parts [1]:

The lexical entries of the verbs are assumed to include a specification of the type of the arguments they have associated with them.

Possible arguments of all the verbs could be classified into a small number of classes called thematic role, participant role or theta role.

A list based on the one in Carnie 2007 shows a commonly assumed set of roles:

Thematic Role



The initiator of action


The entity that feels or perceives something


Entity that undergoes an action, is experienced or perceived


Entity towards which motion takes place


Sub class of goal for verbs involving a change of possession


Entity from which motion takes place


Object with which an action is performed


Object with which an action is performed


One for whose benefit an event took place

Table 2.2.a) Table of most common thematic roles

Fillmore goes on to state that these roles 'comprise a set of universal, presumably innate, concept which identify certain type of judgments human beings are capable of making about the events that are going on around them, judgments about such matters as who did it, who it happened to, and what got changed'.

The idea was to classify the entire verbal lexicon using a finite set of participant roles [1]. Such list of the subcategorized arguments of each verb is its theta-grid. Some examples are:

2.2.vii) break <agent, theme>

Donate <agent, theme, recipient>

Send <agent, theme, goal>

Get <recipient, theme>

So identification of the verbs' theta-grids constitutes the first part of the linking problem. The second part explains how the various labeled participant roles are linked or mapped onto morpho-syntactic positions like subjects and objects.

However there are many problems with thematic roles [1]:

It is very difficult to assign the arguments of many verbs to the conventional thematic roles.

Sometimes a single argument can satisfy many different thematic roles.

There is no universal thematic hierarchy.

Jackendoff proposed a theory of semantic representation which dispenses the use of theta roles and derives argument structures directly from the semantics of the verbs [1]. In this theory of conceptual structure, selection restrictions are also specified directly by the conceptual structure.

It is also seen that many of the verbs show different argument structures known as alternations [1]. Levin and Hovav proposed that which alternation a verb participates in is explained by its underlying semantic structure. These verbs can be classified into semantically defined classes, which all show similar syntactic behavior with respect to their alternations.

Natural Language Generation


Syntactic Analysis


Part of speech tagging


Semantic Analysis


Lexical Units


Frame Annotation

SQL query generation

Natural Language Generation


Problem Scenario

Design Details

Data Flow Diagram

Class Diagram


Integration and Testing





Future Enhancements