The Relationship Between Natural Languages And Grammars English Language Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

This essay introduces the meanings and the formal definitions of natural language and three types of grammar regular, context-sensitive and context-free as characterized in Chomsky hierarchy according to their generative power and the ability to define a language. It will also explain the relationship between the aforementioned mentioned grammars and natural language.

The definition of natural language varies between scholars, but it is mainly seen as a communication language used by humans to communicate, express knowledge and emotions.

Here I will cite Noam Chomsky's definition of a language and natural languages. He says:

"A language is a set of (finite or infinite) sentences, each finite in length and constructed out of a finite set of elements. All natural languages in their spoken or written form are languages in this sense, since each natural language has a finite number of phonemes (or letters in its alphabet) and each sentence is representable as a finite sequence of these phonemes (or letters), though there are infinitely many sentences" [1] 

This essay investigates the concept of centre-embedding and its importance of deciding the syntax of natural language of which grammar type it is.

What methods are used to demonstrate the relationships between various structures within sentences? The answer to this question is grammar. This piece of work will demonstrate how sentences can be presented using a parsing tree to parse a simple sentence admitted by the grammar.

The Chomsky Hierarchy

The Chomsky hierarchy is a classification of formal languages in terms of their generative power. It is a very commonly used hierarchy in computational linguistics for constructing hierarchy of grammars. It includes four types of grammars plus one called "Mildly Context-Sensitive Languages" -defined to be a fifth useful grammar- these grammars are:

Recursively Enumerable Languages

Context-Sensitive Languages

Mildly Context-Sensitive Languages

Context-Free Languages

Regular Languages

This hierarchy compares the generative power of grammars in terms of their ability to define a language; a grammar will have a greater generative power than another one if it defines a language that the other grammar cannot define. The construction of grammar hierarchy is possible when subsuming the set of languages describable by lower power grammars within those grammars with more power. The following figure illustrates the Chomsky hierarchy.

Figure 1 A Venn diagram of the four languages on the Chomsky Hierarchy. [2] 

Some constraints can be placed on the way that grammar rules can be represented (written) in order to achieve a decrease in the generative power of languages from the most powerful to the weakest. The following table -Table 1- shows the extended Chomsky hierarchy that refers to grammars as "types" and demonstrates the constraints on the form that rules must take.

The Chomsky hierarchy explains the use of terminal and non-terminal symbols to define these constraints. The illustration of these constraints contains a rule skeleton of A as a single non-terminal, α, β are arbitrary strings of terminals and non-terminals, while γ is a non-empty string of terminals and non-terminals.


Common Name

Rule Skeleton

Linguistic Example


Turing Equivalent

α→β, s.t. α

HPSG, LFG, Minimalism


Context Sensitive

αAβ → αγβ s.t. γ


Mildly Context Sensitive



Context Free

A →γ

Phrase-Structure Grammars



A → xB or A → x

Finite-State Automata

Table 1 The Chomsky Hierarchy, augmented by the mildly context-sensitive grammars. [3] 

Regular Grammar (RG)

Jurafsky and Martin in Speech and Language Processing see regular grammars as equivalent to regular expressions, which is the standard notation for characterizing text sequences. This grammar is type 3 in Chomsky Hierarchy and can be used to characterize any given regular language. However; regular grammars can be right-linear or left- linear, the right-linear grammar rule has a single non-terminal on the left, and at most one non-terminal on the right- hand side which must be the last symbol in the string. In the left-linear grammars the right-hand side must start with at most a single non-terminal.

The following is an example of the right-linear grammar.

S aA

S bB

A aS

B bbS


Chomsky and Miller (1958) proved that a language is regular if and only if it is generated by a regular grammar, this proof can be found in textbooks like Hopcroft and Ullman (1979) and Lewis and Papadimitriou (1988).

Regular grammars are considered by many linguists to be inadequate for representing natural languages. This is supported by the argument that recursion is a feature of grammar of natural languages and the fact that regular grammars are unable to generate recursive structures (where the non-terminal on the left-hand side of the arrow in a rule also appears on the right-hand side of a rule).


The concept of embedding is explained by Fred Karlsson (2007) in his journal Constraints on multiple centre-embedding of clauses as being the process of embedding a phrase in the middle of another phrase of the same type. Karlsson provided empirical evidence that the maximal degree of multiple centre-embedding of clauses is exactly 3 in written languages. He also points out the importance of multiple centre-embedding as a major syntactic factor deciding whether natural language syntax is finite-state or context-free.

Centre-embedding is a problem for regular grammars; it is hard to process sentences with more than two or three embeddings as regular grammars have no way of matching the numbers of verbs and nouns embedded; which makes the regular grammars descriptions of the syntax of natural languages repetitious.

Context-Sensitive Grammar (CSG):

Noam Chomsky has introduced the concept of context-sensitive grammar in the 1950s in order to explain the syntax of natural language where it is possible that a word may or may not be in the right context of a sentence. Thus; a formal language is named a context-sensitive language if it can be described by a context-sensitive grammar.

In context-sensitive grammar both the left-hand & right-hand sides of a production rule may be surrounded by a context of terminal and non-terminal symbols. This type of grammar is known to be not practical for parsing computer languages since its rules contain several symbols on the left-hand side.

Formal definition of Context-Sensitive Grammar

The non-terminal symbol A in the context αAβ is rewritten as any non-empty string of symbols. They can be written in two forms, one is αAβ → αγβ and the other form is A →γ/ α→β

Context-Free Grammar (CFG)

Jurafsky and Martin in Speech and Language Processing consider the Context-Free Grammar as a mathematical system that is used for modelling constituent structure in English and other natural languages. According to them; this type of grammar allows modelling constituency whose idea is that groups of words may behave as a phrase or a single unit. Such as; a noun phrase (NP), a verb phrase (VP) and a prepositional phrase (PP).

In many resources it is referred to this type of grammar as Phrase-Structure Grammar, noticing that its formalism is equivalent to Backus-Naur Form (BNF) which is a very well known notation technique used to express context-free grammars.

Context-free grammar generates formal languages where group of words or clauses are deeply nested within other clauses; at the same time overlapping of grammatical structures is not allowed. Unlike regular grammar, a context-free grammar cannot be formulated as a regular expression. Also, this grammar offers a simple and strictly stated way for illustrating how phrases in natural languages are constructed from smaller blocks.

CFG is composed of set of rules that state how symbols of languages are grouped and ordered together. It describes the syntax of most programming languages and allows the construction of parsing algorithms that are efficient enough to determine how a specific string can be generated from the grammar.

Formal definition of Context-Free Grammar

Context-free grammar is defined by a 4-ordered list of elements which is also called a "4-tuple". These elements are listed within parentheses '( )' and separated by commas. Thus, the formal definition for context-free grammar G is:

G = (N, ∑, R, S)

These symbols are divided into two classes; some are called terminal symbols which refer to the words in the language (∑) and the other class refers to the non-terminal symbols (N).

N: Set of non-terminal symbols which are also known as syntactic categories.

∑: Set of terminal symbols (disjoint from N); they are the alphabets of the language defined by the grammar.

R: Set of rules or productions of the form A→ β, (A is a non-terminal and β is a string of symbols from the infinite set of strings

S: Designated start symbol, this symbol forms the root label of all well-formed trees as shown below.

The following is a simple context-free grammar example; it illustrates how to define a grammar and shows how to parse a simple sentence admitted by the grammar. Parsing in context-free grammar determines whether a given string belongs to the language that is defined by this grammar. To do this, a parser maps a string of words to its parse tree.

Example 1: "Omar saw Susan".

my_grammar = nltk.parse_cfg("""

S -> NP VP

VP -> V NP | V NP PP

PP -> P NP

V -> "saw" | "danced" | "talked"

NP -> "Omar" | "Susan" | "ALEX" | Det N | Det N PP

Det -> "a" | "the" | "an" | "my"

N -> "man" | "dog" | "cat" | "telescope" | "park"

P -> "in" | "on" | "by" | "with"


>>> sent = "Omar saw Susan".split()

>>> rd_parser = nltk.RecursiveDescentParser(my_grammar)

>>> for tree in rd_parser.nbest_parse(sent):

print tree

(S (NP Omar) (VP (V saw) (NP Susan)))

For more understanding of the example mentioned above; here is a table of the syntactic categories involved:






Omar saw Susan


Noun phrase

The car


Verb phrase

Saw a park


Prepositional phrase

In the car



The, a









By, in, on

Table 2: Syntactic categories used in Example 1

As shown in the example above, the production rule VP V NP | V NP PP has a disjunction on the right hand side, shown by |, this is a short form for the two possible productions VP V NP and VP V NP PP.





Saw Susan

Figure 2 Parsing tree for Example 1

Accordingly; the symbol to the left hand side of the arrow is a single non-terminal that represents some cluster; but symbols that come to the right of the arrow are an ordered list of terminals and non-terminals.

Jurafsky and Martin in Speech and Language Processing show that context-free grammar can be considered as a device for producing sentences and as a device that assigns a structure to a given sentence. Seeing it as a device for generating sentences provides a reading to the → as rewrite the symbol to the left of the arrow with the string of symbols on the right.


As shown in example 1, the production NP (or noun phrase) can be composed of either a proper noun ("Omar" | "Susan" | "ALEX") or a determiner (Det) followed by a Nominal; a Nominal can contain one or more Nouns (N). Thus, a string like "a man" can be derived from the non-terminal NP. Therefore, CFG can be used to generate set of strings.

NP Det Nominal

NP ProperNoun ("Omar")

Nominal Noun ("book")

Det a

It is possible to combine the above rules with each other since context-free rules are hierarchically embedded.


Det Nom

A Noun


Figure 3 Parse tree for "a man"

The sequence of such rule expansion is called a derivation of the strings of words. The above figure represents a parse tree of this derivation. It shows that we can say that the node NP dominates all other nodes in the tree. Nonetheless; node NP is also immediately dominates the nodes Det and Nom.

"A language is defined through the concept of derivation. One string derives another one if it can be rewritten as the second one by some series of rule applications. More formally, following Hopcroft and Ullman (1979),

If A →β is a production of P and α and γ are any strings in the set (∑ U N)*, then we say that αAγ directly derives αβγ, or αAγ αβγ." [4] 

"Derivation is then a generalization of direct derivation:

Let α, α, … , α be strings in (∑ U N)*, m 1, such that

α α, α α, … , α α

we say that α derives α, or α α.

Thus, the formal definition of the language L generated by the grammar G will be the set of strings composed of terminal symbols that can be derived from the designated start symbol S." [5] 

L = {w| w is in ∑ and S w}

Natural Languages and Grammar

We have learnt before about generative grammar as being considered to be a code that translates between orders of words and combinations of thoughts. Steven Pinker in his book, "The Language Instinct", expresses a grammar as an example of a "discrete combinational system."

"A finite number of discrete elements (in this case, words) are sampled, combined, and permuted to create larger structures (in this case, sentences) with properties that are quite distinct from those of their elements." [6] 

For example, the sentence "book reads student" has a different meaning from the meaning of any of the words inside it, and also different from the meaning of the same words combined in the reverse order.

One of the consequences that grammar is a discrete combinational system -according to Pinker- is that it is independent from cognition. It states how a string of words may form a sentence in a specific grammatical code, but this may not lead to a specific meaning we expect from that sentence. Here are two strings that we can interpret but they are ungrammatical:

"Is walking."

"This sentence no verb."

This demonstrates that having a fixed code for interpreting sentences may lead to ungrammatical sentences as we sense from the strings quoted above. However, the opposite is possible. Sentences can be grammatically correct but make no sense. Chomsky's typical example for this is the following sentence:

"Colourless green ideas sleep furiously."

Pinker also points out the work of Michael Frayn where he wrote a novel The Tin Men with a protagonist engineer called Goldwasser who devised a computer system to generate standard kinds of stories found in the daily papers. Pinker calls this system the "word-chain device" which allows a processor to build a sentence by choosing words from three columns or lists, going from one column to the other. This system is capable of making infinite number of distinct combinations from a finite set of elements depending on the transition probabilities of words following one another.

However, Chomsky refutes the word-chain device; he sees it as a suspicious system and irrelevant to how human language works. He proved that word-chain devices could not produce certain sets of English sentences as they cannot handle long-distance dependencies between an early word and a later one.

Pinker states that a sentence is a tree and not a chain; and that is where the difference between the word-chain system and the natural system of human brains lays; since the human grammar treats phrases as group of words.

He uses the illustration of a tree structure holding words in place; thus it is a powerful invention that eliminates the problems of the word-chain device. He shows that an English sentence (S) consists of a noun phrase (NP) and a verb phrase (VP), while a noun phrase consists of a noun (N) and a determiner (det) and the verb phrase consists of verb and a noun phrase.


NP Det N


So, firstly; using the phrase structure tree makes us avoid coupling phrases wrongly which may lead to grammatical nonsense phrases. By following this structure we can cautiously- create grammatical meaningful sentences.

Secondly; the labelled branches of this tree serve as a comprehensive plan for the entire sentence which allows nested long-distance dependencies to be easily handled.

Thirdly, this kind of trees allows recursion which makes it possible to create an infinite number of structures.

Karlsson in his journal Constraints on multiple centre-embedding of clauses (2007) states that the syntax of natural language is finite-state (type 3 in Chomsky Hierarchy) if and only if there is a limit on multiple centre-embedding.

Also; according to Chomsky (1957), unlimited centre-embedding cannot be generated by finite-state grammars, and they form the basis of the proof that natural languages cannot be regular languages.


From this paper we conclude that regular grammars are considered to be inadequate for representing natural languages; since regular grammars are unable to generate recursive structures and cannot model multiple or unlimited centre-embedding. However; it is apparent that context-free grammar is adequate for modelling constituent structure in English and other natural languages. Since it is more powerful than regular grammars and can express recursive structure.

On the other hand; context-sensitive grammar is 'too strong' as it can describe languages that aren't possible human languages.

Since human natural languages rarely allow overlapping construction; this makes the context-free grammar a crucially required grammar since grammatical structures are not permitted to overlap. This grammar offers a simple way for illustrating how phrases in natural languages are constructed; and it describes the basic recursive structure of sentences.

The tree structure is a powerful invention that eliminates problems of the word-chain device; which is supported by Chomsky's refutation to this type of device as it could not produce certain sets of English sentences as they cannot handle long-distance dependencies within sentences.

I believe natural languages require at least context-free grammars.