Simulation Of Human Mind Computer Science Essay

Published:

This paper deals with the information being stored in the form of graphs and its retrieval based on the user queries. The user will feed in the information into the system and then it will first segregate the information into bits and pieces with the use of textual entailment, stemming algorithms and Natural language processing. Storing of the information is done in a recursive chaining form of graphs, where one node may be linked to the other node and so on, as well as nodes are centrally connected to a single source of information called the brain, of the system. Retrieval of the information is applied in a similar manner to that of storing of information. The latter part involves searching for which BFS technique has been deployed to give an efficient result.

Introduction

The model will seem like the simulation of human mind. As neuron networks store and retrieve information in a human mind (Fig. 3), our model will be storing and retrieving information using weighted graphs. Here, nodes will be storing information and edges between them will give relation due to which they are connected. It will be using the algorithms of natural language processing (NLP) for analyzing the given data and then using parse tree, it will store the useful information in an organized manner. It will use stemming algorithms to exclude the redundant information nodes which may evolve during the time, as the system expands and its knowledge base increases, thus it will help in providing just the nodes which are the most probable ones and are used as the root word for storing of information, this is done to ensure that our graph don't get overcrowded with similar words which may be an overhead in searching of the words when the system responds to user queries. Next thing which we have to take care of is the textual entailment, the main motive behind such a task is that, user can give information in many forms where the syntax of the sentence may be different but the logical derivation of their meanings are same, for our system to recognize it we will use this concept and mark those sentences under a same set of class of sentences, which are referred later on during the searching process. This provides us with the basic requirement to implement our system for the information retrieval and storing in form of the graphs.

Related Work

Lady using a tablet
Lady using a tablet

Professional

Essay Writers

Lady Using Tablet

Get your grade
or your money back

using our Essay Writing Service!

Essay Writing Service

Text can be represented as a graph in various ways, for instance in graphs where vertices denote syllable (Soares et al. 2005), and where edges denote some meaningful relation between the vertices. There exist numerous variants of such text graphs. Widdows and Darow (2002)(Fig.1) describe a semantic class induction algorithm that constructs a graph of semantically related entities by performing web searches of the semantically associated words with all terms that match that pattern. The semantic classes are then computed by partitioning this graph [1]

Figure. 1

Mihalcea et al (2004) (Fig 2) describe an algorithm that determines the most central topics of a document. The approach is to convert the document into a graph of its words and their connections, and then return the nodes with the highest scores for the topics.[2]

Figure 2

Rule Based Information Retrieval by Computer (RUBRIC) (Tong et al., 1987) uses production rules to capture user query concepts. A query is specified in a hierarchical fashion which can be viewed as a set of production rules. Each rule has an associated weight, assigned by the user. RUBRIC determines necessary documents required for the retrieval of information, and the keywords to be matched with contents of documents are at the leaves. Score of each node is calculated by the AND operations or the OR operations.[3]

Ogilvie and Callan (Ogilvie & Callan, 1997) introduce a document retrieval system using a tree based generative language model here in the paper the nodes of the tree correspond to document components, such as titles, sections, and paragraphs. Here each node corresponds to the model and is derived from the text under consideration. An internal node is estimated by a linear interpolation of the child nodes. This approach allows for structured queries and ranking of document components. One benefit of the model includes guidance from language modeling on how to estimate the probabilities are used in ranking.[3]

Proposed Approach

Lady using a tablet
Lady using a tablet

Comprehensive

Writing Services

Lady Using Tablet

Plagiarism-free
Always on Time

Marked to Standard

Order Now

Initially, the model will be just a system with pre-defined training data and methods for its functioning with no user-defined information. It will use 'Mind Map Algorithm' proposed in the paper which takes into consideration the following points for its implementation:

The information retrieval system used in the algorithm is based on the structure of a graph. It will be a collection of various trees interconnected with each other. Each tree will store information for a subject (any outside world entity including person, place or thing is considered as a subject, by the model). Root node of each tree is connected by a main node (root node for all other tree's root node).

Each node of the graph is assigned a unique address which is generated by taking the address of the root node and appending index of the current node to it. (fig. in rough draft*)

Data is stored in the nodes of the graph and these values are physically stored in the database where node address is the primary key.

For common nouns(any person), the model will create 4 nodes automatically which includes 'personal', 'profession', 'relations', 'misc'. For other subject(topic), it will automatically create a new node and will ask user for more information regarding the same subject.

For any subject (topic), it will take the information from the user as static string and then it will convert that string into data tree by parsing it and append it to the related node.

The parser will be used to extract the following (tuple) from the given data: 1) subject 2) value 3) relationship between the subject and value. E.g.:

The Red fort is red in color.

↑ ↑ ↑

subject value relation

For parsing, tagged training data will be given which contains daily conversation dialogues, simple/compound statements and interrogative sentences. Whenever a new conversation starts, it will tag the given string and store the information in structured format.

Each node consists of 3 fields: 1) value of the node 2) address of the node 3) relation with which it is connected to its parent node.

If '*' is the value of node then it is a nested node and each field will store something unique about its root node.

If the node value is '#' followed by some number then it means the actual node value is the value of the node whose address is given by the number followed by '#'.

If edge weight value is in terms of numbering then the root node is multivalued.

Each value can be a root node in itself and it will start a new tree having information about that value (which is the subject for this tree).

The model's efficiency will increase with experience which it will be getting by more and more information that it will be gathering.

Figure 3

Implementation

Theoretical Aspect

We will do a little amount of work before we try to insert information into the nodes of a Graph so as to form our semantic network, which we will use to retrieve information once information has been stored in the network.

We will be using our own Tagger known as the KSR Tagger which will tag the sentences such that information can be stored in the connected graphical network and then can be efficiently extracted from it. It will tag the sentence in the form of K ('Key') S('Subject') and R ('Relation').E.g.:

Who is the father of Mr.X ?

↑ ↑ ↑

key relation subject

(key basically represents 'name' here. Thus, it will create a node whose value will be 'name of the father' and it is connected with the parent node with an edge whose weight is 'relation')

Particularly if this type of question is asked by the user for the purpose of information retrieval, then the sentence will be tagged such that 'Mr.X' is the main subject under consideration, followed by 'father' being the relation of 'Mr.X' to a particular person. 'who' intends to the 'name' of the person so name is the 'key' here and our answer is the value of this key. So, 'Mr.X' is connected to that node by an edge which represents a relation whose weight is 'father'.

Lady using a tablet
Lady using a tablet

This Essay is

a Student's Work

Lady Using Tablet

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Examples of our work

Next, the major problem which we have to solve is the problem of natural language ambiguity, like a person can ask the same sentence in so many ways and forms, and we have to make our system in such a way that it understands each and every query and produces the output.

So the solution for this will be Textual Entailment, the text entailment relation between two texts: T (the text) and H (the hypothesis) represents a fundamental phenomenon of natural language. It is denoted by T -> H and means that the meaning of H can be inferred from the meaning of T [4].

Next for the further refinement of the information to be stored in the network, and our network must be free from ambiguity. For example - The words like 'fisher', 'fish', 'fishing', 'fished' etc can create a number of individual nodes in the graph though they are morphological variants. Now if each word starts to create a new node in the graph it will create redundancy problem, and efficient traversal may not be easy, so for the above problem we will use Stemming Algorithms [5].

Example:

What are the likings of John?

What all things are liked by John?

Which things does John like?

In the above sentences, we can say that they have the same meaning, i.e. "what are the things which John like?" but they have many representations, each is taking various forms of the word 'like'. So after the information extraction, we will go for the processing of information, in this case by the use of Stemming Algorithm, which will reduce all forms of a word to its base word which can be used to store information in the graph such that there is no ambiguity.

The Stemming algorithm which we will be using for our system is affix reduction stemming algorithm [5], in which probability is used to identify a root form of a word under consideration.

Now, as we have performed all the processing needed, we will go for the real job, i.e. storing the information in the network. The procedure is explained above on Page 1.

Now regarding the traversal of the Graph, we will be using the Breadth First Search (BFS) to search the nodes. As, we are dealing with connected graph, complexity of the model remains O(n).

The algorithm for traversing the graph for information retrieval is explained below:

It will start from the 'main root' always.

Traverse into the sub-tree whose root node's value is the subject, the model is dealing with.

From each level, traverse all the child nodes first. If the condition is satisfied, it will stop. Else, it will go deeper into the tree until it finds the required node/edge.

If the required edge/node is not found then it infers that any such information is not stored with the model.

Practical Aspect

User is expected to ask queries which can be categorized as below and the model will response accordingly depending upon the case which satisfies the given query. In each case, the goal of information retrieval differs:

Case I: Value is asked

In this case we will make a set of keys and substitute them for the given key in the question asked. E.g.: 'who' generally refers to 'name' of the person, 'where' refers to 'place'.

Case II: Relation is asked

In this case, we will select the 'subject' and the 'value' under consideration and we will check the edge connecting the two nodes whose weight gives the relation between the two.

E.g.: How is Mr.X related to Mr.Y?

Here, Mr.X is the subject under consideration and the value is Mr.Y, and for our model to answer the question, they must have an edge connecting the two nodes. So, we will search all edges incident on the node valued Mr.X and if any of it leads to the node valued Mr.Y, then the weight of that edge will be the answer to the question asked.

Case III: Subject is asked

Here, the model will first extract all the characteristics/properties of the subject using parse tree. It will then scan the graph and will find the sub tree which satisfies the given characteristics. The root node of this sub-tree will be the output.

E.g.: Which shop near VIT, which specializes in pizza?

In this case the subject is missing, which forms the node of the graph. Here, characteristics of the subject are: 'near VIT' and 'specializes in pizza'. The model will search for the sub-tree with these conditions and will provide the output.

While storing the given information it will first tag the given sentence for storage purpose. Here, Stanford parser[6] is used for tagging the given query and. It will generate each key-value pair from the generated parse tree. For example, VP-NP, PP-NP, VP-NN, VP-VP NP forms a key-value pair whereas pobj, nsuj, nsubjpass (attributes of Stanford typed dependencies) are used to extract subject of the query.

Example: 1

John is doing his B.tech from Caltech in computer science.

↑ ↑ ↑ ↑ ↑ ↑ ↑

subject key1 value1 key2 value2 key3 value3

(the model will extract multiple information from a single source i.e. from one single query)

Tagging

John/NNP

is/VBZ

doing/VBG

his/PRP$

B./NNP

tech/NN

from/IN

Caltech/NNP

in/IN

computer/NN

science/NN

Parse

(ROOT

(S

(NP (NNP John))

(VP (VBZ is)

(VP (VBG doing)

(NP (PRP$ his) (NNP B.) (NN tech))

(PP (IN from)

(NP

(NP (NNP Caltech))

(PP (IN in)

(NP (NN computer) (NN science)))))))))

Typed dependencies, collapsed

nsubj(doing-3, John-1)

aux(doing-3, is-2)

root(ROOT-0, doing-3)

poss(tech-6, his-4)

nn(tech-6, B.-5)

dobj(doing-3, tech-6)

prep_from(doing-3, Caltech-8)

nn(science-11, computer-10)

prep_in(Caltech-8, science-11)

Example :2

Robert is a teacher by profession.

↑ ↑ ↑

subject value key

Tagging

Robert/NNP

is/VBZ

a/DT

teacher/NN

by/IN

profession/NN

./.

Parse

(ROOT

(S

(NP (NNP Robert))

(VP (VBZ is)

(NP

(NP (DT a) (NN teacher))

(PP (IN by)

(NP (NN profession)))))

(. .)))

Typed dependencies, collapsed

nsubj(teacher-4, Robert-1)

cop(teacher-4, is-2)

det(teacher-4, a-3)

root(ROOT-0, teacher-4)

prep_by(teacher-4, profession-6)

Example :3

Alexar was founded by Trix in 1987.

↑ ↑ ↑ ↑ ↑

subject key1 value1 key2 value2

Tagging

Alexar/NNP

was/VBD

founded/VBN

by/IN

Trix/NNP

in/IN

1987/CD

./.

Parse

(ROOT

(S

(NP (NNP Alexar))

(VP (VBD was)

(VP (VBN founded)

(PP (IN by)

(NP (NNP Trix)))

(PP (IN in)

(NP (CD 1987)))))

(. .)))

Typed dependencies, collapsed

nsubjpass(founded-3, Alexar-1)

auxpass(founded-3, was-2)

root(ROOT-0, founded-3)

agent(founded-3, Trix-5)

prep_in(founded-3, 1987-7)

Example :4

Michal likes playing football, swimming and listening music.

↑ ↑ ↑ ↑ ↑

subject key value1 value2 value3

(here all values points to one key)

Tagging

Michael/NNP

likes/VBZ

playing/VBG

football/NN

,/,

swimming/NN

and/CC

listening/NN

music/NN

./.

Parse

(ROOT

(S

(NP (NNP Michael))

(VP (VBZ likes)

(S

(VP (VBG playing)

(NP (NN football) (, ,) (NN swimming)

(CC and)

(NN listening) (NN music)))))

(. .)))

Typed dependencies, collapsed

nsubj(likes-2, Michael-1)

root(ROOT-0, likes-2)

xcomp(likes-2, playing-3)

nn(music-9, football-4)

conj_and(football-4, swimming-6)

nn(music-9, swimming-6)

conj_and(football-4, listening-8)

nn(music-9, listening-8)

dobj(playing-3, music-9)

Example :5

India got independence in 1947. It is located in south Asia.

↑ ↑ ↑ ↑ ↑

subject key1 value1 key2 value2

(extracting information from compound statement)

Tagging

India/NNP

got/VBD

independence/NN

in/IN

1947/CD

./.

It/PRP

is/VBZ

located/VBN

in/IN

south/JJ

Asia/NNP

./.

Parse

(ROOT

(S

(NP (NNP India))

(VP (VBD got)

(NP (NN independence))

(PP (IN in)

(NP (CD 1947))))

(. .)))

(ROOT

(S

(NP (PRP It))

(VP (VBZ is)

(VP (VBN located)

(PP (IN in)

(NP (JJ south) (NNP Asia)))))

(. .)))

Typed dependencies, collapsed

nsubj(got-2, India-1)

root(ROOT-0, got-2)

dobj(got-2, independence-3)

prep_in(got-2, 1947-5)

nsubjpass(located-3, It-1)

auxpass(located-3, is-2)

root(ROOT-0, located-3)

amod(Asia-6, south-5)

prep_in(located-3, Asia-6)

Conclusion

The process of information storage and extraction is one of the very essential steps for making any human-computer interaction model. The algorithm proposed is easier to implement and its application ranges from personal conversations to inferences and analysis of any statistical data provided suitable algorithms are fed to it. The usage of graph has not only increased the efficiency in terms of time and space but also provided a suitable way to retrieve information and thus generating an output for justified queries of the user.

In future, further functionalities can be appended to the model including execution of routine tasks, priority based working according to users' choices, self-learning through external sources like internet for gaining more experience in lesser time, etc.