Advantages That A Graphical Database Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

This paper deals with the advantages that a graphical database can provide over conventional database in terms of real world applications because of a unique property exhibited by the graphical databases which is, that the edges inside the graph that is the relationship between the nodes can be dynamically changed in real time without any computational overrides of dissolving the dependencies thereby providing a more optimized as well as structure to compute and manipulate, which is given as a hypothesis. It also throws light on how the data mining techniques can be implemented in the social networking making it more intelligent and responsive system which also propose a methodology to generate a social graph of user's action and predict future social activities using graph mining. Through this model, we believe that it becomes clearer that data from different contexts can be related such that new solutions can be explored; thus, it may provide illumination for the aforementioned problems and stimulate new research.

Social networking has become an important area of study because of the sprawling and rapidly evolvement over the last decade with the advance of Internet and Web Technologies. Computing is shifting to the edges of network, and individual users are empowered with technology to engage in Social interaction, sharing content and distributing information. As a result the data is increasing day by day at a rate which is difficult to control and process. Hence a model is needed which can handle the discrepancies in the system connecting the people of similar interest which consists of following some pattern which could be extremely useful to obtain information that could be of interest to different people of different areas. Hence we would be using a graphical model to represent social network data, describing and analyzing patterns of social relations. A novel approach for this would be visualizing datasets into dynamic, directed, and weighted graphs. The datasets will be obtained from the online social networking site and the graph would be constructed using the same dataset. The graph depicts the user's status in real time scenario using links and the weights on the links. The association mining rules would be generated based on the graph and these rules would be used for doing various predictions.

One such example of Graph Database System is the Facebook new Architecture which is also known as the Graph Search. Facebook Graph Search is based on Unicorn which is a graph traversing tool. Using traditional information-retrieval systems to mix keyword and structured queries is fairly well understood. But Facebook needed the system also to find answers more than a single connection away, such as "restaurants liked by my friends from India." Unicorn allows you to write a simple s-expression query to get a list of all Facebook Ids which are your Friends of friends. Using s-expression you can traverse the graph in multiple depths in just one go. Facebook stores two basic types of data: objects and the relationships between those objects. And with Open Graph, the model has been extended to third-party applications and websites.

II. Visualization

In general, the way graph databases work is that you use some index to find a starting point, called a node. Once you have a starting point, you traverse the graph node by node by traveling across relationships. Designing for a graph database is all about turning your data into nodes and relationships.

For Facebook Graph that starting point is often your account, every item in the Graph is a node, and your actions (like, share, etc.) are relationships between you and those nodes. 

Recalling that a graph is a collection of vertices (or nodes) and edges between them. The vertices are abstract nodes and edges represent some sort of relationship between them. In the case of a social network the vertices are people and the edges represent a kind of social or person-to-person relationship. "Friends of," "married to," "slept with," and "went to school with" are all example of possible relationships that could determine edges in a social graph. A social structure is represented normally as a graph with individuals as nodes and relationships as edges and a lot could be learnt about the people studying their social network.

If an edge exists {a, b} then we can say that nodes a and b are related to each other. The edges themselves can be unordered pairs of nodes or in a directed graph (digraph), ordered pairs of nodes where each edge has a direction, sometimes called an arc. Graphs are (generally) non-reflexive; nodes are not related to themselves. Order is # of nodes, size is # of edges.

Most of the social networks are not static and hence it is useful to think about how a network evolves over time i.e. how nodes arrive and depart and how the edges form and vanish. If two people in a social network have a friend in common, then there is an increased likelihood that they will become friends too. This could be understood by the Triadic closure-a principle Triadic closure-a principle. The formation of the edge between Band C illustrates triadic closure. B and C have an incentive to become friends (form a BC edge).

Hence we are proposing data mining technique which could be used to form a social graph in the real scenario of dynamic nature. The data obtained, contains information of diverse people from different areas of world and these data are stored in a database. This database can be mined using mining techniques for social marketing professionals, advertisement, analysts and also as find risky group of people for security purpose. Now these predictions can only be done using proper association rules techniques or algorithms that scans multiple databases for association rules. But techniques like apriori , fptree algorithms produces more complexity in terms of time and space as databases are traversed multiple times thus reducing efficiency.

Thus we approach the problem with social graph technique where the database is broken down and visualized as a directed, weighted graph that dynamically increases its size. First the static data is taken from user during registration of the user and a graph is formed which dynamically increases based on usage or behavior of the user within the social networking site. The links within the graph represents association or relationship and the weights represent the strength of the relationship within the links. And so we can use association techniques and suggest friends to the user based on his/hers mutual friends, or become a part of an organization by understanding the likes and dislikes of the user. And this is performed where the database of the user acts as training datasets for social graph technique.

Visualizing the proposed concept:

Why we need the proposed technique - To increase user friendliness of the social networking site and make it more intelligent towards its users, and also to increase social awareness and improvements

How it is done ?- The dataset containing user information is converted to a directed, weighted social graph that dynamically increases its size based on user interaction and behavior and trains itself to intelligently form social relationships between other users, thus creating a social web of information

Rules in the social graph formation:

Each edge represents the relationship between the nodes, which is the activeness of the user and it dynamically increases on association techniques

Each edge consists of weights that represent how strong the relationship is.

Each edges or link are when linked between two users and unidirectional if link is between community and user


Two users are represented using nodes:

Description: C:\Users\Arka\Downloads\graph.jpg

If two they are friends then they are represented with nodes connected by bidirectional edge.Description: C:\Users\Arka\Downloads\graph (1).jpg

And depending on the closeness of friendship between them, weights are assigned like: just friends, best friends, medium friends, etc.

Description: C:\Users\Arka\Downloads\graph (2).jpg

Thus a social graph is formed related to all the relationship between users and also between users and community and they are related using edges with weights.

Description: C:\Users\Arka\Downloads\graph (3).jpg

Now take this concept to calculate the influence factor among the nodes in our graph. To get the picture, consider the following scenario. Facebook, thanks to the newsfeed, is essentially a word-of-mouth engine. Everything we do, from installing applications to commenting on photos, is broadcast to all of our friends via the newsfeed. Intuitively, however, we know that some people are just more influential than others.

In a social network, people who are more influenced are considered to me more important to us. Even Facebook uses the same technique to classify the important news feeds from the other by the method of influence function which is determined by the technique below.

We can measure the influence between users by means of connectivity and strength of relationship between them. We know that the degree of a node is the number of nodes connected to it. And from this connectedness we can find the influence of each node over the other by the weights between the connectivity i.e. the strength of the relationship. Let's represent influence by the function I where,

I = degree (p) = number of friends of the person p

The main advantage of this is to find the influence. The graph is formed using the adjacency matrix calculated, then the influence of a node is the sum of the row in the corresponding row and the operation is less complex in term of performance, but considered to be the naïve approach. Let us consider this graph:

Single person with high Degree


Single person low degree but high connectivityC:\Users\DTek\Music\graph\low-degree.png

Using Id as a measure of influence the first person, p1, has a higher measure of influence because they are directly connected to eight people. The second person, p2, however, has the potential to influence up to 9 people. This same happens in the real world, too. One way to capture this sort of indirect influence is to use a measurement called eigenvalue (or eigenvector) centrality. The idea is this: a person's influence is proportional to total influence of the people to whom he is connected. We'll call this influence measure Ie.

Let's consider I'm the CEO at A Corp. There are four VPs each of whom has an influence of 5 and these are the people to whom I am connected directly. Then this measure says that there is some number λ such that

Ie (CEO) = 1/ λ [ Ie (VP1) ] + [ Ie (VP2) ] + [ Ie (VP3) ] + [ Ie (VP4) ]

λ determines how much influence people share with each other through their connections. If λ is small then the CEO has a lot of influence, if it is large then he has little. How do we calculate λ?

Let G be a social network or social graph, where vertices are people and edges are some sort of social relationship, and let A be the adjacency matrix of G. If there are N people in the social network labeled p1, p2,... then we can generalize the above and say that


Remember that Ai,j is 1 if pi and pj are joined by an edge and 0 otherwise. It's also important to notice that λ is a function of the graph not of any individual node.

If we call xi = Ie(pi) then we can form a vector x whose ith coordinate is the influence of the ith person. We can rewrite the above equation using vectors and matrices, then, as follows:

X = 1/ λ (A/X)


Given a digraph D(V,E), the in-neighborhood of a node v, denoted Ni(v) is the set of vertices that send arcs to v. That is, Ni(v) = {u: (u,v)  E}. The out-neighborhood of a node v, denoted No(v) is the set of vertices that receive arcs from v. That is, No(v) = {u: (v,u)  E}.

A coloration C is an assignments of colors to the vertices V of a digraph. The color of a vertex v is denoted C(v) and the set of distinct colors assigned to nodes in a set S is denoted C(S) and termed the spectrum of S. In Figure 1, a coloration of nodes is depicted by labeling the nodes with letters such as 'r' for red, and 'y' for yellow. Nodes colored the same are said to be equivalent.

Figure 1.

A coloration is a strong structural coloration if nodes are assigned the same color if and only if they have identical in and out neighborhoods. That is, for all u,v V, C(u) = C(v) if and only if Ni(u) = Ni(v) and No(u) = No(v). The coloration in Figure 1 is a strong structural coloration. We can check this by taking pairs of nodes and verifying that if they are colored the same (i.e., are strongly structurally equivalent) they have identical neighborhoods, and if they are not colored the same, they have different neighborhoods. For example, b and d are colored the same, and both of their neighborhoods consist of {a,c,e}.

Note that in strong structural colorations, any two nodes that are colored the same are structurally identical: if we remove the identifying labels from the identically colored nodes, then spin the graph around in space before placing it back down on the page, we would not be able to figure out which of the same-colored nodes was which. Consequently, any property of the nodes that stems from their structural position (such as expected time until arrival of something flowing through the network) should be the same for nodes that are equivalent.

A coloration C is regular if C(u) = C(v) implies that C(Ni(u)) = C(Ni(v)) and C(No(u)) = C(No(v)) for all u, v  V. In other words, in regular colorations, every pair of nodes that has the same color must receive arcs from nodes comprising the same set of colors and must send arcs to nodes comprising the same set of colors. Every structural coloration is a regular coloration.

Figure 10. Regular coloration.

The coloration in Figure 2 (which depicts the same digraph as in Figure 1) is regular, but not strongly structural. To see this, consider that every red node has an out-neighborhood containing only yellow nodes (e.g., C(No(a)={y}) , and an in-neighborhood containing only yellow nodes, while every yellow node has an out-neighborhood containing only red nodes and an in-neighborhood containing only red nodes. Figure 3 depicts another regular coloration. Note that node g could not be colored the same as f or i, because it has an out neighborhood consisting of a white node, while f and i have no out neighborhood at all. Consequently node p could not be colored the same as c and e, since p's out-neighborhood contains a node of a different color than the c and e. This also implies that g cannot be colored the same as f and i because it received a tie from a node of a different color.

Figure 3. Regular coloration

If a graph represents a social network, we can think of the colors as defining emergent classes or types of people such that if one member of a certain class (blue) has outgoing ties to members of exactly two other classes (yellow and green), then all other members of that (blue) class have outgoing ties to members of those same two classes (yellow and green). Thus, regular colorations classify members of a social network according to their pattern of relations of others, and two people are placed in the same class if they interact in the same ways with the same kinds of others (but not necessarily with same individuals).

Just as the various generalizations of cliques are attempts to capture mathematically the notion of a social group, regular colorations are an attempt to capture the notion of a social role system, in which people playing a certain role have characteristic relations with others who are playing complementary roles (such as doctors, nurses and patients).


Thus we have proposed a mixed approach towards the formation of a social graph. This mixed approach is a combination of mining technique, graph coloration technique and influence factor to form a more stable, dynamic structure of the social web. This system also predicts the activities that could be done by the user. Based on this prediction technique, we could also predicts the activity, membership of the user in different community or between two communities. We have also visualized, how these techniques are easier and has less time complexity and better performance than other association rules like apriori algorithm or fptree formation. Thus we conclude the paper by proposing a better approach or improvement in social networking area