# The Study Of Graphs And Algorithms Computer Science Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The study of graphs and algorithms is very crucial in computer science. In the last few decades the use of algorithms has solved many difficult problems in a very easy manner. Therefore in many universities and colleges a lot of emphasis is made on the study of graphs and algorithms. A lot of research work is being done on this phase of computer science field.

Analysing an algorithm means the time taken by the algorithm to execute i.e. an approximation of the number of operations that an algorithm performs. The number of operations depends on the execution time; therefore time is used to describe an algorithm's computational complexity. The actual number of seconds that an algorithm takes to execute on a computer is not useful in analysis because the relative efficiency of algorithms that solve a particular problem is our main concern.

Graphs are a main part in composite systems and the use of graph algorithms to excerpt useful information for a problem is the main method of analysis for complex systems. A graph representation is used to provide a particular viewpoint on a problem and a graph hypothetical demonstration is useful for a particular problem depends on whether the interesting features of the problem are likely to be emphasized by the kind of idea that a graph-based representation may provide. The most common types of problem in which operational features play a vital role are problems involving multiple interrelating representatives in complex systems disciplines [3].

So the project is design and implementation of breadth first search on implicit graph for web mining application.

The web application will be the website of college BITS-Pilani, Dubai having 3 areas of interest:

Special Interest Group (SIG).

Higher Education (HE).

## Objectives

The present work is intended to meet the following objectives

To design BFS implicit graph and algorithm for web mining application.

To identify the Academic Search related functions where Web Structure Mining and Breadth First Search can be applied effectively.

## Motivation

The net structure of the World Wide Web (www) is constantly changing due to the addition and removal of web pages (i.e. nodes) or the increase or decrease in the no. of incoming and outgoing links (i.e. edges) to or from a page. It is very essential to maintain the web structure in an organized way and implementing using BFS under Graph Structure Analysis.

## BFS performed on the graph

Fig 1.4 Problem Description

## Methodology

Phases 1: Literature Survey (referring to journals, catalogues, websites etc.).

Phases 2: Data Collection, Learning of Necessary Software or Hardware Tools and Installation and Configuration Steps, Problem Analysis.

Phase 3: Problem Analysis cont. and Design and Generation of Real Data, Coding.

Phase 4: Coding Contd. and Individual Modules Testing and Debugging.

Phase 5: Integration Testing and Debugging, Implementation.

Phase 6 and 7: Final Touches, Summary of Findings, Recommendations, Conclusion and Report writing.

Table 1.5 Phases of the project

## Phase 1

Sept6th-Sept15th

In-depth Research on Graph Structure Mining and breadth first search.

## Phase 2

Sept16th - Oct1st

Acquisition and installation of the intended platform.

## Phase 3

Oct 2nd- Oct15th

Learning the coding language of platform.

## Phase 4

Oct 16th - Nov1st

Submission of the mid-semester report, Design and generation of real data, coding.

Nov2nd-Nov15th

Coding.

## Phase 6

Nov 16th- Dec 1st

Implementation and Testing.

## Phase 7

Dec1st- Dec 15th

Fine tuning of the report, conclusion and submission of the final report.

## 1.5.1 Plan of Work

Fig 1.5.1 Plan of work

## Summary

In this chapter it's discussed about graphs and its use in web mining applications. The project deals with design and implementation of breadth first search on implicit graph and java coding for web mining application i.e. an educational website. It also discusses about the problem description, methodology and plan of work.

## 2.1 Graphs, BFS and Web Mining

For computing running time of an algorithm for a given graph G=(v,e) the size of the input in terms of the no. of edges and vertices is measured. The asymptotic notation i.e. the O notation or Î˜ notation denotes the running time of an algorithm.e.g. the algorithm runs in time O (V E) means that the algorithm runs in time O (|V||E|) [3].

A graph is a way of demonstrating networks between points. Mathematically a graph is a collection of nodes and edges. Nodes are the points that are linked together by the edges of the graph. In the undirected graph the edge is a two way connection and in the directed graph the edges connect only one way [5].

1

4

3

2

6

5

Fig 2.1 A sample graph with nodes 1, 2, 3, 4, 5, 6

In Fig 2.1 1, 2, 3, 4, 5, 6 are the nodes and the path between them are edges. The edges are directed edges (if shown by arrows) or can be weighted edges (denoted by some numbers assigned to them). Therefore a graph can be a directed or undirected and weighted or un-weighted graph.

There two ways to represent a graph are:

For both directed and undirected graphs the amount of memory required by adjacency list is O(V + E).Adjacency lists are used to represent "weighted graphs" by a weight function w: E â†’ R.

The problem with the adjacency list depiction is that there is no quicker way to resolve if a given edge is present in the graph or to search it in the adjacency list Adj[]. This disadvantage is taken over by the adjacency matrix representation of the graph but adjacency matrix uses more memory [5].

The adjacency matrix representation of the graph G consists of a |V|x|V| matrix A = (aij) such that

aij= 1 if (i, j) E

aij= 0 otherwise [5]

The adjacency matrix of a graph requires O(V2) memory (or the running time) autonomous of the edges contained in the graph [4].

The adjacency matrix representation can also be used for weighted graphs as the adjacency lists.

In a breadth first traversal first visit the starting node and then on the first pass visit all of the nodes directly connected to it. In the second pass visit nodes that are two edges away from the starting node. With each new pass visit nodes that are one more edge away. As there can be cycles in the graph it is possible for a node to be on two paths of different lengths from the starting node. When visiting that node for the first time along the shortest path from the starting node that is not considered again. Therefore, either keep a list of the nodes visited or use a variable in the node to mark it as visited node to prevent multiple visits [6].

Using the adjacency matrix representation and assuming that the number of vertices is N, breadth first search can be written like this

## BFS Algorithm

Void BFS (int startingVetex) {

IntQueue queue; // Starting with an empty queue of integers.

for (int i = 0; i < n; i++)

## {

visited[i] = false; // Mark all vertices unvisited.

## }

visited [starting Vertex ] = true;

queue.enqueue (starting Vertex);

while (! queue.isEmpty ()) {

int v = queue.dequeue (); // Get next vertex from queue.

process (v); // Process the vertex.

for (int u = 0; u < n; u++) {

if (edge Exists[v][u] && visited[u] == false ) {

// we have an edge from v to an unvisited vertex;

// mark it as visited and add it to the queue.

visited[u] = true;

queue.enqueue (u);

## }

In this algorithm each vertex that is encountered is marked as visited and placed on the queue. It is processed when it comes off the queue. If it is encountered a second time (by an alternate path from the starting vertex), it has already been marked as visited and so is not placed on the queue again. Since no vertex is placed on the queue more than once the while loop is executed max N times. The run time for this algorithm is O(N2) where N is the number of vertices. For edge list representation the run time would be O (M) where M denotes the no. of edges [6].

For this algorithm the first vertex on the queue is the starting vertex. When the starting vertex is removed from the queue all vertices that are connected to the starting vertex by a single edge are located and are placed on the queue i.e. the vertices at distance one from the starting vertex. As these vertices are removed from the queue, the vertices to which they are directly connected are added to the queue. These are the vertices at distance two from the starting vertex and they follow all the distance one vertices on the queue. As the distance two vertices are processed the distance three vertices are encountered and are added to queue following all the distance two vertices and so on. Vertices are added to the queue in order of their distance from the starting vertex i.e. they are added in breadth first order [6].

## 2.1.2 Analysis of BFS

Before deliberating the various properties of breadth first search, scrutiny of the running time on an input graph G is done. The operations of enqueue and dequeue takes O (1) time. Therefore the total time taken by queue operations is O (V). The total summation of the lengths of all the adjacency lists is O (E).Therefore the total time consumed in scanning adjacency lists is O(E). The running time for initialization is O (V). Therefore the total running time of BFS is O(V+E). Thereby BFS runs in time depending upon the size of the adjacency list of graph G [3].

## 2.1.3 Web Mining

Web mining is the combination of information collected by data mining methods with information grouped over the World Wide Web. Web mining is used to understand customer behaviour, estimate the efficiency of a particular web site and help compute the success of a marketing operation [7].

Web mining allows us in outlining the data through content mining, structure mining and usage mining. Content mining is used to observe data that is collected by search engines. Structure mining is used to scrutinise data related to the structure of a particular web site. Usage mining is used to survey data related to a user's browser as well as data gathered by forms succumbed by the user during web dealings [7].

The information collected through Web mining is estimated (using software graphing applications) by using data mining parameters such as clustering and classification, association and examination of sequential patterns.

## 3.1 Introduction

The project deals with the design and implementation of BFS on BITS-Pilani, Dubai website using JAVA coding. The graph generation is also part of it.

## 3.2 Hardware Software Requirements

The project work is done using a system with the following configuration

## Hardware

Processor- Intel Core 2 Duo T5450 @ 1.67 GHz.

Motherboard - Gigabyte P43-ES3G - Intel P43 Chipset +ICH10.

RAM - Lenovo 2.00 GB RAM (DDR2 1066 MHz.

Hard Disk - Seagate 160 GB SATA.

Optical Drive - Lenovo SATA 32x DVD-writer.

Network card - Onboard.

Sound Card - Onboard high-definition Audio.

Graphics card - NVIDIA GeForce 8400 GT-256 MB.

## Software

Operating systems - Windows Vista SP 2.

Office suite - Microsoft Office 2010.

JCreator 4.50 Pro.

JDK.

Telnet.

Graph Traversal Applet.

## 4.1.1 JCreator

JCreator is a Java IDE (Integrated Development Environment) created by Xinox Software. Its interface is close to that of Microsoft's Visual Studio.

Features of JCreator are:

Convention colour schemes.

Enclosing the prevailing projects.

Various JDK profiles can be used.

Rapid code inscription via project prototypes.

Easy project inspecting through class browser.

Debugging using an easy interface. No command line prompts necessary.

ProgrammedÂ Class pathÂ formation.

The run time environment runs the application in a J Unit background or in a command line window as an applet.

JCreator IDE doesn't require a Java Runtime Environment (JRE) to execute that makes it faster than Java-based IDE's [9].

## Starting a new program

JCreator creates the directory for the user.

SelectÂ Project->New Project. In the dialog name is given to the project. For example, if the homework files are contained inside a directoryÂ hw1, call the projectÂ hw1 [9].

EditÂ LocationÂ field (contains the full index path of program files), such asÂ c:\homework\hw1.

Click on theÂ OkÂ button. The project is opened and the software (JCreator) augments a Java file with the same name as the project such asÂ Hw1.java. SelectÂ Edit->DeleteÂ from the menu to delete the file. A new class is added to the project by choosingÂ Project->New classÂ from the menu [9].

Fill in the name of the class (such asÂ Hello). A new file (such asÂ Hello.java) is added to the project. Type the program into the source window. Note the code accomplishment feature that suggests how to complete partially typed lexes.

## Compiling a program

SelectÂ Project->Active Project and then select the current project from the submenu.

To compile the program, selectÂ Build->Compile ProjectÂ or hit theÂ F7Â key.Â  Compiling errors are demonstrated in a separate window.Â

## Running a program

After the program compiles successfully run it. SelectÂ Build->Execute ProjectÂ or hit theÂ F5Â key. The program executes in a separate console window [9].

## 4.1.2 The Graph Traversal Applet

This software animates Breadth First Search or Depth First Search algorithms and shows how graphs are traversed in each algorithm.

The applet provides a simple user interface that helps the users to control all the tasks required to run the animation. For accessing the user interface, precedence is given to allow users to control the animation instead of the applet monitoring the user actions and the rate of animatronics [5].

## 4.2.1 Java File Handling

The Java's I/O system is stored java.io. Put java.io when needed to read or write files. The other package including file handling classes is java.util.zip. The classes in java.util.zip can produce compressed or decompressed files. This class is based on the functions provided the I/O classes defined in java.io. Therefore it is unified in to Java's overall I/O strategy.

## Sample

public void writeFile() {

String filename = "c:\\test.txt";

try {

BufferedWriter writer = new BufferedWriter(new FileWriter(filename,true));

writer.write("test the input.");

writer.newLine();

writer.write("line 4");

writer.close();

} catch (IOException g) {

g.printStackTrace();

## }

Frontier search is a memory efficient approach to graph search that only stores the Open list and saves memory by not storing the 'closed list'. It uses techniques to prevent regeneration of previously closed nodes that are no longer in memory as the traditional trace back method of solution recovery requires all closed nodes to be in memory. It also uses a divide and conquer method of solution recovery in which each search node keeps track of an intermediate node along the best path that leads to it. When the goal node is selected for expanding the intermediate node associated with it must be on an optimal solution path. This intermediate node is used to divide the original search problem into two sub problems - Finding an optimal path from the start node to the intermediate node and to find an optimal path from the intermediate node to the goal node. Then these sub problems are solved recursively by the same search algorithm until all the nodes along an optimal solution path for the original problem are identified [1].

## 5.1 Introduction

In this chapter the discussion is about the algorithm regarding the generation of the graph by inputting the structure of the website (say www.bitsdubai.com) and then implementing breadth first search for this graph generated. The coding of the program is done with java and the platform is JCreator.

The program is an open program i.e. it can be implemented for any web application and by any user.

## Input

The inputs for the program are the links associated with site www.bitsdubai.com these links act like nodes and edges for the graph.

For the class ProjGraph :

Parsing a text file holding the data of the graph.

To override the toString() method to check the proper construction of the graph.

For the class BFS :

Tracking the edges of a particular vertex.

Checking if some vertex is connected to some other vertex or not.

## Generating Graph

Creation of the adjacency list and parsing the data file.

As the graph is a undirected graph, therefore connect vertex X to vertex Y and vice versa by using a private-helper method.

If vertex X already exists in the adjacency list, get its edge list and add vertex Y to it.

If it doesn't exist create a new Array List and add vertex Y to it and put it all in the adjacency List.

The function "private void connect()" returns true iff vertex X points to vertex Y.

The function "return edges.contain()" returns all the edges of a certain vertex or throws an exception if the vertex does not exist in this graph ("throw new RuntimeException()").

The function "private void parseDataFile(String filename)" reads a text file with the graph-data.

Then the string representation of the graph is created.

The main function of the program takes the input from the data file using file handling and generates the graph.

## Performing BFS

A stack is created to store all the vertices of our path.

First the 'end' vertex is pushed on the stack.

Then a loop is put from the highest level back to the first level and a loop through each level.

Each vertex in that level is checked whether it is connected to the vertex on the top of the stack. If a match is found that match is pushed on the stack and broken out of the loop.

Now the stack is reversed before returning the path in the correct order (from start to finish).

During the BFS traversal a queue 'cont' is created consisting of newly discovered vertices.

Get the level from the queue which was added last and then a new level is created to possible unexplored vertices in.

Loop is done through each of the vertices from the last added level and connects the neighbouring vertices to each vertex.

Then looping through each connecting vertex and if the connecting vertex has not yet been visited, only then add it to the newly created level-list and mark it as visited in our set.

There is no need to search any further if user stumble upon the 'end' vertex .In that case just 'return' from this method.

Only make the recursive call if vertices which have not been explored yet are found.

AnÂ emptyÂ queue (cont)Â is created toÂ storeÂ allÂ theÂ levelsÂ fromÂ theÂ 'start' vertex in. A level is also a list with one or more vertices.

TheÂ firstÂ levelÂ onlyÂ holdsÂ theÂ 'start'Â vertex,Â whichÂ isÂ addedÂ first, thisÂ isÂ theÂ 'initial'Â list.

The 'start' vertex is also stored in asset which keeps track of all the vertices we have encountered so that we do not traverse vertices twice (or more).

Once the above 3 steps are initialized, call the actual BFS-Algorithm.

OnceÂ theÂ BFS-AlgorithmÂ isÂ done,Â callÂ theÂ backTrack()Â method toÂ findÂ theÂ shortestÂ pathÂ betweenÂ 'end'Â andÂ 'start'Â betweenÂ theÂ exploredÂ levelsÂ ofÂ the graph.

## Output

The output of the program is the graph generated for the structure of the website as input and for this graph BFS is performed.

## 5.3 Summary

Therefore the main method in the program involves getting the structure of the website for which the BFS has to be performed and then parsing the text file (java file handling) that holds the data of our graph.

Therefore the main method involves:

Creation of the graph.

File handling.

Getting the shortest path between 2 vertices (BFS) and print the shortest path.

## 6.1 Performance Analysis of BFS

Table 6.1 Requirements for BFS referring time and memory

## 6.2 Applications of BFS

Finding all nodes within one connected module.

Finding the shortest path between two nodes.

Analysing a graph for bipartiteness.

Computing the maximum flow in a flow network.

BFS can be implemented to any search problem. BFS never get trapped exploring the useless path. For more than one solution, BFS can find the minimal one that requires less number of steps [1].

BFS can be used to experiment bipartiteness by beginning the search function at any vertex of the graph and giving sporadic labels to the vertices which have been visited.

The disadvantage of BFS is being a "blind search" i.e. when the search space is large the search recital will be poor associated to other exploratory searches .BFS performs well for the small search space. It performs best when the goal stated is in upper left hand side of the tree. It performs judiciously poorly relative to other search algorithms if the goal state lies in the bottom of the tree. The main drawback of BFS is its memory obligation. If any solution is far away from the root, BFS will consume lot of time [1].

## FUTURE EXTENSIONS

This project can be implemented in future for Image Processing Problems.

The basis of the project i.e. breadth first search can be used for the dispensation of digital images by using competent algorithms for corroding, distending, skeletonizing, and distance transforming regions [2]. These algorithms traverse in breadth first way, using a queue for storage of unrefined pixels and manage memory efficiently. As soon as the processing has been completed the pixels are removed from the queue. The image is still epitomized as a pixel milieu in memory but the graph is just a appropriate structure for thinking about the algorithms. All the algorithms will be based on viewing a raster image not as a pixel matrix but as a graph whose vertices represent pixels and whose edges represent neighbourhood between pixels. This suggests that pixels in the image can be traversed in a breadth-first manner, rather than with the traditional raster scan. Because pixels to be processed are stored in a queue, breadth-first traversal yields propagation patterns that grow or shrink uniformly across their boundary like wave fronts [2].

The algorithm of image processing problem will be using following:

Flood Filling (or Object labelling).

Storing Boundary Pixels.

Repeated Erosions.

Distance Transformation.

Skeletonization.

This project can also be implemented in future for creating an educational website where the user can select keywords that are present in world-wide web. Then using breadth first search to extract the nodes and their values in a web page and map them to a proper format so that it is easy for the users to search for a particular data whenever they want.

## LESSONS LEARNT

In the project regarding BFS it has been seen that BFS is a comprehensive search algorithm. It is easy in implementation and can be used for any search problem.

BFS has no probable infinite loop problem which crashes the computer. Breadth first search never gets entombed while probing the useless path. If a solution exists, BFS traces it out.

Moreover a lot of information about web mining, graphs, java file handling, parsing etc. is also discussed and their use in computer science. This project provides a basis for designing academic search web techniques and it also explains how the links in a website are accessed and explored.

## CONCLUSION

The report concludes the four months of the project work on Design and Implementation of BFS on Implicit Graph for Web Mining Application. The work mainly revolved around studying and analyzing as much literature as possible on topics like running time of algorithms, graphs, breadth first search, graph traversal applet, web mining, JCreator etc. In this report the discussion was about the literature, algorithm, design (data, program and interface) java coding and implementation of the algorithm of BFS, execution time of the algorithm, creation and traversal of the graph for BFS.

Therefore this project helped to learn, understand and comprehend the various aspects of computer science topics practically. The last four months' worth of knowledge and data helped in analysing the project. Therefore thorough knowledge about the aforementioned topics will provide for a concrete foundation for the aspirations of managing engineering in the near future.

## ACRONYMS

JRE-Java Runtime Environment.

IDE-Integrated Development Environment.

www-World Wide Web