# The Importance Of Graphs And Algorithms Computer Science Essay

**Published:** **Last Edited:**

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The study of graphs and algorithms is very crucial in computer science. In the last few decades the use of algorithms has solved many difficult problems in a very easy manner. Therefore in many universities and colleges a lot of emphasis is made on the study of graphs and algorithms. A lot of research work is being done on this phase of computer science field.

In the last four decades graph and geometric problems have been studied by computer science researchers using the framework of analysis of algorithms. Graph theory is the study of the properties of graphs. Graph algorithms are one of the oldest classes of algorithms and they have been studied for almost 300 years. Graphs provide essential models for many applications areas of computer science and mathematics. There have been a number of exciting recent developments in graph theory that are important for designers of algorithms to know about. Correspondingly the algorithmic viewpoint of computer science has stimulated much research in graph theory. Graph theory and graph algorithms are inseparably intertwined subjects. On the other hand the main impetus for the development of geometric algorithms came from the progress in computer graphics, computer aided design and manufacturing. In many application domains- computer graphics, geographic information systems, robotics and others, geometric algorithms play a crucial role [10].

Graphs are a fundamental part in research of complex systems and the use of graph theoretic algorithms to extract useful information from a problem is a primary method of analysis for complex systems. A graph representation is used to provide a particular perspective on a problem and a graph theoretic representation is useful for a particular problem depends on whether the interesting features of the problem are likely to be highlighted by the kind of idea that a graph-based representation may provide. Generally graph the representations of a problem revolve around highlighting aspects of a problem that are either structural (like the connectivity of components or processes in the problem) or that depend on structural properties (like structure dependant dynamics). The most common types of problem in which structural aspects play a vital role are problems involving multiple interacting agents. In complex systems disciplines[2].

So my project is designing and implementing breadth first search on implicit graph for web mining application.

My web application will be educational application for BITS-Pilani, Dubai having 3 areas of interest:

Special Interest Group (SIG)

Conference Alerts (CA)

Higher Education (HE)

## Objectives

The present work is intended to meet the following objectives

To survey the functions of Web Structure Mining Algorithms

To design BFS implicit graph and algorithm for web mining application

To identify the Academic Search related functions where Web Structure Mining and Breadth First Search can be applied effectively

## Motivation

The net structure of the World Wide Web (www) is constantly changing due to the addition and removal of web pages (i.e. nodes) or the increase or decrease in the no. of incoming and outgoing links (i.e. edges) to or from a page. It is very essential to maintain the web structure in an organized way and implementing using BFS under Graph Structure Analysis.

## Problem Description

## BFS

## BFS

## BFS

## Methodology

Phases 1: Literature Survey. (referring to journals, catalogues, websites etc)

Phases 2: Data Collection, Learning of Necessary Software/Hardware Tools and Installation & Configuration Steps, Problem Analysis.

Phase 3: Problem Analysis cont. & Design and Generation of Real Data, Coding.

Phase 4: Coding Contd. & Individual Modules Testing & Debugging

Phase 5: Integration Testing & Debugging, Implementation.

Phase 6&7: Final Touches / Fine Tuning, Summary of Findings, Recommendations, Conclusion and Report writing.

## Phase 1

Sept6th-Sept15th

In-depth Research on Graph Structure Mining and breadth first search

## Phase 2

Sept16th - Oct1st

Acquisition and installation of the intended platform.

## Phase 3

Oct 2nd- Oct15th

Learning the coding language of platform

## Phase 4

Oct 16th - Nov1st

Submission of the mid-semester report, Design and generation of real data, coding

## Phase 5

Nov2nd-Nov15th

Coding.

## Phase 6

Nov 16th- Dec 1st

Implementation and Testing.

## Phase 7

Dec1st- Dec 15th

Fine tuning of the report, conclusion and submission of the final report.

## 1.5.1 Plan of Work

Fig 1.5.1 Plan of Work

## Summary

In this chapter I discussed about graphs and its use in web mining applications. My project deals with design and implementation of breadth first search on implicit graph and java coding for web mining application i.e. an educational website. I also discussed about the problem description, methodology and plan of work.

## CHAPTER 2

## LITERATURE SURVEY

## 2.1 Graphs, BFS and Web Mining

In describing the running time of a graph algorithm on a given graph G = (V, E) we usually measure the size of the input in terms of the number of vertices |V| and the number of edges |E| of the graph. We adopt a common notational convention for these parameters. Inside asymptotic notation (such as O notation or Î˜ notation) and the symbol V denotes |V| and the symbol E denotes |E|.e.g. "the algorithm runs in time O (V E)," means that the algorithm runs in time O (|V||E|) [3].

A graph is a way of representing connections between points or places. Mathematically a graph is a collection of nodes and edges. Nodes are locations that are connected together by the edges of the graph. For example, if we had two small towns connected by a two-way road, we could represent this as a graph with two nodes, each node representing a town, and one edge, the road, connecting the two towns together. In addition to the undirected graph in which the edge is a two-way connection there are directed graph in which edges connect only one way.

Fig 2.1.1 a sample graph with nodes A, B, C, D, E, F

In Fig 2.1.1 A, B, C, D, E, F are called nodes and the connecting lines between these nodes are called edges. The edges can be directed edges shown by arrows they can also be weighted edges in which some numbers are assigned to them. Therefore a graph can be a directed or undirected and weighted/un-weighted graph. So we will discuss about undirected and un-weighted graphs.

There are two standard ways to represent a graph G = (V, E) as a collection of adjacency lists or as an adjacency matrix [4]. Either way is applicable to both directed and undirected graphs. The adjacency-list representation is usually preferred because it provides a compact way to represent sparse graphs-those for which |E| is much less than |V|2. Most of the graph algorithms presented in this book assumes that an input graph is represented in adjacency list form. An adjacency matrix representation may be preferred when the graph is dense |E| is close to |V|2 or when we need to be able to tell quickly if there is an edge connecting two given vertices [4].

The adjacency list representation of a graph G = (V, E) consists of an array Adj of |V| lists, one for each vertex in V. For each u Â the adjacency list Adj[u] contains all the vertices v such that there is an edge (u, v) Â E. That is Adj[u] consists of all the vertices adjacent to u in G. (Alternatively it may contain pointers to these vertices). The vertices in each adjacency list are typically stored in an arbitrary order [3]. Figure 2.1.2(b) is an adjacency list representation of the undirected graph in Figure 2.1.2(a). Similarly Figure 2.1.3(b) is an adjacency-list representation of the directed graph in Figure 2.1.3(a).

Fig 2.1.2 (a) Fig 2.1.2 (b) Fig 2.1.2 (c)

Figure 2.1.2: Two representations of an undirected graph. (a) An undirected graph G having five vertices and seven edges. (b) An adjacency-list representation of G. (c) The adjacency matrix representation of G.

Fig 2.1.3 (a) Fig 2.1.3 (b) Fig 2.1.3 (c)

Figure 2.1.3: Two representations of a directed graph. (a) A directed graph G having six vertices and eight edges. (b) An adjacency list representation of G. (c) The adjacency matrix representation of G.

If G is a directed graph the sum of the lengths of all the adjacency lists is |E| since an edge of the form (u, v) is represented by having v appear in Adj [u]. If G is an undirected graph the sum of the lengths of all the adjacency lists is 2 |E|. If (u, v) is an undirected edge, then u appears in v's adjacency list and vice versa. For both directed and undirected graphs the adjacency list representation has the desirable property that the amount of memory it requires is Î˜ (V + E) [5].

Adjacency lists can readily be adapted to represent weighted graphs i.e. graphs for which each edge has an associated weight represented by a weight function w: E â†’ R. For example, let G = (V, E) be a weighted graph with weight function w. The weight w (u, v) of the edge (u, v) Â E is simply stored with vertex v in u's adjacency list. The adjacency list representation is quite robust in that it can be modified to support many other graphs. A disadvantage of the adjacency list representation is that there is no quicker way to determine if a given edge (u, v) is present in the graph than to search for v in the adjacency list Adj[u]. This disadvantage can be overcome by an adjacency-matrix representation of the graph at the cost of using more memory [3].

For the adjacency matrix representation of a graph G = (V, E), we assume that the vertices are numbered 1, 2.. |V|. Then the adjacency matrix representation of a graph G consists of a |V| Ã- |V| matrix A = (aij) such that

aij= 1 if (i, j) E

aij= 0 otherwise

Figures 2.1.2(c) and 2.1.3(c) are the adjacency matrices of the undirected and directed graphs in Figures 2.1.2(a) and 2.1.3(a) respectively. The adjacency matrix of a graph requires Î˜ (V2) memory (running time), independent of the number of edges in the graph.

Like the adjacency list representation of a graph the adjacency matrix representation can be used for weighted graphs. For example if G = (V, E) is a weighted graph with edge weight function w, the weight w (u, v) of the edge E (u,v) is simply stored as the entry in row u and column v of the adjacency matrix. If an edge does not exist a NIL value can be stored as its corresponding matrix entry. Though for many problems it is convenient to use a value such as 0 or âˆž [6].

## 2.1.1 Breadth First Search

In a breadth first traversal we visit the starting node and then on the first pass visit all of the nodes directly connected to it. In the second pass we visit nodes that are two edges "away" from the starting node. With each new pass we visit nodes that are one more edge away. Because there can be cycles in the graph it is possible for a node to be on two paths of different lengths from the starting node. When we visit that node for the first time along the shortest path from the starting node that is not considered again. Therefore, either we need to keep a list of the nodes we have visited or we will need to use a variable in the node to mark it as visited node to prevent multiple visits [6].

Breadth first search is one of the simplest algorithms for searching a graph and the architecture for many important graph algorithms. Prim's minimum spanning tree algorithm and Dijkstra's single source shortest paths algorithm uses the concept of breadth first search. Given a graph G = (V, E) and a distinguished source vertex "s" breadth first search systematically explores the edges of G to 'discover' every vertex that is reachable from s. It computes the distance (smallest number of edges) from s to each reachable vertex. It also produces a "breadth first tree" with root's' that contains all reachable vertices [5]. For any vertex v reachable from s the path in the breadth first tree from s to v corresponds to the "shortest path" from s to v in G i.e. a path containing the smallest number of edges. The algorithm works on both directed and undirected graphs.

Breadth first search is so named because it expands the frontier between discovered and undiscovered vertices uniformly across the breadth of the frontier of an algorithm. The algorithm discovers all vertices at distance "k" from "s" before discovering any vertices at distance k + 1. To keep track of progress breadth first search colors each vertex white, gray, or black. All vertices start out white and later may become gray and then black. A vertex is discovered the first time it is encountered during the search, at which time it becomes non-white. Therefore Gray and black vertices have been discovered, but breadth first search distinguishes between them to ensure that the search proceeds in a breadth first manner. If (u, v) Â E and vertex u is black, then vertex v is either gray or black i.e. all vertices adjacent to black vertices have been discovered. Gray vertices may have some adjacent white vertices which represent the frontier between discovered and undiscovered vertices [7].

Breadth first search constructs a breadth first tree initially containing only the root, which is the source vertex "s". Whenever a white vertex "v" is discovered during scanning the adjacency list of the discovered vertex "u", the vertex v and the edge (u, v) are added to the tree. The vertex u is the predecessor or parent of v in the breadth-first tree. Since a vertex is discovered at most once, it has at most one parent. Ancestor and descendant relationships in the breadth-first tree are defined relative to the root s i.e. if u is on a path in the tree from the root s to vertex v then u is an ancestor of v and v is a descendant of u [3].

Using the adjacency matrix representation and assuming that the number of vertices is N, breadth first search can be written like this

## Algorithm

Void BFS (int startingVetex) {

IntQueue queue; // Start with an empty queue of integers.

for (int i = 0; i < N; i++) {

visited[i] = false; // Mark all vertices unvisited.

## }

visited [starting Vertex ] = true;

queue.enqueue (starting Vertex);

while (! queue.isEmpty ()) {

int v = queue.dequeue (); // Get next vertex from queue.

process (v); // Process the vertex.

for (int u = 0; u < N; u++) {

if (edge Exists[v][u] && visited[u] == false ) {

// we have an edge from v to an unvisited vertex;

// mark it as visited and add it to the queue.

visited[u] = true;

queue.enqueue (u);

## }

## }

## }

## }

In this algorithm each vertex that is encountered is marked as visited and placed on the queue. It is processed when it comes off the queue. If it is encountered a second time (by an alternate path from the starting vertex), it has already been marked as visited and so is not placed on the queue again. Since no vertex is placed on the queue more than once the while loop is executed max N times. So it is easy to see that the run time for this algorithm is O (N2), where N is the number of vertices. In the edge list representation the run time would be O (M) where M is the number of edges in the graph.

In this algorithm the first vertex on the queue is the starting vertex. When the starting vertex is removed from the queue all vertices that are connected to the starting vertex by a single edge are located and are placed on the queue i.e. the vertices at distance one from the starting vertex. As these vertices are removed from the queue, the vertices to which they are directly connected are added to the queue. These are the vertices at distance two from the starting vertex and they follow all the distance one vertices on the queue. As the distance two vertices are processed the distance three vertices are encountered and are added to queue following all the distance two vertices and so on. Vertices are added to the queue in order of their distance from the starting vertex i.e. they are added in breadth first order [5].

## 2.1.2 Analysis of BFS

Before discussing the various properties of breadth first search, analysis of its running time on an input graph G = (V, E) is done. The operations of enqueuing and dequeuing take O (1) time, so the total time taken by queue operations is O (V). Because the adjacency list of each vertex is scanned only when the vertex is dequeued, each adjacency list is scanned max one time. Since the sum of the lengths of all the adjacency lists is Î˜ (E), the total time spent in scanning adjacency lists is O (E). The running time for initialization is O (V) and thus the total running time of BFS is O (V + E). Thus breadth first search runs in time linear in the size of the adjacency list representation of graph G [7].

## 2.1.3 Web Mining

Web mining is the integration of information gathered by traditional data mining methodologies and techniques with information gathered over the World Wide Web (www). Web mining is used to understand customer behaviour, evaluate the effectiveness of a particular Web site and help quantify the success of a marketing campaign [8].

Web mining allows us to look for patterns in data through content mining, structure mining and usage mining. Content mining is used to examine data collected by search engines and Web spiders. Structure mining is used to examine data related to the structure of a particular Web site. Usage mining is used to examine data related to a particular user's browser as well as data gathered by forms the user may have submitted during Web transactions [7].

The information gathered through Web mining is evaluated (sometimes using software graphing applications) by using traditional data mining parameters such as clustering and classification, association and examination of sequential patterns [8].

## CHAPTER 3

## SYSTEM CONFIGURATION

## 3.1 Introduction

Throughout the project I will be designing and implementing the educational application i.e. BITS-Pilani, Dubai website through PHP software and will be performing Breadth First Search for this site using JAVA coding. I will also be using graphs for BFS.

## 3.2 Hardware Software Requirements

The project work is planned to be carried out using a system with the following configuration

## Hardware

Processor- Intel Core 2 Duo T5450 @ 1.67 GHz

Motherboard - Gigabyte P43-ES3G - Intel P43 Chipset +ICH10

RAM - Lenovo 2.00 GB RAM (DDR2 1066 MHz

Hard Disk - Seagate 160 GB SATA

Optical Drive - Lenovo SATA 32x DVD-writer

Network card - Onboard

Sound Card - Onboard high-definition Audio

Graphics card - NVIDIA GeForce 8400 GT-256 MB

## Software

Operating systems - Windows Vista SP 2

Office suite - Microsoft Office 2010

JCreator 4.50 Pro

JDK

Telnet

Graph Traversal Applet

## CHAPTER 4

## DESIGN

## 4.1 User Interface

## 4.1.1 JCreator

JCreator is a Java IDE created by Xinox Software. Its interface is similar to that of Microsoft's Visual Studio.

Features of JCreator are:

Custom colour schemes

Wrapping around of our existing projects

Different JDK profiles can be used

Quick code writing via project templates

Easy project viewing with the class browser

Debugging with an easy interface. No command line prompts necessary

AutomaticÂ Class pathÂ configuration

UI customization (similar to Microsoft Visual Studio)

The run-time environment can run our application as anÂ applet in a J-Unit environment or in a command-line window

JCreator's IDE does not require a Java Runtime Environment to execute which may make it faster than Java-based IDE's.

## Starting a new program

If we write a program from scratch then we can start our work in JCreator. It is always best to place each of our programs into a separate directory.Â JCreator will create the directory for us.

SelectÂ Project->New Project (fig 4.1.1 a)Â .In the dialog give a name to the project. The best choice for the name is the name of the directory that contains the files. For example, if our homework files are contained inside a directoryÂ hw1, call the projectÂ hw1.

Edit theÂ LocationÂ field so that it contains the full directory path of your program files, such asÂ c:\homework\hw1.

Fig 4.1.1 a Creating a new project

Click on theÂ OkÂ button. Our project is opened. JCreator adds a Java file with the same name as the project such asÂ Hw1.java. If we do not need that file, click on it in the project window and selectÂ Edit->DeleteÂ from the menu. To add a new class to the project, chooseÂ Project->New classÂ from the menu.

Fill in the name of the class (such asÂ Hello). A new file (such asÂ Hello.java) is added to the project. Now we can type our program into the source window. Note the code completion feature that suggests how to complete partially typed expressions.

## Compiling a program

We will usually have multiple projects open in JCreator. When we are ready to compile and run a program, make sure that the project on which we are working is the active project. SelectÂ Project->Active Project and then select the current project from the submenu.

To compile a program, selectÂ Build->Compile ProjectÂ or hit theÂ F7Â key.Â Compilation errors are displayed in a separate window (fig 4.1.1 b).Â

Fig 4.1.1 b Compliling the the project

## Running a program

When the program compiles successfully, you can run it. SelectÂ Build->Execute ProjectÂ or hit theÂ F5Â key. The program executes in a separate console window (Fig 4.1.1 c).

Fig 4.1.1 c Running the project

## 4.1.2 The Graph Traversal Applet

The Graph Traversal Applet animates Breadth First Search (BFS) and Depth First Search (DFS) algorithms and shows how graphs are traversed in each algorithm.

The applet provides a simple user interface helping users to control all the tasks needed to run the animation. For designing the user interface priority is given to allow users to control the animation and the display at all times instead of the applet controlling the user actions and the rate of animation [5].

The user interface is shown below:

Figure 4.1.2: the user interface of the graph traversal applet

## 4.2 Architecture and Data

## 4.2.1 Java File Handling

File handling is an integral part for almost all programming projects. Files provide the means by which a program stores data, accesses stored data or shares data. As a result there are very few applications that do not interact with a file in one form or another. Although no aspect of file handling is particularly difficult many classes, interfaces, and methods are involved. It is important to understand that file I/O is a subset of Java's overall I/O system. Furthermore Java's I/O system is quite large. It supports two distinct I/O class hierarchies: one for bytes and one for characters. It contains classes that enable a byte array, character array or a string to be used as source or target of I/O operations. It also provides the ability to set or obtain various attributes associated with a file itself such as its read/write status, whether the file is a directory or if it is hidden. We even obtain a list of files within a directory.

Despite is size; Java's I/O system is surprisingly easy to use. One reason for this is its well thought-out design. By structuring the I/O system around a carefully crafted set of classes this very large API is made manageable. The I/O system's consistency makes it easy to maintain or adapt code and its rich functionality provides solutions to most file handling tasks.

The core of Java's I/O system is packaged in java.io. It has been included with Java since version 1.0 and it contains the classes and interfaces that we will most often use when performing I/O operations including those that operate on files. Simply put java.io when we need to read or write files java.io is the package that we will normally turn to

Another package that includes file handling classes is java.util.zip. The classes in java.util.zip can create a compressed file or decompress a file. These classes build on the functionality provided by the I/O classes defined in java.io. Thus, they are integrated in to Java's overall I/O strategy.

## Filename handling

To write anything to a file first of all we need a file name we want to use. The file name is a simple string like this:

## String fileName = "test.txt";

If we want to write in a file which is located elsewhere we need to define the complete file name and path in your fileName variable:

## String fileName = "c:\\filedemo\\test.txt";

However if we define a path in our file name then we have to take care the path separator. On windows system the '\' is used and we need to backslash it so we need to write '\\', in Unix, Linux systems the separator is a simple slash '/'.

To make our code OS independent we can get the separator character as follows:

## String separator = File.separator;

## Open a file

To open a file for writing use the FileWriter class and create an instance from it. The file name is passed in the constructor like this:

## FileWriter writer = new FileWriter(fileName);

This code opens the file in overwrite mode. If we want to append to the file then we need to use another constructor like this:

## FileWriter writer = new FileWriter(fileName, true);

Besides this the constructor can throw an IOException so we put all of the code inside a try-catch block.

## Write to file

At this point we have a writer object and we can send real content to the file. We can do this using the write() method, which has more variant but the most commonly used requires a string as input parameter. Calling the write() method doesn't mean that it immediately writes the data into the file. The output is maybe cached so if we want to send our data immediately to the file we need to call the flush() method.

As last step we should close the file with the close() method and we are done.

## Sample

public void writeFile() {

String fileName = "c:\\test.txt";

try {

BufferedWriter writer = new BufferedWriter(new FileWriter(fileName,true));

writer.write("Test text.");

writer.newLine();

writer.write("Line2");

writer.close();

} catch (IOException e) {

e.printStackTrace();

## }

## }

## 4.2.2 Breadth-first heuristic search

Frontier search is a memory efficient approach to graph search that only stores the Open list and saves memory by not storing the Closed list. It uses techniques to prevent regeneration of previously closed nodes that are no longer in memory as the traditional trace back method of solution recovery requires all closed nodes to be in memory. It also uses a divide and conquer method of solution recovery in which each search node keeps track of an intermediate node along the best path that leads to it. When the goal node is selected for expanding the intermediate node associated with it must be on an optimal solution path. This intermediate node is used to divide the original search problem into two sub problems - 1. The problem of finding an optimal path from the start node to the intermediate node. 2. The problem of finding an optimal path from the intermediate node to the goal node. Then these sub problems are solved recursively by the same search algorithm until all the nodes along an optimal solution path for the original problem are identified [1].

Java code shows the implementation of the BFS algorithm:

public void bfs ()

## {

// BFS uses Queue data structure

Queue q=new LinkedList ();

q.add (this.rootNode);

printNode (this.rootNode);

rootNode.visited=true;

while (! q.isEmpty ())

## {

Node n= (Node) q.remove ();

Node child=null;

while ((child=getUnvisitedChildNode (n))! =null)

## {

child.visited=true;

printNode(child);

q.add (child);

## }

## }

// Clear visited property of nodes

clearNodes ();

## }

My project data will be the web application i.e. educational application for BITS-Pilani, Dubai having 3 areas of interest:

Special Interest Group (SIG)

Conference Alerts

Higher Education

With the users:

Students

Library staff

Faculty

Therefore the root of the search tree will be BITS-Pilani, Dubai, parent nodes-SIG, CA, HE and children- students, library staff and faculty. For this tree BFS will be done.

## CHAPTER 5

## IMPLEMENTATION

## 5.1 Introduction

In this chapter we will discuss about the algorithm regarding the generation of the graph by inputting the structure of the website (say www.bitsdubai.com) and then implementing breadth first search for this graph generated. The coding of the program is done with java and the platform is JC Creator.

The program is an open program i.e. it can be implemented for any web application and by any user.

## 5.2 Algorithm

## Input

The inputs for the program are the links associated with site www.bitsdubai.com these links act like nodes and edges for the graph.

For the class ProjGraph :

We need to be able to add and connect vertices

A way to parse a text file that holds the data of our graph

Perhaps to override the toString() method to check if the graph is properly built

For the class BFS :

Get the edges of a specific vertex

To check if a certain vertex is connected to some other vertex or not

## Generating Graph

Creation of the adjacency list and parsing the data file

As the graph is a undirected graph, so we can connect vertex X to vertex Y and vice versa by using a private-helper method.

If vertex X already exists in the adjacency list, get its edge list and add vertex Y to it.

If it doesn't exist create a new Array List and add vertex Y to it and put it all in the adjacency List.

The function "private void connect()" returns true iff vertex X points to vertex Y.

The function "return edges.contain()" returns all the edges of a certain vertex or throws an exception if the vertex does not exist in this graph ("throw new RuntimeException()").

The function "private void parseDataFile(String filename)" reads a text file with the graph-data.

Then the string representation of the graph is created.

The main function of the program takes the input from the data file using file handling and generates the graph.

## Performing BFS

A stack is created to store all the vertices of our path.

First the 'end' vertex is pushed on the stack.

Then a loop is put from the highest level back to the first level and a loop through each level.

Each vertex in that level is checked whether it is connected to the vertex on the top of the stack. If a match is found that match is pushed on the stack and broken out of the loop.

Now the stack is reversed before returning the path in the correct order (from start to finish).

Here is an example ASCII drawing of backtracking from the end vertex "n" to the starting vertex "g". The arrows <- denote the path that was found.

level: 0 1 2 3 4 5 6 7 8

## ---------------------------------------------------------

g <-+ IV e I a +- III <- o <- VI <- n

+- V <-+ f +- II <-+ b | p

+- c <-+ | d |

j +- h <-+

q

r

Firstly during the BFS traversal a 'container' is being created consisting of newly discovered vertices where the first vertex (the starting point) is on index 0 i.e. (level 0), it's edges at index 1 (level 1) and so on.

Get the level from the container which was added last and then a new level is created to possible unexplored vertices in.

Loop is done through each of the vertices from the last added level and connects the neighbouring vertices to each vertex.

Then looping through each connecting vertex and if the connecting vertex has not yet been visited, only then add it to the newly created level-list and mark it as visited in our set.

We do not need to search any further if we stumble upon the 'end' vertex .In that case just 'return' from this method.

Only make the recursive call if we have found vertices which have not been explored yet.

AnÂ emptyÂ 'container'Â is created toÂ storeÂ allÂ theÂ levelsÂ fromÂ theÂ 'start' vertex in. A level is also a list with one or more vertices.

TheÂ firstÂ levelÂ onlyÂ holdsÂ theÂ 'start'Â vertex,Â whichÂ isÂ addedÂ first, thisÂ isÂ theÂ 'initial'Â list.

The 'start' vertex is also stored in asset which keeps track of all the vertices we have encountered so that we do not traverse vertices twice (or more).

Once the above 3 steps are initialized, we can call the actual BFS-Algorithm

OnceÂ theÂ BFS-AlgorithmÂ isÂ done,Â weÂ callÂ theÂ backTrack()Â method toÂ findÂ theÂ shortestÂ pathÂ betweenÂ 'end'Â andÂ 'start'Â betweenÂ theÂ exploredÂ levelsÂ ofÂ the graph.

## 5.3 Summary

Therefore the main method in the program involves getting the structure of the website for which the BFS has to be performed and then parsing the text file (java file handling) that holds the data of our graph.

Therefore the main method involves:

Creation of the graph

File handling

Getting the shortest path between 2 vertices (BFS) and print the shortest path

## CHAPTER 6

## DISCUSSION

## 6.1 Performance Analysis of BFS

## Depth

## Nodes

## Time

## Memory

## 0

## 2

## 4

## 6

## 8

## 10

## 12

## 14

## 1

## 111

## 11,11

## 106

## 108

## 1010

## 1012

## 1014

## 1 millisecond

## seconds

## 11 seconds

## 18 minutes

## 31 hours

## 128 days

## 35 years

## 3500 years

## 100 B

## 11 B

## 1 MB

## 111 MB

## 11 GB

## 1 TB

## 111 TB

## 11,111 TB

BFS depth requirements on time and memory

## 6.2 Applications of BFS

Breadth first search can be used to solve many problems in graph theory, for example:

Finding all nodes within one connected component

Copying Collection 'Cheney's algorithm'

Finding the shortest path between two nodes u and v

Testing a graph for bipartiteness

(Reverse) Cuthill-McKee mesh numbering

Ford-Fulkerson method for computing the maximum flow in a flow network

Serialization or Deserialization of a binary tree vs. serialization in sorted order allows the tree to be re-constructed in an efficient manner.

## 6.3 Advantages of BFS

BFS is an exhaustive search algorithm. It is simple to implement and it can be applied to any search problem. Comparing BFS to depth first search algorithm BFS does not suffer from any potential infinite loop problem, which may cause the computer to crash. Breadth first search will never get trapped exploring the useless path forever. If there is a solution, BFS will definitely find it out. If there is more than one solution then BFS can find the minimal one that requires less number of steps.

BFS can be used to test bipartiteness by starting the search at any vertex and giving alternating labels to the vertices visited during the search i.e. give label 0 to the starting vertex, 1 to all its neighbours, 0 to those neighbours' neighbours and so on. If at any step a vertex has (visited) neighbours with the same label as itself, then the graph is not bipartite. If the search ends without such a situation occurring, then the graph is bipartite.

## 6.4 Disadvantages of BFS

One disadvantage of BFS is that it is a 'blind' search i.e. when the search space is large the search performance will be poor compared to other heuristic searches. BFS will perform well if the search space is small. It performs best if the goal state lies in upper left hand side of the tree. But it will perform relatively poorly relative to other search algorithms if the goal state lies in the bottom of the tree. BFS will always find the shortest path if the weight on the links are uniform. The main drawback of Breadth first search is its memory requirement. Since each level of the tree must be saved in order to generate the next level, and the amount of memory is proportional to the number of nodes stored, the space complexity of BFS is O(bd). As a result, BFS is severely space-bound in practice so will exhaust the memory available on typical computers in a matter of minutes. If the solution is farther away from the root, breath first search will consume lot of time.

## CHAPTER 7

## FUTURE EXTENSIONS

My Project can be implemented in future for Image Processing Problems.

The basis of my project i.e. breadth first search can be used as the breadth-first graph traversal for the processing of digital images by using efficient algorithms for eroding, dilating, skeletonizing, and distance-transforming regions. These algorithms work by traversing regions in a breadth-first manner, using a queue for storage of unprocessed pixels. They use memory efficiently. The pixels are removed from the queue as soon as their processing has been completed and they process only pixels in the region (and their neighbours), rather than requiring a complete scan of the image. The image is still represented as a pixel matrix in memory; the graph is just a convenient framework for thinking about the algorithms. All the algorithms will be based on viewing a raster image not as a pixel matrix but as a graph whose vertices represent pixels and whose edges represent neighbourhood between pixels. This suggests that pixels in the image can be traversed in a breadth-first manner, rather than with the traditional raster scan. Because pixels to be processed are stored in a queue, breadth-first traversal yields propagation patterns that grow or shrink uniformly across their boundary like wave fronts.

The algorithm will be using following:

Flood Filling (or Object labelling)

Storing Boundary Pixels

Repeated Erosions

Distance Transformation

Skeletonization

My project can also be implemented in future for creating an educational website where the user can select keywords that are present in world-wide web. Then using breadth first search to extract the nodes and their values in a web page and map them to a proper format so that it is easy for the users to search for a particular data whenever they want.

## CHAPTER 8

## LESSONS LEARNT

In my project regarding BFS I have learnt that BFS is an exhaustive search algorithm. It is simple to implement and it can be applied to any search problem.

BFS does not suffer from any potential infinite loop problem, which may cause the computer to crash. Breadth first search will never get trapped exploring the useless path forever. If there is a solution, BFS will definitely find it out.

Moreover I have learnt lot about web mining, graphs, java file handling, parsing etc. and their use in computer science.

## CHAPTER 9

## CONCLUSION

The above report concludes the 4 months of my project work on Design and Implementation of BFS on Implicit Graph for Web Mining Application. My work mainly revolved around studying and analyzing as much literature as possible on topics like running time of algorithms, graphs, breadth first search, graph traversal applet, web mining, JCreator, PHP etc. In this report I discussed about the literature, algorithm, design (data, program and interface) java coding, implementation of the algorithm of BFS, execution time of the algorithm, creation and traversal of the graph for BFS.

Therefore in this report where I got to learn, understand and comprehend the various aspects of my project topic practically. The last 4 months' worth of knowledge and data helped me in analyzing my project. I would conclude by saying that thorough knowledge about the aforementioned topics will provide for a concrete foundation for my aspirations of managing engineering in the near future.

## REFERENCES

Barnat, J.; Brim, L.; and Chaloupka, J. 2003. Parallel breadth-first search LTL model checking. In Proceedings of the 18th IEEE International Conference on Automated Software Engineering (ASE'03), 106-115

Bang-Jensen, J. & Gutin, G.: Digraphs: Theory, Algorithms and Applications. Springer-Verlag (2002)

Mehlhorn, K.: Graph Algorithms and NP-Completeness. Springer-Verlag (1984)

Reinhard Diestel Graph Theory Electronic Edition 2000 Springer-Verlag New York 1997, 2000

Introduction to Algorithms, Second Edition Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, The MIT Press Cambridge , Massachusetts London, England, McGraw-Hill Book Company Boston Burr Ridge , IL Dubuque , IA Madison , WI New York San Francisco St. Louis Montreal Toronto

Analysis of Algorithms: An Active Learning Approach, Jeffrey J. McConnell, Jones and Bartlett Publishers

Introduction to The Design and Analysis of Algorithms, second edition, Pearson International Edition, Anany Levitin

Jeyalatha Sivaramkrishnan, Vijayakumar Balakrishnan, "Web Mining Functions in an Academic Search Applications", Informatica EconomicÄƒ vol. 13, no. 3/2009.

Java Breadth First Search problem, bytes.com

http://bytes.com/topic/java/answers/626759-java-breadth-first-search-problem

## "Introduction to Graph with Breadth First Search (BFS) and Depth First Search (DFS) Traversal Implemented in JAVA"

http://www.codeproject.com/KB/java/BFSDFS.aspx