Data Structures And Algorithms An Overview Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Algorithms consist of a set of rules to carry out a calculation either by hand or machine. It can also be defined as an abstraction consisting a program executed on a machine (Drozdek 2004). This program will follow a sequence of operations carried out on data organized in data structures. These data structures are broadly categorized into:

Linear data structures whose examples include arrays, matrices, hashed array trees and linked list among others.

This is an abstract data structure which implements the graph concepts from mathematics. The graph will consist arcs or edges as (x, y) of nodes or vertices. These vertices may or may not be part of the graph structure. The edges may assume some value or numeric attribute such as cost length or capacity. Some of the operations of the graph structure G would include:

Adjacent (x, y) an operation testing whether for the existence of an edge between x and y.

Set_ node, value (G x, a) an operation setting the value associated with node x to a

Add (G x, y) an operation that adds to the graph an arc from x and y if it is not existent.

Graph algorithms are implemented within computer science to find the paths between two nodes like the depth or breadth first search or the shortest path (Sedgewick 2001). This is implemented by the Dijkstra's algorithm. The Floyd - Warshall algorithm offers a solution to find the shortest path between nodes.

Linked lists

These are linear data structures consisting of a sequence of data linked by a reference in each of the data components in the list. Linked lists provide implementation for stacks, queues, skip lists and hash tables. Linked lists are preferred over arrays because the lists may be ordered differently from how they are stored in memory. These lists will therefore allow the removal or insertion of nodes at any point. Each component or record of a linked list has a node which contains an address to the next node called the pointer or next link. The remainder of the fields are known as the payload, cargo, data or information. The list has first node as the head and the last node as the tail. A linked list may be circularly linked where the last node references the first node in the same list or linear where the link field is open.

B -Tree

This is a tree data structure that keeps sorted data allowing searches, deletions, insertions and sequential access. Unlike the self balancing binary searching tree, operations in the B- Tree are optimized for systems dealing with big blocks of data. The B -Tree has variants of design. However the B -Tree stores keys in the internal nodes but this does not normally record at the leaves. The general variations are B+ - Tree and B* - Tree (Comer 129).

The searching process in the B- Tree is similar to the binary search tree. It commences at the root and a traversal is executed from top to bottom. The search points at the child pointer with values between the search values. The insertion starts at the leaf node which if containing fewer than legally acceptable elements qualify for an addition, otherwise the node is evenly split into two nodes. A median is chosen in determining the left or right hand placements with values greater than the median going to the right node. The median here acts as the separation value. The deletion process assumes two popular strategies where the element is located and deleted and a restructuring done to the tree. Alternatively a scan may be performed followed by a restructuring of the tree after the candidate node to be delete has been identified.


This is a data structure employing the hash function to map to identity keys. The function transforms the key into the index of an array. The function should map every possible key on its unique slot index. Using well dimensioned hash tables every look up is independent of the number of elements in the array. Due to their efficiency hash tables are widely used for database indexing, set and cache implementation and associative arrays. A simple array sits at the heart of the hash table algorithm. This algorithm derives an index from the element's key using the index to store the elements into the array. The hash function f is the implementation of this calculation. Hash tables are used to implement various types of memory tables. Keys are used for disk based database indices and persistent data structures.

Greedy Algorithms.

These algorithms work by making most promising decisions at the onset whatever the outcome would be is not taken into consideration at that moment. These algorithms are considered straight forward, simple and short sighted (Chartrand 1984). The upside or advantage to these greedy algorithms is that they are easy to invent and implement and will prove efficient.

Their disadvantage is that they are not able problems optimally because of their greedy approach. Greedy algorithms are applied in trying to solve optimization problems.

A typical implementation of these algorithms is the 'making change' problem whereby we are required to give change using minimum number of notes or coins. We commence by giving the largest denomination first.

Informally the greedy algorithm for this problem would follow the steps below:

Begin without anything

At every stage without passing a given amount

Consider the largest addition to the set.

A formal algorithm of the implementation of the making change problem can be written as here below:


C → {100, 25, 10, 5, 1} // C is a constant set of different coinage denominations

Sol → {X} // Represents the solution set

Sum→ 0 which is the sum of items in {X}

WHILE Sum. Not = n

L =Largest of C such that

Sum +L <= n

IF no such item THEN

RETURN "No item"

SUM ← Sum+L


An approach by the greedy algorithm to ensure optimization is the maintaining of two sets one for chosen items and the other for rejected items. Based on the two sets the greedy algorithm carries out four functions. Function one will check if the chosen set of items provides a solution. Function two checks for flexibility of the set. The selection function will identify the candidates. The objective function will give the solution.

The greedy algorithm applies for the shortest path. Dijkstra's algorithm on the shortest path aims at determining the length of the shortest path. This path is determined from S the source or vertex to other nodes.

Typically Dijkstra's algorithm will maintain two sets of nodes S and C. S in this case consists of already selected nodes whereas C will consist of the rest of the nodes within the graph (Papadimitrious & Steiglitz 1998). At the initialization of the algorithm our set X has only S. After execution {X} has all the nodes of the graph. During each step of the algorithm a node in C that is closest to S is chosen. The remainder nodes that don't belong to S will result in a disconnected graph. This graph has no shortest path from S.

The diagrams below illustrate the operation of the Dijkstra algorithm

Consider the graph G = (V, E). Every node on the graph will have an infinite cost except the source node which has 0 costs (Design and Analysis of Computer Algorithms 2010)

Source: Design and Analysis of Computer Algorithms 2010

Initialize d[S] to zero and choose the node closest to S. Add it to S and relax all other nodes adjacent to S.

Update all the nodes. The diagram here below illustrates this process:

Source: Design and Analysis of Computer Algorithms 2010

Choose the closest node X and relax adjacent nodes while updating u, v and y as indicated in the diagram below.


Next we consider y as closest and add to S and relax V as indicated in the diagram below

Source: Design and Analysis of Computer Algorithms 2010

Consider u and adjust v as a neighbor. The diagram here below illustrates continued adjustment.

Source: Design and Analysis of Computer Algorithms 2010

Finally add V and the predecessor list is now defining the shortest path from S which was the source node. The diagram below illustrates the resulting shortest path

Source: Design and Analysis of Computer Algorithms 2010

Spanning trees

Typically graphs will have multiple paths between nodes. Spanning tree graphs contain all the nodes with one path between any two nodes. A graph will consist of many different spanning trees. A disconnected graph suffices as a spanning forest. Breadth first spanning tree results when a breadth first search is carried out on this graph. The depth first spanning tree results from a depth first search carried out on the spanning tree. Spanning tree applications among others will include the travelling salesman problem illustrated here below:

Problem: Consider a complete undirected graph G= (V, E).This has a non negative integer cost which is associated with each edge representing a certain distance. We can derive a tour of the graph G with the minimum cost. The salesman may start from city 1 and go on to the six cities (1 - 6) and return back to city 1.

The first approach would run in the following manner from city:

1 to 4 to 2 to 5 to 6 to 3 to 1 resulting in a total of 62 kilometers. The diagram below illustrates this approach.

Adding the edge weights we have 15+10+8+15+9+5 = 62

Source: Design and Analysis of Computer Algorithms 2010

The other alternative approach which is the most optimal would run in the following man from city: 1 to 2 to 5 to 4 to 6 to 3 to 1 resulting in a total of 48 kilometers. The diagram below illustrates this approach.

Adding the edge weights we have 10+8+8+8+9+5= 48 Kilometers


Other applications using the panning tree approach are like the airlines' route determination, designing of computer networks, the laying of oil pipelines to connect refineries and road link constructions between cities. (Goodrich & Tamassia 2010; Sedgewick 2002).

A typical minimum spanning tree application based on the spanning tree application MST(minimum spanning tree) cost can be used o determine the points of connection of some cable for example the fiber optic being laid along a certain path. The edges with a larger weight which corresponds to more cost would be those that require more attention and resources to lay the cable. The most appropriate result would be derived from the graph with the minimum cost.

Prim's Algorithm.

The approach for this algorithm is that it proceeds from an arbitrary root node at every stage. A new edge is added to the tree at each of these steps. The addition process will terminate after all the nodes in the graph have been achieved. This algorithm concentrates on finding the shortest edge. Therefore the time lapse for the algorithm will depend on how the edge is searched. The straight forward search method identifies the smallest edge by searching adjacently a list of all nodes in the graph. Every search as an iteration has a cost time O (m). The total cost time for running a complete search would be O (mn).

The Prim algorithm (basic) takes the following steps:

Initialize the tree to consist of a start node

WHILE not all nodes in the tree


Examine all nodes in the graph with one end point in the tree

Find the shortest edge adding it to the tree


After every step or iteration a partial spanning tree which holds the maximum number of shortest edges is created as A and B will consist of the remaining nodes. The loop searches for the shortest edge between A and B.

Kruskal's Algorithm.

This is an algorithm for computing the minimum spanning tree (MST) by building the generic algorithm into a forest which consists of a number of minimum spanning trees. Kruskal's algorithm considers each edge and is ordered based on increasing weight. Considering an edge (u, v) that connects two different trees. It follows that (u, v) will be added to the set of edges in the generic algorithm. The resultant is a single tree from two trees connected by (u, v).

The algorithm can be outlined as follows:

Commence with an empty set E selecting at each stage the shortest edge not yet chosen or discarded regardless of its location on the graph


A ←{ } // the set containing the edges of the MST

for every node n in V[G]

do make_set (n)

sort edge of E by decreasing weights w

for each edge (u, n) in E

do if FIND_SET (u) not equal FIND_SET (n)

then A=A U {(U, N)}

UNION (u, n)

Return A

The algorithm above uses disjoint set data structures. Kruskal's algorithm can also be implemented with the priority queue data structure. The resulting algorithm would be as shown below:


for each node n in V[G] do

define S(n) ←{ n}

initialize the queue Q consisting of all the edges of the graph G. Weights are not used as key here

A ←{ } // This set will contain the edges of the generic algorithm(MST)

WHILE A has v-1 edges do

n Є S(n) and u Є S(u)

IF S (n)! = S (u) Then add edge (u, n) to A S(n) U S(u)

Return A

The Binary Search Tree.

A binary tree is one where each internal node X stores an element. Generally the elements in the left sub tree of X will be less than or equal to X while those in the right sub tree are equal or greater than X. This represents the binary search tree property. The binary search tree height equals the number of links between the root node and the deepest node (Skeinna 2008). The implementation of the binary search tree is such as a linked data structure where each node is an object with a total of three pointer fields namely left, right and Parent. These points to nodes corresponding to the left, right children and the parent. A NIL in any of these fields indicates no parent or child. The root node is the only one with NIL in the Parent field.

Dynamic programming algorithms

These typically explore ways of optimization in determining solutions based on a sequence of decisions. The algorithms here will avoid full enumeration of any partial decisions that have a sub optimal contribution to the final solution and instead concentrate on only optimal contributors (Aho & Hopcrost 1983). The optimal solution is arrived at in a polynomial number of decision steps. At other times it is necessary for the algorithm to consider a full implementation although in many cases only the optimal solution is considered.

Dynamic programming algorithms make use of duplication and every sub problem solution is stored for later referencing. These solutions to the sub problems are held in a table. The total sub problems are then worked out using the bottom up technique. The steps in this bottom up technique will include the following:

Begin by addressing the smallest sub problem

Combine and sum up their solution increasing the scope and size

UNTIL arriving at the solution of the original problem

Dynamic programming relies on the principle of optimality. This principle alludes to the fact that present in an optimal decision or choice sequences are sub sequences that must be optimal as well. Programming this optimality however continues to be an elusive undertaking because of the difficulty of identifying the sub sequences of interest.

Floyd - Warshall Algorithm.

The WFI algorithm as it is also known is a graph analysis algorithm used to determine the shortest path in a weighted graph (Chartrand 1984). A comparison is carried out for all possible paths between nodes of the graph. Consider graph G with nodes V as 1 to N. Let sPath(i, j, k) be the function that returns the shortest path between I and j using the nodes 1 to k, we have a recursive formula that results as shown here below

sPath(i, j, k)

= min{shortestPath(I, j, k-1),shortestPath(i, j, k -1)+shortestPath(k, j, k-1)}

shortestPath(i, j, 0) = edgeCost(i, j)

This forms the heart of the WFI algorithm. The shortest path is first computed as shortestPath(i, j, k) for all (i, j) pairs of k where k = 1 to n.

The Floyd - Warshall algorithm will iteratively determine paths lengths between all nodes (i, j) over i=j.

The initial path is considered as zero, the algorithm provides the path lengths between the nodes. In order to reconstruct the path between two nodes as end points, information about which node is a transit point to a given node is stored. This information will be stored in a matrix for path reconstruction purposes. The Floyd - Warshall algorithm is used to solve among others

Paths for Maximum bandwidth in flow networks

Shortest paths for directed graphs

Optimal routing


Data structures and their associated algorithms are fundamental even today in providing the means for data storage and manipulation (Sage 2006). Core and complex computer processing involving among others the operating system memory management functions, the database management systems cache implementation rely heavily on the data structures and their associative algorithms to execute efficiently and effectively. It is therefore necessary that an adequate study of these data structures and algorithms related with them are carefully studied and understood by system programmers to ensure the design of efficient and effective software.