Prim's Algorithm

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Prim's Algorithm is used to construct a minimum spanning tree. The point being that we attempt to traverse a graph, connected or in some cases unconnected in order to find the minimum distance based on weights of each connection. When we have used every node then we have constructed a minimum spanning tree. An example of this could be BT laying phone lines between sets of houses. If each node represents a house then it is likely that they want to lay the cable in a fashion that makes economic sense. That is, they use the least amount of cable whilst servicing all of the properties.

  • Start from any node
  • Take the shortest path to another linked node
  • If there are 2 routes with the same weight and both are the least values the arbitrarily choose but choose consistently
  • Continue round the graph until all nodes have been used.

Kruskal's algorithm works on the same data as Prim's however we have a slight variation in which the problem is tackled. Instead of linking from one node to the next, Kruskal's algorithm works on the fact that we don't need to traverse from a used edge already. We can just find the minimum spanning tree by just finding the shortest paths, even if the shortest path goes from 2 edges that are currently not being used.

Differences between Kruskal and Prim

For smallish data sizes the algorithms perform very similarly. However we only really start to notice a different as the graph sizes expand. As graph sizes increase physical properties of the graph can start to change. For example graphs can contain many nodes and are said to be dense and some graphs can have fewer edges and these are said to be sparse. This is where the difference in algorithm performance changes.

Kruskal's and Prim's algorithms are said to be greedy, this is essentially that they have no long term view of what is going on. They will sequentially find the next cheapest path and use it. The slight variation is with the implementation in this stage however and was briefly described above. Prim's only adds the next least expensive edge that can connect to the existing, growing sub tree, whereas Kruskal's just picks the next least cost edge in total, as long as it creates no cycles, so that eventually all sub trees connect.

Kruskal's algorithm will work much more efficiently on sparse graphs and runs in O(n log n) time. A graph with many more edges than vertices (dense graph) will be computed more efficiently by Prim's algorithm will outperform it.

  • dense graph: Prim = O(N2), Kruskal = O(E*log(E)) = O(c*N2*log(c*N2)) = O(N2*log(N))
  • sparse graph: Prim=O(N2), Kruskal = O(E*log(E)) = O(c*N*log(c*N)) = O(N log(N))

When looking deeper in to the time complexities of each algorithm it is apparent that Prim's algorithm will work at O(n^2) for both sparse and dense graphs. The change however comes from Kruskal's. Kruskal's algorithm is effected by the number of edges, E, where as Prim's is effected by the number of nodes, N. This is how the O(n^2) is derived. As the graph time increases - so will the computational time increase too. An example of this is if the N value doubles the computational time to perform the desired actions will take 4 times longer.

For dense graphs with Kruskal, E is equal to N^2 and after simplifying we are left with a time factor of O(N2*log(N)), C being a constant factor that is much less than n and therefore can and is ignored. Ultimately this is a less efficient computation that that off the Prim algorithm. To be exact n^2 would be (n(n-1) / 2))

The same logic applies to the sparse graphs, however E is not equal to n^2 anymore but is more closely related to n because as the n value increases the constant time factor will be significantly less than n. After dropping the constant factor we are left with a time complexity of O(N log(N)) which is a better performance than that of the O(N^2) performed by the Prim algorithm.

Binary chop search and Linear Search

Binary and Linear algorithms are used for searching data sets. They both are efficient in terms of searching but there are special instances when one is better than the other.

Binary chop search is dependent on one key factor. The data set must be sorted. It will then proceed to run in a O(log n) time complexity. Conversely, the most efficient sorting algorithms usually work around a time of O(n * log n). This presents with a pitfall in the sense that, for a sufficiently large data set, if we are to only do one search then the time spent sorting, then searching using the binary search method may in fact be slower than the linear search (it's important to remember that the time spent sorting will always be slower than using a linear search). Its therefore best to use binary search if we are to use the search on more than one occasion.

Linear search has a time complexity of O(n). It has no need for the data to be sorted and will traverse through a selection of data until an element is found. If we are to do a one of search on a large data set then the linear search is more efficient. For my example to this question I shall be using binary chop search on pre sorted data to illustrate my point.

Consider we have a collection of numbers from 1 -> 10,000, in ascending order. If these numbers were in a book and we wanted to find where number 9999 was we would essentially open the book somewhere around the middle page onward, eliminating all elements that came before as we would know that the number 9999 was towards the end of the book. This is, in basic terms how a binary chop search would work. By taking a pivot(a number to start the search from, usually the middle element) we can determine if the number is greater or less than the target number we wish to find. In my example of 10,000 the pivot if set to the middle value of the collection would be element 5000. We would then determine that the number 9999 was in fact greater than the element 5000, thus eliminating all values from the search from 1 - > 5000.

This now leaves us with half the amount of data again, and this process will continue until we hit the point at which we have found the element we were looking for. By consistently decreasing the search space by half.

For a linear search, by starting at the first element in the list, and then traversing all elements we would see that if the number we were searching for was close to the last element in the list then the efficiency will be minimal. It would have to go through every element checking the number until it reached its destination.

Issues such as picking a bad pivot can lead to less efficient searches for the binary search however it should still be faster than searching using linear.

In contrast, now describe a programming task and a pair of algorithms A and B realising it, such that A is faster than B, i.e. for sufficiently large data sets. Again, you need to carefully explain why A would be faster than B. (30%)


It is easy to lose sight of the fact that there is more to consider about an algorithm other than how fast it runs. The Big-O can also be used to describe other behavior such as memory consumption. We often optimize by trading memory for time. You may need to choose a slower algorithm because it also consumes less of a resource that you need to be frugal with.

What The Big-O Is Not

Constants: The Big-O is not concerned with factors that do not change as the input increases. Let me give an example that may be suprising. Let's say we have an algorithm that needs to compare every element in an array to every other element in the array.

We have just cut our run time in half - YAY! Guess what, the Big-O has stayed the same O(N^2). This is because N^2 / 2 only has one variable part. The divided by 2 remains the same (constant) regardless of the input size. There are valid mathematical reasons for doing this but it can be frustrating to see two algorithms with the exact same Big-O that results in one running twice as fast as the other.

Implementation Details: The Big-O is an uncaring cold-hearted jerk. It does not care if you can't afford to buy the extra RAM needed for your problem and have to resort to tying your hash to disk. You are on your own. It also doesn't care that the data structure you would need to implement to achieve O(Log Log N) is so complex you will never be able to maintain it. In a nutshell, the Big-O lives in the land of theory and doesn't care very much about the real world.