Box plots were developed in the 1970s to display statistical measures of data. They are designed to show, on a graph, the minimum and maximum values, the median value and the top and bottom quartiles for a given set of data. They consist of a box, which surrounds the middle half of the data, containing a line where the media value is. In addition, there are two lines stretching from each end of the box. The extents of these lines are the minimum and maximum data values of the set.
As a result of this, the top edge, or hinge, of the box represents the upper quartile, with 75% of the data below it, and the bottom edge represents the lower quartile, with 25% of the data below it. The distance between the two lines represents the inter-quartile range. The ends of the lines which extend from the box, or the whiskers, show where the minimum and maximum values are for the data. However, if there are significant outlying points, which are some way from the quartiles, then the whiskers will only reach as far as 1.5 times the inter-quartile range.
As such, it is possible to determine if the data is skewed by where the median sits within the box relative to the inter-quartile ranges. In addition, a diamond shape can be used to show the mean and the confidence interval for the mean, hence demonstrating how the mean and media compare. In addition, the width of the box plot can be proportional to logarithm of the size of the data set, hence allowing different size samples to be compared.
Advantages and limitations of Box plots
Box plots make it easier to display the key statistical variables of a data set, and the relationship between them. They can also help demonstrate any symmetry and skew which may exist in the data, as well as the nature and significance of any outliers. Finally, placing multiple box plots on the same set of axes allows a quick and easy comparison of different data sets.
However, the fact that box plots explicitly focus on the outliers in a distribution can cause problems in interpretation, as generally the outliers are the points with the most uncertainty. In addition, the use of box plots can hide much of the finer details of a probability distribution, and can lead users to ignore some significant points.
On top of our MBA help guides we also have a range of free resources covering the topic of statistics: