# Statistics

## Scatter Plot

A scatter plot is designed to show a relationship between two distinct variables in a distribution. This is achieved by plotting a series of data points on two axes, with the variable which is hypothesised to be independent of the other being plotted on the x axis, and the one believed to be dependent on the other plotted on the y axis. For example, if someone was trying to show that someone’s exam results in university had an impact on their future salary, they would plot the exam results on the x axis and future salary on the y axis. This method has been used to show that obtaining a 2:1 degree over a 2:2 degree causes a person to earn an additional £400,000 during their working life: around £10,000 extra each year.

Scatter plots are useful for examining the relationships between a series with a significant number of data points. This is because scatter plots can be used to either visually estimate or statistically calculate any relationship between the two points, the strength of that relationship, whether it is positive or negative, and any outlying data points which may exist. If there is a relationship between two variables, this will be seen on a scatter plot as the variables clustering along a line, which could be straight, curved or disjointed. Indeed, one of the main uses of a scatter plot is determining what type of relationship seems to exist. This relationship can then be tested via statistical means.

It is important to note that, even if a scatter plot demonstrates a significant correlation between two variables, this does not necessarily mean that one has caused another. For example, in the study above, both the future salary and the final degree can be driven by the amount of money a student and their family spend on private tuition and educational support services. In addition, it is possible, particularly for smaller data sets, that what appears to be a correlation could just be a random result.