UK Essays - The UK's leading provider of custom essays, free essays, dissertations and coursework...

Free Essays - Statistics Essays

A comprehensive statistical analysis on the relationship between the two variables investment fund performance and investment fund management fee.

The project aims at drawing (statistical) conclusions about the relation(if any) between the performance of the investment funds(in terms of price) and money spend as the cost of managing the respective funds, the significance of the relation and thereby provide some support in decision making for higher level management regarding this. Please note that the project only attempts to find if change in one variable is accompanied by the change in the other and this in no way establishes the cause and effect relationship i.e. we cannot conclude, at least by this exercise, whether increase in spending on the management fees causes increase in the fund performance. (the cause and effect relationship cannot be found by just using statistical techniques but it also requires the exhaustive study of the nature of the funds and the management involved). If the correlation exists between the above two factors then it could mean either one of them influences the other or both of them influence each other consistently or intermittently or both of them are influenced by some other third factor or the correlation could be out of pure chance(i.e. because of the choice of a wrong sample).

We come across similar situations in reality where we need to find whether two or more entities are related to one another and the nature and the degree of their relationships. For example the age of husband and wife, price of a commodity and the amount demanded, an increase in rainfall up to a point and production of rice etc. (examples involving only two entities are considered here because it would be easier to discuss, and the given project also involves only two entities and the same ideas can be extended to the cases involving more than two entities). Since the entities assume different values depending upon the time place or persons these are referred to as variables in Statistics. If a relationship (measurable) exists between the two variables then the variables are said to be correlated. The measure of the correlation called the correlation coefficient or the correlation index gives the degree and the nature of the correlation (i.e. increase in one variable corresponds to increase or decrease in the other variable) in terms of a number. The correlation analysis deals with the techniques used in measuring the closeness of the relationship between the variables.

We give some important definitions in this regard:

  • Correlation Analysis deals with the association between two or more variables. (By Simpson and Kafka).
  • If two or more quantities vary in sympathy so that movement in one tends to be accompanied by a corresponding movement in the other(s) then they are said to be correlated.(by L.R.Connor)
  • When the relationship is of a quantitative nature, the appropriate statistical tool for discovering and measuring the relationship and expressing it in brief formula is known as correlation. (Croxton and Cowden).

There are several ways of classifying correlation. Three of the most important are:

  • positive (direct) or negative (inverse) correlation:- if one variable increases the other variable also increases on average and if one decreases the other decreases on average then they are said to be positively correlated. If one variable increases and the other decreases on average and if one decreases the other increases on average the two are said to be negatively or inversely correlated.
  • Simple, partial and multiple correlation:- when only two variables are studied it is a problem of simple correlation. When three or more variables are studied it is a problem of either partial or multiple correlation. In multiple correlation three or more variables are studied simultaneously. For example, when we study the relationship between the amount of fee paid to a plastic surgeon, the complexity of the operation and the quality of their work (in terms of results etc.) then it is a problem of multiple correlation. If we consider only two variables, say, the quality of work and the fee paid to be influencing each other and the effect of the other influencing variable is kept constant then it is a problem of partial correlation.
  • Linear and non-linear (curvilinear) correlation:- if the amount of change in one variable tends to bear a constant ratio with the amount of change in the other then the correlation is said to be linear. If we draw a graph with one variable on X-axis and the other on Y-axis then almost all the point will approximately fall on a line. If amount of change in one variable does not bear a constant ratio with the amount of change in the other then the correlation is said to be non-linear. In most of the practical situations we find a non-linear relationship between the variables. But the techniques of analysis for measuring non-linear correlation are far more complicated than those for linear correlation. Therefore, we generally make an assumption that the relation between the variables is of linear type.

The various methods of ascertaining whether two variables are correlated or not are:-

  • Scatter diagram method: - This is the simplest method. Here we take one variable on X-axis, the other on Y-axis and plot the points. The greater the scatter of the plotted points lesser is the relationship between the two variables. The more closely the points come to a straight line, the higher the degree of linear relationship. The correlation is positive or negative depending upon the sign of the slope of this line. Merits of the method are that it is simple, easy to understand and the rough idea can be easily formed as to whether or not the variables are related. It is not influenced by the size of extreme items whereas most of the mathematical methods of finding correlation are influenced by extreme items. While investigating the correlation we usually first draw the scatter diagram. Its drawback is that the exact degree of correlation cannot be established as it is done with mathematical methods.
  • Graphic method: - in this method we obtain two curves for X and Y variable respectively. By examining the direction and closeness of the two curves we can infer whether or not the two variables are related. Merits and demerits are same as those for scatter diagram.
  • Karl Pearson's correlation coefficient :- Also called Pearsonian correlation coefficient denoted(universally) by r is given by r = (Sxy) / Nsxsy where x and y are the deviations of the variable values from their respective means, N is the number of the pairs of observations and sx, sy are the standard deviations of the variables X and Y respectively. The above formula can also be written in simplified form as r = (Sxy) / (Sx2. Sy2)1/2. Note that this method is to be applied only where the deviations of items, x and y, are taken from the actual means and not from the assumed means. Correlation coefficient can also be obtained directly without taking the deviations of the items either from actual means or assumed means by the formula r = (Nixie - SxSy) / {[NSx2 - (Sx)1/2][ NSy2 - (Sy)1/2]}1/2 where x and y are the values of the variables X and Y respectively and not the deviations from the means as in the earlier formulas. When the deviations are taken from assumed means(for example, if the values of X and Y are integral but the means involve fractions the to make calculations simple we take deviations from some integers near to the actual means which are called assumed means) the formula is identical as the one given immediately before with the only difference that the actual values of x and y are replaced by the deviations from the assumed means. The Pearsonaion correlation coefficient is based on the assumptions that i) there is a linear relationship between the variables, ii) the two variables form a normal distribution and iii) there is a cause and effect relationship between the variables. The chief limitations of this method are i) linear relationship between the variables is assumed ii) the coefficient is prone to misinterpretation iii) the coefficient is unduly affected by extreme items iv) comparatively more time consuming.
  • Rank correlation coefficient: - This was developed by the British psychologist, Charles Edward Spearman and hence named after him. This method does not assume any thing about the parameters of the population or the shape of the distribution. This method is especially useful when quantitative measures of certain factors cannot be fixed but the members of the group can be ranked. The Spearman's rank correlation coefficient is defined as rs = 1 - (6SD2) / N (N2 -1) where D denotes the difference of ranks between paired items. The advantages of this method are i) simpler to understand and easier to apply and if all items are different the coefficient is same as Pearsonian's. ii) advantageous for the data of qualitative nature. For example, surgeons in two countries can be ranked in order of professionalism and the degree of correlation can be established by applying this method. iii) this is the only method that can be used when the actual data is not given but only the ranks are given iv) even where actual data are given this method can be applied. Its limitations are that it cannot be used for finding out correlation in grouped frequency distribution and it cannot be used if the number of items exceed 30

  • Concurrent deviation method :- This is the simplest of all methods. The formula is rc = +[+(2C -N)/N]1/2 where C stands for the number of concurrent deviations and N = number of pairs of observations less 1. The method is simplest of all and may be used to form a quick idea about the degree of relationship before making use of more complicated methods. It's limitations are that it does not differentiate between small and big changes. For example, if X changes from 100 to 101 the sign will be plus and if Y changes from 100 to 160 the sign will be plus. The results obtained from this method are only a rough indicator of the presence or absence of correlation.

Interpretation of correlation coefficient :- the correlation coefficient is often likely to be misinterpreted. A large amount of experience is required to interpret it properly. The general rules of interpreting are :

  • r = +1 means there is perfect positive relationship between the two variables.
  • r = -1 means there is perfect negative relationship between the two variables.
  • r = 0 means there is no relationship between the variables.
  • Closer the value of r to +1 or -1, the closer the relationship between the variables and closer the r is to 0, the less close the relationship. When estimating the value of one variable from that of the other variable, the higher the value of r the better the estimate.
  • The closeness of relationship is not proportional to r. if the value of r is 0.8 it does not indicate a relationship twice as close as one of 0.4. It is in fact very much closer.

The probable error of correlation coefficient is defined as P.E.r = 0.6745(1 - r2)/N where r is the correlation coefficient and N is the number of pairs of observations. If the value of r is less than the probable error there is no evidence of correlation, i.e. the value of r is not at all significant. If r > 6 P.E.r the value of r is significant. If r is the correlation coefficient of the population then r- P.E.r < r < r+ P.E.r.

Note: the standard error of r is defined as S.E.r = (1 - r2)/N.

The probable error can be used only when the data approximately satisfies normal distribution and the sample is unbiased.

The coefficient of determination which is equal to and denoted by r2 and is defined as the ratio of the explained variance to the total variance. It should not be misinterpreted that the variable X is in determining or casual relationship with Y as the statistical evidence never establishes this kind of causality. The statistical evidence only determines covariation.

Some of the properties of the correlation coefficient:-

The correlation coefficient r lies between -1 and +1. it is independent of the scale and origin of the variable X and Y. it is equal to the geometric mean of the two regression coefficients.

The correlation analysis is included in the adjoining Excel sheet. The idea of probable error is used to test the significance of the correlation coefficient.

Statistics Essays - Find your free statistics essays...

To prove the quality of our work and assure you of the standards we adhere to we’ve given you some samples from our vast library of statistics essays from our free essays section.

Please note: All of the essays in the "Free Essays" section were written by students and then submitted to us to display and help others. Thanks to all the students who have submitted their essays to us. You should not hand in our essays as your own. We do not condone plagiarism!