A comprehensive statistical analysis on the
relationship between the two variables investment fund performance and
investment fund management fee.
The project aims
at drawing (statistical) conclusions about the relation(if any) between the
performance of the investment funds(in terms of price) and money spend as the
cost of managing the respective funds, the significance of the relation and
thereby provide some support in decision making for higher level management
regarding this. Please note that the project only attempts to find if change
in one variable is accompanied by the change in the other and this in no way
establishes the cause and effect relationship i.e. we cannot conclude, at
least by this exercise, whether increase in spending on the management fees
causes increase in the fund performance. (the cause and effect relationship
cannot be found by just using statistical techniques but it also requires the
exhaustive study of the nature of the funds and the management involved). If
the correlation exists between the above two factors then it could mean either
one of them influences the other or both of them influence each other
consistently or intermittently or both of them are influenced by some other
third factor or the correlation could be out of pure chance(i.e. because of the
choice of a wrong sample).

We come across
similar situations in reality where we need to find whether two or more
entities are related to one another and the nature and the degree of their
relationships. For example the age of husband and wife, price of a commodity
and the amount demanded, an increase in rainfall up to a point and production
of rice etc. (examples involving only two entities are considered here because
it would be easier to discuss, and the given project also involves only two
entities and the same ideas can be extended to the cases involving more than
two entities). Since the entities assume different values depending upon the
time place or persons these are referred to as variables in Statistics. If a
relationship (measurable) exists between the two variables then the variables
are said to be correlated. The measure of the correlation called the
correlation coefficient or the correlation index gives the degree and the
nature of the correlation (i.e. increase in one variable corresponds to
increase or decrease in the other variable) in terms of a number. The
correlation analysis deals with the techniques used in measuring the closeness
of the relationship between the variables.
We give some
important definitions in this regard:
- Correlation
Analysis deals with the association between two or more variables. (By
Simpson and Kafka).
- If two or more
quantities vary in sympathy so that movement in one tends to be
accompanied by a corresponding movement in the other(s) then they are said
to be correlated.(by L.R.Connor)
- When the
relationship is of a quantitative nature, the appropriate statistical tool
for discovering and measuring the relationship and expressing it in brief
formula is known as correlation. (Croxton and Cowden).
There are
several ways of classifying correlation. Three of the most important are:
- positive (direct) or negative (inverse)
correlation:- if one variable increases the other
variable also increases on average and if one decreases the other decreases on
average then they are said to be positively correlated. If one variable
increases and the other decreases on average and if one decreases the other
increases on average the two are said to be negatively or inversely correlated.
- Simple, partial and multiple correlation:- when only two variables are studied it is a problem of simple
correlation. When three or more variables are studied it is a problem of either
partial or multiple correlation. In multiple correlation three or more
variables are studied simultaneously. For example, when we study the
relationship between the amount of fee paid to a plastic surgeon, the
complexity of the operation and the quality of their work (in terms of results
etc.) then it is a problem of multiple correlation. If we consider only two
variables, say, the quality of work and the fee paid to be influencing each
other and the effect of the other influencing variable is kept constant then it
is a problem of partial correlation.
- Linear and non-linear (curvilinear)
correlation:- if the amount of change in one
variable tends to bear a constant ratio with the amount of change in the other
then the correlation is said to be linear. If we draw a graph with one variable
on X-axis and the other on Y-axis then almost all the point will approximately
fall on a line. If amount of change in one variable does not bear a constant
ratio with the amount of change in the other then the correlation is said to be
non-linear. In most of the practical situations we find a non-linear
relationship between the variables. But the techniques of analysis for
measuring non-linear correlation are far more complicated than those for linear
correlation. Therefore, we generally make an assumption that the relation
between the variables is of linear type.
The various methods of ascertaining whether two variables are
correlated or not are:-
-
Scatter diagram method: - This is the simplest method. Here we take one variable on X-axis,
the other on Y-axis and plot the points. The greater the scatter of the plotted
points lesser is the relationship between the two variables. The more closely
the points come to a straight line, the higher the degree of linear
relationship. The correlation is positive or negative depending upon the sign
of the slope of this line. Merits of the method are that it is simple, easy to
understand and the rough idea can be easily formed as to whether or not the
variables are related. It is not influenced by the size of extreme items
whereas most of the mathematical methods of finding correlation are influenced
by extreme items. While investigating the correlation we usually first draw the
scatter diagram. Its drawback is that the exact degree of correlation cannot be
established as it is done with mathematical methods.
- Graphic method: -
in this method we obtain two curves for X and Y variable respectively. By
examining the direction and closeness of the two curves we can infer whether or
not the two variables are related. Merits and demerits are same as those for
scatter diagram.
- Karl Pearson's correlation coefficient :- Also called Pearsonian correlation coefficient denoted(universally)
by r is given by r = (Sxy) / Nsxsy where x and y are the deviations of
the variable values from their respective means, N is the number of the pairs
of observations and sx, sy are the standard deviations of the variables X and Y respectively.
The above formula can also be written in simplified form as r = (Sxy) / (Sx2. Sy2)1/2.
Note that this method is to be applied only where the deviations of items, x
and y, are taken from the actual means and not from the assumed means. Correlation
coefficient can also be obtained directly without taking the deviations of the
items either from actual means or assumed means by the formula r = (Nixie - SxSy) / {[NSx2 -
(Sx)1/2][
NSy2
- (Sy)1/2]}1/2
where x and y are the values of the variables X and Y respectively and not the
deviations from the means as in the earlier formulas. When the deviations are
taken from assumed means(for example, if the values of X and Y are integral but
the means involve fractions the to make calculations simple we take deviations
from some integers near to the actual means which are called assumed means) the
formula is identical as the one given immediately before with the only
difference that the actual values of x and y are replaced by the deviations
from the assumed means. The Pearsonaion correlation coefficient is based on the
assumptions that i) there is a linear relationship between the variables, ii)
the two variables form a normal distribution and iii) there is a cause and
effect relationship between the variables. The chief limitations of this method
are i) linear relationship between the variables is assumed ii) the coefficient
is prone to misinterpretation iii) the coefficient is unduly affected by
extreme items iv) comparatively more time consuming.
- Rank correlation coefficient: - This was developed by the British psychologist, Charles Edward
Spearman and hence named after him. This method does not assume any thing about
the parameters of the population or the shape of the distribution. This method
is especially useful when quantitative measures of certain factors cannot be
fixed but the members of the group can be ranked. The Spearman's rank
correlation coefficient is defined as rs = 1 - (6SD2) /
N (N2 -1) where D denotes the difference of ranks between paired
items. The advantages of this method are i) simpler to understand and easier to
apply and if all items are different the coefficient is same as Pearsonian's. ii)
advantageous for the data of qualitative nature. For example, surgeons in two
countries can be ranked in order of professionalism and the degree of
correlation can be established by applying this method. iii) this is the only
method that can be used when the actual data is not given but only the ranks
are given iv) even where actual data are given this method can be applied. Its
limitations are that it cannot be used for finding out correlation in grouped
frequency distribution and it cannot be used if the number of items exceed 30
- Concurrent deviation method :- This is the simplest of all methods. The formula is rc =
+[+(2C -N)/N]1/2 where C stands for the number of
concurrent deviations and N = number of pairs of observations less 1. The
method is simplest of all and may be used to form a quick idea about the degree
of relationship before making use of more complicated methods. It's limitations
are that it does not differentiate between small and big changes. For example,
if X changes from 100 to 101 the sign will be plus and if Y changes from 100 to
160 the sign will be plus. The results obtained from this method are only a
rough indicator of the presence or absence of correlation.
Interpretation of correlation coefficient :- the correlation
coefficient is often likely to be misinterpreted. A large amount of experience
is required to interpret it properly. The general rules of interpreting are :
- r = +1 means
there is perfect positive relationship between the two variables.
- r = -1 means
there is perfect negative relationship between the two variables.
- r = 0 means
there is no relationship between the variables.
- Closer the
value of r to +1 or -1, the closer the relationship between the variables
and closer the r is to 0, the less close the relationship. When estimating
the value of one variable from that of the other variable, the higher the
value of r the better the estimate.
- The closeness
of relationship is not proportional to r. if the value of r is 0.8 it does
not indicate a relationship twice as close as one of 0.4. It is in fact
very much closer.
The probable
error of correlation coefficient is defined as P.E.r = 0.6745(1 - r2)/✔N
where r is the correlation coefficient and N is the number of pairs of
observations. If the value of r is less than the probable error there is no
evidence of correlation, i.e. the value of r is not at all significant. If r
> 6 P.E.r the value of r is significant. If r is the
correlation coefficient of the population then r- P.E.r < r < r+ P.E.r.
Note: the
standard error of r is defined as S.E.r = (1 - r2)/✔N.
The probable
error can be used only when the data approximately satisfies normal
distribution and the sample is unbiased.
The coefficient
of determination which is equal to and denoted by r2 and is defined
as the ratio of the explained variance to the total variance. It should not be
misinterpreted that the variable X is in determining or casual relationship
with Y as the statistical evidence never establishes this kind of causality.
The statistical evidence only determines covariation.
Some of the
properties of the correlation coefficient:-
The correlation
coefficient r lies between -1 and +1. it is independent of the scale and origin
of the variable X and Y. it is equal to the geometric mean of the two
regression coefficients.
The correlation
analysis is included in the adjoining Excel sheet. The idea of probable error is
used to test the significance of the correlation coefficient.
Statistics Essays - Find your free statistics essays...
To prove the quality of our work and assure you of the standards we adhere to we’ve given you some samples from our vast library of statistics essays from our free essays section.
Please note: All of the essays in the "Free Essays" section were written by students and then submitted to us to display and help others. Thanks to all the students who have submitted their essays to us. You should not hand in our essays as your own. We do not condone plagiarism!