Free Essays - Statistics Essays
A comprehensive statistical analysis on the relationship between the two variables investment fund performance andinvestment fund management fee.
The project aimsat drawing (statistical) conclusions about the relation(if any) between theperformance of the investment funds(in terms of price) and money spend as thecost of managing the respective funds, the significance of the relation andthereby provide some support in decision making for higher level managementregarding this. Please note that the project only attempts to find if changein one variable is accompanied by the change in the other and this in no wayestablishes the cause and effect relationship i.e. we cannot conclude, atleast by this exercise, whether increase in spending on the management feescauses increase in the fund performance. (the cause and effect relationshipcannot be found by just using statistical techniques but it also requires theexhaustive study of the nature of the funds and the management involved). Ifthe correlation exists between the above two factors then it could mean eitherone of them influences the other or both of them influence each otherconsistently or intermittently or both of them are influenced by some otherthird factor or the correlation could be out of pure chance(i.e. because of thechoice of a wrong sample).
We come acrosssimilar situations in reality where we need to find whether two or moreentities are related to one another and the nature and the degree of theirrelationships. For example the age of husband and wife, price of a commodityand the amount demanded, an increase in rainfall up to a point and productionof rice etc. (examples involving only two entities are considered here becauseit would be easier to discuss, and the given project also involves only twoentities and the same ideas can be extended to the cases involving more thantwo entities). Since the entities assume different values depending upon thetime place or persons these are referred to as variables in Statistics. If arelationship (measurable) exists between the two variables then the variablesare said to be correlated. The measure of the correlation called thecorrelation coefficient or the correlation index gives the degree and thenature of the correlation (i.e. increase in one variable corresponds toincrease or decrease in the other variable) in terms of a number. Thecorrelation analysis deals with the techniques used in measuring the closenessof the relationship between the variables.
We give someimportant definitions in this regard:
- Correlation Analysis deals with the association between two or more variables. (By Simpson and Kafka).
- If two or more quantities vary in sympathy so that movement in one tends to be accompanied by a corresponding movement in the other(s) then they are said to be correlated.(by L.R.Connor)
- When the relationship is of a quantitative nature, the appropriate statistical tool for discovering and measuring the relationship and expressing it in brief formula is known as correlation. (Croxton and Cowden).
There areseveral ways of classifying correlation. Three of the most important are:
- positive (direct) or negative (inverse)correlation:- if one variable increases the othervariable also increases on average and if one decreases the other decreases onaverage then they are said to be positively correlated. If one variableincreases and the other decreases on average and if one decreases the otherincreases on average the two are said to be negatively or inversely correlated.
- Simple, partial and multiple correlation:- when only two variables are studied it is a problem of simplecorrelation. When three or more variables are studied it is a problem of eitherpartial or multiple correlation. In multiple correlation three or morevariables are studied simultaneously. For example, when we study therelationship between the amount of fee paid to a plastic surgeon, thecomplexity of the operation and the quality of their work (in terms of resultsetc.) then it is a problem of multiple correlation. If we consider only twovariables, say, the quality of work and the fee paid to be influencing eachother and the effect of the other influencing variable is kept constant then itis a problem of partial correlation.
- Linear and non-linear (curvilinear)correlation:- if the amount of change in onevariable tends to bear a constant ratio with the amount of change in the otherthen the correlation is said to be linear. If we draw a graph with one variableon X-axis and the other on Y-axis then almost all the point will approximatelyfall on a line. If amount of change in one variable does not bear a constantratio with the amount of change in the other then the correlation is said to benon-linear. In most of the practical situations we find a non-linearrelationship between the variables. But the techniques of analysis formeasuring non-linear correlation are far more complicated than those for linearcorrelation. Therefore, we generally make an assumption that the relationbetween the variables is of linear type.
The various methods of ascertaining whether two variables arecorrelated or not are:-
- Scatter diagram method: - This is the simplest method. Here we take one variable on X-axis,the other on Y-axis and plot the points. The greater the scatter of the plottedpoints lesser is the relationship between the two variables. The more closelythe points come to a straight line, the higher the degree of linearrelationship. The correlation is positive or negative depending upon the signof the slope of this line. Merits of the method are that it is simple, easy tounderstand and the rough idea can be easily formed as to whether or not thevariables are related. It is not influenced by the size of extreme itemswhereas most of the mathematical methods of finding correlation are influencedby extreme items. While investigating the correlation we usually first draw thescatter diagram. Its drawback is that the exact degree of correlation cannot beestablished as it is done with mathematical methods.
- Graphic method: -in this method we obtain two curves for X and Y variable respectively. Byexamining the direction and closeness of the two curves we can infer whether ornot the two variables are related. Merits and demerits are same as those forscatter diagram.
- Karl Pearson's correlation coefficient :- Also called Pearsonian correlation coefficient denoted(universally)by r is given by r = (Sxy) / Nsxsy where x and y are the deviations ofthe variable values from their respective means, N is the number of the pairsof observations and sx, sy are the standard deviations of the variables X and Y respectively.The above formula can also be written in simplified form as r = (Sxy) / (Sx2. Sy2)1/2.Note that this method is to be applied only where the deviations of items, xand y, are taken from the actual means and not from the assumed means. Correlationcoefficient can also be obtained directly without taking the deviations of theitems either from actual means or assumed means by the formula r = (Nixie - SxSy) / {[NSx2 -(Sx)1/2][NSy2- (Sy)1/2]}1/2where x and y are the values of the variables X and Y respectively and not thedeviations from the means as in the earlier formulas. When the deviations aretaken from assumed means(for example, if the values of X and Y are integral butthe means involve fractions the to make calculations simple we take deviationsfrom some integers near to the actual means which are called assumed means) theformula is identical as the one given immediately before with the onlydifference that the actual values of x and y are replaced by the deviationsfrom the assumed means. The Pearsonaion correlation coefficient is based on theassumptions that i) there is a linear relationship between the variables, ii)the two variables form a normal distribution and iii) there is a cause andeffect relationship between the variables. The chief limitations of this methodare i) linear relationship between the variables is assumed ii) the coefficientis prone to misinterpretation iii) the coefficient is unduly affected byextreme items iv) comparatively more time consuming.
- Rank correlation coefficient: - This was developed by the British psychologist, Charles EdwardSpearman and hence named after him. This method does not assume any thing aboutthe parameters of the population or the shape of the distribution. This methodis especially useful when quantitative measures of certain factors cannot befixed but the members of the group can be ranked. The Spearman's rankcorrelation coefficient is defined as rs = 1 - (6SD2) /N (N2 -1) where D denotes the difference of ranks between paireditems. The advantages of this method are i) simpler to understand and easier toapply and if all items are different the coefficient is same as Pearsonian's. ii)advantageous for the data of qualitative nature. For example, surgeons in twocountries can be ranked in order of professionalism and the degree ofcorrelation can be established by applying this method. iii) this is the onlymethod that can be used when the actual data is not given but only the ranksare given iv) even where actual data are given this method can be applied. Itslimitations are that it cannot be used for finding out correlation in groupedfrequency distribution and it cannot be used if the number of items exceed 30
- Concurrent deviation method :- This is the simplest of all methods. The formula is rc =+[+(2C -N)/N]1/2 where C stands for the number ofconcurrent deviations and N = number of pairs of observations less 1. Themethod is simplest of all and may be used to form a quick idea about the degreeof relationship before making use of more complicated methods. It's limitationsare that it does not differentiate between small and big changes. For example,if X changes from 100 to 101 the sign will be plus and if Y changes from 100 to160 the sign will be plus. The results obtained from this method are only arough indicator of the presence or absence of correlation.
Interpretation of correlation coefficient :- the correlationcoefficient is often likely to be misinterpreted. A large amount of experienceis required to interpret it properly. The general rules of interpreting are :
- r = +1 means there is perfect positive relationship between the two variables.
- r = -1 means there is perfect negative relationship between the two variables.
- r = 0 means there is no relationship between the variables.
- Closer the value of r to +1 or -1, the closer the relationship between the variables and closer the r is to 0, the less close the relationship. When estimating the value of one variable from that of the other variable, the higher the value of r the better the estimate.
- The closeness of relationship is not proportional to r. if the value of r is 0.8 it does not indicate a relationship twice as close as one of 0.4. It is in fact very much closer.
The probableerror of correlation coefficient is defined as P.E.r = 0.6745(1 - r2)/✔Nwhere r is the correlation coefficient and N is the number of pairs ofobservations. If the value of r is less than the probable error there is noevidence of correlation, i.e. the value of r is not at all significant. If r> 6 P.E.r the value of r is significant. If r is thecorrelation coefficient of the population then r- P.E.r < r < r+ P.E.r.
Note: thestandard error of r is defined as S.E.r = (1 - r2)/✔N.
The probableerror can be used only when the data approximately satisfies normaldistribution and the sample is unbiased.
The coefficientof determination which is equal to and denoted by r2 and is definedas the ratio of the explained variance to the total variance. It should not bemisinterpreted that the variable X is in determining or casual relationshipwith Y as the statistical evidence never establishes this kind of causality.The statistical evidence only determines covariation.
Some of theproperties of the correlation coefficient:-
The correlationcoefficient r lies between -1 and +1. it is independent of the scale and originof the variable X and Y. it is equal to the geometric mean of the tworegression coefficients.
The correlationanalysis is included in the adjoining Excel sheet. The idea of probable error isused to test the significance of the correlation coefficient.
Statistics Essays - Find your free statistics essays...
We have a large assortment of free statistics essays available to use as research material. Visit our statistics essays from our free essays section.
All of the essays in the Free Essays section were written by students and then submitted to us to display and help others. Thanks to all the students who have submitted their essays to us. You should not hand in our essays as your own. We do not condone plagiarism!
