McAfee SECURE sites help keep you safe from identity theft, credit card fraud, spyware, spam, viruses and online scams

Free Essays - Statistics Essays

Consider and discuss the required approach to fullanalysis of the data set provided.

As part of this explore also how you would test the hypothesis below and explain the reasons for your decisions. Hypothesis 1: Male children are taller than female children. Null hypothesis; There is no difference in height between male children and female children. Hypothesis 2: Taller children are heavier. Null hypothesis: There is no relationship between how tall children are and how much they weigh.

Analysis of data set

The data set is a list of 30 children's gender, age, height,weight, upper and lower limb lengths, eye colour, like of chocolate or not andIQ.

There are two main things to consider before analysing thedata. These are the types of data and the quality of the data as a sample.

Types of data could be nominal, ordinal, interval or ratio.Nominal is also know as categorical. Coolican (1990) gives more details of allof these and his definitions have been used to decide the types of data in thedata set.

It is also helpful to distinguish between continuousnumbers, which could be measured to any number of decimal places an discretenumbers such as integers which have finite jumps like 1,2 etc.

Gender

This variable can only distinguish between male or female.There is no order to this and so the data is nominal.

Age

This variable can take integer values. It could be measuredto decimal places, but is generally only recorded as integer. It is ratio databecause, for example, it would be meaningful to say that a 20 year old personis twice as old as a 10 year old.

In this data set, the ages range from 120 months to 156months. This needs to be consistent with the population being tested.

Height

This variable can take values to decimal places ifnecessary. Again it is ratio data because, for example, it would be meaningfulto say that a person who is 180 cm tall is 1.5 times as tall as someone 120cmtall. In this sample it is measured to the nearest cm.

Weight

Like height, this variable could take be measured to decimalplaces and is ratio data. In this sample it is measured to the nearest kg.

Upper and lower limb lengths

Again this variable is like height and weight and is ratiodata.

Eye colour

This variable can take a limited number of values which areeye colours. The order is not meaningful. This data is therefore nominal(categorical).

Like of chocolate or not

As with eye colour, this variable can take a limited numberof values which are the sample members preferences. In distinguishing merelybetween liking and disliking, the order is not meaningful. This data istherefore nominal (categorical).

IQ

IQ is a scale measurement found by testing each samplemember. As such it is not a ratio scale because it would not be meaningful tosay, for example, that someone with a score of 125 is 25% more intelligent thansomeone with a score of 100.

There is another level of data mentioned by Cooligan intowhich none of the data set variables fit. That is Ordinal Data. This means thatthe data have an order or rank which makes sense. An example would be if 10students tried a test and you recorded who finished quickest, 2ndquickest etc, but not the actual time.

The data is intended to be a sample from a population aboutwhich we can make inferences. For example in the hypothesis tests we want toknow whether they are indicative of population differences. The results canonly be inferred on the population from which it is drawn it would not be validotherwise.

Details of sampling methods were found in Bland (2000). Toaccomplish the required objectives, the sample has to be representative of thedefined population. It would also be more accurate if the sample is stratifiedby known factors like gender and age. This means that, for example, theproportion of males in the sample is the same as the proportion in thepopulation.

Sample size is another consideration. In this case it is 30.Whether this is adequate for the hypotheses being tested is examined below.

Hypothesis 1: Male children are taller than femalechildren.

Swift (2001) gives a very readable account of the hypothesistesting process and the structure of the test.

The first step is to set up the hypotheses:

The Null hypothesis is that there is no difference in heightbetween male children and female children.

If the alternative was as Coolican describes it as "wedo not predict in which direction the results will go then it would have beena two-tailed test. In this case the alternative is that males are taller it istherefore a specific direction and so a one-tailed test is required.

To test the hypothesis we need to set up a test statisticand then either match it against a pre-determined critical value or calculatethe probability of achieving the sample value based on the assumption that thenull hypothesis is true.

The most commonly used significance level is 0.05. Accordingto Swift (2001) the significance level must be decided before the data isknown. This is to stop researchers adjusting the significance level to get theresult that they want rather than accepting or rejecting objectively.

If the test statistic probability is less than 0.05 we wouldreject the null hypothesis that there is no difference between males andfemales in favour of males being heavier on the one sided basis.

However it is possible for the test statistic to be in therejection zone when in fact the null hypothesis is true. This is called a TypeI error.

It is also possible for the test statistic to be in theacceptance zone when the alternative hypothesis is true (in other words thenull hypothesis is false). This is called a Type II error. Power is 1 -probability of a Type II error and is therefore the probability of correctlyrejecting a false null hypothesis. Whereas the Type I error is set at thedesired level, the Type II error depends on the actual value of the alternativehypothesis.

Coolican (1990) sets out the possible outcomes in thefollowing table:

In acceptance zone

In rejection zone

NULL Hypothesis TRUE

OK

Type I error

NULL Hypothesis FALSE

Type II error

OK

Test method

The data for gender is categorical and for height the datais ratio. The sample is effectively split into 2 sub-sets for male and female.

Order Now. It takes less than 2 minutes.

  1.  
  2.  
  3.  
  1.  

Most books give the independent samples t-test as the mainmethod for testing this hypothesis e.g. Curwin, et al (2001), Swift (2001).

Bland (2000) states that in order to use this test thesamples must both be from a normal populations and additionally thedistributions must have the same variance. Bland also suggests modifications tothe test when the variances cannot be assumed to be the same. Programs likeSPSS will calculate both for equal and non-equal variances. SPSS also gives atest for equality of variances.

When the assumptions of normality and independence are metthen the t-test is the best test according to Bland because it has higher powerthan the equivalent non-parametric test which is the Mann-Whitney U-test.However, the Mann-Whitney test is more robust in that it does not assume thatthe data is normally distributed.

It is a matter of weighing up the pros and cons. Ifnormality can be assumed then the independent samples t-test is best. If not,then the U-test should be used. Tests suchj as a histogram or Q-Q(quantile-quantile) plot can be used to check normality to help the decisionBland (2000).

Because the test is one-sided we would be looking for themale mean to be higher and the critical value to come from 0.05 in the one tailof the distribution. For the t test this would be looked up with n1 + n2 - 2degrees of freedom where n1 and n2 are the numbers of males and femalesrespectively.

It is also useful to work out a 95% confidence interval forthe population mean. This gives an idea of the spread of the estimate. Largersample sizes will reduce the confidence interval.

It was mentioned above that the inferences made are onlyvalid for the population being sampled and only so if the sample isrepresentative, which means selecting the sample from the whole population suchthat each member has equal probability of selection.

For the results to be reliable as Coolican (1990) says thatif a research finding can be repeated it is reliable. So, if the sample isrepeated the same result would indicate reliability.

Hypothesis 2: Taller children are heavier.

The null hypothesis is that there is no relationship betweenhow tall children are and how much they weigh. The alternative hypothesis isthat taller children are heavier, which is a one-sided test. That is, thealternative is not simply that there is a relationship, which would betwo-sided.

Both heights and weights are ratio data. This enables thedata to be examined by tests where normality is an underlying assumption.

In order to visually check the relationship a scatter graphis pretty well essential. This would give an idea of the strength and nature ofthe relationship. The relationship may not be linear as is often assumed. If sothen the scatter should show indication of a curve.

The strength of the relationship can be tested by using thePearson correlation coefficient ( r ). This is closely related to a regressionanalysis which would be fitting a straight line equation to the data withheight being the independent (x) variable and weight being dependent (y).

The correlation coefficient can be tested using a 1 sidedt-test. This has n-2 degrees of freedom, 28 in this case. The value of r wouldneed to be positive to indicate that taller children are heavier.

Analysis of the regression residuals can give us a lot moreinformation than simply carrying out a correlation calculation. See Bland(2000). They can be plotted to see whether they are normally distributed usinga histogram or Q-Q plot. Also, non-linearity should be apparent if this is thecase.

If the data shows a non-linear relationship then it would benecessary to transform the data using logs or other mathematical functions. Thetransformed variables would then need to be analyses for normality andlinearity.

According to Bland there is an alternative to the Pearsoncorrelation coefficient which does not assume that the data is normallydistributed. This is the Spearman Rank Correlation Coefficient. This is basedon the distribution of the ranks of the data and not the data itself. Thismakes it more robust in terms of departures from assumptions, however it isless powerful. In other words there is more chance of making a Type II error.

Again, if the sample is repeated the same result wouldindicate reliability.

Summary

The stages that need to be gone through in order to testhypotheses such as those above is as follows.

References

Bland, J.Martin (2000) An Introduction to MedicalStatistics 3rd Edition Oxford. Oxford Medical Publications.

Curwin, Jon and Slater, Roger (2001) QuantitativeMethods for Business Decisions

London, Thomson Learning

Coolican, Hugh (1990) Research Methods and Statistics inPsychology London, Hodder and Stoughton

Swift, Louise (2001) Quantitative Methods for Business,Management and Finance, Basingstoke, Palgrave

Find out how a custom written essay can help you

Click here

All of the essays in this section were written by students and then submitted to us to publish and help others. Thanks to all of the students who have submitted their essays to us. You should not hand in our essays as your own. We do not condone plagiarism! If you need custom essays on your exact essay questions, then have a look at our essay writing service.

Sign up and be the first to receive our latest offers: