# Statistics Essays | Analysis of Data

1049 words (4 pages) Essay

24th Apr 2017 Statistics Reference this

**Disclaimer:** This work has been submitted by a university student. This is not an example of the work produced by our Essay Writing Service. You can view samples of our professional work here.

Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UKEssays.com.

## Consider and discuss the required approach to analysis of the data set provided.

As part of this explore also how you would test the hypothesis below and explain the reasons for your decisions. Hypothesis 1: Male children are taller than female children. Null hypothesis; There is no difference in height between male children and female children. Hypothesis 2: Taller children are heavier. Null hypothesis: There is no relationship between how tall children are and how much they weigh.

Analysis of data set

The data set is a list of 30 children’s gender, age, height, the data weight, upper and lower limb lengths, eye colour, like of chocolate or not andIQ.

There are two main things to consider before and the data. These are the types of data and the quality of the data as a sample.

Types of data could be nominal, ordinal, interval or ratio.Nominal is also know as categorical. Coolican (1990) gives more details of all of these and his definitions have been used to decide the types of data in the data set.

It is also helpful to distinguish between continuous numbers, which could be measured to any number of decimal places an discrete numbers such as integers which have finite jumps like 1,2 etc.

Gender

This variable can only distinguish between male or female.There is no order to this and so the data is nominal.

Age

This variable can take integer values. It could be measured to decimal places, but is generally only recorded as integer. It is ratio data because, for example, it would be meaningful to say that a 20 year old person is twice as old as a 10 year old.

In this data set, the ages range from 120 months to 156months. This needs to be consistent with the population being tested.

Height

This variable can take values to decimal places if necessary. Again it is ratio data because, for example, it would be meaningful to say that a person who is 180 cm tall is 1.5 times as tall as someone 120cmtall. In this sample it is measured to the nearest cm.

Weight

Like height, this variable could take be measured to decimal places and is ratio data. In this sample it is measured to the nearest kg.

Upper and lower limb lengths

Again this variable is like height and weight and is ratio data.

Eye colour

This variable can take a limited number of values which are eye colours. The order is not meaningful. This data is therefore nominal(categorical).

Like of chocolate or not

As with eye colour, this variable can take a limited number of values which are the sample members preferences. In distinguishing merely between liking and disliking, the order is not meaningful. This data is therefore nominal (categorical).

IQ

IQ is a scale measurement found by testing each sample member. As such it is not a ratio scale because it would not be meaningful to say, for example, that someone with a score of 125 is 25% more intelligent than someone with a score of 100.

Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.

View our servicesThere is another level of data mentioned by Cooligan into which none of the data set variables fit. That is Ordinal Data. This means that the data have an order or rank which makes sense. An example would be if 10students tried a test and you recorded who finished quickest, 2^{nd}quickest etc, but not the actual time.

The data is intended to be a sample from a population about which we can make inferences. For example in the hypothesis tests we want toknow whether they are indicative of population differences. The results can only be inferred on the population from which it is drawn it would not be valid otherwise.

Details of sampling methods were found in Bland (2000). To accomplish the required objectives, the sample has to be representative of the defined population. It would also be more accurate if the sample is stratified by known factors like gender and age. This means that, for example, the proportion of males in the sample is the same as the proportion in the population.

Sample size is another consideration. In this case it is 30.Whether this is adequate for the hypotheses being tested is examined below.

Hypothesis 1: Male children are taller than female children.

Swift (2001) gives a very readable account of the hypothesis testing process and the structure of the test.

The first step is to set up the hypotheses:

The Null hypothesis is that there is no difference in height between male children and female children.

If the alternative was as Coolican describes it as “we do not predict in which direction the results will go then it would have been a two-tailed test. In this case the alternative is that males are taller it is therefore a specific direction and so a one-tailed test is required.

To test the hypothesis we need to set up a test statistic and then either match it against a pre-determined critical value or calculate the probability of achieving the sample value based on the assumption that the null hypothesis is true.

The most commonly used significance level is 0.05. Accordingto Swift (2001) the significance level must be decided before the data is known. This is to stop researchers adjusting the significance level to get the result that they want rather than accepting or rejecting objectively.

If the test statistic probability is less than 0.05 we would reject the null hypothesis that there is no difference between males and females in favour of males being heavier on the one sided basis.

However it is possible for the test statistic to be in the rejection zone when in fact the null hypothesis is true. This is called a TypeI error.

It is also possible for the test statistic to be in the acceptance zone when the alternative hypothesis is true (in other words the null hypothesis is false). This is called a Type II error. Power is 1 -probability of a Type II error and is therefore the probability of correctly rejecting a false null hypothesis. Whereas the Type I error is set at the desired level, the Type II error depends on the actual value of the alternative hypothesis.

Coolican (1990) sets out the possible outcomes in the following table:

## Consider and discuss the required approach to analysis of the data set provided.

As part of this explore also how you would test the hypothesis below and explain the reasons for your decisions. Hypothesis 1: Male children are taller than female children. Null hypothesis; There is no difference in height between male children and female children. Hypothesis 2: Taller children are heavier. Null hypothesis: There is no relationship between how tall children are and how much they weigh.

Analysis of data set

The data set is a list of 30 children’s gender, age, height, the data weight, upper and lower limb lengths, eye colour, like of chocolate or not andIQ.

There are two main things to consider before and the data. These are the types of data and the quality of the data as a sample.

Types of data could be nominal, ordinal, interval or ratio.Nominal is also know as categorical. Coolican (1990) gives more details of all of these and his definitions have been used to decide the types of data in the data set.

It is also helpful to distinguish between continuous numbers, which could be measured to any number of decimal places an discrete numbers such as integers which have finite jumps like 1,2 etc.

Gender

This variable can only distinguish between male or female.There is no order to this and so the data is nominal.

Age

This variable can take integer values. It could be measured to decimal places, but is generally only recorded as integer. It is ratio data because, for example, it would be meaningful to say that a 20 year old person is twice as old as a 10 year old.

In this data set, the ages range from 120 months to 156months. This needs to be consistent with the population being tested.

Height

This variable can take values to decimal places if necessary. Again it is ratio data because, for example, it would be meaningful to say that a person who is 180 cm tall is 1.5 times as tall as someone 120cmtall. In this sample it is measured to the nearest cm.

Weight

Like height, this variable could take be measured to decimal places and is ratio data. In this sample it is measured to the nearest kg.

Upper and lower limb lengths

Again this variable is like height and weight and is ratio data.

Eye colour

This variable can take a limited number of values which are eye colours. The order is not meaningful. This data is therefore nominal(categorical).

Like of chocolate or not

As with eye colour, this variable can take a limited number of values which are the sample members preferences. In distinguishing merely between liking and disliking, the order is not meaningful. This data is therefore nominal (categorical).

IQ

IQ is a scale measurement found by testing each sample member. As such it is not a ratio scale because it would not be meaningful to say, for example, that someone with a score of 125 is 25% more intelligent than someone with a score of 100.

There is another level of data mentioned by Cooligan into which none of the data set variables fit. That is Ordinal Data. This means that the data have an order or rank which makes sense. An example would be if 10students tried a test and you recorded who finished quickest, 2^{nd}quickest etc, but not the actual time.

The data is intended to be a sample from a population about which we can make inferences. For example in the hypothesis tests we want toknow whether they are indicative of population differences. The results can only be inferred on the population from which it is drawn it would not be valid otherwise.

Details of sampling methods were found in Bland (2000). To accomplish the required objectives, the sample has to be representative of the defined population. It would also be more accurate if the sample is stratified by known factors like gender and age. This means that, for example, the proportion of males in the sample is the same as the proportion in the population.

Sample size is another consideration. In this case it is 30.Whether this is adequate for the hypotheses being tested is examined below.

Hypothesis 1: Male children are taller than female children.

Swift (2001) gives a very readable account of the hypothesis testing process and the structure of the test.

The first step is to set up the hypotheses:

The Null hypothesis is that there is no difference in height between male children and female children.

If the alternative was as Coolican describes it as “we do not predict in which direction the results will go then it would have been a two-tailed test. In this case the alternative is that males are taller it is therefore a specific direction and so a one-tailed test is required.

To test the hypothesis we need to set up a test statistic and then either match it against a pre-determined critical value or calculate the probability of achieving the sample value based on the assumption that the null hypothesis is true.

The most commonly used significance level is 0.05. Accordingto Swift (2001) the significance level must be decided before the data is known. This is to stop researchers adjusting the significance level to get the result that they want rather than accepting or rejecting objectively.

If the test statistic probability is less than 0.05 we would reject the null hypothesis that there is no difference between males and females in favour of males being heavier on the one sided basis.

However it is possible for the test statistic to be in the rejection zone when in fact the null hypothesis is true. This is called a TypeI error.

It is also possible for the test statistic to be in the acceptance zone when the alternative hypothesis is true (in other words the null hypothesis is false). This is called a Type II error. Power is 1 -probability of a Type II error and is therefore the probability of correctly rejecting a false null hypothesis. Whereas the Type I error is set at the desired level, the Type II error depends on the actual value of the alternative hypothesis.

Coolican (1990) sets out the possible outcomes in the following table:

#### Cite This Work

To export a reference to this article please select a referencing stye below:

## Related Services

View all### DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have your work published on the UKDiss.com website then please: