# Classifying Data And Defining Variables Biology Essay

**Published:** **Last Edited:**

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The save process is easier, use the ctrl+s on the keyboard, I saw the interface of figure 3, I named my file as kkkkkkk.sav, then click save to finish the job.

Figure 3

b. In the Variable View window define all your variables: explain how you have assigned each variable a name; show the variables which you have assigned values to and state which measurement scale you have chosen for each of the variables and why.

To name the variables is straightforward, double click the cells of name and label, and enter the names or labels I need to give them; the result is shown on figure 4.

Figure 4

In this part, I need to change the data in the column of Gender according to the questionnaire gave in the assessment. All the data of "M" will be changed into "1" while all the "F" will be changed into "2". In the "Transform" menu, I chose the function of "Recode into Same Variables" and came to the interface of figure 5, then move Gender into the string variables dialogue box.

Figure 5

After click the "Old and New Values"(see figure 6), I entered M in the Old Value dialog box and entered 1in the New Value dialog box, then click add, it mean all the data of "M" belong to the Gender column will be changed into "1", and then I did the same job to the data "F". Now all the figures in the Gender column have been changed into proper type (see figure 9).

Figure 6

Then I need to Defining variables, firstly I need to move to the Variable View window and changed the Measure for each variable into proper type (see figure 7).

A variable can be treated as nominal when its values represent categories with no intrinsic ranking, so variables such as Gender, Major, Undergraduate Specialization, and Employment status can be respected as "Nominal".

A variable can be treated as ordinal when its values represent categories with some intrinsic ranking, so the Satisfaction Advisement can be treated as "Ordinal".

A variable can be treated as scale when its values represent ordered categories with a meaningful metric, so that distance comparisons between values are appropriate. As a result of that, variables such as age, height, Graduate GPA, Undergrad Specialization, number of jobs, expected/ anticipated salary are all can be changed into "Scale".

Figure 7

The last thing to do is to define the values of variables, take respondent's gender for example, click the cell of value and I saw the interface of value labels (figure8), and then add value=1, label=male, and add value=2, label= female, click ok at last. I defined other values in this way and the figure 9 shows the result after all the jobs have done.

Figure 8

Figure 9

c. Briefly describe the variables in the data by classifying them according to the type of data: quantitative/numerical versus qualitative/non-numerical; categorical versus continuous and according to the levels (or scales) of measurement they represent: nominal, ordinal or interval. Give at least two examples of the types of analysis that are appropriate for each classified group of variables.

Quantitative/ numerical data deals with numbers, it is the data identified or measured on a numerical scale. Statistical methods can be used to analyze the numerical data, and tables, charts, histograms and graphs are used to show the results. Examples of quantitative/ numerical data in this study are respondent's age and height, and their anticipated/ expected salary.

Qualitative/ non-numerical data described items in terms of "some quality or categorization that may be 'informal' or may use relatively ill-defined characteristics" (Dodge, 2003). This is almost converse to quantitative data. However, if the data of individual items which is originally obtained as qualitative information are summarised by means of counts, it may give rise to quantitative data. Examples of such data are respondent's current major area of study and their undergrad specialization.

Categorical variables represent types of data which may be divided into groups. Examples of this kind of variables are respondent's gender, current major area of study and their undergrad specialization.

Continuous data is not discrete; it is the data which can take any numerical value within certain restrictions. Examples of such data are respondent's age and height.

A variable can be treated as nominal when its values represent categories with no intrinsic ranking, so variables such as Gender, Major, Undergraduate Specialization, and Employment status can be respected as "Nominal".

A variable can be treated as ordinal when its values represent categories with some intrinsic ranking, so the Satisfaction Advisement can be treated as "Ordinal".

A variable can be treated as scale when its values represent ordered categories with a meaningful metric, so that distance comparisons between values are appropriate. As a result of that, variables such as age, height, Graduate GPA, Undergrad Specialization, number of jobs, expected/ anticipated salary are all can be changed into "Scale".

[2] Descriptives

a. Run and present relevant descriptive statistics (including graphs) on all the categorical variables.

I first analyzed the descriptive statistics of "respondent's gender" and "employment status", using the "frequencies" in the analyze menu. Figure 10 shows all the data are completed with no missing data, and the mean, median, mode, standard deviation, variance of the data are calculated as well.

And the frequency table of figure 11and figure 12 indicates the frequency and percentage of each "gender"/ "employment status", and the pie chart (figure 13) shows the result more directly. We can see male and female account for 60% and 40% of all the respondents, respectively. And "full-time", "part time" and "unemployed" account for 82.5%, 12.5% and 5% of all the respondents, respectively.

Figure 10

Statistics

Respondent's gender Employment Status

N Valid 40 40

Missing 0 0

Mean 1.40 1.23

Median 1.00 1.00

Mode 1 1

Std. Deviation .496 .530

Variance .246 .281

Range 1 2

Figure 11

Respondent's gender

Frequency Percent Valid Percent Cumulative Percent

Valid Male 24 60.0 60.0 60.0

Female 16 40.0 40.0 100.0

Total 40 100.0 100.0

Figure 12

Employment Status

Frequency Percent Valid Percent Cumulative Percent

Valid Full-time 33 82.5 82.5 82.5

Part-time 5 12.5 12.5 95.0

Unemployed 2 5.0 5.0 100.0

Total 40 100.0 100.0

Figure 13

I did the almost the same work to the "current major" area of study and "undergrad specialization" (see figure 14, 15 & 16), except I chose the bar chart to represent the result (figure 17) because pie chart may have problems when cut into too many parts.

Figure 14

Statistics

Current major area of study Undergrad Specialization

N Valid 40 40

Missing 0 0

Mean 3.30 3.85

Median 3.00 3.00

Mode 1 2

Std. Deviation 1.990 2.751

Variance 3.959 7.567

Range 6 9

Figure 15

Current major area of study

Frequency Percent Valid Percent Cumulative Percent

Valid Accounting 10 25.0 25.0 25.0

Economics/ Finance 9 22.5 22.5 47.5

Information Systems 4 10.0 10.0 57.5

International Business 2 5.0 5.0 62.5

Management 7 17.5 17.5 80.0

Marketing/ Retailing 7 17.5 17.5 97.5

Undecided 1 2.5 2.5 100.0

Total 40 100.0 100.0

Figure 16

Undergrad Specialization

Frequency Percent Valid Percent Cumulative Percent

Valid Other 3 7.5 7.5 7.5

Biological Sciences 2 5.0 5.0 12.5

Business Administration 11 27.5 27.5 40.0

Computer or Maths 9 22.5 22.5 62.5

Education 2 5.0 5.0 67.5

Engineering 3 7.5 7.5 75.0

Humanities 2 5.0 5.0 80.0

Performing Arts 1 2.5 2.5 82.5

Physical Sciences 2 5.0 5.0 87.5

Social Sciences 5 12.5 12.5 100.0

Total 40 100.0 100.0

Figure 17

b. Run and present relevant descriptive statistics (including graphs) on all the interval/ratio variables.

Again, I used the frequencies function to prepare the descriptive statistics for the interval/ ratio variables. Figure shows that no data is missing and the mean, median, mode, standard deviation, variance of the data are calculated as well.

Figure 18

Statistics

Respondent's age Respondent's height Number of Jobs Expected Salary Anticipated Salary in 5 Years GMAT score UNdergrad GPA Graduate GPA

N Valid 40 40 40 40 40 40 40 40

Missing 0 0 0 0 0 0 0 0

Mean 29.73 67.28 2.13 70.00 108.13 554.75 3.293 3.515

Median 29.50 67.00 2.00 65.00 100.00 560.00 3.265 3.510

Mode 25 63 1a 60 100 580 3.0a 4.0

Std. Deviation 4.657 4.132 1.181 19.315 35.494 44.202 .3051 .2898

Variance 21.692 17.076 1.394 373.077 1259.856 1953.782 .093 .084

Range 19 14 5 80 190 190 1.3 1.0

a. Multiple modes exist. The smallest value is shown

Then the frequency table of the 8 variables are shown as following:

Figure 19

Respondent's age

Frequency Percent Valid Percent Cumulative Percent

Valid 22 1 2.5 2.5 2.5

23 1 2.5 2.5 5.0

24 2 5.0 5.0 10.0

25 5 12.5 12.5 22.5

26 3 7.5 7.5 30.0

27 3 7.5 7.5 37.5

28 3 7.5 7.5 45.0

29 2 5.0 5.0 50.0

30 4 10.0 10.0 60.0

31 3 7.5 7.5 67.5

32 3 7.5 7.5 75.0

33 2 5.0 5.0 80.0

34 1 2.5 2.5 82.5

35 2 5.0 5.0 87.5

36 1 2.5 2.5 90.0

37 1 2.5 2.5 92.5

38 1 2.5 2.5 95.0

39 1 2.5 2.5 97.5

41 1 2.5 2.5 100.0

Total 40 100.0 100.0

Figure 20

Respondent's hight

Frequency Percent Valid Percent Cumulative Percent

Valid 60 1 2.5 2.5 2.5

61 2 5.0 5.0 7.5

62 1 2.5 2.5 10.0

63 6 15.0 15.0 25.0

64 2 5.0 5.0 30.0

65 4 10.0 10.0 40.0

66 3 7.5 7.5 47.5

67 2 5.0 5.0 52.5

68 3 7.5 7.5 60.0

69 3 7.5 7.5 67.5

70 3 7.5 7.5 75.0

71 2 5.0 5.0 80.0

72 1 2.5 2.5 82.5

73 4 10.0 10.0 92.5

74 3 7.5 7.5 100.0

Total 40 100.0 100.0

Figure 21

Expected Salary

Frequency Percent Valid Percent Cumulative Percent

Valid 40 1 2.5 2.5 2.5

45 3 7.5 7.5 10.0

50 4 10.0 10.0 20.0

55 2 5.0 5.0 25.0

60 8 20.0 20.0 45.0

65 4 10.0 10.0 55.0

70 2 5.0 5.0 60.0

75 3 7.5 7.5 67.5

80 4 10.0 10.0 77.5

85 1 2.5 2.5 80.0

90 2 5.0 5.0 85.0

100 4 10.0 10.0 95.0

105 1 2.5 2.5 97.5

120 1 2.5 2.5 100.0

Total 40 100.0 100.0

Figure 22

Number of Jobs

Frequency Percent Valid Percent Cumulative Percent

Valid 0 1 2.5 2.5 2.5

1 13 32.5 32.5 35.0

2 13 32.5 32.5 67.5

3 8 20.0 20.0 87.5

4 3 7.5 7.5 95.0

5 2 5.0 5.0 100.0

Total 40 100.0 100.0

Figure 23

Anticipated Salary in 5 Years

Frequency Percent Valid Percent Cumulative Percent

Valid 60 1 2.5 2.5 2.5

65 2 5.0 5.0 7.5

75 2 5.0 5.0 12.5

80 3 7.5 7.5 20.0

85 4 10.0 10.0 30.0

90 4 10.0 10.0 40.0

95 1 2.5 2.5 42.5

100 8 20.0 20.0 62.5

110 2 5.0 5.0 67.5

120 2 5.0 5.0 72.5

125 1 2.5 2.5 75.0

130 1 2.5 2.5 77.5

135 1 2.5 2.5 80.0

140 1 2.5 2.5 82.5

150 5 12.5 12.5 95.0

160 1 2.5 2.5 97.5

250 1 2.5 2.5 100.0

Total 40 100.0 100.0

Figure 24

GMAT score

Frequency Percent Valid Percent Cumulative Percent

Valid 460 1 2.5 2.5 2.5

480 3 7.5 7.5 10.0

490 1 2.5 2.5 12.5

500 2 5.0 5.0 17.5

510 1 2.5 2.5 20.0

520 1 2.5 2.5 22.5

530 3 7.5 7.5 30.0

540 4 10.0 10.0 40.0

550 3 7.5 7.5 47.5

560 2 5.0 5.0 52.5

570 4 10.0 10.0 62.5

580 5 12.5 12.5 75.0

590 2 5.0 5.0 80.0

600 4 10.0 10.0 90.0

610 2 5.0 5.0 95.0

620 1 2.5 2.5 97.5

650 1 2.5 2.5 100.0

Total 40 100.0 100.0

Figure 25

UNdergrad GPA

Frequency Percent Valid Percent Cumulative Percent

Valid 2.6 1 2.5 2.5 2.5

2.8 1 2.5 2.5 5.0

2.8 1 2.5 2.5 7.5

2.9 1 2.5 2.5 10.0

3.0 1 2.5 2.5 12.5

3.0 1 2.5 2.5 15.0

3.0 1 2.5 2.5 17.5

3.0 1 2.5 2.5 20.0

3.0 2 5.0 5.0 25.0

3.1 1 2.5 2.5 27.5

3.1 1 2.5 2.5 30.0

3.1 1 2.5 2.5 32.5

3.2 1 2.5 2.5 35.0

3.2 1 2.5 2.5 37.5

3.2 2 5.0 5.0 42.5

3.2 1 2.5 2.5 45.0

3.2 1 2.5 2.5 47.5

3.3 1 2.5 2.5 50.0

3.3 1 2.5 2.5 52.5

3.3 1 2.5 2.5 55.0

3.3 1 2.5 2.5 57.5

3.3 1 2.5 2.5 60.0

3.3 1 2.5 2.5 62.5

3.4 1 2.5 2.5 65.0

3.4 2 5.0 5.0 70.0

3.5 1 2.5 2.5 72.5

3.5 1 2.5 2.5 75.0

3.6 1 2.5 2.5 77.5

3.6 1 2.5 2.5 80.0

3.6 1 2.5 2.5 82.5

3.7 1 2.5 2.5 85.0

3.7 1 2.5 2.5 87.5

3.7 1 2.5 2.5 90.0

3.8 1 2.5 2.5 92.5

3.8 1 2.5 2.5 95.0

3.8 1 2.5 2.5 97.5

3.9 1 2.5 2.5 100.0

Total 40 100.0 100.0

Graduate GPA

Frequency Percent Valid Percent Cumulative Percent

Valid 3.0 2 5.0 5.0 5.0

3.1 1 2.5 2.5 7.5

3.2 1 2.5 2.5 10.0

3.2 1 2.5 2.5 12.5

3.2 1 2.5 2.5 15.0

3.2 2 5.0 5.0 20.0

3.2 1 2.5 2.5 22.5

3.3 1 2.5 2.5 25.0

3.3 1 2.5 2.5 27.5

3.3 1 2.5 2.5 30.0

3.4 1 2.5 2.5 32.5

3.4 1 2.5 2.5 35.0

3.4 1 2.5 2.5 37.5

3.4 1 2.5 2.5 40.0

3.4 2 5.0 5.0 45.0

3.5 1 2.5 2.5 47.5

3.5 2 5.0 5.0 52.5

3.5 1 2.5 2.5 55.0

3.5 1 2.5 2.5 57.5

3.6 1 2.5 2.5 60.0

3.6 1 2.5 2.5 62.5

3.6 1 2.5 2.5 65.0

3.7 2 5.0 5.0 70.0

3.7 1 2.5 2.5 72.5

3.7 1 2.5 2.5 75.0

3.8 1 2.5 2.5 77.5

3.8 1 2.5 2.5 80.0

3.8 1 2.5 2.5 82.5

3.9 2 5.0 5.0 87.5

3.9 1 2.5 2.5 90.0

4.0 4 10.0 10.0 100.0

Total 40 100.0 100.0

I used the bar chart to show the result of the statistics of "Respondent's age", "Respondent's height", "Number of Jobs", "Expected Salary", "Anticipated Salary in 5 Years" and "GMAT score"(figure 26-28) and for the "Undergraduate GPA" and "Graduate GPA", Histogram (figure 29) are applied to demonstrate the distribution.

Figure 26

Figure 27

Figure 28

Figure 29

c. Perform a tabular and graphical analysis to explore the relationship between gender and employment status. Include a relevant test of association and interpret your results.

I used the "explore" function to analyze the relationship between gender and employment status. The gender is used as independent, and the employment status is treated as dependent. Then we get the result as following:

Figure 30

Case Processing Summary

Cases

Valid Missing Total

N Percent N Percent N Percent

Employment Status * Respondent's gender 40 100.0% 0 .0% 40 100.0%

Figure 31

Employment Status * Respondent's gender Crosstabulation

Respondent's gender Total

Male Female

Employment Status Full-time Count 20 13 33

% within Employment Status 60.6% 39.4% 100.0%

Part-time Count 4 1 5

% within Employment Status 80.0% 20.0% 100.0%

Unemployed Count 0 2 2

% within Employment Status .0% 100.0% 100.0%

Total Count 24 16 40

% within Employment Status 60.0% 40.0% 100.0%

Figure 32

As can be seen in the bar chart (figure 32), male's rate of employment is higher than that of female.

Moreover, figure 33 shows the result of the Chi-Square tests. The degree of freedom is 2 and the significance level based on the asymptotic distribution of a test statistic =0.147 which is higher than 0.05, so we can conclude that rows and columns of the contingency are independent.

However, there are some problems with this. Firstly there are too many cells have expected count less than 5 that the method is not that accurate in the evaluation. Another thing problem is that the test usually used to testify the differences of choices people faced at the same time, but in this case, it may not fit since the gender is relatively fixed.

Figure 33

Chi-Square Tests

Value df Asymp. Sig. (2-sided)

Pearson Chi-Square 3.838a 2 .147

Likelihood Ratio 4.585 2 .101

N of Valid Cases 40

a. 4 cells (66.7%) have expected count less than 5. The minimum expected count is .80.

[3] Distributions

a. Carry out appropriate tests of normality on the following variables

i. Age

ii. Height

iii. Graduate cumulative grade point average

iv. Undergraduate cumulative grade point average

v. GMAT score

I used the "explore" function to test the normality of these variables.

The Descriptives table (figure 34) shows the 5% Trimmed Mean and the Skewness of these variables; I will use these two figures to interpret the result:

Figure 34

Descriptives

Statistic Std. Error

Respondent's age Mean 29.73 .736

95% Confidence Interval for Mean Lower Bound 28.24

Upper Bound 31.21

5% Trimmed Mean 29.56

Median 29.50

Variance 21.692

Std. Deviation 4.657

Minimum 22

Maximum 41

Range 19

Interquartile Range 7

Skewness .538 .374

Kurtosis -.333 .733

Respondent's height Mean 67.28 .653

95% Confidence Interval for Mean Lower Bound 65.95

Upper Bound 68.60

5% Trimmed Mean 67.28

Median 67.00

Variance 17.076

Std. Deviation 4.132

Minimum 60

Maximum 74

Range 14

Interquartile Range 8

Skewness .127 .374

Kurtosis -1.139 .733

Graduate GPA Mean 3.515 .0458

95% Confidence Interval for Mean Lower Bound 3.422

Upper Bound 3.608

5% Trimmed Mean 3.517

Median 3.510

Variance .084

Std. Deviation .2898

Minimum 3.0

Maximum 4.0

Range 1.0

Interquartile Range .5

Skewness .136 .374

Kurtosis -.914 .733

UNdergrad GPA Mean 3.293 .0482

95% Confidence Interval for Mean Lower Bound 3.195

Upper Bound 3.391

5% Trimmed Mean 3.297

Median 3.265

Variance .093

Std. Deviation .3051

Minimum 2.6

Maximum 3.9

Range 1.3

Interquartile Range .5

Skewness -.034 .374

Kurtosis -.322 .733

GMAT score Mean 554.75 6.989

95% Confidence Interval for Mean Lower Bound 540.61

Upper Bound 568.89

5% Trimmed Mean 555.00

Median 560.00

Variance 1953.782

Std. Deviation 44.202

Minimum 460

Maximum 650

Range 190

Interquartile Range 58

Skewness -.269 .374

Kurtosis -.442 .733

Method 1: 5% Trimmed Mean - This is the mean that would be obtained if the lower and upper 2.5% of values of the variable were deleted. We can see that all the four 5% Trimmed Means are very close to their means, so if there are outliers, their effect is not significant.

Method 2: Skewness - Skewness measures the degree and direction of asymmetry. We can see that all the skewnesses are not far to zero especially the Undergraduate GPA which is -0.034, while the skewness of Respondent's age is furthest to zero which is 0.538, but it is still less than twice its standard error(0.748), so all these variables can be respected as follow the symmetric. To be specific, the skewnesses of Respondent's age, height and their Graduate GPA are positive which means their means are more than medians, while the rest two have negative skenesses which means their means are less than medians.

Method3: the table of Tests of Normality (figure 35) shows the Kolmogorov-Smirnov and the Shapiro-Wilk test of those variables. We can see that the Sig values (p) are all more than 0.05, so the distribution as a whole deviates from a comparable normal distribution.

Figure 35

Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

Respondent's age .096 40 .200* .964 40 .229

Respondent's hight .109 40 .200* .948 40 .066

Graduate GPA .083 40 .200* .962 40 .193

UNdergrad GPA .072 40 .200* .985 40 .851

GMAT score .110 40 .200* .975 40 .513

a. Lilliefors Significance Correction

*. This is a lower bound of the true significance.

Methods 4: Stem and leaf plot: (Appendix 1)

Frequency is the frequency of the leaves.

Stem is the number in the 10s place of the value of the variable. Take the Respondent's height for example (see figure36 left), in the last line, the stem is 7 and leaves are 4. The value of the variable is 74. The 7 is in the 10s place, so it is the stem.

Leaf is the number in the 1s place of the value of the variable. The number of leaves tells you how many of these numbers is in the variable. Take Respondent's Undergraduate GPA for example (see figure 36 right), on the third line, there is two 8s and one 9s (hence, the frequency is 3). This means that there is two values of 28 and one values of 29 in the variable write.

Method 5: Boxplot:

1. Age:

Figure 37

a= 41: This is the third quartile plus 1.5 times the interquartile range (the difference between the first and the third quartile).

b= 32.75: This is the third quartile, also known as the 75th percentile.

c= 29.5: This is the median, also known as the 50th percentile.

d= 26: This is the first quartile, also known as the 25th percentile.

e= 22: This is the first quartile minus 1.5 times the interquartile range (the difference between the first and the third quartile).

2. Height:

Figure 38

a= 74: This is the third quartile plus 1.5 times the interquartile range (the difference between the first and the third quartile).

b= 70.75: This is the third quartile, also known as the 75th percentile.

c= 67: This is the median, also known as the 50th percentile.

d= 63.25: This is the first quartile, also known as the 25th percentile.

e= 60: This is the first quartile minus 1.5 times the interquartile range (the difference between the first and the third quartile).

3. Graduate GPA:

Figure 39

a= 4.0: This is the third quartile plus 1.5 times the interquartile range (the difference between the first and the third quartile).

b= 3.738: This is the third quartile, also known as the 75th percentile.

c= 3.510: This is the median, also known as the 50th percentile.

d= 3.258: This is the first quartile, also known as the 25th percentile.

e= 3.0: This is the first quartile minus 1.5 times the interquartile range (the difference between the first and the third quartile).

4. Undergraduate GPA:

Figure 40

a= 3.9: This is the third quartile plus 1.5 times the interquartile range (the difference between the first and the third quartile).

b= 3.537: This is the third quartile, also known as the 75th percentile.

c= 3.265: This is the median, also known as the 50th percentile.

d= 3.060: This is the first quartile, also known as the 25th percentile.

e= 2.6: This is the first quartile minus 1.5 times the interquartile range (the difference between the first and the third quartile).

5. GMAT score:

Figure 41

a= 650: This is the third quartile plus 1.5 times the interquartile range (the difference between the first and the third quartile).

b= 587.5: This is the third quartile, also known as the 75th percentile.

c= 560: This is the median, also known as the 50th percentile.

d= 530: This is the first quartile, also known as the 25th percentile.

e= 460: This is the first quartile minus 1.5 times the interquartile range (the difference between the first and the third quartile).

Method 6: Normal Q-Q Plot (figure 42):

The straight line represents what our data would look like if it were perfectly normally distributed. Our actual data is represented by the squares plotted along this line. The closer the squares are to the line, the more normally distributed our data looks. Here, most of our points fall almost perfectly along the line. This is a good indicator that our data is normally distributed.

[4] Hypothesis test

a. Suggest a null hypothesis and an alternative hypothesis for testing the mean age for male and female students.

Test the claim that true mean (average age for male students) is equal to 30.88

H0 : Î¼ 30.88 H1 : Î¼ <30.88 (This is an one-tail test)

a = 0.05 and n = 24

p=12/24=0.5

Assume =0.23

So the test statistic is:

Since ZSTAT = 3.14 > 1.96, reject the null hypothesis & conclude there is sufficient evidence that the mode age of male students is no less than 30.88.

Test the claim that true mean (average age for female students) is equal to 28

H0 : Î¼ 28 H1 : Î¼ < 28 (This is an one-tail test)

a = 0.05 and n = 16

P=6/16=0.375

Assume =0.3

So the test statistic is:

Since ZSTAT = -0.8731 is between -1.96 and +1.96, accept the null hypothesis & conclude there is sufficient evidence that the mode age of female students is no less than 28.

b.Carry out an appropriate test to compare the mean age for the two sexes, and interpret your results.

I used the "compare mean" function to analyze this relationship.

The output of ANOVA summary table (figure 43) is divided into between group effects (effects due to the experiment) and within group effects (this is the unsystematic variation in the data). The between-group effect is the overall experimental effect. In this row we are told the sums of squares for the model (79.350). The sum of squares for the model represents the total experimental effect whereas the mean squares for the model represent the average experimental effect. The row labelled within group gives details of the unsystematic variation within data. The table tells us how much unsystematic variation exists. It then gives the average amount of unsystematic variation, the residual mean squares.

The ANOVA summary table reveals that the between-group mean square (the variation explained by the model) is 79.350 (79.350 / 1), and the within-group mean square (the variation unexplained) is 20.174 (766.625 / 38). The F-ratio is 3.933 (79.350 / 20.174), and since the significance value is more than 0.05, we can say that there was not a significant effect of the mean age and the two genders. However, at this stage we still do not know the exact effect of variables.

Figure 43

ANOVA Tablea

Sum of Squares df Mean Square F Sig.

Respondent's age * Respondent's gender Between Groups (Combined) 79.350 1 79.350 3.933 .055

Within Groups 766.625 38 20.174

Total 845.975 39

a. With fewer than three groups, linearity measures for Respondent's age * Respondent's gender cannot be computed.

[5] Data manipulation and analysis

a.Create a new variable out of the satisfaction variable by collapsing the existing variable into three categories - 'Unsatisfied', 'Neutral' and 'Satisfied'. You may name the new variable satiscat; and then give it an appropriate label.

I used the recode into different variables to create the new variable (figure44). After named the new variable and describe it in the label box, click the "Old and New Values", there is the interface of figure 45

Figure 44

Figure 45

After the new variable has been created, I defined the value as shown in figure 46. Figure 47 shows the Data View of SPSS after all the jobs have been done.

Figure 46

Figure 47

b. Analyse the collapsed satisfaction categories so as to compare the levels of satisfaction for the two sexes; remember to apply an appropriate test of association

I used the "explore" function to test the normality of these variables.

The table of Tests of Normality (figure 48) shows the Kolmogorov-Smirnov and the Shapiro-Wilk test of those variables. We can see that the Sig values (p) are all 0.00, so the distribution as a whole follows the normal distribution.

Figure 48

Tests of Normality

Respondent's gender Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

collapsing the satisfaction Male .331 24 .000 .770 24 .000

Female .438 16 .000 .511 16 .000

a. Lilliefors Significance Correction

Moreover, when look at the histogram (figure 49&50),we can find that most respondents from both genders feel neutral, but for the satisfied and unsatisfied items, there are more men than women.

Figure 49

Figure 50

And the boxplot (figure 51) indicated this situation as well:

Figure 51

[6] Correlation and regression

a. Perform a correlation on:

i. Age and height

ii. Graduate cumulative grade point average and Undergraduate cumulative grade point average

iii. Age and GMAT

iv. Expected salary and anticipated salary

The correlations have been done as following:

Figure 52 shows that the respondent's age have no significant relationship with respondent's height as the sig=0.119>0.05

Figure 52

Correlations

Respondent's age Respondent's height

Respondent's age Pearson Correlation 1 .251

Sig. (2-tailed) .119

N 40 40

Respondent's height Pearson Correlation .251 1

Sig. (2-tailed) .119

N 40 40

Figure 53 shows that the Graduate GPA and the Undergraduate GPA have a significant correlation at the 0.05 level since the sig=0.047<0.05

Figure 53

Correlations

Graduate GPA UNdergrad GPA

Graduate GPA Pearson Correlation 1 .316*

Sig. (2-tailed) .047

N 40 40

UNdergrad GPA Pearson Correlation .316* 1

Sig. (2-tailed) .047

N 40 40

*. Correlation is significant at the 0.05 level (2-tailed).

Figure 54 shows that the respondent's age have no significant relationship with GMAT score as the sig=0.279>0.05

Figure 54

Correlations

Respondent's age GMAT score

Respondent's age Pearson Correlation 1 -.175

Sig. (2-tailed) .279

N 40 40

GMAT score Pearson Correlation -.175 1

Sig. (2-tailed) .279

N 40 40

Figure 55shows that the Expected salary and the Anticipated salary in 5 years have a significant correlation at the 0.01 level since the sig=0.000<0.01

Figure 55

Correlations

Expected Salary Anticipated Salary in 5 Years

Expected Salary Pearson Correlation 1 .798**

Sig. (2-tailed) .000

N 40 40

Anticipated Salary in 5 Years Pearson Correlation .798** 1

Sig. (2-tailed) .000

N 40 40

**. Correlation is significant at the 0.01 level (2-tailed).

b. Do a scatterplot on each set of variables above. In interpreting your results how much additional information do you derive and how do these compare to results in 6a?

Scatterplot is employed to identify potential associations between two variables, where one may be considered to be an explanatory variable and another may be considered a response variable.

The first scatterplot (figure56) displays the association between the age of respondents and height of respondents.

Figure 56

The next scatterplot (figure 57) shows the association between undergraduate GPA and postgraduate GPA.

Figure 57

The third scatterplot (figure 58) displays the association between respondent's age and respondent's GMAT. R2 is a measure of how much of the variability in the outcome is accounted for by the predictors. Since "R2" is between -0.5 and 0.5 in the three cases, there is a poor match between regression equation and scatterplot in terms of age and height, undergraduate GPA and postgraduate GPA, and age and GMAT.

Figure 58

The last scatterplot (figure 59) displays the association between anticipated salary in 5 years and expected salary. Since the "R2" > 0.5, the scatterplot clearly indicates that there is a strong positive association between the two.

Figure 59

c. Perform a simple linear regression on

i. Graduate cumulative grade point average with Undergraduate cumulative grade point average

Figure 60

Variables Entered/Removedb

Model Variables Entered Variables Removed Method

1 Graduate GPAa . Enter

a. All requested variables entered.

b. Dependent Variable: UNdergrad GPA

Figure 60 shows that the dependent variable is undergraduate GPA, and all the variables are entered.

In figure 61, Capital R=0.316 is the multiple correlation coefficient that tells us how strongly the multiple independent variables are related to the dependent variable. R square=0.100 is useful as it gives us the coefficient of determination.

Figure 61

Model Summary

Model R R Square Adjusted R Square Std. Error of the Estimate

1 .316a .100 .076 .2933

a. Predictors: (Constant), Graduate GPA

The ANOVA part (figure 62) basically tells us whether the regression equation is explaining a statistically significant portion of the variability in the dependent variable from variability in the independent variables. Here we find that sig=0.047<0.05, so the correlation is significant.

Figure 62

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression .362 1 .362 4.204 .047a

Residual 3.268 38 .086

Total 3.630 39

a. Predictors: (Constant), Graduate GPA

b. Dependent Variable: UNdergrad GPA

The Coefficients part (figure 63) of the output gives us the values that we need in order to write the regression equation. The regression equation will take the form:

Predicted variable (dependent variable) = slope * independent variable + intercept

As a result of that, we can conclude from the table that:

Undergraduate GPA=0.332*Graduate GPA+2.125

Figure 63

Coefficientsa

Model Unstandardized Coefficients Standardized Coefficients t Sig.

B Std. Error Beta

1 (Constant) 2.125 .572 3.718 .001

Graduate GPA .332 .162 .316 2.050 .047

a. Dependent Variable: UNdergrad GPA

ii. Expected salary with anticipated salary

Figure 64 shows that the dependent variable is undergraduate GPA, and all the variables are entered.

Figure 64

Variables Entered/Removedb

Model Variables Entered Variables Removed Method

1 Anticipated Salary in 5 Yearsa . Enter

a. All requested variables entered.

b. Dependent Variable: Expected Salary

In figure 65, Capital R=0.798 is the multiple correlation coefficient that tells us how strongly the multiple independent variables are related to the dependent variable. R square=0.638 is useful as it gives us the coefficient of determination.

Figure 65

Model Summary

Model R R Square Adjusted R Square Std. Error of the Estimate

1 .798a .638 .628 11.780

a. Predictors: (Constant), Anticipated Salary in 5 Years

The ANOVA part (figure 66) basically tells us whether the regression equation is explaining a statistically significant portion of the variability in the dependent variable from variability in the independent variables. Here we find that sig=0.000<0.01, so the correlation is significant.

Figure 66

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 9277.059 1 9277.059 66.856 .000a

Residual 5272.941 38 138.762

Total 14550.000 39

a. Predictors: (Constant), Anticipated Salary in 5 Years

b. Dependent Variable: Expected Salary

The Coefficients part (figure 67) of the output gives us the values that we need in order to write the regression equation. The regression equation will take the form:

Predicted variable (dependent variable) = slope * independent variable + intercept

As a result of that, we can conclude from the table that:

Excepted salary=0.435*Anticipated salary in 5 years+23.017

Figure 67

Coefficientsa

Model Unstandardized Coefficients Standardized Coefficients t Sig.

B Std. Error Beta

1 (Constant) 23.017 6.040 3.811 .000

Anticipated Salary in 5 Years .435 .053 .798 8.177 .000

a. Dependent Variable: Expected Salary

d. Perform a multiple regression treating anticipated salary as the outcome variable and expected salary, graduate cumulative grade point average and GMAT as predictor variables.

I run the multiple regressions and get the following result:

Figure 68 shows basic statistics of the input variables:

Figure 68

Descriptive Statistics

Mean Std. Deviation N

Anticipated Salary in 5 Years 108.13 35.494 40

Expected Salary 70.00 19.315 40

Graduate GPA 3.515 .2898 40

GMAT score 554.75 44.202 40

Figure 69 shows that the expected salary has the most significant correlations with the anticipated salary in 5 years since the sig=0.000

Figure 69

Correlations

Anticipated Salary in 5 Years Expected Salary Graduate GPA GMAT score

Pearson Correlation Anticipated Salary in 5 Years 1.000 .798 .260 -.122

Expected Salary .798 1.000 .233 -.107

Graduate GPA .260 .233 1.000 .364

GMAT score -.122 -.107 .364 1.000

Sig. (1-tailed) Anticipated Salary in 5 Years . .000 .053 .226

Expected Salary .000 . .074 .256

Graduate GPA .053 .074 . .010

GMAT score .226 .256 .010 .

N Anticipated Salary in 5 Years 40 40 40 40

Expected Salary 40 40 40 40

Graduate GPA 40 40 40 40

GMAT score 40 40 40 40

Figure 70

Variables Entered/Removed

Model Variables Entered Variables Removed Method

1 GMAT score, Expected Salary, Graduate GPAa . Enter

a. All requested variables entered.

Figure 70 all the variables are entered.

In figure 71, the stand error of the estimate is 21.895, so the regression's effect is not perfect.

Figure 71

Model Summary

Model R R Square Adjusted R Square Std. Error of the Estimate

1 .805a .649 .619 21.895

a. Predictors: (Constant), GMAT score, Expected Salary, Graduate GPA

Figure 72

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 31876.353 3 10625.451 22.165 .000a

Residual 17258.022 36 479.390

Total 49134.375 39

a. Predictors: (Constant), GMAT score, Expected Salary, Graduate GPA

b. Dependent Variable: Anticipated Salary in 5 Years

Figure 73

Coefficientsa

Model Unstandardized Coefficients Standardized Coefficients t Sig.

B Std. Error Beta

1 (Constant) -1.627 53.377 -.030 .976

Expected Salary 1.404 .191 .764 7.350 .000

Graduate GPA 13.587 13.592 .111 1.000 .324

GMAT score -.065 .087 -.081 -.750 .458

a. Dependent Variable: Anticipated Salary in 5 Years

Again, from figure 73, we can get the equation that:

Anticipated salary = -1.627+ 1.404 * (excepted salary) + 13.587 * (graduate GPA) - 0.065* (GMAT score)

ii. Do the three variables sufficiently predict students' anticipated salary? Why?

The three variables are not sufficient to predict students' anticipated salary. In figure 71, the stand error of the estimate is 21.895, so the regression's effect is not perfect.

R2 is a measure of how much of the variability in the outcome is accounted for by the predictors. When we look at figures below we can find that only the R2 of excepted salary and anticipated salary (figure 74) is over 0.5 (which is 0.6), the other two all have a R2 less than 0.5, so the sigs in figure 73 support the results R2 that the match of multiple regression is poor.

Figure 74

Figure 75

Figure 76

The ANOVA tests (figure 72) tests whether the model is significantly better at predicting the outcome than using the mean as a 'best guess'. Since the sig=0.000<0.001, we can interpret these results as meaning that the final model significantly improves our ability to predict the outcome variable.

The general purpose of multiple regressions is to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable. Multiple regressions can be an effective tool for creating prediction equations providing adequate measurement, large enough samples, assumptions of multiple are met, when we use multiple regression for predicting students' anticipated salary, we use the sample to create a regression equation which would optimally predict a particular phenomenon within a particular population. Here the goal is to use the equation to predict outcomes for individuals not in the sample used in the analysis.

Appendix 1 Stem and Leaf