This essay was produced by our professional writers as a learning aid to help you with your studies
Example Statistics Essay
Using the crime survey of England and Wales, examine how experience of crime affects citizen's opinions of the criminal justice system. What demographic factors influence the relationship between experience of crime and rating of the criminal justice system?
Introduction:
In order to answer the question posed, the following analysis is split in to three sections. Firstly, Section 1 presents an initial inspection of the variables in the dataset. A statistical modelling procedure is then proposed in Section 2 in order to address which variables affect citizen’s opinions of the criminal justice system. Appropriate conclusions are then drawn in Section 3.
Section 1: Description of the data
1.1: Variables in the dataset
The crime survey of England and Wales provided data for 35371 individuals. There is a clear problem with missing data in the dataset, which will be investigated in due course and discussed in detail in Section 2. The variables in the dataset can be grouped in to three types for this analysis:
(1) Demographic factors:
Sex: categorical variable
Age: continuous variable
Marital status: categorical variable
Respondent Social Class: categorical variable
Type of area: categorical variable
(2) Variables relating to citizen’s opinions of the criminal justice system, such as:
How confident are you that the Criminal Justice System as a whole is effective?
(4 level ordinal categorical variable ranging from “very confident” to “not at all confident”)
(3) Variable relating to citizen’s experience of crime:
Experience of any crime in the previous 12 months? Categorical variable
1.2: Inspection of the data
As a starting point, some initial inspections of the data were conducted by assessing variables on an individual basis. Of the 35371 individuals, there were 16176 males and 19195 females, as shown in Table 1. Hence there were no missing values for the sex variable.
Table 1: Gender frequencies in the crime survey of England and WalesValid 
Frequency 
Percent 
Valid Percent 
Cumulative Percent 
Male 
16176 
45.7 
45.7 
45.7 
Female 
19195 
54.3 
54.3 
100.0 
Total 
35371 
100.0 
100.0 
For the continuous age variable, ages ranged from 16 to 99. There was a small percentage of individuals who did not give their age (0.33%), thus these responses are missing. See Section 2 for more details on missing values.
Only 61 of the 35371 individuals did not provide their marital status (0.17%). Similar to the age variable, this percentage appears to not be meaningful. Table 2 shows the marital status frequencies in each of the 8 categories. The frequencies in the “samesex civil partnership and living with partner” category and the bottom two categories in Table 2 (all highlighted in bold) are small in comparison to the others.
For the modelling procedure in Section 2, it is of benefit to have sufficiently large counts in each of the categories and to have a smaller number of categories. Consequently, the categories were combined in a relevant way. The “samesex civil partnership and living with partner” category was combined with the married category. Similarly, the “SPONTANEOUS ONLY separated but legally in samesex civil partnership” was combined with the separated category. Finally, the “SPONTANEOUS ONLY surviving civil partner” was combined with the widowed category. In other words, categories that relate to civil partnerships had to be combined with the corresponding samesex partnerships due to small counts. Table 3 gives frequencies for the new marital status variable. Individuals who are either single or married account for nearly 75% of the dataset. This new marital status variable is used in the modelling procedure of Section 2 and is referred to as “MaritalStatusNew” from now on.
The respondent social class categorical variable had a very large number of categories, therefore making interpretations difficult. Clearly there are too many categories for it to be considered as a categorical variable in a statistical model in Section 2. Although an attempt could be made to try and group the categories in to a much smaller number, this was not deemed sensible. This is because results in Section 2 could potentially differ drastically depending on the groupings chosen. In addition, 1765 individuals did not state their social class (4.99%). Given all these points, this variable was not considered further in Section 2.
The type of area variable had no missing values with 27585 individuals (77.99%) stating that they live in an urban area and 7786 (22.01%) individuals stating that they live in a rural area.
Table 2: Marital status frequencies in the crime survey of England and Wales
Valid 
Frequency 
Percent 
Valid Percent 
Cumulative Percent 
Single 
10513 
29.7 
29.8 
29.8 
Married and living with husband/wife 
15657 
44.3 
44.3 
74.1 
In a samesex civil partnership and living with partner 
90 
.3 
.3 
74.4 
Separated 
1273 
3.6 
3.6 
78.0 
Divorced 
3936  11.1  11.1  89.1 
Widowed  3818  10.8  10.8  99.9 
Separated but legally in samesex civil partnership  18  .1  .1  100.0 
Surviving civil partner  5  .0  .0  100.0 
Total  35310  99.8  100.0  
Missing  System  61  0.2  
Total  35371  100.0 
Table 3: Marital status frequencies (with combined categories) in the crime survey of England and Wales
Valid  Frequency 
Percent 
Valid Percent 
Cumulative Percent 
Single 
10513 
29.7 
29.8 
29.8 
Married 
15747 
44.5 
44.6 
74.4 
Separated 
1291 
3.6 
3.7 
78.0 
Divorced 
3936 
11.1 
11.1 
89.2 
Widowed 
3823 
10.8 
10.8 
100.0 
Total 
35310 
99.8 
100.0 

Missing 
System 
61 
.2 

Total 
35371 
100.0 
The five variables relating to citizens opinions of the criminal justice system (type 2 in Section 1.1) have large proportions of missing values, as shown in Table 4.
Table 4: Frequencies for citizen’s opinions of the criminal justice systemHow confident are you that the police are effective at catching criminals? 
How confident are you that the Crown Prosecution Service is effective at prosecuting people accused of committing a crime? 
How confident are you that prisons are effective at rehabilitating offenders who have been convicted of a crime? 
How confident are you that the probation service is effective at preventing criminals from reoffending? 
How confident are you that the Criminal Justice System as a whole is effective? 

Valid 
17727 
16892 
16145 
15193 
17452 
Missing 
17644 
18479 
19226 
20178 
17919 
Given the nature of the question, attention is focused on the “How confident are you that the Criminal Justice System as a whole is effective?” variable, which will be referred to as “CJSopinion” from now on. This is because interest lies in determining which variables affect citizen’s opinions of the criminal justice system generally rather than any specific aspects of it. A more detailed analysis would also focus on the other four variables. The CJSopinion variable will therefore be the dependent variable in Section 2. The consequences of the 17919 missing values (50.66%) are discussed in detail in Section 2.
Table 5: How confident are you that the Criminal Justice System as a whole is effective (CJSopinion)?Frequency 
Percent 
Valid Percent 
Cumulative Percent 

Very confident 
573 
1.6 
3.3 
3.3 
Fairly confident 
7556 
21.4 
43.3 
46.6 
Not very confident 
7164 
20.3 
41.0 
87.6 
Not at all confident 
2159 
6.1 
12.4 
100.0 
Total 
17452 
49.3 
100.0 

Refusal 
2 
.0 

Don't know 
779 
2.2 

System 
17138 
48.5 

Total 
17919 
50.7 

Total 
35371 
100.0 
The final variable to consider is the variable relating to individuals experience of crime. This variable had no missing values with 29819 individuals (84.30%) stating that they had not been a victim of crime in the last 12 months. This variable will be referred to as “ExperienceOfCrime” from now on.
Section 2: Modelling the data
2.1: Potential approaches
Based on Section 1.2, CJSopinion is chosen as the dependent variable with sex, age, MaritalStatusNew, type of area and experience of crime as the independent variables. The continuous variable age is mean centered to aid interpretability. There are a number of modelling based methods that one may consider in order to determine which of the independent variables significantly affect citizen’s opinions of the criminal justice system. For example:
(1) Linear regression with CJSopinion as the dependent variable and sex, age, MaritalStatusNew, type of area and experience of crime as the independent variables.
(2) Multinomial logistic regression with CJSopinion as the dependent variable and the same independent variables as (1).
(3) Ordinal logistic regression with CJSopinion as the dependent variable and the same independent variables as (1).
Approach (1) relies upon the assumption that the dependent variable is truly continuous and the intervals between consecutive values are equal, both of which are questionable for this case. Approach (2) is an acceptable approach but it does not exploit the fact that the dependent variable in this case is ordinal. Approach (3) is the preferred approach since it exploits the fact that the dependent variable is truly ordinal. In contrast to traditional logistic regression approaches, the ordinal approach in SPSS is based upon the logit of the cumulative probabilities. SPSS uses the proportional odds form of this model. The reader is referred to Agresti (2013, chapter 8.2) for more details.
2.2 Missing data and checking the adequacy of the model
As detailed in Section 1.2, the dataset contains missing values on both the dependent and the marital status independent variable. The problem is of more concern for the dependent variable since 50.66% of the values were missing. The ordinal regression procedure in SPSS only allows for the listwise deletion method of dealing with missing data. The listwise deletion method deletes all the observations for any individual who has any missing values on either the independent or dependent variables. Despite this, listwise deletion still leaves 17427 individuals who have no missing values on any of the dependent or independent variables. However, listwise deletion is a strong assumption that relies on the missingness being random. The assumption was deemed to be acceptable for this dataset, the details of which are given in Section. More details on missing data are given in Agresti (2013, p. 471) and Little and Rubin (2002).
Prior to running the ordinal regression model in SPSS it is important to make sure that there are no low cell counts for combinations of the dependent variable with each of the categorical independent variables in the dataset (for the individuals with no missing values). Crosstabs of the dependent variable against each of sex, MaritalStatusNew, type of area and experience of crime were assessed. All the counts were sufficiently large, thus the model was deemed acceptable to be run in SPSS.
The proportional odds model also assumes a proportional odds assumption. This means that the model assumes the same regression effects for each cumulative logit (Agresti, 2013). This can be assessed in SPPS, as detailed in Section 2.3.
2.3 Ordinal logistic regression in SPSS
In order to address the first part of the research question: “Examine how experience of crime affects citizen's opinions of the criminal justice system”, an ordinal logistic regression was run in SPSS with the dependent variable CJSopinion and independent variable experience of crime. For the dependent variable, “not at all confident” is treated as the baseline and for experience of crime, “not a victim of crime” is treated as the baseline. Ordinal logistic regression can be performed in SPSS by selecting the Analyze tab then Regression and then ordinal, as shown below.
CJSopinion is then entered as the dependent variable and experience of crime as the single independent variable in the factor(s) box. In order to test the proportional odds assumption, the test of parallel lines tick box should be checked on the output tab, as shown below:
The key part of the SPSS output is the parameter estimates. These are shown in the table below.
Table 6: Parameter estimates for the proportional odds model (no demographic variables)
Estimate 
Std. Error 
Wald 
df 
Sig. 

[CJSopinion = very confident=1] 
3.341 
.043 
6092.466 
1 
.000 
[CJSopinion = fairly confident=2] 
.090 
.016 
30.468 
1 
.000 
[CJSopinion = not very confident=3] 
2.010 
.024 
6975.842 
1 
.000 
[ExperienceOfCrime=victim of crime=1] 
.301 
.039 
59.541 
1 
.000 
[ExperienceOfCrime = not a victim of crime=0] 
0^{a}  .  .  0  . 
The model has three intercept parameters (one for each cumulative logit) and these are labelled thresholds in the parameter estimates. These parameters are not usually of interest unless interest lies in calculating response probabilities (Agresti, 2013). Attention is therefore focused on the location part of the parameter estimates.
The experience of crime variable is statistically significant since the p value in the Sig column is <0.001. In order to interpret the coefficient, we can say that the odds of being less than or equal to a given value of the dependent variable are exp(0.301)=1.35 times greater for those who have been a victim of crime than those who are not a victim of crime. For example, for the lowest category of the dependent variable, the odds of being very confident in the criminal justice system are 1.35 times greater for those have been a victim of crime than those who have not been a victim of crime.
The proportional odds assumption was found to be satisfied since the pvalue in the sig column is not less than 0.05 (at the 5% significance level). The SPSS output is shown below.
Table 7: Test of Parallel Lines^{a}
Model 
2 Log Likelihood 
ChiSquare 
df 
Sig. 
Null Hypothesis 
49.120 

General 
49.032 
.087 
2 
.957 
The next stage is to add the demographic variables in to the model. The table below shows the parameter estimates for this case.
Table 8: Parameter estimates for the proportional odds model (with demographic variables)
Estimate 
Std. Error 
Wald 
df 
Sig. 

[cjsovb1 = 1] 
3.323 
.060 
3099.016 
1 
.000 
[cjsovb1 = 2] 
.063 
.045 
1.967 
1 
.161 
[cjsovb1 = 3] 
2.051 
.048 
1808.076 
1 
.000 
AgeMeanCentered 
.008 
.001 
66.790 
1 
.000 
[sex=Male] 
.009 
.029 
.102 
1 
.749 
[sex=Female] 
0^{a} 
. 
. 
0 
. 
[Type of area=urban 
.028 
.035 
.667 
1 
.414 
[Type of area=rural] 
0^{a} 
. 
. 
0 
. 
[MaritalStatusNew=Widowed] 
.296 
.065 
20.602 
1 
.000 
[MaritalStatusNew=Divorced] 
.230 
.054 
17.951 
1 
.000 
[MaritalStatusNew=Separated] 
.080 
.081 
.975 
1 
.323 
[MaritalStatusNew=Married] 
.110 
.038 
8.198 
1 
.004 
[MaritalStatusNew=Single] 
0^{a} 
. 
. 
0 
. 
[ExperienceOfCrime=victim of crime=1] 
.357 
.040 
81.390 
1 
.000 
[ExperienceOfCrime = not a victim of crime=0] 
0^{a} 
. 
. 
0 
. 
The experience of crime variable remains significant after the inclusion of the demographic variables with similar conclusions to before. The continuous variable age is significant. In order to interpret this variable, we can say that for a one unit increase in age, the odds of being very confident are exp(0.008)=1.008 times greater (holding other variables constant).
For marital status, the odds of being very confident in the criminal justice system are exp(0.296)=1.34 times greater for those who are single than those who are widowed (holding other variables constant). In addition, the odds of being very confident in the criminal justice system are exp(0.230)=1.26 times greater for those who are divorced than those who are single(holding other variables constant). Similarly, the odds of being very confident in the criminal justice system are exp(0.110)=1.12 times greater for those who are married than those who are single (holding other variables constant). Type of area and sex were nonsignificant.
The proportional odds assumption was found to not be satisfied for this model since the pvalue was less than 0.05. Rather than rejecting the model outright, Agresti (2013, page 307) recommends performing separate binary logistic regressions (by collapsing over the levels of the ordinal response) and comparing the parameter estimates obtained to those from the original proportional odds model. For this model, the estimates were not found to differ drastically so the assumption was deemed to be viable.
2.4 Assessing the listwise deletion of missing values
SPSS only allows for listwise deletion of missing values when conducting ordinal regression. However, traditional linear regression techniques in SPSS allow for alternative methods for dealing with missing values. It is acknowledged that the use of traditional regression methods for an ordinal response is more questionable and open to debate. However, the same conclusions with regards to which variables are significant are obtained by treating the response as continuous as opposed to ordinal.
Table 9: Parameter estimates for the linear regression model (with demographic variables)
Parameter 
B 
Std. Error 
t 
Sig. 
Intercept 
2.595 
.017 
149.591 
.000 
[sex=Male] 
.003 
.011 
.258 
.796 
[sex=Female] 
0^{a} 
. 
. 
. 
[Type of area=urban 
.013 
.014 
.933 
.351 
[Type of area=rural] 
0^{a} 
. 
. 
. 
[MaritalStatusNew=Widowed] 
.114 
.025 
4.499 
.000 
[MaritalStatusNew=Divorced] 
.092 
.021 
4.304 
.000 
[MaritalStatusNew=Separated] 
.029 
.032 
.920 
.357 
[MaritalStatusNew=Married] 
.044 
.015 
2.958 
.003 
[MaritalStatusNew=Single] 
0^{a} 
. 
. 
. 
[ExperienceOfCrime=victim of crime=1] 
.139 
.015 
8.982 
.000 
AgeMeanCentered 
.003 
.000 
8.191 
.000 
The same conclusions were also obtained when alternative methods were chosen for dealing with the missing values (pairwise deletion and mean substitution) thus giving more confidence in the results obtained for the original proportional odds regression model in Table 8.
3 Conclusions
To conclude, the results of Section 2.3 indicate that experience of crime significantly affects citizen’s overall opinion of the criminal justice system. This was true before and after accounting for other relevant demographic variables. After accounting for other variables, the odds of being very confident in the criminal justice system were 1.43 times greater for those who have been a victim of crime than those who had not. Age and marital status were also found to significantly affect citizen’s overall opinions of the criminal justice system. Gender and whether the area was urban or rural were not found to affect citizen’s overall opinion of the criminal justice system.
References:
Agresti, A. (2013). Categorical Data Analysis, 3rd edition. New Jersey : John Wiley and Sons, Inc,
Little, R. J. and Rubin, D.B. (2002). Statistical Analysis with Missing Data, 2^{nd} ed. Hoboken, NJ: Wiley.