This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
This chapter describes the research design, conceptual framework, sampling procedure, instrumentation, research procedure and assessment of the papers. Inter-rater and instrument reliability are also discussed. Internal and external validity and threats that can jeopardize experimental validity are reviewed at the end of the chapter.
3.1 Research Design
This is an experimental study using the "quasi-experiment" 2 x 2 design. A study is referred to as a "quasi-experiment" if a non random selection and assignment is applied. Unlike a "true experiment", in a quasi-experiment subjects are not randomly assigned to treatment and control groups. It is occasionally preferable to true experimental designs in which the division of the sample to treatment and control groups is randomly done.
As a result of non-random selection of the sample in a quasi-experimental design, and the differences among the sample prior to the study, such a design is subject to the threat of selection that may affect the internal validity. The ways such threats were decreased in this study are discussed through this chapter, and briefly reviewed at the end of the chapter, too.
In addition to a quasi-experimental design, a pretest-posttest comparison group design was used to examine the effect of goal instructions for revision (independent variable) on students writing performance (dependant variable). Both groups were exposed to all conditions of the experiment and they differ only in the treatment they received and subsequently this controls some of the threats to the internal validity of the study.
The research aims to uncover the relationship between different treatments and learner's scores by giving treatments based on content and audience awareness goals for revision, and administering pre- and post-test on goal conditions as it is shown in figure 3.1.
Independent Variables Dependent variables
Post-testïƒ Writing Performance in general and in each aspect of essay writing
Pretest ïƒ writing performance in general and in each aspect of essay writing
Treatment Sessions: Writing and Revising based on content and audience goals
Content and Audience Goal
Treatment Sessions: Writing and Revising based on general goal
Figure 3.1. Conceptual Framework
It involved stratified and convenient random sampling for selecting 26 English learners among EFL students studying in Department of English Language in Unity College International. They were assigned into two conditions using systematic random sampling, through their performance in the pre-test. All 26 students were divided into two groups of 13, General Goal (GG) and Content and Audience Goal (C*AG), in a way that the means of the groups were equal and they were homogenous in terms of their writing performance (see Table 3.1).
Table 3.1. Student demographics
There is no control group in a quasi-experiment and the groups are referred to as the 'treatment' and 'comparison' groups (Krathwohl, 1993; Campbell and Stanley, 1963).This research did not include a control group because of the following reasons:
The design of the study was a between-groups design that used comparison groups to investigate research questions. The comparison was made between the groups, with different treatments (the independent variable). This type of design, referred to as "comparison group design", is described in Mackey and Gass (2005, p.146).
It is not feasible to have a control group in the research because the treatment groups receive different instructions for the tests during revision based on the treatments (goal setting). Therefore, it would not be possible to devise tests and instructions for revision, for a control group who did not have any treatment.
And the most importantly, by having a control group in the study and consequently having them to take the post-tests, which are writing and revising with some instructions that make them aware of the goals (GG and C*AG) which are considered as treatment of this study, the role of the treatment will be eliminated. Consequently, there will not be any differences between the control and experimental groups.
The research was conducted at Unity College International (UCI), Centre of English Language (see Appendix 8). UCI is a private college in Malaysia that offers various undergraduate and postgraduate academic courses regaled by the market, as well as being involved with Human Resource Training and Development courses. Students who are going to enroll in any specific field in this college need to sit for an English placement test conducted by Centre of English Language to participate in Intensive English Language classes if needed.
The Intensive English Program is UCI's core program for entry into tertiary study that helps to prepare students to be able to communicate with ease in a global environment. It is specially designed to enable those weak in English to gain fluency that help them prepare their assignments, understand their textbook, and present their report. There are three levels from elementary to advanced and the program includes Speaking Practice and conversation, Reading, Writing, and multi-media lab work.
This research site was chosen because, in general, this research has initially intended to investigate the effect of revision goals on writing performance among EFL learners and there were many EFL students participating in English classes. Another reason for this venue to be chosen was the number of days the students should participate in the classes, which was five days a week and five hours per day. That made it convenient for the researcher to have enough time for an appropriate treatment.
A total number of 26 EFL students from Centre of English Language were chosen for this study. Since the characteristics of the entire population are the main concern, stratified and convenient random sampling was used to choose those 26 students among all EFL students with different levels of English proficiency in the Centre of English Language based on the class levels and students' willingness for participation in the study.
The reason for choosing these subjects was their being exposed to English as a Foreign Language (EFL) such as those from China, Iran, Vietnam, and Saudi Arabia. Furthermore, the courses they needed to pass were relevant to the current study that is on writing. The English classes focus was on four skills of English, speaking, listening, writing, and reading. Therefore, students were familiar with writing essays on a given prompt with a limited time. However, they had some difficulties in writing five-paragraph essays since they did not know how to organize the paragraphs and how to express their ideas in a smooth and logical way, as their teachers mentioned.
Small number of students was chosen for this study because the instructor (who was the researcher herself) needed to spend enough time for each individual to teach the instructions for revision. Moreover, a deep control over individuals was needed during the treatment in order to control the threats to the experimental research such as diffusion, and that was not possible with a large number of participants to take part in the study.
To control the mortality threat, the researcher collected adequate information about the UCI, the students and their visa status for the next few months. Being concerned about students' willingness to participate in the experiment was another factor that decreased the threat of mortality.
A writing test was given to all 26 students as a pre-test in order to measure their overall writing performance based on which students were divided into two major conditions, GG and C*AG.
Other instruments were three writing tests followed by GG and C*AG instructions for revision (see Appendices 1 and 2). These instruments were to make the students familiar with developing a five-paragraph argumentative essay and revising the essays using the goal instructions provided to them during the treatment. In addition to the writing tests used in the treatment sessions, another writing test was used for post-test.
Specific argumentative prompts from Educational Testing Service (ETS), with a variety of prompts regarding different subjects, were chosen and given to the students to write about. Argumentative prompts were chosen for the current study since the role of audience was more touchable and understandable in this particular type of writing for the students in comparison to other types. Moreover, students in UCI were most familiar with argumentative writing and had little knowledge about other types. The selected prompts given to the students in both conditions to write about are as follows:
Do you agree or disagree with the following statement? Parents are the best teachers. Use specific reasons and examples to support your answer.
If you could change one important thing about your hometown, what would you change? Use reasons and specific examples to support your answer.
Many students have to live with roommates while going to school or university. What are some of the important qualities of a good roommate? Use specific reasons and examples to explain why these qualities are important.
In some countries, teenagers have jobs while they are still students. Do you think this is a good idea? Support your opinion by using specific reasons and details.
People learn in different ways. Some people learn by doing things; other people learn by reading about things; others learn by listening to people talk about things. Which of these methods of learning is best for you? Use specific examples to support your choice.
The researcher tried to choose interesting prompts since students not being attracted with the subject and not having anything to do with it, does not let them do their best as it is highlighted by Worthy, Broaddus, and Ivey (2001), "strong writing begins in an interest to the topic" (p. 146). The chosen prompts are also chosen in a way appropriate to the age of the students and unbiased to gender, religion and their background.
In addition to other instruments, a holistic 0-9 scale used in IELTS was used to mark the essays in the whole process in order to measure students writing performance based on four aspects of essay writing in IELTS, which are Task Response (TR), Cohesion and Coherence (CC), Lexical Resource (LR), and Grammar range and Accuracy (GRA). This scale was chosen firstly because of its division into four different aspects of essay writing, which is the main concern in this study. Secondly, teachers (raters in this study) were familiar with the scale since a similar scale to the one used in IELTS was used in the Centre of English Language to score students' writing.
In order to eliminate the instrumentation threat, no changes occurred in the measurement scale between pre-test and post-test and the raters used the same instrument to mark the papers throughout the study. Furthermore, the same scale of measurement being used by the raters to score the papers ensures the consistency of the raters.
This study was designed to test the effect of revising goals focused on content and audience on students' writing performance and aspects of essay writing. The first aspect of essay writing in the scale used in this study was task response that refers to the extent to which student writers address parts of the task. It also determines that how developed is the position they present in answering to the question and how relevant, expected, and supported are the ideas they present. Therefore, students who fully address all parts of the task, and present a fully developed position to answer the question with relevant, fully expected and well supported ideas, get 9 for TR and those who could not manage to do so get lower.
Another aspect considered in the current study was cohesion and coherence. CC refers to arrangement, sequence and organization of the information and ideas presented in the essay, and to how skillfully the student writer manages paragraphing. As a result, those who use cohesion in such a way that it attracts no attention and skillfully and appropriately manage paragraphing get 9 and those who have very little control of organizational features and fail to communicate any message get the lowest score.
Lexical resource was the third aspect in the holistic scale used to score the essays. Student writers need to present a wide range of vocabulary with very natural and sophisticated control of lexical features; and rare minor errors occur only as 'slips' to get 9 for LR.
The last but not least aspect was grammar range and accuracy in that students who use a wide range of structures with full flexibility and accuracy and present rare minor errors can get 9 in this particular aspect. Otherwise, they get lower score.
3.4.1 Reliability of the Scale
The public version of the holistic 0-9 scale that is used to score students' essays in University of Cambridge ESOL Examination, IELTS idp Australia, and British Council was another instrument in this study. Scale reliability was run in SPSS for this instrument although it is a valid worldwide known instrument introduced by the University of Cambridge. It was found that the scale is highly reliable at Î± = .96, .95, .96, and .96 in TR, CC, LR, and GRA respectively.
3.5 Research Procedure
A total number of 26 EFL students were chosen among all English learners in Centre of English Language, Unity College International (UCI) to participate in this study. They were chosen based on their teachers' suggestions and students' willingness to participate in the study in order to eradicate mortality threat.
Students being from different countries and different backgrounds of English language were assigned into three different levels (low, medium, and high) based on their performance in the placement test given by Center of English Language. However, their levels were not taken into consideration in the current study since students' passing one level and sitting in a higher-level class was only dependent on the passage of time and students' participation in the classes. In other words, students could be found in the medium-level that were actually better than those in the high-level. Thus, In order to divide the students into two homogenous and comparable groups, a pre-test was conducted.
In order to divide them into two homogenous goal conditions, all of them were given a writing test and asked to spend about 40 minutes (Standard time in IELTS writing during which students should complete the task of writing a five paragraph essay on a given prompt) to write an essay on the chosen prompt. The prompt chosen for the students in pre-test was "Do you agree or disagree with the following statement? Parents are the best teachers". The students were told to use specific reasons and examples to support their answer (see Appendices 4 and 5).
After the test, collected papers were photocopied and made anonymous by the researcher in order to avoid bias scoring. Essays were scored on four aspects of essay writing (TR, CC, LR, and GRA) by two independent raters, via a holistic 0-9 scale used in IELTS. Each aspect of essay writing could receive a 0-9 score and the total score for the essay was a 4-devided sum of the four aspects scores.
After scoring the papers, 26 students were divided in to two groups of 13, named General Goal condition (GG) and Content plus Audience Goal condition (C*AG) based on their performance in pre-test, using systematic random sampling. Selection bias was controlled by the use of pre-test and systematic random sampling to determine homogenous experimental groups.
3.5.3 Justification of the Method
In a quasi-experimental research, it is essential for the treatment groups to be homogenous. In order to do so, first the scores of all 26 students were arranged in a descending order. Then they were put in each of the conditions in a way that the student with the highest and the one with the lowest score (7, 1.5) were assigned in GG condition, and the student with the second highest score and that with the second lowest (6.5, 1.5) were assigned to be in C*AG condition (see Table 4.1). The rest of students also were put in either of the goal conditions in the same way. This way, both conditions met the assumption of homogeneity in that the mean of GG (Mean = 4.077, SD = 1.681) and C*AG (Mean = 4.038, SD = 1.478) were almost equal in the pre-test.
The use of comparable groups in the study could help to control the effect of some threats to the internal validity such as interaction factors, that occur when subjects with different maturation rates are selected into the experimental groups. The use of pre-test and systematic random sampling to determine homogenous experimental groups helped to control selection bias, that is due to non-random selection of the sample. A threat from statistical regression was controlled by choosing participants with average grades in each condition because those with extreme scores would affect the outcome.
3.5.4 Goal Conditions Treatment
Being concerned about the factors that may threaten the results of the experiment, the researcher held the treatment sessions mostly for the students to practice writing and revising before they sit for the final test. Students' unfamiliarity with the revision process and the goal instructions they were going to use do revise their essays could affect their performance in the writing tests and consequently the results of the study. Therefore, they needed to become familiar with the nature of the writing tests they were going to sit for after the treatment, and most importantly the goal instructions they were to use, by practicing the same task in advance.
Since the aim of this study was to investigate the effects of different goals for revision, it was necessary for the students to be physically separated during the treatments so as any spread of the effects of two different treatments, named as diffusion threat, be prevented.
22.214.171.124 General Goal Condition Treatment
The experiment included three separate writing tests followed by another three sessions for revision. Each writing test was carried out in one week. During the experiment, students in GG condition were taught how to write a five-paragraph essay, and were asked to write about the given prompts in 40 minutes while they were informed that they would have extra time to revise their first drafts in the following session. In each session, the prompt was read to, and explained for, the students. Furthermore, they were provided with a written copy and a sheet to write their first draft on.
Students in the GG condition were informed about the importance of revision in the process of writing. The instructor explained that expert writers benefit from revision process by making improvements in the quality of their writings since they believe that having completed a draft, a writer needs to take another step which is reviewing and evaluating the draft. Afterward, students were provided with a written copy of their assigned instructions and were trained how to use general goal instructions (adopted from Midgette et al., 2008, p. 138) to reread their essays so that they could make any improvements. General goal instruction was broad, asking students to improve their essays generally and contained no explicit cues and guidelines focused on content and audience awareness.
Based on general goal instruction (see Appendix 1) students in GG condition were required to read and reread their first draft carefully, evaluate what they had written, look for areas that could benefit from revision, and make corrections while reading. They also were asked to write the final draft with all changes and revisions on the new sheets provided for them in advance, and finally, hand it in to the instructor.
126.96.36.199 Content plus Audience Goal Condition Treatment
For the C*AG condition also the experiment included three separate writing tests followed by another three session for revision. Each writing tests were carried out in one week. Just the same as in GG condition, students in C*AG condition were taught how to write a five-paragraph essay during the experiment, and were asked to write about the given prompts, which were like the prompts used for GG condition, in 40 minutes. Students in this condition also were informed that they would be given extra time in the following session to revise their first drafts. In each session, the prompt was read to, and explained for, the students. Furthermore, they were provided with a written copy and a sheet to write their first drafts on.
Having written the first drafts, students in C*AG condition were taught about the importance of revision and how expert writers get benefit from this process in order to improve the quality of their writings. Afterwards, they received a copy of goal instruction assigned to them and were trained to use the specific instruction different from that in GG condition. The instruction assigned to C*AG condition (see Appendix 2) focused on content and audience awareness that needed more explanation in order to make the students clear about the revision goals.
Regarding the content-based instructions (adopted from Midgette et al., 2008, p. 138) the instructor made the students aware of the steps a writer needs to take in the process of writing which are writing a draft and reading it several times in order to make improvements. They were told to make sure that they had taken a clear position toward the subject, to support their opinion with at least three reasons, to elaborate each reason with relevant examples and evidence, and to end with a conclusion.
Next, the instructor used the audience-based instructions (adapted from Reid, 1988, p. 80) to teach the students how to revise for an audience and told them that it is also important to think about the audience and what their "interests, experiences, education, prejudices" are (Reid, 1988, p. 80). They were told to think about the kind of relationship they have with their audiences, how successfully their essays communicate with them. Furthermore, the instructor asked them to see if their introduction attracts the audience's attention, provide the audience with the writer's clear opinion towards the topic. They were told to consider the audience's needs and expectations while revising body paragraphs and make sure enough information were presented about the tropic. They were also reminded of giving necessary information in detail to support their idea in the body paragraphs.
Finally, the instructor explained the necessity of writing a conclusion that helps the reader understand the importance of the essay and main idea of the writer. Having made all the content plus audience goal instructions for revision clear, the instructor asked the students to start reading and rereading their first drafts and focus on "specific reconsiderations" during each rereading (Reid, 1988, p. 80).
In the current study, teaching techniques and strategies, which are so common in the classroom, were applied. The instructor tried the most to be like a teacher, making connection with the students and helping them learn and improve in order to eradicate the artificiality threat. Therefore, subjects assigned in the experimental groups did not experience any artificial atmosphere. Moreover, the instructor tried to decrease the threat of contamination as much as possible by being unbiased and teaching to the students in both GG and C*AG conditions fairly.
Having passed the treatment sessions, students were informed that they would be given another writing test as their final test in the following session. They were told that writing and revising must be done in one session and there would not be any interval time between writing the first draft, revising and writing the final one.
In the post-test session, students in GG and C*AG conditions were separated into two classes being observed by a teacher. Unlike the treatment sessions, students were not allowed to ask any single question or discuss ideas neither with the teacher nor with the researcher since they had been already taught how to use the instructions to revise their essays.
The prompt chosen for both goal conditions was "People learn in different ways. Some people learn by doing things; other people learn by reading about things; others learn by listening to people talk about things. Which of these methods of learning is best for you? " (see Appendices 6 and 7). Students were provided with written prompt and sheets to write their first draft on. After that, students were provided with the goal instructions assigned to them to revise their drafts and write the final one with all corrections they had made on the new sheets.
The whole experience was done within eight weeks. Furthermore, pre-test, treatment sessions, and post-test were conducted at specified and short time intervals to control the effect of history of events that may change in between the time of sessions and test administrations and in turn affects the outcome of the study.
3.6 Assessment of the Papers
The researcher collected each set of papers from the students in each condition, GG and C*AG at the end of the tests. In order to avoid rater bias in marking the papers the essays were photocopied, anonymous, coded, and passed to the raters to be marked so that the raters could grade the papers without knowledge of the goal conditions and the students.
All the 26 papers from GG and C*AG conditions were scored independently by the two raters who were English teachers in the same college. Papers were rated on a holistic 0-9 point scale (see Appendix 3) which is used to assess essays in IELTS adapted online from University of Cambridge ESOL Examinations.
The scoring guide directed raters to judge the overall writing performance of a paper based on four aspects of essay writing, Task Response (TR), Cohesion and Coherence (CC), Lexical Resource (LR), and Grammar Range and Accuracy (GRA).
In making the judgment about task response, the raters were directed to consider the degree to which students' argument stated a clear opinion about the topic, provided supporting reasons for their opinion, elaborate their reasons with examples and explanations, and addressed alternative opinions. The scale asked raters to consider the presence, clarity, relevance, and significance of the content. Essays were also rated for cohesion and coherence, lexical resource, and grammar rang and accuracy. Cohesion refers to a close relationship, based on grammar or meaning, between parts of a sentence or a larger piece of writing, and coherence refers to organization and smooth connections among parts of the essay.
Lexical resource refers to the use of language adapted to the audience by use of appropriate vocabulary; range of vocabulary used in the essay; natural and sophisticated control of lexical features. Another aspect of essay writing in the scale was grammar range and accuracy that is range of structures in the essay with; flexibility and accuracy in use of the structures; sophisticated control over the structures.
Two independent raters, who were working as English teachers in the same college (UCI), marked the essays. They could be considered as experienced teachers and raters in that they both had more than 4 years of experience in teaching English skills to nonnative English speakers. Moreover, they were used to marking essays on aspects of essay writing close to those in the 0-9 scale chosen to be used to mark the papers.
Raters were trained by the instructor in a formal session before the experience, how to use the scale in order to score the papers. All the aspects (TR, CC, LR and GRA) were explained in details to the raters so that they knew what to look for in the essays.
3.6.2 Inter-rater Reliability
Inter-rater reliability (Pearson Correlation) was computed using the means of the scored between two raters (see Table 3.2). Moreover, t-test was computed to find out whether there is any difference between the means of the scores by each of the raters (see Table 3.3). High correlation between the means and no significant difference between them make us sure that the scores were reliable.
Table 3.2. Inter-rater reliability (Pearson Correlation)
Nature of the Test
Note: * Correlation is significant at the.01 level (2-tailed)
According to Table 3.2, correlation between two raters in pre-test and post-test was .884 and .854 respectively that shows significant high positive correlation between the means of the scores given by the raters.
Table 3.3. Inter-rater reliability (t-test)
Nature of the Test
Note: *Significant p at .05 level
According to the results shown on Table 3.3, no significant difference were found between the pairs of means in that t = 1.873, p = .07 > .05 in pre-test and t = .738, p = .46 > .05 in post-test. Therefore, the scores found to be reliable.
3.7 Threats to Experimental Validity
The validity of an experimental research is dependent on many factors that can threaten the validity and affect the outcomes (Shadish, Cook, and Campbell, 2002). For a study to be internally valid, factors that can jeopardize external validity must be under control. In other words, the more the threats decrease, the more valid an experimental design will be. Two types of experimental validity defined by Campbell and Stanley (1963) are Internal and External validity.
3.7.1 Factors That Jeopardize Internal Validity
The experiment does not have internal validity unless the independent variables have an authentic effect on dependent variables. It mostly depends on the treatment being effective and adequate evidence to support that (Campbell and Stanley, 1963).
As discussed earlier, there are factors that can threaten internal validity:
Selection bias that may occur due to non-random sampling. This threat was controlled by the use of pre-test and systematic random sampling to determine homogenous experimental groups.
Mortality that occur when there is a loss of participants. This threat was decreased by choosing the students based on their willingness to participate in the study, and by getting adequate information about the UCI, the students and their visa status.
Diffusion occurs when the comparison group imitates the program group after learning about them. Dividing the students physically in two different separate classes hold in different hours helped to control this threat.
Interaction factors that occur when subjects with different maturation rates are selected into the experimental groups was controlled by using comparable groups in the study.
Statistical regression which refers to tendency for subjects who score extremely high or low on a pre-test. That way of dividing students into two homogeneous groups and choosing participants with average grades in each condition helped to control statistical regression.
History that refers to the events that happens outside of the experiment and can affect the outcome of the study. Long interval between pre-test and post-test would increase the threat of history. This threat was controlled by conducting the tests at specified and short time intervals.
3.7.2 Factors That Jeopardize External Validity
External validity refers to the 'generalizibility' of the treatment outcomes (Shadish et al., 2002; Campbell and Stanley, 1963). External validity cannot be mathematically measured. However, it is defined as the degree to which the outcomes of an experiment are generalizable to the population from which the samples were selected. The subject selection method, controlling the procedures and selecting the appropriate design can be helpful to increase the external validity of the experiment. One method suggested by scientists for increasing the generalizability of the outcome is by repeating the experimentation. There are a variety of threats to external validity such as:
188.8.131.52 Interaction Effect of Testing
Pre-testing and experimental research are interrelated. However, the effects of pre-testing on the treatment may cause the results not to be generalized to the population which has not been pre-tested. In this study, a writing test was given to the students as a pre-test. The test could have no special effect on the treatment and the results since firstly, students in Centre of English Language were used to writing essays. Secondly, no feedback was provided for the students after scoring the papers.
184.108.40.206 Interaction Effects of Selection Biases and the Experimental Treatment
This threat is related to some selection factors of the experimental group that would not be the case of those groups that are randomly formed. In this study also, systematic random sampling was used to form the experimental groups. Hence, this threat was removed.
220.127.116.11 Artificiality of the Experimental Setting
Controlling extraneous variables may result in an artificial atmosphere and artificiality of the environment is one of the main obstacles to the experimental research. The setting is sometimes too different from the authentic situation to be generalized (Brewer, 2000; Levine and Parkinson, 1994). However, in the current study, teaching techniques and strategies, which are so common in the classroom, were applied. Moreover, the instructor tried the most to be like a teacher, making connection with the students and helping them learn and improve. Therefore, subjects assigned in the experimental groups did not experience any artificial atmosphere.
3.8 Data Analysis
3.8.1 Data Screening
Having collected the data, the researcher keyed in the scores and arranged them in order to be ready to be used. The data were also cleaned to make sure there was no outlier and that no regression of the data occurred to avoid statistical regression. SPSS software version 17 was used in this study to analyze the data. All the data and the results of the students' writing were keyed in and analyzed. There was no reduction of the data. Thus, the researcher controlled the regression threat.
3.8.2 Descriptive Statistics
Measures of central tendency and dispersion were used to describe the data. Mean and standard deviation would be useful to describe data on dependant variables which were scores of essay writing.
3.8.3 Inferential Statistics
The independent t-test was used to compare the differences in the pre-test and post-test scores between GG and C*AG conditions. The test could help to make sure groups were homogenous and there was no significant difference between the means. Independent t-test was also used to determine the difference of scores of dependant variables (aspects of composition writing) between groups in both pre- and post-test.
A dependant (paired) t test was used to compare the pre-post test differences in scores of writing within groups. The test was also computed to determine the differences in scores of each aspect of essay writing.
Correlation analysis using Pearson product moment correlation was used to determine the inter rater reliability between scores of two raters in the study. More discussion on data analysis can be found in the next chapter.