The reliability, validity and utility of self assessment
✅ Paper Type: Free Essay | ✅ Subject: Education |
✅ Wordcount: 5430 words | ✅ Published: 27th Apr 2017 |
This paper is a critical review of Rosss paper in which he presents a review of a research done on the reliability, validity and utility of self-assessment as a technique for improving learning. In his findings Ross (2006) reported that self assessment produced consistent positive results in terms of raising student achievement and improving behaviour for learning. According to the findings of the research it was found that strength in the use of self-assessment was embedded in training the students in the technique of assessing their own work. The stated purpose and aims of Ross's paper were to discuss four important questions posed by teachers on the subject of self assessment. Stated below is the same set of questions which will form the core for the discussion in this paper:
Is self assessment a reliable assessment technique?
Does self-assessment provide valid evidence about student performance?
Does self assessment improve student performance?
Is self assessment a useful student assessment technique?
In this paper I argue that whereas these may be fundamental questions to teachers (Ross, 2006) it is important to take a critical look at why they were found to be relevant to the subject of assessment, why they were raised and who would benefit from the result of their investigation. I also analyse the evidence-based assertions made by Ross regarding the subject of self-assessment and the literature he used to establish his findings. First I will start by discussing assessment in general, what it is and what are its purposes before I embark on the critical analysis of Ross's work.
Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Find out more about our Essay Writing Service
The Purpose of Assessment
School and schooling is about assessment as much as it is about teaching and learning. Black and William (1998a) define assessment in education as "all the activities that teachers and students alike undertake to get information that can be used diagnostically to discover strengths and weaknesses in the process of teaching and learning" (Black and William, 1998a:12). Assessment can therefore be a means of performance motivation for all stakeholders in the education system starting from the student right to the policy maker. Whereas assessment in schools may serve many purposes, Black, (1998b) sums them up into three main ones namely, support for learning, certification and accountability. The discussion throughout this paper however is confined to self-assessment as a formative form of assessment with the main objective of supporting students' learning. Whilst Freeman and Lewis (1998) agree that assessment in general can have a big influence on pupils' learning, paradoxically they accept that it can work against it (the learning) if teaching is done to the test, (i.e. with a focus on passing tests) while ignoring the significance and understanding of the concepts. This, they reiterate, "tends to encourage a passive reproductive form of learning" (Lewis, 1998:7) which defeats the purpose of assessment. Ross (2006) points out at the onset of his paper that assessment can be more of a stimulant to learning through prompting and motivation of students by way of giving them regular practice so they can see how well they are doing in the learning outcomes. Similarly, he asserts that giving prompt feedback on any tests done provides information that will help learners diagnose their strengths and weaknesses to help them improve their learning and understanding of concepts. Accordingly as Freeman (1998) suggests, involving the learners themselves through the use of self- assessment technique tends to help them understand their weaknesses better and aid them in planning what to do next thereby taking responsibility of their own learning. This is the central theme in Ross's paper and forms the step by step analytical review in this paper.
Is Self-Assessment a reliable assessment technique?
Before addressing the questions on self assessment in this paper I will focus briefly on some of the literature by the proponents of self-assessment technique, such as Boud, (2004), Orsmond, (2004) among others. In general terms, self assessment is what happens every time we do something and look back in the act of questioning or judging ourselves and making decisions about what we have just accomplished and what would be the next step (Boud, 2004). Self assessment means more than students grading their own work. It means involving the students in the processes of determining what is good for their learning and how they can achieve it. It requires them to consider the characteristics of a good piece of work and how they can apply this to their own work (Boud, 2004; Orsmond, 2004). Because the identification of standards and criteria used in self-assessment involves many activities, an effective self assessment process will require a great deal of preparation if it will serve the purpose it is intended to do. This paper will address the issues raised in Ross's paper (Ross, 2006) regarding the various aspects of self assessment and its benefits to the students, teachers and parents.
Self assessment is any activity which entails the learner rather than the teacher taking the lead, (Brooks, 2002:8). As asserted by Boud (1986) in Orsmond (2004), it is the 'involvement of learners in identifying standards and/or criteria to apply to their work and making judgements about the extent to which they have met these criteria and standards' (p.8). Regardless of the circumstances, the most important feature of self-assessment is 'who assumes the lead and who benefits in the process' (Brooks, 2002:68). Whereas Ross (2006) affirms that effective self-assessment helps pupils to become better learners, heightening self awareness and deepening their insight into the assessment process, this paper takes on the task of identifying the features that make the process effective, one of which is reliability, an issue that is about to be discussed in this paragraph. Reliability as used by Ross refers to 'the consistency of the result produced by a measurement tool under different circumstances,' (p. 2). Also Walkin, (1991) describes reliability as "the extent to which an assessment is consistently dependable and reliable when carried out by different assessors or by a single assessor with different candidates, or at different times of day and in different places" (p.10). In this section of the paper an attempt will be made to relate these definitions and / or descriptions of reliability to the evidence provided in Ross's paper regarding the reliability of self assessment. In the subsequent paragraphs, further attempt will be made to analyse the extent to which Ross used various scholars to establish how self-assessment can be a reliable assessment technique.
Ross, (2006) introduced his paper by observing that the majority of teachers researched were found to be widely using self-assessment although they still had doubts about the reliability of the technique. According to Ross, (2006) these doubts centred on the possibility of two extremes happening among the students. On the one hand it was found that students who were not well motivated and confused would have a tendency of over-estimating and inflating their achievement out of self interest whilst on the other hand those who were regarded as 'good kids' underestimated their achievement. Whereas Ross, (2006) observed that this discrepancy could possibly result in what he called a 'construct-irrelevant variance' (ibid.), which would most likely threaten the reliability of grading, one would still question the authenticity of the circumstances under which learners are observed to be either 'good kids' or 'ill-motivated and confused. The question of who sets the criteria and who determines the good and bad learners is an issue of contention as a possible lack of consistence in the learning environment and students' failure to cope under different circumstances may result in one student's good day to be a bad one for the other. Likewise students who fall in the weak category may find themselves retreating into disillusionment to the detriment of their performance which in turn could affect the reliability of self-assessment. Nevertheless, this paper will explore further the concept of good students and low achievers and the effect it has on their performance in self assessment. Basing on Klenowski's definition of self assessment (Klenowski, 1995 in Ross, 2006) Ross describes the process as bearing a formative element which aims to improve student learning.
Regarding reliability of self assessment Ross found what he called a 'high level of internal consistency' which typically refers to the 'ability of the technique to yield consistent results under different circumstances' (Walkin, 1991:10). Ross (2006) used examples of results from his own research coupled with that of other scholars such as Rolheiser and Hogaboam-Gray (2002-b) where they reported high 'internal consistence' in Mathematics and English. Further evidence he cited was in connection with consistence across tasks, quoting examples from Fitzgerald, Gruppen and White (2000) who examined self-assessment of medical students and found that a high level of consistence existed in the students' results across a range of tasks, and in particular pointing out performance in the 'examination of standardized patients' and 'interpretation of test results' (Ross (2006: 4).
The frequency of assessment is another factor Ross identified as having a bearing on the reliability of self-assessment. Ross (2006) cites scholars like Blatchford (1997), whose research findings indicated that there was less consistency in the results of tasks which were less frequently assessed, therefore indicating less reliability. Likewise findings from a study by Sung, Chiou and Hou (2005) revealed a greater reliability (high consistence) when the time periods between assessments is shorter. The age of the participating students was another factor found to have a bearing on the reliability of self-assessment. The reviewed research showed that the younger the students the less reliable were the results and likewise, there was a tendency for the older students to be more realistic in their approach to self-assessment of their performance, reflecting a higher level of reliability (ibid, in Ross 2006:3).
In answer to the question whether self-assessment is a reliable assessment technique, Ross (2006) used considerable amount of literature and backed his findings by evidence-based scholarly citations ranging from beyond a decade to the most recent on the subject of assessment. Consequently he summarised his findings on this question by observing that there was enough evidence to support self-assessment as a reliable technique. Notwithstanding, Ross (2006) emphasised that the level of reliability tends to be higher when the students are properly trained to evaluate their work and it is done more frequently over short periods of time. Likewise, it is less consistent when assessment is done over longer periods and especially so when done among younger children. In his reflections on reliability Ross makes no mention of inconsistence as a result of good or bad students but points to age as a mitigating factor, where young children can have a less realistic approach to self-assessment. This paper will discuss further evidence on the relevance of Ross's work to the subject of assessment and in whose interest it was published. In the following paragraph I present an analysis of Ross's attempt to address the question of validity of self-assessment technique.
Does Self-Assessment provide valid evidence about student performance?
Black, (1998) suggests that a test is considered to be valid if it measures that which those who prepared it intended to measure. In his paper, Ross, (2006) defines validity in self assessment as "agreement with teacher judgement" or "peer rankings" (p.3). In other words validity in self-assessment will be more obvious as we see how closely related the outcomes of the triangulation process appear. Whereas Ross's analysis of the research results done on 48 university students (Boud and Falchikov, 1989) revealed positive results regarding validity, there was concern regarding the quality of the studies. For instance, it was found that there were unexplained variations about what constituted agreement between the self-assessed and the teacher assessed result, the criteria used by teachers and students was undefined, as well as a "lack of replications involving comparable group of student" (ibid, p. 3).
Given the likely discrepancies Ross gives several reasons why self-assessments can at times be higher than teacher ratings. First he cites examples by some scholars such as Aitchison, (1995) in which he mentions that overestimates are likely if the self assessment contributes to the final grade of a course (Boud and Falchikov, 1989, in Ross, 2006:3). Secondly age of the participating students was again found to be a factor with a bearing to validity in as much as it was with reliability of self-assessment as discussed in the preceding section. It was found that the younger the children the more likely it was for them to overestimate their performance. This phenomenon was attributed to a possible lack of cognitive skills as well as getting over ambitious in their achievements. Ross (2006) established this fact by making reference to Butler (1990), who found that self-teacher agreement increased at a higher rate of correlation with age. However Ross (2006) further attributed a high rate of student-teacher agreement to training of the students in how to properly assess their work (Ross et al, 1999; Sung et al, 2005; in Ross, 2006). In this respect Ross, (2006) established that aspects such as "knowledge of the content of the domain in which the task is embedded" (ibid, p. 3); a knowledge that self assessment is going to be compared with teacher or peer ratings (Fox & Dinur, 1998); and when the application of the criteria involves low level inferences (Pakaslahti & Keltikangas -Järvinen, 2000); were key elements in the student-teacher agreement or the validity of self-assessment technique. However, Ross, (2006) makes a clarification that agreement between self-assessment and peer-assessment is likely to be higher than self-teacher agreement on the basis that students will normally interpret assessment criteria differently from the teachers, by possibly "focussing on superficial features of the performance" (p. 3). A higher rate of agreement between self-peer and self-teacher assessment could also be attributed to sympathetic tendencies between peers, overriding the genuine purpose of assessment by overestimating each other's performance, which would adversely affect the validity of the technique (Walkin, 1991).
To address the evidence of validity further, Ross (2006) makes reference to the proponents of assessment reform such as Wiggins et al, (1993) who recommended that along with every major work students were to submit a self-assessment focussing on the perception of their performance, (Ross, 2006:2). This was to be done possibly to determine the validity of self assessment in relationship to what Ross calls "agreement with an objective criterion" (ibid, p. 4). Ross, (2006) cites the work of (Cassady, 2001; Talento-Miller & Peyton, 2006) who established that university students were likely to be more realistic in their self-assessment when applying to graduate school under conditions where the self reports would be checked against official documents. In spite of such conditions however, Ross, (2006) points to the result of the study which showed that even under strict conditions, it was found that high achievers still gave accurate reports whilst low achievers reported their assessment less accurately and overestimated. Ross, (2006) attributed this to "likely social desirability or self-enhancement factors" (Ross, 2006:4). In his findings, Ross also revealed that there were still some variations to a certain degree within self-teacher agreement that could not be explained fully, citing causal instances such as students' inability to apply assessment criteria even in spite of training, students' personal interest, bias, and the possible unreliability of teacher assessments in relationship to student self-assessment.
Ross concludes his discussion on the question of validity by acknowledging that there are discrepancies revealed in the research he was reviewing. Nevertheless he submitted that such discrepancies should be the stimulus for further study and review of the evidence embedded in students' performance that might reveal the strengths and weaknesses in their learning process so that they can be well addressed through improved teaching. This leads us to the idea of consequential validity, the issue to be addressed in the next question.
Does self assessment improve student performance?
It was mentioned earlier in this paper that self-assessment is a form of formative assessment, which meant that it is a technique which aims at improving learning. The previous section has addressed the question of validity and it has been very clearly stated that a valid assessment is one that contributes to a student's learning by measuring those skills and/or knowledge it is designed to measure. In other words if the assessment tool does not focus on testing learning then it fails in its purpose to aid learning and cannot be regarded as valid, according to the definitions by Black, (1998) and Ross, (2006). This is what Ross calls 'consequential validity,' (Ross 2006:4) in which he argues that the worth of a test is determined by its consequences for the learners, asserting that inclusion of consequences as a dimension of test validity was found to be a key element in of self-assessment reform (ibid).
In addition to improving learning, Ross, (2006) also pointed out another aspect of self assessment that was directly concerned with the students' self efficacy and a stronger desire to achieve as provided in the work of Hughes, Sullivan, & Mosley, (1985) in Ross, (2006). Other scholars like Fontana and Fernandez, (1994) provided evidence that students could perform better in subjects like Mathematics when self assessment was used as one of the strategies to increase students' learning. To enhance performance and increase the consequential validity of the technique, Ross, (2006) identified the strategies of teaching students in self-assessment technique. These included the direct involvement of students in defining the assessment criteria, quoting an example where students can participate in the constructing of a rubric that expresses performance expectations (ibid, p. 5). Ross further reviewed some of the strategies used to teach students in applying the self-assessment criteria, noting that giving prompt feedback on self-assessment and engaging the students in evidence-based discussions of the variations between their self-assessed performance and that of their peers and teachers, also referred to as triangulation, (Black,1998) contributed greatly to improved learning. It was also observed that students would benefit from teachers' help in using the assessment data to develop realistic action plans both short and long term to overcome their weaknesses (ibid, p.5). Ross et al, (1999) reported that a sample group of students were trained in these strategies, when they tried on the self-assessment technique to test their learning; they outperformed their peers who had not received similar training in subjects like mathematics and geography (Ross 2006). Positive results of self-assessment were also reported in non-academic activities. A review of students' self assessment in areas of behaviour inside and outside the classroom showed that behaviour had improved as a result of being given a self assessment tool to monitor their trend of behaviour and matching it with their own action plans. It was reported that consistence in the use of the tool contributed to the students' high self-direction, increased positive interactions and there was evidence of a decline in disruptive and off-task behaviour (ibid).
Notwithstanding, Ross identified a few of the negative outcomes that were associated with self-assessment. An analysis of the interview data conducted by Ross et al, (2002) in a grade 11 mathematics classroom revealed that self-assessment contributed to an increased loss of confidence among the lower achievers and that they gave up trying after all while others resolved to get out of the difficult lessons all together. Ross, (2006) attributed this to what he calls "ego-protecting effort reduction" (p. 5). Ross backs up his report on the effects of self-assessment by engaging Bandura, (1997) on what he calls the "social cognition theory," which basically explains the conditional relationships between self-efficacy beliefs and outcome expectations. Bandura, (1997) elaborates on this theory that in given domains of functioning, self-efficacy beliefs vary in level, strength, and generality (ibid). However, he emphasises that "the outcomes of a process such as self assessment can take the form of positive or negative physical, social, and self-evaluation effects" (Bandura, 1997:22). Ross, (2006) emphasised that self-assessment contributed to 'self-efficacy beliefs' or 'the student's perception of their ability to perform the actions required of them in similar future endeavours' (p. 6); applying the concept that if the present tasks are performed to their full satisfaction, students would be more likely to succeed in future tasks (Bandura, 1997).
Ross explores further the question of whether or not self-assessment improves the students' performance, by providing more scholarly evidence review in his paper. Based on Bandura's understanding of self-assessment and increased self-efficacy beliefs, Ross, (2006) established that students with greater confidence in their ability to accomplish the target task are more likely to visualise success than failure, because they set higher standards of performance and set out to achieving them' (p.6). This introduces the ipsative element in the technique of self-assessment where a student is competing against himself or herself. As asserted by Bandura, (1997) students will display a considerable self direction in the face of competition, but in the case of self-assessment students would be comparing against their own performances, thereby setting their own realistic goals for their future accomplishments. Persistence and confidence improve the effort displayed in the performance which in turn influences positive outcomes. Failure in one task becomes a stimulus for the good students prompting them to further action. Pertinent to improved performance through the process of self-assessment therefore is the element of self-efficacy, (Bandura, (1997), self-confidence and effort (Ross, 2006).
Notwithstanding, the question of identifying some students as having high of these qualities while another group has less or none at all needs to be addressed. Could it be related to the social-economic home environment, or could it be attributed to a trend of weakness developed during the school routine? Whichever way it is worth investigating at the right point in time and not just settle with the knowledge that there are weak and strong kids in school. When all steps as suggested by Ross (2006) have been taken and students are trained to give them a sense of purpose in the use of self-assessment technique why would one fail to achieve more positive results? In a normal population however, variations in perception are likely to affect performance and therefore outcomes tend to vary as a result. So there will be students with low self-efficacy who will perceive failure as debilitating evidence that they are incapable of completing their own set tasks and therefore give up. Ross, (2006) reiterated this fact further from his finding that repeated negative self-assessment may lead to students setting unrealistic goals, adopting ineffective learning techniques which in turn affect the effort they put into their work. Eventually they start making excuses for their underperformance and sometimes leading to withdrawal. On the whole however, Ross, (2006) found enough scholarly evidence to show that self-assessment will foster an upward cycle of learning as demonstrated by the studies that found positive outcomes of self-assessment. I will conclude this section with an outline of three ways through which Ross found self-assessment to be contributing to learning. Ross's analysis of the scholarly work of Schunk (1996) found these to be among the processes that self-regulating students use to observe and interpret their own behaviour: They were,
Self-observation - self-regulating students will deliberately focus on specific aspects of their performance related to their own set standards of success, with a view to improve learning (Ross, 2006: 6).
Self-judgements - students make self judgements in which they determine how well they think they have met their general and specific goals also by comparing present with past outcomes in relationship to the specific set goals (ibid.).
Self-reaction - how the students interpret and react to the degree of achievement of their set goals, an indication of how satisfied they may be with the result of their actions. Self-reaction plays a major part in setting realistic goals for future learning as well as a major determinant of the students' progression cycle (ibid.).
According to Ross's findings, these elements can only be achieved and fully exploited through rigorous training which focuses on particular aspects of the students' performance (Ross, 2006:6). Ross justified his assertions by citing examples of the aspects of the students' performance he referred to, such as the dimensions of a co-constructed rubric and redefining the criteria students use to determine whether they were successful or not, and by evaluating teacher feedback to reinforce interpretation of their performance (ibid.). Ross's assertion on these influences of self assessment training was that they would "increase the likelihood that students will interpret their performance as a mastery experience, the most powerful source of self efficacy" according to Bandura, (1997) as cited in Ross (2006:6).
Is self assessment a useful student assessment technique?
The question of whether self assessment is a useful student assessment technique can best be addressed by first of all focussing on what it does for the student. Black, (1998) gives both practical and fundamental reasons for involving students in the assessment of their own work. Among these he mentions the practical element of getting them do some of the work themselves which, according to Black, (1998) allows the teachers the opportunity to carry through the programme of formative assessment. Secondly, Black points to a more fundamental reason that self assessment enhances the student-teacher relationship, as it makes the learner take "responsibility of their own learning" (p. 127).
As stated at the beginning of Ross's paper, (Ross, 2006) research has shown that the notion of involving students in assessing their own work has in the past been met with mixed feelings. Some of the reasons mentioned by scholars such as Brooks, (2002) and Boud, (2004) relate to a lack of understanding of the nature of the technique, and failure to implement it correctly in schools. According to Brooks, (2002) students who carried a negative attitude to self-assessment did so because they found it difficult to grasp the idea that they could carry out assessment of their work to support their own learning. An analysis of the study by Broadfoot et al, (1998) as cited in Brooks, (2002) reported that 'because students had little or no insight into the assessment criteria, or how teachers reached assessment judgements, they simply guessed what they thought teachers would think of their work, which defeated the whole purpose of self-assessment. According to the evidence provided in this paper and having reviewed Ross's work there is no doubt that self assessment is a useful technique for students when they are properly trained in its significance and its implementation. As Boud, (2004) would put it, the defining characteristic of self-assessment is
"…the involvement of students in identifying standards and/or criteria to apply to their work and making judgements about the extent to which they have met these criteria and standards." (Boud 1991, p.5 as cited in Boud, 2004)
In the paragraphs that follow, I will present a critical analysis of Ross's findings about the strengths and weaknesses of self-assessment technique which he reviewed in an attempt to justify why the technique is useful for students.
Strengths of self-assessment
Ross, (2006) addresses the question of usefulness of the technique to a greater degree, using scholars like Hughes et al, (1985); Schunk, (1996); Sparks, (1991), to prove that self assessment is a useful technique. These provide enough evidence to show that self assessment contributes to student achievement particularly if teachers provide the direct instruction in how to self-assess. Evidence is provided in an analysis of his own work (Ross et al, 1999; 2002-a; 2005) as cited in Ross, (2006) that self assessment contributes to improved student behaviour'( p.7). Further analysis of Ross's evidence-based findings revealed more benefits of self-assessment and why it was found to be useful:
Firstly it was reported that students found the technique useful because they gained a better understanding of what they were supposed to do as they were involved in setting the criteria for the assessment.
Secondly, because the technique enabled them to include important performance dimensions such as effort that would not normally be considered in assessment.
The third benefit according to Ross findings was that self assessment allowed the students to communicate information such as goals and objectives regarding their performance.
The fourth and last of the benefits in Ross's analysis was that students found self assessment to be useful because it gave them information they could use to improve their work. (Ross et al., 1998 as cited in Ross, 2006).
The fourth benefit is possible because of the ipsative nature of self assessment mentioned earlier in this paper, which provides students with the opportunity to focus on their attainment, rather that the normative comparison with others as it helps them concentrate on how to improve their own work.
Beside the students Ross, (2006) also reviewed research findings about how teachers benefited from the effective use of self assessment technique. The research established that making the assessment criteria explicit to the students helped the teachers to distinguish essential from less important features of student performance and as a result they would be more focussed in their teaching. It was also established that teacher-student conferences held as a follow up of self assessment would help to solve any discrepancies that may exist between self-teacher assessment and that this might give teachers further insight into the students' thinking and in particular help to iron out the misconceptions that prevent their further learning. Ross, (2006) acknowledges that there was little information regarding parents' reaction to self assessment; nevertheless he asserts that if the construction of the rubrics is done well and in clear language, it will meet the goals of the curriculum which will point to the interests of the parents.
Some noted weaknesses of self-assessment.
Ross, (2006) pointed out the number one concern for teachers regarding self assessment was the fear that sharing control of assessment with students would lower standards and reward students who inflate their assessment (p.7). However if teachers implemented the recommendations as provid
Cite This Work
To export a reference to this article please select a referencing stye below:
Related Services
View allDMCA / Removal Request
If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please: