The Effects Of Formative Assessment Education Essay


While the research presented above seems to indicate that the use of assessment for learning may have significant impact on learning, it is important to note that alternative reviews find very different effect sizes for the benefits of this type of assessment. Kluger and DeNisi (1996) found an average effect size of 0.41 for feedback interventions, while Black and Wiliam (1998a and 1998b) estimated that the effects of formative assessment were around 0.4-0.7 standard deviations. Shute (2008) suggested a similar range (0.4-0.8) but Hattie and Timperley (2007) proposed an average effect size of 0.96 standard deviations for the effect of formative assessment. On the other hand, in a classroom setting, carried out over a year, with ordinary teachers, Wiliam, Lee, Harrison, and Black (2004) found that a range of assessment for learning strategies introduced by teachers had an effect size of 0.32 standard deviations. This is indeed a significant effect, equated by the authors as being equivalent to an increase of the rate of student learning of 70%, but this still only represents one third of the size of effects found in the research by Hattie and Timperley.

This variability is in no doubt partly due to the differences in measuring the results of formative assessment in the various studies and reviews, and also by external influences, such as the variations in the populations that they researched. Many studies included in reviews of research are conducted on sub-populations that are not representative of the whole population. For example, if an effect size is calculated in a study of different interventions for students with special educational needs, then that effect size would not be able to be generalised to the whole population - where the population is more variable. This could cause a positive or negative skewing of the results encountered by the researchers.

However, it is entirely possible, and indeed likely, that a substantial part of the variation in findings and effect sizes is caused by differences in how the ideas of formative assessment or assessment for learning were operationalised. As Bennett (2009) points out, in an important critical review of the field, one cannot be sure about the effects of such changes in practice unless one has an adequate definition of what the terms formative assessment and assessment for learning actually mean, and a close reading of the definitions that are provided suggests that there is no clear consensus about the meanings of the terms formative assessment and assessment for learning.

7. Discussion

All of the studies reviewed showed that innovations that include strengthening the practice of formative assessment produce significant, and often substantial, learning gains. The studies ranged over ages, schools, subjects and countries. The mean effect sizes for most of these studies were between 0.4 and 0.7. This shows that truly significant improvements could be made in educational attainment if formative assessment strategies were implemented effectively and consistently in schools. The following examples illustrate some practical consequences of such large gains:

• An effect size of 0.4 would mean that the average (i.e. at the 50th percentile) pupil involved in an innovation would move up to the same achievement as a pupil at the 35th percentile (i.e. almost in the top third) of those not involved.

• A gain of effect size 0.5 would improve performances of students in GCSE by at least one grade.

• A gain of effect size 0.7, if realized in international comparative studies in mathematics (TIMSS; Beaton et al. 1996), would raise England from the middle of the forty-one countries involved into the top five.

Black & Wiliam (1998a)

Many of the studies reviewed showed that improvement in the use of assessment for learning strategies helped the lower attainers more than those of higher abilities, and so reduced the gap between higher and lower abilities, while raising attainment for all learners.

Any real innovation in formative assessment, therefore, cannot be treated as merely a marginal change in classroom practice. All assessment for learning strategies involve some level of feedback and dialogue between those teachers and pupils, and it is the quality these interactions which is must be at the heart of pedagogical improvement. The depth and quality of interactions between students and teachers, and collaboratively, of learners with each another, are the vital determiners for the outcomes of any significant changes and improvements. This creates a difficulty, as it is clearly troublesome trying to obtain data about the quality of formative interactions from many of the published reports. The studies do draw attention to some of the variety of ways in which interactive assessment for learning strategies can be embedded into future teaching practice. Another potential problem in evaluation that arises in the analysis of the studies is that almost all of the research was clearly pursuing ends as well as means, so that none of the studies can be seen to provide fully unambiguous comparisons into the effectiveness of the various innovations that were studied and alternative approaches. It is clear, that in each of the studies there are 'underlying…assumptions about the psychology of learning [which] can be explicit and fundamental' (Wiliam, 2011: 8) as evidenced in the constructivist basis of Fernades and Fontana (1996), or in the diagnostic approach of Bergan et al. (1991).

For assessment to be formative the feedback information has to be referred to and used to inform classroom practice. This approach is most effective when teachers follow a set of determined rules about how the feedback should inform next steps (Fuchs & Fuchs, 1986) which means that a significant aspect of any approach will be the treatment and development of systems to respond to the feedback received when formative strategies are used. Different practitioners will likely bring different assumptions about the nature and structure of learning tasks which will provide the best results in challenging learners and improving academic attainment (Black & Wiliam, 2009). The differences in priorities across these personal inward assumptions create the possibility for a range of future research studies investigating formative assessment (Black et al, 2002)

Despite these potential difficulties, it seems clear from the data collected, that substantial improvement in attainment may be achievable through the effective implementation of assessment for learning strategies in the classroom. The fact that such gains have been achieved through various formative assessment strategies seems to indicate that it is this feature which accounts, at least in part, for the improvements recorded in the studies. It could also be seen to show that the positive outcomes experienced may not depend on the particular details of any single strategy, but rather a change in the mindset that regulates the nature of student teacher interactions from a summative, results driven, focus, to one in which data is constantly tracked, reviewed and shared with pupils in order to inform and influence future teaching on an individual level.

8. Recommendations and Conclusions

The above discussion raises some questions that could be considered in future research investigations. There are clearly a range of factors which exist to determine the effects of any classroom regime. In light of these questions, it is clear that many of the studies reviewed to this point have not attended to some of the important aspects of the situations being researched. A list of some important and relevant aspects could include the following:

assumptions based on theories of psychology and learning underlying the curriculum and pedagogy;

the interpretative framework and rules used by both teachers and learners to respond to this evidence;

the learning work used in acting on the interpretations so derived;

the divisions of responsibility between learners and teachers in these processes;

the self perceptions held by the learners about themselves as learners about their own learning work, and about the aims and methods for their studies;

the perceptions and beliefs of teachers about learning, about the abilities and prospects of their students, and about their roles as assessors;

issues relating to race, class and gender, which appear to have received little attention in research studies of formative assessment;

the extent to which the context of any study is artificial and the possible effects of this feature on ability to generalise the results.

(Black & Wiliam, 1998a; Bennet, 2009; Wiliam; 2011).

It would be extremely difficult, though not impossible, to design studies to address some of these unresolved issues. For example, determining students self-perceptions, motivations or beliefs about themselves as learners creates difficulty when attempting to provide robust quantitative data. There is clearly a need for a combination of further quantitative studies with richer qualitative studies of perceptions, processes and interactions within the classroom (Assessment Reform Group, 2002).

There are two specific problems and assumptions that need to be addressed. The first is the evidence seen in some of the studies that formative assessment is of particular benefit to the traditionally lower attaining learners, but which is not in evidence results of other studies. These contradictions may have arisen because of techniques or features of the studies which have not yet been properly recorded and interpreted. If assessment for learning strategies can provide the impetus to narrow the gap in school academic attainment of less able learners, then there are very strong social and educational reasons prioritising the research and development of this sensitive issue.

The second potential problem arises from the perceived tensions, of students and educators, between formative and summative assessment purposes which their work may be judged by. Due to the focus by government and business of summative exam results, formative work will always be insecure because of the threat of renewed dominance by the summative (Black & Wiliam, 1998a).

It seems clear that building a single unifying theory and approach in regard to the implementation of formative assessment strategies, which could be used as a guide for all teaching practitioners would be a formidable task. There are, however, examples of this approach that can be learned from to help immediately improve classroom practice. The range of 'conditions and contexts under which studies have shown that gains can be achieved must indicate that the principles that underlie achievement of substantial improvements in learning are robust' (Wiliam, 2011: 9). It would appear that significant gains could be achieved by using many different strategies, and, as such, such approaches would not be likely to fail due to the misapplication of subtle or delicate features.

This final consideration is a very important one, because there does not emerge, from current literature, any single optimum model of assessment for learning pedagogy. What arises from the studies is more of a set of guiding, underlying, principles, so long as they are implemented as a main focus, rather than a marginal influence, on classroom practice, which must be incorporated and utilised by each teacher into his or her practice in his or her own way (Broadfoot et al., 1996). That is to say, educational reform of this nature will likely take a long time, and need support from policy makers, researchers and teaching practitioners.

