This paper presents a review of the literature on classroom formative assessment, or assessment for learning. Several studies show firm evidence that innovations designed to strengthen the frequent implementation of formative assessment strategies yield substantial learning gains. The perceptions of students are considered alongside analysis of the strategies used by teachers and the formative strategies incorporated in systemic approaches to teaching. There follows a more detailed and theoretical analysis of the nature of formative assessment, which provides a basis for a discussion of the development of theoretical models for formative assessment and of the prospects for the improvement of practice.

2. Introduction

Assessment for learning is often referred to as formative assessment, and can be defined in various ways. To aid clarification, the definition of formative assessment used in this paper is meant to include:

'all those activities undertaken by teachers - and by their students in assessing themselves - that provide information to be used as feedback to modify teaching and learning activities. Such assessment becomes formative assessment when the evidence is actually used to adapt the teaching to meet student needs' (Black & Wiliam, 1998b: 140)

From this definition formative assessment can be conceptualized as consisting of five key strategies:

1. Clarifying and sharing learning intentions and criteria for success;

2. Engineering effective classroom discussions and other learning tasks that elicit evidence of student understanding;

3. Providing feedback that moves learners forward;

4. Activating students as instructional resources for one another;

5. Activating students as the owners of their own learning.

(Black & Wiliam, 2009)

The research into assessment for learning has led to the development of a theory of formative assessment which attempts to define all formative interactions as those 'in which an interactive situation influences cognition' (Ibid: 11).

The starting point of the work on formative assessment that is described in this paper was the review by Black and Wiliam (1998a). This review covered a very wide range of published research and provided evidence that formative assessment raises standards and that the assessment practices of the period were weak. However, there seemed to be very few resources to help teachers put the research findings into practice. Partly in response to this perceived lack of help, Black and Wiliam published the booklet Inside the Black Box (1998b), which served four main aims:

• To give a brief review of the research evidence.

• To make a case for more attention to be paid to helping practice inside the classroom.

• To draw out implications for practical action.

• To discuss policy and practice (Wiliam, 2011).

The review by Black and Wiliam (1998a) involved studying reviews of research published up to 1988 and then checking through the issues of over 160 research journals and books for the years 1988 to 1997 and their review drew on material from 250 sources. One of the priorities identified in evaluating the research reports was to identify and summarise studies that produced quantitative evidence that innovations in formative assessment can lead to improvement in the learning of students.

Since the publication of Black and Wiliam's review there has been a greater focus on issues surrounding assessment for learning and their potential benefits to teachers and students in raising classroom attainment. In 2008 the DCSF published The Assessment for Learning Strategy which presented the features and potential benefits of formative assessment as shown in the image below (DCSF, 2008:5).

It seems that there is now a consensus in many educational circles that assessment for learning is one of the most significant, ways of raising attainment within schools.

The aim of this paper is to review and critically analyse some of the most significant evidence that has been gathered regarding formative assessment, and whether it warrants the focus that is now being placed upon its use by teachers and students in our classrooms today.

3. Ethics

The purpose of this literature review is to analyse and assess the efficacy assessment for learning strategies on improving pupil attainment, and as such is designed to have a positive impact on teaching and learning practice, ensuring that teaching and assessment time is used as effectively as possible. As such, there are unlikely to be any negative or harmful consequences as a result of this paper. In its Ethical Guidelines for Educational Research BERA state that educational research aims to 'extend knowledge and understanding in all areas of educational activity and from all perspectives' (2011: 4), and this paper will attempt to meet these high aims.

In accordance with the BERA guidelines care will be taken, when reviewing studies, to ensure that the results are not used in any way other than was intended by researchers, and that was made explicit to participants so as not to impinge upon the terms of voluntary informed consent, right to withdraw and privacy afforded to them in the original studies.

The paper will consider the context and methodology of each research study, and will only include those which are deemed to meet the high ethical standards laid out by BERA (2011) in their Ethical Guidelines for Educational Research.

4. Methodology

Mainly quantitative research was considered and collated, across a variety of education platforms, and in a variety of regions of the world, and so the research has been analysed according to the following criteria, in order to aid selection and interpretation:

Focus - What was the intended focus of the research?

Context and coverage - Where was the study undertaken? At what level of education? How big was the sample size? When was the research completed? Where was the research undertaken?

Perspective - Is there neutral representation of the data or is there any bias toward a specific outcome?

Methodology - How was the research conducted?

Audience - What was the intended audience of the research?

Findings - Are the findings significant and can they robustly support the conclusions drawn?

Impact - What is the impact of the study and is it relevant to the review?

Limitations - What limitations or deficiencies exist in the research?

Areas for future development - Does the research lead to further areas that can or need to be researched in future?

Due to the sheer number of studies into the effects of assessment for learning The difficulty in performing this review was in selecting the most appropriate works and research studies that have been conducted and written to this point, and also in collating the findings appropriately. Student progression and attainment can also be measured in various ways, but an attempt at synthesis has been made in order to provide the reader with useful and robust data to support the conclusions of the paper.

The following section reviews the literature that was selected using the above methodology. The studies chosen were all based on quantitative comparisons of learning gains, and for being rigorous in using pre- and post- tests and comparison of experimental with control groups. It is not implied, however, that useful information and insights about the topic cannot be obtained by work in other paradigms.

5. Literature Review

In this section summarised accounts will be presented of research which was selected and reviewed according to the criteria outlined in Sections 3 and 4, and which illustrate some of the main areas and issues involved in research which aims to secure evidence about the effects of formative assessment.

The first is a project in which 25 teachers of mathematics in Portugal were trained in self-assessment methods on a 20-week part-time course, which they then put into practice as the course progressed with 354 students of aged between 8 and 14 years old (Fontana & Fernandes, 1994). The students of a further 20 teachers, who were taking a different course in education, acted as the control group. Both groups were given pre- and post- tests of mathematics achievement, and both spent the same amount of time in class on the study of mathematics. Both groups showed significant gains over the period, but the experimental group's mean gain was about twice that of the control group's gain. The focus of the assessment work was on regular self-assessment by the pupils which involved teaching them to understand both the learning objectives and the assessment criteria, giving them opportunity to choose learning tasks and using tasks which gave them scope to assess their own learning outcomes.

This research gives robust evidence of attainment gains when using formative assessment strategies. The authors of the study reflect that additional work is required to look for long-term outcomes and to explore the relative effectiveness amongst the various techniques employed in concert and in isolation. In this study the two outstanding elements found were the focus on self-assessment and the implementation of this assessment. It was not conclusive that one or other of these features, or the combination of the two, is responsible for the gains.

The second example had its origin in the idea of mastery learning, but departed from the orthodox ideology in that the authors started from the belief that it was the frequent testing that would be identified as the main reason for the increase in the learning achievements reported for this approach. The project was an experiment in mathematics teaching (Martinez & Martinez, 1992), in which 120 American college students in an introductory algebra course were placed in one of four groups, two experimental and two control groups. The experimental group were tested three times as often as the control group throughout the course and the results of a post-test showed a significant performance advantage for those tested more frequently.

This study has similar statistical measures and analyses, as the first example, but the nature of the two studies is quite different. It could be questioned as to whether frequent testing really constitutes formative assessment and this question would have to focus on the quality of the teacher-student interaction and on whether test results actually could be considered as constituting formative assessment in the sense of it leading to intervening action taken to close any gaps in performance (Ramaprasad, 1983).

The third study reviewed here was undertaken with the teaching of kindergarten children who were aged 5 (Bergan et al., 1991). The motivation for the study was a belief that focused attention to the early acquisition of basic skills is essential. It involved 838 children drawn from mostly disadvantaged home backgrounds in the USA. The teachers of the experimental group implemented a measurement and planning system which required an initial assessment input to inform teaching at the individual level, consultation on progress after two weeks, new assessments to give a further diagnostic review and new decisions about students' needs after four weeks, with the whole course lasting eight weeks. The teachers used observations of skills to assess progress. Outcome tests were then compared with initial tests of the same skills. Analysis of the data showed that the experimental group achieved significantly higher scores in tests in reading, mathematics and science than the control group. The tests used, which were multiple-choice, were not adapted to match the open child-centered style of the experimental group's work. It is important to note, however, that of the control group, on average 1 child in 5 was referred as having particular learning needs and the corresponding figures for the experimental group were 1 in 17 and so this may indicate an area of weakness in the balancing between control and experimental groups within this study.