Does informal formative assessment affect pupil performance

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Results of nationally applied external tests have dominated the accountability landscape, since the inception of the National Curriculum in the 1980's (Black, 1997). However, in autumn of 2008, the externally applied tests for pupils ages 14 where abandoned due to the marking fiasco. This has lead to new requirements for regular reporting, based on summative teacher judgements, by secondary schools (Black et al, 2010). The move was based on the growing body of evidence, on the detrimental effect of regular summative assessment (James, 2006; Mansell, 2007; ARG, 2006). This was designed to give teachers a more central and professional role in the assignment of grades to pupils (ARG, 2006).

This has had major effects on the delivery of the curriculum in science. With teachers reassessing their teaching and learning practice in accordance with the changed made. Undertaking more assessment for learning (AfL) and formative assessment in their lessons, so as, to gauge pupil performance.

Within the classroom context, teachers make judgements on pupil ability based on many factors; classroom dialog being one. This is termed informal formative assessment (sometimes it is called informal assessment; the terms are used interchangeably within the literature). But embedded in such practices remain elements of bias, effecting reliability and validity of the final outcome - as does any other system based on human judgements. This has been highlighted in lessons I have taught. An example will be discussed.

At the beginning of the academic year, I inherited a Year 8 class. The class consisted of 30 pupils, who were grouped according to ability. This was a class I had not taught before or had seen being taught; and therefore had no particular biases, towards the class. After teaching the class for two weeks, I started to pick up on the differences between pupils, with 3-4 pupils appearing more engaged than others; always entering into classroom dialog with me. I set the first level assessed task having high expectations those pupils, but was surprised to find that those particular pupils were not the highest achieving, but had in fact done averagely well, with the quieter pupils achieving the higher levels.

This highlighted a difference in what teachers perceive as the pupils ability lay and where the pupil's ability lay. Much of the teacher judgement is based on informal formative teacher assessment, As Dr Sue Horner writes "To get used to how they work, you could have one of the guideline sheets covering two levels with you as you are teaching and keep an eye open in class for evidence of those levels in what pupils say" (SecEd, 2008). This raises the question of how would such differences affect teachers overall summative judgement especially with regard to Assessing Pupil Progress (APP)?

Models of Assessment and APP?

Initially I started to question APP as a national assessment system. What other alternatives are there to national testing and what are the benefits of APP?

After the demise SATs at KS3, alternative systems where suggested as feasible methods of national assessment. In an article entitled "Considering alternatives to national assessment arrangements in England: Possibilities and Opportunities" Green and Oates, (2009) discuss possible alternative arrangements; offering suggestions as to several national systems. The authors argue the principle aim of assessment systems should be to (1) deliver information to the pupils and parents, leading to the enhancement of learning (2) operate systems of accountability for schools (3) and deliver robust information on the performance of the system for policy purposes. Based on these principles, the authors suggest several methods of delivering these objectives (Green & Oates, 2009).

Model 1, involves collecting data using national monitoring surveys, and using this information to monitor school standards over a period of time; allowing moderation of teacher assessments to be made. This method measures the change in marks, (making no judgement about the year-on-year test difficulty). The model is entitled 'Validity in monitoring plus accountability to school level' (Green & Oates, 2009).

Model 2, is entitled "Validity in monitoring plus a switch to 'School improvement inspection'". This model relies on national school inspection to provide accountability, with teacher assessment providing relevant information for the parents and pupils. National examinations would offer information for progression into higher education and work; whilst a monitoring survey would provide robust information on national standards.

Model 3, relies on the development of national ICT infrastructure delivering online on demand tests. This would provide information back to teachers, pupils and parents. The data would be built up in schools until a threshold is reached to offer a robust reflection of the performance of the school across the whole curriculum. The data feeds would be live, contributing to an ever growing body of national data on the underlying standards within the educational system.

But since the late 1990's, there has been a concerted effort in pushing the need for greater teacher assessment in national assessment practices. Reports by Daugherty in Wales (2004) and Tomlinson in England (2004), asserted the need for the increased role of teacher assessment in national systems of assessment. But such systems where teacher assess their students against the levels of the national curriculum are open to issues. One of the biggest problems is reliability; which is critical in any system based on human judgement (Harlen, 2004; Klenowoski, 2006). But even with the large push and the adopting of teacher based judgements in national assessment, there is no real evidence to support the idea that it can deliver a stable assessment outcome. In fact there is evidence that the system is associated with grade inflation (Wikistrom, 2005). But a leap of faith has been taken and since 2009, Assessing Pupil Progress (APP) has become the main vehicle used by schools to assess pupils progress through KS3 (Green & Oates, 2009). So what is Assessing Pupil Progress? What does it involve and how does it work?

Assessing Pupils Progress (APP) is a structured approach to periodic assessment in a subject (National Strategies). It draws together all the day-to-day assessments to gain a broader view of pupil progress. The APP process gives detail information, about how pupils are achieving, providing success criteria for the assessment across each level in the form of Assessment Focuses, which assess pupil progress across five strands in a subject (Appendix 1). Most of these strands being assessed through the use of level assessed tasks. The process itself does not require teachers to change aspects of their teaching, or to gather vast amount of evidence. APP is designed to enable teacher to,

use diagnostic information about pupil's strengths and weakness to inform teaching and learning;

Track pupils progress over a key stage;

help teachers make more consistent judgements relative to the national curriculum.

But like any system which includes formal and informal assessment how reliable are the judgements that are being made? And if AfL and formative assessment affect pupils learning, what effect does this have on pupils when undertake level assessed tasks?

What are the problems of teacher Summative judgements?

Although Assessing Pupil Progress offers, offers a better way of assessing pupils

There are many different purposes for which pupil work is assessed. The concern is to ensure that the way it is conducted provides information that is fit for purpose. Therefore teacher assessment should have implicit in them following qualities, Validity, Reliability, Impact, and Practicability (ARG, 2006).

Although all are equally important, validity and reliability of pupil assessment, have the largest implications in educational equity (Gipps & Murphy, 1994). The justification is that errors in assessment may have implication for both pupils and teachers; who are evaluated and judged through these assessments. There is much evidence to suggest bias in teacher judgement, most of these related to student characteristics including behaviour, gender, special educational needs (Harlen, 2006). There have been lots of studies carried out looking at teacher assessment and validity, and we will briefly review some of them.

In a research study by Wynne Harlen (2006), the author summarises the findings of a systematic in-depth review of 30 research studies (reduced from an initial 431 potentially relevant studies) on the reliability and validity of teacher assessment used for summative purposes. The researcher argues that both reliability and validity are not mutually exclusive. The relationship is usually expressed in a way that makes reliability the prior requirement, for example, if the range of assessed outcomes is narrow then the reliability will be high, whilst validity will be low, conversely extending the range of the assessed outcomes will lower reliability but increase validity. This recognition of the interaction between validity and reliability means, although it useful to consider each separately, what matters in practice is the way in which they are combined. This has led to the concept of dependability (Wiliam, 1993; James, 1998).

It is argued that in fact any issues relating to reliability with regard to teacher summative assessment are due to a lack of preparation (Harlen, 2006). With many teachers holding a narrow view of assessment, how are unsure how to use evidence from student's actions. Following criteria or guidelines is simply not enough, but what is requires is a structured programme of professional development (Harlen, 2006)

This is supported by a study carried out by Black et al, (2009) entitled 'Validity in teachers summative assessments' which explored ways of developing teachers understanding and practices in their summative assessments. The research looked at how teachers understand validity, and how they formulate there classroom assessment practices in light of that understanding. It was found the regime of external national examinations had undermined validity. The authors argue that teacher could readdress these issues by reflection on their values and engagement in a moderation departmentally and externally liaising with other schools in the moderation of work (Black et al, 2009).

There is much evidence to suggest bias in teacher judgement, most of these related to student characteristics including behaviour, gender, special educational needs (Harlen, 2006).

In an article entitled 'Assessment and age 16+ education participation' , using data on 1.4 million children aged 14 in 2002 - 2005, the authors looked at measures of bias in the national curriculum assessment at KS3, and how those are linked to pupil characteristics, considering it in the wider context of subsequent educational outcomes (Gibbons & Chevalier, 2008). The researchers found evidence of divergence between assessment and test score. Which diverged based on background, particularly gender. Although there was not significant effect on educational outcomes post 16.

Informal formative assessment is a major part of teachers summative assessment in APP; increasing its validity. Informal teacher assessment also contributes to the bias associated with teacher summative assessments and the level outcome of APP. Several studies have been carried out exploring the informal assessment in the classroom setting. There is evidence to suggest that teachers distinguish between children differently. With a more complex relationship, existing, between teacher's conceptual distinction and the level of interaction with the pupil (Savage & Desforges, 1995). Not only are there gaps in the way teacher distinguish between pupils in the classroom setting there are also subject-specific gaps in the way different classroom practitioners assess and describe their students informally (Watson, 2006). Most teachers base their views on classroom dialog. This raises the question as to how can informal formative assessment be improved in order to raise achievement and can a framework of be developed in order to measure informal formative assessment in the classroom? If so what effect will increasing informal formative assessment have on APP?

Informal and Formal formative assessment?

One has to first distinguish between the different types of assessment. Formative assessment is defined as assessment for learning (AfL), as opposed to assessment of learning. There is a large body of evidence suggesting implementation of AfL, improves understanding, raising pupil achievement (Black et at, 2003). Formative assessment involves gathering, interpreting and acting on information (Black, 1993). That is, information gathered through formative assessment should be used to modify teacher and learning activities; reducing the gap between desired student performance and observed student performance.

Bell and Cowie (2001), expanded on this issue, they defined two types of formative assessment; planned formative and interactive formative assessment.

Planned formative assessment is characterized by teachers, eliciting, interpreting and acting on assessment information (appendix 2). The assessment is usually planned; the teacher has chosen to undertake a specific activity. This is used to obtain information on which some action would be taken. The authors argue that the teacher can then act in a several difference ways, student references, science reference and core referenced (Bell & Cowie, 2001)

Interactive formative assessment takes place during student-teacher interaction. The specific assessment activity is not planned it usually arises from a learning activity. The process of interactive formative assessment, involves the teachers noticing, recognizing and responding to student thinking during interactions (appendix 3). Interactive formative assessment is used to mediate in the learning of individual students.

Both types of assessment are linked; both of them are different ways of carrying out formative assessment. Most teachers change from planned to interactive formative assessment during a lesson. This change is usually in response to focusing from the class to the individual, this can occur on more than one occasion during the lesson (Bell & Cowie, 2001). The difference between the two types of assessment can be summarized in a table, appendix 4.

The model espoused by Bell and Cowie (2001) has some key features. Formative assessment is a purposeful, and an intentional activity. Purpose, influences each of the aspects of planned formative assessment (eliciting, interpreting and acting) and interactive formative assessment (noticing, recognizing and responding).The action taken depends on the information or data collected. The action taken describes the fact that formative assessment is an integral part of teaching and learning and that it is responsive and adaptive to students needs. Formative assessment is a situated and contextualized activity. Formative assessment is shaped by the setting or context in which it carried out. Teachers and students enter a partnership with shared and constructed meanings. The central role of language informs active assessment. The teachers and students use language in both planned and interactive formative assessment to communicate their constructed meanings.

Presenting the theoretical framework - What is the link between Informal formative assessment and the ZPD (700 words)

Initially I started by investigating the link between classroom dialog and informal formative assessment.

Classroom dialog has been established as a legitimate object of study, (Edwards & Westgate, 1994). More recent studies have placed teacher-student talk in context, by examining patterns in classroom discourse (Cazden, 2001). Recently one of the patterns of student-teacher interaction has become subject to extensive discussions; it is described as the IRE or IRF (Initiation, Response, and Evaluation/Feedback).

In this sequence, the teacher initiates a query, a student responds, and the teacher provides a form of evaluation or feedback to the student (Cazden, 2001). The IRE and the IRF sequences are characterized by the teacher response, after asking what Nystrand & Gamara (1991) "inauthentic questions" in which the answer is already known.

Ongoing formative assessment occurs in a classroom learning environment, such as within daily classroom talk. This type of classroom talk has been termed an assessment conversation (Duschl, 2003). Assessment conversations permit teachers to recognize student's conceptions, mental models, strategies, language use, or communication skills, and allow them to use this information to guide instruction (Ruiz-Primo & Furtak, 2006).

Assessment conversations have three characteristics of informal assessment, eliciting; recognizing and using information. Using information from assessment conservations involves taking action on the basis of the student responses, to help students move towards learning goals; this resonates with vygotsky's zone of proximal development. The range of student conception at different points during a unit of study should determine the nature of the conversation; therefore more than one iteration of the cycle of eliciting, recognizing and using may be needed to reach a consensus; reflecting the most complete and appropriate understanding. Assessment conversation can thus be described in the context of classroom talk as an ESRU cycle, the teacher Elicits a question; the student responds; the teacher Recognizes the students response and then uses the information collected to support student learning (Ruiz-Primo & Furtak, 2006).

Constructivism is a contemporary view of knowledge; it is distinct from the traditions of realism and rationalism. Its origins lie, in the work of Kelly, who in the 1950's presented his theory of 'personal constructs'. The view revolves around the idea that knowledge cannot be transmitted, but must be constructed by the metal activity of learners (Driver et al, 1994). In 1964 Bruner, showed the importance of language in cognitive development. This was built on by the work of Vygotski, who suggested a model of learning which involved two layers (or profiles as they are sometimes called). An inner layer which consists of the learning an individual can achieve independently

Constructivism is a contemporary view of knowing, distinct from the traditions of realism and rationalism. Despite the variety of reaction to it as a learning theory, it has established itself in science education. The view that knowledge cannot be transmitted but must be 'constructed by the mental activity of learners, underpins contemporary perspectives on science education' (Driver et al, 1994).

Knowledge is constructed by individuals through interpersonal process, this is because as the biologist Humberto Maturana (1991:30) says "Sciences is a human activity", Bruner 1964 also showed the importance of language in cognitive development, this is around the same time as L.S. Vygotsky began to appear on the scene.

Vygotsky described the relationship between two profiles, using the term "Zone of proximal development" (1978:86) suggesting that a child has a personal ZPD. This represents the distance between the actual developmental levels, as determined by independent problem solving and the level of potential development as determined through problem solving under adult guidance or collaboration with more capable peers. The size of the ZPD depends on:

The child's current level of development

The child's current mental construction

The discourse between child and his/her environment

The adult's action in guiding the child's learning has been likened by Bruner (1985:25) to the construction of scaffolding.

ESRU Cycles





By combining, Vygotski's ZPD with the ESRU cycles, we develop the idea, that with increased ESRU cycles in lesson, the zone of proximal development should decrease in size therefore, increasing pupil understanding. This forms the basic premise of the study. By using the model espoused above, the relationship between ESRU and ZPD should become more clearer.

How would the data be collected? (500 words)

The research revolves around the question 'what effect does informal formative assessment have on pupil performance on level assessed tasks?' in other words in there a link between the number of ESRU cycles and student performance on level assessed tasks?

The study will be carried out with the Science department at Mill Hill County High School. The school is an over-subscribed mixed comprehensive. All pupils within the school are placed in ability groupings from Year 7, with the groupings changing every year based on teacher assessments and end of year tests, administered by each department respectfully. The science department uses a bought scheme of learning, entitled 'WIKID' which is produced by UPD8, to deliver the curriculum from Year 7 - Year 9. The scheme contains its own level assessed tasks, for each of the assessment focuses (Appendix). The research study will take place over the course of two terms, with the aim of collecting as much

Four classes will be selected, two from either side of the year group. The classes will be mixed ability sets (02, 03 and from the opposite side of the year group 08, 09). The pupils will ideally be studying the same topic from the WIKID scheme of learning (although any effect the topic will have will be negligible due to the nature of the investigation). All the classes will be selected from Year 8. In sets 02 and 03, the classroom teacher will be instructed to offer support to pupils as they carry out the tasks. Whilst in sets 08, and 09, the classroom teacher will be instructed to offer no support to the class.

The classroom teacher's participating in the study will be asked to videotape their lessons, every time they carry out a level assessed task with the class (this will only apply to classroom teachers of the 02, and 03 sets); the participating teachers will be supplied with a video camera, microphone and videotapes. The teachers will be asked to submit their videotape every time they have carried out an in class level assessed task, along with the levels achieved by the class on the task.

The videotapes will be analysed and transcribed. The transcripts will identify whether statements were made by students or teachers. Each of the speaking turns made by the teacher will be numbered and turned into verbal units (VU's). The verbal units will be identified and determined by the content of the teacher's statements. I would then identify the assessment conversations that took place within each transcript (Duschle & Gitomer, 1997). The assessment conversations with the class and the focus pupils will be selected. The assessment conversations, identified during each instructional episode will undergo further coding based upon the ESRU model.

This should help; identify the exact strategy being used by the teacher in a particular speaking turn, and or verbal unit. Each strategy that is coded will be mapped out onto the ESRU model. This should help identify instances in which the teacher - student conversation followed the model; being able to identify complete and incomplete cycles and linking them to one of the domains i.e. Conceptual or Epistemic etc.

The identification of the ESRU cycles will be done independently by two coders; to determine the consistency of the coder in identifying ESRU's, intercoder reliability will be calculated. In order to check the consistency in the awarding of levels by the classroom teachers, samples of pupil work will be collected and moderated.

How would the data be presented? (500)

What would the data tell me and further work? (500)