"Teaching and learning are reciprocal processes that depend on and affect one another. Thus, the assessment component deals with how well the students are learning and how well the teacher is teaching" Kellough and Kellough (1999).
A teacher's main role is to promote quality learning among students. This is possible only when teachers act as a guide and the students actively participate in the process of learning. During, and even before, the teaching-learning process, teachers should locate and identify the areas where the learner commits mistakes. It is the crucial stage of the teaching-learning process where students are diagnosed and instructional material for remedial teaching prepared to ensure the desired quality of learning. Hence diagnostic testing and remedial teaching are very essential for ensuring effective learning and in improving the quality of education at all levels.
Teachers assess students so that they can identify areas of weakness either with individuals or small and whole groups. The results of these assessments are what drive a good teacher's instruction. In essence, assessment has little to do with the student performing or not performing and everything to do with what a teacher is going to do with the information she obtains from a given assessment.
This review of literature is organized as follows:
Definition of assessment
Purposes of assessment
Types of assessment
Remediation in the math classroom
Test development process
Definition of assessment
Black and William (1998) define assessment broadly to encompass all activities that teachers and students undertake to get information that can be used to alter teaching and learning. Assessment thus, includes teacher observation, classroom discussion, and analysis of student work, including homework and tests. Nitko (2004), states that assessment involves gathering and using information to optimize teaching and learning. Assessments becomes formative when the information is used to adapt teaching and learning to meet student needs.
According to Gronlund (2003), there are two major questions that teachers need to answer before proceeding with instruction:
To what extent do the students possess the skills and abilities that are needed to begin instruction?
To what extent have the students already achieved the intended learning outcomes of the planned instruction?
These questions can be obtained from diagnostic assessments in the form of readiness pretests and placement tests.
Purposes of Assessment.
Assessment serves different purposes. For administrators, it serves to hold schools and principals accountable. For schools, a method of distinguishing between a cohort of students; for universities, it serves to aid recruitment and selection and for teachers, as assurance that the learning outcomes of a programme have been met, and most importantly to improve student learning. Assessment plays a crucial role in the education process. It is more than simply giving marks or grades. It is an evaluation or appraisal of students' work. Experts in the field of assessment (kellough and kellough,1999; McMilan, 2000; Black and William, 1998; Gronlund, 2003) have delineated several purposes of assessment including:
To assist student learning.
To identify students' strengths and weaknesses.
To assess the effectiveness of a particular instructional strategy.
To assess and improve the effectiveness of curriculum programs.
To assess and improve teaching effectiveness.
To provide data that assist in decision making
To communicate with and involve parents.
Assessment is thus, formative, summative, or diagnostic depending on the use made of the results obtained.
"When the cook tastes the soup, that's formative assessment; when the customer tastes the soup, that's summative assessment" (Black, 1998).
Black (1998) used the above analogy to differentiate between the two major types of assessment (formative and summative) used in education. Summative assessments are cumulative evaluations used to measure students' growth after instruction and are generally given at the end of a course in order to determine whether long term learning goals have been met (Garrison & Ehringhaus, 2007). It is a formal testing of what has been learned in order to produce marks or grades which may be used for reporting to parents and other stakeholders. Summative evaluation concentrates on learner outcomes rather than on student improvement. Summative assessment is characterized as assessmentÂ ofÂ learning. Very often, summative assessments are used to grade or promote students (Nitko, 2004; McMilan, 2004).
Because they are occurÂ afterÂ instruction, summative assessments are used to help evaluate the effectiveness of programs, school improvement goals, or curriculum alignment (Garrison & Ehringhaus, 2007). Because summative assessments happen near the end of learning, it does not provide information at the classroom level to make instructional adjustments and interventionsÂ duringÂ the learning process.
Formative assessment forms part of the instructional process. When integrated into classroom sessions, it provides the information needed to adjust teaching and learning while they are happening. Hence, formative assessment informs both teachers and students about what students know and can do at a point when adjustments can be made to the teaching-learning process. Formative assessment is therefore usually referred to as assessment for learning.
Black et. Al (2004) assert that assessment for learning is any assessment for which the first priority in its design and practice is to serve the purpose of promoting pupils' learning. It thus differs from assessment designed primarily to serve the purposes of accountability, ranking, or certifying competence. An assessment activity can help learning if it provides information to be used as feedback, by teachers, and by their pupils, in assessing themselves and each other, to modify the teaching and learning activities in which they are engaged.
Ireland's National Council for Curriculum and Assessment (2004) holds the view that assessment contributes significantly to teaching and learning and is endorsed in recent policy documents, including the Primary School Curriculum. They believe that formative assessment has a central role to play in the teaching and learning process.
In a review of the English-language literature on formative assessment, Black and William (1998) concluded that:
"â€¦formative assessment does improve learning. The gains in achievement appear to be quite considerable, and as noted earlier, among the largest ever reported for educational interventions. As an illustration of just how big these gains are, an effect size of 0.7, if it could be achieved on a nationwide scale, would be equivalent to raising the mathematics attainment score of an 'average' country like England, New Zealand or the United States into the 'top five' after the Pacific Rim countries of Singapore, Korea, Japan and Hong Kong." (Black and Wiliam, 1998)
This conclusion was drawn from a review of more than 250 articles on formative assessment.
Wiliam et. al (2004) found that over the course of a year, the rate of learning in classrooms where teachers were using short- and medium-cycle formative assessment was approximately double that found in other classrooms. Furthermore, teachers reported greater engagement by students in learning and increased professional satisfaction.
Many educators use the terms formative assessment and diagnostic assessment interchangeably. It is important, however, to differentiate between the two. Unlike formative assessment which is an on-going process, diagnostic assessment refers to the use made of the information gained from administration of a test. This test, administered prior to instruction, seeks to ascertain each student's strengths, weaknesses, knowledge, and skills. It is used to detect students who may need special remedial help or special or alternative instruction (Nitko, 2004). It is expected that the results obtained from these tests will assist in identifying both the topics which are not known and in providing information on potential sources of the student's difficulty. Establishing these permits the instructor to remediate students and adjust instruction to meet each pupil's unique needs. Results of diagnostic assessments are not used to grade students.
Instead teachers use diagnostic information to adjust instruction by identifying which areas students have and have not mastered (McIntire and Miller, 2006; Ketterlin-Geller & Yovanoff, 2009). This results in varied instructional plans that are responsive to students' needs. Diagnosis of students' difficulties is a necessary step in the remediation process. In order to determine an appropriate remediation strategy, teachers must be able to assess misunderstood concepts.
Diagnostic assessments have been seen to be effective in raising overall levels of student achievement. For example, diagnostic testing in Helsinki Polytechnic are used to assess the level of basic mathematical skills of the new students, both for students themselves and instructors, and also to place students in appropriate study groups.Â The results show that the diagnostic test correlates well with the achievementsÂ in mathematics class in the first study period (Lehtonen, 2007).
Placement testing, such as COMPASS, CPT, ESOL, ACCUPLACER, is probably one of the most widespread uses of tests. Universities, the world over, use this type of test to determine the level of English and Math courses, among others, that students are prepared to enter. A placement test is a test given to students entering a school, college, or university to find the most appropriate courses or programs for them (Encarta Encyclopedia, 2009). Placement tests are often referred to as readiness tests.
Placement decisions are used when schools stream students into groups receiving different levels of instruction (Nitko, 2004). When using tests for making placement decisions, it is important to note that persons should be provided with the same general type of instruction geared at their level. Students obtaining lower scores should be placed into appropriate levels and helped till his skills are improved.
Schools usually use placement tests to form instructional groups or to stream students. Placement tests are meant to help students succeed in a given subject area by measuring the current skill levels students possess and as a result, determine which level courses students should be enrolled in to get them to a desired level.
They provide direct measurements of students' current skills rather than their potential. Placement tests are used to determine how much students know and how well they know it. These are not meant to pass, fail or reject students but rather to place them in one 'stream' or another. (Nitko, 2004).
Remediation in the (Math) Classroom.
Once diagnosis has been completed and struggling students identified the Math teacher now needs to incorporate remediation to address any deficiencies in student learning. This will prevent students from falling further behind. Since math concepts build upon each other, (concepts of addition is a prerequisite for multiplication, for example), remediation holds the key to any successful math program. The remediation plan should detail the steps the student will need to complete in order to master the identified deficiency before moving on.
According to Long and Boatman (2010), students placed in lower level, remedial courses experienced more positive effects as compared to those placed in more advanced developmental courses. For example, students in the lowest levels of remedial writing persisted through college and attained a degree at higher rates than their peers in the next highest level course. Students who took remedial writing courses also received higher grades in their first college-level writing course, indicating that some remedial courses are indeed helpful in preparing students for college-level work
Long and Boatman, claims that while developmental courses for students at the margin of needing any remediation have mostly negative effects, the impact of such courses for students with lower levels of preparation can be positive or have much smaller effects. In essence, remedial and developmental courses help or hinder students differently depending on their levels of academic preparedness. Therefore, states and schools need not treat remediation as a singular policy but instead should consider it as an intervention that might vary in its impact according to student needs. Hence appropriate placement into remediation programmes is critical in order to effectively cater for students needs.
In their study on the impact of remediation, Bettinger and Long (2009) found that that remedial students at Ohio colleges were more likely to persist in college and to complete a bachelor's degree than students with similar test scores and backgrounds who were not required to take the courses as long as it related to their area of interest. Moreover, Bettinger and Long (2005) found that community college students placed in math remediation were 15 percent more likely to transfer to a four-year college and to take ten more credit hours than students with similar test scores and high school preparation. Overall, the results suggest that remedial courses have beneficial effects for students in Ohio.
Calcagno and Long (2009) found that students on the margin of requiring math remediation were slightly more likely to persist to their second year. They assert that remediation might promote early persistence in college, but it does not necessarily help students who are on the margin of passing the cutoff make progress toward a degree.
Martorell and McFarlin (2008) in Calcagno and Long (2009), however, found no significant effects on students in a similar study conducted with Texas students. This suggest that students are neither harmed nor benefit greatly from any remediation programme.
Test development process
The development and pilot of an assessment instrument to accurately measure achievement or place students into appropriate programmes is both a time consuming and expensive task requiring many people with varied expertise. To be useful, a test must provide some inference about the people who take the test. A properly constructed test should provide valuable information when used in an appropriate setting. In order that a placement test be useful it should effectilvely differentiate between high and low achievers (Adkin et al, 1947). Millman and Green (1989), Miller and Greene (1993) Schmeiser and Welch (2006) and Downing and Haladyna (2006), aver that the main features of the test development process include:
Defining the test purpose
Developing the test specifications
Developing the test items
Evaluating the items
Assembling the test
Reviewing the test and
Evaluating the test.
This section will briefly review the literature on these processes.
Defining the test purpose
Assessment in education serves many different purposes in different settings and as such, the first step in the test development process should be to define the purpose for which the test is to be used. The test developer must thus specify the intended use of the test and the decisions to be made from the scores.
According to Schmeiser and Welch (2006), when developing placement tests, test developers need to state clearly the audience for whom the test is developed and the level of knowledge and skills students need to enter a specific program, hence they could easily be placed into an incorrect program thus leading to further student failure. Mehrans and Lehman (1991) (in Scmeiser and Welch, 2006) and Bloom, Hastings and Madaus (1971) outlined several purposes of assessment including making placement decisions, improving learning or auditing learning.
Whatever decision is to be made of the test, it is imperative that its purpose be clearly articulated from the onset before any further work is carried out. The purpose determines the different levels of questioning, duration and length of the test. This step provides the foundation for all other activities.
With the purpose of the test and the audience established, the next step in the development process is to define the test specifications. This includes specifying the test characteristics, content domain, format, length and delivery platform (Schmeiser and Welch, 2006; Downing and Haladyna, 2006).
A test specification, or blueprint, is a two-way grid which includes a listing of the content areas to be included on the test, along with the cognitive levels that test items are intended to target. It dictates how the test will be constructed and describes the testing format (objective or constructed response), the number of items to be included, the cognitive levels for each item, the scoring system and most importantly the test content (Downing and Haladyna, 2006). The specifications help ensure that particular content topics will be included in the test and helps improve the overall content validity of the test and should be derived from the national or school curriculum. The development of the test specification should involve all major stakeholders in the process.
Having defined and developed the test specifications and the testing domain, the next step is to begin writing the items as delineated in the specifications. The creationg of effective items is probably the greatest challenge for test developers. Haladyna, Downing and Rodriguez, (2002) posit that creating effective test items is more of an art than a science.
The process of item development needs to take into account the background and experience of the population being tested and the purpose of the testing activity. That process begins with the selection of a competent and knowledgeable team of writers. The writing team should consist of experts who can produce material as outline in the specifications and should be a representative of the population to be tested. Writers should undergo item-writing training sessions which focus on the creation of technically sound items (Schmeiser and Welch, 2006). This training is important as it helps with validity issues (Downing and Haladyna, 2006). The test specifications should form the basis for these training sessions.
The type of item to be developed is guided by the table of specifications. While the multiple choice item is usually the format for most large scale testing programs, developing sound objective items is far more difficult and time consuming than to prepare sound performance items (Downing and Haladyna, 2006; Haladyna ,1999 in Schmeiser and Welch, 2006). The creation of effective test items is challenging but is however a critical step in the test development process.
Developed items should now undergo a process of review. This step is both necessary and important in ensuring accuracy in terms of curriculum coverage, grammar and in the case of multiple choice items, that they are constructed properly and have only one key. Like item writers, reviewers should also be experts in the area for which items are being reviewed (Schmeiser and Welch, 2006). Teams of reviewers should include curriculum specialists, master teachers, principals and assessment experts. Having team members from different sectors of the population will increase fairness and reduce bias. Items which are deemed acceptable and passed the review stage should be prepared for field testing.
Field Testing of items
After items have been reviewed for content, accuracy and fairness, they should be, where necessary, refined taking into consideration, recommendations from the review team. When a new test is developed, it cannot be assumed that it will perform as expected. As such, developers should conduct studies to determine how well items on a test will perform. One such study includes the field testing of the items.
Acceptable items should be proofread and compiled into small booklets for field testing (schmeiser and Welch, 2006; Florida Department of Education, 2005; McIntire and Miller, 2007 ). Ideally, items should be field tested with students outside the general testing population. For example, a test meant for a class of 2012 should be piloted with a class of 2011. The field testing process involves administering the test to a small sample of the target audience and analyzing the data obtained from the test.
Upon scoring the field test items, a statistical review should be conducted. According to Schmeiser and Welch, 2006 and McIntire and Miller, 2007), this analysis should include:
the facility index,
the proportion of students selecting each option,
the discrimination index to include the biserial and/or point biserial.
Flawed items should be modified or discarded.
Schmeiser and Welch (2006) define test assembly as a process whereby accepted and psychologically sound items which will make up the test are selected and organized into the final version. This is a crucial process since the validity of the interpretations to be made from the results of a test rests on the competent and accurate test assembly process. (Downing and Haladyna, 2006). The final appearance of a test can affect the validity of the results . Typographical errors, ambiguous directions and disorganized arrangements of items could contribute to measurement errors and should be avoided as much as possible.
The method used to assemble the final test form depends on the mode of delivery to be used. For a paper and pencil mode, the test is usually assembled manually whereas for computer delivery mode, specialized software packages are required.
Careful consideration must be given to the following criteria during the test development process (Downing and Haladyna, 2006):
Curriculum coverage -
Item difficulty and discrimination -
Visual balance and layout -
Option Balance - each item should contain an equal number of key options
Although the development of items should include adequate curriculum coverage, assembling the final test should ensure adequate content coverage. Items should be grouped according to format and arranged in increasing order of difficulty since this will help reduce students' anxiety and boost their confidence. Furthermore, items testing the same topics should be placed together (Downing and Haladyna, 2006; Schmeiser and Welch, 2006; Oermann and Gaberson, 2009).
In terms of visual balance and layout, test items should not be crowded on the page and should allow students to read efficiently. There should be sufficient white space within and between items. The layout of text and graphics should not distract or put any of the test takers at a disadvantage (Schmeiser and Welch, 2006; Oermann and Gaberson, 2009).
For a multiple choice test, the location of the correct answer should be randomly assigned and should occur just about the same number fo times. That is, in a four-option multiple choice test, there should be approximately equal numbers of A's, B's, C's and D's appearing as the key.
Schmeiser and Welch, (2006), posit that test format will vary from test to test depending on examinee characteristics, delivery platform, visual appeal and client preferences.
Reviewing and evaluating the Test
Another important stage in the Test development process is the review of the final test form. While the individual items have already undergone a thorough review process, it does not necessarily follow that they will perform as expected in a final test form. Furthermore, there may be additional concerns which can only be addressed when the whole test is reviewed.
A thorough review of the test will serve to detect content-related issues and independence of each item (Downing and Haladyna, 2006). Instructions should be reviewed for clarity and any ambiguity which may exist. Schmeiser and Welch (2006) suggest seven major reviews that all test should undergo. These include:
Initial review for technical merit and to ensure that tests adhere to the test specifications.
Editorial review for grammatical errors and typos. This review should also check that each item contains only one key.
Measurement specialist review
Alignment, content and fairness review to ensure that the test conform to the test specification and that all items are accurate and sound.
User review to ensure balanced curriculum coverage
Once the test has been administered, the performace of the items should be evaluated for several reasons. Schmeiser and Welch (2006) state that evaluating item performance serves as a quality assurance step that items are performing as expected. It is possible that the items produce significantly different results when administered in a whole test as compared to when they were field tested.
An evaluation of the test results can also shed some light into the performance of different components of the test. Finally, the results can and should be used to improve the overall test.
In analyzing the test, developers should examine the measures of central tendency and spread to determine whether the test was too easy or difficult for the intended purpose or whether it measured what it was intended to do. The reliability of the test should also be evaluated. Reliability estimates should be evaluated for raw scores and where necessary, scaled scores. In the case of a single shot administration, a KR-20 reliability estimate is used to determine the internal consistency of the test (Schmeiser and Welch, 2006).
In terms of item performance, the test should be evaluated using classical test theory or the more complex item response theory. Classical test theory is used more often since it is a relative simple theory. In CTT, the facility index and discrimination index are analyzed. The facility index, as indicated previously, is the proportion of examinees answering an item correctly while the discrimination index tells how an item differentiates between high and low scorers. Measures of discrimination include the biserial and point-biserial index. The point-biserial is used for dichotonomously scored items.
Determining Cut-off scores
Cut-off scores are set as part of identifying the best qualified candidates for a position. Since situations vary from one process to another, judgement is required in setting cut-off scores. A cut-off score represents a standard of performance that is set in a selection process with the objective of identifying the best qualified candidate(s). In setting a cut-off score, you are deciding on the level of performance that a candidate must display to be considered further. Often the objective of identifying the best qualified candidate(s) will be achieved most efficiently by setting a standard of performance above just a minimally acceptable level.
Higher scores on selection instruments are usually associated with higher levels of job performance. The expression "more is better" captures this notion. The manager may want to consider only candidates showing higher levels of performance. Whatever the initial preference of the manager, he/she will want to consider several factors before making a cut-off score decision.
Factors to Consider in Setting Cut-Off Scores
In setting a cut-off score, it is crucial to consider the level of competence required to perform the job. Regardless of other factors, no cut-off score
Who Should Set Cut-off Scores?
Cut-off scores should be set by people who have a good understanding of the position and the required level of job performance. Awareness of labour market conditions and of similar competitions in the past is a definite asset. Normally, the manager of the position to be staffed is the most appropriate person to set cut-off scores. Nonetheless, the opinion of others knowledgeable in the area is often useful in making the final decision.
Types of Cut-off Scores
Setting cut-off scores may be divided into two major types: performance-related and group-related. These two types of methods and their combination are described below. Additional methods for setting cut-off scores can be found in "Guidelines for Establishing Pass Marks" published by the Public Service Commission.
Performance-related cut-off scores
Performance-related cut-off scores are set by making a judgement about the test score or the level of the qualification that corresponds to the desired level of job performance. The following are examples of this type of cut-off score:
On a test of typing speed and accuracy: 40 gross words per minute with no more than a 5% error rate.
On a paper-and-pencil instrument measuring knowledge: 80 correct answers out of 100 questions.
On a test of lifting strength: lifting a weight of 20 kg.
On a qualitative rating scale for motivation: a rating of "adequate" or better.
On a 5-point rating scale for initiative: a rating of 4 or better.
Group-related cut-off scores
Group-related cut-off scores are set relative to the performance of the candidates in a reference group. This reference group may be the present group of candidates, last year's group of applicants, or some other appropriate reference group.