Although we have students from many different backgrounds and walks of life, a common bond for many of them seems to be the learning of English and the struggles and difficulties associated therewith. When these students come to us in the classroom they may be at varying levels of English proficiency: Some knowing no English at all, some having a working knowledge or mastery of basic interpersonal communication skills (BICS), while some have mastered the cognitive academic language proficiency (CALP) and are barely recognizable as ELL's. No matter where they come from, however, or how proficient they are in their English development, all ELL's deserve our best teaching and highest attention to detail.
I had an experience in the classroom this past school year that I did not think much about until I took this course and began to realize how important the issue was. My students were taking the criterion referenced test (CRT), and I was tasked with the responsibility to provide the accommodations for certain students. I was reading the mathematics portion of the test to the students and I noticed that it was awfully laden with vocabulary that I knew was unfamiliar to the students. There were what seemed to be whole passages of narrative for a single problem with just a few numbers for the students to compute. As I was reading the problems to the students, I realized that many of my students were lost and had no idea what I was reading to them. I thought to myself, "No wonder so many students fail the tests! There is so much linguistic skill needed just to decode the problems, let alone the mathematical skills needed to compute the numbers and solve the problems.
Get your grade
or your money back
using our Essay Writing Service!
I really had thought nothing more about this concept of validity until I was watching the video for the course "Assessment for English Language Learners-Roles, Purposes, and Types of Assessment." When the presenter, Jiménez (2010), mentioned the concepts of reliability and validity, it triggered my memory about the research methods graduate course I had taken. Part of the course addressed the topic of reliability and validity, and I would like to focus the remarks of this research paper on that topic.
In order to better understand reliability and validity, and how they apply to ELL's in the classroom, we need to understand what the terms mean. Jiménez (2010) suggests that reliability: is the consistency of a measurement, or the degree to which an instrument measures the same way each time it is used under the same condition with the same subjects; is the repeatability of the measurement; and is not measured, but estimated. She further suggests that a measure is considered reliable if a person's score on the same test given twice is similar. What does that mean? Simply put in terms that relate to what a teacher would do in the classroom, is an assessment consistent in its results? If I create or administer a test for my students I should be able to get approximately the same results if I administer the same test at a later time. Such a test might include something like an IQ test, or the test administered to GATE (gifted and talented education) students in the Clark County School District (CCSD). The idea is that the student will perform approximately the same each time.
Cook and Campbell (1979) define validity as the "best available approximation to the truth or falsity of a given inference, proposition, or conclusion." So validity is the strength of conclusions, inferences, or propositions, which basically means that an assessment measures what it intends to measure. For instance, if a third grade math test asks students to factor polynomials and solve quadratic equations, the assessment is obviously invalid because those concepts are clearly not taught in the third grade.
Going back to my original query at the beginning of this paper, is a test that is designed to assess mathematical skill valid if it requires the use of reading and other skills? Jiménez (2010) poses the same question in her video lecture, and says that is "a question for the courts to decide and for greater minds, I suppose, to think about."
Always on Time
Marked to Standard
That is where I disagree with her. I think the courts have dictated for too long what happens in education. The time is now for educators to start answering the hard questions that affect their students firsthand. Educators in the trenches need to be making decisions for their own students, not the courts. Politicians and bureaucrats have too much influence on the classroom, and that needs to change. Teachers need to take a hard look at language issues and decide what is best for their students.
As for the question of whether or not a test is valid for ELL's if it is in English, I have not yet arrived at a conclusion to that one myself. I'm not sure that there is a right or wrong answer to the question, but as with any issue related to education, this one is a hot button issue that tends to get both schools of opinion very heated. "Since the main purpose of most content tests (especially the CRT's) is to assess a student's subject matter knowledge, the test questions should not require a level of English proficiency that is so high as to pose difficulty in understanding the task presented in the question. This is a concern for all students, including proficient native speakers, but it is especially a concern for students who may not be fully proficient in English, such as is the case with ELL's" (Young, 2008).
There is a view that teachers often fall into a rut, and use one method exclusively of teaching, despite its lack of efficacy. Some schools and districts employ the same strategy when it comes to testing. They select those things that suit their own views, and reject those things that may contradict. Dewey (1964), speaking of the "sects" within the "schools of opinion," said that, "Each selects that set of conditions that appeals to it; and then erects them into a complete and independent truth, instead of treating them as a factor in a problem, needing adjustment." Put colloquially, Dewey is suggesting that we often fail to see the proverbial forest through the trees. We let our own opinions and ideas cloud our judgment as to what is best for our students.
Another thing to think about in regards to our criterion referenced testing of students each year as mandated by the No Child Left Behind Act: The tests are supposed to be measuring students against a specific set of standards. Hence the term criterion in the term criterion referenced testing. Schools' passing test scores for each year, however, are based not on a specific, fixed benchmark, but rather it is based on the percentage of students from the previous year who passed the test. Doesn't this sound a bit like norm referenced testing, where scores are judged against each other and not necessarily against a set standard? Is it a reliable conclusion, then, to base a school's adequate yearly progress on norm referenced scoring and not criterion referenced scoring?
In the video lecture for the course, Jiménez (2010) offered some examples of test bias and showed how they were either not reliable or not valid. One example was on the vocabulary section of an English proficiency test, which showed a picture of dolphins leaping in the ocean.Â The test company was located in Monterey, California, where they field tested this item.Â English learners in the field test all identified these as dolphins, so it was included as valid. Jiménez suggested that the question was invalid because students who grew up in areas away from the ocean, who had never seen a dolphin, might not be able to answer that question correctly.
I know that validity is extremely important, but where do we draw the line? I know that there are many things we can do to reduce test bias, but at some point (using Jimenez'  example) a student has to know what a dolphin is whether he speaks English or some other language as his first language, and whether he is from a coastal state or not. Think about units taught in elementary schools on seasons and months of the year. What are the symbols one would expect to see on the decorations for the month of December? Now think about where you live. If you have not already guessed, I am demonstrating a point which is this: Snowflakes and snowmen are used to depict winter months, despite the fact that most students in the southern United States (from Florida to California), Hawaii, and most Spanish speaking countries (Hispanics comprising the dominate race within the student population in the CCSD) have seen snow only in pictures or in very small amounts for very short periods of time. So if so many students have never seen snow, or are so unfamiliar with it, why do we continue to associate the symbols of snow with winter? It is because such a vast majority of students have come to associate snow and ice with winter, and understand those things, so it has become something of a generally accepted concept. The same holds true for Jiménez' dolphin example. Some level of background knowledge is required for students to complete some tasks, even if the student may not have an extensive knowledge of that matter.
This Essay is
a Student's Work
This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.Examples of our work
Another example of a biased test question Jiménez offered was a Spanish proficiency test for English learners. The test was in Spanish for students recently arrived in the United States. The question showed a picture of three children standing at a doorstep ringing a doorbell.Â One was dressed as a ghost, the other a pirate, and another a skeleton.Â The test asks, "Â¿Qué hacen?Â [What are they doing?]." The obvious answer for any American student is that the students are trick-or-treating. Is that such an obvious answer for non American students, or students who have not grown up with American customs? In research terms this prerequisite knowledge that may be required for an assessment is called construct-irrelevance variance. "These are factors that influence students' test scores but are not directly related to the construct (the knowledge, skills, or proficiency an assessment targets). This is undesirable because it means that test performance may be partly due to factors that differ from the construct the test is designed to measure" (Young, 2008).
I experienced two situations during my first year of teaching which illustrate this matter. In the first, I had a student in my class on the first day of school who spoke not one word of English. School had started on August 25th and he had moved to the United States from Mexico on August 24th. This was not only a new school, new class, and new curriculum, but a new country, new language, and new culture. After very little time in my class, it became apparent that this boy was extremely well educated in all content areas, reading and writing included. He could read Spanish books to me very fluently and was able to answer questions about them. Most Hispanic ELL's I have had over the years could not read or write in Spanish. Most could barely speak more than conversational Spanish, and some less than that. This student's mathematics skills were far and away the best in the class. When CRT's were taken in March, however, this boy failed miserably. I was not surprised, but devastated nonetheless. Here was one of the brightest students in my class and he failed the most important test of the year. Was his failure due to construct-irrelevant variance or lack of knowledge? In my opinion, it was definitely due to the former, not the latter.
The second example was a student who could not read or write well at all. His reading scores put him at the level of a Kindergartener at the end of the year. His math skills were about average for a third grade student his age, which is the grade he was in, and he could perform most of the computational work on his own. When we took tests from the book, which were mandated by administration, he always failed them. The tests were multiple choice, and most contained a lot of writing and language which would require a student to know how to read well, at least at a third grade level. I decided to read the tests to him and let him do the rest of the work on his own. As soon as I did this, his scores increased dramatically. I decided one day to administer a test to him and let him work it out by himself. As I predicted, he failed. The next day, I administered the same test to him, but this time I read it out loud to him. This time, as I predicted, he scored quite well and that proved to me that the math problems were not too challenging, but that the language of the tests was a construct-irrelevant variance.
In my discussion post for the Assessment for ELL's course I wrote, "I think part of learning a new language is not simply being able to speak the language, but knowing the culture as well, like the holidays and customs, regardless of whether or not you grew up with them. If I lived in Puerto Rico, I would surely learn very quickly about their Christmastime celebrations, which are quite different from ours here in the United States. And if I took a test in Puerto Rico and was asked a question where knowledge of these customs was a prerequisite, I would not only think it was a valid question, but I would expect it." Having done a bit more research, and done more in depth thinking, I would have to revise that statement.
In regards to the question about the children trick-or-treating, when assessing students for language proficiency, whether in English or any other language, when would it be necessary to have a prerequisite knowledge of cultural customs and norms? I think it would make more sense to have questions relating to the language mechanics, than to have questions asking about cultural factors, which are not a reliable or valid indication of actual student knowledge.
For ELL's it is critical that they receive instruction and assessment that is reliable and valid. There have been countless times where society has come to a misinformed conclusion on a particular issue based on unreliable or invalid research. One of the biggest examples of this is research done in the 1960's done by Roger Sperry, an American psychobiologist who discovered that the human brain has two very different ways of thinking. Without getting into an inordinate amount of detail, Sperry's conclusions have led many to the false notion that people are predominately right- or left-brained. Sperry's big problem was that his test group was not representative of the entire population, a problem of reliability: his tests would have had different results with different subjects. "Researchers have come to see the distinction between the two hemispheres as a subtle one of processing style, with every mental faculty shared across the brain, and each side contributing in a complementary, not exclusive, fashion" (McCrone, 2000).
So how does this concept apply to education, and specifically ELL's? How do we prevent data from being misinterpreted? If educators are taking evidence and data from ELL's and using that as a benchmark for future ELL's, then are the data reliable and valid? Only if those data can be shown to be free of construct-irrelevant variance, which in many instances, cannot. It is pretty clear that ELL's are disadvantaged by having to take state and federal standardized tests in an unfamiliar language. Would students perform better on these tests if they were translated into their native language? I'm sure they would, but without any real data to support that claim, and as federal law prohibits the translating of any testing materials, one can only surmise that scores would be higher. The other obstacle to translating tests for ELL's, other than federal law, would be accuracy and practicality. Students whose primary language is Spanish currently comprise nearly 80% of the ELL population in the United States (Kindler, 2002). It would make sense to translate tests into Spanish only, but how would that be done? The process would have to be carefully controlled and done directly at the source. Leaving the task to individual states or districts would be a recipe for disaster. That would create an avalanche of invalid test scores since all tests would likely be different. The other factor to consider is that ELL's speak over 450 languages (Kindler, 2002). It really is not practical to translate tests into over 400 languages. It simply would not work.
So what is the ultimate answer for the ELL conundrum when it comes to testing? Are test scores valid and reliable or not? I think the closest anyone can come to answering that question is that results cannot be reliably interpreted without more data at the national level. As I mentioned before, federal law does not permit states to translate CRT materials for ELL's. Until federal law does make that allowance, perhaps we will never have reliable test scores for the ELL subgroup.