Defining Class Size and Student Achievement in Education

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Today, the construct of class size encompasses a wide variety of instructional settings ranging from student one-on-one tutoring to internet on-line classes serving several hundred students at one time. Likewise, the concept of small and smaller class size evolved greatly in the course of the 20th century. While class size denotes the average number of students entrusted in the care of one teacher over the course of one year, pupil-to-teacher ratio refers to the number of students within a local educational authority divided by the number of certificated personnel servicing the student population employed by the organization (Achilles, n.d.). Teacher-student ratio denotes the same construct. Differences between pupil-teacher ratio and class sizes were found to be as large as 10 students. In a nutshell, given a student-teacher ratio of 17 students to one teacher in a given building, the actual classroom load may be as large as 27 students for one teacher (Achilles, Finn, & Pate-Bain, 2002). Yet, in spite of these differences, the literature related to instructional settings has used erroneously both concepts interchangeably. While actual class size may vary during the year or even during the same day, pupil-teacher ratio are usually smaller since they may include certificated personnel not assigned to one classroom or assigned to smaller classes such as those typically required to service special need students. To rephrase the above remark, although both constructs are highly correlated, it is likely that student-teacher ratios will be considerably lower than the one calculated by the actual class size construct. In fact, it is only at the classroom level that both metrics may be identical (Achilles, n.d.), assuming that students are not pulled out during the day.

This being said, student-to-staff ratios in public school steadily decreased from 35:1 in 1890, to 28:1 in 1940, and 20:1 in 1970 (Hanushek & Rivkin, 1997). Hanushek remarks that in the period 1950-94, the pupil-teacher ratio has dropped 35%. Yet, achievement in mathematics, science and reading as measured by the National Assessment of Educational Progress (NAEP) has remained consistently flat over the last three decades of the 20th century (Hanushek, 1998; Johnson, 2002). Although these figures suggests that lowering the student-teacher ratios does not translate gains in academic achievement, the proponents of smaller class sizes point out at the changing nature of education. Indeed, the growth of specialized areas of instruction such as special education gives the illusion that class size have been reduced (Achilles, et al., 2002) by lowering the pupil-teacher ratio while class size itself remained consistent or even increased over the same period. Other researchers (Biddle & Berliner, 2002; Greenwald, Hedges, & Laine, 1996) further contend that Hanusheks conclusions lack external validity since the sample groups used in his studies were small and not representative of the whole U.S. population. Moreover, the use of student-teacher ratios uncontrolled for other characteristics to describe class purportedly hides confounding variables (Biddle & Berliner, 2002, 2003).

Likewise, research in the area of class size and academic achievement focused on increasingly smaller sizes, comparing classes comprised of between 15 and 35 students. For instance, while Rice (1902) compared the effectiveness of classes ranging from under 40 students, 40 to 49 students, and 50 students and over, later studies carried out in the 1980s focused on much smaller class sizes, typically of 15 to 22 students versus 23 to 35 students (Molnar, et al., 1999; Nye, Hedges, & Konstantopoulos, 2000; Shapson, Wright, Eason, & Fitzgerald, 1980). In some studies, such as the first meta-analysis on class size conducted by Glass and Smith (1979) and Glass, et al. (1982), the research would also include comparisons of classes of 25 students or more with one-on-one tutoring (class size of one). Researchers such as Slavin (1986) pointed out that such wide variations between class sizes severely undermined the external validity of such studies. Since most of the educational policies involved class size reductions to smaller classes of a maximum of 15 students and given that most of the studies carried out since the late 70s included comparisons of such classes, this review of literature will not report studies comparing the effectiveness of one-on-one tutoring to whole class instruction.

The difficulty of defining the concept of small class size is further compounded by multiple methods of calculating student-teachers ratios and the complexity of school master course schedules. Although researchers agree class size is a ratio involving students and instructors, studies have been inconsistent or even silent as to how such ratios are obtained. In the large-scale Coleman Report (1966), class size was obtained by dividing the student population within a building by the number of faculty, including non-instructional staff such as librarian clerks who do not instruct classes. Since the primary purpose of the Coleman Report was to observe the impact of racial segregation on achievement in American school, class size was, ipso facto, aggregated to other measures of school facilities/resources and did not account satisfactorily for the impact of class sizes on achievement within the larger context of public education. Relying on the available data, from large samples of convenience and questionnaires, the study was unable to isolate the impact of class size and achievement.

Furthermore, other factors such as non-assigned teaching staff, pullout of students for differentiated instruction, or even small group workshops taking place at various times of the day also introduce complications in calculating student-teacher ratios. Class size in itself includes considerable variations (such as allotted time, student characteristics, instructional methods, grade levels, subject areas), which, if left undefined, may cause an underestimation of the true relationship with student achievement would otherwise suggest (Ehrenberg, Brewer, Gamoran, & Willms, 2001a). Clearly class size and student-teacher ratios do not equate in that the latter does not account for the actual schooling context in which student are learning and there is no agreement among researchers on a standardized method of calculating such ratios.

In the final analysis, the researcher must be explicit when defining his constructs. Adcock suggests a working definition of class size as the total number of students enrolled on the last school day of the year divided by the derived school number of core teachers employed on the last of the school year of [a given] school (Adcock & Winkler, 1999, April, p. 9). Such constructed statistic of class size considers only those teachers assigned to academic subjects: English/language arts, social science/history, mathematics and science.

The concept of academic achievement or academic performance in the present study refers to the individual norm- or criterion-referenced standardized measures administered mostly at the state level (i.e. Iowa Test of Basic Skills [ITBS], California Standards Test [CST], National Assessment of Educational Progress [NAEP] or Stanford Achievement Test [SAT], to name a few standardized tests commonly used in the K-12). Academic achievement differs from academic attainment in that data measuring academic performance are collected at regular intervals for the purpose of measuring progress. Academic attainment, on the other hand, denotes reaching educational goals or milestones that enhance ones societal status, such as graduation from an educational institution, or even moving up the socio-economic ladder. Although most research will mention separate aggregated academic achievement results in one or more of the four core subjects (mathematics, language arts, social studies, and science) for the various groups of students being observed, some studies, particularly meta-analyses such as Glass & Smith (1979), combined the achievement performance for lack of more specific data. Although one could conceive other methods of measuring schooling outcome, such as authentic assessment, standardized testing is more readily available as a measurement. By and large, such quantifiable measurements are readily available and will be used extensively in the present study commonly reported.

Historical Context of Class Size Research

As early as the turn of the 20th century, class size and its effects on academic achievement elicited the interest of educational researchers. At that time, the focus was on elementary education, and more sparingly on the secondary level (Glass, et al., 1982). From 1900s to 1920s, studies followed Rices (1902) footsteps; however, these were shown to contain minimal experimental control (Glass, et al., 1982). By the early 1930s, most of the research efforts related to class size went dormant until the interest resurfaced in the 1960s when student achievement was correlated with school resources (Glass, et al., 1982). Experimental and quasi-experimental research on the topic greatly expanded in the late 70s and early 80s, with the growing unease across the nation that public education was failing kids. Two public reports sparked a renewed interest in school reforms and class size research: A Nation at Risk (Gardner, Larsen, Baker, & Campbell, 1983) and the Coleman Report (Coleman, et al., 1966).

In the wake of the successful launch of Sputnik by the Soviet Union in 1957, the supremacy of the United States was no longer taken for granted at home; this crisis of confidence culminated twenty years later with the publication of a Nation at Risk (Gardner, et al., 1983) pointing at the decline of SAT scores from 1960s to the 1980s and at the resulting lack of international competitiveness of the American educational system. At the state level, boards of education closely monitored large programs of class size reduction launched statewide in Tennessee and Wisconsin; similar actions controlling class size was seen as an easy mandate for public education entities to implement (Addonizio & Phelps, 2000).

Moreover, opinions in the 1960s were divided as one wondered whether the expected increase in academic achievement realized through the implementation of smaller class size would justify the additional spending of public monies. The large-scale state of education research published by Coleman (1966) attributed differences in achievement among students to family environment, defined as the number of books available in the home or the socio-economic status of the unit, and downplayed the role of schooling context, including class size, in student achievement.

In a commissioned paper design to enlighten public policy in education, the Coleman Report (1966), using standardized test scores and questionnaires from teachers and principals, measured the academic achievement of more than 150,000 students in grades 1 to 12 and found class size to be a negligible factor in student achievement on standardized norm-referenced tests in verbal abilities and mathematics: Some facilities measures, such as the pupil/teacher ratio in instruction, are not included [in the report] because they showed a consistent lack of relation to achievement among all groups under all conditions (Coleman, et al., 1966, p. 312). Disregarding the possible impact of class size on student achievement, Coleman concluded that the socio-economic background of the student, the social composition of the student body and the characteristics of the surrounding community are key factors to explain differences in academic achievement among students.

However, in the Coleman Report, class size was not distinctly analyzed as a potential contributing factor; instead class size was combined with other factors such as textbook and library availability under the overall umbrella factor school facilities/resources. Again, it must be emphasized that, in the Coleman Report, class size was defined by dividing the student enrolment by the number of school employees within a building, a potential source of error causing a poor estimate of the true relationship between the class size and academic achievement. Much like in other econometric studies carried out since (Hanushek, 1998; Rivkin, Hanushek, & Kain, 2005; Wossmann & West, 2006), teacher salaries and other input variables used as a substitute for actual class size may mask confounding variables.

Rather than focusing on absolute achievement in a static fashion, it would be of greater interest to determine: (1) the marginal gains obtained in small classes over time through time series analysis; and, (2) whether students with different characteristics respond to treatment in the same fashion (Ehrenberg, Brewer, Gamoran, & Willms, 2001b). Perhaps, the most compelling objections to the conclusions made in the Coleman Report stems from its analysis of education at a given point in time. Nevertheless, the same report brought into light other possible confounding factors in the relationship between class size and student achievement, such as the value of the resources allotted to the schools, the characteristics of instruction including teacher and class size, the characteristics of the school (such as culture), and the characteristics of the community.

This debate over the effectiveness of smaller classes illustrates the divergent and sometimes contradicting interests between government officials and the students families when attempting to answer the question of the economic value of education and the cost benefit of smaller class sizes (Mitchell & Mitchell, 2003).

Research Syntheses

In an effort of developing a first comprehensive meta-analysis on the relationship between class size and student achievement, Glass and Smith (1979) retrieved published empirical class size studies and dissertations since the turn of 1900s, finding over 300 experimental and quasi-experimental studies containing usable quantitative data. Focusing on 77 experimental studies describing 725 paired comparisons/combinations of student class sizes broadly categorized in four types, less than 16 students, 17 to 23 students, 24 to 34 students, and over 35 students, Glass and Smith looked at the achievement test results of nearly 900,000 students over a 70 year span in a dozen countries.

Glass and Smith (1978, 1979) first approximated the relationship between class size and achievement by using the model ?_(S-L), based on standardized achievement mean differences between pairs of smaller (S) and larger (L) classes divided by the within group standard deviation. Next, rather than creating a matrix with rows and columns representing the class sizes and the intersecting cell the values of ?_(S-L), Glass and Smith used the regression model: ?_(S-L)= ?0 + ?1S + ?2S2 + ?3S2 + ?3(L-S) + ? to aggregate the findings. Since interpreting the model in terms of class-size and achievement involves at least three or more dimensions, Glass and Smith imposed a consistency condition on all ?_(S-L)s to derive a single curve from the complex regression surface. Imposing arbitrarily the mean z-score achievement of 0 to the class-size of 30, the final interpretation of the model was represented by a single regression curve for achievement onto class size.

When compared to larger classes of 40 students, smaller classes of 30, 20, 10 and 1 students showed standardized differential achievement effects of -.05, .05, .26, and .57, respectively. Likewise, when compared to larger classes of 25 students, smaller classes of 20, 15, 10, 5, and 1 student showed standardized differential achievement effects of .04, .13, .26, .41, and .55, respectively. Those results included achievement results in mathematics, language arts, and science. Half of these regression analyses involved quasi-experimental or convenience assignment of students to either large or small groups. Translating these z-scores into percentile ranks, the gains in the 25 versus 20, 15, 10, 5, and 1 comparisons are 4, 5, 10, 16, 21 percentile rank, respectively.

From the initial 725 paired comparisons of student achievement in both smaller and larger groups, 435 (60%) comparisons favored smaller class configurations by showing an increase in academic achievement. Yet, this increase was not quantified. Achievement was defined either as combined standardized student results in one or more subject. When focusing on 160 pairs of classes of approximately 18 and 28 students, the meta-analysis suggested even more distinct differences in achievement: In 111 instances (69%) smaller classes demonstrated a higher level of academic achievement over the larger classes. Again, this result was not quantified. Regressions analyses based logarithmic models favored smaller classes by nearly one tenth of a standard deviation for the complete set of comparisons.

It is important to note that only 109 of the 725 initial comparisons involved random experimental designs in a total of 14 studies, 81% of which found smaller class sizes led to increased academic achievement as measured by standardized tests or other measures, such as number of promotion to the next grade level. Others types of class assignment reported in the 725 comparisons included: (1) matched: 236 comparisons; (2) repeated measures: 18; and (3) uncontrolled: 362 comparisons. The last type of methodology involved quasi-experiments that ultimately weakens conclusive discussion related to the relationship between class size and academic achievement.

Possibly for this reason, Glass (1982) further analyzed the results of the 14 random experimental studies. Further distinguishing achievement gains for fewer and greater than 100 hours of instruction time, an average student taught in a class of 20 students would reach a level of achievement higher than that of 60% of students taught in a class of 40 students. At the extreme point of comparison, a student instructed in a class of five students would outperform a student in a class of 40 students by 30 percentile ranks. This study effectively demonstrated that students in smaller class achieve at a higher level. Yet, even in the case of experimental comparisons, effect sizes are limited unless the size of the small class drops below 20 students. Glass and Smith argue in favor of smaller class size.

Two important issues seem to weaken the argument that smaller classes are more effective than larger ones. Firstly, the 109 comparisons were actually aggregated by the authors into about 30 comparisons. In many instances, the same larger and smaller groups and their performances had been evaluated on the basis of different conditions, such as amount of instruction or subject areas. In other cases, the subject areas measured were combined. Secondly, results reported reflect the performance of disparate sizes, such as class of 1 student vs. class of 30 students, or a class of 5 students vs. a class of 30 students. Education Research Services (1980) claims that the Glass and Smith meta-analysis overemphasizes the performance of extremely small instructional setting, one to five students. Hedges and Stock (1983) proceeded to reanalyze the Glass meta-analysis and stated that, and gave validation to the finding that class sizes below 20s students are effectively more conducive to promoting academic achievement. Subsequently, this initial analysis by Glass (1979) was further expanded (Glass, et al., 1982) to include the implications for educational policy decisions. Although the literature tends to describe class sizes below nine students as tutoring setting, a context beyond the scope of the present study, it is noteworthy to mention the meta-analysis carried out on class sizes of nine students or less (Cohen, Kulik, & Kulik, 1982). At the heart of the controversy, we find the very concept of practical significance and pragmatic implications of systemic changes towards lowering class sizes. Smaller class sizes seem to be effective. However, larger effects are noticed in class size of less than 20 students. In their meta-analysis of tutoring classes of 9 students or less, Cohen, et al. (1982) measured effect sizes based on 65 studies. Their findings confirmed Glass greater effect sizes (differences of means of both experimental and control groups divided by the standard deviation of the control group) in favor of smaller class sizes. Interestingly, groups tutored by peers achieved a greater gain than those entrusted in the teaching of regular teachers. This again hints at the need to further identify context variables. Clearly, class size alone does not cause greater academic achievement.

Both Glass studies confirmed the opinion largely spread in educational circles that small class sizes were more conducive to student learning. The contribution of this meta-analysis to the research area is three-fold: it established the benefit of class size below 20 students; gave the impetus for statewide experimental class-size reduction; and, finally emphasized the role of teaching processes, such as time on task, as underlying reasons causing the positive impact of smaller class size on academic achievement.

However, limited number of experimental analyses retained by Glass, et al. (1982) caused validity concerns: Slavin (1989) contended that, by limiting the meta-analysis to only 14 experimental studies, the Glass, et all conclusions lost in external validity and generalizability what was gained in internal validity. Based on the examination of Glass, et al. (1982), it seems that the only sizeable effect was found when comparing 10-student classes vs. a 30 student classes and the greatest effect of class size on student achievement is without a doubt one-on-one tutoring. However, the most common application of the concept of smaller class size would compare differences in achievement between groups of 14-20 students vs. 30 or more students in one class.

Slavin (1989) introduced a best evidence synthesis, combining the elements found in meta-analysis with narrative review. He selected eight random class assignment studies comparing the results of standardized reading and mathematics tests in smaller and larger classes at the elementary level. Studies had to compare larger classes to classes at least 30% smaller with a student/teacher ratio not exceeding 20:1. The selected studies analyzed smaller class size programs of at least one year in duration, with either random assignment to alternative class sizes, or matching preconditions. Effect sizes were based on the difference between the small class achievement mean (experimental group) and the larger class achievement mean (control group) divided by post-test standard deviation of the control group. This is the same definition of effect size introduced by Glass and Smith. On average, these studies compared groups of 27 students to groups of 15 students. Even though these eight studies were well-controlled and documented studies, the median effect size observed was only +.13 (Slavin, 1989, p. 251).

Discussions about such small effects as measured by standardized tests in both mathematics and language arts seem to point at the teacher instructional delivery remaining consistent regardless of the class size. The type of interactions, such as explicit direct instruction, between students and teachers had already been identified as an influential factors in the Coleman report (1966). This observation was again echoed by Glass, et al. (1982) as they note that class size is only one variable affecting effective instruction.

In the wake of a controversy on appropriate use of funding for underachieving schools, the Educational Research Service (ERS) published a report (Porwoll, 1978) on the state of the research on class size citing over 100 studies which suggested small effect sizes, most of which were correlational with some or little control of other variables such as teacher-, student-, and school-related contexts. Although this particular research was inconclusive, a subsequent ERS study carried out one decade later corroborated the findings of Glass and Smith (Robinson & Wittebols, 1986) and also added an important element to their discussions. Although smaller class sizes seem positively associated with an increase in academic achievement, smaller class sizes alone do not result in increased student performance.

Adding on to Glass meta-analysis and Slavins best evidence synthesis, Robinson used the related cluster approach to review K-12 research studies conducted between 1950 and 1985, involving class sizes greater than five students. Studies were aggregated within clusters representing important factors influencing class size decisions: subject matters, grade levels, student profiles, instructional practices, and student behaviors. The impact of class size on student achievement varies by grade level, pupil characteristics, subject areas, teaching methods, and other learning interventions. (Robinson, 1990, p. 90) Robinson and Wittebols meta-analysis unfortunately does not provide any effect sizes but merely classify the studies as to significant differences, favoring small class sizes, larger class sizes, or bearing no effect on academic achievement. Robinson conclude that positive effect of class size are consistent in grade k-3, slight in grades 4-8, and imperceptible in grades 9-12. Furthermore, lower SES students are found to benefit most of smaller class sizes. Again, these conclusions do not include effect sizes. Nevertheless, Robinsons study clarifies the concept that optimal class size is a nonsensical question. Smaller class sizes benefit students differently, according to their social contexts, personal background, grade level, and academic subject.

The observation that smaller class size alone does not translate into academic achievement ties in with the observations of Coleman (1966) and a latter version of Glass meta-analyses (Glass, et al., 1982), which acknowledges that class size alone does not have a causal effect on student achievement. Given this context, the focus must shift from a direct relationship between class size to academic achievement to the actual mechanisms that link smaller class size to higher academic achievement.

This interpretation of prior research by Robinson announced a new direction that recognized the complexity of the relationship between academic achievement and class size. The need to control potentially confounding variables such as student past academic performance, already emphasized by Glass, et al. (1982), became central in most post-1980s class size studies as researchers recognized that studies carried out on the topic of academic achievement and class sizes suffered from poor sampling, methodological flaws, or inadequate design of quasi-experiments (Finn, 2002; Slavin, 1989). Research, was called to become more sophisticated, and account for several effects on different groups of students (i.e. achievement, ethnicity, English mastery) within different contexts (e,g,, school setting, class size, instructional methods). Meanwhile, it is noteworthy to point out that research on class sizes at secondary or post-secondary levels has been severely limited to this day.

Although critics of the Glass and Smith analysis (1979), such as Slavin (1989), contended shortcomings such as some studies selected within the meta-analysis were of short duration (as little as 100 hours of differentiated instruction), comparing disproportionate sizes (one-on-one tutoring vs. 25 student class), or even evaluate subject of non academic nature (such as tennis), most of these conclusions were later sustained by subsequent research on large-scale class size reduction projects carried out in the same decade (Finn, 1998).

In spite of methodological differences, the research synthesis carried out by Glass (Glass, et al., 1982; Glass & Smith, 1978, 1979), Slavin (1984, 1986; 1989), and Robinson and Wittebols (1986), all conclude that students enrolled in classes of less than twenty students perform better. Furthermore, smaller class sizes cause a significant increase in academic performance especially among the primary grade (K-3). Robinson and Wittebols as well as the Smith, at al. (1982) announced a new direction in the research, indicating clearly that reducing class size alone would not cause a direct increase in student achievement unless teachers adopt different classroom procedures and instructional methods. Robinson also pointed at the economically disadvantaged students as those who were the most likely to benefit from smaller classes,

The understanding of moderating factors such as teacher qualifications and student background in the relationship between class size and student achievement was further enhanced by a national study conducted by the Policy Information center (Wenglinsky, 1997). The study originated from a school finance approach, attempting to link spending of public funds and the overt goal of schooling: academic achievement. Therefore, it is only incidentally that Wenglinsky stumbled on the connection between class sizes and academic achievement. The scale of When Money Matters, not unlike the Coleman Report thirty years earlier, covered the nation, with dramatically different conclusions. Using district-level data from three different databases maintained by the National Center for Educational Statistics, Wenglinsky grouped 10,000 fourth-graders in 203 districts and 10,000 eight-graders in 182 districts according to socio-economic satus.

Figure 1. Wenglinskys Hypothesized Paths to Achievement

The linking of these different databases allowed differentiation between types of spending in a way that would have been impossible at the time the Coleman Report was produced. Indeed, aggregated spending per pupil expenditure cannot account for the types of expenditures incurred, some of which are positively linked to academic achievement while some are not. Furthermore, the Coleman Report was unable to consider cost of education variation across states. The National Assessment of Educational Progress database (which drew the teacher-student ratio) provided not only academic achievement information of a nationwide student samples, but also valuable information about the characteristics of school climate. The Common Core of Data database gathered financial information at the district level; finally, the Teachers Cost Index database also maintained by the U.S. Department of Education accounted for teacher cost differentials among states. Through a series of multivariate regressions, Wenglinskys concluded that increasing school district administration and instructional expenditures to increase teacher-student ratios, in turn, raises fourth-grader academic achievement in mathematics. Likewise, expenditures also affect the performance of eighth-grade students. However, the increased teacher-student ratio is believed to decrease behavioral problems among students and set a positive tone to school environment. These two variables are positively linked to an increase in academic achievement at that grade levels. Interestingly, spending on facilities, school-level administration, and expenditures to recruit highly educated teachers are not found to be directly associated to academic achievement. And Wenglinsky to conclude Because the [previous] studies did not specify measures of school environment, the effect of school spending on achievement as mediated by environment remains unstudied. (Wenglinsky, 1997, p. 21) In the middle/junior high grades, academic achievement seems mediated by an increased in social cohesion created by smaller class. Again, this conclusion points at mediation between class size and academic achievement. Building a 2 by 2 factorial matrix combining district with above- and below-average socio-economic status (SES) and districts with above- and below-average teacher cost, Wenglinsky concludes that the largest gains in achievement in mathematics were obtained in districts with below-average student SES and above-average teacher cost. Study results indicate that higher teacher-student ratios in fourth grade are positively associated with higher achievement in mathematics. In eighth grade, teacher-student ratios is linked to a positive school environment (low teacher- and student-absenteeism, respect of property, low class cutting rate, low tardiness rate, teacher control over instruction/course content). Positive school content, in turn was positively associated with higher achievement in mathematics.

Large-Scale State Experiments

Project Prime Time

Piloted first in 1981-82 in a limited-size experiment of class size reduction in primary grades K-2 with student-ratios of 14:1, the five-year project initiated by Indiana Governor Lamar Alexander (future Secretary of Education during the George H. W. Bush presidency) started in earnest in 1984-85 with class size reduction of 18:1 in grades K-3.. By 2008-09, project Prime Time was in its 25th year of implementation (Indiana Department of Education, 2010).

A early implementation study (McGiverin, Gilman, & Tillitski, 1989) investigated the performance of second grade students at the end of two years of reduced class size instruction (19.1:1) demonstrated a greater academic achievement in reading and math measured by standardized tests than their counterparts in large classes averaging 26.4 students. Six randomly selected schools and school corporations (districts) with students that had received treatment were compared to three schools whose students were included in control groups. 1,940 Prime Time student scores on standardized tests (Cognitive Ability Test CAT, Iowa Test of Basic Skills ITBS) in mathematics and reading in ten studies were compared to the related performance of 2,027 students from larger classes. The Fisher inverse chi-square computation for schools with smaller class sizes with a ratio 19:1 was significant (?2 =190.45, df = 40, p < .001), and the studies mean differences between groups divided by the two groups pooled standard deviation were averaged within a meta-analysis to yield an effect size of .34 SD for all subtests (p. 51). This analysis suggests that Prime Time students enrolled in smaller class perform better academically. Yet, interestingly, the Indiana Department of Education states on its Prime Time web page (Indiana Department of Education, 2010) that Lowering class size, alone, will not bring about better teaching and learning. Although the actually principle of class size is not disputed here, quality instruction and student engagement seem to be emphasized.

Project STAR

From 1985 to 1989, the Student Teacher Achievement Ratio project (STAR), carried out in Tennessee, was the first statewide randomized class size reduction experiment of the kind, involving 76 schools, 1,200 teachers and 12,000 K-3 students over four years. Students were randomly assigned to either a small class (typically 13 to 17 students), a regular class (22 to 26 students), or a regular class with a full-time instructional aide. Class sizes were reduced by one third (seven students) on average (Wossmann & West, 2006). Teacher assignments were also randomized. This configuration continued over the four years of the experiment and data were collected from various sources including teacher interview, student performance data, classroom observations, and teacher questionnaires. Students were kept in this configuration from kindergarten for a total of four years, until completion of grade 3. The following year, all students return to full-size classes. In grades K through 3, the students enrolled in small classes consistently performed better than their regular class counterparts on standardized tests (Stanford Achievement Test).

Effect sizes calculated as the mean score for small class (S) minus the mean score for regular class (R) and teacher-aide class (A) configurations [S-(R+A)/2] expressed in standard deviation unit after four years. All students benefited from the smaller classes. Data collected in grades K-3 indicate higher academic achievement in small class configurations, with attainment measures ranging from +0.15 to + 0.25 standard deviation as compared to larger class configuration performance. However, effect sizes of academic achievement were typically two to three times larger for minority students than for White students (Finn, 1998; Finn & Achilles, 1999). Follow-up data were collected in subsequent years, from grade 4 to 8, suggesting that achievement gains were maintained after treatment (Finn, Pannozzo, & Achilles, 2003). The design of the study was strengthened by the within-school implementation of the three configurations (S, R, and A) which allowed for better control of potentially confounding variables such as school setting (urban, suburban, rural), the socio-economic status of the students, per-pupil expenditures, and gender of the students. All differences were found to the advantage of the small class size outperforming the other two configurations. Gender and school settings were not found to cause significant interaction on academic achievement.

In contrast, non-experimental researchers using education production models (also known as econometric studies ) noted that student attrition, cross contamination of control and experimental groups (occurring when parents press the school administration for their child to be placed from larger to smaller class configurations), non-random assignment of teachers (administrator selection), and possible Hawthorne effect potentially undermined the experimental sturdiness of STAR (Hanushek, 1999; Krueger, 1999). Isolating cohorts of students who remained in the program for four years (48% of the kindergartners initially enrolled), Hanushek calculated the performance of both control and experimental groups to be much lower. For instance, while third-grade students in small groups perform 0.22 z-score above the control group, the gap between experimental and control cohorts after four years was only 0.14. Similarly, in mathematics, the gap between yearly samples and 4-year cohort for the same grade decreased from 0.18 SD to 0.10 SD. The treatment effect was mitigated by student mobility and possibly student SES since students with lower SES demonstrated higher mobility. Does this means that class size should not be considered? Probably not, the evidence indicates that class size reduction affects students differently (Finn & Achilles, 1999). Replying to Hanusheks claims of added value and limited persisting effects, researchers (Finn & Achilles, 1999; Nye, Hedges, & Konstantopoulos, 2004) pointed out that public policies should target urban schools with larger poverty student populations. In conclusion, most of the evidence in favor of class size lies in the fact that smaller classes benefit students differently according to their circumstances.

Based on this evidence, and despite the fact that education is not within its competence, the federal government (United States. Congress. Senate. Committee on Health Education Labor and Pensions., 1999) actively promoted class size reduction, citing STAR has a prima facie case in favor of expanding the small class size concept across the nation.

Until the end of the millennium, the class size debate sharply divided proponents and opponents of smaller class sizes as local governments were considering additional expenditures with the aim at reducing the inequalities that Coleman first reported as strongly associated to socio-economic status and races (1966). The interest in class size reduction as a tool to improve academic achievement culminated in 1998 with the U.S Department of Education and the Office of Educational Research and Improvement commissioned a study published by Finn (1998). This report purported to be an overview of the previous two decades (late 1970s to late 1990s) of research on class size reduction, with the goal of providing evidence to guide and prioritize national educational policies, and clarify questions related to academic effects, cost-benefit analysis of small class sizes, implications for practice and student behavior. Finn based his argument by including only robust large scale experimental designs, such as STAR.

Project SAGE

At about the same time, Wisconsins Student Achievement Guarantee (SAGE) was launched as a five-year program as an intervention targeting SES students in primary grades K-3. Initiated in 96-97 school year, the program design included four components: (1) class size reduction to meet a teacher-student ration of 1 to 15 (including arrangements such as two teachers for 30 students); (2) extended school day; (3) implementation of rigorous curricula; and, (4) staff development combined to a system of professional accountability. 30 schools from 21 school districts meeting the SES criteria of 50 percent of low SES students (based on free school lunch participation) began the program. K-1 was targeted the first year, and grades two and three were added in subsequent years. 14 schools with normal class sizes (typically 22 to 24 students) in 7 districts participating in SAGE were deemed comparable based on family income, achievement in reading, racial makeup, and K-3 enrollment. These provided for control data in this quasi-experiment. The intent of the researchers was to maintain classroom cohorts intact across the five years of the program. This set up would have confirmed the finding that lower socioeconomic students most benefits from reduced class sizes as compared to other students. However, after the first year of implementation, acting under the pressure of parents, results within the experimental subgroup were contaminated, showing no greater gains for students with lower SES (Mosteller, 1995). Anecdotal records by experimental group teachers suggest that students demonstrated fewer instances of disruptive behavior, an increased desire to participate, and a more appreciative attitude towards others (Mosteller, 1995). Teacher further indicated that potential discipline problems could be handled in a timely manner, and that academic learning time, including reteaching and instructional differentiation, could be blended within their lesson delivery.

California Class Size Reduction (CSR)

In 1996, following the successes of Project STAR and SAGE, the California legislature provide schools with over one billion dollars to reduce class size. Unlike these program, CSR in California was not experimental and affected a staggering 1.6 million students at an projected cost of 1.5 billion per year (Bohrnstedt & Stecher, 1999), effectively reducing average student-to-teacher ratios in grades K-3 classrooms from 28.6 students to no more than 20 students per teacher. By 1998-99, school year 98.5% of all eligible Local Education Authorities (LEA) had embraced this voluntary program, servicing 92 percent of K-3 students enrolled in California schools (Bohrnstedt, Stecher, & CSR Research Consortium., 1999). Some districts, such as Modesto Elementary (18,000 ADA) and other small LEAs did choose not to participate as their class sizes were already hovering around 25 students (Illig, 1997).

At the end of its first year of implementation, some 18,400 additional teachers were hired, a figure that would increase a year later to 23,500 (Bohrnstedt & Stecher, 1999). The following year, school year 1997-98, the Governors Budget suggested expanding CSR to fourth grade. The State Legislative Analyst's Office (Schwartz & Warren, 1997) recommended against the initiative, citing several obstacles impeding current and even future efforts of school reform through CSR in California, namely: a shortage of qualified teachers, and a lack of suitable facilities.

The rapid implementation across three levels, from kinder to third grade, departed from the models followed in Tennessee (STAR) and Wisconsin (SAGE) in that California CSR was introduced in three grade level on the very first year of class size reduction, a move that is widely regarded as counterproductive (Achilles, et al., 2002). Although the initial per-pupil funding of $600 was later raised to about $800, the CSR program was severely underfunded from the start as compared to the $2,000 per pupil additional funding of project SAGE (Biddle & Berliner, 2002). California CSR also presented considerable challenges as compared to STAR. First, whereas Tennessee large classes had been reduced from larger classes of 22-26 students down to smaller classes of 13-17, Californias overcrowded classrooms in the same primary grades averaged 33 students prior to CSR. Those students were also much more diverse than their Tennessee counterparts. Furthermore, unlike California, Tennessee had space to accommodate class downsizing (Bohrnstedt, et al., 1999).

For these reasons, CSR in California had unintended effects upon the poor, the non-English speaker, the very students it had set up to help. Overcrowded urban schools catering to lower SES students experienced the greatest difficulty in attracting qualified teachers and providing adequate facilities (Stecher, Bohrnstedt, Kirst, McRobbie, & Williams, 2001). Case and point: the California Legislative Analyst's Office reported in the first year of CSR implementation that over 90 percent of teachers in more affluent district are credential holders versus about 75 percent in urban, low SES districts (Schwartz & Warren, 1997). As a result, schools servicing students with minority and low SES profiles were perhaps the last ones to benefit from full implementation.

Sources of Bias in Empirical, Experimental and quasi-experimental studies

The number one issue is that characteristics of students across experimental and control groups varies at non random. Namely, the assignment to smaller classes may be linked to a specific assignment level. Second, pre- post- treatement covariates are a necessity.

Contextual Factors Impacting Student Achievement