Author note: The study reflected in the attached manuscript was part of a larger research project funded by an American Psychological Association Early Career Grant awarded to the first author. A separate published manuscript presents the analysis of other teacher behaviors as well as other domains not included in the present manuscript. As such, the procedures and methodology (as well as the sample) are the same; however, the substantive interpretations reflected in the attached manuscript deviate markedly from the other manuscript.
This study investigated the degree to which the quality of teachers' language modeling contributed to reading achievement for 995 students, both English Language Learners and native English speakers, across developmental bilingual, dual language, and monolingual English classrooms. Covariates included prior reading achievement, gender, eligibility for free lunch, and ethnicity. A 2-level hierarchical linear modeling analysis revealed that (a) prior achievement, Latino ethnicity, and eligibility for free lunch contributed significantly to the model, but gender did not, (b) reading achievement for ELLs was not significantly different than for native English speaking students, (c) students gain 3 points for each unit increase in the quality of language modeling across classrooms, and (d) cross level interactions revealed that the slope of the quality of language modeling and reading achievement for students in monolingual English classrooms and developmental bilingual classrooms was stronger than for students in dual language classrooms. The authors discuss classroom implications of bilingualism and language modeling in improving reading outcomes.
Language Modeling and Reading Achievement: Variations across Different Types of Language Instruction Program Settings
Students who are English Language Learners (ELLs) comprise substantial portion of the school-age population. One in five students speak a language other than English at home (U.S. Census Bureau, 2010), with the primary concentrations of these students in the elementary grades (Kindler, 2002). In spite of over three decades of guidelines mandating that schools meet the particular needs of ELLs (Bilingual Education Act, 1974), ELLs are systematically underserved in schools (Gandara & Rumberger, 2009).
ELLs remain marginalized across several dimensions. For instance, they tend to be disproportionately clustered in high-poverty schools (U.S. Department of Education, 2010). In high-poverty schools, one in four students at the elementary level is identified as having limited proficiency in English. By contrast, the corresponding figures in low-poverty schools are one in twenty-five students at the elementary level. Moreover, schools with high concentrations of ELLs tend to have more teachers with provisional, emergency, or temporary certification, and these teachers receive less in-service training and support than their counterparts in schools with lower concentrations of ELLs (Cosentino de Cohen, Deterding, & Chu Clewell, 2005). On most learning outcome measures, ELLs lag behind their monolingual English-speaking peers (National Center for Education Statistics, 2010). For instance, on the National Assessment of Educational Progress (NAEP) in 2009, only 6% of fourth grade ELLs scored at proficient or advanced in reading (NCES, 2010). Considering that ELLs also have the lowest graduation rates across student subgroups (NCELA, 2008), addressing ways of improving their achievement in earlier grades is particularly salient.
Effectively educating students who are ELLs is a pressing concern. Much of the research examining achievement disparities among ELLs focuses on the relative effects of various language instruction programs on student achievement. As such, we know a great deal about the outcomes of ELLs in various language acquisition models (e.g., Salazar, 1998). We also know a great deal about how the quality and quantity of language exposure affects children's language development (Hart & Risley, 1995; Hoff, 2003), but know much less about the degree to which reading outcomes for students across different language instruction programs are related to teachers' language modeling. The research presented here addresses this gap in the literature by considering the degree to which the quality of teachers' language modeling influences reading achievement for ELLs across dual language and developmental bilingual classrooms (DBE), as well as native English speaking students in monolingual English classrooms.
Language Instruction Programs
A central educational challenge facing schools in their efforts to educate ELLs involves the dual responsibilities of teaching content knowledge while simultaneously increasing English proficiency. Schools approach this task by employing different language instruction programs with fundamentally disparate theoretical frameworks and goals (Ovando, 2003). Scholars have described the different kinds of language instruction programs at length, most often differentiating among those typically used in schools such as English immersion, English as a second language (ESL), and transitional bilingual programs (e.g., Ramirez, Yuen, Ramey, & Pasta, 1991). As such, we do not describe the various kinds of programs but focus on the two reflected in the present study: developmental bilingual education  (DBE) and dual language.
Both DBE and dual language programs are based on extensive research indicating that strong oral and literacy skills in the first language provide a solid basis for language and literacy development in the second language (Montague, 1997; Mora, Wink, & Wink, 2001; Thomas & Collier, 1997). These programs are also based on evidence that native language instruction can promote higher academic achievement when compared to instructing children in the second language only (Montague, 1997; Mora, Wink, & Wink, 2001; Thomas & Collier, 1997).
In both DBE and dual language classrooms, instruction follows one of two models. In the first, referred to as 90:10, 90% of instruction is in the non-English language during the early elementary grades and English is incrementally introduced until a balance in the two languages is reached by the middle elementary grades. In the second, referred to as a 50:50 model, instruction is delivered in the two languages, equally. Unlike other bilingual programs that transition ELLs into all-English classrooms (see Ramirez, Yuen, Ramey, & Pasta, 1991), DBE and dual language emphasize the maintenance of bilingualism.
There is much variation across DBE and dual language programs. In those using a 90:10 approach, there is usually one classroom teacher who is fluent in both languages. Although this might also be the case in 50:50 models, sometimes there are two teachers involved: one who delivers instruction in English and another who delivers instruction in the other language. In some cases, students receive instruction in the non-English language for one week and receive instruction in English the following week. In other variations of the programs, students may alternate between English and the non-English language on a daily basis (e.g., Spanish on Monday and English on Tuesday). In yet other models, students alternate between two languages on the same day. Whatever the pattern, it is aligned with the curriculum and maintained across the school year. Teachers do not repeat concepts each week (or day) in a different language, but continue from the point in which instruction left off.
Although DBE and dual language programs share many similarities, there is a key difference between the two programs. Whereas DBE, also referred to one-way bilingual immersion, typically consists of ELLs who share a non-English primary language, dual language, also referred to as two-way bilingual immersion, ideally comprises equal numbers of students who speak English as their native language and ELLs who share another language. Some assert that because the classroom comprises ELLs and non-ELLs, dual language programs have two marked advantages. The first is that these classrooms provide a context wherein learning occurs through social interaction, promoting authentic, meaningful interactions in both social and academic contexts across native English and native Spanish speakers (Wong-Fillmore, 1991; Genesee, Lindholm-Leary, Saunders, & Christian, 2006). The second is that dual language classrooms have the potential of being able to promote cross-cultural understanding among students (Calderon & Minaya-Rowe, 2003; Howard, Sugarman, & Christian, 2003; Lindholm-Leary, 2001). There are some scholars, however, who have cautioned that although these settings can certainly be beneficial, they can also have deleterious effects on ELLs when equitable treatment of the two languages or cross-cultural understanding are limited (see Amrein & Peña, 2000; Valdés, 1997).
Academic Outcomes in Different Language Instruction Programs
Over several decades, researchers have debated whether programs that use ELLs' native language in instruction are effective in improving achievement outcomes in English (e. g., Baker & de Kanter, 1981; Brisk, 2006; Greene, 1996; Legarreta, 1979; Rolstad, Mahoney, & Glass, 2005; Rossell & Baker, 1996; Sirin, 2005; Slavin & Cheung, 2003; Willig 1985). Those who espouse English only have argued that exposure to non-English during instruction encroaches on English exposure, and as a result, affects the degree to which ELLs' English proficiency can improve (e.g., Baker & de Kanter, 1981). Despite these assertions, research syntheses have consistently demonstrated that language programs that use ELLs' native language not only do not hinder English acquisition, but they also seem to promote stronger academic outcomes (e.g., Salazar, 1998; Slavin & Cheung, 2005).
There is evidence that in effective DBE programs, children can outperform ELLs enrolled in other types of bilingual programs on standardized tests-both in their native language and English (Tong, Irby, Lara-Alecio, & Mathes, 2008). Notably, researchers have found that ELLs in effective DBE programs outperform peers who receive only English instruction, based on scores from standardized tests in English (Thomas & Collier, 1997). Dual language programs have shown benefits for students as well, outweighing the effects of DBE in some cases (Thomas & Collier, 1997). In a re-analysis of Thomas and Collier's (1997) longitudinal study, Salazar (1998) computed the effect sizes (ES) between the various language instruction programs that use ELLs' native language and English only across grades 1 through 11. Compared to English only, the Normal Curve Equivalent (NCE) scores of students in DBE programs had NCE scores that were slightly higher across grades (average ES = .38). Students in dual language programs, however, had NCE scores that were Â½ SD higher across grades (average ES = .53). Salazar (1998) also found that although the differences between students in the English only group and the various programs that used ELLs' native language were negligible across the early elementary school years (ES = -.14 to .29), the differences became more pronounced by sixth grade, with DBE (ES = .28 to .85) and dual language (ES=.47 to 1.28) programs reflecting the highest gains.
Other Benefits of Bilingual Approaches
The evidence that bilingual approaches promote English proficiency among ELLs to a greater extent than English does only suggests that there are cognitive benefits to bilingualism. Indeed, researchers have long recognized that there are cognitive benefits associated with bilingualism (e.g., Peal & Lambert, 1962). In a recent meta-analysis of studies examining the relationship of various cognitive skills and bilingualism, Adesope, Lavin, Thompson, and Ungerleider (2010) examined 63 studies with an overall sample size of over 6000 participants. They found that bilingualism was associated with slightly higher (g = .26 to .32) problem solving skills, metacognitive awareness, and metalinguistic awareness. Bilingualism was also associated with moderately higher (g =.48 to .57) abstract and symbolic representation and working memory, and substantially higher (g = .97) attentional control. Across grades, the researchers found that bilingualism was related to moderate gains in preschool (g = .64), elementary grades ( g = .09 to .26), and secondary grades (g = .63).
Another important strand of literature that tends to support bilingual approaches is the examination of sociocultural dimensions to learning. A sociocultural perspective on language and learning emphasizes that learning takes place in the context of social interactions, not as an isolated, individual event (Hawkins, 2004; Moll, 1990). This perspective provides an important frame of reference for comparing the language instruction programs examined in this study. Research shows sociocultural factors can positively influence ELLs' learning by bridging home-school communications and engaging parents to support literacy development (Goldenberg, Rueda, & August, 2008; Gonzalez, Moll, & Amanti, 2005). In addition, the degree to which languages are valued within classrooms and in schools (as well as beyond the schoolhouse doors) influences student attitudes (Goldenberg, Rueda, & August, 2008; Gonzalez, Moll, & Amanti, 2005).
Researchers have also found differences in monolingual and bilingual children's cognitive development. Mechelli and colleagues (2004) have shown that early exposure to a second language substantially alters the region of the brain associated with language development. They examined the brains of 88 native English speakers to examine the differences in the region of the brain associated with language processing between bilingual and monolingual speakers. Those with little or no exposure to other languages were considered monolingual (n = 25). Early-bilinguals (n = 25) had acquired a second language before age 5 and had practiced it regularly, and late-bilinguals (n = 33) had learned a second language between the ages of 10 and 15, and had also practiced it regularly. Mechelli and colleagues (2004) found that the density of the grey matter in the area of the brain associated with language in bilinguals was considerably higher (z = 7.1) than in monolinguals. This effect was also substantially higher (z = 3.5) among early-bilinguals when compared to late-bilinguals. The researchers also found a strong, negative association (r = -.86) between overall language proficiency, which was measured by standardized assessments of reading, writing, speech comprehension, and production, and the age of acquisition of the second language.
Research points convincingly toward the benefits of bilingual approaches, academic and otherwise. A central feature, however, is that this research tends to focus attention solely toward programmatic descriptors that detail the degree to which English and students' native language is used in instruction. To date, research has largely ignored the degree to which teaching practices may moderate the relationship between language models and student achievement (for a noteworthy exception, see Cirino, Pollard-Durodola, Foorman, Carlson, & Francis, 2005).
Although the quality and quantity of language exposure is a strong predictor of children's language development (Hart & Risley, 1995; Hoff, 2003), teachers can have significant effects on academic outcomes in the absence of rich language and literacy stimulation at home. Namely, researchers have shown that high quality language instruction and modeling effectively promotes children's language development (Aldridge, 2005). Indeed, during preschool, the most significant predictor of children's language and cognitive development is the quality of teacher-child interaction (Early Child Care Research Network, 2000, 2002). Included among the numerous language facilitation strategies associated with enhanced language development are the modeling of varied vocabulary and advanced linguistic structures, strategic repetition, effective use of open-ended questions, expansions and recasts of children's utterances, extended talk on a specific topic, and cognitively rich topics of conversation (Baker & Nelson, 1984; Dickinson, Darrow, & Tinubu, 2008; Girolametto &Weitzman, 2002; Senechal, 1997; Vasilyeva, Huttenlocher, & Waterfall, 2006; Wasik & Bond, 2001; Wasik, Bond, & Hindman, 2006).
Researchers have also found that dialogic reading, an interactive book reading approach wherein teachers model rich language and promote active child participation, enhances the vocabulary skills, frequency and complexity of responses, and other literacy skills of children from low-income backgrounds (Lonigan & Whitehurst, 1998; Morrow, 1988; Wasik & Bond, 2001; Whitehurst et al. 1994). Valdez-Menchaca and Whitehurst (1992), for example, examined the effect of dialogic reading strategies on the language development of Spanish-speaking toddlers attending day care in Mexico. The investigators randomly assigned the children to the treatment group in which they interacted with a graduate student during short individual dialogic reading sessions each day for one month. The children assigned to the control group participated in regular conversations with the graduate student. At post-intervention, the children in the treatment group displayed significantly higher vocabulary skills and more complex syntax than their peers in the control group. Jiménez, Filippini, and Gerber (2006) followed by examining parent and caregiver use of dialogic reading with their older, seven- and eight-year-old children, the majority of whom spoke Spanish. The parents and caregivers received three training sessions in dialogic reading strategies and shared twelve books using these techniques with their children during approximately an eight-week period. Although there was no control group, results indicated that children produced significantly more words and word types, and took more and longer turns at post-intervention than at pre-intervention. Overall, these studies support the positive effects of dialogic reading techniques on Spanish-speaking children's language productivity and complexity.
According to Tharp and Gallimore (1988, 1989, 1991), Instructional Conversation (IC) is a pedagogy that refers to dialogue in which teachers use conversation to support students' learning within their zone of proximal development (ZPD; Vygotsky, 1978), that is, just beyond their current level. As opposed to recitation and question-and-answer techniques, teachers who use IC stimulate higher-order thinking by promoting extended and goal-oriented conversation that incorporates questions and sharing of knowledge and ideas. Teachers listen to students, tailor the dialogue to meet their comprehension level, and facilitate their active co-participation. Tharp and Gallimore also proposed that this methodology, especially when used in smaller groups and student-directed activities, facilitates the teacher's awareness of students' grasp of the material. This in turn gives the teacher the information necessary to adjust the kind of support provided during the learning moment.
IC incorporates students' individual, community, and family knowledge. As such, it is engaging and culturally relevant to students and is a highly recommended methodology for culturally and linguistically diverse learners (Au, 1979; Dalton & Sison, 1995; Echevarria & McDonough, 1995; Goldenberg, 1991; Tharp, 1997; Tharp & Gallimore, 1998). Dalton and Sison (1995), for example, examined the use of IC to enhance the language skills related to geometry in middle-school Latino students who were reportedly resistant and excluded. Results from their study indicated that with merely four sessions, teachers' use of IC resulted in significantly more frequent, accurate, and confident student responses. Despite the numerous reports supporting IC, however, U.S. schools appear to considerably underutilize this methodology (Tharp, 1997).
Despite the numerous studies that have contributed to our understanding about the role of teachers' language modeling on student outcomes, the majority of research has been conducted with preschool-aged children. What remains largely unexamined is the degree to which language modeling promotes literacy-related achievement outcomes among students in upper elementary grades. Moreover, considering the evidence that different bilingual settings have varying effects on students' achievement, it is important to examine the degree to which language modeling plays a role.
In this study, we move beyond achievement scores alone by examining reading achievement scores in the context of the quality of teachers' language modeling, while controlling for prior achievement and demographic variables. Specifically, our research examines the following questions: (1) To what extent is the quality of language modeling associated with reading achievement among students in different language instruction programs when compared to their non-ELL peers? (2) Does the language instruction program moderate the association between language modeling and reading achievement? By examining how language instruction programs moderate the effects of teachers' language modeling, findings from the present study contribute to our understanding of the ways language instruction programs play a role in improving academic outcomes for ELLs.
Participants included 995 students and their teachers (N = 46) in grades 3, 4, and 5 across 13 schools in an urban school district in the Midwest. The district includes over 100 elementary schools with a majority of the student population (67%) considered "low income." Almost 60% of the students are African American, 20% are Latino, 13% are White, 5% are Asian, and the remaining students are "other." The district has an overrepresentation of students in special education (16%), and has a relatively low graduation rate (64%). Although only 5% of the students in the district are ELLs, schools with relatively high numbers of ELLs were identified for recruitment given the focus on ELLs in the study. Of the total participating sample, 86% qualified for free lunch and 62% of the students were ELLs. We present disaggregated descriptive statistics of the participating students in each the different language instruction programs in Table 1.
The participating school district uses a Home Language Survey that is required upon enrollment to identify students who may be ELLs. School personnel assess students that report a home language other than English on the survey to determine their English language proficiency in reading, writing, speaking, and listening. School personnel assess students identified as ELLs annually to monitor English acquisition and satisfy legal requirements for assessment and accountability. To address the needs of linguistically diverse students, the district offers two types of bilingual language instruction programs already discussed: DBE and dual language programs.
The DBE program in the participating schools follows the 90:10 model wherein teachers use Spanish instruction 90% of the time and English 10% of the time in kindergarten. Teachers continue to provide most instruction in Spanish, but introduce English incrementally each year until fourth grade when both languages are balanced in instruction. Teachers subsequently maintain the balance of English and Spanish is through grade 12. In these classrooms, students are sequential bilinguals. In the present study, there are 18 DBE classrooms that comprise ELLs whose native language in Spanish.
The dual language program in the participating schools follows the 50:50 model. Beginning in kindergarten and continuing over subsequent grades, students in dual language classrooms receive instruction in Spanish for one week, and receive instruction in English the following week. This pattern continues across the school year, and is maintained through grade 12. In these classrooms, students are simultaneous bilinguals. Dual language programs are represented by 14 classrooms and comprise both language minority and language majority students (50% Spanish and English, respectively), although the majority of students are Latino (see Table 1).
The remaining 14 classrooms in the present study provided instruction in English only, and comprised mostly non-ELLs  . Of the n = 18 ELLs in monolingual English classrooms, 56% are Latino, 11% are African American, 22% are Asian, and both White and students who were classified as "other" each comprise 6% of the ELL participants in monolingual English classrooms.
Level 1 Measures
Achievement measures: Discovery reading assessments. The participating school district uses standardized formative reading assessments in the language of instruction  aligned with the state's academic standards. Students take the assessment four times throughout the school year between September and May. Rasch-based vertical scaling (using common items across administrations within a grade level), horizontal scaling (using common items across grades that are centered on grade 6, with a spacing factor of one logit for each grade level), and test equating procedures were applied to the assessments. These procedures are based on Thurstone assumptions (Kolen, 2006), which allow interpretations of linearity across grade levels. For all grades, each reading assessment includes a total of 32 questions comprised of 8 items for each of the following skill objectives: (1) determine the meaning of words and phrases in context; (2) understand text; (3) analyze text; and (4) evaluate and extend text. For the purposes of the proposed study, reading assessments administered in the beginning of the school year were used as a covariate to control for achievement at the beginning of the year, and reading assessments administered in late spring were used as outcome measures. By controlling for achievement at the beginning of the year, results more accurately reflect the influence of classroom dynamics on achievement.
An important consideration in the present study was ensuring sufficient power to detect interactions between levels (Hox, 2002; Hox & Maas, 2001), limiting our ability to examine grade-level effects. To examine achievement across upper elementary grades, student's individual scores were converted to a Normal Curve Equivalent (NCE) with a mean of 50 and standard deviation of 21.06 based on the national norming sample distribution of the assessment. That is, students' scores were converted to NCEs based on the distribution of scores relative to each grade's national norming population (as opposed to the local grade level norms). By converting all scores to NCEs using the grade appropriate norming sample, baseline reading achievement and outcome scores could be aggregated across grade levels without violating the necessary assumptions in interpreting student scores relative to the respective norming samples. To illustrate, an NCE of 50 for a student in 3rd grade would reflect performance at the 50th percentile relative to the national sample of 3rd grade students who took the exam, whereas an NCE of 50 for a student in 4th grade would reflect performance at the 50th percentile relative to the national sample of 4th grade students who took the exam. In the aggregated data set, any students' score would still reflect performance according to the appropriate norming sample while borrowing power from adjacent distributions (i.e., increased sample size). We present disaggregated descriptive statistics for the achievement measure in Table 2.
Language instruction programs. In the analysis, language acquisition programs were treated as student-level variables because although all students within a particular classroom were participating in the same language instruction program, violating the assumption of independence, they had not received the same instruction (i.e., same teacher) in preceding years. Moreover, no controls for baseline achievement prior to enrollment in language programs were available. Class-level effects were controlled by reading proficiency at the beginning of the school year. Although no information on mobility across programs for the participating students was provided by the district, the district-wide attrition and matriculation across bilingual and dual language program is very low (< 1%). Moreover, students who are native Spanish speakers can enroll at any time in the DBE program offered by the district. In these rare cases, the district requires that students complete a language assessment to verify that the student has the literacy and language skills in Spanish to be able to continue their academic development. Students can only enroll in dual language if they have some background in the target language (e.g., for native Spanish speakers, students must have some proficiency in English). Thus, language instruction programs were assumed to vary by student given their differential exposure to the methodology specific to the language instruction program in which they were enrolled.
Level 1 controls. Individual-level predictors used as controls included students' eligibility in free lunch  as a proxy for socioeconomic status (SES) (0 = received free lunch, 1 = did not receive free lunch), ethnicity/race (dummy coded across Latino, African American, Asian, Other; White was the excluded category for comparison), and gender (0 = male, 1 = female).
Level 2 Measures
Classroom Assessment Scoring System (CLASS; Pianta, La Paro, & Hamre, 2008). The CLASS is a classroom observation instrument developed and validated to assess classroom quality across prekindergarten through 5th grade and across content areas, with a focus on the domains of emotional support, classroom organization, and instructional support (NICHD ECCRN, 2002a, 2002b, 2004, 2005; Pianta et al., 2002; Pianta el al., 2007; Pianta & Hamre, 2009, p. 110). The CLASS is grounded in developmental theories and is centered on the belief that "students and adults are the primary mechanism of student development and learning" (Pianta, LaParo, & Hamre, 2008, p. 1). For the analysis presented here, only the language modeling dimension that falls within the instructional support domain was included because it is the only dimension focused on IC. Whereas the other two dimensions that make up the instructional support domain, concept development and quality feedback, encompass teachers' scaffolding of higher level thinking skills, the language modeling dimension is designed to "[capture] the quality and amount of the teachers' use of language-stimulation and language-facilitation techniques" (Pianta, La Paro, & Hamre, 2008, p. 79). Observers rate the degree to which: (1) teachers and students engage in frequent conversations with each other, peers engage in instructional conversation, and teachers respond to students; (2) teachers ask open-ended questions; (3) repeat and extend and/or elaborate student responses (4) teachers map actions (both their own and students') with language; and (5) use a variety of language, including advanced vocabulary that is connected to familiar words (Pianta, La Paro, & Hamre, 2008).
To illustrate, a classroom that would receive a low score (1 or 2) might include the following during a reading activity:
Teacher: How many apples did the giant eat?
[Students respond in unison, "8."]
Teacher: Nicely done. How many apples did the boy give him?
[Students respond in unison, "10."]
In contrast, a classroom that would be scored on the higher end of language modeling (6 or 7) might include the following scenario, based on the same reading activity:
Teacher: Discuss with your group for a minute the situation in which our protagonist found himself. (Students engage in conversations)
Teacher: Now, let's hear what each group decided about our hero's circumstances. Group 1?
Student: Well, the boy was really scared at first. But he saw that the giant looked sad. He then felt sorry for the giant, and shared his apples with him. He thought that maybe he was hungry.
Teacher: I see! So, your group discussed how Daniel, one of our protagonists, examined the giant's body language and demeanor to make an assumption about whether or not the giant was a threat or a menace to him. And Daniel decides the giant doesn't look intimidating, but maybe pitiful. And the giant gained Daniel's compassion.
Observation instructions for the CLASS specify that the use of two languages in classrooms does not automatically result in a higher score; behaviors that result in a high score for language modeling are the same across language instruction programs. The language modeling dimension was highly correlated across all observations with coefficients ranging from .37 to .68.
Quality of language modeling is rated on a 7-point scale (1 - 7) across 4 cycles of 30-minute observations. In the present study, all of the grade levels reflected consistency in both format and subject of observation with 76% of the observations occurring during whole group instruction and the remaining 24% occurring during small group instruction. In terms of subject, 41% of the observations took place during math instruction, 30% during language arts, 19% during social studies, and 9% during science. In the DBE and dual language classrooms, approximately 50% of instruction was observed in Spanish. We present disaggregated descriptive statistics for language modeling scores in Tables 2 and 3.
After university and school district Institutional Review Boards granted permission to conduct the study, principals across 33 public and charter schools were informed of the study during a district meeting and subsequently contacted via telephone and email to request a meeting. Although all principals granted permission to recruit teachers (N â‰ˆ 330) and students (N â‰ˆ 6,600), only 14% of the teachers (N = 46) across 13 schools gave voluntary, informed consent. The reading achievement measures and demographic data (e.g., ethnicity and free lunch status) for student-level analyses were provided for all the students in participating teachers' classrooms by the district, de-identified to protect student identity.
Trained classroom observers included two graduate students in a post-baccalaureate teaching certification program and one faculty member. The observers were trained by Teachstone Inc. to conduct classroom observations using the Classroom Assessment Scoring System (CLASS; Pianta, La Paro, & Hamre, 2008). Training involves the use of a standardized manual that provides comprehensive descriptions of the various scoring criteria for each of the domains assessed (see Pianta, LaParo, & Hamre, 2008) and videotaped classroom interactions, which provide trainees with the opportunity to practice coding numerous classroom interactions using the criteria outlined in the manual. During training, potential observers are provided with iterative opportunities to compare their scores with those of 3 master coders (i.e., experts whose codes are used by Teachstone Inc. as the standard on which to compare scores for reliability purposes) and discuss scoring discrepancies. After 2 days of training, observers are provided with online access to 20-minute videotaped classroom interactions. Potential observers are instructed to view and code each scene (N = 6) of classroom interactions without the opportunity to rewind or stop the video so as to provide a scenario as close to actual classroom observations as possible. To be deemed reliable, potential observers' scores had to reflect 80% agreement within 1 point of master coders' scores across the ten CLASS dimensions. For trainees who do not meet reliability criteria the first time, up to two additional opportunities are provided where they view and code an additional 3 classroom interaction videos. Trainees who do not meet reliability criteria after the third attempt are required to attend another two-day training and repeat the reliability test(s). The percent agreement between observers and master coders is not released by Teachstone, Inc.; however, the trained observers in the present study met the minimum criteria for reliability.
Once observers met the minimum coder reliability criteria, they contacted teachers to schedule classroom observations and student recruitment across the spring semester of the school year. Observations were conducted in each classroom using 30-minute iterative cycles (i.e., classroom observation for 20 minutes, and recording for 10 minutes) until the end of the two-hour observation period; observations in dual language and DBE classrooms were conducted by observers fluent in the language of instruction. All activities during the observation cycle were coded with the exception of recess or free time. If a classroom activity was interrupted in the middle of an observation cycle for student recess or free time, ratings were still assigned for the observed activity; the observation period continued for the duration of the remaining time once students returned from recess or free time. During the 10-mintues of recording, observers recorded the ratings of the dimensions using observation notes.
Interrater reliability. To determine interrater reliability of the observations in the present study, 25% of the classroom observations were coded simultaneously by two trained observers. Interrater reliability across 11 classrooms resulted in 96% agreement within 1 point across the ten CLASS dimensions. Given that interrater agreement computation does not take into account chance agreement (Kolbe & Burnett, 1991), interrater reliability criteria outlined by Shrout and Fleiss (1979) was also used. For the present study, the most appropriate intraclass correlation model is a two way mixed model ANOVA given that each observation was rated by two observers. Strong intraclass correlations (ICC) of the two independent scores of observations resulted for language modeling, with ICC (2, 1) of .93 (p < .01).
To calculate a priori sample size that would provide sufficient power for a two-level HLM for the present study, we conducted a power versus number of sites analysis using Optimal Design (OD; Spybrook et al., 2009). We entered the following coefficients: Î± = 0.05, n = 20 for the estimated number of students per classroom and J = 6 for the number of classrooms per school, and Î´ = .35 for the effect size. We chose a conservative effect size despite prior studies suggesting a larger effect (e.g., Hamre & Pianta, 2005) given the absence of prior studies examining effects for Latino populations and upper elementary classrooms. We added site level covariates (i.e., prior achievement) that have been shown to reduce level 2 variance by approximately 80% in prior studies of classroom observations (e.g., Wiley, McCaslin, Good, & Sabers, 2009). Under these conditions, we determined that there would be sufficient power to run a two-level analysis with at least 20 classrooms per language model.
When student data are collected across classrooms, they often have a nested structure wherein students within a classroom tend to have similarities that are not measured, violating the assumption that measures are independent of one another (Raudenbush & Bryk, 2001). Hierarchical linear models (HLM) address the issue of nested data by allowing for an examination of the data at different levels. In the present study, data consisted of measures at two levels of analysis: the student and his or her classroom (i.e., teacher). The following equation summarizes the full mixed model of reading achievement:
where Yij represents individual student reading achievement, based on the late spring administration of the reading assessment, with error terms u0j showing error associated at the classroom-level in estimating the effect of classroom level variables and rij showing error associated with individual student i in classroom j. For the present study, model fit was determined using a proportional reduction of error approach (Hox, 2002). In this approach, multiple models are examined to determine the degree to which within- and between- classroom variance is explained. The first model includes only the intercept; variables are added to each subsequent model. SPSS version 18.0 was used to manage and clean the data and HLM 6.08 to estimate the two-level models.
The variance component in the null model (p < .01) indicates that a significant amount of variance in students' reading outcomes remained to be explained; the intra-class correlation (ICC), or the proportion of the variance in reading that was explained by the grouping structure (which in this case was classrooms), was .08, meeting the criteria for the appropriateness of HLM to address the research questions (Hox, 2002).
Once the null model was established, level 1 variables were added in model 2 (see Table 4) and compared to the null model. Prior achievement, Latino ethnicity, and SES contribute significantly to the model, but gender does not (see model 2 in Table 4). Including prior achievement is a critical control when attempting to examine the contribution language modeling provides in explaining achievement. By holding constant early fall reading achievement, effects of teachers' language modeling can be more accurately interpreted. Students who receive free lunch are on average close to 7 points lower (.33 SD) than students who do not receive free lunch. Notably, there is no difference in achievement between the English monolingual students and ELLs in either DBE or dual language programs, but Latino students scored on average 3 points lower (.14 SD); other ethnicity categories were not significant. In terms of language modeling, students gain almost 2 points on average for each unit increase in the language modeling measure across classrooms. That is, holding constant prior achievement, SES, and gender, for each unit increase in the rating received by a teacher on the language modeling variable, students across all language programs gained .10 SD on the reading outcome measure. For teachers with the highest ratings, this translates to a 12 point increase (.50 SD) - a moderate effect that once considered in the context of one school year, can translate into marked achievement gains.
In the final model, we estimated cross-level interaction effects to determine if the slope for language modeling and reading achievement differs across language programs. Cross level interactions between language modeling and the type of language instruction program reveal that there is no difference in slopes between students in monolingual English classrooms and DBE classrooms. In other words, the association between language modeling and achievement is the same in these two settings. For dual language classrooms, however, the slope for language modeling and achievement is weaker than it is for both monolingual English (see Figure 1) and dual language settings. This indicates that there is a stronger association between language modeling and reading achievement in monolingual English and DBE classrooms compared to dual language classrooms. No other cross-level effects were statistically significant.
This study explored the degree to which teachers' language modeling is related to reading outcomes for students in different language instruction programs by considering
(a) the extent to which the quality of language modeling is associated with reading achievement among students and (b) whether language instruction programs moderate the relationship between the quality of language modeling and reading achievement.
Consistent with prior studies (Battle & Pastrana, 2007; Morales & Saenz, 2007), the findings in the present study explained a substantial gap in achievement scores. Namely, eligibility for free lunch was associated with a reduction in reading achievement by â…“ of a standard deviation. This held across settings (monolingual English, DBE, and dual language) and the addition of variables in each subsequent model did not reduce the effects of SES. Notably, however, teachers can ameliorate the effects of low SES quite markedly. In classrooms with modest levels of language modeling (a score of 3 or 4), gains of approximately .25 to .33 SD can bring students from low SES backgrounds on par with peers who do not have poverty as an obstacle to achievement. In classrooms with the highest levels of language modeling, students can gain approximately .60 SD. The findings support prior research that has asserted that engaging in IC can ameliorate achievement disparities rooted in the effects of SES. As such, this finding has critical implications for the ways in which preservice teaching programs prepare teachers to engage with students, as well as the kind of professional development opportunities provided to teachers. Clearly, preparing teachers to use IC effectively can provide them with a useful tool to ameliorate the disparities of reading achievement for students in poverty.
The findings not only support the importance of the quality of language modeling for ameliorating the effects of poverty, but also for improving reading achievement among traditionally marginalized students regardless of classroom setting. In the present study, both Latino students and African American students perform lower than their peers by .20 SD and .14 SD on average, respectively; however, similar to the effects of language modeling on SES, even low levels of language modeling can bring these students on par with their peers. These findings also support the importance of ensuring traditionally marginalized students are provided with high quality teachers that are prepared to engage in IC.
In consideration of prior research, the apparent lack of differences across language instruction programs (i.e., no significant coefficients resulted for the language instruction programs in models 2 and 3) after controlling for SES and ethnicity may seem counterintuitive; however, the fact that ELLs in either language acquisition program did not perform lower than non-ELLs is noteworthy-especially considering that the reading assessments were in English. Moreover, even though Latino students had scores that were on average 3 points lower (.14 SD) than non-Latino students after controlling for prior achievement and SES, reading achievement for Latino ELLs in DBE and dual language classrooms was commensurate with their White native English speaking peers (see Table 4).
Although there are clear benefits to bilingual settings for Latinos, the interaction between dual language and language modeling suggests that there is something setting dual language classrooms apart from DBE-a finding that is consistent with the extant research on the advantages of dual language (e.g., Thomas & Collier, 1997). In the present study, the cross-level interaction suggests that in DBE and monolingual English classrooms, improving students' reading achievement is highly reliant on the degree to which teachers model IC-a dependency that is not as strong for dual language. To understand why this might be the case, considering similarities between DBE and monolingual English that are in stark contrast to dual language merit consideration.
Although there are marked differences between monolingual and DBE classrooms, both deliver instruction in students' native language in the early elementary school years. Namely, students in the participating DBE classrooms are in 90:10 settings, gradually adding a second language and becoming sequential bilinguals. In contrast, students in dual language classrooms are simultaneous bilinguals in 50:50 settings, exposed to both English and Spanish instruction equally throughout elementary school years. In consideration of the numerous advantages documented for bilingualism-particularly when exposure to a second language occurs earlier and with more frequency (e.g., Mechelli et al., 2004), it may be that simultaneous bilinguals are gaining cognitive benefits that outweigh the effects of language modeling that is required in other settings.
The present research complements the work of Mechelli and colleagues (2004) in an intriguing way by suggesting that the cognitive benefits gained by students who have had early exposure to second language acquisition may alter the relationship between teacher language modeling and children's literacy achievement. This is not meant to suggest that language modeling is not important for students in dual language classrooms, but that having learned two languages since kindergarten may enhance the cognitive processing skills of dual language students, such as mental control and attention (Bialystok, 1999) that may then support language and literacy development. There is also evidence that bilingual exposure helps children develop a more analytic orientation to language and its sound components than monolinguals, and that such enhanced metalinguistic awareness and metaphonological skills may contribute to successful literacy acquisition (Bialystok, Majumder, & Martin, 2003; Bialystok, 2004).
In addition to cognitive differences between monolinguals and bilinguals, some evidence also suggests that students in dual-language environments develop unique facilitative reading comprehension strategies. Jiménez, García, and Pearson (1996) examined the reading comprehension strategies of bilingual Latino sixth and seventh graders and found that the most successful students used numerous strategies, including searching for cognates, translating, and transferring knowledge from one language to the other, to facilitate their reading comprehension in English. While bilingual students in both DBE and dual-language classrooms might be expected to have similar opportunity to develop such unique comprehension strategies, perhaps students exposed to both languages earlier in the dual-language classrooms are more likely to develop such facilitative strategies or to develop them earlier than their peers in DBE classrooms.
The heterogeneity in dual language classrooms (although somewhat limited in terms of ethnicity in the present study given that the majority of students in these classrooms were Latinos) may also be beneficial to students, potentially validating their sociocultural background and improving their motivation (Gonzalez, Moll, & Amanti, 2005) in a way that reduces the dependency of language modeling. Moreover, dual language settings may not be as reliant on language modeling given that one premise for dual language instruction is that children learn from each other through authentic, meaningful social interactions (Wong-Fillmore, 1991; Genesee, 1999). Indeed, Pianta, LaParo, and Hamre (2008) who designed the CLASS assessment assert that both students and teachers contribute to their students' learning, and numerous studies have shown the positive effect of peer modeling on children's language development, particularly during the early academic years (e.g., Henry & Rickman, 2007; Mashburn, Justice, Downer, & Pianta, 2009; Schechter & Bye, 2007). As such, it is plausible that overall peer language modeling in both English and Spanish (in dual language settings) exerts a greater influence on student learning than teachers' modeling in monolingual English and DBE classrooms.
Despite the reduced effect of language modeling on reading achievement in dual language classrooms, it is important to remember that students in DBE classrooms performed at a level commensurate with their native English-speaking peers. As such, 90:10 bilingual settings are indeed beneficial to ELLs. Still unclear, however, is the degree to which the benefits of dual language can be attributed to sociocultural dynamics or simultaneous bilingualism. Indeed, there is extant research supporting the notion that both dual language and DBE setting contribute to reading achievement for ELLs, but without examining 50:50 DBE and 90:10 dual language settings, it is not possible to identify the degree to which sociocultural dynamics and simultaneous or sequential bilingualism play a role in ELLs' achievement. As such, both merit further examination to contribute to our understanding about the ways language instruction programs ameliorate achievement disparities for ELLs.
Although the present study adds to our understanding of both the ways IC can improve academic outcomes for students and the ways IC may be less salient for simultaneous bilingual students, there are limitations to the present study. First, there may be a selection threat in terms of students participating in DBE and dual language programs that contributed to the results. In other words, there might be reasons that contribute to enrollment in one of the two bilingual settings that are unmeasured, but may contribute to the findings. Unfortunately, although randomization is necessary to control for selection threats, it remains elusive in classroom settings given that parents choose the language instruction program placement for their child. Despite this limitation, we were able to control for achievement at the beginning of the year to-at the very least-reduce the impact of selection threat that may exist. Another limitation of the present study is that we were unable to examine effects by grade given the sample size. In prior research (e.g., Reese, Goldenberg, & Saunders, 2006), however, students in dual language programs across grades K-2 have performed at commensurate levels in Spanish reading compared to their peers in DBE programs, and only slightly below their English monolingual peers in English reading assessments. Students in dual language programs have, however, made the highest gains in both English and Spanish when examined longitudinally across grades K-6 (e.g., Salazar, 1998). Thus, although the findings from the present study indicate that ELLs in dual language programs outperform their ELL peers in DBE programs as well as non-ELLs in English reading performance, we are unable to determine the degree to which their proficiency gains occur across their trajectory in the program. To that end, there is a need for future research to focus on the degree to which teachers' language modeling-in both English and students' native language-influence students' reading achievement across their academic trajectory. Another potential selection threat to consider in the present study is that because teachers volunteered to be part of the study (i.e., agreed to have someone sit in their classroom writing detailed notes about their instruction), it is likely that only teachers who felt confident in their abilities agreed to participate. Examining the range of observed language modeling scores (see Table 3), it becomes evident that the lowest score was in the middle range with no one scoring under 2.75 regardless of the language of instruction. The district requires bilingual certification for all teachers in DBE classroom, and it is likely that teachers may have felt competent in their skills with ELLs in these settings. Unfortunately, no information on teachers' background (e.g., years teaching, certification) was provided, limiting a deeper understanding about the participating classrooms. As such, the findings may generalize to other, high-functioning classrooms, but less so to classrooms.
The finding that dual language programs (or at least that 50:50 settings) may be less dependent on the quality of language modeling over DBE (or 90:10 settings) has important policy implications given the broader differences between these two language instruction programs. While sharing the goal of cultivating bilingualism, the social context of the two approaches is markedly different. DBE classrooms comprise ELLs who are isolated from their native English-speaking peers, while dual language classrooms strive for linguistically balanced populations. A sociocultural perspective emphasizes the importance of these different contexts on learning. As Hawkins (2004) points out, this perspective "sees meanings and understandings constructed not in individual heads, but as between humans engaged in specific situated social interactions" (p. 15). One can infer that the social interactions around language and literacy differ between DBE and dual language classrooms. The research presented here suggests that these differences may affect language development.
The pressing concern of effectively educating ELLs has been addressed by examining the effectiveness of improving English proficiency through language instruction programs (e.g., Slavin & Cheung, 2003), language-focused interventions (e.g., Tong, Irby, Lara-Alecio, Yoon, & Mathes, 2010), and teaching practices (Cirino, Pollard-Durodola, Foorman, Carlson, & Francis, 2005). Our study adds to the extant research by examining teachers' language modeling across classrooms with different language instruction programs. Consistent with prior studies, our results suggest that dual language models may offer ELLs an advantage that promotes not only English proficiency, but also bilingualism-and the positive benefits associated with it. Considering that Latino students perform significantly worse on average (see Table 4), but that ELLs-most of whom are Latino-perform at the same level as their non-ELLs peers, a dual language model may show promise not only for ELLs, but also non-ELLs who may be at-risk for school failure.
In conclusion, the present study is among the first to consider together the extant research on teaching behaviors (see Pianta & Hamre, 2009), language modeling (e.g., Aldridge, 2005), and language instruction programs (e.g., Salazar, 1998) when examining reading achievement for both ELLs and non-ELLs. In consideration of our findings, more research is necessary to determine the degree to which exposure to two languages may contribute to improving academic outcomes for not only Latino ELLs, but also other students at risk for school failure.