The contribution of computers in facilitating language learning has been immense in recent years. Corpus linguistics, an approach to linguistic research, is totally dependent on the computer technology for its analyses of language. The corpus is ''a collection of texts- written, transcribed speech, or both- that is stored in electronic form and analysed with the help of computer software programs'' (Conrad, 2005,p.393). These substantial collections of language texts have been available to researchers for almost forty years, and they sketch a view of language structure that has not been available before (Sinclair, 2004).
The focus of corpora is on "naturally occurring " texts which users of language create for a ''communicative purpose ". Corpora of spoken language represents a variety of genre types including daily conversations, lectures, seminars, theater, radio and television scripts and even classroom discourse. Written corpora can be classified into academic, journalistic, or literary prose (Conrad ,2005; Meyer, 2002; Biber & Conrad, 2001).
Since the early 1990s there has been an upsurge of interest in applying the findings of corpus-based research to language pedagogy. The use of corpora created a revolution in reference publishing materials, such as dictionaries and reference grammar sources (McEnery &Xiao, 2011 ) They have extensively utilized corpus data in one way or another so that 'even people who have never heard of a corpus are using the product of corpus-based investigation' (Hunston 2002: 96). The Collins COBUILD English Language Dictionary (Sinclair, 1987) was published as the first 'fully corpus-based' dictionary. The Longman Grammar of Spoken and Written English (Biber et al, 1999) can be considered as a breakthrough in reference publishing following Quirk et al's (1985) Comprehensive Grammar of the English Language. Longman Spoken and Written English Corpus with as many as 40 million words, gives 'a thorough description of English grammar, which is illustrated throughout with real corpus examples, and which gives equal attention to the ways speakers and writers actually use these linguistic resources' (Biber et al, 1999 p. 45).
A number of scholars have tried further to critically analyse the existing teaching syllabus and materials based on corpus data. Mindt (1996, cited in McEnery &Xiao,2011), for example, reports that the grammatical structures introduced in textbooks for teaching English are considerably different from the use of these structures in L1 English. that they teach 'a kind of school English which does not seem to exist outside the foreign language classroom' (Mindt 1996: 232).oooooooooooooooooooooooooooooooooooooooooooooooooooooooo
The present paper attempts to explore the impact of language corpora on teaching English, illustrating the teacher and learner's interaction with corpora and their roles in a corpus based classroom.
The application of corpora in language teaching
In direct application of corpora (Leech, 1997), language learners and teachers get their hands on corpora and concordances ( index of alphabetical listing of every word in a text produced by computer or machine), themselves and find out about language patterning and the behaviour of words and phrases in an "autonomous" way (cf. Bernardini 2002, 165). Johns made a suggestion to "confront the learner as directly as possible with the data, and to make the learner a linguistic researcher" (Johns 2002, 108). Johns (1997, 101) also referred to the learner as a "language detective" and invented the expression "Every student a Sherlock Holmes!" to underscore the active role of learner and described the computer and the concordancer as "a research tool for both learner and teacher" (1986: 151). This method, in which there is either an interaction between the learner and the corpus or, in a more controlled way, between the teacher and the corpus is coined by Johns (1991) under the label "data-driven learning" or DDL. He describes DDL approach in this way:
What distinguishes the DDL approach is the attempt to cut out the middleman as much as possible and give direct access to the data so that the learner can take part in building his or her own profiles of meanings and uses. The assumption that underlines this approach is that effective language learning is itself a form of linguistic research, and that the concordance printout offers a unique resource for the stimulation of inductive learning strategies - in particular, the strategies of perceiving similarities and differences and of hypothesis formation and testing.( Johns, 1991 p: 30)
Jones initiated his work with the concordancing software MicroConcord as a tool for learners to use, although he also recognised its usefulness for the teacher and linguistic researcher (1986: 158). Since 1990s more sophisticated concordancers, such as WordSmith Tools (Scott 2004) and MonoConc (Barlow 2000) and more recently free web-based concordancers, such as AntConc, have become available. In addition to these, The Cobuild Concordance and Collocations Sampler, is an invaluable source of corpus data which allows the user to type a search word and have immediate access to forty examples of its use. Obviously, this has the benefit of providing teachers with examples of actual language use to support their teaching (Chambers, ).
Activities that ask learners to analyze corpus data are in line with a variety of current principles in language learning theory, as has been pointed out by a number of corpus linguists (see, e.g., Aston, 1995; Bernardini, 2001; Gavioli, 2001; Gavioli & Aston, 2001; Johns, 1994; Leech, 1997; Willis, 1998, cited in Conrad, 2003). Johns (1991) identifies three stages of inductive reasoning with corpora in the DDL approach: observation (of concordanced evidence), classification (of salient features) and generalization (of rules). First, learner autonomy is enhanced as students learn how to observe language and make generalizations, instead of being dependent on a teacher. With the observations and generalizations, the hypothesis formation and testing are enhanced , and this results in the interlanguage progress. Furthermore, corpus analysis activities are easily designed to promote noticing and grammatical consciousness-raising (Rutherford, 1987; Schmidt, 1990; Williams,2005, cited in Conrad, 2005).
Gilquin and Granger (2010) explain the pedagogical functions of DDL which could be summarized as follows:
Exposing learners to authentic language, resulting in vocabulary expansion or heightened awareness of language patterns.
Corrective function to improve writing by comparing learner's writing with data produced by (native) expert writers or by consulting a learner corpus
involving the learner with discovery; learner as traveller ((Bernardini 2001: 22), learner as researchers , or detectives (Johns 1997:101)
'Empowering' learner (Mair 2002), which has the effect of boosting his/her confidence and self-esteem.
Acquiring essential learning skills by learners to explore the language such as predicting, observing, noticing, thinking, reasoning, analysing, interpreting, reflecting, exploring, focusing, making inferences (inductively or deductively), guessing, comparing, differentiating, theorising, hypothesising, and verifying (O'Sullivan, 2007 p.277)
It is noteworthy, however, to mention that the key to successful data-driven learning, even if it is student-centred, is the appropriate level of teacher guidance or 'pedagogical mediation of corpora' ( Widdowson, 2003) depending on the learners' age, experience, and proficiency level, because 'a corpus is not a simple object, and it is just as easy to derive nonsensical conclusions from the evidence as insightful ones' (Sinclair 2004: 2). In this sense, language teachers should receive adequate training in corpus analysis.
Data-driven learning material
The literature on DDL introduces quite a variety of corpora: written, spoken or multimodal, monolingual or bilingual, general or specialised, native or non-native, tagged or untagged, etc. (Gilquin and Granger, 2010) As can be expected, however, every corpora is best suited for particular purposes. The important issue in the choice of corpora is its 'authenticity'. In fact any type of corpora are always authentic for they hold naturally occurring language data. However, Widdowson (2000), distinguishes text production from text reception and argues that corpora may lack authenticity at the receptive end, even though they were initially authentic.
Despite the little empirical evidence of the effectiveness of corpus-based techniques for language learning, there are a variety of theoretical reasons for using them and many reports by teachers of student interest and improvement (Conrad, 2005).
Research reports in Conrad: In ESP applications, Weber (2001) argues that undergraduate writing of formal legal essays improved when the course included the use of concordancing with a corpus of professional essays. Students determined correlations between the generic structure of the essays and the use of certain lexical and grammatical structures. This form-focused activity succeeded in making the essay-writing process "more manageable for the student," Weber reports (p. 15). Similarly, Collins (2000) and Foucou and Kubler (2000) find corpus-based activities useful for business and computer science students, respectively. Donley and Reppen (2001) discuss the use of concordancing with EAP students, to teach general academic vocabulary that is used in many disciplines