The advent of technologies in the field of language pedagogy has created a vast opportunity for researchers, language teachers and learners to integrate technology into the language learning curriculum. As a result, all disciplines of the English Language such as phonetics, lexicology, graphology, grammar, and discourse analysis have been greatly influenced by technological developments during the past decades. Crystal (1995) stresses the research possibilities achieved by improvements in processing and analyzing huge language texts through computer systems. Undoubtedly, the great contribution of the computer sciences to language pedagogy has been observed within applied linguistics in constructing, processing, and analyzing language corpora (Johns, 1990). The corpus is "a collection of texts- written, transcribed speech, or both- that is stored in electronic form and analyzed with the help of computer software programs" (Conrad, 2005, p. 393). These substantial collections of language texts have been available to researchers for almost forty years, and they sketch a view of language structure that has not been available before (Sinclair, 2004).
The focus of corpora is on 'naturally occurring' texts which users of language create for a 'communicative purpose'. Corpora of spoken language represents a variety of genre types including daily conversations, lectures, seminars, theater, radio and television scripts and even classroom discourse. Written corpora can be classified into academic, journalistic, or literary prose (Conrad, 2005; Meyer, 2002; Biber & Conrad, 2001).
2. The concordancing software
The contemporary version of concordancing includes a software program that is able to analyze all instances of a linguistic form or structure in a corpus with the context in which the words is possibly used. On examining a word, for example, the program scans the texts in its storage, identifies all the occurrences of the word under examination, and lists these words within their immediate context (KWIC) shown on the screen (Barlow, 1996). These compiled lists are called concordance lists which enable teachers and learners to observe words in their natural contexts (Biber, Conrad, & Reppen, 1998; Sinclair, 1991; Tribble & Jones, 1990), so that they can see how they collocate with other words, which patterns they follow, which prepositions they go with, and so on (Willis, 1990).
Figure 2.1 KWIC results for the word 'reply 'adopted from Collins Wordbank Corpus
lung cancer will be the swift reply. It often coes as a surprise
have to send you a disappointing reply, but the criteria for listing
thing from the life, and Paul's swift reply that he did not do much
[p] They gave him a blunt reply. This industry had been
CASH BONUS for your prompt reply! [p] And don't forget -- you
please forgive this very tardy reply. Profound apologies. But allow
I'm often asked. The honest reply, which is that I smoke, sleep,
London), working honest, genuine reply only. [p] 8389-Kind hearted,
sure is the somewhat nervous reply. [p] 10.30 am [p] So here I am
Prime Minister gave a confident reply to that prediction: Just let
[p] Yes," came the cautious reply. [p] My name's Kolchinsky. I
then gave an astonishing reply: `The only thing I can say is
out Despite his light-hearted reply, George's compassion and
came my daughter's sleepy reply.  There was no time to be
Andrews," came the muffled reply. `I spoke to you earlier on
I know it," came the indignant reply. `I'll be in touch.' Thanks,
He's dead," came the exasperated reply. `My source was a senior
No, I don't," was his blunt reply. So you think he's already in
English," came the indifferent reply as if it was of little
mccrain," came the gruff reply as the man smoothed down his
doctors," was the disconcerting reply. Drugs are now so widespread
about Mr Hume in an impassioned reply, talking about `a tragedy of
Villa" receives the startling reply: `Suicide is a perfectly
And the Yorkshiremen's instant reply - inevitably coming from Tony
Gorbachev is expecting a speedy reply from Iraq to his plan for
3. Data- driven learning and language teaching
The use of concordance software makes the teaching context more learner-centered; learners can be inspired to discover new meanings and to identify lexical and grammatical collocations (Johns, 1990). Johns refers to the learners studying concordance lists as "language detectives whose task is to discover the rules of the language they are focusing on by finding, identifying and inferring these linguistic implications from context" (1997, p. 101).
He coined the term data- driven learning (DDL) to refer to all the corpora and concordance based activities (Johns, 1990; 1991). Hunston (2002, p.170) maintains that, "DDL involves setting up situations in which students can answer questions about language by studying corpus data in the form of concordance lines or sentences". The proponents of DDL contend that concordances are superior to conventional grammar books, dictionaries and course books, because "they provide easy access to huge amounts of `authentic' language in use, foster the learners' analytical capacities, promote their explicit knowledge of the L2, facilitate critical language awareness, and support the development of learner autonomy" (Gabel, 2001, p. 123). Johns (1991) argues that the use of traditional dictionaries as a means of vocabulary learning is a tiresome task which lacks productivity and, through limited and artificial examples for each definition, learners cannot fully grasp the various meanings of a single word. On the other hand, by searching for word context with the help of a concordance tool of corpora, learners are engaged in a more efficient language learning experience. Chen (2004) adds that the integration of corpora into vocabulary learning fosters learners' motivation for learning.
Collocations are defined as "the occurrence of two or more words within a short span of each other" (Benson, 1990, p.12).This group of words " have been variously called prefabricated units, prefabs, phraseological units, lexical chunks, multiword units, or formulaic sequences" (Nesselhauf, 2005, p. 1). In ESL/EFL learning, the knowledge of collocations is a requisite in achieving native-like competence. According to Lewis (1997) focus on collocations is more important than grammar in language teaching, since grammar examines only the most general rules of the language. MC Carthy (1990) emphasizes the importance of collocations and asserts that "collocation is a marriage contract between words and it forms an important organizing principle in the vocabulary of any language" (p.124). Kjellmer (as cited in Shei & Pain, 2000) states that 'automation of collocations' enable native speakers to utter sentences more fluently. Language learners, being incompetent in this automation, are thus less fluent when using the language. Similarly, Aston (1995) notes that " the use of a large amount of prefabricated items speeds language processing in comprehension and production alike, and thus creates native-like fluency" (p.64).
Hausmann (as cited in Nesselhauf, 2005, p.22) broadly divides collocations into six types:
adjective + noun (heavy smoker)
(subject-)noun + verb (storm-rage)
Noun + noun (piece of advice)
Adverb + adjective (deeply disappointed)
adverb + verb (severely criticize)
verb + (object-) noun ( stand a chance)
Benson et al. (Ibid.) add preposition to the above classification to make such combinations:
noun + prepositions (interested in)
preposition + noun (by accident)
adjective + preposition (angry at)
Collocations with such varied classifications are one of the most problematic issues in language learning (Dundley-Evans, 1994; McAlpine & Myles, 2003). Gui and Yang (2002) found that collocational mistakes were the most dominant mistakes for Chinese EFL students. Similar findings were also reported by Zarei and Koosha (2002) for Iranian students. Altenberg and Granger (2001) and Nesselhauf (2003) show that even advanced level students learning English usually face problems in using collocations.
The DDL approach attempts to provide language teachers and learners with a new perspective to enhance lexical competence. In this way, corpus consultation tasks through concordance tools can be effectively utilized to determine the collocational relationships among words. Sinclair (1997) and Granger (1998) argue that DDL should best be exploited to teach collocations which are sometimes problematic for native speakers as well.
5. Review of related research
Despite lack of enough empirical research, the available literature indicates a relatively positive impact of the corpora on learning vocabulary and collocations. Generally, research findings emphasize the benefits of corpus consultation for language learners (Bernardini, 2002; Kennedy & Miceli, 2001). Chan & Liou (2005) in their study investigated the influence of using five web-based practice units on English verb-noun collocations with the design of a web-based Chinese-English bilingual concordancer (keyword retrieval program) on collocation learning. The findings indicated that learners' use of collocation improved significantly immediately after the online practice but regression happened after sometime . Yet, the final performance was still better than students' entry level. In another study, Gordani (2012) examined the effect of the integration of corpora in general English courses on the students' vocabulary development for two groups of learners. The results revealed that the experimental group outperformed the control group on the posttest suggesting that the main effect of corpus integration has been significant.
Çelik (2011) conducted a research to examine the effects of DDL on Turkish EFL learners' achievement and retention of collocational competence comparing to dictionary use among two experimental groups in an online setting. The results revealed that pre and posttests did not show a significant difference. However, a later 'retention' test showed that the corpora-based learning group had a higher level of retention.
In the Iranian context, Koosha and Jaefarpoor (2006), in their study, tried to investigate the effectiveness of concordancing material through DDL approach on teaching /learning collocation of prepositions among two groups of university students. The findings of the study confirmed the positive effects of DDL approach on learning and teaching the collocation of prepositions. They also found the impact of L1collocational patterns into L2 production.
Due to the need for more research evidence on the effect of corpora and DDL approach on the learners' collocational competence in various settings, this study is conducted to investigate the effects of web-based concordancing instruction on Iranian EFL students' learning of collocations.
6. The Study
This experimental study will report on the effects of concordancing activities, performed through data-driven learning, on Iranian EFL learners' acquisition of collocations.
6.2. Research questions
Is DDL approach an effective way to enhance Iranian EFL learners' knowledge of collocations, compared to the traditional approach?
Is there any significant difference between controlled and experimental groups in terms of their acquisition of collocational vocabulary?
The participants of this study are supposed to be 100 Iranian University students, both male and female, majoring in English language. They are all sophomore students studying in two universities in Tehran. By random selection, they are divided into control and experimental groups.
6.4. Materials and instruments
First, the Michigan Test of English Language Proficiency will be administered to determine the homogeneity of the groups in terms of their English proficiency.
A collocational knowledge test is prepared by the researcher in order to determine the collocational knowledge of the students. The Chronbach alpha formula will be used to estimate the reliability coefficient of the test. It is given to all the participants as a pretest at the beginning and as a posttest at end of the semester to determine the impact of the specific instructions the participants are going to receive.
The concordancing software which will be used in this study is The Corpus with 65million words containing different genre, domain, register and mode.
Teaching materials on the collocations for controlled group are selected from several reading textbooks which are usually taught at this level. The textbooks are Concepts and Comments by P.Ackert & L.Lee (2005); Active Skills for reading 3, by Neil J. Anderson (2002); and Select Readings (Intermediate), by L. Lee and E. Gundersen (2002).
The materials for practicing collocations with the experimental group will be taken from The Collins Wordbank Corpus.
As mentioned above, first, the participants will randomly be divided into control and experimental groups. Then, to determine the homogeneity of the groups with regard to their English proficiency, the Michigan English Language Assessment Battery (MELAB) is administered to both groups.
In the next step, they are pretested by the test of collocations. Then, the overall internal consistency of the collocational test is calculated utilizing Cronbach Alpha formula. At the third stage, the participants attend the English vocabulary course, a weekly two-hour session , during a fifteen-week semester.
In each session, the control group receives the traditional-based instruction in teaching vocabulary whereby collocational patterns are taught through dictionary consultation and explicit explanation by the instructor in English or Farsi (participants' L1). The collocational materials used for this group are selected from their reading textbooks including, Concepts and Comments by P. Ackert & L. Lee (2005); Active Skills for reading 3, by N. J. Anderson (2002); and Select Readings (Intermediate), by L. Lee and E. Gunderson (2002).The experimental group, on the other hand, undergoes a direct data driven- based instruction in which collocations are introduced through concordancing lines presented to the participants in printouts. Subsequent data- driven tasks, based on the words collocating with the query word, are presented by the instructor to the participants (See Appendices).
At the final stage, after fifteen- session treatment, the posttest will be given in order to compare the mean scores of the groups.
6.6 Data analysis procedure
To find out the effectiveness of DDL approach on the students' collocational competence in comparison with the traditional approach, we have to compare the results of the pretest and posttest for the experimental and control groups. Thus, the collected data are subjected to descriptive statistics utilizing minimum, maximum, mean, and standard deviation. To determine the homogeneity of the group variances, the same procedure will be followed for the Michigan test.
For the second research question, an independent T - test will be conducted to determine the significance of the difference between the means of the two groups.
7. Implications of the study
Knowledge of collocations plays a significant role in acquiring native-like performance of EFL/ESL learners. So, the findings of this study will shed light on the effectiveness of DDL through concordancing on collocations and raise the practitioner's' awareness of the potentials of computer technology. On the other hand, corpus consultation enables language learners to notice, discover, and apply new technological methods of enhancing their proficiency and autonomy.
Since this study is conducted with lower level learners, compared to other research studies such as Jaefarpoor & Koosha (2006), the results will reveal the effectiveness of DDL for low intermediate level learners . As to see the other side of the coin, the limitations and shortfalls of the DDL will also be manifested through these experimental investigations. Thus, there will be room to examine concordancing activities with more precision in different educational settings.