Formulaic Language Sequence In Different Languages English Language Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Wray (2002) defines a formulaic sequence as follows: ' a sequence, continuous or discontinuous, of words or other elements, which is, or appears to be, prefabricated: that is, stored and retrieved whole from memory at the time of use, rather than being subject to generation or analysis by the language grammar.'. In linguistic traditions which place generative, syntactic rules at the centre of theories and inquiries, formulaic and also irregular language becomes a marginal or at best separate phenomenon and rule governed, created language is accorded central, and often by implication a high status, considered to be indicative of the sophistication reached by human language(e.g. Pinker & Prince, 1991; Pinker, 1994). Most recent studies converge on the label formulaic as an umbrella term and refer to specific manifestations of the phenomenon with additional labels. These manifestations include (oral) narratives, prayers, proverbs, social routines, non-compositional idioms, (more or less) transparent idioms, collocations, lexical bundles, sentence stems, complex word forms, frequently used sequences of words and clauses, fixed sequences, sequences with open slots which can be filled subject to varying levels of constraints, community-wide sequences and idiosyncratic sequences. Accounts of formulaic language do not only point to the phenomenon as a linguistic one, they assume or suggest that formulaic sequences are processed and produced as wholes, that is as single units, rather than being analyzed or generated. This is said not only to apply to non-compositional language such as opaque idioms, but potentially also to sequences which can in principle be analyzed.

Since the revival of interest in formulaic language in the mid-nineties, studies have examined its role in non-native language as communicative, production and processing, and learning strategy, but not to equal degrees (Weinert, 2010). Most prominent has been the issue of native-like idiomaticity in the broad sense, that is, including non-compositional idioms, collocations and other conventional form-meaning pairings, their development and how learners may be helped towards their use. Recent research shows a vigorous engagement with the challenges inherent in studying formulaicity and in verifying usage-based accounts of language. In fact, the point that formulaic language is an aspect of usage-based language alerts us to the possibility, likelihood even, that we may never find the perfect methodology for demonstrating the difference between formulaic and non-formulaic sequences per se.


Here, I made an attempt to present a brief overview of five articles on the notion of formulaic language (sequences) and to notify the importance and difficulty of their learning and teaching.

1. Conklin and Schmitt (2008) investigated the processing advantage for formulaic sequences by comparing reading times for formulaic sequences versus matched non-formulaic phrases for native and nonnative speakers. Nineteen native English speakers, mostly undergraduates, formed the native group. The non-native group consisted of twenty L2 English speakers who were mostly postgraduates studying on the MA-ELT program and were of different L1 backgrounds. For the purposes of this study, the researchers chose mostly idioms as formulaic sequences, because idioms were clearly formulaic in nature since they represented idiosyncratic meanings which could not generally be derived from the sum of the individual words in the string. They extracted some formulaic sequences from the materials used in Underwood et al. (2004), and added a number more from the Oxford Learner 's Dictionary of English Idioms (Warren 1994). The candidate formulaic sequences were subjected to a frequency analysis based on the British National Corpus (BNC). Candidates with relatively low frequencies were deleted from the list. The formulaic sequences were supposed to be well-known to the native participants. Twenty control phrases were also created to correspond the formulaic sequences by reordering the words in the formulaic sequences, taking particular care that the controls could not be interpreted with either the idiomatic or literal meanings of the formulaic sequence, for example hit the nail on the head → hit his head on the nail. Only the function words were substituted and the main content words were always remained the same. The target phrases (i.e. formulaic sequence or control phrase) were embedded into twenty passages. The context of the passages was written to force either an idiomatic or literal interpretation of the formulaic sequences (e.g. take the bull by the horns = 'attack a problem' vs. 'wrestle an animal'). Totally, there were 60 target phrases: 20 formulaic sequences presented in contexts supporting their idiomatic interpretation, the same 20 sequences presented contexts supporting their literal interpretation, and the 20 control phrases. Thus each formulaic sequence had three conditions (idiomatic, literal, control). Each of the three conditions appeared once for each formulaic sequence, although not in the same passage. Each passage had at least two target phrases and no more than four. Content words in the target phrases were not used previously in the passage in order to avoid priming effects. Finally, a multiple-choice comprehension question for each passage was devised to ensure participants read the contexts conscientiously.

Passages were presented on a computer using a participant-paced line-by-line reading procedure. Each trial began by asking a participant to press the 'R' button to indicate they were ready to begin reading a passage. Once 'R' was pressed, the first line in a passage appeared. Participants pressed the spacebar when they finished reading the line, causing the phrase they were reading to disappear and the subsequent line to be displayed. Participants were asked to read each passage for comprehension as quickly as possible. Reading times were collected for each line. When participants finished reading a passage they were asked a comprehension question about it. Before beginning the experiment, participants read instructions that described the task.

The mean reading times for both the native and non-native speakers were submitted to separate one-way analyses of variance (ANOVA) with either participants or items as the random variable. The analysis of the data indicated that formulaic sequences in contexts supporting their idiomatic interpretation were read more quickly than control phrases for both natives and non-natives. Formulaic sequences in contexts supporting their literal interpretation were also read more quickly than control phrases by both groups. In addition, there was no reading time difference for formulaic sequences in contexts supporting their idiomatic interpretation and those in contexts supporting their literal interpretation. This pattern of results showed that even when a formulaic sequence is used literally, it had a processing advantage. Furthermore, this pattern held for both groups, referring to the fact that that even L2 speakers have a processing advantage for formulaic sequences no matter whether they are used idiomatically or literally.

2. Wood (undated) conducted a case study to investigate the effects of focused instruction of formulaic sequences and fluency on the performance of a female Japanese L2 learner of English in two monologic speaking situations separated by a six-week interval. The participant was enrolled in intermediate-level classes in the program, had been enrolled for a previous 12-week period and was living in a homestay situation with native speakers or native-like speakers of English. She was in her early twenties at the time of the data collection and was studying English in Canada for a year, having completed her undergraduate university degree in Japan.

During the interval, the learner participated in a series of weekly fluency workshops which focused on the key facilitating role of formulaic sequences. Her monologic speech was analyzed with respect to the length of runs between pauses and the speech rate, as well as the use of formulaic sequences. She was instructed to produce narratives spontaneously in the university language laboratory on topics of personal relevance, with no preparation time or use of notes to prepare for the talk. The first speech sample was produced on the first day of the series of fluency workshops, before the start of the activities, and the second was produced a week after the end of the workshops. The workshops consisted of sessions of 90 minutes per week for a total of nine hours over six weeks, following the input-automatization-practice and production-free talk sequence outlined below. The activities and the sequence were grounded in the existing literature on noticing, automatization and memorization, as well as the use of native speaker models and students as ethnographers.

The recordings were transcribed and hesitations marked in the transcripts. Speech rate (SR) was calculated as the number of syllables uttered per minute, and mean length of runs (MLR) was calculated as the total number of syllables uttered divided by the number of runs in a sample. Analysis of the data revealed some trends in the development of SR and MLR, the nature of learner use of formulaic sequences and the efficacy of focused instruction in formulaic sequences. In her initial narrative production, before the fluency workshop, the participant produced 18 formulaic sequences, 2 of which were present in the native speaker models of the workshop. In her second narrative, six weeks later, she produced 52 formulaic sequences, 18 of which were present in the native speaker models. That is, 11.8%of the formulaic sequences in the initial speech sample were from the workshop, while 36% of those in the second speech sample were from the workshop activities. The first sample consisted of 530 syllables overall, 60 (11.3%) of which were from formulaic sequences. The second sample consisted of 760 syllables overall, 95 (12.5%) of which were from formulaic sequences. In any event, it appeared that she was able to borrow the formulaic sequences from the workshop models, work them into her own repertoire and fit them into her own narrative quite effectively for the most part. While she did not borrow large chunks of narrative wholesale from the workshop models, she certainly did appear to have worked some useful sequences from the workshops into her own narrative. The result was increased fluency, particularly as measured by mean length of runs between pauses.

3. Bishop (2004) was interested in studying the problem second language learners have with noticing formulaic sequences. It was hypothesized that formulaic sequences are not noticed and as a result they are not learned. In order to prove this hypothesis, forty-four linguistically heterogeneous upper intermediate ESL students of both genders were pre-tested for knowledge of target items on a computerized version of the Vocabulary Knowledge Scale (VKS). Target items consisted of 20 low frequency words, and 20 formulaic sequences which were synonymous with those words were presented in random order for each subject. Subjects' reading levels were also assessed with a TOEFL reading subtest. No difference in reading ability between groups was found. A week later, to allow for forgetting of the pre-test, subjects were randomly assigned to a control condition and a treatment condition. All subjects read identical texts which differed from each other only in that formulaic sequences were made typographically salient (color and underlining) in the treatment condition. After reading, subjects were asked to answer 20 true-false questions focused on sentences containing the target words. Subjects accessed an online glossary by single clicking with the mouse on target words, and double clicking with the mouse on target formulaic sequences. Whenever a request was made for a gloss of an unknown target item, this was inferred by the experimenter to mean that the lexical item had been noticed. All mouse clicks were tracked and stored.

Unknown words were clicked on significantly more often than unknown formulaic sequences. However, making formulaic sequences typographically salient significantly increased the number of times formulaic sequences (FSs) were noticed, i.e., clicked on. This also appeared to increase global comprehension of the text. Since the data could not be safely assumed to be normally distributed, the non-parametric equivalent of the t-test (Mann-Whitney), and the paired t-test (Wilcoxon Signed Rank) were used. The number of clicks on unknown words in the treatment and control conditions did not vary significantly (Mann-Whitney p = 0.73), which was to be expected because the words were not altered typographically in either condition, and so were presented identically. The major problem for the learner seemed to be the need to learn large numbers of holistically stored formulaic sequences which must coexist with a potentially vast number of similar looking, but grammatically generated, sequences.

4. Schmitt (2005) recognized the formulaic language as an important component of language usage and focused on two characteristics of formulaic language: fixedness and variability. The researcher examined variability in three kinds of formulaic language i.e., idioms, variable expressions, and lexical bundles. In all of the corpus analyses in this paper, he referred to the Longman Corpus, a 100-million word corpus based primarily on the British National Corpus. First, he started with grammatical and lexical variation using the idiom stand shoulder to shoulder as in:

- […] where the grizzled heroes finally stand shoulder to shoulder.

Anytime a phrase has a verb in it, that verb is likely to change its inflection according to the tense. Therefore, we find cases of simple present, past, and continuous forms:

-While France stands shoulder to shoulder with Germany […]

-Now the trees were fenced with armed men standing shoulder to shoulder.

In addition to this kind of grammatical variation, there are also cases of different word choices which do not change the meaning to any great degree:

-He and I fought shoulder to shoulder against appeasement.

-[…] as they worked shoulder to shoulder in a school bus-size laboratory.

To explore how idioms can vary and to what degree, he first chose the idiom scrape the bottom of the barrel. This phrase has three content words: scrape, bottom, and barrel.

-The company evidently had to scrape the bottom of the barrel for material. (canonical form)

-I began to scrape the theoretical barrel-bottom.

-[…] the poor buggers scrape the barrel; the whole of their midfield couldn't […]

-Even to produce that list, he'd had to scrape the barrel a bit.

-Being a grunt, you were like the bottom of the barrel.

Therefore, for scrape the bottom of the barrel, it seems that only any two of the three content words scrape, bottom, and barrel are necessary, and they can be in any order. It seems that the English speech community uses many variants of these idioms, with little of either one being frozen in terms of being absolutely required, or even existing in a certain position in the idiom.

Another kind of formulaic language, variable expressions was also investigated. One common example which occurred 125 times in the corpus was:

─ think nothing of ─

It was found that in 125 out of the 170 cases of this expression (74%), the preposition of was included as part of the string, showing that it should indeed be considered as part of the canonical form, although the last two examples below show other variants.

-She thinks nothing of going out at ten o'clock at night.

-He thought nothing of playing in 10 or 11 consecutive events.

-[…] adolescents capable of subduing the earth around them and thinking nothing of it […]

-Your average person in the States thinks nothing about going to Bali.

-Alan Beith thinks nothing to striding round five villages.

As these examples show, other than the grammatical inflection of think and alternative prepositions in 25% of the instances, the 'fixed' element of this expression is actually quite stable. This analysis as well as others suggested that the fixed elements of variable expressions are actually more fixed than the fixed elements of idioms.

Lexical bundles are extended collocations- bundles of words with a tendency to occur together. They are identified by using a concordancer to isolate the words which occur in multi-word sequences a minimum number of times.

The first lexical bundle examined was have a look at, which occurred 756 times in the corpus.

-Let's have a look at your discovery.

-Let's have a look at what happens when we […]

The researcher searched the corpus for this string, but used a wildcard in place of the content word look. He found 1297 cases of have a X at, but several 100 line samples produced no substitute word which means "look". It seemed that this bundle had no common variant which substituted for content word look. Doing a search with X a look at produced 510 cases of take a look at. This could either be interpreted as a very common variant of have a look at, or take a look at could be considered as a separate bundle in its own right. Either way, there were very few variations besides these two main forms, indicating that this lexical bundle seemed to be relatively fixed.

The analysis of this lexical bundle and other examples showed that lexical bundles were not always fixed in the sense that they were the only form which could impart a certain meaning. Even here there is variation, although this probably depended a great deal on the individual bundle. If a bundle had a modal verb, it was likely to allow other modals; also, some bundles seemed to allow variation in content words while others did not. For example, in the 3-word lexical bundle I want to, want could easily be replaced by wish or like, but in the bundle the number of, it was difficult to think of any content word that could replace number and mean the same thing (amount and degree would change the meaning somewhat). However, the bottom line is that lexical bundles did contain variation.

How formulaic sequences with variation are stored and processed by the mind? The 'holistic storage' theory, where formulaic sequences are stored as individual memorized chunks, is the commonly espoused view. This approach seems to make good sense as long as the formulaic sequences are intact, unchangeable wholes, but poses problems if variation is inserted. For example, how are novel variations of the canonical form recognized? We know that proficient users are creative with formulaic sequences, and once a formulaic sequence is established in a speech community, it is often truncated or the order of the main components switched around. Consider these two examples:

-I hated taking the subways in Japan. The sardine-like train cars always made me sick. (Related to packed like sardines)

-Wasting time in meetings drives me crazy. The worst are the bush-beaters. -It would be better if they were expelled immediately. (Related to beat around the bush)

If you were able to make these relations and catch the meaning, then you were able to interpret these completely novel formulaic sequences, even though you have never seen these forms before, and even though the forms themselves are quite dissimilar to the underlying canonical form. The problem this presents for holistic storage is obvious; these forms were not previously stored in your mind, yet you were in all likelihood able to interpret them. It follows that some other process is necessary to help the mind make this type of connection. It seems that the difference in variation implies that there are differences in storage and processing as well. First, let us discuss idioms. The complete canonical sequence is stored and is used as a kind of template or exemplar. In this template, the key components would be the content words, particularly lower frequency ones which are less likely to combine widely. The template can be accessed via one or more of the content words, but it is not usually necessary for all of them to be recognized for access to occur. In contrast to content words, function words are less useful in accessing the template. The sense is that most templates have a 'core collocation' (usually made up of content words) which reliably leads to access of the template. Variable expressions have different characteristics. If we assume that formulaic sequences with some fixed elements and some open elements can be stored as an individual chunk, then the limited number of variants may well be stored as separate forms. In other words, if we assume that the following expressions:

a minute ago, a day ago, a week ago, a month ago, a year ago

are all actually stored as the single variable expression a (time period) ago, then there is only one form to store, as the fixed elements a and ago cannot be changed. Even in variable expressions which do contain some variation, it is usually not great, and the few additional forms would not pose an onerous burden to memory. Variation which does exist is often in the tense/modal constituents (stand shoulder to shoulder / stood shoulder to shoulder). Lexical bundles are similar to variable expressions in that they are relatively fixed, and so by analogy may also be stored individually. However, there is at least one key difference. Variable expressions have a close connection with meaning and functional language use. Lexical bundles on the other hand have been identified by corpus statistics and often have less obvious relationships with any particular meaning or language function. For example, you know what and the fact that occur frequently as part of language, but do not seem to realize any unique meaning or function in their own right. Rather, they are building blocks which come to gain meaning once combined with other words or lexical bundles. Given this lack of dedicated meaning, it is questionable whether they are actually stored in a formulaic manner at all. In fact, there is some preliminary evidence that at least some of them are not stored holistically.

5. Girard and Sionis (2004) study dealt with Formulaic Speech (FS) usage in the context of the partial L2 immersion class. The study was based on the observation and recording of oral communication in a class of fifteen children aged five to seven, learning English in a partial immersion program (some subjects of the curriculum are taught in English every afternoon) in a small primary school. In the partial immersion program observed in the study, the second language tended to be the main object of the teaching, except for one session during which the children used English to make a cake. During that session, the language production in English was far less important than in other sessions, because the children were not forced to produce through practice drills. The other sessions followed a role-play pattern, related to FS and gestalt learning. All the participants were French, with French-speaking parents, except for a Rumanian boy, whose mother tongue was Rumanian, who had already started learning English in an international school and was currently learning French. The recordings covered a period of two months, between the end of November 2000 until the end of January 2001. English was introduced to the children mainly as a subject to be learned rather than as a medium through which other subjects are learned, nevertheless, the observation did provide a few insights into the various functions of FS in second language production.

As far as data collection itself was concerned, researchers only transcribed sequences in which English was used and numbered them from (1) to (396), according to boundaries corresponding to a minimal exchange of information. What is meant by 'minimal exchange' is an interactive situation in which a piece of information provided by one or several speakers is complete.

Observations of FS in the partial immersion class indicated that it played a part in the adaptation of the learner's speech to the context, and therefore helped the learner to reach efficient communication with few linguistic tools. FS subsequently had an important role in the development of communicative competence, which goes along with social integration in the native speaker community.

FS might be integrated into the general speech structure either by juxtaposition, embedding or gradual evolution. In all cases, FS in the framework of linguistics is seen as a temporary stage of acquisition. If this is so, then FS is likely to be discarded once the next developmental stage is reached. But that does not seem to happen, and one might wonder if FS should be considered as a stage of acquisition or as a learning strategy.

It is clear that FS is not only an interlanguage short-term production strategy used as a response to a lack of structural knowledge (since it is also found in the speech of native speakers who presumably have full structural knowledge), but also a learning strategy which corresponds to choices in processing made by the learner.

The psycholinguistic function of FS to spare effort in interlanguage use has two advantages for the learner: It leads to greater fluency and gives time for pre-planning activities or other types of processing therefore providing a prop for acquisition processes.

FS is strongly linked to the communicative needs of the classroom situation, and it is more likely to appear in very socially and culturally defined situations. Therefore, learners have to be able to comprehend the communicative situation to adapt to the native speaker community. Formulaic sequences are efficient tools that allow learners to make themselves understood quickly and easily by native hearers.