Although a number of studies have been reported regarding segmental over the past fifteen years for an overview, see Ekman, 2003; Strange, 1995, there are only a small number of studies focusing on L2 stress in an EFL context.

On the other hand, suprasegmental properties, including stress, play an important role in second language acquisition. They are shown to be closely related to foreign accent perceived in L2 production and to difficulties in L2 perception. Researchers have attributed the problems with stress to the influence of the L1 prosodic system. However, these studies are inadequate, as their focus on stress acquisition mainly relies on the comparison of the phonological systems of L1 and L2. As Flege (1987) pointed out in research on L2 speech development at the segmental level, it is important to take phonetic details into account in order to gain a better understanding of the possible transfer of L1. The same is true for studies of prosody. It is possible that the influence of L1 lies in the difference between L1 and L2 in the employment of relevant phonetic correlates.

2.2. The history of pronunciation teaching within a theoretical framework

Popular opinion regarding the place of pronunciation training in the ESL or EFL curriculum has ebbed and flowed along with the historical framework of language learning theories and methodologies. Prior to the popularity of the direct method in the late nineteenth century, pronunciation received little overt focus within the language classroom.

Advocates of the direct method claim that an initial emphasis on listening without pressure to speak allows learners to acquire grammar inductively and to internalize the target sound system before speaking, much the way children acquire their first language (Celce-Murcia, Brinton, & Goodwin, 1996, as cited in Aufderhaar, 2004).

Although popular in elite private European schools, the direct method was rejected by the public schools and by most language schools in the United States as impractical due the classroom time, effort and background required of both the teacher and students for the success of this approach. Criticism centered around the time-consuming nature of this instruction at a time in which most students only studied foreign language for two years, along with a lack of qualified teachers who had a comfortable, native-like fluency at their command. As a result, this essentially intuitive-imitative approach gave way to the return of the grammar translation approach of the reading era, with very little attention to pronunciation (Celce-Murcia et al., 1996, as cited in Aufderhaar, 2004).

According to Aufderhaar (2004), both the direct and grammar translation methods were more emphasized when there was a sudden and urgent need for qualified interpreters and intelligence to learn English with the advent of World War II. Rooted in Skinner’s (1957) theory of behaviorism which treated the acquisition of verbal skills as environmentally-determined stimulus-response behavior, the audiolingual method required intensive oral drilling for entire working days, six days a week (E. R. Brown, 1997). In contrast to the grammar-translation method, pronunciation was now considered to be of the highest priority, with phonetic transcription and articulation explicitly taught through charts and demonstrations, along with imitation (Celce-Murcia et al., 1996, as cited in Aufderhaar, 2004).

While generally proving successful within the military environment of small classes of highly motivated instructors and students whose well being depended in part on their command of the target language, the theoretical foundation of audiolingualism was shaken by the reality of the post World War II language classroom that was not conducive to this military regimen. Its strongest critic was Chomsky (1957), whose introduction of the generative-transformational theory viewed the underlying meaning of the whole as being more important than any one part. His focus on the creative, rule-governed nature of competence and performance led many educators to the conclusion that pronunciation should remain inductively within the context of morphology and syntax (Kreidler, 1989). At the heart of this hypothesis was the suggestion that all language skills, including listening comprehension, verbal production and pronunciation, are so integrated that there is no need to address them as separate and distinct features (Brown, 1997).

The influence of Chomsky’s generative-transformational theory, along with the cognitive-code theory of the 1960s, which focused on listening at the discourse level and discarded skill ordering, paved the way for the trend to avoid or ignore direct pronunciation teaching altogether. The advent of the communicative approach in the late 1970s and early 1980s likewise deemed the teaching of pronunciation as ineffective and hopeless, instead it emphasized language functions over forms with the goal being overall communicative competence and listening comprehension for general meaning: MacCarthy (1976) stated that “at present any teaching of pronunciation is so ineffective as to be largely a waste of time.” (p. 212). At that time, many instructors of the communicative approach assumed that pronunciation skills would be acquired naturally within the context of second language input and communicative practice.

However, pronunciation was not entirely ignored in the time period of the 1960s through the mid 1980s. Remnants of the audiolingual approach lingered within structural linguistics, which viewed language learning as a process of mastering hierarchies of structurally related items for encoding meaning (Morley, 1991). When pronunciation was addressed, instruction was generally oriented toward the drilling of individual sounds via articulatory descriptions and minimal pair contrasts (Chun, 2002).

It is the reliance on this traditional phonemic-based approach which Leather (1987) mentions one of the reasons for the demise of pronunciation teaching during this era: “The process, viewed as meaningless non-communicative drill-and-exercise gambits, lost its appeal; likewise, the product, that is, the success ratio for the time and energy expended, was found wanting.” (Morley, 1991, p. 486). Attitudes ranged from serious questioning as to whether pronunciation could be overtly taught and learned at all (Chun, 2002), to unwavering claims that adults were simply unable to acquire second language pronunciation (Scovel, 1988).

According to Madsen and Bowen (1978), the lack of attention to pronunciation, which was prevalent in the communicative approach of the late 1970s and early 1980s and the direct assertion by many that pronunciation could not be taught, resulted in a great number of international students who were failed communicate effectively or even intelligibly although they had been instructed for a long time. This situation sparked research in second language acquisition that suggested a departure from the traditional, bottom-up phonemic-based approach to pronunciation teaching toward a top-down orientation focusing on suprasegmental or prosodic aspects such as rhythm, intonation, and duration.

Defined by Wennerstrom (2001, as cited in Aufderhaar, 2004) as “a general term encompassing intonation, rhythm, tempo, loudness, and pauses, as these interact with syntax, lexical meaning, and segmental phonology in spoken texts” (p.4), prosody has historically been ignored or relegated to the fringes of research and pedagogy, due in large part, according to Chun (2002), to its inherent complexity and difficulty mastering it. Considered notoriously difficult to acquire and define, Bolinger (1972) labeled the most controversial aspect of prosody, intonation, the “greasy part of language.”

Despite its historical back-seat status, an undercurrent of research regarding prosody has spanned several disciplines. The first documented study of speech melody has been traced back to Steele (Couper-Kuhlen, 1993, as cited in Aufderhaar, 2004), who, in 1775, used musical notation to identify pitch variations that occur in regular forms upon syllables. Unfortunately, his materials, based on five features he identified as accent, quantity, pause, emphasis and force were dependent upon fixed and absolute musical pitches rather than flexible and relative tones, apparently lacking in practical applicability (Pike, 1945).

2.3. Pronunciation research in applied linguistics

Although attaining native-like pronunciation that facilitates mutual intelligibility is considered important for many language learners and teachers alike, there have been few empirical studies of pronunciation in applied linguistics (Derwing & Munro,2005; Levis, 2005). For example, Derwing and Munro (2005, p. 386) state that “it is widely accepted that suprasegmentals are very important to intelligibility, but as yet few studies support this belief.” This claim is supported by other researchers such as Hahn (1994) and Levis (2005) who states that over the past 25 years there has been encouragement to teach suprasegmentals though very little pedagogy has been based on empirical research.

The usefulness of empirical research for developing more effective pronunciation teaching is obvious. As Levis (2005) states, “instruction should focus on those features that are most helpful for understanding and should deemphasize those that are relatively unhelpful” ( pp. 370-371). Munro (2008) echoes this point when stating that “it is important to establish a set of priorities for teaching. If one aspect of pronunciation instruction is more likely to promote intelligibility than some other aspect, it deserves more immediate attention.” (p. 197). Of course, we must first know what the most important elements are to ensure optimal instruction and learning outcomes. As Munro (2008) argues, “because prosody encompasses a wide range of speech phenomena, further research is needed to pinpoint those aspects of prosody that are most critical” (p.210).

Hahn (2004, p. 201) agrees that there is little empirical support for claims that teaching suprasegmentals is helpful and that “knowing how the various prosodic features actually affect the way native speakers…process nonnative speech would substantially strengthen the rationale for current pronunciation pedagogy.” For that reason, Hahn (2004) reiterates that it is important to identify the phonological features that are most salient for native listeners. Due to the complex relationship between suprasegmentals and intelligibility, Hahn (2004) argues that “it is helpful to isolate particular suprasegmental features for analyses” (p. 201). Hahn’s argument supports the importance of the research in this dissertation in which the acoustic correlates of English lexical stress are isolated and manipulated individually to identify which are the most pertinent to the perception of speech intelligibility and nativeness.

Levis (2005) states that pronunciation teaching has been a study in extremes in that it was once considered the most important aspect of language learning (when audiolingual methods were favored) and then became very much marginalized in communicative language teaching. Of the research that has been carried out, such as that on intonation patterns, little of it finds its place in pronunciation textbooks (Derwing, 2008; Derwing & Munro, 2005; Levis, 2005; Tarone, 2005). Therefore, there is a need to first fill a gap in empirical research treating aspects of second language pronunciation and then to ensure that these findings are relayed to professionals in the fields of education and applied linguistics so that L2 students can benefit from these findings.

Once a general framework for the delivery of instruction is chosen, the next step in designing a course of any type is to consider the needs and desires of the students and create course objectives and learning outcomes. As stated earlier, ESL students are typically concerned with issues such as intelligibility, accent and nativeness. Students often voice their goals regarding attaining proficiency in these areas and teachers should consider which goals are realistic (Avery & Ehrlich, 1992). To do so, the students’ current abilities must be assessed in order to target strategies that will help achieve these goals.

Assessing students’ abilities is crucial in planning pronunciation teaching. Derwing (2003; 2008) stresses that each student should be assessed individually to identify the student’s strengths and weaknesses and determine individual needs in pronunciation. These assessments can be done in a formal or informal way by the teacher and can include self-reports or self-assessments by the students. Self-assessments by students can provide insight into the students’ perceived needs, although these needs may be biased by the students’ previous experience with pronunciation instruction. Derwing (2003) found that “of the pronunciation problems identified [by the students], roughly 79% were segmental [in nature], while only 11% were related to prosody.”(p.554). In other words, students are simply more aware of segmental elements than they are of prosodic ones due to more previous instruction on segmental elements.

Once evaluations have been completed, the question becomes how to address the language learners’ pronunciation issues. A complication arises at this point because students in ESL classes typically come from very mixed language backgrounds. Even the varying needs of students in EFL classrooms, where all learners are from the same native language background, can be challenging as individual students have individual needs.

Therefore, integrating pronunciation lessons into class activities can be challenging in ESL classrooms as a particular speaker (or group of speakers) may have little difficulty with a particular element of pronunciation while others have great difficulty. A well-known example is Japanese speakers’ difficulty acquiring /r/ and /l/ (Bradlow, 2008) which does not cause any trouble for Spanish speakers. As Derwing (2003) advises, focusing heavily on segmental instruction in mixed classrooms is inappropriate due to the variety of language backgrounds and, therefore, prosody should be emphasized as it can have greater importance for a larger diversity of students. Derwing (2008) also argues that instruction in prosody transfers better to spontaneous speech than instruction on segmentals.

Many instructors are reluctant to teach pronunciation and often unsure how to go about doing it (Derwing & Munro, 2005; Hewings, 2006) as they feel underprepared or have little support in terms of course materials. Derwing (2003) estimates that only about 30% of pronunciation teachers have formal linguistic training in pronunciation pedagogy. To address this issue, it is important that empirical research on pronunciation be conveyed in a clear manner to language teachers so that they can pass this information along to students.

To be certain, pronunciation should be considered an important element of ESL classroom instruction. It has been noted above that pronunciation is implicated in critical elements of communication such as speech intelligibility, and can also affect perception of nativeness. In addition, accurate pronunciation is critical for students needing to pass standardized English tests such as the Test of English as a Foreign Language (TOEFL) and the International English Language Testing System (IELTS) for entrance into colleges and universities in English-speaking countries, or when interviewed by entities such as the Foreign Service Institute which assesses not only a person’s grammar and vocabulary but also comprehension, fluency and accent in oral interviews (Varonis & Gass, 1982).

Pronunciation is also a key element in programs that prepares international teaching assistants to become teachers in American classrooms (Hahn, 2004; Wennerstrom, 1998).

2.4. The reasons for teaching pronunciation

One of the most urgent reasons for effective pronunciation instruction centers on the large number of non-native English speakers attending American colleges and universities. According to The Institute for International Education, these students numbered 547,867 in the 2000/2001 school year, with a substantial number serving as graduate teaching assistants. The increase in the hours of classroom instruction given by non-native speakers has led to a corresponding decrease in student satisfaction with the quality of instruction, due mainly to the reported difficulty following non-native classroom presentation (Ostrom, 1997, as cited in Aufderhaar, 2004).

A survey by Shaw (1985, as cited in Aufderhaar, 2004 ) revealed that having an instructor with foreign-accented speech is the highest of six areas of potential frustration for college students. Accordingly, previous studies conducted by Hinofotis and Bailey (1980) on non-native university teaching assistants revealed a threshold level of understandable pronunciation in English, below which the non-native speaker will not be able to communicate orally regardless of his or her level of control of English grammar and vocabulary. While some instructors and administrators within the field have historically dismissed these problems simply as a matter of not having enough exposure to the spoken target language (Moy, 1986), other well-meaning instructors attempting to deal with this need have often relied on minimal pair drills, repetition and articulatory instruction with poor results (MacDonald, Yule, & Powers, 1994).

According to Aufderhaar (2004), research in second language acquisition that suggested a departure from the traditional, bottom-up phonemic-based approach to teaching from a top-down orientation emphasizing suprasegmental or prosodic aspects such as rhythm, intonation, and duration revealed a need to increase the adult learners’ awareness of suprasegmental patterns of the target language at the discourse level.

Chun (2002) advocates five principles for teaching intonation, including sensitization, explanation, imitation, practice activities, and communicative activities, and stresses the need for focused listening practice requiring the identification of suprasegmental features within a context of various authentic speech samples representing different speaker roles and relationships.

2.5. The sound system of English

According to the Contrastive Analysis Hypothesis (CAH) the unequal features between languages are the main source of errors. Lado (1957, as cited in Gass & Selinker, 2008, p.96) claims that “those structures that are different will be difficult because when transferred they will not function satisfactorily in the foreign language and will therefore have to be changed”. In order to understand the role of the first language in the phonological acquisition of the second language, emphasis has been given to the studies that have focused on the differences between English and Persian phonological systems. As Celce-Murcia, Brinton, and Goodwin (1996) state:

“all languages are unique in terms of their consonant and vowel systems. In linguistics, these distinctive characteristics have been divided into segmental and suprasegmental features. The segmental features of a language relate to consonants and vowels, whereas suprasegmental aspects of a language are involved with word stress, intonation, and rhythm” (p. 35).

2.5.1 English Consonants and Vowels

Standard American English includes 24 consonants and 22 vowels and diphthongs; however, a study performed on American English asserted that “there are similarities among consonants that permits us to classify them into groups; the classification can be done according to various criteria” (Olive, Greenwood, & Coleman, 1993, p. 22). They suggested that consonants could be classified based on voice, place, and manner of articulation; therefore, according to their common characteristics, which include their location inside the mouth, they can be grouped together (Olive, Greenwood, & Coleman, ibid, p. 22). Table 2.1 presents the English consonants.

Table 2.1. English Consonants























































The most common vowels in English have been classified in accordance with how the tongue shapes them, and “while the consonant sounds are mostly articulated via closure or obstruction in the vocal tract, vowel sounds are produced with a relatively free flow of air” (Yule, 2006, p. 38). Therefore, vowels can be classified based on the movement of the tongue, lips, and jaw. The vowels of English have been characterized as low, mid, or high, which describe the height of the tongue, whereas features such as front, central, or back refer to the position of the tongue inside the mouth (Barry, 2008, p. 21).

Table 2.2. English Simple Vowels





















2.6. The Pronunciation Errors of Persian Speakers and the Negative Transfer of Learned L1 Habits into English

Major (2001) addressed the issues in L2 phonology and how L1 phonological features can be transferred to the L2 when the sound pattern and word stress of the L2 differs from the L1. A foreign or nonnative accent can be detected more easily in a formal and longer conversation because in short conversation the speaker can produce words or sounds that are similar to the L2 in terms of segmental and suprasegmental features of language. Therefore, “then overall impression concerning native speakers from whether or not and to what degree a person sounds native or nonnative is called global foreign accent” (Major, 2000, p. 19). The measurement of global foreign accent is essential as it indicates at what stage of language development pronunciation is acquired.

Moreover, Nation and Newton (2009) stated that the goal of pronunciation instruction is to increase the intelligibility of second language speakers although factors such as age, L1, perspectives, and attitudes of the learner can affect the learning of second language phonological system. “There is clear evidence that there is a relationship between the age at which a language is learned and the degree of foreign accent” (Patkowski, 1990, as cited in Nation & Newton, 2009, p. 78). However, pronunciation has been identified as one of the important aspects of second language acquisition as it plays a crucial role in spoken conversational interactions and intelligibility.

Although some studies indicated that it is impossible for adult learners to acquire native-like pronunciation, Boudaoud and Cardoso’s (2009) study suggested that learners’ proficiency level in English could affect their pronunciation. They compared the phonological features of Persian with four languages: Spanish, Japanese, Portuguese, and Arabic and asserted that these languages prevented their speakers from producing the /s/ consonant when learning English. The study focused on four research questions related to the production of /s/ consonant by Persian speakers and the factors that affect the acquisition of English as a second language. The findings indicated that /st/ and /sn/ were more difficult to produce than /sl/ and suggested that error production decreased as the proficiency level increased.

Furthermore, Paribakht (2005) “examined the relationship between first language (L1; Persian) lexicalization of the concepts represented by the second language (L2; English) target words and learners’ inferencing behavior while reading English texts” (p. 701). This study emphasized the pronunciation errors that English majors produce in Iran when they read English texts. The study asserted that students’ errors in reading stemmed from their lack of knowledge in English vocabulary rather than the inability to produce the English sound system. The research questions examined whether lexicalization helped students identify the meaning of unfamiliar words. The findings also showed that students relied on their L1 when they were not provided with the synonym of an unfamiliar word.

Sadeghi (2009) focused on “collocational differences between the L1 and L2 and [suggested] implications for EFL learners and teachers” (p.100). This study addressed the errors that Iranian EFL students make when they learn English, and it stated that these errors stemmed from the differences between Persian and English. The study compared Persian and English collocations and focused on the transfer of L1 habits into L2. The aim of the study was to find out whether students made the same errors based on their proficiency level in the English language. Lower level students tend to transfer L1 habits into L2 more frequently as a result of their lack of knowledge in the target language. However, transferring Persian vowels and diphthongs into English pronunciation can also be observed by advanced learners of English.

Research related to the difference between phonological systems in English and Persian provide a general overview of the difficulties ESL students may encounter when teachers focus on pronunciation, intonation, and word stress.

2.6.1. Common consonant errors of Iranian EFL learners

Persian speakers tend to place a vowel after each consonant; therefore, the following errors can be predicted when Persians pronounce English words: Bread, script, and scramble are pronounced as [bɛɹɛd], [É›skiɹipt], and [É›skɛɹæmbÉ›l]. Furthermore, according to the contrastive analysis of English and Persian conducted by Yarmohammadi (1969, 1996) and Wilson and Wilson (2001), the following negative transfer of learned L1 habits into English can be expected from Persian speakers of English.

1. Stop consonants such as /p/, /b/, /t/, /d/, k/, /g/ are articulated with a stronger puff of air. /k/, /p/, /g/ and /t/ become aspirated when they are placed in the post coda position. Words such as bank, tap, king, and rest are pronounced as [bænkÊ°], [tæpÊ°], [kɪngÊ°], and [È·É›stÊ°].

2. Fricatives such as /v/, /θ/, /ð/, and /s/ are substituted and articulated for other consonants such as /w/, /t/ and /s/, /z/ and /d/, /ɛs/ (no initial consonant cluster). West, three, father, and school are pronounced as [vɛstʰ], [sɛȷi] or [tɛȷi], [fαdɛȷ], and [ɛskul].

3. Nasal consonant /ŋ/ is articulated as /n/ and /g/. Therefore, sing is pronounced as [sɪngʰ]. /m/ and /n/ are also articulated with a stronger puff of air and they may sound like /ɛm/ and /ɛn/.

4. Lateral liquid consonant /l/ can be pronounced with a stronger puff of air /ɛl/ when it is placed at the end of a word such as tell.

5. The retroflex liquid /È·/ is trilled and it is produced with the vibration of the tongue.

6. The glide consonant /w/ is replaced by /v/ since /w/ does not exist in Persian consonants. Therefore, flower is articulated as [fɛlavɛɹ].

2.6.2. Common vowel errors of Iranian EFL learners

According to the contrastive analysis of English and Persian conducted by Yarmohammadi (1969, 1996) and Wilson and Wilson (2001), the following negative transfer of learned L1 habits into English can be expected from Persian speakers of English:

1. /É›/ and /æ/ can substitute for one another; therefore, [bæt] is articulated as [bÉ›t].

2. /ʌ/ replaces /α/. [lʌk] is articulated as [lαk].

3. /ÊŠ/ replaces /u/. [ful] is pronounced as [fÊŠl].

4. /ɪ/ replaces /i/. [bit] is articulated as [bɪt].

5. /j/ replaces /i/ if placed in initial position. [twin] is articulated as [tujin].

2.7. The Importance of suprasegmentals and stress in L2 acquisition

2.7.1 The importance of suprasegmentals

Pronunciation is always a difficult step in learning a second or foreign language, especially for adults. Learners may have acquired perfect reading and writing skills while still being unable to communicate functionally in L2.

Problems in pronunciation can be traced to segmental as well as suprasegmental difficulties. Although most previous research has been conducted on the segmental level, recent studies show that suprasegmentals may play a more important role than segmentals in the acquisition of a second language phonological system (Anderson, Johnson & Koehler 1992, Derwing, Munro & Wiebe, 1998). Anderson, et al (1992) investigated the nonnative pronunciation deviance at three different levels: syllable structure, segmental structure and prosody. The correlation between the actual deviance at the three levels and nonnative speakers’ performance on the Speaking Proficiency English Assessment Kit (SPEAK) Test was calculated. It was shown that while all three areas had a significant influence on pronunciation ratings, the effects of the prosodic variable were the strongest.

In Derwing, Munro, and Wiebe’s (1998) study, native speakers were invited to evaluate the final results of three types of instruction, i.e. segmental accuracy, general speaking habits and prosodic factors, and no specific pronunciation instruction, after a 12-week pronunciation course. Treated in three different ways, three groups of ESL learner reading sentences and narratives at the beginning and end of the course were recorded. Both the first and second groups, who received pronunciation instruction, showed significant improvement in sentence reading. However, only the second group, where prosodic factors were included in the instruction, showed improvement in accentedness and fluency in the narratives.

In Johansson’s (1978, as cited in Wang, 2009) study of Swedish-accented English speech, segmental and non-segmental errors were compared in terms of accentedness scores. Native English judges were presented with two kinds of production, those with native English intonation but segmental errors on the one hand, and those with nonnative intonation (Swedish-accented) but no segmental errors on the other. Higher scores were assigned to productions with native-like suprasegmental characteristics but poor segmentals.

In a more recent study, Munro (1995, as cited in Wang, 2009) used low-pass filtered English speech produced by Mandarin speakers for accent judgment. Untrained native English listeners were invited to rate the speech samples. It was found that non-segmental factors such as speaking rate, pitch patterns and reduction contribute to the detected foreign accent in Mandarin speakers’ production and that their foreign accent can be detected based solely on suprasegmental information.

In addition, some recent studies have, therefore, focused on stress production with nonce words of English. For example, Pater (1997, as cited in Altmann, 2006) investigated the stress placement patterns for English nonce words by both English native speakers and French learners of English. While this study varied syllable weight within words, it used a rather small set of items. The native English speakers exhibited a stress placement pattern that was basically identical to the Latin stress rule (i.e., stre


