Teaching Pronunciation The Key Features Of Pronunciation English Language Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

According Kelly phonemes are the different sounds within a language. Although there are slight differences in how individuals articulate various sounds, we can still describe reasonable accuracy how each sound is produced. When considering the meaning, we see how the use of one sound instead of another can change the whole meaning of the word. This is the principle that defines the total number of phonemes contained in a particular language.

Sounds may be defined as voiced or unvoiced (sometimes referred to as 'voiceless'). Voiced sounds occur when the vocal cords in the larynx are worked and vibrated. It is easy to tell whether a sound is voiced or not by placing one or two fingers on your Adam's apple. If you are producing a voiced sound, you will feel its vibration; if you are producing an unvoiced sound, you will not feel it.

The set of phonemes can be divided into two categories: vowel sounds and consonant sounds. However, these do not necessarily correspond to the vowels and consonants we are familiar with in the alphabet. Vowel sounds are all voiced, and may be single (like /e/, as in let) or combination, involving a movement from one vowel sound to another (like /ei/, as in late); such combinations are known as diphthongs. An additional term used is triphthongs which describes the combination of three vowel sounds (like /aʊə/in our or power). Single vowel sounds may be short (like /ɪ/, as in bit) or long (like /i:/, as in beat). The symbol /i:/denotes a long extended sound.

Consonant sounds may either be voiced or unvoiced. It is possible to identify many pairs of consonants which are essentially the same except for element of voicing (for example /f/, as in fan and /v/, as in van).

Suprasegmental features: phonemes, as we have seen, are units of sound which we can analyse. They are also commonly known as segments. Suprasegmental features, as the name implies, are features of speech which generally apply to groups of segments, or phonemes. The features which are important in English are stress, intonation, and how sounds change in a connected speech.

With regard to individual words, we can identify and teach word stress points. Usually one syllable in a word will sound more prominent than the rest, as in Paper or Bottle. The stress points in the words are usually indicated in dictionaries.

With regard to utterances, we can analyse and teach intonation as well as stress, although as features they can at times be quite hard to consciously recognize and to describe. Stress gives rhythm to speech. One or more words within each utterance are selected by the speaker as worthy of stressing, and thus made prominent to the listener. Intonation, on the other hand, is the way in which the pitch of the voice goes up and down in the course of an utterance. Utterance stress and intonation patterns are often linked to the communication of meaning.

The history and scope of pronunciation teaching

The field of modern language teaching has developed two general approaches to the teaching of pronunciation: (1) an intuitive-imitative approach and (2) an analytic-linguistic approach. Before the late nineteenth century only the first approach was used, occasionally supplemented by the teacher's or textbook writer's impressionistic (and often phonetically inaccurate) observations about sounds based on orthography (Kelly 1969).

An intuitive-imitative approach depends on the learner's ability to listen to and imitate the rhythms and sounds of the target language without the intervention of any explicit information; it also presupposes the availability of good models to listen to, a possibility that has been enhanced by the availability first of phonograph records, then of tape recorders and language labs in the mid-twentieth century (Celce-Murcia, Briton, Goodwin, 1996).

An analytic-linguistic approach, on the other hand, utilizes information and tools such as a phonetic alphabet, articulatory descriptions, charts of the vocal apparatus, contrastive information, and other aids to supplement listening, imitation, and production. It explicitly informs the learner and focuses attention on the sounds and rhythms of the target language. This approach was developed to complement rather than to replace the intuitive-imitative approach, which was typically retained and used in tandem with the phonetic information (Celce-Murcia, Briton, Goodwin, 1996).

"Direct Method" is another way of teaching foreign languages which first gained popularity in the late 1800s and early 1900s, pronunciation is taught through intuition and imitation; students imitate a model - the teacher or a recording - and do their best to approximate the model through imitation and repetition. Successors to this approach are the many so-called naturalistic methods, including comprehension methods that devote a period of learning solely to listening before any speaking is allowed. Examples include Asher's (1977) Total Physical Response and Krashen and Terrell's (1983) Natural Approach.

The first linguistic or analytic contribution to the teaching of pronunciation emerged in the 1890s as part of the Reform Movement in language teaching. This movement was influenced greatly by phoneticians such as Henry Sweet, Wilhelm Viëtor, and Paul Passy, who formed the International Phonetic Association in 1886 and developed the International Phonetic Alphabet (IPA).

Many historians of language teaching (e.g., Howatt 1984) believe that the Reform Movement played a role in the development of Audiolingualism in the United States and of the Oral Approach in Britain during 1940s and 1950s. In both the Audio-lingual and Oral Approach classrooms, pronunciation is very important and is taught explicitly from the start. Furthermore, the teacher often uses a technique derived from the notion of contrast in structural linguistic: the minimal pair drill - drills that use words which differed by a single sound in the same position and base on the concept of the phoneme as a minimally distinctive sound (Bloomdfield 1933).

In the 1960s the Cognitive Approach, influenced by the concepts of transformational-generative grammar (Chomsky 1959, 1965) and cognitive psychology (Neisser 1967), viewed language as rule-governed behavior and deemphasized pronunciation in favor of grammar and vocabulary because:

Native pronunciation was an unrealistic objective

Time would be better spent on teaching more learnable items

The Silent way and Community Language Learning emerged in the 1970s. In the Silent way, learners focused on the sound system without phonetic and teachers spoke as little as possible, indicating through the gesture-elaborate system, which basically means (1) tap out rhythmic patterns with a pointer (2) hold up their fingers to indicate the number of syllables (3) indicate stressed element/ model proper positioning of the articulation by pointing to their lips, teeth or jaw. Besides this, the teachers also used several indispensable tools of the trade such as the sound-color chart, the fidels charts, words charts and colored rods. The Communicative Language Learning (CLL) technique was largely rooted in the humanistic client-centered learning approaches. In this method, the teacher used the audio recorder and the human-computer combination as the major tool for teaching.

The Communicative Approach to language teaching began to take over in the mid to late 1970s (Brumfit and Johnson 1979; Widdownson 1978) by defining that the primary purpose of language is communication. This approach brought renewed urgency to the teaching of pronunciation. There is a threshold level of pronunciation for the non-native speaker. This has resulted in the emergence of 2 principles (1) goal of teaching pronunciation is not to make them sounds like native speakers (2) is to enable learners to surpass the threshold level.

Today's pronunciation curriculum seeks to indentify the most important aspects of both the segmentals and suprasegmentals, and integrate them appropriately in courses that meet the needs of any given group of learners. In addition to segmental and suprasegmental features of English, there is also the issue of voice quality setting.

An overview of the sound system of English

The consonant system and articulation

Stop consonants (plosives)

A plosive is a consonant articulation with the following characteristics:

The closing stage, during which the articulating organs move together in order to form the obstruction. In this stage, there is often a glide or transition audible in a preceding sound segment and visible in acoustic analysis as a curve characteristic of the preceding sound.

The hold or compression stage, during which lung action compresses the air behind the closure; this stage may or may not be accompanied by voice, i.e. vibration of the vocal;

The release or explosion stage, during which the organs forming the obstruction part rapidly, allowing the compressed air to escape abruptly; if stage (b) is voiced, the vocal cord vibration may continue in stage (c); if stage (2) is voiceless, stage (3) may also be voiceless (aspiration) before silence or before the onset of voice.

English has six plosive consonants: p, t, k, b, d, and g. These plosives have different places of articulation

Bilabial Plosives: /p,b/

The soft palate being raised and the nasal resonator shut off, the primary obstacle to the air-stream is provided by the closure of the lips. Lung air is compressed behind this closure, during which stage the vocal cords are held wide apart for /p/, but may vibrate for all or part of the compression stage for /b/ according to its situation in the utterance. Then the closure is released suddenly for the air to escape with a kind of explosion.

Alveolar Plosives: /t, d/

The soft palate being raised and the nasal resonator shut off, the primary obstacle to the air-stream is formed by closure made between the tip and rims of tongue and the upper alveolar ridge and side teeth. Lung air is compressed behind closure, during which stage the vocal cords are wide apart for /t/, but may vibrate for all or part of the compression stage for /d/ according to its situation in the utterance. The air escapes with noise upon the sudden separation of the alveolar closure.

Velar Plosives: /k, g/

The soft palate being raised and the nasal resonator shut off, the primary obstacle to the air-stream is formed by a closure made between the back of the tongue and the soft palate. Lung air is compressed behind this closure, during which stage the vocal cords are wide apart for /k/, but may vibrate for all or part of the compression stage for /g/ according to its situation in the utterance. The air passage escapes with noise upon the sudden separation of the velar closure.

All six plosives can occur at the beginning of a word (initial position), between other sounds (medical position) and at the end of a word (final position).


Fricatives are consonants with the characteristic that when they are produced, air escapes through a small passage and makes a hissing sound sometimes called 'friction'. Fricatives are continuant consonants, as you can continue making them without interruption as long as you have enough air in your lungs.

Labio- dental Fricatives: /f, v/

The soft palate being raised and the nasal resonator shut off, the inner and surface of the lower lip makes a light contact with the edge of the upper teeth, so that the escaping air produces friction. For /f/, the friction is voiceless, whereas there may be some vocal cord vibration accompanying /v/, according to its situation.

Dental Fricatives: /ð, θ/

(Examples words: thumb, thus, either, father, breath, breathe)

The soft palate being raised and the nasal resonator shut off, the tip and rims of the tongue make a light contact with the edge and inner surface of the upper incisors and a firmer contact with the upper side teeth, so that the air escaping between the forward surface of the tongue and the tongue and the incisors causes friction. For /θ/ the friction is voiceless, whereas for /ð/ there may be some vocal cord vibration.

Alveolar Fricatives: /s, z/

(Examples words: sip, zip, facing, rise, rice)

The soft palate being raised and the nasal resonator shut off, the tip and blade of the tongue make a light contact with the upper alveolar ridge, and the side rims of tongue a close contact with upper side teeth. The air-stream escapes through the narrow groove in the center of the tongue and causes friction between the tongue and the alveolar ridge. In other words, in the articulation of these sounds the air escape through a narrow passage along the centre of the tongue, and the sound produces in comparatively intense.

Palato-alveolar Fricatives: /ʃ, ʒ/

(Example words: ship, Russia, measure, Irish, garage)

The fricatives are so called palate-alveolar, which can be taken to mean that their place of articulator is partly palatal, partly alveolar. The tongue is in contact with an area slightly further back than that for /s/, /z/. If you make /s/ then /ʃ/, you should be able to feel your tongue move backwards. The air escapes through a passage along the centre of the tongue, as in /s/ and /z/, but the passage is a little wider. Most speakers of RP have rounded lips for /ʃ/ and /ʒ/, and this is an important difference between these consonants and /s/ and /z/. In addition, the escape of air diffuse (compared with that of /s, z/, the friction occurring between a more extensive area of the tongue and the roof of the mouth. In the case of /ʃ/, the friction is voiceless, whereas for /ʒ/ there may be some vocal cord vibration according to its situation.

All the fricatives described so far can be found in initial, medial and final positions. In the case of /Ê’/, however, the distribution is much more limited. Very few English words begin with / Ê’/ (most of them have come into the language comparatively recently from French) and not many end with this consonants. Only medially, in words such as 'measure', 'usually' is it found at all commonly.

Glottal Fricatives: /h/

The place of articulation of this consonant is glottal. This means that the narrowing that produces the friction noise is between the vocal folds. When we produce /h/ in speaking English, many different things happen in different contexts. In the word 'hat', the /h/ must be followed by an /æ/ vowel. The tongue, jaw and lip positions for the vowel are all produced simultaneously with the /h/ consonant, so that the glottal fricative has an /æ/ quality. The same is found for all vowels following /h/.


Affricates are rather complex consonants. They begin as plosives and end as fricatives.

Affricates: /ʧ; ʤ/

(Palato-alveolar affricates)

The term 'affricates' denotes a concept which is primarily of phonetic importance. Any plosives, whose release stage is performed in such a way that considerable friction occurs approximately at the point where the plosive stop is made, may be called 'affricatives'. The friction present in an affricate is of shorter duration than that which characterizes the fricatives proper. In the articulation of /ʧ; ʤ/ the soft palate being raised and the nasal resonator shut off, the obstacle to the air-stream is formed by a closure made between the tip, blade, and rims of the tongue and the upper alveolar ridge and side teeth. At the same time, the front of the tongue is raised towards the hard palate in readiness for fricative release. The closure is released slowly, the air escaping in a diffuse manner over the whole of the central surface of the tongue with friction occurring between the blade/front region of the tongue and the alveolar/ front palatal section of the roof of mouth. During both stop and fricatives stages, the vocal cords are wide apart for /ʧ/, but may be vibrating for all or part of /ʤ/ according to the situation in the utterance.


Bilabial Nasal: /m/

The lips form a closure as for /p, b/; the soft palate is lowered, adding the resonance of the nasal cavity to those of the pharynx and the mouth chamber closed by the lips; the tongue will generally anticipate or retain the position of the adjacent vowel.

Alveolar Nasal: /n/

The tongue forms a closure with the teeth ridge and upper side teeth as for /t, d/; the soft palate is lowered, adding the resonance of the nasal cavity to those of the pharynx and of that part of the mouth chamber behind the alveolar closure; the lip position will depend will depend upon that of adjacent vowels.

Velar Nasal: / Å‹/

A closure is formed in the mouth between the back of the tongue and the velum as for /k, g/ (the point of closure will depend on the type of vowel preceding); the soft palate is lowered, adding the resonance of the nasal cavity to that of the pharynx and that small part of the mouth chamber behind the velar closure


Only one alveolar, lateral phoneme occurs in English, there being no opposition between fortis and lenis, voiced or voiceless, or fricative and non-fricative. Within the /l/ phoneme three main allophones occur:

Clear [l], with a relatively front vowel resonance, before vowels and /j/.

Voiceless [l°], following aspirated /p, k/.

Dark [Å‚], with a relatively back vowel

For clear [l], the front of the tongue is raised in the direction of the hard palate at the same time as the tip contact is made. For dark [Å‚], the tip contact is again made on the teeth ridge, the front of the tongue being somewhat depressed and the back raised in the direction of the soft palate, giving a back vowel resonance.

Both [l] and [Å‚] are voiced, though partial devoicing may take place when a preceding consonant is fortis. The actual point of contact the tongue for [Å‚] is conditioned by the place of articulation of the following consonant; thus, in health, will they, the [Å‚] has a dental contact, but in already, ultra, all dry, the contact for [Å‚] is likely to be post-alveolar.

Variations of the plosives

Alveolar Approximant: /r/

The most common allophone of RP /r/ is a voiced post-alveolar frictionless approximant. The soft palate being raised and the nasal resonator shut off, the tip of tongue is held in a position near to, but not touching, the rear part of the upper teeth ridge; the central part of the tongue is lowered with a general contraction of the tongue. The air stream is thus allowed to escape freely, without friction, over the centre part of the tongue.

Palatal Approximant: /j/

The vocalic allophones of RP /j/ are articulated by the tongue assuming the position for a front half-close to close vowel and moving away immediately to the position of the following sound; the lips are generally neutral or spread. When /j/ follows a fortis consonant such as /p/, /k/, devoicing takes place.

Labio-velar Approximant: /w/

The vocalic allophones of RP /w/ are articulated by the tongue assuming the position for a back half-close to close vowel and moving away immediately to the position of the following sound; the lips are rounded. The soft palate is raised and the vocal cords vibrate; but when /w/ follows a fortis consonant, some devoicing takes place.

The vowel system and articulation

Vowels differ from consonants in two very important ways: they are articulated without any kind of obstruction in the oral cavity - i.e., the articulators do not form a complete or partial closure or a narrowed passage in the way of the exhaled air. On the other hand, vowels differ from consonants in their behavior, too: while consonants typically occur in syllable marginal positions - they appear at the peripheries of the syllable - vowels form the vey cord of the syllable and occur in syllable central position.

Vowels are produced when the airstream is voiced through the vibration of the vocal cords in the larynx, and then shaped using the tongue and the lips to modify the overall shape of the mouth. The position of the tongue is a useful reference point for describing the differences between vowels sounds, and these are summarized in the following diagram.

The diagram is a representation of the 'vowel space' in the centre of the mouth where vowel sounds are articulated.

'Close', 'Mid' and 'Open' refer to the distance between the tongue and the roof of the mouth

'Front', 'Center' and 'Back' and their corresponding 'vertical' lines refer to the part of the tongue.

The position of each phoneme represents the height of the tongue, and also the part of the tongue which is (however relatively) raised.

Putting these together:

/i:/ bead (a close front vowel) is produced when the front of the tongue is the highest part, and is near the roof of the mouth.

/æ/ hat (an open front vowel) is produced when the front of the tongue is the highest part, but the tongue itself if low in the mouth.

/É’/ dog (an open back vowel) is produced when the back of the tongue is the highest part, but the tongue itself is low in the mouth.

/u:/ food (a close back vowel) is produced when the back of the tongue is the highest part, and is near the roof of the mouth.

New directions in the teaching of pronunciations

The growing interest in the study of suprasegmentals generated by the recognition of the role of prosody in first and second language speech communication is causing a shift in emphasis in foreign language pronunciation teaching. The new approach to pronunciation teaching is more balanced in focus, and more emphasis is placed on pitch, stress, rhythm co-articulation and intonation, and how they used to communicate meaning, the general goal being to achieve comprehensible speech for better overall speech performance (Lambacher, 1996a).

In the past ten years or so, a new impulse to the teaching of the second language (L2) prosody has come from technology, and particularly from speech technology. At the present stage, the use of technology for pronunciation teaching is still largely experimental in nature, but there are indications that new methods are frameworks may be developing that will be beneficial to the study and acquisition of L2 suprasegmentals.

As suggested by a few researchers (e.g., Spaai and Hermes, 1993; Lambacher, 1996b; Stib-bard, 1996; Chun, 1998; Eskenazi, 1999; Wennerstrom, 2000), a combination of audio and visual feedback may have a major impact on learners and enhance their ability to learn both segmental and suprasegmental aspects of pronunciation. On these grounds, speech analysis software have been introduced experimentally in L2 pronunciation classes as a source of feedback for students' speech. The use of speech analysis software allows learners to record and visualize their speech output on their computer monitors to obtain real-time information about the acoustic properties of this output. These visualizations can be used by both learners and teachers to compare and evaluate learners' productions with those of native speakers. Through these visualizations, learners have an objective measure of the distance or closeness of their pronunciation with respect to the target pronunciation. This method is considered to be highly effective by the re-searchers who have used it. Visualization of intonation curves would appear to be particularly effective. So, for example, Eskenazi (1999) maintains that the visual display of L2 prosodic patterns may be crucial for correcting students' inaccurate prosody, because it allows them to visualize where exactly their prosodic patterns differ from native speakers'.

Similarly, Wennerstrom (2000), argues that the visualization of pitch ranges in speech makes it easier for the learner to increase pitch to signal topic shift, and this has a bearing on learners' overall intelligibility in L2. Reports of successful teaching experiences using systems developed for phonetic and speech research and on the effectiveness of visual displays for teaching prosody and intonation are also found in De Bot (1983), Spaai and Hermes (1993), Lambacher (1996a, 1996b), Stibbard (1996), Chun (1998).

The common English pronunciation problems of Vietnamese learners

The main differences of English and Vietnamese pronunciation

According to Doan (1977), there are basically 23 initial consonant phonemes that are represented in the chart based on the place and manner of articulation. A comparison between the two basic phonetic systems of American English (AE) and southern Vietnamese (SV) (see table below) shows that many consonants sounds are not presented in the Vietnamese phonetic system. More specifically, SV does not have the interdental fricatives / θ / - / ð /, alveolar fricative /z/, post alveolar fricatives /ʃ/, /ʒ/, and post alveolar affricatives /ʧ/ - /ʤ/.


In addition to consonants, English and SV vowel are compared. While the monophthong vowels are basically similar, vowel system is clearly different between the English and SV system (see below tables

). It is clear that tense distinction does not exist in SV. For instance, the vowels /i/ and /u/ can occur in final position as tense vowels in both English and Vietnamese, but tense distinction in the middle position does not exist in SV. Therefore, it is impossible to find tense contrasts like beat-bit or pool-pull in Vietnamese. In that sense, it is hypothesized that the tense distinction would be considered a skill for the Vietnamese speakers to acquire and differentiate.

(table 1)

Table 2

The effect of L1 transfer was limited to the inter-dental fricative sound /θ/, /ð/ and tense vowels /i/, /ɪ/, /u/ and /ʊ/. The rationale for selecting these sounds derived from the fact that these sounds and contrasts are present in the English language, but absent in the Vietnamese language. Following contrastive analysis hypothesis, it was predicted that these sounds would present articulation problems to Vietnamese learners of English that would result in mispronunciations triggered by Vietnamese analogous articulation positions.

The common pronunciation errors of Vietnamese learners

Vietnamese is supposed to be easy to phonetically acquire when speakers have an efficient input, especially of the tones. That is one of the reasons why Vietnamese native speakers have to deal with many obstacles to learn foreign languages which are not relatively close to and as easy as their language, for instance, Russian, French, English or Spanish.

There has been quite a number of studies about the difficulties Vietnamese learners face in pronouncing English consonants and clusters. These have led to important findings, which become a valuable basis for further studies, most visibly, for this paper. Ha (2005: 35-46) after her data finally came to these conclusions (first table), in comparison with the table formed by Center for Applied Linguistics (Neumann, 2007)

Research done by Ha (2005), though data were from various sources, it is consistent in general. However a point to note is that her studies focused on learners from the North of Vietnam. Northerern Vietnamese have and obvious confusion most obviously between /s/ and /ʃ/, /t/ and /ʧ/, /ʒ/ and /z/. However, those findings which were mentioned in the table by Center of Applied Linguistics (Neumann, 2007), are applicable for every Vietnamese learner as it contains all the general errors that Vietnamese speakers of from region of the country can make. They are also persuasive to linguists who have certain knowledge about Vietnamese dialects. Taking final consonants into consideration, for example, /t/ and /d/ at word-finally are commonly confused with /t/ and /d/ everywhere, whereas /p/ sound in /pɔp/ pop is often mispronounced as /bɔp/ Bob by Southern people and /ʃ/ sound in /pʊʃ/ push becomes /pus/ puss by northerners.

Tang (2007: 7) offers another comparison table as shown below:

Studying this table specifically for word-final consonants, there are some points that are conflicting with the Vietnamese and English consonants described previously, for example the /Å‹/ sound in sông is bilabialized and different from the /Å‹/ in song, so sang (come over) and sang (past tense of sing) seem to be a better sample pair since they are produced with exactly the same articulation.

Furthermore, Tang (2007) did not list the four-membered consonant clusters that are absolutely foreign to the Vietnamese language. Nevertheless, this material is quite detailed and should be appreciated. It is easily seen from this data that the English language has a number of consonants, especially final consonants and clusters that do not exist in Vietnamese rather than vice versa. As a result, pronouncing English final consonants and consonant clusters properly is one of the most difficult things that learners have to face from the very beginning.

Osburne (1996, 164-181) analyzed a case study from her subject - a Vietnamese native speaker who came to the United States in 1972 - then drew the conclusion that: "In additional to cluster reduction, optional deletion of single syllable-final consonants, especially fricatives, which is attested for Vietnamese L1 speakers […] was found", and "Consonants omitted, however, were always final consonants not permitted by Vietnamese (for example /l/ in [kən'trəʊ] control, /z/ in [bi'kɔz] because)" (Osburne, 169). She also stated that the Vietnamese language is non-rhotic so there is no /r/ sound at the end of English syllables spoken by Vietnamese.

It can therefore be concluded that Vietnamese learners have a tendency to: (1)first move strange English ending sounds towards similar sounds which exist in their mother tongue, (2) secondly omit the sounds that are too difficult for them and (3)thirdly reduce final clusters. This may make their English very "Vietnamese", which causes some problems for communication with native speakers and other proficient speakers of the English language.