Production Of Speech Sounds English Language Essay

The production of speech sounds involves two essential components:

– initiating a flow of air in and through the vocal tract;

– Some method of shaping or articulating the air-stream so as to generate a specific type of sound – articulation;

– A third component, present in most, but not all sounds – phonation.


Initiators: lungs – in English the only initiator (in other languages – also closed glottis or tongue (combined with velar closure).

Lungs – “sponges” that can fill in with the air, contained within the rib cage. The expiratory air stream is further processed to bronchi, then to trachea/windpipe and then the larynx [1] . The space between the vocal cords/vocal folds is the glottis.


Organs participating in articulation are called articulators (above the larynx):

– Pharynx [2] – a tube stretching from above the larynx, its top end is divided into two parts:

– One part being the back of the mouth;

– The other part beginning off the way through the nasal cavity. In the production of English sounds it serves mainly as a container of a volume of air that is set into vibration in accordance with the vocal folds vibration [3] 

– Oral cavity – plays the most important role. It is within the oral cavity that the greatest variety of articulatory motions occurs.

The articulatory organs in the mouth:

i/ “passive” – the maxilla, the teeth, the alveolar ridge, the hard palate.

ii/ “active” – the jaw, the lower lip, the soft palate. The velum/soft palate – raised (the airflow cannot escape through the nose) or lowered a (the air passes through the nose). The most active organ in the mouth is the tongue. The tongue: the tip (apex), the blade (dorsum) and the root (radix).

The outer end of the mouth is provided with the upper and lower lip.

The vibration of the vocal folds inside the larynx/voice box produces the sound of voice and this process is called phonation. The larynx is situated in the neck. It has several parts; its main structure is made of cartilage (material similar to bone but less hard). The larynx consists of four cartilages:

1 cartilago thyreoidea

2 cartilago cricoidea

3 cartilagines arytenoideae

4 epiglottis – covering the entrance into the larynx

Inside the larynx there are vocal folds (two thick flaps of muscle rather like a pair of lips). At the front the vocal folds are joined together and fixed to the inside of the thyroid cartilage. At the back they are attached to a pair of small cartilages called arytenoid cartilages, so that if the arytenoid cartilages move, the vocal folds will move too. The arytenoid cartilages are attached to the top of the cricoid cartilage but they can move so as to move the vocal folds apart or together. The term glottis is used to refer to the opening between the vocal folds.

States of the glottis:

1. Not vibrating:

1) If the vocal folds are closely together and they part after the final phase of articulation – Voiceless Plosives

2) If the vocal folds are loosely open, no vibration, no voice is produced – Voiceless Fricatives and Affricates are produced.

3) If the vocal folds are held closely only in their front part, the cartilage-like part is set apart Glottal Fricative (the sound /h/) is produced.

2. Vibrating:

Further narrowing of the glottis brings it into position for the production of voice.

The vocal folds can be held closely together and vibrate. This produces tones /vowels.

The vocal folds can be loosely together and their vibration is weak. This produces: Voiced Plosives, Affricates and Fricatives.

The number of cycles of opening and closing the glottis per second is referred to as the fundamental frequency of voice (Hz). A single cycle happens in the region of 1/100th second, therefore, the cycle repeats at the rates in the region of between approximately 80-200 cycles per second. This rate is far too rapid for the human ear to be able to distinguish each individual opening /closing of the folds. However, human ear is able to perceive variations in the overall rate of vibration as changes in the pitch of the voice [4] . The vibration averages roughly between 200 and 300 times per second in a woman’s voice and about half that rate in adult men.

Acoustic aspect

Sound is formed by means of the vibration of air molecules and is transmitted in sound waves in all directions. The voice comes into existence on the basis of vocal folds vibration. The periodic vibration gives rise to tone, whereas aperiodic vibration results in the production of noise.

The tone is characterized by three basic qualities:

The pitch – given by the frequency of the vibrations in cps. The pitch is in direct proportion to the number of cps and in indirect proportion to the cycle’s period.

The intensity – is the amount of energy transmitted through the air. It is related to the amplitude of vibration. The intensity is proportionate to the square of the amplitude.

The timbre – is given by the composition of the tone. Simple tones are non existent because any object vibrates not only as a whole but also in its individual parts. The vocal folds vibrate in such a manner that in addition to the fundamental frequency (a basic vibration over their length) they produce a number of overtones or harmonics which are simple multiples of the fundamental or first harmonic. The combination of these components makes up the acoustic spectrum. A visible recording of speech is produced by computer analysis and it refers to the following dimensions:

Time / duration on the horizontal axis, given in ms

Frequency on the vertical axis, given in cycles per second

Intensity indicated by relative blackness of the markings.

The component bands are called formants and are numbered from bottom upwards. It is the first two formants (F1 and F2) that contribute most to the distinctive character of the vowels. From articulatory point of view F1 is correlated with tongue height (the pharyngeal formant), F2 with front-to-back tongue placing (the oral formant).

Auditory aspect

The perception of a sound is mediated by the brain rather than by ear itself. The ear has three major functions:

To collect stimuli

To transmit them

To analyze them.

The upper limit of frequency which can be perceived is maximum 20, 000 cps.

The ear is subdivided into the outer ear, the middle ear and the inner ear. The outer ear leads to the ear drum. The middle ear is a small air-filled cavity containing a chain of three tiny bones connected to the ear drum at one end and the inner ear at the other. The main part of the inner ear is the cochlea shaped like a snail’s shell whose function is to convert sound vibrations into nerve impulses. Speech sounds are perceived in terms of four categories (pitch, loudness, quality and length). The categories are subjective and must not be equated exactly with the related physiological and physical categories.

Consonants: Plosives, Fortis & Lenis

The differences between vowels and consonants are in the way they are produced (vowels – voices, consonants – noises); vowels fulfill the role of the peak of the syllable whereas consonants fulfill the marginal function..

Classification of consonants – see Chart of Consonants (P. Roach, p.62). It is customary to divide consonants into several groups according to several criteria. The most important are:

– According to their place of articulation

– According to their manner of articulation

A/ Classification according to the manner of articulation:

– The articulators involved form a closure: /p, t, k/; /b, d, g /. The consonants produced in this way are called Stops/Plosives.

– The articulators involved form a narrowing (as for /f, v, s, z, ”, „, “, š, h/). The consonants produced in this way are called Fricatives.

– The articulators form a closure combined with a narrowing (as for t“, dš). The consonants produced in this way are called Affricates.

Plosives –

As for place of articulation: bilabial, apico-alveolar, velar. Plosives – four phases (approach, hold, release, post-release).


All the three plosives – in all positions: initial, medial and final.

Initial position: CV

In /p, t, k/ during the transition to voiced sound the wide-open glottis takes some time to close sufficiently so that the vocal folds can start vibrating, consequently there is a period of voicelessness – aspiration (“puff of air”).

Final position: VC

The syllables closed by voiceless consonants are considerably shorter than those that are open, or closed by voiced consonants.


They include /f, ”, s, v, „, z, h/.

Manner of articulation:

– Two organs are brought and held sufficiently close together for the escaping air-stream to produce strong friction. This friction may or may not be accompanied by voice.

A/ Place of articulation:

/f, v/ – labiodental

/”,„/ – dental

/s, z/ – alveolar

/“, š/ – palato-alveolar

/h/- glottal

Length of the preceding sound:

The value of the final /f, ”, s, v, „, z/ is determined by the length of the syllable which they close. Dental fricatives: /”, „/


spelling: always th

distribution: word initial, word medial, word final; word initial clusters, word final clusters


spelling: always th

distribution: word initial, word medial, word final; word initial clusters – /d/ does not occur in initial clusters, word final clusters

Manner & place of articulation:

The tip of the tongue makes a light contact with the edge and inner surface of the upper front teeth. With some speakers – the tongue-tip may protrude through the teeth.


Affricates are complex consonants, beginning as plosives and ending as fricatives (Roach).

Palato-alveolar Affricates /t“, d/

/t“/ – when final in syllable: effect of reducing the length of the preceding sounds.


Bilabial nasal /m/

Alveolar nasal: /®/

Velar nasal: /Ž/ (spellings: ‘ng’ or n followed by a letter indicating a velar consonant: tongue, anxious)

distribution: word medial: singer, hanger, anxiety; word medial + g: finger, angle, angry, hunger; word medial + k: anchor, monkey, donkey; word final: sing, wrong, tongue; word final + k: sink, rank; word final syllabic: bacon, taken, organ

Roach: rules for the pronunciation of the ‘nk’ and ‘ng’ digraphs:

– in ‘nk’ the /k/ is always pronounced

– in ‘ng’ the following /g/ is pronounced in mono-morphemic words (finger, anger, linger) and in comparatives & superlatives of adjectives (younger, the longest) otherwise the /g/ following the /Ž/ is never pronounced!!!


Articulatory features: articulated by means of a partial closure, on one or both sides of which the air-stream is able to escape through the mouth. Only one, alveolar, lateral consonant occurs in E. Within the /l/ phoneme 3 main variants occur:

a/ clear /l/, with a relatively front vowel resonance, before vowels and /j/ – Roach – /i/ resonance

b/ voiceless /l/ – following accented (aspirated) /p, k/ (less considerable devoicing – after /f, s, ”/, or weakly accented /p, t, k/)

c/ dark /l/, with a relatively back vowel resonance, finally after a vowel, before a consonant, and as a syllabic sound following a consonant – Roach – u resonance

Clear /l/ – the front of the tongue is raised in the direction of the hard palate at the same time as the tip contact is made. Dark /l/ – the front of the tongue somewhat depressed and the back raised in the direction of the soft palate.



Distribution: word initial (red, raw), word medial, intervocalic (mirror, very), word final /r-link/ (far away, poor old man); in consonantal clusters (price, crow)

Manner & place of articulation: the tip of the tongue held in a position near to, but not touching, the rear part of the alveolar ridge. Lip position – according to the following vowel.

BBC /r/ distribution: only before a vowel.

/j/ – palatal

/w/ – labio-velar

English vowels: short vowels & long vowels

Vowels are specified in terms of 3 parameters:

– vertical tongue position (high – low; close – half-close – half-open – open)

– horizontal tongue position (front – back)

– lip-position (unrounded – rounded)

In accented syllables the so-called long vowels are fully long when they are final or in a syllable closed by a voiced consonant, but they are considerably shortened when they occur in a syllable closed by a voiceless consonant. The same considerable shortening before fortis consonants applies also to the diphthongs.


The sequences of vocalic elements included under the term ‘diphthong’ are those which form a glide within one syllable. They have a 1st element (the starting point) and a 2nd element (the point in the direction of which the glide is made).

BBC diphthongs:

1st element is in the general region of /‰, e, a, ÊŠ, Ɔ, Ə/

2nd element is in the general region of /‰, ÊŠ, Ə/ division into closing (direction towards /‰, ÊŠ/ and centering (direction towards /Ə/.

Generalizations referring to all RP diphthongs:

1/ Most of the length and stress associated with the glide is concentrated on the 1st element, the 2nd one is only lightly sounded; in Slovak the ratio between the lengths of the two elements1:1, in English it is approximately 2:1;

2/ They are equivalent in length to long vowels and are subject to the same variations in length; in the reduced forms there is a considerable shortening of the 1st element;

3/ No diphthong occurs before /Å‹/;

Phonetics & Phonology

Phonetics & phonology are the two linguistic sciences investigating the phonic aspect of language communication and its generalization in the minds of the language users. Phonetics investigates the phonic material of speech (the sounds). The speech sounds are analyzed from two aspects;

Aspect of the speaker/producer;


The former aspect covers the activity of articulatory organs, the latter: the transmission of acoustic entities perceived by the listener and the process of decoding. According to the subject of investigation, phonetics is further subdivided into:

i/ genetic/articulatory – production of speech sounds

ii/ acoustic – transmission of sounds, acoustic characteristics of speech sounds

iii/ auditory – perception of speech sounds

Phonology – speech sounds from the aspect of their function they fulfill within a linguistic system, how they are organized into systems, how they are utilized in languages and what the relationships among them are.

We can divide speech up into segments and we can find great variety in the way these segments are made (their pronunciation differs from speaker to speaker; even the same speaker never pronounces the same segment in the same way. But there is an abstract set of units as the basis of our speech; otherwise we would not be able to understand other speakers of the same language, communication among people would be impossible. These units are called phonemes, and the complete set of these units is called the phonemic system of the language. The phonemes themselves are abstract (the sound patterns stored in our mental grammar), we do not produce phonemes, we produce sounds or phones. “Phonemes are the minimal sequential contrastive units of the phonology of languages”, (Catford).

– contrastive: phonemes are contrastive in the sense that they are the bits of sound that distinguish one word from another: bit , pit: solely by the contrast between the initial consonants /p/ & /b/ the two words are distinguished. The bits of sound manifesting these contrasts are phonemes.

– minimal: phonemes are minimal units, because if you take a stretch of speech and chop it up into a sequence of phonological units, the shortest stretch of speech sounds that functions as a contrastive unit in the buildup of the phonological forms of words is the phoneme.

The phonological structure of English, like that of other languages, can be described as a hierarchy of units. The largest, or most inclusive, unit in English is the intonation contour or tone-group: Jane was here yesterday. We can chop up each tone-unit into smaller units, namely into successive rhythmic units, or feet: (the fact that these feet are contrastive, meaning differentiating units, is demonstrated by the fact that we could divide the utterance into feet differently, and this would convey a slightly different meaning). Next, we can divide each foot into still smaller chunks, namely into a sequence of syllables. Finally, we can divide up each syllable into a sequence of still smaller units – and here it is necessary for us to go into phonetic transcription: At this point we can do no further chopping. We have reached the lowest rank in the phonological hierarchy, the smallest sequential; or linear units – phonemes. There are no smaller meaning-differentiating units.

– Sequential: following in sequence. Phoneme – an abstract unit operating on the level language as a system.

Symbols & Transcription

Types of Transcription


Phonological transcription, phonemic transcription:

The choice of symbols is limited to one symbol per one phoneme.


Phonetic transcription:

Very detailed, each single realization of a sound is recorded.

The Syllable

Human beings cannot produce a sound smaller than a syllable. The syllable seems to be the essential unit of speech segmentation and speech recognition.

J. Laver’s definition of the phonological syllable is as follows: “The syllable is a complex unit, made up of nuclear and marginal elements. Nuclear elements are vowels, and marginal elements are consonants.”

A/ Languages differ in syllable types:

The minimum syllable V (I, Oh);

CV (consonantal beginning – an onset): (e.g. me) – open syllable;

VC (consonantal end – a coda): (e.g. am) – closed syllable;

Some syllables have both onset and coda: (e.g. him).

The most common type of syllable among the languages of the world is CV. CVC is also common among the languages of the world. English syllables: a wide variety of syllable types, both open and closed.

B/ Languages also differ on constraints on the segments which can occur at the beginning or end of a syllable. No syllable in E can begin with /Å‹/; /Ê’/ and /ÊŠ/ are rare. Almost any consonant can occur in syllable-final position, except for /h/, /j/, /w/, /r/ (only in rhotic accents).

Syllable types in E:

– Beginning: a vowel (see the constraints above), one, two or three consonants.

– Ending: a vowel, one, two, three or four consonants.

Syllable structures in English:

Beginning: a vowel – zero onset (/ÊŠ/ – rare); a consonant – except for /Å‹/, /Ê’/ – having an onset; two or more consonants – a consonant cluster.

C/ Initial two-consonant clusters:

i/ pre-initial /s/ is followed by one of about 10 initial consonants (p, t, k; f; m, n; l; w, j, r); with /l, r, w, j/ a two-way analysis is possible (e.g.: slow, sky, swim);

ii/ initial (p, t, k; b, d, g; f, ÆŸ, s, h, v; m, n; l) followed by a post-initial /l, r, w, j/ (e.g. proud, queen, friend).

D/ Initial three-consonant clusters:

There is a clear relationship between the two groups:

/s/ is the pre-initial /p, t, k/ are initial and /l, r, w, j/ are post-initial (e.g. split, square, strike).

E/ Final consonant clusters:

No final consonant means that there is no coda, i.e. it is an open syllable. One consonant means that the syllable is closed. Any consonant except for /h/, /w/, /j/, /r/ can occur in syllable-final position.

F/ Two-consonant clusters:

i/ pre-final (m, n, Å‹, l, s) is followed by a final (e.g. bend, bench ask);

ii/ a final consonant is followed by a post-final /s, z, t, d, ÆŸ/ (e.g. fifth. asks, robbed). The post-final consonant often corresponds to a separate morpheme. Pronunciation: the release of the first plosive of a plosive + plosive cluster is usually produced without plosion and is therefore practically inaudible.

G/ Final three-consonant clusters:

i/ pre-final + final + post-final: (e.g. helped, twelfth);

ii/ final + post-final 1 + post-final 2 (e.g. fifths, next);

H/ Final four-consonant clusters:

i/ pre-final + final + post-final 1 + post-final 2 (e.g. twelfths);

ii/ final + post-final 1 + post-final 2 + post-final 3 (e.g. sixths).

The syllable: onset + rhyme


peak + coda

Difficulties encountered by foreign learners:

Unknown consonant clusters: usually two strategies are applied:

i/ vowels between the consonants are inserted;

ii/ one of the consonants is deleted.

That is simplification of the syllable structure of the E word by making it conform to the pattern of the native languages of the learners. Deletion exists in E but these deletions do not occur randomly.

Stress in English

The syllable or syllables which stand out from the the other szllable or szllables of a word are said to be stressed, to receive the stress.

Gimson: a stressed syllable – the one upon which there is relatively great breath effort and muscular energy.

As for perception the stressed syllables are perceived to be more prominent because they are louder, longer, pronounced on the pitch different from that of the other syllables and they contain a vowel sound differing in its quality from neighboring vowels. The most powerful effect is produced by pitch, the length comes second, loudness and quality of the vowel sounds is less important.

Types of stress: (Kenworthy):

Three levels of stress: primary, secondary, tertiary are heard in long E words:

i/ when said in isolation;

ii/ the word is in a position in a sentence where it is very strongly stressed;

iii/ full vowels are used.

Placement of stress:

English stress is:

A/ Variable, i.e. the main stress is not tied to any particular syllable (in Slovak it has delimitative function, i.e. denotes word boundaries; in E it has distinctive function, i.e. it differentiates the meanings of words);

B/ Fixed, i.e. the main stress always falls on a particular syllable in any given word;

C/ Mobile, i.e. having become familiar with one form of a word, learners will assume that the stress stays on the same syllable in other forms of the word (or they will assume that prefixes and suffixes make no difference to the placement of the stress) but this is not the case of the E language (e.g. photography – photographer – photograph; advertising – advertiser – advertisement; librarianship – librarian – library).

Word Stress Rules

When considering the stress placement several factors should be taken into account:

the structure of the word (whether the word in question is a simple or complex);

the grammatical category of the word (noun, adjective or verb)

the number of the syllables in the word;

the phonological structure of the syllables.

1. The structure of the word:

Simple word – not consisting of more than one grammatical unit – morpheme (although this is sometimes difficult to decide);

Complex words – two major types:

i/ words made from a basic stem word with the addition of an affix (derived words) affixes: two sorts: prefixes and suffixes. They have three possible effects on word stress: the affix itself receives the primary stress (i.e. -ee, ese); the affix will not influence the placement of stress, (i.e. -ing the word will be stressed just as if the affix was not there); The stress remains on the stem, not on the affix, but it is shifted to different syllable (i.e. ‘magnet – mag’netic).

ii/ compound words – made of two (or occasionally more) independent E words. There is no clear dividing line between two-word compounds and pairs of words that simply happen to occur together quite frequently.

Spelling – inconsistency: solid (one word, e.g. sunflower); words separated by a hyphen (e.g. fruit-cake, whistle-blower, cabinet-maker); two words separated by a space (e.g. coffee table, tax inspector, weather forecast).

Word stress rules

2. The number of the syllables & syllable structure


Verbs: Oo – 60%. If the second syllable contains a long vowel/diphthong, or if it ends with more than one consonant – the second syllable is stressed (e.g. apply, attract, achieve). If the final syllable contains a short vowel and one/no consonant or the diphthong /€•/ the first syllable is stressed (e.g. enter, open, follow).

Adjectives: follow the verbs (e.g. lovely, even, hollow, divine, correct).

Nouns: Oo – 90%. If the second syllable contains a short vowel, the stress will usually come on the first syllable. (e.g. table, sofa, picture). Otherwise it will be on the second syllable (e.g. estate, balloon).

Adverbs, Prepositions: behave like verbs and adjectives (e.g. evenly, correctly).


Verbs: oOo, ooO. If the last syllable contains a short vowel and ends with not more than one consonant that syllable will be unstressed and the stress will be placed on the preceding/penultimate syllable (e.g. encounter, determine). If the last syllable contains a long vowel/diphthong, or ends with more than one consonant, that final syllable will be stressed (e.g. entertain, resurrect).

Nouns: Ooo, oOo. If the final syllable contains a short vowel and the middle syllable contains a short vowel and ends with no more than one consonant, both final and middle syllables will be unstressed and the first syllable will get the stress (e.g. library). If the final consonant contains a short vowel or /əu/, it is unstressed if the middle syllable contains a long vowel/diphthong, or it ends with more than one consonant, the middle syllable will be stressed (e.g. potato, mimosa, disaster).

Adjectives: follow the nouns (e.g. derelict, insolent).


Prefixes – stress rules governed by the same rules as in words without prefixes. Suffixes:

carrying stress themselves (e.g. -ee, -ese, -eer, -ette);

not affecting the stress placement (e.g. -able, -al, -ful, -less);

influencing the stress in the stem (e.g. -ive, -ic, -ion, -ious); stress on the penultimate syllable (e.g. words ending in -phy (e.g. photography); -cy (e.g. democracy); -ty (e.g. reliability); -gy (e.g. prodigy); -al (e.g. critical).



Two noun elements: the first element – stressed (e.g. typewriter, suitcase, sunflower).


Adjective + -ed: the second element is stressed (e.g. bad-tempered); first element – a number: the second element is stressed (e.g. first-class, five-finger, three-wheeler)

Adverbs, Verbs: – usually final stressed (e.g. ill-treat, down-stream, North-East, half-timbered).

Note: What a beautiful black bird! Look at that big blackbird!

Stress – tending to go on syllables containing a long vowel/diphthong and /or ending with more than one consonant.

3. The grammatical category of the word – nouns, adjectives, verbs. Compounds – nouns usually stressed on the first element (e.g. summertime, grandfather, silverware, schoolteacher, bathtub). When the second element is a polysyllabic word it retains its stress pattern, but when speaking more rapidly, the stress of the second element may be lost (the secondary stress, e.g. trade exhibition).

Some compound nouns – late stress – exceptions:

Late stress:

First element – material, ingredient the second element is made of (e.g. plastic ‘cup, turkey ‘sandwich, cherry ‘pie). Compounds containing juice, cake take an early stress!!! (e.g. ‘fruit juice, ‘fruit cake, ‘lemon juice).

Names of squares and roads – thoroughfares (roads for public traffic, e.g. Walnut ‘Avenue, Cambridge ‘Crescent, Belgrade ‘Square, Oxford ‘Drive). Those containing street, however, have an early stress (e.g. ‘Baker Street).

The first element identifies a place or a time (e.g. town ‘hall, kitchen ‘window, summer ‘holiday, London ‘transport, April ‘showers).

Sentence Stress

In sequences (sentence, clause, discourse) not all the words are equally important, which in E is shown by means of sentence stress and sentence focus. Why is it important?

As listeners, it is essential that we are able to spot points of importance in the stream of speech;

As speakers, we must highlight points in our messages, or E listeners will have difficulty in interpreting what they hear, in deciding how it relates to what has just been said and predicting what the speaker is possibly leading up to. Thus sentence stress and sentence focus are vital for intelligibility. The placement of sentence stress is closely related to the function the word fulfills within a sentence. According to their function the words in E are divided into:

Content words/lexical words;

Grammar words/function words/structure words.

The former bear the lexical meaning while the latter are structural markers, denoting grammatical categories and syntactic relations. The classes appear to have physiological and neurological validity. Some brain damaged persons have greater difficulty in using, understanding or reading content words and structure words (e.g. in – inn; which – witch). Content words normally carry the most of information. They generally have in connected speech the qualitative pattern of their isolate form and therefore retain some measure of qualitative prominence even when no pitch prominence is associated with them and when they are relatively unstressed.

Structure words do not carry so much information. They do not have a dictionary meaning in the way we normally expect nouns, verbs, adjectives, adverbs to have. All structure words


