Objective Acoustic Phonetic Speech Analysis Biology Essay

Published:

Speech impairment often occurs in patients after treatment for head and neck cancer. New treatment modalities such as surgical reconstruction or (chemo)radiation techniques aim at sparing anatomical structures that are correlated with speech and swallowing. In randomized trials investigating efficacy of various treatment modalities or speech rehabilitation, objective speech analysis techniques may add to improve speech outcome assessment. The goal of the present study is to investigate the role of objective acoustic -phonetic analyses in a multidimensional speech assessment protocol. Speech recordings of 51 patients (6 months after reconstructive surgery and postoperative radiotherapy for oral or oropharyngeal cancer) and of 18 control speakers were subjectively evaluated regarding intelligibility, nasal resonance, articulation, and patient-reported speech outcome (speech subscale of the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire-Head and Neck 35 module). Acoustic-phonetic analyses were performed to calculate formant values of the vowels /a, i, u/, vowel space, air pressure release of /k/ and spectral slope of /x/. Intelligibility, articulation, and nasal resonance were best predicted by vowel space and /k/. Within patients, /k/ and /x/ differentiated tumour site and stage. Various objective speech parameters were related to speech problems as reported by patients. Objective acoustic-phonetic analysis of speech of patients is feasible and contributes to further development of a speech assessment protocol.

Introduction

Lady using a tablet
Lady using a tablet

Professional

Essay Writers

Lady Using Tablet

Get your grade
or your money back

using our Essay Writing Service!

Essay Writing Service

Tumours in the oral cavity and oropharynx may result in damage of various anatomical structures by tumour extension and treatment. Patients often report a decreased use of tongue and perioral muscles and speech organs, such as the lips, tongue and velum, which frequently causes speech difficulty and other problems such as those related to social activities. These problems

may ultimately have a negative impact on health-related quality of life.1 Health-related quality of life significantly deteriorates during the first 6 months after completion of treatment, and may ameliorate by 12 months after treatment. Functionality of the head and neck area frequently remains below pretreatment level.2 Speech quality after treatment appears to be highly dependent on tumour size and site.3-9 Patients who underwent treatment of larger tumours experienced more difficulty with speech than those with smaller tumours. Speech outcome after treatment for an oral tumour often results in articulation difficulties due to tissue loss, and structure alteration of various speech organs, while problems with speech production of patients treated for oropharyngeal cancer often include nasal resonance problems due to velopharyngeal inadequacy. In the past decades, surgical possibilities of replacing damaged tissues in the oral cavity and oropharynx by different flaps have increased aiming to prevent speech and swallowing impairment. The preferred method of reconstruction of larger defects in the oral cavity or oropharynx is by free flaps. Free fasciocutaneous flaps are thin and pliable and are suitable for reconstruction of dynamic structures, such as the tongue and pharynx.3-6 More recently, organ preservation protocols such as chemoradiation are introduced also aiming at prevention of functional impairment. However, a recent literature review reveals that both treatment modalities, reconstructive surgery and organ preservation, still often result in speech and swallowing impairment.10 New radiation delivery techniques aiming at sparing anatomical structures that are correlated with speech and swallowing may contribute to prevent long-term radiation-induced functional impairment as may speech rehabilitation. Also, new speech rehabilitation approaches such as logopedic exercises in an early stage before or during radiotherapy may improve functional outcome. However, prospective randomized trials are needed to provide evidence-based effectiveness of these approaches. Objective speech analysis techniques may add to improve speech evaluation protocols and enable adequate speech outcome assessment in clinical trials. Speech quality is most often assessed via subjective evaluation by listeners. Results obtained from subjective assessments reveal correlations between tumour stage, intelligibility and articulation: patients with a smaller tumour (T2) have better intelligibility and articulation than patients with larger tumours (T3-T4). Nasal resonance and articulation of patients are significantly worse than in healthy individuals.9 Nasal resonance in patients treated for tumours in the oropharynx appears to be worse than in patients with oral tumours. This

difference is due to the oropharyngeal area that is involved in the partition between the oral and nasal cavity. In case of failing velar closure, air escapes through the nose, which results in hypernasal characteristics of speech.11 Objective measurements of speech quality are less often performed. Acoustic-phonetic analysis of the speech signal appeared to differentiate between healthy speakers and glossectomy patients.12 Acoustic-phonetic analyses also revealed that patients who underwent partial resection of the tongue have deviant formant values for vowels, especially for /i/.12, 13 A study using a nasometer revealed that speech of patients after reconstruction with large flaps had worse nasal resonance scores.5 They also reported that patients with resections of more than half of the soft palate had more nasal resonance than patients with smaller resections of the soft palate. The aim of this study is to obtain more insight in phonetic-acoustic speech characteristics of patients after microvascular reconstructive surgery for oral or oropharyngeal cancer regarding formant values of the vowels /a, i, u/, and the velar consonants /k/ and /x/. The second aim is to investigate the validity of objective phonetic-acoustic speech parameters. The results contribute to further development of a multidimensional speech assessment protocol that can be used in future prospective trials on efficacy of various treatment modalities and rehabilitation for head and neck cancer.

Patients and Methods

Lady using a tablet
Lady using a tablet

Comprehensive

Writing Services

Lady Using Tablet

Plagiarism-free
Always on Time

Marked to Standard

Order Now

Patients

Patients underwent treatment for advanced oral or oropharyngeal squamous cell carcinoma with microvascular soft tissue transfer for the reconstruction of surgical defects. Surgery consisted of composite resections including excision of the primary tumour with en bloc ipsilateral or bilateral neck dissection. In case of oropharyngeal carcinomas a paramedian mandibular swing approach was used. Defects were reconstructed by a microvascular

fasciocutaneous flap; no flap failures were observed. Patients received postoperative radiotherapy in case of advanced (T3-T4) tumours, positive or close surgical margins, multiple lymph node metastases and extra nodal spread. The primary site received a dose of 56-66 Gy in total (2 Gy per fraction, 5 times per week), depending on surgical margins. The nodal areas received a total of 46-66 Gy (2 Gy per fraction, 5 times a week). Exclusion criteria were inability to participate in functional tests, difficulty communicating in Dutch and age above 75 years. Fifty-one patients between 23 and 73 years (mean: 53.8 years, SD: 8.7 years) were included in the study after obtaining written informed consent, as well as 18 gender- and age-matched controls (table 1).

Speech Assessment

Patients (6 months after treatment) and controls read aloud a text with an approximate length of 60 s. The distance between lips and microphone (Sennheiser MKE 212 to 213) was 30 cm. Speech recordings were conducted in a soundproof cabin. For each speaker the recording level was adjusted to optimize signal-to-noise ratio. The recorded speech was digitalized with Cool Edit PRO 1.2 (Adobe Systems Incorporated, San Jose, Calif., USA), with a 22- kHz sample frequency and 16-bit resolution.

Subjective Speech Evaluation

Perceptual evaluation of speech quality comprised ratings on intelligibility, articulation and nasal resonance by two speech pathologists. To enable subjective speech evaluation, a computer program was developed to perform blinded randomized listening experiments and to automatically score intelligibility, nasality, and articulation. Intelligibility was scored using a 10-point scale, where 1 represents the worst score and 10 represents the best score and 6 is just sufficient. Articulation and nasal resonance were judged using a 4-point scale, ranging from normal to increasingly deviant speech quality. Interrater agreement for subjective assessment of intelligibility ranged from 40 to 90%. Intrarater agreement for repeated speech fragments of articulation and nasal resonance was high, with 100% equal scores between the ratings. Patient-reported speech outcome was assessed by the speech subscale (including 3 items) of the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire-Head and Neck 35 module. The scores were linearly transformed to a scale of 0-100, with a higher score indicating a higher level of speech problems.14

Acoustic-Phonetic Analyses

In the present study, the vowels /a, i, u/ (the cardinal vowels in Dutch) and velar consonants were used as study material. Vowels are -compared to consonants- relatively easy to identify in the speech signal, and easier to analyze acoustically.

Table 1. Overview of gender, tumour site and stage of 51 patients included in the study

n (%)

Gender

Male 28 (55)

Female 23 (45)

Tumour site

Oral cavity 21 (41)

Oropharynx 30 (59)

T-classification

2 26 (51)

3-4 25 (49)

Vowel formant analyses proved to be valid measures of speech quality in patients with deviant speech originating from oral cancer or other origins in earlier studies.15, 16 Vowel identity (or its spectral color) is characterized by acoustic correlates and is primarily determined by its formants. Broadly speaking, the first formant frequency (F1) is associated with 'height', that is, the degree of opening of the vocal tract, whereas the second formant frequency (F2) is associated with the anterior-posterior tongue position.17 Plotting the vowels /a, i, u/ onto a graphical F1-F2 representation shows the vowel space (more specifically the vowel triangle). The vertices of the vowel triangle represent the most extended positions. The area of the vowel triangle is a measure for the amount of reduction in the vowel system and can (formally) be measured in terms of Hz 2 (see also figure 1).18

Figure 1. Vowel space of male (blue) and female (pink) patients (fat lines) and of controls (thin lines).

In addition to vowels, the velar consonants /k/ and /x/ were acoustic-phonetically analyzed, because earlier research revealed that patients with an oral or oropharyngeal tumour often have difficulties with the production of velar speech sounds. Speech raters often mistook /k/ for /x/.9, 11 For /k/ the duration of air pressure release (the so-called plosive) as a percentage of the total duration (short silent period of pressure building + the pressure release) was measured and used as outcome measure. For /x/ the spectral slope was used as outcome measure. For each selected speech sound (/a, i, u, k, x/), two acoustic realizations were segmented from running speech and were acoustic-phonetically analyzed using the speech processing software

Lady using a tablet
Lady using a tablet

This Essay is

a Student's Work

Lady Using Tablet

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Examples of our work

Praat version 4.0.28.19 Since the acoustic realization of certain speech sounds may depend on its context, we took different phonological contexts around the target speech sounds into account, in order to improve generalization. A spectrogram functioned as a visual representation of the speech signal, which facilitated recognition of phonemes in the speech signal and facilitated precise extraction of phonemes from running speech. Spectral and acoustic speech analyses were automatically performed using scripts19

Statistical Analysis

Validity of objective speech analyses was tested by means of univariate Pearson correlation coefficients between the subjective speech evaluations of intelligibility, articulation and nasal resonance and objective parameters (formants of the vowels /a, i, u/, size of the vowel space, spectral slope of /x/ and duration of pressure release of /k/). To obtain insight into the role of objective parameters in predicting subjective speech evaluation, multivariate regression analyses were performed. For intelligibility and self-assessments by patients, a linear regression was used, while for articulation and nasal resonance, logistic regression was performed on a binary scale [normal (score 0) vs. deviant (scores 1-3)]. Mann-Whitney tests were performed instead of t tests due to skewed data and were used to determine the validity of the objective speech parameters regarding known group differences: patients versus controls, smaller (T2) versus larger (T3-T4) tumours, and tumour location (oral vs. oropharyngeal).

Results

The two formants of two realisations of each vowel were averaged because inspection of formant values of the two realisations of one vowel revealed that there were no significant differences. For the velar speech sounds /k/ and /x/, however, larger differences were found which made using the average inappropriate. Therefore, /k/1 and /k/2 and /x/1 and /x/2 are analyzed separately and described in the results.

Objective versus Subjective Speech Assessment

Univariate correlations between subjective (self-)evaluations and objective parameters reveal that ratings on intelligibility and articulation are significantly related to objective analyses of /k/, the second formant of /i/, and formant space ( table 2 ).

Table 2. Pearson correlations between objective speech parameters and subjective parameters Intelligibility, Articulation and Nasal Resonance (* p<.05).

Intelligibility

Articulation

Nasal resonance

EORTC H&N35 Speech scale

r

r

r

r

/x/2

.12

.13

.33*

-.02

/k/1

.50*

.40*

.25*

-.27

/k/2

.36*

.25*

.39*

-.13

/i/F1

-.23

-.19

-.42*

.02

/i/F2

.35*

.36*

.13

-.24

/u/F1

-.11

-.11

-.37*

-.12

size Δ (Hz²)

.39*

.42*

.15

-.20

To obtain insight into which objective parameters predict subjective (self-)assessments, multiple regression analyses were performed (tables 3-6). The results reveal that /k/, F1 of /i/, and the size of the vowel triangle predicted best subjective (self-)evaluations. These results reveal adequate validity of objective speech analyses. Especially /k/, /i/, /x/ and the size of the vowel triangle contribute to a prediction of subjective evaluation by objective speech parameters.

Table 3. Prediction of intelligibility by acoustic-phonetic parameters. (* p<.05). R²=45%.

Intelligibility

b

t

/k/1

.038

3.42*

size Δ (Hz²)

7.07

4.11*

/i/F1

-.13

-2.60*

Table 4. Prediction of articulation by acoustic-phonetic parameters.

(* p<.05). R²= 74%.)

Articulation

b

Wald

/k/1

.11

8.53*

/a/ F1

-.20

6.52*

/a/ F2

.01

4.59*

/i/F1

-.06

8.31*

size Δ (Hz²)

42.87

9.23*

Table 5. Prediction of nasal Resonance by acoustic-phonetic parameters.

(* p<.05). R²= 52%.

Nasal Resonance

b

Wald

/x/2

.20

9.36*

/k/2

.05

7.32*

/i/F1

-.03

7.67*

Table 6. subjective and objective speech parameters of speech quality that are related to speech problems in daily life as reported by patients. (* p<.05). R²= 45,4 %.

EORTC H&N-35 Speech Scale

b

t

/x/1

1.11

2.29*

/i/F2

-.04

-2.12*

Known Group Differences

To obtain insight into the predictive validity of objective speech analyses, Mann-Whitney tests were performed regarding known group differences: patients versus controls, and within the group of patients regarding tumour classification and tumour site (table 7). Significant differences between patients and controls in acoustic-phonetic parameters revealed that patients

have a shorter pressure release for /k/ than controls. Patients have a higher F1 of /i/, but a lower F2 of /i/ than controls. The size of the vowel triangle is significantly smaller for patients than for controls. Acoustic-phonetic analysis also differentiated regarding tumour stage. Patients with smaller tumours had a longer pressure release compared to patients with a larger tumour. Regarding tumour site, /x/ distinguished between tumour location: patients with an oropharyngeal tumour had a steeper spectral slope than patients with an oral tumour.

Table 7. Significant differences between objective acoustic-phonetic variables measured on vowels (formant values in Hz, size vowel triangle in Hz2) and consonants (duration of air pressure release (the so-called plosive) as a percentage of the total duration (short silent period of pressure building + the pressure release) of /k/; spectral slope for /x/) between pathological and control speakers, and regarding tumour site and tumour classification, as obtained with a Mann-Whitney test.

Pathological vs. control speakers

Patients

Controls

/k/1

Z=-2.77

p=.006

28,5%

sd=16

43,5%

sd= 18

/k/2

Z=-4.15

p<.001

26,4%

sd=19

49,8%

sd=17

F1 /i/

Z=-2.36

p=.018

334 Hz

sd=54

296 Hz

sd=49

F2 /i/

Z=-2.42

p=.016

2105 Hz

sd=363

2325 Hz

sd=248

size Δ

Z=-2.42

p=.015

.143 Hz²

sd=.12

.213 Hz²

sd=.11

Oral tumour vs. oropharyngeal tumour

Oral tumour

Oropharyngeal tumour

/x/

Z=-2.24

p=.025

-13

sd=6

-17

sd=6

T2 tumour vs. T3-4 tumour

T2

T3-4

/k/

Z=-2.09

p=.037

33%

sd=17

23%

sd=14

Discussion

This study presents an inventory of speech performance 6 months after treatment in a well-defined head and neck cancer patient group after reconstructive surgery and radiotherapy for advanced oral or oropharyngeal

cancer. Speech quality was determined with objective acoustic-phonetic analyses and commonly used subjective (self-)evaluations.

The first aim of the present study was to investigate which objective parameters contribute to the prediction of subjective (self-)evaluations of speech. Especially acoustic-phonetic parameters of /k/, /x/, /i/, and the size

of the vowel triangle predicted best subjective assessment of overall intelligibility, articulation, nasal resonance and self-evaluation of speech. The result regarding /k/ is also reported9, where listeners often judged /k/ as /x/. Production of velar consonants such as /k/ and /x/ require a posterior move of the tongue towards the oropharyngeal region and an adequate motility of the velum. Larger tongue motility corresponds with better intelligibility of consonants, including /k/.20 No previous studies report on the speech sound /x/, which may be due to the absence of /x/ in other modern Western languages except for Dutch and a few dialects like Scottish.

The size of the vowel triangle was also found to be a predictor of subjective speech evaluations. The smaller size of the vowel triangle in patients was caused by the higher F1 and lower F2 of the vowel /i/. These results are in agreement with earlier research, where it was shown that a smaller size of the vowel triangle - that was also caused by deviant values of F1 and F2 of /i/ - was related to worse intelligibility in glossectomy patients.12 In the

present study, the vowel /i/ itself also proved to predict subjective evaluations: patients had a higher F1 and a lower F2. These results are in agreement with the results of earlier research on pathological speech13, 18 (both concerned maxillectomy patients), but are not in agreement with results on research concerning partial glossectomy, where it was found that only gender and complication after surgery were of influence on altered F1

values.21

The second aim of this study was to investigate differences regarding acoustic-phonetic speech characteristics between patients and controls and within the group of patients regarding tumour site and tumour classification. Between patients and controls, pressure release of /k/, F1 and F2 of /i/, and the size of the vowel triangle differentiated best. Difficulty with production of /k/ originates from velar function difficulties. The decreased size of the vowel triangle of patients was mainly caused by deviant formant values of /i/ and is in accordance with earlier studies.12, 13, 18 Inadequate movement of the tongue regarding height (F1) and anterior-posterior movement (F2) may result in distorted vowels. Acoustic-phonetic analysis also revealed differences between patients regarding tumour stage (/k/) and tumour site (/x/): patients with smaller tumours had a longer pressure release of /k/ compared to patients with a larger tumour. Regarding tumour site, patients with an oropharyngeal tumour had a steeper spectral slope in /x/ than patients with an oral tumour. Due to tumour growth and treatment in the oropharyngeal area, patients with oropharyngeal cancer are likely to experience more difficulty with the production of velar speech sounds. Like

/k/, /x/ is also a velar consonant, which appears to be problematic for this patient population. These results are in agreement with earlier research9, 20, 22 The results concerning differentiation between groups can be explained by structure alterations of the vocal tract after tumour involvement and treatment. Patients have more difficulty with proper velar closure, resulting in distorted velar speech sounds. Difficulty regarding production of vowels is also attributable to alterations caused by tumour growth and treatment. Especially patients who underwent treatment involving the tongue may experience more difficulty with the production of vowels. In previous studies, vowels of patients treated for head and neck cancer were considered deviant from vowels produced by healthy individuals: F2 of all vowels was lowered compared to controls, and F1 of /i/ was elevated compared to controls.12, 18

In the present study, the velar consonants /k, x/ and the vowels /a, i, u/ were selected from words that were obtained from running speech. The phonological context of the selected speech sounds may be of influence on the perception hereof and could also be of influence on the results obtained in the present study. Further research into speech quality of patients with head and neck cancer could be performed on different speech sounds in order to detect more characteristics of speech quality and more details of specific speech sounds. Also, a different approach to objectively measure the speech quality could be an analysis of speech features present in speech such as nasality or voicing. Such a complex task of calculating speech features could be performed via automatic speech recognition using a neural network trained in identification of speech features.23-25 This approach might give additional insight into the speech of patients treated for head and neck cancer. In the present study, the results are based on postoperative data only and no attempts were made to compare these data with preoperative speech. Future research may focus on post- versus preoperative speech quality in order to obtain more insight into sensorimotor adaptation capabilities of patients to compensate for alterations in the vocal tract after treatment.26, 27

Conclusion

Speech quality of patients after treatment of an oral or oropharyngeal tumour was investigated. Acoustic-phonetic analyses proved to be valid and are suitable for measuring speech quality of patients. The presented results

contribute to further development of a speech analysis protocol to be used in clinical practice and in clinical trials aiming at improving speech outcome in patients with head and neck cancer.

Acknowledgements

The authors wish to thank Li Ying Chao, Pepijn Borggreven

and Milou Heiligers for their contributions regarding speech recordings

and/or analyses.