Assessment Of Dysarthric Speech Through Rhythm Metrics Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Preliminary evidence from perceptual experiments and earlier studies show that dysarthric speakers have a lower speaking rate and longer vowels and consonants than healthy speakers. This paper reports the results of acoustic investigation based on rhythmic classifications of speech from duration measurements carried out to distinguish dysarthric speech from healthy speech. The experiment takes into consideration rhythm metrics, which are based on durational characteristics of consonantal and vocalic intervals and the Pairwise Variability Index (PVI) as well as the Nemours database of American dysarthric speakers. Results show that when compared to a standard speech, the acoustic measures performed on disturbed speech can effectively be used to characterize the dysarthric speakers and even the severity of dysarthria.

Index terms- Dysarthria, rhythm, Pairwise Variability Index, acoustical analysis, Nemours database


Dysarthria covers various speech disorders resulting from neurological disorders and it probably represents a significant proportion of all acquired neurological communication disorders. These disorders are linked to the disturbance of brain and nerve stimuli of the muscles involved in the production of speech. Such disorders reflect disturbances in the strength, speed, range, tone, steadiness, timing, or accuracy of movements necessary for prosodically normal, efficient and intelligible speech. In fact, these disorders induce variable speech amplitude and poor articulation. All types of dysarthria affect the articulation of consonants leading to slurring speech. Vowels may as well be distorted in very severe dysarthria. Rhythm troubles may be the common characteristic of various types of dysarthria. Many studies state that most dysarthric patients have slow speaking rates with long vowel and consonant segments as compared to standard control samples [1, 2, 3, 4, 5, 6, and 7].

The present paper focuses on assessment of rhythmic disturbance in dysarthria caused by cerebral palsy and head trauma. Cerebral palsy refers to a variety of developmental neuromuscular pathologies, occurring in three main forms: spastic, athetoid, and ataxic, associated with bilateral lesions of upper motor neuron pathways that innervate relevant cranial and spinal nerves. In this paper, we have spastic, athetoid, and head trauma patients who present dysarthria speech.

Dysarthria severity can be indexed in several ways, but quantitative measures usually focus on prosodic features, mainly intelligibility and speaking rate. Disturbance of rhythm in the speech flow process is one of the important factors in prosodic abnormalities. Even if the rhythm is identified as the main feature that characterizes dysarthria, assessment methods are mainly based on perceptual evaluation measures. Despite their numerous advantages that include ease in use, low cost and clinicians' familiarity with related procedures, perceptual-based methods suffer a number of inadequacies and aspects that affect their reliability. These methods also lack evaluation protocols that may help standardization of judgments between clinicians and/or evaluation tools. Therefore, the aim of this work is to quantify rhythm abnormalities in the dysarthric speech through the rhythm metrics (Ramus and Grab parameters) developed recently especially in language identification domain [8, 9]. Grab et al do not relate speech rhythm to phonological units such as interstress intervals or syllable durations. Instead, they calculate durational variability in successive acoustic-phonetic intervals using Pairwise Variability Indices (PVI) [10]. The raw Pairwise Variability Index (rPVI) is given in equation (1):


Where dk is the length of the kth vocalic or intervocalic segment and N the number of segments. A normalized version of the PVI index (noted nPVI) is defined by:


Ramus and colleagues argued that a viable account of speech rhythm should not rely on complex and language -dependent phonological concepts but on purely phonetic characteristics of the speech signal [8]. They measured vowel durations and the duration of intervals between vowels. Ramus et al computed three acoustic correlates of rhythm from the measurements: (a) %V, the proportion of time of vocalic intervals in the sentence; (b) ∆V : the standard deviation of vocalic intervals; (c) ∆C : the standard deviation of inter-vowel intervals. Ramus et al. argued that a combination of %V and ∆C provided the best acoustic correlate of rhythm classes [9]. Our goal is to use these metrics in order to distinguish between the healthy and dysarthria speakers and to characterize the intelligibility because the alterations of rhythm may also impact speech intelligibility [10, 11, 12].

This paper is organized as follows: Section one is an introduction to the work. Section 2 presents the speech materials, subjects and procedures used throughout the experiment and gives an explanation of the different indices and measures proposed to characterize dysarthria. Section 3 deals with a quantitative acoustic analysis of dysarthric speech. Section 4 is for the probabilistic classification used to determine which sets of rhythmic predictor variables best discriminated between dysarthrias and control speakers. As for Section 5, it concludes the work and permits to provide some perspectives.


2.1 speech material

Nemours is one of the few databases of recorded dysarthric speech. It contains records of American patients suffering different types of dysarthria. The evaluation methodology followed in Nemours is inspired by the work of Kent [13, 14]. Kent [1] presents a method that starts by identifying the reasons for the lack of intelligibility and then adapts the rehabilitation strategies. His test consists of a list of words from which four words are selected. The patient is supposed to listen to these words and repeat them aloud. The evaluation takes into account the phonetic contrasts that can be disrupted.

The full set of stimuli consists of 74 monosyllabic names and 37 bi-syllabic verbs embedded in short nonsense sentences (two names and a verb per sentence). Speakers record 74 sentences with the first 37 sentences randomly generated from the stimulus word list (each speaker deals with 74 sentences different from these of the other speakers) and the other 37 sentences are obtained by swapping the first and second names of the 37 first sentences. Sentences have the following form:

THE noun 1 IS verb-ING THE noun 2.

A recording session conducted by a speech pathologist, in a small sound dampened room with the talker seated (typically in his wheel chair) next to the speech pathologist or experimenter and in front of a table mounted microphone (Electro-Voice RE55) connected to a digital audio tape recorder. (Sony PCM-2500). Nonsense sentences were written in large print on a sheet placed in front of the talker and each sentence was read first by the experimenter and then repeated by the subject. This assisted all talkers in pronunciation of words and was essential for some subjects with limited eyesight or literacy. The speech material is sampled at 16 kHz and 16 bit sample resolution after low pass filtering at a nominal 7500 Hz cutoff frequency with a 90 dB/Octave filter. This achieved approximately 12 dB of attenuation at the Nyquist frequency [13, 14]. The extraction of acoustic information and derived parameters is performed using the Snack package of KTH [15].

Only eight sentences from 74 sentences for each dysarthric and healthy speaker were segmented and labeled manually to phonemic intervals. The beginning and end of each phoneme within each phrase was marked by listening to the waveform and using the intensity envelope as a guide.

2.2. Subjects

The speakers are eleven young adult males with dysarthria due to cerebral palsy (CP) or head trauma (HT) and one non-dysarthric adult male (the experimenter). Seven speakers have cerebral palsy, among whom three have CP with spastic quadriplegia and two have athetoid CP, and both have a mixture of spastic and athetoid CP with quadriplegia. The four remaining subjects are victims of head trauma (one quadriplegic and one with spastic quadriparesis), with cognitive function ranging between Level VI-VII on the Rancho Scale. The speech from one of the talkers (head trauma, quadriplegic KS) was extremely unintelligible. A two-letter code was assigned to each patient: BB, BK, BV, FB, JF, KS, LL, MH, RK, RL and SC.

The Frenchay dysarthria assessment scores (see table 1) of motor function associated with each speaker revealed that the patients were highly heterogeneous and consisted of three subgroups, one mild, including subjects FB, BB, MH and LL. The second, between the first and the third subgroups, includes the subjects RK, RL, and JF. The third is severe and includes subjects KS, SC, BV, and BK. The perceptual data and the speech assessment did not take into consideration the too severe case (patient KS) and the too mild case (patient FB) [13, 114].

2.3. The Rhythm Metrics

For each dysarthric sentence of each speaker, we have measured the durations of the vocalic, consonantal, voiced and unvoiced segments. This allows us to compute the follows parameters :

%V: Percent of sentence duration of vocalic intervals .

DC: Standard deviation of consonantal intervals

DV: Standard deviation of vocalic intervals

% VS : Percent of sentence duration of voiced intervals

D(VS): Standard deviation of voiced intervals

D(VNS): Standard deviation of non voiced intervals

Vocalic -rPVI: Raw pairwise variability index for vocalic intervals

Vocalic -nPVI : Normalized pairwise variability index for vocalic intervals

Intervocalic-rPVI: Raw pairwise variability index for intervocalic intervals

Intervocalic-nPVI: Normalized pairwise variability index for intervocalic intervals

Table 1. The rhythm metrics parameters













percentage of severety

























The mean and the standard deviation for the duration of vocalic and consonantal intervals are given in Table 1. The results confirm clearly that the durations of both intervals are greater for Dysarthric patients (DP) than the Healthy Control (HC).

Table 2:. Mean and standard deviation of consonantal and vocalic interval durations.

Figure 1 shows the distribution of DP and HC along the %V (X axis) and DC(Y axis) dimensions. The figure illustrates how the proportion of vocalic intervals represents less than 30% of the total duration of a sentence for the severe cases (KS, BK, SC, RL) which is to be expected because a vowel that is spoken less clearly tends to be reduced while less severe cases are much closer to the control subjects. The plan (%V, DV) shows almost the same distribution of the DP patients (KS, BK, SC, RL are relatively isolated from the rest of the speakers). Finally, we observed a high DC, DV and a low %V for the DP, particularly for the severe cases (KS, BK, SC, RL).

Duration (msec)

Vocalic intervals

Consonantal intervals















Figure 1: Distribution of DP (Dysarthric Patient) and HC (Healthy Control) along the %V(X axis) and, Δ(V) (Y axis) dimensions

A one-way analysis of variance ANOVA to determine if the metrics demonstrated significant group differences was conducted. For DC and %V, the main effect of group was not statistically significant (F (1,20)=3.04, p = 0.09, and F(1,20)=0.91 p=0.35 respectively) but for the standard deviation of vocalic intervals, the main effect of group was rather significant (F(1,20)= 5.35, p=0.03 ).

Actually, we can observe in Figures 2 and 3 that almost all DP are endowed by a Vocalic-nPVI that is inferior to that of the HC but with a higher Vocalic-rPVI. However, for the Intervocalic-rPVI (even Intervocalic-nPVI), the DP and HC were similar except for the most severe DP: KS, RK and BK .

The main effect of group for Vocalic-rPVI and Vocalic-nPVI was statistically significant (F(1,20)=5.93, p=0.02,and (F(1,20)=10.6, p=0.004 respectively). The main effect of group for the Intervocalic-rPVI and Intervocalic-nPVI was not statistically significant ((F(1,20)=3.58, p=0.07, and F(1,20)=0.058 p=0.81 respectively).

We can notice that BV who is considered severe case is always close to HC and mild DP . FB the mildest was relatively far to HC.

Figure 2: Dysarthric and healthy subjects (DP and HC respectively) represented in the (nPVI, inter-rPVI) space for the Vocalic and Intervocalic intervals

Figure 3: rPVI index representation of healthy and dysarthric subjets

A total of 74 sentences for each speaker (814 utterances by dysarthric speakers and the same number of utterances by the healthy control) were automatically segmented to voiced and voiceless intervals. The obtained results illustrated by table 4 reveal that the severe dysarthric patients tend to produce lengthened voiceless segments with higher values of standard deviation and duration of voiced intervals greater than voiceless intervals and a greater number of both segments. The duration of sentences repeated by the patient KS (KS the most sever DP) was far superior to other DP, whose duration was also superior to the healthy control. KS had the greatest number and the most lengthened voiced and voiceless segments.

Table 3:. The mean and standard deviation computed for voiced and voiceless intervals duration.


Voiced intervals

Voiceless intervals





Mean (sec)










We can observe from figure 4 a random distribution of the DP when the HC are well regrouped. The most severe cases are the most far DP from HC (KS, RK,, RL, and SC) with the lowest values of %DVs and the most mild cases (FB and BV) with the highest values of %DVs. The rest of DP are close to HC. We note that the DP BV, whose Frenchay and intelligibility scores are 57.5 and 3 respectively, is with all the parameters studied, yet is very close to the HC. Indeed, on examining the speech of BV and FB patient, we noted that the speed of BV speech was quite normal and almost intelligible but with nasality (BK is most severe in spite his Frenchay and intelligibility scores are 58.2 and 3 respectively ). FB is the mildest case but he is not the closest DP to. In fact his speech is very intelligible but his speech rate is very slow.

Figure 4: Dysarthric and healthy subjects represented in the (%DVs, Δ(NVs) space.


We use a linear discriminant analysis, which is a Gaussian classification technique based on a shared covariance across the classes in the plane of two predictor parameters. In our case, we will consider: (%V, ΔC), (%V, ΔV), (Vocalic-nPVI, Intervocalic-nPVI ), (Vocalic-rPVI, Intervocalic-rPVI )). In order to have an idea of the power separation of the parameters studied above, a closed test that involves training and testing on the same data was adopted. Table 5 shows that the plane of (%V, ΔV) gives the best separation score (95.45%) of the classes (DP and HC) but also the rPVI whose (86.36%) correct separation of the dysarthric patients from healthy control is a very encouraging score in spite of the closed test.

Table4: the confusion matrices of all parameters



%V, Δ(v)

%V, Δ(c)




























In this paper, we have tried to present acoustic evidence for rhythm-based assessment of dysarthric speech. The rhythm metrics are based on durational characteristics of vocalic and intervocalic intervals and their PVI using both raw and normalized measures. The measures are then used to characterize the type and severity of dysarthria out of various duration measurements. The timing of voiced segments and voiceless segments clearly show differences between healthy and dysarthric subjects. We have computed acoustic variability indices and we have found that these indices effectively express the severity level of the dysarthria disturbance. Therefore, these features might be very useful when included in software tools that can help for diagnosis and training of dysarthric subjects. Our approach in this work focused on the rhythm of dysarthric speech by considering the proportion of vocalic, consonantal, voiced and voiceless intervals in speech utterances (sentences), but the rhythm can also be considered as a hierarchical organization of temporally coordinated prosodic units. This will lead us to examine in future work the impact of prosodic features and to continue to assess the vowel and consonant segments in the context of disturbed speech.for the classification, we will validate our results using open test with a large data set.