This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Listeners are sensitive to both the language rhythm and the speech rhythm. Rhythm was also considered by phoneticians to characterize languages by referring to perceptual alternations of stressed and unstressed syllables. In addition, preliminary evidence shows that speakers of the same geographical, social and stylistic variety of a given language do not speak with one tongue when it comes to rhythmicity. The global objective of this proposal is to carry out a new and exhaustive study of Acadian French rhythm properties in order to better understand the process that relates timing control in the production of speech to the role of timing in perception. Our aim is to give the temporal organization of utterances and rhythm-based features a new role in the front-end of speech processing systems. Computational rhythm-based features adapted to the Acadian French language will be proposed. These relevant features are assessed through the use of an Acadian French corpus namely the RACAD (RACAD, 2008) that we collected across eleven French speaking regions of New Brunswick thanks to a previous research grant (reference). To reach the global objective, we define the following specific objectives:
We will develop computational models of rhythm metrics adapted to the Acadian French speech. Traditionally, the metric analysis is based on the premise that speech temporal continuum can be analyzed by quantification, while rhythmic analysis considers the temporal structure of speech through the perception mechanisms. In this project, rhythm metrics based on segmental duration patterns, developed recently especially in the language identification domain, are computed and quantified in order to capture the temporal organization of Acadian French speech. The advantage of the proposed approach is the modeling of the temporal structure of speech utterances without considering the perception mechanisms but only by analyzing the acoustic signal.
We will use the durational metrics to characterize Standard French and Acadian French. The rhythm of Acadian French will be characterized by durational metrics based on consonantal and vocalic intervals. We will compare our results to the tendency observed by the traditional dialectology that permits to assess if dialects exhibit less or more stress-timing than Standard French.
We will determine whether the speech rhythm metrics are affected by the personal and/or demographic factors related to the speakers. Here we aim to research to what extent personal factors (such as age, gender, geographical region...) could be correlated to the durational differences that we observed among speakers.
We will assess the relevance of rhythm-based features by evaluating the performance of automatic Acadian French speech recognition systems that include these parameters. We also investigate the robustness of the automatic speech recognition by modeling separate regional accents within French speaking regions of New Brunswick in Canada.
Despite its importance, rhythm has been one of the least studied aspects of speech and language processing. The traditional phonetic classification of language rhythm as stress-timed or syllable-timed is attributed to Pike (1945) and Abercrombie (1967). According to this classification, languages such as English or Swedish are considered as "stress-timed" which means that their fundamental unit for equal-timed intervals (isochronous unit) is the stress foot. Syllable-timed languages such as French or Italian have the syllable as the fundamental isochronous unit. Based on this classification, Beckam (1992) investigated extensively the basis of the rhythm class hypothesis but his approach like previous ones lacks experimental support for isochrony in speech. Laver (1994) noted that even if the rhythm class hypothesis is popular among linguists, researchers have not provided support from duration measurements for isochronous timing on any absolute basis. On one hand this lack of objective rhythm measure leads many researchers to conclude that rhythm is primarily a perceptual phenomenon (Couper-Kuhlen, 1993). On the other hand, the weak empirical evidence for isochrony led Ramus et al. (1999) and Grabe & Low (2002) to propose approaches for describing the rhythmic structure of languages from acoustic-phonetic measurements. Ramus has suggested a measure based on the percentage of vocalic intervals (%V) and the standard deviation of consonantal intervals. Grabe & Low have proposed the raw and normalized Pairwise Variability Index (nPVI, rPVI) calculated from the differences in vocalic and consonantal durations between successive syllables. Recently, numerous approaches based on speech rhythm measures have been proposed in order to capture many typologies on speech and language related to rhythm. Acoustic correlates of rhythm class in the speech signal have been proposed. Among others, Wiget et al. (2010) present an overview of widely used rhythm metrics and make recommendations about their effectiveness and reliability. They point out, for example, that metrics that deal with vocalic duration are more effective at discriminating between language varieties than those that measure consonantal duration. Wiget et al (2010) also observe that variation between speakers is a significant source of variability in the rhythm scores. In fact, the literature shows that rhythm scores are sensitive to dialect differences in a number of languages including American English (Thomas and Coggshall, 2006-2007), Welsh English and standard Southern British English (Ferragne & Pellegrino, 2004) , and Italian (Russo & William, 2004).
Regional linguistic variation in New Brunswick Acadian French has been the focus of only a very small number of studies. These studies are based on partial sociolinguistic and dialectological surveys, and they identify several regional differences. In addition, phonetic features can vary considerably within localities. For instance, in the town of Tracadie-Sheila located in the Acadian Peninsula in the Northeast, phonetic variation has been shown to correlate with demographic factors such as speaker age and gender (Flikeid 1984). Acadian French has a number of distinctive phonetic features (see the overview in Lucci 1973). Preliminary evidence from our speech recognition experiments (Cichocki et al., 2009) shows that there are noticeable perceptual differences between Acadian French speakers in terms of speaking rate and length of vowels and consonants. In this project we examine the effectiveness of various rhythm metrics in analyzing variable duration patterns observed in the pronunciation of Acadian French spoken across the province of New Brunswick in Canada. We will also assess the performance of Acadian French speech-enabled systems that include quantified rhythm knowledge.
The rhythm is defined as "the regular perception of the prominent units in speech". From the acoustical point of view, it can also be defined as the alternating sequences of voiced or voiceless sound (Crystal, 1985). Cummins and Port (1998) define rhythm in speech as the hierarchical organization of temporally coordinated prosodic units. In this proposal, we carry out acoustic-phonetic investigations based on variability duration measurements to characterize the rhythm of Acadian French speech and of its variant forms.
Strategies to reach the specific objectives
Strategy to reach specific objective #1: Developing computational models of rhythm metrics adapted to Acadian French speech.
In order to capture the temporal organization of Acadian French speech and its variants (spoken in New Brunswick regions), the duration of both consonantal and vocalic intervals will be studied. For each utterance pronounced by each speaker the following rhythmic parameters will be calculated: DeltaV: the standard deviation of the vocalic duration; DeltaC: the standard deviation of the consonantal duration;%V: percentage of vocalic duration over the whole utterance; VarcoV: the ratio of DeltaV to the mean vocalic duration expressed as a percentage (x 100);VarcoC: the ratio of DeltaC to the mean consonantal duration expressed as a percentage (x 100); To measure the variability of the duration of successive vowel and consonant intervals, the PVI is used. The raw PVI (noted rPVI) is defined by:
We also expect to calculate the PVI for the voiced and voiceless segments.
Explicitly, rPVI-V, nPVI-V, rPVI-C, nPVI-C will be calculated for all utterances and speakers.
The variability in contrastive rhythm scores according to speaker and sentence materials will be reported. The intra-class correlation coefficients (ICCs) based on the rhythmic parameters will be investigated. In order to provide some sense of whether differences between speakers and utterances are specific to the Acadian French language, we will compare the size of those differences to the difference reported between languages as a benchmark. Therefore, thanks to this rhythm-based model we will able to perform positioning of Acadian French among the other languages according to the method proposed by White and Mathis (2007).
Strategy to reach specific objective #2: Comparison between Standard French and Acadian French using rhythm metrics.
Durational metrics based on consonantal and vocalic intervals will be used to characterize the Acadian French spoken in eleven regions of New Brunswick Canada. The RACAD corpus will be used for this purpose. The experiments will permit us to know if Acadian French will exhibit less or more stress-timing than Standard French. Positioning of all the eleven French speaking regions in different bi-dimensional rhythm metrics spaces will be performed. The same approach will be followed to characterize Standard French using the Comberscure corpus. We will report the variability in terms of rhythm scores according to speaker and sentence materials. We will determine from our exhaustive analyses the rhythm metrics that are the most discriminative within the Acadian French variety (regional accents of Acadian French). A statistical framework using the analysis of variance will be performed.
Strategy to reach specific objective #3: Determining evidence whether the speech rhythm metrics are affected by personal/demographic factors.
We investigate the correlation between rhythm-based features and personal/demographic factors of speakers such as age and gender. Finding evidence of their impact (of social factors) on the rhythm parameters will help to adapt the verbal interactive systems to the social groups according to their phonological characteristics. We can consider age and gender of speakers as variables in order to precisely assess the relevance of rhythm metrics. Post-hoc comparisons are performed in order to show significant contrasts. To examine the effects of other social factors, a series of two-way or three-way ANOVAs will be carried out on all the metrics. What is relevant in these experiments is the fact that the rhythm metrics will permit us to observe significant differences that exist among groups of speakers. The obtained results will be useful for studies that examine social variations. The methodological point here is to verify if these speaker groups structure or not some of the variation in the rhythm scores. . In the case of Acadian French, a number of recommendations will be provided according to our findings.
Strategy to reach specific objective #4: Improving robustness of automatic Acadian French speech recognition systems.
We propose to include the rhythm parameters to adapt phoneme models of speech recognizers. Our goal is to improve the automatic recognition of speech uttered by Acadian French speakers. Another practical targeted problem is the modeling of separate regional accents. This modeling remains difficult and inaccurate due to the large number of regional accents and to the insufficiency of dialectal speech data available for training. We believe that it is necessary to unify acoustic processing and to adapt the architecture of recognition systems to cover the broadest range of languages and situations. Modern configurations of recognition systems are mostly software architectures that generate a sequence of word hypotheses validated by a language model. The most popular and effective algorithm implemented in these architectures is based on Hidden Markov Models (HMMs), which belong to the class of statistical methods [reference]. We will analyze the effects of including rhythm-based features on the overall speech recognition system accuracy and the individual phoneme accuracies. We will also study the differences related to the phonetic confusion in order to determine the phonemes that play a significant role in the recognition performance between French speaking regions of New Brunswick. The effect of the language model will also be investigated.
Speech material: RACAD Corpus
The RACAD speech corpus contains high quality audio recordings of the regional varieties of French spoken in the province of New Brunswick, Canada. Its design is informed by linguistic analyses of Acadian French. The corpus contains sentences read by 140 speakers who were selected according to age, gender and geographical region. In sum, an important consideration in designing the RACAD corpus was to elicit features of pronunciation that are related to regional and social variation.
Phases and related tasks
The project will be structured into four distinct major phases:
Phase1: Preparing the data and scripts for metric measures and implementing the rhythm-based computational model
The first phase of the project involves conducting a preparation of the label data extracted from the RACAD corpus, designed to produce a clear representation of the linguistic material to be analyzed. In this phase, we also perform a complete script framework for the calculation or rhythmic measures. Statistical framework is also developed. In this first phase the tasks to be performed are as follows.
â€¢Task 1.1. Label conversion/creation from RACAD corpus
â€¢Task 1.2. Tuning the labeling
â€¢Task 1.3. Calculation of rhythm measures
â€¢Task 1.4. Statistical framework set up
Phase 2: Implementing the validation tools and performing the assessment experiments
The second phase aims at building a set of outline scenarios by using the tools developed in phase 1. The scenarios are constructed from the key driving factors impacting and shaping the current phonological variation of Acadian French. First scenario consists of finding evidence for relevance of the durational metrics to discriminate the regional accents of Acadian French spoken with respect to the eleven French speaking regions of New Brunswick. The second scenario will deal with Standard French and will investigate its characteristics with respect to durational metrics. We will compare our results to the tendency observed by traditional dialectology, which consists of assessing if dialects exhibit less or more stress-timing than the standard. The scenarios will be developed in more depth by performing the following tasks.
â€¢Task 2.1. We will establish the references of durational measures using standard French utterances extracted from the Combescure corpus (baseline).
â€¢Task 2.2. We will use the RACAD corpus to perform the duration modeling of Acadian French spoken in the eleven regions of New Brunswick.
â€¢Task 2.3. We will use the Combescure corpus to establish patterns based on rhythm metrics for Standard French.
â€¢ â€¢Task 2.4. We will compare the findings of previous tasks according to the traditional classification of languages with respect to rhythmic groups (positioning the Acadian French language and its variants among the other languages: English, Spanish, Italian, Germanâ€¦).
Phase 3: Investigating correlates of speech rhythm metrics among Acadian French speakers with respect to demographic factors
In the third phase, an approach will be proposed in order to gain a fuller account of potential differences between speakers by including demographic and personal factors in our analysis - age, gender, and the region - that might be related to between-speaker variability in rhythm scores. Experiments will be carried out in order to test whether the rhythm measures could be socio-linguistically conditioned. For this purpose multiple analysis of variance (ANOVA) tests and non-parametric analyses of variance (Kruskal-Wallis) will be performed. The tasks in this phase are as follows.
â€¢Task 3.1. We will identify the factors that could play a significant role in the phonetic/rythmic structure of Acadian French.
â€¢Task 3.2. We will code the identified personal factors in such a way that they can be computationally analyzed. The straightforward method is to perform a categorization.
â€¢Task 3.3. We will carry out a series of one-way and two-ways ANOVAs on all metrics with selected personal factors (gender and age are the obvious ones) as independent variables.
â€¢Task 3.4. We will establish a complete profile of Acadian French speakers regarding the studied variables (durational measures and personal factors).
Phase 4: Incorporating rhythm-based features in the front-end processing of Acadian French speech recognition systems
We will focus on improving the robustness of Acadian French speech recognition systems by exploiting the outcomes of previous phases. Adapted models are created for speech recognition using the relevant rhythm measures. The performance of speech recognition systems is evaluated in various contexts; therefore, the following tasks will be carried out.
â€¢Task 4.1. We will create inter-regional models of Acadian French and compare their performance in speech recognition against global models. Rhythm-based features are included in the front-end processing.
â€¢Task 4.2. We will adapt the Acadian-accented models by incorporating the rhythm knowledge to improve the robustness of speech recognition dedicated to standard French.
â€¢Task 4.3. We will analyze the cross-New Brunswick regional model results, unifying the framework of Acadian French speech recognition to include the variants of the language and the speakers' variability.