Formant Spread For Speaker Recognition Purpose English Language Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The research objective is to study the variation in the vocal tract shapes and formants spread for the male and female speakers, and the reliability of educated of Andhra Pradesh, and establish the use of this variation of vocal tract shapes and formant spread for speaker recognition purpose.

The vocal tract is modeled as a co-axial concatenation of lossless, acoustic tubes of different length and diameters. The cross-sectional area of any of the tubes can be varied independently to simulate the changing shape of the vocal tract. The first such concatenated tube starts at the glottis, and the last tube ends at the lips. The number of tubes, the diameter, and the length of each tube, acoustically determine the resonance of the vocal tract and the place of articulation.

There is evidence that the phonetic distinctiveness and speaker individuality are deeply ingrained in the vocal tract shape, estimated from the vowels using formant frequencies. This is demonstrated by Acoustic Articulatory Model on vowels and speaker based area function approximation of the vocal tract.

Here, a new technique has been proposed to approximate the area function of a person at different times, and in different contexts. The variability of the resulting shapes is measured on intra and inter - speaker basis. Such vocal tract shapes are arrived at for each subject, for pre-defined set of phonemes namely /a/, /e/, /i/, /o/ and /u/. The vocal tract shape correlation graphs of a vowel superimposed on itself, versus the discrimination provided against other vowels pronounced by male speakers are calculated. The time averages of the worst and the best patterns of the ensemble are plotted. The results are as shown in figure 4.65. The repeated tests for different vowels result in figures 4.69, 4.73, 4.77 and 4.81 respectively.

Error minimization is carried out using an all-pole, LPC filter. Analysis is done for a vowel in the above format, to get the vocal tract shapes for vowels of male and female speakers by taking 30 samples of 30 subjects with different contexts. The vocal tract shapes are arrived at for each subject for 30 sets of data at different times for predefined set of phonemes namely /a/, /e/, /i/, /o/ and /u/.

Using LPC and correlation analysis, the vocal tract shape variability of the individual subject is found. Study of variability of the above vocal tract shapes among 30 different speakers is highlighted to identify intra -speaker variability. The identified variability can be used as a cue for personal identification in speaker specific recognition.

These two techniques are used for analysis of phonetic distinctiveness and speaker individuality deeply ingrained in vocal tract shapes estimated from the vowels using formants, and area function approximation of the vocal tract shapes.

The proposed new technique approximates an area function of the vocal tract shapes. It is proposed to use area function approximation of a person taken at different times and in different contexts. Then the resulting variability of the vocal tract shapes are measured on intra and inter-speaker basis, using the minimal and maximal vocal tract shapes for vowels of male and female speakers.

Phonetic distinctiveness and speaker individuality are ingrained in the vocal tract shapes estimated from the vowels using formant frequencies. This is demonstrated by the acoustic model on vowels and speaker based area function approximation and formant spread, i.e., the formant spread will be more in F1 and F2 than in F3 and F4.

6.2 SUMMARY

The Auto-Regressive Method of speech analysis based on Linear Prediction has been used. This system depends only on the earlier outputs. The simplest model of a vocal tract consists of many, coaxially linked, cylindrical tubes, producing an all-pole transfer function. Vocal tract shape is estimated from the reflection co-efficients obtained using LPC analysis of speech signals, using Wakita's speech analysis model. Vocal tract area values are obtained for the natural vowels for male and female speakers. The speech samples are acquired with sampling frequencies 22,100Hz per second in 30ms blocks with an overlapping of 10ms. The LPC filter order of 25 is used.

The dynamic vocal tract models, obtained using 25 cylindrical lossless tubes with 24 reflection co-efficients are calculated. Using the calculated reflection co-efficients, the denominator co-efficients for the transfer function V(z) = VL(z)/VG(z) are carried out. The reflection co-efficient at the lips and the area values for vowels namely /a/, /e/, /i/, /o/ and /u/ vary. The largest reflection co-efficients occur where the relative changes in the area are greatest.

Phonetic distinctiveness and speaker individuality are ingrained in the vocal tract shapes estimated from the vowels. This is demonstrated by the acoustic models on vowels and speaker based area function approximation. A new technique has been proposed to approximate an area function of a person, taken at different times, and in different contexts. Then the resulting variability of the vocal tract shapes are measured on intra and inter-speaker basis using the minimal and maximal vocal tract shapes for vowels of male and female speakers. These are shown in figures 4.53 to 4.62. The bounds of vocal tract shapes for maximal average and minimal average for the test samples of vowels collected from male and female speakers have been obtained. The vocal tract shape correlation graphs of a vowel superimposed on itself versus the discrimination provided against other male and female speakers pronounced vowels have been computed, and are shown in the tables 4.11 and 4.12.

Phonetic distinctiveness and speaker individuality are ingrained in the vocal tract shapes estimated from the vowels using formant frequencies. This is demonstrated by the acoustic model on vowels and speaker based area function approximation and formants spread.

The area function approximation approach is discussed and implemented in the 4th chapter. 5th chapter is about the development and implementation of a new technique for intra and inter-speaker variation in the form of formant spreads F12, F23 and F34.

The Phonetic distinctiveness and speaker individuality are measured in the form of formant spreads i.e., F1, F2 and F3, F4. Phonetic distinctiveness among the vowels uttered by an intra and inter-speakers is well understood and mostly ingrained in the lowest formants F1 and F2. i.e., the format spread will be more for F1 and F2. It is probable that the higher formants should carry more speaker specific information i.e., the formant spread will be less when compared to the formants F1 and F2. This is demonstrated in figures 5.38 to 5.40 for vowels.

To measure the formant spreads i.e., F1, F2, F3 and F4 for vowels, an algorithm for intra and inter-speaker variation is developed as shown in section 5.5 and 5.6.

The formant estimations of vowels for male and female speakers are plotted in tables 5.8 and 5.14.

In the previous section, the experimental results for formants F1 and F2 obtained from AR modeling using MATLAB are compared with the Praat software. The experimental results for formants F1, F2 are shown in the tables 5.19 and 5.20.

The first formant F1 of male is compared with the first formant F1 of female and the second formant F2 of male is compared with the second formant F2 of female and the results are plotted in figures 5.26 and 5.27.

In the section 5.11.2, the experimental results obtained by the Euclidean distance method and MLR using Standard Deviation method for the vowels of the male and female speakers are plotted in figures 5.36 and 5.37 respectively.

6.3 CONCLUSION

Investigation for intra and inter-speaker vocal tract shape estimation is based on Wakita's Speech Analysis Model with reflection co-efficients obtained from LPC filtering of speech signal. From this initial investigation on vowels for dynamic vocal tract modeling, the dynamic models for vowels are derived by taking 25 cylindrical lossless tubes with 24 reflection co-efficients with sampling frequency 22,100Hz and the LPC order 25. It is observed that reflection co-efficients at the lips and the area values for vowels namely /a/, /e/, /i/, /o/ and /u/ values vary respectively.

The reflection co-efficients show the resulting set of 24 reflection co-efficients for area A=0.9632, with the reflection co-efficients of the lips rk = 0.9032 for vowel /a/. The same procedure is repeated for vowels /e/, /i/, /o/ and /u/, with reflection co-efficients rk = 0.8800 for the area 0.8982 for vowel /e/, reflection co-efficient rk = 0.9642 for the area function 0.8780 for vowel /i/, reflection co-efficient rk = 0.9334 for the area function 0.9803 for vowel /o/, reflection co-efficient rk = 0.8835 for the area function 0.7168 for vowel /u/ as shown in the figures 4.22, 4.29, 4.36, 4.43 and 4.50 respectively.

It is found that the higher magnitudes of the reflection co-efficients occur where the relative changes in area are larger.

Investigations are done for intra and inter-speaker vocal tract variability estimation for vowels using LPC coding for male and female speakers. Phonetic distinctiveness and speaker individuality are deeply ingrained in the vocal tract shapes estimated for vowels. This is demonstrated by acoustic model on vowels and speaker based area function approximation to the vocal tract shapes. Here a new technique (algorithm) has been proposed for finding vocal tract shape variability. This is demonstrated by the acoustic model on vowels by taking 'the time variant minimal and maximal vocal tract shape variability' for different male and female speakers for vowels namely /a/, /e/, /i/, /o/ and /u/. These are plotted in figures 4.53 to 4.62. From the results, it is observed that for vowel /a/, vocal tract opens at the front, the tongue is raised at the back and there is low degree of constriction by the tongue against the palate. Also as shown in the figure 4.53, it can be observed that the three peak back reflection co-efficients result from glottis to lips.

The first peak shows the first back reflection, accounting for approximate 30% back propagation and the second peak resonance in mid cavity and the third resonance/reflection at teeth/lips. It is observed that the vocal tract is open at the front and the tongue is raised at the back. The tongue is positioned as far as possible from the roof of the mouth.

For front vowels /e/ and /i/, it is observed that the tongue is positioned forward in mouth during the articulation as shown in figures 4.54 and 4.55.

For the vowel /i/ the vocal tract is opened at the back and tongue is raised at the front. In addition, there is a high degree of constriction of tongue against the palate. Also it is observed from the figure that the first back reflection is due to the constriction of the tongue cavity; the second back reflection due to the folding of the tongue and the cheeks, the third back propagation is due to the teeth and back position of lips.

For vowel /o/ as shown in figure 4.56, it is observed that the lips are generally "pursed" outward, for exolabial rounding, and the inner sides of the lips are visible. Where as in mid to the high rounding of the front vowels, the lips are "compressed", with lip margins pulled in and drawn towards each other. It is also observed that there is a poor back reflection co-efficient in the first peak; this is due to the mild smoother cavity division and the second back reflection co-efficient results due to the teeth.

For vowel /u/ as shown in figure 4.57, it is observed that it is a back vowel named for the position of the tongue during the articulation relative to the back of the mouth. It is also observed that there is a poor back reflection co-efficient in the first peak. This is due to mild smoother vocal cavity division and the second back reflection co-efficient results due to back reflection of the teeth and lips.

Based on analysis results for bounds of vocal tract shapes for maximal average and minimal average of the test samples for vowels are obtained. It is concluded that the vocal tract shape correlation graphs of a vowel /a/ is superimposed on itself versus the discrimination provided against other phonemes. When the vowel /a/ is superimposed on itself and other vowels, the correlations or percentage matching are as follows:

/a/ in /a/ = 87%

/a/ in /e/ = 74%

/a/ in /i/ = 70%

/a/ in /o/ = 54%

/a/ in /u/ = 41%

The correlation results for remaining vowels and their discriminations are shown in table 4.11. The table 4.11 also shows the probability of fault recognition for vowel /a/ in vowel /e/ followed by vowel /i/, /o/ and /u/ respectively. The fault recognition results when there is least variability in the correlation graph.

Our study has resulted in some acoustic articulatory evidence in support of the long standing claim that the upper formant region of steady state vowels contains relatively more speaker specific information than the lower F1, F2 region. F1, F2 are known to contain carrier phonetic cues. Our results also indicate that the speaker individuality can be expected to vary with the place of articulation with a strong F2 and F4 dependency for front vowels.

The experimental results are obtained by our study using Euclidean distance and MLR using Standard Deviation method for vowels of male and female speakers. It is observed that vowel /a/ achieved better classification with 96% compared to MLR method with recognition rate of 92%.

It can be concluded that overall recognition rate and vowel classification results by Euclidean distance method give better results than MLR method. Also it is observed that the vowel recognition rate is better for female speakers compared to the male speakers.

6.4 FUTURE SCOPE

From the investigations reported in this thesis, it is concluded that the vocal tract shape correlation graph of a vowel /a/ superimposed on itself, versus the discrimination provided against the other phonemes. When the vowel /a/ is superimposed on itself and other vowels, the correlations or the percentage matching of the respective vowel recognition score is better when compared to the other methods. Also, this technique explains the possibility of fault recognition, when there is a least variability in the correlation graph.

In this thesis, it can also be concluded that the phonetic distinctiveness can be measured for front vowels in the form of formant spread F1 and F2. The formant spread will be more for F1 and F2 than F3 and F4.

The dynamic vocal tract shape estimations are found consistent by fixing the LPC order as 25. Even the order can be varied depending upon applications. Hence it is suggested that the future investigations for LPC based shape estimation should be carried out to model the VCV and CVC words for word recognition and speaker identification. It is further suggested that the future investigations for LPC based shape estimation for glides and nasal phonemes oriented vocal tract signature, may reduce further uncertainty in recognizing and identifying, and thus improve the accuracy of identification and recognition.

In this thesis, the LPC analysis has been applied on the vocal tract shape estimation based on reflection co-efficients. The vocal tract shape estimation may be investigated with various other techniques like excited LPC, MFCC etc for modeling vowels. Application of the technique is to record a large number of speakers with different age groups and different formal backgrounds, and investigation of vocal tract shape estimation using other analysis techniques like formant tracking, articulatory analysis by synthesis etc. These techniques can be applied for regional languages as these methodologies will considerably recognize the words, detection of vowels from words like /a/ in father.

Writing Services

Essay Writing
Service

Find out how the very best essay writing service can help you accomplish more and achieve higher marks today.

Assignment Writing Service

From complicated assignments to tricky tasks, our experts can tackle virtually any question thrown at them.

Dissertation Writing Service

A dissertation (also known as a thesis or research project) is probably the most important piece of work for any student! From full dissertations to individual chapters, we’re on hand to support you.

Coursework Writing Service

Our expert qualified writers can help you get your coursework right first time, every time.

Dissertation Proposal Service

The first step to completing a dissertation is to create a proposal that talks about what you wish to do. Our experts can design suitable methodologies - perfect to help you get started with a dissertation.

Report Writing
Service

Reports for any audience. Perfectly structured, professionally written, and tailored to suit your exact requirements.

Essay Skeleton Answer Service

If you’re just looking for some help to get started on an essay, our outline service provides you with a perfect essay plan.

Marking & Proofreading Service

Not sure if your work is hitting the mark? Struggling to get feedback from your lecturer? Our premium marking service was created just for you - get the feedback you deserve now.

Exam Revision
Service

Exams can be one of the most stressful experiences you’ll ever have! Revision is key, and we’re here to help. With custom created revision notes and exam answers, you’ll never feel underprepared again.