Signal And Image Processing English Language Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.


The mankind is blessed with different acoustic and linguistic characteristics; especially children's speech has widely different acoustic from those of adult speech (Theodore L. Perry, Ralph N. Ohde, and Daniel H. Ashmead 2001, S. Lee, A. Potamianos, and S. Narayanan 1999, S. Nittrouer and D.H. Whalen 1998). Mostly children's sound has higher pitch and distinguishing frequency components as compared to adults' speech. It is also identified that characteristics of children's speech changes rapidly with the increase in the age. This is due to anatomical and physiological changes occurring during the growth of children as children become more skilled with the increase in their respective age. In order to differentiate the spectrum in the speech of men and women sound fast Fourier transform is used.

Fast Fourier transform (FFT) is an efficient algorithm which determines the spectrum of the speech sounds of males and females. By using this algorithm the differences in spectrum across the gender, different vowels and age group can be observed effectively. A lot of research has been carried out on children's speech development as mentioned in two speech-development research surveys (Ray D. Kent 1976, Houri K. Vorperian 2007). The acoustic development of vowel production is very important and plays a vital role in acoustical features of vowels of children. The acoustic features of vowels are the strong part of the anatomical and physiological development in their childhood.

In the past different acoustic studies of vowel production in children's speech as well as adults' speech (Gordon E. Peterson and Harold L. Barney 1952) have shown that (i) children exhibits large distinguished frequency and pitch in comparison to adults speech; (ii) with the increase in the age of children their vowel formant frequency, variability and area of vowel space is gradually reduced; (iii) the difference in the formant frequency pattern of male and female appears in the age of 3 to 5 years of their childhood; and (iv) the formant frequencies of vowels reduces at more faster rate and lead to more smaller values in male children than female children.

It should be noted that almost all the previous studies conducted on acoustic developments were carried out on English-speaking children data and a very little amount of research is conducted in children. This paper uses the acoustic vowels of six Malay children and recognizes their vowels by using neural network technique. The vowels were obtained by a speaker-independent manner. The system applied Multi-layer Perceptron with one hidden layer for recognizing these vowels and layer was trained and tested with speech samples of Malay children. The sound of the children having age in the range of 7-12 years including male and female. The signals of vowels /a/, /e/, /i/, /o/, /u/, and /ae/ obtained from Malay children were processed using MATLAB software. The results are shown in the shape of graph showing signal power versus frequency


Once the vowels waveform of female and male from 7-12 years groups are acquired and digitized, it can be fast-Fourier-transformed to the frequency domain. Signal modeling represents the process of converting sequences of speech samples to observation vectors representing events in a probability space (Joseph Picone). MATLAB has the fft function, in which it is able to perform Discrete Fourier Transform(DFT) computation in an efficient manner. It is termed as Fast Fourier Transform (fft) (Bren Ninness).

Two main forms of spectral measurements used in speech recognition systems and classes are Power; which measures the gross spectral or the temporal power of the signal and Spectral Amplitude; which measures the power over particular frequency intervals in the spectrum.

The functions Y=fft(x) implement the transform pair given for vectors of length N by:




is an Nth root of unity.

In MATLAB®, the fft functions are based on a library termed FFTW. When N is a prime number, the FFTW library first decomposes an N-point problem into three (N - 1) point problems using Rader's algorithm. It then uses the Cooley - Tukey decomposition to compute the (N - 1) point DFTs.

For most N, real - input DFTs require roughly half the computation time of complex - input DFTs. However, when N has large prime factors, there is little or no speed difference. The execution time for fft depends on the length of the transform. It is fastest for powers of two. Fft supports inputs of data types double and single. If one calls fft with the syntax y = fft (X,…), the output y has the same data type as the input X.

Y = fft(X,n) returns the n-point DFT. fft(X) is equivalent to fft(X, n) where n is the size of X in the first non singleton dimension. If the length of X is less than n, X is padded with trailing zeros to length n. If the length of X is greater than n, the sequence X is truncated. When X is a matrix, the lengths of the columns are adjusted in the same manner.

Since the FFT generates the frequency spectrum for a time domain waveform, harmonic analysis, distortion analysis, and modulation measurements can be done to the waveform that had been fft. Elements of a particular speech signal include spectral resonances, periodic excitation resembling pitch, voicing and pitch contour, noise excitation, transients, amplitude modulation and the timing. When the signals are measured as a time signal, they are converted into a spectrum, and in this assignment; these signals are converted into power spectrum. This spectral analysis shows the amplitude of the various frequencies contained within the signal. On this spectrum, it is usual for a resonance to occur. The resonance can be seen by comparatively large amplitude at a specific frequency.

The absolute value will offer one with the total quantity of information contained at a given frequency, the square of the absolute value is considered as the power of the signal. MATLAB codes which is used to determine and analyze the speech sounds of males and females, and the flow chart depicting the chronology of the codes written to obtain the power spectrum for each individual signals are shown in Figure 1 and Figure 2 respectively.


This experiment is conducted to determine the spectrum of speech sound of males and females and to observe the differences in spectrum across the gender, different vowels and age group. This experiment involved children from age of 7-12 years old, both male and female. The speech signals for vowels /a/, /e/, /i/, /o/, /u/, and /ae/ were collected and processed using MATLAB software. The results produced are in a form of signal power versus frequency.

From the results, it is observed that each vowel has a unique pattern that doesn't change with change in the pitch of speech (female speaker is known to have higher pitch compared to male speaker). Each vowel is recognized from its formant frequency which is the peaks in the spectrum of a sound. Usually, first two formants (f1 and f2) are used to distinguish vowels. According to (Ladefoged, 2001), the first formant, f1 usually has higher frequency for an open vowel like /a/ and lower frequency for a close vowel like /i/ and /u/ while the second formant, f2 has higher frequency for a front vowel such as /i/ and lower frequency for back vowel like /u/. Formant frequency analysis is important because it provides details of articulatory behaviors (typical or atypical) involved in speech. Figure 1 shows the position of formant frequency in a spectrum.

From the experiment, it is observed that each vowel has the main formant region where f1 and f2 lies. For /a/ vowel, the first formant region is in between 200-400 Hz. Referring to Figure 2, the main formant region for adult that speaks Bahasa Malaysia for vowel /a/ is between 500-2000 Hz where the first formant region, f1 lies between 500-1000 Hz. This range is far higher than the range from the results even though it is a fact that formant frequency decrease with age. The main formant region for these children should be higher than the main formant region of adult uttering the same vowel (Note that the results from the experiment are in the form of signal power but the results from (Shahrul Azmi et al, 2010) is in dB. However, they are comparable since the shape of the spectrum of the power signal from the experiment is the same with the shape of power signal from the experiment converted to dB). The same problem also occurs for other vowels (/e/, /o/ and /ee/) where their main formant regions are smaller than the main formant regions of the adult uttering the same vowels.

This disagreement with the fact that formant frequency decrease with age may lies in the execution of the experiment. There might be some errors made during collection of data since the results are dependent on the equipment used (types of speaker used), whether the children are trained properly to pronounce the vowels correctly prior to the experiment and etc. These factors may affect the experiment thus producing results that are inconsistent with the expected results. But, surprising thing is, unlike the other vowels, the main formant regions of vowel /i/ and /u/ of the children are in agreement with the results by (Shahrul Azmi et al, 2010) thus making the factors that may affect the results, questionable.

Figure.4 Spectrum envelope of vowels for different speakers match up (Shahrul Azmi et al, 2010)

According to (Theodore, 2001), the vowel formant frequencies differentiate gender for children as young as 4 years old. Male is known to have lower pitch than female which automatically makes them to have lower formant frequency compared to female. (Bennet, 1981) suggested that the difference is mainly because boys usually use smaller jaw opening, more lip rounding and has lower larynx position than girls. It is observed from the experiment that this claim is true but some of the results show that boys have the same or some even higher formant frequency that the girls. This is probably because the boys haven't reached their "mature age" compared to the girls of the same age. The girls who have reach "mature age" will experience change in the voice where their formant frequency decrease thus making the boys who haven't reach the "mature age" yet to have higher formant frequency compared to their female counterparts.

The vowel formant frequency values decrease with increase in age. From childhood to adulthood, vowel formant frequencies reduce at a faster rate and reach smaller value for male than female. According to (Theodore, 2001), the vowel formant frequencies differentiate gender for children as young as four years of age, while both formant frequencies and fundamental frequency, f 0 differentiate gender after 12 years of age. However, the results show very little decrease in the value of formant frequency with age. In my opinion, the difference would be much significant if the age difference is much higher. The pattern would be much clearer if the comparison is made between children of age range of 7-12 with adults of age range of 30-40 for instance. The difference cannot be clearly stated as the data is collected from primary school children with minimum age difference of 1 year and maximum age difference of 5 years. There are no clear relationship between signal power with the age, gender and vowels.

More studies should be conducted on Bahasa Malaysia vowel recognition. Understanding the developmental changes in children's speech can help devise strategies to deal with acoustic mismatch between different age groups and gender. This analysis is important in applications such as Automatic Speech Recognition (ASR) and in early literacy as well as reading assessment.