This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
The project scopes to transform the speech of the source speaker to the distinct speech that sounds as if spoken by the target speaker. This process will change the identity of the speaker without change the content of the speech. Implementation of the project is done using Linear Predictive coding algorithm. This document will include description of methodology and the features with drawbacks/ advantages. LP Coefficients are generated using Levinson-Durbin Algorithm. LPC technique is used for Speech analysis and synthesis. Graph Representation is produced for Line Spectral Frequencies (LSF) or line spectral pairs (LSP). The entire project is developed in MATLAB 7.5 version.
Study and the analysis of speech required the ability to model the acoustical speech waveform to be mathematically as a linear and time invariant system. This proved difficult due to its complex structure and speech being a continuous process over time and the speech production depending the anatomical and physical changes of the vocal tract that were time invariant. However studies on speech production and analysis have shown that voiced speech over short time intervals (in the order of tens of milliseconds) did behave in a periodic fashion with each period being almost similar to the previous one. A time domain is a representation of a speech signal which shows the periodicity of the signal over short durations. Hence, effective analysis and modelling of speech signals could be done on signals taken over short durations. The frequency domain analysis with voiced sound showed cyclic behaviour at a rate of 1/P. In the case of unvoiced signals, however no periodic behaviour was noticed . Speech analysis established layer in recognition, synthesis and coding.
The modern electronics approach to speech synthesis has begun in the 1930s at the
Bell Laboratories. Speech synthesis is the artificial production of human speech. Since that time the quantity and the effectiveness of cost for the synthesis speech has been improved dramatically. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware. Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diaphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal track and other human voice characteristics to create a completely synthetic voice output.
Figure â€Ž0â€‘1 Speech Synthesis Based on LPC Model
Voice synthesis by LPC model is illustrated in below figure which is widely used to produce synthesis speech. As discussed previously for the LPC parameters, the coefficients of the synthesis filter are calculated by using interpolation of the LPC parameters. For the speech modification process, a mapping is performed between original and modified pitch marks and so the LPC parameters are obtained from Suitable pitch period in the original speech. The synthesis of a speech segment is based on LPC method.
1.3.Speech Analysis and Synthesis techniques
In the past few years Linear Predictive coding technique was used in communication field. Recently it has found applications in speech analysis and synthesis.
Linear predictive coding: Analysis and synthesis of the speech using Linear Predictive coding requires only low bit rate synthesis portion of LPC is used in various application like telecomm communication etc. In Recent Speech encoder different aspects of linear predictive modelling are introduced. Linear Predictive coding can provide effective and accurate output speech parameter which is efficient to operate. It is also an effective method for voice encoding. Furthermore many aspects of linear prediction modelling are incorporated in more recent, higher quality speech coders.
Linear Predictive Coding (LPC) is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate. It provides extremely accurate estimates of speech parameters, and is relatively efficient for computation [HOLMO1].
1.5.Main Goal and Objectives of Project
The main objective of the project is to focus on speech alteration considering three parameters of LPC and producing the altered voice signal with distinct voice characteristics but without changing the content of the speech. Development of the project is done using MATLAB simulation tool with human voice as input and LPC as implementation algorithm.
1. This chapter represent introduction in which describe the aims and objectives of the project.
2. The second chapter describes state of art and back ground knowledge of linear predictive coding and line spectral pairs.
3. In chapter 3 describes system analysis and implementation.
4. The fourth chapter describes the performance analysis and output of the graphs
2. STATE OF ART AND BACK GROUND KNOWLEDGE:
2.1.Review of Literature
This chapter illustrates the properties of speech which the LPC based on. Following that, the fundamentals of Linear Prediction Coding are presented. And some conventional methods to obtain the LP coefficients are also summarized. Furthermore, the spectral representations which are involved for the implementation of this speech analysis and synthesis system, LSF's (Line Spectral Frequencies) or LSP (line spectral pairs) will also be explained.
Finally, the pitch period estimation method is introduced which is used for re-synthesis the speech signal.
2.2.Classification of Speech
The classification of the speech signal into voiced, unvoiced, and silence provides a preliminary acoustic segmentation of speech, which is important for speech analysis. The nature of the classification is to determine whether a speech signal is present and, if so, whether the production of speech involves the vibration of the vocal folds. The vibration of vocal folds produces periodic or quasi-periodic excitations to the vocal tract for voiced speech whereas pure transient and turbulent noises are periodic excitations to the vocal tract for unvoiced speech. When both quasi-periodic and noisy excitations are present simultaneously, the speech is classified here as voiced because the vibration of vocal folds is part of the speech act. The mixed excitation, however, could also be treated as an independent category [HOLMO].
Figure â€Ž0â€‘2 classification block
2.1.1. Properties of speech: A speech signal can roughly be divided in two classes, voiced and unvoiced. Voiced sounds are produced when the flow of air, coming out of lungs, is interrupted by the periodic opening and closing of the vocal cords. The sound pressure wave after the vocal cords is referred to as the glottal excitation. For voiced speech, the glottal excitation is quasi-periodic where each period is called a pitch pulse. For unvoiced speech, the vocal cords do not vibrate and the glottal signal is noise-like. Unvoiced sounds are produced when the turbulent flow of air is passed through a constriction somewhere along the vocal tract. The vocal tract starts above the larynx and ends at lips, including the oral and nasal cavities. The action of the vocal tract is to introduce resonances in the speech spectrum. The shape of the vocal tract changes relatively slowly, leading to slow rate of change in the speech envelope spectrum. Fig 2-2 shows a segment of speech recorded with a microphone. It exhibits both voiced and unvoiced regions .
Figure â€Ž0â€‘3 A speech segment with voiced and unvoiced regions
Due to the inherent limitations of the human vocal tract, speech signals are highly redundant. These redundancies allow speech coding algorithms to compress the signal by removing the irrelevant information contained in the waveform. Knowledge of the vocal system and the properties of the resulting speech waveform are essential in designing efficient coders. The properties of the human auditory system, although not as important, can also be exploited to improve the perceptual quality of the coded speech.
2.3.Literature Linear Predictive Coding
In the past few years Linear Predictive coding was used in communication field. Recently it has found applications in speech analysis and synthesis for example speaker identification and word recognition. LPC estimates the predictive coefficients using time domain speech waveform rather than short term frequency waveform which makes LPC relatively efficient method for encoding speech compared to other frequency domain techniques.
Speech analysis and synthesis with Linear Predictive Coding exploit the predictable Nature of the speech signals based on several previous samples. The main concept behind linear predictive coding is that speech can be reasonably predicted by weighting the sum of previous speech samples. Hence this process includes solving a set of linear equations to determine predictor coefficients which are obtained by minimising the square difference between the speech samples and the one which are linearly predicted. The basic difficulty of the LPC system to determine formants from the speech signal and difference equation is its solution that expresses each sample of the signal as a linear combination of previous samples. This equation is known as Linear Predictive and this is why it is referred to as Linear Predictive Coding. There is also one reason we use LPC technique in this project; that is it provides acutely accuracy in speech parameters estimations and comparably efficient for computation.
2.3.1Linear Predictive Coding
Linear predictive coding describes the properties of speech analysis and synthesis which is based on LPC. Following that, the fundamentals of Linear Prediction Coding are presented. And some conventional methods to obtain the LP coefficients are also summarized. LPC starts with the assumption that the speech signal is produced by a buzzer at the end of a tube. The space between the vocal cords produces the buzz, which is characterized by its intensity and frequency. The vocal tract forms the tube, which is characterized by its resonances, which are called formants.
LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal is called the residue.
The numbers which describe the formants and the residue can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the residue to create a source signal, use the formants to create a filter and run the source through the filter, resulting in speech.
Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames. Usually 30 to 50 frames per second give intelligible speech with good compression.
Furthermore, one spectral representation which is involved for the implementation of this voice alteration system, LSF's (Line Spectral Frequencies) will also be explained.
Finally, the pitch period estimation method is introduced which is used for re-synthesis the speech signal [MARK 76].
Purpose of LPC: Linear prediction is a good tool for analysis of speech signals. Linear prediction models the human vocal tract as an infinite impulse response (IIR) system that produces the speech signal. For vowel sounds and other voiced regions of speech, which have a resonant structure and high degree of similarity overtime shifts that are multiples of their pitch period, this modelling produces an efficient representation of the sound.
Working principle: LPC exploits the redundancies of a speech signal by modelling the speech signal as a linear filter, excited by a signal called the excitation signal. The excitation signal is also called the residual signal. Speech coders process a particular group of samples, called a frame or a segment. The speech encoder finds the filter coefficients and the excitation signal for each frame. The filter coefficients are derived in such a way that the energy at the output of the filter for that frame is minimized. This filter is called an LP analysis filter. The speech signal is first filtered through the LP analysis filter. The resulting signal is called the residual signal for that particular frame. Actually for the decoder, the inverse of the LP analysis filter acts as the LP synthesis filter, while the residual signal acts as the excitation signal for the LP synthesis filter[MARK76]. The whole process is shown in below
Figure 2-3. LPC working principle diagram
Applications for the Blind.
Applications for the Deafened and Vocally Handicapped.
Applications for Telecommunications and Multimedia.
Limitations of LPC: The over simplistic model that the LPC relies on has relatively low computational cost and makes the low bit rate speech coder a practical reality. But this model is also inaccurate in many circumstances, creating annoying artifacts in the synthetic speech.
Linear predictive coding (LPC) has some disadvantage in speech processing system. After study the whole LPC , we can say that linear predictive coding(LPC) is an analysis/synthesis technique to loss speech compression that attempt to model human production of sound instead of transmitting an estimate of the sound wave. In a secure telephone system, Linear Predictive Coding is achieved a bit rate 2400 bits/sec which is much suitable for telephone systems. Telephone system is very much concerned about the main content of speech and the meaning of speech, rather than quality of speech. Linear Predictive Coding (LPC) unable to maintain better quality of speech due to apply low bit rate. LPC doesn't model zeros in the system and speech signal with high bit rate frequency, the predictor coefficients are influenced by the pitch period.
2.3.2.Linear Predictive Speech Coding Standards
There has been enormous activity recently in establishing speech coding standards both nationally and internationally, these standards have a substantial impact on speech coding techniques. For this project, it is not really so necessary to concern the entire problem about speech coding standard. But the speech coding standard is helpful for the implementation of the system. For instant, it will be helpful for setting some parameters for the system and system stability measurement.
LPC prediction order
10 for voiced and 4 for unvoiced speech
AMDF method, Range 51.3-400 Hz, Coding: Semi log, 60 values
Summary: Linear Predictive Coding is a powerful speech analysis technique for representing speech for low bit rate transmission or storage.Linear Predictive Coding technique is mostly used in processing speech and audio signal. Its low bit rate signal output can be used on most application, such as telephone system. Although the quality of the signal output is not that good, it is acceptable for the human listening.
Linear Predictive Coding analyser breaks up a sound signal into different segments and then sends the information on each segment to the synthesizer. The analyser sends the information on whether the segment is voiced or unvoiced is used to create an excitement signal in the synthesizer. The analyser also sends information about the vocal tract to build a filter on the synthesizer side to reproduce the original signal.
The Levinson-Durbin algorithm uses the autocorrelation method to estimate the linear prediction parameters for a segment of a random signal. The levinson-Durbin algorithm is an import method for allocating an all-pole IIR filter with a prescribed deterministic application sequence and an error term is evaluated for each iteration of the procedure. If the error term reduces under an acceptable threshold value the iteration will be truncated and succeeding reflection coefficients equated to zero. The levinson algorithm has some deterministic applications in filter design, coding and spectral estimation. The levinson produced filter has a minimum phase. If the a = levinson(r, p) finds the autoregressive coefficients linear process of the pth order which has its autocorrelation sequence. r = is a real and complex deterministic autocorrelation sequence and p is the order of denominator of A (z) that is a = [1, a (2)... a (p+1)]. The filter coefficients are ordered in descending powers of Z .
2.5.Line Spectral Pairs
The line spectrum frequencies were first introduced by Itakura as an alternative parametric representation of linear prediction coefficients. Due to many desirable properties, the LSF has received widespread acceptance in speech analysis applications.
Figure 2-4 Line spectral pairs in speech analysis and synthesis
The Line Spectrum Frequencies representation has a number of properties which includes as bounded range, it was found that LP parameters have a large dynamic range of value and so this is why they are not good for quantization. The LSF representation adds a sequential ordering of the parameters and checking for filter stability. This makes it desirable for the quantization of LPC parameters. Moreover, the Line Spectral Frequencies (LSF) is a frequency domain representation and so it may be used to accomplishment certain properties of the human perception system. Here the origin of LSF is explained and properties of this alternative representation are described. The LPC analysis results in all pole filters with p poles whose transfer function is denoted by
In which inverse filter is given by
Here p is the order of LP analysis. Here two polynomial are formed. The symmetric or even polynomial is called the sum filter polynomial
and anti symmetric or odd polynomial also called the difference filter polynomial
The roots of polynomials P (z) and Q (z) are called the LSFs. Each of the polynomials
P (z) and Q (z) have p+1 root. And here total of their 2p+1 root, two of them are
Located at z =1 and z = -1. These are known as the extraneous roots. The roots of P (z) and Q (z) have following properties:
1. They lie on the unit circle and so each root is determined by its angle. This property guarantees the existence of the LSF when a (z) is minimum phase.
2. Angles occur in pairs because the polynomial coefficients are real so roots occur in complex conjugate pairs.
3. The roots of these two P (z) and Q (z) polynomials are interlaced on the unit circle and so they are in orders
2.6.Previous Works done on Speech Analysis and Synthesis
Begin with the basic implementation of LP analysis. We discussed the effect on the performance of LP analysis due to the change of different parameters (LP order, window length, frame length, pitch). The performance was measured based in LPC, which denoted the quality of speech.
The Mat lab programming language has become almost a mandatory medium for many signal processing tasks. Its popularity is due to several factors: provision of a fairly complete set of facilities for dealing with a wide variety of applications. There are several low level programming languages such as C++ and Java that can be used to demonstrate the application of DSP but this requires a high proficiency in programming. However there are several high-level software packages that can be used to teach signal processing such as MATLAB, Mathematical or Lab view. MATLAB creates a flexible integrated programming environment which is easy to use and understand. MATLAB has superior graphics handling and visualization capabilities and ease and forgiveness of programming. Applications which are written in MATLAB are open and any user can look at code. For these reasons it has to be cleared that MATLAB is the best choice to develop applications which, satisfy the above mentioned criteria.
3.SYSTEM ANALYSIS AND IMPLEMETATION
This chapter will document the implementation of the voice alteration system. Firstly, different methods for source speech input will be implemented. And then the parameters that used for LPC analysis will be discussed. Furthermore, how to transfer the LP coefficients to LSF will also be presented. Finally, e speech feature parameter will be modified for alteration of voice characteristics.
Can this system be created with the resources (and budget) we have available?
Will this system significantly improve the organization?
Does the old system even need to be replaced?
The package selected to develop the Project is Mat lab 7.0 Tool. The selected package has more advanced features. As the system is to be developed in Image Processing Domain, I had preferred MATLAB tool that supports all class libraries.
Windows XP with all features is selected as the Development (Operating System) area to install and develop the system in MATLAB.
Windows XP Professional offers a number of features unavailable in the Home Edition, including:
The ability to become part of a Windows Server domain, a group of computers that are remotely managed by one or more central servers.
A sophisticated access control scheme that allows specific permissions on files to be granted to specific users under normal circumstances. However, users can use tools other than Windows Explorer (like cacls or File Manager), or restart to Safe Mode to modify access control lists.
Remote Desktop server, which allows a PC to be operated by another Windows XP user over a local area network or the Internet.
Offline Files and Folders, which allow the PC to automatically store a copy of files from another networked computer and work with them while disconnected from the network.
Encrypting File System, which encrypts files stored on the computer's hard drive so they cannot be read by another user, even with physical access to the storage medium.
Centralized administration features, including Group Policies, Automatic Software Installation and Maintenance, Roaming User Profiles, and Remote Installation Service (RIS).
Internet Information Services (IIS), Microsoft's HTTP and FTP Server.
Support for two physical central processing units (CPU). (Because the number of CPU cores and Hyper-threading capabilities on modern CPUs are considered to be part of a single physical processor, multi core CPUs is supported using XP Home Edition.)
Windows Management Instrumentation Console (WMIC): WMIC is a command-line tool designed to ease WMI information retrieval about a system by using simple keywords (aliases).
Next step in the analysis is feasibility study. By performing feasibility study the scope of the system will be defined completely.
Most computer systems are developed to satisfy a known user requirement. This means that the first event in the life cycle of a System is usually the task of studying whether it is feasible to a Computerize a system under consideration or not. Once the decision is made, a report is forwarded and is known as Feasibility Report.
The Feasibility is studied under three contexts
What resources are available for given developer system? Is the problem worth solving? In the proposed system, technical feasibility centres on the existing computer system (hardware, software etc.) and what extent it can support the proposed system. There should not be more cost involved here for the hardware because all the hardware required are present in the existing system and software specified also exists. Therefore, now we needed to install the software on existing system for the project. And the operation of this system requires knowledge about Windows XP or window Professional. This assistance would be easily available.
Even though these technical requirements are needed for implementing the system, once the code is generated and compiled, the executable code of the project is sufficient to run the application. Hence the proposed system is technically feasible.
Economic feasibility is used for evaluating the effectiveness of a candidate system. The procedure is to determine the cost and benefits/savings that are expected from a candidate system and compare with the costs. If cost is less and benefit is high, then the decision is made to design and implement the system. All the required facilities, hardware and software, to be used are initially may be costly, but when put to use it proves to be much more economical than the existing system. Regarding the maintenance, since the source code will be with the company, any small and necessary changes can be done with minimum maintenance cost involved in it. The system that is developed and installed must be good investment for organization. The organization has to spend the amount for technology, as it is not computerized. The present system performance is high when compared to the previous system. So for the organization the cost factor is acceptable, so it is economically feasible.
If installed will certainly be beneficial since there will be reduction in manual work, and increase in the speed of work there by increasing the profit of company and saving time. The proposed system is cost effective one compared with the current existing system. Hence the system is economically feasible.
Speech Synthesis and Analysis system is mainly involved in providing the modulation of voice. This is done by using image processing class libraries. The system should include features like
Generate Pitch value
Generate LPC Coefficients
LPC Analysis & Synthesis
The main problem in developing a new system is getting acceptance and the co-operation from the users because many users are reluctant to operate on a new system. The software being developed is more interactive. With the developing system, it is instantaneous; moreover even a new person can operate the system and easily execute the system. So it is operationally feasible
3.6.Data Flow Diagram
A data flow diagram is a graphical representation that depicts information flow and the transforms that are applied as data move from input to output.
A data flow diagram may be used to represent a system or software at any level of abstraction. DFD's can be partitioned into levels that represent increasing information flow and functional details.
DFD, also called a fundamental system model or a context model, represents the entire software element as a single bubble with input and output data indicated by incoming and outgoing arrows, respectively. Each of the process represented at level 1 is a sub function of the overall system depicted in the context model.
Data Flow Diagram Symbols:
- Source or Destination of data
- Data Flow
Figure â€Ž0â€‘4 block diagram of Speech Analysis and Synthesis using LPC method
Block Diagram describes the process flow from voice input to synthesized output. Compute the frame size and overlapping frame size Hamming window will be generated for each frame. Next LPC filter coefficients filter gain and error signal are generated. Then transmit these three signals to the receiver. Now construct an inverse filter with the LPC coefficients and the filter gain. Last Apply the error signal to the inverse filter and then synthesize the speech back.
The LPC synthesis filter is produced by either a impulsive source or a noise source, depending on whether the analyzed speech is estimated to be voiced or not. The production is specified by a gain factor, a voiced/unvoiced bit, and if voiced a fundamental frequency value. Spectral and excitation parameters are fetched from the stored speech units, typically periodically. Often the parameters from successive frames are linearly interpreted during a frame, to allow more frequent updates to the synthesizer.
The simplest hardware DAC, bundled with speech synthesis software, marketed originally as part of speech synthesis system. In the previous system combined ADC (Analog to Digital Converter) with tiny microphone preamplifier and DAC (Digital to analog Converter) with amplification are used for speech synthesis. Sound quality was also superior due to sound filtering schemes used. In its simplest form, that received 8-bit, mono signal through the parallel port and produced analog output that could be amplified and played back on loudspeakers.
Drawbacks in Existing System:
Require more Bandwidth
Less Bit Rate
Accuracy is less
A proposed technique used for speech analysis and synthesis is linear predictive coding (LPC). In this technique, the previous n samples of a speech signal are used to predict the next sample of the signal. The prediction error, which is the error between such a reconstructed sample and the actual sample, is minimized. It is possible to synthesize a signal with a reduced bit rate and reasonably good quality. Such signals can be extremely useful for low bandwidth and therefore, low bit rate applications.
Advantages of Proposed System:
Synthesize speech of acceptable quality, with a sizeable reduction in bit rate.
A simple speech analysis and synthesis systems:
Linear predictive coding (LPC) methods require efficient representation of both the LPC
filter and its input excitation to synthesize high quality speech at low bit rates. Although
important progress has been made in encoding the LPC filter parameter, it is still not
possible to encode the excitation at low bit rates and maintain high voice quality in the
synthetic speech signal. by using LPC speech synthesis, it is possible to create an
excitation that will produce an exact duplicate of the speech signal at the output of the
LPC filter. However, the digital representation of the prediction residual can use up almost as many bits as the speech signal itself and therefore it does not offer a useful solution for synthesizing speech at low bit rates. Voice-excited and residual-excited vocoders sought to reduce the bit rate by encoding only the baseband and by regenerating the higher frequencies at the receiver using non linear techniques.
Fig 3-2.LPC filter system
LPC Analysis and Synthesis:
Linear predictive coding (LPC) is a popular technique for speech compression and speech synthesis. The theoretical foundations of both are described below.
Correlation, a measure of similarity between two signals, is frequently used in the analysis of speech and other signals. The cross-correlation between two discrete-time signals x[n] and y[n] is defined as
Rxy [l] = âˆ‘ âˆžn=-âˆž (x [n] y [nâˆ’l ] )
where P is the number of past samples of s[n] which we wish to examine.
Next we derive the frequency response of the system in terms of the prediction coefficients a k. In Equation 4, when the predicted sample equals the actual signal (i.e., s[n] =s[n]), we have s[n] = âˆ‘ P K=1 (a k s[nâˆ’k]) s(z) =
âˆ‘ P K=1 (a k s(z) z âˆ’k )
s(z) = 1/ P 1- âˆ‘ P K=1 ( a k z -k) (5)
The optimal solution to this problem is Rabiner and Juang a= (a1 a2 â€¦â€¦.a p) r = (
R ss  R ss  R ss [P] T R = (rss rss â€¦. rss[Pâˆ’1]
rss rss[0 â€¦.. rss[Pâˆ’2]
rss[Pâˆ’1] rss[Pâˆ’2] â€¦. rss
Due to the Toeplitz property of the R matrix (it is symmetric with equal diagonal elements), an efficient algorithm is available for computing a without the computational expense of finding R-1. The Levinson-Durbin algorithm is an iterative method of computing the predictor coefficients a Rabiner and Juang.
Initial Step: E0 = rss , i=1
for i=1 to P.
In this application, lattice filters are used rather than direct-form filters since the lattice filter coefficients have magnitude less than one and, conveniently, are available directly as a result of the Levinson-Durbin algorithm. If a direct-form implementation is desired instead, the Î± coefficients must be factored into second-order stages with very small gains to yield a more stable implementation .
Figure 3-3: IIR lattice filter implementation
When each segment of speech is synthesized in this manner, two problems occur. First, the synthesized speech is monotonous, containing no changes in pitch, because the Î´[n] 's, which represent pulses of air from the vocal chords, occur with fixed periodicity equal to the analysis segment length; in normal speech, we vary the frequency of air pulses from our vocal chords to change pitch. Second, the states of the lattice filter (i.e., past samples stored in the delay boxes) are cleared at the beginning of each segment, causing discontinuity in the output.
To estimate the pitch, we look at the autocorrelation coefficients of each segment. A large peak in the autocorrelation coefficient at lag lâ‰ 0 implies the speech segment is periodic (or, more often, approximately periodic) with period l. In synthesizing these segments, we recreate the periodicity by using an impulse train as input and varying the delay between impulses according to the pitch period. If the speech segment does not have a large peak in the autocorrelation coefficients, then the segment is an unvoiced signal which has no periodicity. Unvoiced segments such as consonants are best reconstructed by using noise instead of an impulse train as input .
To reduce the discontinuity between segments, do not clear the states of the IIR model from one segment to the next. Instead, load the new set of reflection coefficients, ki, and continue with the lattice filter computation.
Step 1: Input the speech signal into matlab.
Step 2: Compute the number of samples.
Step 3: Compute the frame size and overlapping frame size.
Step 4: Use hamming window on each frame.
Step 5: Generate LPC filter coefficients, filter gain and error signal.
Step 6: Transmit these three signals to the receiver.
Step 7: Construct an inverse filter with the LPC coefficients and the filter gain.
Step 8: Apply the error signal to the inverse filter and then synthesize the speech back.
Algorithm for finding out the co-efficient of LPC filters:
Assume an order for the filter.
Based on that generate the system of linear equations.
Solve the equations using Levinson Durbin algorithm to find out the filter
Calculating filter pitch:
The minimum and maximum lag for the correlation is specified.
Find out the short term co-relation of error signal with respect to max lag.
Find the maximum auto-correlation.
Check whether the co-efficient is greater than specified threshold it is treated as voice data.
This pitch value is used to calculate the filter gain.
Gain: amplitude response of LP filter any single frequency
Calculating filter gain:
If the frame is treated as a voice frame then it actually it calculates gain using formulae which is the square root of product of the pitch and predictor error energies.
If it is unvoiced then it is just square root of predictor energies.
Synthesizing the speech.
Reconstruct the filter with gain and filter coefficients.
Apply the error signal
Play the synthesized speech.
4. PERFORMANCE ANALYSIS
The coding software of this project is developed using Matlab . As discuss in Chapter 2.8 Matlab is the powerful software for the use of numerical computation and graphics. Besides that, Matlab have the build in function on Linear Predictive Coding. The purpose of the LPC function in the Matlab is to compute linear prediction filter coefficients. LPC determines the coefficients of a forward linear predictor by minimizing the prediction error in the least squares sense. It has applications in filter design and speech coding.
4.2.Instruction on using software
To start the software, type coding in the Mat lab command window and the Coding Function window will appear as shown in figure 4-1
Figure â€Ž0â€‘5 User Interface for Speech analysis and synthesis
Click on the load file button and the Open File window will appear as shown in figure 4-1. Chose one of the *WAV file from the example folder. (Audio.wav has been chosen in this example). It shows the user interface for speech analysis and synthesis(menu)
Figure â€Ž0â€‘6 Generate LPC
After selected the original signal , click on the Analysis button(generate lpc) and the Analysis window will appear as shown in figure 4-2, based on that it generate linear equations. Here solve the linear equations use the levinson-durbin algorithm to find out filter co-efficient
Generate pitch: Pitch estimation in speech signal is one of the most important parameters in speech analysis, synthesis, and coding application. This is the fundamental frequency of voice speech. Pitch frequency is directly related to the speaker and sets the unique characteristic of a person. For men the possible pitch frequency range is between 50 to 250Hz, while for women the range is between 120 to 500 Hz. In term of period the range for male is 4 to 20ms while for female is 2 to 8ms.For each frame we have to estimate a pitch period. To do that we compare a frame with past samples, then it is possible to identify the period in which the signal repeats itself, this resulting in an estimate of actual pitch period. We have to remember that the estimation procedure make sense only for voiced frames. Design of a pitch period estimation algorithm is a complex undertaking due to lack of perfect periodicity, interference with formants of vocal tract, uncertainty. Many techniques have been used for pitch period estimation one of them is the autocorrelation maximized
Figure â€Ž0â€‘7 Generate Pitch
Generate gain: amplitude response of LP filter any single frequency.
Figure â€Ž0â€‘8 Generate gain
Plot LSP: If the frame is treated as a voice frame then it actually it calculates gain using formulae which is the square root of product of the pitch and predictor error energies.
If it is unvoiced then it is just square root of predictor energies.
Figure 4-5. Line spectral pairs
5.SUMMARY AND FUTURE WORK
This chapter overviewed the fundamentals of LPC and presented the methods to obtain the speech characteristics by LPC analysis. In this project, the LPC analysis will be done primarily with the Levinson Durbin algorithm. The LSF representation of LPC analysis will serve as a measure of voice characteristic alteration. At next chapter, the main issue will be given for the system implementation by these methods and some
Future work planned for this project is to implement some effective filtering technique to filter the noise from the synthesized output. It is expected that by noise filtering method we can reduce the harmonics present in the Synthesized and analyzed output.
In this project, a novel solution to a Speech Analysis and Synthesis has been proposed. The solution is obtained by segmenting the speech into frame and Generating the pitch and LPC Co-eeificents, and it is based on an alternating update scheme, introduced to provide an approximate solution of the Synthesis model.
1] Ping-Fai Yang and Yannis Stylianou "Real Time Voice Alteration Based on Linear
Prediction" AT&T Laboratories-Research.
 Jerry D. Gibson, Toby Berger, Tom Lookabaugh, Dave Lindbergh and Richard L.
Baker, 1998, Digital Compression for Multimedia: Principles and Standards, Morgan
Kaufmann Publishers, pp. 142, pp.193.
 John G. Proakis and Dimitris G. Manolakis, 1996, Digital Signal Processing Principles, Algorithms, and Applications, Third Edition, Prentice Hall, p891.
. Tamanna Islam, Interpolation of Linear Prediction Coefficients for Speech Coding, McGill University, pp. 22.
. Wesley Pereira, 2001, Modifying LPC Parameter Dynamics to Improve SpeechCoder Efficiency, McGill University,pp.9.
 Wai C. Chu, " Speech coding Algorithm, foundation and Evolution of
standardized Coders", John Wiley & sons, Inc., PUBLICATION 2003, Chapter 1.
[ARIY05] Dr. Aladdin Ariyaeeinia (2005),"Speech Digitization and Coding"Lectures
Notes of university of hertfordshire.