Implementation Of Hilbert Huang Transform For Speech Processing Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The Hilbert Huang Transformation is a new method developed for analysing nonlinear and non-stationary data. The key part of the method is the 'empirical mode decomposition' which enables decomposition of any complicated data set can be into a few finite number of 'intrinsic mode functions' that admit well-behaved Hilbert transforms. This decomposition method is adaptive and hence, highly efficient. Since the decomposition is based on the local characteristic time scale of the data, it is applicable to nonlinear and non-stationary processes. With the Hilbert transform, the 'intrinsic mode functions' yield instantaneous frequencies as functions of time that give sharp identifications of imbedded structures and as such the HHT can be used to enhance the performance of speech signals by removing unwanted noise. Herein, a scheme of implementation of the HHT is presented. As a first step the complete HHT filter has been simulated with MATLAB. The obtained results for a noisy audio signal are discussed herein and the superiority of the HHT audio filter over the Wavelet filter is illustrated. Hardware for the Hilbert Transform has also been designed in VHDL and synthesized on Xilinx platform.

Keywords- Hilbert-Huang transform; speech enhancement; empirical mode decomposition; intrinsic mode function; spectral analysis


Traditional data-analysis methods are all based on linear and stationary assumptions. In most real systems, either natural or even man-made ones, the data are most likely to be both nonlinear and non-stationary. Analysing the data from such a system is a daunting task. Only in recent years have new methods been introduced to analyse non-stationary and nonlinear data. A necessary condition to represent nonlinear and non-stationary data is to have an adaptive basis. An a priori defined function cannot be relied on as a basis, no matter how sophisticated the basis function might be. Thus there is a need for an adaptive basis. Being adaptive means that the definition of the basis has to be data-dependent, an a posteriori-defined basis. For non-stationary and nonlinear data, where adaptation is absolutely necessary, no available methods can be found. A recently developed method, the Hilbert-Huang transform (HHT), by Norden Huang[1] seems to be able to address these issues.

Practical applications of the HHT are today broadly spread in numerous scientific disciplines and investigations, e.g. The HHT is used in tsunami research to detect earthquake generated water waves from data series recorded from bottom pressure transducers in the Northern Pacific [2]. Some work on the HHT has also been performed in medical sciences, like in achieving artifact reduction in electrogram due to the fact that severe contamination effects take place and in disintegrating multisite neuronal data [3]. The EMD is also used in automatic human gait analysis that is becoming increasingly important in the context of human gesture recognition to serve as an individual biometric characteristic [4].

Following the overview of some recent applications of the HHT is given firstly. Next, in Section 2, the numerical procedure of the HHT is introduced and a speech enhancement method based on the HHT is proposed and applied to filter unwanted sound is proposed. Section 3, provides some results we have obtained till date. Finally, in Section 4 we discuss about the performance evaluation of this technique.

ii. Filtering Speech Signal

Hilbert - Huang Transformation

A time-frequency distribution may be developed using the Hilbert transform. Unfortunately, the application of HT is strictly limited by the properties of x(t), that is, the signal should be narrow banded around time t. This condition is usually not satisfied by time series collected from practical applications. Suppose that we have a signal

x(t) = cos(ω1t) + cos(ω2t),

Hilbert transform will generate an average instantaneous frequency instead of ω1 and ω2 separately. To overcome this problem, Huang et al.[1] proposed an empirical decomposition method to extract intrinsic mode functions from time series such that each intrinsic mode function contains only one simple oscillatory mode (a narrow band at a given time).

An empirical mode decomposition (EMD) algorithm was proposed to generate intrinsic modes in an elegant and simple way, called the sifting process. Three assumptions are made for the EMD of a time series: first, the signal must have at least two extrema - one minimum and one maximum; second, the time interval between the extrema defines the characteristic of the time series; third, if the data were totally devoid of extrema but contained only inflection points, it can be differentiated to reveal the extrema.

Once the extrema are identified, the maxima are connected using a cubic spline and used as the upper envelope. The minima are interpolated as well to form the lower envelope. The upper and lower envelopes should cover all the data points in the time series. The mean of the upper and lower envelopes, m1(t), is subtracted from the original signal to get the first component h1(t) of this sifting process.


If h1(t) is an intrinsic mode function (IMF), the sifting process stops. Two conditions are used to check h1(t) as an IMF: 1) the number of zero crossings should be equal to the number of extrema or differ by at most 1. In other words, h1(t) should be free of riding waves; 2) h1(t) has the symmetry of upper and lower envelopes with respect to zero.

Otherwise, the sifting process should be repeated to purify the signal h1(t) to an IMF. As a result, h1(t) is sifted to get another first sifted component h11(t)


where m11(t )is the mean of upper and lower envelopes of h1(t) .The process continues until h1k(t) is an IMF. The h1k(t) is then designated as the first component c1(t) = h1k(t). In order to stop the sifting process a criterion is defined using a standard deviation,


The threshold value is usually set between 0.2 and 0.3 [1]. A revised criterion is proposed to accelerate the sifting process.


The stopping criterion is designed to keep the resulting IMFs to be physically meaningful. The first component c1(t) contains the finest scale of the signal, or the highest frequency information at each time point. The residual after the first sifting process is


Then r1 is used to replace the raw signal x(t), and the sifting process continues to generate other IMFs. The sifting process should stop according to the requirement of the physical process. However, there are some general standards, for example, the sum-squared value of the residuals is less than a predefined threshold value or the residual becomes a monotonic function. The original series can be presented by a sum of the IMF components and a mean value or trend


The resulting IMFs from sifting processes are then ready to be transformed using the Hilbert transform.

It is obvious that the resulting empirical components are free from siding (frequencies on either side) waves thus local narrow frequency band is realized. The HHT is adaptive by using the sifting process with the help of cubic interpolations, thus it is a nonlinear transform technique that has great potential applications for complicated non-stationary nonlinear data analysis.

Hilbert Transform:

The main purpose of the EMD is to conduct the HT and obtain the Hilbert spectrum which is similar to wavelet spectrum. After conducting HT to every IMF component, Cj(t) we have a new data series yj(t) in the transform domain:


C j (Ï„ )

y j (t)  π P∫


t − τ

Speech Enhancement

A speech enhancement method, which can remove the unwanted sound from the speech signal, was conceived [5]. In practice, the EMD components before the sudden increase of amplitude of the EMD components can be regarded as the noise content and be removed because the scale of noise signal is generally small in comparison with that of the real signal. The noise removal procedures are as follows:

Decompose the extended original signal to IMFs by the EMD method;

Remove the IMFs whose content belongs to the noise using the criterion of sudden increase in amplitude;

Reconstruct the signal with the rest of IMFs.

Performing Hilbert transform on the reconstructed signal.

Hht Implementation And Verification

The algorithm consists of two modules, namely, generating IMF components of the speech signal and obtaining the Hilbert Transform for the reconstructed signal using imfs. Then, the noise removal procedure as described above is implemented. The results obtained on completion of the first module are as follows:

Fig 1:The Original Speech Signal

Fig 2: The noisy speech signal

Fig 3: The IMF components of the speech signal on sifting

Fig 4: The Signal on reconstructing

Fig 5: Hilbert Transform output

Fig 6: Wavelet Output

Fig 1 shows a female vocal music signal input having a fundamental frequency range of 1 kHz - 4 kHz sampled over 3 seconds resulting in 1,36,842 data points . Figure 2 gives the plot of the composite input when this is mixed with constant frequency noise of 200Hz having 0.2 times the amplitude of the music sample. Fig 4 shows the corresponding output after EMD sifting process. Fig 5 shows the output applying Hilbert Transform (HHT Filtering).

Based on audio, while comparing the input signal and Hilbert output, noise has been reduced to an extent, compared to EMD output and input signal was analogous to the Hilbert transform output. Since Hilbert transform will yield a phase shifted signal as output, and so we are getting shifted signal in wave chart and not able to it compare with the input. On comparison it is evident that the performance of Hilbert transform is much superior to wavelet transform and the noise which is present in the signal is reduced to greater extent which can be proved by playing through MATLAB.

Hilbert Transform As Analytic Filter

The Hilbert Transform can be defined as filter in the discrete frequency domain as

HH ( ej Ω) = -j for 0 < Ω <

j for - < Ω < 0

0 for Ω = 0 ,

A so-called analytic signal can be generated from a real valued signal by extending the signal with its own Hilbert transform as the imaginary part:

x(n) = x(n) + jH{x(n)}

This signal has the property that no spectral components exist for the lower half of the z-plane - < Ω < 0.

The operation in Hilbert can be formulated as a filter operation with

HA(ejΩ ) = 1 + jH H(ejΩ )

The realizations are filters that generate` an analytic signal from a real valued signal, therefore these filters are called analytic filter (HAX).The filter is approximated to analytic filter and its block diagram is shown below:

Fig 6: Block diagram of HH4 (Z) Filter

The Hilbert transform analytic filter is implemented in Modelsim using VHDL language and tested for a sine wave and obtained a result for sine wave as negative cosine wave which is shown below:

Fig 7:Model Sim output for Hilbert transform filter

Synthesis Of Filter On Xilinx Platform

The Hilbert transform as analytic filter was synthesized (for sine wave signal) using Xilinx ISE 9.1 by selecting SPARTAN 3 as device and obtained the following results:


Logic Utilization




Number of Slices




Number of Slice Flip Flops




Number of 4 input LUTs




Number of bonded IOBs




Number of GCLKs





On implementing the entire algorithm in MATLAB, we inferred that the noise has been reduced to greater extent. we wish to determine the SNR levels of the filtered speech signal and compare it with an existing wavelet based denoising technique. After performing simulation in Matlab we wish to implement in the MODEL SIM for Speech Signal and compare it with Matlab result.

Vii. References

Zhuo-Fu Liu, Zhen-Peng Liao, En-Fang Sang., "Speech Enhancement Based on Hilbert Huang Transform". Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, 18-21 August 2006.

Martin Kumm and Mohammad Shahab Sanjari. Digital Hilbert Transformers for FPGA-based Phase-Locked Loops. In International Conference on Field Programmable Logic and Applications,2008

Wu Wang ,Xueyao Li and Rubo Zhang, "Speech detection based on Hilbert Huang transform" First International Conference On Computer And Computational Sciences(IMSCCS'06).