Speech Processing To Reduce Sensorineural Hearing Impairment Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Sensorineural hearing impairments causes widening of the auditory filters, leading to increased spectral masking and degraded speech perception. Earlier studies have shown that binaural dichotic presentation, using critical bandwidth based spectral splitting with perceptually balanced comb filters, helps in reducing the effect of spectral masking for persons with moderate bilateral sensorineural hearing impairment. The objective of the present study is an evaluation of scheme for splitting the speech signal into two-band by means of filtering and down sampling at each decomposition level, using discrete wavelet transform (DWT) with different wavelet functions, in order to reduce the effect of spectral masking in the cochlea.

Keywords-Sensorineural hearing impairment, Spectral masking, Spectral splitting,cochlea, discrete wavelet transform.


Cochlea or Inner ear is a snail shaped cavity like a spiral shaped structure, filled with fluid [1]. Figure 1 shows the structure of the auditory system. The start of the cochlea, where the oval window is placed is known as the basal end and its other end is known as the apical end. The cochlea is separated along its length into three chambers namely: scala vestibuli, scala media, and scala tympani. Scala vestibuli and scala media are split by a thin membrane called Reissner's membrane, and scala media is separated from scala tympani by basilar membrane [1]. Incoming signal causes the oval window for movement and pressure differences at the tympanic membrane are applied, resulting in cochlear fluid movement which provides upward and downward movement of the basilar membrane. The vibration occurs depends on the mechanical properties of the basilar membrane and the frequency of the input signal which fluctuate significantly from base to apex. The basilar membrane is comparatively stiff and narrow at the basal end, and less stiff and wider at the apex. The high frequency signals produce maximum deflection at the base while low frequency signals produce maximum deflection at the apex [2]. Figure 2 shows the vibration of the basilar membrane for high, medium and low frequencies.

The organ of Corti is placed between the basilar membrane and tectorial membrane, which contains sensory hair cells. There are about 3,000 to 3,500 inner hair cells, each with about 40 hairs and 20,000 to 25,000 outer hair cells, each with about 140 hairs. The tectorial membrane is situated above the hairs.

The information about sounds is conveyed via inner hair cells as the inner cells stimulate the afferent neurons. Outer hair cells improve the basilar membrane responses thus produce high sensitivity, and sharp the tuning (frequency selectivity) to the basilar membrane [2].

Structure of the auditory system. Adapted from

Moore (1997), Fig.1-7

Amplitude patterns of vibration of basilar membrane

for different frequencies. Adapted from: Moore (1997), Fig.1-9

The impairment due to defect in the cochlea is known as cochlear (sensory) impairment. Sensorineural hearing impairment is caused by exposure to intense sound, congenital defects leading to loss of cochlear hair cells, damage to auditory neurons. The audiograms of the sensorineural impairment show typical shapes depending on the pathology such as high frequency hearing impairment, elevated thresholds at low frequencies [1], [2]. There are different impairments of moderate to severe bilateral sensorineural hearing impairment, bilateral and profound deafness, and unilateral very severe or profound deafness. Apart from the elevated thresholds, sensorineural hearing impairment is observed as by loudness recruitment, reduced frequency selectivity and temporal resolution, and increased spectral and temporal masking [6],[8].

Spectral masking or simultaneous masking is a frequency-domain version of temporal masking, and tends to occur in sounds with similar frequencies: a powerful spike at 1 kHz will tend to mask out a lower-level tone at 1.1 kHz. It is masking between two concurrent sounds. Sometimes called frequency masking since it is often observed when the sounds share a frequency band e.g. two sine tones at 440 and 450Hz can be perceived clearly when separated. They cannot be perceived clearly when simultaneous. In masking a sound is made inaudible by a "masker", a noise or unwanted sound of the same duration as the original sound [7].

The DWT has a very accurate resolution both in the frequency and in the temporal domain, thus being one of the

Most appropriate tools to analyze non stationary signals such as speech [9]. Also, the ability of the WT in processing speech seems to be intrinsically related to the fact that the cochlea itself behaves as a parallel bank of WT-like filters [10], [11].Wavelet analysis is equivalent to a bank of band pass filters that divides the frequency axis into logarithmic bandwidths when the discrete wavelet transform is used. The wavelet level concept as used to refer to the number of decimations performed in the wavelet analysis may be thought of as an index label of the filter banks and is associated with a particular frequency interval.

The evaluation of this scheme involves two-band splitting of the input signal by means of filtering and down sampling at each decomposition level. The decomposition of the Input signal which includes vowel-consonant-vowel for fifteen English consonants is carried out by using Daubechies, Symlets, Biorthogonal wavelets of different orders, a possible solution to problem of spectral masking.


The speech material

Earlier studies have used CV, VC, CVC, and VCV syllables. It has been reported earlier [3] that greater masking takes place in intervocalic consonants due to the presence of vowels on both sides [4], [5]. Since our primary objective is to study improvement in consonantal identification due to reduction in the effect of masking, so VCV syllables was considered as the most appropriate test material.

For the evaluation of the speech processing strategies, a set of fifteen nonsense syllables in VCV context with consonants / p, b, t, d, k, g, m, n, s, z, f, v, r, l, y / and vowel /a/ as in farmer were used. The features selected for study were voicing (voiced: / b d g m n z v r l y / and unvoiced: / p t k s f /), manner (oral fricative: / s z f v r /), nasality (oral: / p b t d k g s z f v r l y /, nasal: /m n /), frication (stop: / p b t d k g m n l y /), and duration (short: / p b t d k g m n f v l / and long: /s z r y /).

The speech processing strategies

For many signals, the low-frequency content is the most important part. It is what gives the signal its identity. The high-frequency content, on the other hand, imparts flavor or nuance. Consider the human voice. If you remove the high-frequency components, the voice sounds different, but you can still tell what's being said. However, if you remove enough of the low-frequency components, you hear gibberish.

The filtering process, at its most basic level, is shown in Figure 3.

Basic level of Filtering Process

The original signal, S, passes through two complementary filters and emerges as two signals. Unfortunately, if we actually perform this operation on a real digital signal, we wind up with twice as much data as we started with. Suppose, for instance, the original signal S consists of 1000 samples of data. Then the resulting signals will each have 1000 samples, for a total of 2000.These signals A and D are interesting, but we get 2000 values instead of the 1000.

To solve this problem we have perform the decomposition using discrete wavelet transform with different wavelets. We have kept only one point out of two in each of the two 2000-length samples to get the complete information. We produce two sequences called cA and cD which includes downsampling, produces DWT coefficients [9].

To gain a better appreciation of this process, let's perform a one-stage discrete wavelet transform of a signal. Figure 4 shows schematic with real signals inserted into it.

Multiple-Level Decomposition

The decomposition process can be iterated, with successive approximations being decomposed in turn, so that one signal is broken down into many lower resolution components. This is called the wavelet decomposition tree. Figure 5 shows decomposition tree up to level 3

In this study we decomposed fifteen nonsense syllables in VCV context with consonants / p, b, t, d, k, g, m, n, s, z, f, v, r, l, y / and vowel /a/ as in farmer, up to five decomposition level and obtained Approximation coefficients, Detail coefficients, approximations and details.

one-stage discrete wavelet transform of a signal

Experimental Results

In this analysis we have decomposed VCV context by using discrete wavelet transform with different wavelets. Figure 6 shows the decomposition of context /apa/ up to level 5 with

Multilevel Decomposition Tree up to Level 3

Approximation coefficients and detail coefficients, while the approximations and details obtained using 'db2' wavelet is shown in Figure 7.

Decomposition of context /apa/ up to level 5 with Approximation coefficients and detail coefficients using 'db2'

Decomposition of context /apa/ up to level 5 with Approximation coefficients and detail coefficients using 'db2'


Sensorineural hearing impairment is associated with increased masking. Increased spectral masking associated with broad auditory filters results in smearing of spectral peaks and valleys, and leads to difficulty in perception of consonantal feature. Increased forward and backward temporal masking of weak acoustic segments by strong ones causes reduction in discrimination of voice-onset-time, formant transition, and burst duration that are required for consonant identification. Thus the overall effect of two types of masking is a difficulty in discrimination of consonants, resulting in relatively degraded speech perception by persons with sensorineural impairment.

Masking takes place primarily at the peripheral auditory system. In speech perception, the information received from both the ears gets integrated. Hence splitting of speech signal into two complementary signals such that signal components likely to mask each other get presented to the different ears can be used for reducing the effect of increased masking. This technique can be used for improving speech reception by persons with moderate bilateral sensorineural impairment, i.e. residual hearing in both the ears.

This study shows that, hearing impaired subjects are able to perceptually integrate the dichotically presented speech signal. Processing schemes for splitting the speech signal in a complementary fashion to reduce the effects of increased masking at the peripheral level improved speech reception. The dichotic presentation also decreases the load on the perception process. For hearing impaired subjects, the improvement in consonantal reception and reduction in response time do not follow the same trend.

For reducing the effect of increased spectral masking, speech processing schemes based on spectral splitting of speech signal, by using comb filters with complementary pass bands, for binaural dichotic presentation have been reported earlier. Splitting the speech signal and compressing the frequency bands is a possible solution to reduce the effect of increased masking. In this study, a processing scheme of speech signal, by using discrete wavelet transform with different types of wavelets with different orders has been investigated to decompose the speech signal into two for binaural dichotic presentation, presented to the two ears which may help in improving the perception of various consonantal features. The speech signals were decomposed with discrete wavelet transform at various levels to get low frequency and high frequency signal components. For each decomposition level, there is a different time-frequency resolution. Once the decomposition tree has been selected, the next step involves selecting an appropriate wavelet type depending on orthogonally, symmetry such as Daubechies, Symlets and Biorthogonal wavelet functions. It helps in reducing temporal masking and spectral masking simultaneously, thereby improving the speech perception.