This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Abstract- Speech/Music discrimination refers to the problem of segmenting an audio stream and labelling of each segment as either speech/music. Since the first attempt in the mid 1990's, a number of speech/music discrimination systems have been implemented in various field of applications. The real timetechnique for speech/music discrimination was proposed, focusing on the automatic monitoring of radio stations using features related. Audio features were used in order to train classifier. Many different systems for segmentation have been introduced and many different features have been proposed.
A variety of systems for audio classification have been proposed and implemented in the past for various applications. But the problem with those techniques is that a large number of features are used for discrimination between speech/music. The feature extraction is the primary importance to all methods. A study of feature extraction scheme is first provided. Existing methods are broadly classified into categories. In this paper present trying to the limit the number of features and HMM classifier is used to discriminate between speech/music.
Keywords- Audio classification, Mel Frequency Cepstral Coefficients, Speech-Music discrimination, Hidden Markov Model
Audio signals are important source of information for understanding the content of multimedia. Automatically detecting music parts from audio signals in radio broadcast is becoming a basic and important task. For a similar purpose many Speech/Music discrimination proposed in terms of feature extraction and discrimination algorithms. They mainly discriminate pure Speech or Music from each other.
Several approaches for Speech/Music discrimination have been proposed in the past. In  proposed a method for real time automatic monitoring radio channels. His system was based on using Zero Crossing rate feature. In  used 13 features to characterize the distinct properties of Speech/Music signals. In  proposed comparing the performance of the proposed feature with that of previously proposed feature in classification. The novel audio feature is extracted according to the multiplication of MFCCV estimate and an exponential component that depends on the MFCC estimates.
Audio feature extraction in this section we will describe set of most commonly used features for Speech/Music discrimination. Some of the features are in time domain and some are defined in frequency domain. We implemented with proposed feature. The following five features are described below:
Zero-Cross Rate (ZCR): The zero cross rate is a feature, which represents a number of Zero- crossing of a signal.
The Percentage of Low Energy Frames (PLEF): This feature is defined as the proportion of frames, with RMS power value below 50% of the average RMS power.
Spectral Roll-off (SR): Spectral Roll off is the measure of "skewness" of the signals frequency spectrum.SR feature is defined as the frequency, below of which is the 95% of frames power.
Spectral Centroid (SC):It is like SR also a measure of spectral shape.
Spectral Flux (SF):SF is a feature which characterize the change in the shape of the signals spectrum.
DESCRIPTION OF SPEECH/MUSIC DISCRIMINATION SYSTEM
Basic structure of the proposed system classification system is shown in figure(1).Input audio signal is 16 bit,16 KHZ, Windowing carried out with rectangular window of 600 samples once each window short time Fourier transform with order of 600 applied. After STFT calculation step, MFCCV feature calculated. A HMM classifier is used for feature classification.
Mel-Frequency CepstralCoefficients Variance
In this section we will present a novel feature group for speech/music discrimination. It is based on calculation procedure of MFCCV features start with windowing of an input audio signal. The windowing step is carried out with rectangular window of 600 samples. The input signal is sampled at 16 KHZ 16 bit Resolution. After windowing STFFT is applied over each window, order of 600. The magnitude spectrum is then filtered with a serious of triangular filters with overlap, which are evenly distributed on the melodic frequency axis.
Linear frequency axis is warped to Mel-frequency axis using equations:
Whereis melodic scale frequency and is linear scale frequency.For every triangular filter, the centre frequency is calculated using the following formula:
It is the centre frequency of triangular filter on linear scale andis the centre frequency on the melodic scale. Filter centre frequencies are calculated according to the equation:
Where start frequency is the lowest frequency of first filter backchannel, smp-fre is the sampling frequency, and the filter-num is the number of filter bank channels.
The filtering of the magnitude spectrum and energy calculation of the Mel-filter bank is performed according to the expression:
Where k is the frequency bin index is the filter bank channel index is the number of frequency bins in the magnitude spectrum vectorâ€¦is Mel=filter bank channel function. And is the energy of the filter channel, which is equal to Mel frequency coefficient.
When Mel-filter bank coefficients are calculated, discrete cosine transform applied over and MFCCV feature are derived using this equation:
Where N is the number of filter bank channels is the filter bank channels,. are cepstral index, and are the cepstral coefficients
Variance of cepstral coefficients are calculated in a window. The variance of an individual cepstral coefficient is calculated:
Where is the variance of the cepstral coefficients in a window, i is the cepstral index, n is the frame index of a window, W is the number of frames, and is the cepstral coefficient vector.
B.Hidden Markov Model Classifier
A classifier is an algorithm with features as input and concludes what it mean based on information that is encoded into the classifier algorithm and its parameteres.The output is usually a label, but it can also confidence values. Various types of classifiers are there in the proposed approach is Hidden Markov Model Classifier. An HMM classifier used for features classification, this type of feature is assigned to whichever classifier is best model of that feature. The hybrid model's structure is as shown in fig.2.Where N express the basic HMM state number. Three layers which has one hidden layer between the input and output layer. Basic HMM is assumed that each state's output is mixture Gaussian probability-density function and used Maximum Likelihood criterion to train parameter of HMM. Maximum-likelihood estimation (MLE) is the method of estimating parameters of statistical model. When applied to a data set given an statical model, Maximum likelihood estimation provides estimates of the models parameters.
Fig 1. Block scheme of the proposed speech/music classification
Fig 2.Hybrid Model's Structure
Simulation results are shown for input audio signal is carrying out window function with number of samples. The performance is evaluated for frequency vs. .amplitude as well as Mel scale vs. MFCCV. The first graph shows the audio signal the time domain signal is converted into frequency domain signal, and short time Fourier transform is applied with window function. Input is music/speech signal is multiplied by a window function result is obtained in fig1.We use audio signal as input, hence features and classifier is applied for the SPEECH/MUSIC discrimination. The second graph shows the effect of MFCCV feature shown below,
Fig. 3 Graph showing the effect of spectrum with window function
It is a novel feature -group for speech/music discrimination. The feature group is based on Mel Frequency Cepstral Coefficient Variance. Each input is enter different Mel scale vs. MFCCV output can be obtained; this different value depends on number of windows i.e. duration of an audio signal. Input is music signal; duration of audio signal is 16 bit 16 KHZ. The graph of this approach is shown below,
Fig. 4 Graph showing the effect of MFCCV feature
This paper present a Speech/Music discrimination of FM radio broadcast to provides limit the number of features have been used. The advantage is using HMM classifier is used it is one of the best classifier to provides discriminate between speech/music signals. We proposed a new audio feature for optimal speech and music discrimination, while a HMM classifier using this feature was evaluated in a number of experiments. The HMM based classification verified that the proposed feature had the best quality of music/speech classification. In this paper, we have proposed a fast and effective algorithm for audio classification as speech, music. MFCCV features are also very convenient speech/music discriminator for automatic speech recognition. Speech/Music discriminator can be used as a pre-processor to ASR to exclude music portions from the multimedia data before transcription. In low bit rate coding, the output of a dis crimination system can be used as the basis to alternate between different codec for speech/music instead of using a universal codec for both signal types. Here we apply a HMM based classifier is used to its capability in handling number of sequences which may vary in time or speed. In general music and speech only signals are stored with their individual features computed. Test data comes to the classifier after its features are available. We can observe that the proposed feature allows us to achieve the best discrimination quality in all tests.