# Spectrogram Based Musical Instrument Identification Biology Essay

**Published:** **Last Edited:**

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Abstract- It is proposed to identify Musical instruments using the computer systems and recognize which instrument is playing. Musical instrument features are calculated using spectrogram which is new approach than usual time frequency analysis. Spectrogram is generated for every sound, which is used to calculate the spectral, temporal & modulation features. The instrument existence probability is calculated for every possible fundamental frequency F0. The musical instrument is identified using the hidden Markov model. Time complexity of spectrogram are studied in this work.

Keywords- Spectrogram, FO estimation, HMM, Musical Instrument Identification tyling, insert.

Introduction

The key idea of the technique used in this paper is to visualize the probability that the sound of each of target instruments exists at each time and each frequency. The technique using spectrogram, calculates the spectral, temporal & harmonic variations of instrument for every possible F0. This approach made it possible to avoid errors caused by conventional method based on estimation of pitch, duration & timbre.

In addition, by using a Markov chain whose states corresponds to target instruments for every possible F0, the identification of musical instruments for polyphonic music can be achieved [1].

The specific instrument existence probability is calculated using the hidden Markov model (HMM) since temporal characteristics of an instrument are considered while recognizing musical instruments.

At each frame, an observed spectrum of the input signal containing multiple musical instruments sounds as a weighted mixture of harmonic structure with every possible F0. The weight i.e. amplitude of each harmonic structure represents how relatively predominant it is.

In each possible frequency f, the temporal trajectory H(t, f) of the harmonic structure with F0 of f can be considered to be generated from a Markov chain of m models of possible instruments Ï‰1, ··· ,Ï‰m. Each model is an HMM that consists of multiple states. Then, p(Ï‰i|exist; t, f) can be calculated from the likelihoods of paths in the chain.

music identification method

Each image of the spectrogram is a time and frequency plane. The intensity of color of each point (t, f) in the image represents the probability P(Ï‰i; t, f) that a sound of the target instrument Ï‰i exists at time t and frequency f. Instrument existence probability is given by:

P(Ï‰i;t,f) = P(Ï‰i|exist;t,f) (1)

P(Ï‰i|exist;t,f) called instrument existence probability, it is the conditional probability that, if a sound of a certain instrument exists at time t and frequency f, then the instrument is Ï‰i.

At each frame, an observed spectrum of the input signal containing multiple musical instruments sounds as a weighted mixture of harmonic-structure tone models with every possible fundamental frequency F0. The weight i.e. amplitude of each tone model represents how relatively predominant its tone model is.

The instrument existence probability is calculated by using HMM because temporal characteristics of an instrument sound are important in recognizing the instrument. In each possible frequency f, the temporal trajectory H(t, f) of the harmonic structure with fundamental frequency F0 can be considered to be generated from a Markov chain of m models of possible instruments Ï‰1, ··· ,Ï‰m. Each model is an HMM that consists of multiple states. Then, P(Ï‰i|exist; t, f) can be calculated from the likelihoods of paths in the chain.

For estimating the relative dominance of every possible F0 it treats the input mixture as if it contains all possible harmonic structures with different weights (amplitudes). It regards a probability density function (PDF) of the input frequency components as a weighted mixture of harmonic-structure (represented by PDFs) of all possible F0s and estimates their weights corresponding to the relative dominance of every possible harmonic structure. It then considers the maximum-weight model as the most predominant harmonic structure and obtains its F0.

Short-time Fourier Transform

The spectrogram of the given audio signal is calculated with the short-time Fourier transform (STFT) at sampling frequency 44.1 kHz with a 64-point Hamming window. Slide the window to the right with overlap of 50%.

Harmonic structure extraction

In each possible frequency f, the temporal trajectory H (t, f) of the harmonic structure whose F0 is f is extracted.

Feature extraction

Musical instrument sounds have more complicated temporal variations (e.g., amplitude and frequency modulations). For every time t(of few seconds length), first truncate a T-length bit of the harmonic-structure trajectory Ht(Ï„, f)(t â‰¤ Ï„<t + T) from the whole trajectory H(t, f) and then extract a feature vector x(t, f) consisting of 15 features from Ht(Ï„,f).

Overview of 15 features

a. Spectral features-

1 Spectral centroid

2 Amplitude of fundamental frequency

3 - 10 Amplitude of harmonic components (i =2, 3, ··· , 9)

b. Temporal features

11 Roll-off Rate

12 Attack Time

13 Decay Time

c. Modulation features

14 Amplitude Modulation

15 Frequency Modulation

Instrument Existence Probability using HMM

Markov Model is a statistical model for prediction. For a sequence {q1, q2, ..., qn},the first-order Markov assumption:

P(qn|qn-1, qn-2, ..., q1) = P(qn|qn-1) (2)

Probability depends on observation qnâˆ’1 at time n âˆ’ 1.

A second-order Markov assumption probability depend on qn-1 and qn-2. An output sequence {qi} of such a system is a Markov chain.

For hidden values - according to Bayes' rule conditional probability:

P(qi|xi) = P(xi|qi) P(qi) = P(xi|qi) P(qi) (3)

P(xi)

Note that P(xi) is consider as negligible since it is independent of sequence qi.

The transition probabilities are the probabilities to go from state i to state j:

ai,j = P(qn+1 = sj |qn = si) (4)

A HMM allows for transitions from any emitting state to any other emitting state is called an ergodic HMM. In the other type of HMM, the transitions only go from one state to itself or to a unique follower is called a left-right HMM.

Elements of a Hidden Markov Model:

Clock t={1,2,3,â€¦T} (5)

N states Q = {1, 2, 3, â€¦ N} (6)

Every state has its own discrete probability distribution.

M events E = {e1, e2, e3, â€¦, e M} (7)

Initial probabilities Ï€ j = P[q1 = j] (8) 1 £ j £ N

Transition probabilities a i j = P [qt = j | qt-1 = i] (9)

1 £ i, j £ N

Observation probabilities b j (k)=P [ot = e k | qt = j] (10) 1£k£M

b j(ot)=P [ot = e k | qt = j] (11)

1 £ k £ M

A = matrix of aij values, B = set of observation probabilities, & Ï€ = vector of Ï€ j values.

This model is called Hidden Markov Model (HMM) because the sequence of state that produces the observable data is not available (hidden).

Entire Model is given by: l = (A, B ,Ï€) (12)

Emission probability distribution continuous in each state and can be represented by a Gaussian mixture model. Emission probability distribution is continuous in each state and can be represented by a Gaussian mixture model.

Ej(0) = f (0;Î¼j,Ïƒj), j = 1,N (13)

For continuous observation HMM, the probability of both O and q occurring simultaneously in Lambda model is given by:

L

P(O,Q|Î») = Ï€q1eq1(o1).ÐŸ a qi-1 q1 eqi(o1) (14)

i=2

Posterior decoding

The precise posterior decoding of the HMM states can be obtained by application of a forward-backward algorithm, which is used in many speech recognition.

To choose states those are individually most likely at the time when a symbol is emitted.

Let Î» k(i) be the probability of the model to emit k-th symbol being in the i-th state for the given observation sequence.

Initialization:

Î» k(i) = P( q (k) = q i | O ) (15)

Recursion:

Î» k(i) = Î±k(i) Î²k(i) / P( O )

Î» k(i) = Î± k(i). Î² k(i) / âˆ‘ Î± k(i) Î² k(i) for N states (16)

i =1, ... , N , k =1, ... , L

Termination:

q (k) = arg max {Î» k(i)} (17)

Viterbi algorithm

The Viterbi algorithm chooses one best state sequence that maximizes the likelihood of the state sequence for the given observation sequence

It keeps track of the arguments that maximize Î´k(i) for each k and i storing them in the N by L matrix. This matrix is used to retrieve the optimal state sequence at the backtracking step.

Initialization:

Î´1(i) = pi bi(o(1)) (18)

Ïˆ1(i) = 0 , i =1, ... , N (19)

Recursion:

Î´t ( j) = max i [Î´t - 1(i) aij] b j (o(t)) (20)

Ïˆt( j) = arg max i [Î´t - 1(i) aij] (21)

Termination:

p* = max i [Î´T( i )] (22)

q*T = arg max i [Î´T( i )] (23)

Path (state sequence) backtracking:

q*t = Ïˆt+1( q*t+1) , t = T - 1, T - 2 , . . . , 1 (24)

Database

The samples used are Pre-recorded Audio signals with sampling frequency fs 44.1 KHz. The sampled musical notes are recorded on Yamaha-PSR-I425 Electronic keyboard, developed by Yamaha, Japan. Yamaha-PSR-I425 Electronic keyboard Specifications:

flow of instrument identification method

Audio Signal

Spectrogram generation

Feature extraction

Specific instrument existence probability calculation using HMM

Instrument identification

Figure 1: Flow of musical instrument identification.

Results

Touch sensitive 61-key keyboard + 32 note polyphony, for natural and realistic sounds of 514 instruments, real time pitch control, supports for MIDI formats, compressive sound recording function.

Each note is tested for middle, low & high pitch. Musical instrument identification experiment is performed on samples of string guitar, flute & piano. Also experiments are carried on Duo & Trio (combinations) samples of above instruments.

Experimental Results

Both monophonic & polyphonic sound samples are used during experiments. Please see last page of this document for result.

The spectrogram & spectral features for sample instrumental music is shown in figure 2(b)-(g).

For Instrument identification Viterbi algorithm is used in example, which calculates the most likely path through the Hidden Markov Model specified by transition probability matrix, and emission probability matrix.

Transitions (I, J) is the probability of transition from state I to state J (i.e. from one instrument to the other instrument). In given example Left-Right HMM is used. Emission (K, L) is the probability that symbol (in our example feature) L is emitted from state K.

Transition matrix =

1

0

0

0

1

0

0

0

1

Emission Matrix of first frame for Identified Instrument--> Piano

5.714

e-2

1.4938e-2

6.207

e-2

2.4192e+000

3.8165e-1

6.5778e-3

1.0988e-3

6.2947e-3

6.7947e-2

5.9806e-2

1.1625e-1

1.1186e-2

1.8907e+000

1.1500e-1

4.3256e-2

9.7878e-4

3.2057e-3

1.0248e-1

6.0706e-2

4.1587e-2

4.6349e-1

2.2002e+000

7.2443e-1

8.9798e-3

4.8186e-3

9.1239e-3

9.8315e-2

Emission Matrix of second frame for Identified Instrument--> Guitar

3.2617e-2

3.7300e-2

1.5889e-2

9.6048e-1

4.0193e-1

5.7875e-3

1.7276e-3

8.9285e-3

1.3700e-1

2.8087e-1

2.0518e-2

2.3490e-2

5.0418e-1

1.1678e-1

4.2416e-1

1.4484e-3

3.7725e-3

4.2731e-1

3.0189e-1

6.2158e-2

2.2386e-2

5.2383e-1

8.0118e-1

7.5688e-3

8.0830e-3

1.5939e-2

3.6317e-1

Emission Matrix of third frame for Identified Instrument--> Flute

2.4409e-1

8.8960e-2

1.6461e-1

6.9549e-1

3.4623e+000

5.6420e-3

5.2346e-3

4.8182e-3

9.2664e-2

2.6099e-2

1.1522e-2

8.9619e-3

4.2015e-1

1.7282e-001

4.7695e-1

3.3045e-3

1.8359e-2

1.7145e-1

2.6269e-2

1.8471e-2

4.1082e-2

4.3371e-1

6.5418e-001

7.3219e-3

1.9549e-3

3.8940e-3

1.6011e-1

The Trellis diagram (state diagram) showing the possible instrument (state) for features extracted in a frame is shown in figure 2(h), (i), (j). If the extracted feature varies then there is transition from one state to another state.

The joint likelihood of observation sequence over all possible states is calculated to identify the specific musical instrument playing.

Recognition Rate = Number of correctly recognized instruments

Total Number of Instruments

Table 1: Experimental results

## Instruments

## Posterior Decoding

## Recognition rate

## Viterbi Algorithm

## Recognition rate

String Guitar

82%

91%

Flute

77%

95%

Piano

60%

83%

Flute + Guitar

81%

90%

Flute + Piano

71%

84%

Piano + Guitar

40%

60%

Flute + Piano + Guitar

70%

82%

Conclusion

Implementation of spectrograms as well as the improvement of the accuracy of calculating the musical instrument existence probabilities is proposed in this paper .

Signal processing algorithms were designed to measure the features of sound signal & calculate instrument existence probability based on HMM model. During experiments it was found that the results of HMM using Viterbi algorithm are more accurate than Posterior Decoding algorithm.

Future development will be concentrating on integrating the recognizer into a system to process more complex sound mixtures.