# Eeg Signals Classification Using Committee Neural Network Biology Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Index Terms- Autoregressive Coefficients, Artificial Neural Network, Discrete Wavelet Transform.

## Introduction

Eis the recording of spontaneous electrical activity of the brain over a small period of time. It is believed that EEG signals not only represent the brain signal but also the status of the whole body. This gives a prime motivation to apply the advanced Digital Signal Processing Techniques to the EEG signal processing. Use of EEG signal has various advantages e.g., it has high temporal resolution, it measures electrical activity directly and it follows a non-invasive procedure. EEG signal is used for various applications such as diagnosis of neurological diseases, characterizing the seizures for the purpose of treatment and to monitor the depth of anesthesia. Many researchers diverted their attention towards this topic due to the diagnostic application of EEG signals. This paper is also based on the diagnostic application of EEG signals. In this paper we have presented a new kind of classification method using Committee Neural Network having better classification accuracy.

Fig. 1 [1] shows the basic block diagram of EEG signal processing. For the classification purpose we have used we have taken the raw EEG signals available at [2].

From the five data sets available we have selected 3 sets (set A, set D, set E). In set A EEG signals were recorded from healthy volunteers. In set D recordings were taken from within epiletogenic zone but during seizure free interval while set E contained only seizure activity []. For each of these data we extracted the features using discrete wavelet transform and Auto-Regressive coefficients method. After feature extraction the different data sets were classified using committee neural network trained with back propagation algorithm.

To reduce the dimensionality of the features we have used Fisher's ratio based optimization technique.

In section 2 the features extraction methods are discussed, then in section 3 we have discussed our proposed technique for the classification purpose and F-ratio based technique to reduce computational complexity. In section 4 experimental results are shown. Finally in the last section we will conclude our work.

## FIG. 1. Block diagram for EEG signal processing

## Theoretical Background

2.1 Feature Extraction using Wavelet Transform

The wavelet transform gives us multi-resolution description of a signal. It addresses the problems of non-stationary signals and hence is particularly suited for feature extraction of EEG signals. At high frequencies it provides a good time resolution and for low frequencies it provides better frequency resolution, this is because the transform is computed using a mother wavelet and different basis functions which are generated from the mother wavelet through scaling and translation operations. Hence it has a varying window size which is broad at low frequencies and narrow at high frequencies, thus providing optimal resolution at all frequencies.

The continuous wavelet transform is defined

as

Where x(t) is the signal to be analyzed , ψ(t) is the mother wavelet or the basis function τ is the translation parameter and s is the scale parameter .

The computation of CWT consumes a lot of time and resources and results in large amount of data, hence Discrete wavelet transform, which is based on sub-band coding is used as it gives a fast computation of wavelet transform. In DWT the time-scale representation of the signal can be achieved using digital filtering techniques. The approach for the multi-resolution decomposition of a signal x[n] is shown below. The DWT is computed by successive low pass and high pass filtering of the signal x[n]. Each step consists of two digital filters and two downsamplers by 2. The high pass filter g[.] is the discrete mother wavelet and the low pass filter h[.] is its mirror version. At each level the downsampled outputs of the high pass filter produce the detail coefficients and that of low pass filter gives the approximation coefficients. The approximation coefficients are further decomposed and the procedure is continued as shown.

The spectral analysis of EEG signals using WT can compress a number of data points into few features which characterize its behavior. This is crucial for recognition and diagnostic purposes.

2.2 Feature Extraction using AR Coefficients

We have found the Autoregressive (AR) Power spectral density estimation of the EEG signals of set A, set B and set C. The Power spectral density is the distribution of power with respect to the frequency. Power spectral density of the random stationary signal can be expressed by polynomials A(z) and B(z) having roots that fall inside the unit circle in the z-plane [ub]. Autoregressive coefficients are very important features as they represent the PSD of the signal which is very common. The AR coefficients can be obtained after solving the linear equations of the system [df]. AR model is a model of a stationary stochastic process. The AR model of a signal {x(n)} can be written as

where ε(n) is the distribution of stationary stochastic process having zero mean and variance. The coefficients ai are the autoregressive parameters of the model, and L is the model order. Since the method characterizes the input data using an all-pole model, the correct choice of the model order p is important. We cannot take the value of model order too large or too small as it gives poor estimation of PSDs. We can model any stochastic process using AR model. The spectrum of the stochastic process can be given as

There are various methods available for AR modeling such as moving average (MA) model, autoregressive moving average (ARMA) model, Burg's algorithm [7]. ARMA method of AR model is normally used to get good accuracy. Burg algorithm estimates the reflection coefficient ai. we can use Burg method to fit a pth order autoregressive (AR) model to the input signal, x, by minimizing (least squares) the forward and backward prediction errors while constraining the AR parameters to satisfy the Levinson-Durbin recursion [8].

The Burg method is a recursive process.

In this paper we have followed the Burg's method to find the AR coefficients. In the present paper the model order is taken to be equal to 10.

2.3 Neural Network

Artificial neural networks have been successfully used in pattern recognition in various disciplines. It performs this by first undergoing a training session in which the training pattern which is represented as a feature vector is repeatedly fed into the network along with the class to which each particular pattern belongs. The network learns from the training vector and generalizes by classifying the patterns not encountered during the training phase. The multilayer perceptron network has been the most popular neural structure for classification purposes since they can classify non-linearly separable classes by training them in a supervised manner with a highly popular algorithm known as error back-propagation algorithm which is based on error correction rule. The basic structure of a multilayer perceptron is shown in fig. .It basically consists of an(1) input layer which has source nodes that receive the activation patterns, (2)one or more hidden layer having hidden neurons which extract higher order statistics(3)output layer which provide the overall response of the network to the input vectors. The outputs of each layer are fed to the next layer.

The back propagation algorithm involves two steps. First the input pattern is applied to the network and this signal is propagated through different layers and the output is computed for each layer. The resulting output at the output layer is compared with the target value resulting in an error signal at each output unit.

ej(n)=dj()n-yj(n), at jth output node dj() is the desired output and yj() is the ouput of the network. The cost function is given as

The objective is to adjust the free parameters of the network to minimize. This is the forward pass and the synaptic weights remain unaltered. In the backward pass the error signal is passed in the backward direction layer by layer computing the local gradient at each layer. This recursive process permits synaptic weights of the network undergo changes in accordance with the delta rule. Both the passes are iteratively repeated till the performance of the network achieves the required goal.

## Proposed Technique

3.1 Committee Neural Network

Committee neural network is an approach that reaps the benefits of its individual members. It has a parallel structure that produces a final output[] by combining results of its member neural network. The proposed technique consists of 3 steps (1) selection of appropriate inputs for the individual member of the committee (2) training of each member (3) decision making based on majority opinion.

The available data is divided into training and testing data. From the training data features were extracted using wavelet transform and AR coefficients. The input feature set is divided equally among all the neural networks for training purpose. The different networks have different neurons and initial weights. After the training phase is completed the networks are tested with testing data. All the neural networks were trained using gradient descent back propagation algorithm using MATLAB software package. Out of the different networks employed for the initial training stage the best performing three networks were selected to form the committee. For the classification purpose the majority decision of the committee formed the final output. Fig. shows the block diagram of the committee neural network.

3.2 F-Ratio based optimization technique

F-Ratio is a statistical measure which is used in the comparison of statistical models that have been fit to data set to identify the model that best fits the population from which the data were sampled [wiki]. We can see a multi cluster data as shown in fig 2. F-ratio can be formulated as

F-ratio = Variance of mean between the clusters / Average variance within the cluster

Suppose there are k numbers of clusters each having n number of data points. If xij is an ith element of the jth class then the mean of the jth class µj can be expressed as

The mean of all µj is called the global mean of the data and can be expressed as µ0

The f-ratio can be expressed as

If the f-ratio increases then the clusters move away from each other or the cluster size shrinks. We can apply this f-ratio based optimization technique in case of EEG signals to reduce the dimensionality of the feature vector.

## Fig. 2. Diagram for multi-cluster data

## Experimental Result

4.1 Feature Extraction

## Serial No.

## Coefficient No.

## Coefficients F-ratio

## Serial No.

## Coefficient No.

## Coefficients F-ratio

1

21

1.082

17

1

0.226

2

22

0.9316

18

13

0.1972

3

12

0.5243

19

14

0.1958

4

8

0.4748

20

31

0.1634

5

24

0.4714

21

28

0.1555

6

9

0.4464

22

27

0.0459

7

6

0.4073

23

20

0.0444

8

10

0.401

24

26

0.0407

9

23

0.3981

25

17

0.0036

10

4

0.3915

26

18

0.0023

11

25

0.3708

27

19

0.0011

12

5

0.3699

28

11

0.0004

13

30

0.365

29

15

0.0003

14

16

0.301

30

3

0.0002

15

29

0.2875

31

7

0.0001

16

2

0.2766

From the data available [ ] a rectangular window of length 256 discrete data was selected to form a single EEG segment. For analysis of signals using WT selection of the appropriate wavelet and number of decomposition level is very important. The wavelet coefficients were computed using daubechies wavelet of order 2 because its smoothing features are more suitable to detect changes in EEG signal. In the present study, the EEG signals were decomposed into details D1-D4 and one approximation A4. For the four detail coefficients we get 247 coefficients (129+66+34+18) and 18 for the approximation coefficient. So a total of 265 coefficients were obtained for each segment.

To reduce the number of features following statistics were used:

1. Maximum of wavelet coefficients in each sub band.

2. Minimum of wavelet coefficients in each sub band

3. Mean of wavelet coefficients in each sub band

4. Standard Deviation of wavelet coefficients in each sub band

hence the dimension of DWT coefficients is 20. AR coefficients are obtained by using MATLAB toolbox. Since the model order is 10 we have 11 AR coefficients which are appended in the DWT coefficients Now the total feature dimension is 31 (20+11).

4.2 Application of committee neural network to EEG signals

The committee neural network was formed by three independent members each trained with different feature sets. Prior to recruitment in the committee many networks containing different hidden neurons and initial weights were trained and the best performing three were selected. The decision fusion was obtained using majority voting. In order to reach a firm majority decision odd number of networks is required. The resulting accuracy of individual and the committee is shown in table [1].

## Table 1. Accuracy using Committee neural network

## ANN

## Accuracy

NN1

93.28

NN2

94.52

NN3

92.76

CNN

95

CNN (after score addition)

95.36

4.3 Reduction in Feature dimensions by F-Ratio

We can find the F-Ratio corresponding to each feature as shown in table 2.

## Table 2 F-Ratio

Now in order to reduce the dimension of the feature vector we deleted some features having less F-Ratio. In the table 3 we have shown that the accuracy after the deletion of feature.

## Table 3 Deleted Features

## Serial No.

## No. of coefficients taken

## Network structure

## Accuracy %

1

31

31-93-3

95.31

2

30

30-90-3

94.91

3

29

29-87-3

95.00

4

28

28-84-3

94.71

5

27

27-81-3

94.79

6

26

26-78-3

95.03

7

25

25-75-3

95.32

8

24

24-72-3

95.02

9

23

23-69-3

95.12

10

22

22-66-3

95.37

11

21

21-63-3

94.95

12

20

20-60-3

95.16

13

19

19-57-3

95.45

14

18

18-54-3

95.29

15

17

17-51-3

95.70

16

16

16-48-3

95.83

17

15

15-45-3

95.15

18

14

14-42-3

95.04

19

13

13-39-3

94.35

Hence on the basis of F-Ratio based optimization technique we have deleted the 18 features. Hence we have reduced the computational complexity without affecting the accuracy.