# Classification Using Hybridized Kalman Particle Swarm Optimization Technique Accounting Essay

**Published:** **Last Edited:**

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Abstract-Data classification is an important area of data mining. Several well known techniques such as Decision tree, Neural Network, etc. are available for this task. In this paper we propose a Kalman Particle Swarm Optimized (KPSO) Polynomial equation for classification for several well known data sets. Our proposed method is derived from some of the findings of the valuable information like number of terms, number and combination of features in each term, degree of the polynomial equation etc. of our earlier work on data classification using Polynomial Neural Network. The KPSO optimizes these polynomial equations with a faster convergence speed unlike PSO. The polynomial equation that gives the best performance is considered as the model for classification. Our simulation result shows that the proposed approach is able to give competitive classification accuracy compared to PNN in many datasets.

Keywords-Polynomial Neural Network, Group Methods Of Data Handling, Particle Swarm Optimization ,Kalman Filter

## 1. Introduction

Different parametric and nonparametric approaches like NN, Decision tree; SVM, etc are used extensively for pattern recognition [1-9]. Quite a few approaches are there which produce mathematical models for pattern recognition/data classification tasks. Group Method of data handling (GMDH) based Polynomial Neural Network (PNN) is a popular approach for evolving a short-term polynomial equation for data classification. The short-term polynomial equation is developed taking into account the features of the data sets as input to the PNN. The degree of the polynomials, the number of terms in the polynomial equations and number of features and type of features are determined based on algorithm used for PNN. We have investigated [10] the approach of PNN with different real world data sets. Although this approach is a suitable one but it involves a great deal of computational complexity in terms of time and memory requirements to evolve the polynomial equations to achieve desired classification performance. In this paper we suggest a suitable approach of developing mathematical models in terms of polynomial equations using Kalman Particle Swarm Optimized (KPSO) techniques which is comparatively less complex than PNN providing competitive performance. The KPSO has a fast convergence time compared to basic PSO. The KPSO is based on the principle of Extended Kalman Filtering (EKF)[11-13]. The degree of polynomials, number of terms in the equation and the variables in the equation (i.e. features) are randomly chosen in suitable ranges for developing the model using PSO technique. The ranges are determined from our experience of developing models using PNN approach [10]. Our derived polynomial equations using KPSO are found to be computationally less expensive and the performance is competitive with PNN approach. We have taken few data sets like Iris, Diabetes etc to justify the above.

The section II describes the PNN approach and the motivation for our proposed model. The basic PSO and KPSO are discussed in section III. Section IV and section V describe our model and simulation results respectively. Finally conclusion and further enhancements are given in the section VI.

## II. GMDH-TYPE POLYNOMIAL NEURAL NETWORK MODEL

The GMDH belongs to the category of inductive self-organization data driven approaches. It requires small data samples and is able to optimize models' structure objectively. Relationship between input -output variables can be approximated by Volterra functional series, the discrete form of which is Kolmogorov-Gabor Polynomial [14]

(1)

where denotes the coefficients or weights of the Kolmorgorov-Gabor polynomial & x vector is the input variables. This polynomial can approximate any stationary random sequence of observations and it can be solved by either adaptive methods or by Gaussian Normal equations. This polynomial is not computationally suitable if the number of input variables increase and there are missing observations in input dataset. Also it takes more computation time to solve all necessary normal equations when the input variables are large.

A new algorithm called GMDH is developed by Ivakhnenko[15-16] which is a form of Kolmogorov-Gabor polynomial. He proved that a second order polynomial i.e

(2)

which takes only two input variables at a time and can reconstruct the complete Kolmogorov-Gabor polynomial through an iterative procedure. The GMDH algorithm has the ability to trace all input-output relationship through an entire system that is too complex. The GMDH-type Polynomial Neural Networks are multilayered model consisting of the neurons/active units /Partial Descriptions (PDs) whose transfer function is a short- term polynomial described in equation (2). At the first layer L=1, an algorithm, using all possible combinations by two from m inputs variables, generates the first population of PDs. Total number of PDs in first layer is n = m(m-1)/2. The outputs of each PDs in layer L=1 is computed by applying the equation (2). Let the outputs of first layer are denoted as .The vector of coefficients of the PDs are determined by least square estimation approach.

The details of the model developed by PNN and least square estimation technique are explained below.

Let the input and output data for training is represented in the following manner

In general, it is expressed as

where i =1, 2, 3, â€¦ ,n.

The input and output relationship of the above data by PNN algorithm can be described in the following manner:

where m is the number of features in the dataset.

The architecture of a PNN with four input features is shown in "Fig. 1"

PD

PD

PD

PD

PD

PD

x4

x3

x2

x1

PD

PD

PD

PD

PD

PD

Figure.1: Basic PNN Model

Number of PDs, K in each layer depends on the number of input features M as below

The input index of features (p,q) to each PD, may be generated using the following algorithm

1. Let layer is l.

2. Let k=1,

3. for i =1 to m-1

4. for j = i+1 to m

5. Then will receive input from the features

6. p=i; & q=j;

7. k=k+1;

8. end for

9 end for

Let us consider the equations for the first PD of layer1, which receives input from feature 1 and 2.

where the d vector is the error estimation between the target and the obtained outputs.

This equation in general may be written as

where (i) i=1, 2, ..., n.

(ii) j=1, 2, ..., k

(iii) k=m(m-1)/2

The equations for the least square are

To minimize the error, we get the first derivatives of in terms of all the unknown variables ( i.e. the coefficients).

On expanding the above equations, we get

We know that,

Here,

, and

After obtaining the values of the coefficients with the testing dataset, we estimate the target

If the error level is not up to our desired value, we construct next layer of PNN by taking the output of the previous layer i.e. , which is the input to next layer

and the process is repeated until the error decreases.

We have applied the PNN approach in our earlier work [10]. We have found that as the PNN layers grow the number of PDs in each layer also grows, which requires pruning of PDs so as to avoid the computational complexity. For example for a dataset with 10 features, we need to generate about 20000 PDs at layer 6th even after pruning. Even then it requires large memory to implement the program. Also we have seen that the computational time is large to evolve the mathematical model in form of polynomial equations.

PNN layers are grown based on the error at each level. Classification accuracy at each level is computed. We have seen that the error level decreases gradually even up to 10th layer. However, the classification accuracy is not improved much compared to layer 3rd/4th. This is due to overtraining by the chosen training data size. Moreover, the growth of the layers beyond 3rd/4th involves lot of memory requirements and lot of computational time. However obtaining a suitable trade off is hard to determine, as it is not an iterative method. These findings have motivated us to develop an alternate approach of data classification compared to PNN and our proposed model is based on our experiences on working with PNN approach.

## III. PSO basics AND KPSO

An optimized solution to obtain an optimal classification model with higher accuracy can be possible using particle swarm optimization technique.

## Particle Swarm Optimization (PSO)

PSO technique [17-18] is a novel stochastic optimization technique that has its origin in the motion of a flock of binds searching for food. It was a number of particles and is known as a swarm. Each particle moves in the search space looking for the global minimum or maximum. During its flight each particle adjusts its trajectory by dynamically altering its velocity according to its own flying experience and the flying experience of the other particles in the search space. The PSO approach is becoming very popular due to its simplicity of implementation and its ability to quickly converge to a reasonably good solution.

For a particle moving in a multidimensional search space let xi,j and vi,j denote the position of ith particle in then jth dimension and velocity at time t.

(3) and the global best position is obtained as

(4)

The modified velocity and position of each particle at time (t+1) can be calculated as

(5)

(6)

where vi is the velocity of ith particle at time t+1, xi is the current position, is the inertia weight factor and are acceleration constant, rand ( ) is an uniform random value in the range [0,1], K is the constriction factor which is a function and given by

(7)

and

A suitable section of inertial weight and acceleration coefficients and is crucial in providing a balance between the global and local search in the flying space.

A suitable choice of is

(8)

Where itermax=maximum no. of generations

Iter = current number of generations

The particle velocity at any instant limited to a chosen Vmax, which if too high will result in allowing the particles to fly past good solutions. On the other hand if vmax is too small, particles end up in local solutions only. The acceleration factors and used in equation ( 5 ) are varied according to the following equations:

(9)

The acceleration factors and are varied from 2.5 to 0.5, respectively to find out the best ranges for optimum solution.

The objective function chosen for fitness evaluation of a population of particles is given by

(10)

## (b) Kalman PSO

The Kalman PSO (KPSO) is a new approach to particle motion in PSO that reduces the number of iterations required to reach an optimum solution. It uses the Kalman filter algorithm to calculate the next position of each particle. The best location in the neighbourhood is used as an observation at each time step, producing a new location through prediction. The basic equations of KPSO are summarised as follows:

(11)

(12)

(13)

Where F and H are the transition and sensor characteristic matrices, are the respective covariance matrices. The value of is obtained from the observations and K is the time. The transition and observation matrices are

## ,

and the estimated particle position and velocity vector is obtained as

, with initial value as ,

and I is an unit matrix of appropriate dimension.

The covariances are defined as

(14)

and

where is chosen as 0.001 and w1 depends on the size on the limits of particle velocity.

The filtered or true state of the particle is determined from a distribution

(15)

The position information in the KPSO algorithm is used to set the new position of the particle and the velocity information is unused except for the next iteration of the Kalman update.

As with GA, PSO also has premature convergence and thus results in an inaccurate solution. To circumvent this problem, an alternative approach of dynamically varies the inertia weight randomly based on the variance of population fitness. This results in a better local and global searching ability of the particles, which improves the convergence of the velocity and optimal values of co-efficient of the polynomial equations.

The inertia weight W is updated by finding the variance of the population fitness as

(16)

Where

favg=average fitness of the population of particles in a given generation.

fi=fitness of the ith particle in the population.

M=total number of particles

i=1, 2,3â€¦â€¦M,

In the equation given above f is normalizing factor, which is used to limit. If is large, the population will be in a random searching mode, while for small or ,the solution tends towards a premature convergence and will give the local best position of the particles. To circumvent this phenomenon and to obtain gbest solution, the inertia weight factor is updated as

The forgetting factor is chosen as 0.9 for faster convergence.

Another alternative will be

Here

Where rand ( ) is a random number between (0,1).Here the influence of the past velocity of a particle on the current velocity is chosen to be random and the inertia weight is adapted randomly depending on the variance of the fitness value of a population. This result is an optimal coordination of local and global searching abilities of the particles.

## IV. KALMAN Particle swarm Optimized Polynomial: our proposed approach

In our proposed approach we apply KPSO techniques to evolve few polynomial equations to classify the data set. Finally, we choose a suitable polynomial equation as our model for the data set under investigation which gives better classification accuracy.

The polynomial equation considered in our approach can be expressed in below given form

(17)

where

n is the number of polynomial terms chosen randomly from a suitable range

p is the number of features in each term chosen randomly from the given set of features for the dataset under consideration.

r is the index of feature a random integer value between 1 and number of features in the respective dataset.

q is the degree of the feature, a random integer value chosen from a suitable range.

Our proposed model is a mimic of the PNN model. So for obtaining the suitable ranges of terms like n, p and q, we have extensively analyzed the PNN model developed [10].

A. Basis of determining the range of highest degree of Polynomial equation

While developing the models for different data sets using the Polynomial Neural Network (PNN) [10], we have observed that many of the models are generated with satisfactory classification accuracy at the layer 3 or layer 4. Each PD in the model develops an output equation in the form

where i and j takes values from the range of 1 and number of features in the data set under study, where i is not equal to j. And x being the feature value or the output of the PD in the previous layer. We have observed that in many of the cases the competitive classification accuracy are obtained at 4th layer. A biquadratic polynomial equation having highest degree 2 is used for our PNN approach [10].Hence in each subsequent layer it gets doubled and at 4th layer the maximum degree of the polynomial is 16.

"Fig.2" describes the possible generation of highest degree of polynomials at different layers. Considering the performance of PNN model [10] we have chosen the degree of polynomials in our KPSO model to be in the range of 0 to 16.

Figure.2: Highest possible degree of any

feature in the PDs of different layer

Layer1

Layer2

Layer3

Layer4

x2

x

x

x2

x4

x8

x8

x4

x16

B. Basis of determining the number of terms in the polynomial equation

For construction of PDs of first layer in the PNN model [10] the biquadratic polynomial equation that we have used consists of 6 terms. The second layer takes two such inputs from the first layer, so the maximum number of terms possible in layer 2 is 6*6 i.e 36. In general the maximum number of terms that can be generated in any layer is , where l is the number of layer. "Fig.3" shows the possibility of generations of maximum number of terms at different layers.

Max. no. of

terms/layer

Layer1

Layer2

Layer3

Layer4

Figure.3: Highest possible number of terms in the polynomial equation.

PD

x

x

PD

PD

PD

PD

PD

PD

However we know that if all the features belong to unique categories, then generation of maximum terms may be possible. For example, let us consider a, b, c & d are the four unique features, then multiplication of polynomial of two terms generate four terms i.e.

But if we consider only a & b, it will generate three terms i.e.

In our PNN models [10] each PDs from layer 2 onwards get inputs which are combinations of outputs from PDs of 1st layer and original inputs given to layer 1. For example in layer 2, if the output of one such PDs is produced by taking features and i.e.

and the other input is feature , then the polynomial equation generated out of it after ignoring the coefficients is as follows.

The maximum number of terms expected is 51 but number of terms actually generated is 15 only. Further as we always feed the different combinations of the same group of features that is available in the dataset, the number of polynomial terms is much less than that the maximum is expected. From experimentations it has been revealed that depending on the size of input features, a range of 10 to 50 numbers of polynomial terms are enough to approximate the nonlinear dataset.

## C.Basis of considering the maximum number of unique features in each term of the polynomial equation.

The polynomial equation we have considered for the PNN models can have at best two unique features at layer 1. If layer 2 gets input from two PDs of layer 1, without any common features then at layer 2 any of the polynomial terms can have maximum 4 unique features. "Fig.4" shows the possibility of unique features in each term up to layer 3. So if we consider our best result [10] within layer 4, maximum unique features may be up to 16 in each polynomial term. From simulation of different datasets using PNN [10] we have seen maximum of 4 to 8 unique features (subject to availability of features in the dataset) are enough to approximate the non-linearity of the data sets under investigation.

At layer2

4

At layer3

8

At layer1

2

Max. no. of

unique features

Figure.3: Number of unique features in each term of the polynomial equation.

x1

x2

x3

x4

x5

x6

x7

x8

After determining suitable ranges for n, p, and q, we used the following algorithm to obtain the polynomial equation as our mathematical model for classification of datasets using KPSO techniques.

For sm=1 to maxSimulation

dim=random integer between 10 to n

initialize swarmPosition, velocity, pbest values.

for dm=1 to dim-1

poly(dm,1)=random integer between 1 to p

//poly is two dimensional matrix, each row represents each

term of the polynomia1, 1st position of each row stores the

number of features in each term, next ft positions meant to

store the indices of features and the next ft positions for the

degree of respective features, ft is the number of features

in the dataset .

z=2;

for ind=1to poly(dm,1)

poly(dm,z)=random integer between 1 to featureSize

z=z+1;

endfor

z=1+featureSize+1;

for deg=1 to poly(dm,1)

poly(dm,z)=random integer between 1 to q

z=z+1;

endfor

for k=1 to NoOfRecordsInDataset

z1=2;z2=1+features+1;

for j=1to poly(dm,1)

d(k,dm)=d(k,dm)*(data(k,poly(dm,z1)).^

poly(dm,z2));

z1=z1+1;z2=z2+1;

endfor

endfor

endfor//end of dim loop

Using values polynomial terms from matrix d and

respective coefficients from the swarmPosition, evaluate

the polynomial equations and initialize the gbest value

for it =1 to maxIteration

update the velocity and swarmPositions usingK PSO

equation. Evaluate Polynomial terms as before and update

pbest and gbest

endfor//end of maxIteration loop

If current result is better than the previous best simulation,

store the result.

endfor//end of maxSimulation

## V.Simulation and Result

We have experimented different benchmark datasets for developing the mathematical models using KPSO approach. Here we have presented five datasets collected from UCI Repository of Machine Learning database (see www.ics.uci.edu). A brief description of the properties of these dataset is presented in Table I.

Table I: DATA SETS

## #

Datasets

Cases

Classes

Attributes

1

Iris

150

3

4

2

Diabetes

768

2

8

3

Pima

699

2

8

4

Iono

351

2

34

5

Balance Scale

625

3

4

The data set is divided into two parts. One part is used for building the model and other part is used for testing the model. Using the training dataset we develop the model and verify its performance taking the testing dataset. Different models are generated in each simulation. Using KPSO the coefficients of the polynomial equations are optimized. If a model gives better performance over the previous best model, then it is preserved for the future reference. We have obtained competitive classification accuracy for our data sets with 1000 simulations using our proposed KPSO approach.

The percentage of correct classification for each data set using the PNN model[10] and our proposed model(KPSO Model)discussed in this paper is presented in the Table II.

Table II: CLASSIFICATION ACCURACY

## Dataset

## % of correct classification

PNN

KPSO model

Iris

98.69

99.00

Diabetes

77.34

78.73

Pima

41.79

76.89

Iono

35.89

87.55

Balance Scale

58.24

68.01

The mathematical models two datasets are illustrated below as samples for PNN[10] and KPSO models .

## Mathematical Model for Iris data set:

## 1. Developed by PNN

## 2. Developed By KPSO_Polynomial

## vi. CONCLUSION AND FUTURE ENHANCEMENT

In this paper we have proposed a novel method of pattern recognition/data classification using kalman particle swarm optimization technique which is faster in convergence time and better in accuracy.. Our work is inspired from the difficulties we have faced while doing data classification using Polynomial Neural Network. The mathematical models suggested using KPSO approach is computationally less expensive compared to the model derived using PNN. Our KPSO approach also suggests requirement of few numbers features out of the all available features of the investigated data set for classification. We have compared our result with the PNN model for five real world data sets.. In all the cases our approach presents competitive classification accuracy compared to PNN approach. The mathematical model derived using our approach indicates the usefulness of the features of dataset for classification.

As further research, it remains to be seen how this approach behaves for the data sets having missing values and categorical data.