Analysis Undergraduate Student Performance using Feature Selection Techniques on Classification Algorithms

By Matt Swarbrick

✅ Paper Type: Free Essay	✅ Subject: Computer Science
✅ Wordcount: 3742 words	✅ Published: 18 May 2020

Reference this

Share this: Facebook Twitter Reddit LinkedIn WhatsApp

Analysis Undergraduate Student Performance using Feature Selection Techniques on Classification Algorithms

Abstract— Educational Data Mining employed in various field which include various attributes for analysis students details such as name, attendance, class test, lab test, spot test, assignment, result on educational data. In this research mainly focus on calculating performance of undergraduate students in computer science and engineering by a predictive data mining model using feature selection methods with classification algorithms. Feature selection techniques is proposed in data preprocessing process to find out the most inherent and important attributes so that we analyze and evaluate the students better performance. We collected records of 800 students from final year student, studying in undergraduate level from North Western University .In this paper, we used four feature selection methods: genetic algorithms, gain ratio, relief and information gain. Also used to five classification algorithms: K-Nearest Neighbor, Naïve Bayes, Bagging , Random forest, and J48 Decision Tree in this research. Experimental result shown that Gain Ratio feature selection method with 10 feature selected gives the best result on 87.375% accuracy with k-NN classifier. we used dissimilar feature selection techniques are matched by student performance prediction constructed on students academic performance.

Keywords— data mining; feature selection; genetic algorithm ; gain ratio; Relief; information gain; classification; student performance.

I. Introduction

Educational data mining (EDM) describes a research ground anxiety with the application of data mining. Classification is very beneficial data exploration that expect students theoretical performance. In data mining knowledge discovery refers to the broad process of finding knowledge in data. knowledge discovery involves some process such as data selection, data cleaning, data transformation, data integration, pattern evaluation, pattern discovery. The application of data mining is widely prevalent in education system. Educational data mining is an emerging field which can be effectively applied in the field of education. The educational data mining uses several ideas and concepts such as Association rule mining, classification and clustering.[7] The knowledge that emerges can be used to better understand students’ promotion rate, students’ retention rate, students’ transition rate and the students’ success [1].

The data mining system is pivotal and crucial to measure the students’ performance improvement.

we create and initialize the individuals in the population. As the genetic algorithm is a stochastic optimization method, the genes of the individuals are usally initialized at random.

One of the most advanced algorithms for feature selection is the genetic algorithm. This is a stochastic method for function optimization based on the mechanics of natural genetics and biological evolution. [2]

In this paper, we use genetic algorithms as a feature selection method to optimize the performance of a predictive model, by selecting the most relevant features.

A classification built on an connotation instructions algorithm is used to construct a classifier to help estimate the students performance.

By upgraded results, there are some techniques that can rise the correctness of the results tested on data processing techniques.

This technique is able to find info in the form of patterns, topographies known as knowledge.

In this research, mostly emphasis on attribute selection system. We used four Feature selection methods: Genetic algorithm (GA), Gain ratio(GR), Relief and Information gain. Then, we compared student performance by five classification algorithms such as K-Nearest Neighbor(KNN), Naïve Bayes(NB), Bagging, Random forest(RF), and J48 Decision Tree algorithms for each feature selection techniques.

This paper is structured as follows: in Section 1 we discuss about previous researches on student performance prediction in educational field and theirs influence factors. In this section also discuss the effort to deal with feature selection. In Section 2, we describe the step for implementing research from data preparation, preprocessing stage, oversampling stage and selecting best feature from dataset (based on : attribute, dimension and subset). In Section 3 we provide the experimental result and we explain about the results’ analysis. Finally in Section 4 we offer conclusion and discuss the future works.

II. Related work

Now days educational data mining is emerged as a very active research area because there are lots of things in this filed are not exposed. Work connected to student performance, student behavior analysis, faculty performance and impact of this factor on student final performance need much attention.

J K Jothi and K Venkatalakshmi conducted the students’ performance analysis on the graduate students’ data collected from the Villupuram college of Engineering and Technology. The data included five year period and applied clustering methods on the data to overcome the problem of low score of graduate students, and to raise students academic performance [3].

Feature selection is a fundamental stage related to classification accuracy. As the dimensionality of study domain expands, the number of features become higher [2].[4]

A comparison between GA and full model selection (support vector machines and a particle swarm model selection) on classification problems, the results showed that GA gave better performance on problems with high dimensionality and large training sets [5]

Mythili M S and Shanavas A R applied classification algorithms to analyze and evaluate school students’ performance using weka. They came with various classification algorithms, namely J48, Random Forest, Multilayer perception, IBI and decision table with the data collected from the student management system [6].

Noah, Barida and Egerton conducted a study to evaluate students’ performance by grouping the grading into various classes using CGPA. They used different methods like Neural network, Regression and K-means to identify the weak performers for the purpose of performance improvement [8].

Baradwaj and pal described data mining techniques that help in early identification of student dropouts and students who need special attention. Here they used a decision tree by using information like attendance, class test, semester and assignment marks [9].

Remesh, Parkavi, and Yasodha conducted a study on the placement chance prediction by investigating the different techniques such as Naive Bayes Simple, MultiLayerPerception, SMO, J48, and REPTree by its accuracy. From the result they concluded that MultiLayerPerception technique is more suitable than other algorithms [10].

III. Proposed Method

The main ideas of the proposed approach are to increase the performance of classification accuracy and gain the essential features. Establish the features to discover an best set of attributes. This task is carried out using state-of-the-art dimension selection algorithms, namely Genetic Algorithms (GA),Gain Ratio (GR), Relief, Information Gain attribute evaluation (IG).

Data collection

Data Preprocessing

Data cleaning and transformation

Feature Selection methods:

RELIEF

Classifiers:

KNN, NB,Bagging,RF,J48

Evaluation

Final Model

Fig 1: Proposed method

Finally a subset of attributes are select for the classification stage. Attribute removal has played a important role in many classification methods. Eventually, five classification methods, which are measured very strong in solving non-linear problems, are chosen to estimate the class possibility. These methods are K-Nearest Neighbor (KNN), Naïve Bayes(NB), Bagging, Random forest(RF), and J48 Decision Tree classifier.

A. Data Selection

Data used Students’ Academic Performance datasets that consists of 800 students records with 15 features. Data are collected from Department of computer science and engineering, North Western University, Khulna, Bangladesh. Attributes datasets shown in table i.

In this research used 15 attributes such as id, attendance, assignment, class test, lab test, spot test, skill, central viva, extra curriculum activities, quiz test, project/presentation, backlog. Grades are assign to all the students using following mapping: A (91% – 100% ), B (71% – 90% ), C (61% – 70% ) ,D (41% – 60% ) ,F (0% – 40% ) .

Final semester result and final cgpa are assign to all the students using following mapping: A (75% – 100% ), B (70% – 74% ), C (65% – 69% ) ,D (60% – 64% ) ,F (0% – 60% ) .

TABLE I. LIST OF ATTRIBUTE DATASET

Attributes No.	Attributes Name	Possible Values
1	Student Id	Id of the student
2	Attendance	A,B,C,D,F
3	Assignment	A,B,C,D,F
4	Class test	A,B,C,D,F
5	Lab test	A,B,C,D,F
6	Spot test	A,B,C,D,F
7	Skill	A,B,C,D,F
8	Central viva	A,B,C,D,F
9	Extra curriculum activities(ECA)	YES/NO
10	Quiz test	A,B,C,D,F
11	Project/Presentation	YES/NO
12	Backlog	YES/NO
13	Final semester result	A,B,C,D,F
14	Final CGPA	A,B,C,D,F
15	Class	Excellent ,Very Good, Good, Average ,Poor

Extra curriculum activities are divided into two classes: yes(1) and no(0).Quiz test is divided two classes: yes(1) and no(0). And Backlog also divided two classes: yes(0) and no(1) are assign to all the students.

B. Data Preprocessing

1) Data cleaning and transform

Data cleaning is the process of identify and removing corrupt records from a record set.in our student data noisy data remove from dataset.

Data transform is a task to prepare the selected data into format that ready to process. In our experiment, we transformed student grade data by discretized into a categorical classes. A class is divided into 5 classes consisting of Excellent, Very Good, Good, Average and Poor . Final class based on this index shown in table II.

TABLE II. CATEGORICAL CLASSES

Class	Range
Excellent	(11.8 – 13)
Very Good	(9.2 – 11.7)
Good	(7.9 – 9.1)
Average	(5.3 – 7.8)
Poor	(0 – 5.2)

2) Feature Selection

Feature selection is called attribute selection.We used four feature selection methods such as genetic algorithms, gain ratio, relief, information gain. Using this method find out optimal feature for classify better accuracy.

Genetic Algorithms(GA). Genetic Algorithms (GA) are theory algorithms that simulate the action of publication and instinctive selection. Each attribute in data set is calculated as a gene with separate linear series called chromosomes.[11]

GA consequence to initialize a population of solutions. Three operators such as selection, crossover, and mutation are used to the population. Fitness function used in genetic algorithm that is evaluated until optimal solution is arrived.

Gain Ratio (GR). The GR is computed as information gain divided by the entropy of the attribute’s value

GainRatio(Class,Attribute)=InfoGain(Class,Attribute)/H(Attribute) [12]

where H(Attribute) is the entropy of the attribute. GR measures the relative worth of an attribute respect to the class.

Relief. In the Relief algorithm, a good discriminating attribute is defined as the attribute that has the similar feature values in the similar class and dissimilar feature values in dissimilar classes. It uses a nearest neighbor method to calculate relevancy scores for each attribute. It evaluates the worth of an attribute by repeatedly sampling an instance and computing given attribute value based on the nearest instance of the same and different class.

Information Gain (IG). The Information Gain (IG) is a measure based on Entropy. The formula for IG is:

InfoGain (Class ,Attribute)=H(Class)-H(Class | Attribute) [13]

where H (Class) is the total entropy of the class, and H( Attribute, Class) is the conditional entropy of the class given the attribute.

C. Classification

We used five classification algorithms such as K-Nearest Neighbor , Naïve Bayes, Bagging , Random forest, and J48 Decision Tree. This algorithms to mine the data from feature selection steps.

K-Nearest Neighbor (KNN). The K-Nearest Neighbor algorithm called KNN, is a classification algorithm. It is more widely used in classification problems.

K-NN fundamentally works on the belief that the data is connected in a feature space. Hence, all the points are considered in order, to find out the distance among the data points. Euclidian distanceor Hamming distance is used according to the data type of data classes used. In this a single value of K is given which is used to find the total number of nearest neighbors that determine the class label for unknown sample. If the value of K=1, then it is called as nearest neighbor classification.[14]

Naïve Bayes (NB). Naive Bayes is a classification algorithm for binary (two class) and multi-class classification problems.

Bayesian theorem provides an equation for calculating posterior probability P(c | x) from P(c), P(x) and P(x | c):

p(c | x) = p(x | c)*p(c) / p(c) [15]

• P(c | x): the posterior probability of class (c, target) given predictor (x, attributes).

• P(c): the prior probability of class.

• P(x | c): the likelihood, which is the probability of predictor given class.

• P(x): the prior probability of predictor.

Bagging. Bagging is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision tree methods.[16]

Random Forest (RF). Random forests are an ensemble learning method for classification. Its correct for decision trees habit of overfitting to their training set.

Random forest is the combination of different decision trees, used to classify the data samples into classes. It is commonly used statistical technique used for the classification. The worth of each distinct tree in not essential, the purpose of random tree is to reduce the error rate of the whole forest. The error rate depends upon two factors i.e. correlation between two trees and the strength of the tree.[17]

J48 Decision Tree. J48 is an algorithm used to generate a decision tree developed by Ross Quinlan mentioned earlier.

It is an implementation of C4.5 in WEKA. The algorithm uses a greedy technique to induce decision trees for classification and uses reduced error pruning. J48 can use both discrete and continuous attributes, attributes with differencing lost and training data with missing attribute values.[18]

D. Evaluate the results

In this Experiment compares the accuracy results by using the selected attributes on feature selection techniques with each classification techniques. Calculate the accuracy using ten-fold cross-validation. Cross validation is a techniques that validating the accuracy.

WEKA (Waikato Environment for Knowledge Analysis) is used as a data mining tool. Waikato Environment for Knowledge Analysis is a suite of machine learning software written in java, developed at the University of Waikato, New Zealand. It is free software licensed under the GNU General Public License.

Matlab and weka tool used for feature selection, pre-processing and classification.

EXPERIMENT AND RESULTS

The suggested approach for the purpose of predicting student performance applied in this study is carried out in two major phases. In the first phase, the feature space is searched to reduce the feature numbers and prepare the conditions for the next step. This task is carried out using four dimension reduction techniques, namely GA, GR, Relief, IG Algorithms. At the end of this step a subset of features are chosen for the next round.

. The optimal features of these techniques are summarized in table 3. Afterwards, the selected features are used as the inputs to the classifiers. Five classifiers are proposed to estimate the success possibility as mentioned previously, these methods include KNN, NB, Bagging, Random Forest and J48 decision tree.

Fig 2: Student Data Set

Fig 3: Visualize Class Attributes

In our experiment, we applied all dataset to each feature selection method and then, we got the result of selected features set as table III below:

This selected features with five classifiers such as K-Nearest Neighbor , Naïve Bayes, Bagging , Random forest, and J48 Decision Tree compute the performance set as table IV below:

In order to evaluate goodness of each feature selection, we needed further experiment by doing classification of selected features from prior stage

FS Method	No. of Selected feature	Selected feature
GA	10	1,2,3,4,7,9,10,11,12,13
GR	10	1,3,4,6,7,8,10,11,12,13
RELIEF	09	1,2,3,4,8,10,11,12,13
IG	10	1,2,3,4,5,6,7,8,12,13

TABLE III. LIST OF SELECTED FEATURE

Table IV: Performance measures of selected features

Classifier	Performance index	GA	GR	RELIEF	IG
KNN	Accuracy	84.875	87.375	85.875	77.125
	Precision	0.785	0.874	0.859	0.772
	Recall	0.785	0.874	0.859	0.771
	F-Measure	0.784	0.874	0.858	0.771
	ROC Area	0.882	0.936	0.931	0.875
Naïve bayes	Accuracy	75.5	80.375	80.125	76
	Precision	0.759	0.812	0.807	0.768
	Recall	0.755	0.804	0.801	0.760
	F-Measure	0.756	0.805	0.803	0.760
	ROC Area	0.929	0.948	0.948	0.932
Bagging	Accuracy	76.375	81.5	80	75.25
	Precision	0.769	0.816	0.801	0.752
	Recall	0.764	0.815	0.800	0.753
	F-Measure	0.764	0.815	0.800	0.752
	ROC Area	0.930	0.954	0.954	0.931
RF	Accuracy	81.625	86.75	86.625	81.125
	Precision	0.816	0.808	0.866	0.811
	Recall	0.816	0.809	0.866	0.811
	F-Measure	0.816	0.808	0.866	0.811
	ROC Area	0.951	0.878	0.973	0.957
J48	Accuracy	77.75	79.25	81.875	77
	Precision	0.777	0.792	0.819	0.771
	Recall	0.778	0.793	0.819	0.770
	F-Measure	0.777	0.792	0.819	0.770
	ROC Area	0.903	0.902	0.922	0.879

Table IV shows that accuracy results of students’ performance analysis based on students’ dataset. It clearly reveals that Random Forest is a very best classifier for analyzing the students’ performance result with good accuracy.

We computed the accuracy of selected features with all four methods by choose a five classifiers (KNN, Naïve bays, Bagging, RF, and J48 Decision Tree). The result shown as table V and figure 4 below:

Table V: Comparison of FS method

FS Method	KNN	NB	Bagging	RF	J48
GA	84.875	75.5	76.375	81.625	77.75
GR	87.375	80.375	81.5	87.375	79.25
RELIEF	85.875	80.125	80	86.625	81.875
IG	77.125	76	75.25	81.125	77

Fig. 4: Comparison of accuracy of feature selection

Now we computed the Highest accuracy of selected features with Feature selection methods by classifiers. The result shown as table VI and figure 5 below:

The selected attributes on both feature selection methods are further tested on the classification algorithm. In this experiment compares the accuracy results using all selected feature. In table VII, shows Genetic Algoriths(GA) feature selection method with K-Nearest Neighbor(KNN) is 84.875% accuracy. Gain Ratio(GR) feature selection method with K-Nearest Neighbor(KNN) is 87.375% accuracy. Relief feature selection method with Random Forest(RF) is 86.625% accuracy. Information Gain(IG) feature selection method with Random Forest(RF) is 81.125% accuracy. Information Gain(IG) feature selection method with Random Forest(RF) tend to have the lowest accuracy value in this dataset.

Gain Ratio(GR) feature selection method with K-Nearest Neighbor(KNN) gives the best accuracy 87.375%.

Table VII: Comparison of FS With Classifiers

Method	Accuracy(%)
GA + KNN	84.875
GR + KNN	87.375
Relief + RF	86.625
IG + RF	81.125

Fig 5: Comparison of accuracy of feature selection

V. Conclusion

This research aims to develop a model to classify student performance. In our experiment , K-Nearest Neighbor, Naïve Bayes, Bagging , Random forest, and J48 Decision Tree classification algorithms were applied to genetic algorithms, gain ratio, relief and information gain feature selection method. The experimental result had shown that the performance of student calculation model greatly depends on the choice of collection of most related attribute from the list of attribute used in student dataset. Gain Ratio method with K-Nearest Neighbor classifier shown the best accuracy among the other methods.

For the future work, we will apply more feature selection algorithms and also works on optimization algorithms with large datasets.

VI. Acknowledgment

Thanks to our Supervisor, who inspired us with research on this interesting area and for all his helpful tips. The authors thanks to North Western University, Khulna for providing the student data.

VII. References

[1] A. G. Sagardeep Roy, “Analyzing Performance of Students by Using Data Mining Techniques,” in 4th IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics (UPCON) , Mathura, 2017.

[3] J.K.Jothi and K.Venkatalakshmi, “Intellectual performance analysis of students by using data mining techniques”, International Journal of Innovative Research in Science, Engineering and Technology, vol 3, Special iss 3, March 2014.

[4] T. B. A. N. A. S. I. H. Kartika Maharani, “Comparison Analysis of Data Mining Methodology and Student Performance Improvement Influence Factors in Small Data Set,” in International Conference on Science in Information Technology (ICSITech), 2015.

[5] J. M. Valencia-Ramirez, J. Raya, J. R. Cedeno, R. R. Suarez, H. J. Escalante, and M. Graff, “Comparison between Genetic Programming and full model selection on classification problems”, Power, Electronics and Computing(ROPEC), IEEE International Autumn Meeting, pp.1-6, 2014.

[6] M.S. Mythili1 and A.R.Mohamed Shanavas , “An analysis of students’ Performance using classification algorithms ”, IOSR-JCE, Volume 16, iss1, Jan. 2014.

[8] OTOBO Firstman Noah, BAAH Barida and Taylor Onate Egerton, “Evaluation of student performance using data mining over a given data space”, International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-2, iss 4, September 2013.

[9] Brijesh Kumar Baradwaj and Saurabh Pal, “Mining educational data to analyze Ssudents’ performance”, (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 2, No. 6, 2011.

[10] V.Ramesh, P.Parkavi and P.Yasodha, “Performance analysis of aata mining techniques for placement chance prediction”, International Journal of Scientific and Egineering Research , Vol.2, iss 8, August 2011.

Matt Swarbrick

Matt holds a BA and MA certificate from Cambridge, and is an subject-matter expert in Business and Management. Matt also writes about subjects like Finance, Economics and Computing/ICT.

Share this: Facebook Twitter Reddit LinkedIn WhatsApp

Cite This Work

To export a reference to this article please select a referencing stye below:

Related Services

View all

Essay Writing Service

From £99

Report Writing Service

From £99

Student reading and using laptop to study

Assignment Writing Service

From £99

DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please click the following link to email our support team:

Request essay removal