Education Sector Using Data Mining Techniques Education Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.


This is a review paper that is, making another study of predicting the academic performance and retention of students using data mining methods which deals with providing solutions from the analysis of existing data to provide future predictions. The paper reviewed investigated the appropriateness of using data mining tools to predict academic performance and improve retention rate with the results presented and a case study of an Indian business school was used.

Keywords: Data mining,

Neural Network,

discriminant analysis,

logistic regression,

Regression analysis

1. Introduction

Predicting student academic performance precisely will always be valuable to the university that is able to achieve it, it will help in planning and improvement in different contexts within the school environment. If a university is able to discern and separate outstanding students from those who are likely to fail then admission is made a lot easier and resources for study can be allocated precisely according to individual needs this will also ensure maximal student retention rate.

Student retention and academic success cannot be over emphasized they are issues of constant concern to every institution a lot of students leave school without finishing their intended program of study, by predicting their retention and performance, the school can try to ensure that these students are given support to achieve their full potential and strengthening them and putting measures in place that will prevent them from dropping out of school.

1.1 The Business Problem

The general business problem that was addressed in the review paper is this, because business schools recently have been dealing with a rise in the applications they receive during admission processes, it became necessary to be able to measure the potential of this students with regards to their academic performance as this will influence the selection process and decision of admission into business schools. This is useful in admitting the students who are likely to perform better academically and are likely to be retained throughout their study days. The business problems addressed in the paper are:

To reduce the number of student admitted and ensuring only the best is selected

Recuperating the educational attainment of students by encouraging them to learn and remain in the school till graduation.

To differentiate the level of support needed by each student

To reduce the number of school dropouts

To avoid admitting a student who is less likely to perform well over a student who has more potential

1. 2 Data Mining Problem

The business problem was transformed into a classification data mining problem using Neural Networks and traditional statistical techniques.

1.3 Data Mining Methodology

The data mining methodology that was followed at first was the regression model because the academic performance was set as the dependent variable and the regression model can express the two necessary elements in a statistical relation if the dependent variable Y changes with the predictor variable X. The Neural network, discriminant analysis and logistic regression were also used and the performances of all these techniques were compared.

To make an accurate decision for the admission method the syllabus of different programs was considered and factor analysis was used to categorize the make- up of the business school syllabus and with this information, a level of importance is established with various components of the admission process. The paper also contained a full description the various statistical techniques used. SAS Enterprise miner 9.1 was used for the analysis.


1.4 Data Pre-Processing

Different analytical processes were used to pre-process the data like h-statistic,

Cook's statistics were calculated to verify the presence of outliers and significant interpretations but none was found. The data was also divided into training and validation data sets, the data was split into five subsets and four of the samples were used for modelling and one for validation.

Figure1. Knowledge Discovery Process

2. Summary of Results in Paper Reviewed

The business school data that was collected consists of a set of five different groups of management students from different year sets that graduated from the business school, which was a total of 244 student records.

There was also 6 attributes including, Test score, cumulative performance index, grades (core subjects taken), academic results, group discussion and interview score, work experience.

Data was analysed and prediction was done using neural networks and regressions and these two methods are compared on a continuous and categorical range.

Students academic performance was classified into two categories which is successful and marginal, from the study there was a total of 94 students under "Successful" and 150 students under "Marginal". The target is binary with 2 values "0" and "1".

The regression analysis results showed that the model is important and in the forecast process, the variables graduate academic results, test score and group discussion/interview were statistically significant in the predicting student academic performance but work experience was not significant. The model was also cross validated using the training and testing data to obtain the average mean square error.

The Logistic Regression model is derived approximately by means of maximum likelihood ratio method in classifying student's performance into successful or marginal

Using the continuous scale for analysing academic performance, it was discovered that Neural Network model performed better than the Regression analysis but for logistic regression and discriminant analysis no significant differences was found when they were compared with Neural Network.

The results for logistic regression and discriminant analysis were compared with neural networks and the result was that no statistical significant differences were found because a low value of R2(=0.24) was obtained which means predictors did not obtain much useful information. The variables academic results, test score and group discussion/interview were significant predictors of academic performance.

In comparing only logistic and discriminant analysis only the academic results and test scores

were important in discerning between the successful and marginal students, this implies that these two variables are the most essential variables in measuring the academic performance of the business school.

The management subject were grouped into two, which are quantitative and qualitative factors, after the analysis it was discovered that the academic results and test score were important predictors of quantitative factor while academic results, group discussion and interview and work experience were predictors for qualitative factor, but work experience was found not to be relevant to the cumulative performance index.

When the models were compared it was discovered that the neural network model performed better than the regression analysis for prediction but for classification the neural network model was comparable to discriminant analysis and logistic regression.

Figure 2. Neural Network Modelling Process with special reference to multilayered feed forward neural network

2.1 Factor Model

From the review paper, the data was also analysed using factor analysis to be able to spot the essential concept of the school curriculum, partly because in business schools there is a variety of core subjects taken by students and the admission decision process is designed taking this into consideration.

The results show that undergraduate academic results and test score were more important variables in predicting the quantitative factor but for the qualitative factor, only the variables work experience, group discussion and interview and undergraduate academic results were significant in this case.

2.2 Critical Comments and Evaluation of Paper

From the paper reviewed it was deduced that there was a limited amount of data pre-processing carried out on the initial data. Also there seem to be a very small range of data to make a concrete prediction, although the study is just for a particular business school, a wider range of data needs to be studied for the same school or a study that includes other business schools as well, to verify the findings of the report, although a lot of other research has been carried out in this area with comparable results as this one.

3. Conclusion

Presently, different data mining methods have been used to solve a huge variety of practical problems in the education sector. Predictive data mining has become a really important tool for researchers in the Education sector. Understanding the main issues beneath these predictive methods and applying them is necessary for their exploitation and interpretation of their results.

From the paper reviewed the academic performance of the business school graduates was analysed using real life data from the school and applying Neural Networks and regression, these results are compared to find out which model gave the best output for the classification problem and for the prediction problem.

The use of data mining in the Education sector gives us the ability to use the knowledge derived from the analysis to promote student retention and academic success by helping students identify the issues causing academic failure or success like individual factors, self esteem, family issues, community issues etc, when these factors are known, then the likelihood of using them to encourage academic success and student retention is then analyzed.

There is also the advantage the school has with such information, the assessment of the school tutorial strategies and current learning conditions of the student and how it can be improved to become more student centred by special training for school staff so that they are empowered to make progressive decisions and encourage such student to partake in different support programmes available in the schools.