Accuracy Achieved Using Decision Tree Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

A Data Mining is nothing but extracting meaningful data from a huge data set. A class of database applications that look for hidden patterns in a group of data that can be used to predict future. Data warehousing is a process that must occur before any data mining can take place. In other words, data warehousing is the process of compiling and organizing data into one common database. Classification trees are used for the kind of Data Mining problem which are concerned with prediction. For e.g., whether the weather on a particular day will be "sunny", "rainy" or "cloudy". Popular classification techniques include decision trees, neural networks, nearest neighbour classifier, case-based reasoning, Genetic algorithm, Rough set approach, and Fuzzy set approaches. This survey paper describes various classification method in data mining for the user to detect fraud credit card, to predict dynamic stock exchange, using Bayes, KNN, and SVM. Multiple different classification algorithms are trained on the result set to build final classifier model based on tree. The best accuracy was obtained from decision tree compared to other classifiers.

Keywords -Decision Tree, Bayes Classifier, Support Vector Machine, KNN


In this paper, we present a data-based framework for user modelling that uses both unsupervised and supervised classification to discover and capture effective or ineffective student behaviours while interacting with exploratory learning environments. A classification in data mining generally defines to predict categorical class labels (discrete or nominal) also classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data.

A Machine learning is a self-configuring data structures that allow a computer to do things that would be called "intelligent" if a human did it. The process of machine learning is similar to that of data mining. Both systems search through data to look for patterns. However, instead of extracting data for human comprehension -- as is the case in data mining applications -- machine learning uses that data to improve the program's own understanding.

Machine learning programs detect patterns in data and adjust program actions accordingly.  For example, Facebook's News Feed changes according to the user's personal interactions with other users. If a user frequently tags a friend in photos, writes on his wall or "likes" his links, the News Feed will show more of that friend's activity in the user's News Feed due to presumed closeness.

Supervised learning is a machine learning technique for creating a function from training data. The training data consist of pairs of input objects, and desired outputs. The output of the function can be a continuous value (called regression), or can predict a class label of the input object (called classification). The task of the supervised learner is to predict the value of the function for any valid input object after having seen a number of training examples (i.e. pairs of input and target output). To achieve this, the learner has to generalize from the presented data to unseen situations in a "reasonable" way.

Unsupervised learning is a method of machine learning where a model is fit to observations. It is distinguished from supervised learning by the fact that there is no a priori output. In unsupervised learning, a data set of input objects is gathered. Unsupervised learning then typically treats input objects as a set of random variables. A joint density model is then built for the data set.

Existing Systems

Naïve Bayes

The Naive Bayes algorithm is based on conditional probabilities. It uses Bayes' Theorem, a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data. The naïve Bayes classifier is used to exploit the unlabelled data by using with EM algorithm. The Expectation Maximization (EM) algorithm is used to consistently reduce the average classification error rates when the amount of labelled data is small.

The EM is a general framework for estimating the parameters of a probability model when the data has missing values. This algorithm can be applied to minimally supervised learning, in which the missing values correspond to missing labels of the examples. The Naive Bayes algorithm affords fast, highly scalable model building and scoring. It scales linearly with the number of predictors and rows. The build process for Naive Bayes is parallelized. Table 1 is used to construct the below summary for naïve Bayes,

Correctly Classified Instances 201 77.0115 %

Incorrectly Classified Instances 60 22.9885 %

Kappa statistic 0.4631

Mean absolute error 0.266

Root mean squared error 0.3822

Relative absolute error 58.9747 %

Root relative squared error 81.6432 %

Coverage of cases (0.95 level) 99.2337 %

Mean rel. region size (0.95 level) 86.9732 %

Total Number of Instances 261

Support Vector Machine s

The support vector machine (SVM) is a training algorithm for learning classification and regression rules from data, for example the SVM can be used to learn polynomial, radial basis function (RBF) and multi-layer perceptron (MLP) classifiers. It is so flexible to choose a similarity function. When dealing with large amount of data sparseness of solution are arrived. To separate hyper plane only support vector are used to specify them. They are capable of handling large features spaces where complexity and dimensionality does not depend on them. Over fitting can be controlled by soft margin approach. A property named nice math property is a simple convex optimization problem which is guaranteed to converge to a single global solution. Procedures to be followed,

Transform data to the format of an SVM package

Conduct simple scaling on the data

Consider the RBF kernel

Use cross-validation to find the best parameter C and γ

Use the best parameter C and γ to train the whole training set [1] 


In SVM, the Process Initialization Parameter used are Sample Selection , Process Selection, Hierarchical Clustering, Classification Input, Use SVM Classification Editor, Use Classification File , Kernel Matrix Construction, Diagonal Factor, Threshold, Constraints, Positive Constraint, Negative Constraint.

SVMs cannot represent the score of all companies as a simple parametric function of the financial ratios, since its dimension may be very high. If the number of features is much greater than the number of samples, the method is likely to give poor performances. SVMs do not directly provide probability estimates, these are calculated using five-fold cross-validation, and thus performance can suffer.

K-Nearest Neighbour(K-NN):

K-nearest-neighbour classification was developed from the need to perform discriminant analysis when reliable parametric estimates of probability densities are unknown or difficult to determine. The general idea of this method is to divide the data sample into a number of v folds (randomly drawn, disjointed sub-samples or segments).

For a fixed value of k, we apply the KNN model to make predictions on the vth segment (i.e., use the v-1 segments as the examples) and evaluate the error. The kNN algorithm, like other instance-based algorithms, is unusual from a classification perspective in its lack of explicit model training.

While a training dataset is required, it is used solely to populate a sample of the search space with instances whose class is known. No actual model or learning is performed during this phase; for this reason, these algorithms are also known as lazy learning algorithms. Approximating real valued or discrete-valued target functions.

KNN consists of training data. When a new query instance is encountered, a set of similar related instances is retrieved from memory and used to classify the new query instance. Construct only local approximation to the target function that applies in the neighbourhood of the new query instance.

They never construct an approximation designed to perform well over the entire instance space. Instance-based methods can use vector or symbolic representation to construct the set in KNN. Only approximate definition are determined for neighbouring instances. The main demerit of instance-based methods is that the costs of classifying new instances can be high nearly all computation takes place at classification time rather than learning time. Data are represented in a vector space.

In KNN a method LVQ (Learning Vector Quantization) is used to find the smallest distance of the unknown vector from a set of reference vectors is sought. Kohonen Self Organizing Maps Perform a topologically ordered mapping from high dimensional space onto two-dimensional space. The centroids (units) are arranged in a layer (two dimensional space), units physically near each other in a two-dimensional space respond to similar input. Thus vector quantization and Maps are organized to find the distance between two vector space.


Bagging" stands for "bootstrap aggregating. It was proposed by Breiman (1996). It is a method of combining multiple predictors. Consider a training set D ={(yi ,xi ):i =1,2,..,n},which we need to make prediction for an observation with x. Sample B data sets, each consisting of n observations randomly selected from D with replacement. Then {D1, D2... DB} form B quasi replicated training sets. Train a machine or model on each Db. for b=1... B and obtain a sequence of B predictions. The final aggregate classifier can be obtained by averaging (regression) or majority voting (classification).

Bagging and boosting are methods that generate a diverse ensemble of classifiers by manipulating the training data given to a "base" learning algorithm. Using various experiments it is analyzed that only with little or no classification noise, randomization is competitive with (and perhaps slightly superior to) bagging but not as accurate as boosting. In situations with substantial classification noise, bagging is much better than boosting, and sometimes better than randomization.


In order to improve the learning accuracy and also to reduce the computational cost select a particular case samples from a database which contain a large amount of data. This paper aim is to select a sample from the original database as a subset and treated as training set. A new mechanism accuracy learning when compared with the existing random selection, this classifies the maximum unclear data from the original database.

The main advantage of this mechanism is adjustment is made with a minimized work when adding the sample set to the training set. This advantage is confirmed via the theoretical analysis of the leaf-nodes' frequency in the decision trees. A tree is constructed which helps to label the data & filter them from the original database. Furthermore, theoretical and experimental analysis is shown such that our decision tree performs more when compared with various methods.

Decision Trees

A new sample selection mechanism has been initially proposed for the fuzzy decision tree induction. This paper makes an attempt to develop a maximum uncertainty-based sample selection mechanism and then to apply it to the fuzzy decision tree learning. This mechanism selects the samples based on the principle of maximal classification ambiguity.

The decision tree generated from the selected samples usually has a better performance than that from the original database. Crisp decision tree, when an unseen new instance is matching to the decision tree, the matching output of the decision tree is an exact class because only one rule matches the instance. The major advantage of this mechanism is that the adjustment of the fuzzy decision tree is minimized when adding the selected.


Step 1: Data partitions. Each data set is divided into three parts: training set, instance pool (short for pool), and testing set. Training set is used to build a classifier/learner which is used to select next instance; instance pool is a set of Unlabelled instances which provide candidate instances for the learner to select; testing set are used to test the performance of the current classifier. In our experiments, we choose one from the fivefold as testing set, one fold of the remaining as the training set, and the others as the instance pool.

Step 2: Training a fuzzy decision tree by Min-A. In our experiments, each attribute is discredited into two values by Kohonet's feature maps algorithm, and then each attribute is fuzzified into two linguistic terms by Triangular fuzzification method. The final data set is the 0.45-strong set of the fuzzified data sets, which means that the cut level is 0.45. In the growing of the decision tree, the truth level threshold is set as 0.85, which means that a leaf node is produced when the classification accuracy of a node is bigger than or equal to 0.85.

Table 1. Diabetes Diagnosis Database


PG Concentration

Diastolic BP

Tri Fold Thick

Serum Ins


DP Function




















































































Step 3: Estimating the memberships of each instance in the pool to each class by using the newly built fuzzy decision tree and getting its classification ambiguity.

Step 4: Selecting the instance with the maximum classification ambiguity to label. Then, moving it to the training set from the pool.

Step 5: If the selected samples are less than the predefined size, then select next instance following the steps Step 2-4; otherwise, train a decision tree using the labelled samples and test the tree using testing set. Using an open source popular data mining tool we have taken various attributes from diabetes diagnosis database and we have compared with the

existing system and proved such that we have given more accurate results in decision tree. We have chosen the best attribute through the entropy, information gain formulae and constructed a graph accordingly, here we have calculated the accuracy details for decision tree which gives more accurate than existing system.

Correctly Classified Instances 199 76.2452 %

Incorrectly Classified Instances 62 23.7548 %

Kappa statistic 0.4342

Mean absolute error 0.3125

Root mean squared error 0.4059

Relative absolute error 69.2946 %

Root relative squared error 86.7189 %

Coverage of cases (0.95 level) 98.8506 %

Mean rel. region size (0.95 level) 89.8467 %

Total Number of Instances 261

=== Confusion Matrix ===

a b <-- classified as

47 36 | a = Sick 26 152 |

b = Healthy


This paper proposes a sample selection method based on the maximal classification ambiguity in fuzzy decision tree induction and gives an analysis on the significance of the sample selection methodology. It selects the instance with maximal evaluation ambiguity when the instance is matching to the fuzzy decision tree. The selected instance can minimize the adjustment of the generated fuzzy decision tree and finally build a fuzzy decision tree with high performance gradually.

We have analysed with various existing algorithm such as naïve Bayes, bagging, SVM, etc. such that decision tree produce more appropriate result which is proved through the given summary. The graph, accuracy details are produced with the help of an open source data mining tool. . Furthermore, theoretical and experimental analysis is shown

such that our decision tree performs more when compared with various methods.