Looking At Weka Tool Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand [13]. The Weka suite contains a collection of visualization tools and algorithms for data analysis and predictive modeling, together with graphical user interfaces for easy access to this functionality shown in Fig 3.1.

weka 1.jpg

Fig 3.1 Weka GUI Tool

The original non-Java version of Weka was TCL/TK front-end software used to model algorithms implemented in other programming languages, plus data preprocessing utilities in C, and a Makefile-based system for running machine learning experiments.

This Java-based version (Weka 3) is used in many different application areas, in particular for educational purposes and research. There are various advantages of Weka:

It is freely available under the GNU General Public License

It is portable, since it is fully implemented in the Java programming language and thus runs on almost any architecture

It is a huge collection of data preprocessing and modeling techniques

It is easy to use due to its graphical user interface

Weka supports several standard data mining tasks, more specifically, data preprocessing, clustering, classification, regression, visualization, and feature selection. All techniques of Weka's software are predicated on the assumption that the data is available as a single flat file or relation, where each data point is described by a fixed number of attributes (normally, numeric or nominal attributes, but other attribute types are also supported).


There are 7 datasets taken from UCI repository [12] and they are shown in Table 3.1.

Table 3.1 Details of all datasets






No. of Classes



















































Detailed descriptions of each datasets are given in appendix A.


The data mining consists of various methods and techniques. Different methods serve different purposes, each method offer its own advantages and disadvantages. However, most data mining methods commonly used for this thesis are of classification category. Classification is a data mining technique used to predict group membership for data instances. There are number of techniques used for classification some of major techniques are:

Decision Tree Induction

Decision tree induction is the learning of decision trees from class-labeled training tuples. A decision tree is a flowchart-like tree structure,where each internal node (nonleaf node) denotes a test on an attribute, each branch represents an outcomeof the test, and each leaf node (or terminal node) holds a class label. The topmost node in a tree is the root node [2].

Bayesian Classification

Bayesian classifiers are statistical classifiers. They can predict class membership probabilities, such as the probability that a given tuple belongs to a particular class [2].

Rule Based Classification

In rule-based classification the learned model is represented as a set of IF-THEN rules. We first examine how such rules are used for classification. We then study ways in which they can be generated, either from a decision tree or directly from the training data using a sequential covering algorithm [2].

Classification by Backpropagation

Backpropagation is a neural network learning algorithm. In a neural network is a set of connected input/output units in which each connection has a weight associated with it. During the learning phase, the network learns by adjusting the weights so as to be able to predict the correct class label of the input tuples [2].

Support Vector Machines

It is an algorithm that uses a nonlinear mapping to transform the original training data into a higher dimension. Within this new dimension, it searches for the linear optimal separating hyperplane. With an appropriate nonlinear mapping to a sufficiently high dimension, data from two classes can always be separated by a hyperplane. The support vector machine finds this hyperplane using support vectors and margins [2].

Association Rule Analysis

Association rules show strong associations between attribute-value pairs (or items) that occur frequently in a given data set. They are commonly used to analyze the purchasing patterns of customers in a store. Such analysis is useful in many decision-making processes, such as product placement, catalog design, and cross-marketing. The discovery of association rules is based on frequent itemset mining [2].

k-Nearest Neighbor

It is based on learning in which comparison is done on a given test tuple with training tuples that are similar to it. The training tuples are described by n attributes. Each tuple represents a point in an n-dimensional space. All of the training tuples are stored in an n-dimensional pattern space. When given an unknown tuple, a k-nearest-neighbor classifier searches the pattern space for the k training tuples that are closest to the unknown tuple. These k training tuples are the k-nearest-neighbors of the unknown tuple [2].

For our thesis we have chosen Decision Tree Induction which is J48 algorithm in Weka tool and classification by Backpropagataion which is Multilayer Perceptron algorithm in Weka tool.

The purpose of choosing these two classification techniques is J48 give us the easy way of a taking decision like tree structure in J48 algorithm which is easily understand by human and their interpretation is also easy. While Multilayer Perceptron algorithm is neural network based algorithm like humans its learning is a critical deciding factor for predicting accuracy.

3.3.1 J48 Algorithm

J48 algorithm of Weka software is a popular machine learning algorithm based upon J.R. Quilan C4.5 algorithm. All data to be examined will be of the categorical type and therefore continuous data will not be examined at this stage. The algorithm will however leave room for adaption to include this capability. The algorithm will be tested against C4.5 for verification purposes [5].

In Weka, the implementation of a particular learning algorithm is encapsulated in a class, and it may depend on other classes for some of its functionality. J48 class builds a C4.5 decision tree. Each time the Java virtual machine executes J48, it creates an instance of this class by allocating memory for building and storing a decision tree classifier. The algorithm, the classifier it builds, and a procedure for outputting the classifier is all part of that instantiation of the J48 class.

Larger programs are usually split into more than one class. The J48 class does not actually contain any code for building a decision tree. It includes references to instances of other classes that do most of the work. When there are a number of classes as in Weka software they become difficult to comprehend and navigate [14].

3.3.2 MLP Algorithm

Multilayer Perceptron classifier is based upon backpropagation algorithm to classify instances. The network is created by an MLP algorithm. The network can also be monitored and modified during training time. The nodes in this network are all sigmoid (except for when the class is numeric in which case the output nodes become unthresholded linear units).

The backpropagation neural network is essentially a network of simple processing elements working together to produce a complex output. The backpropagation algorithm performs learning on a multilayer feed-forward neural network. It iteratively learns a set of weights for prediction of the class label of tuples. A multilayer feed-forward neural network consists of an input layer, one or more hidden layers, and an output layer. An example of a multilayer feed-forward network is shown in Fig 3.2 [2].
























Fig 3.2 A multilayer feed-forward neural network

Each layer is made up of units. The inputs to the network correspond to the attributes measured for each training tuple. The inputs are fed simultaneously into the units making up the input layer. These inputs pass through the input layer and are then weighted and fed simultaneously to a second layer of "neuronlike" units, known as a hidden layer. The outputs of the hidden layer units can be input to another hidden layer, and so on. The number of hidden layers is arbitrary, although in practice, usually only one is used [2]. At the core, backpropagation is simply an efficient and exact method for calculating all the derivatives of a single target quantity (such as pattern classification error) with respect to a large set of input quantities (such as the parameters or weights in a classification rule) [15]. To improve the classification accuracy we should reduce the training time of neural network and reduce the number of input units of the network [16].


For evaluating metrics we use test set of class-labeled tuples instead of training set when assessing accuracy.

Methods for estimating a classifier's accuracy:

Holdout method, random subsampling


Classifier Evaluation Metrics:

Accuracy, Error Rate, Sensitivity and Specificity

Classifier Accuracy, or recognition rate:

Percentage of test set tuples that are correctly classified

Accuracy =

Error rate, misclassification rate:

Error rate: 1 - accuracy, or

Error rate =

Sensitivity, true positive rate, recall:

completeness - what % of positive tuples did the classifier label as positive?

Sensitivity, Recall =

Specificity, true negative rate:

Specificity =

Precision and F-measures


exactness - what % of tuples that the classifier labeled as positive are actually positive

Precision =

Perfect score is 1.0

Inverse relationship between precision & recall

F measure (F1 or F-score): harmonic mean of precision and recall,

F =

Fβ: weighted measure of precision and recall

assigns times as much weight to recall as to precision

Fβ =

Various terminologies used in evaluation measures are:

True positives (TP): These refer to the positive tuples that were correctly labeled by the classifier. Let TP be the number of true positives.

True negatives (TN): These are the negative tuples that were correctly labeled by the classifier. Let TN be the number of true negatives.

False positives (FP): These are the negative tuples that were incorrectly labeled by the classifier. Let FP be the number of false positives.

False negatives (FN): These are the positive tuples that were mislabeled as negative. Let FN be the number of false negative.

These terms are summarized in confusion matrix of Table 3.2.

Table 3.2 Confusion matrix for positive and negative tuples


Predicted class

Actual class
















P + N

The confusion matrix is useful tool for analyzing how well your classifier can recognize tuples of different classes. TP and TN tell us when the classifier is getting things right, while FP and FN tell us when the classifier is getting things wrong.

Evaluating Classifier Accuracy: Holdout & Cross-Validation Methods

Holdout method

Given data is randomly partitioned into two independent sets

Training set (e.g., 2/3) for model construction

Test set (e.g., 1/3) for accuracy estimation

Random sampling: a variation of holdout

Repeat holdout k times, accuracy = avg. of the accuracies obtained

Cross-validation (k-fold, where k = 10 is most popular)

Randomly partition the data into k mutually exclusive subsets, each approximately equal size

At i-th iteration, use Di as test set and others as training set

Leave-one-out: k folds where k = # of tuples, for small sized data

Stratified cross-validation: folds are stratified so that class dist. in each fold is approx. the same as that in the initial data

Issues affecting model selection

Accuracy: classifier accuracy: predicting class label

Speed: time to construct the model (training time). time to use the model (classification/prediction time)

Robustness: handling noise and missing values

Scalability: efficiency in disk-resident databases

Interpretability: understanding and insight provided by the model

Other measures, e.g., goodness of rules, such as decision tree size or compactness of classification rules.