This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Data mining studies based on algorithms and computational paradigms that allow computer to discover structure in database, performing prediction & forecasting or improving their performance through interaction with data. Nowadays machine learning is very important for developing & researching in different field, Machine learning is the study of algorithms that automatically improve their performance with experience. Machine learning enable program to analyze content of data automatically. At early decade researchers were working on learning algorithms & theses were available in different languages, used on different platforms, and operated on many data formats. The task to collecting different learning schemes together for comparative study & collection data sets to obtained more efficient and best of them. It was envisioned that WEKA would not only provide a toolbox of learning algorithms, but also a framework inside which researchers could implement new algorithms without having to be concerned with supporting infrastructure for data manipulation and scheme evaluation.( Frank, Holmes, Reutemann, Witten & Hall, 2010)
The main purpose machine learning to find most relevant information from data and use this crystallized information to make prediction and intelligent decision faster and more accurately, then Weka, which is machine learning toolkit. Weka was developed by University of Waikato in New Zealand and the name ¿½Weka¿½ stands for Waikato Environment for Knowledge Analysis. Weka system is written in Java and distributed under the terms of the ¿½GNU¿½ General Public License; it runs and tested almost any platform like Linux, Windows, Macintosh operating systems and even personal digital assistant. Weka is widely used as software package for educational as well as practitioner research purpose. The first version of Weka developed by a team from University Waikato New Zealand 13 years ago for public used. The Weka team has put tremendous amount of effort into to continually developing, updating & maintaining since 1994. The Development of Weka funded by a grant from the New Zealand Government¿½s Foundation for Research, Science and Technology. Weka team has been awarded with 2005 ACM SIGKDD service Award for their development of Weka system, Including accompanying book named ¿½Data Mining; Practical Machine Learning Tools and techniques¿½ written by Professor. Ian H. Witten and Associate Professor Eibe Frank. As Gregory Piatetsky-Shapiro writes in the news item about this event (KDnuggets news, June 28, 2005), ¿½Weka is a landmark system in the history of the data mining and machine learning research communities, because it is the only toolkit that has gained such widespread adoption and survived for an extended period of time (Markov & Russell, 2006). Due to recognized of Weka as land landmark system in data mining and machine learning and its wide spread acceptance within academic and business areas, the researcher & practitioners widely used as tool for data mining research. The book that accompanies related to Weka and data mining named Data Mining; Practical Machine Learning Tools and Techniques is very popular text book for frequently cited for machine learning publications.
Fig 2.1 Choosing Weka interface main screen
The main philosophy behind Weka to move away from supporting computer science or machine learning researcher towards supporting end user of machine learning. Weka is collection of machine learning algorithms and data pre processing tools for Data Mining tasks for researchers and practitioners, Weka and is extensible and has become a collection of machine learning algorithms for solving real-world data mining problems. Weka contains tools for data preprocessing, classification, regression, clustering, association rules and visualization. Weka designed help to quickly try out existing methods to new datasets in flexible ways. It provides extensive support for the whole process of experimental Data Mining including preparing the input data, evaluating learning schemes statistically, and visualization input data and the result of learning. Weka easily to modify for their user, we can apply the algorithms in Weka directly to a dataset or call them from writing our own Java code.
The Weka machine learning workbench includes methods provides for all the standard data mining problems & create a general purpose environment for automatic regression, classification, clustering, association ruling mining, and attribute selection. It contains an extensive collection of machine learning algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it.( Frank, Hall, Trigg, Holmes & Witten, 2004). To get know the data integral part of work, several data visualization facilities and data processing tools are provided for it. All algorithms take their input in form of a single relational table in the ARFF (Attribute relation file format) it can read from file or generated by data base query.
The ways to using Weka apply learning method on data set to analyze its output for learn more about the data. Second way is to use learned module generate prediction on new instances. Third way to apply several different learners and compare their performance in order to choose one of best for prediction the learning method is called ¿½classifier¿½. Common evaluation modules are used to measure the performance of all classifiers. Many classifiers have multiple parameters by which we can accessible through a property or object editor. Weka is also includes implementations of algorithms for different purpose like have no class value and select different relevant attributes in the data.
Main advantages and disadvantages of Weka
There are some main advantages & disadvantages of Weka are discussed below:
The first one advantage of Weka is open source and it can obtain free and it is maintainable without depending commitment any particular institution or company. Second, it provides precious knowledge of state-of-the-art machine learning algorithms that can be deployed on any given problem. Third one is fully implemented in Java which is can be modify and it can be runs on almost any platform. The main disadvantage is that most of the functionality is only applicable if all data is held in main memory. A few algorithms are included that are able to process data incrementally or in batches Frank et al., 2002. However, for most of the methods the amount of available memory imposes a limit on the data size, which restricts application to small or medium-sized datasets. If larger datasets are to be processed, some form of sub sampling is generally required. A second disadvantage is the flip side of portability: a Java implementation may be somewhat slower than an equivalent in C/C++. (Frank, Hall, Holmes, Kirkby, Pfahringer & Witten, 2005)
Basic Functionality Weka
WEKA¿½s functionality can be accessed through various Graphical User Interfaces & the easiest way to use Weka is through, Explorer, Experimenter, Knowledge flow Interface & Command Line Interface. Weka is provides an excellent graphical user interface.
The easiest way to use Weka through Graphical User Interface called Explorer. The Explorer Interface guides us by presenting choices as menus to do work in an appropriate order by graying out option until they are applicable and presenting option as forms to be filled out, Weka Explorer is main graphical user interface. It gives access to all of its facilities using menu that corresponding to various data mining tasks supported, from Weka Explorer user can quickly read datasets from an ARFF file or spreadsheet and user could built free decision. Weka Explorer is panel based interface, it has six different panels, where different panels correspond to different data mining tasks; first panel is ¿½Pre-process¿½ data can be loaded through Wake¿½s data preprocessing tools called ¿½filter¿½. Second panel in Explorer is called ¿½classify¿½ which explorer enables to access Wake¿½s classification and regression algorithms. Third panel of Weka Explorer called ¿½cluster¿½, it enable user to apply clustering algorithm to the dataset and provides simple statics for evaluation of clustering performance. Fourth panel of explorer is ¿½Associate¿½, it provide access to algorithms for learning association rules. The next fifth panel of explorer called ¿½Attribute selection¿½ it performs most important task in practical data mining and it identify which attributes in data are the most predictive one and provides to access various methods for measured utility of attributes. The sixth & final penal of explorer called ¿½Visualize¿½, it provides a color coded scatter plot matrix, and the user can select and enlarge individual plots.
Fig 2.2 the Explorer Interface
Knowledge Flow is another interface of Weka, user select it from tool bar and able to connect them into a direct graph which can process and analyze data. Knowledge Flow provides alternative of explorer, it is a Java Beans application which allows same kind of data exploration, processing and visualization as the Weka Explorer. Knowledge flow interface allows you to design configurations for streamed data processing. A fundamental disadvantage of the Explorer is that it holds everything in main memory¿½when you open a dataset, it immediately loads it all in. This means that it can only be applied to small to medium-sized problems. However, Weka contains some incremental algorithms that can be used to process very large datasets. (Witten and Frank, 2005)
Fig 2.3 Knowledge Flow Interface
Knowledge flow helps users to drag boxes and represent learning algorithms and data sources around the screen and join them together into the configuration according user need. It enables specify data sources, preprocessing tools, learning algorithms, evaluation methods, and visualization modules while the filters and learning algorithms are capable of incremental learning, so the data will be loaded and processed incrementally.
Simple Command-line Interface
Simple CLI provides a command line interface that allows direct execution of Weka command for operating system. One of Weka¿½s features ability to be executed via operating systems Command-line Interface for initial experiments. The Graphical User Interface is quite sufficient, but if a user observed in depth usage. The Command-Line Interface facilitates users because it offers some functionality which is not available in Graphical User Interface. Simple Command-line Interface as Weka interface, from which user can access all functionality or some more specialized functions, so it means user could use Weka without a windowing system.
Fig 2.4 Simple Command Line Interface
All the learning techniques can be accessed from command line as part of shell scripts, or from other Java programs using the Weka API; it¿½s because of command line interface gives access to all features of the system. Some incremental algorithms are also implemented which able to process very large data sets through command-line interface.
Weka Explorer and Knowledge Flow environments, provides to determine which machine learning techniques performs on given datasets. In depth investigative need some experiments typically running several schemes and different techniques on different datasets with various parameters values set, so Explorer & Knowledge Flow interfaces are not really suitable for this. Weka Experimenter interface is designed for give to answer practical questions related to when and where the suitable techniques are used. Experimenter helped to which method, technique and parameter values work best for given problem or process. The workbench has been developed for reason to provide environment that enables Weka user to compare a variety of learning technique. This can be done by interactively using by the Explorer. However, the Experimenter allows user to automate the process by making it easy to run classifier and filters with different parameter setting on a corpus of datasets, collects performance statistics, and perform significant tests. The Experimenter enables user to set large scale of experiments, start them running, leave them when the experiments finished, Experimenter to have analyzed the performance statistics, and collect and stored results in ARFF format, subject for further data mining process.
Fig 2.5 the Experimenter Interface
We are designing a parallel distributed system for Weka Experimenter for special purpose of drastically decreasing the amount of time. The Knowledge Flow transcends limitations of space by allowing machine learning runs that do not load in the whole dataset at once; the Experimenter transcends limitations of time (Witten and Frank, 2005). Advanced user can use Experimenter to distribute computing load across multiple machine using Java Remote Invocation (RMI). In this way user can set up big Experiments & just leave them to run.