Data mining and its significance

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

1.1. Data Mining

Data Mining is the process which is utilized to analyze the data from the different views and summarizing it into practicable information and this information is used for increasing the revenue and for cutting the cost. For analyzing the data, data mining software is used and data mining is the one of analytical tools (Lan H. Witten and Eibe Frank, 2005). Data base technology has been characterized by the popular adoption of relational technology and developments activities on the new and database systems. Data mining allows client to allow data from various dimensions and will categorize it, and can summarize the relationships which is identified. Technically, data mining is the process of finding correlation between the many fields in large relational data. For solving business problems data mining software allows users to analyze large databases (Jiawei Han and Micheline Kamber, 2006). Data mining is just a technology it is not a business solution. Data mining can be performed on data represented in quantitative, textual, or multimedia forms. Data mining applications can use a variety of parameters to examine the data. They include association, sequence or path analysis, classification, clustering, and forecasting. Data mining tools predict future trends and behaviors allowing business to make active and to take decisions. The automated prospective analyses offered by data mining move beyond the analyses of past events provided by tools typical of decision support systems (Hand D.J., Heikki Mannila and Padhraic Smyth, 2001). Data mining tools can answer business questions that traditionally consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Most of the companies already collect and rectify huge quantities of data. Data mining techniques can be implemented quickly on existing software and hardware platforms to raise the value of existing information resources and can be integrated with new products and systems as they are brought online (Sushmita Mitra and Tinku, 2003). Various terms have been used to refer to data mining. These include data extraction. Some of the data mining techniques include those based on the rough sets, logic programming among others. Data mining process is not trivial it contains many steps such as business problem definition, data collection, data preprocessing etc. In every given step various types of techniques may applied. Due to the complexity of data mining process and data mining tools normal business users cannot easily use data mining tools to solve their business problems. Data mining practice in industry mostly depends on the experienced data mining professionals for providing the solutions. Data mining practice has become more costly and time consuming (Graham J. Williams and Simeon J. Simoff, 2006). It is integrated with many technologies such as visualization and parallel computing. It is being carried out in various fields. Database management researches are taking advantage of the work on the deductive and intelligent query processing for the data mining. These areas are interested to extend query processing techniques to facilitate data mining. Data warehousing is also another key data management technology for integrating the various data sources and organizing the data so that it can be effectively considered. Researches in statistical analysis are integrating their techniques with machine learning techniques for developing more techniques for data mining. Various analysis packages are now marketing by the data mining. It attracted a great deal of attention in the information industry from the past few years. The information and knowledge gained can be used for the applications ranging from the market analysis to production control and science exploration. Data mining can be viewed as a result of the natural evolution of information. The database system industry has witnessed an evolutionary path in the development of the functionalities. With the numerous database system offering query and transaction processing as common practice so that advance data analysis has naturally become the next target. Large scale information technology is developing separate transaction and analytical systems; data mining provides the link between the two. Data mining software analyzes relationships and patterns in stored transaction data based on indeterminate user queries. Several types of analytical software are available where as statistical, machine learning, and neural networks. In statistical, stored data is used to locate data in predetermined groups (Jeffrey W. Seifert, 2004). Data items are grouped according to logical relationships or consumer preferences in clusters when considering the association, Data can be mined to identify associations and in sequential patterns Data is mined to anticipate behavior patterns and trends.

1.2 Significance of Data mining

Data mining helps to extract unsuspected data from very large databases. Data mining is an advanced tool for managing magnanimous data. The previously collected data will be analyzed which is the secondary analysis. Misuse detection searches the attack patterns which are known. The present generation of commercial intrusion detection systems has implemented this strategy. In data mining implicit, unknown and potentially useful information is extracted from database. The misuse detection systems include data mining in its strategy. JAM (Java Agents for Meta-learning) implements data mining techniques for discovering the intrusion patterns. Meta learning classifier is applied to analyze the signature of attacks. Features are extracted from corresponding algorithms which are used to compute models of intrusion behavior (Daniel Barbara and Sushil Jajodia, 2002). Therefore data mining in JAM builds a misuse detection model. Data warehousing has become affordable as the data mining techniques have reduced the costs involved in data processing. The implementation of data mining tools on high performance parallel processing systems analyzes the magnitude of databases in minutes. It helps the users to experiment with various models to understand the complex data. The speed factor makes it more practical to analyze the data. Data mining algorithms have existed for more than a decade but these algorithms are being used recently as mature, reliable. Understandable tools targeted to outperform older methods. Enhanced analytical models and algorithms like data visualization and exploration and others provide profound analytical depth.

By implementing data mining precisely businesses can mine data regarding customers' purchasing patterns, gain, behavior and a better understanding of the customer to help minimize the fraud, resource forecasting, to increase acquisition of the customer and finally to curb customer erosion. It helps in improving the production quality and to reduce the losses in production while manufacturing. Perfect implementation of data mining strategies helps in identifying the hidden patterns in a single step (George Fernandez, 2003). There are various data mining products the most prevalent are conditional rules or association rules. Conditional rules are drawn from induced trees while learning from tabular data is done in association rules. The most common among the two is association rules. Data mining offers several algorithms for various problems, at the same time learning data streams poses new challenges to data mining. In these situations training examples are generated at random. Natural approach for these kinds of incremental tasks consists of adaptive learning algorithms. The learning systems in data mining are able to exploit constant, high volume, open ended data streams. The properties of these systems are they require small constant time per data. It uses fixed amount of main memory irrespective of data. It has the potential to deal with changes in the target concept. To achieve these properties they require sampling and randomization techniques. Some data stream models allow delete and update operators (Pavel Brazdil, Christophe Giraud-Carrier, Carlos Soares and Ricardo Vilalta, 2008). Mining data stream aims to infuse knowledge structure represented in models and patterns. The crucial issue in data stream mining is to locate frequent patterns which are urged by business applications like e-commerce, recommender systems, supply chain management and group decision support systems. Many algorithms had been proposed constantly to make this fast and accurate (Reda Alhajj, 2007).

The signification of peer-to-peer downloading is examined, and the data mining technology is employed to P2P downloading detection. A model to detect P2P downloading is built. The Agriori algorithm is improved according to the given domain knowledge, and the detection performance of the algorithm improved is certified by experiments. Finally, the rules mined by the improved algorithm are interacted with firewall, and the utilization ratio of the campus network is promoted. Data mining is a process of finding useful knowledge from the data and also discovering the useful patterns among data sets which helps in intelligent decision making. Data mining in agriculture is a relatively novel research field. Efficient techniques can be developed and tailored for solving complex agricultural problems using data mining (Ian H. Witten and Eibe Frank, 2000) Recommendations for future research directions in agriculture-related fields can be provided. Due to the rapid growth of electronic data having graph structures such as HTML and XML texts and chemical compounds, many researchers have been interested in data mining and machine learning techniques for finding useful patterns from graph structured data (Sankar K. Pal, Pabitra Mitra and Pabitra Mitra, 2004 ). Since graph data contain a huge number of substructures and it tends to be computationally expensive to decide whether or not such data have given structural features, graph mining problems face computational difficulties. Data mining can frequently provide additional help than web search services. For instance authoritative web page analysis based on the linkage between the web pages can assist in the classification of web pages based on the importance, influence and topics. Web community analysis helps in identification of hidden Web Social networks and communities. Web mining is the development of scalable and efficient web data analysis and mining methods. It will help in distribution of information and to locate the web dynamics and the association and other relationships among various web pages.

1.3 Data Mining and Knowledge Data Discovery (Need to be done)

For drawing out the practical information from the immense repositories of the various types data, the recently emerged an important direction are Data mining and Knowledge data Discovery is used. These are used to know the different concept of the data mining tasks. In order to make progress in superiority of the health care of the patient without exploding the cost, the healthcare system offers the great potential in reducing the charge of the hospitalization (Joseph Tan and Joseph K. H. Tan, 2005). The process of Knowledge Data Discovery deals with the identification of the potential use and the features of patient motoring system. This is essential for discovering the relevant actionable patterns that contributes the modeling system (Yeal Song, Johann Eder and Tho Manh Nguyen, 2007). The figure has shown below give a complete detail of the KDD process of the e-health. This process work with the relevant data selection, feature extraction and construction is done with the help of the visual data. In this process the pattern discovery is used to give the full details in the form of the descriptive and predictive pattern of the patient modeling and the rules for adapting the constructing steps.