Mining model algorithm and data mining algorithm used in sql server

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Mining Model Algorithm

Data mining is a set of sophisticated tool and algorithms that will allow analyst and end users to solve the problems or else which would take huge amounts of manual effects or else would simply remain unsolved. Data mining algorithm are the foundations for creating the mining models. Algorithms are mathematical functions that will perform specific types of analysis on the associate data sets. SQL Server 2005 has seven world-class data mining algorithms. They are Microsoft Naïve Bayes, Microsoft Decision Trees, Microsoft Time Series, Microsoft Clustering, Microsoft Association Rules, Microsoft Neural Network and Text Mining (Zhaohui Tang and Jamine Maclennan, 2005). In these algorithms some are unsupervised and supervised. The supervised are Microsoft Association Rules, Microsoft Naïve Bayes, Microsoft Decision and Microsoft Neural Network. The unsupervised are Microsoft Time Series, Microsoft Clustering and Text Mining (Lynn Langit, 2007). Hence from the above context it can be understood that data mining is a set of sophisticated tool. They are seven data mining algorithm used in SQL Server 2005.

Microsoft Clustering

Microsoft Clustering algorithm finds the natural grouping inside the data when these grouping are not apparent. This will find the hidden variables that will accurately classified the data. This will find the hidden dimensions that are unique data, it will also provide the information in the way that is impossible to achieve with the predefined organizational methods. This algorithm uses iterative techniques to group records from the dataset into clusters which will contain similar characteristics (Lynn Langit, 2007). These types of algorithms are often used as a starting point to help end users to understand the relationship between attributes in a large volume of data in a better manner. These clusters can be used for explore the data, learning more about the relationships that exist, which may not be easy to derive logically through casual observation (Ray Rankins, Paul Jensen and Paul Bertucci, 2002). Hence it can be understood that Microsoft Clustering algorithm is find the hidden variables that will accurately classified the data.

Microsoft Association

Microsoft Association algorithm is related to priori association family. It is very efficient and popular algorithm to find frequent itemsets in the dataset. Two steps are involved in this algorithm in that first step is calculation intensive phase to find frequent itemsets and second one is create association rules based on the itemsets (Zhaohui Tang and Jamine Maclennan, 2005). This algorithm considered each value or attribute as an item. This is mainly developed to implement in the basket analysis. This algorithm makes the rules that explain which items are close to each other in the transformation (Mike Gunderloy and Joseph L.Jorden, 2006). It can find group of items called as itemsets in a single transformation. This algorithm search complete data set to discover item sets that tend to appear in many transactions. This algorithm contains parameters. The parameter SUPPORT defines how many transactions the itemsets must appear in before it is considered significant (Lynn Langit, 2007). Hence it can be understood that this algorithm is related to priori associate family. They are steps involved in this. The first step is calculation intensive phase and second is creating create association rules based.

Microsoft Naive Bayes

The Microsoft Naive Bayes algorithm will enable the user to quickly create models which will be having predictive abilities and also provides a new method of exploring and understanding user data. It will build the mining models that will be used for classifying and prediction. This algorithm helps in calculating the probabilities for each possible state of the input attribute. When each state of the predictable attribute is given, which can be used later to predict an outcome of the predicted attribute based on the known input attributes (Jamie Maclennan, Zhaohui Tang and Bogdan Crivat, 2009). This algorithm will support only the discrete or discredited attributes. In this all the input attributes are considered as independent. This is called naïve because they will be no one attribute that has higher significance. It is considered as a start point data mining process, because most of the calculations are used to create the model are generated during cube processing, results are retuned quickly (Lynn Langit, 2007). Hence from the above context it can be understood that Microsoft Naïve Bayes will help the user to create models quickly which will be having predictive abilities.

Microsoft Time Series

Microsoft time series algorithm is an algorithm which is used to predicting and analyzes the time dependent data. Generally, this algorithm is the combination of two algorithms in one industry standard ARIMA algorithm, which was which was introduced by Box and Jenkins and second algorithm is ARTxp algorithm developed by Microsoft (Brian Larson, 2008). Time series algorithm includes series of data gathered over successive periods of time or other time indicators. The main aim of this algorithm is to estimate the future series points and take the valuable decisions based on past historical data. This algorithm can produce best results with minimum of information (Jamie Maclennan, Zhaohui Tang and Bogdan Crivat, 2009). This algorithm has a great future that is it can automatically detect the seasonality with the help of fast Fourier transform so it is an efficient method to analyze the frequencies. One or more variables can be selected to predict by using this algorithm. It can use cross-variable correlations in its predictions (Zhaohui Tang and Jamine Maclennan, 2005). Hence, The Microsoft Time Series algorithm creates models that can be used to predict continuous variables over time from both OLAP and relational data sources.

Microsoft Sequence Clustering

Microsoft sequence clustering algorithm mainly used to analyze sequence data but it also many other uses. Segmentation and sequence analysis are the fundamental features of this algorithm. It can also used for classification and regression (Lynn Langit, 2007). The Microsoft Sequence Clustering algorithm is a hybrid of sequence and clustering algorithms. The algorithm groups multiple cases with sequence attributes into segments based on similarities of these sequences (Otey, 2005). The Microsoft Sequence Clustering algorithm can group these Web customers into more-or-less homogenous groups based on their navigations patterns. These groups can then be visualized, providing a detailed understanding of how customers are using the site (Florent Masseglia, Pascal Poncelet and Maguelonne Teisseire, 2007). Hence, form the above discussion it can be understood that it can analyzes sequence-oriented information that includes discrete-valued series. Usually the sequence attribute in the series holds a set of events with a specific order. By analyzing or predicting the transition between states of the sequence, the algorithm can predict future states in related sequences.

Microsoft Neural Network

The Microsoft Neural Network algorithm that will create a classification and regression mining models that can be constructed multilayer perceptron network of neurons. This Neural network technology can be applied to more and more commercial applications. This uses the weighted sum approach in this the output of combination is then passed through the activation function. The Microsoft Neural Network works by creating and training artificial neural paths that are used as patterns for further prediction (Jamie Maclennan, Zhaohui Tang and Bogdan Crivat, 2009). The Microsoft Neural Network is used as a Discrimination Viewer similar to those the other algorithm. This algorithm will provide processes the entire set of cases, iterating comparing the predicted classification of the cases with the known actual classification of the cases. Neural networks are more complicated than Naïve Bayes and decision trees (Zhaohui Tang and Jamine Maclennan, 2005). Thus, when the clients need to apply the algorithm in more than one application this is the best algorithm technique.

Microsoft Logistic Regression

Microsoft Logistic regression algorithm is another form of Microsoft Neural Network algorithm. Logistic regression is a well-known statistical method for determining the contribution of multiple factors to a pair of outcomes (Msdn, 2009). If the problem contains one of two possible outcomes this algorithm is very useful to model that data. This algorithm can be used in many fields because of its flexibility (Brian Larson, 2008). This algorithm has been mostly used by statisticians to predict and model the statistical and probability information based on input values. This algorithm can support the prediction of both continuous and discrete attributes (Jamie Maclennan, Zhaohui Tang and Bogdan Crivat, 2009). Hence, from the above discussion it can be understood that Logistic regression algorithm is simple and highly flexible, taking any kind of input, and supports numerous analytical tasks like weight and Explore the factors that contribute to a result and Classify e-mail, documents, or other objects that have many attributes.

Effectiveness of Data mining

Data mining technique is an effective modeling technique used in business to take effective decisions in the organizations. Data mining techniques gather the information from different areas and it also use the historical information. Before you can efficiently use data mining tools, you must have large amounts of information in storage. Data mining is a modeling process which transfer the information enfolded in a dataset into a form amenable to human cognition. Recently available tools of data mining are support only automatic modeling. Data mining tools are using in different areas effectively because of its features. These can be used to understand the business better and also exploited to improve future performance through predictive analytics. It is very useful for marketers because it provides perfect trend details and customers' purchasing behavior. In addition, data mining may also help marketers in predicting which products their customers may be interested in buying. Through this prediction, marketers can surprise their customers and make the customer's shopping experience becomes a pleasant one. Retail stores can also benefit from data mining in similar ways.Data mining can help effectively for financial institutions in areas such as loan information and credit reporting. For example, by examining previous customers with similar attributes, a bank can estimated the level of risk associated with each given loan. Additionally, data mining can also assist credit card issuers in detecting potentially fraudulent credit card transaction. Data mining can aid law enforcers in identifying criminal suspects as well as apprehending these criminals by examining trends in crime type, habit, location and other patterns of behaviors. Data mining can assist researchers by speeding up their data analyzing process. Thus, allowing those more time to work on other projects.Hence, from the above discussion it can be stated that data mining can be implemented in different areas like banking, crime and financial organizations. This technique is a powerful technique which is very useful technique to take the decisions.