# Positive And Negative Association Rules English Language Essay

**Published:** **Last Edited:**

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Abstract- Self-Organizing Map (SOM) is a type of neural network that uses the principles of competitive or unsupervised learning. In unsupervised learning there is no information about a desired output as is the case in supervised learning. This unsupervised learning approach forms abstractions by a topology preserving mapping of high dimensional input patterns into a lower-dimensional set of output clusters. These clusters correspond to frequently occurring patterns of features among the input data. Due to its simple structure and learning mechanism SOM has been successfully used in various applications and it has proven to be one of the effective clustering techniques. The useful properties of SOM mentioned above have motivated me to investigate whether the method can be applied to the problem of frequent pattern discovery in the association rule framework sense. More specifically, it needs to be determined whether the notion of finding frequent patterns can be interchangeably substituted with the notion of finding clusters of frequently occurring patterns in the data. There is a relationship between the task of separating frequent patterns from infrequent patterns from data and the task of finding clusters in data, as a cluster could represent a particular frequently occurring pattern in the data. A cluster would in this case correspond to a pattern, and hence it would be interesting to see whether a cluster set obtained from a clustering technique can correspond to a set of frequent patterns obtained through the Apriori approach. The clustering technique can be used to detect frequent patterns in a top-down manner as opposed to the traditional approach that employs a bottom-up lattice search. Furthermore we would like to find the necessary restrictions or limitations on the clustering technique so that the extracted patterns satisfy the specified support constraints.

Index Terms-Association Rule Mining, Data Mining, Frequent Patterns, Self-Organizing Map.

I INTRODUCTION

Data Mining or knowledge discovery in database is to find new knowledge from database. However, the dimensionality, complexity, or amount of data is prohibitively large for manual analysis. With a huge amount of data stored in databases, it is increasingly important to develop powerful tools for mining interesting knowledge from it. In recent years, the computational efficiency of modern computer technology is making the mining fast and precise. One of the most interesting developments in this area is the application of neural computation.

Self-Organizing Map (SOM) (Kohonen, 1990) is a type of neural network that uses the principles of competitive or unsupervised learning. In unsupervised learning there is no information about a desired output as is the case in supervised learning. This unsupervised learning approach forms abstractions by a topology preserving mapping of high dimensional input patterns into a lower-dimensional set of output clusters (Sestito & Dillon. 1994). These clusters correspond to frequently occurring patterns of features among the input data.

The useful properties of SOM mentioned above have motivated me to investigate whether the method can be applied to the problem of frequent pattern discovery in the association rule framework sense. More specifically, it needs to be determined whether the notion of finding frequent patterns can be interchangeably substituted with the notion of finding clusters of frequently occurring patterns in the data. There is a relationship between the task of separating frequent from infrequent patterns from data and the task of finding clusters in data, as a cluster could represent a particular frequently occurring pattern in the data. A cluster would in this case correspond to a pattern, and hence it would be interesting to see whether a cluster set obtained from a clustering technique can correspond to a set of frequent patterns obtained through the Apriori approach. I want to investigate whether the clustering approach can automatically filter out the infrequent patterns and detect only the frequent ones. In other words the clustering technique can be used to detect frequent patterns in a top-down manner as opposed to the traditional approach that employs a bottom-up lattice search. Furthermore I would like to find the necessary restrictions or limitations on the clustering technique so that the extracted patterns satisfy the specified support constraints.

The self-organizing map (SOM) [5] is an unsupervised neural network algorithm. It has been widely applied to solve problems such as pattern recognition, financial data analysis, image analysis, process monitoring, and fault diagnosis [3, 6]. Some major components of data mining are discussed and the applications of the SOM in association rules mining are proposed.

A. Data Mining

Data mining is the process of extracting patterns from data. Data mining is becoming an increasingly important tool to transform these data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery. Data mining can be used to uncover patterns in data but is often carried out only on samples of data. The mining process will be ineffective if the samples are not a good representation of the larger body of data. An important part of the process is the verification and validation of patterns on other samples of data.

B. Stages of KDD

The Knowledge Discovery in Databases (KDD) process consists of several stages as follows: selecting the target data, preprocessing the data, transforming them if necessary, performing data mining to extract patterns and relationships, and interpreting and assessing the discovered structures. There are considered five stages, presented in figure 1: Selection - this stage consists on creating a target data set, or focusing on a subset of variables or data samples, on which discovery is to be performed; Pre-processing - this stage consists on the target data cleaning and pre processing in order to obtain consistent data; Transformation - this stage consists on the transformation of the data using dimensionality reduction or transformation methods; Data Mining - this stage consists on the searching for patterns of interest in a particular representational form, depending on the DM objective (usually, prediction); Interpretation/Evaluation - this stage consists on the interpretation and evaluation of the mined patterns.

The KDD process is preceded by the development of an understanding of the application domain, the relevant prior knowledge and the goals of the end-user. It must be continued by the knowledge consolidation, incorporating this knowledge into the system (Fayyad et al, 1996).

## Figure 1: An overview of the Steps that Compose the KDD Process.

2 ASSOCIATION ANALYSES

Association rule mining, one of the most important and well researched techniques of data mining. It aims to extract interesting correlations, frequent patterns, associations or casual structures among sets of items in the transaction databases or other data repositories. Association rules are widely used in various areas such as telecommunication networks, market and risk management, inventory control etc. Various association mining techniques and algorithms will be briefly introduced and compared later.

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database. The problem is usually decomposed into two sub problems. One is to find those itemsets whose occurrences exceed a predefined threshold in the database; those itemsets are called frequent or large itemsets. The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence. Suppose one of the large itemsets is Lk, Lk = {I1, I2, â€¦ , Ik}, association rules with this itemsets are generated in the following way: the first rule is {I1, I2, â€¦ , Ik-1}âˆ©{Ik}, by checking the confidence this rule can be determined as interesting or not. Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent, further the confidences of the new rules are checked to determine the interestingness of them. Those processes iterated until the antecedent becomes empty. Since the second sub problem is quite straight forward, most of the researches focus on the first sub problem. The first sub-problem can be further divided into two sub-problems: candidate large itemsets generation process and frequent itemsets generation process. We call those itemsets whose support exceed the support threshold as large or frequent item-sets, those itemsets that are expected or have the hope to be large or frequent are called candidate itemsets.

A. Association Rule Mining Steps

Let I ={ I1, I2,â€¦, Im} be a set of items. Let D, the task-relevant data, be a set of database transactions where each transaction T is a set of items such that T â‰¤ I. Each transaction is associated with an identifier, called TID. Let A be a set of items. A transaction T is said to contain A if and only if A â‰¤ T. An association rule is an implication of the form A=>B, where ACI , BCI , and Aâˆ©B=¢. The rule A=>B holds in the transaction set D with support s, where s is the percentage of transactions in D that contain A/B (i.e., the union of sets A and B, or say, both A and B). This is taken to be the probability, P(AUB). The rule A => B has confidence c in the transaction set D, where c is the percentage of transactions in D containing A that also contain B. This is taken to be the conditional probability, P (B/A). That is,

Support (A=>B) => P (AUB) â€¦â€¦â€¦â€¦â€¦â€¦(2.2)

Confidence (A=>B) => P (B/A)â€¦â€¦â€¦â€¦â€¦ (2.3)

Rules that satisfy both a minimum support threshold (min sup) and a minimum confidence threshold (min conf) are called strong. By convention, we write support and confidence values so as to occur between 0% and 100%, rather than 0 to 1.0.

A set of items is referred to as an itemset. An itemset that contains k items is a k-itemset. The set (computer, antivirus software) is a 2-itemset. The occurrence frequency of an itemset is the number of transactions that contain the itemset. This is also known, simply, as the frequency, support count, or count of the itemset. Note that the itemset support defined in Equation (2.2) is sometimes referred to as relative support, whereas the occurrence frequency is called the absolute support. If the relative support of an itemset I satisfy a pre specified minimum support threshold (i.e., the absolute support of I satisfies the corresponding minimum support count threshold), then I is a frequent itemset. The set of frequent k-itemsets is commonly denoted by LK.

From Equation (2.3), we have

(2.4)

Equation (2.4) shows that the confidence of rule A=>B can be easily derived from the support counts of A and AUB. That is, once the support counts of A, B, and AUB are found, it is straightforward to derive the corresponding association rules A=>B and B=>A and check whether they are strong. Thus the problem of mining association rules can be reduced to that of mining frequent itemsets. In general, association rule mining can be viewed as a two-step process:

1. Find all frequent itemsets: By definition, each of these itemsets will occur at least as frequently as a predetermined minimum support count, min support.

2. Generate strong association rules from the frequent itemsets: By definition, these rules must satisfy minimum support and minimum confidence.

3 SELF ORGANIZING MAP

A. Introduction

SOM (Kohonen, 1990) is an unsupervised neural network that effectively creates spatially organized "internal representations" of the features and abstractions detected in the input space. It consists of an input layer and an output layer in form of a map (see figure 1). SOM is based on the competition among the cells in the map for the best match against a presented input pattern. Each node in the map has a weight vector associated with it, which are the weights on the links emanating from the input layer to that particular node. When an input pattern is imposed on the network, a node is selected from among all the output nodes as having the best response according to some criterion.

## Figure 1: SOM consisting of two input nodes and 3 * 3 map

This output node is declared the "winner" and is usually the cell having the smallest Euclidean distance between its weight vector and the presented input vector. The winner and its neighboring cells are then updated to match the presented input pattern more closely. Neighborhood size and the magnitude of update shrink as the training proceeds. After the learning phase, cells that respond in similar manner to the presented input patterns are located close to each other, and so clusters can be formed in the map. Existing similarities in the input space are revealed through the ordered or topology preserving mapping of high dimensional input patterns into a lower-dimensional set of output cluster.

When used for classification purposes, SOM is commonly integrated with a type of supervised learning in order to assign appropriate class labels to the clusters. After the learning phase has come to completion, the weights on the links could be analyzed in order to represent the learned knowledge in a symbolic form. One such method is used in "Unsupervised BRAINNE" (Sestito & Dillon, 1994) which extracts a set of symbolic knowledge structures in form of concepts and concept hierarchies from a trained neural network. A similar method is used in the current work. After the supervised learning is complete each cluster will have a rule or pattern associated with it, which determines which data objects are covered by that cluster.

It was mentioned that SOM is good at mapping high dimensional input patterns into a lower-dimensional set of output clusters. As we map higher-dimensional space into a lower-dimensional space, there are only two possibilities. Either the data in lower-dimensional space represents the data from higher-dimensional space in compressed form or the data in the lower-dimensional space is corrupted because there isn't enough resolution (map size) to store the data. Putting mining frequent patterns into the context, we will investigate a relationship between isolating frequent patterns from infrequent patterns and mapping higher-dimensional space into lower-dimensional space. The relationship becomes important when there is a correspondence between the two frameworks. The key to realize such correspondence lies in understanding the way SOM learns without supervision.

The notion of competitive learning hints that the way knowledge is learnt by SOM is competitive based on certain criteria. Therefore, with a smaller output space the level of competition increases. Patterns exposed more frequently to SOM in the training phase are more likely to be learnt while the ones exposed less frequently are more likely to be disregarded. The number of nodes in the output space, width x height, in SOM is normally set less than or equal to 2n, where n denotes the number of features in the input space, i.e. the dimension of input space.

The frequently occurring patterns (or rules) will influence the organization of the map to the highest degree, and hence after training the resulting clusters correspond to the most frequently occurring patterns. The patterns that are not covered by any clusters have not occurred often enough to have an impact on the self-organization of the map and are often masked out by frequent patterns.

Input Transformation: A transactional record, R, can be represented as a tuple of items from 1 to n number of items, {I1, I2, â€¦, In}. A transactional database, T, of size s, consists of many records {R1, â€¦, Rs}. For a record, Ri, if an item, Ij, is a part of the record then it will have a corresponding value 1; and 0 when the opposite condition occurs. Representing transactional records in this way is suitable for the SOMs input layer.

Training the SOM: SOM needs to be trained with a large set of training data until it reaches a terminating condition. The terminating condition is reached when either, the number of training iterations (epochs) has reached its maximum or the mean square error (mse) value has reached its minimum or a pre-specified limit. The mse value should be chosen small enough so that the necessary correlations can be learned. On the other hand it should be large enough to avoid the problem of over-fitting which results in a decrease of the generalization capability (Sestito & Dillon, 1994).

B. Self-Organizing Map in Association Rules Mining

In association rules mining [12], a set of items is referred as itemset. An itemset that contains k items is a k-itemset. Suppose I = {i1, i2, ..., im} is a set of items, D is a set of database transactions, where each transaction T is a set of items such that T C I . Let A be set of items. A transaction set T is said to contain A if and only if A C T. An association rule is an implication of the form A =>B, A C I, B C I and A âˆ© B = Ï†. It has the following two significant properties:

## Support: (A => B) = P (AUB)

The probability of itemset D contains both itemset A and B.

## Confidence: (A => B) = P (A/B)

The probability of itemset D contains A also contains B.

Rules that satisfy both minimum support threshold and a minimum confidence threshold are called strong. Association rules mining is the rule to generate support and confidence which are greater than or equal to their minimum support and confidence. An itemset satisfies minimum support is called a frequent itemset, the set of frequent k-item sets is commonly denoted by Lk.

C. SOM Clustering

Transactions as in Table 1 in a database can be modeled as data vectors, each transaction will be converted to an input vector. If it has item i in the transaction, the ith component in the vector will be 1, otherwise 0. The data modeling can be done at the time when transactions are extracted from the database. To train the input vectors, a SOM or GHSOM [3] neural network can be initialized to generate map units. Different from the usual neural networks we used in other pattern, in this particular training, the network has only one vector in each row, each neuron just has neighbors in different rows. From the map units, one can easily find the relationship between different items.

## Table1. Transaction data for Allen's Electronics Branch

Transactions in Table1 are converted to SOM input samples as follows:

The SOM algorithm has two phases [5], the first is the search phase, during which each node i computes the Euclidean distance between its weight vector wi(t) and the input vector T(t) as following:

Then it chooses the closest neuron i0 such that

where neuron i0 is called the winner or winning cell. The second phase is the update phase, during this phase, a small number of neurons within a neighborhood around the winning cell, including the winning cell itself, are updated by

where Î±(t) is the learning constant.

D. Association Rules Mining on SOM Clusters

The purpose of using SOM to train the data set is to obtain the visualization of structure of the transactions and reduce the association rules mining time. After iterations of SOM training, if two or more neurons are in the same cluster, it means these neurons have one more similar or same components in their weight vectors, therefore transactions which have some or even exactly same items will be in same cluster. There are more transactions in one cluster; the support of itemsets in the cluster will be higher. Using the SOM trained outputs, we have the following two approaches to generate association rules for the database:

1. Obtain the association rules by observing SOM trained map units. Fig. 1 is the map units of 5000 iterations SOM trained transactions of Allen's Electronics Branch [1]. From the three classes in this figure, we have immediate conclusions about the association rules for this mining. For example, itemset {I5, I6}, {I3, I5, I6}, {I2, I5, I6}, {I5, I6, I10}, and {I1, I4, I9} and their subsets have bigger support counts than any other itemsets, these items must have some rules associated.

2. Generate association rules for a giving minimum support count. For a k-itemset, in Fig. 1, the support count for 1-itemset is the total count of 1s in each column. If the support count of an 1-itemset is less than the minimum support count, then the support count of any k-itemset which contains this 1-itemset will be less than the minimum support count. For saving time, we remove these columns if their total counts in the map units are less than the minimum support count, then use Apriori algorithm to generate strong association rules for the remained items. This will take less scanning time but generate same association rules for the database. In Fig. 1, if the minimum support count is 4, then column 7 and column 13 will be removed from the map units because their support counts are 3 and 2 respectively. Table 2 shows the remained items and their 1-item set support counts, the association rules will be generated based on these items.

## Fig.2. Outputs of 5000 iterations of SOM trained transactions

The benefit for this algorithm is, in a large database, it may generate thousands of transactions for analysis, but we are only interested in these data which appear repeatedly. Items in the transactions just appear a few times may not be important for the mining.

Table2. Items used to generate association rules for the database

In Apriori algorithm association rules mining, computation of frequent k-itemset Lk (k = 1, 2, ...) and their candidates Ck will be very tedious, we use SOM clustering to classify the transaction data, it always organizes the data to be neighborhood if they have some similar properties (same items appear in many different transactions). Therefore the structures of those data are visualized, from the structured map we can remove some columns and choose data for association rules mining.

4 CONCLUSIONS AND FUTURE WORK

Many algorithms are proposed to use SOM for data mining. Apriori algorithm for data mining is a very efficient utility when database is relatively small. However, it takes too much time to repeatedly scan the large-scale database. The combination of SOM training and Apriori algorithm can take the advantage. In this algorithm, each input vector represents a transaction in a database, Kohonen self-organizing map is used to train input vectors. According to the principles of the SOM, it is a similarity of neighborhood neural network, on which we have the visualization of the relationship between items in the database. The simulation experiment shows the feature map can be formed in very short time. By reviewing the SOM outputs, one can easily determine the positive and negative association rules in the database. It's clearly the proposed algorithm makes the data mining a significantly improvement.