# Interestingness Measures For Multi Level Association Rules Education Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Lenca et al specified that interestingness measures are necessary to rank Association rule patterns. Each interestingness measure produces different results, and opinions that constitute a good rule. The interestingness of discovered association rules is an important and active area within data mining research [12]. The primary problem is the selection of interestingness measures for a given application domain. Association rule algorithms produce thousands of rules, many of which are redundant [38, 41]. In order to filter the rules, the user generally supplies a minimum threshold for support and confidence. Support and confidence [2, 9 and 36] are basic measures of association rule interestingness. All of these measures were proposed for association rules derived from single level or flat datasets, which were most commonly transactional datasets. Today multi-level datasets are more common in many domains. With this increase in usage there is a big demand for techniques to discover multi-level and cross-level association rules and also techniques to measure interestingness of rules derived from multi-level datasets. J. Han et al [22, 23 and 78] proposed some approaches for multi-level and cross-level frequent itemset discovery have been proposed. However, multi-level datasets are often a source of numerous rules and in fact the rules can be so numerous it can be much more difficult to determine which ones are interesting [2, 9]. Moreover, the existing interestingness measures for single level association rules can not accurately measure the interestingness of multi-level rules since they do not take into consideration the concept of the hierarchical structure that exists in multi-level datasets.

In this chapter, the proposed works is to measures particularly for assessing the interestingness of multilevel association rules by examining the diversity and peculiarity among rules. These measures can be determined at rule discovery phase during post-processing to help users determine the interesting rules. Diversity of a data set is defined as when comparing two data sets, the one with more diverse rules is more interesting. Diversity will be used to compare two data sets to determine which data set contains rules that are more interesting.

This chapter is organized as discusses related work, the theory, background and assumptions behind proposed interestingness measures and then experiments and results are presented.

## 2. Related Work

For as long as association rule mining has been around, there has been a need to determine which rules are interesting. Originally this started with using the concepts of support and confidence [2]. Since then, many more measures have been proposed [9, 12 and 36]. The Support-Confidence approach is appealing due to the anti monotonicity property of the support. However, the support component will ignore itemsets with a low support even though these itemsets may generate rules with a high confidence [36]. Also, the Support-Confidence approach does not necessarily ensure that the rules are truly interesting, especially when the confidence is equal to the marginal frequency of the consequent [36]. Based on this argument, other measures for determine the interestingness of a rule are needed.

All of these existing measures fall into three categories; objective based measures (based on the raw data), subjective based (based on the raw data and the user) and semantic based measures (based on the semantic and explanations of the patterns) [12]. In the survey presented by Geng et al [12] there are nine criteria listed that can be used to determine if a pattern or rule is interesting. These nine criteria are; conciseness, coverage, reliability, peculiarity, diversity, novelty, surprisingness, utility and action ability or applicability. The first five criteria are considered to be objective, with the next two, novelty and surprisingness being considered to be subjective. The final two criteria are considered to be semantic.

Despite all the different measures, studies and works undertaken, there is no widely agreed upon formal definition of what interestingness is in the context of patterns and association rules [12]. More recently several surveys of interestingness measures have been presented [12, 36, 37 and 43]. In McGray et al (2008) [43] survey evaluated the strengths and weaknesses of various measures from the point of view of the level or extent of user interaction. P. Lenca et al (2007) [37] survey looked at classifying various interestingness measures into five formal and five experimental classes, along with eight evaluation properties. However, all of these surveys result in different outcomes over how useful, suitable etc., an interestingness measure is. Therefore the usefulness of a measure can be considered to be subjective.

All of these measures mentioned above are for rules derived from single level datasets. They work on items on a single level but do not have the capacity for comparing different levels or rules containing items from multiple levels simultaneously.

Hilderman et al [29] have established three primary principles that a good interestingness measure should satisfy:

The minimum value principle, which states that a uniform distribution is the most uninteresting.

The maximum value principle, which states the most uneven distribution is the most interesting.

The skewness principle, which states that the interestingness measure for the most uneven distribution will decrease when then number of classes of tuples increases.

The permutation invariance principle, which states that interestingness for diversity is unrelated to the order of the class and it is only determined by the distribution of counts.

The transfer principle, which states that interestingness increases when a positive transfer is made from the count of one tuple to another whose count is greater.

Here in our work we propose to measure the interestingness of multi-level rules in terms of diversity and peculiarity (also known as distance). These measures were chosen as they are considered to be objective (rely on just the data).

## Diversity-Based Interestingness Measures

According to Geng et al, a "pattern is diverse if its elements differ significantly from each other, while a set of patterns is diverse if the patterns in the set differ significantly from each other" [12]. Summaries can be measured using diversity-based interestingness measures. In this chapter, proposed two diversity measures for association rules extracted from a multilevel dataset, which take the structural information of the nodes or items into consideration. The diversity measure(s) defined is a measure of the difference between the items or topics. These measures are based on their positions in the multi-level dataset's hierarchy. A diverse rule may be interesting because the uniform distribution does not hold due to the items are being significantly different from each other.

In the proposed work, the diversity of an association rule can be measured using two different approaches; overall diversity and antecedent-consequent diversity. These approaches are based on distances like Hierarchical Relationship Distance (HRD) between items and Concept Level Distance (LD) between items, which are measured in order to determine the diversity of a rule.

## Proposed Hierarchical Relationship Distance Measure

In the proposed work, the Hierarchical Relationship Distance (HRD) attempts to measure the strength of the relationship of two items ni & nj based on their ancestry. This distance is mainly measured based on the Horizontal Distance between two nodes. The Horizontal Distance is the distance between the two nodes via their common ancestor in the multi-level dataset's hierarchy. The more horizontal distance between two nodes or items will give a weak relationship. For example, sibling nodes have little diversity because they have the same parent node and the horizontal distance is minimal. Thus the similarity between their items would be strong. However, as the distance to a common ancestor for two nodes increases, the horizontal distance between them also increases. Thus the degree of similarity decreases then their diversity increases.

Maximum HRD is achieved when two items doesn't share a common ancestor, and are both located at the lowest concept level in the hierarchy. The lowest concept level items ensure that the two nodes have the minimal degree of similarity thus maximum diversity.

Thus to determine the Hierarchical Relationship Distance (HRD) of two items ni, nj ïƒŽ C, where C is the set of all items or concepts in the multi level dataset. Then

Where

ca (ni, nj) is the closest common ancestor to both ni and nj in the hierarchy, where ni, nj ïƒŽ C and neither of ni, nj is the root in the hierarchy.

In ni or nj can become the common ancestor. The ni is the common ancestor, if ni is an ancestor of nj.

Similarly nj is the common ancestor, if nj is an ancestor of ni.

TreeHeight is the maximum number of concepts or items on a path in the multi-level dataset from the root to a concept located at the lowest concept level in the dataset.

Hierarchy level of an item is the depth of the item in the hierarchical tree, i.e., the hierarchy level of the root is 0 and the hierarchy level of an item is larger than the level of its parent by 1.

NLD(ni, nj) denotes the Number of levels Difference between two concepts ni, nj, which is defined as the number of hierarchy concept levels between ni and nj as

(6.2)

The Hierarchical Relationship Distance between two items is defined as the ratio of the number of levels between the two items and their common ancestor and the height of the tree. Thus if two items share a direct parent, the HRD value of the two items becomes the lowest value which is 1/TreeHeight. If the two items have no common ancestor or their common ancestor is the root, the HRD values of the two items can be high. Maximum HRD value is 1, is achieved when the two items have no common ancestor or the common ancestor is the root. If the two items are the same, then HRD is 1/TreeHeight.

Figure 6.1. Amazon taxonomy snippet showing highlighted topics for HRD and LD examples.

For the seven concepts, n1, n2, n3, n4, n5, n6, n7 the hierarchy relationship distances between n1 and n2, n1 and n4, n1 and n3, n1 and n5, are HRD(n1, n2) = 1, because ca(n1, n2) there is no ancestor other than root thus NLD(n1, root) = |4-0|. Similarly NLD(n2, root) = |4-0|. Thus HRD(n1, n2) = (4+4)/2*4 = 1. Similarly HRD(n1, n4) = 0.75, HRD(n1, n3) = 0.375, and HRD(n1, n5) = 0.5, respectively.

From the example above the n1 & n2 achieve a maximum of HRD which is 1. These two nodes do not have a common ancestor other than root and are both located at the lowest concept level, therefore they are diverse for this dataset.

Compare n1 and n2 to n1 and n4, where, HRD(n1, n2) =1.00 and HRD(n1, n4) = 0.75. Both n1 and n4 are tt the lowest concept level and n1 & n4 share a common ancestor. This means n1 & n4 do have some similarity more than n1 & n2.

HRD(n1, n3) = 0.375, because of two points,

Only one of the nodes (n1) is at the lowest concept level, while the other is at a higher level increasing the chances of similarity.

n3 is a direct ancestor of n1 and thus the common ancestor of n1 & n3 is n3 itself.

This means that only one NLD component in the HRD equation contributes to determining the diversity. The distance of n3 to itself is 0 and therefore there is no diversity between n3 and itself.

Finally HRD (n1, n5) =0.5 which is greater than n1 & n3, as n1 & n5 share a common ancestor n3. But n1 & n5 does not score as high as n1 & n4. As n5 is closer to the common ancestor n4. Thus the distance between n1 & n5 is less than the distance between n1 & n4 and hence the score is lower due to the stronger relationship.

## Proposed Concept Level Distance Measure

In the proposed work, the second aspect to be considered is the Concept Level Distance (LD) of two items and is based on the hierarchical levels of the two items. Two items on the same hierarchy level are not considered as diverse, but two items on different levels are more diverse as they have different degrees of abstractness. The LD is similar to measuring the generational difference between two members of a family. In this measure, no consideration is given to whether the two nodes are related by a common ancestor, that one is an ancestor of the other or that they do not share a common ancestor. LD differs from HRD in that HRD measures the distance of two nodes are from a common ancestor item (or root) to measure the strength of their relationship, whereas LD measures the difference between the two items' specificity. LD focuses on measuring the distance between two items in terms of their height (vertical) difference only, whereas HRD considers the width (horizontal) distance when determining diversity.

Thus, in this proposed work, use the ratio between the level difference (NLD) of two items and the height of the tree (eg. the maximum level difference) to measure the Level Distance of the two items as defined as follows, where ni ,nj ïƒŽ C are two concepts in the dataset:

This means that two items on the same concept level will have a LD of 0, while an item at the highest concept level and another at the lowest concept level will have an LD of 1, as they are as far apart as possible in the given hierarchy.

## Example 6.1.

For the seven concepts, n1, n2, n3, n4, n5, n6, n7 which are highlighted with red in Figure 4.1, by using Equation 6.3, the hierarchy relationship distances between n1 and n2, n1 and n4, n1 and n3, n1 and n5, are LD(n1,n2) = 0, LD(n1,n4) = 0, LD(n1,n3) = 1, and LD(n1,n5) = 0.666, respectively.

From these examples, the proposed work, say that both n1 & n2 and n1 & n4 achieve a LD of 0. This is because n1, n2 and n4 are all at the same concept level and therefore have the same level of abstractness. However, for Hierarchical Relationship Distance, the n1 and n2 achieved a HRD of 1, while n1 & n4 achieved a HRD of 0.75. From the point of view of LD these pairs are not diverse as they all come from the same concept level, but in the case of HRD, n1 and n2 as diverse because they are unrelated. Thus, this shows that HRD and LD measure two different aspects of the nodes in a hierarchy.

The pair's n1 and n3 achieves the maximum LD with a value of 1. This occurs because one of the nodes is at the lowest concept level, while the other is at the highest concept level. This maximizes the number of concept levels between the two. However, the HRD value of this pair is only 0.375 because n3 is a direct ancestor of n1 and thus they have a strong relationship with each other. LD does not take this into consideration.

For n1 and n5 the LD obtained is 0.666, while the HRD is 0.5. The HRD value gives the relationship between n1 and n5. The value of 0.666 for LD is obtained because the two nodes are on different concept levels, but unlike n1 and n3 which are separated by as many concept levels as possible, the number of levels separating n1 and n5 is less. Thus n1 and n5 are less diverse in terms of their abstract / specificity levels than n1 and n3.

In the following two sections are proposed diversity measures which are based on the proposed HRD and LD distances.

## Proposed Antecedent-Consequent Diversity Measure

The first proposed approach to measuring the diversity is known as the Antecedent-Consequent Diversity measure. This approach measures the diversity between the set of items in the antecedent and the set of items in the consequent of an association rule. Those rules which have a high difference between their antecedent and consequent will have a high antecedent-consequent diversity.

Let R be a rule R: a1, a2, a3, â€¦, an â†’ c1, c2, c3, â€¦, cm, with n items in the antecedent and m items in the consequent, where C ïƒŽ a1, a2, a3, â€¦, an, c1, c2, c3, â€¦, cm, and DACR denote the antecedent to consequent diversity of R, the diversity of R can be determined as follows.

(6.4)

Where ï¡ï€²ï€ and ï¢ï€²ï€ are weighting factors such that ï¡ï€²ï€ ï€«ï€ ï¢ï€²ï€ ï€½ï€ ï€±ï€ .

The first component in Equation (6.4) is the average diversity between the items in the antecedent and the items in the consequent in terms of the HRD distance, and the second component is the average diversity between the antecedent and the consequent in terms of the LD distance. This measure allows determining the average diversity between the antecedent and consequent within a rule and thus the proposed work get an overall internal measure of the differences between the rule's antecedent and its consequent.

Example 6.2. Determine the antecedent-consequent diversity of rule n1, n6 â†’ n7. The HRD and LD distances between pairs of items in the antecedent and consequent of this rule are HRD(n1, n7) = 0.625, LD(n1, n7) = 1, HRD(n6, n7) = 0.5 and LD(n6, n7) = 0.666.

For this rule, the number of items in the antecedent is 2 (i.e., n = 2) and the number of items in the consequent is 1 (i.e., m = 1). Then, the antecedent-consequent diversity measure is calculated below with ï¡2 and ï¢2 being set at 0.5 each:

Example 6.3. The antecedent-consequent diversity of rule n1, n7 â†’ n4 .

The HRD and LD distances between pairs of items in the antecedent and consequent of this rule are HRD(n1, n4) = 0.75, LD(n1, n4) = 0, HRD(n7, n4) = 0.625 and LD(n7, n4) = 1.

For this rule, the number of items in the antecedent is 2 and the number of items in the consequent is 1. Then, the antecedent-consequent diversity measure is calculated below with ï¡2 and ï¢2 being set at 0.5 each:

## Example 6.4. Comparison of the antecedent-consequent diversity of rules n1, n6 â†’n7 and n1, n7 â†’ n4.

As per examples 6.2 and 6.3 the antecedent-consequent diversity of rule n1, n6 â†’n7 is 0.6977, while the antecedent-consequent diversity of rule n1, n7 â†’ n4 is 0.5937. Thus the first rule has a higher antecedent-consequent diversity and therefore has more diversity between its antecedent and consequent sets then the second rule. For the first rule, antecedent-consequent diversity considers the node of pairs n1 & n7 and n6 & n7, while for the second rule the antecedent-consequent diversity considers the node pairs n1 & n4 and n7 & n4. From the Figure 6.1, the proposed work, says that for each of the pairings n1 & n7, n7 & n4 and n6 & n7 the two nodes are in separate branches, thus increasing their diversity. However, from the second rule, the proposed work, that in the pairing of n1 & n4 the two nodes comes from the same branch as they both have the node n3 as a common ancestor. Thus the diversity of this pair would be lower than other the other pairs. This is evident as this pair results in an LD of 0 as both nodes are on the same concept level. Also despite both being at the lowest concept level in the taxonomy their HRD is not 1 due to their common ancestor. Thus this lowers the antecedent-consequent diversity score of the second rule, making it less diverse as would be expected. Thus the rule n1, n6 â†’n7 is more diverse than the rule n1, n7 â†’ n4 when measured using the proposed antecedent-consequent diversity measure because the rule n1, n7 â†’ n4 contains the pair of n1 & n4 which are related and thus not as diverse.

## Proposed Overall Diversity Measure

The second proposed approach to measuring the diversity is known as the Overall Diversity measure. This approach measures the overall diversity of the items in an association rule by combining the antecedent and consequent into a single set of items. If the contents of this set are different the association rule will have a high overall diversity, regardless of which part of the rule the diverse items came from.

The overall diversity differs from the previous antecedent-consequent diversity in that it considers all pairs of items in an association rule, including those pairs within the antecedent and/or consequent. This allows rules which have diverse items within the antecedent and/or consequent to be discovered. The antecedent-consequent diversity only discovers those rules whose consequent differs greatly from the antecedent. Overall diversity on the other hand will allow rules with diverse antecedents and/or consequents to be found. This allows those rules which have vastly different initial conditions, as well as those with vastly differing conclusions to be discovered.

Let R be a rule with k items n1, n2, n3, â€¦ , nk and DOR denotes the overall diversity of R, the diversity of R can be determined as follows.

Where ï¡ï€±ï€ and ï¢ï€±ï€ are weighting factors such that ï¡ï€±ï€ ï€« ï¢ï€± = 1.

The overall diversity considers the HRD and LD scores for every pair of topics in the rule. This measure allows us to determine the average diversity of the topics within the rule and thus we get an overall internal measure of these differences.

## Example 6.5. The overall diversity of rule n1, n6 â†’n7.

The HRD and LD distances between pairs of items in this rule are HRD(n1, n6) = 0.375, LD(n1,n6) = 0.333, HRD(n1, n7) = 0.625, LD(n1, n7) = 1, HRD(n6, n7) = 0.5 and LD(n6, n7) = 0.666. Then, the overall diversity measure is calculated below with ï¡1 and ï¢1 being set at 0.5 each:

Thus for the association rule n1, n6 â†’n7 the overall diversity is 0.583, which is lower than the antecedent-consequent diversity score obtained for this rule in Example 6.2 (which was a score of 0.697). This is because the antecedent-consequent diversity looks at the difference between the antecedent and consequent of the rule. Here, the proposed work, considers the item pair n1 and n6 (which was not considered when calculating the antecedent-consequent diversity) as this pair is completely within the antecedent. As per the Figure 6.1 it can be seen that this item pair is not very diverse and they are both on the same branch of the hierarchy. The overall diversity considers this pair and includes it when calculating the final diversity of the association rule. However, this pair is within the antecedent and it is ignored by the antecedent-consequent diversity measure and thus does not bring the score down. This is the key difference between the overall diversity and the antecedent-consequent diversity.

## Example 6.6: The overall diversity of rule n1, n7 â†’n4.

The HRD and LD distances between pairs of items in this rule are HRD(n1, n7) = 0.625, LD(n1,n7) = 1, HRD(n1, n4) = 0.75, LD(n1, n4) = 0, HRD(n7, n4) = 0.625 and LD(n7, n4) = 1. Then, the overall diversity measure is calculated below with ï¡1 and ï¢1 being set at 0.5 each:

= 0.666

Thus for the association rule n1, n7 â†’n4 the overall diversity is 0.666, which is different to the antecedent-consequent diversity score obtained for this rule in Example 6.3 (which is 0.5937). For this association rule the overall diversity is higher than the antecedent-consequent diversity. This is because the antecedent-consequent diversity only looks at the difference between the antecedent and consequent of the rule. This measure considers the item pair n1 and n7 as this pair is completely within the antecedent. This pair of item scores high for LD and reasonably well for HRD. As per the Figure 6.1 is reviewed it can be seen that these two items are on separate branches of the hierarchy and therefore are not closely related. This raises the score of the overall diversity as this pair of items is reasonably diverse, but because it is within the antecedent the antecedent-consequent diversity measure does not consider it. Again, this is the key difference between the overall diversity and the antecedent-consequent diversity.

## Example 6.7. Comparison of the overall diversity of rules n1, n6 â†’n7 and n1, n7 â†’n4.

As per examples 6.5 and 6.6 the overall diversity of rule n1, n6 â†’n7 is 0.583 while the overall diversity of rule n1, n7 â†’n4 is 0.666. Thus the second rule has a higher overall diversity and therefore has more diversity between its items then the first rule. when referring to Figure 4.1, in the proposed work, the pair of nodes n1 & n6 have very little diversity as they are located very close together in the taxonomy being not only in the same branch, but with only a single level concept level separating them. The pair of n1 and n7 for the second rule, the two nodes come from separate branches and have the maximum number of concept levels between them (thus maximum LD). Thus when the overall diversity is used to measure these rules the pairing of n1 & n6 in the first rule is considered in the calculation and this pair has very low diversity and thus lowers the overall diversity of the first rule. Thus the second rule n1, n7 â†’n4 is more diverse then n1, n6 â†’n7 when measured using the overall diversity.

## Proposed Peculiarity Based Interestingness Measures

Peculiarity is an objective measure that determines how far away one association rule is from others. The further away the rule is, the more peculiar. It is usually done through the use of a distance measure to determine how far apart rules are from each other. Peculiar rules are usually few in number (often generated from outlying data) and significantly different from the rest of the rule set. It is also possible that these peculiar rules can be interesting as they may be unknown. Dong et al proposed peculiarity measure, which is neighborhood-based unexpected measure for single level rules. In this proposal, it is argued that a rule's interestingness is influenced by the rules that surround it in its neighborhood and applied on multi level rules.

The measure is based on the idea of determining and measuring the symmetric difference between two rules, which forms the basis of the distance between them. From this Dong et al has proposed that unexpected confidence (where the confidence of a rule R is far from the average confidence of the rules in R's neighborhood) and scarcity (where the number of mined rules in a neighborhood is far less than that of all the potential rules for that neighborhood) could be determined, measured and used as interestingness measures [9, 12].

Dong et al measure determines the symmetric difference was developed for single level datasets where each item was equally weighted. Thus the measure is actually a count of the number of items that are not common between the two rules. In a multi-level dataset, each item cannot be regarded as being equal due to the hierarchy. Thus the Dong et al measure needs to be enhanced to be useful with these datasets. Here we will present an enhancement as part of our proposed work [9, 12].

In the proposed work, considers the distance measure for multi-level datasets. The original measure is a syntax-based distance metric in the following form:

(6.6)

The âˆ† operator denotes the symmetric difference between two item sets, thus X âˆ† Y is equivalent to (X -Y)ïƒˆ(Y -X), ï¤1, ï¤2 and ï¤3 are the weighting factors to be applied to different parts of the rule. Equation 3 measures the peculiarity of two rules by a weighted sum of the cardinalities of the symmetric difference between the two rule's antecedents, consequents and the rules themselves.

The proposed work has an enhancement to this measure to allow it to handle a hierarchy. Under the existing measure, every item is unique and therefore none share any kind of 'syntax' similarity. However, we argue that the items 1-*-*-*, 1-1-*-*, 1-1-1-* and 1-1-1-1 (based on Figure 1) all have a relationship with each other. Thus they are not completely different and should have a 'syntax' similarity due to their relation through the dataset's hierarchy.

The greater the P(R1,R2) value is, lower similarity and so the greater the distance between those two rules. Therefore, the further apart the relation is between two items, the greater the difference and distance. Thus, the proposed work have,

R1 : 1 âˆ’ 1 âˆ’ 1 âˆ’ * ïƒ 1 âˆ’ * âˆ’ * âˆ’ *

R2 : 1 âˆ’ 1 âˆ’ * âˆ’ * ïƒ 1 âˆ’ * âˆ’ * âˆ’ *

R3 : 1 âˆ’ 1 âˆ’ 1 âˆ’ 1 ïƒ 1 âˆ’ * âˆ’ * âˆ’ *

We believe that the following should hold; P(R1,R3) < P(R2,R3) as 1-1-*-* and 1-1-1-1 are further removed from each other than 1-1-1-* and 1-1-1-1. The difference between any two hierarchically related items\ nodes must be less than 1. Thus, for the above rules, 1 > P(R2,R3) > P(R1,R2) > 0. In order to achieve this we modify Equation 3 by calculating the diversity of the symmetric difference between two rules instead of the cardinality of the symmetric difference. The cardinality of the symmetric difference measures the difference between two rules in terms of the number of different items in the rules. The diversity of the symmetric difference takes into consideration the hierarchical difference of the items in the symmetric difference to measure the difference of the two rules. We recite Equation 2 in terms of a set of items below, where S is a set containing n items:

(6.7)

Where HRD is The Hierarchical Relationship Distance between two items is defined as the ratio between the average number of levels between the two items and their common ancestor and the height of the tree. In general HRD is defined as width or horizontal distance, which is defined in equation 6.1

LD is the Level Distance which measures the distance between two items in terms of their height. This is also called as height distance or vertical distance, which is defined in equation 6.2

Thus the neighborhood-based distance measure between two rules shown in Equation 6.6 now becomes:

(6.10)

## Example 6.8. The peculiarity distance between rule R1: n1, n5 ï‚® n6 and rule R2: n6 â†’ n7.

First we determine the sets generated based on the ï€ operator.

=({n1, n5}({nï€¶ï½ï€©ï€¨ï»ï»nï€¶ï½ï»n7})={n1, n5, nï€¶ï½{n6, n7} = {n1, n5, n7}

= {n1, n5} {n6} = {n1, n5, n6}

= {n6} {n7} = {n6, n7}

Thus at this stage the peculiarity distance measure is

DOR is the overall diversity, hence the DOR({ = 0.583, = 0.43, = 0.583.

Now the peculiarity distance between R1 and R2 is

PM(R1, R2) = * 0.583 + * 0.43 + * 0.583.

The proposed work considers the , and value as 1.00, then

PM(R1, R2) = 0.583 +0.43 + 0.583 = 1.596.

This value is a large one thus the rules R1 and R2 are not similar

## 4. Experiments

The dataset used for our experiments is a real world dataset, the BookCrossing dataset (obtained from http://www.informatik.uni-freiburg.de/cziegler/BX/). From this dataset the proposed work built a multi-level transactional dataset that contains 91,550 user records and 970 leaf items, with 3 concept / hierarchy levels.

Fig . 7 Distribution Curves for Support, Confidence, Overall Distribution, Antecedent and Consequent and Peculiarity Distance for Book-Crossing Dataset using MinMax

The first thing that can be noticed from Figures 4, 5, 6 and 7 is that the distribution curves for the analyzed measures are similar. Thus the removal of redundant association rules does not appear to effect the overall distribution of interesting association rules.

For Figures 4, 5, 6 and 7, the confidence curve shows that the rules are spread out from 0.5 (which is the minimum confidence threshold) up to close to 1. The distribution of rules in this area is fairly consistent and even, ranging from as low as 2,181 rules for 0.95 to 1, to as high as 4,430 rules for 0.85 to 0.9. Using confidence to determine the interesting rules is more practical than support, but still leaves over 2,000 rules in the top bin.

For Figures 4, 5, 6 and 7, the overall diversity curve shows that the majority of rules (13,665) here have an average overall diversity value of between 0.3 to 0.4. The curve however, also shows that there are some rules which have an overall diveristy value below the majority, in the range of 0.15 to 0.25 and some that are above the majority, in the range of 0.45 up to 0.7. The rules located above the majority are different to the rules that make up the majority and could be of interest as these rules have a high overall diversity.

For Figures 4, 5, 6 and 7, the antecedent-consequent diversity curve is similar to that of the overall diversity. It has a similar spread of rules, but the antecedent-consequent diversity curve peaks earlier at 0.3 to 0.35 (where as the overall diversity curve peaks at 0.35 to 0.4), with 12,408 rules. The curve then drops down to a low number of rules at 0.45 to 0.5, before peaking again at 0.5 to 0.55, with 2,564 rules. The shape of this curve with that of the overall diversity seems to show that the two diversity approaches are related. Using the antecedent-consequent diversity allows rules with differing antecedents and consequents to be discovered when support and confidence will not identify them.

## Examples of Association Rules from a Multi-Level Dataset with High Antecedent - Consequent Diversity

This section shows some examples of association rules with high antecedent-consequent diversity which was derived from the Book-Crossing dataset in the previous experiment. Rules with a high antecedent-consequent diversity are interest, instead of those with a high confidence. Antecedent-consequent diversity will allow a user to discover those rules whose consequent vastly differs from the initial antecedent.

For the examples, items within the antecedent or consequent are separated by a ',' while within a items the '-' separates the description for each level. Thus for example the item 'a-b' is a second level item, with 'a' describing the item at the first taxonomy level and 'b' describing it at the second taxonomy level.

Example 6.9. The following association rule:

Subjects - Biographies and Memoirs - General, Subjects - Literature & Fiction - Authors(A..Z) â†’ Book Clubs - Literature & Fiction

An antecedent-consequent diversity value for this rule is 0.67 and a confidence of 0.61 is deemed to be interesting in terms of its antecedent-consequent diversity, but not so interesting in terms of confidence, where the items in the antecedent are very weakly (if at all) related to the items in the consequent. This type of rule may be of interest to a user.

Example 6.10. The following association rule:

Subjects - Mystery & Thrillers, Subjects - Literature & Fiction - Authors (A..Z) â†’ Book Clubs

An antecedent-consequent diversity value of this is 0.67 and a confidence of 0.51 is deemed to be interesting in terms of its antecedent-consequent diversity (but not so interesting in terms of confidence), where the items in the antecedent are very weakly (if at all) related to the items in the consequent. This type of rule may be of interest to a user.

Example 6.11. The following association rule:

Subjects - Literature & Fiction - General, Subjects - Mystery & Thrillers - Mystery â†’ Book Clubs

An antecedent-consequent diversity value of 0.83 and a confidence of 0.53 is deemed to be interesting in terms of its antecedent-consequent diversity.

## Examples of Association Rules from a Multi-Level Dataset with High Overall Diversity

This section shows some examples of association rules with high overall diversity which was derived from the Book-Crossing dataset in the previous experiment. Rules with a high overall diversity may be of interest, instead of those with a high confidence. As previously mentioned, overall diversity will allow a user to discover those rules whose consequent or initial conditions antecedent topics vastly differ.

Example 6.12. The following association rule:

Book Clubs, Subjects - Literature & Fiction - Authors(A..Z) â†’ Subjects - Mystery & Thrillers -General

An overall diversity value of 0.67 and a confidence of 0.52 is deemed to be interesting in terms of its overall diversity.

Example 6.12. The following association rule:

Book Clubs, Subjects - Literature & Fiction - General â†’ Subjects - Literature & Fiction -Genre Fiction

An overall diversity value of 0.61 and a confidence of 0.72 is deemed to be interesting in terms of its overall diversity and confidence.

Example 6.13. The following association rule:

Book Clubs, Subjects - Literature & Fiction - Genre Fiction â†’ Subjects - Biographies &Memoirs - General

with an overall diversity value of 0.67 and a confidence of 0.52 is deemed to be interesting in terms of its overall diversity.

## Examples of Association Rules from a Multi-Level Dataset with High Peculiarity Distance

This section shows some examples of association rules with high peculiarity distance which was derived from the Book-Crossing dataset in the previous experiment. Rules with a high peculiarity distance may be of interest, instead of those with a high confidence. Peculiarity distance will allow a user to discover those rules which differ from the rest of the rule set and are potential outliers. Rules with a high peculiarity distance have a high overall diversity of the symmetric differences between them and the remaining rules.

Example 6.15. The following association rule:

Book Clubs, Subjects - Literature & Fiction - World Literature â†’ Subjects - Literature &Fiction - Genre Fiction, Subjects - Mystery & Thrillers

with a peculiarity distance value of 0.822 and a confidence of 0.58 is deemed to be interesting in terms of its peculiarity distance.

Example 6.16. The following association rule:

Book Clubs, Subjects - Literature & Fiction - General â†’ Subjects - Literature & Fiction -World Literature, Subjects - Mystery & Thrillers

with a peculiarity distance value of 0.824 and a confidence of 0.54 is deemed to be interesting in terms of its peculiarity distance.

## 5. Summary

In this paper we proposed two interestingness measures for Multi Level Association rules. These proposed interestingness measures are diversity and peculiarity respectively. Diversity is a measure that compares items within a rule and peculiarity compares items in two rules to see how different they are. In our experiments we have shown how diversity and peculiarity distance can be used to identify potentially interesting rules which can't be identified using basic measurements support and confidence.

## Note: Related to this chapter, a research paper is published in

R. Vijaya Prakash, Dr. A Govardhan, Prof. SSVN. Sarma, "Interestingness Measures for Multi level Association Rules", International Journal of Information and Knowledge Management Volume 2, No 6, ISSN 2224-5758 (Paper) ISSN 2224-896X (Online). Impact Factor 5.42.