Incremental Frequent Pattern Mining English Language Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

In real-world applications, the target data is changed with time in association rules mining, and then existed association rules will also be changed, so the incremental mining algorithm should be developed. Recent important applications have called for the need of incremental mining. This is due to the increasing use of the record-based databases whose data is being continuously added. Examples of such applications include Web log records, stock market data, sales data, transactions in electronic commerce, and daily weather/traffic records, to name a few. Incremental mining not only include new data that is, data in the new month, but also remove the old data that is, data in the most obsolete month from the mining process [13], [14].

Discovering the frequent item sets becomes more time consuming if the dataset is incremental in nature. In the incremental dataset, new records are added time to time. Generally the size of the increments, i.e. the number of records added to the dataset, is very small in comparison to the whole dataset. But due to the addition of these new records, scenario of the rules in the updated dataset may be changed. Some of the new item sets may become frequent, while some previously derived frequent set may become infrequent. For example, suppose X and Y are two distinct item sets of a dataset having 500 records with support count 200 and the minimum support of 30%, X is frequent item set but Y is not.

Example simple database with nine transactions and six items [14], [15], [16].

D

TID

Items

T1

A B C

T2

A F

T3

A B C E

T4

A B D F

T5

C F

T6

A B C

T7

A B C E

T8

C D E

T9

A, B D ETable1 4.1 Transactions Database

Generation of Candidate Set C1 for single item set show in the Table 4.2

Table 4 .2 Candidate one item set C1

Item

Support Count

A

7/9

B

6/9

C

6/9

D

3/9

E

4/9

F

3/9

Frequent pattern for single item set show in Table4.3

Table 4.3 Frequent one item set L1

Item

Minimum

supportt Count

A

7/9

B

6/9

C

6/9

E

4/9

In the Table 4.4 first generate the Candidate for Two item set C2 and after that calculate the support count.

Table 4.4 Generating Frequent Two Item Set

Item set

Support Count

AB

6/9

AC

4/9

AE

3/9

BC

4/9

BE

3/9

CE

3/9

Item

AB

AC

AE

BC

BE

CE

Item set

Minimum Support

Count

A,B

6/9

A,C

4/9

B,C

4/9

Then finally generated two item set pattern in Table 4.4

After that Generating two item set frequent pattern then generating three candidate item set show in Table 4.5.

Then calculating the minimum support count after that generating three item set frequent pattern show in Table 4.5

Table 4.5 Generating Frequent Three Item set

Item set

ABC

Item set

Support Count

ABC

4/9

Item set

Support Count

ABC

4/9

From the above method that get frequent pattern there are 1 frequent item set, 2 frequent item , 3 frequents item set. To generate association rule it should be determine confidence between the frequent items. Frequent item are shown in the Table 4.6.

Table 4.6 Frequent Item Sets

One frequent item set

A, B, C, E

Two frequent item set

{A, B}, {A, C} {B,C}

Three Frequent Item set

{A, B, C}

There are four important things that should be considered for incremental database for mining frequent pattern [16],[17],[18].

1. Pattern is frequent in old database and not frequent in increment to database .

2. Frequent in both old database as well as increment to it.

3. Not frequent in both old database as well as increment to it.

4. Frequent in increment to database and not frequent in old database.

4.2 Incremental Frequent Pattern Mining Process

Working mechanism of Incremental Frequent Pattern Mining Process is as described below.

Transaction

Database

Increment or decrement

In the data base

Candidate Generation

Transaction Formatting

Frequent Database

Frequent Pattern Generation

Figure 4.1 Frequent Pattern Mining Process

Working of each module:

1. Transaction Database: Database which contains day to day transactions.

2. Increment extraction: As transaction database is incremental in nature some addition,

deletion or modification occurs on to it. It extracts those changes.

3. Transaction Formatting: Items in transactions are arranged in ascending order for

processing.

4. Candidate item sets of different size are generated.

5. Frequent Pattern Database: Contains occurrence frequency of each itemset.

6. Frequent patterns are generated by taking consideration of candidate generation with their count values in given database [18], [19].

4.3 Incremental Frequent Pattern Mining Methods

Generally speaking, the goal is to solve the efficient update problem of association rules after a nontrivial number of new records have been added to or removed from a database. Assuming that the two thresholds, minimum support Smin and minimum confidence Cmin, do not change, there are several important characteristics in the update problem.

The update problem can be reduced to finding the new set of frequent itemsets. After that, the new association rules can be computed from the new frequent itemsets.

An old frequent itemset has the potential to become infrequent in the updated database.

Similarly, an old infrequent itemset could become frequent in the new database.

In order to find the new frequent itemsets "exactly", all the records in the updated database, including those from the original database, have to be checked against every candidate set.

There are several algorithms have been developed for mining frequent pattern from incremental data base, some of them are explaining [17], [18].

4.3.1 FUP

Algorithm FUP (Fast UPdate) is the first algorithm proposed to solve the problem of incremental mining of association rules. It handles databases with transaction insertion only, but is not able to deal with transaction deletion. Specifically, given the original database D and its corresponding frequent itemsets L = {L1... Lk}. The goal is to reuse the information to efficiently obtain the new frequent itemsets L' = {L1'... Lk'} on the new database D' = DUΔ+. By utilizing the definition of support and the constraint of minimum support Smin. The following lemmas are generally used in algorithm FUP.

1. An original frequent itemset X, i.e., X Є L, becomes infrequent in D' if and only if X.support D' < Smin.

2. An original infrequent itemset X, i.e., X Є L, may become frequent in D' only if X.support Δ+ ≥ Smin.

3. If a k-itemset X whose (k-1)-subset(s) becomes infrequent, i.e., the subset is in Lk−1 but not in L'k−1, X must be infrequent in D'.

Basically, similarly to that of Apriori, the framework of FUP, which can update the association rules in a database when new transactions are added to the database, contains a number of iterations. The candidate sets at each iteration are generated based on the frequent itemsets found in the previous iteration. At the k-th iteration of FUP, Δ+ is scanned exactly once. For the original frequent itemsets, i.e., {X|X Є Lk}, they only have to be checked against the small increment Δ+. To discover the new frequent itemsets, the set of candidate itemsets Ck is firstly extracted from Δ+, and then be pruned according to the support count of each candidate itemset in Δ+. Moreover, the pool for candidate itemsets can be further reduced by discarding itemsets Whose (k-1)-subsets are becoming infrequent. The flows of FUP can be best understood by the following example. The dataset with the increment portion labeled as Δ+ is shown in Figure 4.1 that the first nine transactions are identical to those shown in earlier examples. In addition, the frequent itemsets of the unchanged portion D is also shown in Figure4.1 where the generation process is described earlier [16], [18], [19], [20].

Database performing algorithm FUP and the original frequent itemsets generated using association rule mining algorithm.

Table 4.7 Incremental Transaction Database

TID

Item set

D'D

T1

A B C

T2

A F

T3

A B C E

T4

A B D F

T5

C F

T6

A B C

T7

A B C E

T8

C D E

T9

A, B D E

D+

T10

A,B,D

T11

D,F

T12

A,B,C,D

Original frequent item set with database D

Table 4.8 Frequent Item Set

Item set

Support Count

{A}

6/9

{B}

6/9

{C}

6/9

{E}

4/9

{A,B}

5/9

{A,C}

4/9

{B,C}

4/9

{A,B,C}

4/9

The first iteration of algorithm FUP when performing incremental mining is represented in Figure. 4.2 The original frequent 1-itemsets are firstly verified on the increment portion Δ+, and only itemsets with new supports no less than Smin(=40%) is retained as a part of new frequent 1-itemsets L'1, i.e., {A}, {B} and {C}. Then the supports of other possible items are also checked

in Δ+, leading to the construction of C1:{{D}} to be further verified against the unchanged portion D. Finally, the new L'1 is generated by integrating the results from both possible sources.

Item set

Support

{D}

3/3

{F}

1/3

Item set

Support

{A}

7/12

{B}

8/12

{C}

7/12

{E}

4/12

Item set

Support

{A}

7/12

{B}

8/12

{C}

7/12

{E}

4/12

Item set

Support

{D}

6/12

Figure 4.2 Incremental Mining

The successive iterations work roughly the same as the first iteration. However, since the shorter frequent item sets have already been discovered, the information can be utilized to further reduce the pool of candidate item sets. Specifically, as shown in Figure 4.3, since the 1-itemset {E} becomes infrequent as Δ+ is considered, i.e., {E}∈ (L1 - L' 1), all item sets in L2 containing a subset of {E} should be infrequent and are thus discarded with no doubt. Other item sets in L2 are then verified again Δ+ to see if they are still frequent. The set of candidate 2-itemsets C2 is constructed by (L'1 * L'1 − L2) since itemsets in L2 are already checked. In this example, the 2-itemsets {A D}, {B D} and {C D} are firstly checked against Δ+. Afterward, only {B D} is being checked against the unchanged portion D, since it is the only frequent itemset in Δ+. Finally, the new L'2 is generated by integrating the results from both possible sources. The generation of L'k is of analogous process to that of L'2, and it works iteratively until no more longer candidate item sets can be formed. In this example, algorithm FUP stops when the only frequent 3-itemset {A B C} is discovered.

Second iteration of FUP algorithms

Item set

Support

{A,B}

6/12

{A,C}

5/12

{B,C}

5/12

Item set

Support

{A,D}

1/3

{B,D}

2/3

{C,D}

1/3

Item set

Support

{B,D}

4/12

L'2

Check C2 on D

Check other item set in Δ+

Check new support of L2

Item set

Support

{A,B}

6/12

{A,C}

5/12

{B,C}

5/12

Figure 4.3 Generating Frequent item set for Incremental Mining

Specifically, the key steps of FUP can be listed below [18, 19]

At each iteration, the supports of the size-k frequent item sets in L are updated against Δ+ to filter out those that are no longer in D'.

While scanning Δ+, a set of candidate item sets, Ck, is extracted together with their supports in Δ+ counted. The supports of these sets in Ck are then updated against the original database D to find the "new" frequent item sets

Many item sets in Ck can be pruned away by checking their supports in Δ+ before the update against the original database starts.

The size of Ck is further reduced at each iteration by pruning away a few original frequent item sets in Δ+.

4.3.2 FUP2 Algorithm

The algorithm FUP updates the association rules in a database when new transactions are added to the database. An extension to algorithm FUP was reported in where the authors propose an algorithm FUP2 for updating the existing association rules when transactions are added to and deleted from the database. In essence, FUP2 is equivalent to FUP for the case of insertion, and is, however, a complementary algorithm of FUP for the case of deletion.[19]

Consider the example

Table 4.9 Transitional Database for FUP2

D

TID

Items

T1

A B C

T2

A F

T3

A B C E

T4

A B D F

T5

C F

T6

A B C

T7

A B C E

T8

C D E

T9

A,B D E

This methods will work in two phase

In the first step that gets new C1 as the previous C1 in the old database.

In the second step that break C1 in to two part

(a)The item which are common to both C1 and L1 (P1=C1 intersection L1 )

(b)The item which are not common to both C1 and L1 (Q1=C1-(C1 intersection L1))

Then proceed to compute the support of each itemset in p1 to obtain all large item set in P1 those item which large item set in new database to include them compute support count of each item in Q1 in new database this procedure will continue for C1 ,C2……Ck until all frequent item set may not found.

After adding some new transaction and deleting some existing transaction from the database the updated database is now

Table 4.10 New Transitional Database for FUP2

TID

Item set

D-

T1

A B C

T2

A F

T3

A B C E

T4

A B D F

T5

C F

T6

A B C

T7

A B C E

T8

C D E

T9

A, B D E

D+

T10

A,B,D

T11

D,F

T12

A,B,C,D

D'

C1={A, B,C,D,E,F} for new database

Now divide C1 into two parts

Common in C1 and L1(for old database)

P1=C1 Intersection L1

P1={A,B,C,D,E}

Find item not common to Both C1 and L1

Q1=C1-P1

Q1={F}

Table 4.11 FUP2 for One Item Set

P1

{A, B ,C, D ,E}

Q1

{F}

So that can find that large one item set for updated database is L1'={A,B,C,D,E}

Now check the support count for F in New data base that found that F is still not large item set

Generate C2 from this new L1'

C2={AB, AC, AD, AE, BC, BD, BE, CD, CE, DE}

Now break C2 into P2 (Common inC2 and L2) and Q2(not common to C2 and L2)

P2={AB ,AC ,AD ,AE ,BC ,BD ,CD}

Q2={BE,CE,DE}

Table 4.12 FUP2 for Two Item Set

P2

{AB,AC.,AD,AE,BC,BD,CD}

Q2

{BE,CE,DE}

And finally computed L2' from P2, and Q2

L2'={AB, AC, AD, BC, BD, BE, CD}

Similarly find other large item set without scanning the whole database again in this way It also use frequent items which also frequent in new database.

Table 4.13 FUP2 for Three Item Set

P3

{ABC,ACD,BCD}

Q3

{ABD}

All large item sets are

{A, B, C, D, E, AB, AC, AD, BC, BD, BE, CD, ABC, ACD, BCD}.

4.3.3 Algorithms MAAP (Maintaining Association Rules with Apriori Property)

Agrawal [1] proposes an Apriori property, which states that \All non- empty subsets of a large item set must be large". For example, if a large 3-itemset is L3= {123}, It can immediately infer that the following item sets are large as well: {12}, {13}, {23}, {1}, {2}, {3}. Based on this principle, when association rules are to be maintained in the updated database, the large item sets can be computed from the highest level large item sets, that is, from Lk. If any item set in Lk is still large in the new database, its lower level subset item sets are included to their appropriate level large item sets in Lk-1, Lk-2, . . . , L1. For example, since L3 = {123} is confirmed to be still large in the new database, MAAP includes {12}, {13}, {23} to L2 and {1}, {2}, {3} to L1. By so doing, some computation time is saved. Utilizing the large item sets computed as the result in the old database, the MAAP algorithm proceeds by checking if each item set in L3 = {ABC, ACD, BCD} is still large in the new database. Since ABC is large in the new database, AB, AC, BC, A, B, C are also large item sets. Thus, ABC is added to L'3 (L'i denotes the large i-itemsets in the new database), add AB, AC, BC to L'2, add A, B, C to L'1. It continues to test the next item set in L3 which is ACD and since ACD is large, AC, AD, CD, A, C, D are also large item sets. MAAP adds ACD to L'3, if not already a member of this large item set. After testing ACD, L'3 has elements ABC, ACD; L'2 has AB, AC, BC, AD, CD; L'1 has A, B, C, D. Next, the last item set in L3 is tested, which is BCD. BC, CD, BD, B, C, D are large item sets since BCD is a large item set. These item sets are included to L'3, L'2 and L'1 respectively. The temporary results at this stage are: L'3 = {ABC ;ACD ;BCD }, L'2 = { AB ; AC ; BC ; AD ;CD ; BD}, L'1={A;B;C;D}. These are not yet the final updated large item sets in the new database. These large item sets are only some large item sets in the old database that remain large item sets in the new database [19], [20], [21].

The next step checks the item sets in each of the old L1 to Lk-1 large item sets not yet in the corresponding new L'1 to L'k-1 large item sets to see if they are still large in the new database. Continuing with the running example, L1- L'1 = {E} represents large 1-itemsets in the old database not yet in the so far computed L'1 = {A; B; C; D}. There is need to scan the new database to obtain the support of item E which is included in the new large item set if this support is greater than or equal to minimum support. Since E is large, the algorithm adds E to L0 1. The same way L2 - L'2= {AE} is also used to compute additional elements of L'2 as { } since AE is not large. To complete the computation of large item sets, the algorithm finally checks all previous small item sets in the old database to see if they are now large in the new database. For

1-itemset, to compute the old small item sets, e.g. S1, MAAP subtracts the large 1-itemsets from its candidate set, C1. Thus, S1 = C1 - L1 = {F} [19], [21].

The algorithm scans the new database for the support of F. Since this support does not meet the min support requirement, then F is still not a large item set. For 2-itemsets, S2 = C2 - L2 = {BE, CE, DE}. After scanning the new database, only BE is a large item set and this is included in the L'2 to make L'2 = {AB; AC; AD; BC; BD; BE; CD}. Since these item sets were small in the old database, they are not part of the old candidate item set and thus, that need to modify the new candidate item set one level higher than the computed large item set. Here, additional C'3 = L' 2 {BE}= {BDE}, those new candidate item sets are added to S3. And C'3 = C'3 U additional C'3 = {ABC, ABD, ACD, BCD, BDE}. For each item set in the additional C'3, MAAP checks the new database for support and if item is large, it is included in the new large 3-itemset L'3. Since none of the elements in the additional C'3 is large, final L'3 = {ABC, ACD, BCD}. The final large item sets for all levels is now L' = {A;B;C;D;E;AB;AC;AD;BC;BD;BE;CD;ABC;ACD;BCD}.

4.3.4 Borders [Feldman ]

This algorithm finds the frequent itemsets from the dynamic dataset, using the frequent itemsets already discovered from the old dataset. Here, an infrequent itemset is termed as border set if all the non empty proper subsets of it are frequent. Due to the insertion of new records to the dataset, some of the border sets may become frequent, and is termed as promoted border set. For that- the border sets of the old dataset also have to be maintained along with the frequent sets derived. Based on the promoted border set, some new candidate itemsets are generated and checked for frequent set. For the above example B1={F}, B2={BE, CE, DE} and B3={ ABD, ABE, ACE, ADE} are the border sets. On the updated dataset only BE becomes promoted border set. Then the candidates due to the promoted border set are generated. Dataset is scanned to get the support of the itemsets and the frequent item sets are found. The candidates are generated if there is at least one promoted border set. This algorithm may require more than one pass of the old dataset depending on the frequent sets discovered due to the incremented part. The borders algorithm to handle the incremental dataset[19],[21],[22].

4.3.5 Modified borders [Das and Bhattacharyya]

This algorithm is a modified version of the borders algorithm that minimizes unnecessary candidate generations. However, this algorithm uses an additional user parameter apart from the parameter support count which are sensitive. With proper tuning of these parameters only- a better performance of the algorithm is possible. When this additional parameter's value is closer to the support count, the algorithm converges to the borders algorithm. Depending on this parameter, the border sets has been divided into four different sets B', B'', B''' and B''''. The probability of becoming promoted border set is the highest for the elements of B' and lowest for B''''. [5]

Writing Services

Essay Writing
Service

Find out how the very best essay writing service can help you accomplish more and achieve higher marks today.

Assignment Writing Service

From complicated assignments to tricky tasks, our experts can tackle virtually any question thrown at them.

Dissertation Writing Service

A dissertation (also known as a thesis or research project) is probably the most important piece of work for any student! From full dissertations to individual chapters, we’re on hand to support you.

Coursework Writing Service

Our expert qualified writers can help you get your coursework right first time, every time.

Dissertation Proposal Service

The first step to completing a dissertation is to create a proposal that talks about what you wish to do. Our experts can design suitable methodologies - perfect to help you get started with a dissertation.

Report Writing
Service

Reports for any audience. Perfectly structured, professionally written, and tailored to suit your exact requirements.

Essay Skeleton Answer Service

If you’re just looking for some help to get started on an essay, our outline service provides you with a perfect essay plan.

Marking & Proofreading Service

Not sure if your work is hitting the mark? Struggling to get feedback from your lecturer? Our premium marking service was created just for you - get the feedback you deserve now.

Exam Revision
Service

Exams can be one of the most stressful experiences you’ll ever have! Revision is key, and we’re here to help. With custom created revision notes and exam answers, you’ll never feel underprepared again.