Predicting User Behavior Through Sequential Computer Science Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Web usage mining is a technique to discover usage patterns from Web log in an order to understand and fulfill the needs of users navigating on the Web. Sequential pattern mining is a significant application of web usage mining which is used to extract useful and relevant information from the huge amount of data. Web usage mining has become one of the major fields of most frequently used knowledge domain. It basically mines the data from the web log server by applying algorithm to the pattern discovery phase. This results in the sequences of web pages that are frequently accessed by the web users over a period of time. One of the major issues of web usage mining is frequent pattern discovery. This paper focuses on the navigation behavior of user which can be predicted with the knowledge of the browsing pattern gathered from the previous stage; current user can be recommended links to pages that are similar to the once that are presently viewed which results in caching and pre-fetching.

Keywords: Web Usage Mining, Sequential Pattern Mining, Apriori, Eclat, Recommended System.

1. Introduction

The World Wide Web serves as a vast, widely distributed, global information service center for advertisement, consumer information, e-commerce, education, financial management, government, news and many other information services. So, it has become much more difficult to access relevant information from the web with the explosive growth of information available on the internet. Therefore, further research work needs to be carried out on the existing web services as the services offered are not so adequate enough to satisfy the needs of different web users. As million of web pages are accessed by the users for business and personal transactions, the web servers contains enormous amount of web page access data. This paper focuses on adopting an intelligent technique that can provide personalized web service for accessing related web pages more efficiently and effectively, so that it can be determined which web pages are more likely to be accessed by the user in future [1].

1.2 Sequential Pattern Mining

Sequential mining is the process of applying data mining techniques to a sequential database for the purposes of discovering the correlation relationships that exist among an ordered list of event[2]s. Web usage mining is a vital problem with wide applications, including the analyses of customer purchase behavior, web access patterns, scientific experiments, disease treatments, natural disaster, and protein formations. The algorithm for the sequence pattern mining extract the sequence database looking for repeating patterns (known as frequent sequences) that can be used later by end users to find associations between the different items or events in their data for purposes such as business enterprises, marketing campaigns, planning and prediction[3].

Web log mining an exceptional case of sequential pattern mining deals with finding user navigational patterns by extracting knowledge from web logs, where ordered sequences of events in the sequence database are composed of single items[14] , with the assumption that a web user can actually access only one web page at any given point in time[4]. Presently, most web usage mining solutions consider that user access one web page at a time, which gives rise to special sequence database with only one item in each sequence's ordered event list. Thus, for a set of events E = {a, b, c, d, e, f },which may represent product web pages accessed by the clients in an E-Commerce application.

Algorithm for Sequential Pattern[3] Mining

INPUT:

T = (A1, A2 ,..., An) / / Database from the filtered Data where A1, A2 are the item sets

miniSup / / Support = 20%

OUTPUT: Frequent Sequential Pattern

Algorithm for Sequential Pattern Mining:

T = T sorting on Transaction ID and Find Candidate Sequence table

L = Apriori (T, minSupp, L)

Find the sequential data of L;

Table 1. Sequential Pattern Mining Using Apriori

Sequential Pattern Algorithm using ECLAT

INPUT:

D = (T1, T2, …. Tn) / / Database from the filtered Data where T1, T2 are the Transaction ID

miniSup / / Support = 20%

OUTPUT: Frequent Sequential Pattern

Algorithm for Sequential Pattern Mining:

D = D sorting on Item sets

L = ECLAT (D, minSupp, L)

Find the sequential data of L;

Table 2: Sequential Pattern Mining using Eclat

1.2.1. Apriori Algorithm

Apriori is a algorithm proposed by R. Agrawal and R Srikant in 1994 [5] for mining frequent item sets for Boolean association rule. This algorithm uses prior knowledge of frequent item set properties. Apriori make use of an iterative approach known as Level-wise search, where k item set are used to explore (k+1) item sets. Each iteration consists of two steps.

i. Generates a set of candidate item sets

ii. Count the occurrence of each candidate set in database and Prunes disqualified

Pruning Techniques used by Apriori: Apriori uses two pruning techniques

i. First is based on Support Count (Greater than User specified support threshold)

ii. Item set to be frequent, all its subset should be in last frequent item set.

The iterations process starts with size 2 item sets and the size is incremented after any iteration. According the algorithm if a set of items is frequent, then all its proper subsets is also frequent.

1.2.2. ECLAT

This algorithm is based on the concept of depth-first search. It is opposite of Apriori algorithm so it prunes itemsets that have a lower support than the threshold. It calculates the support for itemsets by maintaining a transaction list for each item [6]. In this manner the transaction database is only required once for counting the support. Support for the subsequent.

1.2.3. Recommendation System

The navigation behavior of the client can be predicted with the knowledge of browsing patterns gathered from the previous stage, current user can be recommended links to pages that are similar to the one's is presently viewing[7]. Recently browsed user's pattern is matched with the analyzed data of previous users and based on this comparison; current user is suggested similar web pages of interest not yet visited.

The pages requested by the client can be discovered by two methods:

i. Highest Confidence

ii. Last Sequence

Highest Confidence: It is chosen to predict next page, Highest confidence is gathered from the predecessor as it is based on pattern matching rule. Last Sequence: It considers the Highest Confidence if different rules are equal. This procedure selects rules where the requested pages are approximately near to the consequent. This process basically considers the distance between pages requested by a user and the consequences of a rule whereas the distance is the number of clicks from one page to another.

User navigation prediction model is built based on the patterns extracted by him-self. The paper hypothesis is that pages accessed in recent times have a great influence on pages that will be accessed in the near future. The prediction for the users' navigation behavior is based on discovered rules matching the current user session.

1.3 Web Usage Mining

Web usage mining is an application of data mining in which the meaningful information is extracted from the Web Server Log for the various purpose such as for the business strategies, financial activities etc. Web usage mining is the concept of web mining activity which involves the automatic detection of user access patterns from one or more Web servers. The goal of web usage mining is to analyze the user access behavior patterns. Web mining can be practices in three different domains i.e. the content mining, hyper link web structure mining and web usage mining[8]. These approaches effort to extract valuable information from the web which are then applied to some real world problems. Fig. 1 Show the web mining applications.

Web Mining

Web Content

Web Usage Mining

Web Structure

Business Intelligence

Characterization

Website Modification

System Improvement

Personalization

Figure 1 Applications of Web Mining

Web usage mining is an application of web mining concentrates on mining the access patterns of users. It tries to find user's behavior which is approached from web server log analysis.

Web usage mining is important for cross marketing strategies, web advertisements and promotion campaigns. It is an application of data mining techniques that extract usage pattern from the click-stream [15]. The extraction of valuable information about users' accesses is obtained from analysis of navigation behavior from the web server logs, where all accesses to web pages are recorded. The access information includes IP Address (Request Originated), Page Requested (URL), Time and Date of the request etc.

The output generated from the pattern analysis consists of sequences of accesses with corresponding probabilities. The algorithms used to mine the usage are association rule mining and sequence analysis. Association rule mining [9] discovers relationships between different web pages within a web site whereas Sequence mining is the process of applying data mining techniques to a sequential database that discovers the correlation relationships that exist among an ordered list of events. Sequential pattern is used for pre-fetching instead of simple association rules; this approach helps to find out the order in which the pages are visited, reduces the bandwidth usage and storage needs, which undoubtedly results in improving the system efficiency and effectiveness i.e. an improved system.

This paper focuses on analyzing the navigation behavior of the web access users from the filtered data so that they can be served better in future. This paper is organized into following sections. Section II presents the Material & Methodologies. Section III consists of Result and Discussion. Finally Section IV concludes the paper.

2. Material & Methodologies

Web usage mining is an application of data mining in which the meaningful information is extracted from the Web Server Log for the various purpose such as for the business strategies, financial [10] activities etc. Web usage mining is the concept of web mining activity which involves the automatic detection of user access patterns from one or more Web servers. The goal of web usage mining is to analyze the user access behavior patterns.

Web Server

Server Logs

Discover Sequential Pattern(Apriori & Eclat)

Access Sequence (Current)

Recommendation Rules

Recommended Pages links with the Requested Page

Preprocessing

Data Cleaning

User Identification

Session Identification

Path Completion

Storage in MS -Access/ Excel/ SQL

Database

Mining Techniques

Analyzed Pattern

Recommended System

Filtered Data

In this paper the material is taken from the Learning Management System of Graphic Era University. An experiment has been conducted on a filtered data of the web log file. This paper applies the concept of data mining onto the filtered datab. The proposed architecture working is shown in Figure 2. Data Mining is a technique that is used to extract the meaningful knowledge from the large databases. This is also known as Knowledge Discovery. The methodologies applied in this paper are Sequential Mining Techniques using the Apriori and Eclat Algorithm.

Client Request

Figure 2. Working of Proposed Architecture

2.1 Prorposed Work for Apriori and Eclat

2.1.1 Apriori Algorithm

It is a breadth-first search algorithm, [11] it makes use of two-pass strategy for finding frequent item sets. The lists of candidate item sets are generated in the first pass at each level and item sets are pruned that are supersets of infrequent item sets. In the second pass support values for the remaining item sets are calculated and again performing pruning of those item sets which have a support less than the user-defined threshold. Item sets support can be calculated by either counting the number of transactions in the database for each item set [12].

2.1.2. Pseudo code for Apriori

Input: Preprocessed Data, Minimum Support (minSup).

Output: Generating Frequent Item Sets From The Preprocessed Data.

F1= Frequent Item Set

j=n; /* Maximum Number Of Elements N */

for n= MAXLENGTH to 1

{

for i=n to 2

{

while each transaction Fi

{

if (Fi Repeated)

{

Fi.increment++;

}

x=0;

for (;i<j-x;)

{

if ( Fi is a subset of each transaction Fj-x of order j-x)

{

Ti.increment++;

x++;

}

}

If (Fi.increment >= minSup)

{

Rule Fi generated /*Store the transaction */

}

i++;

}

}

}

2.1.3 Eclat

The Eclat algorithm utilizes the aggregate memory of the system, it portioned the candidates into disjoints sets using the concept of Equivalence Class Partitioning [13]. It was aim to defeat the shortcomings of the Count and Candidate Distribution algorithms. Eclat uses the concept of vertical database layout which keeps all relevant information in an itemset's tid-list. In Éclat local database partition is scanned only once whereas in contrast Candidate Distribution must scan it once in each iteration. Eclat doesn't search complex data structure, it doesn't generate all the subsets of each transaction thus doesn't pay the extra computation overhead.

Begin Pseudo code

// Initialization Step

Mining frequent item set in the database

Then take the containing transaction ID

Support = 20%

// Transformation Step

Mining Frequent 2 - Item sets in Vertical data format

If (Transaction ID < Support ) then

Left those items sets

// Asynchronous Step

Mining Frequent k - item sets in Vertical data format

If (Transaction ID < Support) then

Left those items sets

k = 1 to n

// Final Reduction

Now Total Results and Output

End

Transaction ID

Item sets

T100

{A1 A2} {A1}

T200

{A1 A3 A4} {A1 A2}

4. Result & Discussions

4.1 Working of Apriori

User Name

Category

T1

{Web, IT, SE} {IT} {IT, Web, Edu}

T2

{Down, Govt, IT} {IT, Govt} {Web, IT, SE}

T3

{Edu, Down} {Edu, IT, Web} {Web, IT SE, Down}

T4

{IT, Web, Edu} {IT, Govt, Down}

T5

{Web, IT, SE} {IT, Govt} {Edu, IT, Web} {IT}

T6

{Down, Govt, IT} {IT, Web, Edu}

T7

{Web, IT, Down, SE}

T8

{Edu, Down} {IT, Govt} {IT}

Category

Items

Supp_Min

{Web, IT, SE}

A1

3

{IT, Web, Edu}

A2

5

{IT}

A3

2

{Down, Govt, IT}

A4

3

{IT, Govt}

A5

3

{Edu, Down}

A6

2

{Web, IT, SE, Down}

A7

2

Minimum Support = 20%

Category

Support

A1A2

2

A1 A3

2

A1 A5

2

A2 A3

2

A3 A5

2

Frequent Data Sets

A1 A2 A3

2

A1 = {Web IT SE}

A2 = {IT, Web, Edu}

A3 = {IT}

4.2 Working of ECLAT

Category

User Name

A1

T1, T2, T5

A2

T1, T3, T4, T5, T6

A3

T1, T5, T8

A4

T2, T4, T6

A5

T2, T5, T8

A6

T3, T8

A7

T3, T7

Category

TID

A1 A2

T1, T5

A1 A3

T1, T5

A1 A5

T2, T5

A2 A3

T1, T5

A3 A5

T5, T8

A1 A2 A3

T1, T5

Frequent Item Sets

A1 = {Web IT SE}

A2 = {IT, Web, Edu}

A3 = {IT}

In this paper we have applied two sequential data mining techniques using Apriori and Eclat. The working of both of the algorithms is shown above. From the above shown working of Apriori we can conclude that Apriori uses more number of candidate sequence sets , it prunes the infrequent pattern from the data. On other hand Eclat generates less number of sequence tables which takes less time for generation of frequent accessed patterns as compared to Apriori. In apriori if massive data is their then it takes huge time to generate the frequent accessed patterns.

Conclusion

Web usage mining is important for cross marketing strategies, web advertisements and promotion campaigns. It is an application of data mining techniques that extract usage pattern from the click-stream. The extraction of valuable information about users' accesses is obtained from analysis of navigation behavior from the web server logs, where all accesses to web pages are recorded. This paper proposed an efficient sequential pattern mining techniques using the Apriori and Eclat algorithm for the large databases. These algorithms helps to find out the navigation behavior of the user based on the previous visits which later on enhances the web according to the users interest. Web usage mining includes three process namelyIn present paper an experiment has been conducted on the college data.

Writing Services

Essay Writing
Service

Find out how the very best essay writing service can help you accomplish more and achieve higher marks today.

Assignment Writing Service

From complicated assignments to tricky tasks, our experts can tackle virtually any question thrown at them.

Dissertation Writing Service

A dissertation (also known as a thesis or research project) is probably the most important piece of work for any student! From full dissertations to individual chapters, we’re on hand to support you.

Coursework Writing Service

Our expert qualified writers can help you get your coursework right first time, every time.

Dissertation Proposal Service

The first step to completing a dissertation is to create a proposal that talks about what you wish to do. Our experts can design suitable methodologies - perfect to help you get started with a dissertation.

Report Writing
Service

Reports for any audience. Perfectly structured, professionally written, and tailored to suit your exact requirements.

Essay Skeleton Answer Service

If you’re just looking for some help to get started on an essay, our outline service provides you with a perfect essay plan.

Marking & Proofreading Service

Not sure if your work is hitting the mark? Struggling to get feedback from your lecturer? Our premium marking service was created just for you - get the feedback you deserve now.

Exam Revision
Service

Exams can be one of the most stressful experiences you’ll ever have! Revision is key, and we’re here to help. With custom created revision notes and exam answers, you’ll never feel underprepared again.