In previous chapter, fundamentals of web mining Markov process were discussed. In this chapter, the approach of some of the researchers for link prediction using web mining & Markov model is presented.
Caching popular objects close to the users provides an opportunity to combat this latency by allowing users to fetch data from a nearby cache rather than from a distant server. Web caching has been recognized as one of the effective schemes to alleviate the service bottleneck and reduce the network traffic, thereby minimize the user access latency, but it has the drawback that it stores the pages without any prior knowledge. Predictive caching becomes an attractive solution where in the forthcoming page likely to be requested soon are predicted based on user access logs information and prefetched , while the user is browsing the current display pages.
As web page prediction gained its importance, this thesis proposes a bracing approach for increasing web server performance by analysing user behavior, in this prefetching and prediction is done by preprocessing the user access logs and integrating the three techniques i.e. Clustering Markov Model and association rules which achieves better web page access prediction accuracy.
Three fundamental questions that may be asked by the users while navigating the Web site are as follows :
Where am I now?
Where have I been?
Where can I go next?
From the current browser, user can give the good answer of above first two question but fail to third one. To know where currently the user is, he/she can check address bar field of the explorer. So that users are able to get the answers of the first two questions very easily but they cannot get the answer of the third question directly from Web browser.
Figure Navigational Problem for the Web User
Above figure depicts one of the navigational problems for the web user, i.e., from the current page where he/she can go next? One may answer that link with the highest probability is selected, i.e., if any user enters to IT department of educational site then his/her tendency is to go for IT department profile, IT faculty profile etc., but very less chance to go for admin department.
Link prediction can help the users to find the answer to the third question. Navigation process of a user on a web site can be modelled as a "first-order Markov chain", i.e., the next page to be visited by a user is only dependent on the current page. First a link structure is constructed, also called a link graph based on past users' visit behavior recorded in web log file. It consists of nodes representing web pages, links representing hyperlinks & weights as users' traversals on hyperlinks of the web site. Then link graph is used to build a "Markov Chain" of the web site. This model is used then for link prediction.
Predicting the next page to be accessed by Web users has attracted a large amount of research work lately due to the positive impact of such prediction on different areas of Web based applications. Major techniques applied for this intention are Markov model. Markov model is the most commonly used prediction model because of its high accuracy. Markov model is framework used for predicting the next page to be accessed by the Web user.
Markov models have been used for studying and understanding stochastic processes, and were shown to be well-suited for modelling and predicting a user's browsing behavior on a web-site. Markov models are becoming very commonly used in the identification of the next page to be accessed by the Web site user based on the sequence of previously accessed pages. Markov models are represented by three parameters < A, S, T >, where A is the set of all possible actions that can be performed by the user; S is the set of all possible states for which the Markov model is built; and T is a |S| Ã- |A| Transition Probability Matrix (TPM), where each entry tij corresponds to the probability of performing the action j when the process is in state i
The navigation probability provides the means to predict the next link choice of unseen navigation sessions and thus can be used for prefetching links in adaptive we applications.
Above process can be summarized as follows.
Data cleaning or preprocessing (Filtering log files)
Construction of link graph
Construction of transition probability matrix (Markov model)
Limitations of traditional Markov models
Traditional Markov models predict the next Web page a user will most likely access by matching the user's current access sequence with the user's historical Web access sequences. The 0-order Markov model is the unconditional base-rate probability p(xn) = Pr(Xn), which is the page visit probability. The 1-order Markov model looks at page-to-page transition probabilities: p(x2 | x1) = Pr(X2 = x2 | X1 = x1). The Kth order Markov model considers the conditional probability that a user transitions to an nth page given his or her previous k = n - 1 page visits: p(xn | xn-1,..., xn-k) = Pr(Xn = xn | Xn-1 = xn-1,..., Xn-k = xn-k).
Lower-order Markov models cannot successfully predict future Web page access because they do not look far enough into the past to correctly discriminate users' behavioural modes. Thus, good predictions require higher-order models. Unfortunately, higher-order models result in high state-space complexity and low coverage.