This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Number of people who uses internet and websites for various purposes is increasing at an astonishing rate. More and more people rely on online sites for purchasing rented movies, songs, apparels, books etc. The competition between numbers of sites forced the web site owners to provide personalized services to their customers. So the recommender systems came into existence. Recommender systems are active information filtering systems and that attempt to present to the user, information items in which the user is interested in. The websites implement recommender systems using collaborative filtering, content based or hybrid approaches. The recommender systems also suffer from issues like cold start, sparsity and over specialization. Cold start problem is that the recommenders cannot draw inferences for users or items for which it does not have sufficient information. This paper attempts to propose a solution to the cold start problem by combining association rules and clustering technique.
Keywords- cold start, association rule, clustering, taxonomy, user profile
Recommender systems are now an integral part of online sites. They are very useful in recommending items or products to user according to their interests. The origin of recommenders can be traced back to methods like cognitive science, approximation theory, information retrieval and management science. The benefits of having a recommender system are cross-selling, personalization, keeping the customers informed and customer retention. Some of the websites that use recommenders are Amazon, MovieLens, eBay, CDNow, MovieFinder. In collaborative filtering approach, the system recommends new items to the user by analyzing items purchased by similar users (Amazon.com). In content based approach, recommend items with similar contents to the items preferred by the target users (PandoraRadio). In hybrid approaches, both the content based and collaborative approaches are used to provide
recommendations (Netflix). These approaches provide the customers with a number of recommendations.
Cold start problem (new user, new item) is one of the major issues in recommenders. In the case of a new user, the number of ratings will be very less. This implies that the user profiles (consist of ratings given to the items) will be very short. The new user will be given non personalized recommendations till an adequate number of ratings are collected for the user. For a new item updated in the system, there will be no ratings initially. The possibility that this item will be recommended to the user is minimal. These problems should be addressed because the initial recommendations given to a new user plays an important role in deciding the user satisfaction and retention. Only if quality recommendation is given, the users will come back to the site.
Various methods exist for addressing the cold start problem. Some of these methods are based on association rules, clustering, classification etc. Many hybrid recommenders also exist for solving this issue. In our paper we make use of association rules and clustering technique for solving cold start. Association rules are used to create and expand the user profile so that it will contain more number of ratings (solves new user problem). The clustering technique is used to group items and based on some similarity measures, make prediction for item (solves new item problem).
The paper is organized as follows. Section II discusses related works. Our proposed approach for solving the cold-start problem is presented in Section III. Section IV outlines the proposed methodology. Finally, Section V concludes the paper.
In this section we review some of the works related to our proposed approach.
Much work has been done in the area of recommender systems. In their work, Qing Li and Byeong Man Kim explain how the clustering techniques can be applied to the item-based collaborative filtering framework to solve the cold start problem . The work done by Gavin Shaw, Yue Xu and Shlomo Geva explains how to expand a user profile from a dataset with the help of association rules . The collaborative filtering, content based and hybrid approaches and the issues in recommender systems are clearly explained in the survey done by Adomavicius and G. Tuzhilin . Schein and Popescul et al... , proposed a hybrid recommender system, the aspect model, to recommend items that is not yet recommended . The relationship between ontology's and recommender system and how to exploit this synergy to solve cold start problem is given by Middleton, Alani et al… . Ziegler, C.N. Lausen and G. Schmidt proposes a method in whch the taxonomic background knowledge is used for computing personalized recommendations in particular domain . In  Leung, et al… Chung implements a hybrid recommendation algorithm which makes use of Cross-Level Association Rules (CLARE) to integrate content information about domain items into collaborative filters. Pasquier, N. Taouil, et al… explains how to remove the redundant association rules without reducing the information in . In , ,  by Shaw, Xu and Geva they implemented a method which allows the removal of hierarchically redundant approximate basis rules from multi-level datasets through the use of the dataset's hierarchy or taxonomy. The new algorithmic elements that increase the accuracy of collaborative filtering is discussed by Herlocker, Konstan et al... in . Q. Li and B. Kim described a new filtering approach that combines the content-based filter and collaborative filter to achieve a good performance . The basics of recommenders and the e-commerce sites which use the recommenders are described in  by Schafer, Konstan, and Riedl. Al Mamunur Rashid et al… proposed an online simulation framework to address cold start problem in their paper . Another interesting work done by Sunita B Aher, and Lobo proposed various combinations of algorithms in recommending the courses to students in E-learning . A combination of clustering and association rules was used to improve recommendation for digital library in the study conducted by Hui Lia and Xinyue Liub . In  Herlocker, Konstan et al… explain the key decisions in evaluating the collaborative filtering recommender system. , by Amit Singhal give a brief overview of the key advances in information retrieval field. The basic concepts of collaborative filtering and its limitations and a clustering based algorithm for a large dataset is explained by Badrul M. Sarwar et al… in .
In this section we draw an outline of our proposed approach for solving the cold-start problem in recommender systems.
Our approach is to combine two existing approaches in a sequential manner. First we apply association rule technique to expand the user profile as suggested by Gavin Shaw, Yue Xu and Shlomo Geva. With the help of this expanded user profile, we apply clustering techniques for recommendation focusing on new item recommendation as explained by Qing Li and Byeong Man Kim.
The architecture of the proposed work is outlined in Fig. 1.
Fig. 1: Architecture of recommender system based on association rule and clustering technique
Apply association rule technique
Apply clustering technique
Now let us go through the mentioned techniques in some detail.
Association rule technique
Recommendation system shows best behavior when the user profiles are extensive and dataset has high information density. Expanding user profile means it contains more ratings. From the existing taxonomy driven user profile (P) construct a transactional dataset . A transaction refers to the topics that the user is interested in. Using this transactional dataset, mine the frequent patterns. Association rules between topics that interest users can be derived from the patterns. These rules allow us to discover those topics that frequently appear together. Now consider this rule set and the user profiles (P) we have in hand. For each user profile p (ux), extract all topics (t) within and list of all combination possible from group of topics is generated. Each combination represents a possible antecedent of an association rule. Take each combination and search the set of association rules for any rules that have a matching antecedent. If matching rule exists, take topics in its consequent and add them to the profile. For each new topic assign a weight based on the weight of the topics in the rule's antecedent.
where |A| represents the number of topics in the antecedent of the rule R: A-> C. Finally, the topics in the expanded profile are normalized . The topic scores of profile p are normalized through the following formula.
Where Limit is the value to which a profile of normalized topic values is to sum to. This approach thus helps us in resolving the new user problem to an extent.
Clustering technique is one of the most important techniques which have got a wide variety of applications. In this approach we apply clustering algorithm to group items as explained in . The result represented by fuzzy set is used to create group-rating matrix. The group ratings are used to group the items and to provide content based information for collaborative similarity calculation. Here k-means algorithm is used and the affiliation between an object and cluster is represented using fuzzy set theory. So items are grouped and the possibility of an item belonging to a cluster is given as
where Pro (j, k) represents the possibility of object j belonging to the cluster k; CS (j, k) represents the counter-similarity between object j and cluster k; MaxCS (i, k) represents the maximum counter-similarity between an object and cluster k. Fuzzy k-means algorithm is also applied to group the items.
Compute the sub similarity of group rating matrix and sub similarity of item rating matrix. Sum up to get the total similarity. In this approach, Pearson correlation based similarity and adjusted cosine similarity is used and then makes a linear combination of the results. Pearson correlation measures the degree to which a linear relationship exists between two variables. It calculates similarity from item rating matrix.
sim (k, l)= (∑u=1m (Ru,k - Rk') (Ru,l - Rl'))/(√(∑u=1m (Ru,k - Rk')2)√(∑u=1m(Ru,l-Rl')2)) (4)
where sim (k, l) is the similarity between item k and l; m is the total number of users, which rated on item k and l; Rk' and Rl' are the average ratings of item k and l. Ru,k and Ru,l is the rating of user u on item k and l.
Similarity from the group rating matrix is found by using adjusted cosine similarity,
sim (k, l)= (∑u=1m (Ru,k - Ru') (Ru,l - Ru'))/(√(∑u=1m (Ru,k - Ru')2)√(∑u=1m(Ru,l-Ru')2)) (5)
where sim (k, l) is the similarity between item k and l; m is the total number of users, which rated on item k and l; Ru' is the average ratings of item k and l. Ru,k and Ru,l is the rating of user u on item k and l.
Now take the combination of the above to get the total user similarity
sim (k, l) = sim (k, l)item * (1-c) + sim (k,l)group * c (6)
Then make a prediction for item by performing a weighted average of deviations from the neighbor's mean. Prediction on item i of user k is
Pu,k = Rk' + (∑i=1n (Ru,i - Ri') * sim (k, i) )/ (∑i=1n |sim (k, i)|) (7)
Where Pu,k is the prediction for the user u on the item k; n is the total neighbors of item k. In equation 7, Rk' is the average rating of all ratings on item k. For a new item, Rk is zero. So equation 7 cannot be applied for a new item. For a new item, we make use of two methods. One, we use Rneighbors', the average rating of all ratings on the new items' nearest neighbors instead of Rk'. Second, we use a weighted sum method proposed by 
Pu,k = (∑i=1n (Ru,i * sim (k, i) )/ (∑i=1n |sim (k, i)|) (8)
Where Pu,k is the prediction for the user u on the item k; n is the total neighbors of item k. Thus we attempt to solve the new item problem.
Combination of association rule and clustering technique
The above techniques are combined in a sequential manner in our proposed work. Initially, we expand the user profiles using association rule technique. The result is the enriched user profile which now contains more ratings. This item rating matrix can be used in the next phase which is clustering. Clustering technique is applied to the new items so that finally the user is provided with the top N recommendations.
For our experiment we use the MovieLens dataset from GroupLens. The dataset contains missing values and so comes the need for preprocessing. We preprocess the dataset using the Weka software and store the preprocessed data in MS Access database.
Now user profiles have to be created based on the information obtained from the users. Taxonomy tree for movie is created manually. The taxonomy based user profile is created as explained in . Using this taxonomy driven user profile (P) construct a transactional dataset and mine the frequent patterns. The association rules between topics that interest users can be derived from the patterns and these rules allow us to discover the topics that frequently appear together. For each user profile, next step is to extract all topics. List of all combination possible from group of topics is generated. Each combination represents a possible antecedent of an association rule. For each combination, search the set of association rules for any rules that have a matching antecedent. If a matching rule is found, take topics in its consequent and add them to the profile. For each new topic assign a weight based on the weight of the topics in the rule's antecedent as given in Equation (1). The topics in the expanded profile are normalized using Equation (2). This expanded profile will have more number of user ratings. Now this user profiles which contain the user rating for movies can be used to solve the new item problem. Now apply the clustering technique as explained above. Finally we will get a set of recommendations for the users. From the available recommendations, we take the Top-N and give to users.
The quality of the recommendations will be checked using the evaluation metrics such as precision, recall and F1 measure .
Our work is based on the suggestion that the use of taxonomy driven profile will improve the recommendations since we will be able to cover more topics in which the user is interested in. The enriched profile thus generated will be an added advantage in clustering the new items and produce quality recommendations for the users. So we wish that our approach will be better than applying the association rule and clustering techniques separately.
Our proposed methodology for the implementation of this proposal is shown in Table 1.
Table 1: Phases in proposed methodology
Data collection and data preprocessing
Creation of taxonomy based user profiles
Application of association rule technique to enrich the user profiles
Application of clustering technique to improve item recommendation
Project the Top-N recommendations to the user
Evaluating the recommendation quality using metrics
In this paper we proposed a method to solve the cold start problem in recommender system. The proposal is in the implementation stage and is completed till the dataset preprocessing. Using the results we are also planning to do a comparative study of the quality of recommendations when we apply association rule technique only, clustering technique only and combination of association rule and clustering technique.