This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
The rapid development and widespread popularity of Internet led to the explosive growth of online information. In order to provide efficient information service to users, it is vital for the search system to find the accurate information in a timely manner. However, the utilization rate of the information on Web is not high. Surfing the internet in search of information to find the desired documents or web pages, users do not have that much time and patience to modify queries, wait for the results and go over them. What's worse, some users always use vague keywords. For this reason, explicit and implicit feedback techniques that train models of users' information demand based on users' actions and events are becoming more and more important. These feedback techniques form a winning combination, able to offer the users more accurate information than search engines which do not use feedback. In this paper, a couple important implicit and explicit feedback search techniques and approaches are intorduced, along with discussion of strengths and weaknesses between them. Finally, the author argued a position from his point of view.
Despite the rapid development of search technology and enhancing effectiveness of search engine, the persistently increasing amount of Web data has become a critical problem for the Web search task. To deal with this problem - the over loaded information on the Web - plenty of search techniques have been developed.
Currently search engine is the most widely used method to access information on the Web. However, most of time, it is difficult for users, especially for the primary users, to fully express his need and retrieve the information by simply using several keywords. Some search engines use manual methods to summarize and classify the websites for information navigation, which has achieved great convenience, but it is high cost, such as Google's directory. A description of this kind of website is also very sketchy, and users usually cannot query the important internal information from the websites.
As a result, with the rapid development of information search technology, it is an urgent to develop more simplified, personalized, specifically a high level of network information retrieval tools to help users get faster and more accurate access to the information they need, which requires the emergence of personalized search techniques, such as implicit and explicit feedback search techniques. These techniques make use of effective feedback, organizing the results on the basis of each user's information needs. On the contrary, the same result list is returned for the same query by the traditional search systems, no matter who submitted the query and regardless of what are the users' demands. As a result, it's necessary to study the reactions and behavior of the users when they are searching the information.
The explicit and implicit feedback techniques and typical instances are discussed in the following sections. The framework of this paper is as follow: a brief overview of the implicit and explicit feedback techniques is introduced in section 2. Explicit feedback search techniques are reported in Section 3.1. The implicit feedback techniques based on the current activity's context or search histories is reported in Section 3.2. Comparison of explicit and implicit feedback techniques and argument are presented in Section 4. Finally, conclusions are presented in Section 5.
2. An overview on implicit and explicit feedback search techniques
Last section is an introduction of the intention of the implicit and explicit feedback search techniques. This section is an overview of the implicit and explicit feedback search techniques and tools proposed to achieve this goal.
2.1 Definition of implicit and explicit feedback
Explicit feedback is obtained from users of relevance indicating the relevance of a document retrieved for a query. The precondition of explicit feedback is that the users know that the feedback provided is used as a factor to be integrated in the retrieval process to improve the search performance.
Implicit feedback is obtained from user's data and usage date, such as user's interests, monitoring which documents they select for viewing, the duration of time spent viewing a web page .
The major differences between these two kinds of relevance feedback include:
For implicit relevance feedback, the users are just trying to satisfy their own needs, not intentionally do anything for the benefit of the IR system.
For implicit relevance feedback, the users usually will not be informed that their data and usage data will be integrated into relevance feedback.
2.2 Sources of implicit and explicit feedback
In order to provide effective personalized assistance, one of the most significant problems to be handled is to acquire users' preferences. Some approaches use machine learning to analyze user data, which describes the users' personal characteristics, while others use data mining techniques on web sifting histories or search system logs. There are differences between users' data and usage data. The latter are related to users' behavior while interacting with the system, while user's data is information like name, age, address and phone number.
For explicit feedback, a binary or graded relevance system is usually used to obtain the users explicitly indication of the information's relevance. By using binary relevance feedback system, users are required to indicate whether a web link or a document is relevant or not for a given query. Additionally, by using graded relevance feedback system, users are required to indicate the relevance of web link or a document on a scale using numbers, letters, or descriptions. And the results are presented in descending order.
The retrieval performance is improved by employing these techniques, but the experiences show us that explicit relevance feedback techniques can not significantly enhance the user model especially when the interface is not good enough to clearly represent the obtained information. What's more, usually, users are unwilling to spend extra time and effort to explicitly describe their needs or modify them by all sorts of feedback, which always make the users confused, resulting in time and effort wasting.
Additionally, statistics indicate that users usually use simple and easily organized key words to start retrieval process on the Internet, rather than "wasting" time to modify their query key words . The explicit feedback techniques seem not very effective due to the requirement of extra time and effort during the searching, which leads to more load on the users, and the improvement of the search result is not guaranteed.
Furthermore, most of time, the users can not provide the best query keyword, since they usually do not know how does the back-end search system work. In addition, part of users do not like to spend time on manually updated their feedback. Instead of obtaining information from users' explicitly specified queries, an alternative way to improve the performance of search results is to develop algorithms that obtain users' demands implicitly.
On the other hand, for implicit feedback, in order to enhance retrieval performance, the users' query is expanded by the feedback algorithm with words which do not match with the original key words, but relevant to the original ones. The words that are used to expand the original query are obtained from documents which have been read previously by the users.
To keep users from explicit judgment, users' actions and events are monitored by the implicit feedback system. There are mainly two ways to collect the users' profile or relevant information in an implicit feedback search systems: obtain them on the server-side, such as browsing histories and web access logs, or record them on the client-side, such as browser cookies and mouse clicking or keyboard typing. There is a comparison between server-side and client-side as shown in Table. 2.2.1. Bharat et al.  proposes an approach in Web-based newspaper domain. Basically, this approach tracks users' actions and updates users' profile on the backend to provide implicit feedback. In this proposed approach, the actions such as scrolling or selecting a specific document mean the current user is interested in the current hit. And the implicit feedback system will keep adding score to this document. After it's bigger than the max score which is set by the system, the user's profile is updated according to the information in this document. The weight of the key words in the users' profile will be changed according to the documents the users clicked or used. When a new document which includes some of the keywords recorded in the profile shows up, it gets a score and is included in the personalized information.
Table 2.2.1. Comparison between profile stored in server-side and client-side. The information exchange between the server-side and client-side is mutual.
Access to rich Web/group information
Personal data stored by someone else
Need to approximate Web statistics
The implicit feedback techniques seem doing a big favor for the users. However, this may cause a problem, since sometimes users just want to have non-specific general information, which requires a low-level of personalization, but the personalization system usually filters out information that is considered unrelated to the topics recognized in previous information read by users. The details of the problem are discussed in Section 4.2.
Basically, the keywords used by users, users' actions and events are employed by the feedback system to improve the ranking performance of the retrieval results. Meanwhile, in order to make the query more clear and precise, the users' profile are built and updated by the explicit and implicit feedback system as shown in Fig. 2.2. The details of these approaches are described further on in this paper.
Fig. 2.2. The users' profile are built and updated by the explicit and implicit feedback system.
2.3 Differences of implicit and explicit feedback
With a variety of approaches, implicit and explicit search on the Web is gathering more and more emphasis recently. It's difficult to find an accurate measurement to distinguish basic principles and techniques. A proposed organization is shown in Table 2.3.1, where the implicit and explicit feedback technique approaches are sorted by the type of feedback, the technique based on, and the typical input data during the profiling.
Table 2.3.1. The users' profile are built and updated by mainly two kinds of technique, explicit feedback and implicit feedback. The automatically expended query keywords are subtracted from resources selected by the users, or documents browsed by the users so far.
Technique based on
Typical Input Data
Adaptive Result Clustering
selected clusters in taxonomies
docs, emails, Web pages
past queries, selected pages
2.3.1 Advantages and disadvantages of explicit feedback search technologies
With explicit search technologies, the user is in the driver's seat. Within the limits of the available tools, users can customize and filter resources on a site in the manner that best suits their needs. The first approach, discussed in Section 3.1.1, is based on explicit feedback techniques. Normally, explicit feedback employs a vote system assigned to classify the retrieved results. This technique is very useful whenever the user can not correctly formulate a query, because users do not have to find an accurate keyword at first, but they can use a general keyword to find general results and then analyze the sorted results and use the hits that are mostly relevant to what the users are looking for.
However, a single query usually generates millions of hits to Web pages, which is always overwhelming for the users to go through the all of Web links retrieved by the search engine to get the information they are seeking, although sometimes users may find the documents or webpage at the top of the result lists. In order to solve this problem, a variety of clusters are employed to group the query results. The results with same topic are grouped into the same cluster. Therefore, users could get more organized retrieval results and find interesting information more easily. Most of the time, the clusters containing the web hits with the same topic are detailed partitioned or organized into a hierarchical tree structure. During the clustering process, the score of each document is calculated upon the given query and the hits with the highest score are returned on the top of the list. The reason for the search system is considered as explicit feedback search system is that, driven by their own needs, the users' explicit actions, such as going through the returned cluster results and clicking the interesting documents, is involved in the information retrieval process. This kind of Adaptive Result Clustering is investigated in Section 3.1.1.
2.3.2 Advantages and disadvantages implicit feedback search technologies
With implicit search technologies, it is the site itself that adapts and conforms independently to the needs of the user. It can automatically adapt its resources to the needs of a user based on the user's profile and prior behavior. The approaches based on implicit feedback are discussed in Section 3.2.1 and Section 3.2.2, where users do not need to explicitly tell their preferences or demands. Client-side application captures user actions and events, such as browsed websites, edited files or emails saved on local drives. It seems an impossible task for the search system itself to find and analyze all of these information, which are very useful to understand the user's current working context and these information can be used to modify query keywords or infer the user's interests. Implicit feedback technologies based on the users' profile and current files or web pages exploit this information to infer the current user desired information, which can be employed to search documents related to the user actions and events. If the feedback is limited to the Web Search History, the related search systems are distinguished from the previous classification. Without any client-side application requirements, search engines can access this information for each user. Query histories of users, returned information in the query results, documents chosen by the users, anchor text, topics relevant to a given Web page, or data such as go over rate, browsing style and time of page visits, are easily recorded on server-side. In addition, the implicit feedback search process can be executed in the traditional information retrieval style, which means a faster response.
However, the implicit feedback system does not have the access to the users' data or usage data all the time. Even if it has the access all the time, the raw data may also need to be filtered or organized. In this case, it would be better to use explicit feedback techniques rather than use the implicit feedback techniques.
3. Analysis of explicit and implicit feedback search techniques
3.1 Explicit feedback techniques
3.1.1 Adaptive Result Clustering
Adaptive clustering uses external feedback to improve cluster quality; past experience serves to speed up execution time. Adaptive clustering supports the reuse of clustering by memorizing what worked well in the past. It has the capability of exploring multiple paths in parallel when searching for good clusters . When using this technique, the users' effort is explicit involved, such as scoring the interesting result cluster or marking the interesting result cluster.
In traditional way, search engines take the query provided by users and return the thousands of results in an endless list ordered by the similarity score between keywords and Web document. To find the relevant information, users have to review the contents in the list one by one, go over the titles and the textual abstracts snipped from the Web pages. Usually, this process will take quite a long period, even the users can submit to the search engine a very clear query.
In order to solve the problem of the traditional search techniques, users' extra action are involved. Adaptive result clustering technique is able to cluster the search result and improve the cluster quality by requiring users' external feedback. To make execution time short, adaptive result clustering records the query history and reuse the information which worked well in the past. Additionally, adaptive result clustering supports parallel searching.
Adaptive result clustering technique is adopted by CLUSTY and KARTOO, two Web search engines. The query results are grouped together if they have the same topic. Using "folders and subfolders" to separate the query result, the former one is based on the VIVISIMO clustering engine. On the other hand, the later one organizes the query results using a graphic interactive map. When the mouse pointer is moved over those results, a brief description appears. The brilliant part is that, the more relevant of the result, the bigger of the icons' size. Unfortunately, in January 2010 KARTOO closed down, removing all content from the KARTOO websites, but leaving a small message in French thanking its users for their support.
Most of the time, since the query result is clustered after the retrieval process, it is vital to make this process very fast. In order to achieve this goal, the document snippets are extracted instead of whole documents in the Web information pool. It's not necessary to pre-define the categories. To retrieve the most matching information, concise and accurate cluster descriptions are required finally.
3.2 Implicit feedback techniques
3.2.1 Implicit Feedback Based on the Current Context
Just-in-Time IR (JITIR), a method based on the implicit feedback techniques proposed by Rhodes , makes use of the users' context of current documents or websites to feedback search information. Basically, in order to save the extra effort from users, the users' interaction with the Web application, such as interacting with a file processor or using Internet browsers, is monitored. The users' needs can be automatically identified and useful documents will be retrieved. Plenty of data sources, such as pre-indexed databases of files or e-mails, are used during the retrieval process. To improve the search performance, Google Alert is combined by JITIR approach, by taking the advantage of the activities inside the users' local working context. No matter what kind of current events the users have, information related to predefined sets of topics are pushed to the users by alerting. The users' profile is dynamic and kept updated according to the changes of the local working contexts. By this way, JITIR offers the most updated information to the current user.
One reason for JITIR method is very powerful is that it has a server-side system Savant, which is shared by the agents in the user-side. So, the resources can be reused by multiple users, which make the Savant very efficient for retrieving resources. As users query with the search system, search service for all of the agents in these user-side is provided by the server-side Savant. The public information, such as healthcare information or programming language tutorial, or personal e-mail and notes is pre-indexed in the database. The pre-indexing information is very helpful for the later query process. By comparing the information extracted from the currently opening page and the stored pre-indexing information one by one, the measure score is generated.
Watson [6, 7] is one typical instance of the JITIR approach. In order to predict the users' desired information and provide them in an accurate manner, the users' events and the documents that they are reading or editing are under surveillance. The parallel surveillance is executed by Watson to monitor different applications in different windows, such as Internet Explorer, notepad, and the Microsoft Word processor. Relevant information is targeted and separated context for each opened window is tracked when the user's actions goes on. According to the learned information by the system, special queries are generated and a variety of resources are used to get the more accurate search results.
Fig. 3.2.1 Users' activities are monitored and special queries are sent to special information sources according to the monitored information.
A number of sources are adopted by Watson, such as the search engines YAHOO! and DOGPILE, news sources like Reuters and the New York Times, Blog sites, e-commerce sites and so on so forth. Then specific information is extracted from the application context by application adapters as shown in Fig. 3.2.1. For instance, if the user edits content in a notepad application, an updated information representation of this action and context is triggered by the keyboard events. In the meantime, the representation is extracted into a query to be submitted to a proper information source by the information adapter. The current actions and events play a very important role in the choice of which information source should to query. For instance, if the user is browsing a healthcare website, a special search engine on healthcare topic is chosen by Watson agent.
3.2.2 Implicit Feedback Based on Search Histories
Personal search history is an important type of personal information, from which we can learn users' interests and information needs, thus improve the search service for the users. In order to gather the information needs of the users, the query history is a very important source without any doubt. Without using any applications either on server side or client side, a search system can efficiently obtain and process the search history of the users. For this reason, a search system can modify the search results according to the users' previous queries . The traditional means based on IP or last access times is not considered as very accurate method to customize the search process, but by using log-in forms and web cookies, the user and the related information can be identified . A good case in point is, if the user submits a vague query, such as "Java", it is not clear if the user is looking for the programming language or Java Island. The web browser history could be very helpful to identify which one is the target. If the user has recently searched for python, a Java query should be more likely related to programming language. If based on query history, there are two groups for search approaches to be organized. With offline approaches, actions history is preprocessed, such as checking documents visited before by users. With online approaches, real time data is captured and users' most updated interactions are considered as very important factors for customizing the query. Each approach has its own merits. The former one is able to integrate more sophisticated algorithms due to the less urgent time constrains. On the other hand, the latter one can offer most updated suggestions.
Fig. 3.2.2. The search history and actions of user [email protected] is recorded by Google Personalized Search. This record is used as implicit feedback resource.
Online implicit feedback approaches are covered in this paper. There are several typical online implicit feedback approaches proposed by people. Basically, the Google implicit feedback system records all of the queries and the Web pages the users have clicked from the results, as shown in Fig. 3.2.2, and organize an representation model of users' needs. When the search begins, the results are ordered on the grounds of the feedback score, the more related to information the user queried before, the higher score is tagged. As the retrieval record contains the more data which can be analyzed, the more related results can be generated. However, the algorithm detail is classified (very limited information can be found in the file ).
Misearch is another system using implicit feedback techniques put forward by Speretta and Gauch . In order to improve the search performance, the particular profiles are created for the users by organizing their search histories and clicked query results. These intentionally created profiles are used to re-order the results retrieved by an external search system by giving more emphasis on the documents relevant to topics included in their users' profile. In this approach, users' profiles are recorded as weighted concept hierarchies. If the search results are clicked by users, the query keywords, the titles and abstract of visited links are recorded. Then the most relevant collected data is identified by a classifier, marked with higher scores. When user inputs a query and clicks search, a function will be activated to calculate to what extent that each of the result abstract j is similar to the user profile i:
where wpi,k is the weight of the concept k in the user profile i, wdj,k is the weight of the concept k in the document j, and N is the number of concepts. The final weight of the document used for reordering - so that the results that best match the user's interests are ranked higher in the list - is calculated by combining the previous degree of similarity with GOOGLE's original rank, using the following weighting scheme:
where Î± gets values between 0 and 1. When Î± is 0, conceptual rank is not given any weight, and the match is equivalent to the original rank assigned by GOOGLE. If Î± has a value of 1, the search engine ranking is ignored and pure conceptual match is considered. Obviously, the conceptual and search engine-based rankings can be blended in different proportions by varying the value of Î±.
With the purpose of verifying performance of user profiles made from queries and snipped abstracts, some investigation has been done, the detail information can be found in file . GOOGLE's original rank is used in the investigation to compare with the conceptual rank based on the profile. Thirty queries were used and six persons were involved in this investigation, and the rank of the search result was improved more than thirty percent. The detail is included in file . It is possible for the search engine to retrieval accurate results with vague query words.
There are two main advantages of misearch. First, no software or proxy servers are required. Secondly, high accurate search result can be obtained with vague keywords. What's more, this approach takes both of the Google rank and users' context as measure to calculate the order of the results, which contributes a lot to the high accurate retrieval results.
4. Comparison of the explicit and implicit feedback search techniques
4.1 Comparison between explicit and implicit way:
(1) Explicit feedback techniques
User shares more about query intent
User shares more about interests
Hard to express interests explicitly
(2) Implicit feedback techniques
Query context inferred
Users' profile inferred
Less accurate, needs lots of data
Both of explicit and implicit feedback techniques have their own merits. The advantage of explicit feedback techniques is that the information is straightforward. After the user selects a related query or chooses a category among the presented results, the search becomes much more accurate. On the other hand, the major advantage of implicit feedback techniques is that it requires no extra effort from the user. If implemented appropriately it can be very effective and efficient in providing accurate results to the user.
However, both of explicit and implicit feedback techniques are not perfect. The major drawback of explicit feedback techniques is that it requires additional effort from the user. There is too much information on the Web and they often don't want to make extra effort even if it improves the search performance. Another shortcoming of explicit feedback techniques is that a user might not be able to fully express what they are looking for. If user is using a search engine with categorization, he or she might end up in the wrong category before knowing enough information to select the correct category. On the other hand, the major disadvantage of implicit feedback techniques is that it can sometimes make incorrect inferences, especially when the data is not sufficient or when the user is in distinct situations. It's very likely that users' short term information needs might be different from their long-term interests. For example, if the user is a developer and working on software development and searches for "Java", the user is interested in programming language for long term interest. During the holiday, however, the user may want to take a vacation and use the search engine to retrieve the travelling information on the Java Island.
4.2 Current explicit and implicit feedback techniques are not optimal
The author thinks that the major issue of current explicit and implicit feedback techniques is that most approaches are uniformly applied to all users and queries. However, queries generated by distinct users at different time should not be handled in the same manner due to the following three major reasons:
(1) Low effectiveness on specific queries. Not every query needs feedback technique to improve the performance. For instance, it could be very helpful to consider user's profile, if the user is querying about "Java" and there are plenty of information for both of programming language Java and Java Island. In this situation, the feedback techniques bring serious benefit to the user. On the contrary, it's not very helpful to use feedback techniques if the user is querying "Microsoft". In this case, the hit that represents the homepage of Microsoft will be selected by most of the users, maybe all of the users. For this reason, it will be not effective to use feedback techniques.
(2) Misled result by using feedback techniques based on user profile. The user profile used by feedback techniques may mislead the query result to wrong direction, which harms user experience. Sometimes users are looking for documents to satisfy short-term information needs, which may be inconsistent with general user interests. In such cases, long-term user profiles may be useless or harmful to the query results and short-term query context may be more useful. For example, when a software developer submits the query "Java", he or she may not be seeking information on programming language, but may be seeking help on information about Java Island. In this case, if interest-based feedback techniques are involved, the user may be confused by many irrelevant results which are moved to the front.
(3) Different effectiveness. It's tough to draw interests of users who have done few searches. But it seems much easier to provide precise results for a user who has done lots of searches.
The Web provides an extremely large and dynamic source of information, and the continuous creation and updating of Web pages magnifies information overload on the Web. And timely and accurate information plays a vital role for everyone who wants to achieve both of his or her goals on personal and professional level. Since implicit and explicit feedback search techniques on the Web are possible solutions to this problem, much emphasis has been put on this field recently. In this paper, author has presented a comparison of implicit and explicit feedback search techniques on the Web, concentrating on several of the most interesting and promising techniques and approaches. Both of implicit and explicit feedback search techniques have great potential for improving the search experience, and even more so if used together. For the majority of queries, implicit and explicit feedback in any form is unnecessary. The search engine has essentially replaced URLs and bookmarks for navigating to websites and feedback techniques does nothing for these navigational queries. If the user doesn't end their search session with the first click, implicit Real-time feedback techniques should begin working with the user to help them find what they need. At the same time, explicit feedback techniques options should be made available to further help the search. Different searches will require different techniques. The demand of accessing to all of these techniques will increase as the volume of web content grows. Search experience can be greatly improved by these techniques. Another point that worth mention is that implicit and explicit feedback search techniques should empower the users with the right to choose one type or the other according to the usage context. In sum, there should be more flexibility for users who are using no matter explicit or implicit the feedback techniques. The users should be put on the driver's seat and endowed with the control of the system behavior by managing the implicit and explicit feedback search techniques they want to be applied to their results and a control bar is developed to reduce the effect of implicit and explicit feedback search techniques.