Nowadays, Web becomes vital collection of huge information sources and also influences our daily lives and business transactions. As a result of having massive information on the Web, how we find the information we targeted? The major tools for searching and retrieving the information on the Web are search engines. Having knowledge of which search engines should choose is the important role to get the useful information. This paper will discuss the key features of various search engines, mainly on Web Search Engines- to search for information on the Web and FTP servers and results are presented by lists and are often called hits, selection-based search engines- user raises a search query using only mouse and it allows the user to search the internet for more information about any keyword or phrase contained within a document or Web page, and meta-search engines- sends user requests to several other search engines and/or databases and aggregates the results into a single list or displays them according to their source. Next, we will also discuss strength and weakness of these search engines.
Web Search Engine, Meta Search Engine, Selection-based Search Engine, Search Engine, Metacrawler, Crawler, Metatags
In the information era, Internet is the most significant information sources. If we want to find or know something on the Internet, firstly, we will go to particular pages or links that support search engine features, this mean that we don't have to know the exact address of the information source that we are looking for. Nowadays, we can use various features of search engines according to our information need and convenience. In this paper, we would like to discuss about the Web Search Engine, Meta Search Engine and Selection-based Search Engine.
Web Search Engine
A Web Search Engine is a search engine that provides to search information and resources on the Internet. When we give a query to the Search Engine, it will go through its own databases of indexed information and bring back the relevant information content. The information content may be various such as articles, blogs, images and videos which are presented according to their ranks called hits which are built by computer programs.
Most of the search engines provide huge indexes that can retrieve large amount of information resources from different portions of the Internet. The developing of well-designed search engine software can provide us to get more precise information that we are looking for.
Many good Web pages and search engines are appearing in every day. However, the search results are still producing lists of irrelevant Web pages which can cause user to misunderstanding, misleading and misuses.
Background of Web Search Engine
A very first and successful "full text" Web Search Engine was Web Crawler which had launched in 1994. Lycos, Infoseek and OpenText appeared as its competitors within a year. Alta Vista and Excite appeared in late 1995. HotBot and Northern Light came along in 1996 and 1997. Google in 1998 and many others searches engines are appearing up to now. Nowadays, there are too many kinds of search engines available to choose but we are not able to depend on Web Search Engine for heavy-duty searching technology that found in online services such as DIALOG and LEXIS-NEXIS.
Structure of Web Search Engine
Nowadays, most of the search engines are providing portal features. The idea for portal features is that there is home page, so called Gateway, for a particular Web Search Engine that provides most frequently needed information, tools and links. This idea will save time and effort for users from having to look in various different sources and places.
The search engines can be divided by five functional parts:
Crawler: It is also known as spider. Web crawlers go through to the Internet and produce a copy of all visited pages for next time search query and create index to maximize searching
response. Crawlers send content information of the pages to the search engine database. Most of the search engines crawled more depth information and more frequently that the sites have more popularity or more access by users.
Database: Database is a collection of information of sites or pages which have been identified by crawlers. The information is come not only from crawlers but also from Web page publishers. Most of the search engines offer features to add our contents of page or URL to their database. For example, we can add our home page URL at Google search engine's database at "http://www.google.com/addurl/?continue=/addurl". Google crawler has ability to find the directories of pages so we just need to add our home page or top level page and the rest of pages or low level pages will be automatically added by Google crawler.
Indexing program: It is an important function of search engines that can determine the performance of your search engines. Indexing program is a program which can create indexes, of the information of Web pages using by appropriate structure and methods and this indexes will be stored in a database. When you enter a query, the indexing program will go through its database to find appropriate index records.
Metatags are useful information for indexer. Metatags can be words, phrases, or sentence that can be placed in behind of the Web pages. We would not be able to see Metatags when we view pages, but we can see by clicking "view source" menu from browser. Most of the indexing programs of search engines try to index all of the keywords from every page.
Simply designed index program will help to get first response of retrieving data. Indexing will also help to reduce the overall workload of the search engines as it has quick link to get the data.
Indexing may take space of the databases and may effect databases speed but we can recover from this situation by applying some rules and limitations.
Figure1: Metatags from NTU home page
Retrieval engine: It is a program that communicates between user queries and database indexes to deliver the result to user. When the retrieval engine gets a query from the user, it will try to identify the matching records by using the respective algorithm. After that, the retrieved information will be sorted according to their weighted term by search engine before display the result page to the user.
Graphical interface: It performs an interactive layer between the user and the search engine. The interface receives query from the user, and pass this query to the search engine for retrieval process, and then gather results from the search engine and present to the user. Well-designed interface is important for the search engines as it is only one that has interaction with the user. It is difficult to attract the user to use the search engine if the interface of search engines is not well-designed although their database is very powerful. In real world, the interface provides various portal features to attract the users such as media, advertisements and entertainments.
Figure2: Web Search Engine and its portal features (Yahoo)
1. Search Option
2. Query Box
3. Member Sign In
5. Entertainment and Portal Features
6. Video News
Meta Search Engine
Meta Search Engine is a kind of collaboration tools that can collect user queries and transmit them to several search engines. Meta Search Engine merges the results which are collected from several search engines into a single ranked list and display in single unified interface to the users. Meta Search Engine does not have own database and it contains various search engine components and so it takes times to send each user query to every component. To save time and to get the most relevant user query results, Meta Search Engine provides database selection. In database selection, it stores the most common useful information for each search engine. When a user query is received, the database selection matches user query with previous defined information and then decide the most useful search engine for conducting the query.
Structure of Meta Search Engine
Meta Search Engine contains five components: User Interface, Database Selector, Document Selector, Query Dispatcher and Result Merger.
User Interface: User can type their query in user interface. It can also support interactive query modification and managing the display query result based on the structure of Meta Search Engine.
Database Selector: It tries to choose the most relevant search engines to provide the precise information for user query.
Document Selector: Based on the database selector's outcomes, it selects the appropriate documents.
Query Dispatcher: Query dispatcher connects with search engines and then transfers the query by HTTP (Hyper Text Transfer Protocol) method.
Result Merger: The role of result merger is to combine the various results from different searchers and then show these results according to their ranked lists in user interface.
Figure3: Architecture of Meta Search Engine
Features of Meta Search Engines
According to Randolph Hock, there are different features in Meta Search Engine. However, most of the Meta Search Engine has following common features:
Automatic or User-controlled
Field (Date, Title, URL, Domain, etc.)
Search From (Query box, Date range box, Language window)
There are many kinds of Meta Search Engine on the Web. Among them, we focus on Metacrawler to describe the features of Meta Search Engines in our paper.
MetaCrawler was launched in 1994 and it was developed by graduate student Erik Selberg and Associate Professor Oren Etzioni from University of Washington.
The lists of search categories included in Metacrawler are Web- use search providers such as - Google, Yahoo! Search, MSN Search, Ask Jeeves, About, MIVA, LookSmart, etc., (1)Image- results of image files are collected from Yahoo! Search and Ditto, (2)Videos- related video files are provided by Yahoo! Search and Singingfish, (3)News- ABC News, Yahoo! News, FoxNews and Topix providers are used to find the News, (4)Yellow Pages- Yellow book provider provided to find business listings, (5)White Pages- to find information about people, we can use Acxiom provider.
The search results retrieved by Metacrawler rely on the collection of top commercial (sponsored advertising) and non-commercial (algorithm). These results come out from the most useful search engines on the World Wide Web. Metacrawler's technique collaborate the top documents from different search engines depending on the user query. The combination results for a search query base on the user requirements. The results are generally arranged by relevant order, displayed with Sponsored Link and also can be known the sources where the information comes from. For example, when we find for "information retrieval system" in Metacrawler, the results will be covered for different areas like as non-commercial Web pages, the results will be weighted on more academic view.
Advantages of Meta Search Engine
Unlike the single search engines, Meta Search Engine gives high performance because it sends user query to different search engines simultaneously. In addition, Meta Search Engine saves time because it contains time out period limitation mechanism that checks the response time against with user queries within a certain period. And then that mechanism will neglect the search engines which do not respond in time. As the huge amounts of documents are available on the Web, the mechanism cannot find the relevant results. Meta search engine can cover the scope of this problem. Moreover, Meta Search Engine does not show the results that are duplicated. According to the Javed A. Aslam and Mark Montague, there are some following advantages:
Improve Recall: Recall is the ratio of retrieved relevant documents to total relevant documents.
Improved Precision: Precision is the ratio of retrieved relevant documents to retrieved documents.
Consistency: Current Web Search Engines often respond to the same query very differently over time. However, fusing- averaging procedure in Meta Search Engine that can solve the above problem.
Modular Architecture: Meta search technique that allows user query decomposed into smaller parts (modules) and then parallel and collaborate executed.
Disadvantages of Meta Search Engine
Meta Search Engine does not have its own database and they send user query to other search engine databases. Therefore, sometimes it can wrongly choose the databases to get the relevant results. Another problem is associated with the selection of document. As there are many documents available on the search engines to determine the suitable result, the cost of computation and communication can be expensive in the Meta Search Engine. Some Meta Search Engines cannot support the larger search engine. Mostly, Meta Search Engine retrieves only 10 to 20 records per search engine. Therefore, it can miss the most relevant documents if this document does not contain in this specified range. Although some Meta Search Engine allows searching by Title, URL, some does not support these methods. Moreover, sometimes it cannot find the results even the user uses the simple syntax. In some cases, Meta Search Engine shows the paid documents first.
Selection-based Search Engine
Selection-based Search Engine is a tool that allows user to search information easily in various sources and Web services. When user raises a search query using only mouse, the engine will support the user to search more information on the internet, within a document or Web page instead of opening Web browser, navigating search engines, keying keywords or phrase and clicking hyperlinks. In Selection-based search, user just needs to select the texts which are from their context of application such an Internet Explorer and then user can view information with pop up information boxes without moving any other locations or links.
The search engine will not compile the databases and Web links on the user's computer, it just passes to the online cloud services, which are Internet based computing to share information and resources. The results will compile and present in a uniform way by using a specific algorithm. The result will also present in various ways of opening new windows, tabs or floating boxes. This feature will save user time and effort from going around the massive Web pages and application to find information.
The main difficulty for the Selection-based search is that its various categories become unmanageable for the user. Thus, the Selection-based search systems need to categorize user selected text and search an appropriate online service which is the most suitable for the selected text. For instance, when the user selects title of book, the system supports to identify the selection as the most appropriate online book store.
"Internet Explorer 8 Accelerators", "GoFor-It" and "Kallout" are the most significant tools for Selection-based search. "GoFor-It" provides more features than "Internet Explorer 8" and it is more easy to use. However, both tools are only provided in Internet Explorer. "Kallout", is more convenient for the Selection-based search which supports Internet Explorer, Firefox, Microsoft Word, Microsoft Excel, Microsoft Power Point, Microsoft Outlook and Adobe PDF.
Figure4: Internet Explorer 8 Accelerators (Selection-based search)
To conclude, based on the result of evaluation, we have realized that the usefulness of the search engines. Moreover, we have gained knowledge regarding with the structures and features of search engines. Although we can search the information whatever we need in the Web with the aid of search engines, we should have general knowledge of these search engines. We believe that after reading this paper, reader will get some knowledge of the features of our evaluated search engines and also could help the reader to understand how the searching strategies work.
List of References:
José A. Olivas. (2008). Fuzzy Sets and Their Extensions: Representation, Aggregation and Models. Studies in Fuzziness and Soft Computing, 220, 537-552. doi : 10.1007/978-3540-73723-0_27
Randolph Hock. (2001).The Extreme Searcher's Guide to Web Search Eengines (2nd ed.).United States of America, US: Thomas H. Hogan, Sr.
Randolph Hock. (2010).The Extreme Searcher's Guide to Web Search Eengines (2nd ed.).United States of America, US: Thomas H. Hogan, Sr.
List of Web References:
Chris Smith. (2006) Introduction to Database Indexes - Web Design Articles and Tutorials
Retrieved 10/October/2010 from:
Web Search Engine - Wikipedia, the free encyclopedia
Retrieved 15/October/2010 from:
Search Categories - MetaCrawler
Retrieved 20/October/2010 from: http://www.metacrawler.com/metacrawler/ws/categories/_iceUrlFlag=11?_IceUrl=true
Nanyang Technological University, Singapore - Global University of Excellence (Source Page)
Retrieved 28/October/2010 from:
About NTU: President's Message
Retrieved 28/October/2010 from:
GoFor-It Download: http://gofor-it.com/downloads.html
Kallout Download: http://www.kallout.com/index.html