The efficiency of some search engines

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Abstract

This paper presents an attempt to show the efficiency of some search engines in handling Arabic keywords. To achieve this, a comparison was made among the number of retrieved pages, retrieving time, and stability (in both the number of retrieved pages and the order for each retrieved page) for each of the 20 Arabic keywords selected (with its roots) after being simultaneously entered to the selected four search engines. Search engines tested in this experiment were Google, Yahoo, Al-hoodhood and Ayna. Google was the best search engine among the four selected search engines according to the results obtained over 10 weeks of experimenting.

Keywords: Search engines, World Wide Web, Internet, Information retrieval,

1. Introduction

Arabic is a language that is being increasingly used by Internet users despite many significant problems. First time users face many difficulties when trying to read Arabic web sites. In major part, these difficulties arise from the way of representing Arabic in multiple character sets and the characteristics of the Arabic script itself [1].

With the extremely rapid growth rate of the Internet and the spread of textual information in a whole host of languages other than English on the web, retrieval of documents in these languages is becoming problematic more and more. Rules, theories, algorithms, and retrieval methods designed and developed for English and other morphologically similar languages may not necessarily apply in different linguistic environments. In the context of languages that differ fundamentally from English in morphology and word-formation rules, the problem tends to be even more exigent. Being the essence of written and spoken information queries, words are hugely the most important elements of expression and are the building blocks of meaningful information exchanges [2].

Initially, most of the available electronic databases were in English. Search and retrieval software, indexing methods, and user interfaces were designed specifically for this language. Since English is no longer the sole language used on the Internet, Information Retrieval (IR) systems have been developed for languages other than English. In tandem, search engines were being progressively modified to handle these languages [3][4][5]. Traditional IR environment and popular search engines face real challenges when Arabic is the language used due to the radical differences in morphology and words formation rules between Arabic and English. These rules are based on a root -and - pattern system that has been long thought to be a major factor in hindering IR operations. Finding all possible words that have a common Arabic root might not necessarily lead to better IR performance. Despite the use of advanced word stemming and root extraction algorithms in Arabic IR field, researchers still fail to answer many questions [6]. This paper investigates the handling of Arabic words in English and Arabic search engines. Retrieval environment is represented by Google, Yahoo, Al-hoodhood, and Ayna. Also, it presents specific approaches to assessing stemming and root-based retrieval methods to lodge the peculiarities of Arabic word formation rules within the skeleton of this environment. The following section will briefly present the information retrieval. Search Engines will be described in section 3 and the implementation that has been conducted will be dealt with in section 4. Experimental designs and their results are discussed in section 5, while section 6 gives the concluding notes of this work. Finally, some suggestions for future work will be highlighted in section 7.

2. Information Retrieval (IR)

Up until the 1990s, efforts of specialists in Arabic computing concentrated on presenting the language in a computer environment and finding solutions for display and coding problems. In the early 1990s, interest in Arabic IR became visible and research was conducted on the automation of Arabic online library catalogs and on IR issues [6]. IR involves many strategies each of which comes with its own features that can be used to retrieve information efficiently. Boolean Search, Serial Search, and Cluster-Based Retrieval are among these strategies [7].

Compared to English, redundancy in Arabic was assumed to be higher, because Arabic words are derived from roots according to certain patterns, depending on fixed rules, in addition to suffixes, prefixes and infixes [3]. Also by comparing the results with these from research on English, Arabic was found to have a greater redundancy, and the average word length for Arabic is greater than English, making Arabic potentially more compressible than English [6].

Root indexing was used to index Arabic documents because root indexing increases recall and circumvents composite problems created by Arabic morphology. A root index term would retrieve all variations of this root and abolish the need to use complex queries while searching [8].

3. Search Engine

Search engine technology has to advance hugely in order to keep pace with the web growth. Examples of web growth include the increased number of web pages, documents and web queries posted on the Internet [9] [10].

It is not easy to evaluate the information retrieval system for the World Wide Web (WWW) environment. The difficulty originates from the lack of standard test data and it can also be attributed to the highly subjective nature of the conception of relevancy of WWW pages retrieved in relation to the user's information needs [11].

Precision is always reported in formed information retrieval experiments. However, there are variations in the way it is calculated depending on how relevance judgments are made [12] Search engine stability problems were investigated in several studies performed by Bar-IIan, and several measures to evaluate search engine functionality over time were outlined in these studies [12]. Bar-IIans' measures are based on the technical relevance concept which is the document defined to be technically relevant if it fulfils all the conditions posed by the query [13]. For the purpose of updating search engines, a tool, generally called a spider, is used. Spiders clean hundreds of thousands of pages a day. To find information independently, many spiders also track the links on a page hence it is possible for a spider to index a web site even if that web site was not submitted to the search engine [14].

A search engine such as Google is designed to avoid disk seeks whenever possible, and this has a substantial effect on the design of data structures [15] [16]. In Google, several distributed crawlers do the web crawling (downloading of web pages). Web crawling is the backbone to the search engine. There is a URL (Uniform Resource Locator) server that sends lists of URLs to the crawlers to be fetched. Fetched web pages are then sent to the store server which then compresses and stores the web pages into a reservoir [17].

They may only use a small database from which to create a set of results to the users (Yahoo for example only indexes a very small proportion compare to a billion pages indexes by Google) or they may not be updated particularly quickly (All the web is updated every fortnight or so, while Google is updated monthly). These spider programs may not be very fast, which means that their currency might not be a real reflection of the state of play on the Internet [8] [18].

4. Implementation

In this research study, four search engines were selected in order to sustain a good comparison for Arabic keywords. Of the search engines selected, two are general search engines (Google [19][20][21][22] and Yahoo [23]) while the remaining are Arabic language search engines (Al-hoodhood [24] and Ayna [25]) that employ stemming and root indexing. These search engines were chosen because they are broadly used as general search engines. The test included using these search engines to search for a specified word, search for a specified word by its root, and then evaluating the stability of each search engine in terms of the number of retrieved pages and the order of each one. Search was designed to compare the performance of Google with Yahoo, Al-hoodhood and Ayna, and evaluate stemming as an alternative to root retrieval. Experiments were conducted using a computer with 1.7 GHz processor, 256 MB RAM, and windows XP operating system.

5. Results and Discussion

This study has been conducted in two phases. The first part was intended to determine the speed of loading results. In this phase, after selecting twenty different Arabic words (each with its root); each word was entered as an input in the four selected search engines simultaneously. A record was kept for the total number of pages resulted and the retrieval time. Table (1) shows the selected (20) words which were entered simultaneously to the four search engines and the number of results from each search engine with the relative time spent for searching and retrieving the results.

This process was repeated for the roots of the selected words (as shown in table 2). The purpose of this phase was to maintain a good comparison between the selected search engines in the number of retrieved pages and time. To achieve this, the total number of retrieved pages (Total-Pages) was calculated by summing up the number of the retrieved pages for all the entered search keywords. Similarly, the total time of retrieving (Total-Time) was calculated by summing up the time required to retrieve each keyword. Then, Total-Pages were divided over Total-Time and the results obtained were collated in an ascending order for the four search engines to know which of the selected four search engines is faster in retrieving (the first one is faster than the second and so forth).

In the second phase of the work, five words were taken out of the selected 20 words with its roots. Each of the five words and its root were entered simultaneously into the selected four search engines and the process was repeated for ten weeks. A record was kept for the first twenty retrieved pages resulted for every week of the ten weeks period. The retrieving time was omitted at this part of the study as the aim of this phase was to compare the selected search engines from the results retrieval stability standpoint. Table (3) illustrates the stability of each search engine in terms of the number of the retrieved web pages for each word of the selected five words. Whereas tables (4 and 5) and figures (1 and 2) show the stability of each search engine in terms of the order of the retrieved web pages for each word of the selected five words. Figures in tables (4 and 5) were calculated by taking the first twenty pages resulted in the first week as a measure to assess how stable the search engine was in retrieving the same web pages in the coming weeks. For example, as it is clear in table 4, in the second week, Google retrieved eleven pages from the twenty that were retrieved in the first week; while Yahoo retrieved only four. Al-hoodhood retrieved 20 and Ayna retrieved 13 for the same week. These results underline two points; one is that Al-hoodhood and Ayna are more stable than Google and Yahoo. The other is that Google and Yahoo are more flexible in updating their databases (by adding new pages for the same subject).

6. Conclusion

Analysis of tables 1 and 2 was performed by summing up the results of each search engine and dividing it by the sum of the retrieving time. (Bil, u can delete this green bit because there is no need to repeat the procedure in conclusion section) It is concluded that Google is the best search engine in dealing with Arabic keywords. Yahoo is the second, while Ayna comes third and Al-hoodhood is the last one. The results show that Google is faster and can retrieve a large number of results comparing with others; also they reflect that although there are search engines specialized in Arabic keyword, they still have limited abilities in comparison with the general purpose ones (Google and Yahoo).

Analysis of table 3 revealed that Google is the best search engine when it comes to dynamic update of web pages with stability in dealing with Arabic keywords. Yahoo falls behind Google in the second position to be followed by Ayna which comes third while Al-hoodhood is the last one (no update occurred in Al-hoodhood during the search time). These results clearly show that Google is capable of rapid dynamic updating to its database in a short time compared with other search engines. Similarly, it is easily concluded that Al-hoodhood is the slowest one in that update.

Conclusions drawn from analyzing tables 4 and 5 are compatible with the above and demonstrate that Google is the best search engine in maintaining the retrieval of the same results from week to week with dynamic update of web pages in dealing with Arabic keywords. Again, Yahoo follows Google as the second; while Ayna comes third leaving Al-hoodhood sitting at the bottom as the fourth (no update occurred in Al-hoodhood during the search time).

7. Future Work

Research ideas are plenty in the web search engines' rich environment. There are many issues that need to be looked at when attempting to define new methods to search the web in a more meaningful way. Recommendations to addressing present and future issues in developing a web search are:

  1. Design a smart algorithm to decide what old web pages should be re-crawled and what new ones should be crawled.
  2. Developing a metasearch engine that improves the efficiency of web searches by downloading and analyzing each document and then displaying results that show the query terms in context. This helps users to more easily decide on the relevancy of the document without having to download each page.
  3. For solving Arabic language problems, Unicode must be possible to be handled, which is just one out of several possible encoding sets.
  4. Another important consideration is how the system handles simultaneous search and database updates/indexing in real time. Most current web search systems use some very limited "parallel processing" techniques and replication technology to handle performance scalability issues.
  5. Supporting query refining.
  6. The addition of more search engines together with using additional samples in the experiments.

References

  1. Sanan, M., Rammal, M., Zreik, K., Internet Arabic Search Engines Studies, 3rd International Conference on Information and Communication Technologies: from Theory to Applications. ICTTA, Damascus, 2008, pp. 1-8
  2. BCS British Computer Society, The BCS Glossary of ICT and Computing Terms, Pearson Education, UK, 2005.
  3. Moukdad, H., Stemming and root-based approaches to the retrieval of Arabic documents on the Web, Webology, 2006, Article 22.
  4. Douglas Comer, Computer Networks and Internets with Internet applications, prentice hall international, INC., USA, 2008.
  5. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press, UK, 2008.
  6. Saba Abdul Khaliq Al-Khadady, Internet and Arabic Search Engines, M.Sc. Thesis, Iraq, 2002.
  7. Van Rijsbergen, Information Retrieval, Butterworth, London, 1979.
  8. Khalid Shaker Jassim, Comparison of Efficiency of some search Engines on the Internet, M.Sc Thesis, Iraq, 2005.
  9. Peter R. Monge and Noshir S. Contractor, Theories of Communication Networks, Oxford University Press, INC., UK, 2003.
  10. Stott D. And Moran D., Information and Communication, Springer, London, 2000.
  11. Masimo Marchiori, The Quest For correct information on the web: Hyper search Engines, Depantment of Pure Application Mathematics University of Padova, Italy, 2000.
  12. Bar-Ilan J., Evaluating the stability of the search tools Hotbot and Snap: a case study, Online Information Review, Emerald, Bradford, ROYAUME-UNI, INIST-CNRS, Cote INIST, 2000, pp. 439-450.
  13. Mike Thelwall, The Responsiveness of Search Engine Indexes, Cybermetrics: International Journal of Scientometrics, Informetrics and Bibliometrics 2001.
  14. Mark Levene, An Introduction to Search Engines and Web Navigation, Pearson Education, Uk, 2006.
  15. Danny Sullivan, Search Engine Features for Webmasters [online] available from [5 Dec 2002].
  16. Danny Sullivan, How Search Engines Work [online] available from [14 Mar 2007].
  17. Sengey Brin and Lawrence page, The Anatomy of lange-scale Hypertextual web search Engine, Computer Science Department, Stanford University, 1994.
  18. Multi-search Engines - a comparison [online] available from [2003].
  19. Google [online] available from .
  20. All About Google [online] available from http://www.google.com/about.html.
  21. Google Help Central [online] available from http://www.google.com.au/help.
  22. Danny Sullivan, Major Search Engines and Directories [online] available from [28 Mar 2007].
  23. Linda Barlow, A Helpful Guide to Search Engines [online] available from [5 Nov 2004].
  24. http://www.alhoodhood.com/about.html.
  25. http://www.aynacorp.com/About/6.html.

Writing Services

Essay Writing
Service

Find out how the very best essay writing service can help you accomplish more and achieve higher marks today.

Assignment Writing Service

From complicated assignments to tricky tasks, our experts can tackle virtually any question thrown at them.

Dissertation Writing Service

A dissertation (also known as a thesis or research project) is probably the most important piece of work for any student! From full dissertations to individual chapters, we’re on hand to support you.

Coursework Writing Service

Our expert qualified writers can help you get your coursework right first time, every time.

Dissertation Proposal Service

The first step to completing a dissertation is to create a proposal that talks about what you wish to do. Our experts can design suitable methodologies - perfect to help you get started with a dissertation.

Report Writing
Service

Reports for any audience. Perfectly structured, professionally written, and tailored to suit your exact requirements.

Essay Skeleton Answer Service

If you’re just looking for some help to get started on an essay, our outline service provides you with a perfect essay plan.

Marking & Proofreading Service

Not sure if your work is hitting the mark? Struggling to get feedback from your lecturer? Our premium marking service was created just for you - get the feedback you deserve now.

Exam Revision
Service

Exams can be one of the most stressful experiences you’ll ever have! Revision is key, and we’re here to help. With custom created revision notes and exam answers, you’ll never feel underprepared again.