Analysis Of Algorithm Of Search Engine Computer Science Essay
Abstract:A program in a computer can be viewed as an elaborate algorithm.an algorithm usually means a small procedure that solves a recurrent problem. Search engine parts are very important and indispensable,but the search algorithm is to allow normal operation of the various parts of the key.The search algorithm is to build search engine based search on various parts. Serach engines work on the way users find the data to search algorithms are based. In response to a query a search engine returns a ranked list of documents. If the query is broad i.e., it matches many documents then the returned list is usually too long to view fully.Algorithm shows that for broad queries that places the most authoritative pages on the query topic at the top of the ranking.The algorithm operates on a special index of expert documents. These are a subset of the pages on the WWW identified as directories of links to non-affiliated sources on specific topics. Results are ranked on the based the match between the query and relevant descriptive text for hyperlinks on expert pages pointing to a given result page. Search engine that implements the ranking scheme and discuss its performance. With a relatively small i.e 2.5 million pages expert index, our algorithm was able to perform comparably on broad queries with the best of the mainstream search engines.This term paper elaborates the concept of algorithm of search engines by defining their their types ,their working,their comparisions.
I.INTRODUCTION A.Algorithm:In mathematics, computer science, and related subjects, an algorithm is an effective method for solving a problem expressed as a finite sequence of steps. Algorithms are used for calculation, data processing, and many other fields.Each algorithm is a list of well-defined instructions for completing a task. Starting from an initial state, the instructions describe a computation that proceeds through a well-defined series of successive states, eventually terminating in a final ending state.A example of an algorithm is Euclid’s Algorithm to determine the maximum common divisor of two integers.
1) Need of algorithm
1.1) Human being cannot write fast enough, or long enough, or small enough to list all members of an infinite set by writing out their names, one after another, in some notation. But humans can do something equally useful, in the case of certain infinite sets: They can give explicit instructions for determining the nth member of the set, for arbitrary finite n. Such instructions are to be given quite explicitly, in a form in which they could be followed and understandable by a computing machine, or by a human who is capable of carrying out only very elementary operations on symbol.
1.2) Instructions in language understood by the computer for a fast, efficient, good process that specifies the moves of the computer machine or human, equipped with the necessary internally contained information and capabilities to find, decode, and then munch arbitrary input integers/symbols m and n, symbols + and = reliably, correctly, effectively produced, in a reasonable time, output-integer y at a specified place and in a specified format. Are
1.3) The concept of algorithm is also used to define the notion of decidability. That notion is central for explaining how formal systems come into being starting from a small set of axions and rules. In logic, the time that an algorithm requires to complete cannot be measured, as it is not apparently related with the customary physical dimension. From such uncertainties, that characterize ongoing work, stems the unavailability of a definition of algorithm that suits both concrete and abstract usage of the term.
B) SEARCH ENGINE
1)Introduction: A search engine is a software program that searches for sites based on the words that one designate as search terms. Search engines look through their own databases of information in order to find what it is that user is looking for. A program that searches documents for specified keywords and returns a list of the documents where the keywords were found. It is really a general class of programs, the term is often used to specifically describe systems like Google, Alta Vista and Excite that enable users to search for documents on the World Wide Web and USENET newsgroup.
2)History: The first tool for searching the Internet, was created in 1990, was called "Archie". It downloaded directory listings of all files located on public anonymous FTP servers; creating a searchable database of filenames. One year later "Gopher" was created. It indexed plain text documents. "Veronica" and "Jughead" came along to search Gopher's index systems. The first actual Web search engine was developed by Matthew Gray in 1993 and was called "Wandex".
When users use the term search engine in relation with the web ,they are usuakky refrring to the actual search HTML documents ,initially gathered by a robot.
3)Types: Search for anything using one’s favourite search engine , the search engine will sort through the millions of pages it has in its database and present the user with ones that match user’s search term. The matches will be ranked so that the most relevant appear first. Sometimes, depending upon the search engines algorithm, non relevant pages may make it into these results. However, it is because of things like this that the search engines are constantly updating their algorithms.There are basically three types of search engines.
3.1)Human-powered search engines:They are powered by human submissions.The information is submitted by the human being.The submitted information is put into the index.
3.2)Robot-powered search engine:They are powered by robots.When user query asearch engine to locate information ,user is actually searching through the index that the search engine has created user is not actually searchimg the web.
3.3)Hybrid of human and robot search engines: These indices are gaint databases of information that is collected and stored and subsequently searched.The return results eare based on the index ,if the index hasn’t been on the index,if the index hasn’t been updated since the Web page became invalid and the search engine treats the page still an active link even though it is not remain so longer.It will remain that way until the index updated.
The same search on different search engines produce different results because not all indices are going to be exactly the same. It depends on what the spiders find or what the user submitted to find. But every search engine doesnot uses the same algorithm. to search through the indices. The algorithm is what the search engines use to determine the relevance of the information in the index to what the user is searching for.
II.Web Directories And Search Engines
Search engines and Web direction are not
the same thing; although the term search engine often is used interchangeably. Search engines automatically create web site listings by using spiders that crawl web pages, index their information, and optimally follows that site's links to other pages. Spiders returns the already-crawled sites on a regular basis in order to check the updates or changes, and everything that these spiders find goes into the search engine database. On the other hand, Web directories are databases of human-compiled results. Web directories are also known as human powered search engine.
2)An alternative to using a search engine isto
explore a structured directory of topics. Yahoo,
which also lets you use its search engine, is the most widely-used directory on the Web. A number of Web portal sites offer both the search engine and directory approaches to finding information.
III.Working of Search Engines Work
1)Search engines are the key to finding specific information on the vast expanse of the World Wide Web. Without search engines, it would be impossible to locate anything on the Web without knowing a specific URL.
2) A search engine works by sending out a spider to fetch as many documents as possible. Another program, called an indexer , then reads these documents and creates an index based on the words contained in each document. Each search engine uses an algorithm to create its indices so that, only meaningful results are returned for each query. Search engines are not simple. They include incredibly detailed processes and methodologies, and are updated all the time. This look at how search engines work to retrieve the search results. All search engines go by this basic process when conducting search processes, but because there are differences in search engines, there are bound to be different results depending on which engine is used by the user.
2.1)The searcher types a query into a search engine.
2.2)Search engine software quickly sorts through literally millions of pages in its database to find matches to this query. search engines use automated software agents called crawlers that visit a Web site, read the information on the actual site, read the site's meta tags and also follow the links that the site connects to performing indexing on all linked Web sites as well. The crawler returns all that information back to a central depository, where the data is indexed. The crawler will periodically return to the sites to check for any information that has changed. The frequency with which this happens is determined by the administrators of the search engine.
2.3)The search engine's results are ranked in order of relevancy. On the Internet, a search engine is a coordinated set of programs that includes:
2.4)A spider that goes to every page or representative pages on every Web site that wants to be searchable and reads it, using hypertext links on each page to discover and read a site's other pages.A program that creates a huge index called a catalog from the pages that have been read.A program that receives the user’s search request, compares it to the entries in the index, and returns results to the user.
IV.GOOGLE Search Engine
A)Introduction:Today GOOGLE is the fastest growing search engine, and is one of the largest public databases of information.Using Google approximately 80% of all Internet searches are done, through Google.com and the network of sites licensing the Google search results like AOL, Netscape, iWon, Compuserve, Alexa, and many of others.
1)Google is an amazing search engine.User can add the url for free. Google doesn't care what kind of files ona can have on the site - it will index almost anything. And Google ranks the site according to pretty standard algorithms, except for one really neat factor the site is ranked in part based on the number and quality of sites that have linked back to it. A critical element of the link to the site is the phrase in the link. If the link has the words really amazing website, then the site will get a slightly higher search rank for the phrase really amazing website.
2)Google also doesn't have any editor "quality rating" system,it is a system adapted by Alta Vista,Yahoo, and Look Smart affiliated search engines which gives higher ranking to sites based upon subjective evaluations of editors.Google frequently spiders the Open Directory for new sites, and gives extra popularity credit to sites which are listed on the Open Directory.
V. Serach Engines Comparison
A)Yahoo Search Engine
It has been in the search game for many years.
It is better than MSN but nowhere near as good as Google at determining if a link is a natural citation or not.
It has a ton of internal content and a paid inclusion program. both of which give them incentive to bias search results toward commercial results.
B) MSN Search Engine
It is new to the search game.
It is bad at determining if a link is natural or artificial in nature due to sucking at link analysis they place too much weight on the page content.
Their poor relevancy algorithms cause a heavy bias toward commercial results likes bursty recent links new sites that are generally untrusted in other systems can rank quickly in MSN Search .
Things like cheesy off topic reciprocal links still work great in MSN Search.
B)Google Search Engine
It has been in the search game a long time, and saw the web graph when it is much cleaner than the current web graph.
It is much better than the other engines at determining if a link is a true editorial citation or an artificial link.
It looks for natural link growth over time .
It heavily biases search results toward informational resources .
A page on a site or subdomain of a site with significant age or link related trust can rank much better than it should, even with no external citations.
They have aggressive duplicate content filters that filter out many pages with similar content.
If a page is obviously focused on a term they may filter the document out for that term. on page variation and link anchor text variation are important. a page with a single reference or a few references of a modifier will frequently outrank pages that are heavily focused on a search phrase containing that modifier.
V.Search Engine Approaches
1) Major search engines such as Google,Yahoo,
AltaVista, and Lycos index the content of a
large portion of the Web and provide results
that can run for pages - and consequently
overwhelm the user.
2) Some specialized content search engines are
selective about what part of the Web is
crawled and indexed. Ask Jeeves i.e http://
www.ask.com provides a general search of
the Web but allows the user to enter a search
request in natural language.
3) Major Web sites such as Yahoo and some
special tools let the user to use a number of
search engines at the same time and compile
results in a single list.
4)Individual Web sites, especially larger
corporate sites, may use a search engine to
index and retrieve the content of just their
own site. Some of the major search engine
companies license or sell their search engines
for use on individual sites.
VI.Search Engine Algorithm
A)Introduction:An algorithm is nothing more than a set of rules, used by a search engines, to determine in which order search results will be listed. There are over 5 million pages that contain that phrase. Listing the algorithm alphabetically would not make much sense asit is technically would be considered the simplest form of an algoritm. Considering how much information there is on the Internet on virtually any topic, the absolute best deal for everyone involved is for search engines to return the most relevangt sites at the top, and the least relevant sites at the bottom. This is done by algorithms..
B)Working:A search engine algorithm takes the phrase entered the user and test all of the pages in its index according to a very long series of rules that rank them according to relevancy. In the case of the search phrase Web Desinger the page that appears at the number one position is supposed to be the most relevant, and the one that appears in the 5 million position is supposed to be the least relevant.
1) There are many immeasurable factors that go into it, and it is physically impossible for any computer, however powerful, to know them all. Nonetheless, those who have been writing search engine algorithms over the past several years have learned thousands of little tricks that help search engines make educated guesses at which pages might be the most useful. The algorithms are constantly being updated in such a way that,the results are becoming increasingly accurate.
2)As search engines are always learning new tricks, those who want to beat the system are learning them as well. Some might remember the days when one would type in a phrase such as Web Desinger and get a completely unrelated page, trying to sell an entirely different service.After getting high ranked, the pages that are caught using underhanded tricks are now heavily penalized, or even banned from the index.
3)Search-Engine-Site.com attempts to explain these larger patterns that will keep the site ranking high in the long run, not just the immediate future. Most of this has nothing to do with secrets, but the hard work involved in creating a site that truly is relevant, and has therefore gained a strong reputation among a large network of informative sites. Instead, attempt to explain the genuinely good-intentioned project that underlies search engine algorithms: giving helpful answers to the millions of questions asked by people all over the world, every single day!
B)Types Of Search Algorithm:
The commonly used search engine algorithm are given as follow:
1)List Search Algorithm:List of search algorithm is based on the data specified by a particular keyword search. The search data is a completely linear, list-based approach. List of search results are usually only one element, which means that this method of billions of websites in the search will be very time-consuming, but can be less of search results.
2)Tree Search Algorithm: In this first imagine a tree in the mind . Now the root of this tree or the leaves start to inspect the tree. This is the tree search of work. The algorithm can be the most broad leaf from the data part of the beginning, has been searching to the most narrow roots; also start from the root of the most narrow, has been searching some of the most broad leaves. Data set is like a tree: a data through the branches into contact with the other data, like Web pages in this organization. Tree search is not the only one that can be successfully used in Web search algorithms, but it really applies to Web search.
3)SQL Search Algorithm: Tree search is a flaw inherent in it can only be carried out layer by layer search, that is, it can only order the data, a data search from the other data. The SQL search no such limitations, it allows to search for non-tier approach, which means that any data can be a subset of start the search.
4)Heuristic (informed) Search Algorithm: Heuristic search algorithm is similar to the tree structure of a given data set to find the answer. As the name suggests, as they search for answers to the inherent characteristics of Web search heuristic search is not the best choice. However, the heuristic search is applied to a specific data set to perform a specific query.
5)Hostile (adversarial) Search Algorithm: Hostile attempt exhaustive search algorithm answers to all questions, it is like in the game trying to find all possible solutions. The algorithm for Web search is difficult because of the network, whether it is a word or a phrase, there will be almost infinite number of search results.
6) Constraint Satisfaction Search Algorithm: A web search for a word or phrase, the constraint satisfaction search algorithm of the search results most likely meet your needs. The search algorithm to a number of constraints to find the answer, and a variety of different ways you can search for data sets without having to be limited to the linear search. Constraint satisfaction search is ideal for Web search.
C)Search Engine Algorithms are Kept Secret
1)Many user try to optimize a page based on exact algorithms of the search engines. To protect themselves, search engines have been active in using off site criteria to rank web pages. Here are some few search engine algorithm facts:
2)Anyone who knew the exact search engine algorithm could not be selling the information cheaply over the web.To fight off with spam search engines change their algorithms many times each month.If the user knew the exact algorithm then they could manipulate rankings as they please until the search results became so irrelevant that the search engine became junk.
3)Due to the many millions of Websites and pages available on the Internet the search engines, in order to find the most relevant ones and rank them accordingly, follow a set of rules, known as an algorithm. Exactly how a particular search engines algorithm works is not made public and so it is the responsibility of the SEO agencies to use their methods and techniques to rank a Website using SEO.
1)Search Engine Facts:
1.1)Search engines are the No.1 way for the consumers to find the information.
1.2)Search engines are the No.1 way to generate traffic to websites.
1.3)80% of internet users use search engines to find the sites they want.
1.4)Search engine positioning was the top method cited by web site marketers to drive traffic to their sites.
2)With moderate search engine optimization knowledge, some common sense, and a resourceful and imaginative mind, one can be able to keep his or her web site in good standing with search engines even through the most significant algorithm changes.
3)Many people believe that search engines have hidden agenda or promote certain thing that stop their sites from being listed. This impurity as a whole would cause the search engine to have low popularity since the search results would be biased and likely highly inaccurate. For this reasons each search engine tries to keep competitive high quality search results.
In building a search engine, only a few of several search algorithms available. Search engines often use a variety of search algorithms at the same time, and in most cases will create some proprietary search algorithm.
 Kleinberg, J. May 1997,Authoritative
sources in a hyperlinked environmenta.
Technical Report RJ 10076, IBM.
 Borodin, A, Roberts, G.O., Rosenthal, J.S.
and Tsaparas, P. â€ Finding authorities and
hubs from link structures on â€™. In
Proceedings of the 10 th International
World Wide Web Conference, Hong Kong,
 Donald Knuth. The Art of Computer
Programming Volume 3: Sorting and
Searching. ISBN 0-201-89685-0.
 Internet History Search Engines(fromSearch Engine Watch), UniversiteitLeiden,Netherlands,September2001,web:LeidenU-Archie Archive of NCSA what’s new inDecember 1993 page.
Gandal, Neil (2001). "The dynamics ofcompetition in the internet search engine market".International Journal ofIndustrial Organization.
March 2008, The Recession List – Top 10Industries To Fly And Flop In 2008,IBISWorld.
Nielsen NetRatings: August 2007 SearchShare Puts Google On Top, MicrosoftHoldingGains,SearchEngineLand,September 21, 2007.
 Steve Lawrence; C. Lee Giles(1999)
("Accessibility of information on the
web".Nature 400: 107.doi.10.1038/
 Levene, Mark (2005).An Introduction
to Search Engines and Web Navigation
 Hock, Randolph (2007).The Extreme
Searcher’s Handbook ISBN978-0-
Ross, Nancy; Wolfram, Dietmar (2002).
"End user searching on the Internet:
Ananalysis of term pair topics submitted
to the Excite search engine".Journal of
theAmerican Society for Information.
Xie, M.;et al.(1998). "Quality
dimensionsof Internet search engines”.
Journal ofInformation Science: 365–
Page rank algorithm
Suppose a small universe of four web pages: A,B ,C and D. If all those pages link to A, then the PR (PageRank) of page A would be the sum of the PR of pages B, C and D.
PR(A) = PR(B) + PR(C) + PR(D)
But then suppose page B also has a link to page C, and page D has links to all three pages. One cannot vote twice, and for that reason it is considered that page B has given half a vote to each. In the same logic, only one third of D's vote is counted for A's PageRank.
In other words, divide the PR by the total number of links that come from the page.
Finally, all of this is reduced by a certain percentage by multiplying it by a factor q. For reasons explained below, no page can have a PageRank of 0. As such, Google performs a mathematical operation and gives everyone a minimum of 1 - q. It means that if you reduced 15% everyone you give them back 0.15.
So one page's PageRank is calculated by the PageRank of other pages. Google is always recalculating the PageRanks. If you give all pages a PageRank of any number (except 0) and constantly recalculate everything, all PageRanks will change and tend to stabilize at some point. It is at this point where the PageRank is used by the search engine.
The formula uses a model of a random surfer who gets bored after several clicks and switches to a random page. The PageRank value of a page reflects the frequency of hits on that page by the random surfer. It can be understood as a Markov process in which the states are pages, and the transitions are all equally probable and are the links between pages. If a page has no links to another pages, it becomes a sink and therefore makes this whole thing unusable, because the sink pages will trap the random visitors forever. However, the solution is quite simple. If the random surfer arrives to a sink page, it picks another URL at random and continues surfing again.
To be fair with pages that are not sinks, these random transitions are added to all nodes in the Web, with a residual probability of usually q=0.15, estimated from the frequency that an average surfer uses his or her browser's bookmark feature.
So, the equation is as follows:
where p1,p2,...,pN are the pages under consideration, L(pi) is the set of pages that link to pi, and N is the total number of pages.
The PageRank values are the entries of the dominant eigenvector of the modified adjacency matrix. This makes PageRank a
Need an essay? You can buy essay help from us today!