This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
The Web evolved beyond FTP archives not just by becoming a graphically rich multi-media world, but by evolving tools which made it possible to find and access this richness. Oldsters like this author remember that before browsers there was WAIS (released 1991), and the XWAIS version provided a user-friendly GUI way to find information. However, this system required servers to organize information according to a specific format. GOPHER, another information serving system with some user-friendliness, was released the same year. One of the earliest search engines like those today, Lycos, began in the spring of 1994 when John Leavitt's spider (see below) was linked to an indexing program by Michael Mauldin. Yahoo!, a catalog, became available the same year. Compare this to the appearance of NCSA Mosaic in 1993 and Netscape in 1994.
Today there are a score or more of "Web location services." A search engine proper is a database and the tools to generate that database and search it; a catalog is an organizational method and related database plus the tools for generating it. There are sites out there, however, that try to be a complete front end for the Internet. They provide news, libraries, dictionaries, and other resources that are not just a search engine or a catalog, and some of these can be really useful. Yahoo!, for example, emphasizes cataloging, while others such as Alta Vista or Excite emphasize providing the largest search database. Some Web location services do not own any of their search engine technology - other services are their main thrust. Companies such as Inktomi (after a native American word for spider) provide the search technology. These Web location services have put amazing power into every user's hands, making life much better for all of us. . . . and it's all free, right?
. . . Maybe not. It is rumored that these information companies might increase their revenues by selling information - information aboutÂ you. After you use a search engine and find a page with mutual fund quotes, you might find yourself suddenly receiving e-mail advertising investments. Think this is a coincidence? Think again. The investment company could have paid a search engine for your e-mail address. The sale of such information is not advertised at this time, however, there is an existing protocol for servers to ask a user's browser for such information, routinely entered during set-up. Get scared about your privacy by checking outÂ the anonymizer snoop page. For best results, search for the anonymizer snoop page, "I can see you", then go to it from your search engine (you'll see what I mean). For now, let's stick to the practical aspects of search engines, catalogs, and Web location services.
II. How Software Agents and Search Engines Work
There are at least three elements to search engines that I think are important: information discovery & the database, the user search, and the presentation and ranking of results.
Discovery and Database
A search engine finds information for its database by accepting listings sent in by authors wanting exposure, or by getting the information from their "Web crawlers," "spiders," or "robots," programs that roam the Internet storing links to and information about each page they visit. Web crawler programs are a subset of "software agents," programs with an unusual degree of autonomy which perform tasks for the user. How do these really work? Do they go across the net by IP number one by one? Do they store all or most of everything on the Web?
According toÂ The WWW Robot Page, these agents normally start with a historical list of links, such as server lists, and lists of the most popular or best sites, and follow the links on these pages to find more links to add to the database. This makes most engines, without a doubt, biased toward more popular sites. A Web crawler could send back just the title and URL of each page it visits, or just parse some HTML tags, or it could send back the entire text of each page. Alta Vista is clearly hell-bent on indexing anything and everything, with over 30 million pages indexed (7/96). Excite actually claims more pages. OpenText, on the other hand, indexes the full text of less than a million pages (5/96), but stores many more URLs.Â InktomiÂ has implemented HotBot as a distributed computing solution, which they claim can grow with the Web and index it in entirety no matter how many users or how many pages are on the Web. By the way, in case you are worrying about software agents taking over the world, or your Web site, look over theÂ Robot Attack Page. Normally, "good" robots can be excluded by a bit ofÂ Exclusion StandardÂ code on your site.
It seems unfair, but developers aren't rewarded much by location services for sending in the URLs of their pages for indexing. The typical time from sending your URL in to getting it into the database seems to be 6-8 weeks. Not only that, but a submission for one of my sites expired very rapidly, no longer appearing in searches after a month or two, apparently because I didn't update it often enough. Most search engines check their databases to see if URLs still exist and to see if they are recently updated.
What can the user do besides typing a few relevant words into the search form? Can they specify that words must be in the title of a page? What about specifying that words must be in an URL, or perhaps in a special HTML tag? Can they use all logical operators between words like AND, OR, and NOT?
Query Syntax Checklist
How does your engine handle:
Truncation, Pluralization & Capitalization:
Macintosh, Mac, Macintoshes, Macs, macintosh, macintoshes, mac, macs, could all yield different results. Most engines interpret lower case as unspecified, but upper case will match only upper case, but there are exceptions. There is no standard at all for truncation, and worse yet, it is probably different in general and advanced search mode for every engine.
does the engine logically AND them or OR them ?
Typically one puts quotes around a phrase so that each word in the phrase is not searched for separately.
. . . Check with your engine's help file before starting a search.Most engines allow you to type in a few words, and then search for occurrences of these words in their data base. Each one has their own way of deciding what to do about approximate spellings, plural variations, and truncation. If you just type words into the "basic search" interface you get from the search engine's main page, you also can get different logical expressions binding the different words together. Excite! actually uses a kind of "fuzzy" logic, searching for the AND of multiple words as well as the OR of the words. Most engines have separate advanced search forms where you can be more specific, and form complex Boolean searches (every one mentioned in this article except Hotbot). Some search tools parse HTML tags, allowing you to look for things specifically as links, or as a title or URL without consideration of the text on the page.
By searching only in titles, one can eliminate pages with only brief mentions of a concept, and only retrieve pages that really focus on your concept.
By searching links, one can determine how many and which pages point at your site. Understanding what each page does with the non-standard pluralization, truncation, etc. can be quite important in how successful your searches will be. For example, if you search for "bikes" you won't get "bicycle," "bicycles," or "bike." In this case, I would use a search engine that allowed "truncation," that is, one that allowed the search word "bike" to match "bikes" as well, and I would search for "bicycyle OR bike OR cycle" ("bicycle* OR bike* OR cycle*" in Alta Vista).
Presentation & Ranking
With databases that can keep the entire Web at the fingertips of the search engines, there will always be relevant pages, but how do you get rid of the less relevant and emphasize the more relevant?
Most engines find more sites from a typical search query than you could ever wade through. Search engines give each document they find some measure of the quality of the match to your search query, a relevance score. Relevance scores reflect the number of times a search term appears, if it appears in the title, if it appears at the beginning of the document, and if all the search terms are near each other; some details are given in engine help pages. Some engines allow the user to control the relevance score by giving different weights to each search word. One thing that all engines do, however, is to use alphabetical order at some point in their display algorithm. If relevance scores are not very different for various matches, then you end up with this sorry default. Zeb's [Whatever] page will never fare very well in this case, regardless of the quality of its content. For most uses, a good summary is more useful than a ranking. The summary is usually composed of the title of a document and some text from the beginning of the document, but can include anÂ author-specified summary given in a meta-tag. Scanning summaries really saves you time if your search returns more than a few items.
Get More Hits By Understanding Search Engines
Knowing just the little bit above can give you ideas of how to give your page more exposure.
Hustle for Links
Most software agents find your site by links from other pages. Even if you have sent in your URL, your site can be indexed longer and ranked higher in search results if many links lead to your site. One of my sites that couldn't show up in the most casual search got most of its hits from links on other sites. Links can be crucial in achieving good exposure.
Use Titles Early In the Alphabet
All engines that I used displayed results with equal scores in alphabetical order.
Submit Your URL to Multi-Database Pages
It is best to use a multiple-database submission service such asÂ SubmitIt!Â to save you the time of contacting each search service separately. Remember, it takes 6-8 weeks to become indexed.
Control Your Page's Summary
You can use the meta tag name="description" to stand out in search results. Appear in search summaries as "Experienced Web service, competitive prices" not "Hello and welcome. This page is about."
Search Reverse Engineering
Simulate your audience's search for your page (have all your friends list all the searches they might try), then see what you need to do to come up first on their search engine's results list.
Use theÂ meta-tagÂ name="keywords" to put an invisible keyword list at the beginning of your document that would match keywords your audience would use. Most search engines rate your page higher if keywords appear near the beginning.
How many times do the keywords appear in the text? It usually demonstratesÂ goodÂ writing if you don't repeat the same words over and over. However, search engines penalize you for this, usually rating your page higher for repetitions of keywords, inane or not. Some authors combat this by putting yet more keywords at the bottom of their pages in invisible text. Look at the source code for this article, and you'll see what I mean; the words are just in the same color as the background.
"Spamming" is net-lingo for spreading a lot of junk everywhere; keyword spamming is putting hidden keywords a huge number of times in your document just so yours will be rated higher by search engines.
Search engines typically limit you to 25 keywords or less, and one I know of truncates your list when they see an unreasonable number of repetitions.
Invisible text at the end of your pages puts blank space there, which looks bad and slows loading. Services which rate pages will enjoy marking you down for this.
Responsible Keyword Use:Â If an important keyword doesn't appear at least four times in your document, I hereby give you the right to add invisible text until it appears a maximum of five times.
III. Getting the Most Out of Your Search Engine
Search Engine Features
Web location services typically specialize in one of the following: their search tools (how you specify a search and how the results are presented), the size of their database, or their catalog service. Most engines deliver too many matches in a casual search, so the overriding factor in their usefulness is the quality of their search tools. Every search engine I used had a nice GUI interface that allowed one to type words into their form, such as "(burger not cheeseburger) or (pizza AND pepperoni)." They also allowed one to form Boolean searches (except Hotbot as of 7/1/96, which promises to install this feature later), i. e. they allowed the user to specify combinations of words. In Alta Vista and Lycos, one does this by adding a "+" or a "-" sign before each word, or in Alta Vista you can choose to use the very strict syntax Boolean "advanced search." This advanced search was by far the hardest to use, but also the one most completely in the user's control (except for OpenText). In most other engines, you just use the words AND, NOT, and OR to get Boolean logic.
By far the best service for carefully specifying a search was Open Text. This form has great menus, making a complex Boolean search fast and easy. Best of all, this service permits you to specify that you want to search only titles or URLs. But then there's Alta Vista's little known "keyword" search syntax, now as powerful as OpenText, but not as easy to use. You can constrain a search to phrases in anchors, pages from a specific host, image titles, links, text, document titles, or URLs using this feature with the syntax keyword:search-word. There is an additional set of keywords just for searching Usenet. (To my knowledge, Alta Vista's keywords were undocumented before 7/19/96, so tell your friends you heard it here first!)
Which Search Page Should I Use When, and How?
Use . . .
If You . . .
Using the Feature . . .
have no good ideas for specific search strategies
best test results for broad search terms
want to find someone's e-mail
have more than one broad search word, or can't pick a site from Lycos' summaries.
best available results summaries.
want interactive news/ want details on today's headlines.
news with links to related sites.
want to search only document title or perform complex searches
title search specification, best advanced search interface.
are hunting for an image
want to find all the links to your page
+link:your_site -url:your_site syntax.
want the best national and international news
Reuters world headlines.
want a dictionary or other reference source
Dictionaries or Reference Libraries.
What could really make engines with large data bases shine, however, would be an improvement in the way they rank and present results. All engines I tested had ranking schemes that were not well documented, based on how many times your search words were mentioned, whether or not they appeared early in the document, whether or not they appeared close together, and how many search terms were matched. I did not find the ranking schemes very useful, as relevant and irrelevant pages frequently had the same scores.
Useful Non-Search Goodies
E-mail address books:
Most engines allow you to search for someone's name if you quote it "John Q. Webhead", but you have to be careful about exact spelling, use of initials, etc.
Yahoo! has the best news, in my humble opinion, as they haveÂ Reuters international news headlines.Â Most other news are ultra-brief summaries which read like "MacPaper."Catalogs
I have only been disappointed by catalog services. In practice, they seem to aim for the lowest common denominator, and reflect very little thought to how and when they might be useful instead of search engines. All the ones I tested were directed toward novices and favored popular commercial sites. I would have thought they would be very good for finding software at least, but this was not the case. See the example below trying to find Web server related software.
Advanced or Boolean Queries
Making queries very carefully in Boolean terms to narrow a search rarely produces useful results for me (but see below). In practice, other ways of specifying a search besides detailed logic are much more useful. Specification of exact vs. approximate spelling, specification that search terms must appear as section headings or URLs, using more keywords, and just specifying the language of the document would have been more valuable in all of my search examples.
Example: Eliminating Unwanted Matches
The exception to this is the AND NOT operatorÂ - it is essential to exclude unwanted but close matches when they outnumber the desired matches. An example of when to use this operator is given by the problem of finding information on growing apples, because you will be deluged by information on Apple computers. With enough work, you can start to see apples with stems, not cords, but it isn't easy. Using Alta Vista, "+apple -mac* -comp* -soft* -hard* -vendor" got me information on the Payson-Santaquin apple farming region and a federal apple agriculture database on the first page of results.
Useful Search Features
ï‚· Find Images to Steal (Alta Vista)
I bet you will all use this at one time or another, so I insist you credit this article andÂ webreference.comÂ for this goodie: With Alta Vista, you can limit your search to image titles by using the format:
This was the only way I could find a useful picture of a nose for a physician's page - I had searched through jillions of clip art pages, and even contacted graphic artists, and they couldn't come up with anything as good as I found for free! USE THIS.
Try it now (replace ansel with your choice of image search string):
Top of Form
Alta Vista Search:Â
Bottom of Form
ï‚· Search for Strings in Titles (Alta Vista, OpenText)Â for faster results.
If applicable, this kind of search eliminates chaff by sticking to the pages that center on your subject, not ones that just mention a lexically related word. Use the syntax:
in Alta Vista, or just use the simple pull-down menus in OpenText's "advanced search mode."
ï‚· Find the Links to Your Own Site (Alta Vista)Â
Alta Vista claims that you can get all the links to your own site by searching with the keyword construction: +link:http://mysite.com/ -host:mysite in the Simple queryÂ
...I found that the most important link to one of my sites was missing from this search, so I was not impressed; however, my editor swears by this. Try it now (replace webreference below with your site name):
Top of Form
Alta Vista Search:Â
Bottom of Form
ï‚· Find the Number of Links to Your Own Site (Alta Vista)Â
For a more accurate estimate of the actual number of links to your site (or backlinks), use Alta Vista's advanced search, and display the results as a "count only." The above method will give you links, but approximates their number, this method more accurately estimates the number of backlinks. Try it now (replace webreference below with your site name) ABK-12-29-96:
Top of Form
SearchÂ Â and Display the ResultsÂ
Selection Criteria:Â Please use Advanced Syntax (AND, OR, NOT, NEAR).
Bottom of Form
Which is the Best Search Engine?
(It's not just how big your data base is, it's how you use it.)
To decide which search engine I would choose as the best, I decided that nothing but useful results would count.Previous articlesÂ have emphasized quantified measures for speed and database sizes, but I found these had little relevance for the best performance in actual searches. By now, all engines have great hardware and fast net links, and none show any significant delay time to work on your search or return the results. Instead, I just came up with a few topics that represented, I felt, tough but typical problems encountered by people who work on the net: First, I tried a search with "background noise", a topic where a lot of closely related but unwanted information exists. Next, I tried a search for something very obscure. Finally, I tried a search for keywords which overlapped with a very, very popular search keyword. I defined a search as successful only if the desired or relevant sites were returned on the first page of results.
Example - Search Terms Which Yield Too Many Matches
For the first type of search, I wanted to find a copy of Wusage to download, free software that lets you keep track of how often your server or a specific page is accessed, a common tool for HTML developers. This site is hard to find because output files are produced by the program on every machine running it that have the string "wusage" in their title and text. When I simply typed "wusage" into search page forms, Infoseek and Lycos were the only engines to find theÂ free versionÂ of the software I wanted. (Note I gave no credit for finding the version for sale. A careful search of the sale version's page, didÂ notÂ produce any links to the free version's download site.) Infoseek's summaries were very poor, however, and all matches had to be checked.
Always Search As Specifically As Possible
Most engines failed to find their quarry because the search was too broad. After all, how is the engine supposed to know I want the free version? After spending a long time to find out theÂ exactÂ name of what I wanted, "wusage 3.2", Infoseek, Excite, Magellan, and Lycos all found the site I was interested in. Alta Vista, Hotbot, and OpenText yielded nothing of interest on their first page. Magellan came out the clear winner on this search, as the site summary was by far the best. (Asking Alta Vista to display a detailed version of the results didn't change things at all!) Infoseek and Excite performed well, but Lycos listed a much older version of wusage (2.4) first.
Think About Search Terms
It eventually occurred to me to search for "wusage AND free" to find the free copy of wusage. In some sense, Lycos was the winner this time because the free version was the first match listed; however, its summary was not very useful. While it did a better job than Infoseek, it didn't tell me whether each site was relevant or not. Magellan's response was very good, as it included a link leading to the software on the first page of matches, again with an excellent summary. Yahoo and Alta Vista also found it, but all these engines rated the fee version higher than the free version. OpenText did very well here, but only in advanced search mode where it was possible to specify that wusage must be in the title, and "free" could be anywhere in the text. Wusage3.2 was listed as the second of only two entries - no digging here! Excite failed to find the site at all, and HotBot found only 10 matches for statistics of a server in Omaha.
Curiously, a search for "download wusage" did not improve the results over the single-word searches for any of the search engines! (It may be time for rudimentaryÂ standardizedÂ categories to be used on the Web: e.g. this is a download archive, this is an information only site, this is an authoritative site, etc.) The lesson here may just be "if at first you don't succeed..."
Catalogs were not helpful. Yahoo!, under computers/software had nothing whatever to try for wusage: no http, no HTML, no wusage, not even servers. In Excite!, under computing/www/web ware, three more clicks got me to wusage, but -surprise!- I could not get to the free version. See why you don't want anyoneÂ elseÂ filtering your information?
The lessons from this search, which I have found repeated in other searches, are given in the "Examples: Summary . . ." box below.
Examples Summary: How To Improve Your Searches
The most valuable search tool is specific information
on a search. (In the search for wusage, I had no problems when I knew that version 3.2 was what I needed.)
Think about your search terms - the next most important search tool
Obviously, since I wanted the free version of wusage in the example, I should have searched for "free AND wusage"; I got nothing with just "wusage" with most engines.
Good site summaries save you time by saving you surfing
Use Magellan or OpenText if possible. To research the example above, I had to pour through dozens of pages. Only Magellan's summaries really gave me any confidence that I did not have to check every site.
Specify a "title only" search if applicable
Title only searches are available only with OpenText and Alta Vista. In the examples, it yields more practical results than coming up with lots of search words, (as help pages suggest) or than forming logically complex search queries (as one might think). Adding more search words made the results above worse, not better. A Boolean search also did no better, e.g.. "wusage AND (free or download)" yielded nothing from Alta Vista.
Searches Can Yield New Information, but they are never complete
None of my searches ever found the good page on tegu care that I know exists.Example - Finding The Really Obscure
For this example, let's try to find out how to care for a "tegu", a South American lizard that is only moderately popular even among lizard enthusiasts. (If that's not an adequate example of obscure information, I don't know what is.) I know that a page exists called "TEGU INTRO" atÂ http://www.concentric.net/ ~tegu/tegu.html, but we will simulate a blind search here. This search was full of surprises.
First I began by just searching for the string "tegu." Infoseek's first match was a tegu page I did NOT know about! Still, the one I wanted was not listed on the first page. Excite yielded nothing about tegus, only information on a vaguely related reptile, the "dwarf tegu." A search on the string "tegu care" yielded nothing relevant. (A search on their handy Usenet database did find the old tegu article I was looking for, three weeks old, which was no longer on my local news server. Other engines found this as well.) Lycos came up with the URL Infoseek found, plus two more, however, the additional listings were only pictures, not information. Searching for the string "tegu care" got nothing. Alta Vista found nothing useful either way, just ads for lizard food. OpenText found nothing, even when I searched for "tegu lizard." Hotbot found a picture of a tegu with "tegu care," but it did not return any relevant information with any search.
None of the searches I tried came up the URL I knew about. The lesson here is that you can really find new things on the Web with search engines, but if you need to find a specific page, it will always be a crap shoot. Advanced searches yielded nothing more with any engine ("tegu in title AND (care or lizard)", etc.) Some way to require that the searches were only among English language documents would have been much more helpful. Some northern-European sounding language apparently has the word tegu in it, not referring to a lizard, and many foreign language pages fouled my results on some engines. Another feature that would really have made a difference would be a filter for sales pages -- most of the mentions of tegu on the net are ads for "Monitor and Tegu Food", containing no care information. As expected, Yahoo! and Excite! Catalogs were useless here as well.
Example - Selectivity: Apple Trees NOT Apple Computers
There are gobs of stuff on the net about Apple Computers, but what about growing apple trees? Surprisingly, this search was very easy! apple* alone always yielded lots of stuff about the computers, and one often had to add as many as five excluded terms (apple* -vendor* -hard* -soft* -comp* -mac*) before receiving any matches for apples you can eat. Surprisingly, however, just apple* tree* usually yielded detailed information on growing apple trees on the first page of results. The poorer results required one to increase the search command to apple* tree* grow*.
And The Winner Is. . .
I don't really want to pick a winner. . . All right, if you insist: The "Search Test Results . . ." table, below, lists the engines in order of their ranking.Â LycosÂ is therefore the official heavy weight search engine champion of the universe, based on the tests above. However, I think this is missing the point. As shown in the table,Â "Which Search Page . . . ?", above, you should choose different engines for different tasks. None of the engines tested were able to limit their searches to images except for Alta Vista. This engine must therefore surely be the bestoneÂ for graphics designers if they are allowed to use only one, but for most other purposes, the user will have to wade through the mountains of chaff and drek to find what they want. It is more beneficial to use different engines for different tasks; at most only a few are required.
Search Engine Test Results
"One Item Among Many Related Pages" Test
"Obscure Item" Test
"Selectivty: Apple Trees Not Computers" Test
Found item with broad search word and exact name.Â
Found item first on results list with two search terms.
Found unknown item, but not known item.
Just apple$ tree$ yielded good results.
Returned the most relevant matches in the tests, but requires more time to check bad matches than Magellan.
Found item with broad search word and exact name.Â
Found item with two search terms.
Found unknown item, but not known item.
Just apple$ tree$ yielded good results.
Found wusage in title search
Good results with 2 or 3 terms, most useful with 3 terms due to superior summaries.
Ability to specify title searches very useful and user-friendly. Summaries very good.
Failed with approximate and exact words.Â
Found item low on first page with two search terms.
Good results with apple* tree* grow*.
Keyword searches for images, titles, etc. are very useful in other searches.
Found with exact name.Â
Found item low on first page with two search terms.
Required three search terms: apple* tree* grow*
Superior summaries always save you surf time.
Found with exact name, failed with two word search.
Required third search term: apple* tree* grow*, even then irrelevant results were first.
. . .
Failed all searches
Failed all searches
Found only images, and did worse when grow* was added!!!
Poorest Performer (excluding catalogs).
Excite! Catalog (not engine)
Failed all searches
Failed all searches
Failed all searches
Catalogs not at all useful.
Yahoo! Catalog (not engine)
Failed all searches
Failed all searches
Failed all searches
Catalogs not at all useful.
Different engines have different strong points; use the engine and feature that best fits the job you need to do. One thing is obvious; the engine with the most pages in the database IS NOT the best. Not surprisingly, you can get the most out of your engine by using your head to select search words, knowing your search engine to avoid mistakes with spelling and truncation, and using the special tools available such as specifiers for titles, images, links, etc. The hardware power for rapid searches and databases covering a large fraction of the net is yesterday's accomplishment. We, as users, are living in a special time when search engines are undergoing a more profound evolution, the refinement of their special tools. I believe that very soon the Web will evolve standards, such as standard categories, ways of automatically classifying information into these categories, and the search tools to take advantage of them, that will really improve searching. I think it's exciting to be on the Web in this era, to be able to watch all the changes, and to evolve along with the Web as we use it.
V. References and Recommended Reading
A fairly extensive list of search engines and related services appears onÂ Netscape's Net Search PageÂ but you should also look atÂ Web CrawlerÂ and the many others that exist. Remember, a new, better engine could come on-line at any moment, and the underdogs need your support.
For an overly-techy article on search engines, try the IW labs review of engines, Internet World May 1996.
Keep software agents off your site by reading and usingÂ A Standard for Robot Exclusion.
The author gratefully acknowledges technical assistance from the very expertÂ Opus One, the most knowledgeable and enjoyable people you will ever meet in this or any other business. This outfit is an excellent reference for anything having to do with computers or the Internet.
* * * *
Dr. Bruce Grossan, when he is not out climbing, hunts supernovae and gamma-ray burst counterparts at the University of California at Berkeley'sÂ Space Sciences LaboratoryÂ andÂ Lawrence Berkeley National Laboratory. Lately he has also been exploring consulting on educational and business web projects, and writing The Great American Novel.
Welcome to Internet Detective - a free online tutorial that will help you develop Internet research skills for your university and college work. The tutorial looks at the critical thinking required when using the Internet for research and offers practical advice on evaluating the quality of web sites.
Who is the tutorial for?
It's designed to help students in higher and further education who want to use the Internet to help with research for coursework and assignments.
What does the tutorial cover?
The tutorial is divided into the following sections:
What's the Story?Â - understand the advanced Internet skills required for university and college work.
The Good the Bad and the UglyÂ - see why information quality is an issue on the web, especially for academic research. Learn how to avoid time wasting on Internet searching, scams and hoaxes.
Detective WorkÂ - get hints and tips that help to critically evaluate the information you find on the Internet.
Get On the CaseÂ - try out your Internet Detective skills with these practical exercises.
Keep the Right Side of the LawÂ - be warned about plagiarism, copyright and citation.
What does the tutorial involve?
You can work through the whole tutorial by selecting the next button at the bottom of each screen, or use the table of contents in the left margin to skip to a section.
The tutorial will takeÂ around an hourÂ to complete, but you can do it in more than one sitting.
If you get stuck use the "HELPÂ at the top of the page. ".
OK, let's get on the case!
What's the Story?
University and college work requires some advanced Internet skills
Use this section of the tutorial to learn:
Why studentsÂ failÂ if they use the Internet badly
About the potentialÂ pitfallsÂ of using the Internet indiscriminately for research
Why you need toÂ step up your Internet skillsÂ at university and college
Picture the scene
You've just spent a week working hard on a piece of coursework. You spent ages doing the research and found loads of information on the Internet.
You have high hopes for a good grade. Then all of a sudden, BANG, you get a fail!
What went wrong?
You scan your feedback comments â€¦
It seems your lecturer is not happy with the references you used.
Apparently youÂ missed out all the key sourcesÂ of information that you should have used. They ask why you didn't refer to your reading list or any resources from the library.
Some of theÂ sourcesÂ you quote are inappropriate - they were looking for academic sources such as journal articles, rather than random web sites.
They are also unhappy with theÂ contentÂ of the some of the sites you quote -there was a lot of bias and you don't give both sides of the argument. Much of the information you cited was out of date and downright inaccurate!
They also warn you to watch outÂ whereÂ the information you use is coming from - all the sources you used were from the USA and you missed out all the European research in this area.
But perhaps most embarrassing - apparently you're not allowed to "cut and paste" text from web sites into your assignments - it'splagiarismÂ - unless you use properÂ citationÂ methods - so you get an outright fail!
Your lecturer suggests you brush up on your Internet research skills.
What does this mean?
University and college students sometimes fail assignments or get poor marks in their coursework because they have used the Internet in ways that are inappropriate for work at this level.
You may have used the Internet to help with school work or personal research but you can't necessarily rely on the same web sites and skills to get you through higher or further education.
Repeating information from a single source (eg. a text book, encyclopaedia or Web site) is not likely to get you very far.
Common mistakes made by students:
They rely on Internet searches for their research andÂ ignore other key sources
They don'tÂ critically evaluateÂ the quality of the information they find
They copy information from the Internet and don'tÂ acknowledge their sources
At university or college you will need to take your Internet research skills to the next level
At this level of your education you will be expected to:
Be able to do your own independent research
Locate and use a wide range of information sources
Critically evaluate the information you find
Synthesize information to form your own original piece of work
Present a balanced and well-informed argument leading to your own conclusions.
You should take full advantage of yourÂ reading list, course materials and library resources. You might also be tempted to turn to theÂ wider webÂ in which case you need to tread very carefully.
You will need to develop some advanced Internet research skills.
This tutorial can help!
In this section we have looked at how developing your Internet research skills can help you succeed in your university and college work.
OK, so let's look at some specific Internet Detective skills ...
The Good, The Bad and The Ugly
The quality of information on the Internet is extremely variable.
At best the Internet is a great research tool, at worst it can seriously degrade your work by feeding you misinformation.
Use this section of the tutorial to learn about:
The good:Â academic publishing on the Internet
The bad:Â time wasting on Internet searches
The ugly:Â Internet hoaxes, scams and legends
The good news is that many sources of authoritative research information now publish on the Internet.
In the academic world it is considered very important that new research builds upon past research and that the quality of information is assured. There are formal processes to facilitate this, and it's essential you understand these if you are to succeed at university.
Let's look at some of the information sources that are traditionally used to support academic research and at how these are increasingly available online ...
The Academic publishing process
Academics usually publish their research in formal publications such as journal papers and articles or reports. These follow formal procedures designed to quality-assure the work.
Peer review / refereeing
Peer review is what characterises academic research. If a publication is peer reviewed it means it has been read, checked and authenticated (reviewed) by independent, third party academics (peers). Peer review has been the quality-control system of academic publishing for hundreds of years.
Peer reviewed articles are often collated into scholarly journals, which are usually published by academic publishing houses, professional societies or university press. Journals will be a key source of information you at university - you will be expected to reference articles from them in your work.
A university library may have shelves full of journals, but nowadays many are also available in electronic form over the Internet. Ask your lecturers or librarians how to find and use the key journals for your subject - the sooner you do this the quicker you will succeed in your research.
Library eJournal services
Access to eJournals is not usually free - a subscription has to be paid. However, a university library will have paid some subscriptions for its users - who can then get free access to these journals via their library web services, using a special password (check with your library for details).
If you can't get access to eJournals from your library you may be able to via the publisher's web services. Some offer "pay-per-view" which means you pay a small fee for each article you view.
Increasingly academics are offering free access to their refereed journal articles (and sometimes other material) by means of databases accessible via the Web called Institutional Repositories (IRs).
Most academics rely on specialist databases to access details of past research. The databases draw together details of scholarly publications from a wide range of sources including academic publishers, journals, archives and sometimes books, and so enable you to search a large body of the scholarly literature in one go.
Academic web directories
Of course a lot of information on the web can be useful for research even if it hasn't come from the traditional sources. Academic web directories, such as Intute, guide you to the best online resources for research - and each resource has been selected and reviewed by a subject specialist.
Library web sites
The library web site for your university or college will be an important source of information for you, as it will quickly guide you to the key electronic journals, bibliographic databases and archives that you should be using for your research.
Ask your lecturers and librarians for advice on which sources you should be using.
The bad news is that the Internet also leads to a lot of information that is completely inappropriate for your research, and it takes time and skill to weed this out.
The quality of information on the Internet
As things stand the Internet has no standard system of quality control so it's important to be careful about which information you use and not to trust everything you read.
Think about it - the Internet links millions of computers:
Anyone can put something on the InternetÂ - an amateur or an expert
From anywhere in the WorldÂ - be it the United Kingdom or Uruguay
They can say anything they likeÂ - be it true or false
And leave it there as long as they likeÂ - even if it goes out of date
Or change it without warningÂ - perhaps even remove it completely
There is a danger that the information you find on the Internet will:
Be from a source that isÂ unreliable, lacking in authority or credibility
Have content that isÂ invalid, inaccurate, out-of-date
Not be what it seems!
Weeding out poor quality information takes time
Most people use very simple search techniques when they want to find information on the Internet using aÂ search engineÂ such as Google.
These can produce thousands if not millions of web sites to explore: some information will beÂ useful, some will beÂ uselessÂ - it's up to you to discern which is which!
It can take considerableÂ time and skillÂ to sift through search engine results and evaluate which are the best sources.
Although it may seem a quick and easy option to turn to a search engine for your research, it might be more effective to turn to web services designed specifically for university and college research such as yourÂ library web site.
It's easy to miss key information
If you want to find something on the Internet, you go to a search engine, as they containÂ everythingÂ that is available online, right? Wrong!
Search engines only cover aÂ proportionÂ of what is available online, a lot of information isÂ hiddenÂ orÂ invisibleÂ to them. For example, some of the databases of research literature that we discussed earlier will not appear in search engine results, especially if they require a subscription or password to get access.
It's also worth remembering that search engines only search information that is online, and of courseÂ a huge body of research literature is still only available in printÂ form in books and journals.
If you try doing the same search in different search engines you will get a different set of results on each search engine - which reveals thatnone of them index the whole Internet.
Â Try this to compare search engines
It's a common misconception that search engines (such as Google) search everything - they don't - so if you rely on them alone you may miss some of the key sources for your research - consider using other sources too, such as your library catalogue, other databases and academic web search tools.
At worst the Internet can lead you to misinformation that could land you in real trouble.
Unfortunately there are a lot of sharks on the Internet - people who want to trick you, misinform you, deceive you and defraud you. Some web sites and emails can be real crime scenes.
Be sceptical, not paranoid!
This page will highlight some classic cases of misinformation on the Internet: Internet hoaxes, urban legends, scams and hate sites.
You need to develop some healthy scepticism when using the Internet for research but there's no need to get paranoid - we've already seen that there's plenty of good stuff out there too. OK, let's get ugly ...
Some web sites are fakes designed to be spoofs, parodies or jokes. This is fine as long as you realise it's a fake and don't take it at face value!
Hoaxes are often about famous people, politics, products or organisations. Their content is humorous and the fact that they are not 'real' sites can be easy to spot. Some sites even include a disclaimer, just in case you don't get the joke, freely admitting that the web site is a hoax.
Â See an example spoof
ThisÂ mirroring of the web designÂ is a clever trick to deceive you into thinking you have accessed the real site. In some cases the design is so like the original that you have to look very carefully to determine whether it is real or fake.
Sometimes fake web sites are designed to make a more serious point, be it political or educational.
Â See an example parody site
Urban legends can be harmless but only if you realise they are not actually true!
What are urban legends?Â They are stories or rumours that have been circulated from person to person. In the past they were spread by word of mouth but now are often spread via email or web sites. Some may originally have contained elements of truth, but have become distorted by mistakes being made in the retelling. Others have been complete fabrications from the start.
Warning: if an email contains a phrase like: "Please, send this message to as many people possible!!!!" it should alert you to the idea that you may be looking at an urban legend and so the last thing you should do is forward the email to anyone.
The Internet is awash with false information, which people endlessly forward on to others believing it to be true. They become SPAM that clogs up the networks and peoples' email, misinforms them and wastes their valuable time.
Â See some examples of urban legends
Scams and frauds
Scams and frauds are more serious as they involve criminals trying to steal your identity or con you out of your cash
TheÂ Office of Fair TradingÂ describes SCAMS as:
Their advice is that "If it looks too good to be true it probably is!"
Â See some examples of scams
Sadly, the Internet can reflect the worst side of human nature and is sometimes used for defamation or to advocate hate, violence and hostility.
Some web sites with malicious intent have become known as Hate Sites because they disseminate such information. This could be about a person, an organisation, a religion, a political viewpoint - the list is endless.
Â See an example of a hate site
How do you spot the fakes?
A number of web sites exist to expose fake sites and frauds.
If you are unsure if a site is genuine then check these sites to see if it is listed there as a fake. A quick search here could save you a lot of embarrassment!
SnopesÂ [Â http://www.snopes.com/Â ] is a really great site for checking out anything you think might be an urban legend, hoax or scam. It keeps a huge archive of examples of urban legends, myths and hoaxes - so if you do have suspicions about an email check this site to see if it is a hoax.
TheÂ Office of Fair Trading: Advice on ScamsÂ [Â http://www.oft.gov.uk/oft_at_work/consumer_initiatives/scams/Â ]Â gives the official line on what to do if you become a victim of Internet fraud and has good advice on how to spot scams and frauds.
ScambustersÂ [Â http://www.scambusters.com/Â ] gives information about how to avoid becoming a victim of identity theft, or of frauds such as pyramid selling, or money laundering scams.
Remember, it's up to you to make sure you don't degrade your work by quoting misinformation from the Internet.Â If in doubt, leave it out!
Top of Form
Q1. What is the traditional quality control system for work published by academics?
Â Peer review
Â Proof reading
Â Publishing research
Bottom of Form
Top of Form
Q2. You've just been set an assignment. Where should you start looking for sources?
Â A search engine
Â The library web site
Bottom of Form
Top of Form
Q3. What should you do if you are unsure whether a web site you are thinking of using as a source for your work is genuine?
Â Use it in your work anyway - your tutor probably won't notice
Â Look to see whether it is listed on a web site where hoaxes are posted.
Â Leave it out of your work
In this section we have looked at the good, the bad and the ugly for Internet research:
The good:Â academic publishing on the Internet
The bad:Â wasting time on Internet searching
The ugly:Â Internet scams and frauds, urban legends and myths
It's often up to you to discern which is which!
The next section of the tutorial will help you do just this ...
In this section we will look at some practical steps you can take to critically evaluate information you find on the Internet.
It can pay to think like a detective:Â
Take aÂ case-by-caseÂ approach
Ask questionsÂ (who, what, where) and look for clues
Weigh up the evidenceÂ to make a judgement
Case by Case
"Quality is in the eye of the beholder" - you need to take a case-by-case approach to evaluating information.
The value of information is subjective as different information will be appropriate in different circumstances - it all depends on what you need it for.
For example, if you are doingÂ formal scientific researchÂ you will probably want to rely onÂ peer-reviewed articlesÂ that have been validated and checked by qualified scientists.
If you are writing an essay on something likeÂ popular cultureÂ orÂ political biasÂ it might be appropriate to referenceÂ informal or primary sourcesÂ that represent different points of view and to discuss the strengths and weaknesses of these.
The key is toÂ be clear about your purpose; decide what types of sources would be acceptable to use in light of this, and then to weigh up any information you find in light of your purpose.
What information do you need?
What are the best sources of this information?
What type of Internet resources (if any) would be worth looking for?
If you don't know what you are looking for on the Internet you are likely to spend a lot of timeÂ drifting aimlessly through cyberspaceÂ - so save time by deciding exactly what you're trying to find before you start searching!
Check which sources your lecturers are happy for you to useÂ - do they want you to stick to yourÂ reading listÂ orÂ library resourcesÂ or are they happy for you to search theÂ wider web?
Once you know what you're looking for you can get on the case.
The phrase "don't judge a book by its cover" also applies to web sites.
You need to question the quality of information you find on the Internet before you use it in your research.Â
A novice searcher will make judgements based purely on the look and feel of the site.
An expert researcher will make judgements based on the content of the site, and the credibility of the source of the information.
There is a simple line of questioning that can help:
On the WWW ask WWW: Who? What? Where?
Who?Â - question the source of information
What?Â - question the content of information
Where?Â - question the location of the information
Can you trust your sources? You will need to establish their credibility, reliability and authority.
Authors,Â publishers,Â sponsorsÂ andÂ developersÂ will all impact on the reliability and credibility of the content of the information.
It's important to identifyÂ who is providing the informationÂ and to considerÂ whether they can be relied onÂ to provide the information you need.
Remember, your search results might list:
Scholarly journalsÂ next toÂ tabloid news.
Peer-reviewed articlesÂ next toÂ vanity publishing.
The site of aÂ Nobel prize winning scientistÂ next to that of anÂ Internet quack.
Detective work on sources
You need toÂ identify andÂ verifyÂ your sources.
Who is the author?
Who is the publisher?
Who sponsored or funded the site?
Do you recognise them as an authoritative source?
What are their credentials, qualifications, background and experience?
Has the information been edited or peer reviewed?
Are the sources trustworthy?
What are their motives for publishing the information?
What standpoint do they take: impartial? Biased?
Do other Internet sources that you trust link to this site?
Look for clues
ToÂ gather evidenceÂ look for:
Author detailsÂ is there a biographical statement that lists their job title, contact details, qualifications and publications? Is this on the Web site of their employer or is it their own personal web site?
Details about theÂ publisher, sponsorÂ orÂ developerÂ of the site.
TheÂ About UsÂ section,Â Mission StatementÂ orÂ HelpÂ - these might help establish their history, affiliations and standpoint.
TheÂ Contact DetailsÂ - is there a physical address which verifies claims of authorship?
PhotographsÂ of the author or offices of the organisation.
AÂ Copyright StatementÂ to help establish the owner.
Consider how you came by the site- was it aÂ link from a trusted source?
TheÂ URLÂ (more on this later in this section).
Tips on checking your sources
On the Internet the source of the information may not always be made explicit butÂ in academic work youÂ mustÂ be able to cite your sources. Always look for statements ofÂ authorship. Is there any information about their qualifications, their position or who they work for?
If you've never heard of the sourcesÂ try doing a quick Internet search on their name. Does Google tell you more about their credentials?
You canÂ check to see if the author has published anything else byÂ conducting a search on a relevant bibliographic database.
If you are quoting information taken from the web site of an organisation, always check that it is aÂ reputable body. Look to see if it is listed in any of the directories of associations or organisations that you will find in your local library. Check if it quotes support or sponsorship from any other established bodies.
Be wary of contact details that give you aÂ POÂ number as an addressÂ or which offer aÂ premium rate phone numberÂ - these are common tactics used by Internet fraudsters.
If the sources are not disclosed - consider rejecting the information.
Can you trust the content of what you see?
You will need to establish itsÂ coverage, validity, accuracyÂ andÂ currency.
If theÂ material presentedÂ isÂ inaccurate,Â untrue,Â illogical, orÂ out-of-dateÂ then it's unlikely to be a lot of use for serious research.
It's important for you toÂ evaluate the contentÂ of the information you find andÂ think criticallyÂ about theÂ arguments,Â assertions,factsÂ andÂ dataÂ that are presented - are they of sufficient quality for your needs?
Remember, your search results might list:
Scientific factsÂ next toÂ unfounded opinions.
Professional adviceÂ next toÂ idle gossip.
TheÂ latest researchÂ next toÂ last year's news.
Detecting the value of information content
Are the arguments and conclusionsÂ validÂ ie. well founded inÂ logicÂ orÂ truth?
Does the authorÂ back up any claimsÂ with reliable third-party support (eg.Â citations, references,Â research dataÂ andÂ source material?
Is there aÂ balanced argumentÂ or is it one-sided?
Do you agree with the conclusions it draws?
Is the informationÂ accurate:Â or can you spot errors (eg.Â typographical errorsÂ orÂ broken links).
Is the informationÂ currentÂ - or might it be out of date or superceded by more recent publications? Is there aÂ "last-updated"date?
Is theÂ coverageÂ sufficient?Â Does it include all the aspects of the subject that you need in enough breadth or depth?
Is theÂ levelÂ of the site appropriate?Â Does it treat the subject at the level you require or is it an introductory guide that is too basic?
Is itÂ completeÂ - is it available in full or has it been abridged?
Is it aÂ commentaryÂ or anÂ originalÂ text? AÂ primaryÂ orÂ secondaryÂ source?
Is itÂ factÂ orÂ opinion?
Are thereÂ advertsÂ everywhere, that might make you question the motives of the online publication?
Look for clues
Take time to gather evidence about the content. Look for:
Bias and controversial statements that are unsubstantiatedÂ - use your own knowledge to question content and if it goes against what you know then look for evidence to back it up.
Research evidenceÂ - to back up the arguments and assertions presented (eg. look for good quality research methods, research data and reviews of past literature in the field.).
Proper referencesÂ - especially in academic works - these should follow conventional citation practices and come from authoritative sources.
Mistakes and inaccuracies: if you spot any of these it should be a cause for concern - an editor or reviewer should have picked these up so maybe it hasn't been properly checked and cannot be relied upon in other ways?
DatesÂ - for when it was written, published and last updated - how useful is it for your purposes?
Tips on checking the content
Site maps,Â Content pages and About UsÂ statements - they often tell you the scope and coverage of the work
You will need to cite theÂ titleÂ of the work and theÂ dateÂ it was published in your references so make sure you can find these.
If you are looking for current news headlines or the most recent version of an article it is important that you are seeing the mostup-to-date information.
If the site offers something for nothing, asks you to send money to claim a free gift or prize, or asks for your bank account details it's probably a scam!
It has been said that the Internet can be used to find evidence to support any argument, but it's up to you to make sure that the evidence will stand up in court (well, stand up to the critical eye of your lecturers as they mark your coursework!)
If in doubt, leave it out!
Do you know where your information is coming from?
You will need to establish the location and origin of the information.
Which part of the WorldÂ is your information coming from andÂ on whose computerÂ is it located?
Remember,Â information on the Internet might be based on a computer anywhere in the wo