The Web Evolved Beyond Ftp English Language Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The Web evolved beyond FTP archives not just by becoming a graphically rich multi-media world, but by evolving tools which made it possible to find and access this richness. Oldsters like this author remember that before browsers there was WAIS (released 1991), and the XWAIS version provided a user-friendly GUI way to find information. However, this system required servers to organize information according to a specific format. GOPHER, another information serving system with some user-friendliness, was released the same year. One of the earliest search engines like those today, Lycos, began in the spring of 1994 when John Leavitt's spider (see below) was linked to an indexing program by Michael Mauldin. Yahoo!, a catalog, became available the same year. Compare this to the appearance of NCSA Mosaic in 1993 and Netscape in 1994.

Today there are a score or more of "Web location services." A search engine proper is a database and the tools to generate that database and search it; a catalog is an organizational method and related database plus the tools for generating it. There are sites out there, however, that try to be a complete front end for the Internet. They provide news, libraries, dictionaries, and other resources that are not just a search engine or a catalog, and some of these can be really useful. Yahoo!, for example, emphasizes cataloging, while others such as Alta Vista or Excite emphasize providing the largest search database. Some Web location services do not own any of their search engine technology - other services are their main thrust. Companies such as Inktomi (after a native American word for spider) provide the search technology. These Web location services have put amazing power into every user's hands, making life much better for all of us. . . . and it's all free, right?

. . . Maybe not. It is rumored that these information companies might increase their revenues by selling information - information about you. After you use a search engine and find a page with mutual fund quotes, you might find yourself suddenly receiving e-mail advertising investments. Think this is a coincidence? Think again. The investment company could have paid a search engine for your e-mail address. The sale of such information is not advertised at this time, however, there is an existing protocol for servers to ask a user's browser for such information, routinely entered during set-up. Get scared about your privacy by checking out the anonymizer snoop page. For best results, search for the anonymizer snoop page, "I can see you", then go to it from your search engine (you'll see what I mean). For now, let's stick to the practical aspects of search engines, catalogs, and Web location services.

II. How Software Agents and Search Engines Work

There are at least three elements to search engines that I think are important: information discovery & the database, the user search, and the presentation and ranking of results.

Discovery and Database

A search engine finds information for its database by accepting listings sent in by authors wanting exposure, or by getting the information from their "Web crawlers," "spiders," or "robots," programs that roam the Internet storing links to and information about each page they visit. Web crawler programs are a subset of "software agents," programs with an unusual degree of autonomy which perform tasks for the user. How do these really work? Do they go across the net by IP number one by one? Do they store all or most of everything on the Web?

According to The WWW Robot Page, these agents normally start with a historical list of links, such as server lists, and lists of the most popular or best sites, and follow the links on these pages to find more links to add to the database. This makes most engines, without a doubt, biased toward more popular sites. A Web crawler could send back just the title and URL of each page it visits, or just parse some HTML tags, or it could send back the entire text of each page. Alta Vista is clearly hell-bent on indexing anything and everything, with over 30 million pages indexed (7/96). Excite actually claims more pages. OpenText, on the other hand, indexes the full text of less than a million pages (5/96), but stores many more URLs. Inktomi has implemented HotBot as a distributed computing solution, which they claim can grow with the Web and index it in entirety no matter how many users or how many pages are on the Web. By the way, in case you are worrying about software agents taking over the world, or your Web site, look over the Robot Attack Page. Normally, "good" robots can be excluded by a bit of Exclusion Standard code on your site.

It seems unfair, but developers aren't rewarded much by location services for sending in the URLs of their pages for indexing. The typical time from sending your URL in to getting it into the database seems to be 6-8 weeks. Not only that, but a submission for one of my sites expired very rapidly, no longer appearing in searches after a month or two, apparently because I didn't update it often enough. Most search engines check their databases to see if URLs still exist and to see if they are recently updated.

User Search

What can the user do besides typing a few relevant words into the search form? Can they specify that words must be in the title of a page? What about specifying that words must be in an URL, or perhaps in a special HTML tag? Can they use all logical operators between words like AND, OR, and NOT?

Query Syntax Checklist

How does your engine handle:

Truncation, Pluralization & Capitalization:

Macintosh, Mac, Macintoshes, Macs, macintosh, macintoshes, mac, macs, could all yield different results. Most engines interpret lower case as unspecified, but upper case will match only upper case, but there are exceptions. There is no standard at all for truncation, and worse yet, it is probably different in general and advanced search mode for every engine.

Multiple Words

does the engine logically AND them or OR them ?

Phrases

Typically one puts quotes around a phrase so that each word in the phrase is not searched for separately.

. . . Check with your engine's help file before starting a search.Most engines allow you to type in a few words, and then search for occurrences of these words in their data base. Each one has their own way of deciding what to do about approximate spellings, plural variations, and truncation. If you just type words into the "basic search" interface you get from the search engine's main page, you also can get different logical expressions binding the different words together. Excite! actually uses a kind of "fuzzy" logic, searching for the AND of multiple words as well as the OR of the words. Most engines have separate advanced search forms where you can be more specific, and form complex Boolean searches (every one mentioned in this article except Hotbot). Some search tools parse HTML tags, allowing you to look for things specifically as links, or as a title or URL without consideration of the text on the page.

By searching only in titles, one can eliminate pages with only brief mentions of a concept, and only retrieve pages that really focus on your concept.

By searching links, one can determine how many and which pages point at your site. Understanding what each page does with the non-standard pluralization, truncation, etc. can be quite important in how successful your searches will be. For example, if you search for "bikes" you won't get "bicycle," "bicycles," or "bike." In this case, I would use a search engine that allowed "truncation," that is, one that allowed the search word "bike" to match "bikes" as well, and I would search for "bicycyle OR bike OR cycle" ("bicycle* OR bike* OR cycle*" in Alta Vista).

Presentation & Ranking

With databases that can keep the entire Web at the fingertips of the search engines, there will always be relevant pages, but how do you get rid of the less relevant and emphasize the more relevant?

Most engines find more sites from a typical search query than you could ever wade through. Search engines give each document they find some measure of the quality of the match to your search query, a relevance score. Relevance scores reflect the number of times a search term appears, if it appears in the title, if it appears at the beginning of the document, and if all the search terms are near each other; some details are given in engine help pages. Some engines allow the user to control the relevance score by giving different weights to each search word. One thing that all engines do, however, is to use alphabetical order at some point in their display algorithm. If relevance scores are not very different for various matches, then you end up with this sorry default. Zeb's [Whatever] page will never fare very well in this case, regardless of the quality of its content. For most uses, a good summary is more useful than a ranking. The summary is usually composed of the title of a document and some text from the beginning of the document, but can include an author-specified summary given in a meta-tag. Scanning summaries really saves you time if your search returns more than a few items.

Get More Hits By Understanding Search Engines

Knowing just the little bit above can give you ideas of how to give your page more exposure.

Hustle for Links

Most software agents find your site by links from other pages. Even if you have sent in your URL, your site can be indexed longer and ranked higher in search results if many links lead to your site. One of my sites that couldn't show up in the most casual search got most of its hits from links on other sites. Links can be crucial in achieving good exposure.

Use Titles Early In the Alphabet

All engines that I used displayed results with equal scores in alphabetical order.

Submit Your URL to Multi-Database Pages

It is best to use a multiple-database submission service such as SubmitIt! to save you the time of contacting each search service separately. Remember, it takes 6-8 weeks to become indexed.

Control Your Page's Summary

You can use the meta tag name="description" to stand out in search results. Appear in search summaries as "Experienced Web service, competitive prices" not "Hello and welcome. This page is about."

Search Reverse Engineering

Simulate your audience's search for your page (have all your friends list all the searches they might try), then see what you need to do to come up first on their search engine's results list.

Use the meta-tag name="keywords" to put an invisible keyword list at the beginning of your document that would match keywords your audience would use. Most search engines rate your page higher if keywords appear near the beginning.

How many times do the keywords appear in the text? It usually demonstrates good writing if you don't repeat the same words over and over. However, search engines penalize you for this, usually rating your page higher for repetitions of keywords, inane or not. Some authors combat this by putting yet more keywords at the bottom of their pages in invisible text. Look at the source code for this article, and you'll see what I mean; the words are just in the same color as the background.

SPAMMERS BEWARE

"Spamming" is net-lingo for spreading a lot of junk everywhere; keyword spamming is putting hidden keywords a huge number of times in your document just so yours will be rated higher by search engines.

Search engines typically limit you to 25 keywords or less, and one I know of truncates your list when they see an unreasonable number of repetitions.

Invisible text at the end of your pages puts blank space there, which looks bad and slows loading. Services which rate pages will enjoy marking you down for this.

Responsible Keyword Use: If an important keyword doesn't appear at least four times in your document, I hereby give you the right to add invisible text until it appears a maximum of five times.

III. Getting the Most Out of Your Search Engine

Search Engine Features

Web location services typically specialize in one of the following: their search tools (how you specify a search and how the results are presented), the size of their database, or their catalog service. Most engines deliver too many matches in a casual search, so the overriding factor in their usefulness is the quality of their search tools. Every search engine I used had a nice GUI interface that allowed one to type words into their form, such as "(burger not cheeseburger) or (pizza AND pepperoni)." They also allowed one to form Boolean searches (except Hotbot as of 7/1/96, which promises to install this feature later), i. e. they allowed the user to specify combinations of words. In Alta Vista and Lycos, one does this by adding a "+" or a "-" sign before each word, or in Alta Vista you can choose to use the very strict syntax Boolean "advanced search." This advanced search was by far the hardest to use, but also the one most completely in the user's control (except for OpenText). In most other engines, you just use the words AND, NOT, and OR to get Boolean logic.

By far the best service for carefully specifying a search was Open Text. This form has great menus, making a complex Boolean search fast and easy. Best of all, this service permits you to specify that you want to search only titles or URLs. But then there's Alta Vista's little known "keyword" search syntax, now as powerful as OpenText, but not as easy to use. You can constrain a search to phrases in anchors, pages from a specific host, image titles, links, text, document titles, or URLs using this feature with the syntax keyword:search-word. There is an additional set of keywords just for searching Usenet. (To my knowledge, Alta Vista's keywords were undocumented before 7/19/96, so tell your friends you heard it here first!)

Which Search Page Should I Use When, and How?

Use . . .

If You . . .

Using the Feature . . .

Lycos

have no good ideas for specific search strategies

best test results for broad search terms

" "

want to find someone's e-mail

People Finder.

Magellan

have more than one broad search word, or can't pick a site from Lycos' summaries.

best available results summaries.

" "

want interactive news/ want details on today's headlines.

news with links to related sites.

OpenText

want to search only document title or perform complex searches

title search specification, best advanced search interface.

Alta Vista

are hunting for an image

image:search_word syntax.

" "

want to find all the links to your page

+link:your_site -url:your_site syntax.

Yahoo!

want the best national and international news

Reuters world headlines.

" "

want a dictionary or other reference source

Dictionaries or Reference Libraries.

What could really make engines with large data bases shine, however, would be an improvement in the way they rank and present results. All engines I tested had ranking schemes that were not well documented, based on how many times your search words were mentioned, whether or not they appeared early in the document, whether or not they appeared close together, and how many search terms were matched. I did not find the ranking schemes very useful, as relevant and irrelevant pages frequently had the same scores.

Useful Non-Search Goodies

E-mail address books:

Most engines allow you to search for someone's name if you quote it "John Q. Webhead", but you have to be careful about exact spelling, use of initials, etc.

News Services:

Yahoo! has the best news, in my humble opinion, as they have Reuters international news headlines. Most other news are ultra-brief summaries which read like "MacPaper."Catalogs

I have only been disappointed by catalog services. In practice, they seem to aim for the lowest common denominator, and reflect very little thought to how and when they might be useful instead of search engines. All the ones I tested were directed toward novices and favored popular commercial sites. I would have thought they would be very good for finding software at least, but this was not the case. See the example below trying to find Web server related software.

Advanced or Boolean Queries

Making queries very carefully in Boolean terms to narrow a search rarely produces useful results for me (but see below). In practice, other ways of specifying a search besides detailed logic are much more useful. Specification of exact vs. approximate spelling, specification that search terms must appear as section headings or URLs, using more keywords, and just specifying the language of the document would have been more valuable in all of my search examples.

Example: Eliminating Unwanted Matches

The exception to this is the AND NOT operator - it is essential to exclude unwanted but close matches when they outnumber the desired matches. An example of when to use this operator is given by the problem of finding information on growing apples, because you will be deluged by information on Apple computers. With enough work, you can start to see apples with stems, not cords, but it isn't easy. Using Alta Vista, "+apple -mac* -comp* -soft* -hard* -vendor" got me information on the Payson-Santaquin apple farming region and a federal apple agriculture database on the first page of results.

Useful Search Features

ï‚· Find Images to Steal (Alta Vista)

I bet you will all use this at one time or another, so I insist you credit this article and webreference.com for this goodie: With Alta Vista, you can limit your search to image titles by using the format:

image:title_string

This was the only way I could find a useful picture of a nose for a physician's page - I had searched through jillions of clip art pages, and even contacted graphic artists, and they couldn't come up with anything as good as I found for free! USE THIS.

Try it now (replace ansel with your choice of image search string):

Top of Form

Alta Vista Search: 

Bottom of Form

 Search for Strings in Titles (Alta Vista, OpenText) for faster results.

If applicable, this kind of search eliminates chaff by sticking to the pages that center on your subject, not ones that just mention a lexically related word. Use the syntax:

title:search_string

in Alta Vista, or just use the simple pull-down menus in OpenText's "advanced search mode."

 Find the Links to Your Own Site (Alta Vista) 

Alta Vista claims that you can get all the links to your own site by searching with the keyword construction: +link:http://mysite.com/ -host:mysite in the Simple query 

...I found that the most important link to one of my sites was missing from this search, so I was not impressed; however, my editor swears by this. Try it now (replace webreference below with your site name):

Top of Form

Alta Vista Search: 

Bottom of Form

 Find the Number of Links to Your Own Site (Alta Vista) 

For a more accurate estimate of the actual number of links to your site (or backlinks), use Alta Vista's advanced search, and display the results as a "count only." The above method will give you links, but approximates their number, this method more accurately estimates the number of backlinks. Try it now (replace webreference below with your site name) ABK-12-29-96:

Top of Form

Search  and Display the Results 

Selection Criteria: Please use Advanced Syntax (AND, OR, NOT, NEAR).

Bottom of Form

Which is the Best Search Engine?

(It's not just how big your data base is, it's how you use it.)

To decide which search engine I would choose as the best, I decided that nothing but useful results would count.Previous articles have emphasized quantified measures for speed and database sizes, but I found these had little relevance for the best performance in actual searches. By now, all engines have great hardware and fast net links, and none show any significant delay time to work on your search or return the results. Instead, I just came up with a few topics that represented, I felt, tough but typical problems encountered by people who work on the net: First, I tried a search with "background noise", a topic where a lot of closely related but unwanted information exists. Next, I tried a search for something very obscure. Finally, I tried a search for keywords which overlapped with a very, very popular search keyword. I defined a search as successful only if the desired or relevant sites were returned on the first page of results.

Example - Search Terms Which Yield Too Many Matches

For the first type of search, I wanted to find a copy of Wusage to download, free software that lets you keep track of how often your server or a specific page is accessed, a common tool for HTML developers. This site is hard to find because output files are produced by the program on every machine running it that have the string "wusage" in their title and text. When I simply typed "wusage" into search page forms, Infoseek and Lycos were the only engines to find the free version of the software I wanted. (Note I gave no credit for finding the version for sale. A careful search of the sale version's page, did not produce any links to the free version's download site.) Infoseek's summaries were very poor, however, and all matches had to be checked.

Always Search As Specifically As Possible

Most engines failed to find their quarry because the search was too broad. After all, how is the engine supposed to know I want the free version? After spending a long time to find out the exact name of what I wanted, "wusage 3.2", Infoseek, Excite, Magellan, and Lycos all found the site I was interested in. Alta Vista, Hotbot, and OpenText yielded nothing of interest on their first page. Magellan came out the clear winner on this search, as the site summary was by far the best. (Asking Alta Vista to display a detailed version of the results didn't change things at all!) Infoseek and Excite performed well, but Lycos listed a much older version of wusage (2.4) first.

Think About Search Terms

It eventually occurred to me to search for "wusage AND free" to find the free copy of wusage. In some sense, Lycos was the winner this time because the free version was the first match listed; however, its summary was not very useful. While it did a better job than Infoseek, it didn't tell me whether each site was relevant or not. Magellan's response was very good, as it included a link leading to the software on the first page of matches, again with an excellent summary. Yahoo and Alta Vista also found it, but all these engines rated the fee version higher than the free version. OpenText did very well here, but only in advanced search mode where it was possible to specify that wusage must be in the title, and "free" could be anywhere in the text. Wusage3.2 was listed as the second of only two entries - no digging here! Excite failed to find the site at all, and HotBot found only 10 matches for statistics of a server in Omaha.

Curiously, a search for "download wusage" did not improve the results over the single-word searches for any of the search engines! (It may be time for rudimentary standardized categories to be used on the Web: e.g. this is a download archive, this is an information only site, this is an authoritative site, etc.) The lesson here may just be "if at first you don't succeed..."

Catalogs

Catalogs were not helpful. Yahoo!, under computers/software had nothing whatever to try for wusage: no http, no HTML, no wusage, not even servers. In Excite!, under computing/www/web ware, three more clicks got me to wusage, but -surprise!- I could not get to the free version. See why you don't want anyone else filtering your information?

The lessons from this search, which I have found repeated in other searches, are given in the "Examples: Summary . . ." box below.

Examples Summary: How To Improve Your Searches

The most valuable search tool is specific information

on a search. (In the search for wusage, I had no problems when I knew that version 3.2 was what I needed.)

Think about your search terms - the next most important search tool

Obviously, since I wanted the free version of wusage in the example, I should have searched for "free AND wusage"; I got nothing with just "wusage" with most engines.

Good site summaries save you time by saving you surfing

Use Magellan or OpenText if possible. To research the example above, I had to pour through dozens of pages. Only Magellan's summaries really gave me any confidence that I did not have to check every site.

Specify a "title only" search if applicable

Title only searches are available only with OpenText and Alta Vista. In the examples, it yields more practical results than coming up with lots of search words, (as help pages suggest) or than forming logically complex search queries (as one might think). Adding more search words made the results above worse, not better. A Boolean search also did no better, e.g.. "wusage AND (free or download)" yielded nothing from Alta Vista.

Searches Can Yield New Information, but they are never complete

None of my searches ever found the good page on tegu care that I know exists.Example - Finding The Really Obscure

For this example, let's try to find out how to care for a "tegu", a South American lizard that is only moderately popular even among lizard enthusiasts. (If that's not an adequate example of obscure information, I don't know what is.) I know that a page exists called "TEGU INTRO" at http://www.concentric.net/ ~tegu/tegu.html, but we will simulate a blind search here. This search was full of surprises.

First I began by just searching for the string "tegu." Infoseek's first match was a tegu page I did NOT know about! Still, the one I wanted was not listed on the first page. Excite yielded nothing about tegus, only information on a vaguely related reptile, the "dwarf tegu." A search on the string "tegu care" yielded nothing relevant. (A search on their handy Usenet database did find the old tegu article I was looking for, three weeks old, which was no longer on my local news server. Other engines found this as well.) Lycos came up with the URL Infoseek found, plus two more, however, the additional listings were only pictures, not information. Searching for the string "tegu care" got nothing. Alta Vista found nothing useful either way, just ads for lizard food. OpenText found nothing, even when I searched for "tegu lizard." Hotbot found a picture of a tegu with "tegu care," but it did not return any relevant information with any search.

None of the searches I tried came up the URL I knew about. The lesson here is that you can really find new things on the Web with search engines, but if you need to find a specific page, it will always be a crap shoot. Advanced searches yielded nothing more with any engine ("tegu in title AND (care or lizard)", etc.) Some way to require that the searches were only among English language documents would have been much more helpful. Some northern-European sounding language apparently has the word tegu in it, not referring to a lizard, and many foreign language pages fouled my results on some engines. Another feature that would really have made a difference would be a filter for sales pages -- most of the mentions of tegu on the net are ads for "Monitor and Tegu Food", containing no care information. As expected, Yahoo! and Excite! Catalogs were useless here as well.

Example - Selectivity: Apple Trees NOT Apple Computers

There are gobs of stuff on the net about Apple Computers, but what about growing apple trees? Surprisingly, this search was very easy! apple* alone always yielded lots of stuff about the computers, and one often had to add as many as five excluded terms (apple* -vendor* -hard* -soft* -comp* -mac*) before receiving any matches for apples you can eat. Surprisingly, however, just apple* tree* usually yielded detailed information on growing apple trees on the first page of results. The poorer results required one to increase the search command to apple* tree* grow*.

And The Winner Is. . .

I don't really want to pick a winner. . . All right, if you insist: The "Search Test Results . . ." table, below, lists the engines in order of their ranking. Lycos is therefore the official heavy weight search engine champion of the universe, based on the tests above. However, I think this is missing the point. As shown in the table, "Which Search Page . . . ?", above, you should choose different engines for different tasks. None of the engines tested were able to limit their searches to images except for Alta Vista. This engine must therefore surely be the bestone for graphics designers if they are allowed to use only one, but for most other purposes, the user will have to wade through the mountains of chaff and drek to find what they want. It is more beneficial to use different engines for different tasks; at most only a few are required.

Search Engine Test Results

Engine

"One Item Among Many Related Pages" Test

"Obscure Item" Test

"Selectivty: Apple Trees Not Computers" Test

Comments

Lycos

Found item with broad search word and exact name. 

Found item first on results list with two search terms.

Found unknown item, but not known item.

Just apple$ tree$ yielded good results.

Returned the most relevant matches in the tests, but requires more time to check bad matches than Magellan.

Infoseek

Found item with broad search word and exact name. 

Found item with two search terms.

Found unknown item, but not known item.

Just apple$ tree$ yielded good results.

Poor Summaries.

OpenText

Found wusage in title search

Found Nothing.

Good results with 2 or 3 terms, most useful with 3 terms due to superior summaries.

Ability to specify title searches very useful and user-friendly. Summaries very good.

Alta Vista

Failed with approximate and exact words. 

Found item low on first page with two search terms.

Found nothing

Good results with apple* tree* grow*.

Keyword searches for images, titles, etc. are very useful in other searches.

Magellan

Found with exact name. 

Found item low on first page with two search terms.

Found nothing

Required three search terms: apple* tree* grow*

Superior summaries always save you surf time.

Excite

Found with exact name, failed with two word search.

Found nothing.

Required third search term: apple* tree* grow*, even then irrelevant results were first.

. . .

HotBot

Failed all searches

Failed all searches

Found only images, and did worse when grow* was added!!!

Poorest Performer (excluding catalogs).

Excite! Catalog (not engine)

Failed all searches

Failed all searches

Failed all searches

Catalogs not at all useful.

Yahoo! Catalog (not engine)

Failed all searches

Failed all searches

Failed all searches

Catalogs not at all useful.

IV. Conclusions

Different engines have different strong points; use the engine and feature that best fits the job you need to do. One thing is obvious; the engine with the most pages in the database IS NOT the best. Not surprisingly, you can get the most out of your engine by using your head to select search words, knowing your search engine to avoid mistakes with spelling and truncation, and using the special tools available such as specifiers for titles, images, links, etc. The hardware power for rapid searches and databases covering a large fraction of the net is yesterday's accomplishment. We, as users, are living in a special time when search engines are undergoing a more profound evolution, the refinement of their special tools. I believe that very soon the Web will evolve standards, such as standard categories, ways of automatically classifying information into these categories, and the search tools to take advantage of them, that will really improve searching. I think it's exciting to be on the Web in this era, to be able to watch all the changes, and to evolve along with the Web as we use it.

V. References and Recommended Reading

A fairly extensive list of search engines and related services appears on Netscape's Net Search Page but you should also look at Web Crawler and the many others that exist. Remember, a new, better engine could come on-line at any moment, and the underdogs need your support.

For an overly-techy article on search engines, try the IW labs review of engines, Internet World May 1996.

Keep software agents off your site by reading and using A Standard for Robot Exclusion.

The author gratefully acknowledges technical assistance from the very expert Opus One, the most knowledgeable and enjoyable people you will ever meet in this or any other business. This outfit is an excellent reference for anything having to do with computers or the Internet.

* * * *

Dr. Bruce Grossan, when he is not out climbing, hunts supernovae and gamma-ray burst counterparts at the University of California at Berkeley's Space Sciences Laboratory and Lawrence Berkeley National Laboratory. Lately he has also been exploring consulting on educational and business web projects, and writing The Great American Novel.

The Brief

Welcome to Internet Detective - a free online tutorial that will help you develop Internet research skills for your university and college work. The tutorial looks at the critical thinking required when using the Internet for research and offers practical advice on evaluating the quality of web sites.

Who is the tutorial for?

It's designed to help students in higher and further education who want to use the Internet to help with research for coursework and assignments.

What does the tutorial cover?

The tutorial is divided into the following sections:

What's the Story? - understand the advanced Internet skills required for university and college work.

The Good the Bad and the Ugly - see why information quality is an issue on the web, especially for academic research. Learn how to avoid time wasting on Internet searching, scams and hoaxes.

Detective Work - get hints and tips that help to critically evaluate the information you find on the Internet.

Get On the Case - try out your Internet Detective skills with these practical exercises.

Keep the Right Side of the Law - be warned about plagiarism, copyright and citation.

What does the tutorial involve?

You can work through the whole tutorial by selecting the next button at the bottom of each screen, or use the table of contents in the left margin to skip to a section.

The tutorial will take around an hour to complete, but you can do it in more than one sitting.

If you get stuck use the "HELP at the top of the page. ".

OK, let's get on the case!

What's the Story?

University and college work requires some advanced Internet skills

Use this section of the tutorial to learn:

Why students fail if they use the Internet badly

About the potential pitfalls of using the Internet indiscriminately for research

Why you need to step up your Internet skills at university and college

Crime Scene

Picture the scene

You've just spent a week working hard on a piece of coursework. You spent ages doing the research and found loads of information on the Internet.

You have high hopes for a good grade. Then all of a sudden, BANG, you get a fail!

What went wrong?

You scan your feedback comments …

It seems your lecturer is not happy with the references you used.

Apparently you missed out all the key sources of information that you should have used. They ask why you didn't refer to your reading list or any resources from the library.

Some of the sources you quote are inappropriate - they were looking for academic sources such as journal articles, rather than random web sites.

They are also unhappy with the content of the some of the sites you quote -there was a lot of bias and you don't give both sides of the argument. Much of the information you cited was out of date and downright inaccurate!

They also warn you to watch out where the information you use is coming from - all the sources you used were from the USA and you missed out all the European research in this area.

But perhaps most embarrassing - apparently you're not allowed to "cut and paste" text from web sites into your assignments - it'splagiarism - unless you use proper citation methods - so you get an outright fail!

Your lecturer suggests you brush up on your Internet research skills.

What does this mean?

Wise Up

University and college students sometimes fail assignments or get poor marks in their coursework because they have used the Internet in ways that are inappropriate for work at this level.

You may have used the Internet to help with school work or personal research but you can't necessarily rely on the same web sites and skills to get you through higher or further education.

Repeating information from a single source (eg. a text book, encyclopaedia or Web site) is not likely to get you very far.

Common mistakes made by students:

They rely on Internet searches for their research and ignore other key sources

They don't critically evaluate the quality of the information they find

They copy information from the Internet and don't acknowledge their sources

At university or college you will need to take your Internet research skills to the next level

At this level of your education you will be expected to:

Be able to do your own independent research

Locate and use a wide range of information sources

Critically evaluate the information you find

Synthesize information to form your own original piece of work

Present a balanced and well-informed argument leading to your own conclusions.

You should take full advantage of your reading list, course materials and library resources. You might also be tempted to turn to the wider web in which case you need to tread very carefully.

You will need to develop some advanced Internet research skills.

This tutorial can help!

Sum Up

In this section we have looked at how developing your Internet research skills can help you succeed in your university and college work.

OK, so let's look at some specific Internet Detective skills ...

The Good, The Bad and The Ugly

The quality of information on the Internet is extremely variable.

At best the Internet is a great research tool, at worst it can seriously degrade your work by feeding you misinformation.

Use this section of the tutorial to learn about:

The good: academic publishing on the Internet

The bad: time wasting on Internet searches

The ugly: Internet hoaxes, scams and legends

The Good

The good news is that many sources of authoritative research information now publish on the Internet.

In the academic world it is considered very important that new research builds upon past research and that the quality of information is assured. There are formal processes to facilitate this, and it's essential you understand these if you are to succeed at university.

Let's look at some of the information sources that are traditionally used to support academic research and at how these are increasingly available online ...

The Academic publishing process

Academics usually publish their research in formal publications such as journal papers and articles or reports. These follow formal procedures designed to quality-assure the work.

Peer review / refereeing

Peer review is what characterises academic research. If a publication is peer reviewed it means it has been read, checked and authenticated (reviewed) by independent, third party academics (peers). Peer review has been the quality-control system of academic publishing for hundreds of years.

Scholarly journals

Peer reviewed articles are often collated into scholarly journals, which are usually published by academic publishing houses, professional societies or university press. Journals will be a key source of information you at university - you will be expected to reference articles from them in your work.

Electronic journals

A university library may have shelves full of journals, but nowadays many are also available in electronic form over the Internet. Ask your lecturers or librarians how to find and use the key journals for your subject - the sooner you do this the quicker you will succeed in your research.

Library eJournal services

Access to eJournals is not usually free - a subscription has to be paid. However, a university library will have paid some subscriptions for its users - who can then get free access to these journals via their library web services, using a special password (check with your library for details).

eJournal publishers

If you can't get access to eJournals from your library you may be able to via the publisher's web services. Some offer "pay-per-view" which means you pay a small fee for each article you view.

ePrints

Increasingly academics are offering free access to their refereed journal articles (and sometimes other material) by means of databases accessible via the Web called Institutional Repositories (IRs).

Bibliographic databases

Most academics rely on specialist databases to access details of past research. The databases draw together details of scholarly publications from a wide range of sources including academic publishers, journals, archives and sometimes books, and so enable you to search a large body of the scholarly literature in one go.

Academic web directories

Of course a lot of information on the web can be useful for research even if it hasn't come from the traditional sources. Academic web directories, such as Intute, guide you to the best online resources for research - and each resource has been selected and reviewed by a subject specialist.

Library web sites

The library web site for your university or college will be an important source of information for you, as it will quickly guide you to the key electronic journals, bibliographic databases and archives that you should be using for your research.

Ask your lecturers and librarians for advice on which sources you should be using.

The Bad

The bad news is that the Internet also leads to a lot of information that is completely inappropriate for your research, and it takes time and skill to weed this out.

The quality of information on the Internet

As things stand the Internet has no standard system of quality control so it's important to be careful about which information you use and not to trust everything you read.

Think about it - the Internet links millions of computers:

Anyone can put something on the Internet - an amateur or an expert

From anywhere in the World - be it the United Kingdom or Uruguay

They can say anything they like - be it true or false

And leave it there as long as they like - even if it goes out of date

Or change it without warning - perhaps even remove it completely

There is a danger that the information you find on the Internet will:

Be from a source that is unreliable, lacking in authority or credibility

Have content that is invalid, inaccurate, out-of-date

Not be what it seems!

Weeding out poor quality information takes time

Most people use very simple search techniques when they want to find information on the Internet using a search engine such as Google.

These can produce thousands if not millions of web sites to explore: some information will be useful, some will be useless - it's up to you to discern which is which!

It can take considerable time and skill to sift through search engine results and evaluate which are the best sources.

Although it may seem a quick and easy option to turn to a search engine for your research, it might be more effective to turn to web services designed specifically for university and college research such as your library web site.

It's easy to miss key information

If you want to find something on the Internet, you go to a search engine, as they contain everything that is available online, right? Wrong!

Search engines only cover a proportion of what is available online, a lot of information is hidden or invisible to them. For example, some of the databases of research literature that we discussed earlier will not appear in search engine results, especially if they require a subscription or password to get access.

It's also worth remembering that search engines only search information that is online, and of course a huge body of research literature is still only available in print form in books and journals.

If you try doing the same search in different search engines you will get a different set of results on each search engine - which reveals thatnone of them index the whole Internet.

 Try this to compare search engines

It's a common misconception that search engines (such as Google) search everything - they don't - so if you rely on them alone you may miss some of the key sources for your research - consider using other sources too, such as your library catalogue, other databases and academic web search tools.

The Ugly

At worst the Internet can lead you to misinformation that could land you in real trouble.

Unfortunately there are a lot of sharks on the Internet - people who want to trick you, misinform you, deceive you and defraud you. Some web sites and emails can be real crime scenes.

Be sceptical, not paranoid!

This page will highlight some classic cases of misinformation on the Internet: Internet hoaxes, urban legends, scams and hate sites.

You need to develop some healthy scepticism when using the Internet for research but there's no need to get paranoid - we've already seen that there's plenty of good stuff out there too. OK, let's get ugly ...

Internet hoaxes

Some web sites are fakes designed to be spoofs, parodies or jokes. This is fine as long as you realise it's a fake and don't take it at face value!

Hoaxes are often about famous people, politics, products or organisations. Their content is humorous and the fact that they are not 'real' sites can be easy to spot. Some sites even include a disclaimer, just in case you don't get the joke, freely admitting that the web site is a hoax.

 See an example spoof

This mirroring of the web design is a clever trick to deceive you into thinking you have accessed the real site. In some cases the design is so like the original that you have to look very carefully to determine whether it is real or fake.

Sometimes fake web sites are designed to make a more serious point, be it political or educational.

 See an example parody site

Urban legends

Urban legends can be harmless but only if you realise they are not actually true!

What are urban legends? They are stories or rumours that have been circulated from person to person. In the past they were spread by word of mouth but now are often spread via email or web sites. Some may originally have contained elements of truth, but have become distorted by mistakes being made in the retelling. Others have been complete fabrications from the start.

Warning: if an email contains a phrase like: "Please, send this message to as many people possible!!!!" it should alert you to the idea that you may be looking at an urban legend and so the last thing you should do is forward the email to anyone.

The Internet is awash with false information, which people endlessly forward on to others believing it to be true. They become SPAM that clogs up the networks and peoples' email, misinforms them and wastes their valuable time.

 See some examples of urban legends

Scams and frauds

Scams and frauds are more serious as they involve criminals trying to steal your identity or con you out of your cash

The Office of Fair Trading describes SCAMS as:

Scheming

Crafty

Aggressive

Malicious

Their advice is that "If it looks too good to be true it probably is!"

 See some examples of scams

Hate sites

Sadly, the Internet can reflect the worst side of human nature and is sometimes used for defamation or to advocate hate, violence and hostility.

Some web sites with malicious intent have become known as Hate Sites because they disseminate such information. This could be about a person, an organisation, a religion, a political viewpoint - the list is endless.

 See an example of a hate site

How do you spot the fakes?

A number of web sites exist to expose fake sites and frauds.

If you are unsure if a site is genuine then check these sites to see if it is listed there as a fake. A quick search here could save you a lot of embarrassment!

Snopes [ http://www.snopes.com/ ] is a really great site for checking out anything you think might be an urban legend, hoax or scam. It keeps a huge archive of examples of urban legends, myths and hoaxes - so if you do have suspicions about an email check this site to see if it is a hoax.

The Office of Fair Trading: Advice on Scams [ http://www.oft.gov.uk/oft_at_work/consumer_initiatives/scams/ ] gives the official line on what to do if you become a victim of Internet fraud and has good advice on how to spot scams and frauds.

Scambusters [ http://www.scambusters.com/ ] gives information about how to avoid becoming a victim of identity theft, or of frauds such as pyramid selling, or money laundering scams.

Remember, it's up to you to make sure you don't degrade your work by quoting misinformation from the Internet. If in doubt, leave it out!

Quiz

Top of Form

Q1. What is the traditional quality control system for work published by academics?

 Peer review

 Proof reading

 Publishing research

Bottom of Form

Top of Form

Q2. You've just been set an assignment. Where should you start looking for sources?

 A search engine

 The library web site

Bottom of Form

Top of Form

Q3. What should you do if you are unsure whether a web site you are thinking of using as a source for your work is genuine?

 Use it in your work anyway - your tutor probably won't notice

 Look to see whether it is listed on a web site where hoaxes are posted.

 Leave it out of your work

Sum Up

In this section we have looked at the good, the bad and the ugly for Internet research:

The good: academic publishing on the Internet

The bad: wasting time on Internet searching

The ugly: Internet scams and frauds, urban legends and myths

It's often up to you to discern which is which!

The next section of the tutorial will help you do just this ...

Detective Work

In this section we will look at some practical steps you can take to critically evaluate information you find on the Internet.

It can pay to think like a detective: 

Take a case-by-case approach

Ask questions (who, what, where) and look for clues

Weigh up the evidence to make a judgement

Case by Case

"Quality is in the eye of the beholder" - you need to take a case-by-case approach to evaluating information.

The value of information is subjective as different information will be appropriate in different circumstances - it all depends on what you need it for.

For example, if you are doing formal scientific research you will probably want to rely on peer-reviewed articles that have been validated and checked by qualified scientists.

If you are writing an essay on something like popular culture or political bias it might be appropriate to reference informal or primary sources that represent different points of view and to discuss the strengths and weaknesses of these.

The key is to be clear about your purpose; decide what types of sources would be acceptable to use in light of this, and then to weigh up any information you find in light of your purpose.

What information do you need?

What are the best sources of this information?

What type of Internet resources (if any) would be worth looking for?

Warning!

If you don't know what you are looking for on the Internet you are likely to spend a lot of time drifting aimlessly through cyberspace - so save time by deciding exactly what you're trying to find before you start searching!

Check which sources your lecturers are happy for you to use - do they want you to stick to your reading list or library resources or are they happy for you to search the wider web?

Once you know what you're looking for you can get on the case.

Questions

The phrase "don't judge a book by its cover" also applies to web sites.

You need to question the quality of information you find on the Internet before you use it in your research. 

A novice searcher will make judgements based purely on the look and feel of the site.

An expert researcher will make judgements based on the content of the site, and the credibility of the source of the information.

There is a simple line of questioning that can help:

On the WWW ask WWW: Who? What? Where?

Who? - question the source of information

What? - question the content of information

Where? - question the location of the information

Who?

Can you trust your sources? You will need to establish their credibility, reliability and authority.

Authors, publishers, sponsors and developers will all impact on the reliability and credibility of the content of the information.

It's important to identify who is providing the information and to consider whether they can be relied on to provide the information you need.

Quality warning!

Remember, your search results might list:

Scholarly journals next to tabloid news.

Peer-reviewed articles next to vanity publishing.

The site of a Nobel prize winning scientist next to that of an Internet quack.

Detective work on sources

You need to identify and verify your sources.

Ask questions

Who is the author?

Who is the publisher?

Who sponsored or funded the site?

Do you recognise them as an authoritative source?

What are their credentials, qualifications, background and experience?

Has the information been edited or peer reviewed?

Are the sources trustworthy?

What are their motives for publishing the information?

What standpoint do they take: impartial? Biased?

Do other Internet sources that you trust link to this site?

Look for clues

To gather evidence look for:

Author details is there a biographical statement that lists their job title, contact details, qualifications and publications? Is this on the Web site of their employer or is it their own personal web site?

Details about the publisher, sponsor or developer of the site.

The About Us section, Mission Statement or Help - these might help establish their history, affiliations and standpoint.

The Contact Details - is there a physical address which verifies claims of authorship?

Photographs of the author or offices of the organisation.

A Copyright Statement to help establish the owner.

Consider how you came by the site- was it a link from a trusted source?

The URL (more on this later in this section).

Tips on checking your sources

On the Internet the source of the information may not always be made explicit but in academic work you must be able to cite your sources. Always look for statements of authorship. Is there any information about their qualifications, their position or who they work for?

If you've never heard of the sources try doing a quick Internet search on their name. Does Google tell you more about their credentials?

You can check to see if the author has published anything else by conducting a search on a relevant bibliographic database.

If you are quoting information taken from the web site of an organisation, always check that it is a reputable body. Look to see if it is listed in any of the directories of associations or organisations that you will find in your local library. Check if it quotes support or sponsorship from any other established bodies.

Be wary of contact details that give you a PO number as an address or which offer a premium rate phone number - these are common tactics used by Internet fraudsters.

If the sources are not disclosed - consider rejecting the information.

What?

Can you trust the content of what you see?

You will need to establish its coverage, validity, accuracy and currency.

If the material presented is inaccurate, untrue, illogical, or out-of-date then it's unlikely to be a lot of use for serious research.

It's important for you to evaluate the content of the information you find and think critically about the arguments, assertions,facts and data that are presented - are they of sufficient quality for your needs?

Quality warning! 

Remember, your search results might list:

Scientific facts next to unfounded opinions.

Professional advice next to idle gossip.

The latest research next to last year's news.

Detecting the value of information content

Ask questions 

Are the arguments and conclusions valid ie. well founded in logic or truth?

Does the author back up any claims with reliable third-party support (eg. citations, references, research data and source material?

Is there a balanced argument or is it one-sided?

Do you agree with the conclusions it draws?

Is the information accurate: or can you spot errors (eg. typographical errors or broken links).

Is the information current - or might it be out of date or superceded by more recent publications? Is there a "last-updated"date?

Is the coverage sufficient? Does it include all the aspects of the subject that you need in enough breadth or depth?

Is the level of the site appropriate? Does it treat the subject at the level you require or is it an introductory guide that is too basic?

Is it complete - is it available in full or has it been abridged?

Is it a commentary or an original text? A primary or secondary source?

Is it fact or opinion?

Are there adverts everywhere, that might make you question the motives of the online publication?

Look for clues

Take time to gather evidence about the content. Look for:

Bias and controversial statements that are unsubstantiated - use your own knowledge to question content and if it goes against what you know then look for evidence to back it up.

Research evidence - to back up the arguments and assertions presented (eg. look for good quality research methods, research data and reviews of past literature in the field.).

Proper references - especially in academic works - these should follow conventional citation practices and come from authoritative sources.

Mistakes and inaccuracies: if you spot any of these it should be a cause for concern - an editor or reviewer should have picked these up so maybe it hasn't been properly checked and cannot be relied upon in other ways?

Dates - for when it was written, published and last updated - how useful is it for your purposes?

Tips on checking the content

Site maps, Content pages and About Us statements - they often tell you the scope and coverage of the work

You will need to cite the title of the work and the date it was published in your references so make sure you can find these.

If you are looking for current news headlines or the most recent version of an article it is important that you are seeing the mostup-to-date information.

If the site offers something for nothing, asks you to send money to claim a free gift or prize, or asks for your bank account details it's probably a scam!

It has been said that the Internet can be used to find evidence to support any argument, but it's up to you to make sure that the evidence will stand up in court (well, stand up to the critical eye of your lecturers as they mark your coursework!)

If in doubt, leave it out!

Where?

Do you know where your information is coming from?

You will need to establish the location and origin of the information.

Which part of the World is your information coming from and on whose computer is it located?

Remember, information on the Internet might be based on a computer anywhere in the wo

Writing Services

Essay Writing
Service

Find out how the very best essay writing service can help you accomplish more and achieve higher marks today.

Assignment Writing Service

From complicated assignments to tricky tasks, our experts can tackle virtually any question thrown at them.

Dissertation Writing Service

A dissertation (also known as a thesis or research project) is probably the most important piece of work for any student! From full dissertations to individual chapters, we’re on hand to support you.

Coursework Writing Service

Our expert qualified writers can help you get your coursework right first time, every time.

Dissertation Proposal Service

The first step to completing a dissertation is to create a proposal that talks about what you wish to do. Our experts can design suitable methodologies - perfect to help you get started with a dissertation.

Report Writing
Service

Reports for any audience. Perfectly structured, professionally written, and tailored to suit your exact requirements.

Essay Skeleton Answer Service

If you’re just looking for some help to get started on an essay, our outline service provides you with a perfect essay plan.

Marking & Proofreading Service

Not sure if your work is hitting the mark? Struggling to get feedback from your lecturer? Our premium marking service was created just for you - get the feedback you deserve now.

Exam Revision
Service

Exams can be one of the most stressful experiences you’ll ever have! Revision is key, and we’re here to help. With custom created revision notes and exam answers, you’ll never feel underprepared again.