Website Credibility Trust Rank Computer Science Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

One of the serious problems and challenges in the Internet is the Web authoring and credibility. With the continuous expansion on Internet services especially when it is used to communication important sensitive information, there is a need to verify the Webpage content and authors. In this paper, we evaluate the elements required to evaluate Websites and pages credibility based on Website, Web pages, and authors credibility metrics. A case study of selected websites in Jordan is used to assess the proposed credibility metrics. Results showed that there are many metrics to measure a trust in Website or a Webpage. Results also showed the need to have a clear standard to evaluate Website content authenticity and credibility.

KEYWORDS

Information retrieval, Website credibility, trust rank, content authentication

1 INTRODUCTION

Research papers and publications are important indicators for the ability of an author or an education community to conduct research projects in the different human science fields. In general, the number of publications and the increase in this number is a direct indicator of the size or the volume of research activities for a particular author or university. Nonetheless, the number of publications merely, is showed to be a limited indicator to show the impact of those publications. The number of citations for a particular paper is shown to be more relevant and important in comparison to the number of publications. This is why early citation indices such as H-index and G-index gave more weight and important to the number of citations in comparison to the number of publications.

The changing nature and the huge size of the web have led to shed light on information retrieval systems. It has become increasingly difficult to retrieve the required web pages for users on the web. It becomes a necessity for a user to do that for searching certain queries with a minimum number of irrelevant web pages with the desired search features such as file type, domain, desired words, and so on. To resolve this issue, programs called spiders have been built to retrieve the desired web pages automatically.

Crawlers or spiders are automated tools that parse through websites and retrieve all pages and their contents. Users' needs are dynamic and over time they might need to reuse the same web pages that they have downloaded before, two types of crawlers are proposed to resolve this issue; batch crawler which doesn't allow duplication, instead it brings the last snapshot of the web page that has been downloaded by user, and incremental crawler which allows the duplication of web pages occurrence, and the crawling process is considered to be continuous.

There are several parameters for measuring crawler performance:

1. The importance of the page which is measured by keywords such as unique ones or their frequency, similarity to a user query description, similarity to seed pages which calculates the cosine similarity of the relevance ones, classifier scores which is given base upon

ISBN: 978-0-9853483-3-5 �2013 SDIWC 174

classifier existing knowledge, retrieval system ranking which uses many crawlers, and the popularity of the link which uses the Page Rank or the HITS algorithm.

2. Precision and recall. Unreliable information has misled internet users who rely on the web as a major source of knowledge. Search engines focus on retrieving the Web pages that are most popular and relevance to user query without taken into consideration the credibility of those web pages.

Many studies and algorithms focus on measuring page rank, the relevancy of the results to user query, and the behavior of users through accessing the web by using data mining techniques.

This research aims to study the credibility of web pages by interpreting the credibility guidelines into several measurements.

We will study the credibility from three perspectives:

1. Domain or website will be measured in terms of domain age, number of indexed pages in various search engines such as Google, inlinks, outlinks, number of broken links, website size, number of authenticated pages, trust rank, popularity, traffic, number of materials and publications, number of contacts, freshness, and age.

2. Web page/file: each web page or file will be measured in terms of freshness, popularity, trust, inlinks, outlinks, and age.

3. Author: the authors will be measured by number of citations and for example the number of indexed pages in Google.

2 RELATED WORKS

The increased number of web users has led to dynamically changing of the web confident in many areas. They recognize the importance of the web pages credibility as a measurement of web page quality. Many users concern in finding adequate and trusted source of information in order to gain the desired knowledge, The researchers study many elements of credibility such as freshness and Publication dates (P-dates) of web pages, they are extremely important issues for verification the quality of the web content, where the oldest web pages supposed to be a strong indication of confidence to the users. Yet most search engines rely on the relevancy between user query and web pages' contents in retrieving search results without looking at novelty of it. Another issue are studied to indicate the importance of credibility is finding the citations of published papers, which help users in evaluating the academic research and knowing the strength or weakness of author, where gaining the citations of an author is a challenge issue.

In [1], the authors study the freshness of the web page in terms of two elements the web page freshness and the inlinks freshness, then a temporal correlation is used.

In [2], the authors developed a temporal ranking algorithm, there work beats page rank algorithm by 17.8% and 13.5% in terms of NDCG. The limitation of their work that is not all web pages of certain website are indexed or achieved.

In [3] the author defined five elements of trusted web pages, three of them; expertise, experience, impartiality express the relation between user and topic, where affinity and record track express the relation between two users. The authors developed a Hoonoh ontology which stands on "who" and "know" in order to highlight the relations that are related to surfing on trusted information. The authors developed a search engine using the Hoonoh ontology to help users in seeking the web for trusted information and providing them a worth suggestions and directions regarding their search query.

In [4], the authors developed a supervised machine learning method to investigate the P-dates, where the linguistic information and the coordination of information extracted from the Document Object Model (DOM) tree of Web pages are used as elements of learning. Experiments explain that the developed model beats the F1 Score for English and Chinese web pages, in terms of three types of dates; first, last, and latest dates. Then a model for page ranking has been improved by using the P-dates, scores for relevancy between user query and the content of the web page, and scores of the importance of the page.

In a preprocessing phase, the Webpage is used as an input, and then extracts as a series of units. The unit consists of temporal element and text content. The output is represented as DOM tree. In training

ISBN: 978-0-9853483-3-5 �2013 SDIWC 175

phase the P-dates is assigned a score. In post processing phase P-dates are extracted using heuristic rules based upon the following elements:

1. Linguistic information including temporal elements, count of numerical characters, count of alphabetic characters, and words that point to the mean of publication such as "updated", "published", and so on.

2. Locations of the unit on the web page. For example before title, after title, at the bottom, and at the end.

3. The format of information, such as font type, alignment, and font size.

Then the page ranking is calculated according to the following formula:

rank(i) = a * sim(i,q) + � * f(i) * Pagerank(i)

The limitation of this work is the implicit P-date of the web page is not considered.

In [5], authors proposed an approach for obtaining the citations for an author by using his/her name and some vocabularies which are extracted from the title of the published articles. their approach is applied by using Google Scholar and implementing a filter on the data as a preprocessing phase. The results of their work give an average sensitivity of 98% and 72% specificity over traditional search.

The limitation of their work is related to accuracy which is obtained by using vocabulary filter. They recommended using other types of filtering algorithms on words such as treating plurals and misspelling, or implement a clustering technique as a preprocessing phase.

In [6], the authors proposed system for helping users to judge the credibility of Web search results and to search for credible Web pages with providing them a brief knowledge of certain topic. Conventional Web search engines present only titles, snippets, and URLs for users, which give few clues to judge the credibility of Web search results.

Moreover, ranking algorithms of the conventional Web search engines are often based on relevance and popularity of Web. they have implemented three functions: 1- computing and visualization credibility scores of web pages, 2- using users' feedback of credibility to estimate a credibility decision model of users, and 3- re-ranking web pages based upon users' feedback.

In [7], the author proposed an approach for measuring credibility of web articles by using Wikipedia articles for two reasons; Firstly, its public use by students and researchers. Secondly it is free online encyclopedia. 200 articles are selected for testing and key sentences if each article are extracted and assigned a score with consideration of natural language processing elements such as text similarity and word count, also credibility is measured by using Page Rank algorithm. The key sentences of articles are tested by using Google. Based on author, those are the summary findings:

1. Google doesn't retrieve credible search results which are based on the key sentences of the article.

2. Google returns not trusted and unrelated web pages.

3. The key sentences retrieve credible web pages if there is an exact match, but it is achieving poorly with partial matching of web pages that are retrieved by Google.

4. The credibility is different of the key sentence which is using different words or synonyms, or if the sentence contains more or less words.

5. The key sentences may not be clear.

6. Some key sentences are depending on the trustworthiness of author, because they are used in a specific domain.

The following are list of studies that developed metrics of trust, where they focus on one perspective, and neglecting the rest.

In this paper, we will integrate some of these metrics and assign a score for each website.

1. Compete Rank: it is an online project, which is providing users, the traffic of website and the usage of website by users through number of visitors [8].

2. Search Engine Optimization (SEO) Scores: The researchers developed a formula that uses the content of the website, in order to measure its credibility such as, the number of links, images, and unique terms [9].

3. Alexa Traffic: it is an online project, which is providing users, the traffic on the website universally and locally, and the top 100 websites which are linking to a website [10].

4. Wayback Machine (WBM): it archives more than 150 billion web pages since 1996, it provides important metrics such as number of indexed web pages of certain domain, domain

ISBN: 978-0-9853483-3-5 �2013 SDIWC 176

age, and the frequent update of certain web page [11].

5. PageRank: it is illustrated in the related work section.

6. The number of indexed pages of certain domain is an indicator of its credibility.

3 WEBSITE CREDIBILITY

This is an indicator of how much to trust or believe what this Website says (i.e. as content). It consists of two elements: trustworthiness; where terms such unbiased, truthful, good, and honest are referred to this website. The second type of elements includes terms related to the level of expertise. This may be referred to using terms such as: experienced, intelligent, powerful, and knowledgeable, are referred to it, also it is agreed that it is a "perceived quality" [12].

The aim of this paper; is to highlight the metrics for assessing the credibility of websites in order to provide users with certain important clues about a particular website. We used a case study of several Websites from Jordan selected from two sectors, Universities, banks and e-government websites. Tables 1 and 2 show a sample of metrics related to credibility measured for several Websites of Universities, banks and e-government entities in Jordan. The reason for selecting Websites of Universities, banks and e-government is that since those are examples of websites that should provide highly credible information and entities who own those websites are liable for announcing any possible incorrect information.

Results showed that Universities are getting higher trust ranks in comparison to banks and e-government websites due several factors such as: the large number of possible audience, the age of those websites, their popularity, etc.

Table 1: Metrics related to credibility measured for several Jordanian Universities. UN. Visit SEO Alexa Traffic in JO sites linking in Age in days indexed pages in Google Trust Rank

JU

3111

0.7

132839

73

1169

5042

983

5.69

JUST

3270

0.84

303883

181

763

5066

976

4.54

YU

2352

0.81

191279

93

893

4911

990

5.24

HU

277

0.58

239692

219

662

4206

972

4.05

UOP

1908

0.6

208004

591

4127

978

5.56

AABU

190

0.89

530194

463

469

4747

981

3.74

PHIL

2350

0.81

210676

802

2039

4447

986

5.92

BAU

947

0.55

351294

238

448

4601

992

4.38

Mutah

325

0.83

678491

702

495

4767

971

3.38

Zayt.

367

0.52

2131857

5044

253

3490

917

2.34

ASU

704

0.68

913016

1760

305

4711

965

2.85

GJU

443

0.84

1754165

2811

825

2317

982

2.73

IPU

118

0.65

2077034

2685

189

3907

918

2.31

AHU

0.52

1428590

2604

209

3731

979

2.22

ZPU

97

0.95

3162594

3153

230

4090

953

2.34

Table 2: Metrics related to credibility measured for several Jordanian banks and e-government websites. Site Visit SEO Alexa Traffic in JO Sites linking in Age in days pages in Google Trust Rank

1

319

0.87

484303

408

92

3316

708

2.69

2

0.59

2636623

3831

42

3985

94

1.43

3

366

0

1399037

1216

111

3958

428

1.91

4

0.79

1106440

993

65

2431

404

1.59

5

111

0

1239967

2556

194

5096

593

2.18

6

0.81

27772225

39

4622

245

0.87

7

134

0.62

756371

542

238

5542

580

2.66

8

0.93

8237534

11

927

275

0.49

9

0.76

26528582

4

52

0.48

10

0

8432702

31

3633

265

0.77

11

0.86

2126147

2079

2

466

0.33

12

0.83

1137603

728

57

3372

102

1.54

13

46

0.65

1547411

1696

160

4243

445

1.95

14

452

0

1000681

688

261

4466

990

2.84

15

16194768

119

3607

0

0.69

16

0.57

3814020

1141

106

3877

325

1.8

17

0.69

3252366

5092

281

3500

493

2.22

18

509

0.53

2726580

252

5123

944

2.83

19

65

0.5

302803

147

595

4416

748

3.98

20

200

0.67

970546

1692

274

3962

765

2.77

21

0.27

2845981

5284

138

4276

839

2.62

22

0.73

3626074

3542

176

3137

0

1.27

23

875

0

3939072

267

3446

391

1.53

24

0

11740014

51

2482

0

0.59

4 RESULTS AND DISCUSSION

We used data mining prediction to evaluate which metric(s) have significant impact on calculating

ISBN: 978-0-9853483-3-5 �2013 SDIWC 177

credibility for a particular Website. It should be mentioned however, that the experiments in this area are still immature and perhaps Trust rank metric is calculated from some specific attributes ignoring several other attributes that in future they should be also considered in calculating trust rank metric. In order to convert trust rank metric to a categorical attribute, we divided the values heuristically into three level: Values of trust rank less than 1 is given the label (low), values between 1 and 3 are given the label (medium) and values above 3 are given the label (High).

Figure 1 shows using J48 prediction metric on the collected dataset and using trust rank metric as a class label. The Figure shows that trust rank is solely depending on one attribute which is the number of index pages in Yahoo. This may indicate that the website calculating trust rank is actually taking or using the data from Yahoo pages count. As explained earlier, future formulas should take all relevant attributes into consideration and not only focus on one attribute which may make results biased. Figure 2 shows the accuracy of the predicted rank which shows that recall and precision are high.

Figure 1: J48 trust rank prediction results.

Figure 2: Trust rank prediction performance metrics.

Tables 1 and 3 shows that current trust rank metrics highly, and possibly solely, depend on popularity and traffic related metrics. While popularity should of course be an important criteria to indicate a trust in a Website where the high number of visitors for a Website means that such Website is known and trustworthy, nonetheless, this should not be the only or the major criteria to take into consideration.

In this paper, we only focused on trust rank metrics related to the whole Website in general. However, our preliminary investigations showed that there is a triangle of three factors related to the Website that may impact its trust rank. Those are, the Web pages that Website contains and the Authors that write in this Website. Each one of those three may have unique attributes that can define or specify their own trust metric which may further impact the trust rank of the others. For example, authors with high trust ranks usually write or post on Websites with also high trust ranks, and vice versa.

To study the effect of the classes that were given to the trust rank from original trust rank website (http://www.seomastering.com/trust-rank-checker.php) in the second experiment.

Table 3: Trust Rank class labels Range Label

More than 5

Excellent

Between 4-5

Very Good

Between 3-4

Good

Less than 3

Poor

Figure 3 agrees also with our previous findings that Yahoo backlinks metric is a major metric in deciding the trust rank metric. It shows here also other parameters that are related to traffic (i.e. Alexa and visitors metrics).

Figure 3: Trust rank prediction

Figure 4 shows the decision tree for trust rank based on the domain type. Based upon the three Jordanian domains that picked, the tested websites of (i.e. Universities, ministries and banks) results show that Universities have the high rank values

ISBN: 978-0-9853483-3-5 �2013 SDIWC 178

in comparison to the other two domains. Results also showed that this time Good popularity or page rank value was the first in distinguishing the trust rank based on domains.

Figure 4: Trust rank based on domain types

In general, results confirmed two major points regarding the trust rank metric:

There is a clear high dependability of trust rank on popularity metrics. While popularity should be a major factor, however, it should not be the de facto to judge trust-ability based upon. It is possible that since those metrics are easy to collect and less subjective, this make them the first to consider.

While trust rank checker website claims to base their formula on several other factors, results showed that such claim cannot be proven based on results and statistics.

5 CONCLUSION

In this paper, we evaluated metrics related to Websites and pages credibility and authenticity. Those metrics are indicators for the level of confidence and trust users should have and trust on visited websites and the content on those websites. Results showed that the issue is very complex and while we listed several important related metrics to evaluate, nonetheless, the process of evaluating such credibility can still be far more complicated. Results also showed that credibility is an integral process among the three major dimensions in a website: The Website itself and credibility related to the website itself, credibility related to the web pages and the content in those web pages, and credibility related to the authors of the website and pages� content.

Writing Services

Essay Writing
Service

Find out how the very best essay writing service can help you accomplish more and achieve higher marks today.

Assignment Writing Service

From complicated assignments to tricky tasks, our experts can tackle virtually any question thrown at them.

Dissertation Writing Service

A dissertation (also known as a thesis or research project) is probably the most important piece of work for any student! From full dissertations to individual chapters, we’re on hand to support you.

Coursework Writing Service

Our expert qualified writers can help you get your coursework right first time, every time.

Dissertation Proposal Service

The first step to completing a dissertation is to create a proposal that talks about what you wish to do. Our experts can design suitable methodologies - perfect to help you get started with a dissertation.

Report Writing
Service

Reports for any audience. Perfectly structured, professionally written, and tailored to suit your exact requirements.

Essay Skeleton Answer Service

If you’re just looking for some help to get started on an essay, our outline service provides you with a perfect essay plan.

Marking & Proofreading Service

Not sure if your work is hitting the mark? Struggling to get feedback from your lecturer? Our premium marking service was created just for you - get the feedback you deserve now.

Exam Revision
Service

Exams can be one of the most stressful experiences you’ll ever have! Revision is key, and we’re here to help. With custom created revision notes and exam answers, you’ll never feel underprepared again.