Web Mining And Its Applications Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

In this paper we survey the Semantic-based Web mining is a combination of two fast developing domains Semantic Web and Web mining. These two fields address the current challenges of the World Wide Web (WWW). The idea is to improve the results of Web Mining by making use of the new semantic structures in the Web and to make use of Web Mining for creating the Semantic Web. The Semantic Web can make mining of the Web much easier because of the availability of background knowledge and Web Mining can also construct new semantic structures in the Web. This survey analyses the approach of both areas. This paper first introduces the knowledge of Semantic Web and Web mining techniques, and then discusses the semantic-based Web mining and its applications.

The current World Wide Web (WWW) has a huge amount of data that is often unstructured and only human understandable. Web is rich with information; gathering and making sense of the data in the web is more difficult because the document of the Web is largely unorganized and unstructured. From the unorganized human readable web data semantic web is how to effectively and efficiently creating a machine-understandable, queriable, information and knowledge layer. If computer can understand the meaning behind the information, it can learn what we are interested in and it help us better find what we want.

Since the semantic Web mainly focuses on the data and information. Data in the Semantic Web is well defined and linked in a way that can be used for more effective discovery, automation. The nature of most data on the Web is unstructured that only understand by humans, but the amount of data is very huge on the web that processed efficiently by machines.

The goal of the Semantic Web is to develop allowing standards and technologies designed for both user and machines understandable. Semantic web information can support data integration, data discovery, navigation, and automation of tasks. The Semantic Web Layered Architecture will describe in Figure 1.

Figure 1 Layers of semantic web.

Uniform Resource Identifiers (URI) and Unicode follow the important features of the existing WWW. URI is simply a Web identifier. Such identification enables interaction with representations of the resource over a network (typically the World Wide Web) using specific protocols like http or ftp. The purpose of an URI is to specify an identifier to represent a resource of a web in a uniform path. URI is used to identifying information representation and constructs including classes, properties and individuals. URIs is the fundamental benefit of semantic web technology. URIs provides users to know exactly what it is they are being referred.

Unicode is an encoding character sets and that allow all user languages can be used to read and write on the web by using standardized form.

Extensible Markup Language (XML): XML is a language used to transport and store data on the web.XML is only to carry data, not to display data. XML documents contain a user defined tags. XML schema is used to describe the structure of the XML document. XML schema also called as XSD XML Schema Definition. XML Namespace in semantic web is used to avoid conflict data or names. XML layer aim to provide the basic syntax and structure of the data on the web.

Resource Description Frameworks (RDF): RDF is a framework for semantic web based on XML. RDF is XML based language used to describe information and resource with classes, properties and values on the web. In web semantic RDF is used to describe the web resources. RDF representing metadata about World Wide Web resources, such as the author, title, and modification date of a Web page. It can be used for storing any other data. Semantic web uses RDF as the primary representation language and provide data inter change data on the web.

Resource Description Frameworks Schema (RDFS): RDFs extension of RDF. RDF Schema provides the framework to describe application specific classes and properties. RDFS is used to describe taxonomies of classes and properties. RDFs do not define the classes and properties. It is similar to OOP Object Oriented Programming.

Web Ontology Language (OWL): OWL is based on the top of the RDF and XML based language. RDF is used to represent the rich and complex knowledge about things and their relationship. OWL provides processing information on the web. OWL is a part of web semantics. There are two types of OWL properties i.e. Object properties and Data type properties. OWL layer is uses to represent the ontologies of the semantic web.

 Rule Interchange Format (RIF) and Semantic Web Rule Language (SWRL): RDFS and OWL have defined semantics and it used to observation on the web. SWRL consist of OWL Lite and OWL DL. It is also based on XML. RIF and SWRL provide rules for the semantic web.

Simple Protocol and RDF Query Language (SPARQL): SPARQL is query language like and protocol for RDF. SPARQL used to querying the RDF data as well as RDFS and OWL ontologies with knowledge. SPARQL based on RDF data model. The results of SPARQL queries in the form of XML.

Proof and Trust layer:

Proof layer is used to verify the results produced by the agents should be believed or authenticate the agent behaviour. Trust layer is to provide a mechanism for trust and poise between information users (man or machine) and information sources.

On the top of the layer user interface and applications are built. Summarization of the semantic web layer given below in Table 1.

Web Mining:

Web is a collection of inter-related files on one or more Web servers. Web mining is data mining techniques used to extract knowledge from Web. Web mining is a helpful tool in the process of transforming human understandable content in to machine understandable semantics. The classification of web mining techniques represented in below Figure 2.

Figure 2: Classification of Web Mining Techniques

Web Content mining: Web Content Mining is the process of extracting information from the contents of Web documents. It examines content of the web pages as well and web searching. Content data corresponds to the collection of facts a Web page was designed to convey to the users. Web content may be unstructured (plain text), semi- structured (HTML documents), or structured (extracted from databases into dynamic Web pages). Such dynamic data cannot be indexed and consist what is called "the hidden Web". A research area closely related to content mining is text mining.

Web structure mining: Web structure mining is mostly interested in the hyperlinks of the web pages. Web Structure Mining can be is the process of mining structure information from the Web. Web structure mining is used to improve the structure of the web pages. Depending upon the hyperlink, the web pages categorize the Web pages and the related information and inter domain level.

Web usage mining: Web usage mining is the process of extracting information from server logs i.e. user's history and web user behaviour. The logs can be examined by client perspective or server perspective. This information takes as input the usage data, i.e. the data exist in in the Web server logs showing the visits of the users to the Web site. Web usage mining is the process of identifying browsing patterns by analysing the user's navigational behaviour.

To attain the concept, Web data (usage, content, structure) are represented by using developing model of representation, ontologies. This representation had the gap between Semantic Web and Web Mining areas, to create a research area, which of Semantic based Web Mining [1].

Semantic based Web Mining:

Semantic-based Web mining is a combination of two fast developing domains Semantic Web and Web mining. It can be read both as Semantic (Web Mining) and as (Semantic Web) Mining. Semantic Web addresses the challenge by trying to make the data for both machine and user understandable, While, Web Mining addresses the automatically extracting the useful knowledge or information, hidden data, and making available of web data. It is essentially mining the information pertaining the semantic web. In semantic based web mining the web pages are mined by the machine can better understand the information. It also means mining the data source to develop an effective semantic web. Figure 3 illustrates the semantic based web mining technology.

Figure 3: Semantic based web mining Technologies

It is basically mining XML and RDF documents along with ontologies and metadata. Semantic based web mining includes mining the data sources and information relating to the information management technologies on the web. Semantic Web mining will develop from Web mining. The goal of semantic based web mining is to make easy use of the web. It also used provide the human and machine can better perform their task. Semantic web requirement are considered in three major groups ontology, semantic web content and web service. Table 2 describe the Web mining technique used in Semantic web.

An Ontology approach: ontology is the backbone of the semantic web. A Semantic Web vocabulary can be considered as a special form of usually light-weight. Ontology is a collection of URIs with a usually informally described meaning. Ontologies are represented by a formal ontology language. In [2, 3] ontology plays a major role. Ontology is being represented as a set of concepts and their inter-relationships related to some knowledge domain. The knowledge provided by ontology is very useful in defining the structure and scope for mining Web content. Ontology is defined as a set of objects, concepts, and other entities that are exist in some area and the relationships that occur them. Figure 4 illustrates the ontology and semantic based web mining representation.

In Semantic based web mining also mine the ontologies on the web, using ontologies on the web provide that it more intelligent. Ontologies are developed from metadata. for example RDF schemas mine the metadata and ontologies. RDF schema used to developing the semantic web. I.e. information or data mined from ontologies and RDF schemas can be used to perform better understandable of web pages. Semantic web technologies represent meaning using ontologies and provide reasoning over the rules, logic, relationships and conditions represented in the ontologies.


Mahindra Pratap et al., [4] The Internet has developed from a collection of static HTML pages. Internet consists of static web pages and to providing dynamic, interactive content. Semantic based web mining application plays a major role in e-commerce to managing business processes. Semantic Web applications are decentralized; open to operate on distributed data. Semantic web applications follows semi structured schemas.

Berendt,B et al., [5] Semantic based web mining application includes many areas such as e-activities, health care, bioinformatics, privacy and security, and knowledge management and information retrieval . It proposes great chances in finance, business, marketing, commerce, finance, education, research and development.

P. Markellou et al., [6] Web mining research focused on E-learning is focused on web usage mining; based on how the student performs and their activities. Now-a-days digital libraries are also accessible from the web. Many commercial institutions are transforming their businesses and services electronically. The challenge of the Semantic based Web Mining technologies in the E-Learning domain to provide the personalized experiences for the users. These applications can take into the individual needs and requirements of user or learners.

Lappas, G [7] Semantic based web mining is applied in the E-Services areas like E-Government, E-Politics and E-Democracy. Only web mining applications have been related to these areas. Most of the government information is placed on the web. Current web mining research focused on E-Politics is based on web structure mining to identify political groups. It appears that the fields of E-services and web mining have recently had leading benefits to society.

Dzeroski S et al., [8]. Semantic based web mining is also applied in genetics, social network analysis, molecules and natural language processing .Semantic based web mining is also being applied in the search engine.

Semantic based web mining application of ontology is grouped into two classes improved search web data and better web browsing capability.

Naing et al., [9] Improved search of web data with semantic ontology of web data can be indexed by concept and relationship. By using concept and relationship of semantic ontology provides a better search.

Jean Vincent et al., [10] better web browsing capabilities in web searching web pages browsed by using ontology concept and relationship. If web pages concept and relationship instances can be created a web page virtual link between web pages belongs to the concept of interests. It provides the better web browsing capabilities.

Xiaohui Tao et al., [11] Semantic based web mining application with ontology based on the semantic based personalized web search engine is used to achieved by user recommended web pages only display. The displayed web pages are personalized web pages with user interests. Personalized web pages take advantages of semantic web and web mining, it may provide to improve the web search.

Table 3: Sematic based web mining applications


In this survey we have studied the two fast developing research areas in World Wide Web are: web mining and semantic web. The combined area of Semantic Web Mining offers new techniques to improve both areas. Semantic web mining can improve the results of Web Mining by exploiting the new semantic structures in the Web; and to make use of Web Mining for building up the Semantic Web. The Semantic Web can make mining of the Web much easier because of the availability of background knowledge and Web Mining can also construct new semantic structures in the Web. . The resulting research benefits many areas of industry such as e-activities, health care, bioinformatics, privacy and security, and search engines, knowledge management and information retrieval.



Figure 1: Layers of the Semantic web

Figure 2 Classification of web mining techniques

semantic_web_mining technology

Figure 3: Semantic based web mining technologies

Figure 4: Ontologies and web mining




Layer 1

URI and Unicode

Unicode Processing resources to encoding, URI: Used for identification of resources.

Layer 2


Used to represent the data content and structure

Layer 3

RDF and RDF schema

Used to describe resources on the Web and types

Layer 4


Describe the various types of resources and the relationship between resources

Layer 5


In the following four layers operate on the basis of logical reasoning

Layer 6


Query language and protocol for RDF.

Layer 7


According to logic, to verify statements in order to draw conclusions and The establishment of a trust relationship between users

Table 1: Layers of Semantic web

Web mining Techniques


Semantic web Content

Ontology construction

Ontology management

Semantic Annotation

Metadata Access

Web services

Web Content mining

Highly used

Highly used

Highly used

Highly used

Rarely used

Web Structure mining

Rarely used

Not used

Rarely used

Rarely used

Not used

Web Usage mining

Rarely used

Highly used

Highly used

Rarely used

Highly used

Table 2: Web mining technique used in Semantic web


Semantic based web mining applications

Mahindra Pratap et al., [4]

E-Commerce for business process

Berendt,B et al., [5]

Health care, bioinformatics , privacy and security and Information Retrieval

P. Markellou et al., [6]


Lappas, G [7]

E-Services like E-Politics , E-Government and E-Democracy

Dzeroski S et al., [8].

Genetics ,social network analysis and Natural language processing

Naing et al., [9]

Improved search engine of web data

Jean Vincent et al., [10]

For better browsing capabilities

Xiaohui Tao et al., [11]

Semantic based personalized search engine

Table 3: Semantic based web mining Applications