Querying Rdf In Semantic Web Using Sparql Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The Semantic Web aim is to make the present web more machine-readable, in order to allow intelligent agents to retrieve and manipulate pertinent information. As Semantic web can be viewed as integrated data from various sources with the intelligence of searching. The Resource Description Framework (RDF) is a structure for describing and interchanging metadata on the Web. RDF data model is used to represent data on Web in the form of XML. SPARQL, known as RDF query language defines a standard query language and data access protocol which is used with RDF data model and can works for every data source that can be mapped to RDF. As RDF data is generally of very large size, so there is a requirement of one effective and efficient terminology to get data quickly. In this research paper we proposed one framework for SPARQL query in form of Model which will evaluate the result efficiently by rewriting the SPARQL queries. This paper also discusses various approaches for optimization of SPARQL.

Keywords: Semantic web, RDF, SPARQL, TWINKLE, Jena ARQ.


Semantic Web

The promise of the Semantic Web is based on the principle that online content will be semantically annotated, creating machine-understandable content using interlinking ontologies [1]. The information on the web should thus be expressed in a meaningful way accessible to computers. The Semantic Web uses the Resource Description Framework (RDF) as its basic data format, which aims to represent information about resources [4].

RDF is the W3C recommendation data model for the representation of information about resources on the Web. The RDF specification includes a set of reserved keywords with its own semantics, the RDFS (Resource Description Framework Semantics) vocabulary. This vocabulary is designed to describe special relationships between resources like typing and inheritance of classes and properties [7].

The Semantic Web is the Web of data whose fundamental principle is the creation and use of semantic metadata. Various tools have been developed and are being developing in the ongoing semantic web research projects. These tools may help in overall semantic web development or ontology development which supports various applications and help in knowledge management. These tools often provide easy to use functionality, environment for consistence checking, promote easy and fast navigation between concepts, have tutorial support, and offers Plug-ins [4].

Fig.1:Semantic Web Layered Architecture [1]


RDF is used to describe the resources which are available on the web and also identify the relationship between them. It is a general purpose Language for representing the web metadata. The main purpose of RDF is to represent the semantics (Meaning) and reasoning about the web metadata[6].

The RDF Data Model

RDF is a data model for representing information about World Wide Web resources. The various principles designed by W3C followed by RDF are interoperability, extensibility, evolution and decentralization. Above all, the model for RDF was designed to have a simple data model, with a formal semantics and provable inference, with an extensible URI-based vocabulary [11]. This model allowed anyone to enquire about any resources. In the RDF data model, data is to be stored in a universal format i.e. anything that can have a universal resource identifier (URI) can be stored in RDF format. RDF data consists of a set of triples of the form (s, p, o), where s is called the subject, p is called the predicate and o is called the object of the triple [2].The language to describe them is a set of properties, technically as binary predicates. These binary predicates are the Descriptions of statements very much in the subject, predicate, object structure, where predicate and object are resources or strings. Both subject and object can be anonymous objects, known as blank nodes. In addition, the RDF specification includes a built-in vocabulary with a normative semantics (RDFS). This vocabulary deals with inheritance of classes and properties, as well as typing, among other features. Selecting RDF to store data allows to easily make changes in the data schema [2]. Information is stored in the form of triples, so adding one new attribute is an easy operation of creating a new triple. In relational databases this usually requires to alter a whole table and to add one new column with some default value for all information already stored in that table [5].

RDF Graphs

An RDF graph is a set of RDF triples. A graph has no blank nodes then it is called ground if it. Graphically, we represent RDF graphs as follows: each triple (s, p, o) is represented by a labeled edge s, p o where s is the subject, p is the predicate and o is the object. Notice that the set of arc labels can have a non-empty intersection with the set of node labels. Thus, technically speaking, and "RDF graph" is not a graph in the classical sense [7].

The subject or predicate of an RDF statement is usually a URI (Uniform Resource Identifier) which denotes resources representing relationships. One of the popular applications of RDF is FOAF (Friend of a friend ontology) and query language for RDF graphs is SPARQL.


SPARQL stands for "Simple protocol and RDF query Language" , which is basically an RDF query language .It is a RDF Query Language (SPARQL) that defines a standard query language and data access protocol for use with the RDF data model. SPARQL works for any data source that can be mapped to RDF. Although a number of RDF query languages are available, Connected Services Framework (CSF) Profile Manager only supports SPARQL queries. SPARQL is a query language having very much similarity with SQL constructs.


A query expressed in a high level query language is first be scanned, parse and validated. The scanner identifies the language tokens-such as keywords, attribute names and relation names, whereas the parser checks the query syntax to determine whether it is formulated according to the grammar of the query language. It must also be validated for the attribute and relation names are valid or not.

An internal representation of the query is then created, usually as a tree data structure called query tree. It is also possible to represent the query using graph data structure called query graph.

For the query execution the processor apply some optimization techniques on query graph and optimize that graph for processing and produces an execution plan. Then query code generator generates the code to execute that plan. The runtime DB processor run the query code to generate the result of the query.


Just as Normal Query Processing SPARQL Query is also has a processing cycle to retrieve the data. In SPARQL query processing, SPARQL query is firstly parsed by the parser for any syntax error then rewriting query stage will do the optimization by rewriting the query then QEP (Query Execution plan) generator generates the plan and executes that plan to retrieve the data from RDF data.

A Framework for SPARQL Query Execution and Optimization

General Query Optimization

A query typically has many possible execution strategies for retrieving the result. The process of choosing a suitable one for processing is known as Query optimization. Before the optimization process there is some internal processing which have to be done, the steps are as follows:-

1- Convert the query in to intermediate form. This form is basically relational algebra for SQL then it may be converted into query tree or graph which is also known as an intermediate form the query. There are some basic rule to convert RA(relational algebra) in to equivalent query graph. or query tree.

Optimization Techniques

There are two techniques to optimize the query. First is by applying heuristics rules and another is the cost estimation approach.

1- Heuristic Approach

This approach is widely used in today scenario of query processing and optimization. The parser generates an initial internal representation, which is then optimizing according to heuristic rules. The main heuristic is to apply first the operations those reduces the size of intermediate results. This includes performing as early as possible SELECT operations to reduce the number of tuples and PROJECT operations to reduce the number of attributes. This is done by moving SELECT and PROJECT operations as far down the tree as possible.

2- Cost Estimation Approach

It uses traditional optimization technique that searches the solution space to problem for a solution that minimizes an objective (cost) function. In this approach processor estimate and compare the costs of execution of a query using different execution strategies and then choose the strategy with the lowest cost estimate. For this approach to work, accurate cost estimate are required so that different strategies are compared fairly and realistically. In addition , number of strategies to be considered should be limited otherwise too much time will spent in making the cost estimation for the many possible execution strategies. So this approach is more suitable for compiled queries where the optimization is done at compilation time.


To Execute SPARQL we have different tools available as open source on which SPARQL query can be executed. Among these tools TWINKLE and Jena ARQ is most popular. One sample query and its execution is demonstrated on TWINKLE tool

Query:- The below query will find the name and email of all employees.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?name ?email


?person rdf:type foaf:Person .

?person foaf:name ?name .

OPTIONAL { ?person foaf:mbox_sha1sum ?email }



A query tree corresponds to the above SPARQL query.

Query:- The Query below

SPARQL Query Optimization

Due to declarative nature of SPARQL, a query engine has to choose an efficient way to evaluate a query.

Although all RDF repositories provide query capabilities, some of them require manual interaction to minimize the query execution time.

The SPARQL query graph model (SQGM), and the transformation rules to rewrite a query into a semantically equivalent one was proposed.

The goal of rewriting is to find an efficient query execution plan.

Conclusion and Future Work

In this paper we discuss semantic web and its RDF data analyze the query processing and SPARQL query processing.