Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UK Essays.
XML Query Routing in Structured P2P Systems
(Leonidas Fegaras, Weimin He, Gowtham Das, David Levine)
- Rahul Rayineni
The structured peer-to-peer network is a decentralized architecture which consists of large number of nodes that share data and resources with other nodes. They use a distributed hash table to determine the location of data. The nodes in structured P2P systems maintains a list of neighbors so that they from a overlay network in which lookup time for a key take logarithmic number of routing hops between peers. They offer better availability and scalability than unstructured P2P systems but the main difficulty in using these systems lies in data placement and query processing as queries are more complex. If the queries are not properly optimized routing and processing takes lot of time. This paper demonstrates about data placement, querying, and indexing large data repositories distributed over an existing DHT based P2P systems like pastry (Reference 4 and 6).
According to Leonidas Fegaras, Weimen He, Gautam das, and David Levine, “There are lot of earlier proposals like XPath lookup queries in P2P networks(Reference1) and Locating data sources in large Distributed Systems(reference 3) on indexing and querying XML data distributed data over a P2P network but there is no work reported on complex XML query processing with full text search that uses data synopses to selectively route queries to peers”.
The framework proposed in this paper is implemented on DHT-based P2P system Pastry (Reference 4 and 6). But this framework can be implemented on any P2P infrastructure. This frameworks works on summarized data namely structural synopsis and data synopsis for mapping XML queries. The high query routing precision, low data placement and low maintenance overheads are achieved through novel data synopses structures. This framework gives more accurate evaluation of textual and containment constraints in a query compared to bloom filters.
The publishing process makes the documents available to other peers while unpublishing is removing the document by the owner. If the peer wants to update the document first it will unpublish the document and later publish it again to reflect the document updates. According to Leonidas Fegaras, Weimen He, Gautam Das, and David Levine,” The XPath syntax is modified to add the predicate e ~ S, where e is an arbitrary XPath expression that returns true if at least from the sequence returned by e matches the search specification, S. A search specification is an IR-style Boolean keyword query that takes the form
“term” | S1 and S2 | S1 or S2 | (S)
where S, S1 and S2 are search specifications. A term is a keyword that must be present in the text of an element returned by the expression e”.
The label path of an XML document consists of child/attribute steps and can distinguish non-empty set of data nodes in the document. There are two types of data synopses used in this framework in which one is content synopses that contain structural summary nodes associated with bit matrices. The second one is positional filters which are bit vectors consists of internal structure summary nodes and their positions. To achieve good load balance with small number of nodes structural summary information and data synopsis are distributed over the existing P2P network. This framework is capable of finding all possible structural summaries applicable to the query with one DHT lookup.
The content synopses consists of index terms along with their positions. Index terms are formed breaking the document in to simple terms. The position of the index term depends on the position of its begin/end tags. The position of the elements tag depends on the number of
the begin and end tags preceding the elements. So index terms of a same element consists of same positional range. The positional bit vector consists of all the positions of the document.
According to Leonidas Fegaras, Weimen He, Gautam Das, and David Levine,” The search specification e ~ t1 and t2 for two terms t1 and t2 becomes true if and only if there is at least one document node returned by e that contains both terms. Using one-dimensional term bitmaps alone, such as bloom filters, and checking whether both the t1 and t2 bits are on, will give us a prohibitive number of false positives.”
The core operation in this framework is containments filtering, CF(F,V) which test element containment. Here F is positional filter and V is a bit vector. The output of the above function is new positional filter F’. In the bit vector V, if there is at least one bit is on with in this range it copies all the bits in F to F’.
Data placement involves placing of structural summary and data synopsis. The structural summaries are routed to every peer using every different tag name. Thus, with a single DHT lookup we can able to find out all structural summaries matching the structural footprint of the query. The label path of data synopsis is used in placing it. Since Multiple documents consists of label path , the synopsis from these documents is placed at a single peer. Thus with single DHT look up we can locate all the documents. Query routing involves collecting and filtering documents all the way. The triples(peer, document, positional-filter) are used as communication between the peers. They contain id of a matching document and owner of the document along with the document positions. At each peer size of the list is reduced by removing the documents whose positional filter are zeroes.
The network updates like node arrival, departure and failure are handled by novel methods when the node arrives the overlay network, it invokes the Pastry method notifyReady() and start sending and receiving messages. It receives all the information like structural summaries and data synopses from its predecessor. Similarly when a node decides to leave the network, it routes all its structural summaries and data synopses to its successor and leave the network. This can be done using one single message. When peer doesn’t find a matching structural summary the predecessor node may be failed. In this case peer chooses another tagname and search request to other peers to find structural summary. The Id of the failed predecessor is used to check for the node failure. If the predecessor is failed it will abort the query and scan the list again for the documents and send a message to publisher to publish the data. The main advantage of this method it will abort only one query at a time and data synopsis is restored that is associated with the failed peer
The closest work to this is done by L.Galanis, Y.Wang, S. R.Jeffery, and D.J.DeWitt. locating data sources in large distributed systems.(Reference 3). In this framework, the distributed indexing targets location of data sources which is different from the framework in this paper. The structural summaries are similar to the one that are presented in this paper. Here they use the tag name of the element that contain the text as search key which is contrary to the framework in this paper in which text was broken before indexing and label paths are used as keys. They don’t address the indexing cost also. According to Leonidas Fegaras, Weimen He, Gautam Das, and David Levine, ” the framework proposed by Galanis (Reference 3) is more suitable for data-centric XML data rather than to document-centric ones, since the later may include large text portions inside specific tag names, which results to the routing of large parts of document to the same nodes.” Another related framework is by A. Bonifati, Xpath Lookup Queries in P2P Networks. WIDM 2004. in which XML data fragments are indexed based on their path. The search key is the hash value of its path. This framework answer simple Xpath queries in one peer hop. The drawback of this framework is it requires additional hops to retrieve complex data fragment. Also this framework doesn’t support XPath predicates. There are other distributed summaries for XML data like XSketch(Reference 5) which is used in selectivity estimation than in query routing. In the paper presented by J.M. Bremer and M. Gertz on distributing XML repositories (Reference 2), the structural summary is used as a global scheme to show how XML data are fragmented and distributed over the network.
My opinion of this paper, it provides best framework for XML routing in structured P2P networks. The data synopsis and content synopsis used for indexing are better than regular bloom filters. The framework maps a query with full-text search into a distributed program that migrates from peer to peer. The index terms used in this framework are label paths which are better single tag names used in the previous frameworks for routing XML data. The containment filtering of this framework is efficient in addressing the containment relationships between predicates in a query. It can find all structural summaries of a query using one DHT lookup. It can handle complicated XPath queries using structural summaries. The network updates are handled effectively using novel methods which are very crucial in structured P2P networks. The data placement strategy gives load balancing in the system. The framework is easily scalable and it can be implemented on top of any existing P2P infrastructure.
- A. Bonifati, et al. XPath Lookup Queries in P2P Networks. WIDM 2004.
- J.-M. Bremer and M. Gertz. On Distributing XML Repositories. WebDB 2003.
- L.Galanis, Y.Wang, S.R.Jeery, and D. J. DeWitt. Locating Data Sources in Large Distributed Systems. VLDB 2003.
- Pastry. http://freepastry .rice.edu/.
- N. Polyzotis and M. Garofalakis. Structure and Value Synopses for XML Data Graphs. VLDB 2002.
- A. Rowstron and P . Druschel. Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems. International Conference on Distributed Systems Platforms 2001.
Cite This Work
To export a reference to this article please select a referencing stye below:
Related ServicesView all
DMCA / Removal Request
If you are the original writer of this essay and no longer wish to have the essay published on the UK Essays website then please: