An Indexing Structure Over Distributed Hash Table Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The goal of this research paper is to analyze and understand the importance of Prefix Hash Tree indexing technique. Prefix Hash Tree is a distributed data structure that enables more sophisticated queries over a Distributed Hash table (DHT)[7]. This paper describes the design and implementation of the Prefix Hash Tree, which uses the lookup interface of a DHT to construct a trie-based structure that is both efficient and resilient.[1]

With emerging technologies like cloud computing, decentralization and scalability are two important features that are required by a distributed system. It is essential to understand how PHT provides efficient indexing and availability is provided by PHT during failure of any node.


DHT are self-organizing, requiring no centralized authority or manual configuration. They are robust against node failures and easily accommodate new nodes. In many large distributed systems it is very important for the system should be easily scalable and the number of hops per lookup is efficient. In these systems the number of hops per lookup should increase logarithmically with the number of nodes to provide good scalability.

While DHT is widely used in various enterprise applications, they are seriously deficient in one regard: they only directly support exact match queries. Equality joins can also be supported within DHT framework but Range queries, asking for all objects with values in a certain range, are particularly difficult to implement in DHT's. This is because DHTs use hashing to distribute the keys uniformly and there are no structural properties of the key space such as ordering among keys.

The Prefix Hash Tree (PHT) supports range queries. It also supports various other queries like heap query (what is the maximum/minumum?) and proximity query (what is nearest element to x?). Moreover PHT also tolerates failures well; while it cannot by itself protect against data loss when nodes go down, the failure of any given node in the Prefix Hash Tree doesn't not affect the availability of data that is stored in that node.[1]

One of the critical advantages of using the Prefix Hash Tree is it is built entirely on top of the DHT interface and thus can run over any DHT. PHT uses only the lookup(key) operation that is very common in al the DHT as compared to other services like SkipGraph[2] and other approaches that assume knowledge of changing the existing DHT topology or routing behavior. Hence PHT provides the ease of use as compared to other implementation.

Key Concepts

This section describes the PHT data structure, along with related algorithms. Operations such as lookup, insertion, Range Queries and deletion etc. are discussed in the following section.

Trie or Prefix Tree:

A trie or prefix tree is an ordered tree data structure that is used to store associative array where the keys are usually strings. Unlike binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree shows what key it is associated with. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are normally not associated with every node, only with leaves and some inner nodes that correspond to keys of interest.[4]

An example of trie for keys "A", "to", "tea", "ted", "ten", "i", "in", and "inn" is shown in Figure 1:[4]

Figure 1: prefix tree[4]

PHT Description

For the sake of simplicity, let us assume the domain being indexed is {0,1}D , i.e binary string of length D, although the discussion extends naturally to other domains. Therefore, the data set indexed by PHT consists of some number N of D bit binary keys.

In PHT, each node of the trie is labeled with a prefix that is labeled recursively: given a node with label l, their left and right nodes are labeled l0 and l1 respectively.

The following properties are invariant in a PHT.[1]

Universal Prefix:

Each node has either 0 or 2 children

Key storage:

A key K is stored at a leaf node whose label is a prefix of K.


Each of the node stores atmost B keys.


Each internal node contains atleast (B+1) keys in its sub-tree.

Threaded leaves:

Each leaf node maintains a pointer to the leaf nodes on its immediate left and immediate right respectively.

Property 1 states that any key of a given node should have its key, K ¥ {0,1}D .The property 2 states that there is exactly one leaf node leaf(K) whose label is a prefix of K. The figure 1 shows PHT containing N=20 6 bit keys with B=4. The table shows list of 20 keys and the leaf nodes they are stored in.

Figure 2: Prefix Hash Tree - [1]

Property 3 and property 4 govern how PHT adapts to the distribution of keys in the data set. If a new node is added then the number of nodes that is stored by a node may be violated hence we split into two child nodes and key is redistributed among the children according to property 2. Conversely deletion of a node from the PHT will lead to the number of keys to fall below B+1, causing property 4 to be violated. The property 5 ensures the leaves of the PHT form a doubly linked list, which enables sequential travelling of leaves for answering range queries.

The Prefix Hash Tree nodes use DHT for faster lookup this is achieved by hashing the prefix labels of the PHT node over the DHT identifier space. Thus a node of PHT is directly mapped to a particular key space by the function HASH(l). The direct access of the node of PHT provides various added advantages instead of traversing many nodes.

PHT Operations:


Lookup functionality is used to get a particular node with the help of prefix. As the number of nodes is a D+1 distinct node then there are D+1 potential nodes that can be extracted. Linear approach of searching is not suitable and it is a big overhead. We can use binary search algorithm for the lookup. The algorithm for the same is as follows

Algorithm : PHT-Lookup

Input : key K

Output: leaf(K)

Low = 0;

High = D;

While(low<=High) do

mid = low+high/2;

node <- DHT-lookup(Pmid (K))

if(node is a leaf node) then return node;


if(node is an internal node) then return low<- mid+1;

else hi <- mid-1;



return false;

The binary search reduces the number of DHT lookups from linear search (D+1) to logD.

Range Queries:

Consider a query for a range [001001,001011] is performed then in traditional approaches we find the prefix of the lower bound and we know that tree is a doubly linked list and nodes are traversed from leaf of lower bound to the higher bound. As the node traversal is happening in sequential manner the complexity is N. As initially it uses the DHT lookup to obtain the prefix of the lower bound, its complexity is logD hence the total complexity is logD + N.

Alternatively we can improve the efficiency by using the DHT to locate the node whose label corresponds to smallest prefix range that completely covers the specified range. This process continues until the leaf nodes overlapping with the query are reached. In the figure shown below for the query range from [001001,001011] we consider the lowest range of the set(i.e) 001001 and the nearest node to that range is 00100*. After this a traversal of linked list forwards the query in a sequential manner to other paths in parallel such as the 001011* or 001010*, which resolves the query as shown in the Figure-2.

The smallest prefix range that covers the entire range is 0010* in the above figure 2. PHT uses a single DHT lookup to directly jump into this node after which the query is forwarded in parallel within the sub tree until the leaf nodes overlap with the search range.

Figure 3: Range Query[1].


Insert/ Delete Operations:

In PHT, insertion or deletion of a key into the PHT first requires to lookup for a leaf(k). Insertion of a new key can cause this leaf node to split into two different children and it has to re-distribute the keys. In worst case all the keys in the PHT has to be redistributed. Similarly for the deletion of the node from the PHT also does the same. In the worst case the redistribution can cascade until the depth of the PHT making the cost of the operations as expensive.

It is possible to optimize the solution by limiting the number of split per insertion to be one and even the number of merge per deletion to one. This process is called staggered updates. This approach violates the property 3 and property 4 mentioned above. But the PHT does the key redistribution in the later stages to make the tree balanced.

Tries v/s Trees

In this section we compare the difference of using trie-based index such as the PHT as compared to the tree-based indices such as the B tree implemented using the interfaces of the DHT. The main difference is trie partitions the space whereas tree partitions the data set. In trie-based approach, the space used by the trie is constant and doesn't depend on the data set whereas in tree based approach the node represent the particular set of keys. The prefix of the key is stored as the key and it is possible to get the node via a single DHT lookup, which is not possible in case of the tree-based approach. PHT has several advantages as any node in the PHT can be retrieved by single DHT lookup. Few of the advantages provided by the PHT are


Comparing the cost for the lookup for a key in tree-based approach is higher as compared to the PHT model. If we consider there are million 32-bit keys, a tree index would require 20 DHT lookup as compared to 6 for the PHT to retrieve the key. Efficiency is greater when we choose PHT model of indexing.

Load Balancing:

In tree-based approach, every lookup in a tree must goes through root node, creating a potential bottleneck to the system. In trie based, binary search allows the load to be spread over 2D/2 nodes, thus avoiding the single point of failure of the system thus improving the performance of the system.

Fault Resilience:

In tree-based approach loss of one of the nodes results in loss of entire sub tree rooted at the failed nodes. The availability of other data is affected due to one of the faulty nodes in the network. In PHT approach the node can be easily retrieved by DHT lookup and there is no necessity of top-down traversal. In both of the approaches final node is found by sequential traversing of the node. But Fault tolerance is higher in PHT based system than the tree-based system.

Refreshing and recovering from failure

PHT inherits the failure recovery property of the underlying DHT. PHT restores the lost data by relying on soft state update. Every node in the PHT has a time to live (TTL) associated with the node. When the TTL expires, the entry is automatically removed from the DHT.

The mapping server periodically does a refresh of the values that are inserted in the DHT. Each of the keys that are inserted in the PHT has a time to live associated to it. Let us assume that is T seconds, Every T/2 second, a mapping server does a refreshes its key by resetting their TTL to T seconds. At the same time it even checks whether the TTL of the parent node is less than T/2 seconds, if yes then it does the refresh to the parent too. The same process is continued until the root node is obtained. This process ensures to recover data that is lost.

Dealing with Concurrency:

If two of the mapping servers attempt to insert key K1 and K2 into a same leaf and the leaf node is full then both the servers will attempt to spilt the node this leads to a race conditions that can result in loss of temporary data. But mechanisms such as soft state update ensure the data is retrieved back.

Various implementation changes were proposed to solve the concurrency problem. One such approach is the introduction of put_conditional(). When inserting key K into leaf(K), we use the put_conditional() to ensure that the leaf has not been modified or split since we performed the lookup. When a leaf node needs to be split then we mark the node as transition using the put_conditional() method, it is like holding a lock on that leaf node. If multiple servers are trying to split the same leaf node then only one of the servers is successful in the split operation of the node. The in transition state is removed once the split operation is completed.

Caching to improve performance

The main advantage of using the PHT is the single time lookup of DHT. Hence introduction of caching to this model decreases the time taken for the query to complete. The caching can be achieved by using a hint cache on the client side that keeps track of the shape of the PHT based on the previous lookup operation. A cache hit thus generates a single DHT operation. Upon a cache miss, however the lookup must revert to the binary search algorithm.

For query operations the query is first performed in the cache scheme to retrieve the leaf node. Hence performance is improved by introduction of the cache.






Sql Support

Supports SQL queries

Supports simple key lookup which is slower compared to PHT

Very quick key lookup

Data Size

Suitable for large amount of data.

Small size of data.

Small size of data

Data Range Search

Indexing methodologies help in data range search

Not supported



Support various functions such as Max, Min and SUM

Not supported

Supported due to indexing using Prefix Hash tree

Table 1: Comparison between RDBMS, DHT and PHT


DHT is widely used in the enterprise application space and has gained high popularity. DHT provides a decentralized distributed system that provides lookup service similar to hash table. Even though DHT provides various advantages but it lacks in basic range querying functionality. Our goal was to address the shortcoming of the DHT and come up with a feasible solution based on the research that is carried out by various researchers in this field. Earlier solutions such as SkipGraph[2] has the dependency or needs knowledge of the underlying DHT routing algorithm. PHT indexing solution is built over the DHT interface so it doesn't require extra knowledge about the system and this solution can be used over any of the existing DHT easily.

In brief, PHT supports various query functionalities like heap query, proximity queries and others. PHT over comes the shortcomings of the existing DHT and can broaden the applicability of the system.