There are two types of peer-to-peer network, unstructured peer-to-peer network and structured peer-to-peer network. Load balancing is an important problem to be solved for efficiency of a peer-to-peer network. Due to P2P's lack of restriction, the main problem for load balancing in self-organizing networks is to identify overloaded nodes and reassign their loads to others.
One of the most important factors to be considered in P2P system is heterogeneity. The meaning of heterogeneity is different nodes have different bandwidth and other resources like CPU, memory and disk. Overloading of any nodes can be avoided if we take more attention to node heterogeneity by taking capacity thresholds into consideration.
Load balancing issues in unstructured peer-to-peer networks
In unstructured P2P system, there is no unique identifier associated with each data item and node in network. Therefore, load balancing issue in unstructured P2P is even harder to deal with compared to structured P2P. Besides, there is no distributed hash table in unstructured P2P. Because of lacking a well defined overlay topology, nodes in unstructured P2P are randomly connected.
Load balancing issues in structured peer-to-peer networks
In structured P2P network based on DHT, load balancing remains an issue. Some nodes are map to more data keys, and some are map to less data keys regardless of their heterogeneity. These would cause some nodes are overloaded and some are under-loaded. Clearly, a well designed load balancing algorithm is greatly needed.
Due to the scalability, self organization and lookup efficiency of distributed hash table, it is the current trend among the P2P research community. DHT protocol like Chord, Cycloid and Koorde use consistent hashing to divide the identifier space evenly among the participating nodes. These protocols were designed with assumption, which is every participating node has same amount of resource. This is impossible because nodes are always different in their resources. Therefore, some nodes might be assigned more data while some might assign lesser than their fair allocated amount of data. Overloaded or under-loaded nodes are occurred at this time. Obviously, the node who owns a larger amount of data has to take more duty compared to the node with lesser data.
As a result of imbalanced load, it will cause slower file retrieval and slow down overall system performance. Therefore, DHT has two main limitations:
1.) Use consistent hash function to generate objects ID in DHT does not produce perfect load balance.
2.) In a homogenous structure overlay network, DHT ignore the heterogeneity characteristic of P2P system.
One of the algorithms for storage load balancing in distributed hash table is virtual server. However, the load balancing problem does not completely solved by using virtual server. When a node is overloaded, it simply drops parts of its load by removing its virtual server. However, deleting virtual servers may make other nodes become overloaded. Therefore, it brings the load problem to other nodes.
In a heterogeneity P2P system, it is desirable that each node has a load proportional to its resource capability. Even if the P2P system is homogeneous, it is hard to ensure that load in every nodes is balance because of the dynamism in P2P system.
There are few main issues in P2P2 load balancing:
1.) How to determine if a node is overloaded or under-loaded?
2.) If so, how to distribute the workload on the nodes?
3.) How to divide the system workload?
The system performance is highly depended on the answers to these questions.
In this section, I have summarised some of the academic publication I have found in the internet. There are few P2P load balancing issueââ‚¬â„¢s solution here:
1.) Routing based algorithm for unstructured P2P
Routing Discovery Process
if Source node S receives a query request then
forwards SETUP message to all available neighbour nodes of S;
Updates the load value and writes it to the route records;
While routing path is not selected do
for all intermediate node i do
if node i receives SETUP message then
forwards the message to all its unvisited neighbours and records the visited node information
for building a backward path;
if node i receives ACK message then
backwards the message to its upstream neighbours from route records;
if node i receives ERROR message then
discards the ACK message it holds and sends ERROR message backward along the path to the
if node i is source node and receives ACK message then
return routing path;
return NULL .
Load Reassignment Process
If all the paths found in forward stage are heavy loaded, the destination node will choose the least hops path and initiate load reassignment process. The node with the heaviest load is selected and parts of load items are reassigned to its neighbourhood.
2.) Service Discovery Mechanism for unstructured P2P
First, create an agent call P2P Registry (P2PReg) as a middleware within DNS and peers. In this scheme, each node is able to register and discover desirable services automatically through current DNS within a short duration in an efficient way with the help of P2PReg. With the help of DNS, we can discover desirable peers and obtain services from the peer within a short duration. It can ensure that the searching process results in a nearest peer that has the desired content. Finding nearby peers can greatly balance load in order to avoid hot-spot problems. Hence, we can balance the load for content demand while provide an efficient search. Their simulation study showed that serving data discovery over local search is indeed effective in improving the system performance significantly.
3.) Allocating and reallocating process for unstructured P2P
In this research paper, they build a new protocol for heterogeneous unstructured P2P networks. This protocol consists of two parts, which is allocating process and reallocating process.
In allocating process, new nodes select suitable neighbours in terms of capacity and connectivity. Besides, it decides if that node can be used for working on the workload based on its capability during allocating process. This can achieve load-balancing.
In reallocating process, the node will allocate its work to other node when it leaves the network.
This protocol can optimize a P2P network by solving load balancing issue.
4.) RESOURCE DONATION BASED
In this research paper, resource donation mean a node donate their resource to P2P system during its life-time. In this approach, each node is mapped with multiple identifiers. This can improve the lookup path length and the lookup failures significantly when multiple identifiers focused in one node. It can also balance the load due to key distribution and query load.
However, this approach has a disadvantage which is each node needs extra space to store multiple node identifiers.
5.) Adaptive load-balancing
The aim of this approach is to balance the routing load. It consists of two parts, which are item-balancing and caching popular keys.
A heuristics based is used on the average load. It ensures that an under-loaded node never takes more than the average load. Unneeded data item are limited when transferring data. The main advantage of this strategy happens when the load of a node is larger than the average load.
In this approach, the system will replicate and cache the object on other nodes. This method can reduce the request load. Hence, if a node holds replica of the requested object in its cache, the node can directly respond to the request.
6.) Collect file access history and stored on each peer
In this approach, every file access history is stored in each node. The node resourceââ‚¬â„¢s information is included in the history. This information is used distributing or redistributing load and it is important because of the dynamism in P2P system. In summary:
1.) Collect file access history for future use.
2.) A peer which owns a certain zone will be well divided when a new peer comes.
3.) Using low overhead to redistribute load. The size of each zone owned by a node may be adjusted during the system working time.
4.) Use topologically-aware in data replication. The copies of files on the peer are placed close to the group of peers which have high chance to access the file.
5.) No virtual server are used to avoid heavy routing metadata maintenance overhead.
8.) proximity-aware load balancing
This approach is based on the concept of virtual servers. This load balancing approach consists of four phases:
1.) Sum up the load and capacity information the whole system.
2.) Class node into different category like overloaded nodes, under-loaded nodes or neutral nodes.
3.) Assign virtual server according to the nodes capacity and load.
4.) Transfer virtual server from overloaded nodes to under-loaded nodes.
9.) Histogram-Based Global Load Balancing
HiGLOB is used for global load balancing in structured P2P system. Histogram here shows the view of the distribution of the load in the system. Two key components in HiGLOB are:
1.) A histogram manager ââ‚¬" maintains the histogram.
2.) A load balancing manager ââ‚¬" redistributes the load when the load is not balanced.