Volume Replication Group Replication Tree Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Self Synchronization in a distributed file system is the scope of the work. A spanning tree is constructed and maintained for each file volume. The servers that include 'volume replicas' and also the caches for the particular file volume are represented using the spanning tree. Many distributed algorithms for self synchronization are used to construct and maintain the spanning trees. The survey proposes a γ synchronizer algorithm to ensure consistency in the replication trees.

Keywords used:

File volume, Server, Volume replica, Volume replication group, Replication tree.

1. Introduction

A typical network file system is centralized and depends on a devoted file server. This leads to performance issues and reliability anarchy. Also such type of designs cannot endure faults as it can be considered as a "single point of failure". Here we propose a system design which is scalable and non-centralized or it can be called as server-less distributed storage system. The cache, storage and control are distributed across co-operating workstations. A self stabilizing system is one which recovers from any arbitrary configuration of the system. In other words, it tolerates the existence of faults that occur frequently and unexpectedly and also it saves the system from the message corruption which is difficult to detect or predict. Once the faults cease, the system starts to recover.

2. Previous work

NFS is the current standard remote file access protocol. NFS does not support caching. It requires frequent access to the server and does not provide self synchronizing mechanisms. One of the most common distributed file systems is Andrew File System or AFS. The advantage of AFS is that it provides caching and consistency guarantees. A centralized server controls each file volume. AFS provides only one copy for reads and writes, and other copies are 'read only replications'.

3. Traditional Approach

In traditional approaches of Self Synchronizing distributed file system, a distributed replica tree for each directory is maintained. The key structure will be the choice of a distributed spanning tree that can relate and connect replicas. The novel techniques that involve construction of tree and election of leader in a TCP/IP network are implemented.

Here, the TCP IP network is implemented as a communication graph G = (V, E) where there is an edge e between every two nodes, and also a cost function, fc : E → N between every two nodes . Each v ∈ V represents a system/computer in the network and the communication distance between the nodes is represented by the cost function fc. Also the design is based on the assumption that each computer is reachable from every other computer and hence the network is represented as a graph in which for each e ∈ E the value fc (e) should be finite.

The state of a server is denoted by its memory content which consists of its program counter also. The communication contents are specified by the values in communication buffers. There are two buffers, one for each direction of each communication edge. A system configuration, c1 = (s1, s2, · · · , b1, b2 , · · ·), is a vector of the states of the servers and contents of buffers. A step is an execution sequence of local computations and it should consist of at most one communication operation, either sends or receives operation. A real time value is related to each step. An execution can be considered as a series of steps and configurations, E1 = (c1, a1, c2, a2, c3, a3· · ·) in which each ci+1 is calculated by executing ai , when the system's current configuration is in ci. A task is defined as a finite set of execution called legal executions which are denoted by LE. A configuration denoted by c is considered as a safe configuration for an algorithm and for a task LE, if and only if any execution that starts with c is a legal execution. An algorithm is Self Synchronizing related to a task denoted by LE if and only if every execution which is infinite reaches a safe configuration with respect to the corresponding task and the algorithm.

3.1 Methods Involved

The replication tree requires a tree structure of the replication group nodes. The main task of the replication tree is to maintain data consistency in the volume replication group. Hence it makes sure that each server has the same replica.

A replication tree is said to be consistent if and only if every replica in it is exactly the same as the other one. The β synchronizer algorithm is used to ensure that the consistency of the replication tree is maintained.

Initially, the self synchronizing update algorithm is used to create the spanning tree. This algorithm makes use of local IP multicast process as a source of communication. In this procedure, the messages are sent up to a definite communication radius denoted by ϵ, where ϵ is the greatest number of routers through which the message can travel starting from the sender to a receiver. Messages sent by IP multicast are unreliable connection-less messages.

The self synchronization algorithm used is a combination of several algorithms. The first layer deals with the election of a leader in a self stabilizing way. The second layer involves construction of a replication spanning tree using a self synchronizing update algorithm. This replication spanning tree which is the output of the second layer is used by the leader to tune up a communication radius, ϵ for IP multicast messages. This replication tree is used for file systems operations like read and write.

The β synchronization algorithm used for consistency maintenance needs an initialization phase. Here a leader and a spanning tree, rooted at the leader are constructed and maintained. It operates as follows: The leader learns that the nodes in the network are safe after the execution of a particular pulse. It then broadcast a safe message to all the nodes in the network, so that may spawn a new pulse. The time taken is calculated using a certain communication pattern which is represented as "convergecast". This is initiated at the leaves and terminates at the root of tree. The strategy used is that whenever a node knows that it is safe and all its descendants are also safe, the node notifies this face to its father.

The communication complexity of a synchronization algorithm is denoted as C and is defined as the total number of messages sent during the algorithm. The time complexity, T is the total number of pulses passed from its starting time till its extinction. This assumption is meant for evaluating the performance of the algorithm and the algorithm is supposed to operate accurately with arbitrary delays. The trade-off between communication and time is a typical phenomenon in the case of communication networks.

The communication complexity of β Synchronization algorithm, ). The time complexity is given as T(β) = ). This is because all processes are carried out along the spanning tree. It can be seen that the time is only proportional to the height of the tree.

4. Proposed Work

In this paper, an algorithm called γ synchronization algorithm is proposed to maintain the consistency of the replication tree. This algorithm is combination of α synchronization algorithm and β synchronization algorithm. The β synchronization algorithm is discussed above. The α synchronization algorithm is inefficient in terms of communication but good in terms of time. But the β synchronization algorithm is good in terms of communication but inefficient in terms of time. The γ synchronization algorithm, which is the scope of our work, is efficient in terms of both communication and time.

A 'node' is considered to be safe if each message of the synchronous algorithm sent by the node for a particular pulse reaches its destination without fail. The messages can only be sent to neighbors. Eventually each node, with respect to a certain pulse becomes safe, after the execution of a particular pulse. If an algorithm is required to be sent back whenever a message of the algorithm is obtained from a neighbor, then a node is considered to be safe after all the messages are being acknowledged. It has to be noted that the asymptotic complexity is not increased by acknowledgements and hence each node knows that it is safe in a constant time after the new pulse has been entered. When a node learns that no message sent at the previous pulses of the algorithm may reach at that node in the future, a new pulse may be generated at that particular node. This can be the case when all the neighbors of that particular node are known to be safe in accordance with a particular pulse. It only remains to find a way to distribute this information to each node with less communication and time cost.

4.1 α Synchronization algorithm

In α Synchronization algorithm, each node learns that it is safe and conveys this message to all its neighboring nodes eventually using the mechanism described above. A new pulse is produced when a node understands that all its neighbors are safe.

The Communication complexity of α Synchronization algorithm is ), where E =. The time complexity is given as T(α) = ), since one message is sent to each link in each direction and the communication is established only between neighboring nodes.

4.2 γ Synchronization algorithm

The γ synchronization algorithm, which is combination of both the α synchronization algorithm and β synchronization algorithm discussed in the previous sections, consists of an initialization phase where the network is partitioned in to clusters. Any spanning forest of the communication graph (V, E) of the network defines the partition. An 'intracluster' tree consists of a cluster of nodes defined by each tree of the forest. A communication link is chosen between any two neighboring clusters. A leader is chosen inside each cluster. This will coordinate the operations of the cluster through the 'intracluster' tree. A cluster is defined to be safe if all the nodes in the cluster are safe.

The γ synchronization algorithm is carried out in two phases. The β Synchronization algorithm is applied to each cluster of the 'intracluster' trees separately in the first phase. The leader of the cluster notifies all the other nodes in the cluster as well as to all leaders of neighboring clusters as soon as it learns that its cluster is safe. In the second phase of the γ synchronization algorithm, all the clusters wait for all the neighboring clusters to be safe and then generate the next pulse. It can be seen that α synchronization algorithm is applied among all clusters internally.

A detailed description of the γ synchronization algorithm is given below: A cluster leader will broadcast a PULSE message along the tree and this PULSE message triggers the nodes to enter the new pulse. When a node enters the first phase of the γ synchronization algorithm in which SAFE messages are 'convergecast' along each intracluster tree as in the case of β synchronization algorithm. This procedure which starts at the leaves which then send SAFE messages to father nodes as soon as they realize they are safe. In case of non leaf nodes, which are non-leaders, they send SAFE message to their corresponding father nodes when they realize they are safe. If these non leaf nodes are not leaders, they learn that their cluster is safe and sends a CLUSTER-SAFE broadcast message to all neighboring clusters. The nodes on receiving this message forward it to its sons in the cluster along all preferred communication links.

In the second phase of the algorithm, the time at which all the neighboring nodes are found to be safe are determined. This is done by a standard 'convergecast' process in which a READY message is sent by a node to its father whenever all the neighboring clusters or any of its descendants are found to be safe. The node on receiving the READY message from all its sons and CLUSTER-SAFE messages from all preferable communication links and from its father, initiates this process. The whole process starts at the leaves and ends when the above mentioned conditions are satisfied at the leader of the cluster. The leader of each cluster learns that all the neighboring clusters as well as its own cluster are safe. The whole procedure then starts again and the leader now triggers the broadcast of PULSE message to all the nodes in its cluster so that they can generate the next pulse.

4.2.1 Complexities of γ Synchronization Algorithm

Let be the set of all tree links and preferred communication links in a particular partition P. Let be the maximum height of a tree in a forest of P. It can be determined that at most four messages are sent through each link of . Thus C(γ) = O(│. Also it can be verified that O( is required for each cluster to verify that it is safe and an additional time O( is needed to verify that all neighboring clusters are safe. Thus T(γ)= O(. Thus it can be seen that complexities of γ synchronization algorithm is smaller when compared to that of β synchronization algorithm.

5. Future Work

From the analysis of the complexities of the γ synchronization algorithm, it is clear that the complexities will be lesser only if we can find a partition P for which and are smaller. This can be implemented by finding out an algorithm in which each cluster is chosen as a maximal subset of nodes whose diameter does not go beyond the logarithm of its cardinality.

6. Integrating in an Operating System

The interactions between the user who issues commands and the algorithms and the structures which we have defined are described in this section. Two basic operations used in file system are 'mount' and 'unmount' commands. 'mount' operation attaches a file system to a mount point which is a local path identifier for a file system. The functionality of 'unmount' operation is to detach the file system from the local system.

The 'mount' command connects the cache to the caching tree, start executing the self-stabilizing algorithms described above, while the 'unmount' command disconnects the cache from the tree, or stop executing these algorithms.

6. Conclusion

The Self Synchronization of the distributed file system is achieved using synchronization algorithm. In this paper, a new simulation technique called γ synchronization algorithm has been proposed, whose communication trade off is optimal within a common factor. The γ synchronization algorithm is an efficient and simple methodology for designing distributed algorithms in asynchronous networks. This algorithm paves the way for the construction of low complexity algorithms.