Global State Of Distributed Systems Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Grid computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications and high-performance oriented applications.

Chandy K. M. and Lamport L., "Distributed Snapshots: Determining Global State of Distributed Systems," ACM Transaction on Computing Systems, vol. 3, No. 1, pp. 63-75, February 1985.

In [1] Chandy and Lamport proposed a global snapshot algorithm for distributed systems. It is observed that every checkpointing algorithm proposed for message passing system uses Chandy and Lamport's algorithm as the base. It is observed that most of the algorithms proposed for message passing systems use Chandy and Lamport's algorithm as a base. The algorithms proposed in literature for MP systems may be derived by relaxing various assumptions made by the demand modifying the way each step is carried out.

Chandy and Lamport's Algorithm

Chandy and Lamport's algorithm is based on following assumptions:

Distributed system consists of a finite set of processors and a finite set of channels.

The processors communicate with each other by exchanging messages through communication channels.

The channels are fault free.

Communication delay is arbitrary but finite.

The global state of the system includes the local states of processors and the state of communication channels.

State of a channel refers to the set of messages sent along that channel and not yet received by the destination node from that channel.

Buffers are of infinite capacity.

Termination of the algorithm is ensured by fault-free communication.

Algorithm: The global state is constructed by coordinating all the processors and logging the channel state at the time of checkpointing. Special messages called markers are used for coordination and for identifying the messages originating at different checkpointing intervals. The algorithm is initiated by centralized nodes. The steps followed after a checkpoint initiation are the same in all the nodes except that a centralized node initiates checkpoint on its own and the other nodes initiates checkpoints as soon as they receive a marker. The steps are below:

Save the local context in a stable storage.

For i = 1 to all outgoing channels do send markers along channel I ;

Continue regular computation;

For i=1 to all incoming channels do Save incoming messages in channel i until a marker I is received along that channel.

J.L. Kim and T. Park. "An efficient protocol for checkpointing recovery in Distributed Systems" IEEE Transaction On Parallel and Distributed Systems, 4(8):pp.955-960, Aug 1993.

In [2] J.L. Kim and T. Park had presented a new efficient synchronized checkpointing protocol which exploits the dependency relation between processes in distributed systems. In their protocol, a process takes a checkpoint when it knows that all processes on which it computationally depends took their checkpoints, and hence the process need not always wait for the decision made by the checkpointing coordinator as in the conventional synchronized protocols. As a result, the checkpointing coordination time is substantially reduced and the possibility of total abort of the checkpointing coordination is reduced. By doing so the second phase of the checkpointing coordination may be removed. When multiple checkpointing co-ordinations are overlapped. Time under their protocol can also be saved if it is possible to use the decision of one checkpointing coordination for other coordination. The checkpointing commitment decision can be made locally so that the total abort of checkpointing is avoid i.e. when a process involved in a checkpointing coordination fails, the processes not affected by failed one can make their decision, while the protocols following the straightforward two-phase mechanism abort the whole checkpointing activity. Even if the checkpointing and rollback coordination overlap, the processes which are involved in checkpointing coordination but not involved in the rollback coordination can successfully make their decisions.

Franco Zambonelli: On the Effectiveness of Distributed Checkpoint Algorithms for Domino Free Recovery, IEEE, Proceeding of HPDC-7,98, July 1998, at Chicago, pp 124-131.

In [3] Franco Zambonelli Domino free Snapshot algorithm to permit one process to consistently restore its execution from its latest local checkpoint before the fault, one must grant that all its local checkpoints are useful can belong to at least one consistent global checkpoint. Otherwise, the execution of the process must be rolled back in the past until a useful local checkpoint is found from which to build a consistent global checkpoint. Rollback propagation, often called the domino effect because of its recursive nature, limits forward execution progresses in presence of faults.Franco zambonelli algorithm deals with on-line algorithms that grant domino-free recovery by monitoring the application execution and by forcing additional local checkpoints in processes, when the arrival of one message is likely to make some local checkpoint useless. Several well known checkpoint algorithms are presented and integrated within a single theoretical framework. The effectiveness of the algorithms was evaluated in a heterogeneous set of message passing applications. The main result was that none of the algorithms shows itself capable of reasonably limiting the number of forced checkpoints, thus introducing a high overhead on applications.

Koo R, Toueg S 1987 Checkpointing and rollback recovery for distributed systems. IEEE Trans. Software Eng. SE-13: 23-31.

In [4] Koo-Toueg's proposed a minimum process blocking checkpointing algorithm for distributed systems. The algorithm consists of two phases. During the first phase, the checkpoint initiator identifies all processes with which it has communicated since the last checkpoint and sends them a request. Upon receiving the request, each process in turn identifies all process it has communicated with since the last checkpoint and sends them a request, and so on, until no more processes can be identified. During the second phase, all process identified in the first phase take a checkpoint. The result is a consistent checkpoint that involves only the participating processes. In this protocol, after a process takes a checkpoint, it cannot send any message until the second phase terminates successfully, although receiving messages after the checkpoint is permissible.

Silva L, Silva J 1992 Global checkpointing for distributed programs. Proc. IEEE 11th Symp. On Reliable Distributed Syst. pp 155-162.

In [5] Silva and Silva proposed all process coordinated checkpointing protocol for distributed systems. The non- intrusiveness during checkpointing is achieved by piggybacking monotonically increasing checkpoint number along with computational message . When a process receives a computational message with the high checkpoint number, it takes its checkpoint before processing the message. When it actually gets the checkpoint request from the initiator, it ignores the same. If each process of the distributed program is allowed to initiate the checkpoint operation, the network may be flooded with control messages and process might waste their time making unnecessary checkpoints. In order to avoid this, Silva and Silva give the key to initiate checkpoint algorithm to one process. The checkpoint event is triggered periodically by a local timer mechanism. When this timer expires, the initiator process the checkpoint state of process running in the machine and force all the others to take checkpoint by sending a broadcast message. The interval between adjacent checkpoints is called checkpoint interval.

R. Prakash and M. Singhal. "Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems". IEEE Trans. on Parallel and Distributed System, pages 1035- 1048,0ct. 1996.

In [6] Prakash-Singhal algorithm was the first algorithm to combine these two approaches. More specifically, it forces only a minimum number of processes to take checkpoints and does not block the underlying computation during checkpointing. Prakash Singhal algorithm forces only part of processes to take checkpoints, the csn of some processes may be out-of-date, and may not be able to avoid inconsistencies. Prakash-Singhal algorithm attempts to solve this problem by having each process maintains an array to save the csn, where csn,-[i] has been the expected csn of Pi. Note that Pi's csn,[i] may be different from P, 's csn,- [i] if there is no communication between P,- and P, for several checkpoint intervals. By using csn and the initiator identification number, they claim that their non- blocking algorithm can avoid inconsistencies and minimize the number of checkpoints during checkpointing.

G. Cao and M. Singhal. "On impossibility of Min- Process and Non-Blocking Checkpointing and An Efficient Checkpointing algorithm for mobile computing Systems". OSU Technical Report #0SU-CISRC-9/97-TR44, 1997, pp 37-44.

In [7] Guohong Cao and Mukesh Singhal proposed a min-process checkpointing algorithm that no min-process non-blocking algorithm exists. There are two directions in designing efficient coordinated checkpointing algorithms. First is to relax the non-blocking condition while keeping the min-process property. The other is to relax the min-process condition while keeping the non-blocking property. The new constraints in mobile computing system, such as low bandwidth of wireless channel, high search cost, and limited battery life, suggest. That the proposed checkpointing algorithm should be a min-process algorithm. Therefore, we develop an algorithm that relaxes the non- blocking condition; that is, it is a

In this, they introduce the concept of mutable checkpoint, which is neither a tentative checkpoint nor a permanent checkpoint, to design efficient checkpointing algorithms for mobile computing systems. Mutable checkpoints can be saved anywhere, e.g., the main memory or local disk of MHs.

The Basic Idea behind Non-blocking Algorithms

Algorithms rely on the two-phase commit protocol and save two kinds of checkpoints on the stable storage: tentative and permanent.

first phase, the initiator takes a tentative checkpoint and forces all relevant processes to take tentative checkpoints. Each process informs the initiator whether it succeeded in taking a tentative checkpoint. After the initiator has received positive replies from all relevant processes.

second phase, if the initiator learns that all processes have successfully taken tentative checkpoints, it asks them to make their tentative checkpoints permanent; otherwise, it asks them to discard them. A process, on receiving the message from the initiator, acts accordingly. A non-blocking checkpointing algorithm does not require any process to suspend its underlying computation. When processes do not suspend their computations, it is possible for a process to receive a computation message from another process which is already running in a new checkpoint interval. If this situation is not properly handled, it may result in an inconsistency.

Kalaiselvi S, Rajaraman V 2000, "Task graph based checkpointing in parallel/distributed systems". Parallel Distributed Comput. (submitted). Sadhana, Vol. 25, Part 5, October 2000, pp. 489-510.

In [8] Kalaiselvi VI and V Rajaraman Chandy & Lamport proposed a global snapshot algorithm for distributed systems. They observe that every checkpointing algorithm proposed for message-passing (MP) systems uses Chandy & Lamport's [1] algorithm as the base. They show that most of the algorithms proposed in the literature for checkpointing MP systems may be derived by relaxing various assumptions made by them and by modifying the way each step is carried out.

Modifications of Chandy and Lamport's algorithm:

Each step of the CL algorithm can be modified to accommodate some improvements in the basic global snapshot algorithm. In step one, a node saves its context in stable storage. The overhead associated with step one is context-saving overhead. The objective of saving the context in stable storage is to ensure its availability after a node failure. The overhead of context saving is proportional to the size of the context and the time taken to access the stable storage.

(a) Minimizing the context size, and

(b) Overlapping context saving with computation can thus reduce context- saving


In step two, markers are sent along all the outgoing channels. The purpose of a marker is

To inform the receiving node that a new checkpoint has to be taken;

To separate the messages of the previous and the current checkpoint interval.

At the time of checkpointing the centralized node informs all the nodes to initiate checkpoints through this marker message. CL algorithm sends markers along every channel to inform the nodes to log all transit messages onto stable storage. It is not necessary to send markers along all the channels as they may be safely eliminated along those channels in which there was no message exchange between the previous and the current checkpoint [9], [10].

Coordination through markers can also be achieved in two phases by delaying the message transmission between the two phases. Checkpointing can be coordinated without using markers by sending with regular messages a header, which has the checkpoint interval number in which the message originated. The simplest would be a one-bit header, which toggles between one and zero indicating the consecutive checkpoint intervals. Note that the marker overhead has now become header overhead; overhead due to appending headers with regular messages. When a message is received with a header value different from that of the receiving node, either a new checkpoint is initiated or the message is logged depending on whether the message is an orphan message or a missing message. This one-bit header complicates checkpoint initiation when out-of-sequence messages are encountered. Message sequence numbers along with checkpoint interval number in the message header can help in controlling the number of checkpoints along with logging of missing messages and elimination of orphan messages. The cost of this approach is the size of the header for maintaining the message sequence numbers and checkpoint interval umber.

When nodes initiate checkpoints on their own, it is called distributed checkpointing. If checkpoint initiation completely depends on the header of regular messages, those nodes, which have not communicated, with other nodes between consecutive checkpoints cannot participate in a global checkpoint. One can also use markers just to inform about checkpoints; markers take care of coordination and headers take care of message logging [10]. In all the schemes mentioned above, coordination is achieved at runtime and a consistent global state is always maintained in stable storage.

The next major alternative called independent checkpointing eliminates coordination overhead at runtime and forms a consistent global state only when it is needed, i.e. only at recovery time. Instead of coordinating the nodes during every global checkpoint, nodes can be coordinated once at recovery time to form a consistent global state. When there is no coordination, nodes should be able to initiate checkpoints independently on their own. To form a consistent global state at recovery time, nodes have to maintain multiple checkpoints and messages in stable storage. The advantages of this independent checkpointing are that

(i)incoordination and thereby the use of markers is eliminated;

(ii) nodes can initiate checkpoints at their convenience without being forced to initiate by

the receipt of marker messages.

The disadvantage is the maintenance of multiple checkpoints and message logs. Multiple checkpoints occupy more space and garbage collection algorithms can be run periodically to reclaim the space occupied by unwanted checkpoints. Consistent global state is constructed periodically and all the checkpoints which do not belong to the recovery line are declared unwanted checkpoints. Though special messages are used for identifying the recovery line, the frequency of usage is lower when compared with a coordinated algorithm based on markers. The other significant overhead in independent checkpointing is due to logging of messages (logging overhead) since it has to log all the messages received. Pessimistic logging approach has the advantage of faster recovery since it logs a message as and when it is received (Borg et al 1989).

By grouping the messages over a period and logging them once in a while optimistic logging approaches reduce the stable storage access overhead. If sufficient messages are not logged, multiple rollbacks are possible in optimistic logging schemes. One can also send sufficient information with regular messages so that messages can be logged selectively thereby reducing the message logging overhead. Optimistic schemes need a complicated recovery procedure. The advantages of pessimistic and optimistic schemes can be combined to achieve minimum logging overhead with faster recovery. Further modification of independent checkpointing algorithm is possible depending on where a message is logged; at places other than the receiver. The advantage is that the messages need not be logged onto stable storage. Yet another mode of coordination is to synchronize the clocks and initiate the checkpoints approximately at the same time in all the nodes. To account for the differences in the clock values, message sending can either be delayed during checkpointing or headers can be used with messages. Step three of the CL algorithm allows regular processing to proceed without waiting for the channel state recording and consequently the checkpoint operation to be completed. This is a good way of reducing the intrusion of a checkpointing algorithm but a better approach would be to overlap the context-saving process with regular computation. Step four of CL algorithm logs those messages which cannot be generated at recovery time. The purpose served by markers in identifying these messages can also be fulfilled by headers and this was mentioned.

D.V. Subba Rao and MM Naidu: A new, efficient corrdinated checkpointing protocol combined with selective sender based message logging, IEEE, 2008, Page(s): 444 - 447.

In [11] Suba Rao and Naidu presented their work for checkpointing algorithm combined with selective sender based message logging. This algorithm is free from problem of lost messages. This algorithm tolerates permanent faults in the presence of spare processors. In their absence it tolerates only transient failures. The term selective implies that messages are logged only within a specified interval known as active interval, thereby reducing message logging overhead. This algorithm minimizes different overheads like checkpointing overhead, message logging overhead, recovery overhead and blocking overhead.

"An Adaptive Index-based Algorithm using Time-coordination in Mobile Computing", by Yanping Gao, Changhui Deng, Yandong Che in the Proceedings of the 2008 International Symposium on Information Processing (ISIP 08), May 2008, pp.578-585. 

In [12] Gao-Deng-Che algorithm presented their work for an indes based algorithm using time coordination in mobile computing. They use integration of time base and index based checkpointing algorithm. The proposed algorithm does not use any control message. It is more efficient because it takes lesser number of checkpoints and does not need to compute dependency relationship. In time based checkpointing protocols there is no need to send extra coordination messages. However they have to deal with the synchronization of timers. This type of algorithm is suitable for applications where processes have low message sending rate.

Ajay D Kshemkalyani: "A symmetric O(n log n) message distributed snapshot algorithm for large scale systems", IEEE, 2010, pp 1-4

In [13] Ajay D Kshemkalyani presented a fast and message efficientshow that new algorithm is more efficient. He presented two new algorithms Simple Tree and Hypercube that use fewer message and have lower response time and parallel communication times. In addition the hypercube algorithm is symmetrical and has greater potential for balanced workload and congestion freedom. This algorithm have direct applicable in large scale distributed systems such as peer to peer and MIMD supercomputers which are a fully connected topology of a large number of processors. This algorithm is also useful for determine checkpoint in large scale distributed mobile systems.

Ajay D Kshemkalyani " Fast and message efficient global snapshot algorithms for large scale distributed systems" IEEE 2010. Page(s): 1281 - 1289.

In [14] Ajay D. Kshemkalyani has presented his work on large scale distributed systems and give two approaches, first are Simple Tree and second is Hypercube. He has shown that the response time and message complexity is minimum in these cases. Both algorithms are fast and required small numbers of message, this property make them highly scalable. The applications of this algorithm are in supercomputers and in MIMD processors.


Here we summirized that Checkpointing algorithms has the following desirable features:

The time taken by checkpointing algorithms should be minimum during failure free run.

Domino effect or Rollback propagations should be minimum.

Selective rollback should be possible.

Resources requirement for checkpointing should be minimum.

Recovery should be fast in event of failure .Availability of consistent global state in stable storage expedite recovery.