Failure Nodes Detector For Parallel And Distributed Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.


Failure nodes detection is very complex work in parallel and distributed image processing on cluster. In this paper I am proposing an algorithm to detect the multiple failure nodes. This algorithm is combination of time incrementing mechanism, and calculation of turnaround time of each node, automatic message update mechanism. Initially I calculating turnaround time of each node and I takemaximumturnaroundtime(tamax),fourminimumturnaroundtimes(tamin1,tamin2,tamin3,tamin4).Initially all nodes are broadcasting message with "fine" .First I setting the minimum turnaround time (tamin1) to start the process if all nodes are answered with in this tamin1 then all nodes are working .if only some nodes are answered then I will put these nodes in a working list and assign next minimum turnaround time(tamin2) for those not answered and I update message queue by latest messages this process is repeated till maximum turnaround time reached .There is no message even maximum turnaround time out then I declare failure node if does not answer.

Keywords: Image processing, Failure Detector, Turnaround time, Update messages, Fault Tolerant image processing


Fault tolerance is one of the major concerns in parallel and distributed image processing. If the size of image increase then automatically cluster size increased then there is chance to occur the failures I immediately identify the errors and correct them it is very import in parallel and distributed image processing. There are few errors like node failure, communication link failure and so on in this paper I identifying the node failure i.e. in group of nodes I cannot decide which one is failure and which one is active. If I cannot correct the failure node the entire cluster may not work it effects on entire system. Sometimes some nodes take the heavy load at the time they cannot send the reply message to coordinator even though it is healthy node recently some algorithms decide healthy nodes are failure but it is not correct I will find another solution that is proposed in this paper. Recently one algorithm proposed but in that it will give the best results but there is fixed time every node must be give their response with in time the algorithm increase time in every step for more load maintain nodes

In parallel and distributed image processing I accurately find the results that is more important in above proposed algorithms only 90% accurate there but it is not enough in image processing at least I get the 99.99% .why I maintain accurate results in image processing means image processing and computing has emerged as a key to many areas such as medical, defense, satellite, weatherforcasting, geographical biological like and so on

Related work:

Failure node detection algorithms are fully dependent on the heartbeat strategy. In this strategy every node must and should be give the response to their master whenever master send a message to confirm whether the node is active or faild.In order to know the status of every node the central coordinator send messages to all nodes with in cluster and fix the same a time with in the time every node send their message to the central coordinator[1] .But it is not good sometimes some nodes work is more it means more load and traffic due that they may or may not send the reply but the algorithm declare that is a failure node even though it is active to overcome this drawback in 2012 a paper proposed algorithm in this there is a user fix a minimum time and maximum time first initiate are minimum time with in that time every node send their response and increase time for those did not send their reply by more traffic and load and again increment time finally it reach maximum time up to that those did not send their response they declare failure nodes[2].N.Hayashibara proposed a failure nodes detector with k-fd which is fully based on statistical method. In this proposal a real number associates to each process being monitored which is representing the level of confidence that this process has crashed. The k-fd .the k-fd output a value which is calculated as a sum of contributions from expected heartbeats. The basic idea is that each missed heartbeat contributes to raise the level of suspicion of failure detector [3].A researcher W.Chen proposed another approach a set of metrics to evaluate the Quality of service of failure detectors means how fast failure detector fined accurate faults and how avoid false detections. It is study of quality of service of failure detectors in this he proposed set a measures (how fast failure detector find accurate faults and how Ill avoid false detections) to evaluate the QoS of failure detectors where message losses and message delay simply it is designed for how Ill a failure detector woks [4][8] .Another researcher design a new approach that is an adaptive failure detection protocol in this process Pi monitor process Pj state to know whether it is crashed or not .For each process Pi the protocol maintain two arrays of local variables one array contains sending times of the messages sent by Pi to Pj and whose acknowledgements has not yet been received by Pi and another one contains the biggest round trip time of the messages that Pi has sent to Pj and that have been acknowledged .Initially this variable set to zero. This protocol provides 3 primitives SEND M to Pi: used by Pi to send an application message M to Pj .RECEIVE M: used by Pi to receive an application message. Query (j): used to know whether Pj is suspected to have crashed. This primitive returns an answer namely, the value suspect or no suspect [7].I must distinguish slow process with crashed process this is based on load and traffic for this a author propose another approach that is load balancing .He design unified repartitioning algorithm is able to reduce more load overhead by reducing the load automatically process works increased then it send the response .This algorithm partitioning the load into same parts and distribute among the all process (i.e. Share the work )[9].A improved algorithm proposed by using above papers for detecting failure nodes by finding the turnaround time of each node .This proposed algorithm works by using this turnaround time.

Existing System: Failure node detection in parallel and distributed image processing is very complex task to finding the failure nodes so many algorithms discovered but there is low accuracy sometimes they identified healthy node is failure node. Due to more load it is main disadvantage. It is very important to improve accuracy of an algorithm. Sometimes they can't identify the crashed process from slow process

Proposed System

I calculating the turnaround time of each node in cluster (before inserting a node into cluster I assign a job how much time it takes to complete that job is consider as turnaround time of that node).And I take maximum turnaround (tamax) time among all nodes turnaround times similarly minimum turnaround time (tamin)and next maximum turnaround time(tamax1,tamax2,tamax).Even if load is increased the node will execute task within its turnaround time so it give response within the tamax.Now assign tamin as staring time next tamax1,next tamax2……up to tamax this is end of time for example there are 5 nodes n1,n2,n3,n4,n5,n6,n7,n8,n9,n10 and their turnaround times are 10,12,2,8,4,3,9,6,7,15 now take


If answer is not come within 2 sec then I increment next time i.e. tamax1=3 up to tamax=15.if there is no answer even 15 seconds time up then I declare failure nodes those are not answer. Update message queue in every time incrementing by latest messages these messages are useful to find out failure nodes in every time all nodes must be give their response that is "fine" those are answered within time the corresponding message node information is stored in queue. If time is increase for those are not answered, in this time again all nodes are answered even they already send message in previous time now I must update queue by latest messages. After maximum time completed the algorithm stop working now I search the message queue to find the messages those are send and those are not send with in the time if any one did not send reply that is failed


Calculate turnaround time of each node

Take maximum and minimum turnaround time and take next two minimum times


Set time(i)




If any reply


Put nodes in working list


Update list No



Stop the procedure and display failure nodes yes

Results and Discussion:

I have develop a simulator in java for failure nodes detection .The results are listed in below table those are obtained while simulation these are the better results compare with previous algorithm

No.of nodes

failure nodes detected by two algorithms

Adaptive staircase algorithm

Proposed algorithm






















Finally I found out a failure node successfully if it is failure only even though they have more traffic and load in this algorithm turnaround time so useful because for every node there is a turnaround time with in that time they complete give task even though they have more load I will set maximum turnaround time as a maximum time with in that all nodes must be give their message therefore I will find failure nodes.