This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Mobile ad hoc networking allows portable mobile devices to establish communication path without having any centralized infrastructure. As there is no centralized infrastructure and the mobile devices are moving randomly, this gives rise to various kinds of problems such as routing and detecting faulty mobile nodes in the network. The nodes may fail because of battery discharge, crash or limitation in age. In this thesis, the problem of adaptive fault diagnosis in mobile ad hoc networks (MANETs) is considered. Fault diagnosis in Mobile Ad-hoc Networks (MANETs) is very challenging task.
In fact, fault-diagnosis becomes an important building block to establish dependability in MANET. An important problem in MANET is the distributed system-level diagnosis problem whose purpose is to have each fault-free mobile node to determine the state of all the mobile nodes in the system. The parameters such as diagnostic latency and message complexity are used for evaluating the proposed diagnosis algorithm. The result shows that diagnosis latency and message complexity is reduced as compared to non-clustering distributed diagnosis algorithm Forward Heartbeat proposed in . Diagnosis algorithm should be efficient enough to find the status (either faulty or fault free) of each mobile in the network. The models in the literature are either for static fault or dynamic fault. Dynamic fault identification is more complex and difficult than static fault
A MANET is an autonomous collection of mobile nodes without any centralized infrastructure such as base station. Manets are very useful when infrastructure is not available, impractical or expensive because it can be rapidly deployable, without prior planning or any existing infrastructure. Mostly mobile ad hoc networks are used in military communication by soldiers, planes, tanks etc. Each node is equipped with wireless receivers and transmitter. Mobile host in a Manet may be highly mobile or stationary and may vary widely in terms of their characteristics, uses and capabilities. They may differ in terms of their communication transmission ranges, processing, storage and power capabilities, and exhibiting varying degree of reliability. Since nodes are mobile, the network topology may change rapidly and unpredictable over time. It has to support multi-hop paths for mobile nodes to communicate with each other and can have multiple hops over wireless links; also connection point to the internet may also change. If mobile nodes are within the communication range of each other then source can send message directly to the destination node otherwise it can send through intermediate node.
Nowadays, mobile ad-hoc networks have robust and efficient operation in mobile wireless networks as it include routing functionality into mobile nodes which is more than just mobile hosts and reduces the routing overhead and saves energy for other nodes. An important problem in designing dependable MANETS that are subject to the failure of mobile hosts is the distributed self diagnosis problem.
CHARACTERISTICS OF MANETS
The characteristics of ad-hoc network routing protocol are:
Dynamic topologies :
The topic refers to the most essential property of an ad hoc network. Nodes can move arbitrarily with respect to other nodes in the network.
Nodes in an ad hoc network are mobile. Thus, they are using radio links that have far lower capacity than hardwired links could use. In practice the realized throughput of a wireless network is less than a radio's theoretical maximum transmission rate.
Energy constrained operation:
Mobile nodes are likely to rely on batteries. That is why the primary design criteria may sometimes be energy conservation.
Limited physical security:
In general, radio networks are vulnerable to physical security threats compared to ï¬xed networks. The possibility of eavesdropping, spooï¬ng and DoS attacks is higher. Existing link security techniques can be applied. However, a single point failure in an ad hoc network is not as crucial as in more centralized networks
DISTRIBUTED SYSTEM LEVEL DIAGNOSIS:
An important problem in designing dependable MANETS that are subject to the failure of mobile hosts is the distributed self diagnosis problem. In distributed self diagnosis each working (fault free) mobile host maintain correct information about the status (working or failed) of each mobile host in the entire MANET for some corrective actions. The existing distributed self diagnosis algorithms have been developed for wired networks assuming a centralized infrastructure which creates a bottleneck and single point of failures.
ADAPTIVE SYSTEM LEVEL DIAGNOSIS
The field of distributed system-level diagnosis has flourished for years. Bianchini and Buskens introduced the Adaptive Distributed System level Diagnosis (Adaptive DSD) algorithm, and also its implementation in an Ethernet environment. Adaptive DSD has diagnosis latency of N testing rounds for a network of N nodes. Consider a system consisting of N units, which can be faulty or fault-free. The goal of system-level diagnosis is to determine the state of those units . For almost 30 years, researchers have worked on this problem. An adaptive approach, which requires fewer tests, is to assume that each unit is capable of testing any other, and to issue the tests adaptively, i.e., the choice of the next tests depends on the results of previous tests, and not on a fixed pattern. The Adaptive Distributed System-level Diagnosis algorithm, Adaptive-DSD is, at the same time, distributed and adaptive. Each node must be tested only one time per testing interval. All fault-free nodes achieve consistent diagnosis in at most N testing rounds. There is no limit on the number of faulty nodes for fault-free nodes to diagnose the system. Adaptive-DSD is executed at each node of the system at predefined testing intervals. Each time the algorithm is executed on a fault-free node, it performs tests on other nodes until it finds another fault-free node, or it runs out of nodes to test. A testing round is defined as the period of time in which all nodes of the system have executed Adaptive-DSD at least once. After one testing round, if there are at least two fault-free units, the testing graph has the format of a ring, as shown in Fig. In the example shown in Fig, node 1, node 4, and node 5 are faulty, and the rest are fault free. Node 0 tests node 1 and finds it faulty, so it goes on and tests node 2, which is fault-free, and then stops testing. Node 2 then tests node 3 as fault-free, and so on.
Each node i that executes the algorithm has an array, called TESTED-UPi, that contains N entries, indexed by the node identifier. The entry TESTED-UPi[k] = j means that node i has received diagnostic information from a
fault-free node specifying that node k has tested j to be fault-free. An entry TESTED-UPi[j] is "arbitrary" if node j is faulty. When node i finds node j to be fault-free, it saves this information in TESTED-UPi[i]. In the next testing round, this test data of i is taken by its first fault-free predecessor, and so on, until all nodes get the information. In this way, the diagnostic information in the TESTED-UP array is forwarded to nodes in the reverse direction of the testing network. Using the information in TESTED-UPi , a node i has to diagnose the state of all nodes in system; for this task, another algorithm, called Diagnose, is employed. Adaptive-DSD has a diagnosis latency of N testing rounds. It is desirable to reduce this latency.
In mobile ad hoc network , clustering can be defined as a notional arrangement of the dynamic nodes into various groups. These virtual collections of nodes are grouped together regarding their relative transmission range proximity to each other that allows them to establish a bidirectional link. The diameter size of the clusters determines the control architectures as single-hop clustering and multi-hop (K-hop) clustering. In single-hop clustering every member node is never more than 1-hop from a central coordinator - the clusterhead. Thus all the member nodes remain at most two hops n distance away from each other within a logical cluster. In multi-hop clustering, the limitation or restriction of an immediate proximity to member nodes from the head is removed, allowing them to be present in serial k-hop distance to form a cluster.
Ordinary nodes (cluster member): As the name suggests, ordinary nodes do not perform any
other function beyond a normal node role. They are members of an exclusive cluster independent of neighbors residing in a different cluster.
Cluster Gateway Nodes: Is a node that works as the common or distributed access point for two clusterheads. When a node remains within the transmission range of two clusterheads.
Clusterhead nodes: for any efficient cluster (subsets of nodes in a network satisfying a
particular property) operation there must be a support or backbone to sustain all essential control
functions such as channel access, routing, calculation of the routes for longer-distance messages,
bandwidth allocation, forwarding inter-cluster packets, power control and virtual-circuit support. This support or backbone takes the form of connected clusterheads, in managerial role;
linked either directly or via gateway nodes and they will have the subordinate nodes of that cluster linked to them. Another function of clusterheads is internal node communication, to forward interclustermessages. To send a packet an ordinary node must first direct it to its 'superior' its directly connected clusterhead.
A node becomes faulty because of battery discharge, crash and limitation in age. The presence of faulty node affects the efficiency and throughput of the network, which makes the network inconsistent. Faulty nodes cannot communicate with the other mobiles or behave unexpectedly and send unexpected results. In this way it unnecessarly consumes energy and cause inconsistency. Many protocols introduced by researchers to identify the fault in ad-hoc network are for static diagnosis, where node cannot change their status during diagnosis session. The fault (hard or soft) identification in dynamic diagnosis is more complex than static diagnosis; during the diagnosis fault-free node can be faulty.
Types of Faults
Each node in the system can be in one of two states faulty or fault-free. Faults can be categorized based on their duration, how it behaves after failure and occurrence of fault during diagnosis session.
Based on the Duration- Based on duration faults can be of three types:
1. Transient fault: A transient fault can disappear without any visible event; it appears in a network for short time. The recovery of transient faults from system is addressed using repeated-round techniques. A probabilistic model used for the action of faulty periods, and a fault analysis is used to obtain the optimum retry period.
2. Intermittent fault: It is problematic type of transient fault; can't predict its appearance and disappearance in the network. An intermittent fault is occurred by several factors . These factors can only be identified when malfunction is occurred. Intermittent faults are difficult to identify and repair.
3. Permanent fault: Once it appears in network it remains until it removed and repaired by some external administrator. Permanent faults are simpler to deal.
Based on the Behavior-Based on behavior faults can be of two types:
1. Soft Fault: Soft faulted units can communicate with its neighbors but with unexpected behaviors and always give undesirable response.
2. Hard fault: Hard faulted units cannot communicate with its neighbors. It neither sends nor receives any information from the network.
Based on the Occurrence-Based on occurrence faults can be of two types:
1.Static fault: All faulty nodes be faulty from the starting of diagnosis session.The fault-free node can't be faulty during diagnosis session.
2.Dynamic fault: Fault-free node may become faulty during diagnosis session.It is hard to diagnosis because any node may fail after it diagnosed fault-free by any fault-free node.
Another type of fault is Byzantine fault which fail the components of a system in arbitrary ways by processing requests incorrectly. It is of two types:
1.Omission failures: This type of failure doesn't response for a request, e.g.,crash, failing to receive a request, or failing to send a response.
2. Commission failures: This type of failure may respond in any unpredictable way, e.g., processing a request incorrectly, corrupting local state, and/or sending an incorrect or inconsistent response to a request.
With the fast development of mobile ad hoc networks (MANETs), fault diagnosis has become a critical need to guarantee robust service for various applications. Many techniques have been suggested to solve this problem, but they still cannot satisfy the special need of MANETs Fault identification is one of the important parts in many protocols. When any altered behavior is shown by system or nodes of the network, a diagnosis function is started to determine which node(s) has(have) shown abnormal behavior. This is termed as Diagnosis; diagnosis is classified based on the occurrence of fault. It is simply classified as static diagnosis and dynamic diagnosis.
In static diagnosis, the faults are not occurring during the diagnosis session. In dynamic diagnosis, the faults can occur during the diagnosis session, which is difficult to handle because node can be faulty after it has been diagnosed as fault-free by other node.
Methods of Fault Diagnosis
Several diagnosis methods have been adopted based either on invalidation models, such as the PMC model, or comparison models and the generalized comparison model. The comparison model is most promising approach in which a set of task is assigned to nodes and outcomes are compared with their neighbour's outcomes. Various generalized comparison approach have been used. In this approach the comparison is done by the nodes themselves. The generalized comparison outcomes can be summerized as follows. If the tester and the tested nodes are fault-free, the comparison outcome is 0. If at least one of the tested nodes is faulty and the tester node is fault-free comparison outcome is 1. If the tester node is faulty, the comparison result is unpredictable (0 or 1).
1)Fault-tolerant cluster-based routing approach in wireless mobile ad hoc networks(2001)
In this paper authors(Sheng Xu, Symeon Papavassiliou )propose a new algorithm for routing in mobile survivable networks, based on the combination of position-based routing concepts and fault tolerant routing techniques in computer networks. By using combination of these two concepts, achieve to employ a simplified way of localizing routing overhead while at the same time they improve the operational effectiveness of the position-based routing approaches by alleviating some of the drawbacks associated with them, such as routing deadlock occurrences, and therefore creating a robust and fault tolerant routing strategy.
The Position guided Sliding-window Routing (PSR) protocol. This protocol provides a single-tier routing organization scheme by employing a simplified way of localizing routing overhead. In this paper they enhance this approach by adding an additional level of hierarchy (on the cluster level which is of much smaller scale) in order to improve the operational effectiveness of this scheme and alleviate some of the drawbacks associated with the position-based protocols (such as routing deadlock occurrences).
To overcome the drawback Deadlock and loop that is inherent to the position based routing schemes, gateways are used as intermediate hops along the path to the destination. When a packet arrives to a gateway, some calculations is performed at the gateway to see if there exists a path between the local node to another gateway of local cluster that is closer to the destination. If deadlock/loop is found during this operation, it is better to request the gateway of the previous cluster to change the path to another cluster. The grid-clustered PSR is used to avoid deadlock/loop creation.
In this the problem of providing scalable fault tolerant routing in large-scale mobile wireless networks is considered. A new algorithm for routing in mobile survivable networks, based on the combination of position-based routing concepts and fault-tolerant routing techniques in computer networks. By using combination of these two concepts, achieve to employ a simplified way of localizing routing overhead while at the same time improve the operational effectiveness of the position-based routing approaches by alleviating some of the drawbacks associated with them, such as routing deadlock occurrences, and therefore creating a robust and fault tolerant routing strategy. It should be noted that although adopt the concept of cluster creation, avoid the use of cluster-heads and thus achieve high level of fault-tolerance. Also, because of the introduction of cluster level, less service information is required to be transmitted, thus it is easier to use more elaborate adaptive routing techniques, further improving network performance. Performing a complete and in-depth comparative evaluation of the basic PSR approach with the grid-clustered PSR scheme, in terms of many performance parameters such as, average packet delivery time, maximal throughput, number of hops, packet delivery ratio, etc.
2 )A Hierarchical Adaptive Distributed System-Level diagnosis algorithm (1998)
In this paper authors(Elias Procópio Duarte, Takashi Nanya)In this Consider a system composed of N nodes that can be faulty or fault-free. The purpose of distributed system-level diagnosis is to have each fault-free node determine the state of all nodes of the system. This paper presents a Hierarchical Adaptive Distributed System-level Diagnosis (Hi-ADSD) algorithm, which is a fully distributed algorithm that allows every fault-free node to
achieve diagnosis in, at most, (log2 N)2 testing rounds. Nodes are mapped into progressively larger logical clusters, so that tests are run in a hierarchical fashion. Each node executes its tests independently of the other nodes, i.e., tests are run asynchronously. All the information that nodes exchange is diagnostic information. The algorithm assumes no link faults, a fully-connected network and imposes no bounds on the number of faults. Both the worst-case diagnosis latency and correctness of the algorithm are formally proved. As an example application, the algorithm was implemented on a 37-node Ethernet LAN, integrated to a network
management system based on SNMP (Simple Network Management Protocol). Experimental results of fault and repair diagnosis are presented. This implementation by itself is also a significant contribution, for, although fault management is a key functional area of network management systems, currently deployed applications often implement only rudimentary diagnosis mechanisms.
Hi-ADSD maps nodes to clusters and uses a divide-and-conquer testing strategy to achieve diagnosis in, at most, log2 N testing rounds. In this way, Hi-ADSD improves the diagnosis latency of previous algorithms, while keeping the number of tests conveniently low. The correctness and worst-case latency of the algorithm were formally proven. Hi-ADSD was implemented, integrated to an SNMP based network management system on a 37-node Ethernet LAN. As SNMP applications are currently widely deployed, but fault management is still based on rudimentary procedures, this implementation by itself is also a significant contribution to the field of network management.
3) A Fault-Tolerant Mutual Exclusion Resource Reservation Protocol for Clustered Mobile Ad hoc Networks(2007)
In this paper authors(Mohammad Moallemi, Mohammad Hossien Yaghmaee Moghaddam)Resource reservation and mutual exclusion are challenging problems in mobile ad-hoc networks(MANET). Due to the dynamic characteristics of nodes in these networks, yet, few algorithms have been proposed. The other problem in these networks is link or node failure due to many reasons (e.g. running out of battery, hardware software crash, getting out of transmission range due to high mobility). Thus fault tolerance for these algorithms is another necessity which hasn't been completely accomplished. In this paper author proposed an algorithm which is completely
fault tolerant (covers temporary and permanent faults). It also has the mutual exclusion property for critical resource reservations. The proposed algorithm uses three recovery processes to maintain the stable state for whole system.
The proposed algorithm builds a fault tolerance for Naimi-Trehel algorithm and distributed it independently on different clusters. They have used the hierarchical structure and the proposed algorithm supports fault tolerance capability for both temporary and permanent failures. It has three recovery processes to recover failures. They have three internal cluster broadcasts and one overall broadcast for recovering a failure in recovery processes R1 and R2. The proposed algorithm preserves the order of the token requests after the failure and can cover N-1 failures of N nodes. The proposed algorithm's Safety and Liveness properties to show its integrity.
4)Design and Evaluation of a Failure Detection Algorithm for Large Scale Ad Hoc Networks Using Cluster Based Approach(2008)
In this paper  authors(Pabitra Mohan Khilar, 2Jitendra Kumar Singh) propose a scalable failure detection service for large scale ad hoc networks using an efficient cluster based communication architecture. Their failure detection service adapts the detection parameter to the current load of the wireless ad-hoc network. The proposed approach uses a heartbeat based testing mechanism to detect failure in each cluster and take the advantage of cluster based architecture to forward the failure report to other cluster and their respective members.
In This paper the Clustering concept is used. The failure detection algorithm coupled with suitable clustering algorithm make a very efficient failure detection service for wireless ad-hoc networks. Clustering divides whole network into two level communication architecture namely intra-cluster and inter-cluster. Two types of message overheads are required to maintain such as intra-cluster and inter-cluster. The disadvantage of the clustering approach is that CH itself may fail, hence it becomes necessary that the presence of leader is also need to be monitored and in case of its failure another node takes over the CH. Author use the concept of deputy clusterhead or backup cluster head to solve this problem. They will choose this member as deputy clusterhead (DCH), who can monitor the leader as follows: (i) After every heartbeat interval, CH node sends a packet to the backup clusterhead, (ii) The packet contains information about each nodes in the group and its arrival indicates that the CH is up and running, (iii) The deputy clusterhead (DCH) updates its database using data obtained from this packet, (iv) In case of absence of this packet indicating that the primary CH has failed, DCH assumes the role of the leader, (v) This change is multicast to the cluster members who update their database in order to change the communication path of the heartbeat messages, and (vi) The same is multicast to the other CHs through GWs who multicast it to their respective members.
A scalable failure detection service for large scale ad hoc networks using an efficient cluster based communication architecture. Failure detection service adapts the detection parameter to the current load of the wireless ad hoc network. The proposed approach uses a heartbeat based testing mechanism to detect failure in each cluster and take the advantage of cluster based architecture to forward the failure report to other cluster and their respective members. The simulation results show that this approach is linearly scalable in terms of message complexity and consensus time.
5)Hierarchically Adaptive Distributed Fault Diagnosis in Mobile Ad hoc Networks Using Clustering (2010)
In this paper  authors(Nishi Yadav,2P.M. Khilar) Ad hoc networking allows portable mobile devices to establish communication path without having any central infrastructure. As there is no centralized infrastructure and the mobile devices are moving randomly, this gives rise to various kinds of problems such as routing and detecting faulty mobile nodes in the network. In this paper, the problem of fault diagnosis in mobile ad hoc networks (MANETs) is considered. In fact, fault-diagnosis becomes important building block to establish dependability in MANET. An important problem in MANET is the distributed system-level diagnosis problem whose purpose is to have each fault-free mobile node to determine the state of all the mobile nodes assuming a MANET composed of N nodes that can be faulty or fault-free. This paper uses a hierarchical clustering approach proposed by authors Durateand Nanya for diagnosing nodes in mobile ad hoc networks (MANETs) . The proposed diagnosis algorithm is linearly scalable under the assumption that the mobiles may be: (i) crash faulty due to out of range or physical damage and (ii) value faulty due to sending erroneous messages while operating in the field. The generic parameters such as diagnostic latency and message complexity are used for evaluating the proposed diagnosis algorithm. In this paper, author proposed a hierarchically adaptive distributed diagnosis algorithm for diagnosing crash and value faulty nodes in MANET based on Hi-ADSD. Hi-ADSD maps nodes to cluster and uses a divide-and-conquer testing strategy to achieve diagnosis.
6)Comparison-Based System-Level Fault Diagnosis in Ad Hoc Networks(2011)
In this paper  authors(Stefano Chessa Paolo Santi,) the problem of fault identification in ad-hoc networks. We presented a new comparison-based diagnostic model based on the one-tomany
communication paradigm which takes advantage of the shared nature of communication typical of multi-hop packet radio networks. We presented two implementations of the model. The first implementation assumes that the network topology is fixed. Under this scenario, hard faults can be detected using a timeout, and efficient diagnosis protocols can be easily designed. If the fixed topology assumption is released, thus taking into account a relevant feature of ad-hoc networks, the .diagnostic efficiency. Of the model decreases notably: hard-faults cannot be detected and fault-free nodes are no longer guaranteed to correctly diagnose all their neighbors within a certain time. This indicates that the design of diagnosis protocols ensuring correct diagnosis within a limited time could turn out to be very difficult under this scenario. It is our opinion that achieving correct diagnosis in the traditional sense (i.e., all the fault-free units of the system correctly diagnose the state of any other unit in finite time) in mobile systems be extremely hard, unless some restrictions on the mobility of the units are imposed. The identification of a .minimal. set of restrictions ensuring a somewhat weaker notion of correct diagnosis is matter of ongoing research.
The problem of identifying faulty mobiles in ad-hoc networks is considered. Current diagnostic models were designed for wired networks, thus they do not take advantage of the shared nature of communication typical of ad-hoc networks. In this paper we introduce a new comparison-based diagnostic model based on the one-tomany communication paradigm. Two implementations of the model are presented. In the first implementation, we assume that the network topology does not change during diagnosis, and we show that both hard and soft faults can be easily detected. Based on this implementation, a diagnosis protocol is presented. The evaluation of the communication and time complexity of the protocol indicates that efficient diagnosis protocols for ad-hoc networks based on our model can be designed. In the second implementation we allow the system topology to change during diagnosis. As expected, the ability of diagnosing faults under this scenario is significantly reduced with respect to the stationary case.
7)A Distributed System-Level Diagnosis Algorithm for Arbitrary Network Topologies(2005)
In this paper  authors(Sampath Rangarajan, Anton T. Dahbura) a distributed algorithm is described for detecting and diagnosing faulty processors in an arbitrary network. Fault-free processors perform simple periodic tests on one another; when a fault is detected or a newly-repaired processor joins the network, this new information is disseminated in parallel throughout the network. It is formally proven that the algorithm is correct; and it is also shown that the algorithm is optimal in terms of the time required for all of the fault-free processors in the network to learn of a new event. Simulation results are given for arbitrary network topologies.
We have presented a distributed algorithm for fault diagnosis that uses parallel dissemination of fault event information to minimize the information latency in the network. Simulation of this algorithm using the process-oriented simulation language CSIM shows that parallelizing the dissemination stage also allows for nodes that are local to the event to, in general, learn about the event before more distant nodes. Further, in our algorithm, a newly repaired node can rejoin the system without relying on other nodes to first detect that it has been repaired; equivalently, faulty nodes do not have to be periodically tested. Our algorithm provides an option through which dead messages can be removed at the cost of increasing the information latency; that is, a tradeoff can be made between message overhead and latency.
8)A Failure Detection Service for Large-Scale Dependable Wireless Ad-Hoc and
In this paper  authors(Mourad Elhadef and Azzedine Boukerche) Dependable mobile ad-hoc networks are being designed to provide reliable and continuous service despite the failure of some of their components. One of the basic building blocks that has been identified for such fault tolerant systems is the failure detection service which aims at providing some information on which hosts have crashed. In this paper, we present a new implementation of a failure detection service for wireless ad-hoc and sensor systems that is based on an adaptation of a gossip-style failure detection protocol and the heartbeat failure detector. We show that our failure detector is eventually perfectâˆ’That is, it satisfies both properties: strong completeness and eventual strong accuracy. Strong completeness means that there is a time after which every faulty mobile is permanently suspected by every fault-free host. While, eventual strong accuracy refers to the fact that no host will be suspected before it crashes.
The proposed failure detector is a variant of the heartbeat failure detector and allows each host to maintain a list of hosts it currently suspects of having crashed. The main characteristics of our failure detector is that it is adaptable and dynamic, that is, it adapts the freshness points to the
current network or hosts' load. The distributed failure detection service can be used by distributed applications directly, or support other middleware services such as system management, load balancing and group communication and membership services. As such, failure detection is a valuable extension to current dependable services that a wireless environment is expected to provide. In future investigations, we are planning to conduct extensive simulations of the proposed failure detector for various MANETs. We are also looking at a more adaptable failure detection service that would be more scalable for large MANETs. We are also investigating the QoS of our adaptive failure detection service.
9)Adaptive Fault Tolerance in Distributed Systems(2011)
In this paper  authors(Roger Bharath, Melanie Dumas, and Mevlut Erdem Kurul) Reliable distributed systems provide high availability for an important class of applications through a combination of software and hardware support. Redundancy and replication are essential features of these systems but both come with a high cost. One trend that promises to provide more intelligence to the allocation of resources in this environment is adaptation. Adaptive fault tolerance is the idea of adaptively configuring system resources to respond to environmental changes (i.e. faults). This paper presents an overview of several adaptive fault tolerant systems,
and describes the challenges involved in their implementation.
we presented a unified model highlighting fundamental components in the design of an adaptive fault tolerant system. We used our model to describe a selection of recent representative systems and expose the design decisions made during their construction. Adaptive fault tolerance can increase availability, reliability and decrease cost in a distributed computing environment. Present-day AFT systems are mature in their use of redundancy, communication and synchronization but to further the goal of reliability other directions need to be explored.
Environment awareness and other proactive measures are features of AFT that we believe future systems will attempt to leverage.
10)Failure detection and consensus in the crash-recovery model
In this paper  authors(Marcos Kawazoe Aguilera, Wei Chen2, Sam Toueg) the problems of failure detection and consensus in asynchronous systems in which processes may crash and recover, and links may lose messages. We first propose new failure detectors that are particularly suitable to the crash-recovery model. We next determine under what conditions stable storage is necessary to solve consensus in this model. Using the new failure detectors, we give two consensus algorithms that match these conditions: one requires stable storage and the other does not. Both algorithms tolerate link failures and are particularly efficient in the runs that are most
likely in practice - those with no failures or failure detector mistakes. In such runs, consensus is achieved within 3_ time and with 4n messages, where _ is the maximum message delay and n is the number of processes in the system
11)Multipath Routing in Mobile Ad Hoc Networks Issues and Challenges
In this paper  authors(Stephen Mueller1, Rose P. Tsang2, and Dipak Ghosal1) Mobile ad hoc networks (MANETs) consist of a collection of wireless mobile nodes which dynamically exchange data among themselves without the reliance on a fixed base station or a wired backbone network. MANET nodes are typically distinguished by their limited power, processing, and memory resources as well as high degree of mobility. In such networks, the wireless mobile nodes may dynamically enter the network as well as leave the network. Due to the limited transmission range of wireless network nodes, multiple hops are usually needed for a node to exchange information with any other node in the network. Thus routing is a crucial issue to the design of a MANET. we specifically examine the issues of multipath routing in MANETs. Multipath routing allows the establishment of multiple paths between a single source and single destination node. It is typically proposed in order to increase the reliability of data transmission (i.e., fault tolerance) or to provide load balancing. Load balancing is of special importance in MANETs because of the limited bandwidth between the nodes. We also discuss the application of multipath routing to support application constraints such as reliability, load-balancing, energy-conservation, and Quality-of-Service (QoS)
multipath routing in ad hoc networks mostly in terms of the network layer. We have not made mention of the interaction of multipath routing with the transport layer, in particular, TCP. The main issue that must be dealt with at the transport layer is the arrival of out-oforder packets when multiple paths are used in a round-robin fashion. In TCP, out-of-order packets are assumed to signal congestion in the network, at which point TCP reduces its window size. This can have a detrimental effect on the overall throughput seen by TCP connections. Therefore, the implementation of a TCP-friendly multipath protocol is necessary. We have discussed CHAMP, which uses equal length paths to reduce out-of-order packets. However, finding only equal length paths puts a restriction on the number of paths you can find. If unequal paths are chosen, there could also be ways to perform intelligent traffic allocation depending on path lengths and path delays such that out-of-order packets are minimized. For instance, sending later packets over shorter paths and earlier packets over longer paths may result in reduced out-of-order packets at the receiver. This implies intelligently sending packets out-of-order such that they arrive in-order at the receiver.
In our discussion of using multipath routing to support QoS, most of the protocols proposed only provide QoS in terms of specific metrics, such as bandwidth, delay, or reliability. However, it may be necessary to develop mechanisms to support QoS in terms of multiple metrics. For instance, when searching for multiple paths that have the required bandwidth, it is desirable to find reliable paths. Given the faulty nature of MANETs, constructing a multipath route that meets the bandwidth requirements while also meeting certain reliability requirements would result in better performance. Also, the mechanisms proposed for supporting QoS in terms of delay only attempt to minimize or improve on the delay. It would be desirable to develop a multipath protocol that can provide delay bounds or guarantees, which are required by some real-time applications. Using multipath routing to provide adaptive QoS using source coding is also
a promising technique that can be expanded upon for applications other than video.
12)An Adaptive Fault Tolerant Multipath Routing (AFTMR) Protocol for Wireless Ad Hoc Networks(2012)
In this paper  authors(K. Vanaja, R. Umarani) The increasing popularity in wireless communication devices and the advancements in wireless technology make the communication in an effective and efficient manner. Mobile ad hoc Network (MANET) is a kind of wireless network, having collection of mobile nodes communicated through wireless links without using any infrastructure. Routing Protocols are necessary for forwarding of data packets to have effective communication. The performance of MANET routing protocols degrade the network
performance when there is a link break. This paper mainly deals with the fault management to resolve the mobility induced link break. The proposed protocol is the adaptive fault tolerant multipath routing (AFTMR) protocol which reduces the packet loss due to mobility induced link break. In this fault tolerant protocol, battery power and residual energy are taken into account to determine multiple disjoint routes to every active destination. When there is link break in the existing path, AFTMR initiates Local Route Recovery Process. Network Simulator NS-2 is used for implementation and performance analyzed using the quantitative metrics such as packet delivery ratio, end to end delay, control overhead, throughput and packet drop. Simulation results show that the proposed protocol achieves better throughput, packet delivery ratio with reduced delay, packet drop and energy.
13)An Adaptive Fault Identification Protocol for an Emergency/Rescue-Based Wireless and Mobile Ad-Hoc Network(2007)
In this paper  authors(Mourad Elhadef, Azzedine Boukerche, and Hisham Elkadiki) the fault diagnosis problem in MANETs, i.e. the problem of identifying faulty hosts by fault-free ones. The diagnosis scheme that we consider is that based on the comparison approach, where hosts transmit test tasks to their neighbors and the outcomes are compared. By comparing the received outcomes fault-free hosts are able to diagnose the fault status of the network. We propose an adaptive distributed diagnosis algorithm that uses an adaptable spanning tree to disseminate the local diagnosis views throughout the ad-hoc network. The protocol allows all fault-free hosts to correctly identify all faulty ones, and it constitutes a viable addition to existing
new adaptive fault identification protocol, called Adaptive-DSDP, for fixedtopology MANETs. The diagnosis is based on the comparison approach and accomplishes a correct and complete fault identification. Adaptive-DSDP uses a spanning tree in order to disseminate the local diagnosis views gathered separately by the mobiles. The spanning tree is initially configured with the MANET, and then adapted to any faulty situation that might affect any of its internal nodes. In future work, we are investigating dynamic fault identification solutions that will be able to tolerate the occurrence of faults during the diagnosis session. We are also investigating a self-diagnosis approach that would be more appropriate for sensor networks. Last but not least, we aim the development of new adaptive failure detector that can be used by MANETs' applications or routing protocols in order collect information on the fault status of the MANETs.
14)Adaptive Fault Detection Approaches for Dynamic Mobile Networks(2011)
In this paper  authors(Dingxiang Liu and Jamie Payton) Recent technological advancements have led to the popularity of mobile devices, which can dynamically form wireless networks. Unfortunately, mobile devices are vulnerable to failure because of various factors, including physical damage due to deployment in harsh environmental conditions, limited energy, and malicious attacks. Detecting node failure is an important problem that has been widely studied; recent attention has focused on determining failure when nodes are mobile. Detection of node failure requires additional messages to be sent across the network, which is costly in terms of energy consumption. We contend that fault detection algorithms should be designed with consideration of the tradeoffs between cost and accuracy of fault detection. In this paper, we present two approaches to dynamically adapting a fault detection algorithm. We compare our adaptive approaches to existing approaches and evaluate the tradeoffs between cost and accuracy
Two approaches to adapting fault detection in dynamic mobile networks. We use a clusterbased
probe-and-ack algorithm to illustrate how a) applicationspecific requirements can be used to drive the adaptation of the rate at which failure detection probes are issued and b) how failure detection history can be used to drive adaptation of the interrogation period. The use of either of these approaches can result in the reduction of network load and message overhead, which can extend the lifetime of the network.
15)Survey of Modern Fault Diagnosis Methods in Networks(2012)
In this paper  authors(Zijian Yang, Yong Wang Jiaguo Lv) modern computer networks, fault diagnosis has been a focus of research activity. This paper reviews the history of fault diagnosis in networks and discusses the main methods in information gathering section, information analyzing section and diagnosing and revolving section of fault diagnosis in networks. Emphasis will be placed upon knowledge-based methods with discussing the advantages and shortcomings of the different methods. The survey is concluded with a description of some open problems.
Modern fault Diagnosis methods in computer networks, focuses on the contributions which we think close to the modern theory and may gain some relevance for the future research and practical applications. fault diagnosis in networks has made great progress in common fault detecting and localization. Each method of fault diagnosis in networks relies on one or more theories, which determinates the application of method.
16)Failure detection and consensus in the crash-recovery model(2000)
In this paper  authors (Marcos Kawazoe Aguilera1, Wei Chen2, Sam Toueg) the problems of failure detection and consensus in asynchronous systems in which processes may crash and recover, and links may lose messages. We first propose new failure detectors that are particularly suitable to the crash-recovery model. We next determine under what conditions stable storage is necessary to solve consensus in this model. Using the new failure detectors, we give two consensus algorithms that match these conditions: one requires stable storage and the other does not. Both algorithms tolerate link failures and are particularly efficient in the runs that are most likely in practice - those with no failures or failure detector mistakes. In such runs, consensus is achieved within 3_ time and with 4n messages, where _ is the maximum message delay and n is the number of processes in the system.
18)Towards Network Invariant Fault Diagnosis in MANETs via Statistical Modeling: The Global Strength of Local Weak Decisions(2012)
In this paper  authors (Akshay Vashist, Rauf Izmailov, Kyriakos Manousakis, Ritu Chadha, C. Jason Chiang, Constantin Serban) fault detection and localization is a well-studied problem in communication networks, as attested by the many techniques designed to address this problem. The inherent variability, limited component reliability, and constrained resources of MANETs (Mobile Ad hoc Networks) make the problem not just more important, but also critical. Practical development and deployment considerations imply that fault detection and localization methods must i) avoid relying on overly detailed models of network protocols and traffic assumptions and instead rely on actual cross-layer measurements/observations, and ii) be applicable across different network scales and topologies with minimum adjustments.
the feasibility of such goals, and proposes an important and as yet unexplored approach to fault
management in MANETs: network-invariant fault detection, localization and diagnosis with limited knowledge of the underlying network and traffic models. We show how fault management methods can be derived by observing statistical network/traffic measurements in one network, and subsequently applied to other networks with satisfactory performance. We demonstrate that a carefully designed but widely applicable set of local and weak global indicators of faults can be efficiently aggregated to produce highly sensitive and specific methods that perform well when applied to MANETs with varying sizes, topologies, and traffic matrices.
Finding and Analysis
To design and develop a adaptive distributed system level diagnosis algorithm for identifying the fault status of various nodes in cluster where nodes are subjected to crash and value faulted nodes MANET based on adaptive distributed diagnosis algorithm
â€¢ To analyze and validate the performance of the proposed adaptive distributed diagnosis algorithm using mat lab.
â€¢ To compare the proposed method with the existing algorithms based on message and time complexity.
Introduction to MATLAB
MATLAB (matrix laboratory) is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages, including C, C++, Java, and Fortran.
Although MATLAB is intended primarily for numerical computing, an optional toolbox uses the MuPAD symbolic engine, allowing access to symbolic computing capabilities. An additional package, Simulink, adds graphical multi-domain simulation and Model-Based Design for dynamic and embedded systems.
In 2004, MATLAB had around one million users across industry and academia. MATLAB users come from various backgrounds of engineering, science, and economics. MATLAB is widely used in academic and research institutions as well as industrial enterprises.
MATLAB is a high-performance language for technical computing. It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation. Typical uses include:
Math and computation
Modeling, simulation, and prototyping
Data analysis, exploration, and visualization
Scientific and engineering graphics
Application development, including Graphical User Interface building
MATLAB is an interactive system whose basic data element is an array that does not require dimensioning. This allows you to solve many technical computing problems, especially those with matrix and vector formulations, in a fraction of the time it would take to write a program in a scalar non interactive language such as C or Fortran.
The name MATLAB stands for matrix laboratory. MATLAB was originally written to provide easy access to matrix software developed by the LINPACK and EISPACK projects, which together represent the state-of-the-art in software for matrix computation.
MATLAB has evolved over a period of years with input from many users. In university environments, it is the standard instructional tool for introductory and advanced courses in mathematics, engineering, and science. In industry, MATLAB is the tool of choice for high-productivity research, development, and analysis.
MATLAB features a family of application-specific solutions called toolboxes. Very important to most users of MATLAB, toolboxes allow you to learn and apply specialized technology. Toolboxes are comprehensive collections of MATLAB functions (M-files) that extend the MATLAB environment to solve particular classes of problems. Areas in which toolboxes are available include signal processing, control systems, neural networks, fuzzy logic, wavelets, simulation, and many others.
The MATLAB System
The MATLAB system consists of five main parts:
The MATLAB language.
This is a high-level matrix/array language with control flow statements, functions, data structures, input/output, and object-oriented programming features. It allows both "programming in the small" to rapidly create quick and dirty throw-away programs, and "programming in the large" to create complete large and complex application programs.
The MATLAB working environment.
This is the set of tools and facilities that user work with as the MATLAB user or programmer. It includes facilities for managing the variables in their workspace and importing and exporting data. It also includes tools for developing, managing, debugging, and profiling M-files, MATLAB's applications.
This is the MATLAB graphics system. It includes high-level commands for two-dimensional and three-dimensional data visualization, image processing, animation, and presentation graphics. It also includes low-level commands that allow user to fully customize the appearance of graphics as well as to build complete Graphical User Interfaces on their MATLAB applications.
The MATLAB mathematical function library.
This is a vast collection of computational algorithms ranging from elementary functions like sum, sine, cosine, and complex arithmetic, to more sophisticated functions like matrix inverse, matrix Eigen values, Bessel functions, and fast Fourier transforms.
The MATLAB Application Program Interface (API).
This is a library that allows user to write C and Fortran programs that interact with MATLAB. It include facilities for calling routines from MATLAB (dynamic linking), calling MATLAB as a computational engine, and for reading and writing MAT-files.