The Adaptive Distributed Diagnosis In Mobile Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Mobile ad hoc networking allows portable mobile devices to establish communication path without having any centralized infrastructure. As there is no centralized infrastructure and the mobile devices are moving randomly, this gives rise to various kinds of problems such as routing and detecting faulty mobile nodes in the network. The nodes may fail because of battery discharge, crash or limitation in age. In this thesis, the problem of adaptive fault diagnosis in mobile ad hoc networks (MANETs) is considered. Fault diagnosis in Mobile Ad-hoc Networks (MANETs) is very challenging task.

In fact, fault-diagnosis becomes an important building block to establish dependability in MANET. An important problem in MANET is the distributed system-level diagnosis problem whose purpose is to have each fault-free mobile node to determine the state of all the mobile nodes in the system. The parameters such as diagnostic latency and message complexity are used for evaluating the proposed diagnosis algorithm. The result shows that diagnosis latency and message complexity is reduced as compared to non-clustering distributed diagnosis algorithm Forward Heartbeat proposed in [1]. Diagnosis algorithm should be efficient enough to find the status (either faulty or fault free) of each mobile in the network. The models in the literature are either for static fault or dynamic fault. Dynamic fault identification is more complex and difficult than static fault



We define some terms as follows before we use them in this document. A router or node is a basic hardware/software unit in the network that has the capability to forward packets based on its local routing table. Examples include Routers 1, 2, and 3 in Figure 2.1. A host is another basic unit in the network that may attach to or act as a router. It can be either the source or the sink of a data flow in the network. In the example shown in Figure 2.1, three computers, Host 1, Host 2, and Host 3, are three hosts in the example network. Network components refer to hosts and routers in a network. A host M is said to be a neighbor of another host, say N, if there is a bidirectional link between M and N. We can also say that hosts M and N are neighbors. For example, Router 3 and Host 3 in Figure 2.1 are neighbors. Router 1 and Router 2 are also neighbors. A route or path is a sequence of hosts connecting two end hosts or routers within which any two adjacent hosts in the route or path are neighbors. For example, a path between Host 1 and Host 2 contains Host1, Router 1, Router 2 and Host 2.

FIGURE 1.1: Illustration of some terms used in this document.

Wireless Networks

Like traditional wired networks, wireless networks are formed by routers and hosts. In a wireless network, the routers are responsible for forwarding packets in the network and hosts may be sources or sinks of data flows. The fundamental difference between wired and wireless networks is the way that the network components communicate. A wired network relies on physical cables to transfer data. In a wireless network, the communication between different network components can be either wired or wireless. Since wireless communication does not have the constraint of physical cables, it allows a certain freedom for the hosts and/or routers in the wireless network to move. This is one of the advantages of a wireless network. Network components in a wireless network communicate with others using wireless channels. Different radio frequency (RF) spectrum ranges are used in wireless networks, for example, 27.5-29.5 GHz for the Local Multipoint Distribution System (LMDS)[2] , 2.5-2.7 GHz for the Multipoint Multichannel Distribution System[3] , and 5.15-5.35 GHz and 2.4-2.58 GHz for IEEE 802.11a[4] and 802.11b[5] , respectively. Signal strength in a wireless medium decreases when the signal travels further.

When the signal travels beyond a certain distance, the strength reduces to the point where reception is not possible[6]. The distance that the signal travels when it reaches this point is called the radio range for this signal. To simplify the transmission model regarding this property, people assume that the wireless signal is strong enough for the receivers to receive the signal if the receivers are inside of the radio range. Otherwise, the receivers cannot receive the signal at all. The details of wireless communications are not in the scope of this dissertation. Refer to [6] for details.

Several medium access control (MAC) protocols are used in wireless networks to manage the use of the wireless medium. Examples include the Bluetooth MAC layer protocol[7] and IEEE 802.11MAClayer protocol[5] .The details of these MAC protocols are beyond our scope. Refer to [7] and [5] for more details.

Because radio range is usually limited and the network components may have some mobility, the topology of a wireless network can vary with time. According to the relative mobility of hosts and routers, there are three different types of wireless networks.

Fixed wireless network: Fixed hosts and routers use wireless channels to communicate with each other and form a fixed wireless network. An example is a wireless network formed by fixed network devices using directed antennas, as shown in Figure1.2.

Wireless network with fixed access points: Mobile hosts use wireless channels to communicate with fixed access points, which may act as routers for those mobile hosts, to form a mobile network with fixed access points. An example is a number of mobile laptop users in a building that access fixed access points, as illustrated in Figure 1.3.

Mobile ad hoc network

A mobile ad hoc network is formed by mobile hosts .Some of these mobile hosts are willing to forward packets for neighbours. Examples include vehicle-to-vehicle and ship-to-ship networks that communicate with each other by relying on peer-to-peer routings, as shown in Figure 1.4.

Generally speaking, traditional routing protocols that are used in wired networks can support routing in fixed wireless networks and mobile networks with fixed access points. Only one-hop routing is required over a link in a wireless network with fixed access points and many fixed wireless network. Routing in mobile ad hoc networks and some fixed wireless networks use multiple-hop routing. Routing protocols for this kind of wireless network should be able to maintain paths to other nodes and in most cases, must handle changes in paths due to mobility. Traditional routing cannot properly support routing in a MANET.

To simplify the description of a MANET, the MANET model is usually illustrated as shown in Figure 1.5. Nodes i, j, and k are mobile nodes in the network. The dashed circles shown in the figure imply the radio coverage areas of nodes. In wireless networks, node i can hear node j if i is within the radio range of j. Node i is a neighbor of node j if node j can also hear node i. This is called a bi-directional connection. Two nodes are disconnected if one node is not in the radio range of the other. For example, nodes j and k are disconnected in the figure.

The nodes in a MANET can usually move. Generally, the connection between two nodes is lost when a node moves out of another node's radio range. Thus, the network topology may change from time to time. We call this kind of MANET a truly mobile ad hoc network.

There is another kind of MANET besides the truly mobile ad hoc network. All nodes in this type of ad hoc network are fixed. The connectivity between components is relatively static. For example, a sensor network is typically a fixed ad hoc network . Network components in a sensor network are wireless sensors instead of general-purpose computers. They are often deployed in a large area. The sensors can collect information and route data back to a central processor or monitor. Those routes rely on intermediate sensor nodes. The positions of sensors are often fixed once they are in place. Therefore, the topology is relatively static. Since people often deploy sensors in a random way, for example, by scattering them from an airplane, the topology of a sensor network is often unpredictable. Moreover, the topology may also change when sensors lose power. Therefore, we consider the sensor network as a fixed ad hoc network. Many MANET issues and solutions do apply to fixed ad hoc networks.

FIGURE1.3: An example of fixed wireless networks.[8]

FIGURE1.4: An example of a wireless network with access points.[8]

FIGURE1.5: An example of a mobile ad hoc networks.[8]

FIGURE1.6: A simple network model for mobile ad hoc networks.[8]


MANET are new paradigm of networks, offering unrestricted mobility without any underlying infrastructure. Basically, an ad-hoc network is a collection of nodes communicating with each other by forming a multi-hop network. The characteristics[9] of ad-hoc network routing protocol are:

Dynamic topologies :

The topic refers to the most essential property of an ad hoc network. Nodes can move arbitrarily with respect to other nodes in the network.

Bandwidth-constrained :

Nodes in an ad hoc network are mobile. Thus, they are using radio links that have far lower capacity than hardwired links could use. In practice the realized throughput of a wireless network is less than a radio's theoretical maximum transmission rate.

Energy constrained operation:

Mobile nodes are likely to rely on batteries. That is why the primary design criteria may sometimes be energy conservation.

Limited physical security:

In general, radio networks are vulnerable to physical security threats compared to ¬xed networks. The possibility of eavesdropping, spoo¬ng and DoS attacks is higher. Existing link security techniques can be applied. However, a single point failure in an ad hoc network is not as crucial as in more centralized networks.


A mobile ad hoc network has following features[9]:

Autonomous Terminal

In MANET, each mobile terminal is an autonomous node, which may function as both a host and a router. In other, since there is no background network words, besides the basic processing ability as a host, the mobile nodes can also perform

switching functions as a router. So usually endpoints and switches are indistinguishable in MANET.

Distributed Operation

For the central control of the network operations, the control and management of the network is distributed among the terminals. The nodes involved in a MANET should collaborate amongst themselves and each node acts as a relay as needed, to implement functions e.g. security and routing.

Multi-hop Routing

Basic types of ad hoc routing algorithms can be single-hop and multi-hop, based on different link layer attributes and routing protocols. Single-hop MANET is simpler than multi hop in terms of structure and implementation, with the cost of lesser functionality and applicability. When delivering data packets from a source to its destination out of the direct wireless transmission range, the packets should be forwarded via one or more intermediate nodes.

Dynamic Network Topology

Since the nodes are mobile, the network topology may change rapidly and unpredictably and the connectivity among the terminals may vary with time. MANET should adapt to the traffic and propagation conditions as well as the mobility patterns of the mobile network nodes. The mobile nodes in the network dynamically establish routing among themselves as they move about, forming their own network on the fly.

Light Weight Terminal

In most cases, the MANET nodes are mobile devices with less CPU processing capability, small memory size, and low power storage. Such devices need optimized algorithms and mechanisms that implement the computing and communicating


Fluctuating link capacity

The nature of high bit-error rates of wireless connection might be more profound in a MANET. One end-to-end path can be shared by several sessions. The channel over which the terminals communicate is subject to noise, fading, and interference, and has less bandwidth than a wired network. In some scenarios, the path between any pair of users can traverse multiple wireless links and the link themselves can be heterogeneous.


With the increase of portable devices as well as progress in wireless communication, ad hoc networking is gaining importance with the increasing number of widespread applications. Ad hoc networking can be applied anywhere where there is little or no communication infrastructure or the existing infrastructure is expensive or inconvenient to use. Ad hoc networking allows the devices to maintain connections to the network as well as easily adding and removing devices to and from the network. The set of applications for MANETs[9] is diverse, ranging from large-scale, mobile, highly dynamic networks, to small, static networks that are constrained by power sources. Besides the legacy applications that move from traditional infrastructure environment into the ad hoc context, a great deal of new services can and will be generated for the new environment. It includes[9]:

Military battlefields

Military equipment now routinely contains some sort of computer equipment. Ad hoc networking would allow the military to take advantage of commonplace network technology to maintain an information network between the soldiers, vehicles, and military information head quarters. The basic techniques of ad hoc network came from this field.

Commercial Sector

Ad hoc can be used in emergency/rescue operations for disaster relief efforts, e.g. in fire, flood, or earthquake. Emergency rescue operations must take place where non-existing or damaged communications infrastructure and rapid deployment of a communication network is needed. Information is relayed from one rescue team member to another over a small handheld. Other commercial scenarios include e.g. ship-to-ship ad hoc mobile communication, law enforcement, etc.

Local level

Ad hoc networks can autonomously link an instant and temporary multimedia network using notebook computers or palmtop computers to spread and share information among participants at a e.g. conference or classroom. Another appropriate local level application might be in home networks where devices can communicate directly to exchange information. Similarly in other civilian environments like taxicab, sports stadium, boat and small aircraft, mobile ad hoc communications will have many applications.

Personal Area Network(PAN)

Network Short-range MANET can simplify the intercommunication between various mobile devices (such as a PDA, a laptop, and a cellular phone). Tedious wired cables are replaced with wireless connections. Such an ad hoc network can also extend the access to the Internet or other networks by mechanisms e.g. Wireless LAN (WLAN), GPRS, and UMTS. The PAN is potentially a promising application field of MANET in the future pervasive computing context.



In MANETs communication between nodes is done through the wireless medium. Because nodes are mobile and may join or leave the network, MANETs have a dynamic topology. Nodes that are in transmission range of each other are called neighbors. Neighbors can send directly to each other. However, when a node needs to send data to another non- neighboring node, the data is routed through a sequence of multiple hops, with intermediate nodes acting as routers. There are numerous issues to consider when deploying MANETs.[10] The following

are some of the main issues.

Unpredictability of environment

Ad hoc networks may be deployed in unknown terrains, hazardous conditions, and even hostile environments where tampering or the actual destruction of a node may be imminent. Depending on the environment, node failures may occur frequently.

Unreliability of wireless medium:

Communication through the wireless medium is unreliable and subject to errors. Also, due to varying environmental conditions such as high levels of electro-magnetic interference (EMI) or inclement weather, the quality of the wireless link may be unpredictable. Furthermore, in some applications, nodes may be resource-constrained and thus would not be able to support transport protocols necessary to ensure reliable communication on a lossy link. Thus, link quality may fluctuate in a MANET.

Resource-constrained nodes

Nodes in MANET are typically battery powered as well as limited in storage and processing capabilities. Also, they may be situated in areas where it is not possible to re-charge and thus have limited lifetimes. The available bandwidth of the wireless medium may also be limited because nodes may not be able to sacrifice the energy consumed by operating at full speed.

Dynamic topology

The topology in an ad hoc network may change constantly due to the mobility of nodes. As nodes move in and out of range of each other, some links break while new links between nodes are created.

2.2 Faults

Wireless Communication System is playing a great important role in information systems and its management is very important and essential. There are many managements such as configuration management, fault management, performance management, security management, accounting management and etc. Fault management is more important.

A node becomes faulty because of battery discharge, crash and limitation in age. The presence of faulty node affects the efficiency and throughput of the network, which makes the network inconsistent. Faulty nodes cannot communicate with the other mobiles or behave unexpectedly and send unexpected results. In this way it unnecessarly consumes energy and cause inconsistency.

2.2.1 Types of Faults

Each node in the system can be in one of two states faulty or fault-free. In order to exchange information, a node requires multiple hops in the MANETs due to restricted transmission power in the network. Due to issues like unpredictability of environment, unreliability of wireless medium, resource constrained nodes and dynamic topology, MANETs are susceptible to various types of faults.[11][12]

Based on the Duration- Based on duration faults can be of three types:

Transient fault: A transient fault can disappear without any visible event; it appears in a network for short time. The recovery of transient faults from system is addressed using repeated-round techniques. A probabilistic model used for the action of faulty periods, and a fault analysis is used to obtain the optimum retry period.

Intermittent fault: It is problematic type of transient fault; can't predict its appearance and disappearance in the network. An intermittent fault is occurred by several factors . These factors can only be identified when malfunction is occurred. Intermittent faults are difficult to identify and repair.

3. Permanent fault: Once it appears in network it remains until it removed and repaired by some external administrator. Permanent faults are simpler to deal.

Based on the Behavior-Based on behavior faults can be of two types:

Soft Fault: Soft faulted units can communicate with its neighbors but with unexpected behaviors and always give undesirable response.

Hard fault: Hard faulted units cannot communicate with its neighbors. It neither sends nor receives any information from the network.

Based on the Occurrence-Based on occurrence faults can be of two types:

Static fault: All faulty nodes be faulty from the starting of diagnosis session.The fault-free node can't be faulty during diagnosis session.

Dynamic fault: Fault-free node may become faulty during diagnosis session.It is hard to diagnosis because any node may fail after it diagnosed fault-free by any fault-free node.

Other faults

Another type of fault is Byzantine fault which fail the components of a system in arbitrary ways by processing requests incorrectly. It is of two types:

Omission failures: This type of failure doesn't response for a request, e.g.,crash, failing to receive a request, or failing to send a response.

Commission failures: This type of failure may respond in any unpredictable way, e.g., processing a request incorrectly, corrupting local state, and/or sending an incorrect or inconsistent response to a request.


With the fast development of mobile ad hoc networks (MANETs), fault diagnosis has become a critical need to guarantee robust service for various applications. Fault diagnosis put extra burden on the sensor node in addition with their normal task and it will also consume extra energy of the sensor nodes. Many techniques have been suggested to solve this problem, but they still cannot satisfy the special need of MANETs Fault identification is one of the important parts in many protocols. When any altered behavior is shown by system or nodes of the network, a diagnosis function is started to determine which node(s) has (have) shown abnormal behavior. This is termed as Diagnosis; diagnosis is classified based on the occurrence of fault. It is simply classified as static diagnosis and dynamic diagnosis.

In static diagnosis, the faults are not occurring during the diagnosis session.

In dynamic diagnosis, the faults can occur during the diagnosis session, which is difficult to handle because node can be faulty after it has been diagnosed as fault-free by other node.

The Heartbeat Approach:

The most common technique for monitoring crash faults is the heartbeat mechanism. The classical heartbeat approach [13] is based on the continuous monitoring of a node to know whether it is alive or not. Hence, each mobile transmits periodically an "I Am Alive" message to mobiles that monitor its state. After the expiration of a timeout, if a mobile u does not receive such a message from one of the neighbours , say v, it is in charge of detecting its failure, then it starts suspecting it as being faulty. Mobile u adds v to a list of suspected nodes, and will remove it from this list once it receives later v 's "I Am Alive" message. The heartbeat mechanism can be characterized by two main parameters:

• The heartbeat period Δ that denotes the time between two successive "I Am Alive" messages.

• The timeout delay Δto which refers to the time between the last reception of an "I Am Alive" message from v, and the time where u starts suspecting v, until u starts receiving again v's "I am Alive" messages.

The classical heartbeat approach suffers from two main weaknesses. The first one is that the detection time depends

on the last heartbeat. This weakness may have a negative impact on the accuracy of the failure detector since premature timeouts may occur. The second weakness is that it relies on a fixed timeout delay that does not take into account the network and system's load. That is, a node may be mistakenly suspected as faulty if it slows down due to heavy workload or if the network suffers from links failure that may delay the delivery of "I Am Alive" messages.


An important problem in designing dependable MANETS that are subject to the failure of mobile hosts is the distributed self diagnosis problem. In distributed self diagnosis each working (fault free) mobile host maintain correct information about the status (working or failed) of each mobile host in the entire MANET for some corrective actions. The existing distributed self diagnosis algorithms have been developed for wired networks assuming a centralized infrastructure which creates a bottleneck and single point of failures.


The field of distributed system-level diagnosis has flourished for years. Bianchini and Buskens introduced the Adaptive Distributed System level Diagnosis (Adaptive DSD) algorithm[14], and also its implementation in an Ethernet environment. Adaptive DSD has diagnosis latency of N testing rounds for a network of N nodes. Consider a system consisting of N units, which can be faulty or fault-free. The goal of system-level diagnosis is to determine the state of those units . For almost 30 years, researchers have worked on this problem. An adaptive approach, which requires fewer tests, is to assume that each unit is capable of testing any other, and to issue the tests adaptively, i.e., the choice of the next tests depends on the results of previous tests, and not on a fixed pattern. The Adaptive Distributed System-level Diagnosis algorithm, Adaptive-DSD is, at the same time, distributed and adaptive. Each node must be tested only one time per testing interval. All fault-free nodes achieve consistent diagnosis in at most N testing rounds. There is no limit on the number of faulty nodes for fault-free nodes to diagnose the system. Adaptive-DSD is executed at each node of the system at predefined testing intervals. Each time the algorithm is executed on a fault-free node, it performs tests on other nodes until it finds another fault-free node, or it runs out of nodes to test. A testing round is defined as the period of time in which all nodes of the system have executed Adaptive-DSD at least once. After one testing round, if there are at least two fault-free units, the testing graph has the format of a ring, as shown in Fig. In the example shown in Fig, node 1, node 4, and node 5 are faulty, and the rest are fault free. Node 0 tests node 1 and finds it faulty, so it goes on and tests node 2, which is fault-free, and then stops testing. Node 2 then tests node 3 as fault-free, and so on.

Each node i that executes the algorithm has an array, called TESTED-UPi, that contains N entries, indexed by the node identifier. The entry TESTED-UPi[k] = j means that node i has received diagnostic information from a

fault-free node specifying that node k has tested j to be fault-free. An entry TESTED-UPi[j] is "arbitrary" if node j is faulty. When node i finds node j to be fault-free, it saves this information in TESTED-UPi[i]. In the next testing round, this test data of i is taken by its first fault-free predecessor, and so on, until all nodes get the information. In this way, the diagnostic information in the TESTED-UP array is forwarded to nodes in the reverse direction of the testing network. Using the information in TESTED-UPi , a node i has to diagnose the state of all nodes in system; for this task, another algorithm, called Diagnose, is employed. Adaptive-DSD has a diagnosis latency of N testing rounds. It is desirable to reduce this latency.

2.6 Clustering

In mobile ad hoc network , clustering can be defined as a notional arrangement of the dynamic nodes into various groups. These virtual collections of nodes are grouped together regarding their relative transmission range proximity to each other that allows them to establish a bidirectional link. The diameter size of the clusters determines the control architectures as single-hop clustering and multi-hop (K-hop) clustering. In single-hop clustering every member node is never more than 1-hop from a central coordinator - the clusterhead. Thus all the member nodes remain at most two hops n distance away from each other within a logical cluster. In multi-hop clustering, the limitation or restriction of an immediate proximity to member nodes from the head is removed, allowing them to be present in serial k-hop distance to form a cluster.

Ordinary nodes (cluster member): As the name suggests, ordinary nodes do not perform any

other function beyond a normal node role. They are members of an exclusive cluster independent of neighbors residing in a different cluster.

Cluster Gateway Nodes: Is a node that works as the common or distributed access point for two clusterheads. When a node remains within the transmission range of two clusterheads.

Clusterhead nodes: for any efficient cluster (subsets of nodes in a network satisfying a

particular property) operation there must be a support or backbone to sustain all essential control

functions such as channel access, routing, calculation of the routes for longer-distance messages,

bandwidth allocation, forwarding inter-cluster packets, power control and virtual-circuit support. This support or backbone takes the form of connected clusterheads, in managerial role;

linked either directly or via gateway nodes and they will have the subordinate nodes of that cluster linked to them. Another function of clusterheads is internal node communication, to forward interclustermessages. To send a packet an ordinary node must first direct it to its 'superior' its directly connected clusterhead.





2.7 Methods of Fault Diagnosis

Several diagnosis methods have been adopted based either on invalidation models, such as the PMC model, or comparison models and the generalized comparison model[15]. The comparison model is most promising approach in which a set of task is assigned to nodes and outcomes are compared with their neighbour's outcomes. Various generalized comparison approach have been used. In this approach the comparison is done by the nodes themselves. The generalized comparison outcomes can be summerized as follows. If the tester and the tested nodes are fault-free, the comparison outcome is 0. If at least one of the tested nodes is faulty and the tester node is fault-free comparison outcome is 1. If the tester node is faulty, the comparison result is unpredictable (0 or 1).


Xu (2001) et al. [17] proposed a new algorithm for routing in mobile survivable networks, based on the combination of position-based routing concepts and fault tolerant routing techniques in computer networks. By using combination of these two concepts, they employed a simplified way of localizing routing overhead while at the same time they improved the operational effectiveness of the position-based routing approaches by alleviating some of the drawbacks associated with them, such as routing deadlock occurrences, and therefore creating a robust and fault tolerant routing strategy.

The algorithm proposed here was based on the Position guided Sliding-window Routing (PSR) protocol. This protocol provides a single-tier routing organization scheme by employing a simplified way of localizing routing overhead. In this paper they enhance this approach by adding an additional level of hierarchy (on the cluster level which is of much smaller scale) in order to improve the operational effectiveness of this scheme and alleviate some of the drawbacks associated with the position-based protocols (such as routing deadlock occurrences).

To overcome the drawback Deadlock and loop that is inherent to the position based routing schemes, gateways are used as intermediate hops along the path to the destination. When a packet arrives to a gateway, some calculations is performed at the gateway to see if there exists a path between the local node to another gateway of local cluster that is closer to the destination. If deadlock/loop is found during this operation, it is better to request the gateway of the previous cluster to change the path to another cluster. The grid-clustered PSR is used to avoid deadlock/loop creation.

In this the problem of providing scalable fault tolerant routing in large-scale mobile wireless networks is considered. A new algorithm for routing in mobile survivable networks, based on the combination of position-based routing concepts and fault-tolerant routing techniques in computer networks. By using combination of these two concepts, achieve to employ a simplified way of localizing routing overhead while at the same time improve the operational effectiveness of the position-based routing approaches by alleviating some of the drawbacks associated with them, such as routing deadlock occurrences, and therefore creating a robust and fault tolerant routing strategy. It should be noted that although adopt the concept of cluster creation, avoid the use of cluster-heads and thus achieve high level of fault-tolerance. Also, because of the introduction of cluster level, less service information is required to be transmitted, thus it is easier to use more elaborate adaptive routing techniques, further improving network performance. Performing a complete and in-depth comparative evaluation of the basic PSR approach with the grid-clustered PSR scheme, in terms of many performance parameters such as, average packet delivery time, maximal throughput, number of hops, packet delivery ratio, etc.

Duarte and Nanya(1998)[14] Considered a system composed of N nodes that can be faulty or fault-free. The purpose of distributed system-level diagnosis is to have each fault-free node determine the state of all nodes of the system. This paper presents a Hierarchical Adaptive Distributed System-level Diagnosis (Hi-ADSD) algorithm, which is a fully distributed algorithm that allows every fault-free node to achieve diagnosis in, at most, (log2 N)2 testing rounds. Nodes are mapped into progressively larger logical clusters, so that tests are run in a hierarchical fashion. Each node executes its tests independently of the other nodes, i.e., tests are run asynchronously. All the information that nodes exchange is diagnostic information. The algorithm assumes no link faults, a fully-connected network and imposes no bounds on the number of faults. Both the worst-case diagnosis latency and correctness of the algorithm are formally proved. As an example application, the algorithm was implemented on a 37-node Ethernet LAN, integrated to a network management system based on SNMP (Simple Network Management Protocol). Experimental results of fault and repair diagnosis are presented. This implementation by itself is also a significant contribution, for, although fault management is a key functional area of network management systems, currently deployed applications often implement only rudimentary diagnosis mechanisms.

Hi-ADSD maps nodes to clusters and uses a divide-and-conquer testing strategy to achieve diagnosis in, at most, log2 N testing rounds. In this way, Hi-ADSD improves the diagnosis latency of previous algorithms, while keeping the number of tests conveniently low. The correctness and worst-case latency of the algorithm were formally proven. Hi-ADSD was implemented, integrated to an SNMP based network management system on a 37-node Ethernet LAN. As SNMP applications are currently widely deployed, but fault management is still based on rudimentary procedures, this implementation by itself is also a significant contribution to the field of network management.

Moallemi and Moghaddam (2007) [16] described that Resource reservation and mutual exclusion are challenging problems in mobile ad-hoc networks(MANET). Due to the dynamic characteristics of nodes in these networks, yet, few algorithms have been proposed. The other problem in these networks is link or node failure due to many reasons (e.g. running out of battery, hardware software crash, getting out of transmission range due to high mobility). Thus fault tolerance for these algorithms is another necessity which hasn't been completely accomplished. In this paper author proposed an algorithm which is completely fault tolerant (covers temporary and permanent faults). It also has the mutual exclusion property for critical resource reservations. The proposed algorithm uses three recovery processes to maintain the stable state for whole system.

The proposed algorithm builds a fault tolerance for Naimi-Trehel algorithm and distributed it independently on different clusters. They have used the hierarchical structure and the proposed algorithm supports fault tolerance capability for both temporary and permanent failures. It has three recovery processes to recover failures. They have three internal cluster broadcasts and one overall broadcast for recovering a failure in recovery processes R1 and R2. The proposed algorithm preserves the order of the token requests after the failure and can cover N-1 failures of N nodes. The proposed algorithm's Safety and Liveness properties to show its integrity.

Khilar(2008) et al.[18] proposed a scalable failure detection service for large scale ad hoc networks using an efficient cluster based communication architecture. Their failure detection service adapts the detection parameter to the current load of the wireless ad-hoc network. The proposed approach uses a heartbeat based testing mechanism to detect failure in each cluster and take the advantage of cluster based architecture to forward the failure report to other cluster and their respective members.

In This paper the Clustering concept is used. The failure detection algorithm coupled with suitable clustering algorithm make a very efficient failure detection service for wireless ad-hoc networks. Clustering divides whole network into two level communication architecture namely intra-cluster and inter-cluster. Two types of message overheads are required to maintain such as intra-cluster and inter-cluster. The disadvantage of the clustering approach is that CH itself may fail, hence it becomes necessary that the presence of leader is also need to be monitored and in case of its failure another node takes over the CH. Author use the concept of deputy clusterhead or backup cluster head to solve this problem. They will choose this member as deputy clusterhead (DCH), who can monitor the leader as follows: (i) After every heartbeat interval, CH node sends a packet to the backup clusterhead, (ii) The packet contains information about each nodes in the group and its arrival indicates that the CH is up and running, (iii) The deputy clusterhead (DCH) updates its database using data obtained from this packet, (iv) In case of absence of this packet indicating that the primary CH has failed, DCH assumes the role of the leader, (v) This change is multicast to the cluster members who update their database in order to change the communication path of the heartbeat messages, and (vi) The same is multicast to the other CHs through GWs who multicast it to their respective members.

A scalable failure detection service for large scale ad hoc networks using an efficient cluster based communication architecture. Failure detection service adapts the detection parameter to the current load of the wireless ad hoc network. The proposed approach uses a heartbeat based testing mechanism to detect failure in each cluster and take the advantage of cluster based architecture to forward the failure report to other cluster and their respective members. The simulation results show that this approach is linearly scalable in terms of message complexity and consensus time.

Yadav and Khilar (2010) [21] described that Ad hoc networking allows portable mobile devices to establish communication path without having any central infrastructure. As there is no centralized infrastructure and the mobile devices are moving randomly, this gives rise to various kinds of problems such as routing and detecting faulty mobile nodes in the network. In this paper, the problem of fault diagnosis in mobile ad hoc networks (MANETs) is considered. In fact, fault-diagnosis becomes important building block to establish dependability in MANET. An important problem in MANET is the distributed system-level diagnosis problem whose purpose is to have each fault-free mobile node to determine the state of all the mobile nodes assuming a MANET composed of N nodes that can be faulty or fault-free. This paper uses a hierarchical clustering approach proposed by authors Durate and Nanya for diagnosing nodes in mobile ad hoc networks (MANETs). The proposed diagnosis algorithm is linearly scalable under the assumption that the mobiles may be: (i) crash faulty due to out of range or physical damage and (ii) value faulty due to sending erroneous messages while operating in the field. The generic parameters such as diagnostic latency and message complexity are used for evaluating the proposed diagnosis algorithm. In this paper, author proposed a hierarchically adaptive distributed diagnosis algorithm for diagnosing crash and value faulty nodes in MANET based on Hi-ADSD. Hi-ADSD maps nodes to cluster and uses a divide-and-conquer testing strategy to achieve diagnosis.

Chessa and Santi(2011)[20] considered the problem of fault identification in ad-hoc networks. They presented a new comparison-based diagnostic model based on the one-to many communication paradigm which takes advantage of the shared nature of communication typical of multi-hop packet radio networks. They presented two implementations of the model. The first implementation assumes that the network topology is fixed. Under this scenario, hard faults can be detected using a timeout, and efficient diagnosis protocols can be easily designed. If the fixed topology assumption is released, thus taking into account a relevant feature of ad-hoc networks, the .diagnostic efficiency of the model decreases notably: hard-faults cannot be detected and fault-free nodes are no longer guaranteed to correctly diagnose all their neighbors within a certain time. This indicates that the design of diagnosis protocols ensuring correct diagnosis within a limited time could turn out to be very difficult under this scenario. In their opinion achieving correct diagnosis in the traditional sense (i.e., all the fault-free units of the system correctly diagnose the state of any other unit in finite time) in mobile systems be extremely hard, unless some restrictions on the mobility of the units are imposed. The identification of a .minimal set of restrictions ensuring a somewhat weaker notion of correct diagnosis is matter of ongoing research.

The problem of identifying faulty mobiles in ad-hoc networks is considered. Current diagnostic models were designed for wired networks, thus they do not take advantage of the shared nature of communication typical of ad-hoc networks. In this paper we introduce a new comparison-based diagnostic model based on the one-to many communication paradigm. Two implementations of the model are presented. In the first implementation, we assume that the network topology does not change during diagnosis, and we show that both hard and soft faults can be easily detected. Based on this implementation, a diagnosis protocol is presented. The evaluation of the communication and time complexity of the protocol indicates that efficient diagnosis protocols for ad-hoc networks based on our model can be designed. In the second implementation we allow the system topology to change during diagnosis. As expected, the ability of diagnosing faults under this scenario is significantly reduced with respect to the stationary case.

Rangarajan and Dahbura (2005)[19]- described a distributed algorithm for detecting and diagnosing faulty processors in an arbitrary network. Fault-free processors perform simple periodic tests on one another; when a fault is detected or a newly-repaired processor joins the network, this new information is disseminated in parallel throughout the network. It is formally proven that the algorithm is correct; and it is also shown that the algorithm is optimal in terms of the time required for all of the fault-free processors in the network to learn of a new event. Simulation results are given for arbitrary network topologies.

They presented a distributed algorithm for fault diagnosis that uses parallel dissemination of fault event information to minimize the information latency in the network. Simulation of this algorithm using the process-oriented simulation language CSIM shows that parallelizing the dissemination stage also allows for nodes that are local to the event to, in general, learn about the event before more distant nodes. Further, in their algorithm, a newly repaired node can rejoin the system without relying on other nodes to first detect that it has been repaired; equivalently, faulty nodes do not have to be periodically tested. The algorithm provides an option through which dead messages can be removed at the cost of increasing the information latency; that is, a tradeoff can be made between message overhead and latency.

Elhadef and Boukerche(2007)[22] Dependable mobile ad-hoc networks are being designed to provide reliable and continuous service despite the failure of some of their components. One of the basic building blocks that has been identified for such fault tolerant systems is the failure detection service which aims at providing some information on which hosts have crashed. In this paper, we present a new implementation of a failure detection service for wireless ad-hoc and sensor systems that is based on an adaptation of a gossip-style failure detection protocol and the heartbeat failure detector. We show that our failure detector is eventually perfect−That is, it satisfies both properties: strong completeness and eventual strong accuracy. Strong completeness means that there is a time after which every faulty mobile is permanently suspected by every fault-free host. While, eventual strong accuracy refers to the fact that no host will be suspected before it crashes.

The proposed failure detector is a variant of the heartbeat failure detector and allows each host to maintain a list of hosts it currently suspects of having crashed. The main characteristics of their failure detector is that it is adaptable and dynamic, that is, it adapts the freshness points to the current network or hosts' load. The distributed failure detection service can be used by distributed applications directly, or support other middleware services such as system management, load balancing and group communication and membership services. As such, failure detection is a valuable extension to current dependable services that a wireless environment is expected to provide. In future investigations, we are planning to conduct extensive simulations of the proposed failure detector for various MANETs. They are also looking at a more adaptable failure detection service that would be more scalable for large MANETs. They are also investigating the QoS of adaptive failure detection service.

Bharath et al. (2011)[24] described that reliable distributed systems provide high availability for an important class of applications through a combination of software and hardware support. Redundancy and replication are essential features of these systems but both come with a high cost. One trend that promises to provide more intelligence to the allocation of resources in this environment is adaptation. Adaptive fault tolerance is the idea of adaptively configuring system resources to respond to environmental changes (i.e. faults). They presented an overview of several adaptive fault tolerant systems, and describes the challenges involved in their implementation .They presented a unified model highlighting fundamental components in the design of an adaptive fault tolerant system. They used our model to describe a selection of recent representative systems and expose the design decisions made during their construction. Adaptive fault tolerance can increase availability, reliability and decrease cost in a distributed computing environment. Present-day AFT systems are mature in their use of redundancy, communication and synchronization but to further the goal of reliability other directions need to be explored. Environment awareness and other proactive measures are features of AFT that they believe future systems will attempt to leverage.

Aguilera(1998) et al.[21] considered the problems of failure detection and consensus in asynchronous systems in which processes may crash and recover, and links may lose messages. They firstly proposed new failure detectors that are particularly suitable to the crash-recovery model. They next determined under what conditions stable storage is necessary to solve consensus in this model. Using the new failure detectors, They gave two consensus algorithms that match these conditions: one requires stable storage and the other does not. Both algorithms tolerate link failures and are particularly efficient in the runs that are most likely in practice - those with no failures or failure detector mistakes. In such runs, consensus is achieved within 3_ time and with 4n messages, where _ is the maximum message delay and n is the number of processes in the system

Mueller(2004) et al. [10] described that Mobile ad hoc networks (MANETs) consist of a collection of wireless mobile nodes which dynamically exchange data among themselves without the reliance on a fixed base station or a wired backbone network. MANET nodes are typically distinguished by their limited power, processing, and memory resources as well as high degree of mobility. In such networks, the wireless mobile nodes may dynamically enter the network as well as leave the network. Due to the limited transmission range of wireless network nodes, multiple hops are usually needed for a node to exchange information with any other node in the network. Thus routing is a crucial issue to the design of a MANET. The authors specifically examined the issues of multipath routing in MANETs. Multipath routing allows the establishment of multiple paths between a single source and single destination node. It is typically proposed in order to increase the reliability of data transmission (i.e., fault tolerance) or to provide load balancing. Load balancing is of special importance in MANETs because of the limited bandwidth between the nodes. They also discussed the application of multipath routing to support application constraints such as reliability, load-balancing, energy-conservation, and Quality-of-Service (QoS) multipath routing in ad hoc networks mostly in terms of the network layer. They didn't mentioned about the interaction of multipath routing with the transport layer, in particular, TCP. The main issue that must be dealt with at the transport layer is the arrival of out-of order packets when multiple paths are used in a round-robin fashion. In TCP, out-of-order packets are assumed to signal congestion in the network, at which point TCP reduces its window size. This can have a detrimental effect on the overall throughput seen by TCP connections. Therefore, the implementation of a TCP-friendly multipath protocol is necessary. They discussed CHAMP, which uses equal length paths to reduce out-of-order packets. However, finding only equal length paths puts a restriction on the number of paths you can find. If unequal paths are chosen, there could also be ways to perform intelligent traffic allocation depending on path lengths and path delays such that out-of-order packets are minimized. For instance, sending later packets over shorter paths and earlier packets over longer paths may result in reduced out-of-order packets at the receiver. This implies intelligently sending packets out-of-order such that they arrive in-order at the receiver.

In their discussion of using multipath routing to support QoS, most of the protocols proposed only provide QoS in terms of specific metrics, such as bandwidth, delay, or reliability. However, it may be necessary to develop mechanisms to support QoS in terms of multiple metrics. For instance, when searching for multiple paths that have the required bandwidth, it is desirable to find reliable paths. Given the faulty nature of MANETs, constructing a multipath route that meets the bandwidth requirements while also meeting certain reliability requirements would result in better performance. Also, the mechanisms proposed for supporting QoS in terms of delay only attempt to minimize or improve on the delay. It would be desirable to develop a multipath protocol that can provide delay bounds or guarantees, which are required by some real-time applications. Using multipath routing to provide adaptive QoS using source coding is also

a promising technique that can be expanded upon for applications other than video.

Vanaja and Umarani (2012) [25] stated that the increasing popularity in wireless communication devices and the advancements in wireless technology make the communication in an effective and efficient manner. Mobile ad hoc Network (MANET) is a kind of wireless network, having collection of mobile nodes communicated through wireless links without using any infrastructure. Routing Protocols are necessary for forwarding of data packets to have effective communication. The performance of MANET routing protocols degrade the network performance when there is a link break. This paper mainly deals with the fault management to resolve the mobility induced link break. The proposed protocol is the adaptive fault tolerant multipath routing (AFTMR) protocol which reduces the packet loss due to mobility induced link break. In this fault tolerant protocol, battery power and residual energy are taken into account to determine multiple disjoint routes to every active destination. When there is link break in the existing path, AFTMR initiates Local Route Recovery Process. Network Simulator NS-2 is used for implementation and performance analyzed using the quantitative metrics such as packet delivery ratio, end to end delay, control overhead, throughput and packet drop. Simulation results show that the proposed protocol achieves better throughput, packet delivery ratio with reduced delay, packet drop and energy.

13)An Adaptive Fault Identification Protocol for an Emergency/Rescue-Based Wireless and Mobile Ad-Hoc Network(2007)

Boukerche(2007) et al.[27] considered the fault diagnosis problem in MANETs, i.e. the problem of identifying faulty hosts by fault-free ones. The diagnosis scheme that we consider is that based on the comparison approach, where hosts transmit test tasks to their neighbors and the outcomes are compared. By comparing the received outcomes fault-free hosts are able to diagnose the fault status of the network. They proposed an adaptive distributed diagnosis algorithm that uses an adaptable spanning tree to disseminate the local diagnosis views throughout the ad-hoc network. The protocol allows all fault-free hosts to correctly identify all faulty ones, and it constitutes a viable addition to existing self-diagnosis protocols, new adaptive fault identification protocol, called Adaptive-DSDP, for fixed topology MANETs. The diagnosis is based on the comparison approach and accomplishes a correct and complete fault identification. Adaptive-DSDP uses a spanning tree in order to disseminate the local diagnosis views gathered separately by the mobiles. The spanning tree is initially configured with the MANET, and then adapted to any faulty situation that might affect any of its internal nodes.

Liu and Payton(2011)[26] described that recent technological advancements have led to the popularity of mobile devices, which can dynamically form wireless networks. Unfortunately, mobile devices are vulnerable to failure because of various factors, including physical damage due to deployment in harsh environmental conditions, limited energy, and malicious attacks. Detecting node failure is an important problem that has been widely studied; recent attention has focused on determining failure when nodes are mobile. Detection of node failure requires additional messages to be sent across the network, which is costly in terms of energy consumption. The authors contended that fault detection algorithms should be designed with consideration of the tradeoffs between cost and accuracy of fault detection. In this paper, they presented two approaches to dynamically adapting a fault detection algorithm. They compared their adaptive approaches to existing approaches and evaluate the tradeoffs between cost and accuracy.

Two approaches to adapting fault detection in dynamic mobile networks. They used a cluster based probe-and-ack algorithm to illustrate how a) application specific requirements can be used to drive the adaptation of the rate at which failure detection probes are issued and b) how failure detection history can be used to drive adaptation of the interrogation period. The use of either of these approaches can result in the reduction of network load and message overhead, which can extend the lifetime of the network.

Yan and Lv [28](2012)described that in modern computer networks, fault diagnosis has been a focus of research activity. This paper described the history of fault diagnosis in networks and discusses the main methods in information gathering section, information analyzing section and diagnosing and revolving section of fault diagnosis in networks. Emphasis were placed upon knowledge-based methods with discussing the advantages and shortcomings of the different methods. The survey was concluded with a description of some open problems.

Modern fault Diagnosis methods in computer networks, focuses on the contributions which we think close to the modern theory and may gain some relevance for the future research and practical applications. fault diagnosis in networks has made great progress in common fault detecting and localization. Each method of fault diagnosis in networks relies on one or more theories, which determinates the application of method.

Vashist et al. [29](2012) described that fault detection and localization is a well-studied problem in communication networks, as attested by the many techniques designed to address this problem. The inherent variability, limited component reliability, and constrained resources of MANETs (Mobile Ad hoc Networks) make the problem not just more important, but also critical. Practical development and deployment considerations imply that fault detection and localization methods must i) avoid relying on overly detailed models of network protocols and traffic assumptions and instead rely on actual cross-layer measurements/observations, and ii) be applicable across different network scales and topologies with minimum adjustments.

The authors proposed an important and as yet unexplored approach to fault management in MANETs: network-invariant fault detection, localization and diagnosis with limited knowledge of the underlying network and traffic models. They showed how fault management methods can be derived by observing statistical network/traffic measurements in one network, and subsequently applied to other networks with satisfactory performance. The authors demonstrated that a carefully designed but widely applicable set of local and weak global indicators of faults can be efficiently aggregated to produce highly sensitive and specific methods that perform well when applied to MANETs with varying sizes, topologies, and traffic matrices.


Finding and Analysis


To design and develop a adaptive distributed system level diagnosis algorithm for identifying the fault status of various nodes in cluster where nodes are subjected to crash and value faulted nodes MANET based on adaptive distributed diagnosis algorithm

• To analyze and validate the performance of the proposed adaptive distributed diagnosis algorithm using mat lab.

• To compare the proposed method with the existing algorithms based on message and time complexity.

Tool used

Introduction to MATLAB

MATLAB (matrix laboratory) is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages, including C, C++, Java, and Fortran.

Although MATLAB is intended primarily for numerical computing, an optional toolbox uses the MuPAD symbolic engine, allowing access to symbolic computing capabilities. An additional package, Simulink, adds graphical multi-domain simulation and Model-Based Design for dynamic and embedded systems.

In 2004, MATLAB had around one million users across industry and academia. MATLAB users come from various backgrounds of engineering, science, and economics. MATLAB is widely used in academic and research institutions as well as industrial enterprises.

MATLAB is a high-performance language for technical computing. It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation. Typical uses include:

Math and computation

Algorithm development

Modeling, simulation, and prototyping

Data analysis, exploration, and visualization

Scientific and engineering graphics

Application development, including Graphical User Interface building

MATLAB is an interactive system whose basic data element is an array that does not require dimensioning. This allows you to solve many technical computing problems, especially those with matrix and vector formulations, in a fraction of the time it would take to write a program in a scalar non interactive language such as C or Fortran.

The name MATLAB stands for matrix laboratory. MATLAB was originally written to provide easy access to matrix software developed by the LINPACK and EISPACK projects, which together represent the state-of-the-art in software for matrix computation.

MATLAB has evolved over a period of years with input from many users. In university environments, it is the standard instructional tool for introductory and advanced courses in mathematics, engineering, and science. In industry, MATLAB is the tool of choice for high-productivity research, development, and analysis.

MATLAB features a family of application-specific solutions called toolboxes. Very important to most users of MATLAB, toolboxes allow you to learn and apply specialized technology. Toolboxes are comprehensive collections of MATLAB functions (M-files) that extend the MATLAB environment to solve particular classes of problems. Areas in which toolboxes are available include signal processing, control systems, neural networks, fuzzy logic, wavelets, simulation, and many others.

The MATLAB System

The MATLAB system consists of five main parts:

The MATLAB language.

This is a high-level matrix/array language with control flow statements, functions, data structures, input/output, and object-oriented programming features. It allows both "programming in the small" to rapidly create quick and dirty throw-away programs, and "programming in the large" to create complete large and complex application programs.

The MATLAB working environment.

This is the set of tools and facilities that user work with as the MATLAB user or programmer. It includes facilities for managing the variables in their workspace and importing and exporting data. It also includes tools for developing, managing, debugging, and profiling M-files, MATLAB's applications.

Handle Graphics.

This is the MATLAB graphics system. It includes high-level commands for two-dimensional and three-dimensional data visualization, image processing, animation, and presentation graphics. It also includes low-level commands that allow user to fully customize the appearance of graphics as well as to build complete Graphical User Interfaces on their MATLAB applications.

The MATLAB mathematical function library.

This is a vast collection of computational algorithms ranging from elementary functions like sum, sine, cosine, and complex arithmetic, to more sophisticated functions like matrix inverse, matrix Eigen values, Bessel functions, and fast Fourier transforms.

The MATLAB Application Program Interface (API).

This is a library that allows user to write C and Fortran programs that interact with MATLAB. It include facilities for calling routines from MATLAB (dynamic linking), calling MATL