Coding Scheme For Data Aggregation Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

An energy efficient secured cluster-reconfiguration scheme in data aggregation is proposed where data aggregation is performed by reducing the data redundancy using Slepian-wolf coding technique and security is provided by an energy efficient encryption algorithm. This scheme is mainly intended for reducing the data redundancy in highly correlated data sensed by the sensor nodes based on the assumption that each sensor node has knowledge of the correlation structure of the network a priori and thereby increasing the overall network lifetime by using less energy. However, this encoding scheme is not tolerant to loss of data at several nodes that are dependent on a single node failure and therefore the proposed scheme deals with formation of clusters and an efficient cluster head reconfiguration scheme applicable to the slepian-wolf coding technique. Security is also an important aspect in data aggregation, therefore an efficient encryption technique for the encoded data is applied to provide confidentiality to the aggregated data.

Wireless sensor network (WSN) is a collection of sensor nodes which achieve mutual communication and sense environmental change. A number of sensor nodes are densely deployed in the field of interest to observe the phenomena at different points in the field, which are sent to a data sink, located either at the center or out of the field, for further processing [1]. Usually, the observed phenomena are highly correlated, which leads to considerable data redundancy in the network [2].

The main issues in a wireless sensor network include the energy constraints of a sensor node and limited bandwidth. To increase energy efficiency in data transmission, it is desirable to reduce such data redundancy and for this purpose data aggregation has been widely used. On the other hand, data security is an important issue for many WSN applications. To provide data security, the conventional way is to encrypt the sensed data at each sensor node and then decrypt the data at the data sink(s). However, a network-wide encryption would cause considerable computational, communication, and storage overhead due to data encryption and key management operations. Hence, an energy efficient technique for WSN is required.

In most of the WSN applications, the sensors are densely deployed due to which there exists correlation in the data being sensed by the sensor nodes. The observed phenomenon by the sensor nodes is usually a spatially dependent, continuous process, in which the observed data have a certain spatial correlation. In general, the degree of spatial correlation in the data increases with decrease in distance between the sensor nodes. Therefore, spatially proximal sensor observations are highly correlated, which leads to considerable data redundancy in the network [2]. To efficiently use network resources and increase energy efficiency in data transmission, it is highly desirable to remove such redundancy through effective data-aggregation techniques.

Data aggregation is the process of summarizing and combining sensor data in order to remove data redundancy so that the amount of data transmission is reduced in the network. This results in prolonging the overall average lifetime of the network and minimizes energy consumption. Data aggregation along with data compression is necessary for WSN applications which have large amount of data to send across the network.

Slepian-Wolf coding [3], [4] is a distributed source coding technique that can completely remove data redundancy with no need for inter-sensor communication and is therefore a promising technique for data aggregation in a WSN. This technique is based on the assumption that each sensor node has knowledge of the correlation structure of the network a priori, which depends on the distances between sensor nodes and the characteristics of the observed phenomenon [4]. However, globally applying Slepian-Wolf coding to the whole network is difficult, because each sensor node needs to know the global correlation structure to encode its own data, which would incur significant additional costs. Moreover, Slepian-Wolf coding is not tolerant to communication failures, because the data from one node may affect the decoding of the data from other nodes. For these reasons, Slepian-Wolf coding is usually not suitable for global application in a large network. In a cluster-based network, however, each cluster covers a smaller number of sensor nodes within a smaller local range of the network. This makes it more feasible to locally apply Slepian-Wolf coding within each cluster as, in this case, a sensor node only needs knowledge of a local correlation structure to perform encoding. Thus, an efficient clustering algorithm has to be employed to form clusters and select the cluster heads efficiently.

Since the data from one node may affect the decoding of data from other nodes, this slepian-wolf coding technique is not tolerant to node failures due to which the data at the other nodes are also lost. Hence, a cluster reconfiguration mechanism is required to select a new cluster head in case of a cluster head failure. The case of a node (a cluster member) failure is also considered as the data from the failed node also may affect the decoding of data from other nodes within a cluster. Security is also an important aspect of data aggregation where data which is aggregated has to be sent confidentially, but use of encryption techniques may degrade the performance of the sensor network as it requires high energy and computational power. Therefore an energy efficient encryption technique has to be employed such that it encrypts the slepian-wolf coded data of the sensor nodes efficiently.

In this paper, the proposal is to apply slepian-wolf coding technique to eliminate data redundancy as given in [5]. A cluster formation protocol applicable to slepian-wolf coding was proposed in [5] but it does not address the problem of data loss in case of cluster head failure and other node failures within a cluster. Thus, an efficient cluster reconfiguration scheme is proposed. In addition to that, in this coding technique all the readings of the cluster members depend on the data of the cluster head. Therefore, an energy efficient encryption technique applicable for wireless sensor networks is applied for the cluster head data so that the data confidentiality within the entire network is ensured.

The rest of this paper is organized as follows. In section II the related work is given and in Section III deals with the description of slepian-wolf coding technique. Section IV gives the description of our proposed scheme. Section V contains performance evaluation and Section VI concludes with possible future work in this area.

2. Related work

Several mechanisms have been proposed about cluster head election in mobile ad hoc sensor networks. Two of most used heuristics in mobile ad hoc networks are Highest-Degree heuristic and Lowest-Id heuristic [6]. Highest-Degree heuristic chooses node with maximum number of neighbors as cluster head while Lowest-Id heuristic assigns each node with a unique id and chooses node with minimum id as cluster head. The generic weight-based clustering algorithms [12, 13] associate each sensor with some weight. WCA [7] and DCA [11] are different from [2] and [3] in that they not only considers signal factor, but also combines weight with various parameters such as battery power, node degree and mobility. In wireless sensor networks, the most popular clustering based routing protocol is LEACH [8] which uses random rotation of cluster head to distribute the energy load among nodes in system.

In case of distributed source coding techniques, a detailed review of the use of distributed source coding technique in wireless sensor networks is given in [16]. A practical distributed scheme called DISCUS (distributed source coding using syndromes) [9] has been proposed. The problems pertaining to the practical use of slepian-wolf coding technique is discussed in [10].

Several security encryption algorithms have been proposed mainly for wireless sensor networks. A detailed survey of several secure data aggregation algorithms have been discussed in [14] [15] presents an overview of end-to-end encryption solutions for convergecast traffic in wireless sensor networks that support in-network processing at forwarding intermediate nodes.

The most relevant protocols to our proposed security protocol for wireless sensor networks are SPINS, SNAKE and LEAP key distribution protocols. In SPINS protocol, the base station (Key Server) assigns a unique key to each session for communication between any pair of nodes. SNAKE is a protocol that can negotiate the session key in an ad-hoc way. Nodes do not need a key server to perform key management.

LEAP [19] (Localized Encryption and Authentication Protocol) is a deterministic security key management protocol that reduces the security impact of a node compromise and limits the impact to a region within the immediate network neighborhood of the compromised node. The design of the protocol is designed based on the observation that a single keying mechanism is not suitable as there are different security requirements for different types of messages exchanged between sensor nodes at different levels. Hence, LEAP supports the establishment of four types of keys for each sensor node: an individual key shared with the base station, a pair-wise key shared with another sensor node, a cluster key shared with multiple neighboring nodes, and a group key that is shared by all the nodes in the network.

3. Slepian-wolf coding technique

Slepian-Wolf coding [3] is a distributed source-coding technique that can completely remove data redundancy with no need for inter-sensor communication and is therefore a promising technique for data aggregation in a WSN. This technique is based on the assumption that each sensor node has knowledge of the correlation structure of the network a priori, which depends on the distances between sensor nodes [3] [4].

Consider a network consisting of N sensor nodes uniformly distributed in a region of interest, where each node i produces reading, and all the readings constitute a set of jointly ergodic sources denoted by with distribution , which corresponds to the spatial correlation structure known by each node a priori. To obtain the spatial correlation structure, different estimation approaches can be employed. According to the Slepian-Wolf Theorem, the nodes can jointly encode their data without inter-node communication, with a rate (in bits) that is lower bounded by their joint entropy, as long as their respective rates satisfy the constraints


for all, where is a set of the indices of sensor nodes in the network, is the complementary set of, is the entropy of , and



Thus, for an arbitrary ordering of N there exists a rate allocation (vector) such that the number of generated bits from all nodes can achieve the value of their joint entropy, e.g.,




Therefore, a cluster of nodes A can be encoded with bits using Slepian-Wolf coding without communicating with each other, where |A| is the number of nodes in cluster A.

4. Proposed work

To efficiently use slepian-wolf coding, there exists an optimal cluster formation algorithm using DOC protocol as described in [5]. However, this protocol does not consider the re-establishment of cluster head in case of failure. Since this encoding scheme is not fault tolerant an energy efficient mechanism is proposed. The proposed scheme, a Fault Tolerant Mechanism (FTM) two cases are considered: 1. CLUSTER HEAD FAILURE, 2. NODE FAILURE. In addition to that a security mechanism to secure the data of the sensor nodes is also described.

4.1 Cluster Head Failure:

In WSN, a major amount of energy of the sensor nodes is consumed due to packet transmissions. Since the cluster head transfers more data and is the one which is responsible for communicating with the sink on behalf of the entire cluster, the energy of the cluster head drains out fast. This may lead to cluster head failure due to which the entire data of the cluster is lost. Hence a mechanism to select a new cluster head is proposed. The description of the modifications done to the DOC protocol to enable re-selection of cluster head in case of cluster head failure are as follows. In the DOC protocol [5], the cluster head is selected based on the minimum average entropy of the node. During this process, the cluster head is selected in a distributed manner where a node compares the average entropy of its neighboring nodes and its own average entropy. The entropy of each node is calculated as specified in [5] according to equation (7) and (8). The one with minimum average entropy is selected as the cluster head. In addition, this cluster head will store the id of the node which is having the second lowest average entropy. Then, a threshold value is set and when the energy of the cluster head falls below this threshold value, a new cluster head is to be selected.

The energy-consumption model used is similar to that in [22] and the energy consumption and the remaining energy are calculated using equation (5). To send an n-bit message over a distance d, a transmitter consumes

ETX = Eelec + Eamp*d2 (5)

where Eelec = 50 nJ/bit is the energy consumption to run the radio circuit, and Eamp = 10 pJ/bit/m2 is the energy consumption to run the power amplifier.

In case a cluster head is about to fail, the node with the stored id whose average entropy is the second lowest is considered and say its id is N1. Then the cluster member which is nearest to the current cluster head is considered, say its id is N2. N1 and N2 are the favorable candidates to become a cluster head. The reason for considering N2 as one of the candidates for becoming a cluster head is due to the fact that its residual energy will be greater compared to that of the other cluster members. It is because the data packets sent by N2 to the cluster head is not transferred over a long distance as it is nearest due to which energy consumed for transmission will be less compared to others. It is assumed that in WSN, most of the energy of a sensor node is consumed during data transmission where the amount of energy consumed is directly proportional to the distance of the packets transmitted. The residual energy of a node is also an important parameter to be considered as it elongates the network lifetime and thereby, the data of the entire cluster can be received and decoded, without data loss, at the sink for a longer duration. Now considering the two candidates N1 and N2, a weighted average of average entropy and the residual energy for each node N1 and N2 is calculated.



Wi - weighted average of node i

AEi - average entropy of node i

REi - residual energy of node i

The node whose weighted average is greater is selected as the new cluster head. The current cluster head will send all its information regarding the list of neighboring nodes and other information required by a cluster head to the newly selected cluster head.

4.2 Node Failure:

Another aspect of fault tolerance is the node failure. In case of slepian wolf coding the decoding of data from one node depends on the data of the other node. Hence, the failure of a node N will affect the decoding of data of the other nodes that depend on N and thereby information of the other nodes is also lost. The encoding process using slepian-wolf coding as specified in [5] is that the reading of a sensor node XN is encoded with a rate of conditional entropy H(XN|XN-1,…,X1) where X1 is the cluster head and all other nodes X2 to XN-1 are the sensor nodes nearer to X1 than XN. Therefore, if any node fails, the data of other nodes which are encoded at a rate depending on the node that failed is also lost along with the data of the failed sensor node.

To avoid the loss of data of other active nodes that is dependent on the failed node for decoding, a fault tolerant mechanism is suggested. The procedure involves the assumption similar to [5] that the network correlation structure is known a priori. When the cluster head begins to collect information initially, it sends a list of nodes to each node that are nearer to the cluster head than itself. Later when information is to be collected by the cluster head from its cluster members, it sends a beacon signal to each of its members to check if the nodes are active. The cluster head waits for a certain period of time to receive the replies from each node. If the cluster head does not receive a reply from a node then it assumes that the node has failed and the cluster head will send the id of the failed node to all its other members. The cluster members, on receiving the id of the failed node, will delete this id from its list of nodes earlier received from the cluster head. Now the cluster members will encode at a rate without involving the failed node. Thus the data of the active nodes that are dependent on the failed node is not lost as the cluster members will not encode the data at a rate depending on the failed node.

4.3 Security mechanism:

According to the slepian-wolf coding scheme as described in [5], the data of an entire cluster depends on the decoding of the cluster head data and the cluster head's data is sent to the sink directly without any compression. This may result in the entire data of the cluster being decoded by attackers if the cluster head's data alone is known. Hence a security mechanism to encrypt the cluster head's data alone is sufficient to secure the data of the entire cluster.

The conventional way of ensuring the confidentiality of data is to encrypt the sensed data at each sensor node and then decrypt the data at the data sink(s) [18]. Since encryption and decryption consumes lot of computational energy, only few selected node's data are encrypted such that the confidentiality of the data transmitted is maintained. Thus, selecting only the cluster heads alone to encrypt their data will reduce the overall energy consumption within the clusters and also ensures the confidentiality of the data of the cluster members.

The security mechanism to maintain confidentiality of the sensor data can be divided into two phases: Key establishment and Encryption. Usually, encryption is performed either by symmetric key cryptography or asymmetric key cryptography.

In the key establishment phase, initially every node is pre-assigned with a unique master key (K) before network deployment. A key is to be generated for secure communication between a node and the sink namely Node-Sink key. Node-Sink (NS) key is to be established in order to secure the communication between a node and the sink which can be a function of the node id, sink address and the master key. After generating the NS key, the node deletes the master key (K) so that even in case of node capture the master key is not known. The data is encrypted using this NS key and hence can be decoded only by the sink which only has the master key. Here it is assumed that the sink cannot be compromised by the attacker nodes. The NS key of node I can be derived as follows:

NSi = F(i||sink address||K) (7)

Therefore a cluster head can encrypt its data using the NS key it has generated earlier and then transmits this encrypted data along with the data of the cluster members to the sink.

A computationally less, energy efficient algorithm is required for Wireless Sensor Networks(WSN) since the sensor nodes run on battery power and have less energy. There are few algorithms [21] that support wireless sensor networks like Rabin's algorithm, RC4, RC5 etc. The use of Rabin's algorithm can be energy efficient but the disadvantage is that it cannot be used in cases of numerical data as the decrypted data may have more than one value of which the correct data cannot be identified.

Hence, RC5 [20] block cipher encryption algorithm which has less number of computations is chosen as an efficient algorithm for WSN. The encryption is done only to the cluster head data as the overall energy of the network is conserved and thereby increasing the network lifetime. The encryption of the cluster head data alone is sufficient to maintain the data confidentiality of the cluster as the data of the remaining nodes are encoded with respect to the cluster head data and the data redundancy is eliminated using the slepian-wolf coding scheme. The confidentiality of the entire cluster is maintained if the data of the cluster head is protected. Hence, it is sufficient to encrypt the cluster head alone instead of encrypting the data of all the nodes so that energy is conserved.

5. Performance evaluation

The performance evaluation of the proposed mechanisms is evaluated using simulations in Omnet++. The system parameters that are considered for simulation as in [5] include network size as 100x100m2, initial energy at node as 2J.

Fig. 1 shows the effects of the correlation degree and the network size on the compression performance. The compression performance is measured in an overall compression ratio, which is the total amount of data produced in the whole network after CSWC (Clustered Slepian-wolf coding) [5] is applied over the total number of bits generated by all nodes without using this distributed source-coding scheme. The network size or the total number of sensor nodes n is set to be {80, 90, 100, 110, 120}. The parameter θ in the covariance model is set to be {0.01, 0.009, . . . , 0.0003}, where θ = 0.01 indicates low correlation, and θ = 0.0003 indicates high correlation.

Fig.1. Impact of degree of correlation parameter and network sizes on overall compression ratio

For the correlation structure, we assume that the observations at N sensor nodes are modeled as an N-dimensional random vector, which has a multivariate normal distribution with mean (0, 0,…, 0) and covariance matrix K, i.e.,

the density of X is (8)

and the differential entropy of is

bits (9)

where |K| denotes determinant of the matrix X [4]. An exponential model of the covariance to model the observed physical event such as electromagnetic waves [2], where dij denotes the distance between the nodes measuring Xi and Xj respectively. The parameter θ controls the relation between the distance dij and the covariance kij, and it can be set to be different values to indicate different correlation degrees within a given distance.

The performance of the proposed fault tolerant mechanism is evaluated based on the parameters of the time till which all the clusters fail to send data and get decoded. Another parameter considered, ratio of decodable clusters, is used to find the number of decodable clusters available even in the presence of node failure. Decodable clusters are considered to be those clusters in the network where a cluster contains a cluster head and at least a single cluster member without failure so that the cluster head can send its own data as well as the cluster member's data encoded by slepian-wolf coding to the sink. Once the energy of all the nodes within a cluster drains out, the cluster is considered to be non-decodable as the entire information of that cluster will be lost. The ratio of decodable clusters is defined as the ratio of the number of decodable clusters to the total number of clusters formed initially. Fig. 2 shows that the data from clusters are still received even in the presence of node failures. Only when a majority of the nodes fail, few of the cluster heads fail to send the data due to cluster head failure and non-availability of other cluster members that can take the role of the cluster head.

Fig.2. Ratio of decodable clusters Vs Number of nodes

The network lifetime parameter considered for performance evaluation is the time till all the clusters become non-decodable called as LCD (last cluster decodable). Fig. 3 shows the increase in the LCD lifetime by varying the number of nodes deployed in the network. The simulation results show that the time till which a cluster is considered to be decodable and able to send data to the sink is greater than the lifetime without using the proposed mechanism.

Fig 3. Network Lifetime Vs Number of nodes

6. Conclusion

A fault tolerant data aggregation scheme discussed above involves the initial formation of clusters using DOC [5] protocol. A cluster head reconfiguration and node failure tolerant scheme is also discussed based on the minimum average entropy of the nodes. Data redundancy among the cluster members is reduced using Slepian-Wolf coding for efficient data transmission in wireless sensor networks (WSNs). Therefore a fault tolerant secured slepian-wolf coding scheme is achieved based on the formation of clusters and an efficient cluster head reconfiguration scheme applicable to the slepian-wolf coding technique.