Secure Data Aggregation In Sensor Networks Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Sensor networks are solutions to many monitoring problems. A sensor network consists of many wireless sensors which are used to monitor and control physical environments from remote locations. Sensor nodes often have limited computation, resources and battery power. So data transmission is very energy consuming process. To maximize the performance, it is essential to minimize the number of data send and received by each node. One of the solutions to this problem is data aggregation However, it may make network less secure. This paper describes a survey on "Secure Data Aggregation in sensor networks".

KEYWORDS: Sensor Networks, Secure aggregation, Data aggregation


A wireless sensor network is used to monitor physical environmental conditions such as temperature, pressure; sound, vibration etc., and then it cooperatively pass the data to a main server. so it is using for many military applications such as battle field surveillance and today's it is also using for process monitoring and control, machine health monitoring, environment monitoring, and home automation etc. The sensor network consist of thousands of nodes each connected to one sensor, they are collectively monitor an area and generating a substantial amount of data.

One of the major design challenges in these networks is to reduce the energy cost of transmitting this large volume of data. Data or information aggregation is using for this purpose, in which designated nodes called "aggregators collect data, process it locally and reply back to the queries of a remote user. However, if malicious codes are present in the network .then their influences on the collected data will be significantly magnified. Also the sensor nodes and aggregators may be compromised to physical tampering. Such a compromised aggregator can produce spurious results to remote user. Therefore processing and aggregation need to be resilient against attacks. Attacker can perform a denial of service and stop responding to queries. One of such attack is the stealthy attack in which attacker goal is to make available false aggregation results to the end user which are different from the results that the users want. That is we want to guarantee that the user accepts a reported aggregation results close to the true aggregation value with high probability Techniques are needed to ensure that user is obtaining the correct data even when the sensor node and aggregators are under the control of an attacker, who is attempting to inject the false data. Many protocols are also using for the secure data aggregation. This paper describes different methods for secure data aggregation and the comparative study of secure data aggregation and transmission in sensor networks and its effects. Here considering 4 ways of secure aggregation. , First one is the cluster based secure system, second one is a secure hop by hop data aggregation protocol, third one is the privacy preserving data aggregation and its two methods, and last one is the secure in-network aggregation.


In data aggregation we use a term known as sink which represents the base station to which data are gathered. Data aggregation collects most critical data and makes it available to sink in an energy efficient manner with minimum data latency.

2.1 Secure DAV: A Secure Data Aggregation and Verification protocol in sensor networks

The main goal of this protocol is to provide secure and reliable communication in a sensor networks. A term known as "cluster" is used to represent collection of one or more nodes in sensor networks. Instead of each node send data to the sink it is aggregated in the network itself and then unicast to the sink. Thus it saves the energy of data transmission by transmitting it to a cluster head. The cluster-head aggregate the data and send to the base station. Here also assumes that upto t nodes only compromised with in a cluster. For the secure data aggregation it made the followings steps:

1. Developed a key establishment protocol to generate a secret key for each cluster. This secret key is shared by each sensor node. This share works as partial signature on the readings which ensures authentication.

2. Secure DAV is used to ensure that sink does not accept faulty readings for an upper bound t of compromised clusters... Threshold signatures are used for verification.

CKE (Cluster Key Establishment Protocol) is used to generate a secret key for each cluster. The public key for the secret key is known to all nodes within the cluster as well as the base station(sink).Because of the Secret key is hidden the cluster is secure from attacks. Each cluster generates a public/private key pair, and broadcast public key to all nodes within the cluster. The key pair is different from the shared secret key which is used for secure transmission with other nodes. The shared key is generated using Diffie-Hellman algorithm. This shared key is used to generate partial signatures. Cluster-head will collect all the partial signatures and to form a full signature and send to the base station along with aggregated data. Then the basesation, provided with public key, verify the signature.CKE protocol is executed with in each node and generate keys. After the CKE protocol is executed each sensor will get its own share of the secret key.

In the next step, cluster-head of each cluster aggregates the readings from the sensors and compute the average, and broadcast the average to all members of the cluster. A comparison will be performed on each sensor node with this average value if the difference is less than threshold-average, each sensor node will create its own partial signature and send it to the cluster-head. The cluster head make it as a full signature and send this full signature along with the average to sink. Here the base station or sink verifies the validity by using its public key. Since the cluster key is not available to attacker it can't generates the full signature. Threshold signature ensures the authenticity of message, thus integrity of the message is also ensured and because of only nodes upto 't' are compromised within the cluster we can ensure that basesation cannot accept faulty messages.

2.2 SDAP: Secure Hop by Hop Data Aggregation Protocol for Sensor Networks.

To reduce the communication overhead and energy lose in sensors one of the method called hop-by-hop data aggregation is used. In which sensor nodes are organized into a tree hierarchy with base station at the root. The non-leaf nodes are acted as aggregators, which is responsible for collecting the data from their child nodes and send it to the base station. Thus data are processed and combined at each hop on the way to base station. In this way it reduces the communication overhead. However, hop-by-hop data aggregation may subject to false data injection attacks. An attacker compromises the nodes and obtains confidential information such as private keys and can make changes in the data collected. It results in fusion of false data in the parent node. This attack becomes more serious when multiple nodes are compromised. Because of all individual sensor readings are lost in the per-hop aggregation process, and base station doesn't have the ability to know the original readings it is impossible for a base station to verify the correctness of aggregated data. SDAP is a solution which allows base station to verify the aggregated data without losing the efficiency of per-hop data aggregation.

SDAP is designed based on the two principles "divide-and-conquer and commit -and-attest". In the divide and conquer method it dynamically partition the topology of tree into multiple logical groups called subtrees of similar sizes. Commit-and-attest method ensures that once a group commits its aggregation then it can't deny later. The design goals includes:

- To defend against false data injection attacks

- And provide low communication overhead.

SDAP uses a novel probabilistic grouping technique to partition the nodes in a tree into subtrees of same sizes. Then it performs a commitment based hop-by-hop aggregation in each group. Then it generates a group aggregated result .Based on this group aggregated results basesation will identify the suspicious groups. Finally an attestation process is performed with each of this group of aggregated results to prove its correctness. Thus we can define the performance of this protocol in 3 phases: a) Query dissemination b) Data aggregation c) Attestation.

a) Query dissemination

In the data dissemination step first we construct a tree by using some tree construction algorithm. After constructing the tree, base station or sink can disseminate the query message through this tree. Along with the aggregation function a random number is also added to the query. This functions as a grouping seed, used for the probabilistic grouping in next phase... BS →→ âˆ-: Fagg, Sg ,, represents the equation,Fagg is the specific aggregation function and Sg is the random number.

b) Data Aggregation

Through the query dissemination all nodes have identified their parent nodes .In this phase SDAP randomly groups all the nodes into multiple logical groups and performs aggregation in each group. Then it perform probabilistic grouping through the selection of leader node for each group.

Group leaders are selected on-the-fly based. Two functions are used for this purpose, a cryptographically secure pseudo-random function H, and grouping function Fg...A node x become leader node when H (Sg|x) < Fg(c) where c is the count value of node x. After selecting the leader node grouping is performed. Two ways of data aggregation is performed Leaf node aggregation and Intermediate node aggregation. Each packet for aggregation maintain 3 values sender's id, an aggregated data value and a count denotes how many nodes contributing to the aggregation. In addition flag field is keeping, flag value of 1 indicate further aggregation is not needed. Leaf node aggregation is processing from leaf node to base station In the intermediate node aggregation ,when it gets aggregate from its child, it checks the flag, it perform further aggregation only if the flag is '0" otherwise it directly forward the packet to its parent node.

c) Attestation

Base station needs to verify the authenticity of the aggregated messege, after it is received from the group leaders. This includes verifying the packet content and verifies the leader's authenticity. Base station verifies the authenticity by checking whether H (Sg|x) < Fg (cx) If this is not satisfied or any item in the packet is invalid the base station will discard the packet.

2.3 PDA: Privacy-preserving Data Aggregation in Wireless Sensor Networks.

Preserving data aggregation during data aggregation is a challenging process. There are two PDA schemes CPDA-Cluster based Private Data Aggregation and SMART-Slice Mix AggRegaTe. CPDA has the advantage of less communication overhead. The goal is to provide a gap between collaborative data collection and data privacy in sensor networks. Packet loss is one of the major problem in wireless sensor networks, otherwise these method s can guaranteeing that no private sensor reading is released to other sensors. These schemes are built on the top of existing protocols so it is providing both security and privacy. In CPDA scheme one or more sensor nodes are formed as clusters. Data that are aggregated in each cluster are then collectively aggregated to a base station or a sink. But in SMART, it preserves its private data by slicing into pieces. Each of these slices is encrypted and sends to different intermediate aggregation nodes. When intermediate nodes receive these pieces it calculates intermediate aggregate values and further aggregate them to the sink. In both schemes privacy is preserving along with aggregation. Both schemes are better than general data aggregation method TAG, because privacy is provided here.

CPDA and SMART prevent attackers by encrypting messages. In which key distribution consist of 3 phases (1) key pre-distribution, (2) shared-keydiscovery, and (3) path-key establishment. In the pre-distribution phase, there is a large key-pool of K keys and their corresponding identities are generated. During the key-discovery phase, each sensor node identifies a common key which is shared to a neighbouring node by exchanging discovery messages. If two neighbouring nodes share a common key then there is a secure link between two nodes. In the path-key establishment phase, a path-key is assigned to the pairs of neighbouring sensor nodes who do not share a common key.

(a). Cluster-based Private Data Aggregation (CPDA)

It consists of 2 steps Formation of clusters and calculation within clusters. A distributed protocol is used for the construction of clusters to perform intermediate aggregations. When it gets a HELLO message, a cluster node elects itself as a cluster leader and sends HELLO messages to its neighbours; otherwise it waits for HELLO messages from neighbours. It becomes a cluster by sending JOIN message to others. The second step of CPDA is the calculation of intermediate aggregations .After the calculation of intermediate aggregation next step is the data aggregation. A common technique used for the data aggregation is to build a routing tree. CPDA is implemented on the top of TAG (Tiny Aggregation Protocol).So each cluster leader should transmits the aggregated result to server through the TAG routing tree.

(b) Slice-Mix-AggRegaTe (SMART)

One disadvantage of CDPA is the computational overhead within clusters. As the name implies it is a 3 step process

STEP 1-Slicing: Each node randomly selects a set of nodes within hops then it slices its private data randomly into pieces. One of the nodes is kept within itself; others are encrypted and send to nodes in the randomly selected set.

STEP 2-Mixing: When a node gets an encrypted slice, it decrypts the data using its shared key. It sums up slices only after it receives all the slices.

STEP 3: Aggregation: In this step all nodes aggregate the data and send the result to the server. Like that of CPDA aggregation is done using tree-based routing protocols. When it receives all the data slices, it sums up and forward to the server

Then it evaluates the privacy of data using privacy metric. Table below shows the comparison between CPDA and SMART



Efficiency in Privacy preservation



Communication overhead



Aggregation accuracy



Computational overhead



2.4 Secure Hierarchical In Network Aggregation in Sensor Networks.

In most of methods including above assumes that all intermediate nodes are protected. This method guaranteed to detect any manipulation of the data aggregation by an attacker. This is the first provable secure network data aggregation. This algorithm limits the ability of an attacker in manipulating the aggregated data. Unlike other schemes this algorithm is designed for heirarchial data aggregation strategies. Assumption made includes: it uses a general multihop network with n sensor nodes and a single base station, can communicate with the user outside the network. And also assumes aggregation is performed over a aggregation tree. For the secure transmission it also assumes that each sensor node is provided with a unique Id and shares a secret key with the server. Also assumes that sensor nodes have the capability to perform symmetric-key encryption and decryption. An algorithm is said to be optimally secure, if an attacker is unable to induce the querier to get any aggregation result which is not directly available through data injection. The goal of this algorithm is to build a optimally secure aggregation algorithm with sublinear edge congestion. Congestion is a commonly used metric it measures how quickly the overloaded nodes exhaust their batteries.

SUM algorithm-This algorithm computes a cryptographic commitment structure over the data values of the sensor nodes. The overall process includes 3 steps: Ouery dissemination, Aggregation commit and result-checking. In Query dissemination base station broadcast the query to the nodes in the aggregation tree or a directed spanning tree over the network. In the aggregation commit the sensor nodes iteratively construct a commitment structure which is similar to a hash tree. The leaf nodes will transmit the data to their parents. Aggregation is performed by each intermediate node. When a node performs aggregation operation, it creates a commitment to the set of inputs used for the aggregation .Commitment is performed by computing hash over all the inputs. Both the aggregation result and the commitment are then passed on to the parent node. After all commitment values are reported to the base station so the attacker can't claim a false aggregation result. The third step is the result-checking; querier will disseminate the final commitment values to the rest of the network in an authenticated broadcast. At the same time sensor nodes disseminate information that will allow their peers to verify that their respective data values. If a sensor node determines that its data value was indeed added towards the final sum, it sends an authentication code up the aggregation tree towards to the base station. Authentication codes are aggregated along the way with the XOR function for communication efficiency. When the querier has received the XOR of all the authentication codes, it can then verify that all the sensor nodes have confirmed that the aggregation structure is consistent with their data values. If so, then it accepts the aggregation result.

This algorithm is based on a novel method of distributing the verification of aggregation results onto the sensor nodes, and combining this with a unique technique for balancing commitment trees to achieve sublinear congestion bounds.


Security in data aggregation and transmission is an important issue while designing the sensor networks. In most of the cases sensors are deployed in an open environment. So they may susceptible to physical attacks which might compromise the sensor's cryptographic keys. Secure aggregation of information is a challenging task if the data aggregators and sensors are malicious There many methods for secure data aggregation. Some of them are discussed above.

Many issues have been addressed for secure data aggregation. That includes a) Some sensor nodes may be compromised and transmit false data values to the aggregator b) The aggregator itself may be compromised and generate false aggregate values to base station .c) Estimation errors in the sampling techniques while computing the aggregation. Most of the above discussed protocols and method gives solution to these problems.

The Secure DAV protocol described one of the methods of secure data aggregation by the encryption of cluster nodes. It uses threshold signature with elliptic curve systems to ensure the authenticity of the aggregated message. For establishing a cluster key this algorithm developed a protocol and hierarchical network structure using elliptic curve cryptosystems. The cluster key is secret from all the sensor nodes. Thereby it eliminates eavesdropping of the cluster key. Each sensor that has a share of the secret cluster key generates partial signatures which is then combined at the cluster-head and given to the base station for authentication. Thus SecureDAV protocol ensures that the base station accepts the aggregate readings

with high reliability, even if the cluster-head is compromised nodes within a cluster. But there are two main issues in the encryption methods, size of the encrypted message and the time needed for the encryption of messages or data.

Another security model used is SDAP, it prevents several general attacks. It is based on the commit-and-attest technique .Aims to ensure that group attestation process will detect the attack, if a. group hasn't report its original aggregate after committed its aggregation result. It will detect the attack by finding the inconsistency between the committed aggregate and/or MAC This technique is secure as long as we use a cryptographically secure MAC function such as HMAC. Advantage is it produces littile overhead compared to general hop-by-hop aggregation process. SDAP is also following the divide and conquer approach; here we partition the tree into subtrees. But commit and attest method is usually used so that base station has a way to verify the aggregated result.

Third type of secure aggregation is privacy based, it discussed two methods for the aggregation CPDA and SMART. CPDA is similar to Secure DAV, i.e. it is cluster key based aggregation. But compared to secure DAV it is having fair communication overhead. SMART is another method, it is a slice method; here key is sliced into pieces so it is more secure than CPDA and Secure DAV. Comparison of both Fourth type of security is provides through a in-network aggregation. Unlike other schemes it is built on the top of existing methods so it is seeming more secure than any other methods so far discussed.


Thus this paper presented a comprehensive survey of secure ways for data aggregation in sensor networks. All of them focus on securing the aggregation result from an attacker who is attempting to inject the falsified data. Encryption of data at each sensor node as well as at the base station and then allows decryption by exchanging keys are the main focus area of the secure data aggregation. This also described another methods over the existing methods for preserving the data while aggregation. And then it conduct a comparative study of these different methods on the basis of "how they are provided security during data aggregation" .Many other methods are available for secure data aggregation in sensor networks.