Developing data-centric sensor

Introduction

SENSOR networks are envisioned to be extremely useful for a broad spectrum of emerging civil and military applications such as remote surveillance, habitat monitoring, and collaborative target tracking. Sensor networks scale in size as time goes on, so does the amount of sensing data generated. The large volume of data coupled with the fact that the data are spread across the entire network creates a demand for efficient data dissemination access techniques to find the relevant data from within then network. This demand has led to the development of Data- Centric Sensor (DCS) networks DCS exploits the notion that the nature of the data is more important than the identities of the nodes that collect the data. Thus, sensor data as contrasted to sensor nodes are "named," based on attributes such as event type (e.g., elephant-sightings) or geographic location. According to their names, the sensing data are passed to and stored at corresponding sensor nodes determined by a mapping function such as Geographic Hash Table (GHT) As the sensing data with the same name are stored in the same location, queries for data of a particular name can be sent directly to the storing nodes using geographic routing protocols such as GPSR , rather than flooding the query throughout the network.

Fig. 1 shows an example of using a DCS-based sensor network to monitor the activities or presence of animals in a wild animal habitat. The sensed data can be used by zoologists to study the animals or by an authorized hunter to locate certain types of animals (e.g., boars and deers) for hunting. With DCS, all the sensing data regarding one type of animals are forwarded to and stored in one location. As a result, a zoologist only needs to send one query to the right location to find out the information about that type of animals. Similarly, a soldier can easily obtain enemy tank information from storage sensors through a DCS-based sensor network in the battlefield. In many cases, DCS-based data dissemination offers a significant advantage over previous external storage-based data dissemination approaches, where an external base station (BS) is used for collecting and storing the sensing data. If many queries are issued from nodes within the network external storage-based scheme is very inefficient since data must be sent back and forth between the sensors and the BS, thus causing the nodes close to the BS to die rapidly due to energy depletion. Further, for sensor networks deployed in hostile environments such as a battlefield, external BS may not be available because the B is very attractive for physical destruction and compromise, thus becoming a single point of failure from both security and operation perspectives. In contrast, the operation of a DCS

system does not assume the availability of a persistent BS; instead, mobile sinks (MSs) such as mobile sensors, users, or soldiers may be dispatched on-demand to collect the stored data (or to perform other tasks) on appropriate occasions. The previous DCS systems, however, were not designed with security in mind. All data of the same event type are stored at the same node or several nodes based on a publicly known mapping function. As long as the mapping function and the types of events monitored in the system are known, one can easily determine the locations of the sensors storing different types of data. In our previous example, a zoologist can use the DCS system to locate any animals of interest, whereas a hunter is only permitted to hunt certain kinds of animals (e.g., boars and deers) but not the protected ones (e.g., elephants). Nevertheless, a nonconforming hunter may acquire the locations of the protected animals for hunting purpose. As such, security and privacy should be provided for DCS system. Securing DCS systems is complicated by the network scale, the highly constrained system resource, the difficulty of dealing with node compromises, and the fact that sensor networks are often deployed in unattended and hostile environments. The low cost of sensor nodes (e.g., less than as envisioned for smart dust [8]) precludes the built-in tamper-resistance capability of sensor nodes. Thus, the lack of tamper resistance coupled with the unattended nature gives an adversary the opportunity to break into the captured sensor nodes to read out sensor data and cryptographic keys. We present pDCS, a privacy-enhanced DCS system for unattended sensor networks. To the best of our knowledge, pDCS is the first one to provide security and privacy to DCS networks. Specifically, pDCS provides the following features. First, even if an attacker can compromise a sensor node and obtain all its keys, he cannot decrypt the data stored in the compromised node. Second, after an attacker has compromised a sensor node, he cannot know where this compromised node stored its event data generated in the previous time intervals. Third, pDCS includes very efficient key management schemes for revoking a compromised node once its compromise have been detected, thus preventing an attacker from knowing the future storage location for particular events. Finally, pDCS provides a novel query optimization scheme to significantly reduce the message overhead without losing any query privacy.

The salient features of pDCS are due to the following techniques. Instead of using a publicly known mapping function, pDCS provides private data-location mapping based on cryptographic keys. The keys are assigned and updated to thwart outsider attackers or insider attackers from deriving the locations of the storage cells for previous sensor data. The key management scheme for updating compromised keys makes a seamless mapping between location keys and logical keys. On the other hand, as private mapping may reduce the efficiency of sending MS queries, the proposal of various query optimization techniques are used which based on euclidean Steiner Tree (EST) [9] and keyed Bloom Filter (KBF) to minimize the query message overhead while maintaining the same query privacy.

AIM

To design an, a privacy enhanced DCS network with different levels of privacy that a data can be accessed where the levels of privacy is based on different cryptographic keys.

Objective

In sensor networks the demand to access the efficient data using some techniques to find relevant data (in sensor networks) had led to the development of Data-Centric Sensor (DCS) network. In this the attributes based naming in the sensor data are related to geographical location or event type. Saving the data in the sensor network is also a security problem with various factors like unattended nature of sensor network and lack to tamper resistance of the sensor nodes. An attacker can also locate the node compromise it by storing the event of his interest. The main objective is to provide security for data centric sensor networks.

Wireless communication

"The increasing demand for high data rates in wireless communications due to emerging new technologies makes wireless communications an exciting and challenging field. The spectrum or bandwidth available to the service provider is often limited and the allotment of new spectrum by the federal government is often slow in coming".

The devices which are power requirements ought to use little power in order to conserve the batter life and reduce the production level. This leads the wireless designers to face challenge as a two-part one, higher data rates and also to develop the performance though there is no change in power. This channel is volatile, unsystematic and when compared to general i.e. wired channel error rates are poorer.

Many of the current and emerging wireless communication systems make use of diversity in case of designing a classic and well-known concept that has been used for the past half century in order to combat the detrimental effects of multi-path fading. Indeed, diversity techniques at the receiver use two or more copies of the same information-bearing signal are combined skillfully to increase the overall signal-to-noise ratio (SNR) and it still offer one of the greatest potential for radio link performance improvement to many of the current and future wireless technologies. For example, to meet stringent requirements for quality service requirements and spectrally efficient multilevel constellations, antenna (space) diversity is needed to offset penalty on the SNR due to fading and denser signal constellation. In addition, one of the most promising features of wideband code division multiple access systems is their ability to resolve additional multipath, resulting in an increased multipath diversity which can be exploited by rake reception.

The major problem that is faced in Wireless communications is out-of-phase reception of multi-paths that cause bottomless attenuation in the output signal that is known as Fading. Next to fading, deep fade occurs. This decrease is referred to as signal-to-noise ratio (SNR) and this result in inaccuracy access. Thus, this breaks the performance of the signal.

In Fading, incorrupt communication is done via use the diversity techniques, where the receiver is an ordered multiple model for the transmitted signal and thus come under fading conditions. Thus, these reduce the probability and they are affected by a deep attenuation.

Wireless Propagation characteristics

In radio means, the signal at the receiver consist s of a single direct path signal, due to which it would be a perfect reconstruction of the transmitted signal.

The signal is changed during transmission in the real channel. The output signal consists of a combination of attenuated, reflected, refracted, and diffracted replicas of the transmitted signal. And this channel then adds noise to the signal which cause a shift in the frequency of carrier if the transmitter, or receiver is moving which is nothing but Doppler Effect. As these effects on the signal affect the performance of a radio system it implies that it is dependent on the characteristics of radio channel.

1.7 FACTORS AFFECTING WIRELESS COMMUNICATION

To improve the consistency of wireless channels which range from new transport-layer procedures to stout physical-layer schemes, the following schemes are included to improve the modulation and coding. The evolution and group of techniques are based on the errors or corrupted signal that is occurred in statistically environment of errors. Thus, an excellent considerate of the nature of errors or corruptions that had been occurred in these channels is crucial in having a consistent wireless communication for upper-layer applications. . Some of the main causes of bit errors, and consequently packet losses, in the widely deployed in wireless channel as described below are defined by Haowei Bai:

"Attenuation: This is due to a decrease in the intensity of electromagnetic energy at the receiver (e.g., due to long distance), which leads to low signal-to-noise ratio (SNR).

Inter Symbol Interference (ISI): This is caused by delay spread (the arrival of a transmitted symbol is delayed), resulting in partial cancellation of the current symbol.

Doppler shift: This is due to the relative velocities of the transmitter and the receiver. Doppler shift causes frequency shifts in the arriving signal, thereby complicating the successful reception of the signal.

Multipath fading: Caused by multipath propagation of radio frequency (RF) signals between a transmitter and a receiver. Multipath propagation can lead to fluctuations in the amplitude, phase, and angle of the signal received at a receiver". (Bai, 2003)

What is Wireless sensor network (example)

Large number of heterogeneous sensor devices (Ad Hoc Network) and complex sensor nodes with communication, processing, storage capabilities

Challenges of WSN

  • Requirements: small size, large number, tetherless and lowcost. Hence constrained by� Energy, computation and communication
  • Small form factors => prohibits large long lasting batteries
  • Cost & energy => low power processors, small radios with minimum bandwidth & small transmission ranges.
  • Ad-hoc deployment => no maintenance and battery replacement
  • Increase NW lifetime => No raw data to gateway for compiation

Literature survey

Existing System:

The previous DCS systems, however, were not designed with security in mind. All data of the same event type are stored at the same node or several nodes based on a publicly known mapping function. As long as the mapping function and the types of events monitored in the system are known, one can easily determine the locations of the sensors storing different types of data. Securing DCS systems is complicated by the network scale, the highly constrained system resource, the difficulty of dealing with node compromises, and the fact that sensor networks are often deployed in unattended and hostile environments. The low cost of sensor nodes (e.g., less than as envisioned for smart dust ) precludes the built-in tamper-resistance capability of sensor nodes.

Proposed System:

We present pDCS, a privacy-enhanced DCS system for unattended sensor networks. To the best of our knowledge, pDCS is the first one to provide security and privacy to DCS networks. Specifically, pDCS provides the following features. First, even if an attacker can compromise a sensor node and obtain all its keys, he cannot decrypt the data stored in the compromised node. Second, after an attacker has compromised a sensor node, he cannot know where this compromised node stored its event data generated in the previous time intervals. Third, pDCS includes very efficient key management schemes for revoking a compromised node once its compromise has been detected, thus preventing an attacker from knowing the future storage location for particular events. Finally, pDCS provides a novel query optimization scheme to significantly reduce the message overhead without losing any query privacy.

Need of the project

For security purpose in wireless sensor network and

Privacy

Related Diagrams

Arichitectural Diagram

Data flow diagram

Class Diagram

Modules

Related works

We introduce the related work in three categories: privacy and anonymity, key management, and location-based forwarding. 2.1 Location Privacy and Communication Anonymity There are mainly two approaches for restricting MS access to sensor data: policy enforcement and data perturbation. In the spirit of the first approach, Myles et al. , Hengartner and Steenkiste , and Snekkenes studied the issue of specifying location privacy policies on which access control decisions are based. Alternatively, anonymity mechanisms could also be employed to provide the required level of privacy by properly perturbing the sensor data before its release. Gruteser et al. proposed techniques such as data cloaking and hierarchical data aggregation to prevent an attacker from tracking the precise location of an individual monitored by sensors. The main difference between our work and the previous work is that we achieve sensor data privacy in an unattended environment by encryption as well as random location mapping, instead of policy enforcement or data perturbation. These techniques are complementary to each other and could be applied jointly if needed. Deng et al. studied how to conceal BSs from outsiderattackers. In their schemes, all sensor nodes transmit at a constant rate and the mix technique is used to hide sender-receiver correlations. Ozturk et al. studied an outsider attack in which a single attacker tries to trace back to the data source by analyzing the observed traffic in sensor networks where sensor nodes report sensing data to a fixed external sink. To defend against the attack, a phantom flooding scheme is proposed to disturb the traffic pattern and mislead the attacker. Currently, pDCS does not include its own anonymous communication techniques yet. Instead, it relies on one of the existing schemes to provide the service when required. In [, we proposed a preliminary version of pDCS, but important issues such as A DCS-based sensor network which can be used by zoologists (who are authorized to know the locations of all animals) and hunters (who should only know the locations of boars and deers, but not elephants). Key management and load balancing evaluation have not been addressed yet.

2.2 Key Management for Sensor Networks

Key management for sensor networks has been extensively studied recently. There are pair wise key establishment schemes using a trusted third party (BS) , exploiting the initial trustworthiness of newly deployed sensors [19], and based on the framework of probabilistic key pre deployment . pDCS may adopt one of these pair wise key establishment schemes according to security requirements and resource constraints. Many logical-key-tree (LKH)-based group key management schemes have been proposed for secure multicast in wired networks, including LKH [28], ELK [29], subset difference to name a few. Since these schemes were\ not designed for sensor networks, they are less optimized and less efficient when employed in sensor networks directly. A few schemes also discussed the management of group keys in sensor networks. In , an updated group key is distributed in a network through hop-by-hop encryption by trading computation for communication. In, geographical information is exploited to map an LKH to the physical tree structure so as to optimize the energy expenditure of a group rekeying operation. There are mainly two differences between our key management scheme and the above. First, in addition to group key updating, in pDCS row keys and cell keys also need to be updated upon a node revocation. Second, in pDCS, the key encryption keys (KEKs) in an LKH are location-dependent keys and our cell-based network partition allows our scheme to further reduce rekeying overhead.

2.3 Location-Based Forwarding

Location-based forwarding has been studied for both mobile ad hoc networks and sensor networks. The location- aided routing was proposed to reduce the cost of discovery by restricted area flooding when the uncertainty about a destination is limited. Greedy routing schemes, e.g., GPSR , choose the next hop that provides most progress toward the destination. In these schemes, the delivery of packets is guaranteed by planarizing the network graph and applying detour algorithms which avoid obstacles using the "right hand rule" strategy. Niculescu and Nath proposed trajectory-based routing, in which the source encodes trajectory to traverse and embeds it into each packet. Upon the arrival of each packet, intermediate nodes employ greedy forwarding techniques such that the packet follows its trajectory as much as possible. With this scheme, routing becomes source-based while there is no need for maintaining routing tables at intermediate nodes. We note that the scheme in is suitable for a regular shape trajectory, not for totally random shape trajectory, which is the case in pDCS. pDCS employs two approaches for forwarding query packets to randomly distributed locations. One is trajectorybased routing, in which the trajectory is explicitly encoded in each packet using EST. In another approach, a novel KBF technique is applied to encode the trajectory implicitly, which can achieve destination anonymity while guaranteeingthat each query packet reaches its destination.

3 MODELS AND DESIGN GOAL

3.1 Network Model

As in other DCS systems, our pDCS system also assumes that a sensor network is divided into cells (or grids) where each pair of nodes in neighboring cells can communicate directly with each other. Cell is the minimum unit for detecting events (referred to as detection cell) and for storing sensor data (referred to as storage cell); for example, a cell head coordinates all the actions inside a cell. Each cell has a unique ID and every sensor node knows in which cell it is located through a GPS when affordable. In the cases either GPS services are not available or GPS devices are too expensive, attack-resilient GPS-free localization techniques may be employed instead because pDCS does not rely on absolute coordinates. For example, in Verifiable Multilateration (VM) , distances are measured based on radio signal propagation time and it provides secure and reasonably accurate sensor positioning. We assume the events of interest to the MSs are classified into multiple types. For example, when a sensor network is deployed for monitoring the activities and locations of the animals in a wild animal habitat, all the activities of a certain kind of animal may be considered as belonging to one event type. We do not assume a fixed BS in the network. Instead, a trusted MS may enter the network at an appropriate time and work as the network controller for collecting data or performing key management. We also assume the clocks of sensor nodes in a network are loosely synchronized based on an attack-resilient time synchronization protocol

3.2 Attack Model

Given the unattended nature of a sensor network, an attacker may launch various security attacks in the network at all layers of the protocol stack . Due to the lack of a one-for-all solution, in the literature, these attacks are studied separately and the proposed defense techniques are also attack specific. As such, instead of addressing all attacks, we will focus on the specific security problems in our pDCS network. We assume that in a pDCS network the (ultimate) goal of an attacker is to obtain the event data of his interest. To achieve this goal, an attacker may launch the following attacks: . Passive attack. An attacker may passively eavesdrop on the message transmissions in the network.

Query attack.

An attacker may simply send a query into the network to obtain the sensor data of interest to him.

Readout attack. An attacker may capture some sensor nodes and read out the stored sensor data directly. It is not hard to download data from both the RAM and ROM spaces of sensor nodes (e.g., Mica motes Mapping attack.

In this attack, the goal of an attacker is to identify the mapping relation between two cells. Specifically, he may either identify the storage cell for a specific detection cell or reversely figure out the detection cell for a storage cell of his interest. Mapping attack is normally followed by a readout attack. The passive attack can be relatively easily addressed by message encryption with keys of sufficient length, and the query attack can be addressed by source authentication so that a node only answers queries from an authorized entity. Given that compromising nodes is much easier than breaking the underlying encryption/authentication algorithm, we assume that the readout attack and the mapping attack are more preferable to the attacker. Note that letting detection cells encrypt sensor data and store the encrypted data locally cannot address the readout attack because an attacker can read out the encryption keys from the captured sensor nodes as well.

3.3 Security Assumption

We assume that an authorized MS has a mechanism to authenticate broadcast messages (e.g., based on _TESLA ), and every node can verify the broadcast messages. We also assume that when an attacker compromises a node he can obtain all the sensitive keying material possessed by the compromised node. Note that although technically an attacker can compromise an arbitrary number of current generation of sensor nodes without much effort, we assume that only nodes in a small number of cells have been compromised. For instance, it may not be very easy for sensor nodes to be captured because of their geographic locations or their tiny sizes. Also, the attacker needs to spend longer time on compromising more sensor nodes, which may increase the chance of being identified. For simplicity, we say a cell is compromised when at least one node in the cell is compromised. To deal with the worst scenario, we allow an attacker to selectively compromise s cells. We assume the existence of anti-traffic analysis techniques if so required. If an attacker is capable of monitoring and collecting all the traffic in the network, he may be able to correlate the detection cells and the storage cells without knowing the mapping functions. Therefore, we assume one of the existing schemes may be applied to counter traffic analysis if the attacker is assumed to be capable of analyzing traffic.

3.4 Design Goal

Our main objective is to prevent an attacker from obtaining the data of his interest in a DCS network through various attacks. In more detail, our goal is to address the types of attacks that are specific to pDCS, i.e., passive attack, query attack, readout attack, and mapping attack. As passive attack and query attack are easy to address, below we mainly discuss the requirements to be met for addressing the readout attack and the mapping attack: . Event data confidentiality. Even if an attacker can compromise a sensor node and obtain all its keys, he should be prevented from knowing the event data stored in the compromised node.

Backward event privacy.

An attacker should be prevented from obtaining the previous sensor data for an event of his interest even if he has compromised some nodes.

Forward event privacy.

We should also thwart

(if not completely preventing) an attacker from obtaining the sensor data regarding an event in the future even if he has compromised some nodes.

Query privacy.

An MS query should reveal as little location information of the sensor data as possible. For example, if multiple events are mapped and stored in the same storage cell, a query for one of the events will also reveal the storage cell of the other events. As such, an attacker may eavesdrop on MS queries to minimize his efforts in launching a mapping attack.

In addition, as sensor networks are scarce in resources, especially the non regenerative power, our security mechanisms should be resource efficient. For example, we should avoid network-wide flooding and public-key operations if at all possible. Especially, as communication normally consumes much more energy than computation , we will prefer computation to communication when they achieve the same goal

Requirements

1.1 Business Description

Large volume of data spread across wide network. Efficient dissemination/access techniques to extract relevant data. In DCS nature of data is important than the identities of the node. Sensor data is named based on even type or geographic location.

  • Sensor data is stored in nodes determined by Geo. Hash Table (GHT)
  • Data with same name are co-located
  • Queries are sent directly using Geo. Routing protocol (e.g. GPSR) vs. flooding
  • Fig. 1 Sensing data about an animal aggregated and stored in one location
  • BS based is inefficient since large data is exchanged back and forth
  • Nodes close to BS will die very quickly due to energy depletion
  • BS is attractive for attack and single point of failure
  • DCS does not need presence of BS, Mobile sinks (MSs) are dispatched on demand to collect stored data.

1.2 Problem Statement

First, even if an attacker can compromise a sensor node and obtain all its keys, he cannot decrypt the data stored in the compromised node. Second, after an attacker has compromised a sensor node, he cannot know where this compromised node stored its event data generated in the previous time intervals. Third, pDCS includes very efficient key management schemes for revoking a compromised node once its compromise has been detected, thus preventing an attacker from knowing the future storage location for particular events. Finally, pDCS provides a novel query optimization scheme to significantly reduce the message overhead without losing any query privacy.

  • First one to provide security and privacy to DCS networks.
  • Can not get the sensor data from a node even with key compromise
  • Can not get previous event data even with node compromise.
  • Revokes compromised node to prevent attacks on future storage locations.
  • Provides novel query optimization to reduce message overhead still preserving privacy
  • Private data-location mapping based on cryptographic keys, with periodic key updates.
  • Query optimization based on Euclidean Steiner Tree (EST) and keyed Bloom Filter (KBF) to reduce message overhead.

2. System Planning

2.1 Time schedule for various phases

1st module

2nd module

3rd module

4th module

3. FUNCTION Requirement

3.1 DFd Diagram

3.2 Use case description

Mainly address readout and mapping attack. Event Data Confidentiality: Though keys of a node are compromised, can not decrypt data. Backward event privacy: Attacker is prevented from obtaining previous sensor data though some nodes are compromised. Forward event privacy: Thwart an attacker from obtaining future data though some nodes are compromised. Query Privacy: MS query reveal as little location information of sensor data. Resources constrained and hence avoid network wide flooding or public key operations as much possible.

4. NON FUNCTION REQUIRMENTS

a) Performance requirement

� Location Privacy and Communication anonymity

  • Restrict data access using policy enforcement and data perturbation.
  • Data Cloaking and hierarchical data aggregation
  • pDCS in contrast uses encryption and random location mapping.
  • conceal BS using constant rate and mix techniques to hide sender-receiver correlations.
  • phantom flooding and disturbed data to mislead attacker.

b) Interface Requirement

� Key Management

  • pair wise key management with trusted BS.
  • LKH based group key management for multicast.
  • Not suited for sensor networks.
  • updated group key distribution using hop-by-hop encryption
  • Use geographic based mapping for efficient group re-keying.
  • pDCS uses row keys and cell keys in addition to group key. Cell based partition reduces re-keying overhead.

� Location based forwarding

  • location aided routing to reduce flooding overhead
  • greedy routing (GPRS) chooses next hop that provides most progress to destination
  • pDCS uses trajectory based routing , trajectory encoded in each packet using EST. A novel KBF based approach

c) Operational requirement

  • Assumes attacker targets specific event data
  • Attacker may launch
  • Passive attack: By eavesdropping. Solution: encryption
  • Query attack: Send query to target data. Solution: Authentication e.g. using micro-Tesla for broadcast.
  • Readout attack: Capture some nodes and read data.
  • Mapping attack: Obtain mapping storage vs. detection cells.

Software Requirements:

  • Core Java
  • Swing Frond End
  • JDK 1.5
  • Windows XP

Hardware Requirements:

Hard Disk : 40 GB

RAM : 256 mb

Processor : Pentium IV

d) Resource Requirement

� Each sensor processes 5 types of keys

  • master key shared only with MS.
  • pair wise key shared with every neighbor.
  • row key shared by all sensors in same row.
  • cell key shared by all sensors in a cell.
  • group key shared by all sensors in a network.

� Sensed data handled using 6 steps (Event �E at Time -T, detection cell �u and storage cell �v)

  • determine storage cell using keyed hash function.
  • encrypts recorded information with cell key.
  • forward message towards destination. Apply techniques to prevent attacker analyzing traffic and injecting false packets.
  • Storage cell v stored the message locally.
  • authorized MS interested in event E at cell �u , determines storage cell �v using mapping and queries cell �v directly. Query optimization is used to reduce message overhead.
  • after MS receives data of interest, decrypts using cell key.

e) Security requirement

  • Without knowing mapping key attacker can not get the mapping of cell-u and cell-v
  • Since storage cell does not posses decryption key, readout attack is difficult though a node is compromised in cell �v.
  • Attacker can launch various attacks only if he knows the mapping.
  • Key point of the design hence is to secure mapping function to randomize mapping among cells.

f) Quality and reliability requirement

  • All m detection cells are mapped to one location
  • Attacker randomly compromise a node to get group key
  • Locate storage cell based on group key.
  • Data stored is encrypted using individual cell key. Attacker has to first get cell-ID randomly from m-detection cells.
  • Assume attacker compromise up to s cells.
  • First compromise cell is Storage cell with probability (1/N). Attacker will randomly compromise (s-1) cells from (N-1) cells.

Research

pDCS: PRIVACY-ENHANCED DATA-CENTRIC SENSOR NETWORKS

In this section, we first give an operational overview of pDCS. Then, we present several schemes to randomize the mapping function and propose efficient protocols to manage various keys involved in the system. Finally, we describe optimization techniques for issuing queries.

4.1 The Overview of pDCS

First of all, we assume that each sensor processes five types of keys, including master key (shared only with the MS), pairwise key (shared with every neighbor), cell key (shared by all sensors in the same cell), row key (shared by all sensors in the same row), and group key (shared by all sensors in the network). Different keys are useful in different schemes or under different circumstances. The details of key management will be discussed in Section 4.3. Our solution involves six basic steps in handling sensed data: determine the storage cell, encrypt, forward, store, query, and decrypt. We demonstrate the whole process through an example in which a cell u has detected an event E:

  1. Cell u first determines the location of the storage cell v through a keyed hash function.
  2. u encrypts the recorded information with its cell key. To enable MS queries, either the event type E or the detection time interval T is in its plain text format, subject to the requirement of the application.
  3. u then forwards the message toward the destination storage cell. Here, techniques should be applied to prevent traffic analysis and to prevent an attacker from injecting false packets.
  4. On receiving the message, v stores it locally.
  5. If an authorized MS is interested i azn the event E occurred in cell u, it determines the storage cell v and issues a query (optimized query schemes are discussed in Section 4.4).
  6. After it retrieves the data of interest, the MS decrypts it with the proper cell key .

The first step is for defending against the mapping attack. Without the mapping key, an attacker cannot determine the mapping from the detection cell to the storage cell. The second step is for preventing the readout attack. Since the storage cell v does not possess the decryption key for Me, an attacker is prevented from deciphering Me after he has compromised a node in v. Step 3 and Step 4 deal with forwarding and storing the sensed data, Step 5 shows the basic operation for issuing an MS query, and Step 6 describes the local processing of retrieved dataThe following sections focus on the performance and security issues related to Step 1, Step 2, Step 5, and Step 6. Currently, we assume some existing schemes for Step 3 and Step 4; we believe research in these areasbears its own importance and deserves independent study.

4.2 Privacy-Enhanced Data-Location Mapping

From the system overview, we can see that an attacker can launch various attacks if he can find the correct mapping relation between a detection cell and a storage cell. This motivated our design of secure mapping functions to randomize the mapping relationship among cells. Below, we present three representative secure mapping schemes in the order of increasing privacy. The following notations are used during the discussion.. To quantify and compare the privacy levels of different schemes, we assume that an attacker is capable of compromising totally s cells of his choice. To simplify the analysis, we assume that there are m detection cells for the event of interest to the attacker, and the locations of these m cells are independent and identically distributed (iid) over N cells. (In real applications, the locations of these m detection cells may correlate.) We further introduce the concept of event privacy level (EPL).

Definition 1. EPL is the probability that an attacker cannot obtain both the sensor data and the encryption keys for an event of his interest. According to this definition, the larger the EPL, the higher the privacy .This definition can be easily extended to the concepts of backward event privacy level (BEPL) and forward event privacy level (FEPL).

4.2.1 Scheme I: Group-Key-Based Mapping

In this scheme, all nodes store the same type of event E in the same location �Lr; Lc� based on a group-wide shared key K. To prevent the stand-alone readout attack, a cell should not store its data in its own cell. Hence, if a cell L�x; y� finds out its storage cell is the same, that is, Lr � x and Lc � y, it applies H on Lr and Lc until either Lr 6� x or Lc 6� y. To simplify the presentation, however, we will not mention the above case again during the future discussions.

Type 1 query. An MS can answer the following query with one message: what is the information about an event E? This is because all the information about event E is stored in one location. An MS first determines the location based on the key K and E, then sends a query to it directly to fetch the data by, e.g., the GPSR protocol (shortly we will discuss several query methods with optimized performance and higher query privacy).

Security and performance analysis

In this scheme, all m detection cells are mapped to one storage cell. An attacker first randomly compromises a node to read out the group key, based on which he locates the storage cell for the event. Because the data stored in the compromised node

were encrypted by individual cell keys and the IDs of detection cell were also encrypted, the attacker has to randomly guess the IDs of these m detection cells. Assume that an attacker can compromise up to s cells. If the first compromised cell is the storage cell1 (with probability 1=N), the attacker will randomly compromise �s _ 1� cells from the rest �N _ 1� cells. There are totally N-1 and s-1combinations, among which N-1-m and s-i-m and i-1 combinations correspond to the case where i out of m detection cells are all compromised .On the other hand, in the case when the first compromised node is not the storage cell, the attacker first compromise the storage cell, then randomly compromise �s - 2� cells from the rest �N - 2� cells. There are totally N-2and s-2 combinations, among whichN-2 �m s-2-I and m-i combinations correspond to the case where I out of m detection cells are all compromised. Also note that an attacker can only obtain im of the event data when i out of m detection cells are compromised. Let B1 � min�s -1;m�and B2 � min�s _ 2;m�, then the BEPL of this scheme isp1b �m; s� �1

Fig. 2 shows the analytical result of BEPL as a function of m and s for a network size of N � 20 - 20 � 400 cells, from which we can make two observations. First, without 1. For simplicity, we ignore the case when the first compromised cell is a detection cell. Our study shows that the error introduced by this simplification is negligible. Fig. 2. The BEPL as a function of m and s, where m is the number of detection cells and s is the number of compromised cells. surprise, BEPL decreases with s. Second ,BEPL does not change with m.

This is due to the tradeoff between the number of detection cells and storage cells that are probably compromised and the fraction of event data possessed bythe compromised storage cells.

Suppose the attacker compromises s cells including the storage cell at time t0. He can come back at a timet1 in the future to obtain the event data from the storage cell, and then simply decrypt all the data that were detected by theses cells during t0 and t1. Assume that m cells will detect the event during t0 and t1 and the locations of these m cells arein dependent and identically distributed over N cells. On average, ms N out of s compromised nodes are detection cell sand they will provide the encryption keys. Hence, the FEPL of this scheme is simplyp1f�m;:Note that this formula holds after the attacker has compromised s cells and cannot compromise any more cells. We do not consider the FEPL during the process of compromising s cells. Because all information about one event is stored in one location, Scheme I is subject to a single point of failure. Furthermore, both the traffic load and resources for storing the information are not uniformly distributed among all the nodes.

Scheme II: Time-Based MappingIn

this scheme, all nodes store the event E occurring in the same time interval T (including a start time and an end time, the duration is denoted as jTj) into the same location �Lr; Lc� based on a group-wide shared key KT :

In addition, every sensor node maintains a timer which fires periodically with time period jTj. When its timer fires, a node derives the next group key KT � H�KT �. Finally, it erases the previous key KT .

Type 2 query.

An MS can answer the following query with one message: what is about the event E during the time interval T? This is because the information about E in T is stored in one location. An MS first determines the location based on KT; E; T, and then sends a query to fetch the data.

Security and performance analysis

Due to the use of the one-way hash function, an attacker cannot derive the old group keys from the current group key of a captured node. Hence, the locations for storing the events occurred during the previous time periods are not derivable. An attacker has to randomly guess the previous storage cells and detection cells for the event of his interest. The BEPL p2b �m; s� of the previous data is very complicated to derive because it depends on the spatial and temporal distribution of m detection cells, the number of previous storage cells for the event, which in turn depends on the number of previous key updating periods and the probability of hash collisions. For ease of analysis, we ignore the case where a cell serves as both a detection cell and a storage cell. Under this assumption, on average, an attacker can correctly guess s=N fraction of detection cells and s=N fraction of storage cells. Only when these detection cells are mapped to these storage cells can the attacker decrypt the encrypted data. As such Consider the case s � 40 and N � 400, the BEPL of Scheme II is 99 percent. From Fig. 2, we can see the BEPL of Scheme I under the same condition is slightly over 90 percent. Thus, Scheme II provides higher BEPL (i.e., higher backward privacy) than Scheme I. There are two cases for the FEPL. If the attacker changes the code of the compromised nodes such that in the future these nodes keep their detected event data locally, However, if the compromised nodes follow our protocol and hence do not keep a local copy of their data, the FEPL will increase. This is because in the future the event data might be forwarded to new storage cells that are not controlled by the attacker (who is assumed not to be able to compromise more than s cells).

4.2.3 Scheme III: Cell-Based Mapping

In this scheme, all the nodes in the same cell L�i; j� of the gridded sensor field store in the same location �Lr; Lc� the same type of event E occurring during a time interval T, based on a cell key Kij shared among all the nodes in the cell L�i; j�. Here, the old cell key to achieve backward event privacy. Second, since cell keys are also used for encryption, the updating of cell keys leads to the change of encryption key for the same event detected by the same cell but in different time periods.

Type 3 query

An MS can answer the following query with one message: has event E happened in cell L�i; j� during the time interval T? An MS first determines the location based on the key Kij; T; E, and the detection cell L�i; j� of interest, then sends a query to the cell to fetch the data.

Security and performance analysis

The updating of cell keys prevents an attacker from deriving old cell keys based on the current cell key of a compromised cell. Hence, the event data recorded in the previous periods are indecipherable irrespective of the number of compromised cells (the network controller however still keeps the older keys to decrypt previous event data). In other words, the BEFL of this scheme is, Scheme III provides the highest BEFL.

The FEPL p3f �m; s� of this scheme is the same as that in Scheme II. It can also be seen that this scheme is the least subject to the single point of failure problem compared to the previous schemes. Moreover, both the traffic load and resources for storing the information are the most uniformly distributed among all the nodes.

4.2.4 Comparison of Different Mapping Schemes

Above, we have presented three data-to-location mapping schemes with increasing privacy and complexity. These three mapping schemes certainly do not exhaust the design space, because we have three dimensions (time, space, and key) to manipulate. In the Appendix, we further introduce a row-based mapping scheme. In general, the higher the event privacy, the larger the message overhead for query. On the other hand, these schemes may be used simultaneously based on the levels of privacy required by different types of data. Next, we use simulations to compare the message overhead of the three mapping schemes: group-key-based mapping, time-based mapping, and cell based mapping. Message overhead is defined as the total number of transmission hopsof all the messages sent out by the detection cells toward their storage cells. The simulations were run for 20,000 time units in a DCS network with 20 _ 20 cells. In each time unit, 10 events are generated from randomly selected cells and a random event type ID (ranging from 1 to 3) is assigned to each event. After an event is sensed in a cell, the cell will calculate the storage cell coordinates based on the mapping schemes and forward a message toward it. Fig. 3 shows that the amortized message overhead (message overhead per time unit per cell) linearly increases with the number of events. We observe that cell-based mapping incurs a slightly higher message overhead than the other two schemes. Also, even when there are as many as 50 events happening in one time unit, the amortized message overhead is low, e.g., 1.2 in group key-based mapping and 1.39 in cell-based mapping. In Fig. 4, we use 3D plots to show the message overhead distribution over a plane of cells. We observe that the message overhead is the most balanced with the cell-based mapping scheme and the least balanced with the groupkey- based mapping scheme. In general, when message overhead is more balanced among all the cells, the network can have a longer lifetime. Note that we also change the time period jTj, the number of event types, and the event rate in each time unit. The message overhead distributions of these mapping schemes are similar. Finally, we briefly mention the memory usage of sensor nodes. Since sensed data have to be stored somewhere in the network, the overall memory requirement is the same in all these mapping schemes. But because the cell-based scheme involves most storage cells, intuitively it will best balance the memory requirement among sensor nodes. So we will expect similar memory usage distribution as the results

4.3 Key Management

So far we have seen several types of symmetric keys involved in pDCS. Now, we are ready to show the complete list of keys that are used in pDCS and discuss their purposes as well as efficient ways for management of these keys:

Master key.

Every node u has a master key Ku shared only with MS. Although master key is not explicitly used in the data-location mapping schemes, it is necessary to secure the communications between the MS and individual sensors. In our application, for example, when the node wants to report the misbehavior of another node in the same cell to MS, it may use the master key to calculate a message authentication code over the report, or when MS distributes a new cell key to a cell with a node to be revoked, the master keys of the remaining nodes in the cell can be used to encrypt the new cell

Fig. 3. Overhead comparisons among different mapping schemes.

Fig. 4. Message overhead distribution of different mapping schemes. (a) Group-key-based mapping. (b) Time-based mapping. (c) Cell-based mapping. key for secure key distribution. (The new cell key can be encrypted master key.)

Pairwise key. Every pair of neighboring nodes shares a pairwise key. This key is used for 1) secure distribution of keying material such as a new cell key among a cell or 2) hop-by-hop authentication of data messages between neighboring cells for preventing packet injection attacks.

Cell key. A cell key can be used 1) for encrypting sensed data to be stored in a storage cell, 2) for private cell-to-cell mapping, or 3) as a KEK for secure delivery of a row key. . Row key. A row key can be used 1) for private row to- cell mapping or 2) as a KEK for secure delivery of a group key.

Group key. A group key is used 1) for secure group to- cell mapping or 2) when MS broadcasts a secure query or command to all the nodes.

Of these five keys, four keys (except pair wise keys) can be organized into an LKH [28], [46], [47] data structure maintained by MS, as shown in Fig. 5. The first level key (i.e., root key) is the group key, the second level of keys are row keys, the third level of keys are cell keys, and the fourth level are master keys. The out-degree of a key node is Nr, Nc, Nij, respectively, where Nij is the number of nodes in cell L�i; j�. Like in LKH, every node only knows the keys on the path from its leaf key to the root key. Unlike in LKH where group members do not share pair wise keys, in our scheme, a node shares a pair wise key with every neighbor node. We will show shortly that pair wise keys help reduce the bandwidth overhead of a group rekeying operation for revoking a node.

Initial key setup. Next, we show how nodes establish all these types of keys initially. Pair wise keys can be established by an existing scheme introduced in Section 2.2. Group key and master keys are easy to establish by loading every node with them before network deployment. However, it might not be feasible to set up row keys and cell keys by preloading every node with the corresponding keys for large-scale sensor networks. For massive deployment of sensor nodes (e.g., through aerial scattering), it is hard to guarantee the precise locations of sensor nodes. If a node does not have the cell key for the actual cell it falls in, it will not be able to communicate with the other nodes in the same cell. To address this key setup issue, we need to establish row/cell keys after deployment. Based on real experiments, Deng et al. [48] showed that it is possible for an experienced attacker to obtain copies of all the memory and data of a Mica2 mote in minutes after a node is captured. Zhu et al. [49] showed through experiments that it takes several seconds for a node with a reasonable node density (_20 neighbors) to communicate with each neighbor and establish a secret key with each of them. As the number of message exchanged in a localization protocol [34] is no more than that in [49], in pDCS, we would assume that during the initial network deployment phase, a node will not be compromised before it discovers its location based on a secure location scheme [34], [50]. This assumption also holds if the initial deployment is monitored. With this assumption, our scheme works by preloading every node with the same initial network key KI . For a node located in cell �i; j�, it can derive its cell key as follows: Kij � H�KI; ijj�: �4� After this, it erases K from its memory completely. A row key can be established similarly as Ki � H�KI; i�. Key updating upon node revocations. pDCS does not include a mechanism for detecting compromised nodes although its key updating operation introduced below is triggered by the detection of node compromises. Instead, pDCS assumes the employment of such schemes

Suppose node u in cell L�2; 2� is compromised and its cell reports its compromise to MS. For example, a majority of the other nodes in the cell each computes a MAC over the report using their master keys. Since node u knows keys K22, K2, Kg, these keys will need to be updated to their new versions, say K022, K02 , K0g . Based on LKH, MS will need to encrypt each updated key with its child keys (new version if updated) and then broadcast all the encryptions. For example, the new group key K0g is encrypted by K0, K1, K02 , and K3, respectively, K02 is encrypted by K20, K21, K022, and K32, respectively, and K022 is encrypted by Kv0 , Kv1 , Kv2 , Kv3 , respectively. In general, Nr � Nc � Nij _ 1 encrypted keys will be broadcast and flooded in the network. Fig. 5. The mapping between physical network into an LKH and the rekeying packet flows for revoking node u. (a) A sensor network divided into cells. (b) An LKH (each dot denotes a key). (c) Demonstration of rekeying packet flows. Next, we present a variant of the above scheme, which incorporates two techniques to further improve the rekeying efficiency. The first technique is based on network topology. Instead of flooding all the keys in the network, MS sends them separately to different sets of nodes. This is based on the observation that nodes in different locations should receive different sets of encrypted keys. Suppose the node to be revoked is in cell L�i; j�. For nodes in row m �r 6� i�, they only need to receive the new group key K0g encrypted by its row key Km. Hence, MS only needs to send one encrypted key to the cell �m; 0�, and the key is then propagated to the other cells in row m. For nodes in row i, there are two scenarios. If the nodes are in column n �n 6� j�, they only need to receive K0g encrypted with K0i and K0i encrypted with the cell key Kin. Otherwise, if they are located in the same cell as node u, each of them needs to receive K0ij encrypted with its own master key. In these scenarios, MS sends Nc � Nij _ 1 keys to the cell �i; 0�, and the keys are then propagated in row i. Note that a cell can remove from the keying message the encrypted keys that are of only interest to itself before forwarding the message to the next cell. As such, the size of a keying message decreases when it is forwarded. Our second technique trades computation for communication because communication is more energy consuming than computation in sensor networks. It has been shown that the energy consumption for encrypting or computing a MAC over an 8-byte packet based on RC5 is equivalent to that for transmitting one byte. As such, instead of sending the Nij _ 1 encryptions of K0ij to the cell �i; j� across multiple hops, MS may send only one of the encryptions to a specific node (e.g., v0 in Fig. 5) and then request that node to securely propagate K0ij to the nodes but u using their pairwise keys for encryption. Key management performance analysis. Now, we analyze the performance of our rekeying scheme upon a node revocation. For simplicity, we define the performance overhead C as the average number of keys that traverse each cell durng a rekeying event

4.4.1 The Basic Scheme

Suppose an MS needs to send multiple query messages to multiple storage cells to serve a query. Due to the randomness of the mapping function, these storage cells may be separated by other cells. In the basic scheme, as shown in Fig. 6a, the MS sends one query message to each cell using a routing protocol such as GPSR [5]. Since each query message contains the query information and the ID of the destination storage cell, these query messages are different and have to be sent out separately. It is easy to see that this scheme has very high message overhead. Another weakness of the basic scheme is its lack of query privacy. Query privacy is measured by the probability that an attacker cannot find the IDs of the storage cells from eavesdropped MS query messages. In the basic scheme, since the MS has to specify the IDs of the destination storage cells, the query privacy of this scheme, denoted by P1, is P1 � 0. Fig. 6. Three schemes for delivering a query to the storage cells. (a) Basic scheme. (b) EST scheme. (c) BF scheme.

4.4.2 The Euclidean Steiner Tree (EST) Scheme

A natural solution to reduce the message overhead of the basic scheme is to organize the storage cells as a minimum spanning tree. In this way, the MS can first generate the minimum spanning tree which includes all the storage cells, and then send the query message to these cells following this minimum spanning tree. Although this solution increases the message size, it greatly reduces the number of query messages. Because a message includes many redundant header information, combining multiple messages can significantly reduce the overall message overhead. Similar to the basic scheme, the MS has to include the IDs of the destination storage cells in his query messages. Thus, the query privacy of this solution is still 0 To further reduce the message overhead, we can use EST which has been shown to have better performance than minimum spanning tree and is widely used in network multicasting. Fig. 6b shows an EST, which includes some cells other than the storage cells, called Steiner cells. Note that these Steiner cells can also help improve the query privacy because they add noise into the set of storage cells. With EST, the cell that the MS resides will be the root cell. The MS constructs a query message, which contains the IDs of the cells in the EST, and sends it to its child cells using routing protocols such as GPSR. When a cell head receives a query message, it reconstructs an EST sub tree by removing such information as its own ID and the IDs of its sibling nodes, and only keeping the information about the sub tree rooted at itself. Then, it forwards the query message with the EST sub tree to its child cell. This recursive process continues until each storage cell in the EST receives the query message. To construct an EST, we use a technique proposed by Winter and Zachariasen [9]. Since their solution may return a non integer Steiner cell, we use the nearest integer Steiner cell to replace the non integer Steiner cell. Let n denote the number of storage cells. With this solution, an EST spanning k �2 _ k _ n� cells, has at most k _ 2 integer Steiner cells, which means that at most 2k _ 2 cells are included in the Steiner tree.

4.4.3 The Keyed Bloom Filter Scheme

Bloom Filter. A Bloom Filter is a popular data structure used for membership queries. It represents a set S � s1; s2; . . . ; sn using k independent hash functions h1; h2; . . . ; hk and a string of m bits, each of which is initially set to 0. For each s 2 S, we hash it with all the k hash functions and obtain their values hi�s��1 _ i _ k�. The bits corresponding to these values are then set to 1 in the string. Note that multiple values may map to the same bit (see Fig. 7 for an example). To determine whether an item s0 is in S, bits hi�s0� are checked. If all these bits are 1s, s0 is considered to be in S. Since multiple hash values may map to the same bit, Bloom Filter may yield false positives. That is, an element is not in S but its bits hi�s� are collectively marked by elements in S. A Bloom Filter can be used to construct query messages. A basic approach is as follows: After an MS determines the location information of all the storage cells, it builds an EST and gathers the IDs of all the cells covered by the tree. The MS then inserts the IDs into a Bloom Filter, which is sent with other query information to the root cell of the EST using the GPSR algorithm (as shown in Fig. 6c). When a query message arrives at a cell, the cell checks the embedded Bloom Filter to determine which of its neighbors belong to the Bloom Filter, and then forwards the message to them. Recursively, every storage cell receives one query message. Using Bloom Filter for directed forwarding provides higher query privacy than EST. This is because Bloom Filter introduces some additional noise cells, including the non storage cells connecting the Steiner cells in the EST and a small number of noise cells caused by the false positive rate. KBF. In the Bloom Filter-based scheme, an attacker can freely check whether a cell is one of the storage cells although there could be a high false positive rate. To further improve the query privacy, we should disable the attacker's capability in performing membership verification over a Bloom Filter. This motivated our design of a KBF scheme, which uses cell keys to "encrypt" the cell IDs before they are inserted. In this way, an attacker can derive none or only a small number of cell IDs from a query message. As such, the attacker will have negligible probability to identify the storage cells other than randomly guessing. In the KBF scheme, each cell ID is concatenated with the cell key of its parent node in the EST before it is inserted into the Bloom Filter. Specifically, to insert cell ID x, the bits corresponding to Hi�xjkp� �i � 1; . . . ; k� are set to 1, where kp is the cell key of the parent of cell x. When a query message arrives at a cell, the cell concatenates its own cell key with the ID of each neighboring cell that is not a neighbor of its own parent node (to avoid redundant computation and forwarding), and determines whether the neighbor is in the Bloom Filter. If it is, the message is forwarded to the neighbor. Algorithms 1 and 2 formally Fig. 7. A Bloom Filter with k hash functions. describe the ways to create a Bloom Filter and to forward a query message, respectively.

Algorithm 1.

Create a Bloom Filter

Input: an array of storage-cell Cartesian coordinates c�_;

Output: Bloom Filter BF;

Procedure:

  1. initialize a Bloom Filter BF;
  2. build Steiner tree based on c�_;
  3. for each cell u in the Steiner tree do
  4. p � parent of u; kp � cell key of p;
  5. map �ujkp� into BF;
  6. end for
  7. return BF;

Algorithm 2.

Forward a query message

Input: a query message received by cell u, which includes

a Bloom Filter BF.

Procedure:

  1. ku � cell key of u 2: for for each neighboring cell u0 of u do
  2. if u0 6� parent of u ^ u0 6� neighbor of the parent of u ^ BF contains u0 then
  3. forward the query message to u0
  4. end if
  5. end for

Query privacy. In this scheme, cell IDs are "encrypted" with cell keys before being inserted into the Bloom Filter. If an attacker has not compromised any cells in the EST, he will not know any cell keys. In this case, he cannot obtain any information about storage cells from an eavesdropped query message. Next, we consider the case that the attacker has compromised some cells in the EST. If a compromised cell is contained in the EST, from the received query message, it can find out which of its neighboring cells also belong to the EST. However, it cannot verify the membership of the other cells. In fact, this is one prominent advantage of the KBF scheme over the EST scheme. To make the EST scheme more secure, a straightforward extension would be to encrypt the EST tree. To enable every cell in the tree to access the information for correct forwarding of a query message, a group key will need to be used to encrypt the EST tree. Thus, an attacker can decrypt the entire EST as long as he can compromise one cell. Clearly, the KBF scheme offers much better query privacy than the EST scheme. The query privacy of the KBF scheme and other schemes are compared in Section 5, and the results show that the KBF scheme has the highest privacy.

4.4.4 Plane Partition

The EST scheme reduces the number of query messages at the price of larger messages. The limited packet size, e.g., 29 bytes in TinyOS may prevent the MS to piggyback all the storage cell IDs together with the query information in a single packet. A Bloom Filter may be designed to fit in a packet, but to maintain a low false positive rate, only a limited number of cell IDs should be included in a packet. To address this problem, we use multiple Steiner trees, each of which is encoded into a single packet. Because partitioning a Steiner tree into multiple Steiner trees, known as the minimum forest partition problem, is NP-hard , we propose heuristics to perform the partition. In Fig. 8a, the solid lines are used to represent the EST tree, and the shaded areas along these solid lines are use by Bloom Filters to encode the EST tree. An intuitive partition method is to first cluster the storage cells in a top down and left-right fashion, and then build a sub-EST within each partition. We can let the EST scheme and the KBF scheme have the same partitions and build the same sub-EST trees. After the partition, the MS sends a query to each partition at the same time. In this way, the message size can be reduced. Further, since multiple queries are sent out at the same time, the average query delay is also reduced.

Fanlike partition method. With the intuitive partition, the query message from the MS has to go through some redundant cells. For example, in Fig. 8a, the query message of theMShas to go through manycells before reaching the top partition. To address this problem, we change the Cartesian coordinates into Polar coordinates. In this new coordination system, storage cells are within �__; __. The partition algorithm scans the plane from __ to _ and collects enough storage cells into each partition. Fig. 8b shows one example of dividing the plane into three partitions using the Fanlike partition method. The detailed description is shown in

Algorithm 3.

Algorithm 3. Fanlike partition method

Input: an array of Cartesian coordinates c�_, where s is the

size of the array and c�0_ is the cell that the MS resides;

Output: Partition Sets;

Procedure:

  1. initiate an array degree�_ to store the degree of each cell;
  2. for i � 1 to s do
  3. degree�i_ � tan_1�c�i_:y_c�0_:y c�i_:x_c�0_:x�;
  4. if c�i_ is in the 2nd quadrant then
  5. degree�i__ � _;
  6. end if
  7. if c�i_ is in the 3rd quadrant then
  8. degree�i_� � _;
  9. end if
  10. end for
  11. Sort all the cells according to their degrees, and then uniformly divide the cells into the specified number of partitions and put them into a set array A�_.
  12. return A;

Fig. 8. Seventeen storage cells are partitioned into three parts.

(a) Intuitive partition. (b) Fanlike partition.

4.5 MS Data Processing

Through the above query process, an MS can retrieve the message of his interest, which is encrypted by the cell key of the detection cell. To process the event, the MS needs to decrypt the message first. However, for preventing selective compromise attacks, in our design the ID of a detection cell is also encrypted. As such, the MS will try all the cell keys until the decrypted message is meaningful (e.g., including a source cell ID and following a certain format). The average number of decryptions is N=2. Though this may not be a big issue for a laptop-class MS, which can perform about 4 million en/decryptions per second [58], we will continue to design more efficient ways in our future work. Another concern in pDCS is the number of keys that have to be possessed by an MS when the MS needs to decrypt data from many cells. If we assume that the MS could not be compromised, we can simply load it with a single key, which is the initial group key KI . From this initial key, the MS can de ive the cell key Kij of each cell �i; j� as Kij � H�KI; ijj�. This is however dangerous if the MS could be compromised, because all the cell keys would be exposed. This problem can be relieved in the following way. Instead of applying its cell key for encryption directly, every node may first derive some variances of its cell key for specific events or time intervals using a hash function. The variance keys are then used to encrypt event messages. The MS will be loaded with the variance keys for the event of his interest. In case that the MS is compromised, the other variance keys are still secure.

Result analysis

In this section, we evaluate and compare the performance ofthree query schemes: the Basic scheme, the EST scheme and the KBF scheme. Inour simulation setup, each query message contains the query information and the encoded query path. The query information occupies 4 bytes which are used to represent time and event,2 and 25 bytes are used to represent the query path. For evaluation purpose, we do not consider the overhead of source authentication. In the EST scheme, the query path is encoded as a Steiner tree. Each node ID is presented by 2 bytes, so only 12 cell IDs can be encoded in each packet. In the KBF scheme, 25 bytes are used to encode the query path with Bloom Filter, and it is expected to achieve an acceptable false positive rate, say 0.1. Considering these limitations, we choose �n; k� � �20; 5�. These schemes are evaluated under various storage cell densities, ranging from 1 40 to 1: 2:5 . The storage cell density is defined as the ratio of the number of storage cells to the number of total cells in the plane. For example, with our setting of 20 _ 20 cells, a density of 1 10 means that there are about 400 _ 1 10 � 40 storage cells.Four metrics are used to evaluate the performance of the proposed schemes: the number of query messages, the average query delay, the maximum query delay, and the message overhead. The number of query messages is the total number of messages sent out by the MS for a query. The average query delay is the average of the query delays for different storage cells. The maximum query delayis the maximum among all the query delays. The messageoverhead is defined as the total number of transmitted hops of all the messages sent out by the MS to serve a query. In the KBF scheme, the message overhead also includes the extra messages due to false positive. As query messages are forwarded in the network in a hop-by-hop fashion, the number of query messages and message overhead also proportionally reflect the communication costs by the sensor nodes.

5.1 Choosing the Partition Method

In this section, we evaluate the performance of EST with intuitive partition and EST with Fanlike partition. As shown in Fig. 9, the Fanlike partition method outperforms the intuitive method in terms of average query delay, maximum query delay, and message overhead. We did not show the number of messages, since both schemes have the same number of messages determined by the packet size. As discussed earlier, in the intuitive partition method, each query message is sent from the MS to the partition, which may go through many redundant cells and hence increase the message overhead. However, in the Fanlike partition, less redundant cells are involved, and hence the message overhead is lower. This also explains why the Fanlike partition has lower average and maximum query delay when compared to the intuitive partition.

Some applications may require more bytes; nevertheless, since we are interested in the comparative results of multiple schemes, normally the payload size will not affect much. Further, the time should be in hour/ minute level instead of microsecond level, and hence only need less number of bits.

Fig. 9. Performance comparisons between different partitioning schemes. (a) Average query delay. (b) Maximum query delay. (c) Message overhead.

In Fig. 9a, with Fanlike partition, the average query delay drops as the storage cell density increases. This can be explained as follows. When the storage cell density is high, each partition is small. Therefore, the Steiner tree is limited within a small range and the zigzag paths from MS to storage cells tend to be shorter. This results in smaller average query delays. The aforementioned reason also explains the phenomenon that the maximum query delay decreases as the storage cell density increases for the Fanlike partition in Fig. 9b. However, when the density is very low � 1 40�, the intuitive partition has a little bit lower maximum query delay than the Fanlike partition. We checked the simulation trace and found the following reason. Due to the use of Steiner cells and that each packet is limited to 12 cell IDs, there are a very small number (one or two) of cells left into the second packet. These leftover cells tend to be faraway in the intuitive partition method but not in the Fanlike partition. As a result, the intuitive partition can achieve a slightly shorter maximum delay than the Fanlike partition method when the storage cell density is very low. We also evaluated the performance of the KBF schemeunder both partition methods. The results are similar to EST where the Fanlike partition performs better. Thus, we use the Fanlike partition method in the following comparisons.

5.2 Performance Comparisons of Different Schemes

This section compares the performance of three schemes: the Basic scheme, the EST scheme, and the KBF scheme. Fig. 10 compares the number of messages and the message overhead of the three schemes. As can be seen, both optimization schemes (EST and KBF) outperform the basic scheme since the optimization schemes combine several messages into one.Wecan also see that the message overhead of the KBF scheme is higher than the EST scheme although both schemes have similar number of messages. This is due to the fact that the query messages in the KBF scheme may go through some redundant cells due to false positive. Figs. 11a and 11b compare the average delay and the maximumdelay of the three schemes.Ascan be seen, the basic scheme outperforms the other two. This is because in the basic scheme, the query messages are sent directly to the storage cells in parallel along shortest paths, resulting in a lower query delay. Although EST and KBF can reduce the message overhead, the query delay is increased since the message has to go through many intermediate cells sequentially. As shown in Figs. 11a and 11b, when the storage cell density is low, KBF outperforms EST in terms of query delay. To explain this, we need to understand the effects of the number of partitions. When the number of partitions is small and hence each partition is large, the path to each storage cell is more zigzag like, which may result in long delay. As shown in Fig. 10a, when the density is low, EST has less number of messages and hence less number of partitions, which means that EST will have large partitions and long delay. Similarly, when the density is high, EST has more partitions and shorter delay.

In addition, as shown in Fig. 11c, the KBF scheme has the highest query privacy. Even after s � 20 cells have been compromised, the query privacy level is still above 83 percent. In summary, there is a tradeoff among query delay, message overhead, and query privacy. The Basic scheme has the lowest delay but the highest message overhead and the lowest query privacy. The EST scheme and the KBF scheme can significantly reduce the number of messages and the message overhead with the same level of query delay. Especially the query privacy level of KBF is far higher than the other schemes.

6 CONCLUSIONS

In this paper, proposed solutions on privacy support for DCS networks (pDCS). The proposed schemes offer different levels of location privacy and allow a tradeoff between privacy and query efficiency. pDCS also includes an efficient key management scheme that makes a seamless mapping between location keys and logical keys, and several query optimization techniques based on EST and Bloom Filter to minimize the query message overhead and increase the query privacy. Simulation results verified that the KBF scheme can significantly reduce the message overhead with the same level of query delay. More importantly, the KBF scheme can achieve these benefits without losing any query privacy. To the best of our knowledge, this is the first paper to address privacy issues in DCS networks. As the initial work, we do not expect to solve all the problems. In the future, we will address other issues such as source anonymity, and look into other query techniques to balance the tradeoff between query delay and message overhead. Techniques for initial key setup without relying on a short safe time period are also needed.