Nearest Neighbour Search On Encrypted Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Cloud computing services which enable designers and individuals to outsource the authority of their data to a service provider, and to reduce maintenance cost. The spatial queries act as an additional threat to privacy, because the query location may be comfortable to reveal sensitive information about the query. Only authorized user is allowed to access the query, even though service provider not able to view the query. This paper focuses on adducing a novel k nearest neighbor search on the encrypted database. It describes the strong location privacy, that renders a query identical in data space from any location. Due to communication cost in query processing, existing work fails to hold this search. We include a method that endeavors strong location privacy, by amalgamate Metric Preserving Transformation (MPT). Empirical results reveal the efficacy of the adduced methodology as compared to the related works.

Key words:

Protection, Query Processing, Security, Data Privacy, Encryption, Nearest Neighbor Search.


The positioning capabilities in mobile devices is location-based services (LBS), it is considered as the next killer application in the wireless data market. LBS allow client to query based a service provider in all around manner, concerning to retrieve detailed information about the point of interest. Association exploration has wide applications in different database systems, similar as multimedia databases, biological databases and many various systems, similarity is often defined as distance function. The most ordinary problem in this field is k- nearest neighbor which search to retrieve k most objects to the query object. The problem in distributed setting where consider as S it is distributed between multiple parties, which disclose their private data. These problems occur when data to be mined, for eg. Commercial secrets, proprietary, a national security effects and preserved private legal requirements. This process uses to develop the privacy preserving protocol for data mining. The particular domain is nearest neighbor performing privacy protocol it includes comparisons among chemical architecture, biology, a location privacy.

The authors of [10] proposed adequate secure protocols for this problem. This problem additionally identi¬ed some shortcomings of their solution, it includes the discharge information about private data and quadratic complication. Furthermore, we use techniques which allow for provably and efficient privacy applications of multi-step powerful approach is given in [8], the higher dimensional situations is performing an accurate distance among two objects and has low accurate but has particularly inexpensive distance function that have lower bound on actual distance. With privacy preserving benchmark kNN protocol and feature distance, our work can decrease the number of times which needed for application edit distance protocol. In our work, can can be used to improve the nearest neighbor algorithm that preserves the privacy, as KNN classification and outlier detection. In our work, we focus on k Nearest Neighbor queries concentrate at highest privacy, which has strong location privacy. The existing techniques can contribute a degree of location privacy. This can be classified into three concepts: Location confusion, metric preserving transformation and data transformation. In our proposed work, kNN search with strong location privacy has two main elements: MPT and query plan. We dispute that no method can support KNN query providing location privacy. Additionally confusion techniques the location based service can constrict the client in a subspace of overall domain. MPT approaches developed a simple query original, this process retrieves the database block from LBS without discovering that the block are retrieved. This original process can oppose to pattern attacks. Then, the client use to reduce a spatial query to a set of private retrievals.

To the best of our acquaintance, the MPT method deals with KNN process search, our proposing algorithm may involve a flexible number of MPT method block can retrieves the spatial query. Even though, each query retrieval is completing a private process, then the MPT approach can request KNN query process it may acknowledge the information, which is parallel to data transformation techniques. Therefore, this method disobeys the strong location privacy. Thus, it renders all queries equal in satisfying the strong location privacy. However, this process can handle single NN queries. Moreover, this approach can lead to computational process and communication cost is very small in POI databases, due to relies on an expensive MPT.

The remainder of this paper is organized as follows. Section 2 gives a deliberation on the related work. Section 3 presents the proposed approach to the problem and introduce a perceptive privacy. Section 4 presents the experimental results and Section 5 concludes the paper with pointers to future work.


Some of the notable research efforts have been made in the previous research work. Shashank et al. [7], this paper focuses on the content based multimedia retrieval is an excess of encrypted database it can be in both the query and database, which the document is encrypted and their privacy is protected. This method is proposed in this paper allow efficient and it retrieved directly in the encrypted domain, but it should be without multiple round of communication method between the user and server. They have to reveal the proposed method using the image, even though this method is valid to other multimedia such as video. Several algorithms were proposed [11,15], however they have to assign that n input value points the query point q as public. Currently consider the problem is dispersed setting where S is distributed among the several parties, who wish to release their private data to others. This occurs while data mining is sensitive, for eg : containing marketable secret, national security significance, to be reserved privately for legal supplies. This motivates the expansion of a private protocol for data mining.

The author [3], described a method of DBH for estimating nearest neighbor retrieval in arbitrary spaces. The DBH hashing method can create several hashing tables into which database object and query processing object are mapped. The formulation is applicable for distance measures and arbitrary spaces in the key feature of DBH. It is inspired by LSH, DBH can create a technique that allows some key concept in LSH that has to be applied in arbitrary spaces. The key value is DBH and LSH can be applied to locality responsive in hashing functions that can be demonstrated to exist, it can use a family of binary hashing functions that is defined as a distance based function and constructed in any space. This indexing recital cannot be analyzed using the geometric properties and optimization is based on sample data. In paper [9], they solve the problem of (PPGQ) privacy preserving graph query in cloud computing. This problem has to reduce the subgraph isomorphism and they adopt the competent standard of filtering and verification that can prune to many relative graph data as possible before the verification process. The feature based index is built for feature related sequence about each and every encrypted graph. Then, this process needs to select the most efficient technical inner creation as the pruning instrument to carry out the filtering procedure. This functionality is needed to gain the index construction in each and every associated binary vector as subindex, each bit value is represented as the parallel feature is a subgraph isomorphic to its data graph. Then the query graph is also defined as a binary vector in each bit value that is parallel to the query graph. This inner creation of the query vector is used to filter this negative value and do not contain the query graph. This k-anonymity model is defined in [13] and applied in the privacy preserving publication of the data set process. This can be generalized in least k tuples in the table value. But object cannot be distinguished from k-1 objects. Then, it is frequently used to simplify the medical records of patients so that the challenger cannot link a detailed medical record. But for some person-related data like DNA data, the majority of the metric data that composed of nature slightly than from people. Rakesh Agrawal and Jerry Kiernan [4] summarized the order preserving encryption system, in OPES, that allows queries with evaluation operators to be directly applied to encrypted numeric columns. The query results neither hold any false positive nor miss some answer tuple. The new values can be added without triggering changes in the encryption of additional values. The OPES is intended to function in environments in that the interloper can get contact to the encrypted database, however does not have priority in sequence such as the allocation of values and cannot encrypt or decrypt arbitrary morals of his choice. The authors [1], described a multi-step query processing algorithm for the KNN search using the lower and the upper bound in the filter step. Then, this process shows that is optimal algorithm can produce and maintains the minimum number of the candidates that needs to be refined. Due to this, they can use to generalize the R-optimality in the distance estimation that is available in the filter into a different account. On paper [6] discussed abut the revisit of the security problem in nearest neighbor. Then they show that the insecurity of existing solutions and main is the hardness of the problem. They designed a new model of partition based secure voronoi diagram method (SVD). This technique is used to secure the encryption function and its uses and it can secure several encryption schemes to be employed in the SVD techniques. Subsequently, the paper [14] described a first effort of the content based retrieval in excess of the encrypted database. By image database can be focused on the building protected search indexes that can be privacy for the image in the content from the server side and it can preserve the ability of same comparison. There are two types of securities in indexing scheme,

Secure is inverted index

The secure is minimized in the hash sketch method

By using this method they can achieve a good retrieval in performance over the encrypted image indexes and server is mentioned as good for privacy preserving method in multimedia retrieval. In [5], an easy but effective method has been planned to encrypt a search for index consisting of pairs. The objective is to let the parallel value to be retrieved condition and only if an applicant input is provided. The necessary plan is to encrypt absolute tuples, however connect with each tuple the one-way hash charge of its key. Consequently, refusal tuple will be retrieved if an unacceptable key is present. Attractive schemes are projected to hold keyword searches over an encrypted text repository is given in [7]. The driving application for this work is the ef¬cient retrieval of encrypted email messages. Obviously, they do not discuss relational queries and it is not clear how their techniques can be adapted for relational databases. In [12], a smart card with encryption and query processing capabilities is used to make sure the official and protected retrieval of encrypted data stored on untrusted servers. The smart card can interpret exact match queries into equal queries over encrypted information. Though, range queries require creating a disjunction for every possible value in the range, which is infeasible for real data values. The smart card implementation might bene¬t from our encryption plan in that variety queries might be translated into equivalent queries in excess of encrypted data.


In our proposed work, a location based service acquire a database and the client desire to issue a NN query in database without revealing its location. It constructs an index structure on the database. Then, LBS integrate database with index and arrange them in m which is disjoint databases , where m () count on the proposed solution, this decomposition will become clear soon. Each Is consists of a set of blocks Bi, 1, Bi, 2… all in equal size. Considering black box it implements a query original Qi, j 1 archive on DBi. Then, the results are denoted as Ci, j computed version of jth block DBi Qi, j and Ci, j are understandable only by client and secure hardware. Fig 1 shows overflow of our methodology.


In this method Q be the client KNN query. Client process uses to execute an algorithm, that process Q in informed multi step method. This algorithm is initially has set of blocks to retrieve privately for the location based services. Consequently, the client process uses to send the LBS parallel set Qi, j, Qu, v…… MPT queries. This process the query and send replies in Ci, j, Cu, v…. These block processes contain the index or results data, that algorithm is used to determine the blocks which is to be retrieved in the next step. Then, the above process is frequent until the collection of Q results. A KNN query process translates to MPT query. The two obligatory supplies for the security of KNN query.

Databases must be queried in the identical order

Database access in the category must involve the same number of MPT query, LBS must create a query processing due to these supplies.

The location based service creates Query processing in offline pre-processing stage and it's widely available. While generating MPT queries, the query algorithm at client side takes into description query processing. While completing this process QP has guaranteed the successful result retrieval in any query data space.

3.1 Threat Model and Protection

We assume that query algorithm, in our framework primary privacy is strong location privacy. In this process we do not look for database to protect. Hence, we suppose that database and index value are not encrypted. Our proposed model provides strong location privacy for the database. In MPT Qi, j and Ci, j do not reveal information about the parallel request block Bi, j to any client and the query processing force to every KNN query process in same MPT retrievals on the database in similar order. Thus, the KNN query process becomes identical.

3.2 BenchMark Explanation

We plan the solutions called benchmarks and computing a query processing is classifying to implement strong location privacy in databases. It describes a KNN algorithm.

3.2.1 KNN Algorithm

Let database is a POI database, then we defined P ∈ DB has it is the unique identi¬er of P, that are denoted as P's and P. tail where represents as additional database connected with P. Location based services creates a g-g grid matrix G, over the POIs. This process can construct two databases DB1, DB2 includes B1, i and B2, i blocks. It's used to determine by MPT protocol. Then, we focus on DB1. In every cell c ∈ G, the LBS process creates a block B, it stores an entry for POI in c, it has P.x, P.y both are similar meaning. Moreover, B cannot provide the entries of all POIs, and it LBS can create additional blocks which form a list in the B. Thus, the LBS process can store each cell in DB1. Then the further block is appended at the end of DB1.

The KNN algorithm process runs on the client and it consists of two phases. CPM is the first phase it implements the grid based KNN method. This process retrieves the cell from the grid in ascending order number from the query. These techniques is always leads to the optimal retrieves cell, that has circle centered at the query, through radius distance to kth NN. While the method is determines the cell It should be accessed and all blocks with .

This is possible because the client process can identify head block with In DB1 it is described as i-1, g+j. The KNN algorithm is determined as a second phase, its result is based on DB1 entries which is retrieved in the first phase. Hence, DB2 blocks locate the results tails using the ptr pointer process.

3.3 Transformation Methods

We present the transformation methods and the equivalent query processing methods. In our work we propose a three transformation method EHI, MPT, FDH.

They have various tradeoffs between data privacy and query cost and accuracy. Then, they're extensions of MPT and FDH in order to comply with the gap guarantee.

3.3.1 Encrypted Hierarchical Index Search (EHI)

In Encrypted Hierarchical Index Search (EHI) introduce a client algorithm for developing NN search on EHI, which is stored in the server process. In this technique it offers data privacy for the data owner, however this approach incurs the multiple communication round trips throughout the query processing. Query Processing

There is tree index method which is stored on the server and its encrypted, then the server cannot process the NN query by itself. A algorithm is developed to answer the NN query process correctly for communication among the server and clients' needs. The hole response time of algorithm contains round trip latency and data transfer time. And these two measures look for analogous to time and relocate time in hard disks. The data relocate time is minimized by best first NN search algorithm. Though, this process needs to send a message to the server in each time while a node is requested, this process would acquire very high in round trip latency. Moreover, it is used to enhance the best first NN search algorithm in client-server design.

3.3.2 Metric Preserving Transformation (MPT)

In our proposed work, we introduce approaches which shift search functionality to the server. Our proposed work Metric Preserving Transformation stores qualified information on the server, along respect to a private set of anchor objects. This technique guarantees correctness of the final search result, however at the cost of two round communication. Dissimilar the EHI method and MPT acquire only 2 rounds of communication in the query phase. The fundamental idea behind the MPT is a small subset of dataset T and the set of anchor objects is assigned to all objects of T in nearest anchor. Then, all object pointers have to compute its distance dist from its anchor ci need to apply an order preserving encryption function OPE on the distance value. Then, the order preserving encrypted distance is used to store on the server and it utilized for processing NN queries. Transformation In MPT

Algorithm process the pseudocode for implementing MPT from the input dataset T. The A is also denoted the number of anchor objects. In our proposed we can concern the M-tree to select a set of anchor A from T. In this heuristic is aims to optimizing the quality of anchors. In next line 2, we have to calculate the value of B its defined as the maximum number of the objects, that can be assigned to the equal anchor. Subsequently, we can apply a heuristic have to allocate each object to its anchor objects. Next, we observe each anchor object. Ci denotes the ith anchor object and set the value ci, the value S is assigned as set of objects. The total anchor value is covered by radius ri, that can be denoted the maximum distance from ci to any object value it set as ci : S. In query processing anchor distance plays an important role. We compute distance dist (ci ; t) from anchor, for each objects p in set of ci : S and need to apply an order preserving encryption function OPE on (dist (ci; t)) and encrypted object is ECR (t: CK) it will send to server. The profit of using OPE is hiding the inventive distance and then allows comparisons correctly to evaluate at the server side process.

Algorithm: MPT Building Algorithm for Data Owner

3.3.3 Flexible Distance-based Hashing (FDH)

In Flexible Distance Based Hashing we propose a hashing techniques for processing the NN query. The main advantage of this FDH technique is a server, which always return a set of constant sized candidate set. Client process then refines the candidate to gather the final results. Moreover, FDH techniques not guaranteed to return the exact final results, that is close to the actual NN practice. In this query processing, the FDH allows the client side process to specify the integer limitation for increasing the accuracy value of query result, server can be stored without rebuilding the data transformed. In our earlier work development query accuracy cannot be built the index structure.

In our proposed work, FDH techniques can employ the novel method for linking same hash buckets and it has to minimize the transformed data for answering queries.


In order to test the results of our adduced work, we organized several executions on the synthetic sequence of tasks. We proposed the methodology with the model of nearest neighbor search on the encrypted database. We include real data set methods, a measurement and parameters. Our concept includes the effect of various factors on quality and performance of the distinct methods.

Fig 2: Time Complexity for Yeast

Fig 2 depicts the ranking of results kNN on the datasets YEAST. Recognize that the result rank of MPT is better than EHI and FDI in the time complexity for YEAST. Considering time complexity, largest ranks for EHI much more than FDI, MPT. Moreover, a certain time has a large rank on the dataset EHI. According to our proposed work MPT is better in time complexity for YEAST.

Fig 3: Query Cost for YEAST

Fig 3 exhibits the effect of deviating the number of anchors that are used in the EHI, MPT and FDH techniques. In this graph, results show the effects on query cost for YEAST. We recognize a regular improvement in query cost with increasing the numbers of anchors. This affords more information on the location of objects due to more anchors. Moreover, it's reflected in the approximation quality of EHI that improves with increasing number of anchors.

Fig 4: Time Complexity for MUSH

Fig 4 depicts the ranking of results kNN on the datasets MUSH. Recognize that the result rank of MPT is better than EHI and FDI in the time complexity for MUSH. Considering time complexity, largest ranks for EHI much more than FDI, MPT. Moreover, a certain time has a large rank on the dataset EHI. According to our proposed work MPT is better in time complexity for MUSH.

Fig 5: Query Cost for MUSH

Fig 5 exhibits the effect of deviating the number of anchors that are used in the EHI, MPT and FDH techniques. In this graph, results show the effects on query cost for the MUSH. We recognize a regular improvement in query cost with increasing the numbers of anchors. This affords more information on the location of objects due to more anchors. Moreover, it's reflected in the approximation quality of EHI that improves with increasing number of anchors.


In this paper, we derived an adept methodology for nearest neighbor search on encrypted databases, that distribute a query corresponding from any location in the data space. We propose amalgamate solutions that decompose a kNN query into a series of database block retrievals. In our proposed work, we introduce a Metric Preserving Transformation (MPT) approach, which stores the relative information on the server along private set of anchor objects. The cost of two round communication in MPT guarantee correctness of final search result. Then, the Flexible Distance-based Hashing (FDH) methods ¬nishes a single round of communication, it does not guarantee retrieval of the exact result. The experimental evaluation shows that the approximation is very close to the exact result.