Keyword Based Search Over Outsourced Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Cloud computing is one of the best beguiling technology domains referable to its cost-efficiency and flexibility. Therefore, outsourcing data to the cloud servers become much reliable. But unencrypted data that are stored in the cloud are insecure so it is more in force to encrypt the data before outsourcing. Many encryption techniques are used; merely searching on encrypted data is a difficult task. So many algorithms where used for searching the encrypted content to make much efficient for the user, while retrieval of data. Keyword based search on encrypted data is much efficient and fast. The survey is done on searching algorithms to find out the best efficient methodology for searching the encrypted data that have been outsourced.

KEYWORDS:-Cloud, Encrypted data, Keyword based search, Outsourced data, Survey.


Cloud computing, the most trendy computing in information technology where everything is based on on-demand service and pay-for-use service. It is bringing of computing services such as SaaS, PaaS and IaaS over the internet that are supervised by arbitrator at outback locations. Many applications such as emails, file storage, business data, etc. are outsourced to cloud server. Outsourcing unencrypted data to cloud by the owner is not much secure because server may leak information to cyberpunks. Thence encryption plays a major role before outsourcing the data into the cloud server. In spite of encrypting, retrieval of data becomes an intriguing task when searching has to be made on vast data. The best way is to use keyword based search on encrypted data for data concealing.

Many searchable techniques has been proposed on the basis of keyword search .Discussion is made on the existing techniques that are been intend by many authors .This paper analyses the algorithms for searching the encrypted content. Survey is made on these algorithms based on the working principle, merits and demerits. It also compares the complexity, efficiency overhead of various algorithms and shows which technique is better to handle while retrieving the encrypted content.


Symmetric Key Cryptography

Symmetric key cryptography works by encrypting each word in a file using two layered encryption construction. Probabilistic searching is made on the encrypted data which deals with sequential scan and indexing methodologies Provable secrecy, controllable searching, hidden queries, query isolation (D.Song et al.,2003) are the four techniques which makes the algorithm efficient, simple and fast. Sequential scan meets all the above techniques but it is not effective when searching is made on huge data content. Therefore to induce searching pre-computed index plays a vital role which support advanced search queries. But to make indexing technique secure, Secure Index data structure can be used which admits queries with a trapdoor. It is semantically secure and practicable in multi-user settings where indexes are updated frequently on the remotely located server.

Public Encryption Keyword Search (PEKS)

PEKS, searchable encryption technique which corresponds to symmetric key encryption .In this, file is encrypted using public key by the people who wants to store in the server but the authorized users can search a file using their private key. Fig 1 illustrates the working principle of PEKS. The four algorithms are used for this technique. First, keyGen is used to generate public key and private key pair for both server and user. Second, PEKS algorithm produces searchable encryption. Third, Trapdoor algorithm is used to calculate trapdoor with private key and keyword. Fourth, Test is used to match the keyword and requested word. If matches then the file are sent to the user .PEKS means Identity Based Encryption (IBE) which has major advantages such as

Effective system based on Diffie-Hellman problem

Limited system based on general trapdoor definitions

This scheme fails regarding access policy and dictionary attack. The major disadvantages are trapdoor contains meaningful keywords and one-one mapping takes place between trapdoor and keyword.

Figure 1. PEKS Working

Hidden Vector Encryption (HVE)

HVE supports continuative queries (Boneh et al., 2007) whereas PEKS supports only comparison and subset queries. In which it works with four algorithms namely Setup, Encrypt, GenToken and Query .First, Setup creates a bilinear group of elements using random primes and random elements, Second, Encrypt chooses the random element and using public key it encrypt the contents in a file. Third, GenToken will generate the token for the predicate using a secret key. Fourth, Query finds the keyword from the cipher text and if matches it return the file. Even so HVE fails for disjunctive queries because cipher text is linear to attribute.

Attribute Based Encryption (ABE)

ABE affirms one upload many download policy most formally PEKS and HVE does not support. ABE uses access policy while searching on encrypted data with its Boolean expressions. It works on the basis of nine algorithms .The first is the Setup algorithm used to compute secret key and master key by the trusted authority. Second, KeyGen algorithm is used to generate the public/private key pair. Fourth, fifth and sixth algorithm such as PseudoGen, Encrypt, AttrScm are used for outsourcing the data using cryptographic primitives such as access structure, bilinear maps and attribute scrambling procedure. Seventh, eighth and ninth algorithm such as query, retrieve and decrypt is mainly for the retrieval of data. In which query algorithm works as the retriever take on pseudonym list from the cloud service provider and receiver sends the scrambled index to the CSP. Then the CSP checks whether the request made by the retriever and the encrypted index stored are same by using Retrieve algorithm. If it matches decrypt algorithm works where the encrypted data are decrypted and sent to the retriever. It provides best quality for searching over encrypted data and faster in accessing.

Predicate Privacy Preserving in Public Key Encryption

Predicate privacy preserving search on keyword get the better of PEKS by using randomization technique. In which keywords are randomized and therefore trapdoors does not provide any meaningful keywords. The user and the receiver share a secret key which is not logical when there are huge number of users .To make tolerant of guessing attacks, two framework were introduced namely PEKSrand-BG for brute-force guessing and PEKSrand-SG for statistical guessing.

PEKSrand-BG provides a proxy server which in advance processes the PEKS cipher text from the sender. The working principle illustrates in fig 2.

PEKSrand-SG has two methods Proxy Farm and Random Walk, in which several proxies can be maintained for storing the secret key and indirect mapping between keyword and trapdoor respectively.

Therefore overall communication, computation and storage overhead are sensible when predicate privacy made in PEKS.

Fig 2.PEKSrand-BG working principle

Privacy Preserving Keyword Search

It is a multi-round protocol between server and user on single keyword .It uses per index file where each document contains a keyword. The keyword index is encrypted using pseudorandom bits using heuristic pseudorandom functions. On the setup phase user chooses a random secret key to encrypt the file. Then the user submits index and file content to server .On the retrieval phase, when the user wants to search or retrieve file from the server, user retrieves the index file and then computes keyword with the secret key .The computed key is sent to server, where server matches the file and then sent to the user .The per-index file scheme using pseudorandom functions is the better than using bloom filters. This scheme fails when multiple keywords are used.

Secure Privacy Preserving Keyword Search (SPKS)

SPKS allows cloud service provider to decrypt the data and return file containing keywords. This technique overcomes the computation and communication overhead, provides query and data privacy for the users. It figures out six algorithms for efficient searching on encrypted data. Fig. 3 shows the working principle. First, KeyGen used to generate a public/private key pair. Second, EMBEnc&KWEnc encrypts all the content in the file and keywords are encrypted respectively which then stored in the server. Third, Tcompute used on the retrieving phase where user generates a trapdoor and pass it to CSP. Fourth, KWTest checks whether the keyword contain in the encrypted data. Fifth, PDecrypt mainly for CSP to decrypt the intermediate result partly and sends the cipher text and the partial decrypted content .Sixth, Recovery runs by the user to decrypt the plain text. Therefore it provides semantic security in plain text attack.

Fig 3. SPKS Working Principle

Authorized Private keyword Search (APKS)

APKS deals with multi-keyword search while above techniques conducts with single keyword which misses query flexibility and efficiency (M.Li,2011). In fine-grained authorization framework, every user obtains searching capabilities authorization from Local Trusted Authority (LTA). Hierarchical Predicate Encryption (HPE), a cryptographic primitive uses attribute hierarchy for simple range queries.

APKS based on HPE

The following steps are required while searching takes place in encrypted data using multi-keyword

Multi-dimensional query are converted to its CNF (Conjunctive Normal Form) formula.

Attributes are defined in a hierarchical way. i.e., attribute hierarchy.

Indexes and capabilities are generated by GenIndex and GenCap algorithm respectively.

When the user wants to retrieval a file using a keyword from LTA, LTA checks whether the user has an attribute value set and if it matches then user can retrieval the file from the server. But the major disadvantage is APKS also does not prevent keyword attack.

APKS+ adds a secret key while encryption and decryption takes place which hides the data from the attackers. Therefore it prevents dictionary keyword attack and accomplishes index privacy and query privacy.

Fuzzy Keyword Search

It enhances system usability when searching input exactly matches. Keywords are measured using edit distance and fuzzy keyword sets are constructed .Straight forward and wild card based are the two approaches are dealt with edit distance .In straight forward approach edit distance are calculated where all the forms of keywords are to be listed .Based on this indexing is built .Trapdoor are shared between user and the owner .While retrieving file user computes the trapdoor based on the request, server compares with index table and return all potential identifiers. The major disadvantage is large storage is needed and so lack of efficiency .Wild card based approach overcomes the disadvantage by building a wild card fuzzy sets which calculates edit distance , keyword takes place at the same position are put together in a set.

The above all techniques based on searchable encryption supports only Boolean search which has two major drawbacks. They are,

Retrieving the file based on the keyword user wants to decrypt every file that contains the keyword to match their file.

Retrieving all files that contains the keyword leads to network traffic.

Ranked Keyword Search over Encrypted Data

The major disadvantage of above mentioned techniques gets the better of in ranked keyword search. Ranked Searchable Symmetric Encryption (RSSE) framework is used to support rank search which built over the SSE cryptographic primitive. Four algorithms are used namely KeyGen, BuildIndex, TrapdoorGen, SearchIndex. Two phases such as setup phase which uses KeyGen algorithm for generating public/private key pair and BuildIndex algorithm to generate index file containing keywords educed from file. The file collection and the index file are outsourced after encryption with frequency based relevance score. While the retrieval phase uses TrapdoorGen algorithm generates a trapdoor using the user’s request. Upon the user request server runs SearchIndex algorithm which searches the files based on ids and relevance scores and sent files to the user. But RSSE has huge communication overhead when ranking is on user side and two round trip time is taken .Therefore efficient RSSE frame work uses Order Preserving Symmetric Encryption Scheme (OPSE) (C.Wang,2012). It supports deterministic property in which TapeGen (.), a random coin generator and HYGEINV (.) for sampling function are implemented .OPSE is used instead of encrypting scores in RSSE and in retrieval phase OPSE values are much more relevant. They provide better efficiency while retrieving files with top-k retrieval.


In this paper rigorous analysis is made on encryption techniques and search based retrieval of files from the outsourced encrypted data. We first put forth many searchable encryption schemes based on single keyword and multi-keyword search. Many disadvantages have been focussed on these techniques since they rely on Boolean expressions. Therefore rank based retrieval of data has been discussed which proves the data security, fast search access and does not leak information to untrusted authorities. This paper concludes rank based retrieval is most efficient for searching on encrypted data.