A cloud based system is the one of the widely used web based distributed system. The cloud is available is an architecture that provides reusable products and the services to the public or the private environment. In the existing work, the author has defined a keyword based analysis on cloud data and based on keyword search over cloud the ranking is assigned. The work also includes the rank based search on secure clouds, it means the work is been performed on encrypted data. In this present work a search engine based work is performed on public cloud to find the best service cloud based on user query. The presented work is slightly different from it, in this present work the search is performed on public cloud. The search mechanism includes the keyword based search as well as the parametric analysis of the cloud. The parameters includes along with keywords are the response time, availability etc. The user will pass the input respective to the user cloud requirement and based on the query the ranking rule will be defined. The results will be driven in terms of parametric satisfaction as well as ranking order based on the weightage of these parameters.
This is to certify that Ramandeep Kaur has completed M.Tech dissertation proposal titled "A
Cloud based search engine to perform the rank based cloud server selection" under my guidance and supervision. To the best of my knowledge, the present work is the result of her original investigation and study. No part of the dissertation proposal has ever been submitted for any other degree or diploma.
The dissertation proposal is fit for the submission and the partial fulfilment of the conditions for the award of M.Tech Information Technology.
Date_____________ signature of Advisor
Success comes to those who strive for it. To achieve one's goal one puts in a lot of hard work and efficiency. Unless you venture into real world you never know, how lame and in fecund your efforts could without help of various and how tough the real world environment is. And even tough work is enumerating and insisting all the individuals whose contribution went into making of this project.
I also take the privilege to pay our deepest appreciation and heartiest thanks to Mr. Maninder Singh whose constant guidance is unbound source of inspiration for me. He is responsible for involving me in the dissertation in the first place. He showed me different ways to approach a research problem and the need to be persistent to accomplish any goal.
I take this opportunity to express my profound sense of gratitude to the support given to me by Lovely Professional University, Phagwara. I also acknowledge the encouragement and moral support received from my family.
TABLE OF CONTENTS
List of Figuresâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦...
List of Tablesâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦.
List of Abbreviationsâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦
Chapter1. Introductionâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 1
Characteristics of Cloud Computingâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦. 1
Search Engineâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 2
Chapter2. Review of Literatureâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦.. 4
2.1 Related Workâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 4
Chapter3. Present Workâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦. 7
3.1 Scope of Studyâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦. 7
3.2 Problem Formulationâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦..................... 7
3.3 Objectivesâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦.. 8
3.4 Research Methodologyâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦. 8
3.4.1 Formulation of Hypothesisâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 8
3.4.2 Sources of Dataâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦. 8
3.4.3 Research Designâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 9
3.4.4 Research Toolâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦10
3.5 Expected outcomeâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦.11
Chapter 4. Work Planâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦..12
4.1 Gantt Chartâ€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦12
LIST OF FIGURES:
Figure 3.4.3 Design view for ranking............................................................ 9
Figure 3.5.1 Intermediate outcome of approach........................................... 11
Figure 3.5.2 Final outcome of approach....................................................... 11
LIST OF TABLES
Table 4.1 Gantt chart............................................................................ 12
LIST OF ABBREVIATIONS
URL- Universal Resource Locator
API- Application Programming Interface
PC- Personal Computer
This chapter focuses on the basics of cloud computing, characteristics of cloud computing and about search engine. A cloud computing is the new trend of computing where readily available computing resources are exposed as a service. These computing resources are generally offered as pay-as-you-go plans and hence have become attractive to cost conscious customers.
A cloud is in which service providers offer their resources as a service to the general public. Public clouds offer several key benefits to service providers, including no initial capital investment on infrastructure and shifting of risks to infrastructure providers. However, public clouds lack fine-grained control over data, network and security settings, which hampers their effectiveness in many business scenarios. A public cloud is offered as a service, usually over an internet connection.
Public clouds typically charge a monthly usage fee per GB of data, combined with bandwidth transfer charges.
Users can scale the storage on demand and do not need to purchase storage Â Â hardware.
Service providers manage the infrastructure and pool resources into capacity that customers can claim.
1.2 Characteristics of Cloud Computing
Cloud computing has a variety of characteristics as follows :
Shared Infrastructure: Uses a virtualized software model, enabling the sharing of physical services, storage, and networking capabilities. The cloud infrastructure, regardless of deployment model, seeks to make the most of the available infrastructure across a number of users.
Dynamic Provisioning: Allows for the provision of services based on current demand requirements. This is done automatically using software automation, enabling the expansion and contraction of service capability, as needed.
Network Access: Needs to be accessed across the internet from a broad range of devices such as PCs, laptops, and mobile devices, using standards-based APIs. Deployments of services in the cloud include everything from using business applications to the latest application on the newest smart phones.
Managed metering uses metering for managing and optimizing the service and to provide reporting and billing information. In this way, consumers are billed for services according to how much they have actually used during the billing period.
1.3 Search Engine
Search engine has come to life along with the development of the information age. Search engines, however, are the users ofÂ webÂ crawlers but other users are also available such as:
Search engines useÂ webÂ crawlers to collect information about what is available on publicÂ webÂ pages. Their primary purpose is to collect data so that when Internet surfers enter a search term on their site, search engine can quickly provide the surfer with relevantÂ webÂ sites. When aÂ search engine'sÂ webÂ crawler visits aÂ webÂ page, it reads the visible text, the hyperlinks, and the content of the various tags used in the site, such asÂ keywordÂ rich Meta tags. Using the information gathered from the crawler, a search engine will then determine what the site is about and index the information. The website is then included in the search engine's database and its page ranking process
Linguists may use aÂ web crawler to perform a textual analysis; that is, they may comb the Internet to determine what words are commonly used today.
Market researchers may use aÂ webÂ crawler to determine and assess trends in a given market. There are numerous immoral uses ofÂ webÂ crawlers as well.
In the end, aÂ webÂ crawler may be used by anyone seeking to collect information out on the Internet. WebÂ crawlers may operate one time only, say for a particular one-time project. If its purpose is for something long term, as is the case with search engines, they may be programmed to comb through the Internet periodically to determine whether there has been any significant changes. If a site is experiencing heavy traffic or technical difficulties, the spider may be programmed to note that and revisit the site again, hopefully after the technical issues have subsided.
Web crawling is an important method for collecting data on, and keeping up with, the rapidly expanding Internet. A vast number ofÂ webÂ pages are continually being added every day, and information is constantly changing. AÂ webÂ crawler is a way for the search engines and other users to regularly ensure that their databases are up to date.
From this chapter we conclude that Cloud computing provides shared access to resources on the internet in a scalable and simple way.
Review of Literature
One of the major requirements over the web is about the selection of best service and service provider over the web. When we talk about cloud service the work is more specific and the parametric. Many researchers performed a lot of work in the same direction. The work performed by different researchers is shown in this chapter.
2.1 Related Work:
Byron Y-L. Kuo, Thomas Hentrich, Benjamin M. Good and Mark D. Wilkinson  (2007) suggest tag based summarization approach for the web search. The presented work is suggested on the public cloud. In which the integration of the web architecture and the database extraction is integrated. The work includes the refinement of the user query based on the cloud tags. The words extracted from the query are been summarized and this summarized query is passed to the public cloud. The cloud interface enabled the extraction of new and required information.
Georgia Koutrika, Zahra Mohammadi Zadeh and Hector Garcia-Molina  (2009) presented a data cloud in which cloud search is performed on the basis of query summarization approach. The work presented by the author is a structural work in which the keyword extraction and the summarization is performed by the researcher and on the basis of this navigation and visualization of the data is suggested. The implemented work is based on the basis of tag assignment to different kind of keywords and on the basis of these tags a query refinement is been performed. Finally a flexible search over the database is performed to derive the final outcome. The result analysis is based on the basis of effectiveness and efficiency of the cloud services.
Georgia Koutrika, Zahra Mohammadi Zadeh and Hector Garcia-Molina  (2009) performed a query refinement model based on the summarization. Now based on this summarization the query is presented to the web architecture and relatively the search is performed for a reliable and effective cloud service .
Wei-Ying Ma  (2009) suggests multimedia search for the cloud architecture. In this work different multimedia services are suggested such as client PC, phone, TV etc. On the basic of the knowledge based search is performed to retrieve the multimedia analysis and will perform the search respective to the client request for the particular multimedia service .
Hang Guo, Jidong Chen, Wentao Wu and Wei Wang  (2009) presented personalization architecture for the cloud services. The work includes the individual access to the cloud to perform the user query. The author work is presented in two main parts one is client side and other is cloud side. The client side basically fetches the periodic information from the system where as the cloud data search engine presents the data for the modeling.
Cong Wang, Ning Cao, Jin Li, Kui Ren, and Wenjing Lou  (2011) has presented a ranked keyword search on secure clouds. The work is performed on the encrypted data. The author has divided the work based on symmetric key based analysis on secure cloud. Each cloud is been filtered by extracting the keywords and on the keyword analysis the cloud ranking is assigned. Different parameters considered in ranking include number of keywords, Number of files, ratio of occurrence of keywords etc. The work gives the guaranteed security as the work on encrypted data.
Venkateshprasanna H.M, Rujuswami D. Gandhi, Kavi Mahesh and J. K. Suresh  (2011) presented a work on enterprise search on the tag cloud. The tag is the information based on the keyword classification. It basically provides the categorization of the cloud based on its role in the business environment. On the basic of this information the knowledge criteria is defined respective to the enterprise system. In this work a novel approach is suggested based on the automated selection of the cloud on enterprise query system. The presented system is content based and integrated to the search system.
Dimitrios Skoutas and Mohammad Alrifai  (2011) presented tag oriented cloud work. In this work the summarization is suggested to perform the keyword summarization. Author proposes a set of evaluation metrics that consider the use of tag clouds for search, navigation and recommendations.
Ju-Chiang Wang, Yu-Chin Shih1, Meng-Sung Wu, Hsin-Min Wang2 and Shyh-Kang Jeng  (2011) presented a content oriented tag based search for the database search. In this work the music database is selected for the query analysis. In this work the multiple levels of preferences are defined based on desired clouds. In this work the query performed by the user is analyzed and divided to different colors or the levels to perform the effective content based retrieval. In this work the music retrial is been proposed. The probabilistic fusion model is defined based on Gaussian mixture model and the multinomial mixture model. The author evaluates the proposed system for the effectiveness of the user query and the related results.
Cengiz Orencik and Erkay Savas  (2012) performed presented a rank based keyword search on the data cloud. In this work the document retrieval is performed on the cloud server based on the keyword analysis and the information search is performed relative to the defined information. The presented work is performed on the encrypted data that has improved the security and the reliability of the retrieval. On this basis a secure protocol is suggested called Private Information Retrieval. The system will performed the query and present the final results on the basis of parametric ranking. The presented work is the efficient computation and communication of the requirement analysis.
Mathew J. Wilson, Jonathan Hurlock and Max L. Wilson  (2012) performed a work based on web search engine based for the keyword cloud. In this work the clouds are represented by some tags called the metadata. The metadata defines the cloud with relative parameters in terms of its security, efficiency and the reliability criteria's. On the basis of this the keyword match is performed on different cloud keywords. The work includes the learning stage for the keyword extraction and the comparative analysis is performed to extract the related cloud services from the system.
Daniel E. Rose  (2012) suggested cloud search based on the information retrieval. The author presented his work on Amazon cloud service. The work is tested under different criteria's such as scalability, configuration etc. The presented search reduces the barrier to allow a person or the organization to perform the content oriented search and the search is tested under the enterprises environment as well as on web search.
From this chapter, we conclude that till now, ranked search is done on data clouds to retrieve documents and search is keyword based or tag based.
Cloud is one of the major parts of Web architecture. The cloud architecture is widely spread in the form of public cloud and available to all the users. There are number of cloud service providers. When a user pass a query for a specific cloud service with well specified requirements then a search engine based work are suggested in this research proposal. The presented work is the agent based work performed by the middle layer to perform service level agreement
3.1 Scope of Study
The presented work is a public cloud based search engine that will accept the user query and perform the keyword extraction on it. These all keywords will perform the parametric matching on different cloud services and perform a rank based selection on cloud services. The presented work is an effective search that gives the better suggestion to the user to so that the user can get better choice for the services. As the approach is rule based the results are more reliable. The work will perform a segmented search that will also increase the efficiency of the search mechanism.
In the existing work, the author has defined a keyword based analysis on cloud data and based on keyword search over cloud the ranking is assigned. The work also includes the rank based search on secure clouds, it means the work is been performed on encrypted data. In this present work a search engine based work is performed on public cloud that is open to all servers with respective service description to find the best service cloud based on user query.. User will pass a query respective to the required services. This query will be decomposed by the Service provider at middle layer and find the list of eligible clouds. Now a ranking model will be implemented to identify the cloud servers that can provide the relative services. The ranking criteria will include
Highest weightage will be give to the availability parameters
Second level weightage will be given to the response time of the services
Number of users recommended the services is also be considered while assigning the ranking
Numbers of shared resources are also considered
The presented work will perform the following research objectives
Study and analysis of Existing Search Engine work and the rank assignment approaches.
Creation of cloud service database with relative URL and metadata.
Design of summarization approach to extract the keywords.
Design of ranking algorithm based on cloud search parameters.
Analysis of Result.
The proposed work is about to optimize the cloud search with the concept of rank based selection approach. For this a new architecture is proposed, this architecture will use the keyword summarization approach to detect the related cloud services. The system will perform the search on public cloud based on user query and metadata analysis.
3.4.1 Formulation of Hypothesis
The hypothesis is about defining some research questions that are defined by the researchers to prove the result more effective and reliable. In this present work we have also defined some such research questions. These questions include
Is the presented keyword extraction and summarization approach is more flexible and optimized then other keyword extraction approach?
What parameters will be selected to perform the effective ranking?
Is the work is significantly enough to perform query on cloud architecture.
3.4.2 Sources of Data
In this proposed work we require data in the form of multiple clouds and the web services. For this we need to perform an adequate search over the web that will provide some general use web services. While collecting such data we need to follow some constrains.
Data required in terms of cloud services that we can get from web search or from the work done on earlier researcher.
We require the knowledge related to the web services parameters and returning values.
Different ranking parameters are required, that will be collected from work done by earlier researchers.
3.4.3 Research Design
As in this proposed architecture the user will interact to the web with his topic based query to retrieve the web pages. As the page is query performed it will perform request to the web and generate the basic URL list. Now it will retrieve the data from the web. For the URL collection it will use some concepts like indexing and the ranking. The indexing will provide a fast access to the web page where as ranking will arrange the list according the priority.
Figure 3.4.3 Design View for Ranking
As we can see in this architecture, at first the query is performed by the user and on this query the query analysis will be performed. The analysis includes the keyword extraction by removal of stop words. Once the keyword extracted the next work is about to perform the keyword summarization based on frequency of keywords. Once we get the summarized keywords it will be used as the content based analysis.
On the cloud side a database is used to maintain the cloud information that includes the cloud URL along with some parameters such as availability, response time, shared resources etc. on the basis the cloud services are selected.
3.4.4 Research Tool
Java was designed to meet all the real world requirements with its key features, which are explained in the following paragraph.
Java is Simple and Powerful
Java was designed to be easy for the professional programmer to learn and use efficiently. Java makes itself simple by not having surprising features. Since it exposes the inner working of a machine, the programmer can perform his desired actions without fear. Unlike other programming systems that provide dozens of complicated ways to perform a simple task, Java provides a small number of clear ways to achieve a given task.
Security of Java
Today everyone is worried about safety and security. People feel that conducting commerce over the Internet is a safe as printing the credit card number on the first page of a Newspaper. Threatening of viruses and system hackers also exists. To overcome all these fears java has safety and security as its key design principle.
Using Java Compatible Browser, anyone can safely download java applets without the fear of viral infection or malicious intent. Java achieves this protection by confining a java program to the java execution environment and by making it inaccessible to other parts of the computer. We can download applets with confidence that no harm will be done and no security will be breached.
Portability of Java
In java, the same mechanism that gives security also helps in portability. Many types of computers and operating systems are in use throughout the world and are connected to the internet. For downloading programs through different platforms connected to the internet, some portable, executable code is needed. Java's answer to these problems is its well designed architecture.
In this chapter, proposed research problem is discussed with its scope and objectives. Architecture of proposed problem is shown in Figure 1 to clear the whole concept of problem. Java will be used as a research tool.
In this present work the GUI interface will be created to pass the user query like a search engine. The search engine will be integrated with the cloud architecture.
The first output will be drawn in terms of query filtration and extraction of the keyword from the query analysis. Once the keyword analysis is performed the keyword reduction will be performed and finally the keywords will be drawn as output.
Figure 3.5.1 Intermediate output of approach
Now this extracted keyword will work as input to the cloud search architecture and based on the algorithmic approach it will return the effective URL list along with ranking.
Figure 3.5.2 Final outcome of approach
This is what, our outcome will be.
This chapter focuses on dividing the task in subtasks and according to the task, time slot is allocated to individual task so that task should be completed on time.
4.1 Gantt Chart
Table 4.1 Gantt chart
Time in Weeks
Data Collection and Study of Existing System
Study of Tool
Creating Cloud Database and its management
Design of GUI and User Query Filtration
Cloud Search and ranking
In this section, different time period is allocated to different tasks to manage the tasks properly and for timely completion of task.