This section offers a critical review of the current literature, as it relates to the domain Of discourse. It is achieved in two main sub-sections - ‘Definitions', and ‘The Literature'. Investigation of the literature revealed at an early stage that credible studies and academic works relating directly Web cache optimization are rare. This presents a constraint (and hence a challenge) to this study, not only in the preparation of a literature review, but also in the collection and analysis of the data, which is largely literature-based.
In the early stages of the review, (which also formed the early stages of data collection) it was deemed appropriate to ‘map' the literature into broad categories of interest, in order to reveal what relevant material was prominently available, and what was not. It is also the purpose of this part of the study to identify those concepts, models and frameworks that might be found to be useful later for comparing and evaluating real word situations. Some general points relating to the literature are worthy of note. It is recognised that those articles that appear in journals and specialist periodicals are likely to differ by comparison to those that appear in published books. Apart from the obvious point that it takes less time and commitment to write a journal paper than it does a book, journals tend to reflect a more transient level of thinking, and act as a forum for the expression of current views and academic thinking. Books on the other hand, being more substantial works, tend to reflect the author's hope that what is being proposed is of some lasting value to the body of literature. These considerations have been taken account of in this work by including, so far as is prudent, only recent journal articles - in the main those of the past two years or so, whilst for books this has been extended to cover about the past eight years (though most are much more recent).
2.1Introduction to web caching
Rapid growth in the Web browsing dominates today's Internet. In looking at how to improve the quality of service delivered by the Internet, a very productive way to start is examining the performance of Web transactions. It is here that Web caching can play a valuable role in improving service quality for a large range of Internet users. Web caching is a technology that has demonstrated to improve bus traffic on the Internet.
This project has been carried out for Web caching targeting application to provide reduction in the latency to retrieve web pages, frequency of access, Access regency, i.e., time elapsed since the last access. This project work implements a relative value algorithm, least weighted usage algorithm and Least Unified-Value (LUV) algorithm. A new algorithm, called least Grade Replacement (LGR), has been proposed which takes frequency of access, access recency and document size into account for web cache optimization.
2.2 Nature of Web Today:
2.3 About Web cache optimization:
Web caching is a process of storing the web pages, documents and any web content that passes through it. Requests can be satisfied from cache if certain requirements are met.
Controlling the caches can be done by the three mechanisms freshness, validation and invalidation.
http is having two objects request and response. The request object is carrying the query data to the server and the response in having the answer as the web page or any data. Freshness uses the response object which is already cached without going to the server. Over the time the response will get stale the validation will check the state of the cached response.
Whenever the validation fails the response is invalidated and the fresh response object is taken from the server.
A web cache can be properly designed by improving the access time of the web site results in reduction of the network traffic.
2.4Web Cache Optimization Algorithms:
Steven Glassman First wrote the caching relay in 1994 which examined the behaviour of the user acts and the performance of the relay. He provided the first cache design architecture. He tested it in an Unix operating system with 80,000 web pages cached in it. Although this model suffered from many memory and caching constraints this model became the basis for the later development of the web cache optimization.
Later many authors worked on the similar concepts and developed algorithms for the web caching.
“The next major advancement in the web caching is done by the Timo Koskela, Jukka Heikkonen and Kimo kaski of University of Technology, Finland in 2002”. They optimized the web cache by utilizing the syntactic features extracted from the cache objects. Each cache object is valued by using the non liner model by observing the features from the HTTP response, access log of the cache and HTML structure of the object. The Least Recently Used algorithm (LRU) cache model is optimized here. A generalized linear model is implemented on the LRU to dynamically improve the performance.
2.5 Page replacement algorithms
Paging is the process of storing the memory pages in to the secondary storage devices like hard disks. These are later retrieved whenever the request is received by the system. This is mostly used by the operating systems. This is also used for the virtual memory management.
The pages stored in the secondary storage are swapped with the pages in the main memory based on their utility and the requests from the programs. The page replacement algorithm decides which pages must be swapped in the system. Although the page replacement algorithm are proposed during the early stages of the computer development i.e., during 1960-70, these are recently implemented for the web cache optimization.
There are many page replacement algorithms exists today which evolved during the years of research.
The theoretically optimized page replacement algorithm is popularly known as OPT is used for swapping since this cannot be used for the general purpose operating systems due to its in efficiency to compute reliability of a page before it is going to be used.
Not recently used (NRU): This will work with the logic of paging the pages in the memory which are not recently used. The NRU picks the random page from the lowest category for removal. This always implies that a reference page is more important than a modified page.
First In First Out(FIFO): This algorithm will implement a queue where the pages are queued together. This is used for the book keeping process of the operating system. Since it cannot effectively handle the caching process it is still facing many loop holes.
Least Recently Used: This is similar to NRU but LRU can track the page usage over the short term period of time. It uses the linked lists to handle the memory pages. LRU performance will be reduced in the situations under quite common reference patterns.
Random Page Replacement Algorithm: This algorithm picks up the random page from the memory and replaces the page. Since it eliminates the tracking cost of processing it is optimal for the systems where the process is a constraint. It fails in working with the web cache since the cost of processing is not a factor in the current process.
Aging: This algorithm considers the age of the page that was stored in the memory and then replaces the page. Since age can be taken into consideration only when the cache reserves over time, this is not applicable for the web cache optimization due to its unpredictable nature of the usage.
2.6 LGR RELATED ALGORITHMS OF WEB CACHING
The algorithm for the replacing the documents is based on the statics of the collected Web data. The factors influencing it are as follows
- Frequency of Document Reference
- Recency of Document Reference
- Size of Document
- Document Consistency (Strong or loose)
- Replica Document in different proxies
- Non-Replica Document in different proxies.
These factors are combined to form a effective scheme. The cache architecture is changed to improve performance. We are considering the three algorithms LRV, LWU and LUV for comparing with our LGR algorithm.
2.6.1 Lowest Relative Value:
This algorithm was proposed by Lugi and Vicisano intended for proxy. This uses the cost/benefit model to calculate the relative value of each document in the cache. The possibility of the document to be accessed again is calculated.
- Extreme performance when compared to the traditional algorithms LFO, FIFO, random (As discussed early).
- Use full in small caches.
1. If the cache capacity increases the performance falls.
2.6.2 Least Weighted Usage Algorithm:
“This algorithm is proposed by Edward and Ye-Sho. They proposed the model driven simulation. The Algorithm they proposed is called Least Weighted Usage (LWU)”. This is a dynamic and self-adaptive. This access the history based on the same size interval. The cached document is assigned with a Boolean value corresponding weather it is accessed or not accessed during this interval.
- Wide range of access patterns.
- Performs best due to adoption to both frequency and recency in different environments.
1. It ignores the size of web documents.
2.6.3 LUV Algorithm:
This was proposed by Bahn. This algorithm uses the complete web history of documents. The lowest value is replaced based on the studied history.
1. Uses the full history of the documents.
2. Optimization of weighting function is weak.
Our work towards the above algorithms gave rise to a new algorithm named Least Grade Page Replacement Algorithm. This algorithm out performed when compared with the simulation results of the above algorithms.
Our rest of the work discusses the proposed algorithm and simulation results are provided with the analysis results which proved the algorithm works better when compared with these three algorithms in web cache optimization.
2.7 Defining the terms
Internet is the interconnection of the computers all over the world. Over the past decades the utilization of the internet is drastically increased in both the research and the business sectors. The first main origin of the internet came from the ARPANET (Advanced Research Project Network) Department of the defence, United States of America. Later the ARPANET allowed the universities to join their network. In later years the network gave birth to Internet.
The Web Browser:
Web browser is a client application of a web application. Which is capable of executing the HTML text and displaying the content on the screen. The web browser will also work as a compiler and interpreter for the scripting languages like java script, vb script etc, Today there are many web browsers available some of them are listed below. The web browsers are having some controls which enable the user to navigate through the pages. There are also options to store history and bookmark the favourite pages.
Hypertext mark-up language is today the heart of the web. This is developed by the World Wide Web consortium (W3C). The file consists of the HTML is known as the Web page. This page can be read by special software called Browser. The web pages are basically stored in the web servers. Later different versions of HTML are released and the latest version of HTML is 4.0.
Microsoft Internet Explorer:
This is the first web browser that is released with the operating system. Microsoft has released the version 1.0 web browser with his new operating system Windows 3.1 but it got its popularity after the Microsoft internet explorer 4.0 was released with Microsoft Windows 95 in the year 1995. Today it is most successful browser in the marker which occupies 78% of the consumer segment. Today's version of the internet explorer is 8.0.
The founder of java script released a web browser which started its boom during the early internet ages. It lost its market and significance in later years. Today we can say that there is no more Netscape Navigator since it contributed its source to the open source community.
This is the next competitor to the Microsoft Internet explorer. It is developed by the open source community for the use of Linux operating system. But later versions for the Windows are used. The secret for its success is for the availability of plug-in.
Although this is not having such a significance in the computer browsers. It is successful in the mobile devices. This is having a rich quality for displaying the graphics.
This is developed by the apple for the web browsing. It is popular for its rich user interface and ease of use.
The later browser that is bundled with many features and themes. This is released by Google. The chrome is a multi tabbed browser which shows the maximum space to be utilized for the web page. Very less space is occupied by the menus.
DNS: Domain Naming System is the major mile stone in the development of the internet. Basically every host in a network is uniquely identified by a number with in the network called ip address. Since ip address is a four part number each part consists of a numeric value ranging from the 0-255. Each host is having an IP address and a subnet mask which defines the type of network the host is connected to. A simple example of the ip-address is displayed below.
IP Address: 192.168.0.1
Sub Net Mask: 255.255.255.0
The DNS is capable of converting ip-address to a name called domain name. users can access the system by typing the domain name instead of remembering the ip-address.
2.8 Object Oriented Analysis and Design for web cache optimization
Object oriented analysis and design is an approach for developing software applications. This is an approach where it uses the real time objects for the development process. The entire process is divided into four major parts. The object oriented approach is first started by the Rational software corporation. Grady brooch is the first person to start the process of object oriented software engineering.
The first part analysis is the major part in the object oriented software engineering. This is also popularly known as the requirements engineering. This is a study about the problem which includes the problem scope and problem identification. In this process the analysts will create the high level requirements for the projects. These requirements are categorized into two parts. Functional and Non Functional Requirements. In functional requirement we will work with the functionalities of the system like Business rules, features and screen mocks, where as no functional requirements will consider with the performance, security, user interface, reliability.etc.,
The design is the detailed solution for the existing problem. This uses the UML (Unified Modeling Language which will be described in the next sessions) for developing a solution for the problem. Using the UML will enable you to present the plan in a diagrammatic way. Since diagrams can be understood easily by the people. This includes diagrams like class diagrams, use case diagrams, sequence diagrams, activity diagrams ...etc.,
The coding phase is also known as the implementation phase. The code will be developed based on the design provided by the architect of the software. This involves in actual development of the software. Here all the software dependencies are assembled together to make the system more functional.
In the testing phase the test engineers will check whether the developed project meets the requirements or not. There are many types of testing processes majorly the white box testing and black box testing. The white box testing is done by the third party test engineers or the developers who is developing the code. The white box testing is called so because the code is visible to the tester. The black box testing is done by the third party people or client who can handle the requirements. They will consider with different testing processes like requirements testing, integration testing and functional testing.
According the done research every algorithm that is referred we found that each algorithm is having its boundaries for the working. The boundaries are discussed above. We found there must be a requirement for a new algorithm we meet the daily requirements of a web browser for the caching of the web pages. The new algorithm as per the 2008 paper released by IEEE is based on the grading of the pages and the least graded page is replaced into the memory from the cache. This algorithm implementation will work well and dynamically increases the performance of the system.
1.Pei Cao, Snady Irani, “Cost-Aware WWW Proxy Caching Algorithms”, inProceedings of the USENIX Symposium on Internet Technologies and Systems, December, 1997.
2. Replacement policies for proxy cache byLuigi Rizzo, Lorenzo Vicisa
3.K.Thompson, G. Miller, and R. Wilder, “Wilde-Area Internet Traffic Patterns and Characteristics,” in Proceedings of 3rd International Conference Web caching,May, 1998.
4.Ganesh, Santhanakrishnan, Ahmed, Amer, Panos K.Chrysanthis and Dan Li, “GDGhOST: A Goal Oriented Self Tuning Caching Algorithm”, inProceeding of the 19th ACM Symposium on Applied Computing.March 2004.
5. Ben Smith, Anurag Acharya, Tao Yang, and Huican Zhu, “Exploring Result Equivalence in Caching Dynamic Web Content”,2nd USENIX Symposium on Internet Technologies and Systems, October, 1999
6. H.M. Vin, J.S. Kay "Design Considerations for Distributed Caching on the Internet", R. Tewari, M.Dahlin. Proc. ICDCS, Austin, June 1999.
7.K. Yagoub, D. Florescu, P.Valduriez, V. Issarny "Caching Strategies for Data-Intensive Web Sites",. Proc. VLDB, Cairo, September 2000.
8. Publication manual of American Psychological Association 5th Edition http://owl.english.prudue.edu
9. Timo Koskela, Jukka Heikkonen and Kimmo Kaski web cache optimization with non- linear model using object features computer networks vol 43
10. Object Oriented Software Engineering by Pressman