A method for caching specified data in an n-way set associative memory with a copy-back update policy consists of the following steps. First, a row of the associative memory, organized as a plurality of rows and having n ways per row, is selected according to the main memory address of the specified data. The main memory provides primary storage for the data being cached. If one of the ways of the selected row holds invalid data, the specified data is cached in the way holding the invalid data and the data caching process is discontinued. If all n ways of the selected row hold valid data, the following steps are performed. First, a replacement strategy is used to select a way from the selected row. If the way selected in accordance with the replacement strategy holds unmodified data, the specified data is cached in the way selected by the replacement strategy and the data caching process is discontinued. However, if the way selected by the replacement strategy holds modified data, the ways of the selected row are examined again to find a way that holds data from the currently open page of the main memory. If such at least one such way is found, the specified data is cached in one of the ways holding data from the open page, and the data caching process is discontinued. Finally, if none of the ways in the selected row meet the above criteria, the specified data is cached in the way previously selected by the replacement algorithm, and the method terminates.
What is a cache?
A cache is an amount of faster memory used to improve data access by storing portions of a data set the whole of which is slower to access. For example, on most computers disk access is very slow relative to the speed of the main memory; to speed repeated accesses to files or disk blocks, most computers cache recently accessed data from disk in main memory or some other form of fast memory. At the other extreme, most modern processors run at a much higher clock speed than the main memory of the computers they are in; in order to avoid being slowed down to the speed of main memory every time they have to access some data, processors typically have higher speed caches to store the data they are currently working with. Sometimes the total data set isn't actually stored at all; instead, each data item is calculated as necessary, in which case the cache stores results from the calculations.
How does a cache work?
When a datum is needed, the cache is checked to see if it contains the datum. If it does, the datum is used from the cache, without having to access the main data store. This is known as a 'cache hit'. If the datum is not in the cache, it is transferred from the data store; this is known as a 'cache miss'. When the cache fills up, items are ejected from the cache to make space for new items.
When should my software use a cache?
Your software probably already uses some built in caches - for example, the processor cache and disk cache mentioned above. For most software, the computer's built in caches are sufficient.
If you work with a large data store, a cache may speed up your software. If you perform calculations that may be expensive relative to the amount of data produced - for example, complex database queries - caching some of the results may provide benefits. As with all optimizations, it's best to thoroughly understand where your software spends its time before implementing a cache.
It's usually a bad idea to implement a cache that simply duplicates the purpose of an existing cache. You're not likely to be able to improve on the work of the disk system experts who implement the computer's block level disk cache, for example, and even if you did, it probably wouldn't make up for the added overhead of having two block level disk caches.
What issues should I consider when designing a cache?
Probably the key design issue for a cache is selection of the algorithm used to decide which data to retain in the cache. To decide which algorithm is best for your purposes, it's good to know your data access patterns. Key issues are whether your data access is temporally clustered - that is, whether data that has been recently accessed is more likely to be accessed again; whether there tend to be scans - sequential accesses - to significant chunks of the data, as opposed to random accesses; and how uniform the frequency of access is to various items in the data set. In addition, some algorithms are more complex to implement than others; some require more or less calculational and memory overhead than others; and some have parameters that need to be tuned to get good performance.
If the main data set can be modified by multiple clients each maintaining their own cache, you may also have to worry aboutÂ cache coherencyÂ - whether the different clients' caches have consistent, 'coherent' views of the main data set.
What are some of the popular caching algorithms?
Some of the most popular and theoretically important algorithms are FIFO, LRU, LFU, LRU2, 2Q and time-based expiration. Most of the titles are based on the strategies used to eject items from cache when the cache gets full, except the time-based expiration algorithms.
FIFOÂ (First In First Out): Items are added to the cache as they are accessed, putting them in a queue or buffer and not changing their location in the buffer; when the cache is full, items are ejected in the order they were added. Cache access overhead is constant time regardless of the size of the cache. The advantage of this algorithm is that it's simple and fast; it can be implemented using just an array and an index. The disadvantage is that it's not very smart; it doesn't make any effort to keep more commonly used items in cache.
Summary for FIFO: fast, not adaptive, not scan resistant
LRUÂ - (Least Recently Used): Items are added to the cache as they are accessed; when the cache is full, the least recently used item is ejected. This type of cache is typically implemented as a linked list, so that an item in cache, when it is accessed again, can be moved back up to the head of the queue; items are ejected from the tail of the queue. Cache access overhead is again constant time. This algorithm is simple and fast, and it has a significant advantage over FIFO in being able to adapt somewhat to the data access pattern; frequently used items are less likely to be ejected from the cache. The main disadvantage is that it can still get filled up with items that are unlikely to be reaccessed soon; in particular, it can become useless in the face of scans over a larger number of items than fit in the cache. Nonetheless, this is by far the most frequently used caching algorithm.
Summary for LRU: fast, adaptive, not scan resistant
LRU2Â - (Least Recently Used Twice): Items are added to the main cache the second time they are accessed; when the cache is full, the item whose second most recent access is ejected. Because of the need to track the two most recent accesses, access overhead increases logarithmically with cache size, which can be a disadvantage. In addition, accesses have to be tracked for some items not yet in the cache. There may also be a second, smaller, time limited cache to capture temporally clustered accesses, but the optimal size of this cache relative to the main cache depends strongly on the data access pattern, so there's some tuning effort involved. The advantage is that it adapts to changing data patterns, like LRU, and in addition won't fill up from scanning accesses, since items aren't retained in the main cache unless they've been accessed more than once.
Summary for LRU2: not especially fast, adaptive, scan resistant
2QÂ - (Two Queues): Items are added to an LRU cache as they are accessed. If accessed again, they are moved to a second, larger, LRU cache. Items are typically ejected so as to keep the first cache at about 1/3 the size of the second. This algorithm attempts to provide the advantages of LRU2 while keeping cache access overhead constant, rather than having it increase with cache size. Published data seems to indicate that it largely succeeds.
Summary for 2Q: fairly fast, adaptive, scan resistant
LFUÂ - (Least Frequently Used): Frequency of use data is kept on all items. The most frequently used items are kept in the cache. Because of the bookkeeping requirements, cache access overhead increases logarithmically with cache size; in addition, data needs to be kept on all items whether or not in the cache. The advantage is that long term usage patterns are captured well, incidentally making the algorithm scan resistant as well; the disadvantage, besides the larger access overhead, is that the algorithm doesn't adapt quickly to changing usage patterns, and in particular doesn't help with temporally clustered accesses.
Note: This is sometimes referred to as "perfect LFU", which is in contrast to "in cache LFU". The latter retains frequency of use data only on items that are already in the cache, and generally does not peform as well.
Summary for LFU: not fast, captures frequency of use, scan resistant
Simple time-based expirationÂ - Data in the cache is invalidated based on absolute time periods. Items are added to the cache, and remains in the cache for a specific amount of time.
Summary for Simple time-based expiration: Fast, not adaptive, not scan resistant.
Extended time-based expirationÂ - Data in the cache is invalidated based on relative time periods. Items are added to the cache, and remains in the cache until they are invalidated at certain points in time, sucha as everi five minutes, each day at 12.00 etc.
Summary for Extended time-based expiration: Fast, not adaptive, not scan resistant.
Sliding time-based expirationÂ - Data in the cache is invalidated by specifying the amount of time the item is allowed to be idle in the cache after last access time.
Summary for Sliding time-based expiration: Fast, adaptive, not scan resistant.
Working setÂ - Based on Dr Peter Denning's classic "Working Set" paper from ACM Computing Surveys (CSUR) Volume 2 , Issue 3 (September 1970
Data in the cache marked with a flag for every access. The cache is periodically checked, recently access members are considered part of the "working set". Members not in the working set are candidates for removal. Size of cache is not defined directly, rather the frequency of the periodic checks indirectly controls how many items are deleted.
Summary for Working Set: Fast, adaptive, theoretically near optimal, not scan resistant.
Other algorithmsÂ - there are other caching algorithms available that have been tested in published papers. Some of the popular ones includeÂ CLOCK,Â GCLOCK, andÂ LRDÂ (Least Reference Density). Of possible interest is IBM's ARC (Adaptive Replacement Cache) web page athttp://www.almaden.ibm.com/StorageSystems/autonomic_storage/ARC/index.shtmlÂ - in particular the first link to a PDF format paper ("ARC: A Self-Tuning, Low Overhead Replacement Cache"), which includes some useful tables giving overhead times and hit ratios as functions of cache size and some other parameters.