This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Introduction to selective victim caching techniques
As there are fewer cache lines than main memory blocks, an algorithm is designed for mapping main memory blocks into cache lines. A means is needed for determining which main memory block currently occupies a cache line. The way on how the cache mapping decides how the organiztion structure of the cache. These techniques are: direct, associative, and set associative. As stated in the topic, we discuss a method to improve the performance of direct-mapped caches which is called "Selective Victim Caching".
Direct mapping is the technique that maps every block of memory into one cache line. Implementation is easy for such mapping function by using address. The method, a fully-associative cache is employed to enchance the main direct-mapped cache is to store the "victims" of replacements from the main cache. For every replacement, the block is removed from cache and is added to victim cache where it replaces the least recently used block. The victim cache is accessed in parallel. Effective memory access time was improved, this is done by reducing miss rate in the second level of the memory hierarchy.
Selective Victim Caching
This scheme is to reduce the miss rate of direct-mapped caches.
With a small fully-associate cache, victim cache, it enhances the direct-mappe main cache.
Cache blocks removed are stored by victim cache during replacements.
Incoming blocks are selectively placed into first-level cache of the main cache or into victim cache.This mechanism is done based a prediction scheme according to their past history of use.
Furthermore, interchanges of blocks between the main cache and the victim cache are also performed selectively.
A significant decrease in miss rate by using this scheme.
Number of interchanges between the two caches for both small and large caches(4 Kbytes - 128 Kbytes) are improved.
For example, implementation in an on-chip processor cache:
An average improvement about 20 percent in miss rate for a 16K cache are seen with the simulation with 4 traces from SPEC Release 1 programs.
Approximately 74% number of blocks interchanged between the main and victim is reduced.
Why an improved cache mapping scheme has to be introduced?
An increasing speed gap between processor and the underlying memory hierarchy since rapid technology development in manufacturing processor.
Two contributing factors to this scenario are:
(i) processor's cycle time has been decreasing at a faster rate than memory access time which had been predicted by Moore's Law.
Pipelining and superscalar processing techniques have caused a tremendous reduction in the average number of cycles per instruction(CPI).
These 2 factors have increased the number of wasted instruction cycles which is as the cache-miss penalty. So innovations on an improved cache memory system is required.
Why choosing direct-mapped technique as a scheme to be improved?
Most microprocessors use on-chip L1(first level) caches to provide fast access to instructions and data.
A tradeoff occurs between miss-rate and access time in such design of caches.
A direct-mapped cache has the shortest access time, although has its weakness of high miss rates due to conflicts among memory references.
Set-associate cache improve the miss rates caching.
Compared to other mapping functions, single-cycle access to the cache by a direct-mapped configuration is most suitable to be applied.
In order to cover the trade off, the conflicts among memory references in a direct-mapped cache always cause a increases in miss rate over set-associative caches.
Researchers are motivated to solve this problem without affecting its hit time(cache access time).
Introduces and describes the selective victim caching idea.
The figure shown is the memory hierarchy for the selective victim caching scheme.
The first-level(L1) cache has a direct-mapped main cache and a small fully-associative victim cache.
A line buffer is put into work. Sequential accesses to words in the same cache block would not cause more than single access to the cache.
The whole line is brought into the line buffer parallel with the execution in the CPU during the first access to cache. Subsequent accesses to words result in no update of state bits in cache. The algorithm of prediction would consider the sequential references to words within a single access to the cache line.
In accordance to the "least recently used" algorithm, lines are substituted in the victim cache
A L2 cache or the main memory could be next lower level of the memory hierarchy.
The line buffer, main cache, and the victim cache are accessed in parallel on every memory activity.
No other action is required when the line is found in the line buffer. If the line is not found, 3 different cases must be taken into account:
1.The word will be sent to CPU and the whole line containing the word is sent into line buffer once the word is found in the direct-mapped cache. In such scenario, it is same with simple direct-mapped cache.
Main differences between selective victim caching and simple caching
2. The word is fetched from victim cache into line buffer and sent to CPU when there is a miss in main cache. The prediction algorithm decides if the accessed block in victim cache seems would be accessed in the next time compared to the block in main cache.An interchange between the 2 blocks is operated if the prediction algoritm decides that the the victim cache block is probably to be accessed again rather than the conflicting block in the main cache. If not, no such interchange is operated. The blocks in the victim cache is recored as most recently accessed. The conflicting blocks in the main cache and the victiom cache are always swapped in simple victim caching. This scheme processes the swaps selectively.
When the word is not available in both main cache and victim cache this implies that maybe the partcular line in the direct-mapped cache is null, or the new line conflicts with another line already recorded. Incoming blocks are always allocated in the main cache in simple victim caching scheme where this scheme allocate them selectively in either cache.
Prediction algorithm employed
When the access conflicts with line stored in main cache, prediction algorithm is called.
The purpose of the algorithm is to decide which of the 2 conflicting lines is most probably to be accessed next time.
If the line has higher probability of access, then it is allocated in the main cache. Other lines are put into victim cache.
Due to small capacity of victim cache, the line in victim cache is replaced next time.
Two state bits, hit bit and sticky bit are used by algorithm for every cache line.
Performance evaluation of the scheme
The effectiveness of the selective victim caching is tested using simulations.
Simulations were operated for instruction and data caches in different context.
The size of the victim cache is fixed as 8 lines. The size of the direct-mapped main cache is differ from 2Kbytes to 128Kbytes.
The effect of different line size on those simulations retains the size of main cache as 16 Kbytes.
L2 cache with 8 times more than the capacity of L1 main cache is used in simulation.The line size of the L2 cache is configured equal to the L1 cache.
L2 cache stores the hit bits used by prediction algorithm.
Hit bit is started of to zero when a new line was introduced to L2 cache from main memory.For replacement of the line in L1, hit bit is copied from L2.
Lines in L1 are invalidated once they are replaced in L2 to maintain inclusion property of data caches.
How this scheme was tested?
Program trace~2 C programs and 2 FORTRAN programs are tested with their trace length.
Instruction cache simulations~Direct-mapped cache and simple victim cache are compared by simulation of the instruction cache.The miss-ratios for both caching methods are compared. Selective victim caching shows improvement compared to the pure victim cache when the cache is not that large to suit the whole program.Selective victim caching can improve the result of the first-level cache system by working up on miss ratio and the number of exchanges.
Data cache simulations~First-level data cache is tested. Selective victim caching improve the performance for 2 of the traces - spice2g6, and doduc. For li the improved performance of both caching is found to expand with the cache size. Both caching schemes are competent to remove a large percentage of misses.
Hit bit and sticky bit are needed to record the history of cache.The sticky bit is logically associated with a line in the main cache. So, it is intuitively to store it in the direct-mapped cache.The hit bit, is logically associated with every line in main memory. So, as an ideal implementation, the hit bits have to be stored in the main memory. This approach is impractical usually. Thus, we must adapt to an rough implementation.
Hit array is maintained as part of the L1 cache system.
Every memory line is mapped to one of the bits of these array. The hit array length is smaller than the maximum number of lines to be addressed.
So, more than one line would be mapped to the same hit bit. Some randomization happens in the prediction. Although this may decrease the performance of selective victim caching, simulation results showed almost no degradation.
Selective victim caching scheme can be processed without an L2-cache with hit bit array, this improve the prediction scheme slightly.
As L2-cache is designed as set-associative, the effect of invalidations was reduced.