Performance Estimation Of Reconfigurable Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

One of the recent research areas includes reconfiguration of FPGA, which involves minimum bit stream size and optimum memory. Reconfiguration also helps in improvement of communication bandwidth and reduced reconfiguration time. This is done by using bit stream compression techniques. Some bit stream compression techniques are used in multiple bit stream compression but are devoid of real-time decompression and some focuses on improving the decompression performance leaving the compression part. So it's a major task to design an efficient code compression technique which results in reduction of code size without affecting the system performance. This paper deals with an efficient compression bit stream placement technique which supports decompression without affecting the compression efficiency. This technique uses bitmasks which improves the efficiency without any decompression penalty. This helps in the reduction of code size and therefore results in the area, power and the performance improvement. The proposed technique overcomes the performance of other existing compression techniques by 15%.Also the operating speed of the decompression hardware for variable length encoding is closer to the best known field-programmable gate array-based decoder for fixed-length coding.

Keywords -Dictionary based compression,Bitmask based compression, RLE based compression.


FIELD-PROGRAMMABLE GATE ARRAYS (FPGAs) are widely used in reconfigurable systems. Since the configuration information for FPGA has to be stored in internal or external memory as bitstreams, the limited memory size, and access bandwidth become the key factors in determining the different functionalities that system can be configured and how quickly the configuration can be performed. While it is quite costly to employ memory with more capacity and access bandwidth, bitstream compression technique alleviates the memory constraint by reducing the size of the bitstreams. With compressed bitstreams, more configuration information can be stored using the same memory. The access delay is also reduced, because less bits need to be transferred through the memory interface. To measure the efficiency of bitstream compression, compression ratio (CR) is widely used as a metric. It is defined as the ratio between the compressed bitstream size (CS) and the original bitstream size (OS) i.e., CR=CS/OS . Therefore, a smaller compression ratio implies a better compression technique. There are two major challenges in bitstream compression: 1) how to compress the bitstream as much as possible and 2) how to efficiently decompress the bitstream without affecting the reconfiguration time.

Our approach combines the advantages of previous compression techniques with good compression ratio and those with fast decompression. This paper makes four important contributions. First, it performs smart placement of compressed bitstreams to enable fast decompression of variable-length coding (VLC)[1]-[3]. Next, it selects bitmask-based compression parameters suitable for bitstream compression. Finally, it efficiently combines run length encoding (RLC) [4] and bitmask-based compression to obtain better compression and faster decompression.


Fig.1.shows our decode-aware bitstreamcompression framework.Onthe compression side, FPGA configuration bitstream is analyzed for selection of profitable dictionary entries and bitmask patterns. The compressed bitstream is then generated using bitmask-based compression [5] and run length encoding (RLE). Next, our decode-aware placement algorithm is employed to place the compressed bitstream in the memory for efficient decompression. During run-time, the compressed bitstream is transmitted from the memory to the decompression engine, and the original configuration bitstream is produced by decompression.

Algorithm 1 outlines four important steps in our decode-aware compression framework (shown in Fig. 1):

1) bitmask selection; 2) dictionary selection;3) integrated

RLE compression; and 4) decode-aware placement. The input bitstream is first divided into a sequence of symbols with length of . Then bitmask patterns and dictionary entries used for bitmask-based compression are selected. Next, the symbol sequence is compressed using bitmask and RLE. We use the same algorithm [5] to perform the bitmask-based compression. The RLE Compression in our algorithm is also discussed in later chapter. Finally, we place the compressed bitstream into a decode friendly layout within the memory using placement algorithm.



This section describes the existing dictionary-based approaches and analyzes their limitations. First, we describe the standard dictionary-based approach [5]. Next, we describe the existing techniques that improve the standard approach by considering mismatches (hamming distance). Finally, we perform a detailed cost-benefit analysis of the recent approaches in terms of how many repeating patterns they can generate from the mismatches. This analysis forms the basis of our technique to maximize the repeating patterns using bitmasks.

Dictionary-Based Approach:

Dictionary-based code-compression techniques provide compression efficiency as well as fast decompression mechanism. The basic idea is to take advantage of commonly occurring instruction sequences by using a dictionary. The repeating occurrences are replaced with a code word that points to the index of the dictionary that contains the pattern. The compressed program consists of both code words and uncompressed instructions. Fig. 2 shows an example of dictionary based code compression using a simple program binary [6]. The binary consists of ten 8-b patterns, i.e., a total of 80 b. The dictionary has two 8-b entries. The compressed program requires 62 b, and the dictionary requires 16 b. In this case, the CR is 97.5%. This example shows a variable-length encoding. As a result, there are several factors that may need to be included in the computation of the CR, such as byte alignments for branch targets and the address-mapping table.

The largest limitation of the decoder in terms of speed is the number of bits it can process at a time. In this respect, compression methods like LZ78 most widely known through its variant LZW, have the distinct advantage of being able to read an entire input word at a time, as encoded words have the same length. However, the same technique has the disadvantage of having to dynamically generate and maintain the contents of the dictionary.

A solution that targets both speed and simplicity would be to use a statistical dictionary that is computed based on the contents of the entire bitstream and is used throughout the entire decompression. Unlike the Huffman dictionary, there is no clear methodology for how such a dictionary can be created in an optimal way (at least not to the knowledge of the authors), but the characteristics of the bitstream make the choice an easy one. In particular, due to the high probability of occurrence of the zero symbol, the coding scheme degenerates into a bit-level RLE with minor modifications.


Bitmask-based compression is an enhancement on the dictionary-based compression scheme, that helps us to get more matching patterns. In dictionary-based compression, each vector is compressed only if it completely matches with a dictionary entry.

As seen in Fig. 4, we can compress up to six data entries using bitmask based compression. The compressed data is represented as follows. Those vectors that match directly are compressed with 3 bits. The first bit represents whether it is compressed (using 0) or not (using 1). The second bit indicates whether it is compressed using bitmask (using 0) or not (using 1). The last bit indicates the dictionary index. Data that are compressed using bitmask requires 7 bits. The first two bits, as before, represent if the data is compressed, and whether the data is compressed using bitmasks. The next two bits indicate the bitmask position and followed by two bits that indicate the bitmask pattern.

For example, the last data vector in Fig. 4 is compressed using a bitmask. The bitmask position is 11, which indicates the fourth even bit position from left. For this case, we have assumed fixed bitmasks, which are always employed on even-bit positions and hence only 2 bits are sufficient to represent the four positions in a 8-bit data. The last bit gives the dictionary index. The bitmask XORed with the dictionary entry produces the original data. In this example, the compression efficiency is 27.5%, based on the following formula expressed as percentage:

Comp. Efficiency = Original Size -Compressed Size

Original Size

Since existing approach does not handle don't cares ("X"), in this example we have replaced all don't cares by 1. Note that we could have replaced all don't cares with 0's as well. In that case, it will result in worse compression efficiency of 2.5%. A better compression efficiency can be achieved by selectively replacing the don't cares with "0" or "1" instead of replacing all by 0's (or 1's). It is a major challenge to identify the selective replacement to generate the best possible compression efficiency.


The configuration bitstream usually contains consecutive repeating bit sequences. Although the bitmask-based compression encodes such patterns using same repeated compressed words, it is suggested in that run length encoding of these sequences may yield a better compression result. Interestingly, to represent such encoding no extra bits are needed. Note that bitmask value 0 is never used, because this value means that it is an exact match and would have encoded using zero bitmasks. Using this as a special marker, these repetitions can be encoded without changing the code format of bitmask-based compression.

Fig. 4 illustrates the bitmask-based RLE. The input contains word "00000000" repeating five times. In normal bitmask-based compression these words will be compressed with repeated compressed words, whereas our approach replaces such repetitions using a bitmask of "00". In this example, the first occurrence will be encoded as usual, whereas the remaining 4 repetitions will be encoded using RLE. The number of repetition is encoded as bitmask offset and dictionary bits combined together. In this example, the bitmask offset is "10" and dictionary index is "0". Therefore, the number of repetition will be "100" (i.e., 4).

The compressed words are run length encoded only if the RLE yields a shorter code length than the original bitmask encoding. In other words, if there are r repetitions of code with length and the number of bits required to encode them using RLE is bits, RLE is used only if bits. Since RLE is performed independently, the bit savings calculation during dictionary selection should be modified accordingly to model the effect of RLE.


The decompression engine is a hardware component used to decode the compressed configuration bitstream and feed the uncompressed bitstream to the configuration unit in FPGAs. As discussed previously, a decompression engine usually has two parts: the buffering circuitry is used to buffer and align codes fetched from the memory, while decoders perform decompression operation to generate original symbols. Since the decoders are well studied in previous literatures, we implement our bitmask and RLE decoder based designs in a successful way.

Fig 6 Decompression Engine

The structure of our decompression engine for 8-bit memory is shown in Fig. 7. An "Assemble Buffer with a Left Shifter Array" (ABLSA) is employed to replace the original "Buffer with a Barrel Shifter" (BBS). The basic working principle of ABLSA is to use an array of left shift registers to buffer the power-two bitstreams separately. Since the code length in bitmask-based compression is uniquely determined by the first two bits of a code (is Compressed and is- Bit masked flags), we can easily obtain the length of a code by checking of front bits of stream CS and BS. Then, the shift register (or PT streams) that hold bits of the code is identified based on the binary representation of the code length. Finally, the original code is assembled in the assemble buffer and fed to the bitmask or RLE decoders. When some shifter becomes empty, it is guaranteed to be loaded correctly by our decompression algorithm.

Decompression Efficiency:

We measured the decompression efficiency using the time required to reconfigure a compressed bitstream, the resource usage and maximum operating frequency of the decompression engine. The reconfiguration time is calculated using the product of number of cycles required to decode the compressed bitstream and operating clock speed. We observed that our approach can operate at a much higher frequency and occupies only 60% area compared to original bitmask-based decompression engine. Since our approach has the identical bitmask decoding circuit of the original one, the improvement is due to our ABLSA as we expected. Compared with previous techniques, our approach achieves almost the same operating speed with less area and also achieved 15%-20% better compression which means we can decompress more configuration information during the same amount of time.



The compacted scan chain network, which is mentioned in this section, reduces the scan depth of the scan chain, and reduces the test cost. Then combine the dictionary-based compression scheme with the compacted scan chain network to achieve the high compression ratio and the fast application time.


Here the bit stream which has been compressed is converted to its original size by subjecting it to the decompression engine. The simulation results shown here prove that the original data is retrieved from the compressed data.


This have been developed using an efficient test data compression algorithm using bitmasks. A compressed code stores information regarding the mask type, mask location and the mask pattern itself. The mask can be applied on different places on a vector and th number of bits required for indicating the position varies depending on the mask type.


A variation of Run-Length encoding perfectly meets the requirements for the address compression. A series of addresses with a common offset can be compressed into a codeword of the form base, offset and length.


This strategy uses a more intensive decompression algorithm. It attempts to reorder the address data pairs in more optimal manner. Since the addresses must be decompressed and sent in addition to the data, our decompression strategies use both the address and data buses to send the decompressed code words.

Compression Type


Total No. Of Bits=72


Total No. Of Bits=72


Total No. Of Bits=72


Total No. Of Bits=72






Dictionary +






Dictionary +

Bitmask RLE+





Table 1: Compression of various circuits Bitstreams

Using Our Proposed Compressed Methods.


The existing compression algorithms either provide good compression with slow decompression or fast decompression at the cost of compression efficiency. In this paper, we proposed a decoding-aware compression technique that tries to obtain both best possible compression and fast decompression performance.We also exploit run length encoding of consecutive repetitive patterns efficiently combined with bitmask-based compression to further improve both compression ratio and decompression efficiency. Our experimental results demonstrated that our technique improves the compression ratio by 10% to 15% while the decompression engine is capable of retrieving the compressed data without any loss. The configuration time is reduced by 15% to 20% compared to the best known decompression accelerator. In the future, we plan to investigate more placement algorithms that are compatible with other compression techniques such as Huffman coding, Goulomb coding and Arithmetic coding.