This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Abstract- In data compression or source coding algorithms, input sequences of symbols are converted to shorter sequences while the original information remains unchanged. One of the well-known data compression algorithms is Deflate which is designed based on the LZ method. Deflate method has three different modes where its second mode is applicable for real-time applications. In this mode, a certain static table of Huffman codes is employed during the coding procedure. In this paper, a new version of deflate algorithm is proposed and implemented in hardware. In the proposed method, a new basic coding table is employed. This table is modified adaptively based on the input sequence. Simulation results show that in this adaptive algorithm, the coding performance is improved. In the hardware implementation of the new method, through some parallelism concepts, we try to improve the hardware utilization and throughput.
Keywords-Data Compression, Data Coding, LZ Encoder, Lempel Ziv Algorithm, Deflate, Huffman Codes, Adaptive Algorithms, FPGA.
With the popularity of multimedia applications, people like to transfer huge files over the internet or store and play them back in digital systems like computers or CD players. Because we technically faced with limitations in bandwidth and storage space, we try to efficiently compress these files or sequences of data. In data compression or source coding algorithms, input sequences of symbols are converted to shorter sequences while the information stored in the original sequences remains unchanged [1-4].
Compression methods may be performed in one or two passes. In two-pass methods, the input sequence is analyzed statistically in order to determine the distribution of symbols. This analyze is performed in the first pass. In the second pass, coding is performed based on the symbol distribution and other useful information which has been determined in the first pass. While two-pass algorithms have better performance, compared with the one-pass methods, they are not applicable in real-time applications.
In one-pass strategies, a pre-determined coding table is used which may or may not changed adaptively during the coding process. One-pass methods are highly applicable in real-time applications.
While most of the compression methods are implemented in software, hardware implementation of these algorithms may improve the coding performance from the coding latency and throughput points of view.
One of the well-known data compression algorithms is Deflate which is designed based on the LZ method . Deflate method has three different modes where its second mode is applicable for real-time applications. In this mode, a certain static table of Huffman codes is employed during the coding procedure.
In this paper, a new version of deflate algorithm is proposed. In the proposed method, a new basic coding table is employed and is modified adaptively based on the input sequence. Simulation results show that the coding performance is improved.
Beside the above mentioned modifications, the proposed method is implemented in hardware. In the proposed hardware implementation, through some parallelism concepts, we try to improve the hardware utilization and throughput.
In general, in the first step we modify the original and standard deflate method by inserting some additional processing which make it smarter while compression is being happened. This additional process forces the software implementation of proposed method to take additional run time. In the second step, by considerations of hardware utilization and throughput parameters, we tried to present a hardware implementation in the way that the forced overhead time be eliminated and run time of the proposed method remains almost similar to the standard deflate run time.
The rest of the paper is organized in this manner: in Section 2, the adaptive deflate algorithm is proposed. Section 3 describes the hardware implementation of the new algorithm. Section 4 is dedicated to concluding remarks.
There are too many compression algorithms which have been designed for different applications. While these methods may act in different manners, all of them are based on a common basic concept: there are unnecessary redundant data in the information which may be removed .
Lempel-Ziv (LZ) algorithm is one of the first and well-known algorithms which act based on dictionaries. This algorithm has been developed by Jacob Ziv and Abraham Lempel in 1970s. In LZ algorithm, sequences of characters are translated to codes based on a dictionary. This dictionary is generated based on the data that have been coded previously. LZ method employs a sliding window to find the longest substring of the input data in the dictionary. Then, appropriate data about this substring and its place in the dictionary is generated in the output sequence. Lempel and Ziv have proved that their dictionary-based method can compress data to the entropy, but they pointed out that enormous quantities of data would be needed to come near to the ideal compression .
Deflate is a popular compression method that was originally used in the well-known Zip and Gzip software and has since been adopted by many applications, the most important of which are HTTP protocol, PNG (Portable Network Graphics) and Adobe's PDF . Deflate method is a dictionary-based compression algorithm which has been designed based on the LZ solution. Deflate has three different modes . In the first mode (no compression) the input sequence of symbols is segmented to sub-sequences with no compression. This mode is applicable for files that are incompressible or for cases where the compression software is asked to segment a file without compression. The second mode (compression with fixed code tables) is a one-pass compression solution. In this strategy, a pre-determined coding table is employed during the coding process. The second mode of deflate is applicable in real-time applications. The third mode of deflate is a two-pass compression solution, working based on dictionaries which are generated based on statistical features of the input file. This mode is not suitable for real-time applications, because of the expected long latency.
One of the main drawbacks of the deflate algorithm in its second mode, is that it employs fixed static coding tables which remain unchanged during the coding process. While coding based on a pre-determined table reduces the coding latency, it could be shown that adaptive code tables may result in better performance from the compression rate point of view.
Statistical models that have been used in this research had been generated with different distributions of verbatims and code-lengths in different data types.
Fig.1 and Fig. 2 show the average of lengths and verbatims in the employed data types. It is clear from Fig. 1 that only 20% of ASCII codes have been presented in the data more than 70% of times. Also, Fig. 2 shows that only 5% of lengths have been occurred for about 90% of times. According to these facts which directly realized from the simulation results, we predict that if table of codes gradually adapt itself with ouput of LZ method, it will lead to better results.
It may be shown that adaptive source coding algorithms result in better compression rates. Hence, in this paper, a new adaptive version of deflate (in its second mode) is proposed. This adaptive version has been built upon a simple idea: at each iteration, the current coding tables are modified in the way that tables remain valid. The valid table is a coding table which matches shorter codes to more occurred uncoded cases. In order to employ this idea both of the length and literal tables are being sorted downward based on the frequency of the literals and lengths. Then, instances with greater occurrence frequency will be coded with shorter codes.
Another modification that has been made in the second mode of deflate was to design a new coding table for this algorithm. The new coding table is designed based on 1600 simulations with different file types including text files, binary executable files, bitmap files, audio wave files, Windows DLL files and Adobe PDF files. The best table (among 20 candidate ones) has been selected with respect to the Compression Ratio. Candidates have been designed based on statistical analysis of different types of data.
Tables 1 and 2 show the Huffman codes for Verbatims and Lengths used in the standard second mode of deflate. These are illustrated in Tables 3 and 4 for our proposed method.
Fig. 3 shows the difference between the Compression Ratio of our proposed method with the results of the standard deflate algorithm in its second mode. It is clear from this figure that the proposed adaptive version of deflate has a Compression Ratio, about 0.5 to 6% better than the standard version. Table 5 shows average improvements gained from our proposed method for some popular data types.
Figure 1. Average distribution of literals in popular data types.
Figure 2. Average distribution of lengths in popular data types.
Table 1. Huffman codes for Verbatims used in standard deflate.
00110000 - 10111111
110010000 - 111111111
Table 2. Huffman codes for Lengths used in standard deflate.
Table 3. Huffman codes used for Verbatim in the proposed method.
Table 4. Huffman codes used for Lengths in the proposed method.
Figure 3. Average difference between compression ratio of the proposed method and the standard deflate.
Table 5. Average improvements gained from the proposed method for different data types.
Average Improvement in Compression Ratio
Popular Data Types
In this section, a novel implementation of the proposed adaptive deflate is described. This implementation is performed in hardware. In order to implement this algorithm on hardware, the system is described by Verilog  and is synthesized on FPGA.
The block diagram of the adaptive deflate is illustrated in Fig. 4. In this block diagram, functions that could perform in parallel are placed in different blocks. This parallelization reduces run time and the size of the employed RAM in the final circuit.
As it is illustrated in Fig. 4, there are two main parts in the system: LZ Encoder and Dynamic Deflate Coder. At each positive edge of the reading clock pulse, LZ Encoder reads a character from the input. The processing starts at the following positive edge of the processing clock pulse. The frequency of the processing clock could be much greater than the reading clock's frequency. Processing unit of the LZ Encoder implements the LZ algorithm. This unit includes different functions such as search buffer shift and update, look-ahead-buffer shift and update and also generates different outputs related to the LZ method, including distance, length and verbatim.
In Dynamic Deflate Coder the outputs of the LZ Encoder are translated to appropriate output codes. This block composed from five smaller sub-units: in the preprocessing unit, some high level decisions will take out. In this unit, according to deflate algorithm, if length value is one or two, we should just encode verbatim(s). In order to obtain this goal, we trigger dynamic verbatim code table unit by activating a corresponding signal to its high level. Even though that, the value of length is three or larger than three, both of the length and distance values should be encoded. Similar to the previous case, activating a specific signal to its high level leads to activate the dynamic lengths code table unit. In addition to the dynamic length code table, a unit, called distance coder, is employed to encode the distance values. At the end of the encoding procedure, the unit called Finalize Output Code, uses output of other units to produce the fialized code and obtaion the code size which is used by the next block which may responsible to write the code into a memory or transmit the code over a communication channel. These blocks are illustrated in Fig. 5.
Handshake signals are used to synchronize different blocks of the system. Each block, after generating the desired output, waits for the next block to generate a ready signal. In the following negative edge, data valid signal is activated and remains active until the following negative edge. The next block, receives the valid data on the positive edge of the clock cycle.
During the hardware implementation the concept of parallelizing the LZ-Encoder with the Dynamic Deflate Coder unit has been considered. In fact, in the hardware implementation, operations such as shifting the buffers and sorting the tables take place concurrently. The next
parallelizing was to doing both of distance encoding and length encoding at the same time. Hence, the time overhead of extra processing due to the proposed method was entirely eliminated and the adaptive approach takes as much run time as the standard deflate.
In order to implement the system on a FPGA, different blocks and the overall system have been described in Verilog hardware description language. The related pseudo-code is illustrated in Fig. 6, 7, 8 and 9.
The proposed implementation of the modified adaptive deflate, has been synthesized in Spartan and Virtex FPGAs of Xilinx. Table 6 shows the results that have been generated by the synthesizing software. Results show that the processing clock frequency could be up to 56 MHz, on a Vertex2 (Xc2v500) chip.
Figure 4. Block diagram of the overall system.
Figure 5. Block diagram of Dynamic Deflate Coder.
Do @posedge of readingclock
Figure 6. Pseudo-code of the reading block.
Do @posedge of processingclock
Find (verbatim1, verbatim2, length, Distance);
Figure 7. Pseudo-code of the LZEncoder block.
Do When (LZOutputValid==TRUE)
DynamicVerbatimsCodesTableInputFlag = TRUE;
else If (length==2)
dynamicVerbatimsCodesTableInputValid = TRUE;
(trigger the dynamicVerbatimsCodesTable for first time).
wait for dynamicVerbatimsCodesTable to finish updating.
dynamicVerbatimsCodesTableInputValid = TRUE;
(trigger the dynamicVerbatimsCodesTable for second time)
dynamicLengthsCodesTableInputValid = TRUE;
distanceCoderInputValid = TRUE;
Do @posedge of processingclock
Distance Encoder Block
Do @posedge of processingclock
Do @posedge of processingclock
Figure 8. Pseudo-code of dynamic deflate coder sub-block.
Finalize Output Code Block
Join (coded length, coded verbatim, coded Distance)
And calculate the output code size.
Figure 9. Pseudo-code of Finalize sub-block.
Table 6. Synthesizing Results
Processing clock period
Reading clock period
Total number of register slices
646 (18% fit)
643 (16% fit)
Total number of logic slices
1726 (99% fit)
1768 (92% fit)
1757 (57% fit)
Total number of 4 input LUT
3250 (94% fit)
3201 (83% fit)
3205 (52% fit)
In the proposed hardware implementation, it is possible to use some ideas to improve the throughput and performance. A customized table of codes could be employed for each specific data type rather than the one which has been used in this paper. These tables are well tuned for certain types of data and could be selected by the users in run-time. This modification could improve the coding performance. Also, using ideas such as employing extra buffers in order to speed up the shift operations could improve the throughout effectively.
One of the well-known data compression algorithms is deflate which is designed based on the LZ method. In this paper, a new version of deflate algorithm has been proposed and implemented on FPGAs. The proposed version of deflate acts based on a new coding table. However, this table is used as a starting point and the algorithm updates the table adaptively based on the input data stream. Simulation results show that in about 95% of cases (including different file types) our proposed method results in better compression ratios comparing with the standard deflate algorithm.