# Image Compression System Using Dct And Idc Biology Essay

**Published:** **Last Edited:**

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Image compression, the art and science of reducing the amount of data required to represent an image, is one of the most useful and commercially successful technologies in the field of digital image processing. Image data compression is the technique to reduce the redundancies in the image data representation in order to decrease the data storage requirement and hence the communication cost. Reducing the storage requirement is equivalent to increasing the capacity of storage medium and hence communication bandwidth. Thus the development of efficient compression techniques will continue to be a design challenge for future communication system and advanced media application. Image compression plays an important role in many areas of interest like tele-video conferencing, remote sensing, document and medical imaging. An increasing number of applications depend on efficient manipulation, storage, and transmission of binary, gray scale and colour image.

Image compression algorithms can be broadly classified into lossy compression and lossless compression. The JPEG is one of the most popular and comprehensive continuous tone, still frame compression standard which is centred around Discrete Cosine Transform (DCT). In JPEG

Compression is performed in three steps viz, computation of DCT, quantization and Variable Length Coding. The DCT transforms the input image which is split into non-overlapping blocks of 8x8 matrix in spatial domain into frequency domain with different frequencies. During quantization the actual compression occurs wherein the less important and high frequency components are discarded and only the most important low frequency components which remain is used to retrieve the image in the decompression process. Once quantization is performed the quantized DCT coefficients are compressed using variable length codes. The JPEG standard uses the Huffman tables for variable length encoding.

DCT is computationally intensive since it takes on large number of multiplications. Many algorithms for DCT has been proposed [2] to reduce the number of computations and hence the power. Its also been proved that the lower bound of multiplications for the computation of DCT is 11[3].

In this paper architecture for DCT and IDCT is implemented using Lo-effler algorithm which requires 11 multiplications and 29 additions. This algorithm was selected as it uses minimum number of mathematical operations. In the design Canonic Signed Digit (CSD) representation for the constant coefficients is used which again reduces the effort of multiplications. Also the multipliers are replaced with adders and shifters. A technique known as Common Sub-expression Elimination [2] is used so as to obtain the utilization of the resources. Thus all of these making the design a low power implementation.

The rest of the paper is organised as follows: section 2 describes the DCT and IDCT algorithms, section 3 gives the description of Canonic Signed Digits (CSD); section 4 introduces to common sub-expression elimination (CSE); section 5 presents the proposed architecture; section 6 gives the implementation of the results and section 7 is the conclusion.

II. DCT AND IDCT ALGORITHM

The 8x8 2D DCT for the input f(x,y) is given as

N ï€1 N ï€1

f (x, y) cosïƒ©° (2x ï€«1)u ïƒ¹ cosïƒ©° (2 y ï€«1) ïƒ¹ -(1)

c(u, v) ï€½ a(u)a(v)

## ïƒ¥ïƒ¥

## ïƒª

2N

## ïƒº

## ïƒª

2N

## ïƒº

xï€½0 yï€½0

## ïƒ«

## ïƒ»

## ïƒ«

## ïƒ»

## ïƒ¬

1 for u/v ï€½ 0

Where

## ïƒ¯

(2)

## ïƒ¯

N

a(u / v) ï€½

## ïƒ

2

## ïƒ¯

for u/v ï‚¹ 0

## ïƒ¯

N

## ïƒ®

## 01

## A Low Power VLSI Architecture for Image Compression System Using DCT and IDCT

The 8x8 2D IDCT is given as

N ï€1 N ï€1

ïƒ©° (2x ï€«1)u ïƒ¹

ïƒ©° (2 y ï€«1)v ïƒ¹

(3)

f (x, y) ï€½

## ïƒ¥ïƒ¥

a(u)a(v)c(u, v) cos

cos

## ïƒª

## ïƒº

## ïƒª

## ïƒº

2N

2N

uï€½0 vï€½0

## ïƒ«

## ïƒ»

## ïƒ«

## ïƒ»

Direct implementation of the above equations will require 1024 multiplications and 896 additions. Thus the design will be more complex and also expensive.

By making use of separability property of DCT the 2D DCT/IDCT can be implemented by using the cascade of two 1D DCT and a transposition block as shown below. The transposition block transposes the output of 1D DCT.

Figure 1. 2 D DCT/IDCT

The 1D DCT is as given

N ï€1

ïƒ©° (2x ï€«1)u ïƒ¹

(4)

c(u) ï€½ a(u)

## ïƒ¥

f (x) cos

## ïƒª

## ïƒº

2N

xï€½0

## ïƒ«

## ïƒ»

and IDCT equations is as given below

N ï€1

ïƒ©° (2x ï€«1)u ïƒ¹

(5)

f (x) ï€½ a(u)c(u) cos

## ïƒ¥

## ïƒª

2N

## ïƒº

xï€½0

## ïƒ«

## ïƒ»

Where a(u) is as defined in equation (2).

A. Lo-effler Algorithm

Christoph Lo-effler has proposed a 8-point DCT algorithm that requires only 11 multiplications and 29 additions [3]. In this scheme the number of multiplications has been reduced to a theoretical lower bound which is 11.

The Lo-effler algorithm for the DCT is as shown in the fig 2. It has four stages, each stage has to be executed in series due to the data dependencies. As is seen in the figure, stage 1 requires 4 additions and 4 subtractions. In the second stage the algorithm is split into two parts, one of which for even coefficients and the other half for the odd coefficients. Again in the third stage the even coefficients is separated into even and odd parts. The scaling factor k= √2.

Figure 2. Flow graph of Lo-effler DCT The building blocks are as shown below:

Oo = I0 kcos(nπ/16) + I1k.sin(nπ/16)

O1= -I0 ksin(nπ/16) + I1k.cos(nπ/16)

Similar to DCT. The IDCT is also as given by Lo-effler algorithm as below with k=1/√2.

## Figure 3. Flow graph of Lo-effler IDCT

The equations of the kcn is modified in IDCT and is given

as

Oo = I0 k.cos(nπ/16) - I1k.sin(nπ/16) O1= I0 k.sin(nπ/16) + I1k.cos(nπ/16)

III. CANONIC SIGNED DIGIT (CSD)

Canonic Signed Digit was introduced by Avizienis, is a signed representation. Containing fewest number of non zero bits[4]. Thus for the constant multipliers, the number of toggles will be minimum and hence reducing the power consumption.

For a constant coefficient c, the CSD representation is as shown

N ï€1

where ci = {-1,0,1} ≡ {-,0,+}.

c ï€½ ïƒ¥ci 2

i

i ï€½0

CSD numbers have essentially two properties

no 2 consecutive bits in a CSD numbers are non zeros.

The CSD representation of a number contains the

minimum possible number of non zero bits.

The CSD representation contains about 33% fewer non zero bits than 2's complement number. Consequently, for constant multipliers the number of partial products are reduced. The CSD representation for the constant DCT

## 02

coefficients is as shown in the table 1.

TABLE I:

CONSTANT DCT COEFFICIENTS IN CSD

## constant

## Fractional

## Binary value

## Csd equivalent

## value

cos(6π/16

0.38268

00110001

0+0-000+

## )

sin(6π/16)

0.92388

01110110

+000-0-0

cos(3π/16

0.83147

01101010

+0-0+0+0

## )

sin(3π/16)

0.55557

01000111

0+00+00-

cos(π/16)

0.98079

01111110

+00000-0

sin(π/16)

0.19509

00011001

00+0-00+

IV. COMMON SUB-EXPRESSION ELIMINATION (CSE)

To replace the multipliers with adders and shifters CSE technique is used. CSE technique enhances the usage of adders and shifters by identifying the common expressions[2]. Thus by the use of CSE resource utilization is achieved. In the design the CSE has taken the advantage of CSD representation for identifying the common sub-expressions. Let us take for example the evaluation of Y(1).

Y(1) = c(5) + c(8);

b(5) + b(7) + b(6) + b(8) ;

a(5) * cos(3π/16) + a(8) * sin(3π/16) - a(6) * sin(π/16) + a(7) * cos(π/16) + a(6) * cos(π/16) + a(7) *

sin(π/16) - a(5) * sin(3π/16) + a(8) * cos(3π/16) ;

= a(5) * (27- 25+ 23+ 21) + a(8) * ( 26+ 23- 20) - a(6) *(25-23+ 20) + a(7) * (27- 21) + a(6) *(27- 21) + a(7) * (25- 23+ 20) - a(5) * (26+ 23 - 20) + a(8) * (27-25+23+21) ;

= 27* (a(5) + a(8) + a(7) + a(6)) + 26* (a(8) - a(5)) - 25 *(a(5) + a(8) + a(6) - a(7)) + 23 ( a(5) + a(8) + a(6) - a(7) - a(5) + a(8)) +21* (a(5) + a(8) - a(7) - a(6)) -20 * (a(5) + a(8) + a(6)

a(7)) ;

27* (sb1 + sb4) + 26*(sb2) -25* (sb1 + sb3) + 23 *(sb1 +sb2 + sb3) + 21 * (sb1 - sb4) - 20 *(sb2 + sb3) ;

Where sb1=a(5)+a(8); sb2=a(8)-a(5); sb3=a(6)-a(7); sb4=a(6)+a(7).

Thus as can be seen sb1 occurs 4 times, sb2 occurs 3 times, sb3 for 3 times and sb4 occurs 2 times. Thus implementing these sub-expressions once we can share the hardware and hence power reduction can be achieved. Now the multipliers above can be replaced by the shift operations yielding

Y(1) = (sb1+sb4)<<7 + (sb2)<<6 - (sb1+sb3)<<5 + (sb1+sb2+sb3)<<3 + (sb1-sb4)<<1 - (sb2+sb3) .

V. PROPOSED DCT AND IDCT ARCHITECTURE

The 2D DCT is implemented by using two 1D DCT blocks and a transposition block. The 8x8 image matrix having the values in the range of 0 to 255 is first converted from unsigned to signed representation which is called level shifting by subtracting from 128. The signed converted values are now given as the input to first 1D DCT in either rowwise or columnwise. The output of the first 1D DCT is given to the transposition block where the transpose of the input takes place. Then the output of this transposition block is given as input to the second 1D DCT to obtain the final 2D DCT.

The complete 2D DCT is as shown in figure 4.

Figure 4. The 2D DCT architecture

Similarly to the DCT the 2D IDCT is also implemented using two 1D IDCT blocks and a transposition block. In the IDCT process to the obtained 2D IDCT an addition of 128 is done to reconstruct the original image.

The architecture for 1D DCT which is implemented using Lo-effler algorithm and using CSD and CSE is as shown in figure 5.

## Figure 5. The 1D DCT architecture

## 03

## A Low Power VLSI Architecture for Image Compression System Using DCT and IDCT

The eight pixels in a column of 8x8 matrix is input in parallel to the input registers and then the registered inputs are sent for processing. The architecture is implemented in pipeline structure. The design implemented here is a parallel in and parallel out thereby reducing the latency of the design

## Figure 6. The 1D DCT top module

## Figure 7. The 2D DCT simulation results

## Figure 10. MATLAB image results

A. Power Analysis

RTL compiler of CADENCE was used to synthesize the design. Once the synthesize was done the power and area reports were obtained by mapping the design to 180 nm TSMC library. The RTL schematic of the design was also obtained.

VI. IMPLEMENTATION AND RESULTS

The architecture for the proposed DCT/IDCT is modelled using Verilog HDL. MATLAB was used to obtain the input image pixels for the design and also to reconstruct the image after obtaining the outputs from the IDCT core. The functionality of the designs were verified by simulating the design in ISIM of XILINX. RTL compiler of CADENCE was used to synthesize the designs and the power and area reports were obtained. The simulation results for 2D DCT and 2D IDCT from ISIM simulator is as shown in figures 7 and 8. MATLAB results are as shown in figure 9 and 10.

B. Power Analysis

## Figure 8. The 2D IDCT simulation results

## Fig 9. MATLAB results

RTL compiler of CADENCE was used to synthesize the design. Once the synthesize was done the power and area reports were obtained by mapping the design to 180 nm TSMC library. The RTL schematic of the design was also obtained.

TABLE 2

POWER AND AREA REPORT OF DCT AND IDCT

## Features

## DCT

## IDCT

Power

2.488mW

3.143Mw

No.of cells

8827

10901

Cell area

0.1033mm2

0.1235mm2

VII. CONCLUSION

This paper proposed a low power VLSI architecture for DCT and IDCT for image compression system. Since there were no multipliers used in the design a very low power was obtained. Also due to pipelined structure and parallel input and parallel output the design had very low latency. Power reduction was achieved due to both CSD and CSE techniques and hence a low power design.