Design Of High Speed Fir Filter By Using 32 Bit Parallel Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Filters are signal conditioners. Filters have two uses: signal separation and signal restoration. Signal separation is needed when the signal has been contaminated with interference, noise or other signals.

Filter is the most important operation in DSP. An FIR filter is one of the primary types of digital filter used in DSP. Digital filters take a digital input, give digital output, and consist of digital component .A finite impulse response (FIR) filter is a filter structure that can be used to implement almost any sort of frequency response digitally. An FIR filter is usually implemented by using a series of delays, multipliers, and adders to create the filter's output.While implementing FIR filter, most of the hardware complexity is due to multiplication. To overcome this in this paper to implement FIR Filter, first I have designed parallel decimal floating point multiplier, which offers low latency & high throughput. It also provides more accuracy & precision compare to iterative decimal floating point multiplier. Most of the filter in DSP application is implemented using highly specialized DSP processor. This DSP processor is capable of carrying out high speed MAC operation, but has bandwidth limitation. While VHDL based filter are implemented with parallel-pipelined architecture, enhancing the overall performance. Mostly all the processor commonly uses binary floating point to represent the real number. But in representing & rounding decimal values such as 0.10, BFP system has error which often causes computer based calculation to differ from result calculated by hand. To avoid this, I have used decimal arithmetic while designing multiplier. Instead of using fix point arithmetic, I have used floating point. As the fix point has fix window of representation, which limits it from representing very large & small number. That is fix point prone to loss of precision .On the other hand floating point employs "sliding window "to represent very large & small number. In this paper a new clocking scheme is used for realization of Parallel FIR filter. The traditional clocking scheme uses one kind of clock edge, either rising, or falling. In my design of FIR filter, I have used both edges alternatively. It can reduce latency of filter by half, without increasing its clock frequency, while it still can maintain the system power dissipation.

INDEX TERMS: - low latency, high throughput, parallel decimal floating point multiplier

INTRODUCTION: -Signal processing becomes the most demanding application of digital design. Digital signal processing (DSP) deals with the manipulation of digital signal using complex signal processing system built from the basic building blocks like filter &signal transformation. Filter is the most important operation in DSP. Filter is used to modify an input signal in order to achieve further processing. Digital filters can be divided into two categories: finite impulse response (FIR) filters and infinite impulse response (IIR) filters. Although FIR filters, in general, require higher taps than IIR filters to obtain similar frequency characteristics, FIR filters are widely used because they have linear phase characteristics, guarantee stability and are easy to

implement with multipliers, adders and delay elements. The filter which I have designed in VHDL uses parallel DFP multiplier based on a parallel fix point multiplier[1] & complies with IEEE 754 Standard[3][4].

While designing FIR filter, I focused on the parameter latency. The latency is the difference between the time when a response generated at an output and the time when its corresponding stimuli received at the input. To get faster response system, people often increase the system clock frequency. However, this will increase the system dynamic power dissipation as well. To reduce the latency and to avoid the power increase, I used a new clocking scheme [2] for fully parallel pipelined FIR filters. The paper is organized as follows: Section I is the introduction of this paper. Section II is the structure of FIR filter Section III illustrates the design of 32 bit parallel decimal floating point multiplier with its simulation. Section IV illustrates design of FIR filter with order of 3 by using 32 bit parallel decimal floating point multiplier. Section V illustrates conclusion.


A) Basic structure of FIR filter:-

First let's see the basic structure FIR filter with order 3 as shown in fig 1

It consist of series of delay(d0,d1,d2) element, multiplier [h(0),h(1),h(2),h(3)],adder. x (n) is the input and y(n) is the output.

Fig1: structure of FIR filter

It can be described by the non-recursive difference equation given by

B) Properties:-An FIR filter has a number of useful properties which sometimes make it preferable to an IIR filter. FIR filters:

i) Are inherently stable. This is due to the fact that all the poles are located at the origin and thus are located within the unit circle.

ii) Require no feedback. This means that any rounding errors are not compounded by summed iterations. The same relative error occurs in each calculation. This also makes implementation simpler.

iii) They can easily be designed to be linear phase by making the coefficient sequence symmetric; linear phase, or phase change proportional to frequency, corresponds to equal delay at all frequencies. This property is sometimes desired for phase-sensitive applications, for example crossover filters and mastering.

The main disadvantage of FIR filters is that considerably more computation power is required compared to an IIR filter with similar sharpness or selectivity, especially when low frequencies (relative to the sample rate) cutoffs are needed.

C) Applications:-FIR filter widely used in various digital signal processing application like

Signal preconditioning

Video convolution function

Wireless communication

Digital video broadcast


A] Decimal arithmetic in IEEE 754

The IEEE 754 standard [3][4]specifies formats for both binary floating- point (FP) and decimal floating-point (DFP) numbers [11]. The primary difference between the two formats, besides the radix, is the nomalizationfion of Significant.BFP significant are normalized with the radix point to the right Of the MSB, while DFP significant are not required to be normalized and are typically represented as integers. In this paper, all DFP operands use integer significant. The IEEE P754 standard specifies DFP formats of 32, 64, and 128 bits. An IEEE 754 DFP number contains assign bit, an integer significant with a precision of p digits, and a biased exponent. The value of a finite DFP number is:

D = −1s Ã- C Ã- 10E−bias

Where s is the sign bit, C is the non-negative integer significant, and E is the biased non-negative integer exponent. The significant can be encoded either in binary or in Densely Packed Decimal (DPD) [4], which in the draft Standard is referred to as the decimal encoding. The exponent must be in the range [Emin, Emax], when biased by bias. Representations for infinity and Not-a-Number (Nan) are also provided. The non-normalized significant allows redundant representations of numbers. For example, multiplying 32Ã-1015 by 70 Ã- 1015 yields a result that could be represented as 22400 Ã- 1029, 2240 Ã- 1030, or 224 Ã- 1031. Because of the possibility of multiple representations, IEEE 754 defines a preferred exponent, which for multiplication is:

PE = EA + EB − bias (2)

Where EA and EB are the biased exponents of the first and second operands, respectively. the multiplier design presented in this paper uses 32-bit DFP numbers with significant encoded in the DPD format. This format has p = 7 decimal digits of precision in the significant, an unbiased exponent range of [−95, 96], and a bias of 101.

Table shows IEEE 754 Draft Standard for 32bit,64 bit & 128 bit

B] Design of Parallel Decimal floating point multiplier

Fig2: parallel decimal floating point multiplier

The multiplication begins with reading two operands in IEEE 754 format and decoding each to produce the sign bit, significant, exponent, and flags for special values of Not-a-Number (NaN) or infinity. The significant of the two operands are then decoded from the DPD encoding to Binary Coded Decimal (BCD). As soon as the decoded significant become available, a decimal fixed-point multiplication begins. If one or both of the operands is a Nan, its value is preserved through the multiplier by forcing the other operand to a value of one. The fixed-point multiplier generates decimal partial products in parallel and adds them along with a possible correction term using a carry-save adder (CSA) tree followed by a high-speed decimal carry-propagate adder. The result is a non-redundant BCD number referred to as the intermediate product (IP).In parallel with the multiplication, a shift-left amount (SLA) and corresponding intermediate exponent of the intermediate product (IEIP ) are calculated for shifting the fixed-point result to fit into p = 7 digits of precision. This calculation is performed using leading-zero detection (LZD) on both operands in order to estimate the number of significant digits in the result. Since the calculation occurs prior to the computation of the IP, the SLA may be off by one due to the significance of the product being one less than expected. In addition to the SLA and IEIP values, this unit calculates the sign bit of the final result and detects exception conditions which are later used by the rounding unit. The IP from the multiplier is then shifted by the SLA amount, forming the shifted intermediate product (SIP). The design sets the decimal point to be in the middle of the intermediate product, thus splitting it into a truncated product (TP+0) and a fractional product (FRP). This design choice keeps the decimal point in the same location throughout the data path and requires only a left-shift to produce the SIP. Next, the FRP is used to produce the guard digit, round digit, and sticky bit. In parallel, the TP+0 is incremented to allow the rounding logic to select between TP+0 and TP+1, which is sufficient to support all rounding modes. Finally, the rounding and exception logic uses the rounding mode and exception conditions to select between TP+0, TP+1, and special case values to produce the rounded intermediate product (RIP).

The fixed point multiplier [1] unit takes two operands, calculates partial products in parallel and returns their sum, as integer. There are three main components in the fixed-point multiplier design: generation of multiplicand multiples, selection of partial products, and reduction of partial products. All the resulting products are encoded in BCD-4221 to simplify the CSA tree.

C] Simulated result for 32 bit decimal floating point multiplier using Modelsim

D] Comparison table for iterative & parallel DFP



Fixed pt.

Floating pt.

Fixed pt.

Floating pt.

Latency (cycle)





Throughput (ops/cycle)





Cell count















V. A] Design of FIR filter

To design FIR filter with low latency in my work new clocking method is used.

The traditional clocking scheme for a fully parallel pipelined FIR filter is using the same kind clock edge either positive or negative.

Figure 3 shows fully parallel pipelined structure of FIR filter in which all the registers are clocked by rising edge of the clock. So latency, in clock cycles, from the filters input to its output is equal to 3, assuming the input x (n) is registered

Fig 3. Parallel Pipeline FIR Structure using Rising Edge

To reduce latency ,in this paper there is use of both clocking edge that is rising and falling for every two consecutive register as shown in fig.4.In this case filter latency reduces by half than the previous one

Fig4:- Parallel pipeline FIR structure using rising edge & falling edge

If the designs shown in Fig. 3 and Fig. 4 has the same clock frequency, both should consume the same amount of dynamic power, because the dynamic power dissipation, as specified by following equation, is proportional to the frequency.


Where Vdd is the circuit power supply, f is the operating

Frequency and C is capacitance.

B] Simulation result of FIR filter with order 3


High speed FIR filter is implemented by using 32 bit parallel decimal floating point multiplier. It uses a new clocking scheme to reduce latency, but still power dissipation remains same. The use of decimal floating point arithmetic gives highly precise & accurate result as compare to binary floating point arithmetic. Rather than DSP Processor VHDL based filter implemented with parallel- pipeline architecture enhance the overall performance.