# Parallel Pipelining For Portable Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Fast Fourier transform is widely used in the field of digital signal processing such as filtering, spectral analysis. FFT plays a critical role in modern digital communications such as digital video broadcasting and orthogonal frequency division multiplexing systems. Algorithms including radix-4,split-radix, radix-22 have been developed based on the basic radix-2 FFT approach [1],[2],[3],[4]. In this Hardware architectures are not fully utilized and require high hardware complexity [7],[11].

II.METHODS AND MATERIALS

The folding transformation and register minimization methods are used to derive several fast fourier transform. The parallel pipelined architectures are derived from the Fast Fourier Transform flow graphs. Fast Fourier Transform consists of complex and real valued signals. The parallel pipelined architectures are derived for computation of both complex and real valued signals. The folding transformation and register minimization techniques are used to derive several FFT architectures

Fig.1. Block Diagram of Parallel Pipelined Architectures for Fast Fourier Transform.

The process is described using an 8-point radix-2 DIF FFT as an example. It can be extended to other radices in a similar fashion. Fig.2 shows the flow graph of a radix-2 8-point DIF FFT. The graph is divided into three stages and each of them consists of a set of butterflies and multipliers The twiddle factor in between the stages indicates a multiplication by WNk where WN denotes the Nth root of unity, with its exponent evaluated modulo N.

Fig.2. Flow Graph of a Radix-2 8-point FFT.

Fig.3. Data Flow Graph of a Radix-2 8-point FFT.

This algorithm can be represented as a data flow graph (DFG) as shown in Fig.3. The nodes in the DFG represent tasks or computations. In this case, all the nodes represent the butterfly computations of the radix-2 FFT algorithm. In particular, assume nodes A and B have the multiplier operation on the bottom edge of the butterfly. The folding transformation is used on the DFG in to derive a pipelined architecture.

Fig.4. Pipelined DFG of a 8-point DIF FFT as a preprocessing step for folding.

The Folding Transformation is a technique to reduce the area of hardware architecture. In the folding transformation many butterflies in the same column can be mapped to one butterfly unit. To transform the DFG, require a folding set, which is an ordered set of operations executed by the same functional unit. Each folding set contains K entries i.e., the number of operations folded into a single functional unit.

The operation in the jth position within the folding set (where j goes from 0 to K-1) is executed by the functional unit during the time partition. The term is the folding order, i.e., the time instance to which the node is scheduled to be executed in hardware. For example, consider the folding set A = {O,O,O,O,A0,A1,A2,A3} for K=8. The Operation A0 belongs to the folding set A with the folding order 4. The functional unit executes the operations A0,A1,A2,A3 at the respective time instances and will be idle during the null operations. This situation to use the systematic folding techniques to derive the 8-point FFT architecture.

From the lifetime chart, the folded architecture requires 4 registers as opposed to 16 registers in a straightforward implementation.

The register minimization is a technique is to reduce the number of registers used in hardware architecture. The register minimization technique is based on life time of the variables. Maximum Number of live variables represents the minimum number of registers.

Table.1. Register allocation tablefor Radix-2 8-point FFT

From the allocation table in Fig.1 and the folding equations, the final architecture in can be synthesized.

The control signals for all the Multiplexer scan be easily generated using a 2-bit counter. The control signals for the 1st stage and 2nd stage multiplexers are obtained by dividing clock signal by 4 and by 2, respectively.

III. SIMULATION RESULTS

The results can be obtained by using the simulation tools Modelsim 6.3F and Xilinx ISE 10.1i. The Modelsim is a hardware simulation software in which design the hardware in VHDL/Verilog HDL. The Xilinx ISE software is used to synthesize the hardware. The complex FFT for different word lengths were implemented on Xilinx Virtex-5 FPGA (XC5VLX20T) using 16-bit fixed-point implementation. For RFFT, the input sequence is assumed to be real. It is easy to show that, if is real, then the output is symmetric.

Fig.5. simulation result of generating the dataflow of signals.

IV. CONCLUSION

This paper has presented a approach to derive the FFT architectures for a given algorithm. The proposed approach can be used to derive parallel architectures of any arbitrary parallelism level. Using this approach parallel architectures have been proposed for the computation of complex FFT based on radix-2n algorithms. The power consumption can be reduced by 37% and 50% in proposed architectures. For RFFT two different scheduling approaches have been proposed with one having less complexity in control logic while the other has fewer delay elements. Due to the capability of processing two input samples in parallel, the frequency of operation can be reduced by 2 which in turn reduces the power consumption upto 50%. These are very suitable for applications in implantable or portable devices due to their low area and power consumption.