This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
The discrete wavelet transform (DWT) has extensively been used in many applications. The existing architectures for implementing, wavelet transform decomposes a signal into a set of basic functions. These basic functions are called wavelets it converts an input series x0,x1,…xm, into one high pass wavelets coefficient series and one low-pass wavelet coefficient series.
WHY DWT: (Wavelet Vs Fourier Transforms)
Fourier transforms provides frequency domain representation of signal, while wavelet transform provides time-frequency representation of signal. Fourier transform is good for analysis of stationary signal, wavelet works well for both stationary and non-stationary signals. Fourier is transform which provides all frequency components without giving time-domain information. Wavelet is a multi-resolution analysis, which provides different time and frequency resolution to analysis different. In Fourier transform signal analysts already have at their disposal and impressive cache of tools. Fourier analysis which breaks down a signal into constituent sinusoids.
The DWT are mainly classified into two categories: convolution based and lifting based.
2) Lifting based. For one-dimensional (1-D) DWT, several convolution-based architectures have been proposed because the DWT computation is intrinsically the filter convolution. After the appearance of the lifting scheme and a factorization method of lifting steps , the lifting scheme has been widely used to reduce the computation of DWT and the control complexity of boundary extension. Since the lifting-based architectures have advantages over the convolution-based ones in computation complexity and memory requirement, more attention is paid on the lifting-based ones.
In , Jou et al. proposed architecture for directly implementing the lifting scheme. Based on this direct architecture, Lian et al.  proposed a folded architecture to increase the hardware utilization. Unfortunately, these architectures have limitations on the critical path latency and memory requirement. The flipping structure  can reduce the critical path latency by eliminating the multipliers on the path from the input node to the computation node without
hardware overhead. Reference  modified the conventional lifting scheme by merging the predictor and updater stages into a single lifting step; thus, the critical path latency is shortened, and the memory requirement is reduced. However, these architectures  and  involve a complex control procedure, and the round off noise had to be considered.
In order to solve those problems, we propose an efficient folded architecture (EFA) for the lifting-based DWT in this brief. The EFA can be obtained according to the following procedures: First, we give a new formula for the lifting algorithm, leading to a novel form of the lifting scheme. Due to this form, the intermediate data that were used to compute the output data are distributed on different paths. Thus, we can process these intermediate data in parallel by employing the parallel and pipeline techniques.
With the aforementioned operations, the conventional serial data flow of the lifting-based DWT is optimized into a parallel one. Thus, the corresponding optimized architecture (OA) has short critical path latency. More importantly, the resulted OA is of repeatability. Based on this property, the EFA is derived from the OA by employing the fold technique. With the proposed EFA, the required hardware resource is reduced, and the hardware utilization is greatly increased. Furthermore, the critical path latency and the number of registers are reduced. In addition, the shift-add operation is adopted to optimize the multiplication; thus, the hardware resource is further reduced, and the implementation complexity
is cut down.
ADVANTAGE OF LIFTING BASED ARCHITECTURE
Since lifting based architecture have more advantage over the convolution based one in computation complexity and requirement more attention is paid on lifting scheme. Various architecture for lifting based are. Direct implementation but this architecture had limitation on critical path latency and memory requirements. Flipping structure this architecture can reduce the critical path latency by elimination the multipliers on the path from input node to the computation node without hardware overhead; however this architecture involves a complex
Control procedure and round off noise to be considered. To solve these problems in this project we propose an efficient architecture for lifting based DWT. With the proposed EFA, the required hardware is reduced, critical path latency and numbers of registers are reduced.
In our work, we take the 9/7 wavelet filters as an example to explain the proposed EFA. The performance comparisons and FPGA implementation results indicate the efficiency of the proposed architecture.
Basic FPGA Structure
The uppermost capacity general purpose logic chips available for today's traditional gate arrays sometimes they are also referred as Mask Programmable Gate Arrays (MPGAs). MPGA consists of an array of a pre-fictional which can be adapted into the user's logic circuit by connecting the transistors with custom wires. Customization is always performed during chip fabrication by specifying the metal interconnect, this means in order for a user to employ an
MPGA a large setup cost involved and manufacturing time is long. Although MPGAs are clearly not FPDs, they are motivated design for user programmable equivalent. FPGAs like MPGAs, FPGAs comprise an array of apathetic circuit elements, known as logic blocks, and the interconnect resources, but FPGA configuration is performed through programming by the end user. FPGAs have been responsible for a major shift in the way digital circuits are designed.
From the above figure there is a number of 2 input NAND gates the chart over here serves to guide us to make selection of a specific device for a given application its depends on the logic capacity that we need. Each type of FPGAs is inherently used for better results than others, there also some other applications that is suitable for specific applications example like state machines, analog gate arrays, large interconnection problems.
As one of the largest upcoming segments in the semiconductors in most of the industry, the FPGAs market place is impulsive, as most of the companies are undergoing rapid changes its very difficult to mention which product will be most suitable during such kind of study situations, to provide more information we will not be discussing about all types of FPGAs may be a few of them, while describing it will include list like capacity, nominally in 2-input NAND gates as given by the hawker, gate count is very important issue in FPGAs.
There are two categories of FPGAs one is SRAM based FPGAs and the second is anti-fuse based FPGAs with the first one is, Xilinx and Altera are the main and the for the second is Actel, Quick-logic and Cypress. But for now we will discuss about Xilinx and Altera.
Xilinx the basic structure is array based, each chip consists of two dimensional array of logic blocks which can be interconnected through a horizontal and vertical routing channels, the first Xilinx FPGA was XC2000 series and after that there were three more series introduced like XC3000, XC4000 and XC4000. Although XC300 was widely used but XC4000 is more often used nowadays, XC5000 has the same features as XC4000 but its more speed installed in it. Recently introduced a anti-fuses called the XC8100 has many interesting features, but still it's not widely used, XC4000 has like 2000 to more than 15000 equivalent gates. XC4000 has a logic block called configurable logic block that is based on look up tables. LUTs is small one bit
memory array, where the address lines for the memory are inputs of the logic blocks and another one bit output from the memory is LUT output.
Altera's FLEX 8000 series consists of three level hierarchies much like that found in CPLDs, however the lowest the lowest level of the hierarchy consists of a set of lookup tables rather than an SPLD block and the FLEX 8000 is based on as FPGA, well it's a combination of FPGA and CPLD technologies. FLEX 8000 is SRAM based and has four LUT logic block its capacity range from 4000 to more than 15000.
FPGAs DESIGN FLOW
Design: two design entry methods HDL (Verilog or VHDL) or schematic drawings.
Synthesize to create: translates V, VHD, SCH files into an industry standard format EDIF file.
Implement design: translate Map, Place and Route.
Configure FPGA: download BIT file into FPGA.
Custom ICs sometimes designed to replace the large amount of glue logic, reduced system complexity and manufacturing cost, improved performance. Customs ICs are very expensive to develop and have a long delay to fabricate that is time to market.
Need to worry about two kinds of costs that are development cost sometimes called non-recurring engineering and the second is manufacturing cost. Customs IC are suitable only for products which are very high in volume which decrease the NRE and not taking more time to market sensitive. FPGAs are introduced as an alternative to custom ICs to improve density related to discrete MSI components, with the aid of computer aided design tools circuits could be implemented in a short amount of time relative to ASICs. No physical layout process, no making, no IC manufacturing, having a lower NRE and shortens TTM. FPGAs compete with microprocessors in dedicated and embedded applications. Resource utilization reeducation,
switching activity reduction, voltage scaling, parasitic capacitance of gate, capacitance associated with programmable interconnect these things come under the circumstances of capacity reduction.
APPLICATIONS OF FPGAs
FPGAs have gained recognition, development and growth over the past ten years because they can be applied to very wide range of applications. A some of the applications are random logic, integrating multiple SPLDs, device controllers, communication encoding and filtering, small to medium sized systems with SRAM blocks and still more.
Some of the other applications of FPGAs are prototyping of designs later to be implemented in gate arrays, and also emulation for complete large hardware systems. The previous applications can be made likely by using a single large FPGA which may corresponds to small gate array in the terms of capacity, so the conclusions would result in bring about many FPGAs connected by some sort of interconnections for emulation of hardware. Quick turn which has recently developed products that include many FPGAs and there may be necessary software to detach and plot circuits.
One of the capable areas for the FPGA application are the use of custom computing machines. This may include programmable parts to execute software rather compiling the software for execution on a regular CPU. The reader is referred to the FPGA based customs computing workshop which is held for the past four years. On the other hand the design's plotted are broken into small logic block sized pieces and they are distributed through different areas of the FPGA, depending on the interconnected source of the FPGA, sometimes there may be some delays associated with the interconnections between the logic blocks. The performance of the FPGAs more often depends on the CAD tools that plot circuit into the chip than compared in case of CPLDs.
However in feature time programmable logic will become one of the dominant forms of digital logic design and implementation. Through principal low cost of the devices which makes it attractive to many small firms and small parts of the companies. Fast manufacturing provide essential elements for the success in many industries, due to architecture and CAD tools improvement the disadvantages of FPDs compared to FPGAs lessen and they will dominate.
The lifting scheme is an efficient way to construct the DWT and . Generally, the lifting scheme consists of three steps: 1) split; 2) predict; and
3) Update. Fig. 1 shows the block diagram of the lifting-based structure. The basic principle is to break up the polyphase matrix of the wavelet filters into a sequence of alternating upper and lower triangular matrices and a diagonal normalization matrix.
According to the basic principle, the polyphase matrix of the 9/7 wavelet can be
Where and are the predict polynomials, and are the update polynomials and the K is the scale normalization. Here, the lifting coefficients α,
β, γ, and δ, and constant K are ï¡ï‚»--1.586134342 ,ï¢ï‚»-0.052980118,ï§ ï‚»-0.8829110762,ï¤ï‚»-0.4435068522 and Kï‚»1.149604398 respectively.
Given the input sequence xn, n = 0, 1, . . .,N − 1, where N is the length of the input sequence, the detailed lifting procedure is given in four steps.
First lifting step:
Second lifting step:
and are intermediate data, where l presents the stage of the lifting step. Output di and si, i = 0, . . . , (N − 1)/2, are the high-pass and low-pass wavelet coefficients.
From (3)-(4), it is obvious that the first and second lifting steps can be implemented using the same architecture, with alternating the lifting coefficients. Thus, the architecture for the first lifting step can be multiplexed using the folded method to reduce the hardware resource and areas. Based on this idea, we will propose a novel folded architecture for the lifting based
Data flow of 9/7 lifting based DWT
According to the aforementioned processing, the data flow of 9/7 lifting can be optimized into the four-stage pipeline flow. This is shown in Fig. 3(b). In this figure, the data read from the four delay registers shown in gray circles are used for current computation. Data D1, D2, D3, and D4 along the arrows are the candidates of the delay registers. They are computed in the current cycle and will be used in the next cycle.
Based on the optimized data flow shown in Fig. 3(b), the corresponding OA can be obtained, as shown in Fig. 4. this figure, the dashed line divides the architecture into two similar parts. Therefore, we can multiplex the left-side architecture, replacing the right-side one. In this way, we can obtain our proposed EFA. It is shown in the dashed area of Fig. 5.
In the following, we will show the EFA for processing the two lifting steps of the 9/7 filter. Intermediate data d(1) i and s(1) i , which were obtained from the first lifting step, are fed back to pipeline registers P1 and P2. They are used for the second lifting step. As a result, the first and second lifting steps are interleaved
by selecting their own coefficients. In this procedure, two delay registersD3 andD4 are needed in each lifting step for the proper schedule.
In the proposed architecture, the speed of the internal processing unit is two times that of the even (odd) input/output data. This means that the input/output
data rate to the DWT processor is one sample per clock cycle. The proposed architecture needs only four adders and two multipliers, which are half those of the architecture shown in Fig. 4.
For the splitting, we use one delay register and two switches to split the input into odd/even sequences. With regard to scaling, it consists of one multiplier and one multiplexer. By properly selecting coefficients 1/K and K, the high-pass and low-pass coefficients are normalized. Fig. 5 shows the complete structure of the proposed EFA, including the splitting, lifting, and scaling steps.
It is well known that, in the lifting scheme, the way of processing the intermediate data determines the hardware scale and critical path latency of the implementing architecture. In the following, we use the parallel and pipeline techniques to process the intermediate data. The corresponding architecture possesses repeatable property. Thus, it can further be improved, leading to the EFA.
Fig 4 : Corresponding optimize architecture
EFFICIENT FOLDED ARCHITECTURE:
Fig 5: Efficient folded architecture