Efficient Architecture For Various Image Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Reconfigurable hardware offers significant potential for the efficient implementation of a wide range of computationally intensive signal and image processing algorithms. The advantages of utilizing Field Programmable Gate Arrays (FPGAs) instead of DSPs include reductions in the size, weight, performance and power required to implement the computational platform. FPGA implementations are also preferred over ASIC implementations because FPGAs have more flexibility and lower cost.

Recently, Field Programmable Gate Array (FPGA) technology has become a viable target for the implementation of algorithms suited to video image processing applications. The unique architecture of the FPGA has allowed the technology to be used in many such applications encompassing all aspects of image processing. The goal of this thesis is to develop FPGA realizations of three such algorithms on two FPGA architectures.

Traditional DSP processor arrays, with fixed architectures and relatively short life, can be costly programmed line by-line with thousands of code's lines. Alternatively, this paper presents a high-level abstract implementation method to fill the present programming gap between parallel algorithms coding and final FPGA implementation. The proposed FPGA implementation method is architecturally based on the Xilinx system generator development tool within the ISE 11.3 development.

On the other hand, parallel multidimensional image filtering algorithms, for aerospace, defense, digital communications, multimedia, video and imaging industries, demand insatiable computationally complex operations at maximum sampling frequency.

Edge detecting Algorithm using modelsim:

The HDL designer can use Link for ModelSim to simulate the HDL design in the Simulink environment using ModelSim, and compare the output of the HDL design to the output of the executable specification. Note that in this process, there is no need for generating an HDL test bench. The Simulink model feeds the input test vector to ModelSim through Link for ModelSim and extracts the data from ModelSim back to the Simulink environment. The HDL designer can readily verify whether the HDL code runs in accordance with the specifications.

The interlink of modelsim is shown in fig.1

In our example, the input to the edge detection algorithm has been a two-dimensional image of size 200x100. In a real-time system, the input is most likely not a matrix but a serial stream of data; for example, this serial stream of data can be generated by a Charge-Coupled Device (CCD). Therefore, we need to modify the structure of the design such that the edge detection algorithm accepts and performs 2D filtering on a serial stream of data.

Fig 1:interlink of model sim with the matlab

Convolution-kernel Generally, let the original image, x (n1, n2), be of size (N x N), and the kernel, β (m1, m2) of size (M x M), then the output image, y (n1, n2), can be expressed by the 2-D convolution formula:


Where,, Moreover, the 2-D image is equally subdivided into small sub-sequences of size ((N/n) x (N/n)) which are independently convolved:



Nine 3x3 convolutional kernels are utilized for the parallel 2-D MRI image filtering algorithms. One of the nine algorithms, namely, the Edge algorithms is empirically modified by a new Edge enhancement orthogonal kernels matrix to enhance fine detail in images,


Xilinx system generator:

Xilinx System Generator (XSG) [12,13] is an integrated design Environment (IDE) for FPGAs within the ISE 11.3 development suite, which uses Simulink[14], as a development environment and is presented in the form of model based design. It has an integrated design flow, to move directly to the Bit stream file (*. bit) from Simulink design environment which is necessary for programming the FPGA.

One of the most important features of XSG is that it possesses abstraction arithmetic that is working with representation in fixed point with a precision arbitrary, including quantization and overflow. XSG can only perform simulations as a fixed point double precision type. XSG automatically generates VHDL/Verilog code and a draft of the ISE model being developed.

Fig 2.Xilinx design flow


These parallel 2-D MRI image filtering algorithms can be behaviorally captured as a stream model-based synchronous dataflow system using system generator libraries. The clock and its corresponding enable logic do not appear in the system generator block diagram but are internally generated when the FPGA implementation is behaviorally compiled within Xilinx/Simulink environment.

The 2-D convolution operation, in, can be functionally implemented as an n-tap MAC FIR filter. Consequently, the parallel 2-D image filtering algorithms can be efficiently realized using n-tap MAC FIR filters with nine programmable coefficient sets. Further high abstracted implementation can be achieved using a 3x3 filter image block, as in Fig. 2.

The entire operation of edge detection proposed using Simulink and Xilinx blocks goes through 3phases,

Image pre-processing blocks.

Edge detection using XSG.

Image post-processing blocks

For the design of filters to meet hardware requirements, it is a must to pre-process the image prior to the main hardware architecture. In the software level simulation using Simulink blocksets alone, where the image is used as a two-dimensional(2D) arrangement such as M x N, there is no need for any image pre-processing, but at hardware level this matrix must be an array of one dimension(1D), namely a vector, where it requires image pre-processing.

The implementation diagram consists of three stages:

MRI input, processing and output. In the first stage, the magnetic resonance imaging (MRI) pixels are sequentially sub-streamed into 3 virtex line buffers via a pipelined gateway block. Each line is delayed by 64 samples and line 3 is a copy of the MRI scan. The second stage consists of parallel five n-tap MAC FIR filters and four adder blocks structure which can be abstractly provided by the 3x3 filter block, as shown in Fig. 2, to filter the 64x64 grayscale MRI scan.

Nine different 2-D FIR filters can be applied via the 3x3 filter block. The nine filters are Edge, SobelX, SobelY,

SobelXY, Blur, Smooth, Sharpen, Gaussian and Identity

Thus, the stored coefficients can be modified by changing the mask of the 3x3 FIR filter. Each n-tap MAC FIR filter is clocked 5 times faster than the input rate and the 3x3 filter operates at 213 MHz. Therefore the throughput of the design is 213 MHz / 5 =42.6 million pixels/second.

For the 64x64 MRI image, this is 42.6x10^6/ (64x64) = 10,400 frames/sec.

Fig3. Xilinx System Generator Captures of the Parallel Nine 2-D Image filtering algorithms.

Figure 4. The 2-D MRI images filtered, via Virtex-6 X240T, using 2-D filter types; A. Edge, B. SobelX, C. SobelY, D. SobelXY, E. Blur, F.Smooth, G. Sharpen, H. Gaussian, I. Identity.

The third stage is pipelined by inserting delay block between the 3x3 filter and the gateway boundary block to be displayed via a simulink block, Fig. 3, that pop up the original MRI image together with the filtered result, as

shown in Fig. 4

The nine 2-D filters types can be either selected by changing the mask parameter on the 3x3 Filter block or modified.

The single system generator diagram in Fig. 2 is Behaviorally equivalent to 7140 lines of VHDL program code and a 8423 lines of Verilog program code. Those thousands of code lines must be manually verified, refined and re-entered line-by-line. This can be a waste of valuable time. Consequently, this paper proposes, after development, an FPGA implementation method.


The developed method is a high-level FPGA implementation method for any DSP algorithms to avoid all the drawbacks of the traditional HDL programming.

The method has only five simple steps, namely:

1. State the DSP algorithm.

2. Structure the DSP algorithm architecture.

3. Algorithm captures using system generator from Xilinx.

4. Quality of results is verified, refined and


5. FPGA bit stream generation.

Region of interest:

In the post processing blockset the feature of ROI is added. The image post-processing blocks which are used to convert the image output back to floating point type is shown in Fig.5

For post-processing it uses a Buffer block which converts scalar samples to frame output at lower sampling rate, followed by a 1D to 2D (matrix) format signal block, finally a sink is used to display the output image back in the monitor, utilizing the Simulink block sets. This proposed design architecture has also been utilized in an application oriented design by adding appropriate image post processing blocks as shown in Fig.6 with added features like region of interest (ROI) section which defines the shape and position of ROI and statistical feature extraction for different tissue analysis. The different textural statistics that can

equation differentiate the tissues like mean, variance and standard deviation are computed using following equation

ROI extraction and statistical analysis is shown in below


The Xilinx System Generator tool is a new application in image processing, and offers a model based design for processing. The filters are designed by blocks and it even supports Matlab codes through user customizable blocks. It also offers an ease of designing with GUI environment. This tool support software simulation, but most importantly it generates necessary files for implementation in all Xilinx FPGAs, with the parallelism, robust, speed and automatic area minimization. These features are essentials in real time image processing.

Furthermore, the X240T FPGA based implementation frequency increased from (194 MHz) to (229 MHz) with relatively the same total power consumption of (1.56 Watt). On the other hand, the X130T FPGA Power consumption is comparatively lowered to (0.96 W) at maximum frequency of (228 MHz).

The total design is implemented in xilinx Virtex-6 X240T FPGA board. In convolution operation, the original image is multiplied with 3x3 kernels. Due to the 3x3 kernel utilizing the hardware complexity will be reduced rather than using 5x5 kernel. In this ROI is applied to the brain image the figures of this are given below.

ROI extracted image is shown in below


The feature of ROI is added to the filtered image to find out the location of the defect. When the region of interest is compared with the previous filtered image, it is flexibility to findout the location of the defect.

The Xilinx System Generator tool is a new application in image processing, and offers a model based design for processing. The filters are designed by blocks and it even supports Matlab codes through user customizable blocks. It also offers an ease of designing with GUI environment. This tool support software simulation, but most importantly it generates necessary files for implementation in all Xilinx FPGAs, with the parallelism, robust, speed and automatic area minimization. These features are essentials in real time image processing. The design architecture used in this paper can be used for all Xilinx FPGA Kit with proper user configuration in System generator block and could be extended to real time image processing.