We write custom essays and can help you with

The essay examples we publish have been submitted to us by students. The essays are the student's work and are not examples of our expert essay writers' work. READ MORE


See how we can help

Need help with your essay?

Environmental systems research institute

CHAPTER 1

INTRODUCTION

1.1 Company Overview

Environmental Systems Research Institute or ESRI, founded by Jack and Laura Dangermond in the year 1969, is the world leader in the Geographic Information System (GIS). ESRI has evolved from a small consulting firm specializing in land use analysis projects to the largest research and development organization dedicated to GIS The worldwide headquarters of ESRI are located in a multi-campus environment in Redlands, California.

ESRI is involved in providing GIS solutions and support to small businesses, large corporations, non-government organizations (NGOs) and governments at all levels. The suite of GIS products offered by ESRI is referred by the name ArcGIS. ArcGIS features client level applications, server level software and applications for mobile devices. ArcGIS also comes bundled with developer products and web service.

ESRI has offices throughout the United States; a business partner program with more than 2,000 developers, consultants, resellers, and data providers; and a network of more than 80 international distributors with more than a million users in more than 200 countries

1.2 ESRI Software Products

ArcGIS is the flagship software suite of ESRI and is an integrated collection of GIS software products for building and deploying a complete GIS solution wherever needed—on desktops, servers, or custom applications; over the Web; or in the field.

Whether you need to perform spatial analysis, manage large amounts of spatial data, or produce cartographically appealing maps to aid in decision making, ArcGIS allows you to use one common platform to meet all your GIS needs. And because ArcGIS is built using technology standards, it will integrate well with your existing systems. It is an integrated collection of GIS software products for building and deploying a complete GIS wherever it is needed—on desktops, servers, or custom applications; over the Web; or in the field.

1.2.1 ArcGIS Desktop

ArcGIS Desktop is software that allows you to discover patterns, relationships, and trends in your data that are not readily apparent in databases, spreadsheets, or statistical packages.

Beyond showing you your data as points on a map, ArcGIS Desktop gives you the power to manage and integrate your data, perform advanced analysis, model and automate operational processes, and display your results on professional-quality maps.

ArcGIS Desktop Products include:

1.2.2 Server GIS

ArcGIS Server enables you to distribute maps, models, and tools to others within your organization and beyond in a way that fits well into their workflows. Server GIS products allow GIS functionality and data to be deployed from a central environment. Server GIS complements ArcGIS Desktop by allowing GIS analysts to author cost-effective maps, globes, and geoprocessing tasks on their desktops and publish them via a server using integrated tools. GIS functions can then be delivered as services throughout the enterprise.

With ArcGIS server the customer can :

1.2.3 Mobile GIS

ArcGIS Mobile is a geographic information system (GIS) software platform that enables organizations to deliver GIS data and services from centralized servers, providing real-time access to information over wireless networks to a range of Mobile devices. These include:

ESRI provides two major Mobile GIS products:

1.2.4 Online GIS

ArcGIS Online: With ArcGIS Online you can:

Chapter 2

Geographical Information System (GIS)

2.1 Overview

A geographic information system (GIS) is an integrated collection of computer software and data used to view and manage information about geographic places, analyze spatial relationships, and model spatial processes. A GIS provides a framework for gathering and organizing spatial data and related information so that it can be displayed and analyzed.

GIS gives you tools to analyze your data and see the results in the form of powerful, interactive maps that reveal how things work together, allowing you to make the most informed decisions possible.

GIS is a rapidly growing technological field that incorporates graphical features with tabular data in order to assess real-world problems. What is now the GIS field began around 1960, with the discovery that maps could be programmed using simple code and then stored in a computer allowing for future modification when necessary. This was a welcome change from the era of hand cartography when maps had to be painstakingly created by hand; even small changes could require the creation of a new map. The earliest version of a GIS was known as computer cartography and involved simple linework to represent land features. From that evolved the concept of overlaying different mapped features on top of each other to determine patterns and causes of spatial phenomenon.

The capabilities of GIS are a far cry from the simple beginnings of computer cartography. At the simplest level, GIS can be thought of as a high-tech equivalent of a map. However, not only can paper maps be produced far quicker and more efficiently, the storage of data in an easily accessible digital format enables complex analysis and modeling not previously possible. The reach of GIS expands into all disciplines and has been used for such widely ranged problems as prioritizing sensitive species habitat to determining optimal real estate locations for new businesses.

The key word to this technology is Geography - this usually means that the data (or at least some proportion of the data) is spatial, in other words, data that is in some way referenced to locations on the earth. Coupled with this data is usually tabular data known as attribute data. Attribute data generally defined as additional information about each of the features, which can then be tied to spatial data. Information about a city (like Dubai) could be an example for this. The actual location of the city (its latitude, longitude, etc.) is the spatial data. Additional data such as the city name, population, number of vehicles, etc. would make up the attribute data. It is the partnership of these two data types that enables GIS to be such an effective problem solving tool through spatial analysis.

GIS operates on many levels. On the most basic level, GIS is used as computer cartography, i.e. mapping. The real power in GIS is through using spatial and statistical methods to analyze attribute and geographic information. The end result of the analysis can be derivative information, interpolated information or prioritized information. [3]

1

2

2.1

2.2

2.2 Defining a GIS

Different firms involved in the GIS field have provided varying definitions of the term “GIS”. Some of them are given below:

“In the strictest sense, a GIS is a computer system capable of assembling, storing, manipulating, and displaying geographically referenced information, i.e. data identified according to their locations. Practitioners also regard the total GIS as including operating personnel and the data that go into the system.”

- USGS

“A geographic information system (GIS) is a computer-based tool for mapping and analyzing things that exist and events that happen on earth. GIS technology integrates common database operations such as query and statistical analysis with the unique visualization and geographic analysis benefits offered by maps.”

- ESRI

“GIS is an integrated system of computer hardware, software, and trained personnel, linking topographic, demographic, utility, facility, image and other resource data that is geographically referenced.”

- NASA [3]

2.3 Applications of GIS

Computerized mapping and spatial analysis have been developed simultaneously in several related fields. The present status would not have been achieved without close interaction between various fields such as utility networks, cadastral mapping, topographic mapping, thematic cartography, surveying and photogrammetric remote sensing, image processing, computer science, rural and urban planning, earth science, and geography.

The GIS technology is rapidly becoming a standard tool for management of natural resources. The effective use of large spatial data volumes is dependent upon the existence of an efficient geographic handling and processing system to transform this data into usable information.

The GIS technology is used to assist decision-makers by indicating various alternatives in development and conservation planning and by modeling the potential outcomes of a series of scenarios. It should be noted that any task begins and ends with the real world. Data are collected about the real world. Of necessity, the product is an abstraction; it is not possible (and not desired) to handle every last detail. After the data are analyzed, information is compiled for decision-makers. Based on this information, actions are taken and plans implemented in the real world.

Major areas of application include:

Different streams of planning

Urban planning, housing, transportation planning architectural conservation, urban design, landscape.

Street Network Based Application

It is an addressed matched application, vehicle routing and scheduling: location and site selection and disaster planning.

Natural Resource Based Application

Management and environmental impact analysis of wild and scenic recreational resources, flood plain, wetlands, aquifers, forests, and wildlife.

View Shed Analysis

Hazardous or toxic factories siting and ground water modeling; Wild life habitat study and migrational route planning.

Land Parcel Based

Zoning, sub-division plans review, land acquisition, environment impact analysis, nature quality management and maintenance etc.

Facilities Management

Can locate underground pipes and cables for maintenance, planning, tracking energy use. [5]

2.4 Tasks Performed by a GIS

A general-purpose GIS essentially performs six processes or tasks:

Input

Before geographic data can be used in a GIS, the data must be converted into a suitable digital format. The process of converting data from paper maps into computer files is called digitizing.

Modern GIS technology can automate this process fully for large projects using scanning technology; smaller jobs may require some manual digitizing (using a digitizing table). Today many types of geographic data already exist in GIS-compatible formats. These data can be obtained from data suppliers and loaded directly into a GIS.

Manipulation

It is likely that data types required for a particular GIS project will need to be transformed or manipulated in some way to make them compatible with your system. For example, geographic information is available at different scales (detailed street centerline files; less detailed census boundaries; and postal codes at a regional level). Before this information can be integrated, it must be transformed to the same scale (degree of detail or accuracy). This could be a temporary transformation for display purposes or a permanent one required for analysis. GIS technology offers many tools for manipulating spatial data and for weeding out unnecessary data.

Management

For small GIS projects it may be sufficient to store geographic information as simple files. However, when data volumes become large and the number of data users becomes more than a few, it is often best to use a database management system (DBMS) to help store, organize, and manage data. A DBMS is nothing more than computer software for managing a database.

There are many different designs of DBMSs, but in GIS the relational design has been the most useful. In the relational design, data are stored conceptually as a collection of tables. Common fields in different tables are used to link them together. This surprisingly simple design has been so widely used primarily because of its flexibility and very wide deployment in applications both within and without GIS.

Query and Analysis

Once you have a functioning GIS containing your geographic information, you can begin to ask simple questions such as

And analytical questions such as

GIS provides both simple point-and-click query capabilities and sophisticated analysis tools to provide timely information to managers and analysts alike. GIS technology really comes into its own when used to analyze geographic data to look for patterns and trends and to undertake "what if" scenarios. Modern GISs have many powerful analytical tools, but two are especially important:

§ Proximity Analysis

To answer such questions, GIS technology uses a process called buffering to determine the proximity relationship between features.

§ Overlay Analysis

The integration of different data layers involves a process called overlay. At its simplest, this could be a visual operation, but analytical operations require one or more data layers to be joined physically. This overlay, or spatial join, can integrate data on soils, slope, and vegetation, or land ownership with tax assessment.

Visualization

For many types of geographic operation the end result is best visualized as a map or graph. Maps are very efficient at storing and communicating geographic information. While cartographers have created maps for millennia, GIS provides new and exciting tools to extend the art and science of cartography. Map displays can be integrated with reports, three-dimensional views, photographic images, and other output such as multimedia.

2.5 Future of GIS

Many disciplines can benefit from GIS technology. An active GIS market has resulted in lower costs and continual improvements in the hardware and software components of GIS. These developments will, in turn, result in a much wider use of the technology throughout science, government, business, and industry, with applications including real estate, public health, crime mapping, national defense, sustainable development, natural resources, landscape architecture, archaeology, regional and community planning, transportation and logistics. GIS is also diverging into location-based services (LBS). LBS allows GPS enabled mobile devices to display their location in relation to fixed assets (nearest restaurant, gas station, fire hydrant), mobile assets (friends, children, police car) or to relay their position back to a central server for display or other processing. These services continue to develop with the increased integration of GPS functionality with increasingly powerful mobile electronics (cell phones, PDAs, laptops).

CHAPTER 3

Digital Image Processing

3.1 Overview

An image may be defined as a two dimensional function, f(x,y), where x and y are spatial (plane) coordinates, and the amplitude of f at any pair of coordinates (x,y) is called the intensity of gray level of the image at that point. When x, y and intensity values are all finite, discrete quantities, we call the image a digital image. The field of digital image processing refers to processing of digital images by application of a two-dimensional signal followed by application of a standard signal-processing technique on a digital computer. Note that a digital image is composed of a finite number of elements, each of which has a particular location and value. These elements are called picture elements, image elements, pels and pixels.

3.2 Implementation and application

Digital Image processing finds application in:

3.2.1 Feature extraction

When the amount of data to be processed is massive or too large to be processed then. input data will be transformed into a reduced representation set of features (also named features vector). This procedure is referred to as feature extraction. In image processing this technique is used in areas which involve detection and isolation of various desired portions or shapes (features) of a digitized image or video stream.

In GIS this technique is used for such algorithms:

3.2.2 Patter recongnition

Patter recognition involves input of raw data, followed by application of feature extraction algorithms , followed by classification algorithms and finally action based on the evaluated result. The classification or description scheme is usually based on the availability of a set of patterns that have already been classified or described.

3.2.3 Projection

Has a numbers of implied meanings in image processing. Can refer to simple display of an image on a hardware device. Can also refer to the reduction of a three-dimensional surface to a flat map or vice versa.

3.2.4 Convolution

In this method the image is treated with a convolution filter. This is basically a two-dimensional matrix which is multiplied with a compatible matrix from the image over and over again until the filter has been applied to entire image.

EXAMPLE 1:

The following is a 5X5 matrix which is used to perform the sharpen function.

EXAMPLE 2:

An example small grayscale image (10x10):

34

22

77

48

237

205

29

212

107

41

50

150

77

158

233

251

112

165

47

229

93

0

77

219

43

56

42

113

140

94

32

19

44

30

36

94

151

101

28

84

10

90

48

73

63

148

159

183

99

22

192

70

27

88

20

230

53

34

38

106

239

202

196

205

50

123

192

88

41

37

230

174

14

22

127

100

189

186

214

187

227

86

195

6

53

168

46

166

36

249

215

165

237

110

125

191

191

94

123

8

An example convolution filter for line detection:

-1

-1

-1

-1

8

-1

-1

-1

-1

The row=2, column=2 pixel and its neighborhood from the image above: The row=2, column=2 pixel and its neighborhood from the image above:

34

22

77

50

150

77

93

0

77

To apply the convolution filter multiply the filter values with the image data block. Work with each pixel and its 3x3 neighborhood:

-1*34

-1*22

-1*77

-1*50

8*150

-1*77

-1*93

-1*0

-1*77

Then sum all the values:
(-34)+(-22)+(-77)+
(-50)+(1200)+(-77)+
(-93)+(0)+(-77) = 770

Divide by the divisor and add the bias.
(770/divisor)+bias=770 (in this example divisor=1, bias=0)

If the new pixel value is > 255 set it to 255
If the new pixel value is < 0 set it to 0

The new pixel value is 255. Store that in a new image:

34

22

77

50

255

77

93

0

77

Continue with all other 3x3 blocks in the image using original values. For example the next image block could be

22

77

48

150

77

158

0

77

219

Note the 3x3 "window" is shifted to the right by one and that the new pixel value is NOT used but stored as a second new image.Most of the image is processed in this manner. Image borders create problems and are ignored.

3.3 GDAL: Geospatial Data Abstraction Library

GDAL is a translator library for raster geospatial data formats. It enables us to perform image reads and writes on a raster level, block level and even on

the pixel level. As a library, it presents a single abstract data model to the calling application for all supported formats.

GDAL is the standard library for most GIS software including ArcGIS. All current versions of ArcGIS make use of GDAL 1.4.1 library to access rasters.

CHAPTER 4

GPGPU AND CUDA

4

4.1 General-Purpose computation on Graphics Processing Units

GPGPU stands for General-Purpose computation on Graphics Processing Units. Graphics Processing Units (GPUs) are high-performance many-core processors that can be used to accelerate a wide range of applications.

This concept works by tricking the GPU into general-purpose computing by casting problems as graphics, turning data into images (“texture maps”) and turning algorithms into image synthesis (“rendering passes”).

The increase in computing speed is not without overheads. All the algorithms need to be converted into “texture processing form” in order to utilize the full power of the GPU. Also, these chips are designed for and driven by video game development; the programming model is unusual, resources are tightly constrained, and the underlying architectures are largely secret.

4.2 Compute Unified Driver Architecture

NVIDIA CUDA is a general purpose parallel computing architecture that leverages the parallel compute engine in NVIDIA graphics processing units (GPUs) to solve many complex computational problems in a fraction of the time required on a CPU.

4.2.1 CPU vs. GPU

Image sampling processes involve a lot of Floating-Point Operations and intensive Memory Bandwidth usage.

The reason behind the discrepancy in floating-point capability between the CPU and the GPU is that the GPU is specialized for compute-intensive, highly parallel computation - exactly what graphics rendering is about - and therefore designed

such that more transistors are devoted to data processing rather than data caching and flow control.

More specifically, the GPU is especially well-suited to address problems that can be expressed as data-parallel computations with high arithmetic intensity . Because the same program is executed for each data element, there is a lower requirement for sophisticated flow control; and because it is executed on many data elements and has high arithmetic intensity, the memory access latency can be hidden with calculations instead of big data caches.

Data-parallel processing maps data elements to parallel processing threads. Many applications that process large data sets can use a data-parallel programming model to speed up the computations. In 2D and 3D rendering, large sets of pixels and vertices are mapped to parallel threads. Similarly, image applications such as post-processing of rendered images, encoding and decoding, image scaling, and pattern recognition can map image blocks and pixels to parallel processing threads.

In recent days GPU clusters have gone mainstream and allow the user to exponentially increase computational power by using several GPUs working in tandem.

4.2.2 CUDA: a General-Purpose Parallel Computing Architecture

The CUDA architecture and programming model has exposed more flexibility on GPU hardware. The Compute Unified Device Architecture (CUDA) allows programmers to write C programs which no longer require knowledge or dependence on the graphics pipeline. The programmers can now concentrate on parallelism without bothering about the low level details. CUDA is supproted on Nvidia GeForce 8 Series and above cards.

Cuda brings to the table several significant upgrades which allow the full power of the GPU to be harnessed. Unlike traditional GPU APIs, CUDA exposes a fast shared memory region that can be shared amongst threads. This can be used as a user-managed cache, enabling higher bandwidth than is possible using texture look-ups. Perhaps the most important feature of CUDA is its automatic thread manager which takes care of thread handling. Thus application programmers don't need to write threaded code explicitly. This also eliminates the possibility of deadlocks. Scalability is another issues that CUDA addresses. The Hardware is free to schedule thread blocks on any processor.

4.2.3 SYSTEM ARCHITECTURE

The architecture of a GPU is very different from a CPU. A CPU relies on high clock frequencies for performance whereas a GPU relies on its massive number of cores. Though these might be slower than the CPU cores but their parallel processing capabilities give them an edge.

Cuda makes use of on-board device memory to for all computational purposes. Data must first be transferred from the host (CPU) via the host-device memory bus to the device memory before any operations can be performed on them.

Apart from the standard DRAM the device also has cache memory exclusive to each streaming processor which can be used for register level speedy access.

4.2.4 IMAGE PROCESSING USING CUDA

Image processing involves analyzing 2D arrays of color values (1D or 3D).Most image processing algorithms are inherently parallel and involve a lot of calculation on a pixel level. Image processing is also memory intensive and involves a great deal of redundant memory look-ups. Such a scenario maps perfectly to GPUs.

Raster level ArcGIS Image Server processes can be either Radiometric or Geometric. The former involves changing pixel values but not the number of pixels or where they are placed; for example, Convolution Filter or Stretching. Geometric processes refer to the process of placing pixels in their correct positions on the ground; for example, Warp.

Significant performance improvements can be achieved using the GPU as a co-processor.

Most processes take image input row wise. The normal CPU code buffers the row pixels and then modifies each pixel individually,one pixel at a time. A row can be expected to have more than 4000 pixels. Using CUDA the task is made much more efficient. All the pixels of the row can be worked on simultaneously. The thread manager initializes a matrix of threads each of which represents an algorithmic pass. If needed the threads can be synchronized explicitly. The boost in efficiency is astounding. The CPU might have a higher clock frequency than the GPU but the parallel paradigm accounts for the lower clock and a significant performance increase is noted.

The algorithm can be further optimized by using texture or shared memory instead of global memory. CUDA facilitates sharing of data between threads within a thread block. If the process involves a pixel and its neighbors then that entire blocked can be read into the shared memory cache thus avoiding redundant fetches from the global memory and increasing performance.

4.2.5 EXAMPLES

SULTAN PROCESS

The Sultan process is one of many raster processes which involves pixel level manipulation. The Sultans process takes a six band 8-bit image and uses the Sultan's formula to produce a three band 8-bit image. The resulting image is a classified image which shows the classification of rock formations called ophiolites on coastlines.

Application of Sultan's formula to each pixel involves a lot of overhead. The process can be made efficient if ported to the GPU using CUDA API. This involves buffering the pixel values from the image, transferring them to device memory and then carrying out calculations on all the values in the buffer simultaneously using an array of threads. Memory copy does involve read and write overheads but these can be ignored if the volume of calculation is tremendous.

CHAPTER 5

CONCLUSION

During my training at ESRI I hope to acquire as much knowledge as possible and aid the company by developing something unique and original. Using CUDA I hope to optimize code snippets and increase the efficiency of their software. In the months of November and December I hope to attain a working version of the watermark code with CUDA implementation.

SULTAN PROCESS REPORT

THE SULTAN PROCESS

The Sultan process is one of many raster processes which involves pixel level manipulation. The Sultans process takes a six band 8-bit image and uses the Sultan's formula to produce a three band 8-bit image. The resulting image is a classified image which shows the classification of rock formations called ophiolites on coastlines.

Application of Sultan's formula to each pixel involves a lot of overhead. The process can be made efficient if ported to the GPU using CUDA API. This involves buffering the pixel values from the image, transferring them to device memory and then carrying out calculations on all the values in the buffer simultaneously using an array of threads. Memory copy does involve read and write overheads but these can be ignored if the volume of calculation is tremendous.

SIX BAND IMAGE

Once downloaded and uncompressed all of the data layers for a single multispectral satellite image, needs to be assembled into one file. This operation can be performed from either ArcTools or the Image Analysis extension of ArcGIS. For the purpose of this study ArcTools was used.

PROCEDURE : From ArcTools select Data Management Tools | Raster | Composite Bands. In the “Composite Bands” window enter the data layers in order. When entering the output filename, it is very important to include the file extension “.img” to create the image.

Once created the image can be added to any image service definition as and when required.

PROCESSING BY IMAGE SERVER

Tools used : ArcGIS ImageServer 9.3.1 SP1 - Gold Image Server Development kit

Once the image is constructed SULTAN process can be applied on it. This is done by first creating a new image service definition from the “Image service Definition Editor” toolbar( this toolbar is available after installation of image server).

PROCEDURE:

1) Navigate to Image Service->Advanced->New Service definition.

Spatial reference is data dependent and can be acquired from image metadata. The service type is change from the default “Color (RGB)” to “Custom” in order to accommodate the six band image. The number of input band is also modified to “6”. Pixel type, Bit depth and Color space are kept at their respective default values.

2) Navigate to Image Service->Advanced->Add Raster Dataset. Select the raster type as ERDAS Imagine. This Extends Image Server support to add

ERDAS Imagine (IMG) files. Add the file using the using the “Add ERDAS

Imagine Raster” UI.

3) Navigate to Image Service->Advanced->Build

This builds the Image service definition and enable image preview.

4) Applying Sultan process : Processes can be applied at various levels such as “Raster level” or “Service Process level”. The Sultan's process is applied at the Service Process level as the process is applicable on the entire dataset and not on any of the individual rasters separately.

Navigate to Image Service Toolbar->Image Service Properties

->Service process

Select Sultans Process from the list of available process and Apply.

5) Enable “Preview” to see process application results.

SULTAN PROCESS WORKING

All processes in ArcGIS ImageServer 9.3 have a common application method. The image is acquired and then cached in the memory. Application of any process is done by sending a “Handle” of the image to the corresponding process function. The function reads the assigned area of interest row by row, performs the calculation and returns the modified row for each call. The rows are obtained and stored in “unsigned character” buffers. The buffer is accessed using simple loops which treat it as a character array.

Two main functions which are highlighted during the process are :

  1. “Set the area of interest in rows and columns” function. This function takes as input the “Image Handle” and the extents of the AOI(Area of interest). It calculates the output buffer size of each input row for a particular handle of the image. It can also be modified to calculate the number of rows or columns in the AOI. The function returns ‘0' if the buffer size is 0 otherwise it returns 2.
  2. “Process and return one row of pixels” function. This is the core of the function code. This function receives a client handle and the index of the row to be processed. After modification has been made to the input row, an unsigned output row buffer is returned. Another important aspect of the process code is the Handle structure. Its defined in the Process header file. It contains several variables for storing the Client Handle properties also several pointers to functions which perform tasks on the handle, such as fetching rows or creating rasters.

CUDA HARDWARE BEING USED

Manufacturer: Nvidia

Series: G-force

Model: 8500 GT

Specifications:

The 8500 GT has two cores. Each of these cores have 8 streaming microprocessors. Each microprocessor can handle around 512 threads. So that makes a total of approximately 9K threads which can run simultaneously.

Threads are organized into blocks and grids. A thread block can contain a maximum of 512 threads and can be 2D or 3D. A grid which contains thread blocks is generally 2D.

Wrap Size is defined as the actual number of threads the GPU can process concurrently per processor.

IMPLEMENTING CUDA

The calculations involved in Sultans Process can be done in parallel. There is no dependency of one result (in a single iteration of a loop) on another. Modifying the code to use the GPU is therefore the perfect choice.

Limitations lie in the way the image is presented to the function. In this case the function is presented with a single row of pixels with the size of the row buffer in the thousands. The single row input posses several challenges on the GPU implementation of this function.

Each CUDA enabled GPU has limited memory and limited number of register, so the Kernel must me modified keeping all this in mind. Efficiency may vary depending on the number of simultaneous threads being processed and several other factors. If the number of threads are low the efficiency won't be reflected and their might even be a performance drop as compared to the code running entirely on the CPU. On the other hand if the number of threads allocated to the kernel is very high then time is spent to schedule the treads leading to overheads in the performance time. Another important factor effecting performance is memory. The amount of shared memory per block is limited and so is the amount of global memory. Too many allocations can lead to paging and thus have adverse effects on the efficiency. Therefore through testing a perfect balance must be attained keeping all these factors in mind to fully optimize the program.

Each of the thread blocks can hold 512 threads. The block occupancy also effects performance. A block occupancy of around 50% and above is a must. The 8500 GT is a major and minor revision one enabled card. Being one of the earliest models to implement CUDA its not as polished as its successors and support for some of the CUDA 2.3 API features are limited.

INITIAL TRIALS

Installations: ArcGIS Imager Server Developer Kit

The Image Server SDK comes with several example raster processes. These examples are provided with pre built Visual Studio 2005 solution files. These can be compiled and used to replace the original raster process DLLs for testing purpose. The solution contains a main “.cpp” and a supporting header file for the Image Handle structure declarations and other global variables and functions.

The first step in building any CUDA program is deciding the Block and Grid dimensions. For the purpose of this example a one dimensional block was chosen with each block containing 256 threads consistent with 50 % occupancy. The Grid was also one dimensional with its length dynamic (depending on the size of the row or rows being processed. The dimensions of the Block and Grid must suffice the number of simultaneous threads being called. Whether these are one dimensional, two dimensional or three dimensional is completely dependent on the program and programmer preference.

CUDA requires the use of device memory so all input must first be copied to device memory before it can be sent to the Kernel for calculation. The same is true when obtaining output from the Kernel. The result has to be copied back to HOST memory for display and further use.

The CUDA kernel is the section where parallel execution takes place. The kernel must always be declared global. Only device variables and functions can be used inside a kernel. Present CUDA architecture does not allow a kernel call inside another kernel but such references might be possible using Nvidia's SLI technology and may be implemented in future versions. Inside the Kernel the basic loop variables are declared, its bounds are set and the calculations are ported to the GPU. The result is then copied to the output buffer of the Image handle and returned to the caller function.

SAMPLE IMAGE

Specifications:

Rows and Columns: 7576, 8608

Uncompressed Size: 373.16 MB

Pixel Type: unsigned integer

Pixel Depth: 8 Bit

Format: IMAGINE

Bands : 6

CASE 1: Processing one Row at a time.

The CUDA Kernel was placed in the supporting header file along with other device functions. No modification was made to the “Set area of Interest” function. Remaining changes were made in the “Process and Return” function. Firstly the input was copied to a device buffer after sufficient memory had been allocated on the device. Then dimensions for the block and grid were set. The device buffer was passed to the kernel along with the device output buffer and out buffer size.

In the kernel thread IDs were assigned, out of bound conditions were set. The calculations remained the same expect that the loop variable was replace by the thread variable. The changes were stored In the device output buffer. This was copied back to the HOST memory and the transformed row was returned.

Testing:

Original DLL processing a single row in a single function call.

Average execution time = 32 ms.

CUDA DLL processing a single row in a function call.

Average execution time = 26 ms.

Performance improvement: 19 %

Result: No significant Speedup.

CASE 2: Processing multiple rows in single call

The “Process and return” function receives as arguments the Image Handle and the row-index. Take this index as the starting row several rows can be requested. These rows are copied to a single large Input Buffer Block. The buffer block is a single linear stream of continuous multiple rows. The entire

block is copied on to the device and passed to the kernel. The kernel remains the same as in the first test case.

Output from the kernel is fetched in a similar Block. The modified value of the base row( the row whose index in the argument of the “Process and Return” function) is returned back in the same call. In the subsequent calls of the function no actual processing is done. The start pointer for the Output Buffer Block is shifted and the corresponding row is returned. This process is repeated until the desired set of rows is exhausted.

TEST VALUES

Rows

Run 1

Run 2

Run 3

Run 4

Run 5

Average

1

25.2784

25.4294

25.9841

26.3801

29.095

26.38214

2

26.3688

26.8742

26.5678

27.3151

24.7848

17.78738

5

17.9359

18.4597

17.5171

17.6685

17.3557

11.32461

10

11.9089

11.0696

11.1459

10.8476

11.651

9.743098

15

10.4265

9.8746

9.38527

9.55469

9.47443

8.853054

20

8.8104

9.06944

8.7529

8.62359

9.00894

8.337746

30

8.18536

8.51943

8.28034

8.5789

8.1247

7.940742

40

7.86275

7.99491

7.8758

7.94203

8.02822

7.791984

45

7.61754

7.61803

7.58214

8.20298

7.93923

7.630104

50

7.41656

7.6768

7.43067

7.91563

7.71086

7.569892

55

7.50348

7.45891

7.33029

7.61211

7.44467

7.483532

60

7.3415

7.44735

7.78371

7.61791

7.48606

7.436166

65

7.32748

7.42857

7.47626

7.50045

7.44807

7.540004

70

7.51796

7.71864

7.0817

7.61136

7.77036

7.568942

75

7.51987

7.70987

7.24756

7.45231

7.78996

7.615633

80

7.61567

7.56342

7.52879

7.45876

7.53489

7.628946

85

7.58229

7.82686

7.56733

7.50504

7.66318

7.920242

90

7.85296

7.68757

7.78371

8.23635

8.04062

7.923518

95

8.24098

7.95283

8.04228

7.74149

7.64001

7.953834

100

7.71901

7.96191

8.20189

7.94303

8.14333

8.087268

125

8.26499

7.91638

7.97393

7.93809

8.34295

8.139088

150

7.37787

7.74114

7.77113

7.54441

7.76089

8.166316

175

7.92588

7.95761

8.6373

8.00349

8.3073

8.299515

200

8.23638

8.24406

8.34183

8.34365

8.33183

8.699838

CALCUTATIONS:

MAXIMIZING PERFORMANCE AND

CUDA LIMITATIONS

  1. The CUDA driver has a significant initialization time on the first call. Around 200ms.
  2. The kernel has a significant initialization time on the first call
  3. Transferring memory to/from the GPU is slow. The objective becomes to push the entire algorithm on to the GPU so that copies are only needed at the beginning and end of a long calculation.
  4. There is still a small (5us - 100us depending on grid size) overhead for each kernel launch. For very fast executing kernels, this overhead can dominate
  5. There is an "amortization" as you say for small kernel executions on the GPU. If you run 1 block, it executes in time A. On a GTX, you can run (depending on the grid/block size and registers) up to 128 blocks concurrently meaning executing 128 blocks will still take time A. Only when you add the 129th block does your kernel take time 2A because the last kernel must execute after all the others finish. But you can go all the way to 256 without increasing your execution time above 2A.

Need an essay? You can buy essay help from us today!







Request the removal of this essay.

Find out how UKEssays.com can help you with your Essays

Get help with your essay

Sign up and be the first to receive our latest offers: