# Porting And Analysis Of Ray Triangle Intersection Computer Science Essay

**Published:** **Last Edited:**

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Today, computers and computer generated images touch each and every aspect of human life. In the recent years, almost every computer can be used to design graphics and people came to such a situation that they even expect to control their computer through icons and pictures instead of typing. Computer Graphics can be described in a broader sense as "almost everything on computers that is not text or sound." (Foley, Dam, Feiner, Hughes1995). Developments in this field provided a better understanding and interpreting various types of data. It exerted a profound impact on different types of media and revolutionized animation, movie and videogame industries. In addition, areas related to engineering, architecture, mass media and medical industries were found to be benefitted with the wide spread use and developments of Computer Graphics (Foley, Dam, Feiner, Hughes1995).

Ray Tracing (in Computer Graphics) is a technique used to generate an image by tracing the path of the light through pixels on an image plane and subsequently simulating the effects of its critical encounters with the virtual objects (Nikodym 2010). This technique has a potential of producing a very high degree of visual realism which is usually more than that of Scanline rendering methods. This principle makes it best suited for certain applications like still images, movie and television special effects (Ward 2007). But owing to its high computational cost, it is more poorly suited for real time applications (such as videogames) where the speed is the critical factor. However, this technique is quite capable of simulating a variety of optical effects like reflection, refraction, chromatic aberration and scattering (Nikodym 2010).

Ray Tracing Algorithms

Optical ray tracing involves producing visual images constructed within three dimensional computer graphics environment with high photorealism (Foscari1989). Scenes in ray tracing can be described mathematically by a programmer or visual artist (utilizing intermediary tools). These scenes incorporate data from the images and models captured by digital photography (Nikodym 2010). Typically, each ray needs to be tested for its intersection with some subset of all the objects in the scene. Once the nearest object is identified, the algorithm (expressed as a finest list of well defined instructions) will then estimate the incoming light at the intersection point, examines the material properties of the object and add up this information to detect and calculate the final pixel color (Nikodym 2010). Some illumination algorithms require more rays to be re-casted in to the scene. The important research breakthrough in the area of ray tracing technique came from Turner Whitted in 1980 (Whitted 1980). Previous algorithms designed by various researchers cast rays from the eye in to the scene but the rays were not traced further (Nikodym 2010). To accurately render the two dimensional picture of a three dimensional scene, the information pertaining to global illumination which affects the intensity of each pixel should be known at the time of intensity calculations (Foscari1989). According to Whitted, this information is protected in a tree of "rays" extending from the viewer to the light sources. A visible surface algorithm plays a key role in creation of this tress for each pixel of the display and enables the passage to the shader (Whitted 1980). Subsequently, the shader transverses the tree to calculate the intensity of light received by the viewer. Conventional shaders simulate accurately the effects of true reflection, shadows and refraction by carefully considering all the above factors (Whitted 1980).

The following figure demonstrates the typical ray tracing algorithm that builds an image by extending in to the scene.

Figure 1: Depicts the ray tracing algorithm and image building process

(Source:Foscari 1989)

Matt Pharr (Pharr 1997) divides the ray tracing algorithm process in to seven objects or steps that involve the use of Cameras (to generate rays from the view point), Ray object intersections (Precise location of meeting point that pertain to ray and object), Light distribution (most important component that distributes light throughout the scene), Visibility (locating and finding the light energy deposits through ray tracers), Surface Scattering (As different objects possess different surfaces and as a result light scattered may also vary (Pharr 1997). Each surface should indicate its scattered light and its degree of interaction), Recursive ray tracing (especially important for shiny surfaces like metal and glass; involves tracing additional rays that originate from the surface to fully capture this effect), Ray propagation (tracing the ray through the spaces) (Pharr 1997). Apart from the use of above parameters in detecting ray tracing algorithms, the phenomenon of Global Illumination needs to be studied. This increases the realism of the image by utilizing ray intersection process as a base (Pharr 1997). Generally, these are the group of algorithms that take in to account the direct light source as well as reflected light rays (from the same source that are reflected by other surfaces in the scene) (Pharr 1997). Theoretically, it is the simulations of diffuse inter reflection process and accommodates reflection, refraction and shadows. But, the high photorealistic images obtained through this process are computationally more expensive as well as slower to generate (Pharr 1997). Besides, producing high degree of realism, Ray Tracing can simulate the camera effects owing to the field depth and aperture shape. The number of reflections and refractions taken up by a ray and effects produced each time when it encounters a surface is completely controlled through software settings (Foscari1989).

Ray Triangle Intersection-An Application of Ray Tracing

Utilizing the principles of Ray Tracing, it can seem to be possible to extend those algorithms to include three dimensional or 3D triangles that are common elements of 3D surface and polyhedron models (Moller, Trumbore 1997). But consideration is given only to the transversal interactions where the intersecting objects lie in different planes rather within the same plane. The computation of ray triangle intersection is the most frequent nontrivial operation in the field of computer graphics rendered with the use of Ray Tracing technique (Synder, Barr 1987). Intersecting ray with the triangle is a little complex process as a triangle needs a slightly complex definition (Moller, Haines 1999).

A triangle is the subset region of the plane defined by its three vertices which is bound by its three edges (Moller, Haines 1999). Two steps need to be considered for ray triangle intersection:

Intersect the ray with that of triangle's plane.

Check if the intersection persists inside the triangle.

As the triangle is the most fundamental element enabling the intersection, a convenient algorithm is definitely needed to handle it. An effective and efficient algorithm for the intersection can be designed by using Bary centric coordinates (Arenberg 1988). These coordinates provide a route to parameterize the triangle in terms of b1 and b2 variables with the conditions of b1> 0, b2>0 and b1+b2<1 (Arenberg 1988).

Detailed analysis of ray triangle intersection together with the research works carried out can be discussed in the subsequent sections.

Computer Unified Device Architecture (CUDA)

CUDA is a revolutionary parallel computing architecture developed in parallel to Nvidia. It is the computing engine in NVIDIA Graphics Processing Units (GPU's) and is accessible to all the software developers through variants of standard programming languages. In general, programmers use C-Language for CUDA (C-Language supported with Nvidia extensions and restrictions) compiled through Path Scale Open64 C compiler (Svetlin, Valle 2008) to code the specific algorithms for the execution on the GPU. The goal of CUDA programming interface is to enable a relatively simple path for the users familiar with C-Language in order to write execution programs easily (Mark 2007). It possess limited set of extensions to the C-Language, which allow the programmers to target specific portions of source code for execution on the device; a run time library that is split in to host, device and common components (NVIDIA 2008).

The following figure (Figure 2) presents an example of CUDA processing flow that involves the following steps (Mark 2007):

Copying data from the main menu to the GPU memory.

Central Processing Unit (CPU) instructs the processing to GPU.

Execution of GPU in a way parallel to each core.

Finally copying the result obtained from the GPU memory to the Main memory.

Figure 2: Representation of CUDA processing flow

(Source: NVIDIA 2008)

http://upload.wikimedia.org/wikipedia/commons/5/59/CUDA_processing_flow_%28En%29.PNG

This architecture combines certain specific hardware like G80-based FX5600 along with drivers, libraries and C-Language extensions in order to provide reliable access to GPU hardware without any possible constraints (generally, these constraints are imposed by traditional GPU programming technique and it depends upon Graphics Application Programming Interface) (NVIDIA 2008). In general, Code is organized in Kernels (Functions that are invoked from the CPU but get executed on the GPU) and the multi processors of GPU tend to execute these kernels in to high efficient Single Instruction Multiple Data (SIMD) modes by maintaining a standard C-Language control flow. In addition to conjuring the kernels, the CPU (host) is also responsible for managing the memory in the device that is segregated into varied multiple spaces of different capabilities (NVIDIA 2008).

Kernels are raised upon various threads through a consecutive way and internally CUDA schedules the execution of these threads in order to potentiate the performance. In general, these Kernels fail in accessing regular memory and therefore memory needed for its usage should be specially allocated. Threads are organized in to various different blocks that are further categorized in to two -dimensional grids (NVIDIA 2008). Each of the Thread pertaining to a single block share some of the device resources and it synchronizes with another. Within the thread block, threads are arranged in well defined size 32 groups called Warps. A single Warp can be signified as minimum collection of threads for SIMD execution. However, a divergence of control flow within a warp extracts a considerable performance penalty (NVIDIA 2008).

On the whole, it can be ascertained that CUDA Architecture consist of several different components like Parallel computing engines present inside the Nvidia GPU's, Operating system kernel-level support for initialization and configuration of hardware, User mode drivers that provide device level API for software developers and PTX (Parallel Thread Execution Assembly Language File) Instruction Set Architecture (ISA) for parallel computing kernels and functions (NVIDIA 2008).

Literature Review

Ray triangle intersection is an important algorithm not only in the area of realistic rendering (which is based on the principles of ray tracing) but also in simulation physics, collision detection, modeling (Havel, Herout 2010). Obviously, the fastness or speed of this well defined algorithm implementation is vital as calls to that routine are immense in rendering and simulation applications. For the intersection of rays and triangles, operations like horizontal addition or dot product are necessarily needed (Havel, Herout 2010). The research work carried out by Havel and Herout (2010), sets a new modification on algorithms of fast-ray triangle intersection (Havel, Harout 2010). They implemented these algorithms on Streaming SIMD Extensions4 (SSE4) and it was observed that SSE4 outperformed the current state of the art algorithms (Havel, Harout 2010). It also allowed single ray and ray packet intersection calculation with the similar pre computed data (Havel, Herout 2010). In the recent years, numerous articles on ray tracing news have discussed solutions for the ray triangle intersections using barycentric coordinates. The research work carried out by Hanrahan exhibited another way of thinking about the problem of ray-triangle intersection (Harhanan 2003). The underlying idea was based on thinking of barycentric coordinates intersection point as the volume ratios instead of area ratios and sooner this technique was used in deriving formulas for ray quadrilateral intersection with the fact that associated with every planar quadrilateral is a unique triangle (Harhanan 2003). The research work led to the development of quite efficient algorithms for ray triangle intersection and the results obtained established a reliable connection to Plucker line coordinates (Harnahan 2003).

The application of Ray-Triangle Intersection algorithm was further extended to Modern CPU Architectures and this was evident from the research article published by Shervtsov, Soupikov and Kapustin (2007). This test utilized SIMD instructions for efficient handling of ray packets and this study proved to be highly essential for achieving high ray tracing performance on the modern CPU's (Shervstov, Soupikov, Kapustin 2007).

Moller (1997) provided a faster and reliable way to calculate ray triangle intersection. He eliminated ray intersection formula and transformed it in to a unit triangle along y and z axis with the ray direction potentially aligned on the x-axis (Moller 1997). This method resulted in saving a lot of memory and the formula designed by Muller is presented below:

In the above equation, P= and Q= (. Moller (1997) asserts that these variables can be reused in future to save memory. By the year of 2005, he designed a clean algorithm to determine the intersection of ray with a triangle. The advantage of this method was that the plane equation neither needed to be computed on the fly nor be stored and it was believed in a way as fastest ray-triangle intersection followed for triangles that do not followed pre computed plane equations (Moller 2005).

Wald (2004) carried out similar work as Moller and he optimized the algorithm with the barycentric co-ordinate test (Wald 2004). His equation first computes the ray triangle intersection and later on barycentric coordinates are calculated by projecting the triangle on to the axis of aligned plane. This method was found to be highly suitable for SSE instructions and to intersect four rays within a single triangle (Wald 2004).

The principle of Hardware accelerated Ray Triangle Intersection Testing was applied for High Performance Collision Detection by Kim, Nam and Lee (2007). Within their research work, they designed the hardware architecture for primitive testing components and implemented it on a multi-FPGA Xilinx Virtex-II prototyping system (Kim, Lee, Nam 2007). The result obtained was a hardware accelerated ray-triangle intersection engine that has a potential of out -performing a 2.8Hz Xeon processor, running a known high performance software ray triangle intersection algorithm upto a factor of seventy (Kim, Lee, Nam 2007). This approach proved to be much faster than current GPU and CPU based algorithms for the ray triangle intersection process (Kim, Lee, Nam 2007).

Project Specifications and Methodology

In the field of Computer Science, Porting involves typical process of adapting software with an aim to create an executable program for a computing environment that is completely different from the originally designed program (by CPU, Operating system and third party library). With relation to the present study, the algorithms of ray triangle intersection can be ported in CUDA using CUDA Integrated Development Environment (CUDA IDE) (also known as Integrated debugging environment or Integrated design environment). The specifications required to execute efficient and faster ray triangle intersection algorithms include the following:

CUDA IDE:

It is a software application that provides comprehensive facilities to various computer programmers for software development (Nourie 2005). These are designed to maximize the programmer productivity in a way by providing tightly-knit components with similar user interfaces. Owing to its high complicated nature, higher productivity can be witnessed only after a lengthy learning process (Nourie 2005). Typically CUDA IDE presents a single program that support various features of authoring, modifying, compiling, deploying and debugging software with an aim to abstract necessary configuration in combining the command line utilities to a cohesive unit and thereby theoretically reducing language learning time and increasing developer productivity (Nourie 2005). Precisely, these IDE's perform tasks like integrating web application software, provides code completion process, integrating the source code repositories, build utilities and web application servers and monitoring HTTP for debugging web applications and generating refractory codes (Nourie 2005).

The results of implemented algorithms can be calculated on Intel Core 2 E6850 3.00GHz processor, 4.00GB of RAM, GeForce 8800 GTX (Ge Force 8 Series, computer graphics processing unit) equipped with 768MB of video memory running through Windows XP64. GeForce 8800 GTX was the fastest GPU released and possess 128 stream processors clocked at 1.35GHz (Giga Hertz), a core clock of 575MHz (Mega Hertz) and 768MB (Mega Byte) of 384-bit GDDR3 memory supported at 1.8GHz giving a total bandwidth of 86.4GB/sec (Ryan 2006). Its unified architecture consist lot of ALU's and simple cores (Ryan 2006). With an 8KB (KiloByte) memory upon thread of multi processor supported by a total of 16KB shared memory per each multi processor, this card performs much faster activity in comparison to its other series (like Single Radeon HD2900 XT and GeForce 7900GTX) (Ryan 2006). In addition, it possess sixteen memory banks and parallel Data Cache to enhance parallel processing of 1000 threads (Threads: Part of CUDA Architecture) (Ryan 2006). The algorithms can be coded using CUDA Computer Engine (Architecture is described in the above chapter). The typical process of code execution involves, allocation of code (described below) upon the graphic card, copying it on to the respective array, running 104 kernel executions that were on 26 blocks of 4 threads each and a final copying process of result obtained back to the main memory (Ryan 2006) (Typcal Flow process of CUDA is mentioned in Figure 2). An example of sample code is presented below:

void Increment Array()

## {

// Create an array of 102 floats

#define N 102

float mainMemoryArr[N];

for (i=0; i<N; i++)

mainMemoryArr[i] = (float)i;

// Copy data to the graphics card memory

float *gpuMemoryArr;

cudaMalloc((void **) &gpuMemoryArr, size);

cudaMemcpy(gpuMemoryArr, mainMemoryArr, sizeof(float)*N, cudaMemcpyHostToDevice);

// Calculate how many 'blocks' of threads we'll run

int blockSize = 4;

int nBlocks = N/blockSize + (N%blockSize == 0?0:1);

// Run the GPU function

incrementArrayOnGPU <<< nBlocks, blockSize >>> (gpuMemoryArr, N);

// Copy the modified memory back to main memory

cudaMemcpy(mainMemoryArr, gpuMemoryArr, sizeof(float)*N, cudaMemcpyDeviceToHost);

## }

The final measurements and analysis of ray triangle intersection algorithms can be tested using specified models like Bunny, Dragon, Conference, Sponge and other models. The typical selection of these models depends upon the intersection process together with its algorithm. Thus, the effective combination of specific processor optimizations together with the defined ray triangle intersection algorithms that explore the potential application of ray tracing coherence makes it possible to ultimately achieve success in real time performance. Much was explored in the field of Computer Graphics through various research studies and more is still needed to be achieved in the future with an aim to produce cost effective illuminations. Use of Sophisticated techniques along with the innovative developments with relation to the software used and the availability of individuals with a determined potential to carry out the research process result in ultimate progress.