# 3d Modeling Computer Vision And Image Processing Computer Science Essay

**Published:** **Last Edited:**

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

3D models are used worldwide for many applications like in movie industry, videogame industry, architecture industry.3D modeling is a technique of mathematically representing the 3dimensional surface of an object. The products generated are called 3D models. Efficient understanding of 3Dstructure is still a problem under research for many computer vision applications. There are numerous techniques for 3D reconstruction given multiple views of a scene. But inferring depth from a single 2D image is tedious. The goal of this article is to outline the available 3D modeling techniques.

There are numerous techniques available for 3D reconstruction.3D models can be created from point clouds as well as from 2D slices.3D models created from 2D slices are used in medical applications. Reconstruction methods differs also on the number of views

of the available input image.It is classified as multiple view reconstruction and single view reconstruction techniques. Reconstruction techniques like structure from motion ,stereovision rely on geometry.Triangulation methods are used to infer 3D points.Besides that there are techniques where monocular cues are considered in cases where we have just a

single 2D.3D depth reconstruction techniques have been used widely. The different 3D reconstruction techniques have been discussed in this article.

## 2. Reconstruction from point clouds

The point clouds can be produced by scanners. This can be used directly for visualization in the architecture and construction world.

## 3.1.1 Polygon mesh modelling

Polygon models also known as Mesh models. This technique has applications in organic and inorganic modeling. They are useful for visualization. But the disadvantage of this technique is that they are generally heavy and are un-editable in this form. A curved surface is modeled as many small flat surfaces inorder to create a polygonal representation. Reconstruction to polygonal model is done by connecting the adjacent points with straight lines in order to create a continuous surface.

## 3.1.2 NURBS Modeling

NURBS stands for Non Uniform Rational Bspline. Initially the outline of the object is created. Curves extending from one side of the profile shape to the other end are joined and they define the shape of the figure. These curves can be joined to form a 3D NURBS surface. This method is easy to model approach. This technique has very limited extensibility.

## 2.3 Surface Modeling

In this approach, initially a spline cage profile is created for each 3D character. A complete closed profile of the character is created by joining the splines with each other. A surface modifier is applied on the spline cage as soon as the cage profile is completed. This technique is applicable for organic modeling. It is a widely adopted technique. It can be adaptable according to varied requirements.

## 2.4 Sub-division Modeling

This is a hybrid modeling technique. It combines the merits of both NURBS modeling and polygonal Modeling technique. In sub-divisional modeling, initially a polygonal model is created. This model is further converted into a sub-divisional model. The original polygonal

model that lies beneath has to be altered. This can be done by pulling and moving the control points and lattices in the model. The level of refinement to be done in a particular area can be selected. More control points can be assigned to those areas requiring higher detail while other regions will remain with a base vertex density. Thus performance can be boosted.

## 2.5 Solid CAD models

CAD is commonly used in industry to describe, edit and maintain the shape of assets in enterprises. It is used in engineering perspective. The representation of a digitized shape is the editable, parametric CAD model. Different approaches are offered by vendors inorder to develop a parametric CAD model. Usually NURBS surfaces are exported and CAD designers complete the model in CAD . CAD applications are robust enough to manipulate polygon models within the CAD environment.

## 3. Reconstruction from a group of 2D slices

CT, MRI, or Micro-CT scanners produce a set of 2D slices which are 'stacked together' to produce a 3D representation. There are several ways to do this depending on the output required:

## 3.1 Volume rendering

Threshold values differ for different parts of an object. Each components are represented by different colors depending on the thresholds. Thus multiple models can be constructed based on this result.Surfaces of equal values can be extracted from the volume and then can be rendered into polygonal meshes or the volume can be rendered directly as a block of data. One the common technique used for extracting a surface from volume data is the marching cubes algorithm.

## 3.2 Surface rendering

A popular technique for presenting 3-D structures is surface rendering.In this two-stage approach, a set of relevant surfaces is first extracted from the data set. These surfaces are then polygonized and displayed on conventional graphics hardware. In earlier methods, simple thresholding or hand-drawn contours were used as a basis of surface extraction. The main purpose for surface extraction is to give a certain threshold value to organs of interest, known as "tagging" or "flagging." Once the voxels are "tagged" as either belonging to the structure of interest or not, the individual polygons are identified using different methods.

A well known surface rendering algorithm called Marching Cubes [2], creates triangle models of constant density surfaces from 3D medical data. The algorithm process the 3D data in scan line order and creates the surfaces as follows:-It reads the slices from the memory and create the cubes from eight pixels; four each from two adjacent slices. Then calculate the index for the cube by comparing the eight density values at the vertices with the surface constant. Using the index created based on the state of the vertex, look up the list of edges the surface intersects from a precalculated table. Then using the densities at each edge vertex, find the surface edge intersection using linear interpolation. Calculate a unit normal at each vertex to produce shaded image. Interpolate the normal to each triangle vertex. The resulting polygonal structure can be displayed on display systems. Another 3D reconstruction algorithm [3] for medical volume visualization can be described in two steps. The proposed reconstruction algorithm was developed with ideas of non-approximating technique for edge detection and fast process of 3D reconstructions. Firstly process of identification of areas occupied by the organ is presented. Identification of the organ areas generally is achieved in the three following steps: preliminary processing, outline tracing, recognition of the geometric features of the image outline. Second part describes process of 3D reconstructions of identified outlines. Proposed solution creates triangle strips from points of all identified outlines between every two neighbouring cross-sections. The algorithm searches the nearest neighbour points in another cross-section. If two following nearest neighbours are separated with some not connected points, the algorithm connects them with currently considered point in another cross-section. But, the algorithm must also take into consideration number of identified outlines in each pair of cross-sections. This information is the most significant in the choice of the way these outlines would be connected. The level of details depends only on CT scan resolution.

Contour based reconstruction algorithm [4] works by extracting iso-contours from each slices and then connecting to create iso-surfaces. The process consists of constructing surface over a set of cross-sectional contours. This surface is composed of triangular tiles which are constructed by separately determining optimal surfaces between each pair of consecutive contours. The contour points determined can be reduced to constructing a sequence of surfaces, one between each pair of adjacent contours. These surfaces can be constructed from elementary triangular tiles, each defined between two consecutive points on the same contour and a single point on an adjacent contour. The algorithm is computationally less complex but composite surfaces like human head with more than one component would cause ambiguities for connecting contours.

## 4. Reconstruction based on number of views

## 4.1 Multiple view reconstruction

Stereo vision[9] can be considered similar to human vision. Humans have two eyes spaced by approximately 60mm.Each eye produces a slightly different image at different angles. Human brain merges these two images to form a single image. Thus we get a perception of depth. Depth is determined by the amount of difference between the images that it receives. Structure and depth are ambiguous from single views. Pixel correspondences across images are determined to construct 3D structures. The algorithm is most effective when the distance between viewpoints is small. With atleast two pictures, depth can be measured by triangulation method. In this method we match a feature atleast from two views inorder to determine the 3D position. Gathering good points require many views otherwise holes may appear. Complex reconstruction and merge algorithms are required inorder to construct a 3D model. Global consistency of the merged model cannot be guarenteed. For regions with homogeneous color and intensity, point correspondences are difficult to compute. Usually correspondence mismatches occurs for images with view dependent effects, such as specular highlights or reï¬‚ections. Obtaining dense correspondences is difficult. Point correspondence between images can be difficult to compute in case of occlusions. This might pose a severe problem for scene reconstruction where occlusions occurs frequently.

## 4.1.1 Shape From Silhouettes

Shape-from-silhouette technique [12] recovers the shape of an object called visual hull from a set of silhoutte images. 3D model is computed as the intersection of visual rays from the camera centers through all points on the silhouette of the object. It is based on geometric information. The defect of this technique is that it cannot recover concavities. This method is successful for real-time virtual reality applications because of its simplicity and speed. Accuracy of this method depends on the number and position of cameras. Reconstruction reaches true shape as the number of cameras appears infinity. If the scene contains multiple objects there is less chance of reconstruction artifacts. The two approaches for computing visual hull are volumetric approach and polyhedral approach. Among these volumetric approach is mostly used because of its efficiency and simplicity.

## 4.1.1.1 Volumetric approach

The reconstruction volume is subdivided into many basic computational units called voxels. The properties of each voxel is computed independently. The occupancy of each voxel has to be checked. Voxel is projected into ach silhouette image. The voxel is said to be empty if the projection is outside at least one silhouette .Voxel is occupied if the projection is inside all silhouettes . Octrees can be used to speed up the computation of this method. Subdivision of an octree cell occurs if it is partially occupied .It is not further subdivided if the cell is empty or full. Since the occupancy for each voxel was checked independently this method can be easily parallelized. So,octree implementations can be made parallel which would help in real time performance. Voxel resolution plays a major role in determining accuracy of reconstruction. Artifacts can result as a result of low spatial resolution.

## 4.1.1.2 Polyhedral approach

In this method, silhouettes are backprojected and intersected in 3D.There are no discretization artifacts. Silhouette resolution determines the precision. The defect of this method is that problems cannot be tolerated.It has proved to work well in real time systems.

Shape-from-silhouette method[12] fails in cases of occlusion. This method has certain advantages such as it is robust,efficient and easy to implement. There does not arise a need for correspondences and only silhouette images are required for this method. But the disadvantage lies in the fact that silhouette images have to be available and the input images have to be caliberated. If the scene is complex and if we have low number of images then artifacts may occur. Concavities that are not present in silhouette image cannot be recovered using this method. Solution to this problem is the cell carving algorithm which is able to reconstruct concavities. This algorithm can handle occlusions. Scenes which are not consistent with input images are removed and the shape is reconstructed with the rest of the parts in the scene. For a perfect reconstruction we need to check whether the surface point is photoconsistent with the input images. Non-photoconsistent parts are eliminated. But this method too has many disadvantages. If a wrong voxel is removed, there are chances that other correct parts may be accidently removed. It uses a greedy approach. It needs uncaliberated input images.

## 4.2Photometric Techniques

This technique[5] is used for reconstruction when a same surface is viewed from a fixed viewpoint by changing the illumination conditions. The factors that change the appearance in an image are illumination condition ,shape, reflectance.Illumination conditions depends on the direction of the light source. Reflectance conditions depends on the type of surface ie whether it is Lambertian or non Lambertian. This method works well than the former method since additional photometric constraints are used to recover shape of an object. The surface luminance is same throughout the object since light falls in a scattered manner regardless of the viewing angle .Voxel coloring removes non-photometric voxels from a 3D volume by considering all images. This approach can be extendented for applications like graphics hardware acceleration and multiple color hypothesis .This approach fails for large scenes having large differences in scale. Interactive viewing may introduce mesh inconsistencies since this process is very lengthy and tedious. Analytic model of reï¬‚ectance is the base for all photometric techniques. If global illumination effects such as shadows or transparencies are present, this method can fail . Normally we assume that the surfaces are Lambetian since recovery of surface in presence of global illumination is difï¬cult. They do not work for objects with same surface color or texture. Once the photometric images are taken surface normals at each pixel is calculated and local orientation and depth is recovered.

## 4.3 Image-based Modeling

User and computer based tasks are divided are a result of image based modeling. We can reconstruct a model from a single image using depth images . Some methods provide various tools to paint a depth image, edit the model, and change the illumination in the photograph. Eventhough these methods are flexible a large amount of user interaction is required. The resulting 3D model may not be globally consistent with all the input photographs. Geometric constraints has to be taken into account if the scene is known to have a certain structure. Inorder to reconstruct architectural scenes the geometric characteristics of scenes with planes and parallel lines has to be considered. This method may fail for reconstructing arbitrary scenes since procedures for optimization may be very lengthy correspondences.

## 4.4 Structure from motion

This technique[10] is used inorder to reconstruct 3D estimate of scene structure from passive video image sequences . It is possible to achieve automatic reconstruction of the 3D location of sparse scene features for sequences of 50-100 images. But for long sequences this may not be applicable. With this technique we can have an estimate of the location of sparse 3D features. Stereo vision with many cameras under fixed lighting is called structure from motion. The processing speed can be increased. This framework consists of two complementary modules which can run in parallel. The ï¬rst module is used to process every new incoming image. It matches features with the previous image, calculates new 3D point reconstructions and finds out the camera pose of the resultant image. Computed camera poses and 3D reconstructions are written by this module to the hard disk. After processing a block of images, it is sent to the second module. A windowed bundle-adjustment is performed so that point reconstructions can be refined. There is a delay of one block with respect to the ï¬rst module.The ï¬rst module finds out matches between the previous and the current image features and computes the camera pose using the correspondences.Epipolar geometry can be used to speed up search for more features. The midpoint of the shortest line segment which connects the lines of sight of the start and the end of the feature track is calculated.Reconstruction is done on the basis of this calculation.3D point is corrected by re-triangulation using only the start and the new end of the feature track.

## 5. Single view reconstruction

The task is tedious compared to multiple view reconstruction where the 3D locations can be infered from geometry. Depth information will be ambiguous when a single view is given.By computing vanishing points and lines given a single image we can infer the 3D location. A single vanishing line allows complete rectification of the corresponding plane and determination of its orientation relative to the camera once the calibration is achieved . Therefore 3D directions for lines and planes can be estimated.As we know that projective transformations do not preserve angles automatic identification of perpendicular sets of lines is very difficult. Several methods have been introduced to overcome this limitation. One such method initially choose a reference plane and computed its vanishing line and a vanishing point for its orthogonal direction using sets of parallel lines. The orthogonal distance of every point in the image to that plane can then be computed by specifying the relative distance of a single point from the reference plane.Absolute 3D positioning of points in space and feature grouping yields a simple planar scene approximation.

## 5. 3D depth reconstruction

Humans have no difficulty in understanding 3D structure of an image upon seeing it. It is impossible to determine depth from a single image. Besides relying on geometry we should also consider monocular cues. There are numerous monocular cues-such as texture variations and gradients, defocus, color etc that contain useful and important depth information. Humans usually perceive depth by seamlessly combining many of these stereo and monocular cues.

## 5.1 SCN method (Saxena, Chung, and Ng)

In this method[6] the task of depth estimation from a single monocular image is considered. This problem is approached using a supervised learning method. Initially a training set of monocular images is collected and their corresponding ground-truth depthmaps are computed. The supervised learning algorithm is applied to predict the depthmap as a function of the image. Local features alone are insufï¬cient to determine the depth at a point. Therefore consider the global context of the image has also to be taken into consideration. The input image is initially divided into small rectangular patches and then depth of each patch is determined .Absolute depth at a particular patch is determined using absolute features and relative depth is estimated using relative features between two patches. The image is convolved with certain number of filters so that we can choose features that capture three types of local cues which are texture variations, texture gradients, and color. patch. Global information is also extracted at multiple spatial scales by using image features since local image features are insufficient.

## 5.2 DLN method (Delage, Lee, and Ng)

On looking upon an image our prior knowledge about the world allows us to infer 3d information about the scene . We also differentiate different objects, determine their orientations, and infer their connections within the environment. A dynamic Bayesian network model[7] capable of eliminating some of these ambiguities was developed .It was useful for recovering 3d information for many images. But this method took into consideration only indoor scenes.It assumes a ï¬‚oorwall geometry on the scene so that ï¬‚oor-wall boundary could be recognised in each column of the image. In case the image is produced under perspective geometry, this model can be used for 3d reconstruction from a single image. This was the ï¬rst monocular approach which was used to recover 3d reconstructions from single indoor images.

## 5.3 HEH method (Hoiem, Efros, and Hebert)

HEH [8]considered outdoor scenes. An image is initially to be segmented into n geometrically homogeneous regions. The group of pixels having homogeneous features are called superpixels. The superpixels are first shuffled. The initial n superpixels are assigned to different regions. Each of the remaining superpixels are assigned based on a pairwise affinity function. Each patch of an image is induced with some orientation in the real world. All cues are needed to determine the most likely orientations . HEH divides an image into "ground","upright", and "sky" components. It learns an afï¬nity metric between superpixel pairs using features like color, texture, location, and vanishing points. However, due to certain reasons this method would be unsuitable for video sequences. This method concentrates on outdoor scenes.

## 6. Conclusion

This article gives an extensive survey of 3D Reconstruction methods and also gives a brief review of related topics of reconstruction. Moreover, the different reconstruction methods and their different categories are also mentioned.