Introduction Backgrounds And Motivations Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

A robot is a mechanical device that can perform tasks automatically and may but need not be humanoid in appearance. Some robots require some guidance, which can be provided through a remote control or a computer interface. A robot is usually an electromechanical machine that is guided by some program or circuitry. Robots can be autonomous, semiautonomous, or remotely controlled and range from humanoid machines such as ASIMO and TOPIO to Nano robot, "swarm," and industrial ones. By mimicking a lifelike appearance or automating movements, a robot may convey a sense of intelligence or thought of its own.

Advances in mechanical techniques during the Industrial Revolution led to more practical applications, including those by Nikola Tesla [55], who designed a radio-controlled boat in 1898. Electronics evolved to become the driving force of development with the creation of autonomous robots by William Grey Walter in Bristol, England, in 1948. The first digital and programmable robot was invented in 1954 by George Devol, the so-called "Unimate." It was sold to General Motors in 1961 to be used for lifting pieces of hot metal from die casting machines at its Inland Fisher Guide Plant in the West Trenton section of Ewing Township, New Jersey.

Robots [55] have replaced humans in the facilitation of repetitive and dangerous tasks that are shunned or cannot be performed by humans because of various physical limitations or even tasks performed in outer space or at the bottom of the sea, where humans cannot survive.

With rapid advances in mobile robot technologies, recent years have witnessed the increasing popularity of mobile robots. Robots [55] have been employed for a wide range of industrial applications, including factory automation, medical assistance, and rehabilitation for the provision of new forms of services. A typical example is the autonomous mobile robot that explores and navigates in dynamic or unstructured environments for various tasks ranging from material-handling systems to military operations.

Mobile robots [56] can move around in their environment. That is, they are not fixed to a specific physical location. For example, mobile robots are widely used as automated guided vehicles (AGVs). An AGV is a mobile robot that follows specific markers or wires on the floor by using its vision or laser.

Mobile robots [56] also appear as consumer products for entertainment purposes or for certain tasks such as vacuuming the floor. Therefore, mobile robots have become a topic of special interest in robotics and received considerable attention from both scholars and practitioners. Many studies have examined mobile robots, and most of the major universities have labs focusing on mobile robots.

Modern robots [55] are usually used in tightly controlled environments such as on assembly lines because they have difficulty responding to unexpected interference. Therefore, most humans rarely encounter robots. However, robots for cleaning and maintenance are increasingly common in and around homes in developed countries, and some are even employed for military applications.

Many studies have focused on motion planning or simultaneous localization and mapping (SLAM) (many studies have used the unscented Kalman filter [23] or Monte Carlo localization methods), but there remains a key problem in location placement, particularly in terms of service robots in dynamic environments. It is important for a mobile robot to know where it is in any moment relative to its environment. To accomplish a given task, a mobile robot must determine its pose (orientation and location). That is, it must be capable of self-localization in any given environment.

This paper proposes a simple and convenient self-localization estimation method for an indoor mobile robot based on the natural features of the regular ceiling.


The use of the global positioning system (GPS) [57] has made the task of self-localization easier, but when the robot is operating indoor environments, the use of the GPS for localization purposes may be limited or not be feasible. That is, mobile robots require other methods for their self-localization.

In general, a mobile robot's indoor self-localization is mandatory for its full autonomy during its navigation [2]. Various solutions to the problem of self-localization have been proposed in the field of robotics and can be classified into two groups: relative (dead-reckoning) and absolute localization.

Relative Localization and Absolute Localization

Relative localization is applied in most wheeled mobile robots, and odometry (or dead-reckoning methods) is widely used for calculating the position of the robot from a starting reference point because of its ease of use, efficient data output, and low cost. However, the main disadvantage is that it leads to the unbounded accumulation of errors resulting from wheel slippages or surface roughness. Therefore, a robot may fail to keep track of its location and thus lose accuracy over long distances. Although very simple and fast, dead-reckoning algorithms tend to accumulate errors in the system because they employ information only from proprioceptive sensors as odometer readings (e.g. incremental encoders on robot wheels).

In absolute localization, the location of an object can be determined by detecting and recognizing landmarks in the environment. Here the location is estimated from known coordinates of landmarks based on ranging and/or bearing measurements between the object and the landmark. Absolute-localization methods are based on information from exteroceptive sensors. Although they yield stable locating errors, they are more complex and costly in terms of the computation time.

Table . Comparison of relative localization and absolute localization.

Computational Complexity

Operation Complexity



Relative Localization



Unbounded Accumulation


Absolute Localization



Stable Error


A popular method for achieving online localization consists of combining relative and absolute methods [2]. Relative localization is used with a high sampling rate to update the robot pose, whereas absolute localization is applied periodically with a lower sampling rate to correct for any positioning misalignment.

Traditionally, an autonomous vehicle is equipped with an odometer to measure the current location of the vehicle with respect to the starting point. However, this scheme usually entails incremental mechanical errors caused by the vehicle's wheel system.

Because of rapid advances in microelectronics, sensors, and wireless communications systems, the cost of localization hardware has decreased sharply while its performance has improved markedly. Some methods based on exteroceptive sensors, such as ultrasonic sensors, laser range finders (or light detection and ranging devices), and vision sensors, can address this limitation to obtain information on the external environment.

Visual positioning methods play an important role in the self-localization of autonomous mobile robots working in indoor environments [1]. In general, knowledge of indoor environments can be used to determine the position and orientation of a mobile robot through visual positioning approaches.

Data on visual images have the potential to disambiguate objects for localization because they provide high-resolution images and additional information such as the color, texture, and shape of objects [1]. To compensate for accumulated navigation errors, mobile robots must use external sensors to estimate their position. Active ranging devices can provide direct distance measurements and have been widely used for robot localization. However, these sensors do not provide the feature recognition needed to resolve ambiguities between objects.

Some methods combine proprioceptive and exteroceptive sensors. A set of self-localization methods for robots is not discussed.

Self-Localization Based on Wireless Network

Distributed wireless networks and RF identification (RFID) [38] systems have become increasingly popular for many sensing applications ranging from environmental monitoring to the classification and tracking of military targets. As shown in Figure . Example of a network to be localized. Edges between sensor nodes indicate availability of an internode measurement such as distances or angles-of-arrival. The measurement set, which need not contain all possible pairs, is combined with prior information in order to obtain coordinate estimates of each node , the self-localization problem entails a combination of internode measurements collected in a measurement vector with prior information to obtain coordinate estimates of N constituent nodes of the network.

Figure . Example of a network to be localized. Edges between sensor nodes indicate availability of an internode measurement such as distances or angles-of-arrival. The measurement set, which need not contain all possible pairs, is combined with prior information in order to obtain coordinate estimates of each node (from Ash, 2008, see ref [38])

With active RFID tags [22], a calibration step is required for constructing an empirical model of the strength and distance of received signals, and then a localization step follows. Otherwise, a number of passive RFID tags may be placed in a grid-like pattern only for the estimation of the position without orientation.

Vision-based Global Self-Localization

It is common to mount a global camera on the ceiling or some overhead location to estimate the position and orientation of a robot by measuring the position of a mark attached on the top of the robot (see Figure . Global Localization: a global camera on the ceiling) [5]. For example, an overhead camera can be employed to compute the position of the robot, the desired goal location indicated by a marker, and all obstacles. In [18], a camera is mounted 3.5 m above the ground, yielding a viewable floor area of approximately 3.2 m Ã- 2.4 m. If the image size is 640 Ã-480 in pixels, then the resolution of robot positioning is approximately 5 mm. In [19], a camera mounted on the ceiling is used to assess the precision of the final position of an AGV and observe the position and orientation of the AGV based on the measurement of two lamps on the AGV. The overhead global vision system is popular in robot soccer [20]. It is clear that the positioning resolution is inconsistent with the available work space of mobile robots.

F:\QQ\87518097\Image\{V%1N4FAG4Y([email protected]

Figure . Global Localization: a global camera on the ceiling.

Landmark-based Localization

Another common strategy is to employ various features to estimate the position and orientation of mobile robots [5]. Feature-based localization algorithms are often simpler and more reliable, particularly in dynamic environments [22]. Features such as landmarks or specific objects in a given environment play an important role in localization because such features can describe an environment in a simple and clear manner. That is, they can be used to define a set of absolute positions of landmarks such as natural landmarks, specifically designed landmarks, and geometric/topological information on a map after the feature extraction process. Active beacons (e.g., ultrasonic sensors and infrared or radio frequency (RF) nodes) can also be landmarks (see Section 2).

Landmarks used in various approaches to mobile robots' localization range from artificial markers such as barcodes and more natural objects such as a set of dining chairs, an array of partitions, and a series of ceiling lights or doors to geometric features such as straight wall segments and corners. Indeed, the selected visual feature influences the performance of the positioning approach.

Some studies have used specific objects attached to the wall, ceiling, or floor as landmarks to determine the position and orientation of mobile robots. For example, in [4], a method based on a SIFT-based object-image matching process, a 2D affine transformation scheme, and an analytic 3D space transformation technique is used for estimating the vehicle location. This method can estimate the vehicle location and monitor objects simultaneously.

Figure . Self-localization based on specific object. The right image is object-image. It used object-image attached to the wall as landmarks to determine the position and orientation of mobile robots (from Chen, 2010, see ref [4]).

Other studies have used natural objects in indoor environments as landmarks to determine the position and orientation of mobile robots, and some have introduced perceptual models for Monte Carlo localization (MCL) through a 3D laser scanner to observe the ceiling. MCL matches ceiling structures such as beams, columns, air conditioners, and lightning installations against a world model consisting of line and point features.

For example, in [49]-[54], a vision-based simultaneous localization and mapping technique is proposed. This technique makes use of both line and corner features as landmarks and localizes and reconstructs 3D line and corner landmarks simultaneously within a framework based on the Kalman filter. However, the number of corner features is smaller, and therefore an error in matching landmarks can lead to large localization errors.

Figure . Corner features detected by horizontal-horizontal and horizontal-vertical lines. The corner points is used as nature landmark (from An, 2010, see ref [52]).

In general, image processing consisting of feature extraction and landmark recognition processes is not sufficiently robust because activities of individuals facilitate dynamic indoor environments. Moving objects in indoor environments can sometimes occlude landmarks from the camera, which can lead to a feature extraction or landmark recognition failure. In comparison with floor or wall features or landmarks, ceiling features are regular and rarely occluded. Many studies have considered various positioning methods based on ceiling features.

In monocular vision [2], the self-localization process is generally composed of the following five stages:

Figure . Landmark-based self-localization stages.

In [10], the scale-invariant feature transform (SIFT) algorithm is introduced to extract invariant features from images. An input image is convoluted with 2D Gaussian functions scaled by different smoothing factors, and the local minima and maxima of smoothed images are taken as keypoints. Global vision localization can be achieved through a random sample consensus (RANSAC) approach by matching SIFT features between the current image and a database map.

Previous studies have compared various image-matching algorithms [59]. Detecting features and matching images represent two important tasks in photogrammetry, and their application has increased in a wide range of fields. From simple photogrammetry tasks such as feature recognition to the development of sophisticated 3D modeling software, there are several applications in which image-matching algorithms play an important role. In addition, this has been a very active area of research in recent decades, as indicated by the tremendous amount of work on this topic. With needs changing and becoming more demanding, researchers have increasingly focused on developing new technologies that can satisfy such needs.

This paper's algorithms and respective images are classified into feature- and texture-based categories. Based on this broad classification, only three of the most widely used algorithms are assessed: SIFT, SURF, and FAST [59]. Here FAST is the only one belonging to the feature-based category. When an evaluated image corresponds to the cluttered background or a considerably busy scene, texture-based algorithms can detect a large number of features and matches.

Feature-based Algorithm (FAST)

Feature detection efforts started with Harris and Stephen, leading to the Harris Corner Detector I [59], which can successfully detect robust features in any given image, meeting basic requirements. However, because it can detect only corners, it lacks the connectivity of feature points, a major limitation in terms of obtaining major descriptors such as surfaces and objects. As a result, points detected using this method do not have the level of invariance required for obtaining reliable image-matching and 3D reconstructions.

In 1997, almost a decade after the Harris Detector I, a new corner detection algorithm called FAST was introduced [59]. FAST prioritizes the detection of corners over edges because it assumes that corners are one of the most intuitive types of features that show clear changes in the 2D intensity and thus that they can be better distinguished from neighboring points. FAST also modifies the Harris Detector I such that the computational time for the algorithm is reduced without compromising the results. A key limitation of most corner detectors is that they are not effective in detecting corners if the background of an image is highly clustered. This is because such detectors are based only on the analysis of a pixel and its neighboring and there is no additional filtering process, which can sometimes lead to detection errors.

Scale Invariant Feature Transform (SIFT)

Scale Invariant Feature Transform (SIFT) [10] was developed by David Lowe in 2004 as a continuation of his previous work on invariant feature detection (Lowe, 1999), and it presents a method for detecting distinctive invariant features from images that can be later used to perform reliable matching between different views of an object or scene. Two key concepts are used in this definition: distinctive invariant features and reliable matching. What makes the Lowes features more suited to reliable matching than those obtained from any previous descriptor? The answer to this lies, in accordance to Lowe's explanation, in the cascade filtering approach used to detect the features that transforms image data into scale-invariant coordinates relative to local features.

This approach is what Lowe's has named SIFT [10], and is broken down into four major computational stages:

Figure . Flow chart of SURF algorithm.

Each of these stages are execute in a descending order (that's why its referred to as a cascade approach) and on every stage a filtering process is made so that only the key points that are robust enough are allow to jump to the next stage. According to Lowe [10], this will reduce significantly the cost of detecting the features. However, researches who tested SIFT algorithm stated that although SIFT seemed to be the more appealing descriptor; the 128-dimensions of the descriptor vector turn the feature detection into a relatively expensive process.

Speeded Up Robust Features (SURF)

After Lowe, Ke and Sukthankar (2004) used Principal Component Analysis (PCA) to normalize gradient patch instead of histograms. They showed that PCA-based local descriptors were also distinctive and robust to image deformations. But the methods of extracting robust features were still very slow. Bay and Tuytelaars (2006) (see Figure . SIFT family. PCA-SIFT use Principal Components Analysis (PCA) to the normalized gradient patch. CSFT is used for color image. ASIFT image matching algorithm extends SIFT method to a fully a¬ƒne invariant device) speeded up robust features (SURF) [11] and used integral images for image convolutions and Fast-Hessian detector.

SURF was conceived to ensure high speed in three of the feature detection steps: detection, description and matching [58]. Unlike PCA-SIFT, SURF speeded up SIFT's detection process without scarifying the quality of the detected points. The reason why SURF is capable of detect images features at the same level of distinctiveness as SIFT and at the same speed as PCA-SIFT is explained by their authors as follows:

An entire body of work is available on speeding up the matching step. All of them come at the expense of getting an approximate matching. Complementary to the current approaches we suggest the use of the Hessian matrix's trace to significantly increase the matching speed. Together with the descriptor slow dimensionality, any matching algorithm is bound to perform faster [59].


Comparison of Image Matching Algorithm

In general, FAST detects significantly smaller numbers of features and substantially fewer matches than SIFT or SURF. Here mistakes in detecting landmarks can lead to large localization errors.

The SIFT and SURF algorithms employ slightly different methods for detecting features [58]. SIFT [10] builds image pyramids, filtering each layer with Gaussians of increasing sigma values and taking the difference, whereas SURF creates a "stack" without 2:1 down sampling for higher levels in pyramids, resulting in images with the same resolution. Because of the use of integral images, SURF filters stacks through the box filter approximation of second-order Gaussian partial derivatives. This is because integral images allow for the computation of rectangular box filters in near constant time.

In keypoint [1] matching step, the nearest neighbor is defined as the keypoint with the minimum Euclidean distance for the invariant descriptor vector. Lowe [10] used a more effective measurement obtained by comparing the distance to the closest neighbor to that to the second-closest neighbor such that SURF uses 0.5 as the distance ratio, as in the case of SIFT.

Figure . SIFT family. PCA-SIFT use Principal Components Analysis (PCA) to the normalized gradient patch. CSFT is used for color image. ASIFT image matching algorithm extends SIFT method to a fully a¬ƒne invariant device.

The SIFT, PCA-SIFT, and SURF algorithms [58] are the most widely used ones in the field of computer vision. Their efficiency and robustness have been verified for invariant feature localization. Some studies have compared these three algorithms.

Table . Comparison of SIFT, PCA-SIFT and SURF.





























Table . Comparison of SIFT, PCA-SIFT and SURF. shows that there is no single best method for all deformations [58]. SIFT's matching success can be attributed to its feature representation, which is carefully designed to be robust to localization errors. As discussed in other research, the PCA approach is known to be sensitive to registration errors. Using a small number of dimensions can provide substantial benefits in terms of the storage space and matching speed. SURF demonstrates its stability and rapid processing in experiments. It is known that the "Fast-Hessian" detector, which is used in SURF, is three times faster than DOG, which is used in SIFT, and five times faster than the Hessian-Laplace. As shown in the table, there is a need for PCA-SIFT to improve its blur and scale performance. SIFT is stable in all experiments except for time because it detects a large number of keypoints and finds numerous matches. SURF is fast and effective in most situations but only when the rotation is large.

Vision based Self-Localization

Generally, feature-based methods are often very efficient, providing that some features can be found. According to the information of the features, the position of the robot can be determined. In this section, we describes our relative localization system, which is shown in Figure . Vision based self-localization. The global coordinate origin OG, the robot's current posture Xe, the interest point Ii.. A series of interest point I (feature point) is extracted using image processing. These interest point have three coordinate: a global coordinate, two relative coordinate in two images (before and after moving). The relative self-localization is calculated using changes of coordinate of interest point.

Figure . Vision based self-localization. The global coordinate origin OG, the robot's current posture Xe, the interest point Ii.


This paper is divided into seven chapters:

This chapter discusses the development status of robots and the motivation for self-localization and then describes some methods for self-localization and image-matching algorithms.

This chapter discusses the task environment of robots and the hardware and software design for self-localization.

This chapter focuses on the methods for detecting interest points by using SURF.

This chapter analyzes the orientation of interest points and the extraction of descriptors by using SURF and proposes a new algorithm for calculating the orientation and descriptor of interest points.

Based on matched interest points, this chapter designs a relative self-localization algorithm.

This chapter shows some simulation results based on SURF and presents the results for the proposed algorithm.

This chapter presents the conclusions and suggestions for future research on image processing.