An Edge Based Algorithm For Moving Object Detection Matching And Tracking Engineering Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

In this paper, we propose a fast and robust approach to the detection, matching and tracking of moving objects. Our method is based on using lines computed by a gradient-based optical flow and Canny edge detector. In our method, extracted edges by using optical flow and the edge detector are restored as lines, and background lines of the previous frame are subtracted. Gradient based optical flow and edges are well matched for accurate computation of velocity, but not much attention is paid to create systems for detecting and tracking objects using this feature. Matched objects are tracked, and each tracked object has a state for handling. The proposed scheme is applied to detect, match and track vehicles and is simulated using MATLAB.

Object detection, matching and tracking of moving objects has its applications in wide variety of fields. Basically motion detection is a well-studied problem in computer vision. There are two types of approaches: the region-based approach and the boundary-based approach. In the case of motion detection without using any models, the most popular region-based approaches are background subtraction and optical flow. Background subtraction detects moving objects by subtracting estimated background models from images.

Object matching in two-dimensional (2-D) images has been an important topic in computer vision, object recognition, and image analysis [1]. The performance of the matching method depends on the type of the features used, matching measure criterion, and so on. Low-level matching algorithms, i.e., algorithms using a distance transform (DT) [2] and a Hausdorff distance (HD) [3] have been investigated because they are simple and insensitive to changes of image characteristics. In this paper, we apply the robust statistics of regression analysis [4], to the computation of the HD measures for object matching, resulting in two robust HD measures: M-HD based on M-estimation and least trimmed square-HD (LTS-HD) based on LTS. The two proposed robust approaches yield the correct results, even though the input data are severely corrupted.

Goal of the tracking module is to track the targets detected by the motion detection stage, as long as they remain visible in the field of view of the camera. This is critical for obtaining tracks that reflect the motion characteristics of the tracked object over longer durations of time. In this paper, the problem of detecting, matching and tracking a moving object is addressed.

This paper is organized as follows: In Section 2 and 3, we give details of detecting moving edges by using a gradient-based optical flow technique and Canny edge detector respectively. Matching of detected objects using Robust HD measures are presented in Section 4. In Section 5, methods of tracking matched objects are dealt. Experimental results and conclusions are described in Sections 6 and 7, respectively.


For the computation of optical flow of 2D image motion, the following approximation is used:

I(x,t) ≈I(x+dx, t+dt) (1)

Where I(x,t) is the spatiotemporal function of image intensity. Intensity of time t and t+dt is assumed to remain the same, i.e., no illumination changes are supposed to occur. Equation (1) can be expanded using Taylor series and ignoring higher order terms as follows:

TI.v+It=0 (2)

where I=(Ix,Iy)T is the gradient, It is the temporal derivative of I(x,t), and v=(u,v)T is the image velocity, respectively. However, one linear constraint equation is insufficient

to determine two-dimensional velocity. This problem is known as the aperture problem.

Many methods have been proposed to add other constraints. Barron et al. evaluated several of them, and they show quantitatively that the method proposed by Lucas and Kanade is one of the most reliable. This method is based on minimizing a weighted sum-of-squares error function of a spatial local region. All pixels in a region are supposed to have the same velocity, and the equation of the error is written as:

E= [TI(Xi,t).v+It(Xi,t)]2 (3)

where R is a spatial local region and is a positive weight of each pixel of the region. To compute the least error of (3), the first-order derivative with respect to v is computed. Then the estimated velocity is obtained as:

=-M-1b (4)


M= (5)

b= (6)

assuming that the negative sign (-) is invertible. Horn and Schunk [5] applied pre-smoothing to a global region in order to simplify the optical flow problem.

Even though the method proposed in has more reliability than other comparable techniques, the approximation in (1) is problematic when the illumination changes. Simoncelli et al. showed that uncertainty of the calculation of (2) decreases nonlinearly in proportion to the trace of M of (5), i.e., the magnitudes of the spatial gradients.

Our method is based on using features which have strong magnitudes of the gradients. These features are extracted by a Canny edge detector. Edges are masked by the thresholding velocity magnitude for eliminating background edges with little motion, i.e.


The Manhattan magnitude is used for simplifying the computation.


The Canny algorithm uses an optimal edge detector based on a set of criteria which include finding the most edges by minimizing the error rate, marking edges as closely as possible to the actual edges to maximize localization, and marking edges only once when a single edge exists for minimal response [6].

According to Canny, the optimal filter that meets all three criteria above can be efficiently approximated using the first derivative of a Gaussian function.

G(x,y) = (8)


The first stage involves smoothing the image by convolving with a Gaussian filter. This is followed by finding the gradient of the image by feeding the smoothed image through a convolution operation with the derivative of the Gaussian in both the vertical and horizontal directions. The 2-D convolution operation is described in the following equation.

I'(x,y) = g(k,l) I(x,y)


where: g(k,l) = convolutional kernel

I(x,y) = original image

I'(x,y) = filtered image

2N + 1 = size of convolutional kernel

Both the Gaussian mask and its derivative are separable, allowing the 2-D convolution operation to be simplified. This optimization is not limited to software implementation only, but applies to hardware implementation as well, as shown in the next section. The non-maximal suppression stage finds the local maxima in the direction of the gradient, and suppresses all others, minimizing false edges. The local maxima is found by comparing the pixel with its neighbors along the direction of the gradient. This helps to maintain the single pixel thin edges before the final thresholding stage.

Instead of using a single static threshold value for the entire image, the Canny algorithm introduced hysteresis thresholding, which has some adaptivity to the local content of the image. There are two threshold levels, th, high and tl, low where th > tl. Pixel values above the th value are immediately classified as edges. By tracing the edge contour, neighboring pixels with gradient magnitude values less than th can still be marked as edges as long as they are above tl. This process alleviates problems associated with edge discontinuities by identifying strong edges, and preserving the relevant weak edges, in addition to maintaining some level of noise suppression. While the results are desirable, the hysteresis stage slows the overall algorithm down considerably. The performance of the Canny algorithm depends heavily on the adjustable parameters, σ, which is the standard deviation for the Gaussian filter, and the threshold values, th and tl. σ also controls the size of the Gaussian filter. The bigger the value for σ, the larger the size of the Gaussian filter becomes. This implies more blurring, necessary for noisy images, as well as detecting larger edges. As expected, however, the larger the scale of the Gaussian, the less accurate is the localization of the edge. Smaller values of σ imply a smaller Gaussian filter which limits the amount of blurring, maintaining finer edges in the image. The user can tailor the algorithm by adjusting these parameters to adapt to different environments with different noise levels.

Results can be further improved by performing edge detection at multiple resolutions using multi-scale representations, similar to the Marr-Hildreth algorithm [7]. This is achieved using different standard deviations, which correspond to different resolution versions of the image. Edges have zero crossing at multiple scale values. Combining maxima information from different scales allows better classification of true edges. This may prove to be challenging to implement as a software solution for real-time applications.


This work proposes two robust HD measures based on the robust statistics such as M-estimation and LTS. The directed distance hM(A, B) of the proposed M-HD based on M-estimation is realized by replacing the Euclidean distance by the cost function that can eliminate outliers .

M-estimation minimizes the summation of a cost function . The directed distance

hM(A, B) is defined as

hM(A,B)= (11)

where the cost function is convex and symmetric and has a unique minimum value at zero. In our experiments, we use the cost function defined by


where is a threshold to eliminate outliers, so the outliers yielding large errors are discarded. In M-HD, because the matching performance depends on the parameter, it is important to determine appropriately. If is set to infinity, this HD measure is equivalent to the conventional MHD. Because the cost function is associated with the distance value dB(a), the threshold value between three and five is selected experimentally.

In the proposed LTS-HD based on the LTS scheme, the directed distance hLTS(A, B) is defined by a linear combination of order statistics

hLTS(A, B) = (i) (13)

where H denotes h x NA, as in the partial HD case, and dB(x)(i) represents the ith distance value in the sorted sequence (dB(x)(1)≤dB(x)(2)≤…≤dB(x)()). The measure hLTS(A,B) is minimized by remaining distance values after large distance values are eliminated. So, even if the object is occluded or degraded by noise, this matching scheme yields good results. An optimal fraction h, whose range is from zero to one, depends on the amount of occlusion. If h is equal to one, this HD measure corresponds to the conventional MHD. In the proposed algorithms, a full search is adopted for finding the optimal matching location, that is, the point that yields a minimum distance is selected because the proposed algorithm are based on a distance measure. The computation time can be reduced by introducing efficient search algorithms such as hierarchical or pyramid schemes.

The object matching algorithms based on two robust HD measures are insensitive to outliers and occlusions, because of employment of the robust estimation in computing the HD measures. Also they yield an efficiency larger than that of the conventional ones, because of the average operation embedded into them, resulting in algorithms relatively insensitive to the change of parameters, e.g., of the M-HD and h of the LTS-HD. The M-HD measure requires comparison and summation operations, whereas the LTS-HD measure requires sorting and summation operations. The partial HD requires a sorting operation as in LTSHD measure computation, where sorting operations can be computed with linear time and real-time algorithms [8]. As a result, the computational complexity of the two proposed HD measures is almost the same as that of the conventional HD measures such as the partial HD and the MHD. Note that the DT map can be used for fast computation of various HD measures.


To solve the correspondence problem for detected objects in different frames, we defined the similarity between an object of the previous frame Rprev(m) and an object of the current frame Rcur(m'), using estimated positions of lines by optical flow, i.e.,


{1,…,Nm}prev(m) (14)

Then the similarity is defined as follows:


where is the number of elements in , and

lij= (16)

Then these two objects Rprev(m) and Rcur(m') are considered as the same object if the similarity is non-zero value, i.e.

Rprev(m) Rcur(m'), if (17)

In addition, we defined following states of tracked objects for handling them as special cases:

A. Occluded

If an object disappears without reaching the borders of the frame, the object is considered to be occluded.

The center of the contour and velocity of the center are registered to the list of occluded objects. Occluded objects are deleted from the list when all estimated positions of points of the contour get out of the frame.

B. Reappeared

Newly detected objects whose contours are inside the frame (i.e., unattached to the borders of the frame) are regarded as reappeared. The corresponding occluded object is searched from the list. The center of each occluded object is recalculated by using a Kalman filter [9] from its tracked history of positions. Then the closest object is considered as the same object.

C. Merged

If a detected object of the current frame includes estimated lines of two or more objects of the previous frame, the detected object is regarded as merged; i.e. two or more objects of the previous frame satisfy (15) with a single object of the current frame. Each velocity of merged objects is stored for finding corresponding objects when they separate again.

D. Separated

If two or more detected objects include estimated lines of one object of the previous frame, these detected objects are regarded as separated; i.e., a single object of the previous frame satisfies (15) with two or more objects of the current frame. If the object of the previous frame is labeled as merged, corresponding objects are found according to their velocity.


We tested our method on a moving vehicle, pedestrians, approaching cars, occluded cars, and interfering objects. The sequences shown in figure are recorded on outdoor-scenes that include the sky, trees, buildings and grounds. They include several kinds of noise caused by illumination changes, small movement in the background, and reflection. However, our results showed remarkable robustness against these environments. Our method succeeded detecting Fig.1 (a) from the video sequence and then it is edge detected using Canny edge detector. The edge detection result is shown in Fig.1 (b). Then the edge detected number plate of a particular vehicle (Fig. 2(a)) that needs to be tracked is matched with the Canny edge detected image (Fig. 2(b)). If there is correct matching, then it is tracked. Fig.3 (a) - (d) shows moving objects that are tracked accurately in the video sequence.


Fig1. The flow of detecting objects. (a) Original image. (b) Canny edge detected image.

(a) (b)


Fig 2. Matching result of a real test image. (a) Object model (72 _ 72). (b) Test image (256 _ 256). (c) Identical

Matching result of (a) and (b) by the two proposed HD measures.

As a real test image, we use the 256 x 256 "road" image contaminated by Gaussian noise (standard deviation _ = 30) [10]. Also about 35% pixels of the target portion of an input image are deleted. The 72 _ 72 object model and the test image are shown in Fig. 2(a) and (b), respectively. The matching result obtained by the two proposed HD measures is shown in Fig. 2(c), where the object model is superimposed on the test image. The matching position estimated by the two proposed HD measures (M-HD with_ = 3; 4; 5 and LTS-HD with h = 0:60; 0:70; 0:80), is correct thus the superimposed matching results are identical, as shown in Fig. 2(c).

On the other hand, the matching position detected by two conventional HD matching (with f = 0:6 for the partial HD, matching results are almost correct) is not correct.

Results using the proposed algorithm are given in Figures 3(a)-(d). For all the figures, the red ellipse shows the vehicle position and size. Figures 3(a), (b) shows that the proposed method finds the vehicle when it reappears from occlusion as the most ambiguous case in tracking; while Figures 3(c), (d) illustrates that the proposed method follows the vehicle.

Finally from the Figure 3, we can judge the accuracy of the system in combined detection, matching and tracking.

(a) (b)

(c) (d)

Fig 3. (a) - (d) Tracking of Moving Object

In our experiment we chose the following parameter, the region size of optical flow was 5x5, =1.0 in (6), identical to all points.


We proposed a new system for the detection, matching and tracking of moving objects. The existing systems have not considered all these tasks. Our method is a Canny based detection, which allows users to obtain more accurate information. The proposed method is reliable when compared to other techniques. This technique can be used for various applications like as Aerial and Ground sensors, Situational Awareness, for Day or night operations, Detection and tracking while objects are in movement, and for Border protection and monitoring. Future work will be focused on multiple object tracking.


[1] N. Ayache and O. D. Faugeras, "Hyper: A new approach for the recognition and positioning of two-dimensional objects," IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-8, pp. 44-54, Jan. 1986.

[2] G. Borgefors, "Hierarchical chamfer matching: A parametric edge matching algorithm," IEEE Trans. Pattern Anal. Machine Intell., vol.10, pp. 849-865, Nov. 1988.

[3] D. P. Huttenlocher, G. A. Klanderman, and W. J. Rucklidge, "Comparing images using the Hausdorff distance," IEEE Trans. Pattern Anal. Machine Intell., vol. 15, pp. 850-863, Sept. 1993.

[4] P. Meer, D. Mintz, D. Y. Kim, and A. Rosenfeld, "Robust regression methods for computer vision: A review," Int. J. Comput. Vis., vol. 6, pp. 59-70, Apr. 1991.

[5] B. K. P. Horn and B. G. Schunk, "Determining optical flow," Artificial Intelligence, vol. 17, pp. 185-203, 1981.

[6] Canny, J., "A Computational Approach to Edge

Detection", IEEE Trans. Pattern Analysis and Machine Intelligence, 8:679-714, November 1986.

[7] Maar, D., Hildreth E., "Theory of edge detection", Proceedings Royal Soc. London, vol. 207, 187-217, 1980.

[8] D. S. Richards, "VLSI median filters," IEEE Trans. Acoust., Speech, Signal Processing, vol. 38, pp. 145-153, Jan. 1990.

[9] R. E. Kalman, "A new approach to linear filtering and prediction problems," Trans. ASME - J. Basic Engineering, vol. 82, pp. 35-45, March 1960.

[10] R. Azencott, F. Durbin, and J. Paumard, "Multiscale identification of building in compressed large aerial scenes," in Proc. 13th Int. Conf. Pattern Recognition, Vienna, Austria, Aug. 1996, vol. 2, pp. 974-978.